What is the meaning of UTF-8,16,32 and UCS-2?
Unicode defines the mapping between code points and symbols, the effective encoding is specified by a Unicode Transformation Format (UTF). The most commonly used UTF encodings are:
- UTF-32
- a character is represented by a 4-byte integer
- UTF-16
- a character is represented by 1 or 2 2-byte integers
- UTF-8
- a character requires between 1 and 4 bytes
Compared to the other UTF encodings UTF-8 has the advantage of being compatible with ASCII: a text that consists only of ASCII characters has the same representation in UTF-8 and ASCII. As a consequence UTF-8 is also more compact than the other UTF encodings for English and most European languages (because the majority of symbols are included in the ASCII set).
UCS-2 (Universal Character Set v2) is a deprecated encoding originally used in Windows and Java: it encodes each character on a 2-bytes integer and is therefore limited to the first 65536 code points of Unicode, this is why it has gradually been replaced by plain UTF-16.
© 2001-2019 Fair Isaac Corporation. All rights reserved. This documentation is the property of Fair Isaac Corporation (“FICO”). Receipt or possession of this documentation does not convey rights to disclose, reproduce, make derivative works, use, or allow others to use it except solely for internal evaluation purposes to determine whether to purchase a license to the software described in this documentation, or as otherwise set forth in a written software license agreement between you and FICO (or a FICO affiliate). Use of this documentation and the software described in it must conform strictly to the foregoing permitted uses, and no other use is permitted.