What is the meaning of UTF-8,16,32 and UCS-2?

Unicode defines the mapping between code points and symbols, the effective encoding is specified by a Unicode Transformation Format (UTF). The most commonly used UTF encodings are:

UTF-32: a character is represented by a 4-byte integer
UTF-16: a character is represented by 1 or 2 2-byte integers
UTF-8: a character requires between 1 and 4 bytes

Compared to the other UTF encodings UTF-8 has the advantage of being compatible with ASCII: a text that consists only of ASCII characters has the same representation in UTF-8 and ASCII. As a consequence UTF-8 is also more compact than the other UTF encodings for English and most European languages (because the majority of symbols are included in the ASCII set).

UCS-2 (Universal Character Set v2) is a deprecated encoding originally used in Windows and Java: it encodes each character on a 2-bytes integer and is therefore limited to the first 65536 code points of Unicode, this is why it has gradually been replaced by plain UTF-16.

Contents

Index

Glossary

Search Results

What is the meaning of UTF-8,16,32 and UCS-2?