15.3 Supported encodings


The following is the list of currently supported encodings. The first column corresponds to the encoding name, the second column is the list of aliases, the third column is its CES and CCS components names, and the fourth column is a short description.

NameAliasesCES/CCSShort description
big5csbig5, big_five, bigfive, cn_big5, cp950table_pcs / big5, us_asciiThe encoding for the Traditional Chinese.
cp775ibm775, cspc775baltictable / cp775The updated version of CP 437 that supports the balitic languages.
cp850ibm850, 850, cspc850multilingualtable / cp850IBM 850 - the updated version of CP 437 where several Latin 1 characters have been added instead of some less-often used characters like the line-drawing and the greek ones.
cp852ibm852, 852, cspcp852IBM 852 - the updated version of CP 437 where several Latin 2 characters have been added instead of some less-often used characters like the line-drawing and the greek ones.
cp855ibm855, 855, csibm855table / cp855IBM 855 - the updated version of CP 437 that supports Cyrillic.
cp866866, IBM866, CSIBM866table / cp866IBM 866 - the updated version of CP 855 which follows more the logical Russian alphabet ordering of the alternative variant that is preferred by many Russian users.
euc_jpeucjpeuc / jis_x0208_1990, jis_x0201_1976, jis_x0212_1990EUC-JP - The EUC for Japanese.
euc_kreuckreuc / ksx1001EUC-KR - The EUC for Korean.
euc_tweuctweuc / cns11643_plane1, cns11643_plane2, cns11643_plane14EUC-TW - The EUC for Traditional Chinese.
iso_8859_1iso8859_1, iso88591, iso_8859_1:1987, iso_ir_100, latin1, l1, ibm819, cp819, csisolatin1table / iso_8859_1ISO 8859-1:1987 - Latin 1, West European.
iso_8859_10iso_8859_10:1992, iso_ir_157, iso885910, latin6, l6, csisolatin6, iso8859_10table / iso_8859_10ISO 8859-10:1992 - Latin 6, Nordic.
iso_8859_11iso8859_11, iso885911table / iso_8859_11ISO 8859-11 - Thai.
iso_8859_13iso_8859_13:1998, iso8859_13, iso885913table / iso_8859_13ISO 8859-13:1998 - Latin 7, Baltic Rim.
iso_8859_14iso_8859_14:1998, iso885914, iso8859_14table / iso_8859_14ISO 8859-14:1998 - Latin 8, Celtic.
iso_8859_15iso885915, iso_8859_15:1998, iso8859_15,table / iso_8859_15ISO 8859-15:1998 - Latin 9, West Europe, successor of Latin 1.
iso_8859_2iso8859_2, iso88592, iso_8859_2:1987, iso_ir_101, latin2, l2, csisolatin2table / iso_8859_2ISO 8859-2:1987 - Latin 2, East European.
iso_8859_3iso_8859_3:1988, iso_ir_109, iso8859_3, latin3, l3, csisolatin3, iso88593table / iso_8859_3ISO 8859-3:1988 - Latin 3, South European.
iso_8859_4iso8859_4, iso88594, iso_8859_4:1988, iso_ir_110, latin4, l4, csisolatin4table / iso_8859_4ISO 8859-4:1988 - Latin 4, North European.
iso_8859_5iso8859_5, iso88595, iso_8859_5:1988, iso_ir_144, cyrillic, csisolatincyrillictable / iso_8859_5ISO 8859-5:1988 - Cyrillic.
iso_8859_6iso_8859_6:1987, iso_ir_127, iso8859_6, ecma_114, asmo_708, arabic, csisolatinarabic, iso88596table / iso_8859_6ISO i8859-6:1987 - Arabic.
iso_8859_7iso_8859_7:1987, iso_ir_126, iso8859_7, elot_928, ecma_118, greek, greek8, csisolatingreek, iso88597table / iso_8859_7ISO 8859-7:1987 - Greek.
iso_8859_8iso_8859_8:1988, iso_ir_138, iso8859_8, hebrew, csisolatinhebrew, iso88598table / iso_8859_8ISO 8859-8:1988 - Hebrew.
iso_8859_9iso_8859_9:1989, iso_ir_148, iso8859_9, latin5, l5, csisolatin5, iso88599table / iso_8859_9ISO 8859-9:1989 - Latin 5, Turkish.
iso_ir_111ecma_cyrillic, koi8_e, koi8e, csiso111ecmacyrillictable / iso_ir_111ISO IR 111/ECMA Cyrillic.
koi8_rcskoi8r, koi8r, koi8table / koi8_rRFC 1489 Cyrillic.
koi8_rukoi8rutable / koi8_ruThe obsolete Ukrainian.
koi8_ukoi8utable / koi8_uRFC 2319 Ukrainian.
koi8_unikoi8unitable / koi8_uniKOI8 Unified.
ucs_2ucs2, iso_10646_ucs_2, iso10646_ucs_2, iso_10646_ucs2, iso10646_ucs2, iso10646ucs2, csUnicodeucs_2 / (UCS)ISO-10646-UCS-2. Big Endian, NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_2_internalucs2_internal, ucs_2internal, ucs2internalucs_2_internal / (UCS)ISO-10646-UCS-2 in system byte order. NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_2beucs2beucs_2 / (UCS)Big Endian version of ISO-10646-UCS-2 (in fact, equivalent to ucs_2). Big Endian, NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_2leucs2leucs_2 / (UCS)Little Endian version of ISO-10646-UCS-2. Little Endian, NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_4ucs4, iso_10646_ucs_4, iso10646_ucs_4, iso_10646_ucs4, iso10646_ucs4, iso10646ucs4ucs_4 / (UCS)ISO-10646-UCS-4. Big Endian, NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_4_internalucs4_internal, ucs_4internal, ucs4internalucs_4_internal / (UCS)ISO-10646-UCS-4 in system byte order. NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_4beucs4beucs_4 / (UCS)Big Endian version of ISO-10646-UCS-4 (in fact, equivalent to ucs_4). Big Endian, NBSP is always interpreted as NBSP (BOM isn’t supported).
ucs_4leucs4leucs_4 / (UCS)Little Endian version of ISO-10646-UCS-4. Little Endian, NBSP is always interpreted as NBSP (BOM isn’t supported).
us_asciiansi_x3.4_1968, ansi_x3.4_1986, iso_646.irv:1991, ascii, iso646_us, us, ibm367, cp367, csasciius_ascii / (ASCII)7-bit ASCII.
utf_16utf16utf_16 / (UCS)RFC 2781 UTF-16. The very first NBSP code in stream is interpreted as BOM.
utf_16beutf16beutf_16 / (UCS)Big Endian version of RFC 2781 UTF-16. NBSP is always interpreted as NBSP (BOM isn’t supported).
utf_16leutf16leutf_16 / (UCS)Little Endian version of RFC 2781 UTF-16. NBSP is always interpreted as NBSP (BOM isn’t supported).
utf_8utf8utf_8 / (UCS)RFC 3629 UTF-8.
win_1250cp1250Win-1250 Croatian.
win_1251cp1251table / win_1251Win-1251 - Cyrillic.
win_1252cp1252table / win_1252Win-1252 - Latin 1.
win_1253cp1253table / win_1253Win-1253 - Greek.
win_1254cp1254table / win_1254Win-1254 - Turkish.
win_1255cp1255table / win_1255Win-1255 - Hebrew.
win_1256cp1256table / win_1256Win-1256 - Arabic.
win_1257cp1257table / win_1257Win-1257 - Baltic.
win_1258cp1258table / win_1258Win-1258 - Vietnamese7 that supports Cyrillic.