The Encoding type defines a set of supported character encodings used by methods like:
Data.fromRawString(str, encoding) — converts a text string into binary data using the specified encoding.Data.toRawString(encoding) — converts binary data back into a string using the specified encoding.These encodings allow interoperability with various text formats and systems, ensuring compatibility across languages and platforms.
| Encoding | Description |
|---|---|
| "utf-8" / "utf8" | UTF-8 (8-bit Unicode Transformation Format). The most common encoding for web and modern text processing. Efficient for ASCII-compatible text and supports all Unicode characters. |
| "utf-16" / "utf16" | UTF-16 (16-bit Unicode Transformation Format). Common in Windows and Apple platforms. Each character typically uses 2 bytes. |
| "utf-32" / "utf32" | UTF-32 (32-bit Unicode Transformation Format). Fixed 4-byte representation per character, used for direct Unicode code point manipulation. |
| "ascii" | American Standard Code for Information Interchange. Represents English letters, digits, and basic symbols using one byte (0–127). |
| "iso2022JP" | ISO-2022-JP. A Japanese character encoding used for emails and legacy systems. Supports JIS X 0201/0208 character sets. |
| "isoLatin1" | ISO-8859-1 (Latin-1). Covers Western European languages such as English, French, German, and Spanish. |
| "japaneseEUC" | EUC-JP (Extended Unix Code for Japanese). Another Japanese encoding, used mainly in Unix systems. |
| "macOSRoman" | Apple’s MacRoman encoding, historically used on classic Mac OS before Unicode adoption. |
| "nextstep" | NextStep encoding (NS encoding). A legacy encoding from NeXTSTEP systems. Rarely used today. |
| "nonLossyASCII" | Non-lossy ASCII encoding. Ensures that any Unicode string can be safely represented as ASCII escape sequences and later restored without data loss. |
| "shiftJIS" | Shift-JIS encoding for Japanese text, commonly used on Windows in Japan. |
| "symbol" | Symbol encoding, used for specialized symbol fonts such as the Symbol typeface. |
| "unicode" | A general alias for Unicode encodings (usually UTF-16). Behaves similarly to "utf16". |
| "utf16BigEndian" | UTF-16 with big-endian byte order. The most significant byte (MSB) comes first. |
| "utf16LittleEndian" | UTF-16 with little-endian byte order. The least significant byte (LSB) comes first. |
| "utf32BigEndian" | UTF-32 with big-endian byte order. |
| "utf32LittleEndian" | UTF-32 with little-endian byte order. |
| "windowsCP1250" | Windows code page 1250 for Central and Eastern European languages (e.g., Polish, Czech, Hungarian). |
| "windowsCP1251" | Windows code page 1251 for Cyrillic scripts (e.g., Russian, Bulgarian, Serbian). |
| "windowsCP1252" | Windows code page 1252 for Western European languages (similar to Latin-1 but includes additional symbols). |
| "windowsCP1253" | Windows code page 1253 for Greek language support. |
| "windowsCP1254" | Windows code page 1254 for Turkish language support. |
| "gbk" | GBK (Guojia Biaozhun Kuozhan). A widely used simplified Chinese character encoding, extending GB2312 to include traditional Chinese and Japanese kana. It is backward compatible with GB2312. |
| "gb18030" | GB18030 (National Standard of the People's Republic of China). A superset of GBK and GB2312, and the official mandatory standard in China. Fully compatible with Unicode and capable of representing all Unicode characters. |
"utf-8" is recommended due to its compatibility and efficiency."shiftJIS", "iso2022JP", and "windowsCP125x" are provided for interoperability with older file formats and systems.