UTF-8 Unicode Character Barcode Encoding
UTF-8 is a variable length method of encoding Unicode characters such as Chinese, Japanese, Russian or Thai characters for example. Any character in the Unicode standard can be encoded in UTF-8. The first 128 characters (US-ASCII) use only one byte and do not require conversion. To properly encode characters above U+007F two or more bytes are necessary. To encode these characters in 2D barcodes such as PDF417, Data Matrix and QR Code, the data must first be converted to a string of bytes in little-endian mode without the byte order mark (BOM). In addition, the decoder must be able to properly decode the data. This conversion should take place before encoding the bytes into the barcode. If it is possible to encode ASCII characters instead of UTF-8 it is recommended.
IDAutomation offers a built-in UTF-8 conversion to byte method for encoding of Unicode characters above U+007F in 2D barcodes such as PDF417, Data Matrix and QR Code. Any UTF-8 character in the Unicode range (0-65535) can be encoded using this method.
This built-in method of conversion is available now (for Data Matrix, PDF417 & QR-Code) in the 2021 or later versions of the following products:
- .NET Standard & .NET Core Barcode Generator
- Access Native Barcode Generator
- Crystal Reports Native Generator
- Excel Barcode Generator
- Java Barcode Package
- SSRS Barcode Generator
It is also supported for .NET Standard, VBA, Access, Excel, Word, Crystal Reports, Java and SSRS in the following Font Packages:
IDAutomation currently offers these products by request for all Developer Licenses and above with an active Level 2 Support and Upgrade Subscription. To obtain the built-in method for native UTF-8 encoding, open a private incident with your order number. IDAutomation can also provide source code so this conversion method can be performed outside of the barcode generation component. This conversion method is available for PDF417, Data Matrix and QR Code in most products updated 2021 or later.
This built-in method converts the text string into a sequence of bytes (using 1 byte for the range [0-127], 2 bytes for the range [128-2047] and 3 bytes for the range [2047-65535] and arranges the byte sequence into a new string in little-endian mode without BOM. This is the format most scanners and decoders use.
Reading and Decoding UTF-8 in 2D Barcodes
Most keyboard wedge and USB scanners cannot properly decode barcodes that include UTF-8 or Unicode. The following barcode decoder apps have been tested and are known to properly decode UTF-8: