UTF-8 Barcode Unicode Character Encoding

Barcode Information | Tutorials | FAQs at BarcodeFAQ.com

UTF-8 Barcode Unicode Character Encoding

UTF-8 is a variable length method of encoding Unicode characters such as Chinese, Japanese, Kanji, Arabic, Russian or Thai characters for example. Any character in the Unicode standard can be encoded in UTF-8. The first 128 characters (US-ASCII) use only one byte and do not require conversion. To properly encode characters above U+007F two or more bytes are necessary. To encode these characters in 2D barcodes such as PDF417, Data Matrix and QR Code, the data must first be converted to a string of bytes in little-endian mode without the byte order mark (BOM). In addition, the decoder must be able to properly decode the data. This conversion should take place before encoding the bytes into the barcode. If it is possible to encode ASCII characters instead of UTF-8 it is recommended.

IDAutomation offers a built-in UTF-8 conversion to byte method for encoding of Unicode characters above U+007F in 2D barcodes such as PDF417, Data Matrix and QR Code. Any UTF-8 character in the Unicode range (0-65535) can be encoded using this method.

This built-in method of conversion is available now (for Data Matrix, PDF417 & QR-Code) in the 2021 or later versions of the following products:

It is also supported for .NET Standard, VBA, Access, Excel, Word, Crystal Reports, Java and SSRS in the following Font Packages:

IDAutomation currently offers these products by request for all Developer Licenses and above with an active Level 2 Support and Upgrade Subscription. To obtain the built-in method for native UTF-8 encoding, open a private incident with your order number. IDAutomation can also provide source code so this conversion method can be performed outside of the barcode generation component. This conversion method is available for PDF417, Data Matrix and QR Code in most products updated 2021 or later.

This built-in method converts the text string into a sequence of bytes (using 1 byte for the range [0-127], 2 bytes for the range [128-2047] and 3 bytes for the range [2047-65535] and arranges the byte sequence into a new string in little-endian mode without BOM. This is the format most scanners and decoders use.

Reading and Decoding UTF-8 in 2D Barcodes

Most USB barcode scanners cannot properly decode barcodes that include UTF-8 or Unicode. The following barcode decoder apps have been tested and are known to properly decode UTF-8:

Recommended Product:

UTF8 Encode and Decode Example:

QR Code Symbol with UTF-8 Encoding.

QR Code Symbol with UTF-8 Encoding

Decode using the IDAutomation Barcode Decoder Verifier App.

Decode using the IDAutomation Barcode Decoder Verifier App


Other UTF8 Decoding Products:

  • Cognex Barcode Scanner App & SDK (iOS | Android)
  • BeeTag on iOS by Connvision Ltd. (Does not scan large codes)
  • GDPicture.NET (Latest version only)
  • iOS camera app (for QR Code only)