UTF-8 Barcode Unicode Character Encoding
UTF-8 is a variable length method of encoding Unicode characters such as Chinese, Japanese, Kanji, Arabic, Russian or Thai characters for example. Any character in the Unicode standard can be encoded in UTF-8. The first 128 characters (US-ASCII) use only one byte and do not require conversion. To properly encode characters above U+007F two or more bytes are necessary. To encode these characters in 2D barcodes such as PDF417, Data Matrix and QR Code, the data must first be converted to a string of bytes in little-endian mode without the byte order mark (BOM). In addition, the decoder must be able to properly decode the data. This conversion should take place before encoding the bytes into the barcode. If it is possible to encode ASCII characters instead of UTF-8 it is recommended.
IDAutomation offers a built-in UTF-8 conversion to byte method for the encoding of Unicode characters above U+007F in 2D barcodes such as PDF417, Data Matrix, and QR Code. Any UTF-8 character in the Unicode range (0-65535) can be encoded using this method.
This built-in method of conversion is available for Data Matrix & QR-Code (Refer to the bottom of the page for PDF417) in the 2021 or later versions of the following products:
- .NET Barcode Generator (Framework, Standard, Core, .NET 6 +)
- ASPX Barcode Generator Script
- Access Native Barcode Generator
- Barcode Generator Subscription Service (SaaS)
- Barcode Label Software Pro
- Crystal Reports Native Generator
- Excel Barcode Generator
- FileMaker Barcode Generator
- Google Sheets | Docs | Apps Script
- IIS Streaming Barcode Server
- Java Barcode Package
- JavaScript Barcode Generator
- SSRS Barcode Generator
- Windows Forms Control (WinForms)
UTF8 is also supported in the following 2D Font Packages:
IDAutomation currently offers these products by request for all Developer Licenses and above with an active Level 2 Support and Upgrade Subscription. IDAutomation can also provide source code by request of any developer license purchase so this conversion method can be performed outside of the barcode generation component. This built-in method converts the text string into a sequence of bytes (using 1 byte for the range [0-127], 2 bytes for the range [128-2047] and 3 bytes for the range [2047-65535] and arranges the byte sequence into a new string in little-endian mode without BOM. This is the format most scanners and decoders use.
Reading and Decoding UTF-8 in 2D Barcodes
Most USB barcode scanners cannot properly decode barcodes that include UTF-8 or Unicode. The following barcode decoder apps have been tested and are known to properly decode UTF-8:
Recommended Product:
- IDAutomation Barcode Decoder Verifier App & SDK (iOS | Android)
- Decodes UTF-8 in QR-Code
- Decodes TLV and Base64 encoded data
- Provides detailed information about the symbol
- Available to developers as a Visual Studio Xamarin project
UTF8 Encode and Decode Example:
QR Code Symbol with UTF-8 Encoding.
Decode using the IDAutomation Barcode Decoder Verifier App.
Other UTF8 Decoding Products:
- Cognex Barcode Scanner App & SDK (iOS | Android)
- BeeTag on iOS by Connvision Ltd. (Does not scan large codes)
- GDPicture.NET (Latest version only)
- iOS camera app (for QR Code only)
PDF417 UTF-8 Support
Encoding UTF-8 in PDF417 is not very efficient compared to Data Matrix and QR Code, therefore it is not recommended. However, we have included this functionality in some products. The built-in method of encoding UTF-8 in PDF417 is in the latest version of the following products:
- .NET Barcode Generator
- ASPX Barcode Generator Script
- Access Native Barcode Generator
- Crystal Reports Native Generator
- Excel Barcode Generator
- Java Barcode Package
- JavaScript Barcode Generator
- SSRS Barcode Generator
- PDF417 Font and Encoder (.NET, Java, VBA, Crystal Reports, SSRS, Access, and Excel only)