Why character sets matter

A symbology is not a free-form text encoder. Each one supports a specific list of characters - some only digits, some letters and digits, some the full 128-character ASCII alphabet, some byte-mode and Kanji. Choosing the wrong symbology for the data you need to carry is one of the most common mistakes when designing labels: a serial number with an embedded letter cannot go in an EAN-13 , and a Chinese product name cannot go in a Code 39.

The character set is part of the symbology specification; it determines what module patterns are defined and which check-digit arithmetic can be applied.

Numeric-only symbologies

Encode digits 0–9 and nothing else. Smallest possible character set, smallest possible printed footprint for the same number of characters.

SymbologyLengthNotes
EAN-13 13 digits (incl. check)Worldwide retail POS
EAN-8 8 digitsSmall items
UPC-A 12 digitsUS/Canada retail POS
UPC-E 8 digits (compressed UPC-A)Very small items
Interleaved 2 of 5 Even number of digitsCartons, ITF-14
ITF-14 14 digits (fixed)Outer-case GTIN-14
Industrial 2 of 5 Variable digitsOlder industrial labels
Code 11 Digits 0–9 plus -Telecoms equipment
PLANET , IMb 20/25/29/31 digitsUSPS mail tracking

Alphanumeric symbologies (43-character)

Encode digits 0–9, the 26 uppercase letters A–Z, plus a handful of special characters. Total alphabet is 43 characters - hence "mod-43" check-digit arithmetic.

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z - . space $ / + %

The symbologies that share this 43-character alphabet:

  • Code 39 - the original 1974 design
  • LOGMARS - Code 39 + mandatory mod-43 check (US DoD MIL-STD-1189B)
  • Codabar - similar 16-character data set (0–9 plus -$:/.+) plus A/B/C/D as start/stop

Full ASCII (128-character)

Encode the complete 128-character ASCII alphabet - digits, both letter cases, punctuation, control codes. The data set is internally divided into three subsets that the encoder switches between to save modules.

SymbologyCharacter setSubset trick
Code 128 Full ASCII (0–127)Three character subsets: A (control + uppercase + digits), B (printable ASCII + lowercase), C (numeric pairs - two digits per character for high density). Encoder picks the optimal subset and switches with shift codes.
GS1-128 Code 128 + FNC1 separatorThe first character is FNC1 (a Code 128 control character); subsequent FNC1s separate variable-length fields. The character set is the same as Code 128.

2D symbologies: multiple data modes

2D barcodes support several internal encoding modes and switch between them for efficiency - numeric data packs more densely than mixed alphanumeric, and a byte mode covers anything outside the structured alphabets.

SymbologyModes
QR Code Numeric, alphanumeric (45-char), byte (ISO-8859-1 default; UTF-8 via ECI), Kanji (Shift JIS), and ECI for switching to other encodings. Per-segment mode switch within one symbol.
Data Matrix ASCII, C40 (uppercase + digits, 3 chars per 2 codewords), Text (lowercase variant), X12 (EDI), EDIFACT, Base 256 (byte mode). Per-segment mode switch.
PDF417 Text (alphanumeric + punctuation), Byte, Numeric. Per-segment mode switch with shift codes.
Aztec Upper, Lower, Mixed, Punct, Digit, Byte modes. Designed for high efficiency on short data.

Quick reference: pick a symbology by character set

If your data is…Use…
Digits only, retail GTINEAN-13 / UPC-A (or QR Digital Link)
Digits only, outer carton GTIN-14ITF-14
Mixed uppercase letters and digits, shortCode 39 or LOGMARS
Mixed case, special characters, variable lengthCode 128 or GS1-128 with AIs
Anything longer than ~30 charactersQR Code, Data Matrix, PDF417, or Aztec
Unicode / non-Latin textQR (Kanji + byte/ECI modes) or Data Matrix (byte mode + ECI)
Embedded URL for consumer scanningQR (GS1 Digital Link if retail)

Related