What are hexadecimal numbers,
like the ones you find in Unicode values?



10

We all count in a “decimal” system. Numbers are represented in columns of ten possible digits, zero through nine (“deci-” being the Latin prefix for 10). This can also be referred to as counting in base‑10.

We count in this decimal/base‑10 system because we have ten fingers (even the word “digit” comes from the Latin word for “finger”) where once you’ve counted the tenth thing you have to start counting again from zero, while remembering that you’ve made one pass through counting on your hands. So, base‑10 is perfectly familiar for us, everything that we’re taught about math and counting happens with zero through nine.

0b10 (2)

Computers are different — when saving data to a hard drive, processing it in memory, or when really doing anything, it all has to end up in “binary” numbers. Just two digits, representing two states of “on” (one) and “off” (zero). This is a concept we’ve all at least heard of, the stock photo of a hacker overlaid with a pattern of ones and zeros is familiar enough. Even if you never come in contact with binary numbers while using a computer you can rest assured they’re there.

Binary is a base‑2 system, there are only two possible digits per column:

So it takes three columns of binary digits to count eight times (the equivalent of zero to seven in base‑10)

To make it clear that you’re counting in binary and not in base‑10 it’s common to start the number off with “0b”. This doesn’t change the value of the number, it just says “this is a binary number”. Like so:

Computers don’t mind the complexity, but for us humans binary numbers can be really difficult to keep track of when they get large. It’s not easy for us to read a binary number at a glance and keep it in mind. They can get pretty unweildy:

That’s such a long binary number! Way too long and complicated to keep track of in our heads.

So, even though computers have to think and work in binary, there has to be a better way for humans to work with them. As an example, let’s say that we limited computers to handle numbers in chunks of 10 just to keep it easy for us. If you wanted to limit a computer to only count in units of ten, here’s how numbers zero through nine are represented in binary:

It takes four columns of binary digits to write the tenth number, a nine. The thing is, whether it’s a “1” or a “0”, all of these binary digits take the same amount of space in the computer’s memory. If you’re going to try to represent a number with four digits of binary you might as well count as high as you can go with four digits:

So that’s a total of 16 base‑10 digits that can be represented in four columns of binary — zero through fifteen.

0x10 (16)

Enter: the “hexadecimal” number, base‑16. Instead of having ten digits (zero through nine) or two digits (zero and one), it has sixteen possible digits per column (zero through nine, and then the first six letters of the alphabet — “A” through “F”).

Similar to how you would commonly write a binary number starting with “0b” just to say that it’s binary, you’ll usually see a hexadecimal number starting with “0x” to let you know that it’s not a normal base‑10 number.

It’s still a little strange to read hexadecimal numbers since they don't match up with the base‑10 counting that we’re used to, but it’s better than trying to read and write in binary!

Those “0x” hexadecimal numbers are MUCH easier to read, write, and keep track of than the “0b” binary versions. Of course base‑10 would be ideal, but if the computer has to store numbers in binary we might as well just look at the binary numbers in a way that it’s easier to keep track of.

And why “A” through “F”? Some work had been done in the past to actually come up with new written forms for these extra six digits, but it’s easier to just type and remember letters that you’re familiar with. Note: it doesn’t matter if you use a capital or lowercase letter — “0x2F” is the same as “0x2f”. Maybe it could be an fun exercise to invent new shapes for these extra six digits...

Hex colors

The more you look around, the more situations you’ll find where these hexadecimal versions of binary numbers occur on the computer. When making a new document in Photoshop you might choose to make it with “8 bit” color, which means that each pixel is represented with eight columns worth of binary digits for red, green, blue and the alpha channel:

Since “0b11111111” is the number 255 in base‑10, we have a total of 256 shades in each color channel per pixel with 0 being black. Dealing with color values in ones and zeroes is too complicated for us to keep track of, so a web designer (for instance) would define colors using “hex values” which really means that these eight binary digits are reduced to two when they’re converted to be hexadecimal:

Therefore, when specifying colors for the web:

But what does this have to do with fonts?

ASCII

You might not see it, but characters of text are usually represented with these hex values too.

In older computing systems if someone typed the letter “A”, the computer wouldn’t know anything about what an “A” was, but the keyboard would really just be sending the binary signal 0b1000001 which is the number 65 in our base‑10 counting system. The computer would then know to display the 65th glyph in the font. I’m simplifying things a little, but that’s essentially all there is to it!

In this chart, from the ancient days of computing...

...if you find the letter “A” and then write out the binary “bits” that it gives you in order:

In this case, there were only 7 bits which gave you a total of 128 kinds of characters that the computer knew how to deal with. A bunch of those characters aren’t even letters or numbers but they were used for commands like ESC for the “Escape” key, HT for “Horizontal Tab” or BEL to make the computer beep when it came across that character.

Unicode

Back to the 65 for the “A”, this number 65 as a hexadecimal/base‑16 is 0x41 which just so happens to be the Unicode value for the “A”. You might be used to seeing it written as “0041” or “0x0041” but those extra zeroes are no different than knowing that the decimal number 0065 is the same as 65 without the leading zeroes.

The first 128 of Unicode values match up exactly with this early ASCII system, but then the Unicode standard takes it pretty far from there.

How far? Those four columns of hexadecimal values, 0x0000 to 0xFFFF are enough for 65536 characters, which was an early limit in Unicode (but that’s still a lot!). Now there’s 17 separate “planes” of 65536 characters each for a total of 1,114,112 code points, all still of course identifiable with their own unique hexadecimal number.

 

Want to read more?



— Andy Clymer, for Type@Cooper, August 2017