ASCII Encoding
Imagine you have some text that you want inside a computer, such as the following:
aaAH! computers!
This is complicated by the fact that computers aren't very smart and don't know what letters are.
All digital information must eventually be stored as a series of 0
s and 1
s. This is known as binary, with each 0
or 1
being a "bit." In order to store and display text, we have to somehow convert all of our complicated human symbols into binary.
We call this process encoding. Roughly, we may say encoding is "taking information stored in one format, and turning it into another." There are essentially an infinite number of ways to do that, and computer scientists spend a lot of time arguing about them.
It's important to recognize that there's a somewhat fundamental obstacle with representing things in binary—there are no spaces to separate things! Say for instance you started writing out the alphabet like this, counting up in binary:
Letter | Code |
---|---|
a | 0 |
b | 1 |
c | 10 |
... | ... |
This starts to work, for instance if we just see a 1
that's clearly b
, and just a 0
is a
, but what about 10
? Does that mean ba
or c
? We aren't allowed to do something like 1 0 10 = bac
, it has to all be in one long line like 1010
.
This introduces us to two fundamental questions any text encoding must answer:
- - Can I represent every letter uniquely?
- - Can I tell where one letter ends and another begins?
ASCII is a simple and widely recognized English text encoding that fits these requirements. It's old and has some historic quirks, but for now we'll pretend it's perfect.
It solves our problem in the most straightforward way: just make everything the exact same size. It turns out that if we use 7 bits per character (allowing us to have 128 unique codes), we can represent all uppercase and lowercase letters, along with some important extras like spaces, punctuation, and special characters.
For ~computer science reasons~, we often add another 0
to the beginning to bring this up to a nice even 8 bits. So, to find where one letter begins and another ends in our long string of binary, we can simply go forward and backward 8 bits and we'll always land on a clean boundary.
Letter | Code |
---|---|
a | 01100001 |
b | 01100010 |
c | 01100011 |
... | ... |
So, using ASCII, our text from earlier becomes:
=
or written out with a little more separation,
That's it! We've successfully taken some text and converted it to binary using ASCII encoding. You can also play around with ASCII by changing the text displayed above: