Encoding

“Ah Gravity, Thou Art a Heartless Bitch

Sheldon Cooper – The Big Bang Theory

I like how in that Series, Sheldon’s okay with calling gravity that but the similar feeling I had when I was first researching about encoding. It is that little piece of the puzzle which is easier to reason about but often overlooked in programming.

During my online research I found some good articles to understand the concepts and in this post I will try to summarize it in a way that when later we will implement it, that will make most sense and will come naturally to us. Please leave your comments, if you would like to share your understanding of the same topic or may be have some interesting facts or examples. So, lets discuss what is encoding, why it is important and in coming posts will see some implementation details in C#. We will also have this post a reference point so we can refer to it anytime we need.

To encode means to use something to represent something else. An encoding is the set of rules with which to convert something from one representation to another.

If you are dealing with text in a computer, you need to know about encoding. Period.

A computer cannot store “letters”, “numbers”, “pictures” or anything else. The only thing it can store and work with are bits. A bit can only have two values: yes or no, true or false, 1 or 0 or whatever else you want to call these two values.

To use bits to represent anything at all besides bits, we need rules. We need to convert a sequence of bits into something like letters, numbers and pictures using an encoding scheme or encoding for short. like this:

the above encoding scheme happens to be ASCII. In total there are 128 characters defined in the ASCII encoding.

Encode: Convert into a coded form (code = a system of words, letters, figures or other symbols)

Text is always a sequence of bits which needs to be translated into human readable text using lookup tables. if the wrong lookup table is used, the wrong character is used.

You’re never actually directly dealing with “characters” or “text”, you’re always dealing with bits as seen through several layers of abstractions.

If two systems are talking to each other, they always need to specify what encoding they want to talk to each other in. the simplest example of this is this website telling your browser that it’s encoded in UTF-8.

Bits, Bytes and Representation of Information

Digital representation means that everything is represented by numbers only.
The usual sequence:

  • Something (sound, pictures, text, instructions….) is converted into numbers by some mechanism.
  • These numbers can be stored, retrieved, processed, transmitted.
  • The numbers might be reconstituted into a version of the original.

Binary, Octal, Decimal, Hex

There are many ways to write numbers. e.g.
10011111 (binary) is equal to:

  • 237 in octal.
  • 157 in decimal.
  • 9F in hexadecimal

They all represent the same value, but hexadecimal is shorter and easier to read than binary.

Hex Encoding / Decoding .NET

Well, lets start simple, we will try to encode a plain-text “Hello World” in a hex representation (hex string). Then the other part of the equation is to construct the Plain-Text from this hex-string. You can find a lot of other examples in different programming languages as well and various variations or patterns. I will try to keep the process as simple as possible. Please see the code below which is self explanatory, however, if something is not clear, feel free to ask in comment and I will try answer as well. The focus here is to get ourselves familiarize with the code and we will keep building on top of that.

Encoding Example

Decoding Example

So, this was a basic introduction of encoding (at some degree also decoding) mechanics with a simple example. We will build on top of that knowledge to be able to communicate with devices.

Have a nice day and Happy Coding

Jawad Hasan Shani

References:
http://kunststube.net/encoding/
http://dotnet.sys-con.com/node/192527/mobile