keyboard, computer, hardware

Numerical Bases – Part 2: Bit-wise operations

In my previous post, I covered the essentials of how to convert between representations of a number in different bases.
In this post, I will go into some more advanced topics related to numerical bases.

The primary use of a numerical base is to represent numbers. At the start, a basis is chosen for convenience. We ended up on base 10 ostensibly because we have 10 fingers and we’d use them to count. When they were gone, we’d count up hands, thereby introducing a positional system.

But that did not have to be like that, it is not necessarily an accident we ended up with base 10, but it was not a foregone conclusion we’d end up there. For example, in French there are vestiges of a base-20 thinking, in the words for numerals between 60 and 99. Unlike in the other Romance languages, or English, where numerals are organised in groups of 10, French numerals from 61 to 79 are all prefixed by “soissante” (“sixty”) followed by a unit from “one” to “nineteen”. Numbers from 81 to 99 are prefixed by the amazingly descriptive word “quatre-vingts” for “eighty”, literally meaning “four-twenties”.

Even more familiar to us, although maybe in hiding, are Duodecimal and Sexagesimal that we see present in our time reckoning (hours, minutes, seconds) inherited from the Summerians and Babylonians and on several traditional measurement systems, among which the British Imperial system.

The reason base-12 is so favoured for units of measurement is that it has more divisors, and so it is possible to represent more fractions exactly, instead of approximations by repeating decimals. For example, the quantity 1/3 becomes 0.333... in base 10, but 0.4 in base 12.

The purpose of this introduction is to show that using different bases for different purposes, depending on your needs, is a natural choice to make and that we should not feel constrained by base-10 just because we’ve been using it all our lives. And in computing, we have good reasons to use base-2 or base-16, not only to speed up computations but also to manipulate information.

From the start of computing, computers have been binary. What this means is that the smallest unit of information we can store in a computer, or rather a memory circuit, is a state being turned ON or OFF. In physical terms, this is represented by the passage or blocking of an electrical current.

As well as a minimal unit of information, we also have minimal circuits that implement the basic logical operations of negation, conjunction and disjunction. The corresponding circuits are known as ‘logic gates’, and their names are, respectively, NOT, AND, and OR.

By combining these gates and an arbitrary number of bits brought together, we can devise a logical circuit for any computable function and ultimately any classical (ie. non-quantum) computer. In practice, circuits can be constructed only with NAND gates but they are less intuitive and do not show up in programming languages. But the basic logic gates, NOT, AND, OR and XOR, do, and they are very often used.

The reason is because they are very fast operations (due to them being easily mapped to basic gates) which allow us to operate on each bit individually, and saving memory in the process. So, how can we do it?

In the old Spectrum days, programmers had to save memory and time wherever they could. They could not afford to store a single boolean value in a 32-bit integer, wasting 31 entire bits. If they wanted to represent state of an object in a game, every aspect of it would be coded in as few bits as possible. That means, for example, that a single byte could represent several different things about a moving object. Similar packing is still very used in Network protocols, like IP or TCP.

For example, if you had a figure moving in a platform game, that would go from left to right and back again at one of 4 different speeds, alternating between two colours as it went and changing its graphics (aka sprite) at each position (to give the illusion of animation) and choosing between two modes of locomotion (running or flying) then all of this state could be represented in a single byte like this:

bit 7: direction of movement
bit 6-5: speed
bit 4: colour
bit 3-1: sprite
bit 0: locomotion mode

It’s a lot of things to pack in just a single byte. Although this byte clearly corresponds to a decimal number, the programmer is not interested in this value at all. No. This byte is in effect a single record with 5 different fields and each of them has to be changeable independently of the others. And how do we do that efficiently?

Suppose the figure is currently represented by the sixth (out of 8) sprite of the flying animation, moving towards the left (vat speed 2 (out of 0 to 3) and in red instead of white. This could be represented for example by this:

bit 7: 0
bit 6-5: 10
bit 4: 1
bit 3-1: 101
bit 0: 1
Byte: 01011011
Decimal value: 91

At the next frame, the figure will have advanced to the next sprite in the cycle, and changed colours back to white, conserving all other values. The result would be

01001101(2) = 77(10)

We can change the state by updating the byte’s value, subtracting 14.
For the next step, the sprite number increases again, and once more it reverts to read. The new state is:

01011111(2) = 95(10).

So, in the first case I subtracted 14, in the second case I added 18. Is there a consistent logic to this? Can we map every possible state change to a value that we add or subtract?

I might be able to do that, but it is going to be very tedious and difficult. If there are several fields changing in one change of state, I’d have to consider all the possible combinations of the current state of those fields, and there are better things I can do with my time. What I really want is to isolate each field change in an individual operation (or series thereof) and then execute them in any order. Let’s take a simple one: toggle the colour:

IF colour == white THEN change to red:
IF bit4 == 0 SET bit4 = 1. Equivalently: ADD 16

IF colour is red THEN change to white:
IF bit4 == 1 SET bit4 = 0. Equivalently: SUB 16

Adding or subtracting 16 is easy, but there is the matter of the IF. How do I know what colour is set? That depends on the state of all the other bits, and because there are 7 of them, they can be in 128 different states. That is, out of 256 possible numbers represented by the byte, half of them will represent the red colour, and half the white colour. If we are going to rely on standard integer operations, we will have to test 128 equalities, or do a series of divisions until we get the value of that bit.

This is where bitwise operations come in, and where they can save the day elegantly and powerfully. Colour is represented by a single bit. To toggle it, I just have to toggle the bit, that is:

SET bit4 = NOT bit4

Your chosen programming language may well have a simple instruction that toggles precisely bit 4 of a certain byte. More often that not, though, that won’t be the case and you can only manipulate bytes as a whole (or in multiples). In this case, I only want to toggle one bit and keep all the others. This can be achieved by the XOR function:

XOR| 0 | 1
---+---+---
 0 | 0 | 1
---+---+---
 1 | 1 | 0

Notice that b XOR 0 = b and b XOR 1 = NOT b where b is a single bit. This operator works as either the identity or the negation function, depending on its second argument, and so can be used to do a selective NOT by XORing with 0 where we want to keep a bit and XORing with 1 where we want to change it. To change only bit 4, I do this:

TOGGLE bit 4:
SET bit4 = bit4 XOR 00010000(2).

This second argument is a bit-wise mask. A mask defines the bits we want to change with a logical operation, by placing at those positions its neutral element (the bit value that does nothing) and putting the opposite bit value in the other positions. This logic works for all the basic gates.

A more convoluted example is given by the change of sprite. The implicit logic of the animation states that after it reaches the last sprite of the sequence it should go back to the first. That is, from sprite 111(2) we go back to sprite 000(2). Incrementing the sprite number must be cyclic and corresponds to consecutively adding 1 modulo 8. Except that these bits are in the middle of a byte, and so simple addition is powerless to do it, even with a MOD operator.
This is going to require well coordinated use of several different (mostly bitwise) operations.

The target bits occupy positions 3-1. In isolation they can represent the numbers 0 to 7. One way of incrementing this counter and applying the modulo is to do a three-fold algorithm:

1. Read the counter value (read bytes 1-3)
2. Increment it and apply modulo 8
3. Place the result back in the byte (write to bytes 1-3)

Step 1: read bits 1-3 of byte n

  1. Copy the existing state value to a safe place (eg the stack)
  2. n = n AND 00001110<sub(2)< sub=””></sub(2)<>
  3. n = n SHIFT RIGHT 1

In item 2, I executed an AND with a new mask. Because the neutral element of AND is 1, this keeps exactly bits 1-3 unchanged, and applies AND 0 to all others. The effect is to set those bits of n to 0.

At the end of step 2, I have only the 3 bits I care for, but they are not in the position I want. If only bit 1 is set (ie equal to 1), the value of n is 2. If I increase the counter by 1, n will increase by 2 each time. It is, in short, double what I want. That is because there is one bit to the right of the counter bits. I can discard it and push the counter bits to the right by using the operator SHIFT RIGHT one time.

Step 2: Add 1 modulo 8

  1. n = n+1
  2. n = n AND 00000111<sub(2)< sub=””></sub(2)<>

At this stage I have a byte whose value is a number between 0 and 7. There is no simple bit-wise operation, or logical gate, that adds one to a number. So I’ll simply use a common addition or increment. This works except for n = 7, at which point I’d get n = 8.
I would like 0 instead.
But that is not a problem. 8 is the first number that requires 4 bits to represent, which means its lowest 3 bits will all be 0. If I just ignore anything except these 3 bits, I will have a cyclic increment with modulo already computed. That is precisely what the AND does.

This is a very useful trick. Computing the remainder (the result of a division modulo something) is in general an expensive calculation, requiring a division to be performed. But when the modulo is a power of 2, there is a shortcut: the remainder is anything to the right of the bit set in the modulo. Remember that the remainder is obtained from the integer division:

n / m = (q, r), where:
n = q x m + r

In the above, q is the quotient, and r is the remainder, and the quotient is the largest integer that satisfies the above (for the simpler case where all numbers are positive). Now, look at n in its binary form, for example

10010010110.

I have highlighted bit 3 to use as an example. If we make m=23=8, we can rewrite n according to the integer division formula:

n = 10010010000 + 110

where I have removed to a second term all the bits to the right of bit 3.
Now, in binary, as in base 10, a number ending in 0 can be exactly divided by the base in which it is represented. For example, 520 is a multiple of 10 and can be divided by 10 by shifting the numbers right once. Equally, 138000 / 1000 can be done by shifting 3 times. With this same trick, n can be rewritten:

n = 10010010(2) x 23 + 110(2)

It should now be obvious that the 3 rightmost bits are the remainder of n / 23 which can be computed with the AND shown above; and also that the integer division n / 23 can be done simply by shifting 3 times to the right and discarding the 3 rightmost bits.

Step 3: write to bits 1-3 of byte n
The third step is the antithesis of the first. After extracting the bits representing the counter, and operating on it, the result must now be placed back in its proper place.

This can be done by inverting each of the operations of step 1:

  1. restore the safe copy of the original counter and place it in A
  2. n = n SHIFT LEFT 1
  3. A = (A AND 11110001) OR n
  4. n = A

The shift instruction above is the counterpart of the shift in Step 1 and places the current value of the counter in its correct position, bits 1-3.
The next instruction is the most interesting one: there are two logical operations in here. The first applies to A the inverse of the mask used in Step 1, and has the effect of erasing exactly the bits 1-3, that is, those where we want to place the new value of the counter. All the other bits of A are preserved by the AND operation, since 1 is its neutral element.
Then, the OR operation simply pastes the value of n into the hole created in A.

This works because the previous steps have made sure that the value of the counter present in n was only written in 3 bits. If this were not the case or I wanted to be safe, I’d add a second mask like this:

A = (A AND 11110001) OR (n AND 00001110)

ensuring that the “hole” created by the mask in A is matched to a “key” in the exact same position in n.

Conclusion

In summary, a mask is a sequence of bits that, when applied to a logical operation and an argument, keeps some bits of that argument unchanged, and applies the operation to the others.
Think of a mask as something that covers a face, except where it has holes: a bit mask “covers” some bits and only lets the others be visible to the operation.
The covered bits are those that fall in positions where the mask has the neutral element of the operation (1 for AND and 0 for OR,XOR).

The example above did not afford the occasion to use any OR-mask, but we saw both masks for XOR and AND. Typically, we use XOR for inverting bits or, more generally, for operations that need to be reversed, because XOR is its own inverse. That is, if you apply the same XOR twice you go back to where you started: a XOR b XOR b = a.
The example above showed a typical usage of AND, which is as a selector of important bits. Unselected bits are discarded, that is, turned to 0.
The OR operation is the dual of the AND: it works in the same way, but “discarded” bits are instead turned to 1, and the neutral element is 0 instead of 1.

This post explored the usage of base 2 not as a representation for arithmetic calculations but rather for usages concerned with storage and precise location of bits. It showed how a number can be seen as something else than a quantity, but rather as an indexable series of data. And although I didn’t need to do any conversion between bases, understanding how numbers are represented, and how each bit contributes to the final sum is important when we need to shift bits around.

All this talk about operators may feel rather abstract. In reality, though, you should always keep a programming language in the back of your mind, as these operators usually have a counterpart in the language.
In C, and most of its descendants, AND, OR, XOR and NOT are available, and written respectively &, |, ^, ~. Shift operators also exist: <<,>>.

Solidity also has all of these, but for some reason the shift operators are not native: instead, they are replaced by an actual division by the corresponding power of 2! (this is scheduled to change with the coming Constantinople fork).

Python has all the operators of C, and then adds some English synonims: and, or, not.

Finally, a word about representations. When we deal with masks we usually have to think in binary, because they tend to affect individual bits, but for economy of representation they can then be turned to hexadecimal instead. The long binary masks used above could instead be written in decimal like this:

00010000(2) = 10(16)
00001110(2) = 0E(16)
00000111(2) = 07(16)
11110001(2) = F1(16)

Hexadecimal is, in effect, a convenient shorthand notation for binary data, more than a calculation tool. I hope this has given you some light into a more advanced case of using the binary representation of numbers and in particular for cases where you need to optimise for speed and memory.

See you next time.

Leave a Reply