RSGC ACES' PRIMER ON FIXED POINT MATHEMATICS (aka INTEGER MATH) Originally prepared in 2009 for Java Software ACES ; updated in 2019 for Arduino C Hardware ACES 
Introduction
Intense real time computational mathematics involving real numbers (as typically
found in 3D gaming software) places a unique burden on the processor. Over
the past few decades,
game developers
have overlooked software techniques to optimize performance, relying heavily
on increases in the speed of processors as compensation. However, with
the advent of small, inexpensive, handheld devices sharing the spotlight
with traditional platforms, the
need
for software to do the heavy lifting has returned.
Fixed Point vs Floating Point
At the most fundamental level, all computer platforms employ a processor to
manipulate bits (0s and 1s) of memory.
The processor
bundles groups of 8, 16, 32, or 64 bits and passes them around
in its internal bus while executing instructions. The numbers that these bundles
represent typically fall into one of two categories: Fixed Point or
Floating Point. The difference being the location of the decimal point
within the bundle.
When the Java programmer declares a variable of type int, he is requesting that 32 bits be reserved and that the position of the decimal point is to the right of the least significant bit, just as it is when when we write any integer in mathematics. Since the location of the decimal point remains in this location at all times, we refer to this as the fixed point representation.
When the Java programmer declares a variable of type double, he is requesting that 64 bits be reserved in a structure similar to scientific notation. Within this design, 52 bits are set aside to represent the number (mantissa), and 11 bits are reserved for the exponent. The value of exponent essentially dictates the location of the decimal point. In this manner, we say the location floats and identify this design as the floating point representation.
Performance Considerations
Keeping track of the location of the decimal point in the
floating point representation requires resources. Some computers solve
this problem by including a coprocessor for floating point manipulations or
simply additional
circuitry
that exploits a hardware solution to the problem of decimal point location management.
Either way, overall performance is adversely affected as additional instructions
must be added to the instruction set.
Optimal performance would then seem to rest with calculations involving fixed point representations such as ints where the locations of the decimal place is never in doubt as it remains in one place. The only problem is that we need fractional quantities to the right of the decimal point.
Java's Bitwise Operators
Below is a summary by example of the useful operators Java offers
for lowlevel bit manipulation of the four integer types.
Operator 
Operation 
Example 
Details 
>> 
right shift with sign extension 
56 >> 3 = 7 
111000 becomes 000111 
<< 
left shift 
3 << 4 = 48 
11 becomes 110000 
& 
AND 
5 & 6 = 4 
101 & 110 = 100 
 
OR 
5  6 = 7 
101  110 = 111 
~ 
NOT (complement) 
~20 = 21 
~00010100 = 11101011* 
Implicit Location of the Decimal Point
To capitalize on the processor's efficiency in the manipulation of ints
while maintaining fractional values, the programmer simply decides on an
implicit position of the decimal place that best fits the range of numbers
he needs to work with. For simplicity, let's assume that ints are only 8
bits wide (not 32), the decimal place is in the middle, between
bit 3 and 4, and that they are unsigned. What number then would the following
represent?
1  0  1  1  0  1  0  1 
This design could be referred to as 4.4, since there are 4 bits to the left of the assumed position of the decimal point and 4 bits to the right. What number would the configuration above represent under a 5.3 design?
Addition
As with addition tasks you undertake manually, once the decimal
places of the operands are aligned, the result
can be obtained. The location of the decimal place of the result remains the
same as each of the two operands. No adjustment necessary.
Review of Data Types
To assume management responsibilities for the location of the decimal point,
it is essential to understand the number of bits in each of the
four integer types available in Java.
Type 
Bytes 
Bits 
States 
Range 
byte 
1 
8 
256 
128..127 
short 
2 
16 
65,536 
32768..32767 
int 
4 
32 
4,294,967,296

2,147,483,648..2,147,483,647 
long 
8 
64 
18,446,744,073,709,551,616 
9,223,372,036,854,775,808..9,223372036,854,775,807 
Multiplication
The strategy for handling fixed point multiplication is similar to the way you
multiply decimal numbers. Once the numbers are rightaligned, you multiply
without regard to the number of decimal places, and then set the number of
decimal places in the product to the sum of the numbers of decimal places
in the multiplicands.
However, unlike addition, the number of decimal places in the product can be as many as double the number of places in the multiplicands. To accommodate this, cast the int operands up to long for the product and then, after adjusting the decimal place, cast the result back to int prior to returning it.
Fractions
Fixed Point representation of fractions consisting solely of sum of powers of
2 with negative exponents can be created in the following straightforward manner.
Expression 
Value 
1 << FP.res  1 
0.5 
1 << FP.res  3 
0.125 
3 << FP.res  1 
1.5 
15 << FP.res 2 
3.75 
Other fractional values can only be approximated and the programmer has to live with roundoff error. Consider the following examples of the binary equivalents of the reciprocals of the first few integers.
n  1/n base 2 

2
 .1 
3
 .01 01 
4
 .01 
5
 .0011 0011 
6
 .001 01 
7
 .001 001 
8
 .001 
9
 .0 000111 
10
 .00011 0011 
11
 .00010111 0100010111 
12
 .0001 01 
13
 .0001001 110110001001 
14
 .0 001 
15
 .0001 
For Fixed Point representations of other rational numbers (fractions) a division method can be employed.
Division
This could be handled two ways. First, would be to combine the earlier concepts
of multiplication and reciprocal fractions. The second is to use integer
division (int/int)
through the use of the / operator. Since the operands are not floating point
values the processing cost is reasonable.
The latter technique requires two considerations. First, to avoid the truncation error associated with integer division, the dividend can be shifted left as far as you like (preferably as far as possible). Second, to normalize the result, the quotient is shifted right the correct number of places to preserve your Fixed Point format.
Negative Numbers: Two's Complement
The Two's Complement Algorithm is the method used by most computer platforms
to negate integers. The internal binary representation for 1 is the result
of applying the Algorithm to +1. This is demonstrated as an 8bit example
in the table below, resulting in 1 being stored as 1111 1111. Now, grab
a piece of paper and apply the Two's Complement Algorithm to 1 and examine
the outcome. As you
see from your result, the Two's Complement Algorithm should be thought
of as the process of
negation.
x: 
0 
0 
0 
0 
0 
0 
0 
1 
~x: 
1 
1 
1 
1 
1 
1 
1 
0 
+1: 
1 

x: 
1 
1 
1 
1 
1 
1 
1 
1 
Below is a table consisting of the nibblesize binary representations of the integers in the range, 2^{3}<= n <= 2^{3}1.
n 
Internal Binary Representation 
7 
0111 
6 
0110 
5 
0101 
4 
0100 
3 
0011 
2 
0010 
1 
0001 
0 
0000 
1 
1111 
2 
1110 
3 
1101 
4 
1100 
5 
1011 
6 
1010 
7 
1001 
8 
1000 
