ICS4M: Fixed Point Mathematics

RSGC ACES' PRIMER ON FIXED POINT MATHEMATICS (aka INTEGER MATH)
Originally prepared in 2009 for Java Software ACES ; updated in 2019 for Arduino C Hardware ACES

Introduction
Fixed Point vs Floating Point
Performance Considerations
Java's Bitwise Operators
Implicit Location of the Decimal Point
EXERCISE SET 1
Addition
Review of Data Types
Multiplication
EXERCISE SET 2
Fractions
Division
Negative Integers: Two's Complement
EXERCISE SET 3
Reference

Introduction
Intense real time computational mathematics involving real numbers (as typically found in 3D gaming software) places a unique burden on the processor. Over the past few decades, game developers have overlooked software techniques to optimize performance, relying heavily on increases in the speed of processors as compensation. However, with the advent of small, inexpensive, handheld devices sharing the spotlight with traditional platforms, the need for software to do the heavy lifting has returned.

Fixed Point vs Floating Point
At the most fundamental level, all computer platforms employ a processor to manipulate bits (0s and 1s) of memory. The processor bundles groups of 8, 16, 32, or 64 bits and passes them around in its internal bus while executing instructions. The numbers that these bundles represent typically fall into one of two categories: Fixed Point or Floating Point. The difference being the location of the decimal point within the bundle.

When the Java programmer declares a variable of type int, he is requesting that 32 bits be reserved and that the position of the decimal point is to the right of the least significant bit, just as it is when when we write any integer in mathematics. Since the location of the decimal point remains in this location at all times, we refer to this as the fixed point representation.

When the Java programmer declares a variable of type double, he is requesting that 64 bits be reserved in a structure similar to scientific notation. Within this design, 52 bits are set aside to represent the number (mantissa), and 11 bits are reserved for the exponent. The value of exponent essentially dictates the location of the decimal point. In this manner, we say the location floats and identify this design as the floating point representation.

Performance Considerations
Keeping track of the location of the decimal point in the floating point representation requires resources. Some computers solve this problem by including a coprocessor for floating point manipulations or simply additional circuitry that exploits a hardware solution to the problem of decimal point location management. Either way, overall performance is adversely affected as additional instructions must be added to the instruction set.

Optimal performance would then seem to rest with calculations involving fixed point representations such as ints where the locations of the decimal place is never in doubt as it remains in one place. The only problem is that we need fractional quantities to the right of the decimal point.

Java's Bitwise Operators
Below is a summary by example of the useful operators Java offers for low-level bit manipulation of the four integer types.

Operator

Operation

Example

Details

>>

right shift with sign extension

56 >> 3 = 7

111000 becomes 000111

<<

left shift

3 << 4 = 48

11 becomes 110000

&

AND

5 & 6 = 4

101 & 110 = 100

|

OR

5 | 6 = 7

101 | 110 = 111

~

NOT (complement)

~20 = -21

~00010100 = 11101011*

Implicit Location of the Decimal Point
To capitalize on the processor's efficiency in the manipulation of ints while maintaining fractional values, the programmer simply decides on an implicit position of the decimal place that best fits the range of numbers he needs to work with. For simplicity, let's assume that ints are only 8 bits wide (not 32), the decimal place is in the middle, between bit 3 and 4, and that they are unsigned. What number then would the following represent?

This design could be referred to as 4.4, since there are 4 bits to the left of the assumed position of the decimal point and 4 bits to the right. What number would the configuration above represent under a 5.3 design?

EXERCISE SET 1

(Java Operators) Create a project called FPTutorial and drop in this driver and the static class, FP.java. The variable res defines the implicit position of the binary point and it has been preset to 4. Develop the body of the interpret(int n) method that uses res to determine the floating point equivalent of the fixed point parameter. Execute the project to confirm your prediction above.
Change the value of res to 3 and run your project again, to confirm your second prediction. Change the value of res back to 4.

Addition
As with addition tasks you undertake manually, once the decimal places of the operands are aligned, the result can be obtained. The location of the decimal place of the result remains the same as each of the two operands. No adjustment necessary.

Review of Data Types
To assume management responsibilities for the location of the decimal point, it is essential to understand the number of bits in each of the four integer types available in Java.

Type

Bytes

Bits

States

Range

byte

1

8

256

-128..127

short

2

16

65,536

-32768..32767

int

4

32

4,294,967,296

-2,147,483,648..2,147,483,647

long

8

64

18,446,744,073,709,551,616

-9,223,372,036,854,775,808..9,223372036,854,775,807

Multiplication
The strategy for handling fixed point multiplication is similar to the way you multiply decimal numbers. Once the numbers are right-aligned, you multiply without regard to the number of decimal places, and then set the number of decimal places in the product to the sum of the numbers of decimal places in the multiplicands.

However, unlike addition, the number of decimal places in the product can be as many as double the number of places in the multiplicands. To accommodate this, cast the int operands up to long for the product and then, after adjusting the decimal place, cast the result back to int prior to returning it.

EXERCISE SET 2

Since we'll be using int as our base type, change the value of res to 16. We will use Q(16.16) format from here on in. Test.
Implement the FP method public static int mul (int x, int y) that accepts two fixed point integers, and returns the fixed point product.
Add statements to your FPTutorial driver that define two new fixed point ints, fpNum1 and fpNum2, and initialize them to any positive integer values you wish. Add the two variables using the normal + operator and assign the result to the int variable, fpResult. Invoke the interpret(int n) method to confirm the accuracy of the addition and multiplication.
Create a new project called Applet1 and adapt/embed the FPTutorial code you've written so far into the paint method of a minimal Applet container found here. Mount your Applet so that the link found here works.

Fractions
Fixed Point representation of fractions consisting solely of sum of powers of 2 with negative exponents can be created in the following straightforward manner.

Expression	Value
1 << FP.res - 1	0.5
1 << FP.res - 3	0.125
3 << FP.res - 1	1.5
15 << FP.res -2	3.75

Other fractional values can only be approximated and the programmer has to live with roundoff error. Consider the following examples of the binary equivalents of the reciprocals of the first few integers.

n	1/n base 2
2	.1
3	.01 01
4	.01
5	.0011 0011
6	.001 01
7	.001 001
8	.001
9	.0 000111
10	.00011 0011
11	.00010111 0100010111
12	.0001 01
13	.0001001 110110001001
14	.0 001
15	.0001

For Fixed Point representations of other rational numbers (fractions) a division method can be employed.

Division
This could be handled two ways. First, would be to combine the earlier concepts of multiplication and reciprocal fractions. The second is to use integer division (int/int) through the use of the / operator. Since the operands are not floating point values the processing cost is reasonable.

The latter technique requires two considerations. First, to avoid the truncation error associated with integer division, the dividend can be shifted left as far as you like (preferably as far as possible). Second, to normalize the result, the quotient is shifted right the correct number of places to preserve your Fixed Point format.

Negative Numbers: Two's Complement
The Two's Complement Algorithm is the method used by most computer platforms to negate integers. The internal binary representation for -1 is the result of applying the Algorithm to +1. This is demonstrated as an 8-bit example in the table below, resulting in -1 being stored as 1111 1111. Now, grab a piece of paper and apply the Two's Complement Algorithm to -1 and examine the outcome. As you see from your result, the Two's Complement Algorithm should be thought of as the process of negation.

x:	0	0	0	0	0	0	0	1
~x:	1	1	1	1	1	1	1	0
+1:								1
-x:	1	1	1	1	1	1	1	1

Below is a table consisting of the nibble-size binary representations of the integers in the range, -2³<= n <= 2³-1.

n	Internal Binary Representation
7	0111
6	0110
5	0101
4	0100
3	0011
2	0010
1	0001
0	0000
-1	1111
-2	1110
-3	1101
-4	1100
-5	1011
-6	1010
-7	1001
-8	1000

EXERCISE SET 3

To your Applet1 project, add statements that define fixed point variables, fp1_2, fp1_4 standing for 1/2 and 1/4, respectively. Display.
Implement the FP method public static int div (int x, int y) that accepts two fixed point integers, and returns the fixed point quotient. Test.
Define a few fixed point variables with a negative values, modify your interpret(int n) method to correctly display these variables, and perform some calculations with these values.
Mount your Applet so that the link found here works.

Reference

Overview: Wikipedia

Primer: Jones on Reciprocal Multiplication

Technical Reference
Fixed Point Representation and Fractional Math

Fixed Point Arithmetic: An Introduction

Fixed Point Library...for Atmel AVR Processors

Applications
Fractint: Their FP Math Library, Deep Zooming

Operator	Operation	Example	Details
>>	right shift with sign extension	56 >> 3 = 7	111000 becomes 000111
<<	left shift	3 << 4 = 48	11 becomes 110000
&	AND	5 & 6 = 4	101 & 110 = 100
\|	OR	5 \| 6 = 7	101 \| 110 = 111
~	NOT (complement)	~20 = -21	~00010100 = 11101011*

Type	Bytes	Bits	States	Range
byte	1	8	256	-128..127
short	2	16	65,536	-32768..32767
int	4	32	4,294,967,296	-2,147,483,648..2,147,483,647
long	8	64	18,446,744,073,709,551,616	-9,223,372,036,854,775,808..9,223372036,854,775,807