Table of Contents

Floating-Point Representation

Floating-point representation is used to store real numbers, especially when dealing with very large or very small values. It approximates real numbers in a way that balances precision and range.

The IEEE 754 Standard

The IEEE 754 standard is the most common way to represent floating-point numbers. It splits a floating-point number into three components:

The formula:

$$ \text{value} = (-1)^S \times 1.M \times 2^{(E - \text{Bias})} $$

where Bias is 127 for single-precision (32-bit).

Single-Precision (32-bit) Example

Let’s break down the number 10.25 in binary to see how it’s represented:

$$ 1010.01 = 1.01001 \times 2^3 $$

$$ 0 | 10000010 | 01001000000000000000000 $$

Special Values in Floating-Point Representation

In the IEEE 754 floating-point standard, certain special values are reserved for edge cases such as zero, infinity, and undefined operations. These values help systems represent situations that can’t be expressed as regular floating-point numbers.

Special Values Overview

Value Sign Bit (S) Exponent (E) Mantissa (M) Description
+Zero 0 00000000 00000000000000000000000 Represents positive zero
-Zero 1 00000000 00000000000000000000000 Represents negative zero
+Infinity 0 11111111 00000000000000000000000 Represents positive infinity
-Infinity 1 11111111 00000000000000000000000 Represents negative infinity
NaN 0 or 1 11111111 Non-zero value Represents “Not a Number”, used for undefined operations

Try them out here: https://www.h-schmidt.net/FloatConverter/IEEE754.html