Floating-point representation is used to store real numbers, especially when dealing with very large or very small values. It approximates real numbers in a way that balances precision and range.
The IEEE 754 standard is the most common way to represent floating-point numbers. It splits a floating-point number into three components:
The formula:
value=(−1)S×1.M×2(E−Bias)
where Bias is 127 for single-precision (32-bit).
Let’s break down the number 10.25 in binary to see how it’s represented:
1010.01=1.01001×23
0|10000010|01001000000000000000000
In the IEEE 754 floating-point standard, certain special values are reserved for edge cases such as zero, infinity, and undefined operations. These values help systems represent situations that can’t be expressed as regular floating-point numbers.
Value | Sign Bit (S) | Exponent (E) | Mantissa (M) | Description |
---|---|---|---|---|
+Zero | 0 | 00000000 | 00000000000000000000000 | Represents positive zero |
-Zero | 1 | 00000000 | 00000000000000000000000 | Represents negative zero |
+Infinity | 0 | 11111111 | 00000000000000000000000 | Represents positive infinity |
-Infinity | 1 | 11111111 | 00000000000000000000000 | Represents negative infinity |
NaN | 0 or 1 | 11111111 | Non-zero value | Represents “Not a Number”, used for undefined operations |
Try them out here: https://www.h-schmidt.net/FloatConverter/IEEE754.html