===== Floating-Point Representation ===== Floating-point representation is used to store real numbers, especially when dealing with very large or very small values. It approximates real numbers in a way that balances precision and range. ==== The IEEE 754 Standard ==== The IEEE 754 standard is the most common way to represent floating-point numbers. It splits a floating-point number into three components: * **Sign (S)**: Determines if the number is positive or negative (1 bit). * **Exponent (E)**: Represents the number range (8 bits for single-precision). * **Mantissa (M)** (also called the significant or fraction): Represents the precision (23 bits for single-precision). The formula: $$ \text{value} = (-1)^S \times 1.M \times 2^{(E - \text{Bias})} $$ where **Bias** is 127 for single-precision (32-bit). ==== Single-Precision (32-bit) Example ==== Let’s break down the number **10.25** in binary to see how it’s represented: * **Step 1**: Convert **10.25** to binary: * Integer part: **10** in binary is **1010**. * Fraction part: **0.25** is **0.01** in binary. * So, **10.25** in binary is **1010.01** * **Step 2**: Normalize it into scientific notation in binary: $$ 1010.01 = 1.01001 \times 2^3 $$ * **Step 3**: Identify the components: * **Sign (S)**: 0 (positive) * **Exponent (E)**: We add the bias (127) to the actual exponent (3), so **E = 3 + 127 = 130**. In binary: **10000010** * **Mantissa (M)**: We drop the leading 1 from **1.01001**, so the mantissa is 010010... (with trailing zeros to make 23 bits) * **Step 4**: Combine them $$ 0 | 10000010 | 01001000000000000000000 $$ ==== Special Values in Floating-Point Representation ==== In the IEEE 754 floating-point standard, certain special values are reserved for //edge cases// such as **zero**, **infinity**, and **undefined operations**. These values help systems represent situations that can’t be expressed as regular floating-point numbers. === Special Values Overview === * **Zero**: Represents positive or negative zero * **Infinity**: Represents positive or negative infinity, resulting from overflow or division by zero. * **NaN (Not a Number)**: Represents undefined results, such as **0/0** or **sqrt(-1)** ^ Value ^ Sign Bit (S) ^ Exponent (E) ^ Mantissa (M) ^ Description ^ | +Zero | 0 | 00000000 | 00000000000000000000000 | Represents positive zero | | -Zero | 1 | 00000000 | 00000000000000000000000 | Represents negative zero | | +Infinity| 0 | 11111111 | 00000000000000000000000 | Represents positive infinity | | -Infinity| 1 | 11111111 | 00000000000000000000000 | Represents negative infinity | | NaN | 0 or 1 | 11111111 | Non-zero value | Represents "Not a Number", used for undefined operations | Try them out here: https://www.h-schmidt.net/FloatConverter/IEEE754.html