關於IEEE754二進位制浮點數算術標準的介紹
Single-precision 32 bit
A single-precision binary floating-point number is stored in 32 bits.
Bit values for the the IEEE 754 32bit float 0.15625
The exponent is biased by 28 − 1− 1 = 127 in this case (Exponents in the range −126 to +127 are representable. See the above explanation to understand why biasing is done). An exponent of
For normalised numbers, the most common, exponent
The number has value v:
v = s × 2e× m
Where
s = +1 (positive numbers) when the sign bit is 0
s = −1 (negative numbers) when the sign bit is 1
e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")
m = 1.fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of the fraction). Therefore, 1 ≤ m < 2.
In the example shown above, the sign is zero, the exponent is −3, and the significand is 1.01 (in binary, which is 1.25 in decimal). The represented number is therefore +1.25 × 2−3, which is +0.15625.
Notes:
1.Denormalized numbers are the same except that e = −126 and m is 0.fraction. (e is NOT −127 : The fraction has to be shifted to the right by one more bit, in order to include the leading bit, which is not always 1 in this case. This is balanced by incrementing the exponent to −126 for the calculation.)
2.−126 is the smallest exponent for a normalized number
3.There are two Zeroes, +0 (s is 0) and −0 (s is 1)
4.There are two Infinities +∞ (s is 0) and −∞ (s is 1)
5.NaNs may have a sign and a fraction, but these have no meaning other than for diagnostics; the first bit of the fraction is often used to distinguish signaling NaNs from quiet NaNs
6.NaNs and Infinities have all 1s in the Exp field.
7.The positive and negative numbers closest to zero (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are
±2−149≈ ±1.4012985×10−45
8.The positive and negative normalized numbers closest to zero (represented with the binary value 1 in the Exp field and 0 in the fraction field) are
±2−126≈ ±1.175494351×10−38
9.The finite positive and finite negative numbers furthest from zero (represented by the value with 254 in the Exp field and all 1s in the fraction field) are
±((1-(1/2)24)2128) [2]≈ ±3.4028235×1038
Here is the summary table from the previous section with some example 32-bit single-precision examples:
Type | Exponent | Significand | Value |
Zero | 0000 0000 | 000 0000 0000 0000 0000 0000 | 0.0 |
One | 0111 1111 | 000 0000 0000 0000 0000 0000 | 1.0 |
Denormalized number | 0000 0000 | 100 0000 0000 0000 0000 0000 | 5.9×10-39 |
Large normalized number | 1111 1110 | 111 1111 1111 1111 1111 1111 | 3.4×1038 |
Small normalized number | 0000 0001 | 000 0000 0000 0000 0000 0000 | 1.18×10-38 |
Infinity | 1111 1111 | 000 0000 0000 0000 0000 0000 | Infinity |
NaN | 1111 1111 | non zero | NaN |
A more complex example
Bit values for the IEEE 754 32bit float -118.625
Let us encode the decimal number −118.625 using the IEEE 754 system.
1.First we need to get the sign, the exponent and the fraction. Because it is a negative number, the sign is "1".
2.Now, we write the number (without the sign; i.e. unsigned, no two's complement) using binary notation. The result is 1110110.101.
3.Next, let's move the radix point left, leaving only a 1 at its left: 1110110.101 = 1.110110101 × 26. This is a normalized floating point number. The fraction is the part at the right of the radix point, filled with 0 on the right until we get all 23 bits. That is 11011010100000000000000.
4.The exponent is 6, but we need to convert it to binary and bias it (so the most negative exponent is 0, and all exponents are non-negative binary numbers). For the 32-bit IEEE 754 format, the bias is 127 and so 6 + 127 = 133. In binary, this is written as 10000101.