fraction

package

v0.0.4 Latest Latest Go to latest Published: Oct 18, 2025 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

codeberg.org/g2t/goblib

Links

Open Source Insights

Documentation ¶

Index ¶

func Float32Fractions(f float32) (int32, int32, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Float32Fractions ¶

func Float32Fractions(f float32) (int32, int32, error)

Float32Fractions calculates the numerator and denominator from the encoded bits of the float32 number, which are placed as follows: s|hhh hhhh h|nnn nnnn nnnn nnnn nnnn nnnn

sign S: 1 bit s, exponent-value E: 8 bits h (count bits r=8), mantis-value M: 23 bits n (count bits p=23)

S = 1 for positive and -1 for negative values

creating of real mantis "m" (1 =< m < 2) from M (0 =< (M/2^p) < 1; 0 =< M =< 2^p - 1): 24 bits mantis = 1nnn nnnn nnnn nnnn nnnn nnnn m = 1 + M/2^p; 2^p = 2^23 = 8388608

creating of real exponent "e" (-126 =< e =< 127) e = E - B; B = 127

f = m * 2^e

Target values can be calculated as follows: num = (-1)^S * [2^k + M*2^(k-23)]; den = 2^(k-e)

For fast calculation use the shift operation: y = a*2^x => y = (a<<x) ; "a" can be 1

If internally int32 is a concern, k must be chosen dynamically as follows: k=30+e; if k>30; k=30 => means for f > 1 (e >= 0) => k=30, otherwise (e<0) => k=0..29

example 18.4: exp=4, M=1258291 k=34 => k=30 den = 2^(30-4) = 67108864 = 0x04000000 num = 2^30 + M*2^(30-23) = 2^30 + M*2^7 = 1073741824 + 161061248 = 1234803072 0x49999980 f = num/den = 18.3999996185

example 0.05: exp=-5, M=5033164 k=25 den = 2^(25--5) = 2^30 = 1073741824 = 0x40000000 num = 2^25 + M*2^(25-23) = 2^25 + M*2^2 = 33554432 + 20132656 = 53687088 = 0x03333330 f = num/den = 0.04999999702

If a bigger range for the internal calculation is used and encoded values until just before return are used, some calculations and checks can be simplified/dropped. This means especially: * k can be constant 30 * a constant can be used for "2^30" instead of calculation from k * a constant can be used for "2^(30-23)" instead of calculation from k * no differentiation is needed at beginning regarding the sign of e

valid range for int32 for e: -30 <= e <= 30 k=30 => means for f > 1 (e >= 0) => (k-e)=0..30, otherwise (e<0) => (k-e)=31..60

if e < 0, "num" can and "den" will exceed the int32 limit and needs to be adjusted as follows: * let den = 2^30 * adjust "num" according the missing accuracy of "den" now, which means dividing by 2^(-exp)

example 18.4: exp=4, M=1258291 k=34 => k=30 den = 2^(30-4) = 67108864 = 0x04000000 num = 2^30 + M*2^(30-23) = 2^30 + M*2^7 = 1073741824 + 161061248 = 1234803072 = 0x49999980 f = num/den = 18.3999996185; num and den are guaranteed to be in the int32 range

example 0.05: exp=-5, M=5033164 k=25 den64 = 2^(30--5) = 2^35 = 34359738368 = 0x0800000000 num64 = 2^30 + M*2^(30-23) = 2^30 + M*2^7 = 1073741824 + 644244992 = 1717986816 = 0x66666600 f64 = num64/den64 = 0.04999999702 den = 2^30 = 1073741824 num = num64/2^--5 = num64/2^5 = 1717986816/32 = 53687088 = 0x03333330 f = num/den = 0.04999999702

special cases: E=0, M=0: f=0 => num=0; den=1 E=0, M>0 (very small numbers): |f| =< 2^-127; f=1/MaxInt32 (error=e^-95); f=0 (error=2^-127) => num=0; den=1 please note: with this we loose the sign, but get higher accuracy E=255 (all bits set), M=0: +/-Inf; but for int32 numerator, +Inf is for e=31, means E=158, -Inf is for e= E=255 (all bits set), M>0: NaN

See: https://de.wikipedia.org/wiki/IEEE_754

Very good accuracy can be reached, similar to calculating with "math/big.Rat", but ~20 times faster: nrf52840: 1.526µs-4.577µs

Considered other options: see function in test file

Types ¶

This section is empty.

Source Files ¶

View all Source files

fraction.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL