Implement the "unum" representation in D ?

Thu Feb 20 21:21:53 PST 2014

On Thursday, 20 February 2014 at 23:41:12 UTC, jerro wrote:
>> We might not need random access though.
>
> Even if you don't need random access, you can't
> store them in a packed way if you want to be able
> to mutate them in place, since mathematical operations
> on them can change the number of bits they take.
>
>> Depends on the application.
>
> I suspect that in most numerical applications
> the number of bits needed to store the numbers
> will keep increasing during the computation until
> it reaches the maximal supported value. That can
> actually happen very fast - a single division
> with a non power of two is enough. Applications
> that could actually benefit from this in any way
> are probably extremely rare. The only one I can
> think of is to use it as a very simple form of
> compression, but I'm sure there are better options
> for that.

Yes, but his ubox method attempts to solve this problem by
providing only the most accurate answers. By using a "sliding"
floating point you can control the accuracy much better and by
using the ubox method you can zero in on the most accurate
solutions.

I think a few here are missing the point. First: The unums self
scale to provide the "fewest bit" representation ala his morse
code vs ascii example. Not a huge deal, after all, it's only
memory. But by using such representations one can have more
accurate results because one can better represent values that are
not accurately representable in standard fp.

I think though adding a "repeating" bit would make it even more
accurate so that repeating decimals within the bounds of maximum
bits used could be represented perfectly. e.g., 1/3 = 0.3333...
could be represented perfectly with such a bit and sliding fp
type. With proper cpu support one could have 0.3333... * 3 = 1
exactly.

By having two extra bits one could represent constants to any
degree of accuracy. e.g., the last bit says the value represents
the ith constant in some list. This would allow very common
irrational constants to be used: e, pi, sqrt(2), etc... with more
accuracy than normal and handled internally by the cpu. With such
constants one could have sqrt(2)*sqrt(2) = 2 exactly. (designate
part of the constants as "squares" since one could have up to 2^n
- 1 constants)

The problem I see is that to make it all work requires a lot of
work on the hardware. Without it, there is no real reason to use
it.

One also runs into the problem of optimization. Having variable
sized fp numbers may not be very efficient in memory alignment. A
list of resizedable fp types would contain 1, 2, 3, 4, 5, ...
byte numbers of random size.

If you do a calculation on the list and try to use the list as
storage again could end up with a different sized list... which
could be very inefficient. You'll probably have to allocate a
list of the maximum size possible in the first place to prevent
buffer overflows, which then defeats the purpose in some sense

In any case, one could have a type where the last 2 bits
designate the representation.

00 - Positive Integer
01 - Floating point
10 - Constant - (value represents an index, possibly virtual,
into a table of constants)
11 - Negative Integer

For floating points, the 3rd last bit could represent a repeating
decimal or they could be used in the constants for common
repeating decimals. (since chances are, repeated calculations
would not produce repeating decimals)