Is this a bug or a VERY sneaky case?

WebFreak001 d.forum at webfreak.org
Tue Dec 28 15:25:30 UTC 2021


On Sunday, 26 December 2021 at 06:34:14 UTC, rempas wrote:
> On Saturday, 25 December 2021 at 21:43:39 UTC, Rumbu wrote:
>>
>> It's common sense, log10 means "give me the power of 10 to 
>> obtain n". And we know that 10^x means 1 followed by x zeroes, 
>> hence the maximum width for the number. You add 1 because of 
>> the 1 before the zeroes.
>>
>> So if we take as an example ubyte.max, we have ```log10(255) = 
>> 2.4```. Truncated as int, you get ```2```. Add ```1``` and you 
>> obtain ```3```, the exact length of ```255```.
>>
>> Or you can take 1000. ```log10(1000) = 3```, add ```1```, you 
>> obtain ```4```, the exact length of ```1000```.
>>
> I got that now! You want to replace the static ifs for the enum 
> "buffer_size" (which is a name I'm going to change). However, 
> is the algorithm built-in? If I have to make it, this means 
> that I'll have to spend time finding how to make it and I will 
> also end up with more code I will eliminate. Unless of course 
> there is still something I don't understand...

log10 (or more correctly for all bases: `log(num, base)` or 
`log(num)/log(base)`) is the mathematically correct answer for 
**positive** numbers, but I think practically the major 
disadvantage (performance) of it in code like this outweigh the 
advantage (memory saving) of being correct, with integers of 
values at most -2^63..2^64.

I think your current code is good as it is. (for base 10 at 
least) I don't think you will really save any memory by leaving 
out the spare bytes, the malloc might add way more overhead. 
There is no need to overthink this really and the `static if` 
cases for the different data-types are a good enough tradeoff for 
saving a few bytes of memory for no extra work needed at runtime.

You could better improve your code performance & memory usage by 
looking into better allocation strategies for the small memory 
blocks you allocate. But you should only really need this when 
your custom to_str function is called a massive amount of times. 
(for an int -> string function like this I could imagine it being 
worthwhile in certain scenarios though)

For base 2 the biggest number would then be 64 characters, still 
very manageable.

Some tips I think I would rather suggest to you based on that 
code: (for style and to avoid bugs, not changing performance or 
memory usage much)

- use [contracts](https://dlang.org/spec/function#contracts) for 
stuff like the base to indicate, that only bases 2..16 are 
allowed:
   `char* to_str(T)(T num, u8 base) in(base >= 2 && base <= 16) { 
...`
   (nice for documentation and catches accidental bugs in 
development, in release builds these checks are omitted - which 
is part of the reason why you should never catch AssertError, 
Error or Throwable!)
- use D's datatypes, not your own ones (if you want others to 
look/work on your code too it's better to use the common names 
for stuff) - but ofc staying consistent across your code is more 
important
- use `is(T == ubyte)` etc. instead of your custom `is_same!(val, 
ubyte)` (same reason as above, people need to read the definition 
of is_same first)
- work with slices, not with pointers (if you plan to use your 
code from D, it's much cleaner and avoids bugs! does not need a 
trailing null terminator and works with @safe code)


More information about the Digitalmars-d mailing list