Counting an initialised array, and segments

Mon Jun 26 19:09:24 UTC 2023

On Monday, 26 June 2023 at 12:28:15 UTC, Jonathan M Davis wrote:
> On Monday, June 26, 2023 5:08:06 AM MDT Cecil Ward via 
> Digitalmars-d-learn wrote:
>> On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis 
>> wrote:
>> > On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via
>> >
>> > Digitalmars-d-learn wrote:
>> >> I recently had some problems
>> >>
>> >> dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];
>> >>
>> >> and I got errors from the compiler which led to me having to
>> >> count the elements in the initialiser and declare the array
>> >> with
>> >> an explicit size. I don’t want the array to be mutable so I
>> >> later
>> >> added immutable to it, but that didn’t help matters. At one
>> >> point, because the array was quite long, I got the arr[
>> >> n_elements ] number wrong, it was too small and the 
>> >> remainder
>> >> of
>> >> the array was full of 0xffs (or something), which was good,
>> >> helped me spot the bug.
>> >>
>> >> Is there any way to get the compiler to count the number of 
>> >> elements in the initialiser and set the array to that size 
>> >> ? And it’s immutable.
>> >
>> > Without seeing the errors, I can't really say what the 
>> > problem was, but most character literals are going to be 
>> > char, not dchar, so you may have had issues related to the 
>> > type that the compiler was inferring for the array literal. 
>> > I don't recall at the moment how exactly the compiler 
>> > decides the type of an array literal when it's given values 
>> > of differing types for the elements.
>> >
>> > Either way, if you want a static array, and you don't want 
>> > to have to count the number of elements, then 
>> > https://dlang.org/phobos/std_array.html#staticArray should 
>> > take care of that problem.
>> >
>> > - Jonathan M Davis
>>
>> Where I used symbolic names, such as TAB, that was defined as 
>> an
>> int (or uint)
>> enum TAB = 9;
>> or
>> enum uint TAB = 9;
>> I forget which. So I had at least one item that was typed
>> something wider than a char.
>>
>> I tried the usual sizeof( arr )/ sizeof dchar, compiler 
>> wouldn’t
>> have that for some reason, and yes I know it should be D 
>> syntax,
>> god how I long for C sizeof()!
>
> sizeof is a property in D. So, you can do char.sizeof or 
> varName.sizeof. But regardless, there really is no reason to 
> use sizeof with D arrays under normal circumstances. And in the 
> case of dynamic arrays, sizeof will give you the size of the 
> dynamic array itself, not the slice of memory that it refers 
> to. You're essentially using sizeof on
>
> struct DynamicArray(T)
> {
>     size_t length;
>     T* ptr;
> }
>
> which is not going to tell you anything about the memory it 
> points to. The length property of an array already tells you 
> the length of the array (be it static or dynamic), so using 
> sizeof like you're talking about really does not apply to D.
>
> And I wouldn't advise using uint for a character in D. That's 
> what char, wchar, and dchar are for. Depending on the 
> circumstances, you get implicit conversions between character 
> and integer types, but they are distinct types, and mixing and 
> matching them willy-nilly could result in compilation errors 
> depending on what your code is doing.
>
> - Jonathan M Davis

No, point taken, a sloppy example. I don’t in fact do that in the 
real code. I use dchar everywhere appropriate instead of uint. In 
fact I have aliases for dstring and dchar and successfully did an 
alternative build with the aliases renamed to use 16-bits wchar / 
w string instead of 32-bits and rebuilt and all was well, just to 
test that it is code word size-independent. I would need to do 
something different though if I ever decided to change to use 
16-bit code words in memory because I would still be wanting to 
manipulate 32-bit values for char code points when they are being 
handled in registers, for efficiency too as well as code 
correctness, as 16-bit ‘partial words’ are bad news for 
performance on x86-64. I perhaps ought to introduce a new alias 
called codepoint, which is always 32-bits, to distinguish dchar 
in registers from words in memory. It turns out that I can get 
away with not caring about utf16, as I’m merely _scanning_ a 
string. I couldn’t ever get away with changing the in-memory code 
word type to be 8-bit chars, and then using utf8 though, as I do 
occasionally deal with non-ASCII characters, and I would have to 
either preconvert the Utf8 to do the decoding, or parse 8-bit 
code words and handle the decoding myself on the fly which would 
be madness. If I have to handle utf8 data I will just preconvert 
it.