Counting an initialised array, and segments
Cecil Ward
cecil at cecilward.com
Mon Jun 26 19:09:24 UTC 2023
On Monday, 26 June 2023 at 12:28:15 UTC, Jonathan M Davis wrote:
> On Monday, June 26, 2023 5:08:06 AM MDT Cecil Ward via
> Digitalmars-d-learn wrote:
>> On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis
>> wrote:
>> > On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via
>> >
>> > Digitalmars-d-learn wrote:
>> >> I recently had some problems
>> >>
>> >> dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];
>> >>
>> >> and I got errors from the compiler which led to me having to
>> >> count the elements in the initialiser and declare the array
>> >> with
>> >> an explicit size. I don’t want the array to be mutable so I
>> >> later
>> >> added immutable to it, but that didn’t help matters. At one
>> >> point, because the array was quite long, I got the arr[
>> >> n_elements ] number wrong, it was too small and the
>> >> remainder
>> >> of
>> >> the array was full of 0xffs (or something), which was good,
>> >> helped me spot the bug.
>> >>
>> >> Is there any way to get the compiler to count the number of
>> >> elements in the initialiser and set the array to that size
>> >> ? And it’s immutable.
>> >
>> > Without seeing the errors, I can't really say what the
>> > problem was, but most character literals are going to be
>> > char, not dchar, so you may have had issues related to the
>> > type that the compiler was inferring for the array literal.
>> > I don't recall at the moment how exactly the compiler
>> > decides the type of an array literal when it's given values
>> > of differing types for the elements.
>> >
>> > Either way, if you want a static array, and you don't want
>> > to have to count the number of elements, then
>> > https://dlang.org/phobos/std_array.html#staticArray should
>> > take care of that problem.
>> >
>> > - Jonathan M Davis
>>
>> Where I used symbolic names, such as TAB, that was defined as
>> an
>> int (or uint)
>> enum TAB = 9;
>> or
>> enum uint TAB = 9;
>> I forget which. So I had at least one item that was typed
>> something wider than a char.
>>
>> I tried the usual sizeof( arr )/ sizeof dchar, compiler
>> wouldn’t
>> have that for some reason, and yes I know it should be D
>> syntax,
>> god how I long for C sizeof()!
>
> sizeof is a property in D. So, you can do char.sizeof or
> varName.sizeof. But regardless, there really is no reason to
> use sizeof with D arrays under normal circumstances. And in the
> case of dynamic arrays, sizeof will give you the size of the
> dynamic array itself, not the slice of memory that it refers
> to. You're essentially using sizeof on
>
> struct DynamicArray(T)
> {
> size_t length;
> T* ptr;
> }
>
> which is not going to tell you anything about the memory it
> points to. The length property of an array already tells you
> the length of the array (be it static or dynamic), so using
> sizeof like you're talking about really does not apply to D.
>
> And I wouldn't advise using uint for a character in D. That's
> what char, wchar, and dchar are for. Depending on the
> circumstances, you get implicit conversions between character
> and integer types, but they are distinct types, and mixing and
> matching them willy-nilly could result in compilation errors
> depending on what your code is doing.
>
> - Jonathan M Davis
No, point taken, a sloppy example. I don’t in fact do that in the
real code. I use dchar everywhere appropriate instead of uint. In
fact I have aliases for dstring and dchar and successfully did an
alternative build with the aliases renamed to use 16-bits wchar /
w string instead of 32-bits and rebuilt and all was well, just to
test that it is code word size-independent. I would need to do
something different though if I ever decided to change to use
16-bit code words in memory because I would still be wanting to
manipulate 32-bit values for char code points when they are being
handled in registers, for efficiency too as well as code
correctness, as 16-bit ‘partial words’ are bad news for
performance on x86-64. I perhaps ought to introduce a new alias
called codepoint, which is always 32-bits, to distinguish dchar
in registers from words in memory. It turns out that I can get
away with not caring about utf16, as I’m merely _scanning_ a
string. I couldn’t ever get away with changing the in-memory code
word type to be 8-bit chars, and then using utf8 though, as I do
occasionally deal with non-ASCII characters, and I would have to
either preconvert the Utf8 to do the decoding, or parse 8-bit
code words and handle the decoding myself on the fly which would
be madness. If I have to handle utf8 data I will just preconvert
it.
More information about the Digitalmars-d-learn
mailing list