A couple of questions about arrays and slices

Sat Jun 24 14:43:00 UTC 2023

On Saturday, 24 June 2023 at 12:05:26 UTC, Jonathan M Davis wrote:
> On Saturday, June 24, 2023 1:43:53 AM MDT Cecil Ward via 
> Digitalmars-d-learn wrote:
>> On Saturday, 24 June 2023 at 07:36:26 UTC, Cecil Ward wrote:
>> > [...]
>>
>> I just realised something, your point about altering the table 
>> and having to rehash, is well taken. I hadn’t considered that. 
>> The reason for my foolishness in failing to realise that I’m 
>> asking the impractical is my pattern of usage. I add all the 
>> entries into the mapping table and have no interest in any 
>> lookups until it is fully built. Then a second function starts 
>> to do lookups while the data remains unchanging and that usage 
>> pattern can be guaranteed. I could even idup it if that would 
>> help, as copying < 32 uints wouldn’t take forever. A typical 
>> value would be a mere 5 or less. I only picked 32 to be 
>> completely safely ott.
>
> Well, if the key were a struct or a class, the hashing function 
> would be opHash. For built-in types, the runtime has hashing 
> functions that it uses. Either way, with AAs, you really don't 
> worry about managing the memory, because it's completely 
> outside of your control. You just put the elements in there 
> using their associated keys, and if you want to try to speed it 
> up after you've populated it, you use rehash so that the 
> runtime can try to move the elements around within the 
> container so that lookup speeds will be closer to optimal.
>
> As such, for the most part, when dealing with AAs and worrying 
> about efficiency, the question really becomes whether AAs are 
> the correct solution rather than much of anything having to do 
> with how you manage their memory.
>
> With so few elements, it's also possible that using 
> std.algorithm.searching.find would be faster - e.g. having a 
> dynamic array of strings where the matching int is at the same 
> index in a dynamic array of ints - or you could use 
> std.typecons.Tuple!(string, int)[] with something like 
> arr.find!(a => a[0] == key)() to find the tuple with the int 
> you want.
>
> Simply comparing a small number of strings like that might be 
> faster than what goes on with hashing the string and then 
> finding the corresponding element within the AA - or it might 
> not be. You'd have to test that to know. The AA would 
> definitely be faster with a large number of elements, but with 
> a small number of elements, the algorithmic complexity doesn't 
> really matter, and the extra overhad with the AA lookups could 
> actually mean that the search through the dynamic array is 
> faster even though it's O(n). But you can only know which is 
> faster by testing it out with the actual data that you're 
> dealing with.
>
> Regardless, you need to remember that associative arrays are 
> not arrays in the C sense. Rather, they're hash tables, so they 
> function very differently from dynamic arrays, and the rehash 
> function is the closest that you're going to get to affecting 
> how the elements are laid out internally or how much memory the 
> AA is using.
>
> - Jonathan M Davis

I started out looking into a number of runtime library routines, 
but in the end it seemed quicker to roll my own code for a crude 
recursive descent parser/lexer that parses part of D’s grammar 
for expressions, and (again partial grammar) parser for string 
literal expressions and so on. I find certain special elements 
and execute actions which involve doing the AA lookup and 
replacing variable names with ordinal numbers in decimal in the 
output stream. Admission: The parsing is the thing that has to be 
fast, even though again the size of the D language text is not 
likely to be huge at all. But 40 years ago, I came from a world 
with 2k RAM and 0.9 MHz clock rates so I have developed a habit 
of always thinking about speed before I do anything, needful or 
not, to be honest. I once wrote a program that took 35 mins to 
evaluate 2+2 and print out the answer, so I’m now ashamed of 
writing slow code. Those were bad days, to be honest. 4 GHz+ and 
ILP is nicer.