Either I'm confused or the gc is

Sat Oct 24 14:21:52 UTC 2020

On Friday, 23 October 2020 at 23:41:05 UTC, rikki cattermole 
wrote:
> Thanks!
>
> The problem is not in that block of code I believe.
>
> Now how about bind_text?
>
> Is that still using c variadic args?

No. Read on.

The detective in me couldn't resist (I really need a Peugeot and 
a basset hound) and so I did some more work on this despite not 
really having the time and I have made some progress.

The walk_account_tree function is defined inside the main 
function. Before it is defined,
I prepare all the sql statements needed to do the verification, 
stored in local variables. One of the queries is count_children:

"select count(guid) from accounts where parent_guid = ?1"

That query is run in the section of walk_account_tree that we've 
been looking at:
         // Now do the children of this account
         // First determine how many there are
         bind_text(count_children, 1, account.guid);
         int n_children = one_row(count_children, 
&get_int).integer_value;
         if (n_children > 0)
         {
             Account[] children = new Account[n_children];
             bind_text(find_children, 1, account.guid);

So the result of that query is saved in n_children and determines 
the size of the children array.
Recall that 'Account account' is the first argument to 
walk_account_tree, so n_children is the number of accounts that 
have account.guid,
an account's unique identifier, as its parent. In other words, 
its children.

If there are children, I allocate the children array and run the 
find_children query:

             "select name, guid, ifnull(commodity_guid, ''), flags
                 from accounts where parent_guid = ?1");

The number of rows that come back ought to be same as n_children. 
On a hunch, I put in a
test to check this:

         // Now do the children of this account
         // First determine how many there are
         bind_text(count_children, 1, account.guid);
         int n_children = one_row(count_children, 
&get_int).integer_value;
         if (n_children > 0)
         {
             Account[] children = new Account[n_children];
             bind_text(find_children, 1, account.guid);
             int i = 0;
             while (next_row_available_p(find_children, 
&reset_stmt))
             {
                 children[i].name = 
fromStringz(sqlite3_column_text(find_children, 0)).dup;
                 children[i].path = format("%s:%s", account.path, 
children[i].name);
                children[i].guid = 
fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
                 children[i].commodity_guid = 
fromStringz(sqlite3_column_text(find_children, 2)).dup; // 
commodity_guid
                 children[i].flags = 
sqlite3_column_int(find_children,
                         3) | account.flags & 
(account_flag_descendents_are_assets
                         | 
account_flag_descendents_are_liabilities | 
account_flag_descendents_are_income
                         | account_flag_descendents_are_expenses | 
account_flag_descendents_are_marketable
                         | 
account_flag_self_and_descendents_are_tax_related
                         | 
account_flag_descendents_need_commodity_link); // flags
                 i = i + 1;
             }
             if (i != n_children)
                 panic(format("walk_account_tree: bad child index, 
%d, %d, %s", i, n_children, account.path));
             foreach (child; children) {
                 walk_account_tree(child, ancestor_flags | 
account.flags);
             }
         }

And, sure enough, the panic happens:
dca at pangloss:~/Software/newcash_d/verifier$ ./verifier 
/tmp/Finances.newcash
walk_account_tree: bad child index, 231, 344, 
:Assets:Investments:Equities and derivatives:Taxable:Donald C. 
Allen 2003 Revocable Trust:TD Ameritrade
dca at pangloss:~/Software/newcash_d/verifier$

Something has caused sqlite to end the return of rows 
prematurely. But next_row_p did not panic,
which it will if the the return code from sqlite3_step, which it 
calls, is other than SQLITE_ROW or SQLITE_DONE.
Since next_row_p just returned 'false', sqlite3_step must have 
returned SQLITE_DONE. sqlite returned 231 of 344 rows but is 
acting like everything is normal.

This explains why my earlier gdb session revealed that the error 
messages the verifier was spewing were due to being fed account 
structs that were all zero.
The recursive calls are being done on all 344 of the elements of 
the children array, but only 231 have data in them.
The rest are zero, as they were when the whole children array was 
allocated and those dataless "children" look bogus to the 
verifier.

Now, I know from prior experiment that if I disable the gc, the 
problem does not occur, so I tried this again with the newly 
installed panic:

dca at pangloss:~/Software/newcash_d/verifier$ ./verifier 
"--DRT-gcopt=disable:1" /tmp/Finances.newcash
dca at pangloss:~/Software/newcash_d/verifier$

This was run with the same executable as the above. No panic.

Next, I put in a call to GC.disable() just before the 'while' and 
a call to GC.enable{} after the loop returns. No panic.

So I moved GC.disable and enable calls into the loop, at first 
with the disable call at the top of the loop, which did not 
panic. I moved it down
one line at a time and when I get to this:

         // Now do the children of this account
         // First determine how many there are
         bind_text(count_children, 1, account.guid);
         int n_children = one_row(count_children, 
&get_int).integer_value;
         if (n_children > 0)
         {
             Account[] children = new Account[n_children];
             bind_text(find_children, 1, account.guid);
             int i = 0;
             while (next_row_available_p(find_children, 
&reset_stmt))
             {
                 children[i].name = 
fromStringz(sqlite3_column_text(find_children, 0)).dup;
                 children[i].path = format("%s:%s", account.path, 
children[i].name);
                 GC.disable();
                children[i].guid = 
fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
                 children[i].commodity_guid = 
fromStringz(sqlite3_column_text(find_children, 2)).dup; // 
commodity_guid
                 children[i].flags = 
sqlite3_column_int(find_children,
                         3) | account.flags & 
(account_flag_descendents_are_assets
                         | 
account_flag_descendents_are_liabilities | 
account_flag_descendents_are_income
                         | account_flag_descendents_are_expenses | 
account_flag_descendents_are_marketable
                         | 
account_flag_self_and_descendents_are_tax_related
                         | 
account_flag_descendents_need_commodity_link); // flags
                 i = i + 1;
                 GC.enable();
             }
             if (i != n_children)
                 panic(format("walk_account_tree: bad child index, 
%d, %d, %s", i, n_children, account.path));
             foreach (child; children) {
                 walk_account_tree(child, ancestor_flags | 
account.flags);
             }
         }

the panic occurs.

Then I had another of my hunches. I re-wrote the line containing 
the call to format like so:

         // Now do the children of this account
         // First determine how many there are
         bind_text(count_children, 1, account.guid);
         int n_children = one_row(count_children, 
&get_int).integer_value;
         if (n_children > 0)
         {
             Account[] children = new Account[n_children];
             bind_text(find_children, 1, account.guid);
             int i = 0;
             while (next_row_available_p(find_children, 
&reset_stmt))
             {
                 children[i].name = 
fromStringz(sqlite3_column_text(find_children, 0)).dup;
                 size_t n_collections = 
GC.ProfileStats.numCollections;
                 children[i].path = (account.path ~ ":" ~ 
children[i].name);
                 bool collection_occured = 
GC.ProfileStats.numCollections != n_collections;
                children[i].guid = 
fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
                 children[i].commodity_guid = 
fromStringz(sqlite3_column_text(find_children, 2)).dup; // 
commodity_guid
                 children[i].flags = 
sqlite3_column_int(find_children,
                         3) | account.flags & 
(account_flag_descendents_are_assets
                         | 
account_flag_descendents_are_liabilities | 
account_flag_descendents_are_income
                         | account_flag_descendents_are_expenses | 
account_flag_descendents_are_marketable
                         | 
account_flag_self_and_descendents_are_tax_related
                         | 
account_flag_descendents_need_commodity_link); // flags
                 i = i + 1;
             }
             if (i != n_children)
                 panic(format("walk_account_tree: bad child index, 
%d, %d, %s, %s", i, collection_occurred, n_children, 
account.path));
             foreach (child; children) {
                 walk_account_tree(child, ancestor_flags | 
account.flags);
             }
         }

This does not panic! In theory, this should consume the same 
amount of GC memory as the format call (though I have not looked 
at the format code; there may well
be intermediate allocations as it does its work).

Now I decided to instrument the code so I could determine whether 
GCs were occurring as a result of the call to format:

         // Now do the children of this account
         // First determine how many there are
         bind_text(count_children, 1, account.guid);
         int n_children = one_row(count_children, 
&get_int).integer_value;
         if (n_children > 0)
         {
             size_t collections_before = 0;
             size_t collections_after = 0;
             Account[] children = new Account[n_children];
             bind_text(find_children, 1, account.guid);
             int i = 0;
             while (next_row_available_p(find_children, 
&reset_stmt))
             {
                 children[i].name = 
fromStringz(sqlite3_column_text(find_children, 0)).dup;
                 collections_before += 
GC.profileStats().numCollections;
                 //children[i].path = (account.path ~ ":" ~ 
children[i].name);
                 children[i].path = format("%s:%s", account.path, 
children[i].name);
                 collections_after += 
GC.profileStats().numCollections;
               children[i].guid = 
fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
                 children[i].commodity_guid = 
fromStringz(sqlite3_column_text(find_children, 2)).dup; // 
commodity_guid
                 children[i].flags = 
sqlite3_column_int(find_children,
                         3) | account.flags & 
(account_flag_descendents_are_assets
                         | 
account_flag_descendents_are_liabilities | 
account_flag_descendents_are_income
                         | account_flag_descendents_are_expenses | 
account_flag_descendents_are_marketable
                         | 
account_flag_self_and_descendents_are_tax_related
                         | 
account_flag_descendents_need_commodity_link); // flags
                 i = i + 1;
             }
             if (i != n_children)
                 panic(format("walk_account_tree: bad child index, 
%d, %d, %s, %d, %d", i, n_children, account.path, 
collections_before, collections_after));
             if (account.path == ":Assets:Investments:Equities and 
derivatives:Taxable:Donald C. Allen 2003 Revocable Trust:TD 
Ameritrade") {
                 writeln(format("walk_account_tree: %d, %d, %s, 
%d, %d", i, n_children, account.path, collections_before, 
collections_after));
             }
             foreach (child; children) {
                 walk_account_tree(child, ancestor_flags | 
account.flags);
             }
         }

Running this produces:
dca at pangloss:~/Software/newcash_d/verifier$ ./verifier 
"--DRT-gcopt=profile:1" /tmp/Finances.newcash
walk_account_tree: bad child index, 231, 344, 
:Assets:Investments:Equities and derivatives:Taxable:Donald C. 
Allen 2003 Revocable Trust:TD Ameritrade, 33, 34
dca at pangloss:~/Software/newcash_d/verifier$

This proves that at least once during the loop that failed, a GC 
occurred as a result of the format call.

Next I tried uncommenting the concatenation line and commenting 
out the format call. That produces:
dca at pangloss:~/Software/newcash_d/verifier$ ./verifier 
"--DRT-gcopt=profile:1" /tmp/Finances.newcash
walk_account_tree: 344, 344, :Assets:Investments:Equities and 
derivatives:Taxable:Donald C. Allen 2003 Revocable Trust:TD 
Ameritrade, 0, 0
         Number of collections:  2
         Total GC prep time:  0 milliseconds
         Total mark time:  0 milliseconds
         Total sweep time:  0 milliseconds
         Max Pause Time:  0 milliseconds
         Grand total GC time:  0 milliseconds
GC summary:    5 MB,    2 GC    0 ms, Pauses    0 ms <    0 ms
dca at pangloss:~/Software/newcash_d/verifier$

So again, using concatenation instead of format fixes the 
problem. And no GCs have occurred after finishing the loop on the 
account at issue.

And one other tidbit. If I add a precision to %s for the path in 
the format call, like so and change the test for the problem 
account by checking for its
guid rather than the truncated path:
         // Now do the children of this account
         // First determine how many there are
         bind_text(count_children, 1, account.guid);
         int n_children = one_row(count_children, 
&get_int).integer_value;
         if (n_children > 0)
         {
             size_t collections_before = 0;
             size_t collections_after = 0;
             Account[] children = new Account[n_children];
             bind_text(find_children, 1, account.guid);
             int i = 0;
             while (next_row_available_p(find_children, 
&reset_stmt))
             {
                 children[i].name = 
fromStringz(sqlite3_column_text(find_children, 0)).dup;
                 collections_before += 
GC.profileStats().numCollections;
                 //children[i].path = (account.path ~ ":" ~ 
children[i].name);
                 children[i].path = format("%s:%.10s", 
account.path, children[i].name);
                 collections_after += 
GC.profileStats().numCollections;
               children[i].guid = 
fromStringz(sqlite3_column_text(find_children, 1)).dup; // guid
                 children[i].commodity_guid = 
fromStringz(sqlite3_column_text(find_children, 2)).dup; // 
commodity_guid
                 children[i].flags = 
sqlite3_column_int(find_children,
                         3) | account.flags & 
(account_flag_descendents_are_assets
                         | 
account_flag_descendents_are_liabilities | 
account_flag_descendents_are_income
                         | account_flag_descendents_are_expenses | 
account_flag_descendents_are_marketable
                         | 
account_flag_self_and_descendents_are_tax_related
                         | 
account_flag_descendents_need_commodity_link); // flags
                 i = i + 1;
             }
             if (i != n_children)
                 panic(format("walk_account_tree: bad child index, 
%d, %d, %s, %d, %d", i, n_children, account.path, 
collections_before, collections_after));
             if (account.guid == 
"c369abf6ec2c7222e3fdd174ce2c0c9a") {
                 writeln(format("walk_account_tree: %d, %d, %s, 
%d, %d", i, n_children, account.path, collections_before, 
collections_after));
             }
             foreach (child; children) {
                 walk_account_tree(child, ancestor_flags | 
account.flags);
             }
         }

I get:
dca at pangloss:~/Software/newcash_d/verifier$ ./verifier 
"--DRT-gcopt=profile:1" /tmp/Finances.newcash
walk_account_tree: 344, 344, :Assets:Investment:Equities 
a:Taxable:Donald C. :TD Ameritr, 0, 0
         Number of collections:  3
         Total GC prep time:  0 milliseconds
         Total mark time:  0 milliseconds
         Total sweep time:  0 milliseconds
         Max Pause Time:  0 milliseconds
         Grand total GC time:  1 milliseconds
GC summary:    5 MB,    3 GC    1 ms, Pauses    1 ms <    0 ms
dca at pangloss:~/Software/newcash_d/verifier$

No panic. So there appears to be a problem in format when dealing 
with strings fed to an unconstrained %s, probably when 
interacting the garbage collector. My guess is that the issue is
in format, not the gc.

That's as far as I've gotten. But I think we know more now than 
before.