three little issues

spir denis.spir at gmail.com
Sun Feb 6 04:11:23 PST 2011


Hello,

Here are three little issues I faced while implemented a lexing toolkit (see 
other post).

1. Regex match

Let us say there are three "natures" or "modes" of lexeme:
* SKIP: not even kept, just matched and dropped (eg optional spacing)
* MARK: kept, but slice is irrelevant data (eg all kinds of punctuation)
* DATA: slice is necessary data (eg constant value or symbol)

For the 2 first cases, I still need to get the size ot the matched slice, to 
advance in source by the corresponding offset. Is there a way to get this 
information without fetching the slice by calling hit()?

Also, I would like to know when Regex.hit() copies or slices.


2. reference escape

This is a little enigma I face somewhere in this module. Say S is a struct:
     ...
     auto s = S(data);
     return &s;
This code is obvioulsy wrong and the compiler gently warns me about that. But 
the variant below is allowed and more, seems towork fine:
     return &(S(data);
For me, both versions are synonym. Thus, why does the compiler accept the 
latter and why does it work? Any later use to the returned struct (recorded in 
an array) should miserably fail with segfault. (*)
Or is it that the compiler recognises the idiom and implicitely allocates the 
struct outside the local stack?
Example:

struct S { int i; }

S* newS (int i) {
     if (i < 0)
         return null;
//  auto s = S(i);
//  return &s;  // Error: escaping reference to local s
     return &(S(i));
}

unittest {
     int[] ints = [2, -2, 1, -1, 0];
     S[] structs;
     foreach (i ; ints) {
         auto p = newS(i);
         if (p) {
             structs ~= *p;      // explicite deref!
         }
     }
     assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

How can this work?


3. implicite deref

But there is even more mysterious for me: if I first access the struct before 
recording it like in:

unittest {
     int[] ints = [2, -2, 1, -1, 0];
     S[] structs;
     foreach (i ; ints) {
         auto p = newS(i);
         if (p) {
             write (p.i,' ');    // implicite deref!
             structs ~= *p;      // explicite deref!
         }
     }
     assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the final assert fails!? But the written i's are correct ("2 1 0").
Worse, if I exchange the two deref lines:

unittest {
     int[] ints = [2, -2, 1, -1, 0];
     S[] structs;
     foreach (i ; ints) {
         auto p = newS(i);
         if (p) {
             structs ~= *p;      // explicite deref!
             write (p.i,' ');    // implicite deref!
         }
     }
     assert ( structs == [S(2), S(1), S(0)] );   // pass!
}

...then the assertion passes, but the written integers are wrong (looks like 
either garbage or an address, repeated 3 times, eg: "134518949 134518949 
134518949"; successive runs constantly produce the same value).


Denis
-- 
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d-learn mailing list