Why use a DFA instead of DIP1000?
Richard (Rikki) Andrew Cattermole
richard at cattermole.co.nz
Tue Sep 16 10:47:03 UTC 2025
On 16/09/2025 9:59 PM, Dennis wrote:
> On Monday, 15 September 2025 at 23:02:25 UTC, Adam Wilson wrote:
>> Here is examples of C#'s DFA in action:
>>
>> (...)
>>
>> There is no rule that states that DFA must be slow as there exist
>> examples disproving the assertion.
>
> When Walter talks about DFA, he refers to the specific algorithm he
> learned in the 1980s and implemented in his backend optimizer, which
> solves data-flow equations in a loop until it converges on a fixed
> point. This algorithm is superlinear, so considered 'too slow' for the
> frontend which aims to analyse n statements in O(n) time.
>
> What Rikki implements is a 4th semantic pass, which analyzes the code
> after the existing 3 linear semantic passes have completed. I don't know
> exactly how it works, but from his communication I gather it's a subset
> of 'full' DFA which trades a bit of accuracy for compile speed.
You haven't got my stuff entirely right.
The fast DFA which is what I'm working on right now, runs in semantic 3.
Its very similar to DIP1000 in terms of it doing a single forward pass
with the statement and expression walkers.
The slow DFA which I want to work on eventually, is indeed more like the
classical DFA's of the 80's and would run in semantic 4. Since this is a
known quantity and it isn't the default experience which all these
features hinge on having a solution to, so it isn't a priority.
Like classical DFA's the fast DFA does indeed use lattices, join and
meet operators. But I have extra information available like depth
checks, write counts and scopes modelling that classical DFA's do not have.
The fast DFA does something quite interesting, it isn't modelling what
is modellable, it instead models what it can't model, and what is left
is what is modellable. You need to have unknown state, with ways of
putting variables into it. This allows me to cull as much false and
perceived false positives as possible.
Data flow analysis is the derivative of proof assistants. They work on
properties and sometimes in the case of VRP they integrate values into
their calculations.
If it weren't for my zero false & perceived false positives rule, the
fast DFA engine would be almost as good as a full DFA engine. Its
missing a complete interprocedural analysis, and single function
cyclicity stuff.
> The C# examples of unused symbols are, from my brief experimentation on
> https://dotnetfiddle.net, even simpler than that. They don't take
> control flow into account at all:
>
> ```C#
> public class Program
> {
> static int y;
>
> public static void Main()
> {
> int x = 0;
> if (y < 0)
> if (y > 0) // this branch is dead code
> x++; // but C# doesn't mark this line as unreachable,
> or x as unused
> }
> }
> ```
It does actually.
Godbolt has a control flow graph display for C#.
https://csharp.godbolt.org/z/nMjPqYens
```
Program:Main() (FullOpts):
xor edi, edi
tail.jmp [System.Console:WriteLine(int)]
```
It silently culled the branches. Exactly what DFA should do in this
situation.
> That's not to say it's not useful, I think dmd should compute this
> information similarly and expose it through an LSP interface. However,
> this can be implemented in the existing linear semantic3 pass with a
> simple usage counter/boolean per declaration.
And that would be a rewrite and I don't mean just DIP1000.
Doing this stuff as part of the type system isn't the normal way to
handle it, and for good reason. Attaching a DFA engine today directly
into the type system to resolve this is quite frankly too late. From
what I've seen that would be a bad direction to go in and have no desire
to watch that being tried.
More information about the Digitalmars-d
mailing list