Parsing D Maybe Not Such a Good Idea <_<;

Basile B. via Digitalmars-d digitalmars-d at puremagic.com
Fri Jun 17 01:07:13 PDT 2016


On Thursday, 16 June 2016 at 17:20:39 UTC, cy wrote:
> On Wednesday, 15 June 2016 at 07:16:31 UTC, Basile B. wrote:
>
>> You're right it's not so simple and you're also right about 
>> "everything", my "everything" is not used adequatly...
>
> Sorry, I don't mean to complain. Actually the work has already 
> all been done, rather elegantly in fact. If libdparse can get 
> through a significant subset of D2 code, I have to say I'm 
> pretty impressed with the project, and can't praise it enough.
>
> https://github.com/Hackerpilot/libdparse // disclaimer: this 
> link not endorsed by the hackerpilot org ltd
>
> It already has a D formatter in it, which dumps (prettified!) D 
> code to any sort of output range, and there's a case in it for 
> every single kind of node in the AST.

Yes, libdparse is the reference and when someone has to parse D 
code he really should use it. Among all the D libraries it's the 
one I know the more. I use it to build the CE's symbol list (it's 
an AST visitor) and to detect the "TODO comments" ;)

But somtimes it's too much: (I speak for me here) for example if 
you need to parse only simple constructs. In CE the **only** 
constructs that are parsed directly in the IDE (the two other 
cases mentioned previously are done in external tools) are 
ModuleDeclaration and VersionCondition. For them libdparse is not 
mandatory, they can be detected by hand in the token list.

> (speaking of which, when are we getting static switch 
> statements?)
>
> What I meant by "D is not simple" isn't that I'm up a creek, 
> without a paddle, but that the paddle is really complex, and 
> I'd have no hope of tackling it if it wasn't already done. The 
> complexity of D's syntax is not so much a problem here, as a 
> spectacle.
>
>> It depends on the grammatical construct you want to parse. But 
>> it's already much more simple when the comments are removed 
>> from the lexical token list.
>
> I suppose. What's complicated is the shoving of expressions 
> everywhere, since those spider out to all possible forms of 
> construct. That means the difficulty of parsing does NOT depend 
> on the grammatical construct you want to parse, except for a 
> few, very minor constructs, only the ones that don't even 
> *potentially* include expressions in their grammar.
>
> So, regardless of what you're doing, you pretty much have to 
> handle every single kind of construct,

No simple constructs can be detected in a token list. But if I 
understand correctly you've started the topic because you wished 
to detect functionDeclaration, right ?
Obviously here you need the AST. Function declarations can be 
disabled via a versionCondition or enabled by a static if, 
injected by a mixin template, injected by a string... They cannot 
be accurately detected by picking 4 or 5 tokens in a list.

> but if "handle" means "transform, then output" and you can 
> separate those two steps, then if someone does all the output 
> for you, the "transform" step can be very simple and specific. 
> Not because you can remove the comment nodes, but because you 
> can ignore ALL nodes that you're not interested in transforming.



More information about the Digitalmars-d mailing list