DCT: D compiler as a collection of libraries

Marco Leise Marco.Leise at gmx.de
Sun May 20 10:42:17 PDT 2012


Am Sun, 20 May 2012 10:09:34 +0200
schrieb "Roman D. Boiko" <rb at d-coding.com>:

> Could you name a few specific concerns?

Mostly my own gut feeling, that things that sound great in my head turn out to bite me in the end. Things that one just doesn't think of because of the limited horizon everyone has and probably a lack of experience in the field of compiler/tools writing.
On the other hand I have good experience with working by community feedback. So the rough edges in the design may already be ironed out by now.

> I'm going to pick up several use cases and prioritize them 
> according to my judgement. Feel free to suggest any cases that 
> you think are needed (with motivation). Prioritizing is necessary 
> to define what is out of scope and plan work into milestones, in 
> order to ensure the project is feasible.

There is one feature I remember caused some head-aches for Code::Blocks. They used a separate parser instance for every project in the IDE, which meant that all the standard include files would be parsed and kept in memory multiple times. When they later switched to a common parser for all projects they ran into stability issues. If you can arrange it, it would be great for multi-project IDEs to be able to add and remove projects to your parser without reparsing Phobos/druntime (which may have its .di files replaced by .d files, looking at the past discussion).

C bindings could be an option. (As in: the smallest common denominator.) They allow existing tools (written in Java, C#, Python, ...) to use your library.

> > Since assembly code is usually small I just preallocate an 
> > array of sourceCode.length tokens and realloc it to the correct 
> > size when I'm done parsing. Nothing pretty, but simple and I am 
> > sure it won't get any faster ;).
> I'm sure it will :) (I'm going to elaborate on this some time 
> later).

I'm curious.

> There are several EoF conditions: \0, \x1A, __EOF__ and physicas 
> EOF. And any loop would need to check for all. Preallocation 
> could eliminate the need to check for physical EoF, but would 
> make it impossible to avoid input string copying and also would 
> not allow passing a function a slice of string to be lexed.

I found that I usually either load from file into a newly allocated buffer (where a copy occurs, only because I forgot about assumeUnique in std.exception) or I am editing the file in which case I recreate the source string after every key stroke anyway. I can still pass slices of that string to functions though. Not sure what you mean.
It probably doesn't work for D as well as for ASM code, but I could also check for \x1A and __EOF__ in the same fashion. (By the way, is it \x1A (substitute, ^Z) or did you mean \0x04 (end-of-transmission, ^D)?) The way it works is: Parser states like 'in expression' can safely peek at the next character at any time. If it doesn't match what they expect they emit an error and drop back to the "surrounding" parser state. When they reach the "file" level, that's the only place where an EOF (which will only occur once per file anyway) will be consumed.
In theory one can drop all string length checks and work on char* directly with a known terminator char that is distinct from anything else.

> > ** Errors  **
> > I generally keep the start and end column, in case someone 
> > wants that.
>
> This functionality has been the priority from the beginning. Not 
> implemented yet but designed for. Evaluation of column and line 
> only on demand is caused by the assumption that such information 
> is needed rarely (primarily to display information to the user). 
> My new design also addresses the use cases when it is needed 
> frequently.
>
> [...] 
>
> Incremental changes are the key to efficiency, and I'm going to 
> invest a lot of effort into making them. Also immutability of 
> data structures will enable many optimizations.

No more questions!

-- 
Marco



More information about the Digitalmars-d-announce mailing list