Want to help DMD bugfixing? Write a simple utility.

Wed Mar 23 14:16:02 PDT 2011

> On 03/23/2011 09:16 AM, Jonathan M Davis wrote:
> >> On Sun, 20 Mar 2011 07:50:10 -0000, Jonathan M
> >> Davis<jmdavisProg at gmx.com>
> >> 
> >> wrote:
> >>>> Jonathan M Davis wrote:
> >>>>> On Saturday 19 March 2011 18:04:57 Don wrote:
> >>>>>> Jonathan M Davis wrote:
> >>>>>>> On Saturday 19 March 2011 17:11:56 Don wrote:
> >>>>>>>> Here's the task:
> >>>>>>>> Given a .d source file, strip out all of the unittest {} blocks,
> >>>>>>>> including everything inside them.
> >>>>>>>> Strip out all comments as well.
> >>>>>>>> Print out the resulting file.
> >>>>>>>> 
> >>>>>>>> Motivation: Bug reports frequently come with very large test
> >>>>>>>> cases. Even ones which look small often import from Phobos.
> >>>>>>>> Reducing the test case is the first step in fixing the bug, and
> >>>> 
> >>>> it's
> >>>> 
> >>>>>>>> frequently ~30% of the total time required. Stripping out the unit
> >>>>>>>> tests is the most time-consuming and error-prone part of reducing
> >>>> 
> >>>> the
> >>>> 
> >>>>>>>> test case.
> >>>>>>>> 
> >>>>>>>> This should be a good task if you're relatively new to D but would
> >>>>>>>> like to do something really useful.
> >>>>>>> 
> >>>>>>> Unfortunately, to do that 100% correctly, you need to actually have
> >>>> 
> >>>> a
> >>>> 
> >>>>>>> working D lexer (and possibly parser). You might be able to get
> >>>>>>> something close enough to work in most cases, but it doesn't take
> >>>> 
> >>>> all
> >>>> 
> >>>>>>> that much to throw off a basic implementation of this sort of thing
> >>>> 
> >>>> if
> >>>> 
> >>>>>>> you don't lex/parse it with something which properly understands D.
> >>>>>>> 
> >>>>>>> - Jonathan M Davis
> >>>>>> 
> >>>>>> I didn't say it needs 100% accuracy. You can assume, for example,
> >>>> 
> >>>> that
> >>>> 
> >>>>>> "unittest" always occurs at the start of a line. The only other
> >>>> 
> >>>> things
> >>>> 
> >>>>>> you need to lex are {}, string literals, and comments.
> >>>>>> 
> >>>>>> BTW, the immediate motivation for this is std.datetime in Phobos.
> >>>>>> The sheer number of unittests in there is an absolute catastrophe
> >>>>>> for tracking down bugs. It makes a tool like this MANDATORY.
> >>>>> 
> >>>>> I tried to create a similar tool before and gave up because I
> >>>>> couldn't make it 100% accurate and was running into problems with
> >>>>> it. If
> >>>> 
> >>>> someone
> >>>> 
> >>>>> wants to take a shot at it though, that's fine.
> >>>>> 
> >>>>> As for the unit tests in std.datetime making it hard to track down
> >>>> 
> >>>> bugs,
> >>>> 
> >>>>> that only makes sense to me if you're trying to look at the whole
> >>>> 
> >>>> thing
> >>>> 
> >>>>> at once and track down a compiler bug which happens _somewhere_ in
> >>>>> the code, but you don't know where. Other than a problem like that,
> >>>>> I
> >>>> 
> >>>> don't
> >>>> 
> >>>>> really see how the unit tests get in the way of tracking down bugs.
> >>>>> Is it that you need to compile in a version of std.datetime which
> >>>>> doesn't have any unit tests compiled in but you still need to
> >>>>> compile with -unittest for other stuff?
> >>>> 
> >>>> No. All you know there's a bug that's being triggered somewhere in
> >>>> Phobos (with -unittest). It's probably not in std.datetime.
> >>>> But Phobos is a horrible ball of mud where everything imports
> >>>> everything else, and std.datetime is near the centre of that ball.
> >>>> What you have to do is reduce the amount of code, and especially the
> >>>> number of modules, as rapidly as possible; this means getting rid of
> >>>> imports.
> >>>> 
> >>>> To do this, you need to remove large chunks of code from the files.
> >>>> This is pretty simple; comment out half of the file, if it still
> >>>> works, then delete it. Normally this works well because typically
> >>>> only about a dozen lines are actually being used. After doing this
> >>>> about three or four times it's small enough that you can usually get
> >>>> rid of most of the imports. Unittests foul this up because they use
> >>>> functions/classes from inside the file.
> >>>> 
> >>>> In the case of std.datetime it's even worse because the
> >>>> signal-to-noise ratio is so incredibly poor; it's really difficult to
> >>>> find the few lines of code that are actually being used by other
> >>>> Phobos modules.
> >>>> 
> >>>> My experience (obviously only over the last month or so) has been that
> >>>> if the reduction of a bug is non-obvious, more than 10% of the total
> >>>> time taken to fix that bug is the time taken to cut down std.datetime.
> >>> 
> >>> Hmmm. I really don't know what could be done to fix that (other than
> >>> making it
> >>> easier to rip out the unittest blocks). And enough of std.datetime
> >>> depends on
> >>> other parts of std.datetime that trimming it down isn't (and can't be)
> >>> exactly
> >>> easy. In general, SysTime is the most likely type to be used, and it
> >>> depends
> >>> on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of
> >>> the
> >>> free functions in the module. It's not exactly designed in a manner
> >>> which allows you to cut out large chunks and still have it compile.
> >>> And I don't think that it _could_ be designed that way and still have
> >>> the
> >>> functionality
> >>> that it has.
> >>> 
> >>> I guess that this sort of problem is one that would pop up mainly when
> >>> dealing
> >>> with compiler bugs. I have a hard time seeing it popping up with your
> >>> typical
> >>> bug in Phobos itself. So, I guess that this is the sort of thing that
> >>> you'd
> >>> run into and I likely wouldn't.
> >>> 
> >>> I really don't know how the situation could be improved though other
> >>> than making it easier to cut out the unit tests.
> >> 
> >> I was just thinking .. if we get a list of the symbols the linker is
> >> including, then write an app to take that list, and strip everything
> >> else out of the source .. would that work.  The Q's are how hard is it
> >> to get the symbols from the linker and then how hard is it to match
> >> those to source.  IIRC there are functions in phobos to convert to/from
> >> symbol names, so if the app had sufficient lexing and parsing
> >> capability it could match on those.
> > 
> > That would require a full-blown D lexer and parser.
> > 
> > - Jonathan M Davis
> 
> Why are we talking about having to recreate a full-blown lexer and
> parser when there has to be one that exists for D anyway? This is
> sounding more and more like you're asking the wrong crowd to solve a
> problem. To do it right, the people who have access to the real D lexer
> and parser would need to write this utility, and in some ways, it's
> already written since compiling with out a -unittest flag already omits
> all the unittests.
> 
> So I'm a bit confused about two things.
> 
> 1) Why ask the wrong people to write the tool in the first place?
> 2) Why are we the wrong people any way?

There are tasks for which you need to be able to lex and parse D code. To 100% 
correctly remove unit tests would be one such task. Another would be if you 
want a program to be able to syntax highlight some D code. Currently, as far 
as I know, there are only two lexers and two parsers for D: the C++ front end 
which dmd, gdc, and ldc use and the D front end which ddmd uses and which is 
based on the C++ front end. Both of those are under the GPL (which makes them 
useless for a lot of stuff) and both of them are tied to compilers. Being able 
to lex D code and get the list of tokens in a D program and being able to 
parse D code and get the resultant abstract syntax tree would be very useful 
for a number of programs.

So, while your average program may not care about being able to lex and parse 
D code, there _are_ programs that do, and being able to do so in D would be 
highly valuable for such programs. Previously Walter asked for a volunteer to 
port the lexer from the C++ front end to D under the Boost license to be put 
into Phobos (I volunteered for that and have been working on it off and on, 
slowly making progress on it). Andrei's reaction was that we should have a 
generic lexer which uses generic programming and is not tied to D at all, and 
_that_ is what someone may be working on for the GSoC (there are still solid 
arguments for having a D-specific lexer though, so hopefully we end up with 
both).

Now, for this particular problem, in order to track down certain types of 
compiler bugs, he needs to be able to build with -unittest but not have 
irrelevant code compiled in. So, for instance, if he's testing a bug related 
to compiling std.file with -unittest and it imported std.datetime, he would 
want to strip out as much as std.datetime as std.file doesn't need in order to 
minimize the code that he has to deal with to find the bug. std.datetime's 
unit tests are prime example of code that would be unnecessary. So, he wants a 
tool to strip the unit tests from a file. You can't use the compiler's lexer 
or parser to do that without a lot of changes. To do it 100% correctly, he 
needs a lexer (and possibly a parser) which can be used by a utility other 
than the compiler to read in a source file, strip out the unit tests, and then 
write out the file again. However, he's willing to settle for a utility that 
_mostly_ works, and you can do that without a full-blow D lexer or parser.

- Jonathan M Davis