[dmd-internals] Rare and pernicious bug in string append
Andrei Alexandrescu
andrei at erdani.com
Tue Mar 16 08:04:49 PDT 2010
This bug ruined a couple of workdays for me. (I'm using dmd 2.042 beta.)
I'd appreciate very much if people who know the innards of string append
could look into it at their earliest convenience. Currently the safe
version is twice as slow as the fast (buggy) version, so I'm looking at
8hrs instead of 4hrs for completing an experiment against 5.75 million
HTML files.
The bug is exceedingly rare. It occurs only once every few thousand HTML
files. The failing file occurs after 28,000 files have processed
successfully.
The code may be further simplified, but not a lot. This is apparently a
low-level bug because small changes in the input or the code make the
bug manifest differently or not at all.
To reproduce: copy untag.d and data.html to an empty directory. Then
compile untag:
$ dmd untag
To run untag without the bug, run:
./untag --bug=0
To run it with bug #1 related to string ~=, run:
./untag --bug=1
You will see:
Invalid UTF sequence: 255
To run it with bug #2 related to string ~, run:
./untag --bug=2
You will see:
Invalid UTF sequence: 252
The three programs should have identical semantics. Characters 255 and
252 are not present in the input file.
Andrei
More information about the dmd-internals
mailing list