[dmd-internals] Rare and pernicious bug in string append

Andrei Alexandrescu andrei at erdani.com
Tue Mar 16 08:04:49 PDT 2010


This bug ruined a couple of workdays for me. (I'm using dmd 2.042 beta.) 
I'd appreciate very much if people who know the innards of string append 
could look into it at their earliest convenience. Currently the safe 
version is twice as slow as the fast (buggy) version, so I'm looking at 
8hrs instead of 4hrs for completing an experiment against 5.75 million 
HTML files.

The bug is exceedingly rare. It occurs only once every few thousand HTML 
files. The failing file occurs after 28,000 files have processed 
successfully.

The code may be further simplified, but not a lot. This is apparently a 
low-level bug because small changes in the input or the code make the 
bug manifest differently or not at all.

To reproduce: copy untag.d and data.html to an empty directory. Then 
compile untag:

$ dmd untag

To run untag without the bug, run:

./untag --bug=0

To run it with bug #1 related to string ~=, run:

./untag --bug=1

You will see:

Invalid UTF sequence: 255

To run it with bug #2 related to string ~, run:

./untag --bug=2

You will see:

Invalid UTF sequence: 252

The three programs should have identical semantics. Characters 255 and 
252 are not present in the input file.


Andrei



More information about the dmd-internals mailing list