DustMite, a D test case minimization tool
Robert Jacques
sandford at jhu.edu
Sun May 22 16:15:49 PDT 2011
On Sun, 22 May 2011 09:40:19 -0400, Vladimir Panteleev
<vladimir at thecybershadow.net> wrote:
> On Sun, 22 May 2011 11:56:33 +0300, KennyTM~ <kennytm at gmail.com> wrote:
>
>> Nice tool! I tried to use it to reduce bug 6044, but encountered 2
>> problems:
>>
>> 1. DustMite will load _all_ files, including the _binary_ ones, which
>> is seldom in valid UTF-8 encoding, and that causes a UtfException to
>> be thrown from 'save.dump' because 'e.header' contains those invalid
>> character. (BTW, Andrei, is it really necessary to include the whole
>> invalid string in the exception?!)
>
> The real question here is why would appender validate UTF when appending
> a string to a string? This reduces the complexity of whatever a GC
> allocation COULD be to linear, so for large strings it might be slower
> than appending to an array. The following comment is in Phobos, but I
> don't understand it:
>
> // note, we disable this branch for appending one type of char
> to
> // another because we can't trust the length portion.
Essentially, this comment is about how you have to decode and then encode
anytime one changes the character type. i.e. the fact that 1 dchar != 1
wchar != 1 char. So a 5 dchar string, might require 20 chars to represent.
As for performance, using appender is never slower than ~=, as it uses
essentially the same code. Furthermore, you actually can not make appender
use linear allocation, even when you are doing a transcoding operation, as
it always grows by max(needed, newCapacity() ), which gives it a roughly
an exponential growth rate. Also, if you're concerned about appender
performance, I'd recommend using the patch from Issue 5813.
More information about the Digitalmars-d
mailing list