RFC: std.json sucessor

Kiith-Sa via Digitalmars-d digitalmars-d at puremagic.com
Mon Aug 25 16:24:42 PDT 2014


On Monday, 25 August 2014 at 22:40:00 UTC, Ola Fosheim Grøstad 
wrote:
> On Monday, 25 August 2014 at 21:53:50 UTC, Ola Fosheim Grøstad 
> wrote:
>> I presume you can load 16 bytes and do BITWISE-AND on the MSB, 
>> then match against string-end and carefully use this to boost 
>> performance of simultanous UTF validation, escape-scanning, 
>> and string-end scan. A bit tricky, of course.
>
> I think it is doable and worth it…
>
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/
>
> e.g.:
>
> __mmask16 _mm_cmpeq_epu8_mask (__m128i a, __m128i b)
> __mmask32 _mm256_cmpeq_epu8_mask (__m256i a, __m256i b)
> __mmask64 _mm512_cmpeq_epu8_mask (__m512i a, __m512i b)
> __mmask16 _mm_test_epi8_mask (__m128i a, __m128i b)
> etc.
>
> So you can:
>
> 1. preload registers with "\\\\\\\\…" ,  "\"\"…"  and "\0\0\0…"
> 2. then compare signed/unsigned/equal whatever.
> 3. then load 16,32 or 64 bytes of data and stream until the 
> masks trigger
> 4. tests masks
> 5. resolve any potential issues, goto 3

D:YAML uses a similar approach, but with 8 bytes (plain ulong - 
portable) to detect how many ASCII chars are there before the 
first non-ASCII UTF-8 sequence,  and it significantly improves 
performance (didn't keep any numbers unfortunately, but it 
decreases decoding overhead to a fraction for most inputs (since 
YAML (and JSON) files tend to be mostly-ASCII with non-ASCII from 
time to time in strings), if we know that we have e.g. 100 chars 
incoming that are plain ASCII, we can use a fast path for them 
and only consider decoding after that))

See the countASCII() function in 
https://github.com/kiith-sa/D-YAML/blob/master/source/dyaml/reader.d

However, this approach is useful only if you decode the whole 
buffer at once, not if you do something like foreach(dchar ch; 
"asdsššdfáľäô") {}, which is the most obvious way to decode in D.

FWIW, decoding _was_ a significant overhead in D:YAML (again, 
didn't keep numbers, but at a time it was around 10% in the 
profiler), and I didn't like the fact that it prevented making my 
code @nogc - I ended up copying chunks of std.utf and making them 
@nogc nothrow (D:YAML as a whole is not @nogc but I use @nogc in 
some parts basically as "@noalloc" to ensure I don't allocate 
anything)


More information about the Digitalmars-d mailing list