Of possible interest: fast UTF8 validation
Patrick Schluter
Patrick.Schluter at bbox.fr
Fri May 18 05:31:49 UTC 2018
On Thursday, 17 May 2018 at 23:16:03 UTC, H. S. Teoh wrote:
> On Thu, May 17, 2018 at 07:13:23PM +0000, Patrick Schluter via
> Digitalmars-d wrote: [...]
>> [...]
>
> Yes. Imagine if we standardized on a header-based string
> encoding, and we wanted to implement a substring function over
> a string that contains multiple segments of different
> languages. Instead of a cheap slicing over the string, you'd
> need to scan the string or otherwise keep track of which
> segment the start/end of the substring lies in, allocate memory
> to insert headers so that the segments are properly
> interpreted, etc.. It would be an implementational nightmare,
> and an unavoidable performance hit (you'd have to copy data
> every time you take a substring), and the @nogc guys would be
> up in arms.
>
> [...]
That's what rtf with code pages was essentially. I'm happy that
we got rid of it and that they were replaced by xml, even if
Microsoft's document xml being a bloated, ridiculous mess, it's
still an order of magnitude less problematic than rtf (I mean at
the text encoding level).
More information about the Digitalmars-d
mailing list