The Case Against Autodecode
H. S. Teoh via Digitalmars-d
digitalmars-d at puremagic.com
Tue May 31 13:35:42 PDT 2016
On Tue, May 31, 2016 at 10:47:56PM +0300, Dmitry Olshansky via Digitalmars-d wrote:
> On 31-May-2016 01:00, Walter Bright wrote:
> > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> > > I don't agree on changing those. Indexing and slicing a char[] is
> > > really useful and actually not hard to do correctly (at least with
> > > regard to handling code units).
> >
> > Yup. It isn't hard at all to use arrays of codeunits correctly.
>
> Ehm as long as all you care for is operating on substrings I'd say.
> Working with individual character requires either decoding or clever
> tricks like operating on encoded UTF directly.
[...]
Working on individual characters needs byGrapheme, unless you know
beforehand that the character(s) you're working with are ASCII, or fits
in a single code unit.
About "clever tricks", it's not really that hard. I was thinking that
things like s.canFind('Ш') should translate the 'Ш' into a UTF-8 byte
sequence, and then do a substring search directly on the encoded string.
This way, a large number of single-character algorithms don't even need
to decode. The way UTF-8 is designed guarantees that there will not be
any false positives. This will eliminate a lot of the current overhead
of autodecoding.
T
--
Klein bottle for rent ... inquire within. -- Stephen Mulraney
More information about the Digitalmars-d
mailing list