Major performance problem with std.array.front()
Marc Schütz" <schuetzm at gmx.net>
Marc Schütz" <schuetzm at gmx.net>
Sun Mar 9 07:12:28 PDT 2014
On Friday, 7 March 2014 at 23:13:50 UTC, H. S. Teoh wrote:
> On Fri, Mar 07, 2014 at 10:35:46PM +0000, Sarath Kodali wrote:
>> On Friday, 7 March 2014 at 20:43:45 UTC, Vladimir Panteleev
>> wrote:
>> >On Friday, 7 March 2014 at 19:57:38 UTC, Andrei Alexandrescu
>> >wrote:
> [...]
>> >>Clearly one might argue that their app has no business
>> >>dealing
>> >>with diacriticals or Asian characters. But that's the typical
>> >>provincial view that marred many languages' approach to UTF
>> >>and
>> >>internationalization.
>> >
>> >So is yours, if you think that making everything magically a
>> >dchar
>> >is going to solve all problems.
>> >
>> >The TDPL example only showcases the problem. Yes, it works
>> >with
>> >Swedish. Now try it again with Sanskrit.
>>
>> +1
>> In Indian languages, a character consists of one or more
>> UNICODE
>> code points. For example, in Sanskrit "ddhrya"
>> http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg
>> consists of 7 UNICODE code points. So to search for this char
>> I have
>> to use string search.
> [...]
>
> That's what I've been arguing for. The most general form of
> character
> searching in Unicode requires substring searching, and
> similarly many
> character-based operations on Unicode strings are effectively
> substring-based operations, because said "character" may be a
> multibyte
> code point, or, in your case, multiple code points. Since
> that's the
> case, we might as well just forget about the distinction between
> "character" and "string", and treat all such operations as
> substring
> operations (even if the operand is supposedly "just 1 character
> long").
>
> This would allow us to get rid of the hackish auto-decoding of
> narrow
> strings, and thus eliminate the needless overhead of always
> decoding.
That won't work, because your needle might be in a different
normalization form than your haystack, thus a byte-by-byte
comparison will not be able to find it.
More information about the Digitalmars-d
mailing list