Major performance problem with std.array.front()

Tue Mar 11 05:44:33 PDT 2014

On Sunday, 9 March 2014 at 21:38:06 UTC, Nick Sabalausky wrote:
> On 3/9/2014 7:47 AM, w0rp wrote:
>>
>> My knowledge of Unicode pretty much just comes from having
>> to deal with foreign language customers and discovering the 
>> problems
>> with the code unit abstraction most languages seem to use. 
>> (Java and
>> Python suffer from similar issues, but they don't really have 
>> algorithms
>> in the way that we do.)
>>
>
> Python 2 or 3 (out of curiosity)? If you're including Python3, 
> then that somewhat surprises me as I thought greatly improved 
> Unicode was one of the biggest reasons for the jump from 2 to 
> 3. (Although it isn't *completely* surprising since, as we all 
> know far too well here, fully correct Unicode is *not* easy.)

Late reply here. Python 3 is a lot better in terms of Unicode 
support than 2. The situation in Python 2 was this.

1. The default string type is 'str', an immutable array of bytes.
2. 'str' could be one of many encodings, including UTF-16, etc.
3. There is an extra 'unicode' type for when you want a Unicode 
string.
4. Python implicltly converts between the two, often in wrong 
ways, often causing exceptions to appear where you didn't expect 
them to.

In 3, this changed to...

1. The default string type is still named 'str', only now it's 
like the 'unicode' of olde.
2. 'bytes' is a new immutable array of bytes type like the Python 
2 'str'.
3. Conversion between 'str' and 'bytes' is always explicit.

However, Python 3 works on a code point level, probably some code 
unit level in fact, and you don't see very many algorithms which 
take, say, combining characters into account. So Python suffers 
from similar issues.