The Case Against Autodecode

Tue May 31 10:06:16 PDT 2016

On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote:
> Equality does not require decoding. Similarly, functions like find don't
> either. Something like filter generally would, but it's also not
> particularly normal to filter a string on a by-character basis. You'd
> probably want to get to at least the word level in that case.

It's nice that the stdlib takes care of that.

> To make matters worse, functions like find or splitter are frequently used
> to look for ASCII delimiters, even when the strings themselves contain
> Unicode characters. So, even if decoding were necessary when looking for a
> Unicode character, it's utterly wasteful when the character you're looking
> for is ASCII.

Good idea. We could overload functions such as find on char, wchar, and 
dchar. Jonathan, could you look into a PR to do that?

> But searching generally does not require decoding so long as
> the same character is always encoded the same way.

Yah, a good rule of thumb is to get the same (consistent, heh) results 
for a given string (including a given normalization) regardless of the 
encoding used. So e.g. it's nice that walkLength the same number for the 
string whether it's UTF8/16/32.

Andrei