Fix Phobos dependencies on autodecoding

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Aug 13 16:18:03 UTC 2019


On Tue, Aug 13, 2019 at 07:31:28AM +0000, a11e99z via Digitalmars-d wrote:
[...]
> imo autodecoding is one of right thing.
[...]
> why u decide to fight with autodecoding?

Because it *appears* to be right, but it's actually wrong. For example:

	import std.range : retro;
	import std.stdio;

	void main() {
		writeln("привет".retro);
		writeln("приве́т".retro);
	}

Expected output:
	тевирп
	те́вирп

Actual output:
	тевирп
	т́евирп

The problem is that autodecoding makes the assumption that Unicode code
point == grapheme, but this is not true. It's usually true for European
languages, but it fails for many other languages.  So auto-decoding
gives you the illusion of correctness, but when you ship your product to
Asia suddenly you get a ton of bug reports.

To guarantee correctness you need to work with graphemes (see
.byGrapheme). But we can't make that the default because it's a big
performance hit, and many string algorithms don't actually need grapheme
segmentation.

Ultimately, the correct solution is to put the onus on the programmer to
select the iteration scheme (by code units, code points, or graphemes)
depending on what's actually needed at the application level.
Arbitrarily choosing one of them to be the default leads to a false
sense of security.


T

-- 
That's not a bug; that's a feature!


More information about the Digitalmars-d mailing list