Should this work?

Jakob Ovrum jakobovrum at gmail.com
Thu Jan 9 17:45:55 PST 2014


On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> [snip]

Using std.algorithm or std.range requires learning about ranges. 
You shouldn't be surprised that string handling with ranges works 
differently from specialized string handling functions, which is 
the norm in most languages. For anyone with even a cursory 
knowledge of ranges and range algorithms, it's no surprise when 
the result of a range composition is not of string type even when 
the input is a string.

If you don't want to learn about ranges, use std.string. If 
std.string is not sufficient, then you should consider learning 
about ranges, which means accepting that yes, things will be 
different. Learning about ranges and how to use them for string 
manipulation is not the easiest thing right now due to a dearth 
of learning material, but that's not a problem with ranges. 
Compiler error messages are indeed part of the problem, but they 
are a WIP. 2.065 contains an incremental improvement to error 
messages on failure of overload resolution (Thanks Kenji).

About Unicode, the unit that the language promotes and the 
standard library embraces is `dchar`, the Unicode code point. The 
choice of not using graphemes is a compromise between correctness 
and performance. That means that the onus is still on the user to 
cover the last mile of correctness, so the user is not exempt 
from having to learn at least the basics of Unicode in order to 
write Unicode-correct code in D. However, this is a surprisingly 
reasonable compromise: as long as all inputs are normalized to 
the same format (which may require std.uni.normalize if the 
source of the input does not guarantee a particular format), then 
outside of contrived examples it's very hard to break grapheme 
clusters by using range-based code, even though they are ranges 
of code points. Explicit handling of graphemes is typically only 
needed for very specific domains, like if you're writing a text 
rendering library or a text input box etc. Thus typical 
range-based string manipulation tends to be correct even for 
multi-code-point graphemes, without the author having to 
consciously handle it.

2.065 has std.uni.byGrapheme/byCodePoint for range-based grapheme 
manipulation. However, there is a performance cost involved so I 
recommend against using it dogmatically. The result of 
`byGrapheme` is not bidirectional yet - someone needs to take the 
time to implement `decodeGraphemeBack` and/or 
`graphemeStrideBack` first.


More information about the Digitalmars-d mailing list