Should this work?
Jakob Ovrum
jakobovrum at gmail.com
Thu Jan 9 17:45:55 PST 2014
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> [snip]
Using std.algorithm or std.range requires learning about ranges.
You shouldn't be surprised that string handling with ranges works
differently from specialized string handling functions, which is
the norm in most languages. For anyone with even a cursory
knowledge of ranges and range algorithms, it's no surprise when
the result of a range composition is not of string type even when
the input is a string.
If you don't want to learn about ranges, use std.string. If
std.string is not sufficient, then you should consider learning
about ranges, which means accepting that yes, things will be
different. Learning about ranges and how to use them for string
manipulation is not the easiest thing right now due to a dearth
of learning material, but that's not a problem with ranges.
Compiler error messages are indeed part of the problem, but they
are a WIP. 2.065 contains an incremental improvement to error
messages on failure of overload resolution (Thanks Kenji).
About Unicode, the unit that the language promotes and the
standard library embraces is `dchar`, the Unicode code point. The
choice of not using graphemes is a compromise between correctness
and performance. That means that the onus is still on the user to
cover the last mile of correctness, so the user is not exempt
from having to learn at least the basics of Unicode in order to
write Unicode-correct code in D. However, this is a surprisingly
reasonable compromise: as long as all inputs are normalized to
the same format (which may require std.uni.normalize if the
source of the input does not guarantee a particular format), then
outside of contrived examples it's very hard to break grapheme
clusters by using range-based code, even though they are ranges
of code points. Explicit handling of graphemes is typically only
needed for very specific domains, like if you're writing a text
rendering library or a text input box etc. Thus typical
range-based string manipulation tends to be correct even for
multi-code-point graphemes, without the author having to
consciously handle it.
2.065 has std.uni.byGrapheme/byCodePoint for range-based grapheme
manipulation. However, there is a performance cost involved so I
recommend against using it dogmatically. The result of
`byGrapheme` is not bidirectional yet - someone needs to take the
time to implement `decodeGraphemeBack` and/or
`graphemeStrideBack` first.
More information about the Digitalmars-d
mailing list