Today's programming challenge - How's your Range-Fu ?

Walter Bright via Digitalmars-d digitalmars-d at puremagic.com
Sat Apr 18 01:18:51 PDT 2015


On 4/18/2015 12:58 AM, John Colvin wrote:
> On Friday, 17 April 2015 at 18:41:59 UTC, Walter Bright wrote:
>> On 4/17/2015 9:59 AM, H. S. Teoh via Digitalmars-d wrote:
>>> So either you have to throw out all pretenses of Unicode-correctness and
>>> just stick with ASCII-style per-character line-wrapping, or you have to
>>> live with byGrapheme with all the complexity that it entails. The former
>>> is quite easy to write -- I could throw it together in a couple o' hours
>>> max, but the latter is a pretty big project (cf. Unicode line-breaking
>>> algorithm, which is one of the TR's).
>>
>> It'd be good enough to duplicate the existing behavior, which is to treat
>> decoded unicode characters as one column.
>
> Code points aren't equivalent to characters. They're not the same thing in most
> European languages,

I know a bit of German, for what characters is that not true?

> never mind the rest of the world. If we have a line-wrapping
> algorithm in phobos that works by code points, it needs a large "THIS IS ONLY
> FOR SIMPLE ENGLISH TEXT" warning.
>
> Code points are a useful chunk size for some tasjs and completely insufficient
> for others.

The first order of business is making wrap() work with ranges, and otherwise 
work the same as it always has (it's one of the oldest Phobos functions).

There are different standard levels of Unicode support. The lowest level is 
working correctly with code points, which is what wrap() does. Going to a higher 
level of support comes after range support.

I know little about combining characters. You obviously know much more, do you 
want to take charge of this function?


More information about the Digitalmars-d mailing list