[Issue 18241] New: Missing characters from std.uni.unicode.Default_Ignorable_Code_Point

d-bugmail at puremagic.com d-bugmail at puremagic.com
Mon Jan 15 23:50:32 UTC 2018


https://issues.dlang.org/show_bug.cgi?id=18241

          Issue ID: 18241
           Summary: Missing characters from
                    std.uni.unicode.Default_Ignorable_Code_Point
           Product: D
           Version: D2
          Hardware: x86_64
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: phobos
          Assignee: nobody at puremagic.com
          Reporter: hsteoh at quickfur.ath.cx

The set returned by unicode.Default_Ignorable_Code_Point is missing some
characters listed in:

    http://www.unicode.org/L2/L2002/02368-default-ignorable.pdf

where Default_Ignorable_Code_Point is defined as:

    Other_Default_Ignorable_Code_Point + (Cf + Cc + Cs - White_Space)

While characters in Other_Default_Ignorable_Code_Point seem to be included
correctly, two characters in Cf appear to be missing from the set:

- U+06DD
- U+070F

Furthermore, characters in (Cc - White_Space) are also missing:

- U+0000 to U+0008
- U+000E to U+001F


(See also: PR #5, referencing the Unicode Standard section 5.22.)


Not sure if this is because these missing characters were added in a later
Unicode standard than was originally implemented in std.uni.

--


More information about the Digitalmars-d-bugs mailing list