unicode combinig mark/ std.uni question

ikod geller.garry at gmail.com
Tue Dec 5 20:04:29 UTC 2017


Hello,

I have to create very basic IDNA (Internationalized Domain Names 
in Applications) library. There are two parts in IDNA - user 
input checks and punycode encoding/decoding.

Punycode part already completed, and now I have to add some 
checks but I'm weak in unicode and cant find proper way to 
express these tests using std.uni.

Here are list of prohibited domain labels 
(https://tools.ietf.org/html/rfc5891):

    o  Labels whose first character is a combining mark (see The 
Unicode
       Standard, Section 2.11 [Unicode]).

    o  Labels containing prohibited code points, i.e., those that 
are
       assigned to the "DISALLOWED" category of the Tables document
       [RFC5892].

    o  Labels containing code points that are identified in the 
Tables
       document as "CONTEXTJ", i.e., requiring exceptional 
contextual
       rule processing on lookup, but that do not conform to those 
rules.
       Note that this implies that a rule must be defined, not 
null: a
       character that requires a contextual rule but for which the 
rule
       is null is treated in this step as having failed to conform 
to the
       rule.

    o  Labels containing code points that are identified in the 
Tables
       document as "CONTEXTO", but for which no such rule appears 
in the
       table of rules.  Applications resolving DNS names or 
carrying out
       equivalent operations are not required to test contextual 
rules
       for "CONTEXTO" characters, only to verify that a rule is 
defined
       (although they MAY make such tests to provide better 
protection or
       give better information to the user).

    o  Labels containing code points that are unassigned in the 
version
       of Unicode being used by the application, i.e., in the 
UNASSIGNED
       category of the Tables document.

Can anybody help with this task?

Thanks!



More information about the Digitalmars-d mailing list