Why is BOM required to use unicode in tokens?

James Blachly james.blachly at gmail.com
Wed Sep 16 00:24:45 UTC 2020


On 9/15/20 8:10 PM, James Blachly wrote:
> Steve: It sounds as if the spec is correct but the glyph (codepoint?) 
> range is outdated. If this is the case, it would be a worthwhile update. 
> Do you really think it would be rejected out of hand?

OK interestingly this code point 0x2202 falls within the range 
"mathematical operators" [0] , and I could see why in general a range 
called "operators" (which includes e.g. set membership, relations, 
operators you would see in abstract algebra, etc.) however, the first 8 
codepoints in the range are "Miscellaneous mathematical symbols" and 
include several that would be appropriately included as/in token names.

Indeed, chapter 22, page 823 of the Unicode standard groups ∂ U+2202 
(the partial differential symbol in question) along with "Basic Set of 
Alphanumeric Characters" that includes Latin 0-9, [a-z,A-Z], uppercase 
greek A-Ω, nabla and variant theta, the lowercase Greek letters, and 
besides U+2202 ∂, six additional glyph variants.

Due to de-duplication of code points, some things that may rightly 
appear in multiple ranges (like U+2202 ∂) are deduplicated and that I 
think is the fate that befell this variant delta.



More information about the Digitalmars-d-learn mailing list