Why is BOM required to use unicode in tokens?
james.blachly at gmail.com
Wed Sep 16 00:24:45 UTC 2020
On 9/15/20 8:10 PM, James Blachly wrote:
> Steve: It sounds as if the spec is correct but the glyph (codepoint?)
> range is outdated. If this is the case, it would be a worthwhile update.
> Do you really think it would be rejected out of hand?
OK interestingly this code point 0x2202 falls within the range
"mathematical operators"  , and I could see why in general a range
called "operators" (which includes e.g. set membership, relations,
operators you would see in abstract algebra, etc.) however, the first 8
codepoints in the range are "Miscellaneous mathematical symbols" and
include several that would be appropriately included as/in token names.
Indeed, chapter 22, page 823 of the Unicode standard groups ∂ U+2202
(the partial differential symbol in question) along with "Basic Set of
Alphanumeric Characters" that includes Latin 0-9, [a-z,A-Z], uppercase
greek A-Ω, nabla and variant theta, the lowercase Greek letters, and
besides U+2202 ∂, six additional glyph variants.
Due to de-duplication of code points, some things that may rightly
appear in multiple ranges (like U+2202 ∂) are deduplicated and that I
think is the fate that befell this variant delta.
More information about the Digitalmars-d-learn