Why UTF-8/16 character encodings?
Torje Digernes
torjehoa at pvv.org
Wed May 29 03:10:52 PDT 2013
On Tuesday, 28 May 2013 at 01:17:37 UTC, H. S. Teoh wrote:
> On Tue, May 28, 2013 at 02:54:30AM +0200, Torje Digernes wrote:
>> On Tuesday, 28 May 2013 at 00:34:20 UTC, Manu wrote:
>> >On 28 May 2013 09:05, Walter Bright
>> ><newshound2 at digitalmars.com>
>> >wrote:
>> >
>> >>On 5/27/2013 3:18 PM, H. S. Teoh wrote:
>> >>
>> >>>Well, D *does* support non-English identifiers, y'know...
>> >>>for
>> >>>example:
>> >>>
>> >>> void main(string[] args) {
>> >>> int число = 1;
>> >>> foreach (и; 0..100)
>> >>> число += и;
>> >>> writeln(число);
>> >>> }
>> >>>
>> >>>Of course, whether that's a good practice is a different
>> >>>story. :)
>> >>>
>> >>
>> >>I've recently come to the opinion that that's a bad idea,
>> >>and D
>> >>should not
>> >>support it.
>> >>
>> >
>> >Why? You said previously that you'd love to support extended
>> >operators ;)
>>
>> I find features such as support for uncommon symbols in
>> variables a
>> strength as it makes some physics formulas a bit easier to
>> read in
>> code form, which in my opinion is a good thing.
>
> I think there's a difference between allowing math symbols
> (which
> includes things like (a subset of) Greek letters that
> mathematicians
> love) in identifiers, and allowing full Unicode. What if you're
> assigned
> to maintain code containing identifiers that has letters that
> don't
> appear in any of your installed fonts?
>
> I think it's OK to allow math symbols, but allowing the entire
> set of
> Unicode characters is going a bit too far, IMO. For one thing,
> if some
> code has identifiers written in Arabic, I wouldn't be able to
> understand
> the code, simply because I'd have a hard time telling different
> identifiers apart. Besides, if the rest of the language
> (keywords,
> Phobos, etc.) are in English, then I don't see any compelling
> reason to
> use a different language in identifiers, other than to submit
> IODCC
> entries. :-P
>
> C doesn't support Unicode identifiers, for one thing, but I've
> seen
> working C code written by people who barely understand any
> English -- it
> didn't stop them at all. (The comments were of course in their
> native
> language -- the compiler ignores everything inside anyway so
> 8-bit
> native encodings or even UTF-8 can be sneaked in without
> provoking
> compiler errors.)
>
>
> T
I think there is very little difference, both cases are
artificially limiting the allowable symbols. Other symbols
relevant in other fields which does not happen to use Greek
symbols primarily, are they to be treated differently?
What you propose is a built in code standard for D, based on your
feelings on a topic.
If what you fear is that unicode will suddenly make cooperation
impossible I doubt you are right, after all there is all kind of
ways to make terrible variable names (q,w,e,r ... qq,qw). If any
such identifiers show up in a project I assume they are cleaned
up, why wouldn't the same happen to unicode if they are causing
problems? Think about it, it should happen even faster because
the symbol might not be accessible for everyone, where a
single/double letter gibberish one is perfectly reproducible and
might grow into the project confusing every new reader. Are you
going to argue for disallowing variables that are not a compound
word or a dictionary word in English?
More information about the Digitalmars-d
mailing list