Why UTF-8/16 character encodings?

Tue May 28 04:12:05 PDT 2013

On Monday, 27 May 2013 at 23:46:17 UTC, H. S. Teoh wrote:
> On Tue, May 28, 2013 at 01:28:22AM +0200, Hans W. Uhlig wrote:
>> On Monday, 27 May 2013 at 23:05:46 UTC, Walter Bright wrote:
>> >On 5/27/2013 3:18 PM, H. S. Teoh wrote:
>> >>Well, D *does* support non-English identifiers, y'know... for
>> >>example:
>> >>
>> >>	void main(string[] args) {
>> >>		int число = 1;
>> >>		foreach (и; 0..100)
>> >>			число += и;
>> >>		writeln(число);
>> >>	}
>> >>
>> >>Of course, whether that's a good practice is a different 
>> >>story.
>> >>:)
>> >
>> >I've recently come to the opinion that that's a bad idea, and 
>> >D
>> >should not support it.
>
> Currently, the above code snippet compiles (upon inserting 
> "import
> std.stdio;", that is). Should that be made illegal?
>
>
>> Why do you think its a bad idea? It makes it such that code 
>> can be
>> in various languages? Just lack of keyboard support?
>
> I can't speak for Walter, but one issue that comes to mind is 
> when
> someone reads the code and doesn't understand the language the
> identifiers are in, or worse, can't reliably recognize the 
> distinctions
> between the glyphs, and so can't match identifier names 
> correctly -- if
> you don't know Japanese, for example, seeing a bunch of Japanese
> identifiers of equal length will look more-or-less the same (all
> gibberish to you), so it only obscures the code. Or if your 
> computer
> doesn't have the requisite fonts to display the alphabet in 
> question,
> then you'll just see a bunch of ?'s or black blotches for all 
> program
> identifiers, making the code completely unreadable.
>
> Since language keywords are already in English, we might as well
> standardize on English identifiers too. (After all, Phobos 
> identifiers
> are English as well.) While it's cool to have multilingual 
> identifiers,
> I'm not sure if it actually adds any practical value. :) If 
> anything, it
> arguably detracts from usability. Multilingual program output, 
> of
> course, is a different kettle o' fish.
>
>
> T

I can tell you for a fact there are a tons of *private* companies 
that create closed source programs, whose source code is *not* 
English. And from *their* business perspective, it makes sense. 
They don't care if you can't understand their source code, since 
*you* will never see their source code. I'm quite confident there 
are tons of programs that you use that *aren't* written in 
English.

My wifes writes the embedded soft for hardware her company sells. 
I can tell you the source code sure as hell isn't in English. Why 
would it? The entire company speaks the local language natively. 
I've worked in Japan, and I can tell you the norm over there is 
*not* to code in English.

And why should it? Why would you code in a language that is not 
your own, if you don't plan to ever share your code to outside 
your team? Why would you care about users that don't have unicode 
support, if the workstations of all your employees is unicode 
compatible?

Allowing unicode identifiers makes their work a better 
experience. Why should we take that away from them?

There are advantages and disadvantages to non-ASCII identifiers, 
but whether or not you should be able to use them should belong 
in a coding standard, not in a compiler limitation.