Fix Phobos dependencies on autodecoding

Walter Bright newshound2 at digitalmars.com
Fri Aug 16 20:44:20 UTC 2019


On 8/16/2019 2:27 AM, Patrick Schluter wrote:
> While the results are far from 
> perfect, they would be absolutely impossible if we used what you propose here.

Google translate can (and does) figure it out from the context, just like a 
human reader would.

Sentences written in mixed languages *are* written for human consumption. I have 
many books written that way. They are quite readable, and don't have any need to 
clue in the reader "the next word is in french/latin/greek/german".

And frankly, if data processing software is totally reliant on using the correct 
language-specific glyph, it will fail, because people will not type in the 
correct one, and visually they cannot proof it for correctness. Anything that 
does OCR is going to completely fail at this.

Robust data processing software is going to be forced to accept and allow for 
multiple encodings of the same glyph, pretty much rendering the semantic 
difference meaningless.

I bet in 10 or 20 years of being clobbered by experience you'll reluctantly 
agree with me that assigning semantics to individual code points was a mistake. :-)

BTW, I was a winner in the 1986 Obfuscated C Code Contest with:

-------------------------
#include <stdio.h>
#define O1O printf
#define OlO putchar
#define O10 exit
#define Ol0 strlen
#define QLQ fopen
#define OlQ fgetc
#define O1Q abs
#define QO0 for
typedef char lOL;

lOL*QI[] = {"Use:\012\011dump file\012","Unable to open file '\x25s'\012",
  "\012","   ",""};

main(I,Il)
lOL*Il[];
{	FILE *L;
	unsigned lO;
	int Q,OL[' '^'0'],llO = EOF,

	O=1,l=0,lll=O+O+O+l,OQ=056;
	lOL*llL="%2x ";
	(I != 1<<1&&(O1O(QI[0]),O10(1011-1010))),
	((L = QLQ(Il[O],"r"))==0&&(O1O(QI[O],Il[O]),O10(O)));
	lO = I-(O<<l<<O);
	while (L-l,1)
	{	QO0(Q = 0L;((Q &~(0x10-O))== l);
			OL[Q++] = OlQ(L));
		if (OL[0]==llO) break;
		O1O("\0454x: ",lO);
		if (I == (1<<1))
		{	QO0(Q=Ol0(QI[O<<O<<1]);Q<Ol0(QI[0]);
			Q++)O1O((OL[Q]!=llO)?llL:QI[lll],OL[Q]);/*"
			O10(QI[1O])*/
			O1O(QI[lll]);{}
		}
		QO0 (Q=0L;Q<1<<1<<1<<1<<1;Q+=Q<0100)
		{	(OL[Q]!=llO)? /* 0010 10lOQ 000LQL */
			((D(OL[Q])==0&&(*(OL+O1Q(Q-l))=OQ)),
			OlO(OL[Q])):
			OlO(1<<(1<<1<<1)<<1);
		}
		O1O(QI[01^10^9]);
		lO+=Q+0+l;}
	}
	D(l) { return l>=' '&&l<='\~';
}
-------------------------

http://www.formation.jussieu.fr/ars/2000-2001/C/cours/COMPLEMENTS/DOC/www.ioccc.org/years.html#1986_bright

I am indeed aware of the problems with confusing O0l1|. D does take steps to be 
more tolerant of bad fonts, such as 10l being allowed in C, but not D. I 
seriously considered banning the identifiers l and O. Perhaps I should have. | 
is not a problem because the grammar (i.e. the context) detects errors with it.


More information about the Digitalmars-d mailing list