Notice/Warning on narrowStrings .length

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Apr 26 20:54:51 PDT 2012


On Thu, Apr 26, 2012 at 09:55:54PM -0400, Nick Sabalausky wrote:
[...]
> Crazy stuff! Some of them look rather similar to Arabic or Korean's
> Hangul (sp?), at least to my untrained eye. And then others are just
> *really* interesting-looking, like:
> 
> http://www.omniglot.com/writing/12480.htm
> http://www.omniglot.com/writing/ayeri.htm
> http://www.omniglot.com/writing/oxidilogi.htm
> 
> You're right though, if I were in charge of Unicode and tasked with
> handling some of those, I think I'd just say "Screw it. Unicode is now
> depricated.  Use ASCII instead. Doesn't have the characters for your
> langauge? Tough! Fix your language!" :)

You think that's crazy, huh? Check this out:

	http://www.omniglot.com/writing/sumerian.htm

Now take a deep breath...

... this writing was *actually used* in ancient times. Yeah.

Which means it probably has a Unicode block assigned to it, right now.
:-)


> > When I get the time? Hah... I really need to get my lazy bum back to
> > working on the new AA implementation first. I think that would
> > contribute greater value than optimizing Unicode algorithms. :-) I
> > was hoping *somebody* would be inspired by my idea and run with
> > it...
> >
> 
> Heh, yea. It is a tempting project, but my plate's overflowing too.
> (Now if only I could make the same happen to bank account...!)
[...]

On the other hand though, sometimes it's refreshing to take a break from
"serious" low-level core language D code, and just write plain ole
normal boring application code in D. It's good to be reminded just how
easy and pleasant it is to write application code in D.

For example, just today I was playing around with a regex-based version
of formattedRead: you pass in a regex and a bunch of pointers, and the
function uses compile-time introspection to convert regex matches into
the correct value types. So you could call it like this:

	int year;
	string month;
	int day;
	regexRead(input, `(\d{4})\s+(\w+)\s+(\d{2})`, &year, &month, &day);

Basically, each pair of parentheses corresponds with a pointer argument;
non-capturing parentheses (?:) can be used for grouping without
assigning to an item.

Its current implementation is still kinda crude, but it does support
assigning to user-defined types if you define a fromString() method that
does the requisite conversion from the matching substring.

The next step is to standardize on enums in user-defined types that
specify a regex substring to be used for matching items of that type, so
that the caller doesn't have to know what kind of string pattern is
expected by fromString(). I envision something like this:

	struct MyDate {
		enum stdFmt = `(\d{4}-\d{2}-\d{2})`;
		enum americanFmt = `(\d{2}-\d{2}-\d{4})`;
		static MyDate fromString(Char)(Char[] value) { ... }
	}
	...
	string label1, label2;
	MyDate dt1, dt2;
	regexRead(input, `\s+(\w+)\s*=\s*`~MyDate.stdFmt~`\s*$`,
			&label1, &dt1);
	regexRead(input, `\s+(\w+)\s*=\s*`~MyDate.americanFmt~`\s*$`,
			&label2, &dt2);

So the user can specify, in the regex, which date format to use in
parsing the dates.

I think this is a vast improvement over the current straitjacketed
formattedRead. ;-) And it's so much fun to code (and use).


T

-- 
Let X be the set not defined by this sentence...


More information about the Digitalmars-d mailing list