Notice/Warning on narrowStrings .length

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Apr 27 14:56:39 PDT 2012


On Fri, Apr 27, 2012 at 12:20:13PM +0400, Dmitry Olshansky wrote:
> On 27.04.2012 1:23, H. S. Teoh wrote:
[...]
> >What we *really* should be doing, esp. for commonly-used functions
> >like computing various lengths, is to automatically process said
> >tables and encode the computation in finite-state machines that can
> >then be optimized at the FSM level (there are known algos for
> >generating optimal FSMs),
> 
> FSA are based on tables so it's all runs in the circle. Only the
> layout changes. Yet the speed gains of non-decoding are huge.

Yes, but hand-coded tables tend to go out of date, be prone to bugs, or
are missing optimizations done by an FSA generator (e.g. a lexer
generator). Collapsed FSA states, for example, can greatly reduce table
size and speed things up.


>  codegen'd, and then optimized again at the assembly level by the
> >compiler. These FSMs will operate at the native narrow string char
> >type level, so that there will be no need for explicit decoding.
> >
> >The generation algo can then be run just once per unicode release,
> >and everything will Just Work.
> >
> This year Unicode in D will receive a nice upgrade.
> http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002#
> 
> Anyway keep me posted if you have these FSA ever come to soil your
> sleep ;)
[...]

One area where autogenerated Unicode algos will be very useful is in
normalization. Unicode normalization is non-trivial, to say the least;
it involves looking up various character properties and performing
mappings between them in a specified order.

If we can encode this process as FSA, then we can let an automated FSA
optimizer produce code that maps directly between the (non-decoded!)
source string and the target (non-decoded!) normalized string. Similar
things can be done for string concatenation (which requires
arbitrarily-distant scanning in either direction from the joining point,
though in normal use cases the distance should be very short).


T

-- 
Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG


More information about the Digitalmars-d mailing list