String implementations

Sat Jan 19 23:35:21 PST 2008

Janice Caron:
> Also, isn't perl an interpreted language? You can get away with a lot
> more in an interpreted language, but you pay the price in speed.

I'm not a Perl expert, and I don't know how well Perl manages Unicode (maybe Python manages Unicode better than Perl), but Perl was designed to process text, so if you process strings you will find that Perl is pretty *fast*, it's easy to write Perl programs that process text faster (and in a more flexible way) than C++ ones... (Note that Python 3.0 will manage unicode strings as default).

For example if you use Python dicts (AAs) with strings they seem faster than current DMD AAs, and probably that's true for Perl ones too. This was a tiny example:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=57986

Perl and Python have GC that is well refined, so it may be faster than the current DMD GC if you manage lot of strings, this was an example where D was slower than Py too:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=62369

With Python you can also use Psyco, that's a JIT, to speed it up, etc. Psyco uses tricks to avoid actually copying strings and string slices in most cases, because Python strings are immutables (Python copies them when you perform a slice), like D too does.

REs in current DMD are *way* slower than Perl/Python/Tcl ones, etc. Some time ago I have found a situation where the RE sub() of D looks O(n^2):
http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=dlang&id=4

String methods of Python are written in a really refined C, like this one:
http://effbot.org/zone/stringlib.htm
And they are usually faster than the not-refined versions you can find in the current Phobos. I have implemented and I use a fastJoin an xsplit, etc, faster then the Phobos ones.

The built-in sort of Python is the Timsort, that's way faster than the D built-in (I have written a rather simple sort that is up to 3 times faster than the built in in D, and it's always faster no matter what data I use).

Now and then the text I/O on disk of the current DMD is slower than Python, this comes from some of my benchmarks.

I know all those parts of DMD can be improved later. When you create a new language you can't (and you don't want to) optimize every little bit (because it may be premature optimization), optimizazion must come later, so I understand Walter in this regard. But all this is just to show you that if today you have to process lot of text in a very flexible way it's not easy to beat the languages like Perl (but Python/Ruby/Tcl too. Ruby is less good than Python for Unicode texts, I think) designed for it.

If you take a look near the bottom of this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/0b3ded6d0f494d06/0068cb1406ab9e4c
you can see that I'd like to use D to speed up some text-processing-related bioinformatics scripts of mine, but often I find that the Python programs are faster for that purpose ;-)

Bye,
bearophile