Interesting performance data-point
Don Allen
donaldcallen at gmail.com
Tue Dec 31 15:36:31 UTC 2024
As I've mentioned in previous messages, I've ported my personal
finance package from C to D, having first ported some of it to
Rust until I just couldn't stand it anymore.
One of the utilities that exists in both the D and Rust versions
reads .csv files downloaded from American Express and loads the
transactions into the Sqlite database that contains my financial
data, trying to assign an expense account to each incoming
transaction by fuzzy-comparing the transaction's description to
existing transactions, using an algorithm based on Levenshtein
distance. The Levenshtein calculation is done using a
user-defined Sqlite function that is loaded as an extension.
What I've found is that the D version of this utility is about
twice as fast (compiled with DMD) as the Rust version to get
identical results. While I haven't done detailed enough
measurements to explain the performance disparity with certainty,
I've done enough to know that both versions spend most of their
time in the Levenshtein distance function.
But I have a theory that I think is the likely explanation. And
if I'm correct, it highlights one of D's strongest points -- the
ability to call C libraries directly, without the need for an
elaborate interface layer.
What I think is going on is that rusqlite, the crate that is
Rust's primary Sqlite interface package, does not provide a way
to step through the results of a select query, as the Sqlite
library itself does, stopping when you are happy. Instead, you
run the 'query' method (or one of its variants) on a prepared
statement, which either returns an iterator for you to access all
the returned rows or calls a closure to process each row. This
difference matters when each row involves an expensive
calculation.
In my case, I want the most recent transaction that meets the
Levenshtein distance criterion, which will be the first row in
the result set, since I order them by post-date descending. In D,
I am able to step the match query and either I get a row or I
don't. If I do, I stop, use that transaction's expense account
and I'm done. The entire result set is not computed. In Rust,
rusqlite computes the entire result set, which is expensive due
to the Levenshtein calculation, and then hands it to me row by
row.
It is not a simple matter to convince Sqlite to restrict the
result set to the most recent row. 'limit 1' makes no difference
in the Rust application's performance (I tried it). Apparently
Sqlite applies 'limit' after computing the result set. There
*may* be a way to do this using Sqlite's windowing capability,
but that's a bit of a research project that I have no inclination
to take on.
I have also not found a Rust crate that provides step-level
control over Sqlite *and* lets you load extensions.
I think this illustrates a strength of D that I don't think
enough people understand -- the ability to talk directly and
easily to the C world. People complain that D doesn't have a rich
set of libraries. It doesn't need one; all the C libraries are
almost as easily accessible from D as they are from C or C++. And
this has gotten even easier with the advent of ImportC, which I
think is a very important addition to D and worth continued
development to hide the craziness in C header files.
In my case, in D, I can use a straight-forward query and have the
same simple interaction with Sqlite that I would have in C. There
may be a way to match D's performance in this case with Rust, but
it would require effort, perhaps a lot. This is typical of the
Rust experience compared to D. Things are just more difficult,
mainly because the user plays a bigger role in memory management
in Rust than in languages, like D, that provide a GC (I simply do
not understand the anti-GC religious fanatics, especially when we
are talking about ordinary applications on today's multi-ghz
hardware with huge amounts of memory). D's performance is
comparable (except in the case of the AMEX utility, where it is a
lot better) and the code is more readable. Unfortunately, people
jump on band-wagons mindlessly.
More information about the Digitalmars-d
mailing list