Proposal: Database Engine for D

Chris Wright via Digitalmars-d digitalmars-d at puremagic.com
Sat Jan 2 12:47:37 PST 2016


On Sat, 02 Jan 2016 16:40:16 +0000, Piotrek wrote:

> On Friday, 1 January 2016 at 10:00:43 UTC, Kapps wrote:
> 
>> This example shows the difficulty of doing this in D. You can't really
>> have something like `p.Name == "James"`, or `p.Age < 21`
>> translate to SQL properly without language changes, which I believe
>> Walter or Andrei were against. This has been the key problem when
>> things like Linq to Sql for D have been brought up before.
> 
> Not really. There is no translation stage to SQL or any other DSL in the
> proposal. So this problem doesn't exist and no language changes are
> needed.

So you want to create the following query:

  people.filter!(x => x.surname == "Slughorn");

And you've got ten million people in the collection, and you want this 
query to finish soonish. So you need to use an index. But a full index 
scan isn't so great; you want to do an index lookup if possible.

That's simple enough; we generate proxy types to record what properties 
you're using and what operations you're performing. PersonProxy records 
that you're accessing a field 'surname', gives a StringFieldProxy, and 
that records that you're checking for equality with the string "Slughorn". 
The lambda returns true when opEquals returns true.

But people write queries that are more complex than that, like:

  people.filter!(x => x.surname == "Slughorn" || x.age <= 17);

First time you run this, x.surname.opEquals("Slughorn") returns true and 
the expression as a whole returns true. You missed the second part of the 
expression. That's bad.

So we need to evaluate this lambda twice per parameter. (Actually, thanks 
to opCmp, sometimes you'll have to evaluate it three times.) We use that 
to build up a giant truth table, and then we can execute your query.

And that "twice per parameter" thing is exponential, and we build up a 
truth table that's exponentially large with respect to the complexity of 
the query. Some queries I've written for production systems would take a 
week for this system to prepare to execute and require a petabyte of 
storage space.

This is, shall we say, less than ideal.

You might be able to support queries as large as ten comparisons in a 
reasonable timeframe. But for all but the most trivial queries, it'll be 
faster to use SQL.

Fortunately, trivial queries are common. You could write this system and 
have it work up to, say, five comparisons, and when your query exceeds 
that limit, it will throw an exception asking you to rewrite the query in 
SQL.

It wouldn't be able to tell you what the equivalent query is, however. 
It's using a truth table and doesn't have access to what you wrote. I 
mean, sure, it could try to work backwards from the truth table, but 
that's rather expensive.


More information about the Digitalmars-d mailing list