[std.database]

Sat Oct 8 17:25:35 PDT 2011

Jonathan M Davis wrote:

> On Saturday, October 08, 2011 12:00:37 Andrei Alexandrescu wrote:
>> 1. If we build a D wrapper for ODBC, then we allow people to write code
>> for any database that has an ODBC driver. This, assuming we commit to
>> ODBC as D's standard database interface, would complete the project.
>> 
>> 2. If we want to go the route of "one std.database API with drivers for
>> each DBMS" and consider ODBC one of several DBMSs, then we need to
>> define our own driver architecture, write a few drivers ourselves
>> (including probably ODBC), and hope that people will add more drivers.
>> That's a larger project but it unties us from ODBC.
>> 
>> 3. If we want to go the route of "similar but not identical specialized
>> APIs per database system" and consider ODBC again only one of the
>> database systems, then we need to define one specialized API per DBMS
>> and force users to essentially choose upfront what DBMS they'll use, and
>> code for it. It's an even larger project and I don't see obvious
>> advantages for it than less of a need for upfront design.
> 
> I definitely vote for #2 or #3. One of our projects at work uses it
> (though not one that I personally work on), and I've never heard good
> things about it. Supporting it makes good sense, but I wouldn't want us to
> design Phobos' database solution around it.
> 
> We should probably explore #2 first, and then if it doesn't work well
> enough, then at least we have a solid base for the API's being similar
> with #3. However, given what I've heard about ODBC and its attempts to
> unify databases, I'm skeptical of how well we'll be able to have a unified
> DBMS API without harming performance. And from what I understand, it's
> pretty rare to change DBMSes for a project. You just pick one and use it.
> And then in the rare case where you have to change, you do the work to do
> it (and as long as the DB is appropriately modularized with regards to the
> rest of the program, it doesn't have a hugely negative affect on the rest
> of the program). So, I question that
> #2 gains us a whole lot over #3 ultimately (_especially_ if it ends up
> #costing
> much of anything in terms of performance or usability), but I'm not a DB
> expert, and I do think that it's at least worth exploring #2 first - if
> nothing else because it could lead to the APIs for #3 being better unified
> without harming their performance or usability.
> 
> - Jonathan M Davis
I agree that once a project picks its database it rarely changes but I find 
most people tend to use common functionality on the application code side 
and it's the sql which tends to get optimised a lot (there are some traps in 
the application code that potentially reduce performance like impropper use 
of mysql prepared statements). I think with #2 we should be able to hit the 
main db requirements so people can atleast develop database connected 
applications. The drivers would give us the flexibility to handle more 
databases at a basic level (at least an odbc driver will let us achieve this 
quickly until someone wants to provide D driver implementation of a db if it 
provides advantage of some kind).

That being said how do people tend to handle the SQL itself? Unless you are 
using a library to create the SQL for you, like hibernate, does anyone find 
you tend to run into SQL syntax variation across different databases? For 
example MySQL and MSSQL handle results pagination differently 
(http://forums.asp.net/p/1427310/3186427.aspx#3186427). I know there is ANSI 
SQL but my understanding is most databases don't implement it well (not all 
of it or poor performance etc). Seems to me even if you used a common API 
you are going to need to abstract the different types of databases some how 
just because of the SQL itself.

Even if #2 is completed #3 could still be worked on, if someone is willing. 
A common API is likely to miss some features from the specific APIs (I've 
only looked into mysql so this is an assumption). The implmentations from #3 
could then be used to implement drivers for #2 allowing those who want 
flexibility to use #2 but those with more specific requirements, and 
bothered about performance, can use #3. That is more work than #2 or #3 
separately but like I said "if someone is willing".

My plan for my personal implementation was to develop specific APIs (#3) for 
MySql and Postgresql, trying to keep them similar where possible, then look 
into providing a wrapper api for common/basic functionality (#2). That 
project was as much about getting to grips with D as a project I intend to 
use, which is why I chose the long way round to implement it.