Database interface design - was how to build up the library.

Sean Kelly sean at invisibleduck.org
Fri Oct 7 11:23:55 PDT 2011


On Oct 7, 2011, at 8:46 AM, Andrei Alexandrescu wrote:

> On 10/7/11 9:11 AM, Steve Teale wrote:
>> I'm thinking that for each database that gets covered there will
>> need to be two modules, like:
>> 
>> etc.c.mysql
>> etc.mysqld
>> 
>> The etc.c.xxx modules would be completely different between databases -
>> just translations of the necessary header files. The xxxd modules
>> would be as similar as possible consistent with allowing the strong
>> points of individual database systems to show through, and the weak
>> points of others to be avoided. I don't think all should be reduced
>> to some LCD.
> 
> Well we may be hasty to go that way. The driver-based approach works well for other languages and APIs - why wouldn't it work for us? The differences across different DBMSs would be mostly in the SQL support, not the basic APIs.

It may be worth targeting ODBC early on as well, since that would get the API working for pretty much everything.  Then targeting a DBMS-specific API would be an optimization step rather than a necessary one.


>> These modules should attempt to make a good range of capabilities
>> available to the D programmer, but they not have to be all encompassing.
>> Those users who want to do really fancy things can drop back to the
>> low-level interface. They should probably have the following capabilities:
>> 
>> 1) Establishing and closing database connections.
> 
> Sounds good. Since DB connections are a precious resource, there must be a RAII struct holding them. The functions below may be members of it.

The way I've done this before is via reference counting, since the connection handle often has to be held by all related statements, resultsets, etc.  Would scope(exit) be sufficient to provide RAII functionality here?


>> 3) The capability to create prepared statements with in and out
>> parameters and association of the parameters with a source, and
>> then to execute these. This breaks down into several components/
>> capabilities, which could be labeled:
>> 
>> 3a) createPreparedStatement() - marshal parameters, associate them
>> with a sourceand have the server prepare the statement.
>> 
>> 3b) execStatement() - for those SQL statements that don't have a
>> result set.
>> 
>> 3c) execStatementIncremental()/getNext() - execute the prepared statement,
>> then fetch the result rows one at a time into some kind of result set.
> 
> Here's where the range interface might be used. We might simply have execStatement() that returns an input range. If the statement produced no rows, the range will be empty. No need for distinguishing execStatement() and execStatementIncremental().

Absolutely, though things may be a bit tricky here.  It's possible for a SQL query to return multiple resultsets, each of which is ideally represented by a range.


>> 3d) execStatementAll() - execute the prepared statement and get all
>> the resulting rows into some kind of result set.
> 
> This is not a primitive, but instead a convenience function that should be used with caution and only for queries known to be small. I agree that it's good to have.
> 
>> 3e) (maybe) execScalar() - do the whole sequence prepare, execute,
>> and get a single value result set placed into a D variable.
> 
> That's convenient for e.g. "SELECT COUNT(*) FROM table WHERE condition", which people run all the time.
> 
>> 3f) (maybe) execStoredProcedure() - another 'do the whole thing'
>> capability TBD.
> 
> Well wouldn't that just execute a special SQL a la "CALL procedure"?

A stored procedure can have a return value and multiple resultsets, I believe.  Certainly one or the other anyway.  I think SPs are covered by the other query methods, though perhaps someone can think of a QOI feature that would benefit from a special call.


>> It is when we come to the nature of the result sets that there is
>> likely to be dissent. I favor arrays of structs, but we should
>> probably do arrays of arrays of variants too for those situations
>> where structures can't be sorted out at compile time. There needs
>> to be some symmetry between what is used here, and what can be used
>> as input to operations such as a prepared insert. It is of course
>> vital that this part of each middle layer produce exactly the same
>> kind of results. Otherwise the top layer could become difficult.
> 
> I'd like arrays of structs too but we must acknowledge that most databases people use will be large, in which case loading the whole thing eagerly takes a lot of RAM and potentially wastes time. So arrays are out except for e.g. convenience functions - but then we already have array() that converts an arbitrary input range into an array.
> 
> Now, regarding the "structs" part, I'd like that too when the schema is statically known. Two issues:
> 
> 1. MFC had at a point a wizard that generated one struct per resultset. It was an absolute maintenance disaster and they recanted by offering dynamically-bound result sets. The lesson there is that defining a struct for each query won't likely play out, so we better use Tuple!(ColType1, ColType2, ...). A possible API would be:
> 
> auto db = std.database.connect("cdb.mysql");
> auto rows = db.sql!(double, ulong)
>    ("SELECT SUM(x), COUNT(x) FROM table");
> // We know that rows is a range of Tuple!(double, ulong).
> writeln("Sum is ", rows.front[0], " count is ", rows.front[1]);
> 
> Cool beans. I'd love to use such an API!

It's only necessary to statically specify the parameter types if static checking of arguments is desired.  Otherwise it can all work implicitly.  For example, here's how you'd call a SQL API I wrote, which works against ODBC (it's a quick and dirty port of a C++ API I wrote in the mid 90s… not really a candidate for standardization):

        auto conn = new Connection( "driver={SQL Server};"
                                    "server=(local);"
                                    "trusted_connection=no;"
                                    "database=MyDB;"
                                    "uid=sa;"
                                    "pwd=;" );
                                    //"network=dbmssocn;" );
        auto stmt = conn.prepare( "SELECT Name FROM Person WHERE PersonID = ?" );
        // the prepare statement learns the type of each parameter from the SQL server
        stmt[0] = 1; // will throw if the wrong type is supplied
        auto rs = stmt.open();
        printf( "%.*s\n----\n", rs[0].name );
        while( rs.next() )
            printf( "%.*s\n", rs[0].asUtf8 );


> 2. Statically-bound tuples work only when we know the query beforehand. We need to cater for resultsets of unknown column count and types. The question here is whether we traffic in untyped memory a la ubyte[] or some variant type. I favor Variant because it's efficient, accommodates any SQL type easily, and is convenient to use.

Ideally, both should be available, though if one must be chosen then I vote for Variant.


More information about the Digitalmars-d mailing list