Databases and the D Standard Library

Sat Dec 31 19:24:31 PST 2016

Hi Everyone,

I've seen a lot of talk on the forums over the past year about the need 
for database support in the D Standard Library and I completely agree. 
At the end of the day the purpose of any programming language and its 
attendant libraries is to allow the developer to solve their problems 
quickly and efficiently; and a large subset of those solutions require 
some form of structured data store. To my mind, this makes some form of 
interface(s) to a data-store an essential component of the D Standard 
Library. And since this is something that my particular problem spaces 
also need, I thought it would be useful to attempt to do something about it.

First, I've seen a couple of promising projects, the most complete, and 
recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) 
hasn't seen a commit since June. An additional setback came when I tried 
to use it and was greeted with a litany of compiler errors.

This is *not* a problem, it's the natural course of a volunteer 
community such as ours; and I want to thank Erik Smith profusely for his 
work. Priorities and circumstances change and that means that valuable 
projects are inexplicably dropped.

But we still lack a critical component, and to get the conversation 
started, I'd like to break down the issues I've seen brought up in past 
threads on this subject and encourage you to bring your own. I may have 
ideas, but I can't possibly know the entire problem space.

1. Isn't this an enormous amount of work?

My answer: Absolutely, depending on your preferred scope of work.

In general, I've seen two distinct camps on this issue. One says that we 
should implement everything in D from the ground up, including 
re-implementation of the database drivers themselves in D. If this is 
your preferred scope of work then you will invariably become 
disheartened at the truly stupendous amount of work you face and give up.

The other camp says that we should make use of existing drivers and 
include them in the D Standard Library. This is difficult path to follow 
as the vanilla build of the D Standard Library now requires a 
significant number of foreign libraries, all with differing licenses, be 
built and distributed to everyone; regardless of whether or not they use 
them in their project. This is more-or-less than path the dstddb is/was on.

My idea: Focus on defining the interface, not the individual driver 
implementations.

If instead we focused on defining an interface that a "conforming 
implementation" had to follow, we would allow developers to only pull in 
the library they need or build a from-scratch library if they so desire.

Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow 
and I think we would be well advised to follow their lead here. Not only 
is the methodology battle-proven, it is also well understood by a 
significant portion of D's potential user-base. By way of example, 
Npgsql is the ADO.NET implementation of a driver for PostgreSQL.

2. There are so many different types of data storage systems, how do you 
design a system generic enough for all of them?

My answer: You don't. Nobody else has bothered trying, and I believe 
that our worry over that question is a large part of why we don't have 
anything substantive today.

My idea: Split the data storage systems out by category of data-store.
For example:
	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
	- Document: std.database.document (Mongo, CouchDB, etc.)
	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

If you want something that doesn't fit into a category above, you're own 
your own, but you were also on your own in other languages.

3. We need to provide a single interface for all data-stores in the 
SQL/Document/Key-Value category.

My answer: Are you sure? The problem is that each underlying data-store 
has it's own dialect. For example, PostgreSQL and MSSQL are both 
ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, 
even in the same category, is never going to be as simple as changing a 
connection string. And additionally, you will have to implement a 
super-set of features in the interface to support all the variations and 
throw exceptions where the chosen implementation does not support a 
specific feature.

My idea: Each data store has it's own implementation with it's own 
naming convention. For example (ADO.NET):
	- SqlConnection (MSSQL)
	- NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch 
data-stores, but since you are already changing your queries, which is a 
much more difficult change, this isn't a significant additional cost. 
Also, the code becomes clearer to those who take over maintenance duties 
from the original author, especially when you are mixing data-stores. 
But in all honest, most developers will pick on technology and stick 
with it for the entirety of the software's lifespan.

4. We should hide querying from the developer because they are bad at 
it, security flaws, etc.

My answer: While agree in principal, especially with the security 
concerns, in reality what you are asking for is an ORM. In my opinion, 
that is a separate concern from a database interface, and is typically 
implemented as layer over the DB interface.

My idea: Don't do it. Save it for a different project.

5. D has so many useful features for data access, we should use as many 
as possible!

My answer: D absolutely has many useful features for data access and 
manipulation. But that doesn't mean that a good interface has to use any 
of them. The first job of a Database Interface, and indeed any library, 
is to get the job done with a minimum of overhead. Let's worry about 
that before going crazy adding in all the D goodness. Ranges have been a 
particular target for abuse here, and while I love ranges, I think the 
mechanics of data-store manipulation don't lend themselves well to 
working with ranges. I'd love to hear your ideas on this though.

My idea: Focus on a more conservative implementation in the style of JDO 
or ADO.NET. This will allow us to ship something that works in a 
reasonable time frame. I'm not saying that we can't use any of D's 
unique talents, but using those talents should be subordinate to 
designing an interface that works efficiently.

That is all I have for now. I am looking forward to hear your thoughts 
on this topic! Until then, I am going to go close out 2016 (PST) with 
family and friends and I wish you all a Happy New Year!

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;