Is it possible to collect object usage information during compilation?

Sat Jan 10 12:53:46 PST 2015

On Saturday, 10 January 2015 at 18:31:18 UTC, Paolo Invernizzi 
wrote:
> On Saturday, 10 January 2015 at 17:31:42 UTC, DaveG wrote:
>> On Saturday, 10 January 2015 at 13:19:19 UTC, Martin Nowak 
>> wrote:
>>> Here is a sketch for an optimal solution. I'm actually 
>>> eagerly waiting that someone finally implements it.
>>>
>>> http://dpaste.dzfl.pl/cd375ac594cf
>>
>> I would also have to sell the idea of writing an ORM which is 
>> certainly not on the roadmap, but this will certainly help my 
>> argument.
>
> Maybe not, something simpler than a full ORM should be 
> compelling also.
>
> I guess you know about the ORM Vietnam [1], but also this [2] 
> can be of some help in selling a simple D solution.
>
> I would like to see, someday, something in D that:
>
>  - can check at compile time the syntax of SQL;
>  - can check at compile time the SQL query statement against 
> the current DB schema;
>  - can read the output of a DB schema dump at CT, and parse it 
> into what is needed for the previous points (more complicated);
>
> The first point should be easy today, the second and the last 
> one involve more work...
>
> [1] 
> http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
> [2] http://wozniak.ca/what-orms-have-taught-me-just-learn-sql
> ---
> Paolo

I have no intention of writing anything as massive as Entity 
Framework or Hibernate. We have been successful over the past 4 
years with just a small collection of functions to reduce some of 
the pain (and redundancy) in writing a lot of dynamic SQL. Now 
that we have an opportunity to start fresh we have a chance to do 
something better.

The traditional problems with ORMs in general are well known and 
these are the reasons why I have never used one in production.

1. Complexity. You basically need to learn an entire new language 
(sometimes literally). This is an investment which can be worth 
it if the abstraction is successful. The following problems are 
why I think the investment is not worth it.

2. Limitations. Unfortunately too often you need to drop in to 
SQL to really get things done. This alone is a non-starter. If I 
need to bypass the abstraction to do anything really interesting 
or complex, it has failed. Sometimes (usually) this is for 
performance, other times it's because there is simply no way (or 
it's too complicated) to express what I want through the 
abstraction.

3. Compilation/Translation. The time to translate commands to SQL 
(or whatever backend) can be a high price. Most ORMs do some type 
of caching now which is generally sufficient. In D most of the 
work can be done at compile time which is even better.

4. Unnecessary Data. Greedy data retrieval is way to common, the 
default is usually to get everything. For small queries and data 
sets you can write it off as "not a problem", but when your model 
gets large and interconnects, this can be catastrophic. Again, 
thanks Martin for the clever basis for a solution in D.

5. DB Performance. The efficiency of the SQL that is actually 
generated. People seem to focus on this because the generated SQL 
is generally quite verbose. Interestingly, in my experience, this 
is often the smallest performance problem because the query 
optimizer (at least in SQL Server with good indexes and 
statistics) will generate the same execution plan regardless. 
This is also a code gen problem that can be tweaked without 
breaking user code.

You may have noticed that 4 of 5 problems are about performance. 
That's because, at least in our case, it is that important and it 
is that much of a problem. Current ORMs often look great, but in 
my experience, the price is always to high. Some "micro-ORMs" 
avoid the performance problems, but they do so by sacrificing 
most of the features (you still have to write raw SQL for 
example). Some of the problems are inherit to solution and cannot 
be "solved", but they can be reduced.

For a long time I thought some of these problems where 
fundamental and had basically written off the concept of ORMs 
[see: Vietnam of Computer Science]. The good news is most of the 
problems appear to be solvable.
#1 is unavoidable obviously there will be something new (whether 
it's a DSL or just an API)
#2 is really dependent on the other problems and implementation.
#3 is "just" implementation.
#4 has a conceptual solution, now it's "just" implementation.
#5 does not have a solution because it will depend on the 
backend, but I think it's reasonable to expect a solution that 
works for almost all cases. It will be impossible to know without 
testing.

One final note. You may have noticed I didn't mention the schema 
syncing problem (keeping database and code in sync). There was a 
time I would have said that was essential and while it would be 
nice in a perfect world, I'm comfortable keeping them in sync 
manually (or semi-manual with scripts). I can generate a bunch of 
classes from an existing database fairly easily and when I change 
a table I can manually update a class. If I was writing SQL 
directly I would have to update my query, this is really no 
different. Doing validation in unit tests is perfectly acceptable 
to me.

Sorry for long post.
-Dave