[GSoC Proposal] Statically Checked Measurement Units

Tue Mar 29 07:45:26 PDT 2011

On 03/29/2011 02:06 AM, Don wrote:
> Cristi Cobzarenco wrote:
>> First, let me apologize for this very late entry, it's the end of
>> university and it's been a very busy period, I hope you will still
>> consider it.
>>
>> Note this email is best read using a fixed font.
>>
>> PS: I'm really sorry if this is the wrong mailing list to post and I
>> hope you'll forgive me if that's the case.
>>
>> ======= Google Summer of Code Proposal: Statically Checked Units =======
>>
>>
>> Abstract
>> -------------
>>
>> Measurement units allow to statically check the correctness of
>> assignments and expressions at virtually no performance cost and very
>> little extra effort. When it comes to physics the advantages are
>> obvious – if you try to assign a force a variable measuring distance,
>> you've most certainly got a formula wrong somewhere along the way.
>> Also, showing a sensor measurement in gallons on a litre display that
>> keeps track of the remaining fuel of a plane (a big no-no) is easily
>> avoidable with this technique. What this translates is that one more
>> of the many hidden assumptions in source code is made visible: units
>> naturally complement other contract checking techniques, like
>> assertions, invariants and the like. After all the unit that a value
>> is measured in is part of the contract.
>
> This is one of those features that gets proposed frequently in multiple
> languages. It's a great example for metaprogramming. But, are there
> examples of this idea being seriously *used* in production code in ANY
> language?
> (For example, does anybody actually use Boost.Unit?)

At work we use C++ enums for categorical types to great effect. The way 
it works is:

enum UserId { min = 0, max = 1 << 31 };
enum AppId { min = 0, max = 1 << 31 };

then we express data in terms of UserID, AppId instead of an integral 
type, and we cast to it when we read it off the wire or the database. 
The beauty of it is that you can never pass by mistake an AppId instead 
of a UserId of vice versa, or even a raw int as one without explicitly 
stating intent.

It's saved us a lot of bugs (I know because I found some when converting 
raw ints to enums) and presumably potential bugs.

If we used quantities probably a similar benefit would emerge from using 
dimensional analysis. I know that in my machine learning code it's very 
difficult to spot bugs because "it's all numbers". If I used a sort of a 
double "enum" that could only be a probability, I'm sure I'd save myself 
a ton of bugs.

Andrei