dataframe implementations

Wed Nov 18 09:15:37 PST 2015

On Monday, 2 November 2015 at 13:54:09 UTC, Jay Norwood wrote:
> I was reading about the Julia dataframe implementation 
> yesterday, trying to understand their decisions and how D might 
> implement.
>
> From my notes,
> 1. they are currently using a dictionary of column vectors.
> 2. for NA (not available) they are currently using an array of 
> bytes, effectively as a Boolean flag, rather than a bitVector, 
> for performance reasons.
> 3. they are not currently implementing hierarchical headers.
> 4. they are transforming non-valid symbol header strings (read 
> from csv, for example) to valid symbols by replacing '.' with 
> underscore and prefixing numbers with 'x', as examples.  This 
> allows use in expressions.
> 5. Along with 4., they currently have @with for DataVector, to 
> allow expressions to use, for example, :symbol_name instead of 
> dv[:symbol_name].
> 6. They have operation symbols for per element operations on 
> two vectors, for example a ./ b expresses applying the 
> operation to the vector.
> 7. They currently only have row indexes,  no row names or 
> symbols.
>
> I saw someone posting that they were working on DataFrame 
> implementation here, but haven't been able to locate any code 
> in github, and was wondering what implementation decisions are 
> being made here.  Thanks.

What do you think about the use of NaN for missing floats?  In 
theory I could imagine wanting to distinguish between an NaN in 
the source file and a missing value, but in my world I never felt 
the need for this.  For integers and bools, that is different of 
course.