dataframe implementations
Laeeth Isharc via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Nov 18 09:15:37 PST 2015
On Monday, 2 November 2015 at 13:54:09 UTC, Jay Norwood wrote:
> I was reading about the Julia dataframe implementation
> yesterday, trying to understand their decisions and how D might
> implement.
>
> From my notes,
> 1. they are currently using a dictionary of column vectors.
> 2. for NA (not available) they are currently using an array of
> bytes, effectively as a Boolean flag, rather than a bitVector,
> for performance reasons.
> 3. they are not currently implementing hierarchical headers.
> 4. they are transforming non-valid symbol header strings (read
> from csv, for example) to valid symbols by replacing '.' with
> underscore and prefixing numbers with 'x', as examples. This
> allows use in expressions.
> 5. Along with 4., they currently have @with for DataVector, to
> allow expressions to use, for example, :symbol_name instead of
> dv[:symbol_name].
> 6. They have operation symbols for per element operations on
> two vectors, for example a ./ b expresses applying the
> operation to the vector.
> 7. They currently only have row indexes, no row names or
> symbols.
>
> I saw someone posting that they were working on DataFrame
> implementation here, but haven't been able to locate any code
> in github, and was wondering what implementation decisions are
> being made here. Thanks.
What do you think about the use of NaN for missing floats? In
theory I could imagine wanting to distinguish between an NaN in
the source file and a missing value, but in my world I never felt
the need for this. For integers and bools, that is different of
course.
More information about the Digitalmars-d-learn
mailing list