dataframe implementations

Laeeth Isharc via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon Nov 2 07:33:32 PST 2015


On Monday, 2 November 2015 at 13:54:09 UTC, Jay Norwood wrote:
> I was reading about the Julia dataframe implementation 
> yesterday, trying to understand their decisions and how D might 
> implement.
>
> From my notes,
> 1. they are currently using a dictionary of column vectors.
> 2. for NA (not available) they are currently using an array of 
> bytes, effectively as a Boolean flag, rather than a bitVector, 
> for performance reasons.
> 3. they are not currently implementing hierarchical headers.
> 4. they are transforming non-valid symbol header strings (read 
> from csv, for example) to valid symbols by replacing '.' with 
> underscore and prefixing numbers with 'x', as examples.  This 
> allows use in expressions.
> 5. Along with 4., they currently have @with for DataVector, to 
> allow expressions to use, for example, :symbol_name instead of 
> dv[:symbol_name].
> 6. They have operation symbols for per element operations on 
> two vectors, for example a ./ b expresses applying the 
> operation to the vector.
> 7. They currently only have row indexes,  no row names or 
> symbols.
>
> I saw someone posting that they were working on DataFrame 
> implementation here, but haven't been able to locate any code 
> in github, and was wondering what implementation decisions are 
> being made here.  Thanks.

Hi Jay.

That may have been me.  I have implemented something very basic, 
but you can read and write my proto dataframe to/from CSV and 
HDF5.  The code is up here:

https://github.com/Laeeth/d_dataframes

You should think of it as a crude prototype that nonetheless has 
been useful for me, but it's done more in the old school hacker 
spirit of getting something working first rather than being 
designed properly.  The reason for that is I have a lot on my 
plate at the moment, and technology is only one of many of these, 
although an important one.  In time I may get someone else to 
work on dataframes and opensource the results, but that may be 
some months away.

So I'd welcome any assistance, or even taking it over.  I haven't 
really done a good job of having idiomatic access, but it's 
something and a start.


Laeeth.


I


More information about the Digitalmars-d-learn mailing list