D language manipulation of dataframe type structures

John Colvin john.loughran.colvin at gmail.com
Wed Sep 25 13:21:26 PDT 2013


On Wednesday, 25 September 2013 at 18:37:48 UTC, Jared Miller 
wrote:
> I agree with other posters that a D REPL and
> interactive/visualization data environment would be very cool,
> but unfortunately doesn't exist. Batch computing is more
> practical, but REPLs really hook new users. I see statistical
> computing as a huge opportunity for D adoption. (R is just
> super-ugly and slow, leaving Python + its various native-code
> cyborg appendages as the hot new stats environment).
>
> There are tons of ways of accomplishing the same thing in D, but
> as far as I know there isn't a "standard" at this point. A
> statically typed dataframe is, at minimum, just a range of
> structs -- even more minimally, a bare *array* of structs, or
> alternatively just a 2-D array in a thin wrapper that provides
> access via column labels rather than indexes. You can manipulate
> these ranges with functions from std.range and std.algorithm.
> Missing or N/A data is a common issue, and can be represented in
> a variety of ways, with integers being the most annoying since
> there is no built-in NaN value for ints (check out the Nullable
> template from std.typecons).
>
> Supporting features like having *both* rows and columns are
> accessible via labels rather than indexes requires a little bit
> more wrapping. We have a NamedMatrix class at my workplace for
> that purpose. It's easy to overload the index operator [] for
> access, * for matrix multiplication, etc.
>
> CSV loads can be done with std.csv; unfortunately there's no
> corresponding support in that module for *writing* CSV (I've
> rolled my own). At my workplace we also have a MysqlConnection
> class that provides one-liner loading from a SQL query into
> minimalist, range-of-structs dataframes.
>
> Beyond that, it really depends on how you want to manipulate the
> dataframes. What specific things do you want to do? If you've 
> got
> an idea, I could work up some sample code.
>
> So yes, there are people doing it in The Real World.
> Unfortunately my colleagues don't have a nice, tidy,
> self-contained DataFrame module to share (yet). But having one
> would be a great thing for D. The bigger problem though is
> matching the huge 3rd-party stats libraries (like CRAN for R).
>
>
> On Wednesday, 25 September 2013 at 03:41:36 UTC, Jay Norwood
> wrote:
>> I've been playing with the python pandas app enables 
>> interactive manipulation of tables of data in their dataframe 
>> structure, which they say is similar to the structures used in 
>> R.
>>
>> It appears pandas has laid claim to being a faster version of 
>> R, but is doing so basically limited to what they can exploit 
>> from moving operations back and forth from underlying cython 
>> code.
>>
>> Has anyone written an example app in D that manipulates 
>> dataframe type structures?

I had considered one day making some a semi-port of pandas, at 
the very least stealing Wes' basic algorithms (no point 
reinventing the hard stuff). The interface could be better in D 
than python I reckon, although of course the lack of a repl is a 
bit of a show-stopper.


More information about the Digitalmars-d-learn mailing list