D language manipulation of dataframe type structures

Jared Miller jared at economicmodeling.com
Wed Sep 25 11:37:46 PDT 2013


I agree with other posters that a D REPL and
interactive/visualization data environment would be very cool,
but unfortunately doesn't exist. Batch computing is more
practical, but REPLs really hook new users. I see statistical
computing as a huge opportunity for D adoption. (R is just
super-ugly and slow, leaving Python + its various native-code
cyborg appendages as the hot new stats environment).

There are tons of ways of accomplishing the same thing in D, but
as far as I know there isn't a "standard" at this point. A
statically typed dataframe is, at minimum, just a range of
structs -- even more minimally, a bare *array* of structs, or
alternatively just a 2-D array in a thin wrapper that provides
access via column labels rather than indexes. You can manipulate
these ranges with functions from std.range and std.algorithm.
Missing or N/A data is a common issue, and can be represented in
a variety of ways, with integers being the most annoying since
there is no built-in NaN value for ints (check out the Nullable
template from std.typecons).

Supporting features like having *both* rows and columns are
accessible via labels rather than indexes requires a little bit
more wrapping. We have a NamedMatrix class at my workplace for
that purpose. It's easy to overload the index operator [] for
access, * for matrix multiplication, etc.

CSV loads can be done with std.csv; unfortunately there's no
corresponding support in that module for *writing* CSV (I've
rolled my own). At my workplace we also have a MysqlConnection
class that provides one-liner loading from a SQL query into
minimalist, range-of-structs dataframes.

Beyond that, it really depends on how you want to manipulate the
dataframes. What specific things do you want to do? If you've got
an idea, I could work up some sample code.

So yes, there are people doing it in The Real World.
Unfortunately my colleagues don't have a nice, tidy,
self-contained DataFrame module to share (yet). But having one
would be a great thing for D. The bigger problem though is
matching the huge 3rd-party stats libraries (like CRAN for R).


On Wednesday, 25 September 2013 at 03:41:36 UTC, Jay Norwood
wrote:
> I've been playing with the python pandas app enables 
> interactive manipulation of tables of data in their dataframe 
> structure, which they say is similar to the structures used in 
> R.
>
> It appears pandas has laid claim to being a faster version of 
> R, but is doing so basically limited to what they can exploit 
> from moving operations back and forth from underlying cython 
> code.
>
> Has anyone written an example app in D that manipulates 
> dataframe type structures?


More information about the Digitalmars-d-learn mailing list