[GSoC] Dataframes for D
Prateek Nayak
lelouch.cpp at gmail.com
Sat Aug 10 04:10:31 UTC 2019
On Friday, 9 August 2019 at 08:08:39 UTC, bioinfornatics wrote:
> Dear D community,
>
> Thanks, Prateeek Nayak for your works.
> As currently, I am working with pandas (python, dataframe ...)
> . They are an extra feature that I appreciate a lot, it is the
> IO tool part:
>
> * SQL
> method: read_sql and to_sql
> Description: which allow to read and save from a DataBase.
> These methods combined with SqlAlchemy are awesome.
>
> * Parquet
> method: read_parquet and to_parquet
> Description: In BigData environment Parquet is a file format
> often used
>
> These abilities made Panda and its Dataframe API a core library
> to have. Using like this, allow standardizing data structured
> used into our application and in same time offer rich
> statistics API.
>
> Indeed it is important for tho code maintainability. And the
> FairData point that an application is a set of input data +
> program's feature = result. Thus put data structured as the
> first component to think how to develop an application is
> important.
> The application is more robust and flexible as we can handle
> multiple input data file format.
>
> I hope to see such features in D.
>
>
> Best regards
>
> Source:
> -
> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
> -
> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
> -
> https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#parquet
I was looking into Parquet and it even came up in the reddit post
i had linked to earlier on - smaller file size and better I/O
makes it really good for industrial use.
A quick search on DUB didn't give any result for a parser so I'll
probably work on a library to work with Parquet files.
I looked into Cap'n Proto too - it looks promising but its
missing from Pandas I/O section which was disappointing.
Thanks for mentioning SQL. I will start working on these features
soon.
> Again, thank you so much for working on this!
> We will be excited to put Magpie through its paces in our lab,
> but it is missing* a few key (really, basic IMO) features we
> make heavy use of in pandas.
> * I have read the README and glanced at code but not used
> Magpie yet, so if I am > wrong about below please correct me!
> Since you are soliciting ideas:
> 1. Selecting/indexing into data with boolean vectors. e.g:
> df[df.A > 30 && df.B != "ignore"]
> 1a. This really means returning a boolean vector for df.COL
> <op> <operand>
> 1b. ...and being able to subset data by a bool vector
> 2. We make heavy use of "pivot" functionality.
> Kind regards
I was thinking of the same feature as 1 - a filter like function
for DataFrame and Group - finding possible ways to implement it
I'm really embarrassed to admit I never even thought about Pivot.
I looks like a beautiful feature to have - will definitely add to
Magpie soon (possibly over the next couple of weeks - I'm a bit
tied down right now with commencement of University academics but
it will definitely come soon)
More information about the Digitalmars-d
mailing list