OT: why do people use python when it is slow?
data pulverizer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Oct 18 04:53:58 PDT 2015
On Thursday, 15 October 2015 at 21:16:18 UTC, Laeeth Isharc wrote:
> On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizer
> wrote:
>> On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc
>> wrote:
>>> https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
>>> Andrei suggested posting more widely.
>>
>> I am coming at D by way of R, C++, Python etc. so I speak as a
>> statistician who is interested in data science applications.
>
> Welcome... Looks like we have similar interests.
That's good to know
>> To sit on the deployment side, D needs to grow it's big
>> data/noSQL infrastructure for a start, then hook into a whole
>> ecosystem of analytic tools in an easy and straightforward
>> manner. This will take a lot of work!
>
> Indeed. The dlangscience project managed by John Colvin is
> very interesting. It is not a pure stats project, but there
> will be many shared areas of need. He has some v interesting
> ideas, and being able to mix Python and D in a Jupyter notebook
> is rather nice (you can do this already).
Thanks for bringing my attention to this, this looks interesting.
> Sounds interesting. Take a look at Colvin's dlang science
> draft white paper, and see what you would add. It's a chance
> to shape things whilst they are still fluid.
Good suggestion.
>> 3. Solid interface to a big data database, that allows a D
>> data table <-> database easily
>
> Which ones do you have in mind for stats? The different
> choices seem to serve quite different needs. And when you say
> big data, how big do you typically mean ?
What I mean is to start by tapping into current big data
technologies. HDFS and Cassandra have C APIs which we can wrap
for D.
>> 4. Functional programming: especially around data table and
>> array structures. R's apply(), lapply(), tapply(), plyr and
>> now data.table(,, by = list()) provides powerful tools for
>> data manipulation.
>
> Any thoughts on what the design should look like?
Yes, I think this is easy to implement but still important. The
real devil is my point #1 the dynamic data table object.
>
> To an extent there is a balance between wanting to explore data
> iteratively (when you don't know where you will end up), and
> wanting to build a robust process for production. I have been
> wondering myself about using LuaJIT to strap together D
> building blocks for the exploration (and calling it based on a
> custom console built around Adam Ruppe's terminal).
Sounds interesting
>> 6. Nullable types makes talking about missing data more
>> straightforward and gives you the opportunity to code them
>> into a set value in your analysis. D is streaks ahead of
>> Python here, but this is built into R at a basic level.
>
> So matrices with nullable types within? Is nan enough for you
> ? If not then could be quite expensive if back end is C.
I am not suggesting that we pass nullable matrices to C
algorithms, yes nan is how this is done in practice but you
wouldn't have nans in your matrix at the point of modeling -
they'll just propagate and trash your answer. Nullable types are
useful in data acquisition and exploration - the more practical
side of data handling. I was quite shocked to see them in D, when
they are essentially absent from "high level" programming
languages like Python. Real data is messy and having nullable
types is useful in processing, storing and summarizing raw data.
I put in as #6 because I think it is possible to do practical
statistics working around them by using notional hacks. Nullables
are something that C#, and R have and Python's pandas has
struggled with. The great news is that they are available in D so
we can use them.
>>
>> If D can get points 1, 2, 3 many people would be all over D
>> because it is a fantastic programming language and is wicked
>> fast.
> What do you like best about it ? And in your own domain, what
> have the biggest payoffs been in practice?
I am playing with D at the moment. To become useful to me the
data table structure is a must. I previously said points 1, 2,
and 3 would get data scientists sucked into D. But the data table
structure is the seed. A dynamic structure like that in D would
catalyze the rest. Everything else is either wrappers, routine
and maybe a lot of work but straightforward to implement. The
data table structure for me is the real enigma.
The way that R's data types are structured around SEXPs is the
key to all of this. I am currently reading through R's internal
documentation to get my head around this.
https://cran.r-project.org/doc/manuals/r-release/R-ints.html
More information about the Digitalmars-d-learn
mailing list