OT: why do people use python when it is slow?
data pulverizer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Oct 14 15:11:53 PDT 2015
On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc wrote:
> https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
> Andrei suggested posting more widely.
I am coming at D by way of R, C++, Python etc. so I speak as a
statistician who is interested in data science applications.
It's about programmer time. You have to weight the time it takes
you to do the task in each programming language, if you are doing
statistical analysis now, R and Python come out streaks ahead.
The scope roughly speaking is Research -> Deployment. R and
Python sit on the research side, and Python/JVM technologies sit
on the deployment side (broadly speaking). The question is where
does D sit? What should D's data science strategy be?
To sit on the deployment side, D needs to grow it's big
data/noSQL infrastructure for a start, then hook into a whole
ecosystem of analytic tools in an easy and straightforward
manner. This will take a lot of work!
I believe it is easier and more effective to start on the
research side. D will need:
1. A data table structure like R's data.frame or data.table. This
is a dynamic data structure that represents a table that can have
lots of operations applied to it. It is the data structure that
separates R from most programming languages. It is what pandas
tries to emulate. This includes text file and database i/o from
mySQL and ODBC for a start.
2. Formula class : the ability to talk about statistical models
using formulas e.g. y ~ x1 + x2 + x3 etc and then use these
formulas to generate model matrices for input into statistical
algorithms.
3. Solid interface to a big data database, that allows a D data
table <-> database easily
4. Functional programming: especially around data table and array
structures. R's apply(), lapply(), tapply(), plyr and now
data.table(,, by = list()) provides powerful tools for data
manipulation.
5. A factor data type:for categorical variables. This is easy to
implement! This ties into the creation of model matrices.
6. Nullable types makes talking about missing data more
straightforward and gives you the opportunity to code them into a
set value in your analysis. D is streaks ahead of Python here,
but this is built into R at a basic level.
If D can get points 1, 2, 3 many people would be all over D
because it is a fantastic programming language and is wicked fast.
More information about the Digitalmars-d-learn
mailing list