OT: why do people use python when it is slow?

data pulverizer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Oct 14 15:11:53 PDT 2015


On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc wrote:
> https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
> Andrei suggested posting more widely.

I am coming at D by way of R, C++, Python etc. so I speak as a 
statistician who is interested in data science applications.

It's about programmer time. You have to weight the time it takes 
you to do the task in each programming language, if you are doing 
statistical analysis now, R and Python come out streaks ahead.

The scope roughly speaking is Research -> Deployment. R and 
Python sit on the research side, and Python/JVM technologies sit 
on the deployment side (broadly speaking). The question is where 
does D sit? What should D's data science strategy be?

To sit on the deployment side, D needs to grow it's big 
data/noSQL infrastructure for a start, then hook into a whole 
ecosystem of analytic tools in an easy and straightforward 
manner. This will take a lot of work!

I believe it is easier and more effective to start on the 
research side. D will need:

1. A data table structure like R's data.frame or data.table. This 
is a dynamic data structure that represents a table that can have 
lots of operations applied to it. It is the data structure that 
separates R from most programming languages. It is what pandas 
tries to emulate. This includes text file and database i/o from 
mySQL and ODBC for a start.

2. Formula class : the ability to talk about statistical models 
using formulas e.g. y ~ x1 + x2 + x3 etc and then use these 
formulas to generate model matrices for input into statistical 
algorithms.

3. Solid interface to a big data database, that allows a D data 
table <-> database easily

4. Functional programming: especially around data table and array 
structures. R's apply(), lapply(), tapply(), plyr and now 
data.table(,, by = list()) provides powerful tools for data 
manipulation.

5. A factor data type:for categorical variables. This is easy to 
implement! This ties into the creation of model matrices.

6. Nullable types makes talking about missing data more 
straightforward and gives you the opportunity to code them into a 
set value in your analysis. D is streaks ahead of Python here, 
but this is built into R at a basic level.

If D can get points 1, 2, 3 many people would be all over D 
because it is a fantastic programming language and is wicked fast.



More information about the Digitalmars-d-learn mailing list