OT: why do people use python when it is slow?

Thu Oct 15 14:16:16 PDT 2015

On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizer 
wrote:
> On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc 
> wrote:
>> https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
>> Andrei suggested posting more widely.
>
> I am coming at D by way of R, C++, Python etc. so I speak as a 
> statistician who is interested in data science applications.

Welcome...  Looks like we have similar interests.

> To sit on the deployment side, D needs to grow it's big 
> data/noSQL infrastructure for a start, then hook into a whole 
> ecosystem of analytic tools in an easy and straightforward 
> manner. This will take a lot of work!

Indeed.  The dlangscience project managed by John Colvin is very 
interesting.  It is not a pure stats project, but there will be 
many shared areas of need.  He has some v interesting ideas, and 
being able to mix Python and D in a Jupyter notebook is rather 
nice (you can do this already).
>
> I believe it is easier and more effective to start on the 
> research side. D will need:
>
> 1. A data table structure like R's data.frame or data.table. 
> This is a dynamic data structure that represents a table that 
> can have lots of operations applied to it. It is the data 
> structure that separates R from most programming languages. It 
> is what pandas tries to emulate. This includes text file and 
> database i/o from mySQL and ODBC for a start.

I fully agree, and have made a very simple start on this.  See 
github. It's usable for my needs as they stand, although far from 
production ready or elegant.  You can read and write to/from CSV 
and HDF5.  I guess mysql and ODBC wouldn't be hard to add, but I 
don't myself need for now and won't have time to do myself.  If I 
have space I may channel some reesources in that direction some 
time next year.

> 2. Formula class : the ability to talk about statistical models 
> using formulas e.g. y ~ x1 + x2 + x3 etc and then use these 
> formulas to generate model matrices for input into statistical 
> algorithms.

Sounds interesting.  Take a look at Colvin's dlang science draft 
white paper, and see what you would add.  It's a chance to shape 
things whilst they are still fluid.

> 3. Solid interface to a big data database, that allows a D data 
> table <-> database easily

Which ones do you have in mind for stats?  The different choices 
seem to serve quite different needs.  And when you say big data, 
how big do you typically mean ?

> 4. Functional programming: especially around data table and 
> array structures. R's apply(), lapply(), tapply(), plyr and now 
> data.table(,, by = list()) provides powerful tools for data 
> manipulation.

Any thoughts on what the design should look like?

To an extent there is a balance between wanting to explore data 
iteratively (when you don't know where you will end up), and 
wanting to build a robust process for production.  I have been 
wondering myself about using LuaJIT to strap together D building 
blocks for the exploration (and calling it based on a custom 
console built around Adam Ruppe's terminal).
>
> 5. A factor data type:for categorical variables. This is easy 
> to implement! This ties into the creation of model matrices.
>
> 6. Nullable types makes talking about missing data more 
> straightforward and gives you the opportunity to code them into 
> a set value in your analysis. D is streaks ahead of Python 
> here, but this is built into R at a basic level.

So matrices with nullable types within?  Is nan enough for you ?  
If not then could be quite expensive if back end is C.
>
> If D can get points 1, 2, 3 many people would be all over D 
> because it is a fantastic programming language and is wicked 
> fast.
What do you like best about it ?  And in your own domain, what 
have the biggest payoffs been in practice?