OT: why do people use python when it is slow?

Sun Oct 18 04:53:58 PDT 2015

On Thursday, 15 October 2015 at 21:16:18 UTC, Laeeth Isharc wrote:
> On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizer 
> wrote:
>> On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc 
>> wrote:
>>> https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
>>> Andrei suggested posting more widely.
>>
>> I am coming at D by way of R, C++, Python etc. so I speak as a 
>> statistician who is interested in data science applications.
>
> Welcome...  Looks like we have similar interests.

That's good to know

>> To sit on the deployment side, D needs to grow it's big 
>> data/noSQL infrastructure for a start, then hook into a whole 
>> ecosystem of analytic tools in an easy and straightforward 
>> manner. This will take a lot of work!
>
> Indeed.  The dlangscience project managed by John Colvin is 
> very interesting.  It is not a pure stats project, but there 
> will be many shared areas of need.  He has some v interesting 
> ideas, and being able to mix Python and D in a Jupyter notebook 
> is rather nice (you can do this already).

Thanks for bringing my attention to this, this looks interesting.

> Sounds interesting.  Take a look at Colvin's dlang science 
> draft white paper, and see what you would add.  It's a chance 
> to shape things whilst they are still fluid.

Good suggestion.

>> 3. Solid interface to a big data database, that allows a D 
>> data table <-> database easily
>
> Which ones do you have in mind for stats?  The different 
> choices seem to serve quite different needs.  And when you say 
> big data, how big do you typically mean ?

What I mean is to start by tapping into current big data 
technologies. HDFS and Cassandra have C APIs which we can wrap 
for D.

>> 4. Functional programming: especially around data table and 
>> array structures. R's apply(), lapply(), tapply(), plyr and 
>> now data.table(,, by = list()) provides powerful tools for 
>> data manipulation.
>
> Any thoughts on what the design should look like?

Yes, I think this is easy to implement but still important. The 
real devil is my point #1 the dynamic data table object.

>
> To an extent there is a balance between wanting to explore data 
> iteratively (when you don't know where you will end up), and 
> wanting to build a robust process for production.  I have been 
> wondering myself about using LuaJIT to strap together D 
> building blocks for the exploration (and calling it based on a 
> custom console built around Adam Ruppe's terminal).

Sounds interesting

>> 6. Nullable types makes talking about missing data more 
>> straightforward and gives you the opportunity to code them 
>> into a set value in your analysis. D is streaks ahead of 
>> Python here, but this is built into R at a basic level.
>
> So matrices with nullable types within?  Is nan enough for you 
> ?  If not then could be quite expensive if back end is C.

I am not suggesting that we pass nullable matrices to C 
algorithms, yes nan is how this is done in practice but you 
wouldn't have nans in your matrix at the point of modeling - 
they'll just propagate and trash your answer. Nullable types are 
useful in data acquisition and exploration - the more practical 
side of data handling. I was quite shocked to see them in D, when 
they are essentially absent from "high level" programming 
languages like Python. Real data is messy and having nullable 
types is useful in processing, storing and summarizing raw data. 
I put in as #6 because I think it is possible to do practical 
statistics working around them by using notional hacks. Nullables 
are something that C#, and R have and Python's pandas has 
struggled with. The great news is that they are available in D so 
we can use them.

>>
>> If D can get points 1, 2, 3 many people would be all over D 
>> because it is a fantastic programming language and is wicked 
>> fast.
> What do you like best about it ?  And in your own domain, what 
> have the biggest payoffs been in practice?

I am playing with D at the moment. To become useful to me the 
data table structure is a must. I previously said points 1, 2, 
and 3 would get data scientists sucked into D. But the data table 
structure is the seed. A dynamic structure like that in D would 
catalyze the rest. Everything else is either wrappers, routine 
and maybe a lot of work but straightforward to implement. The 
data table structure for me is the real enigma.

The way that R's data types are structured around SEXPs is the 
key to all of this. I am currently reading through R's internal 
documentation to get my head around this.

https://cran.r-project.org/doc/manuals/r-release/R-ints.html