OT: why do people use python when it is slow?

Thu Oct 15 14:26:32 PDT 2015

On Thursday, 15 October 2015 at 07:57:51 UTC, Russel Winder wrote:
> On Thu, 2015-10-15 at 06:48 +0000, data pulverizer via 
> Digitalmars-d- learn wrote:
>> 
> […]
>> A journey of a thousand miles ...
>
> Exactly.
>
>> I tried to start creating a data table type object by
>> investigating variantArray:
>> http://forum.dlang.org/thread/hhzavwrkbrkjzfohczyq@forum.dlang.org
>>  but hit the snag that D is a static programming language and 
>> may not
>> allow the kind of behaviour you need for creating the same 
>> kind of
>> behaviour you need in data table - like objects.
>> 
>> I envisage such an object as being composed of arrays of 
>> vectors where each vector represents a column in a table as in 
>> R - easier for model matrix creation. Some people believe that 
>> you should work with arrays of tuple rows - which may be more 
>> big data friendly. I am not overly wedded to either approach.
>> 
>> Anyway it seems I have hit an inherent limitation in the 
>> language. Correct me if I am wrong. The data frame needs to 
>> have dynamic behaviour bind rows and columns and return parts 
>> of itself as a data table etc and since D is a static language 
>> we cannot do this.
>
> Just because D doesn't have this now doesn't mean it cannot. C 
> doesn't have such capability but R and Python do even though R 
> and CPython are just C codes.
>
> Pandas data structures rely on the NumPy n-dimensional array 
> implementation, it is not beyond the bounds of possibility that 
> that data structure could be realized as a D module.
>
> Is R's data.table written in R or in C? In either case, it is 
> not beyond the bounds of possibility that that data structure 
> could be realized as a D module.
>
> The core issue is to have a seriously efficient n-dimensional 
> array that is amenable to data parallelism and is extensible. 
> As far as I am aware currently (I will investigate more) the 
> NumPy array is a good native code array, but has some issues 
> with data parallelism and Pandas has to do quite a lot of work 
> to get the extensibility. I wonder how the R data.table works.
>
> I have this nagging feeling that like NumPy, data.table seems a 
> lot better than it could be. From small experiments D is (and 
> also Chapel is even more) hugely faster than Python/NumPy at 
> things Python people think NumPy is brilliant for. Expectations 
> of Python programmers are set by the scale of Python 
> performance, so NumPy seems brilliant. Compared to the scale 
> set by D and Chapel, NumPy is very disappointing. I bet the 
> same is true of R (I have never really used R).
>
> This is therefore an opportunity for D to step in. However it 
> is a journey of a thousand miles to get something production 
> worthy. Python/NumPy/Pandas have had a very large number of 
> programmer hours expended on them.  Doing this poorly as a D 
> modules is likely worse than not doing it at all.

I think it's much better to start, which means solving your own 
problems in a way that is acceptable to you rather than letting 
perfection be the enemy of the good.  It's always easier to do 
something a second time too, as you learn from successes and 
mistakes and you have a better idea about what you want.  Of 
course it's better to put some thought into design early on, but 
that shouldn't end up in analysis paralysis.  John Colvin and 
others are putting quite a lot of thought into dlang science, it 
seems to me, but he is also getting stuff done.  Running D in a 
Jupyter notebook is something very useful.  It doesn't matter 
that it's cosmetically imperfect at this stage, and it won't stay 
that way.  And that's just a small step towards the bigger goal.