OT: why do people use python when it is slow?

data pulverizer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Oct 15 06:26:17 PDT 2015


On Thursday, 15 October 2015 at 07:57:51 UTC, Russel Winder wrote:
> On Thu, 2015-10-15 at 06:48 +0000, data pulverizer via 
> Digitalmars-d- learn wrote:
> Just because D doesn't have this now doesn't mean it cannot. C 
> doesn't have such capability but R and Python do even though R 
> and CPython are just C codes.

I think the way R does this is that its dynamic runtime 
environment is used bind together native C basic type arrays. I 
wander if we could simulate dynamic behaviour by leveraging D's 
short compilation time to dynamically write/update data table 
source file(s) containing the structure of new/modified data 
tables?

> Pandas data structures rely on the NumPy n-dimensional array 
> implementation, it is not beyond the bounds of possibility that 
> that data structure could be realized as a D module.

Julia's DArray object is an interested take on this: 
https://github.com/JuliaParallel/DistributedArrays.jl

I believe that parallelism on arrays and data tables are 
different challenges. Data tables are easier since we can 
parallelise by row, thus the preference of having row-based 
tuples.

> The core issue is to have a seriously efficient n-dimensional 
> array that is amenable to data parallelism and is extensible. 
> As far as I am aware currently (I will investigate more) the 
> NumPy array is a good native code array, but has some issues 
> with data parallelism and Pandas has to do quite a lot of work 
> to get the extensibility. I wonder how the R data.table works.

R's data table is not currently parallelised

> I have this nagging feeling that like NumPy, data.table seems a 
> lot better than it could be. From small experiments D is (and 
> also Chapel is even more) hugely faster than Python/NumPy at 
> things Python people think NumPy is brilliant for. Expectations 
> of Python programmers are set by the scale of Python 
> performance, so NumPy seems brilliant. Compared to the scale 
> set by D and Chapel, NumPy is very disappointing. I bet the 
> same is true of R (I have never really used R).

Thanks for notifying me about Chapel - something else interesting 
to investigate. When it comes to speed R is very strange. Basic 
math (e.g. *, +, /) operation on an R array can be fast but 
for-looping will kill speed by hundreds of times - most things 
are slow in R unless they are directly baked into its base 
operations. You can write code in C and C++ can call it very 
easily in R though using its Rcpp interface.


> This is therefore an opportunity for D to step in. However it 
> is a journey of a thousand miles to get something production 
> worthy. Python/NumPy/Pandas have had a very large number of 
> programmer hours expended on them.  Doing this poorly as a D 
> modules is likely worse than not doing it at all.

I think D has a lot to offer the world of data science.


More information about the Digitalmars-d-learn mailing list