OT: why do people use python when it is slow?
data pulverizer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Oct 15 06:26:17 PDT 2015
On Thursday, 15 October 2015 at 07:57:51 UTC, Russel Winder wrote:
> On Thu, 2015-10-15 at 06:48 +0000, data pulverizer via
> Digitalmars-d- learn wrote:
> Just because D doesn't have this now doesn't mean it cannot. C
> doesn't have such capability but R and Python do even though R
> and CPython are just C codes.
I think the way R does this is that its dynamic runtime
environment is used bind together native C basic type arrays. I
wander if we could simulate dynamic behaviour by leveraging D's
short compilation time to dynamically write/update data table
source file(s) containing the structure of new/modified data
tables?
> Pandas data structures rely on the NumPy n-dimensional array
> implementation, it is not beyond the bounds of possibility that
> that data structure could be realized as a D module.
Julia's DArray object is an interested take on this:
https://github.com/JuliaParallel/DistributedArrays.jl
I believe that parallelism on arrays and data tables are
different challenges. Data tables are easier since we can
parallelise by row, thus the preference of having row-based
tuples.
> The core issue is to have a seriously efficient n-dimensional
> array that is amenable to data parallelism and is extensible.
> As far as I am aware currently (I will investigate more) the
> NumPy array is a good native code array, but has some issues
> with data parallelism and Pandas has to do quite a lot of work
> to get the extensibility. I wonder how the R data.table works.
R's data table is not currently parallelised
> I have this nagging feeling that like NumPy, data.table seems a
> lot better than it could be. From small experiments D is (and
> also Chapel is even more) hugely faster than Python/NumPy at
> things Python people think NumPy is brilliant for. Expectations
> of Python programmers are set by the scale of Python
> performance, so NumPy seems brilliant. Compared to the scale
> set by D and Chapel, NumPy is very disappointing. I bet the
> same is true of R (I have never really used R).
Thanks for notifying me about Chapel - something else interesting
to investigate. When it comes to speed R is very strange. Basic
math (e.g. *, +, /) operation on an R array can be fast but
for-looping will kill speed by hundreds of times - most things
are slow in R unless they are directly baked into its base
operations. You can write code in C and C++ can call it very
easily in R though using its Rcpp interface.
> This is therefore an opportunity for D to step in. However it
> is a journey of a thousand miles to get something production
> worthy. Python/NumPy/Pandas have had a very large number of
> programmer hours expended on them. Doing this poorly as a D
> modules is likely worse than not doing it at all.
I think D has a lot to offer the world of data science.
More information about the Digitalmars-d-learn
mailing list