Pandas like features

Abdulhaq alynch4047 at gmail.com
Fri Oct 30 18:23:38 UTC 2020


On Friday, 30 October 2020 at 12:15:58 UTC, Russel Winder wrote:
> On Fri, 2020-10-30 at 10:12 +0000, jmh530 via Digitalmars-d 
> wrote:
>> 
> […]
>> I think the point on multi-threaded Python came away as a big 
>> complaint there. Lots of mentions of the GIL or people being 
>> CPU-bound. Pandas was mentioned in this context as well.
>
> << I haven't properly read the blog entry as yet. Sorry. >>
>
> Guido saw (cf. he and I had a long "discussion" at EuroPython 
> 2010, there were many witnesses) GIL as absolutely fine for 
> CPython in perpetuity, that if Pypy came up with a GIL-free VM 
> then that would be fine. His mindset was (and I suspect may 
> still be) that Python code was/is not about being CPU bound 
> code, it was/is about sequential and concurrent, not parallel 
> for performance, code. As long as there is NumPy and other PVM 
> extensions, or use of message passing between processes, that 
> allow for GIL-free parallel, CPU bound processing, it is hard 
> to say Guido was/is wrong. (And in 2010 it was even harder :-) )
>
> Pandas is build on NumPy and so has the same parallelism 
> properties as any other NumPy realised package.

I've spent much of the last 5 years writing code for trade 
studies and other optimisations on top of python, numpy and 
multiprocessing. Lately I have been working a lot with Pandas for 
multi-dimensional optimisation and machine learning.

The slow performance of python in the glue layer between numpy, 
multiprocessing etc. is a non-issue. I can easily keep all 8 
cores very busy running efficient C++ CFD, machine learning codes 
etc. using the above combination.

The migration from P2 to P3 was also pretty tame. For people 
doing real work, it's not a big deal. Sure it was a distraction 
but it has its benefits, I'm glad they did it. Boring opinion, 
and doesn't generate ad income from blog hits, but there you go.

I would like to see D have a numpy equivalent but realistically 
you won't duplicate the numy ecosystem here, it's too much work. 
And why do it? Just wrap up the numpy ecosystem from D and use it 
like that.

Core Pandas on its own BTW isn't hard to implement IMO. It turns 
out it's very expressive and very useful, but not a hard thing to 
copy.


More information about the Digitalmars-d mailing list