Pandas like features

data pulverizer data.pulverizer at gmail.com
Thu Nov 5 20:22:45 UTC 2020


On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
>
> The question for me is if you can work with the same data 
> structures in D, R, Python, and Julia. Can your main program be 
> written in D, but calling out to all three for loading, 
> transforming, and analyzing the data? I'm guessing not, but 
> would be awesome if you could do it.

It's actually a problem I've been thinking about on and off for a 
while but haven't gone round to actually trying to implement it.

1. If I had to do this, I would first decide on a collection of 
common data structures to share starting with *compositions* of 
R/Python/Julia style multi-dimensional arrays - contiguous arrays 
with basic element types with a dimensional information in form 
of another array. So a 2x3 double matrix is a double array of 
length 6 with another long array containing [2, 3]. R has 
externalptr, Julia can interface with pointers, as can Python.

2. Next I would use memory mapped i/o for storage. Usually memory 
mapped files are only accessible by one thread for security but I 
believe that this can be changed. For security you could use 
cryptographic keys to access the files between threads. So that 
memory written in one language can be access by another.

3. Binary file i/o for those is pretty simple, but necessary to 
store results and read then in any of the programs afterwards.

4. All the languages have C APIs so you'd write interfaces in D 
using these to call from D to the languages. All the languages 
can call D extern C functions in dlls directly using their 
versions of ccall.

Another alternative to mmap is using network serialization which 
would be more cross-platform and fungible but this seems like it 
could be slow to me.




More information about the Digitalmars-d mailing list