Pandas like features
data pulverizer
data.pulverizer at gmail.com
Thu Nov 5 20:22:45 UTC 2020
On Thursday, 5 November 2020 at 19:18:11 UTC, bachmeier wrote:
>
> The question for me is if you can work with the same data
> structures in D, R, Python, and Julia. Can your main program be
> written in D, but calling out to all three for loading,
> transforming, and analyzing the data? I'm guessing not, but
> would be awesome if you could do it.
It's actually a problem I've been thinking about on and off for a
while but haven't gone round to actually trying to implement it.
1. If I had to do this, I would first decide on a collection of
common data structures to share starting with *compositions* of
R/Python/Julia style multi-dimensional arrays - contiguous arrays
with basic element types with a dimensional information in form
of another array. So a 2x3 double matrix is a double array of
length 6 with another long array containing [2, 3]. R has
externalptr, Julia can interface with pointers, as can Python.
2. Next I would use memory mapped i/o for storage. Usually memory
mapped files are only accessible by one thread for security but I
believe that this can be changed. For security you could use
cryptographic keys to access the files between threads. So that
memory written in one language can be access by another.
3. Binary file i/o for those is pretty simple, but necessary to
store results and read then in any of the programs afterwards.
4. All the languages have C APIs so you'd write interfaces in D
using these to call from D to the languages. All the languages
can call D extern C functions in dlls directly using their
versions of ccall.
Another alternative to mmap is using network serialization which
would be more cross-platform and fungible but this seems like it
could be slow to me.
More information about the Digitalmars-d
mailing list