dataframe implementations

Wed Nov 18 10:04:29 PST 2015

On Wednesday, 18 November 2015 at 17:15:38 UTC, Laeeth Isharc 
wrote:
> What do you think about the use of NaN for missing floats?  In 
> theory I could imagine wanting to distinguish between an NaN in 
> the source file and a missing value, but in my world I never 
> felt the need for this.  For integers and bools, that is 
> different of course.

The julia discussions mention another dataframe implementation, I 
believe it was for R, where NaN was used.  There was some mention 
of the virtues of their own choice and the problems with NaN.  I 
think use of NaN was a particular encoding of NaN.  Other 
implementations they mentioned used some reserved value in each 
of the numeric data types to represent NA.  In the julia case, I 
believe what they use is a separate byte vector for each column 
that holds the NA status.  They discussed some other possible 
enhancements, but I don't know what they implemented.  For 
example, if the single byte holds the NA flag, the cell value can 
hold additional info ... maybe the reason for the NA.  There was 
also some discussion of having the associated cell hold repeat 
counts for the NA status, which I suppose meant to repeat it for 
following cells in the column vector.  I'll try to find the 
discussions and post the link.