Initial feedback for std.experimental.image

Wed Jul 8 11:07:08 PDT 2015

On Monday, 6 July 2015 at 13:48:53 UTC, Rikki Cattermole wrote:
>
> Please destroy!
>

You asked for it! :)

As a reference to a library that is used to handle images on a 
professional level (VFX industry), I'd encourage you to look at 
the feature set and interfaces of OpenImageIO. Sure, it's a big 
library and some of it is definitely out of scope for what you 
try to accomplish (image tile caching and texture sampling, 
obviously).

Yet, there are some features I specifically want to mention here 
to challenge the scope of your design:

- arbitrary channel layouts in images: this is a big one. You 
mention 3D engines as a targeted use case in the specification. 
3D rendering is one of the worst offenders when it comes to crazy 
channel layouts in textures (which are obviously stored as image 
files). If you have a data texture that requires 2 channels (e.g. 
uv offsets for texture lookups in shaders or some crazy data 
tables), its memory layout should also only ever have two 
channels. Don't expand it to RGB transparently or anything else 
braindead. Don't change the data type of the pixel values wildly 
without being asked to do so. The developer most likely has 
chosen a 16 bit signed integer per channel (or whatever else) for 
a good reason. Some high end file formats like OpenEXR even allow 
users to store completely arbitrary channels as well, often with 
a different per-channel data format (leading to layouts like 
RGBAZ with an additional mask channel on top). But support for 
that really bloats image library interfaces. I'd stick with a 
sane variant of the uncompressed texture formats that the OpenGL 
specification lists as the target set of supported in-memory 
image formats. That mostly matches current GPU hardware support 
and probably will for some time to come.

- padding and memory alignment: depending on the platform, image 
format and task at hand you may want the in-memory layout of your 
image to be padded in various ways. For example, you would want 
your scanlines and pixel values aligned to certain offsets to 
make use of SIMD instructions which often carry alignment 
restrictions with them. This is one of the reasons why RGB images 
are sometimes expanded to have a dummy channel between the 
triplets. Also, aligning the start of each scanline may be 
important, which introduces a "pitch" between them that is 
greater than just the storage size of each scanline by itself. 
Again, this may help speeding up image processing.

- subimages: this one may seem obscure, but it happens in a 
number common of file formats (gif, mng, DDS, probably TIFF and 
others). Subimages can be - for instance - individual animation 
frames or precomputed mipmaps. This means that they may have 
metadata attached to them (e.g. framerate or delay to next frame) 
or they may come in totally different dimensions (mipmap levels).

- window regions: now this not quite your average image format 
feature, but relevant for some use cases. The gist of it is that 
the image file may define a coordinate system for a whole image 
frame but only contain actual data within certain regions that do 
not cover the whole frame. These regions may even extend beyond 
the defined image frame (used e.g. for VFX image postprocessing 
to have properly defined pixel values to filter into the visible 
part of the final frame). Again, the OpenEXR documentation 
explains this feature nicely. Again, I think this likely is out 
of scope for this library.

My first point also leads me to this criticism:

- I do not see a way to discover the actual data format of a PNG 
file through your loader. Is it 8 bit palette-based, 8 bit per 
pixel or 16 bits per pixel? Especially the latter should not be 
transparently converted to 8 bits per pixel if encountered 
because it is a lossy transformation. As I see it right now you 
have to know the pixel format up front to instantiate the loader. 
I consider that bad design. You can only have true knowledge of 
the file contents after the image header were parsed. The same is 
generally true of most actually useful image formats out there.

- Could support for image data alignment be added by defining a 
new ImageStorage subclass? The actual in-memory data is not 
exposed to direct access, is it? Access to the raw image data 
would be preferable for those cases where you know exactly what 
you are doing. Going through per-pixel access functions for large 
image regions is going to be dreadfully slow in comparison to 
what can be achieved with proper processing/filtering code.

- Also, uploading textures to the GPU requires passing raw memory 
blocks and a format description of sorts to the 3D API. Being 
required to slowly copy the image data in question into a 
temporary buffer for this process is not an adequate solution.

Let me know what you think!