Initial feedback for std.experimental.image

Sun Jul 12 07:08:00 PDT 2015

On 13/07/2015 1:43 a.m., "Gregor =?UTF-8?B?TcO8Y2tsIg==?= 
<gregormueckl at gmx.de>" wrote:
> On Thursday, 9 July 2015 at 15:05:12 UTC, Rikki Cattermole wrote:
>> On 10/07/2015 2:07 a.m., "Gregor =?UTF-8?B?TcO8Y2tsIg==?=
>> <gregormueckl at gmx.de>" wrote:
>>> On Thursday, 9 July 2015 at 04:09:11 UTC, Rikki Cattermole wrote:
>>>> On 9/07/2015 6:07 a.m., "Gregor =?UTF-8?B?TcO8Y2tsIg==?=
>>>> <gregormueckl at gmx.de>" wrote:
>>>>> On Monday, 6 July 2015 at 13:48:53 UTC, Rikki Cattermole wrote:
>>>>>>
>>>>>> Please destroy!
>>>>>>
>>>>>
>>>>> You asked for it! :)
>>>>>
>>>>> As a reference to a library that is used to handle images on a
>>>>> professional level (VFX industry), I'd encourage you to look at the
>>>>> feature set and interfaces of OpenImageIO. Sure, it's a big library
>>>>> and
>>>>> some of it is definitely out of scope for what you try to accomplish
>>>>> (image tile caching and texture sampling, obviously).
>>>>>
>>>>> Yet, there are some features I specifically want to mention here to
>>>>> challenge the scope of your design:
>>>>>
>>>>> - arbitrary channel layouts in images: this is a big one. You
>>>>> mention 3D
>>>>> engines as a targeted use case in the specification. 3D rendering
>>>>> is one
>>>>> of the worst offenders when it comes to crazy channel layouts in
>>>>> textures (which are obviously stored as image files). If you have a
>>>>> data
>>>>> texture that requires 2 channels (e.g. uv offsets for texture
>>>>> lookups in
>>>>> shaders or some crazy data tables), its memory layout should also only
>>>>> ever have two channels. Don't expand it to RGB transparently or
>>>>> anything
>>>>> else braindead. Don't change the data type of the pixel values wildly
>>>>> without being asked to do so. The developer most likely has chosen
>>>>> a 16
>>>>> bit signed integer per channel (or whatever else) for a good reason.
>>>>> Some high end file formats like OpenEXR even allow users to store
>>>>> completely arbitrary channels as well, often with a different
>>>>> per-channel data format (leading to layouts like RGBAZ with an
>>>>> additional mask channel on top). But support for that really bloats
>>>>> image library interfaces. I'd stick with a sane variant of the
>>>>> uncompressed texture formats that the OpenGL specification lists as
>>>>> the
>>>>> target set of supported in-memory image formats. That mostly matches
>>>>> current GPU hardware support and probably will for some time to come.
>>>>
>>>> As long as the color implementation matches isColor from
>>>> std.experimental.color. Then it's a color. I do not handle that :)
>>>> The rest of how it maps in memory is defined by the image storage
>>>> types. Any image loader/exporter can use any as long as you specify it
>>>> via a template argument *currently*.
>>>>
>>>
>>> Hm... in that case you introduce transparent mappings between
>>> user-facing types and the internal mapping which may be lossy in various
>>> ways. This works, but the internal type should be discoverable somehow.
>>> This leads down a similar road to OpenGL texture formats: they have
>>> internal storage formats and there's the host formats to/from which the
>>> data is converted when passing back and forth. This adds a lot of
>>> complexity and potential for surprises, unfortunately. I'm not entirely
>>> sure what to think here.
>>
>> Internal color to an image storage type is well known at compile time.
>> Now SwappableImage that wraps another image type. That definitely
>> muddies the water a lot. Since it auto converts from the original
>> format. Which could be, you know messy.
>> It's actually the main reason I asked Manu for a gain/loss precision
>> functions. For detecting if precision is being changed. Mostly for
>> logging purposes.
>>
>
> I haven't encountered SwappableImage anywhere. Did I miss anything?

It's only used for image formats right now directly.
http://cattermole.co.nz/phobosImage/docs/html/std_experimental_image_interfaces.html#.SwappableImage

>>>>> - window regions: now this not quite your average image format
>>>>> feature,
>>>>> but relevant for some use cases. The gist of it is that the image file
>>>>> may define a coordinate system for a whole image frame but only
>>>>> contain
>>>>> actual data within certain regions that do not cover the whole frame.
>>>>> These regions may even extend beyond the defined image frame (used
>>>>> e.g.
>>>>> for VFX image postprocessing to have properly defined pixel values to
>>>>> filter into the visible part of the final frame). Again, the OpenEXR
>>>>> documentation explains this feature nicely. Again, I think this likely
>>>>> is out of scope for this library.
>>>>
>>>> Ugh based upon what you said, that is out of scope of the image
>>>> loader/exporters that I'm writing. Also right now it is only using
>>>> unsigned integers for coordinates. I'm guessing if it is outside the
>>>> bounds it can go negative then.
>>>> Slightly too specialized for what we need in the general case.
>>>>
>>>
>>> Yes, this is a slightly special use case. I can think of quite a lot of
>>> cases where you would want border regions of some kind for what you are
>>> doing, but they are all related to rendering and image processing.
>>
>> You have convinced me that I need to add a subimage struct which is
>> basically SwappableImage. Just with offset/size different to original.
>>
>
> Again, I unfortunately fail to follow. Sorry.
>
> I wanted to argue that window regions are out of scope, unlike subimages
> (which actually are completely separate images within the same file -
> they do not share a single coordinate system).

What I mean by subimage is basically a region of another image.
Like a sprite in a sprite sheet. Same idea.

>>>>> My first point also leads me to this criticism:
>>>>>
>>>>> - I do not see a way to discover the actual data format of a PNG file
>>>>> through your loader. Is it 8 bit palette-based, 8 bit per pixel or 16
>>>>> bits per pixel? Especially the latter should not be transparently
>>>>> converted to 8 bits per pixel if encountered because it is a lossy
>>>>> transformation. As I see it right now you have to know the pixel
>>>>> format
>>>>> up front to instantiate the loader. I consider that bad design. You
>>>>> can
>>>>> only have true knowledge of the file contents after the image header
>>>>> were parsed. The same is generally true of most actually useful image
>>>>> formats out there.
>>>>
>>>> The reasoning is because this is what I know I can work with. You
>>>> specify what you want to use, it'll auto convert after that. It makes
>>>> user code a bit simpler.
>>>>
>>>
>>> I can understand your reasoning and this is why libraries like FreeImage
>>> make it very simple to get the image data converted to the format you
>>> want from an arbitrary input. What I'd like to see is more of an
>>> extension of the current mechanism: make it possible to query the data
>>> format of the image file. That way, the application can make a wiser
>>> decision on the format in which it wants to receive the data, but it
>>> always is able to get the data in a format it understands. The format
>>> description for the file format would have to be quite complex to cover
>>> all possibilities, though. The best that I can come up with is a list of
>>> tuples of channel names (as strings) and data type (as enums).
>>> Processing those isn't fun, though.
>>
>> The problem here is simple. You must know what color type you are
>> going to be working with. There is no guessing. If you want to change
>> to match the file loader better, you'll have to load it twice and then
>> you have to understand the file format internals a bit more.
>> This is kinda where it gets messy.
>>
>
> Messy in what way? My conceptual view of the user code is that it could
> do either of two things:
>
> 1. ask for a image in a format the user specifies - the loader must meet
> that request exactly and do all required conversions
> 2. ask the loader for the image data format contained in the file,
> choose a representation that best matches it in the user's view and only
> then ask the loader as in 1.
>
> Note that the user gets exactly what he asked for in each case, but
> makes an informed decision in 2. only because he chose to.

Think of it like this:

void myfunc(string name) {
	import std.experimental.image.fileformats.png;
	ForwardRange!ubyte input = ...;

	auto headers = loadPNGHeaders(input.save());
	if (headers.pixelType == PNGPixelType.RGB && headers.samples == 16) {
		write("output.png", loadPNG!RGB16(input.save())
		.flipHorizontalRange
		.flipVerticalRange
		. ...
		copyTemporary!MyImage(allocator)
		.asPNG
		.toBytes);
	} else if (headers.pixelType == PNGPixelType.RGBA && headers.samples == 
16) {
		write("output.png", loadPNG!RGBA16(input.save())
		.flipHorizontalRange
		.flipVerticalRange
		. ...
		copyTemporary!MyImage(allocator)
		.asPNG
		.toBytes);
	}

}

Of course that was off the top of my head in more pseudo code then 
anything else. It's also late and I'm tired so excuse any wrong terminology.

>> But, would it be better if you could just parse the headers? So it
>> doesn't initialize the image data. I doubt it would be all that hard.
>> It's just disabling a series of features.
>>
>
> That's what this would boil down to, I believe: a separate entry point
> for the loader that just parses the headers and returns file format
> information to the user, but not the image data itself. The user-facing
> interface would just be an extra interface method. Internally, it would
> share most of the code with the actual image loading code path.

Okay, I'm long since convinced its needed. So no worries here.

>>>>> - Could support for image data alignment be added by defining a new
>>>>> ImageStorage subclass? The actual in-memory data is not exposed to
>>>>> direct access, is it? Access to the raw image data would be preferable
>>>>> for those cases where you know exactly what you are doing. Going
>>>>> through
>>>>> per-pixel access functions for large image regions is going to be
>>>>> dreadfully slow in comparison to what can be achieved with proper
>>>>> processing/filtering code.
>>>>
>>>> I ugh... had this feature once. I removed it as if you already know
>>>> the implementation why not just directly access it?
>>>> But, if there is genuine need to get access to it as e.g. void* then I
>>>> can do it again.
>>>>
>>>>> - Also, uploading textures to the GPU requires passing raw memory
>>>>> blocks
>>>>> and a format description of sorts to the 3D API. Being required to
>>>>> slowly copy the image data in question into a temporary buffer for
>>>>> this
>>>>> process is not an adequate solution.
>>>>
>>>> Again for previous answer, was possible. No matter what the image
>>>> storage type was. But it was hairy and could and would cause bugs in
>>>> the future. Your probably better off knowing the type and getting
>>>> access directly to it that way.
>>>>
>>>
>>> This is where the abstraction of ImageStorage with several possible
>>> implementations becomes iffy. The user is at the loader's mercy to
>>> hopefully hand over the right implementation type. I'm not sure I like
>>> that idea. This seems inconsistent with making the pixel format the
>>> user's choice.  Why should the user have choice over one thing and not
>>> the other?
>>
>> If the image loader uses another image storage type then it is miss
>> behaving. There is no excuse for it.
>>
>
> Am I looking at an old version? The PNG loader I see documented returns
> a specific ImageStorage implementation.

It doesn't?
SwappableImage is used in its place.
Also it is wrapped in a PNGFileFormat because file format != image 
storage type. So while yes it behaves as an image. It is not an image 
primarily. As it includes the headers.

>> Anyway the main thing about this to understand is that if the image
>> loader does not initialize, then it would have to resize and since not
>> all image storage types have to support resizing...
>>
>
> OK, why do you abstract image storage at all? What's the rationale
> there? Storage formats other than the trivial row-by-row storage are
> only used in very few cases in my experience, mostly when it involves
> compression (compressed textures for games - effectively read only after
> compression) or certain kinds of out-of-core image processing. This
> seems to be mostly out of scope this library IMO.

Your right. There is only three that really matter to most users.
One of the primary users of the library is for game development. It is 
who I care personally care about. There is also a lot of overlap between 
efficient GUI toolkit's and game development for images.
While implementations beyond the big three, horizontal/vertical scan 
lines and one dimensional array are out of scope, the extension isn't.

>>>>
>>>> Some very good points that I believe definitely needs to be touched
>>>> upon where had.
>>>>
>>>> I've had a read of OpenImageIO documentation and all I can say is
>>>> irkkk.
>>>> Most of what is in there with e.g. tiles and reflection styles methods
>>>> are out of scope out right as they are a bit specialized for my
>>>> liking. If somebody wants to add it, take a look at the offset
>>>> support. It was written as an 'extension' like ForwardRange is for
>>>> ranges.
>>>>
>>>
>>> I mentioned OpenImageIO as this library is full-featured and very
>>> complete in a lot of areas. It shows what it takes to be as flexible as
>>> possible regarding the image data that is processed. Take it as a
>>> catalog of things to consider, but not as template.
>>>
>>>> The purpose of this library is to work more for GUI's and games then
>>>> anything else. It is meant more for the average programmer then
>>>> specialized in imagery ones. It's kinda why I wrote the specification
>>>> document. Just so in the future if somebody comes in saying its awful
>>>> who does know and use these kinds of libraries. Will have to
>>>> understand that it was out of scope and was done on purpose.
>>>
>>> Having a specification is a good thing and this is why I entered the
>>> discussion. Although your specification is still a bit vague in my
>>> opinion, the general direction is good. The limitation of the scope
>>> looks fine to me and I'm not arguing against that. My point is rather
>>> that your design can still be improved to match that scope better.
>>
>> Yeah indeed. Any tips for specification document improvement?
>> I would love to make it standard for Phobos additions like this.
>
> This is a really big can of worms, especially if you want to set
> standards for the future. You can do anything from informal and lax to
> extreme level of detail.
>
> For example, we require students to write specifications that include
> numbered lists of *all* functional and non-functional requirements for
> their (rather small) projects, which fill endless pages in the end.
> Additionally they have to describe the development tools and
> environment, the target audience, manual and automatic tests, and so on.
> It's the classic waterfall model with all the problems it has (don't ask
> - decision from the top...).
>
> In your case I'd start by defining a target audience and describing
> target use cases in a high-level manner (e.g. "identify the format and
> required loader for an image file", "load the image so that the user
> sees a representation which he understands", ...). Stating non-goals or
> non-features is a good thing to limit scope (your spec already mentions
> some).
>
> The list of use cases can be checked against relevance for the target
> audience. Also, the use cases can be translated into test cases of
> various kinds:
>
> - example snippets demonstrating ease of use of the design (or lack
> thereof)
> - automatic tests (e.g. unit tests)
> - manual tests of complex scenarios requiring user interaction (not
> relevant in this case, but e.g. for UI libraries)
>
> A very formal approach could number the use cases and tests and keep
> cross-references between them to make it easier to check for
> completeness. But for the most part, this would be tedious and boring
> for little gain.

Thank you. I'm definitely staring this to read later tomorrow when 
working on it on stream.
It'll also help with dismissing the arguments regarding signed/unsigned 
vs set size/word size of the coordinate type.