Speed of horizontal flip

Thu Apr 2 17:14:28 PDT 2015

On 3/04/2015 4:27 a.m., John Colvin wrote:
> On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
>> On 3/04/2015 12:29 a.m., John Colvin wrote:
>>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>>> Hi,
>>>>>> I have a bunch of square r16 and png images which I need to flip
>>>>>> horizontally.
>>>>>>
>>>>>> My flip method looks like this:
>>>>>> void hFlip(T)(T[] data, int w)
>>>>>> {
>>>>>>   import std.datetime : StopWatch;
>>>>>>
>>>>>>   StopWatch sw;
>>>>>>   sw.start();
>>>>>>
>>>>>>   foreach(int i; 0..w)
>>>>>>   {
>>>>>>     auto row = data[i*w..(i+1)*w];
>>>>>>     row.reverse();
>>>>>>   }
>>>>>>
>>>>>>   sw.stop();
>>>>>>   writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>>> }
>>>>>>
>>>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>>> compare it with C# and was pretty surprised by the results.
>>>>>>
>>>>>> C#:
>>>>>> PNG load - 90ms
>>>>>> PNG flip - 10ms
>>>>>> PNG save - 380ms
>>>>>>
>>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>>> PNG load - 500ms
>>>>>> PNG flip - 30ms
>>>>>> PNG save - 950ms
>>>>>>
>>>>>> D using imageformats
>>>>>> (http://code.dlang.org/packages/imageformats):
>>>>>> PNG load - 230ms
>>>>>> PNG flip - 30ms
>>>>>> PNG save - 1100ms
>>>>>>
>>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>>> C# was just with debug and VisualStudio attached to process for
>>>>>> debugging and even with that it is much faster.
>>>>>>
>>>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>>>> used with D too, but not on linux.
>>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>>>> enough to be sure if there isn't some more effecient way to make
>>>>>> the flip. I like how the slices can be used here.
>>>>>>
>>>>>> For a C# user who is expecting things to just work as fast as
>>>>>> possible from a system level programming language this can be
>>>>>> somewhat disappointing to see that pure D version is about 3
>>>>>> times slower.
>>>>>>
>>>>>> Am I doing something utterly wrong?
>>>>>> Note that this example is not critical for me, it's just a simple
>>>>>> hobby script I use to move and flip some images - I can wait. But
>>>>>> I post it to see if this can be taken somewhat closer to what can
>>>>>> be expected from a system level programming language.
>>>>>>
>>>>>> dlib:
>>>>>> auto im = loadPNG(name);
>>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>>> savePNG(im, newName);
>>>>>>
>>>>>> imageformats:
>>>>>> auto im = read_image(name);
>>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>>
>>>>>> C# code:
>>>>>> static void Main(string[] args)
>>>>>>         {
>>>>>>             var files = Directory.GetFiles(args[0]);
>>>>>>
>>>>>>             foreach (var f in files)
>>>>>>             {
>>>>>>                 var sw = Stopwatch.StartNew();
>>>>>>                 var img = Image.FromFile(f);
>>>>>>
>>>>>>                 Debug.WriteLine("Img loaded in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>>                 sw.Restart();
>>>>>>
>>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>>>                 Debug.WriteLine("Img flipped in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>>                 sw.Restart();
>>>>>>
>>>>>>                 img.Save(Path.Combine(args[0], "test_" +
>>>>>> Path.GetFileName(f)));
>>>>>>                 Debug.WriteLine("Img saved in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>>                 sw.Stop();
>>>>>>             }
>>>>>>         }
>>>>>
>>>>>
>>>>> Assuming I've done it correctly, Devisualization.Image takes around
>>>>> 8ms
>>>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>>>
>>>>> module test;
>>>>>
>>>>> void main() {
>>>>>    import devisualization.image;
>>>>>    import devisualization.image.mutable;
>>>>>    import devisualization.util.core.linegraph;
>>>>>
>>>>>    import std.stdio;
>>>>>
>>>>>    writeln("===============\nREAD\n===============");
>>>>>    Image img = imageFromFile("test/large.png");
>>>>>    img = new MutableImage(img);
>>>>>
>>>>>    import std.datetime : StopWatch;
>>>>>
>>>>>    StopWatch sw;
>>>>>    sw.start();
>>>>>
>>>>>    foreach(i; 0 .. 1000) {
>>>>>        img.flipHorizontal;
>>>>>    }
>>>>>
>>>>>    sw.stop();
>>>>>
>>>>>    writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>>>> }
>>>>>
>>>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>>>> which fixed for 2.067 broke chunk types reading.
>>>>
>>>> My bad, forgot I decreased test image resolution to 256x256. I'm
>>>> totally out of the running. I have some serious work to do by the
>>>> looks.
>>>
>>> Have you considered just being able to grab an object with changed
>>> iteration order instead of actually doing the flip? The same goes for
>>> transposes and 90º rotations. Sure, sometimes you do need actually
>>> rearrange the memory and in a subset of those cases you need it to be
>>> done fast, but a lot of the time you're better off* just using a
>>> different iteration scheme (which, for ranges, should probably be part
>>> of the type to avoid checking the scheme every iteration).
>>>
>>> *for speed and memory reasons. Need to keep the original and the
>>> transpose? No need to for any duplicates
>>>
>>> Note that this is what numpy does with transposes. The .T and .transpose
>>> methods of ndarray don't actually modify the data, they just set the
>>> memory order** whereas the transpose function actually moves memory
>>> around.
>>>
>>> **using a runtime flag, which is ok for them because internal iteration
>>> lets you only branch once on it.
>>
>> I've got it down to ~ 12ms using dmd now. But if the image was much
>> bigger (lets say a height of ushort.max). I wouldn't be able to use a
>> little trick. But this is only because I'm using multithreading.
>
> That would be an insanely large image. If it was square it would be a
> 4GiB image. I think it's safe to say that someone with images that large
> will be looking for quite specialised solutions and wouldn't be
> disappointed if things aren't optimally fast off-the-shelf!

Most image editing software could definitely not handle it. I would be 
very surprised if e.g. libpng can even read such a file. Although I'm 
pretty sure mine can ;)

Worse case scenario for more than ushort.max I think it'll be a couple 
hundred ms.