Speed of horizontal flip

Thu Apr 2 08:27:47 PDT 2015

On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
> On 3/04/2015 12:29 a.m., John Colvin wrote:
>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole 
>> wrote:
>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>> Hi,
>>>>> I have a bunch of square r16 and png images which I need to 
>>>>> flip
>>>>> horizontally.
>>>>>
>>>>> My flip method looks like this:
>>>>> void hFlip(T)(T[] data, int w)
>>>>> {
>>>>>   import std.datetime : StopWatch;
>>>>>
>>>>>   StopWatch sw;
>>>>>   sw.start();
>>>>>
>>>>>   foreach(int i; 0..w)
>>>>>   {
>>>>>     auto row = data[i*w..(i+1)*w];
>>>>>     row.reverse();
>>>>>   }
>>>>>
>>>>>   sw.stop();
>>>>>   writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>> }
>>>>>
>>>>> With simple r16 file format its pretty fast, but with RGB 
>>>>> PNG
>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>> compare it with C# and was pretty surprised by the results.
>>>>>
>>>>> C#:
>>>>> PNG load - 90ms
>>>>> PNG flip - 10ms
>>>>> PNG save - 380ms
>>>>>
>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>> PNG load - 500ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 950ms
>>>>>
>>>>> D using imageformats
>>>>> (http://code.dlang.org/packages/imageformats):
>>>>> PNG load - 230ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 1100ms
>>>>>
>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>> C# was just with debug and VisualStudio attached to process 
>>>>> for
>>>>> debugging and even with that it is much faster.
>>>>>
>>>>> I know that System.Drawing is using Windows GDI+, that can 
>>>>> be
>>>>> used with D too, but not on linux.
>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>> yet), even flip method itself is 3 times slower - I don't 
>>>>> know D
>>>>> enough to be sure if there isn't some more effecient way to 
>>>>> make
>>>>> the flip. I like how the slices can be used here.
>>>>>
>>>>> For a C# user who is expecting things to just work as fast 
>>>>> as
>>>>> possible from a system level programming language this can 
>>>>> be
>>>>> somewhat disappointing to see that pure D version is about 3
>>>>> times slower.
>>>>>
>>>>> Am I doing something utterly wrong?
>>>>> Note that this example is not critical for me, it's just a 
>>>>> simple
>>>>> hobby script I use to move and flip some images - I can 
>>>>> wait. But
>>>>> I post it to see if this can be taken somewhat closer to 
>>>>> what can
>>>>> be expected from a system level programming language.
>>>>>
>>>>> dlib:
>>>>> auto im = loadPNG(name);
>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>> savePNG(im, newName);
>>>>>
>>>>> imageformats:
>>>>> auto im = read_image(name);
>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>
>>>>> C# code:
>>>>> static void Main(string[] args)
>>>>>         {
>>>>>             var files = Directory.GetFiles(args[0]);
>>>>>
>>>>>             foreach (var f in files)
>>>>>             {
>>>>>                 var sw = Stopwatch.StartNew();
>>>>>                 var img = Image.FromFile(f);
>>>>>
>>>>>                 Debug.WriteLine("Img loaded in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>                 sw.Restart();
>>>>>
>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>>                 Debug.WriteLine("Img flipped in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>                 sw.Restart();
>>>>>
>>>>>                 img.Save(Path.Combine(args[0], "test_" +
>>>>> Path.GetFileName(f)));
>>>>>                 Debug.WriteLine("Img saved in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>                 sw.Stop();
>>>>>             }
>>>>>         }
>>>>
>>>>
>>>> Assuming I've done it correctly, Devisualization.Image takes 
>>>> around 8ms
>>>> in debug mode to flip horizontally using dmd. But 3ms for 
>>>> release.
>>>>
>>>> module test;
>>>>
>>>> void main() {
>>>>    import devisualization.image;
>>>>    import devisualization.image.mutable;
>>>>    import devisualization.util.core.linegraph;
>>>>
>>>>    import std.stdio;
>>>>
>>>>    writeln("===============\nREAD\n===============");
>>>>    Image img = imageFromFile("test/large.png");
>>>>    img = new MutableImage(img);
>>>>
>>>>    import std.datetime : StopWatch;
>>>>
>>>>    StopWatch sw;
>>>>    sw.start();
>>>>
>>>>    foreach(i; 0 .. 1000) {
>>>>        img.flipHorizontal;
>>>>    }
>>>>
>>>>    sw.stop();
>>>>
>>>>    writeln("Img flipped in: ", sw.peek().msecs / 1000, 
>>>> "[ms]");
>>>> }
>>>>
>>>> I was planning on doing this earlier. But I discovered a PR 
>>>> I pulled
>>>> which fixed for 2.067 broke chunk types reading.
>>>
>>> My bad, forgot I decreased test image resolution to 256x256. 
>>> I'm
>>> totally out of the running. I have some serious work to do by 
>>> the looks.
>>
>> Have you considered just being able to grab an object with 
>> changed
>> iteration order instead of actually doing the flip? The same 
>> goes for
>> transposes and 90º rotations. Sure, sometimes you do need 
>> actually
>> rearrange the memory and in a subset of those cases you need 
>> it to be
>> done fast, but a lot of the time you're better off* just using 
>> a
>> different iteration scheme (which, for ranges, should probably 
>> be part
>> of the type to avoid checking the scheme every iteration).
>>
>> *for speed and memory reasons. Need to keep the original and 
>> the
>> transpose? No need to for any duplicates
>>
>> Note that this is what numpy does with transposes. The .T and 
>> .transpose
>> methods of ndarray don't actually modify the data, they just 
>> set the
>> memory order** whereas the transpose function actually moves 
>> memory around.
>>
>> **using a runtime flag, which is ok for them because internal 
>> iteration
>> lets you only branch once on it.
>
> I've got it down to ~ 12ms using dmd now. But if the image was 
> much bigger (lets say a height of ushort.max). I wouldn't be 
> able to use a little trick. But this is only because I'm using 
> multithreading.

That would be an insanely large image. If it was square it would 
be a 4GiB image. I think it's safe to say that someone with 
images that large will be looking for quite specialised solutions 
and wouldn't be disappointed if things aren't optimally fast 
off-the-shelf!