Speed of horizontal flip

Rikki Cattermole via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Apr 2 04:49:44 PDT 2015


On 3/04/2015 12:29 a.m., John Colvin wrote:
> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>> Hi,
>>>> I have a bunch of square r16 and png images which I need to flip
>>>> horizontally.
>>>>
>>>> My flip method looks like this:
>>>> void hFlip(T)(T[] data, int w)
>>>> {
>>>>    import std.datetime : StopWatch;
>>>>
>>>>    StopWatch sw;
>>>>    sw.start();
>>>>
>>>>    foreach(int i; 0..w)
>>>>    {
>>>>      auto row = data[i*w..(i+1)*w];
>>>>      row.reverse();
>>>>    }
>>>>
>>>>    sw.stop();
>>>>    writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>> }
>>>>
>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>> compare it with C# and was pretty surprised by the results.
>>>>
>>>> C#:
>>>> PNG load - 90ms
>>>> PNG flip - 10ms
>>>> PNG save - 380ms
>>>>
>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>> PNG load - 500ms
>>>> PNG flip - 30ms
>>>> PNG save - 950ms
>>>>
>>>> D using imageformats
>>>> (http://code.dlang.org/packages/imageformats):
>>>> PNG load - 230ms
>>>> PNG flip - 30ms
>>>> PNG save - 1100ms
>>>>
>>>> I used dmd-2.0.67 with -release -inline -O
>>>> C# was just with debug and VisualStudio attached to process for
>>>> debugging and even with that it is much faster.
>>>>
>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>> used with D too, but not on linux.
>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>> enough to be sure if there isn't some more effecient way to make
>>>> the flip. I like how the slices can be used here.
>>>>
>>>> For a C# user who is expecting things to just work as fast as
>>>> possible from a system level programming language this can be
>>>> somewhat disappointing to see that pure D version is about 3
>>>> times slower.
>>>>
>>>> Am I doing something utterly wrong?
>>>> Note that this example is not critical for me, it's just a simple
>>>> hobby script I use to move and flip some images - I can wait. But
>>>> I post it to see if this can be taken somewhat closer to what can
>>>> be expected from a system level programming language.
>>>>
>>>> dlib:
>>>> auto im = loadPNG(name);
>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>> savePNG(im, newName);
>>>>
>>>> imageformats:
>>>> auto im = read_image(name);
>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>
>>>> C# code:
>>>> static void Main(string[] args)
>>>>          {
>>>>              var files = Directory.GetFiles(args[0]);
>>>>
>>>>              foreach (var f in files)
>>>>              {
>>>>                  var sw = Stopwatch.StartNew();
>>>>                  var img = Image.FromFile(f);
>>>>
>>>>                  Debug.WriteLine("Img loaded in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>                  sw.Restart();
>>>>
>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>                  Debug.WriteLine("Img flipped in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>                  sw.Restart();
>>>>
>>>>                  img.Save(Path.Combine(args[0], "test_" +
>>>> Path.GetFileName(f)));
>>>>                  Debug.WriteLine("Img saved in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>                  sw.Stop();
>>>>              }
>>>>          }
>>>
>>>
>>> Assuming I've done it correctly, Devisualization.Image takes around 8ms
>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>
>>> module test;
>>>
>>> void main() {
>>>     import devisualization.image;
>>>     import devisualization.image.mutable;
>>>     import devisualization.util.core.linegraph;
>>>
>>>     import std.stdio;
>>>
>>>     writeln("===============\nREAD\n===============");
>>>     Image img = imageFromFile("test/large.png");
>>>     img = new MutableImage(img);
>>>
>>>     import std.datetime : StopWatch;
>>>
>>>     StopWatch sw;
>>>     sw.start();
>>>
>>>     foreach(i; 0 .. 1000) {
>>>         img.flipHorizontal;
>>>     }
>>>
>>>     sw.stop();
>>>
>>>     writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>> }
>>>
>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>> which fixed for 2.067 broke chunk types reading.
>>
>> My bad, forgot I decreased test image resolution to 256x256. I'm
>> totally out of the running. I have some serious work to do by the looks.
>
> Have you considered just being able to grab an object with changed
> iteration order instead of actually doing the flip? The same goes for
> transposes and 90ยบ rotations. Sure, sometimes you do need actually
> rearrange the memory and in a subset of those cases you need it to be
> done fast, but a lot of the time you're better off* just using a
> different iteration scheme (which, for ranges, should probably be part
> of the type to avoid checking the scheme every iteration).
>
> *for speed and memory reasons. Need to keep the original and the
> transpose? No need to for any duplicates
>
> Note that this is what numpy does with transposes. The .T and .transpose
> methods of ndarray don't actually modify the data, they just set the
> memory order** whereas the transpose function actually moves memory around.
>
> **using a runtime flag, which is ok for them because internal iteration
> lets you only branch once on it.

I've got it down to ~ 12ms using dmd now. But if the image was much 
bigger (lets say a height of ushort.max). I wouldn't be able to use a 
little trick. But this is only because I'm using multithreading.


More information about the Digitalmars-d-learn mailing list