Speed of horizontal flip
John Colvin via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Apr 2 08:27:47 PDT 2015
On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
> On 3/04/2015 12:29 a.m., John Colvin wrote:
>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole
>> wrote:
>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>> Hi,
>>>>> I have a bunch of square r16 and png images which I need to
>>>>> flip
>>>>> horizontally.
>>>>>
>>>>> My flip method looks like this:
>>>>> void hFlip(T)(T[] data, int w)
>>>>> {
>>>>> import std.datetime : StopWatch;
>>>>>
>>>>> StopWatch sw;
>>>>> sw.start();
>>>>>
>>>>> foreach(int i; 0..w)
>>>>> {
>>>>> auto row = data[i*w..(i+1)*w];
>>>>> row.reverse();
>>>>> }
>>>>>
>>>>> sw.stop();
>>>>> writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>> }
>>>>>
>>>>> With simple r16 file format its pretty fast, but with RGB
>>>>> PNG
>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>> compare it with C# and was pretty surprised by the results.
>>>>>
>>>>> C#:
>>>>> PNG load - 90ms
>>>>> PNG flip - 10ms
>>>>> PNG save - 380ms
>>>>>
>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>> PNG load - 500ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 950ms
>>>>>
>>>>> D using imageformats
>>>>> (http://code.dlang.org/packages/imageformats):
>>>>> PNG load - 230ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 1100ms
>>>>>
>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>> C# was just with debug and VisualStudio attached to process
>>>>> for
>>>>> debugging and even with that it is much faster.
>>>>>
>>>>> I know that System.Drawing is using Windows GDI+, that can
>>>>> be
>>>>> used with D too, but not on linux.
>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>> yet), even flip method itself is 3 times slower - I don't
>>>>> know D
>>>>> enough to be sure if there isn't some more effecient way to
>>>>> make
>>>>> the flip. I like how the slices can be used here.
>>>>>
>>>>> For a C# user who is expecting things to just work as fast
>>>>> as
>>>>> possible from a system level programming language this can
>>>>> be
>>>>> somewhat disappointing to see that pure D version is about 3
>>>>> times slower.
>>>>>
>>>>> Am I doing something utterly wrong?
>>>>> Note that this example is not critical for me, it's just a
>>>>> simple
>>>>> hobby script I use to move and flip some images - I can
>>>>> wait. But
>>>>> I post it to see if this can be taken somewhat closer to
>>>>> what can
>>>>> be expected from a system level programming language.
>>>>>
>>>>> dlib:
>>>>> auto im = loadPNG(name);
>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>> savePNG(im, newName);
>>>>>
>>>>> imageformats:
>>>>> auto im = read_image(name);
>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>
>>>>> C# code:
>>>>> static void Main(string[] args)
>>>>> {
>>>>> var files = Directory.GetFiles(args[0]);
>>>>>
>>>>> foreach (var f in files)
>>>>> {
>>>>> var sw = Stopwatch.StartNew();
>>>>> var img = Image.FromFile(f);
>>>>>
>>>>> Debug.WriteLine("Img loaded in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>> sw.Restart();
>>>>>
>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>> Debug.WriteLine("Img flipped in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>> sw.Restart();
>>>>>
>>>>> img.Save(Path.Combine(args[0], "test_" +
>>>>> Path.GetFileName(f)));
>>>>> Debug.WriteLine("Img saved in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>> sw.Stop();
>>>>> }
>>>>> }
>>>>
>>>>
>>>> Assuming I've done it correctly, Devisualization.Image takes
>>>> around 8ms
>>>> in debug mode to flip horizontally using dmd. But 3ms for
>>>> release.
>>>>
>>>> module test;
>>>>
>>>> void main() {
>>>> import devisualization.image;
>>>> import devisualization.image.mutable;
>>>> import devisualization.util.core.linegraph;
>>>>
>>>> import std.stdio;
>>>>
>>>> writeln("===============\nREAD\n===============");
>>>> Image img = imageFromFile("test/large.png");
>>>> img = new MutableImage(img);
>>>>
>>>> import std.datetime : StopWatch;
>>>>
>>>> StopWatch sw;
>>>> sw.start();
>>>>
>>>> foreach(i; 0 .. 1000) {
>>>> img.flipHorizontal;
>>>> }
>>>>
>>>> sw.stop();
>>>>
>>>> writeln("Img flipped in: ", sw.peek().msecs / 1000,
>>>> "[ms]");
>>>> }
>>>>
>>>> I was planning on doing this earlier. But I discovered a PR
>>>> I pulled
>>>> which fixed for 2.067 broke chunk types reading.
>>>
>>> My bad, forgot I decreased test image resolution to 256x256.
>>> I'm
>>> totally out of the running. I have some serious work to do by
>>> the looks.
>>
>> Have you considered just being able to grab an object with
>> changed
>> iteration order instead of actually doing the flip? The same
>> goes for
>> transposes and 90ยบ rotations. Sure, sometimes you do need
>> actually
>> rearrange the memory and in a subset of those cases you need
>> it to be
>> done fast, but a lot of the time you're better off* just using
>> a
>> different iteration scheme (which, for ranges, should probably
>> be part
>> of the type to avoid checking the scheme every iteration).
>>
>> *for speed and memory reasons. Need to keep the original and
>> the
>> transpose? No need to for any duplicates
>>
>> Note that this is what numpy does with transposes. The .T and
>> .transpose
>> methods of ndarray don't actually modify the data, they just
>> set the
>> memory order** whereas the transpose function actually moves
>> memory around.
>>
>> **using a runtime flag, which is ok for them because internal
>> iteration
>> lets you only branch once on it.
>
> I've got it down to ~ 12ms using dmd now. But if the image was
> much bigger (lets say a height of ushort.max). I wouldn't be
> able to use a little trick. But this is only because I'm using
> multithreading.
That would be an insanely large image. If it was square it would
be a 4GiB image. I think it's safe to say that someone with
images that large will be looking for quite specialised solutions
and wouldn't be disappointed if things aren't optimally fast
off-the-shelf!
More information about the Digitalmars-d-learn
mailing list