Gfrktyl

Question

I'm currently running this function 60 times per second in order to get the dominant color on my screen in RGB values. It is using approximately 15% of my CPU on 30FPS and 25% of my CPU on 60FPS. Is there any way I can improve the efficiency of this loop or is there a better way to get the color altogether?

public Color getDominantColor(System.Drawing.Bitmap bmp) {

            BitmapData srcData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);



            int stride = srcData.Stride;



            IntPtr Scan0 = srcData.Scan0;



            int totals = new int { 0, 0, 0 };



            int width = bmp.Width;

            int height = bmp.Height;



            unsafe

            {

                byte* p = (byte*)(void*)Scan0;



                for (int y = 0; y < height; y++) {

                    for (int x = 0; x < width; x++) {

                        for (int color = 0; color < 3; color++) {

                            int idx = (y * stride) + x * 4 + color;

                            totals[color] += p[idx];

                        }

                    }

                }

            }



            int avgB = totals[0] / (width * height);

            int avgG = totals[1] / (width * height);

            int avgR = totals[2] / (width * height);



            bmp.UnlockBits(srcData);



            return Color.FromArgb(avgR, avgG, avgB);

        }

The indentation looks very off - does it look exactly like that in your IDE? — Mar 13 '17 at 18:56
Did you try to move independent computations outside the loop and not recalculating stuff? — Mar 14 '17 at 5:01

Community♦ 1 · Answer 1 · 2017-05-23 12:40:52Z

If we look at your for loop very carefully we can see that you can eliminate two multiplications:

for (int y = 0; y < height; y++) {

    for (int x = 0; x < width; x++) {

        for (int color = 0; color < 3; color++) {

            int idx = (y * stride) + x * 4 + color;

            totals[color] += p[idx];

        }

    }

}

You don't use y except for y * stride, and you don't use x except for x * 4, rewrite the for loop and you can eliminate those entirely.

var heightLimit = height * stride;

var widthLimit = width * 4;



for (int y = 0; y < heightLimit; y += stride) {

    for (int x = 0; x < widthLimit; x += 4) {

        for (int color = 0; color < 3; color++) {

            int idx = y + x + color;

            totals[color] += p[idx];

        }

    }

}

By removing all three multiplication operations, we effectively reduce the amount of work we do by a significant portion. (There are roughly 8 instructions in the original, and 6 in the reduced, so we've eliminated 25% of our work.)

Other than that, there really isn't much you can do. You could consider chunking the work, then only recalculate the regions that were re-drawn if that's reasonable, you could also chunk and thread it (this wouldn't reduce CPU usage but it would reduce time spend on this method).

With as frequently as you call this method, the next lines may be a potential for optimization as well:

int avgB = totals[0] / (width * height);

int avgG = totals[1] / (width * height);

int avgR = totals[2] / (width * height);

Why do width * height in all three calculations? Why not store var pixelCount = width * height; then divide by pixelCount? Of course, division is still slow, but you're not using floating-point math so we can't use the reciprocal of it.

You could consider, as mentioned in a comment, using CUDA/OpenGL/GPU-level work. Basically, operating on the GPU itself instead of using the CPU to do what could be very efficient on the GPU. (It's built specifically for this type of processing.) There is at least one Stack Overflow question on running C# code on the GPU, it's not very easy or simple, but it can give you a lot of power.

Peter Taylor 15.3k2657 · Answer 2 · 2017-03-14 09:00:59Z

One useful performance trick when dealing with low-level pixel manipulations like this is that it's often possible to process red and blue together using the 8 bits for green as a gap. Since here you're just adding them, you can add 256 blue values before they would overflow past green into red.

Taking into account John Wu's comment about the stride being irrelevant you can do (untested, and in particular might have endianness bugs; it's several years since I regularly wrote this kind of code, and that was in Java rather than C#):

        unsafe

        {

            uint* p = (uint*)(void*)Scan0;



            uint pixelCount = width * height;

            uint idx = 0;

            while (idx < (pixelCount & ~0xff)) {

                uint sumRR00BB = 0;

                uint sum00GG00 = 0;

                for (int j = 0; j < 0x100; j++) {

                    sumRR00BB += p[idx] & 0xff00ff;

                    sum00GG00 += p[idx] & 0x00ff00;

                    idx++;

                }



                totals[0] += sumRR00BB >> 16;

                totals[1] += sum00GG00 >> 8;

                totals[2] += sumRR00BB & 0xffff;

            }



            // And the final partial block of fewer than 0x100 pixels.

            {

                uint sumRR00BB = 0;

                uint sum00GG00 = 0;

                while (idx < pixelCount) {

                    sumRR00BB += p[idx] & 0xff00ff;

                    sum00GG00 += p[idx] & 0x00ff00;

                    idx++;

                }



                totals[0] += sumRR00BB >> 16;

                totals[1] += sum00GG00 >> 8;

                totals[2] += sumRR00BB & 0xffff;

            }

        }

You need to change to totals[0] += sumRR00BB & 0xffff; and totals[2] += sumRR00BB >> 16; because the color components are stored BGR instead of RGB. — Mar 14 '17 at 11:45
This makes no sense anyway right? 'p' would be the BGR addresses, so it makes no sense to apply bitmasks over the same p[idx] value unless i'm missing something? Maybe that's how java does it in memory, but surely C# doesnt return uint* in that case. — Oct 31 at 19:43
@Viezevingertjes, I don't understand your objection. What does p[idx] contain if not a pixel's value? — Oct 31 at 19:57
p[idx] = 1 byte = 8 bits, 0-255 so R, G or B, not all three, why would those bitmasks work exactly? Not trying to be a dick, would be interested in how this theoretically would work but not seeying this work in C# as-is. — Oct 31 at 20:01
@Viezevingertjes, I tried to explain that in the first paragraph. If you sum 256 values in the range 0x00 to 0xff the result will be in the range 0x0000 to 0xffff. So if you mask with 0xff00ff and add 256 values then the top 16 bits of the result will contain the sums of the bits masked by 0xff0000 and the bottom 16 bits of the result will contain the sums of the bits masked by 0x0000ff. If that's a clearer explanation then I can edit it in, or feel free to do so yourself. — Nov 1 at 7:53

Community♦ 1 · Answer 3 · 2017-05-23 12:40:49Z

Five ideas, from easy to hard:

You can simplify your x/y loop to run in one dimension instead of two-- for ( i = 0; i < y * x * c; i += 4 ). Since you're looking at the whole image, you don't need to worry about the stride. Not only will this reduce the number of operations required, but you may get better performance from your CPU due to better pipelining and branch prediction.

If you can, use a lower color depth (I don't think you need 24 bits of color depth if you're just computing an average). The smaller storage size will yield a smaller memory area to scan. You will have to shift bits around to do the math, but that sort of thing is faster than memory access.

You could try resizing or scaling the bitmap to something lower rez. The resize operation will interpolate color. In theory you could scale it to a 1x1 image and just read that one pixel. If you use GDI+ to perform the scale it could use hardware acceleration and be very fast.

Keep a copy of the last bitmap and its totals. Use REPE CMPSD (yes, this is assembly) to compare your new bitmap to the old one and find non-matching cells. Adjust totals and recompute average. This is probably a little harder than it sounds but the scan would be incredibly fast. This option would work better if most pixels are expected to stay the same from frame to frame.

Do the entire scan in assembly, four bytes at a time. DWord operations, believe it or not, are faster than byte operations, for a modern CPU. You can get the byte you need through bit shifting, which take very few clock cycles. Been a while for me, but would look something like this:
```
    MOV ECX, ArrayLength ;ECX is our counter (= bytecount ÷ 4)

    MOV EDX, Scan0       ;EDX is our data pointer

    SUB BX, BX           ;Set BX to 0 for later

Loop:

    LODSL                ;Load EAX from array, increment pointer

    SHRL 8, EAX          ;Dump the least eight bits

    ADDB GreenTotal, AL  ;Add least 8 bits to green total

    ADCW GreenTotal+1,BX ;Carry the 1

    SHRL 8, EAX          ;Shift right 8 more bits

    ADDB BlueTotal, AL   ;Add least 8 bits to blue total

    ADCW BlueTotal+1, BX ;Carry the 1

    SHRL 8, EAX          ;Shift right 8 more bits

    ADDB RedTotal, AL    ;Add least 8 bits to red total

    ADCW RedTotal+1, BX  ;Carry the 1

    LOOPNZ Loop          ;Decrement CX and keep going until it is zero
```
If the assembly is too much to take on, you can try to do the same in C++ and maybe the compiler will do a pretty good job of it. At the very least, we have gotten rid of all of your multiplication operations (which can take up 5-20x the number of clock cycles compared to a shift), two of your loops, and a whole bunch of if conditionals (which would mess up your CPU's branch prediction). Also we will get nice big cache bursts regardless of the dword alignment of the byte buffer, because it is a single-dimensional contiguous blob.

I find it hard to believe that this would improve performance at all. — Mar 14 '17 at 4:15
Maybe, maybe not. Depends how you do it. If you render+transform the bitmap onto a framebuffer (that you never actually display), the operation may take advantage of GPU hardware acceleration-- that could actually be blazingly fast. — Mar 14 '17 at 6:20
Oh, for context, the answer has been edited since I replied to it -- when I replied, it was just the point about resizing. — Mar 14 '17 at 15:22
Haha, true. Anyway, most of this conversation is irrelevant; I've purged all but my first two comments. I'd recommend editing the think you commented into your answer as a potential resource for how to link C++ to C#; it's useful to speed things up. — Mar 14 '17 at 17:20

Community♦ 1 · Answer 4 · 2017-04-13 12:40:41Z

Validation

Because the method is public you shouldn't assume that you get a valid non null Bitmap. You should add a null check otherwise you are exposing implementation details of your method.

Naming

Based on the C# Naming guidelines methods should be named using PascalCase casing. Method level variables should be named using camelCase casing. Hence getDominantColor->GetDominantColor and IntPtr Scan0->IntPtr scan0.

Possible problems

You state in your question that this method is used to get the dominant color of your desktop. If you use it only for that, then all will be good.

A problem could come up if you use this method with some different bitmap.

If the bitmap which is passed is of DIN A4 size with e.g 300dpi your int totals will overflow.

Performance

I would suggest to use pointer arithmetic instead of calculating each time the idx value. I also would remove the most inner loop like @Zefick posted.

public System.Drawing.Color GetDominantColor(Bitmap bmp)

{

    if (bmp == null)

    {

        throw new ArgumentNullException("bmp");

    }



    BitmapData srcData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, bmp.PixelFormat);



    int bytesPerPixel = Image.GetPixelFormatSize(srcData.PixelFormat) / 8;



    int stride = srcData.Stride;



    IntPtr scan0 = srcData.Scan0;



    long totals = new long { 0, 0, 0 };



    int width = bmp.Width * bytesPerPixel;

    int height = bmp.Height;



    unsafe

    {

        byte* p = (byte*)(void*)scan0;



        for (int y = 0; y < height; y++)

        {

            for (int x = 0; x < width; x += bytesPerPixel)

            {

                totals[0] += p[x + 0];

                totals[1] += p[x + 1];

                totals[2] += p[x + 2];

            }



            p += stride;

        }

    }



    long pixelCount = bmp.Width * height;



    int avgB = Convert.ToInt32(totals[0] / pixelCount);

    int avgG = Convert.ToInt32(totals[1] / pixelCount);

    int avgR = Convert.ToInt32(totals[2] / pixelCount);



    bmp.UnlockBits(srcData);



    return Color.FromArgb(avgR, avgG, avgB);



}

Benchmarking with BechnmarkDotNet with x64 compiled yields

Yours: 17.5252 ms

EBrown's: 14.6109 ms

Mine: 8.4846 ms
Peter Taylor's: 4.6419 ms

Until @PeterTylor doesn't change its code please see my comment: Getting the dominant RGB color of a bitmap

score 1 · Answer 5 · 2017-03-14 05:31:30Z

At least you can unroll the most inner loop without losing quality following way:

for (int y = 0; y < height; y++) {

    for (int x = 0; x < width; x++) {

         int idx = (y * stride) + x * 4;

         totals[color] += p[idx];

         totals[color+1] += p[idx+1];

         totals[color+2] += p[idx+2];

    }

}

Potentially, the compiler can do this optimization itself, but I'm not sure that it does this inside an "unsafe" block.

Viezevingertjes 1375 · Answer 6 · 2018-11-13 07:51:18Z

Currently doing real-time processing where every frame counts, with a modified version of Peter Taylor's answer. Tested on a weak Raspberry PI it still resulted in 24 out of 25 frames per second. Which is amazing performance knowing what the specifications are of it and running it through Mono.

The method itself was hard to understand at first, but maybe this way of using it might clear a few things up.

Please note that this only works with 32bpp images due to the use of the Unsigned 32-bit integer to go through the memory.

    public unsafe uint GetAverageColorInRegion(Bitmap source, Rectangle region)

    {

        var rectangle = new Rectangle(0, 0, source.Width, source.Height);

        var bitmapData = source.LockBits(rectangle, ImageLockMode.ReadOnly, source.PixelFormat);

        var pixelCount = region.Width * region.Height;

        var scanWidth = bitmapData.Stride / 4;



        uint totals = { 0, 0, 0 };

        int flushCounter = 0;

        uint sumRR00BB = 0;

        uint sum00GG00 = 0;



        for (var y = region.Y; y < region.Y + region.Height; y++)

        {

            uint* row = (uint*)bitmapData.Scan0 + y * scanWidth;



            for (var x = region.X; x < region.X + region.Width; x++)

            {

                sumRR00BB += row[x] & 0xff00ff;

                sum00GG00 += row[x] & 0x00ff00;



                // Flush before overflow occurs.

                if (flushCounter++ == 0xff)

                {

                    totals[0] += sumRR00BB >> 16;

                    totals[1] += sum00GG00 >> 8;

                    totals[2] += sumRR00BB & 0xffff;



                    sumRR00BB = 0;

                    sum00GG00 = 0;



                    flushCounter = 0;

                }

            }

        }



        // Flush left-over's.

        totals[0] += sumRR00BB >> 16;

        totals[1] += sum00GG00 >> 8;

        totals[2] += sumRR00BB & 0xffff;



        // Average the totals.

        totals[0] /= (uint)pixelCount;

        totals[1] /= (uint)pixelCount;

        totals[2] /= (uint)pixelCount;



        source.UnlockBits(bitmapData);



        return totals;

    }

Community♦ 1 · Answer 7 · 2017-05-23 12:40:52Z

If we look at your for loop very carefully we can see that you can eliminate two multiplications:

for (int y = 0; y < height; y++) {

    for (int x = 0; x < width; x++) {

        for (int color = 0; color < 3; color++) {

            int idx = (y * stride) + x * 4 + color;

            totals[color] += p[idx];

        }

    }

}

You don't use y except for y * stride, and you don't use x except for x * 4, rewrite the for loop and you can eliminate those entirely.

var heightLimit = height * stride;

var widthLimit = width * 4;



for (int y = 0; y < heightLimit; y += stride) {

    for (int x = 0; x < widthLimit; x += 4) {

        for (int color = 0; color < 3; color++) {

            int idx = y + x + color;

            totals[color] += p[idx];

        }

    }

}

By removing all three multiplication operations, we effectively reduce the amount of work we do by a significant portion. (There are roughly 8 instructions in the original, and 6 in the reduced, so we've eliminated 25% of our work.)

Other than that, there really isn't much you can do. You could consider chunking the work, then only recalculate the regions that were re-drawn if that's reasonable, you could also chunk and thread it (this wouldn't reduce CPU usage but it would reduce time spend on this method).

With as frequently as you call this method, the next lines may be a potential for optimization as well:

int avgB = totals[0] / (width * height);

int avgG = totals[1] / (width * height);

int avgR = totals[2] / (width * height);

Why do width * height in all three calculations? Why not store var pixelCount = width * height; then divide by pixelCount? Of course, division is still slow, but you're not using floating-point math so we can't use the reciprocal of it.

You could consider, as mentioned in a comment, using CUDA/OpenGL/GPU-level work. Basically, operating on the GPU itself instead of using the CPU to do what could be very efficient on the GPU. (It's built specifically for this type of processing.) There is at least one Stack Overflow question on running C# code on the GPU, it's not very easy or simple, but it can give you a lot of power.

Peter Taylor 15.3k2657 · Answer 8 · 2017-03-14 09:00:59Z

One useful performance trick when dealing with low-level pixel manipulations like this is that it's often possible to process red and blue together using the 8 bits for green as a gap. Since here you're just adding them, you can add 256 blue values before they would overflow past green into red.

Taking into account John Wu's comment about the stride being irrelevant you can do (untested, and in particular might have endianness bugs; it's several years since I regularly wrote this kind of code, and that was in Java rather than C#):

        unsafe

        {

            uint* p = (uint*)(void*)Scan0;



            uint pixelCount = width * height;

            uint idx = 0;

            while (idx < (pixelCount & ~0xff)) {

                uint sumRR00BB = 0;

                uint sum00GG00 = 0;

                for (int j = 0; j < 0x100; j++) {

                    sumRR00BB += p[idx] & 0xff00ff;

                    sum00GG00 += p[idx] & 0x00ff00;

                    idx++;

                }



                totals[0] += sumRR00BB >> 16;

                totals[1] += sum00GG00 >> 8;

                totals[2] += sumRR00BB & 0xffff;

            }



            // And the final partial block of fewer than 0x100 pixels.

            {

                uint sumRR00BB = 0;

                uint sum00GG00 = 0;

                while (idx < pixelCount) {

                    sumRR00BB += p[idx] & 0xff00ff;

                    sum00GG00 += p[idx] & 0x00ff00;

                    idx++;

                }



                totals[0] += sumRR00BB >> 16;

                totals[1] += sum00GG00 >> 8;

                totals[2] += sumRR00BB & 0xffff;

            }

        }

You need to change to totals[0] += sumRR00BB & 0xffff; and totals[2] += sumRR00BB >> 16; because the color components are stored BGR instead of RGB. — Mar 14 '17 at 11:45
This makes no sense anyway right? 'p' would be the BGR addresses, so it makes no sense to apply bitmasks over the same p[idx] value unless i'm missing something? Maybe that's how java does it in memory, but surely C# doesnt return uint* in that case. — Oct 31 at 19:43
@Viezevingertjes, I don't understand your objection. What does p[idx] contain if not a pixel's value? — Oct 31 at 19:57
p[idx] = 1 byte = 8 bits, 0-255 so R, G or B, not all three, why would those bitmasks work exactly? Not trying to be a dick, would be interested in how this theoretically would work but not seeying this work in C# as-is. — Oct 31 at 20:01
@Viezevingertjes, I tried to explain that in the first paragraph. If you sum 256 values in the range 0x00 to 0xff the result will be in the range 0x0000 to 0xffff. So if you mask with 0xff00ff and add 256 values then the top 16 bits of the result will contain the sums of the bits masked by 0xff0000 and the bottom 16 bits of the result will contain the sums of the bits masked by 0x0000ff. If that's a clearer explanation then I can edit it in, or feel free to do so yourself. — Nov 1 at 7:53

Community♦ 1 · Answer 9 · 2017-05-23 12:40:49Z

Five ideas, from easy to hard:

You can simplify your x/y loop to run in one dimension instead of two-- for ( i = 0; i < y * x * c; i += 4 ). Since you're looking at the whole image, you don't need to worry about the stride. Not only will this reduce the number of operations required, but you may get better performance from your CPU due to better pipelining and branch prediction.

If you can, use a lower color depth (I don't think you need 24 bits of color depth if you're just computing an average). The smaller storage size will yield a smaller memory area to scan. You will have to shift bits around to do the math, but that sort of thing is faster than memory access.

You could try resizing or scaling the bitmap to something lower rez. The resize operation will interpolate color. In theory you could scale it to a 1x1 image and just read that one pixel. If you use GDI+ to perform the scale it could use hardware acceleration and be very fast.

Keep a copy of the last bitmap and its totals. Use REPE CMPSD (yes, this is assembly) to compare your new bitmap to the old one and find non-matching cells. Adjust totals and recompute average. This is probably a little harder than it sounds but the scan would be incredibly fast. This option would work better if most pixels are expected to stay the same from frame to frame.

Do the entire scan in assembly, four bytes at a time. DWord operations, believe it or not, are faster than byte operations, for a modern CPU. You can get the byte you need through bit shifting, which take very few clock cycles. Been a while for me, but would look something like this:
```
    MOV ECX, ArrayLength ;ECX is our counter (= bytecount ÷ 4)

    MOV EDX, Scan0       ;EDX is our data pointer

    SUB BX, BX           ;Set BX to 0 for later

Loop:

    LODSL                ;Load EAX from array, increment pointer

    SHRL 8, EAX          ;Dump the least eight bits

    ADDB GreenTotal, AL  ;Add least 8 bits to green total

    ADCW GreenTotal+1,BX ;Carry the 1

    SHRL 8, EAX          ;Shift right 8 more bits

    ADDB BlueTotal, AL   ;Add least 8 bits to blue total

    ADCW BlueTotal+1, BX ;Carry the 1

    SHRL 8, EAX          ;Shift right 8 more bits

    ADDB RedTotal, AL    ;Add least 8 bits to red total

    ADCW RedTotal+1, BX  ;Carry the 1

    LOOPNZ Loop          ;Decrement CX and keep going until it is zero
```
If the assembly is too much to take on, you can try to do the same in C++ and maybe the compiler will do a pretty good job of it. At the very least, we have gotten rid of all of your multiplication operations (which can take up 5-20x the number of clock cycles compared to a shift), two of your loops, and a whole bunch of if conditionals (which would mess up your CPU's branch prediction). Also we will get nice big cache bursts regardless of the dword alignment of the byte buffer, because it is a single-dimensional contiguous blob.

I find it hard to believe that this would improve performance at all. — Mar 14 '17 at 4:15
Maybe, maybe not. Depends how you do it. If you render+transform the bitmap onto a framebuffer (that you never actually display), the operation may take advantage of GPU hardware acceleration-- that could actually be blazingly fast. — Mar 14 '17 at 6:20
Oh, for context, the answer has been edited since I replied to it -- when I replied, it was just the point about resizing. — Mar 14 '17 at 15:22
Haha, true. Anyway, most of this conversation is irrelevant; I've purged all but my first two comments. I'd recommend editing the think you commented into your answer as a potential resource for how to link C++ to C#; it's useful to speed things up. — Mar 14 '17 at 17:20

Community♦ 1 · Answer 10 · 2017-04-13 12:40:41Z

Validation

Because the method is public you shouldn't assume that you get a valid non null Bitmap. You should add a null check otherwise you are exposing implementation details of your method.

Naming

Based on the C# Naming guidelines methods should be named using PascalCase casing. Method level variables should be named using camelCase casing. Hence getDominantColor->GetDominantColor and IntPtr Scan0->IntPtr scan0.

Possible problems

You state in your question that this method is used to get the dominant color of your desktop. If you use it only for that, then all will be good.

A problem could come up if you use this method with some different bitmap.

If the bitmap which is passed is of DIN A4 size with e.g 300dpi your int totals will overflow.

Performance

I would suggest to use pointer arithmetic instead of calculating each time the idx value. I also would remove the most inner loop like @Zefick posted.

public System.Drawing.Color GetDominantColor(Bitmap bmp)

{

    if (bmp == null)

    {

        throw new ArgumentNullException("bmp");

    }



    BitmapData srcData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadOnly, bmp.PixelFormat);



    int bytesPerPixel = Image.GetPixelFormatSize(srcData.PixelFormat) / 8;



    int stride = srcData.Stride;



    IntPtr scan0 = srcData.Scan0;



    long totals = new long { 0, 0, 0 };



    int width = bmp.Width * bytesPerPixel;

    int height = bmp.Height;



    unsafe

    {

        byte* p = (byte*)(void*)scan0;



        for (int y = 0; y < height; y++)

        {

            for (int x = 0; x < width; x += bytesPerPixel)

            {

                totals[0] += p[x + 0];

                totals[1] += p[x + 1];

                totals[2] += p[x + 2];

            }



            p += stride;

        }

    }



    long pixelCount = bmp.Width * height;



    int avgB = Convert.ToInt32(totals[0] / pixelCount);

    int avgG = Convert.ToInt32(totals[1] / pixelCount);

    int avgR = Convert.ToInt32(totals[2] / pixelCount);



    bmp.UnlockBits(srcData);



    return Color.FromArgb(avgR, avgG, avgB);



}

Benchmarking with BechnmarkDotNet with x64 compiled yields

Yours: 17.5252 ms

EBrown's: 14.6109 ms

Mine: 8.4846 ms
Peter Taylor's: 4.6419 ms

Until @PeterTylor doesn't change its code please see my comment: Getting the dominant RGB color of a bitmap

score 1 · Answer 11 · 2017-03-14 05:31:30Z

At least you can unroll the most inner loop without losing quality following way:

for (int y = 0; y < height; y++) {

    for (int x = 0; x < width; x++) {

         int idx = (y * stride) + x * 4;

         totals[color] += p[idx];

         totals[color+1] += p[idx+1];

         totals[color+2] += p[idx+2];

    }

}

Potentially, the compiler can do this optimization itself, but I'm not sure that it does this inside an "unsafe" block.

Viezevingertjes 1375 · Answer 12 · 2018-11-13 07:51:18Z

Currently doing real-time processing where every frame counts, with a modified version of Peter Taylor's answer. Tested on a weak Raspberry PI it still resulted in 24 out of 25 frames per second. Which is amazing performance knowing what the specifications are of it and running it through Mono.

The method itself was hard to understand at first, but maybe this way of using it might clear a few things up.

Please note that this only works with 32bpp images due to the use of the Unsigned 32-bit integer to go through the memory.

    public unsafe uint GetAverageColorInRegion(Bitmap source, Rectangle region)

    {

        var rectangle = new Rectangle(0, 0, source.Width, source.Height);

        var bitmapData = source.LockBits(rectangle, ImageLockMode.ReadOnly, source.PixelFormat);

        var pixelCount = region.Width * region.Height;

        var scanWidth = bitmapData.Stride / 4;



        uint totals = { 0, 0, 0 };

        int flushCounter = 0;

        uint sumRR00BB = 0;

        uint sum00GG00 = 0;



        for (var y = region.Y; y < region.Y + region.Height; y++)

        {

            uint* row = (uint*)bitmapData.Scan0 + y * scanWidth;



            for (var x = region.X; x < region.X + region.Width; x++)

            {

                sumRR00BB += row[x] & 0xff00ff;

                sum00GG00 += row[x] & 0x00ff00;



                // Flush before overflow occurs.

                if (flushCounter++ == 0xff)

                {

                    totals[0] += sumRR00BB >> 16;

                    totals[1] += sum00GG00 >> 8;

                    totals[2] += sumRR00BB & 0xffff;



                    sumRR00BB = 0;

                    sum00GG00 = 0;



                    flushCounter = 0;

                }

            }

        }



        // Flush left-over's.

        totals[0] += sumRR00BB >> 16;

        totals[1] += sum00GG00 >> 8;

        totals[2] += sumRR00BB & 0xffff;



        // Average the totals.

        totals[0] /= (uint)pixelCount;

        totals[1] /= (uint)pixelCount;

        totals[2] /= (uint)pixelCount;



        source.UnlockBits(bitmapData);



        return totals;

    }

搜尋此網誌

Gfrktyl

Getting the dominant RGB color of a bitmap

6 Answers
6

Validation

Naming

Possible problems

Performance

Your Answer

Post as a guest

6 Answers
6

6 Answers
6

Validation

Naming

Possible problems

Performance

Validation

Naming

Possible problems

Performance

Validation

Naming

Possible problems

Performance

Validation

Naming

Possible problems

Performance

Post as a guest

Popular posts from this blog

Список кардиналов, возведённых папой римским Каликстом III

Deduzione

Mysql.sock missing - “Can't connect to local MySQL server through socket”

Getting the dominant RGB color of a bitmap

6 Answers 6

Validation

Naming

Possible problems

Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

6 Answers 6

6 Answers 6

Validation

Naming

Possible problems

Performance

Validation

Naming

Possible problems

Performance

Validation

Naming

Possible problems

Performance

Validation

Naming

Possible problems

Performance

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Список кардиналов, возведённых папой римским Каликстом III

Deduzione

Mysql.sock missing - “Can't connect to local MySQL server through socket”

6 Answers
6

6 Answers
6

6 Answers
6