Fast Image Pre-processing with OpenCV 2.4, C++, CUDA: Memory, CLAHE

By | August 19, 2015
Previous couple of posts describe some retina images pre-processing with OpenCV and IPython notebooks. Python is great but having to pre-process about 88,000 images (35,000 train and 53,000 test) I had my doubts about how long it would all take. Besides, I am a huge fan of CUDA, I have a GTX Titan GPU and I am not afraid to use it! OpenCV 2.4.11 in Python, unfortunately, does not support CUDA, so we turn to C++.

Installation

Just a quick note. There are plenty of guides on OpenCV installation. Checking out the sources and building with CMake is a breeze. It is also required for CUDA support.

Sources

All sources for this are on my github.

Preprocessing Steps

We do the following preprocessing on each image:
  1. Find eye contours, create mask
  2. Color transfer via histogram specification
  3. Convert to HSV, apply CLAHE on value channel
  4. Convert back to BGR (this OpenCV, so not RGB, but BGR), apply mask to filter out the background
I will only highlight the more interesting parts of this process briefly.

Data Structures, Memory Management

Unlike Python, where OpenCV images are stored in NumPy arrays, in C++ OpenCV 2.4 uses Mat and GpuMat. Some algorithms work on GPU, some don’t. That means memory moves between RAM and GPU memory may become an issue, since it is one of the more time consuming operations in GPU development. Straight CUDA code for memory moves is cumbersome, but not a problem in OpenCV. On the other hand, when wrapping all kinds of transforms in a class, trying to keep track of where the current object on which a transform should be performed resides, may become complicated and hard to maintain. My first stab at this was, perhaps, not the most elegant:
  1. class TransformImage
  2. {
  3. protected:
  4.  
  5.     Mat _image; //original
  6.     Mat _enhanced; //after all transforms
  7.     Mat _buf; //buffer for intermediate operations
  8.  
  9.     gpu::GpuMat g_image;
  10.     gpu::GpuMat g_enhanced;
  11.     gpu::GpuMat g_buf; 
  12.  
  13.     inline void MakeSafe() { g_enhanced.copyTo(g_buf); }
  14. ...
  15. }
OpenCV APIs for image transforms take a source and a destination parameter. It was not immediately obvious to me in which cases these two may refer to the same structure, so to be safe, I often made use of the buffer. Of course, accessors now take on an additional role of moving images back and forth between regular memory and the GPU:
  1. void setImage(Mat& image) 
  2.     { image.copyTo(_image); g_image.upload(image); g_image.copyTo(g_enhanced); }
  3. Mat& getImage() { return _image; }
  4. void setChannel(Channels channel) { _channel = channel; }
  5. Mat& getEnhanced() 
  6.     { g_enhanced.download(_enhanced); return _enhanced; }
As the code shows, moving stuff between GPU and RAM is a breeze in OpenCV from the coder’s perspective. We use upload() and download() for RAM -> GPU and GPU -> RAM respectively.

CLAHE on GPU

For a discussion of histogram equalization, see this tutorial.

This is sorta out-of-band for this post, but to dilute all this book-keeping, I thought it would be good to demonstrate at least how to do CLAHE on the GPU. The code does not differ much from a CPU code!

My transform class applies the CLAHE algorithm to a single channel of the image. I have g_oneChannel array that holds different channels of the image. CLAHE is designed to work on one channel (grey scale) images. If we want to apply it to an RGB (or, in the case of OpenCV BGR) images, we first need to convert the image to HSV (or HSI or some such), apply the algorithm on the value (intensity) channel, and merge that channel back into the original image. This is where the channel stuff in my C++ code comes from. Fortunately we can do all of this without leaving the GPU.

  1. gpu::GpuMat& ApplyClahe()
  2. {
  3.     Ptr<gpu::CLAHE> clahe = gpu::createCLAHE();
  4.  
  5.     clahe->setClipLimit(4.);
  6.     clahe->setTilesGridSize(Size(16, 16));
  7.     clahe->apply(g_oneChannel[(int)_channel], g_buf);
  8.  
  9.     g_buf.copyTo(g_oneChannel[(int)_channel]);
  10.  
  11.     gpu::merge(g_oneChannel, g_enhanced);
  12.     return g_enhanced;
  13. }

Notice the use of g_buf to hold an intermediate result to be safe. Not sure if we could have skipped it, I didn’t check.

This code is reminiscent of the Python code (as the tutorial shows):

  1. import numpy as np
  2. import cv2
  3.  
  4. # create a CLAHE object (Arguments are optional).
  5. clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
  6. cl1 = clahe.apply(img)
  7. return cl1

Here are the “before” and “after” CLAHE images of Kaggle 16_left.jpeg

eyeeye_enhanced

2 thoughts on “Fast Image Pre-processing with OpenCV 2.4, C++, CUDA: Memory, CLAHE

  1. Jon Lee

    Hello, I was wondering if you know how to use gpu::pyrDown and gpu::pyrUp in opencv? I’ve been having a very hard time finding anything related to my problem and I was wondering if you could help.

    for (int i = 0; i < Pyramid_Size; i++) {
    cv::gpu::pyrDown(DownUp, DownUp);
    }

    for (int i = 0; i < Pyramid_Size; i++){
    cv::gpu::pyrUp(DownUp, DownUp);
    }

    Is where the problem lies. I found that once I use pyrDown and pyrUp each pixel is at a value of 0 inside GpuMat DownUp. Do you know why this may be?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *