I am using this library to process raw frames from a 60fps camera. I wrote a simple benchmarking utility and I get excellent timings and almost 100% GPU utilization. However, when using the library in my scenario, the performance worsens considerably (for example, we go from 0.5ms to encode a HD image to 3ms).
After experimenting a bit, I tried to modify my benchmark code by inserting a simple 17ms sleep inside the encoding loop to simulate receiving images at 60fps. After doing this and monitoring GPU activity, I discovered that the GPU works at full load for about 4/5 seconds and then suddenly its activity drops.
I think this could be related to some automatic stuff the GPU does when managing the workload, but I don't know enough about CUDA or NVIDIA to pinpoint what happens and if there is any way of forcing the GPU to give GPUJPEG a higher "priority".
Here you can see GPU usage when benchmarking in "normal" mode:

Here you can see GPU usage when benchmarking in "60fps" mode:

Here is my benchmark loop for reference:
struct gpujpeg_decoder_output decoder_output;
struct gpujpeg_parameters param;
struct gpujpeg_image_parameters param_image;
struct gpujpeg_decoder* decoder;
unsigned char *jpeg_input = //input image is loaded here;
size_t jpeg_input_size = //;
if (gpujpeg_init_device(0, 0))
{
perror("Failed to initialize GPU device");
return 1;
}
decoder = gpujpeg_decoder_create(0);
if (decoder == NULL)
{
perror("Failed to create decoder");
return 1;
}
gpujpeg_set_default_parameters(¶m);
gpujpeg_image_set_default_parameters(¶m_image);
gpujpeg_decoder_init(decoder, ¶m, ¶m_image);
gpujpeg_decoder_output_set_default(&decoder_output);
gpujpeg_decoder_set_output_format(decoder, GPUJPEG_RGB, GPUJPEG_444_U8_P012);
clock_t total_processing_time = 0;
int width = 0, height = 0;
for(int i = 0; i < iterations; ++i)
{
clock_t t0 = clock();
gpujpeg_decoder_decode(decoder, jpeg_input, jpeg_input_size, &decoder_output);
clock_t t1 = clock();
total_processing_time += t1 - t0;
usleep(17 * 1000); // simulate 60fps, remove this line for continuous processing
}
printf("Total processing time (seconds):%f\n", ((double)total_processing_time) / CLOCKS_PER_SEC);
printf("Average processing time per iteration (milliseconds):%f\n", ((double)total_processing_time) / iterations / CLOCKS_PER_SEC * 1000);
printf("Average frames per second:%f\n", iterations / (((double)total_processing_time) / CLOCKS_PER_SEC));
printf("Average megapixels per second:%f\n", (decoder_output.data_size / 3) / (double)1000000 * iterations / (((double)total_processing_time) / CLOCKS_PER_SEC));
gpujpeg_decoder_destroy(decoder);
I am using this library to process raw frames from a 60fps camera. I wrote a simple benchmarking utility and I get excellent timings and almost 100% GPU utilization. However, when using the library in my scenario, the performance worsens considerably (for example, we go from 0.5ms to encode a HD image to 3ms).
After experimenting a bit, I tried to modify my benchmark code by inserting a simple 17ms sleep inside the encoding loop to simulate receiving images at 60fps. After doing this and monitoring GPU activity, I discovered that the GPU works at full load for about 4/5 seconds and then suddenly its activity drops.
I think this could be related to some automatic stuff the GPU does when managing the workload, but I don't know enough about CUDA or NVIDIA to pinpoint what happens and if there is any way of forcing the GPU to give GPUJPEG a higher "priority".
Here you can see GPU usage when benchmarking in "normal" mode:
Here you can see GPU usage when benchmarking in "60fps" mode:
Here is my benchmark loop for reference: