Skip to content

Performance degrades in non-bechmark situations #109

@simdeistud

Description

@simdeistud

I am using this library to process raw frames from a 60fps camera. I wrote a simple benchmarking utility and I get excellent timings and almost 100% GPU utilization. However, when using the library in my scenario, the performance worsens considerably (for example, we go from 0.5ms to encode a HD image to 3ms).

After experimenting a bit, I tried to modify my benchmark code by inserting a simple 17ms sleep inside the encoding loop to simulate receiving images at 60fps. After doing this and monitoring GPU activity, I discovered that the GPU works at full load for about 4/5 seconds and then suddenly its activity drops.

I think this could be related to some automatic stuff the GPU does when managing the workload, but I don't know enough about CUDA or NVIDIA to pinpoint what happens and if there is any way of forcing the GPU to give GPUJPEG a higher "priority".

Here you can see GPU usage when benchmarking in "normal" mode:

Image

Here you can see GPU usage when benchmarking in "60fps" mode:

Image

Here is my benchmark loop for reference:

struct gpujpeg_decoder_output decoder_output;
struct gpujpeg_parameters param;
struct gpujpeg_image_parameters param_image;
struct gpujpeg_decoder* decoder;

unsigned char *jpeg_input = //input image is loaded here;
size_t jpeg_input_size = //;

if (gpujpeg_init_device(0, 0))
{
  perror("Failed to initialize GPU device");
  return 1;
}

decoder = gpujpeg_decoder_create(0);
if (decoder == NULL)
{
  perror("Failed to create decoder");
  return 1;
}

gpujpeg_set_default_parameters(&param);
gpujpeg_image_set_default_parameters(&param_image);
gpujpeg_decoder_init(decoder, &param, &param_image);
gpujpeg_decoder_output_set_default(&decoder_output);
gpujpeg_decoder_set_output_format(decoder, GPUJPEG_RGB, GPUJPEG_444_U8_P012);
clock_t total_processing_time = 0;
int width = 0, height = 0;
for(int i = 0; i < iterations; ++i)
{
  clock_t t0 = clock();
  gpujpeg_decoder_decode(decoder, jpeg_input, jpeg_input_size, &decoder_output);
  clock_t t1 = clock();
  total_processing_time += t1 - t0;
  usleep(17 * 1000); // simulate 60fps, remove this line for continuous processing
}
printf("Total processing time (seconds):%f\n", ((double)total_processing_time) / CLOCKS_PER_SEC);
printf("Average processing time per iteration (milliseconds):%f\n", ((double)total_processing_time) / iterations / CLOCKS_PER_SEC * 1000);
printf("Average frames per second:%f\n", iterations / (((double)total_processing_time) / CLOCKS_PER_SEC));
printf("Average megapixels per second:%f\n", (decoder_output.data_size / 3) / (double)1000000 * iterations / (((double)total_processing_time) / CLOCKS_PER_SEC));
gpujpeg_decoder_destroy(decoder);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions