Performance degrades in non-bechmark situations

I am using this library to process raw frames from a 60fps camera. I wrote a simple benchmarking utility and I get excellent timings and almost 100% GPU utilization. However, when using the library in my scenario, the performance worsens considerably (for example, we go from 0.5ms to encode a HD image to 3ms).

After experimenting a bit, I tried to modify my benchmark code by inserting a simple 17ms sleep inside the encoding loop to simulate receiving images at 60fps. After doing this and monitoring GPU activity, I discovered that the GPU works at full load for about 4/5 seconds and then suddenly its activity drops.

I think this could be related to some automatic stuff the GPU does when managing the workload, but I don't know enough about CUDA or NVIDIA to pinpoint what happens and if there is any way of forcing the GPU to give GPUJPEG a higher "priority".

Here you can see GPU usage when benchmarking in "normal" mode:

![Image](https://github.com/user-attachments/assets/64805509-4000-4eee-b381-38e23df41ef0)

Here you can see GPU usage when benchmarking in "60fps" mode:

![Image](https://github.com/user-attachments/assets/f79353ce-4b5c-45f1-8937-3daa0d85aea6)

Here is my benchmark loop for reference:

```
struct gpujpeg_decoder_output decoder_output;
struct gpujpeg_parameters param;
struct gpujpeg_image_parameters param_image;
struct gpujpeg_decoder* decoder;

unsigned char *jpeg_input = //input image is loaded here;
size_t jpeg_input_size = //;

if (gpujpeg_init_device(0, 0))
{
  perror("Failed to initialize GPU device");
  return 1;
}

decoder = gpujpeg_decoder_create(0);
if (decoder == NULL)
{
  perror("Failed to create decoder");
  return 1;
}

gpujpeg_set_default_parameters(&param);
gpujpeg_image_set_default_parameters(&param_image);
gpujpeg_decoder_init(decoder, &param, &param_image);
gpujpeg_decoder_output_set_default(&decoder_output);
gpujpeg_decoder_set_output_format(decoder, GPUJPEG_RGB, GPUJPEG_444_U8_P012);
clock_t total_processing_time = 0;
int width = 0, height = 0;
for(int i = 0; i < iterations; ++i)
{
  clock_t t0 = clock();
  gpujpeg_decoder_decode(decoder, jpeg_input, jpeg_input_size, &decoder_output);
  clock_t t1 = clock();
  total_processing_time += t1 - t0;
  usleep(17 * 1000); // simulate 60fps, remove this line for continuous processing
}
printf("Total processing time (seconds):%f\n", ((double)total_processing_time) / CLOCKS_PER_SEC);
printf("Average processing time per iteration (milliseconds):%f\n", ((double)total_processing_time) / iterations / CLOCKS_PER_SEC * 1000);
printf("Average frames per second:%f\n", iterations / (((double)total_processing_time) / CLOCKS_PER_SEC));
printf("Average megapixels per second:%f\n", (decoder_output.data_size / 3) / (double)1000000 * iterations / (((double)total_processing_time) / CLOCKS_PER_SEC));
gpujpeg_decoder_destroy(decoder);
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degrades in non-bechmark situations #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance degrades in non-bechmark situations #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions