ENH: kernels for random.vonmisses; part 2#681
Conversation
| Vvec = reinterpret_cast<_DataType*>(dpnp_memory_alloc_c(size * sizeof(_DataType))); | ||
|
|
||
| for (size_t n = 0; n < size;) | ||
| n = reinterpret_cast<size_t*>(dpnp_memory_alloc_c(sizeof(size_t))); |
There was a problem hiding this comment.
this is quite strange (Make scalar as a array with one element).
I think it should be a scalar, not an array.
There was a problem hiding this comment.
scalar and this is the same. You just can not pass n to sycl region in other way.
| Y = 0.0; | ||
| else if (Y > 1.0) | ||
| Y = 1.0; | ||
| n[0] = n[0] + 1; |
There was a problem hiding this comment.
This is a mistake. This is parallel environment (SYCL kernel). Writing inside the kernel into same memory cause https://en.wikipedia.org/wiki/Race_condition
| V = Vvec[i]; | ||
| sn2 = sn * sn; | ||
| cn2 = cn * cn; | ||
| auto paral_kernel_some = [&](cl::sycl::handler& cgh) { |
There was a problem hiding this comment.
Kernel inside the loop with bigger trip count. It would be more efficient to parallelize (make kernel) the algorithm by bigger value size instead size-n. So, it will require a loop inside the kernel.
It is questionable what will be more performant
- loop with a kernels queue (data dependent)
- kernel with a loop
It is hard to predict it with no perf measurements but I would vote that parallelization with bigger number of threads should be better.
|
@samir-nasibli Is this PR ready to review or still in development stage? |
I will update this PR or move some part of this changes to another PR with closing this. |
random.vonmissesrandom.vonmisses; part 2
| *n = *n + 1; | ||
| result1[*n] = cl::sycl::asin(cl::sycl::sqrt(Y)); |
There was a problem hiding this comment.
Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculate n (index of result) from i.
| *n = *n + 1; | ||
| result1[*n] = cl::sycl::acos(W); |
There was a problem hiding this comment.
Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculate n (index of result) from i.
* Fix race condition in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c * Rename arrays and change if condition from kernels in dpnp_rng_vonmises_large_kappa_c and dpnp_rng_vonmises_small_kappa_c * Add space * Fix indices in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c
|
@LukichevaPolina |
|
The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases. - ideas? |
using extra mem is brute force approach.
We need to investigate it.
|
|
Stale PR? |
Description
Enable computations on devices [CPU/GPU].
Tests
TODO
tests/test_random.py::TestDistributionsVonmises::test_seedfailed on both devices. Bug.