Skip to content

Make PCA projection reproducible given seed#46

Draft
NetZissou wants to merge 1 commit into
mainfrom
fix/pca-seed-reproducibility
Draft

Make PCA projection reproducible given seed#46
NetZissou wants to merge 1 commit into
mainfrom
fix/pca-seed-reproducibility

Conversation

@NetZissou

Copy link
Copy Markdown
Collaborator

sklearn auto-selects the randomized SVD solver for large inputs, calling PCA(n_components=2) without random_state produced a different projection on every run. Now fixed by passing the seed to the API call.

cuML PCA is left unchanged: it has no random_state parameter (passing one raises TypeError and silently falls back to sklearn), and its full-SVD solver is already deterministic. Verified on a V100: cuML PCA stays on GPU path and is reproducible run-to-run. Changing the solver to "jacobi" doesn't make sense for the interactive scale data with 768 dim.

{546A243E-7139-4D64-B8BB-C2688E81E020}

sklearn auto-selects the randomized SVD solver for large inputs, calling
PCA(n_components=2) without `random_state` produced a different
projection on every run. Now fixed by passing the seed to the API call.

cuML PCA is left unchanged: it has no random_state parameter (passing
one raises TypeError and silently falls back to sklearn), and its
full-SVD solver is already deterministic. Verified on a V100: cuML PCA
stays on GPU path and is reproducible run-to-run.
@NetZissou NetZissou requested a review from egrace479 June 18, 2026 13:29
@NetZissou NetZissou self-assigned this Jun 18, 2026
@NetZissou NetZissou added the bug Something isn't working label Jun 18, 2026
@NetZissou NetZissou marked this pull request as draft June 22, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant