Skip to content

Fix LMF Negative Sampling to Sample Uniformly#747

Open
Fazel94 wants to merge 2 commits intobenfred:mainfrom
Fazel94:fix/lmf-negative-sampling
Open

Fix LMF Negative Sampling to Sample Uniformly#747
Fazel94 wants to merge 2 commits intobenfred:mainfrom
Fazel94:fix/lmf-negative-sampling

Conversation

@Fazel94
Copy link
Copy Markdown

@Fazel94 Fazel94 commented May 2, 2026

Fix four bugs in lmf_update that gutted negative sampling

The negative sampling loop in lmf_update had four bugs that together
meant the model was barely seeing true negatives:

  • n_items used item_vectors.shape[1] (= n_factors+2) instead of shape[0],
    capping the negative loop at ~34 regardless of catalogue size
  • negatives were drawn from CSR indices[], which only holds interacted
    items — never zero-interaction ones, and popularity-biased on top
  • the outer loop variable _ got clobbered by the inner factor loops,
    so it ran once instead of neg_prop * seen_items times
  • a single RNG seeded with nnz-1 was shared by both the user-update and
    item-update passes; each needs its own with the correct range

Fix: shape[0] for n_items, sample item/user IDs directly from [0, n)
with rejection for positives, distinct loop variables, two RNGVectors.

Added five regression tests covering each bug and the overall cluster
recovery behavior.

AI use disclosure:
I have used LLMs extensively for understanding the issue, cleaning up and generating comments for my code and pr.
I have written the code and am responsible for it.

Fazel94 added 2 commits May 1, 2026 12:58
Bug A: item_vectors.shape[1] returned n_factors+2, not n_items.
  Fix: use shape[0].

Bug B: RNGVector range was [0, nnz-1] and i = indices[index] only
  samples from already-interacted items (popularity-biased, never
  zero-interaction items).  Fix: sample i directly from [0, n_items).

Bug C: outer negative-sample loop and inner factor loops all used
  as the loop variable.  Each inner loop left _ == n_factors, so the
  outer loop ran at most once regardless of neg_prop.
  Fix: use f for inner factor loops.

Bug D: a single RNG seeded with nnz-1 was shared by the user-update
  pass (needs item IDs) and item-update pass (needs user IDs).
  Fix: two separate RNGVector instances with correct ranges.
test_cluster_recovery         – overall in-cluster precision >= 0.70 (A+B+C+D)
test_n_items_dimension        – catalogue size, not n_factors+2, used as loop bound (A)
test_negatives_not_in_user_positives – negative gradient term uses true negatives (B)
test_negative_loop_variable_shadowing – neg_prop=5 outperforms neg_prop=1 (C)
test_separate_rngs_for_user_and_item_update – both factor matrices finite + precise (D)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant