Skip to content

Fix: Potential Data Leakage in Quantum Data Tutorial.#829

Open
OkuyanBoga wants to merge 1 commit intotensorflow:masterfrom
OkuyanBoga:fix-data-leakage-in-tutorial
Open

Fix: Potential Data Leakage in Quantum Data Tutorial.#829
OkuyanBoga wants to merge 1 commit intotensorflow:masterfrom
OkuyanBoga:fix-data-leakage-in-tutorial

Conversation

@OkuyanBoga
Copy link

A solution to potential data leakage in #828.

Instead of concatenating train and test sets, they should be separately dealt with when getting a stilted dataset:

In lines L745-752:

y_train_new = get_stilted_dataset(S_pqk, V_pqk, S_original, V_original)
y_test_new = get_stilted_dataset(S_pqk_test, V_pqk_test, S_test_original, V_test_original)

where spectrum is calculated separately for test set:

S_pqk_test, V_pqk_test = get_spectrum(
    tf.reshape(x_test_pqk, [-1, len(qubits) * 3]))

S_test_original, V_test_original = get_spectrum(
    tf.cast(x_test, tf.float32), gamma=0.005)

print('Eigenvectors of pqk kernel matrix for test:', V_pqk_test)
print('Eigenvectors of original kernel matrix for test:', V_test_original)

Closes #828.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mhucka
Copy link
Member

mhucka commented Feb 25, 2026

@OkuyanBoga Thank you for this contribution. Would you be able to resolve the (simple) conflicts that have arisen? I will then review the PR.

@mhucka mhucka self-assigned this Feb 25, 2026
@mhucka mhucka added the area/docs Involves documentation – problems, ideas, requests label Feb 25, 2026
@mhucka
Copy link
Member

mhucka commented Mar 6, 2026

Closing due to age and nonresponse.

@mhucka mhucka closed this Mar 6, 2026
@OkuyanBoga
Copy link
Author

Hi, sorry for late response but I think the issue I shown here breaks the whole tutorial. If there is not a leakage, the performance of the method reduces significantly.

Any suggestions or comments?

@mhucka mhucka reopened this Mar 10, 2026
@mhucka
Copy link
Member

mhucka commented Mar 10, 2026

@OkuyanBoga Thanks for your reply and raising awareness of the problem. I reopened this PR.

It looks like it's close to being mergeable, but there are differences in the .ipynb file that are not part of the actual changes. This is a separate mater from the 188 commit difference that GitHub also reports – the actual changes to the file are much smaller. However, there are still irrelevant changes in the .ipynb file, e.g. to metadata elements in the .ipynb file. Before merging, we'd like to make the diff as small as possible so that the change history is easier to follow in the future.

One way may be to click "resolve conflicts" here on this PR page, then out of the 8 changes that GitHub notes in the diff view, accept your incoming changes just for the ones that matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs Involves documentation – problems, ideas, requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible data leakage in quantum/docs/tutorials /quantum_data.ipynb

2 participants