Skip to content

[refine](exec/operator) replace std::mutex/std::lock_guard with annotated wrappers for thread safety analysis#63070

Open
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:use-AnnotatedMutex-for-pipeline
Open

[refine](exec/operator) replace std::mutex/std::lock_guard with annotated wrappers for thread safety analysis#63070
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:use-AnnotatedMutex-for-pipeline

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 8, 2026

What problem does this PR solve?

What problem does this PR solve?

Issue Number: N/A

Problem Summary:

This PR introduces Clang thread safety annotations (-Wthread-safety) to pipeline operator shared states by
replacing raw std::mutex/std::lock_guard/std::unique_lock with annotated wrappers (AnnotatedMutex, LockGuard,
UniqueLock), and by decorating guarded member variables with GUARDED_BY attributes. This enables the compiler to
statically detect data races where a field is accessed without holding its associated mutex.

The change also fixes two bugs uncovered during the annotation process:

  • MultiCastDataStreamer::push: _eos (member) was checked instead of eos (parameter), causing the "set always ready"
    branch to fire on the prior call's stale state rather than the current one.
  • MultiCastDataStreamer::pull's spill lambda: _cached_blocks[sender_idx].empty() was checked outside the mutex; the
    check is now done via a boolean captured inside the lock.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

run buildall

@Mryange Mryange force-pushed the use-AnnotatedMutex-for-pipeline branch from 7f97364 to a29fb60 Compare May 8, 2026 07:54
@Mryange Mryange changed the title [only test now] [refine](pipeline) replace std::mutex/std::lock_guard with annotated wrappers for thread safety analysis May 8, 2026
@Mryange Mryange changed the title [refine](pipeline) replace std::mutex/std::lock_guard with annotated wrappers for thread safety analysis [refine](exec/operator) replace std::mutex/std::lock_guard with annotated wrappers for thread safety analysis May 8, 2026
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

/review

@Mryange Mryange marked this pull request as ready for review May 8, 2026 08:00
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/25544376096

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: no blocking issues found in the changed PR diff.

Critical checkpoint conclusions:

  • Goal/test: The PR appears focused on adding annotated mutex/lock wrappers and applying them to selected pipeline shared-state locks so Clang thread-safety annotations can be used. The existing multicast streamer test was adjusted to respect the new guarded fields; I did not run tests in this review environment.
  • Scope/focus: The change is small and focused on lock annotation/conversion in the authoritative PR patch.
  • Concurrency: Reviewed the converted pipeline locks, dependency interactions, multicast spill/read paths, shared hash table finish dependency list, exchange finished-channel set, and scan conjunct lock. I did not find a concrete introduced race, missed wake-up, or deadlock from the changed lock wrappers/usages.
  • Lifecycle/static initialization: No new static/global lifecycle dependency was introduced.
  • Configuration/compatibility: No new config items, wire protocol changes, persisted format changes, or rolling-upgrade compatibility concerns were found.
  • Parallel paths: The modified paths are localized conversions; no obvious parallel path requires the same annotation change for correctness.
  • Conditional checks/error handling: No new unchecked Status or silent error continuation was found in the changed code.
  • Test coverage/results: Existing test coverage is lightly adjusted for guarded access; no new behavioral test was required by the annotation-only intent. I did not verify by running BE tests.
  • Observability: No new runtime behavior requiring additional logs or metrics was introduced.
  • Transactions/data writes/FE-BE passing: Not applicable to this PR diff.
  • Performance: The annotated wrappers preserve std::mutex semantics at reviewed call sites; no obvious hot-path extra work is introduced outside BE_TEST mock sleeps.

User focus: no additional user-provided review focus was present.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 8, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29606 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 753929183e9f677fffd6359d30686f312da8b64c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17838	3799	3799	3799
q2	q3	10718	888	605	605
q4	4661	466	354	354
q5	7460	1325	1135	1135
q6	204	180	140	140
q7	918	940	754	754
q8	9320	1431	1283	1283
q9	5565	5412	5361	5361
q10	6300	2094	1853	1853
q11	471	262	262	262
q12	679	424	296	296
q13	18164	3878	2729	2729
q14	292	284	264	264
q15	q16	905	879	796	796
q17	1019	1028	665	665
q18	6523	5661	5611	5611
q19	1177	1199	1108	1108
q20	520	410	263	263
q21	5013	2357	2003	2003
q22	472	395	325	325
Total cold run time: 98219 ms
Total hot run time: 29606 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4602	4493	4459	4459
q2	q3	4629	4740	4215	4215
q4	2123	2140	1374	1374
q5	4946	5046	5311	5046
q6	205	169	135	135
q7	2053	1829	1624	1624
q8	3355	3117	3112	3112
q9	8423	8618	8349	8349
q10	4446	4539	4319	4319
q11	592	416	409	409
q12	746	746	513	513
q13	3341	3725	2885	2885
q14	308	314	293	293
q15	q16	753	786	687	687
q17	1333	1344	1271	1271
q18	7944	7043	7067	7043
q19	1153	1153	1170	1153
q20	2237	2240	1940	1940
q21	6169	5374	4835	4835
q22	541	481	392	392
Total cold run time: 59899 ms
Total hot run time: 54054 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169778 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 753929183e9f677fffd6359d30686f312da8b64c, data reload: false

query5	4304	650	506	506
query6	325	228	200	200
query7	4210	544	296	296
query8	324	235	225	225
query9	8833	4018	4010	4010
query10	467	363	302	302
query11	5698	2417	2188	2188
query12	186	133	133	133
query13	1276	645	423	423
query14	5949	5355	5066	5066
query14_1	4388	4327	4360	4327
query15	210	206	184	184
query16	1033	466	439	439
query17	1186	766	651	651
query18	2555	491	366	366
query19	228	221	177	177
query20	144	134	132	132
query21	225	156	117	117
query22	13590	13477	13533	13477
query23	17029	16296	16001	16001
query23_1	16056	16127	16089	16089
query24	7409	1746	1336	1336
query24_1	1360	1358	1347	1347
query25	588	518	463	463
query26	1315	343	172	172
query27	2663	623	343	343
query28	4428	1951	1958	1951
query29	1044	658	537	537
query30	316	243	203	203
query31	1120	1067	944	944
query32	88	78	77	77
query33	548	361	301	301
query34	1196	1144	632	632
query35	752	794	661	661
query36	1298	1345	1179	1179
query37	147	104	86	86
query38	3239	3110	2997	2997
query39	929	907	891	891
query39_1	880	875	873	873
query40	228	161	137	137
query41	66	67	60	60
query42	112	107	107	107
query43	325	328	291	291
query44	
query45	210	202	193	193
query46	1068	1202	754	754
query47	2320	2333	2182	2182
query48	396	389	300	300
query49	632	546	445	445
query50	700	286	215	215
query51	4312	4319	4118	4118
query52	107	103	93	93
query53	258	288	206	206
query54	305	263	248	248
query55	91	89	83	83
query56	289	302	306	302
query57	1406	1386	1333	1333
query58	295	281	267	267
query59	1570	1605	1408	1408
query60	349	337	309	309
query61	156	159	153	153
query62	666	621	558	558
query63	245	198	208	198
query64	2408	814	641	641
query65	
query66	1721	507	395	395
query67	30199	29933	29804	29804
query68	
query69	455	344	296	296
query70	1048	993	998	993
query71	317	275	254	254
query72	2957	2765	2430	2430
query73	854	787	441	441
query74	5041	4898	4743	4743
query75	2766	2657	2319	2319
query76	2318	1139	771	771
query77	411	429	340	340
query78	12968	12878	12422	12422
query79	1522	1034	719	719
query80	1347	591	492	492
query81	511	283	245	245
query82	1222	161	127	127
query83	364	281	251	251
query84	260	143	113	113
query85	930	517	448	448
query86	446	353	301	301
query87	3462	3351	3247	3247
query88	3547	2689	2640	2640
query89	453	382	336	336
query90	1988	178	183	178
query91	177	167	142	142
query92	79	77	80	77
query93	944	950	554	554
query94	728	347	289	289
query95	651	479	345	345
query96	1040	745	327	327
query97	2681	2683	2586	2586
query98	239	230	233	230
query99	1125	1119	989	989
Total cold run time: 253660 ms
Total hot run time: 169778 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants