Skip to content

[feat](storage) Implement adaptive batch size for SegmentIterator#61535

Open
mrhhsg wants to merge 1 commit intoapache:masterfrom
mrhhsg:adaptive_batch_size
Open

[feat](storage) Implement adaptive batch size for SegmentIterator#61535
mrhhsg wants to merge 1 commit intoapache:masterfrom
mrhhsg:adaptive_batch_size

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Mar 20, 2026

Introduce an EWMA-based AdaptiveBlockSizePredictor that dynamically adjusts SegmentIterator chunk row counts so each output Block stays near the session-variable-configured preferred_block_size_bytes target. A complementary byte-budget stop condition is added to all BlockReader and VCollectIterator accumulation loops so the final Block returned to the upper layer is also bounded.

Key changes:

  • New BE config: enable_adaptive_batch_size (default: true)
  • New session variables: preferred_block_size_bytes (8 MB), preferred_max_column_in_block_size_bytes (1 MB)
  • New Thrift fields 204/205 for FE→BE propagation
  • AdaptiveBlockSizePredictor: EWMA per-row and per-column byte estimator with conservative segment-metadata bootstrap
  • SegmentIterator: predicts rows before each next_batch call; updates EWMA on success path
  • BlockReader: byte-stop in _replace_key_next_block, _unique_key_next_block, _agg_key_next_block
  • VCollectIterator: byte-stop in Level1Iterator::_merge_next
  • Unit tests and regression tests included

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 20, 2026

run buildall

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 20, 2026

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces an EWMA-based AdaptiveBlockSizePredictor that dynamically adjusts SegmentIterator chunk row counts to target a preferred block byte size. The design is reasonable and the overall approach is sound. However, there are several issues that need to be addressed.

Critical Checkpoints

Goal and correctness: The feature aims to dynamically adjust batch sizes to target byte-based block sizes instead of fixed row counts. The implementation adds prediction at the SegmentIterator level and byte-budget stops at BlockReader and VCollectIterator. There are correctness bugs in the record_rowids path (see inline comments).

Concurrency: The predictor is single-threaded per instance and used within a single SegmentIterator — no concurrency concern.

Lifecycle / SIOF: No static initialization order issues. The predictor is created as a unique_ptr member.

Config items: enable_adaptive_batch_size is dynamic (mutable), which is appropriate. Session variables preferred_block_size_bytes and preferred_max_column_in_block_size_bytes are correctly propagated via Thrift.

Incompatible changes: New Thrift field IDs (210, 211) have defaults matching the session variable defaults. Backward compatible.

Parallel code paths: The byte-budget check is added to _replace_key_next_block, _unique_key_next_block, _agg_key_next_block, and VCollectIterator::_merge_next. The _direct_next_block path (DUP KEY) relies on the SegmentIterator-level predictor limiting block_row_max, which is correct by design.

Test coverage: Regression tests cover DUP, AGG, and UNIQUE key table types. Unit tests are thorough for the predictor class itself. However, no test covers the record_rowids path which has bugs.

Observability: Profile counters AdaptiveBatchPredictMinRows and AdaptiveBatchPredictMaxRows are added. However, a LOG(INFO) debug log is left in hot path (see inline).

Performance: Block::columns_byte_size() is called per-row in some paths. Each call is O(num_columns) with O(1) per column. Acceptable for typical column counts but could be a concern for very wide tables (hundreds of columns).

Issues Found

  1. [Bug] _replace_key_next_block and _unique_key_next_block: Missing _block_row_locations.resize(target_block_row) when byte-budget triggers break with record_rowids enabled. This will cause the DCHECK at line 604 to fail and potentially return stale/incorrect row locations.

  2. [Bug/Severity:High] LOG(INFO) with ****** prefix on every predict_next_rows() call — this is a debug log accidentally left in production code. It will spam logs on every batch of every segment scan.

  3. [Minor] The _replace_key_next_block function does not call block->set_columns() after mutating columns — this is pre-existing behavior but worth noting for consistency.

  4. [Observation] In VCollectIterator::_merge_next, the byte-budget check uses Block::columns_byte_size(target_columns) before flushing continuous_row_in_block pending rows, so the actual output block can exceed the budget. This is a design choice (conservative), not a correctness bug.

static_cast<size_t>(static_cast<double>(_block_size_bytes) / estimated_bytes_per_row);
LOG(INFO) << "****** predicted rows: " << predicted
<< ", estimated_bytes_per_row: " << estimated_bytes_per_row
<< ", _block_size_bytes: " << _block_size_bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug/High] This LOG(INFO) with ****** prefix is called on every predict_next_rows() invocation — i.e., every batch in every segment scan. This will generate massive log spam in production.

This should be changed to VLOG_DEBUG or removed entirely. At minimum, this is clearly a debug statement accidentally left in:

VLOG_DEBUG << "predicted rows: " << predicted
           << ", estimated_bytes_per_row: " << estimated_bytes_per_row
           << ", _block_size_bytes: " << _block_size_bytes;

Block::columns_byte_size(target_columns) >=
_reader_context.preferred_block_size_bytes) {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] When record_rowids is enabled, this byte-budget break exits the outer while loop without resizing _block_row_locations to target_block_row. At line 349, _block_row_locations was resized to batch_size(), so after this break it will still have batch_size() entries instead of target_block_row entries.

The EOF path at lines 372-374 correctly does _block_row_locations.resize(target_block_row). This break path needs the same treatment:

if (config::enable_adaptive_batch_size && _reader_context.preferred_block_size_bytes > 0 &&
    Block::columns_byte_size(target_columns) >=
            _reader_context.preferred_block_size_bytes) {
    if (UNLIKELY(_reader_context.record_rowids)) {
        _block_row_locations.resize(target_block_row);
    }
    break;
}

Without this fix, current_block_row_locations() returns a vector with stale entries, and the DCHECK at line 604 (DCHECK_EQ(_block_row_locations.size(), block->rows() + delete_count)) will fail.

Block::columns_byte_size(target_columns) >=
_reader_context.preferred_block_size_bytes) {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] Same issue as in _replace_key_next_block: when record_rowids is enabled, this byte-budget break exits without resizing _block_row_locations to target_block_row. At line 530, _block_row_locations was resized to _reader_context.batch_size. The EOF path at lines 548-550 correctly resizes.

Fix:

if (config::enable_adaptive_batch_size && _reader_context.preferred_block_size_bytes > 0 &&
    Block::columns_byte_size(target_columns) >=
            _reader_context.preferred_block_size_bytes) {
    if (UNLIKELY(_reader_context.record_rowids)) {
        _block_row_locations.resize(target_block_row);
    }
    break;
}

Note: in _unique_key_next_block, there is also the _delete_sign_available filtering path (starting at line 566) that uses target_block_row and _block_row_locations. An incorrectly-sized _block_row_locations could corrupt the filter logic.

@doris-robot
Copy link

TPC-H: Total hot run time: 30963 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit daebf3f9ccd3a062258f75fb387de4d89c7ecce6, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17671	4802	4775	4775
q2	q3	10807	879	646	646
q4	4755	454	351	351
q5	8228	1316	1095	1095
q6	258	209	176	176
q7	950	934	776	776
q8	10868	1770	1663	1663
q9	7254	5324	5288	5288
q10	6476	2021	1901	1901
q11	487	320	294	294
q12	839	671	526	526
q13	18097	3025	2203	2203
q14	236	234	218	218
q15	q16	789	777	704	704
q17	935	819	758	758
q18	6446	5588	5519	5519
q19	1231	1428	1089	1089
q20	676	602	476	476
q21	5538	2505	2175	2175
q22	431	355	330	330
Total cold run time: 102972 ms
Total hot run time: 30963 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5106	5051	5083	5051
q2	q3	4094	4498	4007	4007
q4	1364	1465	1080	1080
q5	4535	4723	4680	4680
q6	244	215	178	178
q7	1910	1680	1541	1541
q8	2860	3070	2966	2966
q9	7980	7939	7658	7658
q10	4027	4178	3874	3874
q11	614	525	477	477
q12	615	700	528	528
q13	3018	3319	2470	2470
q14	304	311	295	295
q15	q16	774	811	747	747
q17	1395	1560	1468	1468
q18	7783	7183	6930	6930
q19	1284	1334	1259	1259
q20	2104	2347	2038	2038
q21	4969	4316	4150	4150
q22	516	445	422	422
Total cold run time: 55496 ms
Total hot run time: 51819 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173281 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit daebf3f9ccd3a062258f75fb387de4d89c7ecce6, data reload: false

query5	4316	650	495	495
query6	335	228	226	226
query7	4249	527	281	281
query8	342	255	234	234
query9	9250	5274	5237	5237
query10	501	420	341	341
query11	7061	5150	4884	4884
query12	188	131	122	122
query13	1289	479	367	367
query14	5878	3947	3605	3605
query14_1	2995	3015	3003	3003
query15	209	200	180	180
query16	996	490	437	437
query17	916	757	662	662
query18	2436	488	357	357
query19	237	235	192	192
query20	136	131	129	129
query21	231	132	117	117
query22	13184	13369	13319	13319
query23	15963	15577	15361	15361
query23_1	15545	15514	15426	15426
query24	7340	1779	1334	1334
query24_1	1316	1310	1329	1310
query25	553	470	428	428
query26	1240	279	171	171
query27	2781	501	324	324
query28	4664	2704	2662	2662
query29	868	599	510	510
query30	312	234	193	193
query31	1053	965	882	882
query32	86	80	75	75
query33	513	346	288	288
query34	930	938	594	594
query35	678	692	606	606
query36	1120	1159	956	956
query37	138	104	86	86
query38	2973	3021	2889	2889
query39	869	830	814	814
query39_1	793	788	803	788
query40	233	153	146	146
query41	69	63	63	63
query42	273	265	271	265
query43	258	264	227	227
query44	
query45	197	196	185	185
query46	967	1071	672	672
query47	2165	2187	2105	2105
query48	345	345	250	250
query49	646	469	395	395
query50	744	288	230	230
query51	4105	4090	4093	4090
query52	269	266	263	263
query53	314	353	309	309
query54	326	276	308	276
query55	92	87	89	87
query56	328	344	309	309
query57	1959	1825	1706	1706
query58	287	275	278	275
query59	2881	3060	2789	2789
query60	353	351	327	327
query61	158	158	186	158
query62	628	598	548	548
query63	323	304	287	287
query64	5107	1301	1007	1007
query65	
query66	1474	486	372	372
query67	24257	24455	24139	24139
query68	
query69	476	333	304	304
query70	937	1051	1003	1003
query71	350	310	306	306
query72	2814	2677	2423	2423
query73	597	601	374	374
query74	9644	9612	9419	9419
query75	3028	2901	2560	2560
query76	2342	1169	855	855
query77	404	400	318	318
query78	11103	11115	10530	10530
query79	1204	824	615	615
query80	729	648	556	556
query81	480	262	227	227
query82	1303	155	131	131
query83	353	274	263	263
query84	299	126	114	114
query85	872	518	480	480
query86	395	304	291	291
query87	3158	3122	3026	3026
query88	4341	3346	3346	3346
query89	460	390	353	353
query90	1969	208	196	196
query91	174	166	147	147
query92	78	80	77	77
query93	1020	933	594	594
query94	461	331	307	307
query95	605	358	340	340
query96	726	564	266	266
query97	2475	2502	2469	2469
query98	259	227	222	222
query99	1027	1034	914	914
Total cold run time: 251402 ms
Total hot run time: 173281 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.71% (19793/37554)
Line Coverage 36.23% (184854/510166)
Region Coverage 32.50% (143146/440485)
Branch Coverage 33.66% (62584/185927)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.32% (26944/36750)
Line Coverage 56.73% (288414/508437)
Region Coverage 53.94% (239774/444521)
Branch Coverage 55.74% (103911/186407)

@mrhhsg mrhhsg force-pushed the adaptive_batch_size branch from daebf3f to f404b06 Compare March 20, 2026 07:06
@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 20, 2026

run buildall

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 20, 2026

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary: feat Implement adaptive batch size for SegmentIterator

This PR introduces an EWMA-based AdaptiveBlockSizePredictor that dynamically adjusts chunk row counts so each output Block stays near a configurable byte target. The overall design is sound, with proper plumbing from FE session variables through Thrift to the storage layer. The byte-budget checks in BlockReader and VCollectIterator are placed at safe boundaries.

However, I found several issues ranging from a unit test correctness bug to regression test standard violations.

Critical Checkpoints

1. Goal & Correctness: The PR achieves its stated goal of adaptive batch sizing. The EWMA predictor logic is correct. Byte-budget stops in _replace_key_next_block, _unique_key_next_block, _agg_key_next_block, and _merge_next are all placed at safe row boundaries. No data correctness issues found in the core logic.

2. Focused & Minimal: The change is reasonably focused. Adding new session variables, config flags, thrift fields, the predictor, and byte-budget stops across multiple layers is inherent to the feature.

3. Concurrency: The TOCTOU race in _collect_profile_before_close for min/max counter updates is benign (display-only counters) but should be documented or fixed. See inline comment.

4. Lifecycle: No lifecycle issues. The predictor is owned by SegmentIterator and destroyed with it.

5. Configuration: enable_adaptive_batch_size is mutable and appropriately documented. Session variables preferred_block_size_bytes and preferred_max_column_in_block_size_bytes are properly forwarded.

6. Incompatible changes: Thrift fields 210/211 are optional with correct IDs. No compatibility issues. (Note: PR description incorrectly states fields 204/205.)

7. Parallel code paths: The three BlockReader paths (_replace_key, _agg_key, _unique_key) are all covered. VCollectIterator merge path is covered. Compaction paths are not affected (they don't set preferred_block_size_bytes).

8. Test coverage: Unit tests are comprehensive (19+ test cases) but have a critical mock bug that makes metadata-hint tests read indeterminate memory. Regression tests have standard violations. See inline comments.

9. Observability: Profile counters AdaptiveBatchPredictMinRows/MaxRows are added. The else branch stats overwrite is problematic. See inline comment.

10. Performance: Block::columns_byte_size() is O(num_columns) per call (all byte_size() implementations are O(1)). Called per-row in _unique_key_next_block and _merge_next, this adds measurable but acceptable overhead for wide tables.

Issues Found

# Severity File Issue
1 High Unit test MockSegment::num_rows() mock is broken — Segment::num_rows() is non-virtual, so ON_CALL has no effect through Segment& reference. Tests using metadata hints read indeterminate _num_rows.
2 Medium Regression test Uses qt_ prefix instead of order_qt_ (violates regression test standards)
3 Medium Regression test Drops 4 of 5 tables after use (should only drop before, to preserve debug state)
4 Low olap_scanner.cpp TOCTOU race on min/max counter updates (benign for display-only counters)
5 Low segment_iterator.cpp else branch overwrites stats every call when predictor is inactive, producing misleading values
6 Info PR description States thrift fields 204/205 but actual code uses 210/211


// Set up num_rows mock.
ON_CALL(*seg, num_rows()).WillByDefault(Return(num_rows));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (High): Segment::num_rows() is non-virtual (declared as uint32_t num_rows() const { return _num_rows; } in segment.h:119). The ON_CALL(*seg, num_rows()).WillByDefault(Return(num_rows)) configures gmock's virtual mock method, but when predict_next_rows() calls segment.num_rows() through a Segment& reference, the compiler statically dispatches to the base class's non-virtual method, bypassing the mock entirely.

The actual value returned will be whatever _num_rows was left at by the base Segment constructor — which does not initialize _num_rows in its member initializer list (see segment.cpp:168-174). This means tests like PredictNoHistoryMetadataHint are reading indeterminate memory (undefined behavior).

Fix: Since MockSegment is declared as a friend of Segment, you can directly set the member:

seg->_num_rows = num_rows;

instead of (or in addition to) the ON_CALL mock.


if (stats.adaptive_batch_size_predict_max_rows > 0) {
auto cur_min = local_state->_adaptive_batch_predict_min_rows_counter->value();
if (cur_min == 0 || stats.adaptive_batch_size_predict_min_rows < cur_min) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (Low): This read-check-store pattern on shared RuntimeProfile::Counter is a classic TOCTOU race. Multiple scanners call _collect_profile_before_close concurrently on the same local_state counters. Thread A can read a stale cur_min, pass the check, and overwrite thread B's correct value.

Since these are display-only profile counters, this is benign — the worst outcome is a slightly inaccurate min/max in the query profile. But consider documenting this or using COUNTER_UPDATE with atomic min/max if RuntimeProfile::Counter supports it.

def res_enabled = sql "select id, length(c1) as l1, length(c2) as l2, length(c3) as l3 from abs_wide_table order by 1, 2, 3, 4"

qt_wide "select id, length(c1) as l1, length(c2) as l2, length(c3) as l3 from abs_wide_table order by 1, 2, 3, 4 limit 50"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standards violation (Medium): Per regression test standards, use order_qt_ prefix instead of qt_ to ensure deterministic ordered output. This applies to all query tags in this file (qt_wide, qt_narrow, qt_agg, qt_unique, qt_flag).

While some of these queries have explicit ORDER BY or return single rows, the standard convention is to consistently use order_qt_ prefix.

// toward returning close to max_rows (batch is still row-limited).

sql "drop table if exists abs_narrow_table"
sql """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standards violation (Medium): Per regression test standards: "After completing tests, do not drop tables; instead drop tables before using them in tests, to preserve the environment for debugging."

This drop table after use (and similar ones for abs_agg_table, abs_unique_table, abs_flag_table) should be removed. The drop table if exists before CREATE TABLE at the beginning of each test case is the correct pattern (and is already present).

static_cast<int64_t>(predicted));
} else {
_opts.stats->adaptive_batch_size_predict_min_rows = _opts.block_row_max;
_opts.stats->adaptive_batch_size_predict_max_rows = _opts.block_row_max;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (Low): When _block_size_predictor is null (feature disabled), this else branch unconditionally overwrites adaptive_batch_size_predict_min_rows and adaptive_batch_size_predict_max_rows with _opts.block_row_max on every next_batch() call. Since OlapReaderStatistics is shared across segment iterators for the same scanner, and adaptive_batch_size_predict_min_rows is initialized to INT64_MAX, the first segment iterator's overwrite clobbers the sentinel.

This means the profile counters will show misleading values when the feature is disabled. Consider only setting these once, or guarding with a check (e.g., only set if currently INT64_MAX / 0).

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.63% (1796/2284)
Line Coverage 64.38% (32274/50130)
Region Coverage 65.27% (16162/24760)
Branch Coverage 55.71% (8611/15456)

@doris-robot
Copy link

TPC-H: Total hot run time: 27323 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f404b067c8aa7622413fd7d269235aaccd195121, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17655	4402	4359	4359
q2	q3	11076	846	550	550
q4	5270	389	259	259
q5	9358	1253	1020	1020
q6	252	179	151	151
q7	833	892	691	691
q8	10989	1566	1438	1438
q9	6834	4874	4905	4874
q10	6372	1930	1703	1703
q11	471	255	242	242
q12	746	586	471	471
q13	18060	2948	2210	2210
q14	231	236	205	205
q15	q16	747	763	684	684
q17	743	876	469	469
q18	6012	5388	5145	5145
q19	1117	997	638	638
q20	543	482	377	377
q21	4650	2002	1555	1555
q22	371	327	282	282
Total cold run time: 102330 ms
Total hot run time: 27323 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4645	4569	4703	4569
q2	q3	3992	4406	3881	3881
q4	943	1227	847	847
q5	4215	4459	4331	4331
q6	197	191	150	150
q7	1868	1676	1584	1584
q8	2544	2745	2608	2608
q9	7604	7456	7373	7373
q10	3857	4110	3620	3620
q11	564	515	484	484
q12	508	607	501	501
q13	2764	3188	2341	2341
q14	300	306	282	282
q15	q16	739	793	832	793
q17	1375	1433	1422	1422
q18	7227	6874	6555	6555
q19	1041	989	989	989
q20	2113	2190	2015	2015
q21	4021	3561	3439	3439
q22	452	426	397	397
Total cold run time: 50969 ms
Total hot run time: 48181 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169237 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f404b067c8aa7622413fd7d269235aaccd195121, data reload: false

query5	4328	658	511	511
query6	339	234	209	209
query7	4225	503	273	273
query8	357	255	238	238
query9	8698	2777	2790	2777
query10	514	391	349	349
query11	6983	5116	4889	4889
query12	195	133	132	132
query13	1279	462	362	362
query14	5762	3839	3592	3592
query14_1	2908	2900	2917	2900
query15	211	188	176	176
query16	972	413	452	413
query17	879	712	607	607
query18	2439	459	341	341
query19	231	225	194	194
query20	138	127	128	127
query21	214	135	114	114
query22	13266	14164	14916	14164
query23	16166	15714	15584	15584
query23_1	15849	15782	15826	15782
query24	7273	1668	1244	1244
query24_1	1243	1251	1242	1242
query25	562	469	410	410
query26	1255	274	154	154
query27	2768	507	305	305
query28	4477	1897	1859	1859
query29	841	566	499	499
query30	309	216	196	196
query31	1008	948	876	876
query32	88	74	70	70
query33	495	349	287	287
query34	913	903	543	543
query35	644	686	620	620
query36	1087	1108	927	927
query37	137	96	84	84
query38	2946	2916	2867	2867
query39	856	827	804	804
query39_1	786	786	796	786
query40	245	158	141	141
query41	63	60	58	58
query42	263	257	260	257
query43	248	274	257	257
query44	
query45	200	196	187	187
query46	945	1019	633	633
query47	2099	2123	2026	2026
query48	319	322	231	231
query49	625	464	400	400
query50	717	289	224	224
query51	4164	4042	4004	4004
query52	266	266	257	257
query53	307	344	291	291
query54	315	285	273	273
query55	99	90	90	90
query56	320	328	317	317
query57	1906	1855	1658	1658
query58	294	287	272	272
query59	2822	2956	2743	2743
query60	363	344	335	335
query61	152	154	155	154
query62	631	584	529	529
query63	324	296	287	287
query64	5029	1298	1058	1058
query65	
query66	1488	493	398	398
query67	24209	24298	24147	24147
query68	
query69	422	316	297	297
query70	970	988	931	931
query71	366	322	306	306
query72	3170	2883	2363	2363
query73	562	573	332	332
query74	9597	9540	9387	9387
query75	2892	2778	2490	2490
query76	2292	1098	728	728
query77	378	415	330	330
query78	10970	11041	10476	10476
query79	3115	781	580	580
query80	1748	655	548	548
query81	576	268	226	226
query82	978	157	122	122
query83	339	269	255	255
query84	302	118	110	110
query85	948	500	477	477
query86	499	307	302	302
query87	3144	3130	3005	3005
query88	3685	2684	2686	2684
query89	435	386	360	360
query90	2024	187	189	187
query91	181	166	138	138
query92	81	80	71	71
query93	1816	846	505	505
query94	650	332	281	281
query95	594	415	326	326
query96	654	540	233	233
query97	2469	2501	2423	2423
query98	250	221	218	218
query99	1018	1005	931	931
Total cold run time: 253570 ms
Total hot run time: 169237 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.85% (19877/37610)
Line Coverage 36.36% (185718/510836)
Region Coverage 32.61% (143863/441137)
Branch Coverage 33.79% (62960/186314)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.32% (26985/36806)
Line Coverage 56.83% (289314/509111)
Region Coverage 54.07% (240723/445178)
Branch Coverage 55.85% (104336/186800)

Introduce an EWMA-based AdaptiveBlockSizePredictor that dynamically
adjusts SegmentIterator chunk row counts so each output Block stays
near the session-variable-configured preferred_block_size_bytes target.
A complementary byte-budget stop condition is added to all BlockReader
and VCollectIterator accumulation loops so the final Block returned to
the upper layer is also bounded.

Key changes:
- New BE config: enable_adaptive_batch_size (default: true)
- New session variables: preferred_block_size_bytes (8 MB),
  preferred_max_column_in_block_size_bytes (1 MB)
- New Thrift fields 204/205 for FE→BE propagation
- AdaptiveBlockSizePredictor: EWMA per-row and per-column byte
  estimator with conservative segment-metadata bootstrap
- SegmentIterator: predicts rows before each next_batch call;
  updates EWMA on success path
- BlockReader: byte-stop in _replace_key_next_block,
  _unique_key_next_block, _agg_key_next_block
- VCollectIterator: byte-stop in Level1Iterator::_merge_next
- Unit tests and regression tests included

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mrhhsg mrhhsg force-pushed the adaptive_batch_size branch from f404b06 to ae43fe9 Compare March 20, 2026 14:21
@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 20, 2026

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.63% (1796/2284)
Line Coverage 64.36% (32264/50130)
Region Coverage 65.24% (16154/24760)
Branch Coverage 55.65% (8602/15456)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.80% (19860/37616)
Line Coverage 36.28% (185340/510878)
Region Coverage 32.54% (143556/441152)
Branch Coverage 33.73% (62850/186327)

@doris-robot
Copy link

TPC-H: Total hot run time: 26528 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ae43fe96d6a20040c16d54f6ee49141576a11c2a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	16921	4568	4330	4330
q2	q3	10395	787	521	521
q4	4687	356	257	257
q5	7572	1217	995	995
q6	168	172	146	146
q7	785	852	673	673
q8	9335	1455	1313	1313
q9	4828	4768	4620	4620
q10	6236	1918	1617	1617
q11	448	265	243	243
q12	694	582	468	468
q13	18037	2983	2184	2184
q14	234	234	216	216
q15	q16	735	745	689	689
q17	724	844	427	427
q18	5822	5439	5196	5196
q19	1115	981	617	617
q20	528	495	365	365
q21	4411	1855	1413	1413
q22	335	297	238	238
Total cold run time: 94010 ms
Total hot run time: 26528 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4890	4658	4594	4594
q2	q3	3900	4490	3870	3870
q4	905	1212	779	779
q5	4058	4350	4343	4343
q6	193	181	145	145
q7	1775	1660	1651	1651
q8	2575	2705	2561	2561
q9	7498	7300	7357	7300
q10	3737	3983	3606	3606
q11	510	450	449	449
q12	503	635	475	475
q13	2721	3157	2469	2469
q14	295	303	282	282
q15	q16	739	836	742	742
q17	1217	1318	1415	1318
q18	7172	6692	6694	6692
q19	896	876	896	876
q20	2061	2130	2011	2011
q21	3995	3533	3365	3365
q22	494	438	391	391
Total cold run time: 50134 ms
Total hot run time: 47919 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169678 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ae43fe96d6a20040c16d54f6ee49141576a11c2a, data reload: false

query5	4311	670	499	499
query6	340	232	235	232
query7	4219	492	266	266
query8	334	237	226	226
query9	8742	2742	2713	2713
query10	539	414	374	374
query11	6986	5083	4861	4861
query12	189	131	127	127
query13	1280	475	349	349
query14	5737	3760	3545	3545
query14_1	2939	2858	2883	2858
query15	209	200	176	176
query16	982	472	477	472
query17	924	744	642	642
query18	2448	459	351	351
query19	219	226	195	195
query20	133	127	129	127
query21	215	140	112	112
query22	13437	14426	14713	14426
query23	16117	15651	15646	15646
query23_1	15909	15702	15816	15702
query24	7297	1620	1240	1240
query24_1	1234	1239	1279	1239
query25	559	498	404	404
query26	1237	268	151	151
query27	2774	487	305	305
query28	4345	1861	1865	1861
query29	820	566	488	488
query30	310	224	196	196
query31	996	952	903	903
query32	80	71	71	71
query33	515	346	293	293
query34	893	879	547	547
query35	646	703	592	592
query36	1074	1104	986	986
query37	141	100	82	82
query38	2937	2940	2900	2900
query39	849	822	811	811
query39_1	793	794	809	794
query40	237	157	140	140
query41	63	59	61	59
query42	260	261	255	255
query43	256	248	229	229
query44	
query45	199	193	186	186
query46	875	997	605	605
query47	2495	2130	2072	2072
query48	308	332	234	234
query49	635	471	393	393
query50	693	281	221	221
query51	4027	3958	4048	3958
query52	263	270	260	260
query53	297	348	293	293
query54	311	275	284	275
query55	100	88	87	87
query56	320	336	346	336
query57	1888	1916	1708	1708
query58	285	280	274	274
query59	2783	2967	2773	2773
query60	352	346	328	328
query61	161	161	162	161
query62	638	581	511	511
query63	315	287	294	287
query64	4882	1302	1006	1006
query65	
query66	1467	463	390	390
query67	24220	24234	24122	24122
query68	
query69	418	321	304	304
query70	993	962	946	946
query71	360	316	309	309
query72	3077	2928	2659	2659
query73	556	559	327	327
query74	9567	9619	9383	9383
query75	2884	2801	2468	2468
query76	2155	1052	695	695
query77	373	432	317	317
query78	10875	11020	10521	10521
query79	2691	762	589	589
query80	1792	675	574	574
query81	564	262	233	233
query82	965	152	121	121
query83	338	271	250	250
query84	303	130	103	103
query85	921	506	477	477
query86	419	317	279	279
query87	3141	3175	3028	3028
query88	3631	2686	2677	2677
query89	429	377	348	348
query90	2038	193	188	188
query91	174	165	137	137
query92	87	74	84	74
query93	1367	822	523	523
query94	638	280	292	280
query95	600	398	325	325
query96	648	532	235	235
query97	2442	2487	2398	2398
query98	248	225	217	217
query99	1029	1008	919	919
Total cold run time: 252034 ms
Total hot run time: 169678 ms

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.19% (26943/36811)
Line Coverage 56.69% (288618/509150)
Region Coverage 54.00% (240384/445189)
Branch Coverage 55.67% (103990/186813)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.18% (26940/36811)
Line Coverage 56.68% (288581/509150)
Region Coverage 53.98% (240333/445189)
Branch Coverage 55.66% (103972/186813)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.19% (26941/36811)
Line Coverage 56.68% (288599/509150)
Region Coverage 53.98% (240297/445189)
Branch Coverage 55.66% (103987/186813)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants