Skip to content

v2.8.45#920

Open
jmrenouard wants to merge 14 commits into
major:masterfrom
jmrenouard:2.8.45
Open

v2.8.45#920
jmrenouard wants to merge 14 commits into
major:masterfrom
jmrenouard:2.8.45

Conversation

@jmrenouard

Copy link
Copy Markdown
Collaborator

Release Notes - v2.8.45

Date: 2026-06-01

📝 Executive Summary

2.8.45 2026-06-01

- feat: filter system databases from PFS/sys schema analysis queries and dumpdir exports
- feat: integrate temptable_max_ram into per-thread buffer memory calculations
- feat: add InnoDB index/data ratio check and CSV dump for tables > 50,000 rows
- fix: normalize dollar sign escaping in SQL queries to prevent syntax errors and shell interpolation failures
- fix: skip x$ps_schema_table_statistics_io views in dumpdir schema exports to prevent execution failures on newer MariaDB releases
- chore: restore doc_sync.py utility script and run doc synchronization
- test: add unit tests for system database filtering logic
- test: add unit tests for temptable memory limit calculations
- test: add unit test for index/data ratio calculation logic

📈 Diagnostic Growth Indicators

Metric Current Progress Status
Total Indicators 16 +1 🛡️
Efficiency Checks 1 +1 🛡️
Risk Detections 2 0 🛡️
Information Points 13 0 🛡️

🛠️ Internal Commit History

  • feat: filter system databases from PFS/sys schema analysis queries and dumpdir exports
  • feat: integrate temptable_max_ram into per-thread buffer memory calculations
  • feat: add InnoDB index/data ratio check and CSV dump for tables > 50,000 rows
  • fix: normalize dollar sign escaping in SQL queries to prevent syntax errors and shell interpolation failures
  • fix: skip x$ps_schema_table_statistics_io views in dumpdir schema exports to prevent execution failures on newer MariaDB releases
  • chore: restore doc_sync.py utility script and run doc synchronization
  • test: add unit tests for system database filtering logic
  • test: add unit tests for temptable memory limit calculations
  • test: add unit test for index/data ratio calculation logic

⚙️ Technical Evolutions

➕ CLI Options Added

None.

➖ CLI Options Deprecated

None.

✅ Laboratory Verification Results

  • Automated TDD suite passed.
  • Multi-DB version laboratory execution validated.
  • Performance indicator delta analysis completed.

- feat: filter system databases from PFS/sys schema analysis queries and dumpdir
  exports.

- feat: integrate temptable_max_ram into per-thread buffer memory calculations.

- feat: add InnoDB index/data ratio check and CSV dump for tables > 50,000 rows.

- fix: normalize dollar sign escaping in SQL queries to prevent syntax errors
  and shell interpolation failures.

- fix: skip x\$ps_schema_table_statistics_io views in dumpdir schema exports to
  prevent execution failures on newer MariaDB releases.

- chore: restore doc_sync.py utility script and run doc synchronization.

- test: add unit tests for system database filtering logic.

- test: add unit tests for temptable memory limit calculations.

- test: add unit test for index/data ratio calculation logic.
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f55e6272-8c26-4986-89c6-d4926d57f41e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
tests/issue_864.t (2)

143-146: 💤 Low value

Edge case: what if the global limit isn't the limiting factor?

This test exercises the happy path where temptable_max_ram (2MB) is smaller than the theoretical per-connection ceiling (10GB). You're not testing the inverse—when the per-connection limit is smaller and temptable_max_ram becomes a non-binding upper bound.

Given that the first subtest implicitly covers the "no temptable_max_ram" fallback, you've got decent coverage. But if you ever wonder why a future regression slipped through, well... you've been warned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/issue_864.t` around lines 143 - 146, Add a complementary subtest that
covers the inverse scenario where the per-connection ceiling is the binding
limit and temptable_max_ram is non‑binding: change the test data used in
tests/issue_864.t so that 'tmp_table_size' and/or 'max_heap_table_size' produce
a small per-connection limit (e.g., much smaller than 2 MB) while setting
'temptable_max_ram' to a very large value, then assert the computed temp table
allowance equals the per-connection limit (use the same assertion style and
helper as the existing subtest to mirror the happy-path checks).

112-130: ⚡ Quick win

Consider extracting the mock setup boilerplate.

You've copied 19 lines of local *main:: declarations from the first subtest. When the inevitable sixth developer tweaks a mock in one place and forgets the other, the resulting bug will be... character-building.

A shared setup helper or a BEGIN block would let you write the pain once and reuse the gain.

♻️ Example: shared mock setup

Extract common mocking into a helper at the top of the file:

sub setup_common_mocks {
    no warnings 'redefine';
    no warnings 'uninitialized';
    local *main::debugprint = sub { };
    local *main::is_int = sub { return $_[0] && $_[0] =~ /^\d+$/ };
    local *main::human_size = sub { return $_[0] };
    local *main::hr_bytes = sub { return $_[0] };
    local *main::percentage = sub { return 0 };
    local *main::badprint = sub { };
    local *main::infoprint = sub { };
    local *main::select_one = sub { return 0 };
    local *main::select_array = sub { return () };
    local *main::mysql_version_ge = sub { return 1 };
    local *main::mysql_version_le = sub { return 0 };
    local *main::mysql_version_eq = sub { return 0 };
    local *main::execute_system_command = sub { return "" };
    local *main::get_pf_memory = sub { return 0 };
    local *main::get_gcache_memory = sub { return 0 };
    local *main::is_remote = sub () { return 0 };
    local *main::mysql_cloud_discovery = sub { return "none" };
}

Then call it in each subtest:

subtest 'Issue `#864`: per-thread memory buffer calculations' => sub {
    setup_common_mocks();
    # ... rest of test
};

Note: Perl's local requires care with scope—verify that extracted mocks still apply within the subtest block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/issue_864.t` around lines 112 - 130, Extract the repeated mock
declarations (all the local *main::... subs like debugprint, is_int, human_size,
hr_bytes, percentage, badprint, infoprint, select_one, select_array,
mysql_version_ge, mysql_version_le, mysql_version_eq, execute_system_command,
get_pf_memory, get_gcache_memory, is_remote, mysql_cloud_discovery) into a
single helper sub (e.g. setup_common_mocks) placed near the top of the test
file, then call setup_common_mocks() inside each subtest instead of duplicating
the twenty lines; ensure you preserve the use of local to maintain lexical
scoping (or, if needed for scope, call the helper inside the subtest block or
convert to a BEGIN block) so the mocks apply correctly during each test run.
build/doc_sync.py (2)

9-9: 💤 Low value

The function name parse_aff is cryptic.

I've debugged enough code to know that three months from now, nobody (including you) will remember what "aff" stands for. Since this parses markdown frontmatter and headers, consider parse_markdown_metadata or parse_md_frontmatter.

📝 Proposed rename for clarity
-def parse_aff(file_path):
+def parse_markdown_metadata(file_path):

Then update the two call sites on lines 51 and 55.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@build/doc_sync.py` at line 9, Rename the cryptic function parse_aff to a
descriptive name like parse_markdown_metadata (or parse_md_frontmatter) and
update its two call sites that invoke parse_aff so they call the new name;
specifically change the function definition def parse_aff(...) to def
parse_markdown_metadata(...) and replace the two occurrences of parse_aff(...)
in this module with parse_markdown_metadata(...), keeping the implementation and
parameters unchanged.

55-55: 💤 Low value

The title variable on line 55 is unpacked but never used.

You're extracting the title from the markdown file but only using the description. Either use the title in the output or unpack only what you need to silence the linter.

🧹 Proposed fix to unpack only desc
-            title, desc = parse_aff(os.path.join(folder_path, filename))
+            _, desc = parse_aff(os.path.join(folder_path, filename))

Or if you meant to use the title in the table (which might be better than showing just the filename):

             title, desc = parse_aff(os.path.join(folder_path, filename))
-            output.append(f"| [`{filename}`](./{folder}/{filename}) | {desc} |")
+            output.append(f"| [`{title}`](./{folder}/{filename}) | {desc} |")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@build/doc_sync.py` at line 55, The code unpacks title and desc from
parse_aff(os.path.join(folder_path, filename)) but never uses title; either
discard the unused value or actually include it in the generated output. Locate
the parse_aff call (parse_aff(...)) in build/doc_sync.py and replace the unpack
with a single-value discard (e.g., use "_" or only assign desc) if you don't
need the title, or update the table/metadata writing logic that uses desc (where
filename is shown) to include title instead of filename so title is consumed.
tests/test_index_ratio.t (2)

47-52: 💤 Low value

The moment of truth – does your code actually work, or just think it does?

Your assertions look shipshape. You're validating both the filename and the actual CSV content with proper regex patterns that match the expected format. The three test cases (under/over/ideal) exercise the classification logic nicely.

However, I notice this test is laser-focused on the happy path where InnoDB exists, dumpdir is set, and all tables are large enough to analyze. While that's perfectly fine for a feature validation test, you might want to consider adding edge case coverage as optional future enhancements:

  • Tables with ≤50,000 rows (should be filtered out per the PR objectives)
  • Behavior when have_innodb isn't 'YES'
  • Behavior when dumpdir isn't configured
  • The skipsize threshold mentioned in your layer description

Think of it as insurance against future you wondering "wait, does this handle X?" at 2am during an incident.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_index_ratio.t` around lines 47 - 52, Tests only cover the happy
path; add edge-case assertions to ensure correct behavior when InnoDB is absent,
dumpdir is unset, small tables are filtered, and the skipsize threshold is
respected: extend tests around main::mysql_innodb() and the variables
$dumped_file/$dumped_content to include scenarios where have_innodb != 'YES',
dumpdir is not configured, a table has <=50000 rows (should be excluded from the
CSV), and a table at the skipsize boundary to validate inclusion/exclusion and
classification logic.

16-38: 💤 Low value

Ah, the noble art of mocking – where functions go to pretend they're working.

The test setup looks solid. Your mocks are appropriately stubbed, and I appreciate that you've included three representative test cases (under/over/ideal ratios). The ratio calculations in your comments are correct, which is good because nobody wants to debug floating-point math on a Monday morning.

One minor observation: the select_array mock returns data unconditionally without validating the query being executed. This is fine for a focused feature test (and consistent with this codebase's testing patterns), but it does mean if mysql_innodb() accidentally queries the weather in Tahiti instead of table statistics, this test will still pass. Consider it a trust exercise between you and your future self.

🔍 Optional: Validate the query being executed

If you want to be extra thorough, you could capture and validate the query:

+    my $captured_query = '';
     local *main::select_array = sub { 
+        $captured_query = $_[0];
         return `@mock_ratio_tables`; 
     };
     
     # ... after mysql_innodb() call ...
+    like($captured_query, qr/information_schema\.tables/i, 'Query targets information_schema');
+    like($captured_query, qr/table_rows\s*>\s*50000/i, 'Query filters for >50k rows');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_index_ratio.t` around lines 16 - 38, The test's select_array mock
returns canned rows for any query, so it won't catch if the code under test
calls the wrong SQL; update the mock for select_array to capture the incoming
SQL (e.g., store $_[0] or similar) and assert it matches the expected InnoDB
table stats query or a key pattern (so the test fails if mysql_innodb() queries
the wrong thing); keep using the same `@mock_ratio_tables` as the return value but
only after validating the captured query, and reference the mocked sub
select_array and the test block 'InnoDB Table Index/Data Ratio Check' to locate
where to add the capture/assertion.
tests/test_sysdb_filter.t (1)

16-16: ⚡ Quick win

Path consistency: prefer absolute path resolution.

Ah, the classic "works on my machine" path trap. Lines 13-15 went through the trouble of computing absolute paths, but this line reverted to a relative ./ path. If someone runs the test from a different working directory (say, during debugging), this require will fail while the mysqltuner.pl require succeeds.

🗺️ Proposed fix for consistent path handling
-require './tests/MySQLTuner/TestHelper.pm';
+my $helper = abs_path(File::Spec->catfile($script_dir, 'MySQLTuner', 'TestHelper.pm'));
+require $helper;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_sysdb_filter.t` at line 16, The test uses a relative require
('./tests/MySQLTuner/TestHelper.pm') which breaks when run from a different
working directory; replace that relative require with the absolute path variant
computed earlier in the file (use the same variable/logic created on lines 13-15
that resolves test paths) so the require loads TestHelper.pm via the resolved
absolute tests directory instead of './tests/...'.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@build/doc_sync.py`:
- Line 60: Replace the deprecated os.popen('date') usage inside the
output.append call in build/doc_sync.py with a safe Python datetime or
subprocess alternative: use datetime.datetime.now().strftime(...) (or
datetime.now().isoformat()) to produce the date string, or if you need the exact
shell date formatting invoke subprocess.run(['date', '+FORMAT'],
capture_output=True, text=True) and use the stdout; update the expression used
in the output.append("---\n*Generated automatically by `/doc-sync` on " + ... +
"*") to use the chosen safe alternative instead of os.popen(...).read().strip().

In `@mysqltuner.pl`:
- Around line 11475-11479: The WHERE clause in the queries against
events_statements_summary_by_digest currently uses "SCHEMA_NAME NOT IN (...)"
which unintentionally excludes rows where SCHEMA_NAME IS NULL; update those
WHERE clauses (the query string referenced and the similar block around lines
handling events_statements_summary_by_digest) to explicitly allow NULL-schema
digests by adding a NULL check such as including "SCHEMA_NAME IS NULL OR" before
the NOT IN check (i.e., ensure the predicate is (SCHEMA_NAME IS NULL OR
SCHEMA_NAME NOT IN (...))). Locate the queries that reference
events_statements_summary_by_digest and SCHEMA_NAME and apply the same change to
the second occurrence noted in the comment.
- Around line 11019-11021: The loop over `@ratio_tables` is splitting each $row on
'|' but select_array() returns tab-delimited rows, so $name becomes undefined
and rows are skipped; change the split in the foreach loop (where $row is
processed and assigned to ($schema, $name, $data_len, $index_len, $rows)) to
split on tabs (e.g., use "\t" or /\t/) so
$schema/$name/$data_len/$index_len/$rows are correctly populated and the
index/data ratio check can run.
- Around line 7611-7613: The current schema filters ($sys_db_filter_db,
$sys_db_filter_ts, $sys_db_filter_os) drop rows where the schema column is NULL;
change each to explicitly allow NULLs by wrapping the NOT IN condition with an
IS NULL check so NULL-valued schema rows are preserved (e.g. use "db IS NULL OR
db NOT IN (...)" for $sys_db_filter_db and similarly "table_schema IS NULL OR
table_schema NOT IN (...)" for $sys_db_filter_ts and "object_schema IS NULL OR
object_schema NOT IN (...)" for $sys_db_filter_os).
- Around line 12902-12905: The current WHERE clause built in $query using
$sys_schema_filter_cols{$sys_view} and $sys_excl_list drops rows where the key
column is NULL because "col NOT IN (...)" yields NULL; change the predicate to
be NULL-safe so NULL-valued rows are preserved in dumpdir exports — e.g. replace
the plain "$col NOT IN ($sys_excl_list)" with a NULL-aware condition such as
"($col IS NULL OR $col NOT IN ($sys_excl_list))" when appending to $query so the
filter excludes listed schemas but retains NULLs.

---

Nitpick comments:
In `@build/doc_sync.py`:
- Line 9: Rename the cryptic function parse_aff to a descriptive name like
parse_markdown_metadata (or parse_md_frontmatter) and update its two call sites
that invoke parse_aff so they call the new name; specifically change the
function definition def parse_aff(...) to def parse_markdown_metadata(...) and
replace the two occurrences of parse_aff(...) in this module with
parse_markdown_metadata(...), keeping the implementation and parameters
unchanged.
- Line 55: The code unpacks title and desc from
parse_aff(os.path.join(folder_path, filename)) but never uses title; either
discard the unused value or actually include it in the generated output. Locate
the parse_aff call (parse_aff(...)) in build/doc_sync.py and replace the unpack
with a single-value discard (e.g., use "_" or only assign desc) if you don't
need the title, or update the table/metadata writing logic that uses desc (where
filename is shown) to include title instead of filename so title is consumed.

In `@tests/issue_864.t`:
- Around line 143-146: Add a complementary subtest that covers the inverse
scenario where the per-connection ceiling is the binding limit and
temptable_max_ram is non‑binding: change the test data used in tests/issue_864.t
so that 'tmp_table_size' and/or 'max_heap_table_size' produce a small
per-connection limit (e.g., much smaller than 2 MB) while setting
'temptable_max_ram' to a very large value, then assert the computed temp table
allowance equals the per-connection limit (use the same assertion style and
helper as the existing subtest to mirror the happy-path checks).
- Around line 112-130: Extract the repeated mock declarations (all the local
*main::... subs like debugprint, is_int, human_size, hr_bytes, percentage,
badprint, infoprint, select_one, select_array, mysql_version_ge,
mysql_version_le, mysql_version_eq, execute_system_command, get_pf_memory,
get_gcache_memory, is_remote, mysql_cloud_discovery) into a single helper sub
(e.g. setup_common_mocks) placed near the top of the test file, then call
setup_common_mocks() inside each subtest instead of duplicating the twenty
lines; ensure you preserve the use of local to maintain lexical scoping (or, if
needed for scope, call the helper inside the subtest block or convert to a BEGIN
block) so the mocks apply correctly during each test run.

In `@tests/test_index_ratio.t`:
- Around line 47-52: Tests only cover the happy path; add edge-case assertions
to ensure correct behavior when InnoDB is absent, dumpdir is unset, small tables
are filtered, and the skipsize threshold is respected: extend tests around
main::mysql_innodb() and the variables $dumped_file/$dumped_content to include
scenarios where have_innodb != 'YES', dumpdir is not configured, a table has
<=50000 rows (should be excluded from the CSV), and a table at the skipsize
boundary to validate inclusion/exclusion and classification logic.
- Around line 16-38: The test's select_array mock returns canned rows for any
query, so it won't catch if the code under test calls the wrong SQL; update the
mock for select_array to capture the incoming SQL (e.g., store $_[0] or similar)
and assert it matches the expected InnoDB table stats query or a key pattern (so
the test fails if mysql_innodb() queries the wrong thing); keep using the same
`@mock_ratio_tables` as the return value but only after validating the captured
query, and reference the mocked sub select_array and the test block 'InnoDB
Table Index/Data Ratio Check' to locate where to add the capture/assertion.

In `@tests/test_sysdb_filter.t`:
- Line 16: The test uses a relative require ('./tests/MySQLTuner/TestHelper.pm')
which breaks when run from a different working directory; replace that relative
require with the absolute path variant computed earlier in the file (use the
same variable/logic created on lines 13-15 that resolves test paths) so the
require loads TestHelper.pm via the resolved absolute tests directory instead of
'./tests/...'.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6867b918-30f8-44cc-b61d-01905264411b

📥 Commits

Reviewing files that changed from the base of the PR and between 188a0f1 and 8d0ddeb.

⛔ Files ignored due to path filters (1)
  • .agent/README.md is excluded by !.agent/**
📒 Files selected for processing (10)
  • CURRENT_VERSION.txt
  • Changelog
  • USAGE.md
  • build/doc_sync.py
  • mysqltuner.pl
  • releases/v2.8.45.md
  • tests/issue_864.t
  • tests/test_index_ratio.t
  • tests/test_sysdb_filter.t
  • tests/unit_system.t

Comment thread build/doc_sync.py Outdated

output.append("\n")

output.append("---\n*Generated automatically by `/doc-sync` on " + os.popen('date').read().strip() + "*")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace the deprecated os.popen call—it's been living on borrowed time since Python 2.6.

Using os.popen('date') triggers security scanner warnings (S605, S607) and is deprecated. While your hardcoded date command isn't malicious, security tooling won't know that, and future Python versions may finally put os.popen out of its misery.

⏰ Proposed fix using datetime
+from datetime import datetime
+
 # ... later in the file ...
 
-    output.append("---\n*Generated automatically by `/doc-sync` on " + os.popen('date').read().strip() + "*")
+    output.append("---\n*Generated automatically by `/doc-sync` on " + datetime.now().strftime("%Y-%m-%d %H:%M:%S") + "*")

Or if you prefer shell date format, use subprocess:

+import subprocess
+
 # ... later in the file ...
 
-    output.append("---\n*Generated automatically by `/doc-sync` on " + os.popen('date').read().strip() + "*")
+    timestamp = subprocess.run(['date'], capture_output=True, text=True, check=True).stdout.strip()
+    output.append(f"---\n*Generated automatically by `/doc-sync` on {timestamp}*")
🧰 Tools
🪛 Ruff (0.15.15)

[error] 60-60: Starting a process with a shell: seems safe, but may be changed in the future; consider rewriting without shell

(S605)


[error] 60-60: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@build/doc_sync.py` at line 60, Replace the deprecated os.popen('date') usage
inside the output.append call in build/doc_sync.py with a safe Python datetime
or subprocess alternative: use datetime.datetime.now().strftime(...) (or
datetime.now().isoformat()) to produce the date string, or if you need the exact
shell date formatting invoke subprocess.run(['date', '+FORMAT'],
capture_output=True, text=True) and use the stdout; update the expression used
in the output.append("---\n*Generated automatically by `/doc-sync` on " + ... +
"*") to use the chosen safe alternative instead of os.popen(...).read().strip().

Comment thread mysqltuner.pl Outdated
Comment on lines +7611 to +7613
my $sys_db_filter_db = "db NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
my $sys_db_filter_ts = "table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
my $sys_db_filter_os = "object_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make these schema filters NULL-safe.

db NOT IN (...) also drops rows where db is NULL, so statements executed without a default schema vanish along with the system noise. Very efficient filter; slightly too efficient.

Suggested fix
-    my $sys_db_filter_db = "db NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
-    my $sys_db_filter_ts = "table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
-    my $sys_db_filter_os = "object_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
+    my $sys_db_filter_db = "(db IS NULL OR db NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys'))";
+    my $sys_db_filter_ts = "(table_schema IS NULL OR table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys'))";
+    my $sys_db_filter_os = "(object_schema IS NULL OR object_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys'))";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
my $sys_db_filter_db = "db NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
my $sys_db_filter_ts = "table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
my $sys_db_filter_os = "object_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')";
my $sys_db_filter_db = "(db IS NULL OR db NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys'))";
my $sys_db_filter_ts = "(table_schema IS NULL OR table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys'))";
my $sys_db_filter_os = "(object_schema IS NULL OR object_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys'))";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mysqltuner.pl` around lines 7611 - 7613, The current schema filters
($sys_db_filter_db, $sys_db_filter_ts, $sys_db_filter_os) drop rows where the
schema column is NULL; change each to explicitly allow NULLs by wrapping the NOT
IN condition with an IS NULL check so NULL-valued schema rows are preserved
(e.g. use "db IS NULL OR db NOT IN (...)" for $sys_db_filter_db and similarly
"table_schema IS NULL OR table_schema NOT IN (...)" for $sys_db_filter_ts and
"object_schema IS NULL OR object_schema NOT IN (...)" for $sys_db_filter_os).

Comment thread mysqltuner.pl
Comment on lines +11019 to +11021
foreach my $row (@ratio_tables) {
my ($schema, $name, $data_len, $index_len, $rows) = split(/\|/, $row);
next unless defined $schema && defined $name;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Split on tabs here, not on optimism.

select_array() returns tab-delimited rows, but this loop splits on |. That leaves $name undefined, skips every row, and the new index/data ratio check quietly reports nothing. Classic silent failure: the most polite kind.

Suggested fix
-        my ($schema, $name, $data_len, $index_len, $rows) = split(/\|/, $row);
+        my ($schema, $name, $data_len, $index_len, $rows) = split(/\t/, $row);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
foreach my $row (@ratio_tables) {
my ($schema, $name, $data_len, $index_len, $rows) = split(/\|/, $row);
next unless defined $schema && defined $name;
foreach my $row (`@ratio_tables`) {
my ($schema, $name, $data_len, $index_len, $rows) = split(/\t/, $row);
next unless defined $schema && defined $name;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mysqltuner.pl` around lines 11019 - 11021, The loop over `@ratio_tables` is
splitting each $row on '|' but select_array() returns tab-delimited rows, so
$name becomes undefined and rows are skipped; change the split in the foreach
loop (where $row is processed and assigned to ($schema, $name, $data_len,
$index_len, $rows)) to split on tabs (e.g., use "\t" or /\t/) so
$schema/$name/$data_len/$index_len/$rows are correctly populated and the
index/data ratio check can run.

Comment thread mysqltuner.pl
Comment on lines 11475 to 11479
"SELECT digest_text, count_star, sum_no_index_used, sum_no_good_index_used "
. "FROM performance_schema.events_statements_summary_by_digest "
. "WHERE sum_no_index_used > 0 OR sum_no_good_index_used > 0 "
. "WHERE (sum_no_index_used > 0 OR sum_no_good_index_used > 0) "
. "AND SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') "
. "ORDER BY sum_no_index_used DESC LIMIT 5" );

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The anti-pattern filters are dropping NULL-schema digests too.

events_statements_summary_by_digest.SCHEMA_NAME can be NULL. With SCHEMA_NAME NOT IN (...), statements run without a default schema disappear from the full-scan and disk-temp reports, which is a fun way to lose the interesting queries.

Suggested fix
-          . "WHERE (sum_no_index_used > 0 OR sum_no_good_index_used > 0) "
-          . "AND SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') "
+          . "WHERE (sum_no_index_used > 0 OR sum_no_good_index_used > 0) "
+          . "AND (SCHEMA_NAME IS NULL OR SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')) "
           . "ORDER BY sum_no_index_used DESC LIMIT 5" );
...
-          . "WHERE sum_created_tmp_disk_tables > 0 "
-          . "AND SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') "
+          . "WHERE sum_created_tmp_disk_tables > 0 "
+          . "AND (SCHEMA_NAME IS NULL OR SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')) "
           . "ORDER BY sum_created_tmp_disk_tables DESC LIMIT 5" );

Also applies to: 11503-11507

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mysqltuner.pl` around lines 11475 - 11479, The WHERE clause in the queries
against events_statements_summary_by_digest currently uses "SCHEMA_NAME NOT IN
(...)" which unintentionally excludes rows where SCHEMA_NAME IS NULL; update
those WHERE clauses (the query string referenced and the similar block around
lines handling events_statements_summary_by_digest) to explicitly allow
NULL-schema digests by adding a NULL check such as including "SCHEMA_NAME IS
NULL OR" before the NOT IN check (i.e., ensure the predicate is (SCHEMA_NAME IS
NULL OR SCHEMA_NAME NOT IN (...))). Locate the queries that reference
events_statements_summary_by_digest and SCHEMA_NAME and apply the same change to
the second occurrence noted in the comment.

Comment thread mysqltuner.pl Outdated
Comment on lines +12902 to +12905
my $query = 'use sys; select * from sys.\`' . $sys_view_table . '\`';
if ( my $col = $sys_schema_filter_cols{$sys_view} ) {
$query .= " WHERE $col NOT IN ($sys_excl_list)";
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Carry the NULL-safe filter into dumpdir exports as well.

The generic WHERE $col NOT IN (...) has the same three-valued-logic gremlin for views keyed on db: rows with NULL in that column get dropped from the CSV snapshot. So the export stops being “exclude system schemas” and starts being “also exclude normal activity if SQL feels mysterious today.”

Suggested fix
-        if ( my $col = $sys_schema_filter_cols{$sys_view} ) {
-            $query .= " WHERE $col NOT IN ($sys_excl_list)";
+        if ( my $col = $sys_schema_filter_cols{$sys_view} ) {
+            $query .= " WHERE ($col IS NULL OR $col NOT IN ($sys_excl_list))";
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
my $query = 'use sys; select * from sys.\`' . $sys_view_table . '\`';
if ( my $col = $sys_schema_filter_cols{$sys_view} ) {
$query .= " WHERE $col NOT IN ($sys_excl_list)";
}
my $query = 'use sys; select * from sys.\`' . $sys_view_table . '\`';
if ( my $col = $sys_schema_filter_cols{$sys_view} ) {
$query .= " WHERE ($col IS NULL OR $col NOT IN ($sys_excl_list))";
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mysqltuner.pl` around lines 12902 - 12905, The current WHERE clause built in
$query using $sys_schema_filter_cols{$sys_view} and $sys_excl_list drops rows
where the key column is NULL because "col NOT IN (...)" yields NULL; change the
predicate to be NULL-safe so NULL-valued rows are preserved in dumpdir exports —
e.g. replace the plain "$col NOT IN ($sys_excl_list)" with a NULL-aware
condition such as "($col IS NULL OR $col NOT IN ($sys_excl_list))" when
appending to $query so the filter excludes listed schemas but retains NULLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant