Fix trace-file accumulation and Data Retention job failure (#972)#973
Conversation
xp_delete_file cannot delete SQL Trace (.trc) files - it only accepts backup files and Maintenance Plan reports and validates the header - so the #951 trace cleanup in config.data_retention never worked, and its malformed wildcard path raised an uncatchable Msg 22049 that failed the Data Retention Agent job on every run. - Remove the broken xp_delete_file block from config.data_retention. - collect.trace_management_collector now creates the trace with a rollover file-count cap (@filecount, via the new @max_files param), so SQL Server prunes old .trc files itself. START also replaces an unbounded trace left by an older version, so the fix self-heals without waiting for a SQL Server restart. - scheduled_master_collector calls START instead of RESTART, so it no longer tears the trace down and orphans its files every cycle. - Add tools/Remove-OrphanedTraceFiles.ps1 to sweep trace files left on disk by versions <= 2.11.0; document it in the README troubleshooting section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I've been doing some troubleshooting today and was about to add similar comments (to yours) to #951 (but you beat me to it on #972). Sorry about suggesting xp_delete_file; I didn't know it was hobbled to only work with SQL backup files (you learn something new every day). I did upgrade the dashboard (full) to 2.11.0 a few days ago and then used the dashboard to upgrade a few of my targeted SQL servers (two SQL 2019 and two SQL 2022 servers) but none of them got the new version of the config.data_retention proc. The PerformanceMonitor database on each of them still has the older version of the proc with the xp_delete_file code. |
|
@ That likely explains what youre seeing: the Dashboard binary is current but the procs on your instances are still whatever the script workflow last installed. A few things that would help pin it down:
|
|
Sorry, I didn't explain myself very well. When I said I "used the dashboard to upgrade", what I meant was that I did: Manage Servers | picked a server | Check for Updates | Upgrade Now. The dashboard binaries that I have installed are v2.11.0 (not a nightly revision). That will be why For completeness, this is the recent install history on one of the affected SQL instances: Coincidentally, v2.11.0 was the first time I used the All good, I'll just be patient. 🙂 |
|
@ Your instincts were right on both counts, too: So waiting for v2.12.0 is the cleanest path, but if the failing Data Retention job is noisy in the meantime, grabbing a nightly and re-running the per-server upgrade will get you the fix immediately. The Thanks for the careful investigation and the kind words — genuinely appreciated. 🙂 |


Problem
The trace-file cleanup added in v2.11.0 (#951) failed the
PerformanceMonitor - Data RetentionAgent job on every run withMsg 22049once anyMonitor_LongQueries_*.trcfiles existed.Root causes found while investigating #972:
xp_delete_filecannot delete.trcfiles at all — it only accepts SQL Server backup files and Maintenance Plan report files, and validates the file header. The [BUG] Trace files never get cleaned up #951 cleanup could never have worked.Msg 22049(extended-proc errors bypassTRY...CATCH) that failed the whole Agent job step.scheduled_master_collectorissuedRESTARTevery cycle (tearing down the trace and spawning a fresh timestamped one), and the trace was created with no rollover file-count cap.Changes
config.data_retention— removed the brokenxp_delete_fileblock (the crash).collect.trace_management_collector— new@max_filesparameter (default 5) →sp_trace_create @filecount, so SQL Server prunes old.trcfiles itself as the trace rolls.STARTnow also replaces an unbounded trace left by an older version, so the fix self-heals without waiting for a SQL Server restart.scheduled_master_collector— callsSTARTinstead ofRESTART; keeps one bounded trace running instead of orphaning files every cycle.tools/Remove-OrphanedTraceFiles.ps1— new one-time cleanup for.trcfiles left on disk by versions ≤ 2.11.0; referenced from the README troubleshooting section.No version bump (release-time step).
CHANGELOG.mdupdated under[Unreleased].Test plan
Tested live on SQL Server 2016 and 2019:
@max_files = 1→ validation error;START→ trace created withmax_files = 5and rollover on;STARTagainst an unbounded trace → replaced with a bounded one;STARTagainst a bounded trace → idempotent no-op;RESTART/STATUS/STOPall work.config.data_retentionrunsSUCCESS — Cleaned 51 tables, noMsg 22049.Remove-OrphanedTraceFiles.ps1:-WhatIfpreview + real run deleted 181 (2019) / 280 (2016) orphaned files, correctly skipping the running trace's file and locked files.@filecountrollover-delete behavior verified against MS Learnsp_trace_createdocs.trace_management_collectoruse named parameters, so the new parameter is safe.🤖 Generated with Claude Code