fix regexp_count should count empty-pattern matches#22311
Conversation
There was a problem hiding this comment.
@xiedeyantu
Thanks for the fix here. I walked through the updated behavior and everything looks good overall. I just have one small suggestion around expanding the SQL regression coverage for the user-visible paths.
| ---- | ||
| 4 | ||
|
|
||
| query I |
There was a problem hiding this comment.
Nice addition to cover regexp_count('abc', ''). Since this change also affects the start-position and flag handling paths, it would be great to add a couple more SQL-visible cases here as well, like regexp_count('abc', '', 2) and regexp_count('abc', '', 1, 'i'). It may also be worth covering the one-past-end or beyond-end boundary behavior if that is intentional. The Rust unit tests already exercise some of the internals, but adding these to the SLT suite would help lock in the user-facing behavior that motivated this PR.
There was a problem hiding this comment.
Thanks for your review! I have added some tests in this slt file. Please take a look again.
There was a problem hiding this comment.
You can refer to the execution results of PGSQL.
Which issue does this PR close?
regexp_countshould count empty-pattern matches #22267Rationale for this change
regexp_countdid not handle empty regular-expression patterns correctly. An empty pattern should be counted as valid matches instead of returning0. This also affects calls that use thestartargument and certain flag combinations.What changes are included in this PR?
regexp_countso empty-pattern matches are counted correctly.starthandling so character offsets are computed correctly.start, and flags.Are these changes tested?
cargo test -p datafusion-sqllogictest --test sqllogictests table_functionscargo test -p datafusion-sqllogictest --test sqllogictests scalarAre there any user-facing changes?
regexp_countnow returns correct counts for empty patterns, so results may differ from the previous behavior.