fix(db): persist CPE column in postgres and mysql writers#2494
Conversation
Adds cpe column to both SQL schemas, idempotent migration for existing tables, and JSON binding in InsertBatch.
WalkthroughMySQL and PostgreSQL drivers add a ChangesCPE Column Database Support
Input streaming and resolver handling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/db/mysql.go (1)
176-194: 💤 Low valueConsider restricting allowed column definitions to prevent SQL injection.
The
definitionparameter is interpolated directly into the SQL statement without validation. While currently called only with hardcoded"JSON", this pattern could become a risk if the function is reused with dynamic input.Static analysis flagged this as a potential SQL injection vector. Since table and column names are already properly quoted via
quoteIdentifier, the practical risk is low given current usage.♻️ Optional: validate definition against allowed types
+var allowedColumnTypes = map[string]bool{ + "JSON": true, +} + func (m *mysqlDatabase) ensureColumn(ctx context.Context, column, definition string) error { + if !allowedColumnTypes[definition] { + return fmt.Errorf("unsupported column definition: %s", definition) + } var count int🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/db/mysql.go` around lines 176 - 194, The ensureColumn function currently interpolates the definition string directly into an ALTER TABLE SQL and should validate it first to avoid injection; update ensureColumn to allow only a small whitelist (e.g., "JSON", "TEXT", fixed-size "VARCHAR(n)" matching a strict regex like ^VARCHAR\([1-9][0-9]?\)$) or explicit allowed tokens, reject or return an error for anything else, and also reject characters like ';' or '--'. Keep using quoteIdentifier for m.cfg.TableName and column, and only call fmt.Sprintf("ALTER TABLE %s ADD COLUMN %s %s", ...) with the validated/normalized definition so dynamic input cannot inject SQL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/db/mysql.go`:
- Around line 176-194: The ensureColumn function currently interpolates the
definition string directly into an ALTER TABLE SQL and should validate it first
to avoid injection; update ensureColumn to allow only a small whitelist (e.g.,
"JSON", "TEXT", fixed-size "VARCHAR(n)" matching a strict regex like
^VARCHAR\([1-9][0-9]?\)$) or explicit allowed tokens, reject or return an error
for anything else, and also reject characters like ';' or '--'. Keep using
quoteIdentifier for m.cfg.TableName and column, and only call fmt.Sprintf("ALTER
TABLE %s ADD COLUMN %s %s", ...) with the validated/normalized definition so
dynamic input cannot inject SQL.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7527d076-4683-4924-8b5b-4c790eb6b290
📒 Files selected for processing (2)
internal/db/mysql.gointernal/db/postgres.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/db/mysql.go`:
- Around line 169-176: The current ad-hoc migration in m.ensureColumn uses a
COUNT(*) then ALTER TABLE which races during concurrent upgrades; update
ensureColumn to make the schema change idempotent by performing the ALTER TABLE
ADD COLUMN using "ADD COLUMN IF NOT EXISTS" when supported or, if not, by
executing the ALTER TABLE and treating MySQL error 1060 (duplicate column) as a
non-fatal success. Locate the ensureColumn implementation and change the logic
that does COUNT(*) + ALTER to instead attempt the ALTER directly and
ignore/handle the duplicate-column error (or prefer ADD COLUMN IF NOT EXISTS) so
concurrent callers do not return an error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f88c6546-557c-4a37-a6df-28d34beb904f
📒 Files selected for processing (2)
internal/db/mysql.gointernal/db/postgres.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/db/postgres.go
| // Back-compat for databases whose schema was created before CPE support. | ||
| // New installs already get this column via the CREATE TABLE above; this | ||
| // path only matters for in-place upgrades. | ||
| // TODO: replace these ad-hoc ensureColumn calls with a proper migration | ||
| // framework (e.g. golang-migrate / goose) once more schema changes accumulate. | ||
| if err := m.ensureColumn(ctx, "cpe", "JSON"); err != nil { | ||
| return fmt.Errorf("failed to ensure cpe column: %w", err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd internal/db && head -200 mysql.go | tail -50Repository: projectdiscovery/httpx
Length of output: 1521
🏁 Script executed:
cat -n internal/db/mysql.go | sed -n '169,200p'Repository: projectdiscovery/httpx
Length of output: 1329
🏁 Script executed:
# Check the import structure to understand current state
cat -n internal/db/mysql.go | head -20Repository: projectdiscovery/httpx
Length of output: 526
🏁 Script executed:
# Verify MySQL error code 1060
web_searchRepository: projectdiscovery/httpx
Length of output: 117
🌐 Web query:
MySQL error code 1060 duplicate column
💡 Result:
MySQL error code 1060 (ER_DUP_FIELDNAME, SQLSTATE 42S21) occurs with the message "Duplicate column name '%s'".[1][2][3] It is triggered when attempting to define a column with a name that already exists in the table or result set, enforcing unique column names within tables and query results.[4][5][1] Common scenarios: - CREATE TABLE with duplicate column definitions, e.g., two 'email' columns.[4] - ALTER TABLE ADD COLUMN where the column already exists.[6] - SELECT * from joined tables with matching column names (e.g., both have 'id').[7] - CREATE VIEW without unique column aliases.[8] - Subqueries or INSERT...SELECT without column aliases.[9][10] Fixes: 1. Check the error message for the specific duplicate column name '%s'. 2. For CREATE/ALTER: Remove or rename the duplicate (e.g., ADD COLUMN new_email).[4][6] 3. For SELECT/JOIN/VIEW: Use aliases, e.g., SELECT t1.id AS t1_id, t2.id AS t2_id.[7][8] Example: SELECT t.id AS tips_id FROM tips t LEFT JOIN tip_usage ON tip_usage.tip_id = t.id GROUP BY t.id;[7] 4. Verify table schema with SHOW CREATE TABLE or DESCRIBE table to confirm existing columns.[6] This behavior is consistent across MySQL versions including 8.0+ and 9.x, and also in MariaDB.[3][1][2]
Citations:
- 1: https://dev.mysql.com/doc/mysql-errors/en/server-error-reference.html
- 2: http://dev.mysql.com/doc/mysql-errors/9.6/en/server-error-reference.html
- 3: https://mariadb.com/docs/server/reference/error-codes/mariadb-error-codes-1000-to-1099/e1060
- 4: https://deverrors.com/errors/mysql-error-1060-duplicate-column-name
- 5: https://philipmcclarence.com/resolving-mysql-error-1060-dealing-with-duplicate-column-names/
- 6: https://stackoverflow.com/questions/13736570/mysql-error-1060-duplicate-column-name-alter-table
- 7: https://stackoverflow.com/questions/4815627/1060-duplicate-column-name-id
- 8: https://help.navicat.com/hc/en-us/articles/217791868-1060-Duplicate-column-name-xxx
- 9: https://stackoverflow.com/questions/38757229/error-code-1060-duplicate-column-name
- 10: https://stackoverflow.com/questions/25349763/mysql-get-an-error-error-er-dup-fieldname-duplicate-column
🏁 Script executed:
# Check if errors package is already imported
rg "^import|\"errors\"" internal/db/mysql.goRepository: projectdiscovery/httpx
Length of output: 75
🏁 Script executed:
# Verify go-sql-driver/mysql exports MySQLError type
# Search for usage patterns in the codebase
rg "MySQLError|mysql\.MySQLError" --type goRepository: projectdiscovery/httpx
Length of output: 48
🏁 Script executed:
# Check the actual go-sql-driver/mysql module to understand the error type
web_searchRepository: projectdiscovery/httpx
Length of output: 117
🌐 Web query:
go-sql-driver mysql MySQLError type documentation
💡 Result:
In github.com/go-sql-driver/mysql, MySQL server errors are represented by the struct type MySQLError (pointer receiver form is used): - Type: type MySQLError struct { Number uint16; SQLState [1]byte; Message string } [2] - String formatting: func (me *MySQLError) Error string {... } [2] - Error matching: func (me *MySQLError) Is(err error) bool { if merr, ok:= err.(*MySQLError); ok { return merr.Number == me.Number } return false } [2] So, for documentation/usage you typically: 1) Import the driver and run queries via database/sql. [3] 2) Type-assert / extract *mysql.MySQLError from returned errors when you want the MySQL error number and SQLState; the driver supports matching on MySQLError, including via errors.Is-style number matching because MySQLError defines Is. [2][4] Example (extract typed MySQL error): var mysqlErr *mysql.MySQLError if errors.As(err, &mysqlErr) { log.Printf("MySQL Error %d (%s): %s", mysqlErr.Number, string(mysqlErr.SQLState[:]), mysqlErr.Message) } This works because MySQLError carries Number/SQLState/Message. [2] Notes on exported sentinel errors: - The driver also defines various exported error variables (e.g., ErrInvalidConn, ErrMalformPkt, ErrNoTLS, etc.) for non–server-returned error scenarios, and these can be checked directly. [2] If you want the exact code definition, see errors.go in the driver repository. [2]
Citations:
- 1: https://dev.to/gabrielanhaia/errorsjoin-vs-multi-return-when-to-aggregate-when-to-wrap-3b5o
- 2: https://github.com/go-sql-driver/mysql/blob/master/errors.go
- 3: https://pkg.go.dev/github.com/go-sql-driver/mysql
- 4: https://stackoverflow.com/questions/47009068/how-to-determine-the-error-type-from-mysql
Make the column migration safe for concurrent upgrades.
The COUNT(*) + ALTER TABLE sequence at lines 181-199 is racy. If two instances start against the same pre-existing table, both can observe the column missing; one ADD COLUMN succeeds and the other fails with MySQL error 1060 (duplicate column), so the migration is not idempotent during rolling deploys.
💡 Suggested fix
import (
"context"
"database/sql"
"encoding/json"
+ "errors"
"fmt"
"strings"
- _ "github.com/go-sql-driver/mysql"
+ mysql "github.com/go-sql-driver/mysql"
"github.com/projectdiscovery/httpx/runner"
)
@@
func (m *mysqlDatabase) ensureColumn(ctx context.Context, column, definition string) error {
var count int
err := m.db.QueryRowContext(ctx,
`SELECT COUNT(*) FROM information_schema.columns
WHERE table_schema = DATABASE() AND table_name = ? AND column_name = ?`,
m.cfg.TableName, column,
).Scan(&count)
if err != nil {
return err
}
if count > 0 {
return nil
}
_, err = m.db.ExecContext(ctx,
fmt.Sprintf("ALTER TABLE %s ADD COLUMN %s %s",
quoteIdentifier(m.cfg.TableName), quoteIdentifier(column), definition),
)
+ if err != nil {
+ var mysqlErr *mysql.MySQLError
+ if errors.As(err, &mysqlErr) && mysqlErr.Number == 1060 {
+ return nil
+ }
+ }
return err
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/db/mysql.go` around lines 169 - 176, The current ad-hoc migration in
m.ensureColumn uses a COUNT(*) then ALTER TABLE which races during concurrent
upgrades; update ensureColumn to make the schema change idempotent by performing
the ALTER TABLE ADD COLUMN using "ADD COLUMN IF NOT EXISTS" when supported or,
if not, by executing the ALTER TABLE and treating MySQL error 1060 (duplicate
column) as a non-fatal success. Locate the ensureColumn implementation and
change the logic that does COUNT(*) + ALTER to instead attempt the ALTER
directly and ignore/handle the duplicate-column error (or prefer ADD COLUMN IF
NOT EXISTS) so concurrent callers do not return an error.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@runner/runner.go`:
- Around line 746-749: In the loops iterating "for item, err := range
fileutil.Lines(r.options.InputFile)" do not silently return on err; instead
surface the error (log it via the runner's logger or return it up the call
chain). Replace the bare "if err { return }" with code that logs a descriptive
message including r.options.InputFile and the error (or wraps and returns the
error) so read errors at the fileutil.Lines call are visible and fail the scan;
apply the same change to the other two occurrences handling errors from
fileutil.Lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0b2c1de0-cbba-4122-a14c-37ce9652b7c3
📒 Files selected for processing (2)
runner/options.gorunner/runner.go
| for item, err := range fileutil.Lines(r.options.InputFile) { | ||
| if err != nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
Do not swallow stream read errors silently.
At Line 748, Line 765, and Line 778, read errors cause an immediate return with no log/error signal. That can truncate scans silently and is very hard to diagnose.
Proposed fix
@@
- for item, err := range fileutil.Lines(r.options.InputFile) {
+ for item, err := range fileutil.Lines(r.options.InputFile) {
if err != nil {
+ gologger.Error().Msgf("Could not read input file '%s': %s\n", r.options.InputFile, err)
return
}
@@
- for item, err := range fileutil.Lines(file) {
+ for item, err := range fileutil.Lines(file) {
if err != nil {
+ gologger.Error().Msgf("Could not read input file '%s': %s\n", file, err)
return
}
@@
- for item, err := range fileutil.LinesReader(os.Stdin) {
+ for item, err := range fileutil.LinesReader(os.Stdin) {
if err != nil {
+ gologger.Error().Msgf("Could not read input from stdin: %s\n", err)
return
}Also applies to: 763-766, 776-779
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@runner/runner.go` around lines 746 - 749, In the loops iterating "for item,
err := range fileutil.Lines(r.options.InputFile)" do not silently return on err;
instead surface the error (log it via the runner's logger or return it up the
call chain). Replace the bare "if err { return }" with code that logs a
descriptive message including r.options.InputFile and the error (or wraps and
returns the error) so read errors at the fileutil.Lines call are visible and
fail the scan; apply the same change to the other two occurrences handling
errors from fileutil.Lines.
Proposed changes
Closes #2487. Add
cpecolumn to postgres and mysql schemas with idempotent migration for existing tables and JSON binding in InsertBatch. MongoDB unaffected.Proof
Fresh and pre-existing tables verified on postgres 17 and mysql 8.4.
go vetand unit tests pass.Checklist
Summary by CodeRabbit