Skip to content

vitess: add vschemas with vindexes#8634

Open
jsha wants to merge 13 commits intomainfrom
vschemas-and-vindexes
Open

vitess: add vschemas with vindexes#8634
jsha wants to merge 13 commits intomainfrom
vschemas-and-vindexes

Conversation

@jsha
Copy link
Contributor

@jsha jsha commented Feb 19, 2026

In order to exercise sharding logic in Vitess, we need our integration test environment to use sharded keyspaces, which means we need to use vindexes. This PR enables vindexes and changes DB initialization in support. It does not yet shard the keyspaces since doing so causes some consistency issues I still need to debug.

DB initialization changes:

  • Stop using sql-migrate inside the boulder container.
  • Make the bmariadb and bvitess containers responsible for applying schemas when they start up.
  • Use different database / keyspace names for different test environments: boulder_sa / boulder_sa_next. This allows the two to coexist, which means we don't need to down --volumes the bmariadb container when running locally and switching from ./tn.sh to ./t.sh.
  • The bmariadb image has support for applying SQL schemas from a directory. Use that. The vttestserver has support for applying SQL schemas and vschemas from a directory.

Foreign keys: we had one foreign key constraint (serials.registrationID -> registrations), but we probably don't actually want this in the Vitess world. It makes things more complicated with no benefit. I removed it from the SQL schemas.

test/db.go: In vttestserver we can't delete more than 10k rows, so I changed deleteEverythingInAllTables to delete in a loop. Incidentally I cleaned up some stuff about foreign keys and 1 = 1 that is no longer needed. Also, I changed deletion calls in sa_test.go to use the shared deletion functions.

Deletion from overrides was failing under this setup with Error 1105 (HY000): VT09015: schema tracking required. I concluded this was because the table had no primary key. I changed its UNIQUE KEY to a PRIMARY KEY.

test/vars/vars.go: calculate DSNs based on the BOULDER_CONFIG_DIR variables, to select between boulder_sa and boulder_sa_next (etc.).

@jsha jsha marked this pull request as ready for review February 27, 2026 04:58
@jsha jsha requested a review from a team as a code owner February 27, 2026 04:58
@jsha jsha requested a review from aarongable February 27, 2026 04:58
@github-actions
Copy link
Contributor

@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

1 similar comment
@github-actions
Copy link
Contributor

@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

@jsha
Copy link
Contributor Author

jsha commented Feb 27, 2026

One thing worth noting about this approach: while it's kind of nice that the bmariadb and bvitess containers handle their own SQL schema setup, it means that syntax errors for the .sql files get buried in the logs for those two containers, and need e.g. docker compose logs bvitess to see them.

Copy link
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great for this PR to come with some explicit documentation pointing towards the mariadb and vitess docs around loading schemas from disk. I had to dig for a bit until I found mariadb's docker-entrypoint process_init_files and finally understood how the schemas were getting loading into mariadb.

Overall LGTM, it took me a minute to wrap my head around how things work in this new world but the result is much cleaner and simpler than the status quo. I like it.

@aarongable
Copy link
Contributor

Hmm, I thought the integration failure was a flake, but the exact same failure happened on my retry. Looks like it might be real.

@jsha
Copy link
Contributor Author

jsha commented Mar 4, 2026

The error is from revocation_test:297:

			if !strings.Contains(err.Error(), "urn:ietf:params:acme:error:serverInternal") ||
				!strings.Contains(err.Error(), "SCT embedding") {
				t.Fatal(err)
			}

This is testing that when we cause CT to reject a certificate (so we never create a final certificate), we can still revoke it. As part of the setup, we check that (a) the certificate issuance really did fail, and (b) the error message said something about SCT embeddeding.

In fact we got some errors like these:

2026-03-04T05:20:55.555677+00:00Z 87a3d59626fe unknown boulder-wfe2[694]: 6 boulder-wfe2 Vmx34w POST /acme/finalize/ 398 500 14973 0.0.0.0 JSON={"Slug":"398/343","InternalErrors":["rpc error: code = Unknown desc = persisting linting precert to database: rpc error: code = Unknown desc = failed to insert *sa.certificateStatusModel: context deadline exceeded (also, while rolling back: sql: transaction has already been committed or rolled back)"],"Error":"500 :: serverInternal :: Error finalizing order","ua":"eggsampler-acme/v3 Go-http-client/1.1","Extra":{"KeyType":"ECDSA P-256"},"Identifiers":[{"type":"dns","value":"9d232e.com"}]}

and these:

2026-03-04T05:20:55.566687+00:00Z 87a3d59626fe unknown boulder-ra[656]: 6 boulder-ra hYHGmA [AUDIT] Certificate request - error JSON={"ID":"RwYsuZHRkdvSMeE8Cf7Tl_ZA16uJMnbUOUb6v8K8Pus","Requester":403,"OrderID":348,"VerifiedFields":["subject.commonName","subjectAltName"],"Identifiers":[{"Ident":{"type":"dns","value":"df35d8.com"},"Authz":"356","Challenge":"http-01","Validated":"2026-03-04T05:20:40Z"}],"NotBefore":"0001-01-01T00:00:00Z","NotAfter":"0001-01-01T00:00:00Z","RequestTime":"2026-03-04T05:20:40.604393426Z","ResponseTime":"2026-03-04T05:20:55.566657086Z","Error":"rpc error: code = Unknown desc = getting SCTs: ra.SCTProvider.GetSCTs timed out after 892 ms","PreviousCertificateIssued":"0001-01-01T00:00:00Z","UserAgent":"eggsampler-acme/v3 Go-http-client/1.1"}

Also, with the addition of some improved error returns in the failOrder path I saw:

boulder-wfe2[694]: 6 boulder-wfe2 1i_XSA POST /acme/finalize/ 400 500 3054 0.0.0.0 JSON={"Slug":"400/336","InternalErrors":["getting SCTs: failed to get 2 SCTs, got 6 error(s): ct submission to \"A1 Current\" (\"http://boulder.service.consul:4600\") failed: rpc error: code = Unknown desc = got HTTP status \"400 Bad Request\"; ct submission to \"A2 Current\" (\"http://boulder.service.consul:4603\") failed: rpc error: code = Unknown desc = got HTTP status \"400 Bad Request\"; ct submission to \"B2\" (\"http://boulder.service.consul:4605\") failed: rpc error: code = Unknown desc = got HTTP status \"400 Bad Request\"; ct submission to \"C1\" (\"http://boulder.service.consul:4606\") failed: rpc error: code = Unknown desc = got HTTP status \"400 Bad Request\"; ct submission to \"D1\" (\"http://boulder.service.consul:4607\") failed: rpc error: code = Unknown desc = got HTTP status \"400 Bad Request\"; ct submission to \"B1\" (\"http://boulder.service.consul:4604\") failed: rpc error: code = Unknown desc = got HTTP status \"400 Bad Request\""],"Error":"500 :: serverInternal :: Error finalizing order :: Unable to meet CA SCT embedding requirements","ua":"eggsampler-acme/v3 Go-http-client/1.1","Extra":{"KeyType":"ECDSA P-256"},"Identifiers":[{"type":"dns","value":"b95747.com"}]}
 boulder-ra S5A7EQ [AUDIT] Persisting failed order JSON={"error":"rpc error: code = Unknown desc = error updating order error field:%!(EXTRA db.ErrDatabaseOp=failed to exec orders: context deadline exceeded) (also, while rolling back: sql: transaction has already been committed or rolled back)","order":339,"prob":"problemType:\"serverInternal\" detail:\"Error finalizing order :: Unable to meet CA SCT embedding requirements\" httpStatus:500","requester":397}

I'm able to reproduce locally, intermittently. Still needs more investigation. Also I'm noticing a couple of things in the output from TestRevocation:

  • crl-storer spams the logs a lot, making it hard to find relevant log lines.
  • we call t.Fatal in a goroutine, which we're not supposed to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants