Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's operational readiness and robustness. It introduces detailed documentation for deploying a multi-node Elastickv cluster using Docker, providing clear instructions for manual setup on multiple virtual machines. Concurrently, it improves the reliability of the system through expanded test coverage, including a new test for Raft's dynamic cluster membership changes and a refactoring of DynamoDB compatibility tests. Additionally, the PR refines the Raft bootstrap mechanism by allowing explicit definition of initial cluster members and addresses a minor bug in DynamoDB JSON attribute unmarshalling. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for bootstrapping a multi-node cluster from a static configuration, complete with documentation and tests. It also improves test coverage by adding a new integration test for AddVoter and refactoring a large test for better readability. The changes are generally solid. I have a couple of suggestions to improve test robustness and address the use of a deprecated function.
adapter/add_voter_join_test.go
Outdated
| return nodes[0].raft.State() == raft.Leader | ||
| }, waitTimeout, waitInterval) | ||
|
|
||
| adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials())) |
There was a problem hiding this comment.
grpc.NewClient is deprecated and should be avoided in new code. Please use grpc.Dial instead. This also applies to the other grpc.NewClient calls in this test on lines 158 and 166.
| adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials())) | |
| adminConn, err := grpc.Dial(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials())) |
| }) | ||
| require.NoError(t, err) | ||
| require.Empty(t, getOut.Item) | ||
| t.Run("ThreadItemOperations", func(t *testing.T) { |
There was a problem hiding this comment.
This sub-test, and others that follow, are not independent and rely on state created by the TableLifecycle sub-test. This means running a sub-test in isolation (e.g., go test -run TestDynamoDB_TableAPICompatibility/ThreadItemOperations) will fail.
To make the tests more robust and easier to debug, consider making each sub-test independent by performing its own setup. For example, you could call createThreadsTable(t) and createMessagesTable(t) at the beginning of each sub-test that requires them.
There was a problem hiding this comment.
Pull request overview
This PR adds support for bootstrapping a Raft cluster with an explicit member list, expands/organizes test coverage around bootstrapping and join behavior, and adds a multi-node Docker deployment runbook.
Changes:
- Add
--raftBootstrapMemberssupport (parsing + validation) and plumb explicit bootstrap server configuration into Raft startup. - Refactor startup code to use a
startRuntimeServershelper and improve demo cluster join logic to awaitAddVoter. - Add/reshape tests for bootstrap parsing/validation and Raft join path; add deployment documentation.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| shard_config.go | Adds parseRaftBootstrapMembers + new error for invalid entries. |
| shard_config_test.go | Adds unit tests for parsing bootstrap member strings. |
| main.go | Adds --raftBootstrapMembers, validates/derives bootstrap server list, plumbs into runtime startup; introduces startRuntimeServers. |
| main_bootstrap_test.go | Adds tests for bootstrap server resolution/validation behavior. |
| multiraft_runtime.go | Extends newRaftGroup to accept explicit bootstrap server config. |
| multiraft_runtime_test.go | Updates newRaftGroup call site for new signature. |
| cmd/server/demo.go | Refactors join logic into helpers and awaits AddVoter completion. |
| adapter/add_voter_join_test.go | Adds an integration-style test covering AddVoter join + serving adapter traffic. |
| adapter/dynamodb_types.go | Refactors AttributeValue JSON unmarshal path. |
| adapter/dynamodb_table_compat_test.go | Refactors large compatibility test into subtests + helper closures. |
| adapter/dynamodb.go | Adds clarification comment on transaction StartTS alignment with condition evaluation snapshot. |
| docs/docker_multinode_manual_run.md | Adds multi-VM manual docker run runbook for 4–5 node clusters. |
| README.md | Links the new deployment/runbook document. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func TestAddVoterJoinPath_RegistersMemberAndServesAdapterTraffic(t *testing.T) { | ||
| const ( | ||
| waitTimeout = 8 * time.Second | ||
| waitInterval = 100 * time.Millisecond | ||
| ) |
There was a problem hiding this comment.
Adapter unit tests in this repo generally call t.Parallel() at the start of each top-level test; this new test doesn’t, which can unnecessarily serialize the adapter test suite. Consider adding t.Parallel() here (ports are already dynamically assigned).
| t.Run("TableLifecycle", func(t *testing.T) { | ||
| createThreadsTable(t) | ||
| createMessagesTable(t) | ||
|
|
There was a problem hiding this comment.
These subtests share mutable state but only create the tables inside the TableLifecycle subtest. Running an individual subtest via go test -run TestDynamoDB_TableAPICompatibility/ThreadItemOperations will fail because the tables were never created. Move table creation to a shared setup before the t.Run calls, or have each subtest create (and optionally delete) the tables it depends on.
| t.Run("TableLifecycle", func(t *testing.T) { | |
| createThreadsTable(t) | |
| createMessagesTable(t) | |
| // Ensure required tables exist before running any subtests so that | |
| // individual subtests can be run in isolation (e.g. with -run). | |
| createThreadsTable(t) | |
| createMessagesTable(t) | |
| t.Run("TableLifecycle", func(t *testing.T) { |
adapter/dynamodb_types.go
Outdated
| var ( | ||
| kind string | ||
| valueRaw json.RawMessage | ||
| found bool | ||
| ) | ||
| for k, v := range raw { | ||
| return a.unmarshalSingleAttributeValue(k, v) | ||
| kind = k | ||
| valueRaw = v | ||
| found = true | ||
| break | ||
| } | ||
| return errors.New("invalid attribute value") | ||
| if !found { | ||
| return errors.New("invalid attribute value") | ||
| } | ||
| return a.unmarshalSingleAttributeValue(kind, valueRaw) |
There was a problem hiding this comment.
Given len(raw) != 1 is already rejected above, the found flag and the if !found branch are redundant (the loop will always execute exactly once). This can be simplified by directly extracting the single map entry (or returning from the range loop as before).
docs/docker_multinode_manual_run.md
Outdated
| # Dockerで4〜5ノードのElastickvクラスタを複数VMで手動構築する(docker compose不使用) | ||
|
|
||
| このドキュメントは、`docker run` のみで Elastickv の Raft クラスタを **複数VM** 上に構築する手順です。 | ||
| `docker compose` は使いません。**単一VMへの全ノード同居は想定しません。** |
There was a problem hiding this comment.
This deployment/runbook doc is entirely in Japanese, while the existing docs/README are written in English. For consistency and broader accessibility, consider adding an English version (or at least an English summary at the top) so non-Japanese readers can follow the deployment steps.
| localAddr := groups[0].address | ||
| for _, s := range servers { | ||
| if string(s.ID) != raftID { | ||
| continue | ||
| } | ||
| if string(s.Address) != localAddr { | ||
| return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address)) | ||
| } |
There was a problem hiding this comment.
resolveBootstrapServers compares bootstrap member address against groups[0].address, but the error text says it must match --address. This can mislead when --raftGroups is provided (group address may differ from --address). Either validate against the actual --address value (pass it in) or adjust the error/message to reference the group address/--raftGroups instead of --address.
adapter/add_voter_join_test.go
Outdated
| bound, ls, retry, err := bindListeners(ctx, &lc, ports[i]) | ||
| require.NoError(t, err) | ||
| require.False(t, retry) | ||
| ports[i] = bound | ||
| listeners = append(listeners, ls) |
There was a problem hiding this comment.
This test assumes bindListeners will never return retry=true (EADDRINUSE), and fails immediately if it does. Since ports are assigned globally, collisions can happen in parallel test runs, making this test flaky. Use the existing retry loop pattern from setupNodes (or call setupNodes/createNode) to reassign ports until listeners bind successfully.
| bound, ls, retry, err := bindListeners(ctx, &lc, ports[i]) | |
| require.NoError(t, err) | |
| require.False(t, retry) | |
| ports[i] = bound | |
| listeners = append(listeners, ls) | |
| for { | |
| bound, ls, retry, err := bindListeners(ctx, &lc, ports[i]) | |
| require.NoError(t, err) | |
| if !retry { | |
| ports[i] = bound | |
| listeners = append(listeners, ls) | |
| break | |
| } | |
| // Port was in use; get a new suggestion and retry binding. | |
| newPorts := assignPorts(1) | |
| ports[i] = newPorts[0] | |
| } |
| docker run -d \ | ||
| --name elastickv \ | ||
| --restart unless-stopped \ | ||
| --network host \ | ||
| -v /var/lib/elastickv:/var/lib/elastickv \ | ||
| ghcr.io/bootjp/elastickv:latest /app \ | ||
| --address "10.0.0.11:50051" \ | ||
| --redisAddress "0.0.0.0:6379" \ | ||
| --dynamoAddress "0.0.0.0:8000" \ | ||
| --raftId "n1" \ | ||
| --raftDataDir "/var/lib/elastickv" \ |
There was a problem hiding this comment.
The docker run examples here start the Elastickv container with --network host and bind --redisAddress/--dynamoAddress to 0.0.0.0, while also exposing the gRPC --address on a non-loopback IP. This combination publishes unauthenticated Redis, DynamoDB-compatible, and Raft admin/data-plane gRPC endpoints to the entire host network, so any machine with network access can read/modify data or reconfigure the cluster using redis-cli/grpcurl without credentials. To avoid unauthorized access, restrict these bindings to trusted interfaces and/or enforce network-level ACLs and application-layer authentication/TLS for Redis, DynamoDB, and Raft gRPC before recommending this setup for multi-VM deployments.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature for cluster bootstrapping using a fixed list of members via the --raftBootstrapMembers flag, which is very useful for environments without dynamic service discovery. The implementation is well-supported with new documentation, unit tests, and an end-to-end test.
Additionally, the PR improves test coverage with a new integration test for the AddVoter join path and refactors an existing large test for better maintainability. The demo server's cluster joining logic has also been made more robust.
My review found one medium-severity issue in the new add_voter_join_test.go test regarding error handling in goroutines, which could lead to flaky test results. Overall, this is a high-quality contribution that enhances the project's deployment flexibility and test robustness.
adapter/add_voter_join_test.go
Outdated
| raftadmin.Register(s, r) | ||
|
|
||
| go func(srv *grpc.Server, l net.Listener) { | ||
| assert.NoError(t, srv.Serve(l)) |
There was a problem hiding this comment.
The assert.NoError call inside a goroutine can lead to race conditions in test reporting and will fail on expected shutdown errors. When the server is shut down during test cleanup, srv.Serve(l) will return an error (e.g., grpc.ErrServerStopped), causing assert.NoError to fail. This will mark the test as failed, but it won't stop the test's execution.
A better approach is to check for expected shutdown errors and only assert on unexpected ones. This applies to the other two goroutines starting the Redis and DynamoDB servers as well (lines 177 and 182).
if err := srv.Serve(l); err != nil && err != grpc.ErrServerStopped {
assert.NoError(t, err)
}There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var lc net.ListenConfig | ||
| l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0") | ||
| require.NoError(t, err) | ||
| addr := l.Addr().String() | ||
| require.NoError(t, l.Close()) | ||
| return addr |
There was a problem hiding this comment.
reserveTCPAddr closes the listener and returns the address, which is inherently racy (another process/test can grab the port before the node binds). This can make the E2E test flaky; consider allocating listeners and keeping them open until startRuntimeServers begins serving, or add a retry loop around server startup to handle transient "address already in use" failures.
| var lc net.ListenConfig | |
| l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0") | |
| require.NoError(t, err) | |
| addr := l.Addr().String() | |
| require.NoError(t, l.Close()) | |
| return addr | |
| // Return a wildcard localhost address and let the actual server bind | |
| // to an ephemeral port, avoiding the racy "reserve then close" pattern. | |
| return "127.0.0.1:0" |
main_bootstrap_e2e_test.go
Outdated
| _, err := clients[writerIdx].RawPut(context.Background(), &pb.RawPutRequest{Key: key, Value: value}) | ||
| require.NoError(t, err) | ||
|
|
||
| for i := range clients { | ||
| client := clients[i] | ||
| require.Eventually(t, func() bool { | ||
| resp, getErr := client.RawGet(context.Background(), &pb.RawGetRequest{Key: key}) | ||
| if getErr != nil { | ||
| return false | ||
| } | ||
| return resp.Exists && bytes.Equal(resp.Value, value) | ||
| }, waitTimeout, waitInterval) | ||
| } |
There was a problem hiding this comment.
These RPCs use context.Background() and the conns are configured with grpc.WaitForReady(true). If a node is down/unreachable, WaitForReady can cause the call to block indefinitely (no deadline), which can hang the test beyond require.Eventually's timeout. Prefer per-RPC contexts with timeouts/deadlines (or avoid WaitForReady in polling loops).
| conn, err := grpc.NewClient(ep.raftAddr, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| grpc.WithDefaultCallOptions(grpc.WaitForReady(true)), | ||
| ) | ||
| require.NoError(t, err) | ||
| conns = append(conns, conn) | ||
| clients = append(clients, pb.NewRawKVClient(conn)) | ||
| } |
There was a problem hiding this comment.
The client conns are created with grpc.WithDefaultCallOptions(grpc.WaitForReady(true)), but the test calls RPCs with contexts that have no deadlines. In failure scenarios this can block indefinitely and hang the test run; consider removing WaitForReady here or ensuring every RPC uses a bounded context (e.g., context.WithTimeout).
| followerConn, err := grpc.NewClient(nodes[1].grpcAddress, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| grpc.WithDefaultCallOptions(grpc.WaitForReady(true)), | ||
| ) | ||
| require.NoError(t, err) | ||
| t.Cleanup(func() { _ = followerConn.Close() }) | ||
| followerRaw := pb.NewRawKVClient(followerConn) | ||
|
|
||
| leaderConn, err := grpc.NewClient(nodes[0].grpcAddress, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| grpc.WithDefaultCallOptions(grpc.WaitForReady(true)), | ||
| ) | ||
| require.NoError(t, err) | ||
| t.Cleanup(func() { _ = leaderConn.Close() }) | ||
| leaderRaw := pb.NewRawKVClient(leaderConn) |
There was a problem hiding this comment.
These gRPC conns are configured with grpc.WaitForReady(true), but later RPCs use a context.Background()-derived context with no deadline. If a node becomes unavailable, calls can block indefinitely and hang the test; prefer per-RPC deadlines/timeouts (or avoid WaitForReady for these test clients).
main.go
Outdated
| return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address)) | ||
| } | ||
| return servers, nil | ||
| } | ||
| return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersMissingLocalNode, "raftId=%q", raftID)) |
There was a problem hiding this comment.
errors.Wrapf already records stack/context (via cockroachdb/errors). Wrapping the result again with errors.WithStack(...) is redundant and slightly obscures the error chain; consider returning errors.Wrapf(...) directly here (and similarly for the MissingLocalNode case below).
| return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address)) | |
| } | |
| return servers, nil | |
| } | |
| return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersMissingLocalNode, "raftId=%q", raftID)) | |
| return nil, errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address) | |
| } | |
| return servers, nil | |
| } | |
| return nil, errors.Wrapf(ErrBootstrapMembersMissingLocalNode, "raftId=%q", raftID) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement for cluster deployment by adding a --raftBootstrapMembers flag, allowing for static cluster bootstrapping. This is well-documented in the new docker_multinode_manual_run.md guide. The test coverage is also enhanced with new end-to-end and unit tests for this feature, and existing tests have been refactored for better clarity. The demo cluster join logic has also been made more robust. My review found a few places in new test files where the deprecated grpc.NewClient function is used; I've suggested replacing it with grpc.Dial for better future maintainability. None of the provided rules were applicable to modify or drop these comments.
| return nodes[0].raft.State() == raft.Leader | ||
| }, waitTimeout, waitInterval) | ||
|
|
||
| adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials())) |
There was a problem hiding this comment.
grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.
| adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials())) | |
| adminConn, err := grpc.Dial(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials())) |
| followerConn, err := grpc.NewClient(nodes[1].grpcAddress, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| ) |
There was a problem hiding this comment.
grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.
| followerConn, err := grpc.NewClient(nodes[1].grpcAddress, | |
| grpc.WithTransportCredentials(insecure.NewCredentials()), | |
| ) | |
| followerConn, err := grpc.Dial(nodes[1].grpcAddress, | |
| grpc.WithTransportCredentials(insecure.NewCredentials()), | |
| ) |
| leaderConn, err := grpc.NewClient(nodes[0].grpcAddress, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| ) |
There was a problem hiding this comment.
grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.
| leaderConn, err := grpc.NewClient(nodes[0].grpcAddress, | |
| grpc.WithTransportCredentials(insecure.NewCredentials()), | |
| ) | |
| leaderConn, err := grpc.Dial(nodes[0].grpcAddress, | |
| grpc.WithTransportCredentials(insecure.NewCredentials()), | |
| ) |
| conn, err := grpc.NewClient(ep.raftAddr, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| ) |
There was a problem hiding this comment.
grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.
| conn, err := grpc.NewClient(ep.raftAddr, | |
| grpc.WithTransportCredentials(insecure.NewCredentials()), | |
| ) | |
| conn, err := grpc.Dial(ep.raftAddr, | |
| grpc.WithTransportCredentials(insecure.NewCredentials()), | |
| ) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil { | ||
| return joinClusterWaitError(err) |
There was a problem hiding this comment.
joinNodeWithRetry always waits joinRetryInterval after a failed attempt, including after the final retry. This unnecessarily delays failure reporting by an extra interval. Consider only sleeping when i < joinRetries-1, or restructure the loop so the sleep happens before the next attempt.
| if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil { | |
| return joinClusterWaitError(err) | |
| if i < joinRetries-1 { | |
| if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil { | |
| return joinClusterWaitError(err) | |
| } |
| func startBootstrapE2ECluster( | ||
| t *testing.T, | ||
| baseDir string, | ||
| nodeCount int, | ||
| startupAttempts int, | ||
| ) ([]bootstrapE2EEndpoint, []*bootstrapE2ENode) { | ||
| t.Helper() | ||
|
|
||
| var ( | ||
| lastErr error | ||
| nodes []*bootstrapE2ENode | ||
| ) | ||
|
|
||
| for attempt := 0; attempt < startupAttempts; attempt++ { | ||
| endpoints := allocateBootstrapE2EEndpoints(t, nodeCount) | ||
| started, err := tryStartBootstrapE2ECluster(baseDir, endpoints) | ||
| if err == nil { | ||
| return endpoints, started | ||
| } | ||
| closeBootstrapE2ENodesIgnoreError(started) | ||
| lastErr = err | ||
| if !isAddressInUseError(err) { | ||
| break | ||
| } | ||
| nodes = nil | ||
| } | ||
|
|
||
| require.NoError(t, lastErr) | ||
| return nil, nodes | ||
| } |
There was a problem hiding this comment.
startBootstrapE2ECluster retries startup on "address already in use", but it reuses the same baseDir across attempts while node IDs stay the same (n1..nN). If a partial attempt writes Raft state before failing, the next attempt can pick up stale state (with different addresses) and behave unpredictably. Use a per-attempt subdirectory (or a fresh temp dir) when retrying, or clean up the prior attempt’s Raft data before retrying.
| title, ok := getOut.Item["title"].(*ddbTypes.AttributeValueMemberS) | ||
| require.True(t, ok) | ||
| require.Equal(t, "title1", title.Value) | ||
| // Shared setup for all sub-tests so each sub-test can run in isolation with -run. |
There was a problem hiding this comment.
The comment claims each sub-test can run in isolation with -run, but the sub-tests share state (tables and items) and later sub-tests depend on earlier ones mutating data. Either make each sub-test fully self-contained (unique table names + seed required items inside each sub-test) or remove the isolation claim and keep this as a single end-to-end scenario test.
| // Shared setup for all sub-tests so each sub-test can run in isolation with -run. | |
| // Shared setup for all sub-tests. |
| t.Run("TransactWriteItems", func(t *testing.T) { | ||
| _, txErr := client.TransactWriteItems(ctx, &dynamodb.TransactWriteItemsInput{ | ||
| TransactItems: []ddbTypes.TransactWriteItem{ | ||
| { | ||
| Put: &ddbTypes.Put{ | ||
| TableName: aws.String(threadsTable), | ||
| Item: map[string]ddbTypes.AttributeValue{ | ||
| "threadId": &ddbTypes.AttributeValueMemberS{Value: "t4"}, | ||
| "title": &ddbTypes.AttributeValueMemberS{Value: "title4"}, | ||
| "createdAt": &ddbTypes.AttributeValueMemberS{Value: "2026-01-04T00:00:00Z"}, | ||
| "status": &ddbTypes.AttributeValueMemberS{Value: "pending"}, | ||
| "accessToken": &ddbTypes.AttributeValueMemberS{Value: ""}, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| }) | ||
| require.NoError(t, err) | ||
| queryPendingAfterTransact, err := client.Query(ctx, &dynamodb.QueryInput{ | ||
| TableName: aws.String(threadsTable), | ||
| IndexName: aws.String("statusIndex"), | ||
| KeyConditionExpression: aws.String("#status = :status"), | ||
| ExpressionAttributeNames: map[string]string{ | ||
| "#status": "status", | ||
| }, | ||
| ExpressionAttributeValues: map[string]ddbTypes.AttributeValue{ | ||
| ":status": &ddbTypes.AttributeValueMemberS{Value: "pending"}, | ||
| }, | ||
| ScanIndexForward: aws.Bool(false), | ||
| }) | ||
| require.NoError(t, txErr) | ||
|
|
||
| queryPendingAfterTransact := queryThreadsByStatus(t, "pending", false) | ||
| require.Len(t, queryPendingAfterTransact.Items, 2) | ||
| }) |
There was a problem hiding this comment.
TransactWriteItems asserts there are 2 pending items after inserting t4, but this relies on ThreadItemOperations having already inserted and updated t1-t3 in the same table. If this sub-test is run alone (e.g., -run TestDynamoDB_TableAPICompatibility/TransactWriteItems), it will likely fail. Seed the required items within this sub-test (or use separate tables per sub-test) so the expectation is local.
docs/docker_multinode_manual_run.md
Outdated
| GRPCURL_IMG="fullstorydev/grpcurl:v1.9.3" | ||
|
|
||
| # Wait for every node gRPC endpoint | ||
| for ip in 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15; do | ||
| until docker run --rm --network host "${GRPCURL_IMG}" \ | ||
| -plaintext "${ip}:50051" list >/dev/null 2>&1; do | ||
| sleep 1 | ||
| done | ||
| done | ||
|
|
||
| # Check Raft members | ||
| docker run --rm --network host "${GRPCURL_IMG}" \ | ||
| -plaintext -d '{}' 10.0.0.11:50051 RaftAdmin/GetConfiguration | ||
| ``` | ||
|
|
||
| For a 4-node cluster, remove `10.0.0.15` from the loop. | ||
|
|
||
| ## 6) Validate Read/Write | ||
|
|
||
| Check leader: | ||
|
|
||
| ```bash | ||
| docker run --rm --network host fullstorydev/grpcurl:v1.9.3 \ | ||
| -plaintext -d '{}' 10.0.0.11:50051 RaftAdmin/Leader |
There was a problem hiding this comment.
These commands pull and run the third-party image fullstorydev/grpcurl:v1.9.3 directly with docker run --network host using only a mutable tag, which creates a supply-chain risk. If the image registry or publisher is compromised and the v1.9.3 tag is repointed, an attacker-controlled image would run inside your cluster node with full gRPC network access to Elastickv and could exfiltrate or corrupt data. Consider pinning the image to a known-good immutable digest (or hosting/compiling your own grpcurl binary or internal image) and documenting how operators should verify the provenance of the tool before use.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement for cluster bootstrapping by adding the --raftBootstrapMembers flag, allowing for a declarative, fixed-member startup. This is a more robust alternative to dynamic joins for initial cluster formation. The implementation is well-tested with new unit and end-to-end tests. Additionally, the PR improves existing test coverage by refactoring a large test into more manageable sub-tests and enhances the demo server's join logic to be more reliable. The new deployment documentation is clear and helpful. Overall, this is a high-quality contribution that improves both functionality and maintainability.
|
|
||
| ```bash | ||
| # Pin to an immutable digest (replace with a verified digest from your supply-chain policy). | ||
| GRPCURL_IMG="fullstorydev/grpcurl@sha256:<verified_digest>" |
There was a problem hiding this comment.
While it's great to recommend pinning to a SHA256 digest for security, providing a placeholder <verified_digest> makes it harder for users to follow this guide. Consider providing a specific, known-good digest as an example, along with instructions on how a user can find and verify the latest digest for themselves. This would make the documentation more self-contained and user-friendly.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Node ID | VM | IP | | ||
| | --- | --- | --- | | ||
| | n1 | vm1 | 10.0.0.11 | | ||
| | n2 | vm2 | 10.0.0.12 | |
There was a problem hiding this comment.
The tables use a || prefix on each row (e.g., || Node ID | VM | IP |), which renders as an extra empty column in most Markdown renderers. Consider switching to standard table syntax with single | delimiters so the topology/fault-tolerance tables render correctly on GitHub.
| await, err := client.Await(ctx, future) | ||
| if err != nil { | ||
| return errors.WithStack(err) | ||
| } | ||
| if await.GetError() != "" { |
There was a problem hiding this comment.
tryJoinNode uses the caller-provided ctx for both AddVoter and the blocking Await call. In the demo path this ctx is typically the errgroup context with no deadline, so Await can block indefinitely and prevent the retry loop from making progress. Use per-attempt context.WithTimeout (like other join-path tests do) or otherwise ensure ctx always has a deadline before calling Await.
| var lc net.ListenConfig | ||
| l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0") | ||
| require.NoError(t, err) | ||
| addr := l.Addr().String() | ||
| require.NoError(t, l.Close()) | ||
| return addr |
There was a problem hiding this comment.
reserveTCPAddr closes the listener and returns the address, which does not actually reserve the port; another process/test can bind it before the node starts. This makes the E2E test inherently racy and reliant on retries. Prefer keeping listeners open until server startup (or using :0 binds at startup and plumbing the actual bound address through) to eliminate port-reuse flakes.
No description provided.