Add deployment docs and improve test coverage by bootjp · Pull Request #326 · bootjp/elastickv

bootjp · 2026-02-27T03:46:18Z

No description provided.

gemini-code-assist · 2026-02-27T03:46:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's operational readiness and robustness. It introduces detailed documentation for deploying a multi-node Elastickv cluster using Docker, providing clear instructions for manual setup on multiple virtual machines. Concurrently, it improves the reliability of the system through expanded test coverage, including a new test for Raft's dynamic cluster membership changes and a refactoring of DynamoDB compatibility tests. Additionally, the PR refines the Raft bootstrap mechanism by allowing explicit definition of initial cluster members and addresses a minor bug in DynamoDB JSON attribute unmarshalling.

Highlights

Deployment Documentation: A new comprehensive guide has been added for manually deploying a multi-node Elastickv cluster using Docker across multiple VMs, providing clear instructions for setup.
Raft Bootstrap Enhancement: A new --raftBootstrapMembers flag was introduced, allowing users to specify a fixed list of Raft cluster members during bootstrap, which improves control over initial cluster formation.
Improved Test Coverage: Test coverage has been expanded with a new integration test for the Raft AddVoter functionality and existing DynamoDB API compatibility tests were refactored into more granular sub-tests for better organization.
DynamoDB JSON Unmarshalling Fix: A bug in the attributeValue JSON unmarshalling logic within the DynamoDB adapter has been corrected to properly handle single-key maps.
Code Refactoring: Server startup and Raft cluster joining logic have been refactored for improved modularity, readability, and error handling.

Changelog

README.md
- Updated to include a link to the new deployment documentation.
adapter/add_voter_join_test.go
- Added a new test case to verify that a new voter can successfully join the Raft cluster and serve traffic.
adapter/dynamodb.go
- Added comments to clarify the purpose of StartTS in buildTransactWriteItemsRequest for transaction alignment.
adapter/dynamodb_table_compat_test.go
- Refactored the DynamoDB table API compatibility tests into distinct sub-tests and introduced helper functions for better organization and readability.
adapter/dynamodb_types.go
- Corrected the JSON unmarshalling logic for attributeValue to properly handle single-key maps.
cmd/server/demo.go
- Refactored the joinCluster function by extracting retry logic and AddVoter await into dedicated helper functions (joinNodeWithRetry, tryJoinNode).
docs/docker_multinode_manual_run.md
- Added a new document detailing the manual setup of a 4-5 node Elastickv cluster on multiple VMs using docker run.
main.go
- Introduced a new command-line flag --raftBootstrapMembers.
- Updated buildShardGroups to accept bootstrap servers.
- Refactored the main server startup logic into a new startRuntimeServers function.
main_bootstrap_test.go
- Added new unit tests for the resolveBootstrapServers function, covering various scenarios for bootstrap member resolution.
multiraft_runtime.go
- Modified newRaftGroup to accept an optional list of bootstrapServers for initial Raft cluster configuration.
multiraft_runtime_test.go
- Updated the TestNewRaftGroupBootstrap function to reflect changes in newRaftGroup signature.
shard_config.go
- Defined a new error type ErrInvalidRaftBootstrapMembersEntry.
- Implemented the parseRaftBootstrapMembers function to parse the new bootstrap members flag.
shard_config_test.go
- Added new unit tests for the parseRaftBootstrapMembers function, including tests for valid parsing, whitespace trimming, and error conditions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable feature for bootstrapping a multi-node cluster from a static configuration, complete with documentation and tests. It also improves test coverage by adding a new integration test for AddVoter and refactoring a large test for better readability. The changes are generally solid. I have a couple of suggestions to improve test robustness and address the use of a deprecated function.

gemini-code-assist · 2026-02-27T03:49:06Z

adapter/add_voter_join_test.go

+		return nodes[0].raft.State() == raft.Leader
+	}, waitTimeout, waitInterval)
+
+	adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))


grpc.NewClient is deprecated and should be avoided in new code. Please use grpc.Dial instead. This also applies to the other grpc.NewClient calls in this test on lines 158 and 166.

Suggested change

adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))

adminConn, err := grpc.Dial(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))

gemini-code-assist · 2026-02-27T03:49:06Z

adapter/dynamodb_table_compat_test.go

-	})
-	require.NoError(t, err)
-	require.Empty(t, getOut.Item)
+	t.Run("ThreadItemOperations", func(t *testing.T) {


This sub-test, and others that follow, are not independent and rely on state created by the TableLifecycle sub-test. This means running a sub-test in isolation (e.g., go test -run TestDynamoDB_TableAPICompatibility/ThreadItemOperations) will fail.

To make the tests more robust and easier to debug, consider making each sub-test independent by performing its own setup. For example, you could call createThreadsTable(t) and createMessagesTable(t) at the beginning of each sub-test that requires them.

Copilot

Pull request overview

This PR adds support for bootstrapping a Raft cluster with an explicit member list, expands/organizes test coverage around bootstrapping and join behavior, and adds a multi-node Docker deployment runbook.

Changes:

Add --raftBootstrapMembers support (parsing + validation) and plumb explicit bootstrap server configuration into Raft startup.
Refactor startup code to use a startRuntimeServers helper and improve demo cluster join logic to await AddVoter.
Add/reshape tests for bootstrap parsing/validation and Raft join path; add deployment documentation.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
shard_config.go	Adds `parseRaftBootstrapMembers` + new error for invalid entries.
shard_config_test.go	Adds unit tests for parsing bootstrap member strings.
main.go	Adds `--raftBootstrapMembers`, validates/derives bootstrap server list, plumbs into runtime startup; introduces `startRuntimeServers`.
main_bootstrap_test.go	Adds tests for bootstrap server resolution/validation behavior.
multiraft_runtime.go	Extends `newRaftGroup` to accept explicit bootstrap server config.
multiraft_runtime_test.go	Updates `newRaftGroup` call site for new signature.
cmd/server/demo.go	Refactors join logic into helpers and awaits AddVoter completion.
adapter/add_voter_join_test.go	Adds an integration-style test covering AddVoter join + serving adapter traffic.
adapter/dynamodb_types.go	Refactors AttributeValue JSON unmarshal path.
adapter/dynamodb_table_compat_test.go	Refactors large compatibility test into subtests + helper closures.
adapter/dynamodb.go	Adds clarification comment on transaction StartTS alignment with condition evaluation snapshot.
docs/docker_multinode_manual_run.md	Adds multi-VM manual `docker run` runbook for 4–5 node clusters.
README.md	Links the new deployment/runbook document.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T03:55:50Z

adapter/add_voter_join_test.go

+func TestAddVoterJoinPath_RegistersMemberAndServesAdapterTraffic(t *testing.T) {
+	const (
+		waitTimeout  = 8 * time.Second
+		waitInterval = 100 * time.Millisecond
+	)


Adapter unit tests in this repo generally call t.Parallel() at the start of each top-level test; this new test doesn’t, which can unnecessarily serialize the adapter test suite. Consider adding t.Parallel() here (ports are already dynamically assigned).

Copilot · 2026-02-27T03:55:51Z

adapter/dynamodb_table_compat_test.go

+	t.Run("TableLifecycle", func(t *testing.T) {
+		createThreadsTable(t)
+		createMessagesTable(t)



These subtests share mutable state but only create the tables inside the TableLifecycle subtest. Running an individual subtest via go test -run TestDynamoDB_TableAPICompatibility/ThreadItemOperations will fail because the tables were never created. Move table creation to a shared setup before the t.Run calls, or have each subtest create (and optionally delete) the tables it depends on.

Suggested change

t.Run("TableLifecycle", func(t *testing.T) {

createThreadsTable(t)

createMessagesTable(t)

// Ensure required tables exist before running any subtests so that

// individual subtests can be run in isolation (e.g. with -run).

createThreadsTable(t)

createMessagesTable(t)

t.Run("TableLifecycle", func(t *testing.T) {

Copilot · 2026-02-27T03:55:51Z

adapter/dynamodb_types.go

+	var (
+		kind     string
+		valueRaw json.RawMessage
+		found    bool
+	)
 	for k, v := range raw {
-		return a.unmarshalSingleAttributeValue(k, v)
+		kind = k
+		valueRaw = v
+		found = true
+		break
 	}
-	return errors.New("invalid attribute value")
+	if !found {
+		return errors.New("invalid attribute value")
+	}
+	return a.unmarshalSingleAttributeValue(kind, valueRaw)


Given len(raw) != 1 is already rejected above, the found flag and the if !found branch are redundant (the loop will always execute exactly once). This can be simplified by directly extracting the single map entry (or returning from the range loop as before).

Copilot · 2026-02-27T03:55:51Z

docs/docker_multinode_manual_run.md

+# Dockerで4〜5ノードのElastickvクラスタを複数VMで手動構築する（docker compose不使用）
+
+このドキュメントは、`docker run` のみで Elastickv の Raft クラスタを **複数VM** 上に構築する手順です。  
+`docker compose` は使いません。**単一VMへの全ノード同居は想定しません。**


This deployment/runbook doc is entirely in Japanese, while the existing docs/README are written in English. For consistency and broader accessibility, consider adding an English version (or at least an English summary at the top) so non-Japanese readers can follow the deployment steps.

Copilot · 2026-02-27T03:55:51Z

main.go

+	localAddr := groups[0].address
+	for _, s := range servers {
+		if string(s.ID) != raftID {
+			continue
+		}
+		if string(s.Address) != localAddr {
+			return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address))
+		}


resolveBootstrapServers compares bootstrap member address against groups[0].address, but the error text says it must match --address. This can mislead when --raftGroups is provided (group address may differ from --address). Either validate against the actual --address value (pass it in) or adjust the error/message to reference the group address/--raftGroups instead of --address.

Copilot · 2026-02-27T03:55:52Z

adapter/add_voter_join_test.go

+		bound, ls, retry, err := bindListeners(ctx, &lc, ports[i])
+		require.NoError(t, err)
+		require.False(t, retry)
+		ports[i] = bound
+		listeners = append(listeners, ls)


This test assumes bindListeners will never return retry=true (EADDRINUSE), and fails immediately if it does. Since ports are assigned globally, collisions can happen in parallel test runs, making this test flaky. Use the existing retry loop pattern from setupNodes (or call setupNodes/createNode) to reassign ports until listeners bind successfully.

Suggested change

bound, ls, retry, err := bindListeners(ctx, &lc, ports[i])

require.NoError(t, err)

require.False(t, retry)

ports[i] = bound

listeners = append(listeners, ls)

for {

bound, ls, retry, err := bindListeners(ctx, &lc, ports[i])

require.NoError(t, err)

if !retry {

ports[i] = bound

listeners = append(listeners, ls)

break

}

// Port was in use; get a new suggestion and retry binding.

newPorts := assignPorts(1)

ports[i] = newPorts[0]

}

Copilot · 2026-02-27T03:55:52Z

docs/docker_multinode_manual_run.md

+docker run -d \
+  --name elastickv \
+  --restart unless-stopped \
+  --network host \
+  -v /var/lib/elastickv:/var/lib/elastickv \
+  ghcr.io/bootjp/elastickv:latest /app \
+  --address "10.0.0.11:50051" \
+  --redisAddress "0.0.0.0:6379" \
+  --dynamoAddress "0.0.0.0:8000" \
+  --raftId "n1" \
+  --raftDataDir "/var/lib/elastickv" \


The docker run examples here start the Elastickv container with --network host and bind --redisAddress/--dynamoAddress to 0.0.0.0, while also exposing the gRPC --address on a non-loopback IP. This combination publishes unauthenticated Redis, DynamoDB-compatible, and Raft admin/data-plane gRPC endpoints to the entire host network, so any machine with network access can read/modify data or reconfigure the cluster using redis-cli/grpcurl without credentials. To avoid unauthorized access, restrict these bindings to trusted interfaces and/or enforce network-level ACLs and application-layer authentication/TLS for Redis, DynamoDB, and Raft gRPC before recommending this setup for multi-VM deployments.

bootjp · 2026-03-01T16:21:06Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature for cluster bootstrapping using a fixed list of members via the --raftBootstrapMembers flag, which is very useful for environments without dynamic service discovery. The implementation is well-supported with new documentation, unit tests, and an end-to-end test.

Additionally, the PR improves test coverage with a new integration test for the AddVoter join path and refactors an existing large test for better maintainability. The demo server's cluster joining logic has also been made more robust.

My review found one medium-severity issue in the new add_voter_join_test.go test regarding error handling in goroutines, which could lead to flaky test results. Overall, this is a high-quality contribution that enhances the project's deployment flexibility and test robustness.

gemini-code-assist · 2026-03-01T16:23:25Z

adapter/add_voter_join_test.go

+	raftadmin.Register(s, r)
+
+	go func(srv *grpc.Server, l net.Listener) {
+		assert.NoError(t, srv.Serve(l))


The assert.NoError call inside a goroutine can lead to race conditions in test reporting and will fail on expected shutdown errors. When the server is shut down during test cleanup, srv.Serve(l) will return an error (e.g., grpc.ErrServerStopped), causing assert.NoError to fail. This will mark the test as failed, but it won't stop the test's execution.

A better approach is to check for expected shutdown errors and only assert on unexpected ones. This applies to the other two goroutines starting the Redis and DynamoDB servers as well (lines 177 and 182).

if err := srv.Serve(l); err != nil && err != grpc.ErrServerStopped { assert.NoError(t, err) }

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T16:27:09Z

main_bootstrap_e2e_test.go

+	var lc net.ListenConfig
+	l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0")
+	require.NoError(t, err)
+	addr := l.Addr().String()
+	require.NoError(t, l.Close())
+	return addr


reserveTCPAddr closes the listener and returns the address, which is inherently racy (another process/test can grab the port before the node binds). This can make the E2E test flaky; consider allocating listeners and keeping them open until startRuntimeServers begins serving, or add a retry loop around server startup to handle transient "address already in use" failures.

Suggested change

var lc net.ListenConfig

l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0")

require.NoError(t, err)

addr := l.Addr().String()

require.NoError(t, l.Close())

return addr

// Return a wildcard localhost address and let the actual server bind

// to an ephemeral port, avoiding the racy "reserve then close" pattern.

return "127.0.0.1:0"

Copilot · 2026-03-01T16:27:10Z

main_bootstrap_e2e_test.go

+	_, err := clients[writerIdx].RawPut(context.Background(), &pb.RawPutRequest{Key: key, Value: value})
+	require.NoError(t, err)
+
+	for i := range clients {
+		client := clients[i]
+		require.Eventually(t, func() bool {
+			resp, getErr := client.RawGet(context.Background(), &pb.RawGetRequest{Key: key})
+			if getErr != nil {
+				return false
+			}
+			return resp.Exists && bytes.Equal(resp.Value, value)
+		}, waitTimeout, waitInterval)
+	}


These RPCs use context.Background() and the conns are configured with grpc.WaitForReady(true). If a node is down/unreachable, WaitForReady can cause the call to block indefinitely (no deadline), which can hang the test beyond require.Eventually's timeout. Prefer per-RPC contexts with timeouts/deadlines (or avoid WaitForReady in polling loops).

Copilot · 2026-03-01T16:27:10Z

main_bootstrap_e2e_test.go

+		conn, err := grpc.NewClient(ep.raftAddr,
+			grpc.WithTransportCredentials(insecure.NewCredentials()),
+			grpc.WithDefaultCallOptions(grpc.WaitForReady(true)),
+		)
+		require.NoError(t, err)
+		conns = append(conns, conn)
+		clients = append(clients, pb.NewRawKVClient(conn))
+	}


The client conns are created with grpc.WithDefaultCallOptions(grpc.WaitForReady(true)), but the test calls RPCs with contexts that have no deadlines. In failure scenarios this can block indefinitely and hang the test run; consider removing WaitForReady here or ensuring every RPC uses a bounded context (e.g., context.WithTimeout).

Copilot · 2026-03-01T16:27:10Z

adapter/add_voter_join_test.go

+	followerConn, err := grpc.NewClient(nodes[1].grpcAddress,
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+		grpc.WithDefaultCallOptions(grpc.WaitForReady(true)),
+	)
+	require.NoError(t, err)
+	t.Cleanup(func() { _ = followerConn.Close() })
+	followerRaw := pb.NewRawKVClient(followerConn)
+
+	leaderConn, err := grpc.NewClient(nodes[0].grpcAddress,
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+		grpc.WithDefaultCallOptions(grpc.WaitForReady(true)),
+	)
+	require.NoError(t, err)
+	t.Cleanup(func() { _ = leaderConn.Close() })
+	leaderRaw := pb.NewRawKVClient(leaderConn)


These gRPC conns are configured with grpc.WaitForReady(true), but later RPCs use a context.Background()-derived context with no deadline. If a node becomes unavailable, calls can block indefinitely and hang the test; prefer per-RPC deadlines/timeouts (or avoid WaitForReady for these test clients).

Copilot · 2026-03-01T16:27:10Z

main.go

+			return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address))
+		}
+		return servers, nil
+	}
+	return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersMissingLocalNode, "raftId=%q", raftID))


errors.Wrapf already records stack/context (via cockroachdb/errors). Wrapping the result again with errors.WithStack(...) is redundant and slightly obscures the error chain; consider returning errors.Wrapf(...) directly here (and similarly for the MissingLocalNode case below).

Suggested change

return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address))

}

return servers, nil

}

return nil, errors.WithStack(errors.Wrapf(ErrBootstrapMembersMissingLocalNode, "raftId=%q", raftID))

return nil, errors.Wrapf(ErrBootstrapMembersLocalAddrMismatch, "expected %q got %q", localAddr, s.Address)

}

return servers, nil

}

return nil, errors.Wrapf(ErrBootstrapMembersMissingLocalNode, "raftId=%q", raftID)

bootjp · 2026-03-02T20:04:44Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement for cluster deployment by adding a --raftBootstrapMembers flag, allowing for static cluster bootstrapping. This is well-documented in the new docker_multinode_manual_run.md guide. The test coverage is also enhanced with new end-to-end and unit tests for this feature, and existing tests have been refactored for better clarity. The demo cluster join logic has also been made more robust. My review found a few places in new test files where the deprecated grpc.NewClient function is used; I've suggested replacing it with grpc.Dial for better future maintainability. None of the provided rules were applicable to modify or drop these comments.

gemini-code-assist · 2026-03-02T20:08:42Z

adapter/add_voter_join_test.go

+		return nodes[0].raft.State() == raft.Leader
+	}, waitTimeout, waitInterval)
+
+	adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))


grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.

Suggested change

adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))

adminConn, err := grpc.Dial(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))

gemini-code-assist · 2026-03-02T20:08:42Z

adapter/add_voter_join_test.go

+	followerConn, err := grpc.NewClient(nodes[1].grpcAddress,
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+	)


grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.

Suggested change

followerConn, err := grpc.NewClient(nodes[1].grpcAddress,

grpc.WithTransportCredentials(insecure.NewCredentials()),

)

followerConn, err := grpc.Dial(nodes[1].grpcAddress,

grpc.WithTransportCredentials(insecure.NewCredentials()),

)

gemini-code-assist · 2026-03-02T20:08:42Z

adapter/add_voter_join_test.go

+	leaderConn, err := grpc.NewClient(nodes[0].grpcAddress,
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+	)


grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.

Suggested change

leaderConn, err := grpc.NewClient(nodes[0].grpcAddress,

grpc.WithTransportCredentials(insecure.NewCredentials()),

)

leaderConn, err := grpc.Dial(nodes[0].grpcAddress,

grpc.WithTransportCredentials(insecure.NewCredentials()),

)

gemini-code-assist · 2026-03-02T20:08:42Z

main_bootstrap_e2e_test.go

+		conn, err := grpc.NewClient(ep.raftAddr,
+			grpc.WithTransportCredentials(insecure.NewCredentials()),
+		)


grpc.NewClient is deprecated and will be removed in a future version of google.golang.org/grpc. Please use grpc.Dial instead.

Suggested change

conn, err := grpc.NewClient(ep.raftAddr,

grpc.WithTransportCredentials(insecure.NewCredentials()),

)

conn, err := grpc.Dial(ep.raftAddr,

grpc.WithTransportCredentials(insecure.NewCredentials()),

)

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T20:09:49Z

cmd/server/demo.go

+		if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil {
+			return joinClusterWaitError(err)


joinNodeWithRetry always waits joinRetryInterval after a failed attempt, including after the final retry. This unnecessarily delays failure reporting by an extra interval. Consider only sleeping when i < joinRetries-1, or restructure the loop so the sleep happens before the next attempt.

Suggested change

if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil {

return joinClusterWaitError(err)

if i < joinRetries-1 {

if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil {

return joinClusterWaitError(err)

}

Copilot · 2026-03-02T20:09:50Z

main_bootstrap_e2e_test.go

+func startBootstrapE2ECluster(
+	t *testing.T,
+	baseDir string,
+	nodeCount int,
+	startupAttempts int,
+) ([]bootstrapE2EEndpoint, []*bootstrapE2ENode) {
+	t.Helper()
+
+	var (
+		lastErr error
+		nodes   []*bootstrapE2ENode
+	)
+
+	for attempt := 0; attempt < startupAttempts; attempt++ {
+		endpoints := allocateBootstrapE2EEndpoints(t, nodeCount)
+		started, err := tryStartBootstrapE2ECluster(baseDir, endpoints)
+		if err == nil {
+			return endpoints, started
+		}
+		closeBootstrapE2ENodesIgnoreError(started)
+		lastErr = err
+		if !isAddressInUseError(err) {
+			break
+		}
+		nodes = nil
+	}
+
+	require.NoError(t, lastErr)
+	return nil, nodes
+}


startBootstrapE2ECluster retries startup on "address already in use", but it reuses the same baseDir across attempts while node IDs stay the same (n1..nN). If a partial attempt writes Raft state before failing, the next attempt can pick up stale state (with different addresses) and behave unpredictably. Use a per-attempt subdirectory (or a fresh temp dir) when retrying, or clean up the prior attempt’s Raft data before retrying.

Copilot · 2026-03-02T20:09:50Z

adapter/dynamodb_table_compat_test.go

-	title, ok := getOut.Item["title"].(*ddbTypes.AttributeValueMemberS)
-	require.True(t, ok)
-	require.Equal(t, "title1", title.Value)
+	// Shared setup for all sub-tests so each sub-test can run in isolation with -run.


The comment claims each sub-test can run in isolation with -run, but the sub-tests share state (tables and items) and later sub-tests depend on earlier ones mutating data. Either make each sub-test fully self-contained (unique table names + seed required items inside each sub-test) or remove the isolation claim and keep this as a single end-to-end scenario test.

Suggested change

// Shared setup for all sub-tests so each sub-test can run in isolation with -run.

// Shared setup for all sub-tests.

Copilot · 2026-03-02T20:09:50Z

adapter/dynamodb_table_compat_test.go

+	t.Run("TransactWriteItems", func(t *testing.T) {
+		_, txErr := client.TransactWriteItems(ctx, &dynamodb.TransactWriteItemsInput{
+			TransactItems: []ddbTypes.TransactWriteItem{
+				{
+					Put: &ddbTypes.Put{
+						TableName: aws.String(threadsTable),
+						Item: map[string]ddbTypes.AttributeValue{
+							"threadId":    &ddbTypes.AttributeValueMemberS{Value: "t4"},
+							"title":       &ddbTypes.AttributeValueMemberS{Value: "title4"},
+							"createdAt":   &ddbTypes.AttributeValueMemberS{Value: "2026-01-04T00:00:00Z"},
+							"status":      &ddbTypes.AttributeValueMemberS{Value: "pending"},
+							"accessToken": &ddbTypes.AttributeValueMemberS{Value: ""},
+						},
 					},
 				},
 			},
-		},
-	})
-	require.NoError(t, err)
-	queryPendingAfterTransact, err := client.Query(ctx, &dynamodb.QueryInput{
-		TableName:              aws.String(threadsTable),
-		IndexName:              aws.String("statusIndex"),
-		KeyConditionExpression: aws.String("#status = :status"),
-		ExpressionAttributeNames: map[string]string{
-			"#status": "status",
-		},
-		ExpressionAttributeValues: map[string]ddbTypes.AttributeValue{
-			":status": &ddbTypes.AttributeValueMemberS{Value: "pending"},
-		},
-		ScanIndexForward: aws.Bool(false),
+		})
+		require.NoError(t, txErr)
+
+		queryPendingAfterTransact := queryThreadsByStatus(t, "pending", false)
+		require.Len(t, queryPendingAfterTransact.Items, 2)
 	})


TransactWriteItems asserts there are 2 pending items after inserting t4, but this relies on ThreadItemOperations having already inserted and updated t1-t3 in the same table. If this sub-test is run alone (e.g., -run TestDynamoDB_TableAPICompatibility/TransactWriteItems), it will likely fail. Seed the required items within this sub-test (or use separate tables per sub-test) so the expectation is local.

Copilot · 2026-03-02T20:09:50Z

docs/docker_multinode_manual_run.md

+GRPCURL_IMG="fullstorydev/grpcurl:v1.9.3"
+
+# Wait for every node gRPC endpoint
+for ip in 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15; do
+  until docker run --rm --network host "${GRPCURL_IMG}" \
+    -plaintext "${ip}:50051" list >/dev/null 2>&1; do
+    sleep 1
+  done
+done
+
+# Check Raft members
+docker run --rm --network host "${GRPCURL_IMG}" \
+  -plaintext -d '{}' 10.0.0.11:50051 RaftAdmin/GetConfiguration
+```
+
+For a 4-node cluster, remove `10.0.0.15` from the loop.
+
+## 6) Validate Read/Write
+
+Check leader:
+
+```bash
+docker run --rm --network host fullstorydev/grpcurl:v1.9.3 \
+  -plaintext -d '{}' 10.0.0.11:50051 RaftAdmin/Leader


These commands pull and run the third-party image fullstorydev/grpcurl:v1.9.3 directly with docker run --network host using only a mutable tag, which creates a supply-chain risk. If the image registry or publisher is compromised and the v1.9.3 tag is repointed, an attacker-controlled image would run inside your cluster node with full gRPC network access to Elastickv and could exfiltrate or corrupt data. Consider pinning the image to a known-good immutable digest (or hosting/compiling your own grpcurl binary or internal image) and documenting how operators should verify the provenance of the tool before use.

…to feature/prd-workload

bootjp · 2026-03-02T20:19:30Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement for cluster bootstrapping by adding the --raftBootstrapMembers flag, allowing for a declarative, fixed-member startup. This is a more robust alternative to dynamic joins for initial cluster formation. The implementation is well-tested with new unit and end-to-end tests. Additionally, the PR improves existing test coverage by refactoring a large test into more manageable sub-tests and enhances the demo server's join logic to be more reliable. The new deployment documentation is clear and helpful. Overall, this is a high-quality contribution that improves both functionality and maintainability.

gemini-code-assist · 2026-03-02T20:21:18Z

docs/docker_multinode_manual_run.md

+
+```bash
+# Pin to an immutable digest (replace with a verified digest from your supply-chain policy).
+GRPCURL_IMG="fullstorydev/grpcurl@sha256:<verified_digest>"


While it's great to recommend pinning to a SHA256 digest for security, providing a placeholder <verified_digest> makes it harder for users to follow this guide. Consider providing a specific, known-good digest as an example, along with instructions on how a user can find and verify the latest digest for themselves. This would make the documentation more self-contained and user-friendly.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T20:25:23Z

docs/docker_multinode_manual_run.md

+| Node ID | VM | IP |
+| --- | --- | --- |
+| n1 | vm1 | 10.0.0.11 |
+| n2 | vm2 | 10.0.0.12 |


The tables use a || prefix on each row (e.g., || Node ID | VM | IP |), which renders as an extra empty column in most Markdown renderers. Consider switching to standard table syntax with single | delimiters so the topology/fault-tolerance tables render correctly on GitHub.

Copilot · 2026-03-02T20:25:24Z

cmd/server/demo.go

+	await, err := client.Await(ctx, future)
+	if err != nil {
+		return errors.WithStack(err)
+	}
+	if await.GetError() != "" {


tryJoinNode uses the caller-provided ctx for both AddVoter and the blocking Await call. In the demo path this ctx is typically the errgroup context with no deadline, so Await can block indefinitely and prevent the retry loop from making progress. Use per-attempt context.WithTimeout (like other join-path tests do) or otherwise ensure ctx always has a deadline before calling Await.

Copilot · 2026-03-02T20:25:24Z

main_bootstrap_e2e_test.go

+	var lc net.ListenConfig
+	l, err := lc.Listen(context.Background(), "tcp", "127.0.0.1:0")
+	require.NoError(t, err)
+	addr := l.Addr().String()
+	require.NoError(t, l.Close())
+	return addr


reserveTCPAddr closes the listener and returns the address, which does not actually reserve the port; another process/test can bind it before the node starts. This makes the E2E test inherently racy and reliant on retries. Prefer keeping listeners open until server startup (or using :0 binds at startup and plumbing the actual bound address through) to eliminate port-reuse flakes.

Add deployment docs and improve test coverage

8e3e9cc

Copilot AI review requested due to automatic review settings February 27, 2026 03:46

Copilot started reviewing on behalf of bootjp February 27, 2026 03:46 View session

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Refactor tests for parallel execution and clarity

5371c32

bootjp requested a review from Copilot March 1, 2026 16:20

Copilot started reviewing on behalf of bootjp March 1, 2026 16:21 View session

gemini-code-assist bot reviewed Mar 1, 2026

View reviewed changes

Copilot AI reviewed Mar 1, 2026

View reviewed changes

bootjp added 2 commits March 3, 2026 05:03

Add error handling and context timeouts

9ac59fa

Merge branch 'main' into feature/prd-workload

8a9cd9c

bootjp requested a review from Copilot March 2, 2026 20:04

bootjp closed this Mar 2, 2026

bootjp reopened this Mar 2, 2026

Copilot started reviewing on behalf of bootjp March 2, 2026 20:04 View session

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

Copilot AI reviewed Mar 2, 2026

View reviewed changes

bootjp added 2 commits March 3, 2026 05:18

Update test and retry logic; pin grpcurl image version

5cb4a44

Merge branch 'feature/prd-workload' of github.com:bootjp/elastickv in…

70f113e

…to feature/prd-workload

bootjp requested a review from Copilot March 2, 2026 20:19

Copilot started reviewing on behalf of bootjp March 2, 2026 20:19 View session

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

Copilot AI reviewed Mar 2, 2026

View reviewed changes

	adminConn, err := grpc.NewClient(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))
	adminConn, err := grpc.Dial(nodes[0].grpcAddress, grpc.WithTransportCredentials(insecure.NewCredentials()))

		if err := waitForJoinRetry(ctx, joinRetryInterval); err != nil {
		return joinClusterWaitError(err)

	// Shared setup for all sub-tests so each sub-test can run in isolation with -run.
	// Shared setup for all sub-tests.

Conversation

bootjp commented Feb 27, 2026

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Mar 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment