Skip to content

fix(windows): CI matrix, zip archives, and NTFS ACL for sensitive files#129

Merged
kvaps merged 35 commits intomainfrom
fix/windows-support
Apr 28, 2026
Merged

fix(windows): CI matrix, zip archives, and NTFS ACL for sensitive files#129
kvaps merged 35 commits intomainfrom
fix/windows-support

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented Apr 17, 2026

Summary

Close the Windows support loop. The original "template path with backslash" symptom was already fixed upstream of this branch (engine-side NormalizeTemplatePath in v0.22.0), but nothing proved it stayed fixed — CI only ran on Ubuntu, so the Windows-tagged tests never executed, and the release archive was a .tar.gz that stock Windows cannot open. This PR makes Windows a first-class target and closes the security gap that becomes visible once Windows users can actually run talm.

Closes #11

What changes

1. Windows CI runner

.github/workflows/pr.yml runs go test ./... on both ubuntu-latest and windows-latest with fail-fast: false. The existing engine_windows_test.go (build-tagged //go:build windows) now actually executes. Any future regression in Windows-specific path handling fails CI.

2. Windows zip archives

.goreleaser.yaml uses format_overrides so Windows binaries ship as .zip. Linux and macOS remain .tar.gz.

3. Regression tests for backslash paths

Users running talm from PowerShell pass paths with \ separators, which must be normalized before reaching the helm engine's forward-slash-only map keys. Both code paths that touch template arguments now have Windows-tagged tests:

  • apply_windows_test.go::TestResolveTemplatePaths_BackslashInput covers talm apply via resolveTemplatePaths.
  • template_windows_test.go::TestResolveEngineTemplatePaths_BackslashInput covers talm template via the newly-extracted resolveEngineTemplatePaths.

The two resolvers are intentionally different (apply resolves against rootDir, template against CWD with a templates/<basename> fallback), so they need separate coverage.

4. New pkg/secureperm package for sensitive files

A drop-in replacement for os.WriteFile(path, data, 0o600) and os.Chmod(path, 0o600). On Unix it delegates to those calls with an explicit follow-up Chmod (required because os.WriteFile preserves prior mode bits on overwrite). On Windows it installs a protected NTFS DACL with a single Allow ACE for the current user SID.

WriteFile is also atomic on both platforms via write-to-tmp + rename — a failure mid-write leaves the original file intact. Secrets files are not reconstructible (corrupting secrets.yaml forces a cluster PKI reissue), so non-atomic writes are a real hazard.

Call sites migrated:

  • pkg/age/age.gotalm.key creation, decrypted secrets.yaml, generic decrypted YAML.
  • pkg/commands/init.go — talosconfig, secrets.yaml bundle (via the new writeSecureToDestination helper).
  • pkg/commands/rotate_ca_handler.go — rotated secrets.yaml.
  • pkg/commands/talosconfig.go — regenerated talosconfig.
  • pkg/commands/kubeconfig_handler.go — kubeconfig tighten-after-fetch.
  • pkg/commands/template.go — rendered machine config via talm template --inplace (embeds certs + PKI keys + cluster join tokens).

Tests cover the atomic-write contract on both OSes (TestWriteFile_PreservesOriginalOnFailure_{Unix,Windows}), the DACL shape on Windows (TestWriteFile_*_DACL_Windows parse the SDDL string and assert D:P + exactly one Allow ACE for the current user SID), overwrite-downgrade behavior on Unix (TestWriteFile_OverwriteDowngrades_Unix), and the new-file happy path on both.

5. Collateral cleanups

Three small defects surfaced while working in this area:

  • pkg/commands/kubeconfig_handler.go had a vacuous if err == nil { ... } wrapper around the entire 60-line post-write block, guarding a condition that was already checked with early-return one line above. The wrapper also shadowed err with inner filepath.Abs assignments, making the outer variable meaningless. Removed.
  • pkg/commands/init.go::writeToDestination printed Created <path> unconditionally after the write, so a permission failure produced a confusing "Created foo.key" + error pair. Both writeToDestination and the new writeSecureToDestination now gate the message on err == nil. Pinned by TestWriteToDestination_SilentOnFailure and TestWriteToDestination_AnnouncesOnSuccess.
  • writeSecretsBundleToFile called validateFileExists twice — the outer call was redundant with the same check inside writeSecureToDestination. Removed.

6. README

New "Windows" installation subsection pointing at the release zip and documenting the -t / --template separator equivalence.

Verification

  • go test ./... on macOS — pass
  • GOOS=windows GOARCH=amd64 go build ./... — clean
  • GOOS=windows GOARCH=amd64 go vet ./... — clean
  • GOOS=windows GOARCH=amd64 go test -c -o /dev/null ./... — all test packages compile for Windows
  • golangci-lint run ./... — 0 issues
  • CI matrix (ubuntu-latest + windows-latest) is exercised by this PR itself — watch for --- PASS on both runners

Summary by CodeRabbit

  • New Features

    • Windows platform support with installation docs and Windows-compatible template path handling.
    • Secure, cross-platform writing for sensitive outputs to ensure owner-only access.
  • Bug Fixes

    • Consistent permission/ACL tightening for secret files across platforms.
  • Tests

    • Expanded Unix and Windows tests covering permission, overwrite, and path-resolution behaviors.
  • Chores

    • CI matrix expanded to include Windows; release/archive formats and dependency declarations updated.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

Warning

Rate limit exceeded

@lexfrei has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 21 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30816b27-91f7-4cd0-b267-ebd4a5448203

📥 Commits

Reviewing files that changed from the base of the PR and between e81053c and f5b584b.

📒 Files selected for processing (24)
  • .github/workflows/pr.yml
  • .goreleaser.yaml
  • README.md
  • go.mod
  • pkg/age/age.go
  • pkg/age/age_unix_test.go
  • pkg/commands/apply.go
  • pkg/commands/apply_test.go
  • pkg/commands/apply_windows_test.go
  • pkg/commands/init.go
  • pkg/commands/init_test.go
  • pkg/commands/kubeconfig_handler.go
  • pkg/commands/rotate_ca_handler.go
  • pkg/commands/talosconfig.go
  • pkg/commands/template.go
  • pkg/commands/template_test.go
  • pkg/commands/template_unix_test.go
  • pkg/commands/template_windows_test.go
  • pkg/secureperm/secureperm.go
  • pkg/secureperm/secureperm_test.go
  • pkg/secureperm/secureperm_unix.go
  • pkg/secureperm/secureperm_unix_test.go
  • pkg/secureperm/secureperm_windows.go
  • pkg/secureperm/secureperm_windows_test.go
📝 Walkthrough

Walkthrough

The PR adds a cross-platform secureperm package that atomically writes sensitive files with owner-only permissions, updates multiple command handlers to use it instead of raw OS writes/chmods, expands path-resolution logic for templates, adds Windows CI and goreleaser tweaks, and introduces platform-specific tests and README Windows docs.

Changes

Cohort / File(s) Summary
CI / Release
/.github/workflows/pr.yml, .goreleaser.yaml, go.mod
Adds Windows runner to test matrix, sets Windows archives to zip via format_overrides, and promotes golang.org/x/sys from indirect to direct requirement.
Docs
README.md
Adds Windows installation/usage notes for the talm binary and clarifies path-separator behavior for template and config flags.
Secure Permission Core
pkg/secureperm/secureperm.go, pkg/secureperm/secureperm_unix.go, pkg/secureperm/secureperm_windows.go
New package implementing atomic owner-only writes and lockdown semantics: Unix uses tmp-file + chmod + rename; Windows creates temp with protected owner-only DACL, writes, and renames.
Secureperm Tests
pkg/secureperm/secureperm_test.go, pkg/secureperm/secureperm_unix_test.go, pkg/secureperm/secureperm_windows_test.go
Adds cross-platform and platform-specific tests verifying content persistence, permission tightening, atomicity on failure, and Windows DACL properties.
Commands — Migration to secureperm
pkg/age/age.go, pkg/commands/rotate_ca_handler.go, pkg/commands/talosconfig.go, pkg/commands/kubeconfig_handler.go, pkg/commands/init.go, pkg/commands/template.go
Replaces direct os.WriteFile/os.Chmod usages with secureperm.WriteFile/secureperm.LockDown; introduces writeSecureToDestination, writeInplaceRendered, createdSink, and control-flow adjustments around kubeconfig handling.
Template & Apply Path Handling
pkg/commands/apply.go, pkg/commands/template.go, pkg/commands/template_test.go
Introduces isOutsideRoot and resolveEngineTemplatePaths to correctly classify paths and normalize template keys; removes naive strings.HasPrefix("..") checks.
Tests — Commands
pkg/age/age_unix_test.go, pkg/commands/init_test.go, pkg/commands/template_unix_test.go, pkg/commands/template_windows_test.go, pkg/commands/apply_windows_test.go, pkg/commands/apply_test.go, pkg/commands/template_test.go
Adds/updates unit and platform-specific tests validating permission behavior, backslash normalization, created-file messaging, overwrite refusal semantics, and path classification edge cases.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as "CLI Command"
    participant Secure as "pkg/secureperm"
    participant FS as "Filesystem/OS"
    CLI->>Secure: WriteFile(path, data)
    Note right of Secure: create hidden temp sibling\n(set permissions / DACL per-OS)
    Secure->>FS: create temp file (owner-only)
    FS-->>Secure: temp handle/path
    Secure->>FS: write bytes, sync, close
    FS-->>Secure: write OK
    Secure->>FS: rename temp -> target (atomic)
    FS-->>Secure: rename OK
    Secure->>FS: (optional) parent dir sync
    Secure-->>CLI: success / error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibbled code and stitched the seam,

Hidden keys safe as in a dream,
Unix hush and Windows lock,
Temp-to-rename — click of a rock,
Hoppy secrets, snug and tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title concisely and accurately summarizes the main changes: Windows CI setup, zip archives for releases, and NTFS ACL implementation for sensitive files.
Linked Issues check ✅ Passed The PR successfully addresses issue #11 by implementing Windows CI matrix testing, Windows-specific tests for path handling, secure file writing with proper permissions, and documentation updates for Windows users.
Out of Scope Changes check ✅ Passed All changes are directly aligned with restoring Windows support: CI/release setup, sensitive file permissions, path handling fixes, and Windows-specific tests and documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/windows-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lexfrei lexfrei marked this pull request as ready for review April 17, 2026 22:55
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization's overage spend limit has been reached.

Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.

Once credits are available, reopen this pull request to trigger a review.

@lexfrei lexfrei self-assigned this Apr 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements cross-platform secure file handling by introducing the secureperm package, which ensures sensitive files like private keys and secrets are written with owner-only permissions on both Unix (0600) and Windows (protected DACLs). It also improves Windows support by refactoring template path resolution to handle backslashes and updating the release configuration. A security improvement was suggested to use more restrictive permissions (0700) when creating parent directories for sensitive files in the init command to prevent them from being world-readable.

Comment thread pkg/commands/init.go Outdated

parentDir := filepath.Dir(destination)

if err := os.MkdirAll(parentDir, os.ModePerm); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Since writeSecureToDestination is specifically designed for sensitive files (like talm.key and secrets.yaml), the parent directory should be created with restrictive permissions (0700) on Unix-like systems. Using os.ModePerm (0777) relies entirely on the system umask and may leave the directory world-readable or world-writable in some environments, which is inconsistent with the goal of this security-focused helper.

Suggested change
if err := os.MkdirAll(parentDir, os.ModePerm); err != nil {
if err := os.MkdirAll(parentDir, 0700); err != nil {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to 0o700 in cddfef4. MkdirAll is a no-op when the directory already exists, so this only affects freshly created parent dirs for secrets — the files themselves were already 0o600 via secureperm, but the parent could land at 0o755 under a permissive umask, which didn't match the helper's goal.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
pkg/commands/kubeconfig_handler.go (1)

104-162: LockDown swap looks correct; minor duplication in root-abs resolution.

secureperm.LockDown is the right replacement for os.Chmod here — the Windows branch now actually tightens the DACL instead of silently no-op'ing. The warning-on-failure behavior is preserved.

Minor: filepath.Abs(Config.RootDir) is computed twice (Line 111 and Line 139) for two independent "is this inside the project root?" checks. Hoisting it once keeps the two blocks in sync if the root ever changes and avoids a redundant syscall.

♻️ Proposed hoist
+		rootAbs, rootAbsErr := filepath.Abs(Config.RootDir)
+
 		// Set secure permissions (600) on kubeconfig file. On Windows
 		// this lays down an NTFS DACL; os.Chmod would have been a no-op.
 		if err := secureperm.LockDown(absPath); err != nil {
 			// Don't fail the command if the tighten fails, but log warning
 			fmt.Fprintf(os.Stderr, "Warning: failed to set permissions on kubeconfig: %v\n", err)
 		}
 
-		rootAbs, err := filepath.Abs(Config.RootDir)
-		if err == nil {
+		if rootAbsErr == nil {
 			relPath, err := filepath.Rel(rootAbs, absPath)
 			...
 		}
 		...
 		if !loginFlagValue {
-			// Get relative path from project root for encryption
-			rootAbs, err := filepath.Abs(Config.RootDir)
-			if err == nil {
+			if rootAbsErr == nil {
 				relKubeconfigPath, err := filepath.Rel(rootAbs, absPath)
 				...
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/commands/kubeconfig_handler.go` around lines 104 - 162, Hoist the
duplicate call to filepath.Abs(Config.RootDir) into a single rootAbs variable
computed once before the two "is inside project root?" checks so both blocks
reuse it; locate the code around secureperm.LockDown and the later block gated
by loginFlagValue that computes relPath and relKubeconfigPath and replace the
second filepath.Abs(Config.RootDir) with the previously computed rootAbs,
keeping the existing nil-checks and semantics (use the same err variable
handling for the initial abs calculation and then use rootAbs when calling
filepath.Rel for both absPath -> relPath and absPath -> relKubeconfigPath).
pkg/secureperm/secureperm_windows.go (1)

146-177: WriteFile atomicity and cleanup look correct.

Handle ownership is transferred cleanly to os.File via os.NewFile(uintptr(handle), …), the committed flag gates cleanup correctly, double-Close on the success path is a no-op on *os.File, and os.Remove(tmpPath) on any pre-rename error preserves the original destination — matching the contract documented in secureperm.go and exercised by TestWriteFile_PreservesOriginalOnFailure_Windows.

One optional hardening idea for later: consider f.Sync() before Close() on Windows for the same crash-durability reason you'd want it on Unix. Not a blocker; the atomicity contract here is about "no partial destination visible", not fsync durability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/secureperm/secureperm_windows.go` around lines 146 - 177, Add an optional
durability hardening by calling f.Sync() before closing the temporary file in
WriteFile to flush data to disk on Windows: after writing to f and before the
existing f.Close() call, invoke f.Sync() and handle/return any error (wrap with
the same "close tmp" style or a new "sync tmp" message), keeping the
tmpPath/handle/f/committed logic and cleanup via createSecureTmp intact so the
atomic rename behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/commands/template_unix_test.go`:
- Around line 56-59: The test currently seeds the file using os.WriteFile(path,
[]byte("old"), 0o644) which is subject to the process umask; to ensure the test
truly starts with mode 0644 before calling writeInplaceRendered, explicitly set
the file mode after creation using os.Chmod(path, 0o644). Update the test in
pkg/commands/template_unix_test.go (around the seed + writeInplaceRendered call)
to call os.Chmod(path, 0o644) after os.WriteFile and before calling
writeInplaceRendered so the test reliably verifies that writeInplaceRendered
tightens permissions.

In `@pkg/commands/template.go`:
- Around line 423-435: The check using strings.HasPrefix(relPath, "..") is too
broad and can misclassify paths whose first path element merely starts with
".."; change this to test the first path element exactly equals ".." by
splitting relPath with filepath.SplitList or strings.SplitN using
string(os.PathSeparator)/filepath.Separator (e.g., first :=
strings.SplitN(relPath, string(filepath.Separator), 2)[0]) and then use if first
== ".." { ... } so the rest of the logic (building possiblePath, stat check, and
setting resolved[i] = engine.NormalizeTemplatePath(...)) remains unchanged.

In `@pkg/secureperm/secureperm_unix_test.go`:
- Around line 48-75: The tests assume os.WriteFile(path, 0o644) yields lax perms
but umask can tighten them; make the precondition independent of umask by
explicitly forcing the lax mode after seeding: after the os.WriteFile calls in
TestWriteFile_OverwriteDowngrades_Unix and the other test that seeds a 0644
file, call os.Chmod(path, 0o644) (or equivalent) and check for errors before
invoking secureperm.WriteFile or secureperm.LockDown so the tests actually
verify that those functions tighten permissions.

In `@pkg/secureperm/secureperm_unix.go`:
- Around line 38-69: The WriteFile function must fsync the temporary file and
the containing directory to guarantee durability: after writing to f (and before
closing it) call f.Sync and handle/return any error; after os.Rename(tmpPath,
path) open the parent directory (using dir from filepath.Dir(path)), call Sync
on that directory file descriptor and return any error, ensuring committed is
only set true after both fsyncs succeed; keep existing cleanup/defer behavior
and error wrapping consistent with the surrounding error messages.

In `@pkg/secureperm/secureperm_windows_test.go`:
- Around line 30-38: Update the comment for
assertProtectedOwnerOnlyDACL/ownerOnlyDescriptor in secureperm_windows_test.go
to remove the incorrect `AI` inheritance flag: change the example SDDL from
`D:PAI(A;;FA;;;<USER-SID>)` to `D:P(A;;FA;;;<USER-SID>)` (or
`D:P(A;;FA;;;<current-user-SID>)`) and note that `AI` is stripped when
SE_DACL_PROTECTED is set so a protected owner-only DACL should not include `AI`.

---

Nitpick comments:
In `@pkg/commands/kubeconfig_handler.go`:
- Around line 104-162: Hoist the duplicate call to filepath.Abs(Config.RootDir)
into a single rootAbs variable computed once before the two "is inside project
root?" checks so both blocks reuse it; locate the code around
secureperm.LockDown and the later block gated by loginFlagValue that computes
relPath and relKubeconfigPath and replace the second
filepath.Abs(Config.RootDir) with the previously computed rootAbs, keeping the
existing nil-checks and semantics (use the same err variable handling for the
initial abs calculation and then use rootAbs when calling filepath.Rel for both
absPath -> relPath and absPath -> relKubeconfigPath).

In `@pkg/secureperm/secureperm_windows.go`:
- Around line 146-177: Add an optional durability hardening by calling f.Sync()
before closing the temporary file in WriteFile to flush data to disk on Windows:
after writing to f and before the existing f.Close() call, invoke f.Sync() and
handle/return any error (wrap with the same "close tmp" style or a new "sync
tmp" message), keeping the tmpPath/handle/f/committed logic and cleanup via
createSecureTmp intact so the atomic rename behavior is unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0cda9ea0-9793-4fa3-bacb-3440bb6a419d

📥 Commits

Reviewing files that changed from the base of the PR and between 00358fd and 8d4d0a0.

📒 Files selected for processing (21)
  • .github/workflows/pr.yml
  • .goreleaser.yaml
  • README.md
  • go.mod
  • pkg/age/age.go
  • pkg/age/age_unix_test.go
  • pkg/commands/apply_windows_test.go
  • pkg/commands/init.go
  • pkg/commands/init_test.go
  • pkg/commands/kubeconfig_handler.go
  • pkg/commands/rotate_ca_handler.go
  • pkg/commands/talosconfig.go
  • pkg/commands/template.go
  • pkg/commands/template_unix_test.go
  • pkg/commands/template_windows_test.go
  • pkg/secureperm/secureperm.go
  • pkg/secureperm/secureperm_test.go
  • pkg/secureperm/secureperm_unix.go
  • pkg/secureperm/secureperm_unix_test.go
  • pkg/secureperm/secureperm_windows.go
  • pkg/secureperm/secureperm_windows_test.go

Comment thread pkg/commands/template_unix_test.go
Comment thread pkg/commands/template.go
Comment thread pkg/secureperm/secureperm_unix_test.go
Comment thread pkg/secureperm/secureperm_unix.go
Comment thread pkg/secureperm/secureperm_windows_test.go
Copy link
Copy Markdown
Contributor

@myasnikovdaniil myasnikovdaniil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have enough to draft the review. Here it is.


Draft review for PR #129

Title: fix(windows): CI matrix, zip archives, and NTFS ACL for sensitive files
Author: lexfrei
Base: main · Head: fix/windows-support · Files: 24

Summary

This is a substantial and well-executed Windows-support PR. The CI matrix + zip archives convert Windows from a theoretical to a real release target; the new pkg/secureperm package replaces scattered os.WriteFile(..., 0o600) / os.Chmod(..., 0o600) calls — which were silently no-ops for permissions on NTFS — with an atomic write-to-tmp + rename that lays down a protected owner-only DACL on Windows and an explicit Chmod on Unix. Test coverage is thorough: SDDL-shape assertions on Windows, mode assertions on Unix, atomic-rollback assertions on both, and direct regression tests for the PowerShell backslash path. The documentation in secureperm itself reads well and explains why CREATE_NEW + SECURITY_ATTRIBUTES is the load-bearing detail. A few things I'd like the author to consider before merging — the most substantive being a missing fsync on Windows that breaks the parity the package doc claims, and the kubeconfig_handler.go still using the bare HasPrefix("..") check that this PR fixed elsewhere.

Recommendation

Comment — leaning approve once the Windows fsync omission and the kubeconfig_handler.go HasPrefix("..") carry-over are addressed (or explicitly punted with a follow-up issue). The rest are nits / discussion. Test coverage and the secureperm package docs are notably good.


That's the draft. Nothing has been posted; ready for your review and any edits before you send it.

}
if err := f.Close(); err != nil {
return fmt.Errorf("close tmp %s: %w", tmpPath, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing fsync before rename on Windows

The Unix variant explicitly sync's the tmp before close + sync's the parent dir after rename, with this rationale:

fsync the tmp file before rename so its contents are on stable storage; otherwise a crash/power-loss between rename and the delayed disk flush can surface the renamed inode pointing at zero-length or stale data on reboot — the canonical failure mode the atomic-rename pattern is meant to avoid. Secrets files are not reconstructible, so the full fsync is warranted.

The Windows variant skips this entirely. os.Rename on Windows uses MoveFileEx(MOVEFILE_REPLACE_EXISTING) without MOVEFILE_WRITE_THROUGH, so a power loss between rename and the OS cache flush leaves you exactly in the failure mode the Unix doc warns about. *os.File.Sync() on Windows calls FlushFileBuffers and is the right tool here.

	if _, err := f.Write(data); err != nil {
		return fmt.Errorf("write tmp %s: %w", tmpPath, err)
	}
	if err := f.Sync(); err != nil {
		return fmt.Errorf("sync tmp %s: %w", tmpPath, err)
	}
	if err := f.Close(); err != nil {
		return fmt.Errorf("close tmp %s: %w", tmpPath, err)
	}
	if err := os.Rename(tmpPath, path); err != nil {
		return fmt.Errorf("rename tmp %s -> %s: %w", tmpPath, path, err)
	}
	committed = true

If you want full parity with Unix you'd also want a directory flush after rename, though Windows has no clean public API for that and most filesystems don't need it — the file flush is the more important of the two.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e8c8051 — added f.Sync() before close so FlushFileBuffers runs on the Windows path too. Comment carries the same rationale as the Unix variant. Skipped the directory-level flush per your note that Windows has no clean public API for that; the file flush is the load-bearing call.

Comment thread pkg/commands/kubeconfig_handler.go Outdated
rootAbs, err := filepath.Abs(Config.RootDir)
if err == nil {
relPath, err := filepath.Rel(rootAbs, absPath)
if err == nil && !strings.HasPrefix(relPath, "..") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same HasPrefix("..") misclassification this PR fixes elsewhere

This PR adds isOutsideRoot to fix exactly this bug in apply.go and template.go, but kubeconfig_handler.go (in the same package) still has two raw strings.HasPrefix(relPath, "..") checks. A user with a sibling directory like ..mykube would have their kubeconfig misclassified as outside-root and the gitignore-add / encrypted-update branches would be skipped silently.

The impact is smaller than the template-resolution case (it just suppresses two side-effects rather than producing a "template not found"), but it's the same logical bug and the helper to fix it is already in the same package:

		relPath, err := filepath.Rel(rootAbs, absPath)
		if err == nil && !isOutsideRoot(relPath) {
			// Path is within project root, add to .gitignore

and

				relKubeconfigPath, err := filepath.Rel(rootAbs, absPath)
				if err == nil && !isOutsideRoot(relKubeconfigPath) {
					// Path is within project root

Worth either folding into this PR (it's a one-line change in two places) or filing a follow-up so it doesn't drift.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folded into this PR in 9383179. The boolean contract of isOutsideRoot is now also pinned directly via TestIsOutsideRoot in d6a069b so any future call site inherits validated semantics rather than re-deriving them.

Comment thread pkg/commands/kubeconfig_handler.go Outdated
rootAbs, err := filepath.Abs(Config.RootDir)
if err == nil {
relKubeconfigPath, err := filepath.Rel(rootAbs, absPath)
if err == nil && !strings.HasPrefix(relKubeconfigPath, "..") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same HasPrefix("..") misclassification this PR fixes elsewhere

This PR adds isOutsideRoot to fix exactly this bug in apply.go and template.go, but kubeconfig_handler.go (in the same package) still has two raw strings.HasPrefix(relPath, "..") checks. A user with a sibling directory like ..mykube would have their kubeconfig misclassified as outside-root and the gitignore-add / encrypted-update branches would be skipped silently.

The impact is smaller than the template-resolution case (it just suppresses two side-effects rather than producing a "template not found"), but it's the same logical bug and the helper to fix it is already in the same package:

		relPath, err := filepath.Rel(rootAbs, absPath)
		if err == nil && !isOutsideRoot(relPath) {
			// Path is within project root, add to .gitignore

and

				relKubeconfigPath, err := filepath.Rel(rootAbs, absPath)
				if err == nil && !isOutsideRoot(relKubeconfigPath) {
					// Path is within project root

Worth either folding into this PR (it's a one-line change in two places) or filing a follow-up so it doesn't drift.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix in 9383179.

Comment thread pkg/secureperm/secureperm_windows.go Outdated
Comment on lines +147 to +150
dir := filepath.Dir(path)
if dir == "" {
dir = "."
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead guard

	dir := filepath.Dir(path)
	if dir == "" {
		dir = "."
	}

filepath.Dir is documented to never return the empty string — it returns "." for a bare filename and "." for an empty input. The guard never fires. Either drop it or leave a comment explaining what it's defending against; right now it reads like there's a code path that produces "" and there isn't.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped in e8c8051 (same commit as the fsync — both touch the same WriteFile entry). Agreed it read like there was a code path producing "" and there isn't.

Comment on lines +38 to +85
func WriteFile(path string, data []byte) error {
dir := filepath.Dir(path)
f, err := os.CreateTemp(dir, ".secureperm-*")
if err != nil {
return fmt.Errorf("create tmp in %s: %w", dir, err)
}
tmpPath := f.Name()
committed := false
defer func() {
if !committed {
_ = f.Close()
_ = os.Remove(tmpPath)
}
}()

// os.CreateTemp already uses 0o600 but enforce explicitly so the
// contract survives any future stdlib change.
if err := f.Chmod(0o600); err != nil {
return fmt.Errorf("chmod tmp: %w", err)
}
if _, err := f.Write(data); err != nil {
return fmt.Errorf("write tmp: %w", err)
}
// fsync the tmp file before rename so its contents are on stable
// storage; otherwise a crash/power-loss between rename and the
// delayed disk flush can surface the renamed inode pointing at
// zero-length or stale data on reboot — the canonical failure mode
// the atomic-rename pattern is meant to avoid. Secrets files are
// not reconstructible, so the full fsync is warranted.
if err := f.Sync(); err != nil {
return fmt.Errorf("sync tmp: %w", err)
}
if err := f.Close(); err != nil {
return fmt.Errorf("close tmp: %w", err)
}
if err := os.Rename(tmpPath, path); err != nil {
return fmt.Errorf("rename tmp -> %s: %w", path, err)
}
committed = true
// Best-effort fsync of the parent dir so the rename entry itself is
// durable. Ignored errors: dir fsync is unsupported on a few
// filesystems; the tmp fsync above already protects the payload.
if d, openErr := os.Open(dir); openErr == nil {
_ = d.Sync()
_ = d.Close()
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ownership change when run as a different user from the prior owner

Worth calling out in the package doc. With the old os.WriteFile(path, data, 0o600) against an existing file, the inode is opened with O_TRUNC so owner/group are preserved. With the new tmp + rename strategy, the result is owned by whoever runs the process. If a user ever runs talm as root (e.g. via sudo talm rotate-ca on a CI box that had previously been operated by an unprivileged user), the secrets files will silently change ownership to root and the original user can't read them anymore.

Probably fine for the typical single-user workstation flow, but the contract change deserves one line in the package doc next to the atomicity claim. No code change required if you're comfortable with the trade-off.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in b28c948. Took the no-code-change path you suggested — a paragraph in the package doc next to the atomicity claim, explaining the uid/gid difference vs os.WriteFile's O_TRUNC semantics and pointing mixed-uid setups at running talm under a consistent identity.

// have picked instead).
if err := os.MkdirAll(filepath.Join(rootDir, "..templates"), 0o755); err != nil {
t.Fatalf("mkdir ..templates: %v", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Chdir in tests

Both new tests do os.Chdir(rootDir) with t.Cleanup to restore. os.Chdir is process-global, so anyone adding t.Parallel() to these or to a sibling test would silently corrupt state. The existing t.Cleanup keeps them honest in serial mode, but a // not safe with t.Parallel — uses os.Chdir comment would prevent a future regression.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added in e81053c.


// Chdir so relative paths resolve against rootDir exactly as they
// would when a user cd's into their project directory and runs
// `talm template -t templates\controlplane.yaml`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Chdir in tests

Both new tests do os.Chdir(rootDir) with t.Cleanup to restore. os.Chdir is process-global, so anyone adding t.Parallel() to these or to a sibling test would silently corrupt state. The existing t.Cleanup keeps them honest in serial mode, but a // not safe with t.Parallel — uses os.Chdir comment would prevent a future regression.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added in e81053c.

Comment thread .github/workflows/pr.yml
Comment on lines +10 to +14
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
runs-on: ${{ matrix.os }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail-fast: false is correct, just confirming intent

    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, windows-latest]
    runs-on: ${{ matrix.os }}

Reads as "let both runners finish even if one fails." That's the right call for a platform-parity matrix — you want to see Windows-specific failures even if Linux passed (or vice versa). Just calling it out so the PR description's "watch for --- PASS on both runners" claim is verified by behavior, not just name.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — fail-fast: false is intentional precisely so a Windows-specific failure surfaces even when Linux is green (and vice versa). I'll make the behavioural claim explicit in future PR descriptions rather than letting it ride on the reader recognising the strategy block.

Comment on lines +103 to +104
for range 100 {
candidate := filepath.Join(dir, fmt.Sprintf(".secureperm-%d-%d", os.Getpid(), rand.Uint32()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp filename entropy

	for range 100 {
		candidate := filepath.Join(dir, fmt.Sprintf(".secureperm-%d-%d", os.Getpid(), rand.Uint32()))

math/rand/v2 is auto-seeded per process so two concurrent talm processes in the same dir would have different streams; combined with os.Getpid() and the 100-attempt retry, collisions are statistically zero. Just noting the design is sound — no change requested.


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the audit. Agreed — math/rand/v2's per-process auto-seeding plus os.Getpid() plus the 100-attempt retry makes collisions a non-issue in practice.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/commands/template_windows_test.go (1)

106-154: Optional: extract Windows DACL test helpers to a shared testutil package.

The SDDL parsing, current-user SID lookup, and D:P / Allow-ACE assertions here are near-verbatim duplicates of extractAllowACETrustee, assertTrusteeMatches, and assertProtectedOwnerOnlyDACL in pkg/secureperm/secureperm_windows_test.go. They live in different packages (commands vs secureperm_test) so direct reuse isn't possible, but a small internal/testutil/winperm (or similar) helper package would let both test files share one implementation and keep the assertion shape consistent if the DACL contract evolves.

Not blocking — fine to defer until a third caller appears.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/commands/template_windows_test.go` around lines 106 - 154, Extract the
duplicated Windows DACL test logic (SDDL parsing, current-user SID lookup, and
DACL assertions) into a shared internal test helper package (e.g.
internal/testutil/winperm) and replace the inline code in this test with calls
to those helpers; specifically, implement functions mirroring the behavior of
secureperm_test helpers (extractAllowACETrustee, assertTrusteeMatches,
assertProtectedOwnerOnlyDACL) that encapsulate the regex parsing of SDDL,
calling windows.GetNamedSecurityInfo, token.OpenProcessToken/token.GetTokenUser,
windows.StringToSid and windows.EqualSid, then update this file to call the
shared helpers instead of repeating the regex/retrieve/compare logic so both
packages use the same implementation.
pkg/commands/apply_test.go (1)

283-304: Good contract test; consider a couple of additional edge cases.

TestIsOutsideRoot covers the key HasPrefix("..") misclassification. Two optional additions worth considering:

  • An empty-string case ("") — clarifies behavior for callers that may receive empty rel paths from filepath.Rel of identical paths.
  • A trailing-.. element case like "foo" + string(filepath.Separator) + ".." — verifies the predicate inspects each path element rather than only the leading one (relevant if a caller ever passes an unclean path).

Also a minor readability nit: the string(filepath.Separator) + "foo" + string(filepath.Separator) + "bar" chains could be replaced with filepath.Join(...) for terser cases, e.g. filepath.Join("..", "foo", "bar").

♻️ Optional refactor
 	cases := []struct {
 		relPath string
 		want    bool
 	}{
 		{"..", true},
-		{".." + string(filepath.Separator) + "foo", true},
-		{".." + string(filepath.Separator) + "foo" + string(filepath.Separator) + "bar", true},
+		{filepath.Join("..", "foo"), true},
+		{filepath.Join("..", "foo", "bar"), true},
 		{"..foo", false},
-		{"..foo" + string(filepath.Separator) + "bar", false},
-		{"..templates" + string(filepath.Separator) + "controlplane.yaml", false},
+		{filepath.Join("..foo", "bar"), false},
+		{filepath.Join("..templates", "controlplane.yaml"), false},
 		{"..mykube", false},
 		{"foo", false},
-		{"foo" + string(filepath.Separator) + "..bar", false},
+		{filepath.Join("foo", "..bar"), false},
 		{".", false},
+		{"", false},
+		{filepath.Join("foo", ".."), false}, // unclean: cleans to "."; document expected behavior
 	}

Note: the last suggested case depends on whether isOutsideRoot calls filepath.Clean internally — verify before adding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/commands/apply_test.go` around lines 283 - 304, Add two additional cases
to TestIsOutsideRoot: one where relPath is the empty string "" and one where
relPath is filepath.Join("foo", "..") (or "foo"+string(filepath.Separator)+"..")
to ensure isOutsideRoot handles empty and trailing ".." path elements; also
refactor existing composite path literals to use filepath.Join (e.g.
filepath.Join("..","foo","bar")) for readability. Before adding the
trailing-".." test, confirm whether isOutsideRoot calls filepath.Clean; if it
does, adjust the expected behavior accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/commands/apply_test.go`:
- Around line 283-304: Add two additional cases to TestIsOutsideRoot: one where
relPath is the empty string "" and one where relPath is filepath.Join("foo",
"..") (or "foo"+string(filepath.Separator)+"..") to ensure isOutsideRoot handles
empty and trailing ".." path elements; also refactor existing composite path
literals to use filepath.Join (e.g. filepath.Join("..","foo","bar")) for
readability. Before adding the trailing-".." test, confirm whether isOutsideRoot
calls filepath.Clean; if it does, adjust the expected behavior accordingly.

In `@pkg/commands/template_windows_test.go`:
- Around line 106-154: Extract the duplicated Windows DACL test logic (SDDL
parsing, current-user SID lookup, and DACL assertions) into a shared internal
test helper package (e.g. internal/testutil/winperm) and replace the inline code
in this test with calls to those helpers; specifically, implement functions
mirroring the behavior of secureperm_test helpers (extractAllowACETrustee,
assertTrusteeMatches, assertProtectedOwnerOnlyDACL) that encapsulate the regex
parsing of SDDL, calling windows.GetNamedSecurityInfo,
token.OpenProcessToken/token.GetTokenUser, windows.StringToSid and
windows.EqualSid, then update this file to call the shared helpers instead of
repeating the regex/retrieve/compare logic so both packages use the same
implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c8c181d-1167-4283-9a78-e07f790edc12

📥 Commits

Reviewing files that changed from the base of the PR and between 8d4d0a0 and e81053c.

📒 Files selected for processing (12)
  • pkg/commands/apply.go
  • pkg/commands/apply_test.go
  • pkg/commands/init.go
  • pkg/commands/kubeconfig_handler.go
  • pkg/commands/template.go
  • pkg/commands/template_test.go
  • pkg/commands/template_unix_test.go
  • pkg/commands/template_windows_test.go
  • pkg/secureperm/secureperm_unix.go
  • pkg/secureperm/secureperm_unix_test.go
  • pkg/secureperm/secureperm_windows.go
  • pkg/secureperm/secureperm_windows_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/commands/template_unix_test.go
  • pkg/commands/template.go
  • pkg/commands/kubeconfig_handler.go

Copy link
Copy Markdown
Contributor

@myasnikovdaniil myasnikovdaniil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review comments addressed; Windows fsync-before-rename and kubeconfig_handler isOutsideRoot fixes verified. LGTM.

lexfrei added 21 commits April 28, 2026 15:50
Converts the pr.yml test job to a strategy matrix so engine_windows_test.go
and any future //go:build windows tests actually execute. Until now the
Windows-tagged NormalizeTemplatePath table test had no signal, leaving
the ticket open despite the symptom being fixed.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Adds format_overrides so Windows archives ship as .zip instead of .tar.gz.
Stock Windows has no double-click handler for .tar.gz; tar.exe exists
since Windows 10 1803 but requires a shell. Linux and darwin archives
remain tar.gz as before.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Introduces pkg/secureperm with WriteFile(path, data) and LockDown(path).
On Unix both are thin wrappers over os.WriteFile/os.Chmod with mode
0o600. On Windows os.Chmod does not translate to NTFS DACLs — files
inherit ACLs from the parent directory and remain readable by
BUILTIN\Users. The Windows implementation sets a protected DACL
granting access only to the current user SID via
SetNamedSecurityInfo from golang.org/x/sys/windows, which was already
a transitive dependency.

No call sites migrated in this commit; the helper is introduced alone
so the migration is easy to review.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replaces os.WriteFile(..., 0o600) and os.Chmod(..., 0o600) at the
sensitive-file sites with secureperm.WriteFile / secureperm.LockDown
so Windows gets an actual owner-only DACL instead of a silent no-op.

Sites migrated:
- pkg/age/age.go: talm.key creation, decrypted secrets.yaml write,
  generic decrypted plain file write
- pkg/commands/init.go: talosconfig + secrets.yaml via the existing
  writeToDestination helper (0o600 branch now routes through secureperm)
- pkg/commands/rotate_ca_handler.go: rotated secrets.yaml
- pkg/commands/talosconfig.go: talosconfig regeneration
- pkg/commands/kubeconfig_handler.go: kubeconfig tighten-after-fetch
  (os.Chmod swapped for LockDown)

No behavior change on Unix — the helper delegates to the same
os.WriteFile/os.Chmod calls.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Adds //go:build windows table test for resolveTemplatePaths covering
the four shapes users actually type in PowerShell: relative with
backslashes, nested backslashes, mixed separators, and an absolute
path inside rootDir. Plus a case for an absolute path outside rootDir
asserting that the helm-engine-bound string is still forward-slashed
even when the function declines to relativize.

With the
windows-latest CI runner added, any future path-handling change that
reintroduces backslashes into helm engine keys will fail CI.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The 60-line post-execution block was guarded by 'if err == nil {' five
lines after an 'if err != nil { return ... }' that makes that branch
vacuously true. The wrapper also shadowed err with inner filepath.Abs
assignments, so the outer variable became meaningless — making it
unclear whether gitignore / endpoint / encrypt steps were meant to
depend on an earlier success.

Dead control flow. Un-indented.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Previously writeToDestination branched on permissions == 0o600 exact
match to decide whether to route through secureperm. Fragile: a future
caller passing 0o400 or 0o640 would silently bypass the Windows DACL.

Splits into two named helpers so intent is explicit at the call site:
writeToDestination for non-sensitive files (preset Chart.yaml / values
/ templates, 0o644), writeSecureToDestination for secrets (talosconfig,
secrets.yaml). The two existing 0o600 callers switch to the new name;
writeToDestination reverts to its pre-PR body.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The outside-root case hardcoded 'C:\elsewhere\...' and asserted the
exact forward-slash string. On GitHub Actions windows-latest runner
t.TempDir() lives under D:\ on some images, in which case filepath.Rel
between rootDir and a C:\ path errors out — the function still
normalizes the original input, but the result differs from the
hardcoded expected value.

Construct the outside path via filepath.Join on rootDir (guaranteed
same drive, guaranteed outside) and assert the invariant that matters
for the PowerShell use case: the result contains no backslashes, regardless of which
internal branch the function took.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The cross-platform LockDown test only verified the owner can still
read the file after tighten. That is a necessary but far-from-
sufficient check — it does not verify that anyone else was actually
excluded, which is the whole point of the Windows variant.

Adds a Windows-only test that reads the security descriptor back via
GetNamedSecurityInfo and asserts the SDDL string is structurally
protected (D:P), contains exactly one Allow ACE, and references the
current user SID. If a future refactor drops PROTECTED_DACL_SECURITY_
INFORMATION (letting inherited BUILTIN\Users ACEs return) or the ACE
count mismatches, this test catches it in CI.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
With CI running on windows-latest, a dedicated release archive format,
and NTFS DACL enforcement for secrets, Windows is now a first-class
target. The README previously only mentioned Homebrew (macOS/Linux) and
a POSIX install.sh script, leaving Windows users without a clear
download path. Add an installation subsection pointing at the release
zip and a one-line note that -t path flags accept either separator.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
os.WriteFile only applies the mode argument when creating the file —
on overwrite the prior bits are preserved. A user with a pre-existing
secrets.yaml at 0o644 (copied in from a tarball or an older talm)
would see subsequent talm runs leave it world-readable despite the
'sensitive-file helper' contract. Add an explicit os.Chmod after the
write.

New TestWriteFile_OverwriteDowngrades_Unix pins the fix: seeds a file
at 0o644, calls secureperm.WriteFile, asserts the mode is now 0o600.
Fails on the pre-fix implementation.

Package doc updated to describe the actual sequence.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Previously WriteFile called os.WriteFile (which leaves the inherited
parent-directory DACL — typically BUILTIN\Users readable) and then
LockDown to tighten. Between those two calls, the fully-populated
secret (talm.key, decrypted secrets.yaml) was readable by every local
user. A local attacker polling the directory can grab it.

Now the file is created via CreateFile with a SECURITY_ATTRIBUTES
descriptor that already contains the protected owner-only DACL. The
file's contents never touch disk under a lax ACL.

Factors the owner-only DACL builder out so both WriteFile (at create
time) and LockDown (tighten-after-the-fact) share the same ACE shape.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Both writeToDestination and writeSecureToDestination emitted 'Created
<path>' unconditionally after the underlying write, so a permission
failure produced the confusing 'Created foo.key' + error pair.
Especially misleading on Windows where a DACL failure may leave the
file on disk under the parent's inherited ACL.

Routes the message through a package-level io.Writer (createdSink,
defaults to os.Stderr) gated on err == nil. The writer is swappable
in tests — TestWriteToDestination_SilentOnFailure passes a bytes.Buffer
and asserts no 'Created' line when the destination is a directory,
TestWriteToDestination_AnnouncesOnSuccess pins the happy-path message.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Per MSDN, CreateFile silently ignores lpSecurityDescriptor when
opening an existing file or device. So the previous implementation —
CreateFile + CREATE_ALWAYS + SECURITY_ATTRIBUTES — installed the
owner-only DACL only for brand-new files; an overwrite of a
pre-existing secrets.yaml/talosconfig/talm.key left with an inherited
permissive DACL would leave the weak DACL in place after rewrite.

Fix: request WRITE_DAC in the CreateFile access mask and call
SetSecurityInfo on the open handle before writing any bytes. The
handle is opened exclusive (share mode = 0), so no other process can
observe the file between the DACL switch and the write.

New Windows-only tests:
- TestWriteFile_NewFile_DACL_Windows pins the create-with-protected-
  DACL happy path.
- TestWriteFile_Overwrite_DACL_Windows seeds a file via os.WriteFile
  (which inherits the parent TempDir DACL, typically including
  BUILTIN\Users) and asserts the post-WriteFile DACL is protected and
  owner-only. Fails on the pre-fix code.
- TestLockDown_DACL_Windows mirrors the assertion for LockDown alone.

Assertions use a shared helper that reads the SDDL via
GetNamedSecurityInfo and structurally checks for D:P and exactly one
Allow ACE naming the current user SID.

Package doc updated to reflect the new behavior.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The AnnouncesOnSuccess test compared the printed 'Created <path>' line
against 'dir + "/ok.txt"'. That works today because writeToDestination
prints destination verbatim, but is fragile to a future refactor that
runs destination through filepath.Clean (which would switch separators
on Windows). filepath.Join sidesteps the coupling.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…failure

Previous implementation opened the target path with CreateFile +
CREATE_ALWAYS (Windows) or os.WriteFile + O_TRUNC (Unix), both of
which truncate the existing file before the new bytes land. A failure
between truncate and the final write left the caller with zero bytes
— and secrets files (talm.key, secrets.yaml) are not reconstructible,
so the blast radius is a cluster PKI reissue.

Both implementations now:
  1. Create a hidden sibling tmp file in the same directory with the
     final permissions/DACL already in place (os.CreateTemp on Unix
     gives 0o600 by construction; on Windows a custom retry loop
     calls CreateFile with CREATE_NEW + a protected owner-only
     SECURITY_ATTRIBUTES).
  2. Write the bytes to the tmp.
  3. Close the tmp.
  4. os.Rename the tmp over the target. Rename is atomic on both
     POSIX and NTFS when source and destination live on the same
     filesystem, which they do by construction.
  5. On any pre-rename failure, the tmp is removed and the original
     path is left untouched.

New TestWriteFile_PreservesOriginalOnFailure_Unix pins the contract:
seed a secret, make the parent directory read-only so os.CreateTemp
fails, confirm the original content is still intact after WriteFile
returns an error. Skipped when running as root (directory mode bits
are ignored).

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The --inplace branch rewrote the node config with os.WriteFile +
0o644. The rendered content embeds the cluster's certs, PKI keys, and
join tokens — exactly the material secureperm was introduced to
protect. Without this migration, talm template -i on Windows leaves
the rendered config readable by every local user (inherited parent
DACL), and on Unix leaves it world-readable. Both regress the contract
the rest of this branch enforces.

Extract the write into writeInplaceRendered so the migration is
testable without spinning up a Talos client. Two new Unix tests pin
the contract:

- TestWriteInplaceRendered_Mode0600_Unix: freshly rendered file lands
  at mode 0o600.
- TestWriteInplaceRendered_OverwriteDowngrades_Unix: overwriting an
  older 0o644 file tightens to 0o600.

Windows coverage is implicit — secureperm.WriteFile is the same call
already exercised by secureperm's TestWriteFile_Overwrite_DACL_Windows.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The package-level doc still described the earlier 'os.WriteFile +
Chmod' / 'CreateFile + SetSecurityInfo on handle' strategy that was
superseded in commit 6fe6771 by atomic tmp + rename on both platforms.
Anyone reading the package for the first time — an auditor, the next
maintainer, a security reviewer — would form a wrong mental model of
how overwrite is handled (rename, not SetSecurityInfo) and why
CREATE_NEW matters (it's what makes CreateFile honor SECURITY_-
ATTRIBUTES; CREATE_ALWAYS would ignore it).

Rewrite the doc to describe the actual tmp + rename strategy on both
Unix and Windows, the crash/failure preservation argument, and the
MSDN gotcha that motivates the CREATE_NEW retry loop.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The atomic-write contract on Windows — if the rename fails, the
original file is left intact — was asserted only on Unix. The Windows
path uses a different primitive (CreateFile + os.Rename/MoveFileEx)
with different failure modes, and crashing mid-operation over a
production secrets.yaml costs the user a cluster PKI reissue, so it
needs its own CI coverage.

TestWriteFile_PreservesOriginalOnFailure_Windows induces the failure
by setting FILE_ATTRIBUTE_READONLY on the destination — MoveFileEx
with MOVEFILE_REPLACE_EXISTING returns ERROR_ACCESS_DENIED against a
read-only target, so os.Rename propagates the error and the deferred
cleanup in WriteFile removes the tmp. The test then reads the
original content back and asserts it is byte-identical to the seed.

Also rewrites the stale comment on TestWriteFile_Overwrite_DACL_Windows
that still referenced the CREATE_ALWAYS + SetSecurityInfo strategy
replaced by the atomic tmp + rename strategy in commit 6fe6771.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ndleToFile

writeSecretsBundleToFile called validateFileExists(secretsFile)
directly, then handed off to writeSecureToDestination, which runs the
same check internally. Not a correctness bug, but brittle — a future
change to only one of those call sites would silently diverge
behavior. Drop the outer call and let the helper own the gate.

TestWriteSecretsBundleToFile_StillRefusesOverwrite pins the contract:
with --force=false and an existing secrets.yaml, the call must still
error out with an 'already exists' message and leave the original
content intact. Fails if the inner validateFileExists inside
writeSecureToDestination is ever removed too.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Comments must describe the problem directly rather than pointing at
an issue-tracker number. The test subject (backslash paths from
PowerShell hitting a forward-slash-only helm engine) already conveys
what the regression guard is for.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
lexfrei added 14 commits April 28, 2026 15:50
The original PowerShell reproducer — `talm template -t
templates\worker.yaml` — has been fixed since v0.19.1 via
engine.NormalizeTemplatePath, but no automated test exercised the
actual template-command path-resolution logic. apply_windows_test.go
covers apply's near-duplicate resolveTemplatePaths, not this one.

Extract the generateOutput engine-path resolution block into
resolveEngineTemplatePaths so it can be unit-tested directly, then
add TestResolveEngineTemplatePaths_BackslashInput with the four
shapes PowerShell users actually type: relative with backslashes,
nested backslashes, mixed separators, and an absolute path under
rootDir. The invariant asserted is the one the helm engine cares
about — zero backslashes in the resolved path.

Also adds TestWriteInplaceRendered_ProtectedDACL_Windows asserting
that `talm template --inplace` writes the rendered machine config
(certs, PKI keys, join tokens) under a protected owner-only NTFS
DACL, not the parent directory's inherited permissive one.

No behavior change in generateOutput — the extracted function is
structurally identical to the previous inline block.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
GenerateKey now writes through secureperm.WriteFile. The package had
no test file before this commit, so a regression that silently
reverts the call back to os.WriteFile would leave the age private
key — the root of trust for every encrypted secret in the project —
readable by other local users without any CI signal.

Add a small standalone test asserting talm.key lands at mode 0600
after GenerateKey returns. Windows DACL coverage is transitive via
secureperm's existing TestWriteFile_NewFile_DACL_Windows.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The previous text implied all path flags accepted both separators,
which is only true for -t / --template (and now -f / --file in
apply) where talm runs filepath.ToSlash before handing the key to
the helm engine. --talosconfig is passed through to talosctl's
config loader and follows standard Windows path rules, not talm's
normalization.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Two failure modes surfaced only after windows-latest joined the test
matrix:

1. apply_test.go::TestResolveTemplatePaths/path_outside_rootDir_is_kept_as-is
   hardcoded /other/project/templates/controlplane.yaml as the
   outside-root input. That string is absolute on POSIX but not on
   Windows (no drive letter), so the resolver treated it as relative,
   joined it with tmpRoot, and produced 'other/project/...' instead of
   keeping it as-is. Construct the absolute path via
   filepath.VolumeName + filepath.Separator so it is absolute on both
   OSes, and expect filepath.ToSlash(absOutside) as the normalized
   result.

2. DACL assertions in secureperm_windows_test.go and
   template_windows_test.go used strings.Contains(sddl, fullSid) to
   verify the trustee. On GitHub Actions windows-latest the runner's
   RID-500 admin account is emitted as the SDDL alias 'LA' rather than
   the literal SID, so the substring check failed. Extract the ACE
   trustee via regex and resolve it through windows.StringToSid (which
   accepts both literal SIDs and well-known aliases), then compare
   against the current user SID with windows.EqualSid.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Address review feedback from coderabbitai on pkg/commands/template.go:
`strings.HasPrefix(relPath, "..")` also matched valid paths whose
first directory component literally starts with "..", such as
`..templates/controlplane.yaml`. On the template command path, that
sent a legitimate input into the templates/<basename> fallback and
silently loaded a different file when one happened to exist under
templates/.

Extract isOutsideRoot helper that matches ".\".\" + path
separator" or the exact ".." element, and route all three duplicate
check sites (apply.go resolveTemplatePaths, template.go generateOutput
modeline branch, template.go resolveEngineTemplatePaths) through it.
Add a regression test that seeds a rootDir/..templates/controlplane.yaml
real target alongside a rootDir/templates/controlplane.yaml decoy and
asserts the real target is returned.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Address review feedback from gemini-code-assist on pkg/commands/init.go:
writeSecureToDestination writes sensitive files (talm.key,
secrets.yaml, talosconfig) but called os.MkdirAll with os.ModePerm,
relying on the process umask to drop world-readable/writable bits.
Under a permissive umask the parent dir could end up 0o755 even
though the files themselves land at 0o600, weakening defense-in-depth.

Use 0o700 directly so a newly-created parent dir for secrets is
owner-only regardless of umask. MkdirAll is a no-op when the dir
already exists, so this does not alter pre-existing dir permissions.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Address review feedback from coderabbitai on
pkg/secureperm/secureperm_unix.go: the atomic tmp+rename pattern
protects against partial writes, but without fsync of the tmp file
before rename and of the parent directory after, a crash or power
loss between os.Rename and the delayed disk flush can surface the
renamed inode pointing at zero-length or stale data — the canonical
failure mode this pattern is meant to avoid. The package contract
targets non-reconstructible secrets (losing secrets.yaml forces a
cluster PKI reissue), so the full fsync is warranted.

Sync the tmp file between Write and Close (error is returned because
a failed fsync means the data is not on disk), then best-effort
fsync the parent directory after rename so the rename entry itself
is durable. Directory-fsync errors are ignored because a handful of
filesystems don't support it and the tmp fsync already covers the
payload.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Address review feedback from coderabbitai on
pkg/commands/template_unix_test.go:
TestWriteInplaceRendered_OverwriteDowngrades_Unix seeded the target
with os.WriteFile(path, 0o644), which is filtered by the process
umask. On a developer machine with umask 0o077 the seed lands at
0o600 and the test passes without proving that writeInplaceRendered
tightened a lax file. Follow the WriteFile call with an explicit
os.Chmod(path, 0o644) so the precondition is deterministic.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Address review feedback from coderabbitai on
pkg/secureperm/secureperm_unix_test.go:
TestLockDown_Mode0600_Unix and TestWriteFile_OverwriteDowngrades_Unix
seed a 0o644 file via os.WriteFile, but that mode is filtered by the
process umask. On a developer machine with a restrictive umask such
as 0o077, the seed lands at 0o600 and the tests pass without
exercising the downgrade path. Chmod the seed to 0o644 explicitly
after the write so the lax-mode precondition is deterministic.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add a direct unit test that fixes the boolean contract independently
of any caller — the helper distinguishes a path that truly escapes
root (".\." or first element of "..") from one whose first element
merely starts with ".." (e.g. "..templates", "..mykube"). Pinning
the contract here means future call sites using the helper inherit
validated semantics rather than re-deriving them.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The two strings.HasPrefix(relPath, "..") checks misclassify a sibling
directory whose name merely starts with ".." (e.g. "..mykube") as
outside-root. Concretely: a kubeconfig living under such a sibling
would silently skip the .gitignore-add and the encrypted-update
branches.

This is the same logical bug the PR already fixes in apply.go and
template.go via the isOutsideRoot helper; folding the fix into the
same package keeps the path-classification rule consistent.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The Windows path skipped the FlushFileBuffers call that the Unix path
performs before rename. os.Rename on Windows uses MoveFileEx without
MOVEFILE_WRITE_THROUGH, so a crash/power-loss between rename and the
OS cache flush could leave the renamed file pointing at zero-length
or stale data on reboot — the canonical failure mode the atomic-
rename pattern is meant to avoid. Secrets files are not
reconstructible, so the flush is warranted on both platforms.

Also drop the unreachable dir == "" guard: filepath.Dir is
documented to never return empty (returns "." for a bare filename
and "." for empty input), so the branch never fires.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
os.WriteFile would have preserved the existing inode's owner via
O_TRUNC; tmp + rename produces a file owned by the calling process
uid/gid instead. Document the trade-off so a mixed-uid invocation
(e.g. once under sudo, then as the original user) is not surprising.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Both tests use os.Chdir with t.Cleanup to restore. os.Chdir is
process-global, so a future addition of t.Parallel here or to a
sibling test would silently corrupt CWD state. The comment is the
cheapest regression guard.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Copy link
Copy Markdown
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Windows support is thorough: atomic writes, proper NTFS DACL handling, comprehensive cross-platform tests, and the collateral fixes (..templates edge case, vacuous err wrapper) are appreciated.

@kvaps kvaps merged commit 2545b3b into main Apr 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows support is broken

3 participants