You built dynamic-shell-server so MCP clients can run shell commands with user approval and audit logging. The dynamic approval system is a good first-layer control.
One gap it leaves: the approval step is interactive, but there's no signed artifact proving which commands were actually executed vs. blocked that survives independently after the session. If an agent's tool output contained a prompt injection that shaped what the user saw at the approval prompt, the existing audit log wouldn't detect that.
We're building HELM AI Kernel, a small OSS fail-closed execution boundary that can sit between the MCP client and your server: it applies default-deny policy deterministically, records every decision as a signed receipt, and bundles decisions into an offline-verifiable EvidencePack. The boundary operates independently of the approval UI — so both layers would need to be bypassed for a risky command to execute unrecorded.
Would you be open to pointing dynamic-shell-server at HELM for one risky workflow — say a cleanup task that would run rm -rf on matched paths — and filing an issue with any policy ergonomics or CLI rough edges?
Local demo (~5 min):
brew install mindburnlabs/tap/helm-ai-kernel
helm-ai-kernel serve --policy ./release.high_risk.v3.toml
helm-ai-kernel receipts tail --agent agent.demo.exec --server http://127.0.0.1:7714
helm-ai-kernel verify evidence-pack.tar
One feedback question: Did HELM's default policies feel too coarse or too weak for the kinds of commands your users typically approve? What extra metadata from MCP tool calls would make policy decisions less brittle?
Repo: https://github.com/Mindburn-Labs/helm-ai-kernel
You built dynamic-shell-server so MCP clients can run shell commands with user approval and audit logging. The dynamic approval system is a good first-layer control.
One gap it leaves: the approval step is interactive, but there's no signed artifact proving which commands were actually executed vs. blocked that survives independently after the session. If an agent's tool output contained a prompt injection that shaped what the user saw at the approval prompt, the existing audit log wouldn't detect that.
We're building HELM AI Kernel, a small OSS fail-closed execution boundary that can sit between the MCP client and your server: it applies default-deny policy deterministically, records every decision as a signed receipt, and bundles decisions into an offline-verifiable EvidencePack. The boundary operates independently of the approval UI — so both layers would need to be bypassed for a risky command to execute unrecorded.
Would you be open to pointing dynamic-shell-server at HELM for one risky workflow — say a cleanup task that would run
rm -rfon matched paths — and filing an issue with any policy ergonomics or CLI rough edges?Local demo (~5 min):
One feedback question: Did HELM's default policies feel too coarse or too weak for the kinds of commands your users typically approve? What extra metadata from MCP tool calls would make policy decisions less brittle?
Repo: https://github.com/Mindburn-Labs/helm-ai-kernel