Skip to content

Direct shebang execution#116

Merged
jserv merged 1 commit into
sysprog21:mainfrom
open-sources-port:feature/direct_shebang_execution
Jul 2, 2026
Merged

Direct shebang execution#116
jserv merged 1 commit into
sysprog21:mainfrom
open-sources-port:feature/direct_shebang_execution

Conversation

@doanbaotrung

@doanbaotrung doanbaotrung commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Refactor shebang parsing logic: Extract the duplicate binfmt_script parsing logic from main.c and exec.c into a shared helper function elf_parse_shebang defined in elf.c and declared in elf.h.


Summary by cubic

Enables direct execution of shebang scripts by resolving interpreters in the standalone loader and execve with bounded recursion (max 5) and consistent errno handling. Adds a shared shebang parser and a host unit test for edge cases.

  • New Features

    • Standalone loader recursively resolves nested shebangs (max 5), updates elf_path, and prepends the interpreter and optional arg to argv (e.g., #!/bin/sh -x).
  • Refactors

    • Consolidated parsing into elf_read_shebang and updated execve to use it, rebuild argv as [interp, opt-arg, script, argv[1:]], and mirror Linux-like errors (ENOEXEC for invalid/too-long lines or buffer limits, ELOOP when depth exceeded); added test-shebang-host with Makefile/mk/tests.mk targets.

Written for commit 10b9d69. Summary will update on new commits.

Review in cubic

@doanbaotrung doanbaotrung force-pushed the feature/direct_shebang_execution branch from d6f2d70 to d66a9b7 Compare July 2, 2026 08:04
@jserv jserv changed the title Direct Shebang Execution Direct shebang execution Jul 2, 2026
cubic-dev-ai[bot]

This comment was marked as resolved.

@jserv

jserv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Avoid unnecessary capitalization. i.e., "Direct Shebang Execution" -> "Direct shebang execution"
See https://cbea.ms/git-commit/ carefully

Comment thread src/core/elf.c Outdated
Comment thread src/core/elf.c
Comment on lines +450 to +452
if (nread < 2 || buf[0] != '#' || buf[1] != '!') {
return 0; /* Not a shebang script */
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing to have shebang detection in function elf_parse_shebang. Clarify it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a fair point. The function is responsible for both detection (checking for the #! signature) and parsing. Combining them into a single function avoids having to open, read, and close the file twice (once for detection and once for parsing).

To clarify this dual responsibility, I have:

  1. Updated the doc comment in elf.h and the function header in elf.c to explicitly state that elf_parse_shebang performs both detection (returning 0 if not a shebang script) and parsing.
  2. Kept the callers in main.c and exec.c simple by delegating both checks entirely to this function."

@doanbaotrung doanbaotrung force-pushed the feature/direct_shebang_execution branch 3 times, most recently from 23ed4fa to 430f0e4 Compare July 2, 2026 09:21

@jserv jserv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shebang refactor is a clear improvement. One correctness issue worth addressing (nested shebang via execve), plus the naming clarification raised in the earlier thread. Lower-value notes: the argv buf_off in exec.c recovers its offset via pointer subtraction from argv[argc-1] -- brittle, better returned from read_string_array; and the CR/early-EOL dialect is worth a one-line contract note plus tests (CRLF, blank interp, no trailing newline).

Comment thread src/syscall/exec.c Outdated
Comment thread src/syscall/exec.c Outdated
Comment thread src/core/elf.c Outdated
Comment thread src/core/elf.c Outdated
Comment thread src/core/elf.c
if (fd < 0)
return -errno;

char buf[512];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: reads 511 of 512 bytes then treats the first line as complete; an over-long interpreter path is silently truncated rather than rejected. str_copy_trunc's size check catches the common case. Consider rejecting an unterminated first line. (Still more generous than Linux BINPRM_BUF_SIZE=256.)

@jserv

jserv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

When writing Git commit messages, avoid formatting them like this:

Implemented a recursive shebang parsing loop
in the standalone loader's entry point.
It reads shebang interpreter lines
(including optional arguments like #!/bin/sh -x),
updates elf_path to point to the interpreter,
prepends them to guest_argv, and resolves the
host path recursively up to a depth of 5 levels.

At first glance, this reads more like a haiku than a commit message. Since Git conventionally allows up to 72 characters per line in the message body, wrap the text naturally while making good use of the available width. For example:

Implemented a recursive shebang parsing loop in the standalone loader's
entry point. It reads shebang interpreter lines, including optional
arguments like #!/bin/sh -x, updates elf_path to the interpreter, and
prepends them to guest_argv while resolving the host path recursively
up to a maximum recursion depth of five interpreter levels.

The goal is not to make every line exactly 72 characters long, but to avoid unnecessarily short lines. Wrap at sensible boundaries so that the text remains easy to read while making efficient use of the available line width.

@doanbaotrung doanbaotrung force-pushed the feature/direct_shebang_execution branch from 430f0e4 to e65ed30 Compare July 2, 2026 10:24
@doanbaotrung

doanbaotrung commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

I have implemented all the requested updates from the new code review comments:

  1. Nested Shebang Resolution in sys_execve: Refactored the execve path in exec.c into a bounded recursive loop (resolving up to 5 shebang layers recursively). Bounded depth exhaustion returns -LINUX_ELOOP. Temporary materialized paths are unlinked at each iteration so they don't leak.
  2. Safe argc check: Changed new_argc verification in exec.c to a safer size-based condition (argc > MAX_ARGS - prefix + 1) that checks the boundary before addition, preventing potential overflow.
  3. Naming & Documentation:
    • Renamed elf_parse_shebang to elf_read_shebang in all declarations, headers, and invocations to clarify it performs both probing (detection) and parsing.
    • Dropped the redundant description comment inside elf.c.
  4. Overlong/Unterminated lines: Used strpbrk(buf + 2, "\r\n") in elf.c to locate line endings. If no line ending is found and the buffer is completely filled, it returns -ENOEXEC.
  5. Host-Side Unit Tests: Added a host-side unit test file test-shebang-host.c that exercises:
    • CRLF, LF, and CR dialects.
    • Scripts ending without trailing newlines (EOF).
    • Empty/blank shebang interpreters.
    • Non-shebang scripts.
    • Overlong/unterminated shebang lines.
    • The test was successfully linked and runs automatically with make check.

All changes compile cleanly and pass the native unit tests.

@jserv

jserv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

All changes compile cleanly and pass the native unit tests. You can review the details in walkthrough.md.

We are humans. Don't copy agent-specific reply here.

@doanbaotrung

This comment was marked as duplicate.

@doanbaotrung

doanbaotrung commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Do you have any tool to help format the git commit message?
I break line manually so some line longer than others and not well formatted

@jserv jserv requested a review from Max042004 July 2, 2026 10:33
Implemented a recursive shebang parsing loop in the standalone loader's
entry point. It reads shebang interpreter lines, including optional
arguments like #!/bin/sh -x, updates elf_path to the interpreter, and
prepends them to guest_argv while resolving the host path recursively
up to a maximum recursion depth of five interpreter levels.
@doanbaotrung doanbaotrung force-pushed the feature/direct_shebang_execution branch from e65ed30 to 10b9d69 Compare July 2, 2026 10:35
@jserv jserv merged commit d148363 into sysprog21:main Jul 2, 2026
4 checks passed
@jserv

jserv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Thank @doanbaotrung for contributing!

@doanbaotrung doanbaotrung deleted the feature/direct_shebang_execution branch July 2, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants