-
Notifications
You must be signed in to change notification settings - Fork 1
Simplify CLI: Accept URLs directly for indexing #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
augment review |
🤖 Augment PR SummarySummary: This PR simplifies the indexing CLI by allowing users to pass a repository/website URL directly and having the tool auto-detect the source type. Changes:
Technical Notes: Self-hosted GitLab/Bitbucket URLs are supported via 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/core/url-parser.ts
Outdated
| } | ||
|
|
||
| const owner = pathParts[0]; | ||
| const repo = pathParts[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URLs pasted from clone endpoints often end with .git; currently repo/projectName/defaultIndexName will include the suffix, which likely breaks GitHub/GitLab/Bitbucket API paths and yields odd default index keys. Consider stripping a trailing .git from repo/project segments when parsing.
Other Locations
src/core/url-parser.ts:125src/core/url-parser.ts:152
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Fixed! Added a stripGitSuffix() helper function that removes trailing .git from repo/project names. Applied to all three parsers (GitHub, GitLab, Bitbucket) and added comprehensive tests.
See commit 845aa3b.
src/core/url-parser.ts
Outdated
| } | ||
|
|
||
| // GitLab (gitlab.com or hostname contains "gitlab") | ||
| if (hostname === "gitlab.com" || hostname.includes("gitlab")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hostname.includes("gitlab") / hostname.includes("bitbucket") can misclassify unrelated websites whose domain happens to contain those substrings and then fail parsing instead of falling back to website. Consider a more conservative self-hosted detection strategy to reduce false positives.
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Fixed! Changed from hostname.includes() to hostname.startsWith() for self-hosted detection:
// Before (too broad)
hostname.includes("gitlab")
// After (conservative)
hostname.startsWith("gitlab.")This prevents false positives like "notgitlab.com" or "mygitlabtools.com" from being misclassified. Added tests to verify the conservative behavior.
See commit 845aa3b.
| program.parse(); | ||
| // Auto-detect URL mode: ctxc index <url> -> ctxc index url <url> | ||
| // This allows users to skip the 'url' subcommand when providing a URL directly | ||
| const indexIdx = process.argv.indexOf("index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL auto-rewrite only triggers when the URL is the first argument after index, so ctxc index -i myidx https://… won’t be rewritten and will likely error. Is that limitation intentional, or should the rewrite scan forward for the first non-option arg?
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional. The URL is treated like a positional argument (similar to how git clone <url> works), so it must come first after index.
The supported patterns are:
ctxc index https://github.com/owner/repo # URL first
ctxc index https://github.com/owner/repo -i myidx # Options after URLScanning forward for non-option args would add complexity and could lead to ambiguous parsing in edge cases. The current behavior is consistent with how most CLI tools handle positional arguments.
- Parse GitHub URLs (owner/repo, tree/branch, commit/sha) - Parse GitLab URLs (project path, subgroups, /-/tree/branch) - Parse Bitbucket URLs (workspace/repo, src/branch, branch/name) - Fallback to website source for unknown URLs - Extract default index names from URLs - Support self-hosted GitLab and Bitbucket instances - Export parseSourceUrl and ParsedUrl from @augmentcode/context-connectors/core - Add comprehensive unit tests (19 test cases) Agent-Id: agent-8394bd07-7a81-41d0-ac95-1ca62623e6fb
Adds command that auto-detects source type (GitHub, GitLab, Bitbucket, or website) from the URL and creates the appropriate source. Features: - Parses URL using parseSourceUrl() to determine source type - Supports --ref option to override URL-detected branch/tag - Supports -i/--index option to override default index name - Supports --store and --store-path options - Default index name derived from repo/project name - Graceful error handling for invalid URLs Agent-Id: agent-c9423996-94bb-4ab3-8311-ca0cc822da14
Adds pre-parse argument rewriting that auto-detects when a URL is passed directly to 'ctxc index' and transparently inserts the 'url' subcommand. Before: ctxc index url https://github.com/owner/repo After: ctxc index https://github.com/owner/repo Both syntaxes now work. Existing subcommands (github, gitlab, etc.) are unchanged and continue to work. Agent-Id: agent-ce81a04d-72f2-4289-8eb7-c3074d7d8030
- Strip .git suffix from repo/project names (clone URLs now work) - Conservative self-hosted detection (hostname.startsWith instead of includes) - CLI: Reorder args so options before URL are handled correctly Fixes: 1. URLs like https://github.com/owner/repo.git now parse correctly 2. notgitlab.com no longer incorrectly matches as GitLab 3. now works (options can be anywhere) Added 8 new tests for edge cases. Agent-Id: agent-ce81a04d-72f2-4289-8eb7-c3074d7d8030
00f2c42 to
ad138f4
Compare
Remove argument reordering logic. URL must now appear immediately after 'index', consistent with how other subcommands work: ctxc index https://github.com/owner/repo -i name ✓ ctxc index -i name https://github.com/owner/repo ✗ (error) This is more predictable and matches CLI conventions.
884ab0a to
89e6268
Compare
Agent-Id: agent-f65941cf-2aac-4651-a905-32f3d8b9313d Linked-Note-Id: 26a5d7df-b154-45b3-9351-1698a06d4fd0
- Move examples from description to addHelpText() so they only appear in Usage: ctxc index [options] [command] Index a data source Options: -h, --help display help for command Commands: github [options] Index a GitHub repository gitlab [options] Index a GitLab project bitbucket [options] Index a Bitbucket repository website [options] Crawl and index a website help [command] display help for command, not in the main menu - Simplify usage line to instead of showing two separate usage patterns Agent-Id: agent-8cab8bce-f29f-48f1-8fc1-86167ff2398b
Summary
Simplifies the
ctxc indexcommand to accept URLs directly, eliminating the need for verbose source-specific flags.Before
After
Changes
1. URL Parser Module (
src/core/url-parser.ts)parseSourceUrl()function that auto-detects source type from URLs/tree/main2. CLI URL Mode (
src/bin/cmd-index.ts)urlsubcommand:ctxc index url <url>-i)--refto override branch/tag3. Direct URL Syntax (
src/bin/index.ts)ctxc index <url>without theurlsubcommandurlwhen a URL is detectedBackward Compatibility
✅ All existing subcommands work unchanged:
ctxc index github --owner augmentcode --repo context-connectors # Still worksTesting
Examples
Pull Request opened by Augment Code with guidance from the PR author