feat(similarity): 脚本相似度检测与完整性校验系统 by CodFrm · Pull Request #3 · scriptscat/scriptlist

CodFrm · 2026-04-16T07:39:18Z

No description provided.

Implements parseAndNormalize(code) which parses JavaScript using goja's parser and walks the resulting AST to produce a normalized Token stream. Identifiers collapse to KindVar, literals to their typed kinds, and structural constructs emit KindKeyword/KindPunct tokens so the stream still encodes program structure. Unknown node types fall through to KindUnknown to keep the walker deterministic as more JS constructs surface in subsequent tasks. Promotes github.com/dop251/goja from indirect to direct dependency.

…roperties

…lexity

…t dual-channel error

Scaffolds the data-layer entities and gormigrate migration for the Phase 2 code similarity detection and integrity review feature: - Fingerprint / SimilarPair / SuspectSummary - SimilarityWhitelist / IntegrityWhitelist / IntegrityReview Registers T20260414 in migrations/init.go.

…inter receivers

- Replace errors.Is(err, gorm.ErrRecordNotFound) with db.RecordNotFound(err) to match the canonical CaGo helper used in 64 other repo sites. - Drop createtime/updatetime mutation from Upsert; service layer owns the clock per existing repo convention (see internal/service/* sites). - UpdateParseStatus now takes scannedAt int64 explicitly so the repo stays clock-free; regenerate mock to reflect the new signature.

…calls

docs/docs.go is regenerated by swag and its legacy strings.Replace calls trigger staticcheck QF1004 — we don't own the template, so exclude the entire directory rather than chase transient hits.

Adds the Phase 4 operations tooling required by §4.5 and §8.5 bootstrap: - similarity_patrol crontab handler with two modes: daily Patrol() for incremental catch-up of scripts whose latest code is newer than their last fingerprint scan, and RunBackfill() kicked off by the admin endpoint to iterate every active script from a persisted cursor with rate-limiting and resumable state. - similarity_repo.PatrolQueryRepo with ListStaleScriptIDs / ListScriptIDsFromCursor / CountScripts for the two modes. - similarity_svc.BackfillState helpers persisted via system_config: TryAcquireBackfillLock, SetBackfillCursor, FinishBackfill, ResetBackfillCursor. State survives restarts and prevents simultaneous admin clicks from double-starting. - Admin endpoints POST /admin/similarity/backfill (with reset flag for §8.5 step 9), GET /admin/similarity/backfill/status, POST /admin/similarity/scan/:script_id, and POST /admin/similarity/stop-fp/refresh (§8.5 step 8 on-demand refresh). - RegisterBackfillRunner + RegisterStopFpRefresher function-injection seams wire the crontab handler methods into admin_svc without an import cycle. - Production code uses function-typed fields / package vars for all Redis, NSQ producer, system_config, and PatrolQuery dependencies so unit tests can substitute fakes (matches existing similarity_stop_fp pattern). - 31 new unit tests covering backfill state (9), admin_backfill + stop-fp refresh (11), and patrol handler (11) — including resume-from-cursor, ctx cancellation during rate-limit sleep, re-entry guards, and publish-failure continuation.

- 用 ES cardinality 聚合替代 Σ CommonCount 修正 coverage 计算，消除跨候选指纹的双重计数（spec §4.1 Step 5） - 新增 PurgeScriptData 级联清理 ES/fingerprint/pair/summary，通过 ScriptDeleteMsg.HardDelete 字段驱动的新 consumer 接入硬删除路径（spec §4.6） - 回填 running flag 改用 Redis SETNX 原子 CAS，消除两位管理员同时点启动的竞态；元数据仍落在 system_config（spec §2.3/§4.5） - DBConfigProvider 新增 GetBool/GetFloat/GetInt，Similarity() 在 YAML 之上叠加 pre_system_config 动态覆盖，让管理员后台可实时调整 14 个 similarity.* 阈值开关（spec §1.1/§6.1）

gosec (G118) flagged the missing defer even though the goroutine fires cancel on the happy path — defer guards the early-return paths.

Introduce a codeFeatures struct computed once per Check so the four Category-A signals share a single rune pass (line count, max line, whitespace, comment bytes) and the two Category-B signals share one collectIdents call. Adds a benchmark covering 1MB obfuscated and 256KB plain samples. On M1: obfuscated 1MB goes 142ms → 80ms (1.78x, allocs halved), plain 256KB 64ms → 50ms (1.29x, allocs -40%).

ScriptCat wraps background/cron scripts in (async function(){ ... })() at runtime, making top-level return and await legal. The fingerprint parser treated the source as a standalone ECMAScript Script, so scripts using either feature were rejected with "Illegal return statement" and marked parse_status=failed, falling out of the similarity index. parseAndNormalize now retries wrapped on parse failure and shifts token positions back by the wrapper prefix length (clamped into the original source range) so downstream match segments still point at real bytes.

Drop the 512KB auto-default on MaxCodeSize. scan.go already guards on `MaxCodeSize > 0`, so zero now means unlimited (bounded only by the API-level 10MB cap on script code). Default config example updated to 0 so fresh deployments index all scripts the backend will accept.

Adds GET /admin/similarity/parse-failures so operators can triage scripts that are invisible to similarity comparison. Default filter is parse_status=failed; pass status=2 to see skipped rows. Rescan uses the existing POST /admin/similarity/scan/:script_id, no new action required. Introduces FingerprintRepo.ListByParseStatus with ParseFailureFilter, the adminSvc.ListParseFailures handler composing script + user briefs, and wires the route into the admin middleware group.

Reset=true on /admin/similarity/backfill previously only zeroed the cursor but left the Scan code_hash short-circuit intact, so every rescanned script no-oped with "code unchanged, skipping" and the admin saw no effect. Thread a force flag from TriggerBackfill → RunBackfill → SimilarityScanMsg → consumer → ScanSvc.Scan. When force=true the short-circuit is bypassed so extraction, ES indexing, and pair upsert all run again. Patrol and the publish/update script events keep force=false to stay idempotent.

…ments IntegritySvc.Check now logs final score, per-category breakdown, and hit signal names so ops can trace why a given script landed in a specific zone. RecordWarning surfaces marshal/upsert failures with full context. BuildMatchSegments logs each load step (fingerprint row, ES positions) and the final segment count, making the evidence-page build path debuggable without attaching a debugger.

Two related issues on the admin similar-pairs view: 1. Soft-deleted scripts kept showing in the pair list with no indication. Per spec §4.6 we deliberately preserve the underlying fingerprint as evidence, so instead of cascading the delete we surface the state: ScriptBrief now exposes IsDeleted, and ListPairsRequest accepts an ExcludeDeleted toggle that JOINs cdb_tampermonkey_script to filter pairs whose either side is in DELETE status. 2. After a script's code changes such that an old pair drops below the Jaccard threshold, the row in pre_script_similar_pair was never touched again and lingered as a zombie. Scan() now calls DeletePendingByScriptID right after candidate lookup so any pair that's still similar gets re-Upserted by step 11 while obsolete pending rows disappear. Whitelisted / reviewed pairs are preserved because those statuses are explicit admin decisions.

walkNode only handled an ES5 subset and dropped to a single KindUnknown for any unrecognized AST node, so any modern userscript starting with a top-level `class` (or built around let/const, arrows, async/await, template literals, destructuring, etc.) collapsed to under 14 tokens and tripped the `too_few_fingerprints` skip in scan — leaving stale similar pairs frozen forever. Rewrite walkNode to cover the full goja AST: classes (including private fields, static blocks, methods, getters), lexical declarations, arrow functions, template literals, await/yield, try/catch/throw, switch/case, for-in/for-of, do-while, with, optional chaining, spread/rest, destructuring patterns, sequence + conditional + unary expressions, new, super, this, meta-property, and PropertyKeyed/Short (which the old object-literal walker had been silently turning into KindUnknown). Also plug the scan early-exit cleanup hole: when scan bails out at any of the five guard paths (soft-deleted / oversized / parse-failed / too-few-fingerprints / non-active), still purge pending pairs touching this script. Otherwise scripts that *used to* match leave their old pairs visible forever, since no later scan reaches step 10b for them. Tested with testdata/1.js (ScriptCat OCS helper, 59KB, 1335 lines): fingerprints went from 1 to 866, total tokens from <14 to 4705. walkNode coverage 63.5% -> 85.6%, purgePendingPairs 50% -> 100%.

HTTP 请求中仅执行快速信号（预计算 Cat A + 已知打包器 Cat D），耗时正则信号（标识符提取、注释统计、字符串数组检测等）由 similarity.scan NSQ 消费者异步处理，避免大型脚本发布超时。 - 新增 CheckFast() 方法，已知打包器签名匹配即时拦截 (score=1.0) - scan.go 步骤 2b：异步完整性检查 + 自动归档 + 记录警告 - 移除已废弃的 integrity.warning 消息队列流程 - 新增 integrity_async_auto_archive 配置项

这两个编码方式极其冷门，实际恶意脚本几乎不会使用，且其代码特征会被其他信号（单字符标识符比率、空白比率等）覆盖，无需专门检测。

Copilot

Pull request overview

This PR introduces the plumbing for a script similarity detection system plus an integrity (minify/obfuscation) pre-check, including persistence tables, NSQ topics/consumers, cron-driven patrol/backfill jobs, and admin/evidence endpoints.

Changes:

Add DB migrations + new similarity/integrity entities and repositories (MySQL + Elasticsearch index init).
Add similarity.scan producer/consumer, hard-delete purge consumer, and cron handlers for patrol/backfill + stop-fingerprint refresh.
Add integrity fast pre-check into script create/update flows, plus config (YAML + DB overrides) and routing/controller wiring.

Reviewed changes

Copilot reviewed 103 out of 107 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
migrations/init.go	Registers new migration.
migrations/20260414.go	Creates/drops similarity tables.
internal/task/producer/topic.go	Adds similarity.scan topic.
internal/task/producer/similarity.go	Producer + subscribe helpers.
internal/task/producer/similarity_test.go	Round-trip msg parsing test.
internal/task/producer/script.go	Adds HardDelete flag to delete msg.
internal/task/crontab/handler/similarity_stop_fp.go	Stop-fingerprint refresh job.
internal/task/crontab/handler/similarity_stop_fp_test.go	Stop-fp handler unit tests.
internal/task/crontab/handler/similarity_patrol.go	Patrol + backfill cron handler.
internal/task/crontab/handler/similarity_patrol_test.go	Patrol/backfill unit tests.
internal/task/crontab/crontab.go	Registers similarity cron handlers.
internal/task/consumer/subscribe/similarity_scan.go	Consumer for similarity.scan.
internal/task/consumer/subscribe/similarity_scan_test.go	Consumer dispatch/force tests.
internal/task/consumer/subscribe/similarity_purge.go	Hard-delete purge consumer.
internal/task/consumer/subscribe/similarity_purge_test.go	Purge consumer tests.
internal/task/consumer/consumer.go	Registers new subscribers.
internal/service/similarity_svc/testdata/reorder_pair/original.js	Similarity fixtures.
internal/service/similarity_svc/testdata/reorder_pair/reordered.js	Similarity fixtures.
internal/service/similarity_svc/testdata/rename_pair/original.js	Similarity fixtures.
internal/service/similarity_svc/testdata/rename_pair/renamed.js	Similarity fixtures.
internal/service/similarity_svc/testdata/different_pair/a.js	Similarity fixtures.
internal/service/similarity_svc/testdata/different_pair/b.js	Similarity fixtures.
internal/service/similarity_svc/testdata/integrity/normal/plain_userscript.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/normal/embedded_small_lib.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/minified/uglify_output.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/minified/terser_output.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/packed/dean_edwards_packer.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/obfuscated/obfuscator_io_level1.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/obfuscated/obfuscator_io_level4.js	Integrity fixtures.
internal/service/similarity_svc/testdata/integrity/borderline/has_vendored_json.js	Integrity fixtures.
internal/service/similarity_svc/purge.go	Purge cascade implementation.
internal/service/similarity_svc/purge_test.go	Purge cascade tests.
internal/service/similarity_svc/pending_warning.go	Integrity result types.
internal/service/similarity_svc/mock/scan.go	Generated scan mock.
internal/service/similarity_svc/mock/integrity.go	Generated integrity mock.
internal/service/similarity_svc/match_segments.go	Build UI match segments.
internal/service/similarity_svc/match_segments_test.go	Match segment tests.
internal/service/similarity_svc/integrity_signals.go	Integrity signals implementation.
internal/service/similarity_svc/integrity_signals_test.go	Signal unit tests.
internal/service/similarity_svc/integrity.go	Integrity service + messaging.
internal/service/similarity_svc/integrity_test.go	Integrity end-to-end tests.
internal/service/similarity_svc/integrity_bench_test.go	Integrity benchmarks.
internal/service/similarity_svc/doc.go	Package-level docs.
internal/service/similarity_svc/backfill_state.go	Backfill state + Redis lock.
internal/service/similarity_svc/backfill_state_test.go	Backfill state tests.
internal/service/similarity_svc/admin_backfill.go	Admin backfill/manual scan hooks.
internal/service/similarity_svc/access.go	Evidence access middleware.
internal/service/similarity_svc/access_test.go	Access service smoke test.
internal/service/script_svc/script.go	Integrates integrity gate + scan publish.
internal/repository/similarity_repo/fingerprint.go	Fingerprint MySQL repo.
internal/repository/similarity_repo/fingerprint_test.go	Repo shape test.
internal/repository/similarity_repo/fingerprint_es_init.go	ES index create helper.
internal/repository/similarity_repo/fingerprint_es_test.go	ES query-body tests.
internal/repository/similarity_repo/patrol_query.go	Patrol/backfill SQL repo.
internal/repository/similarity_repo/similar_pair.go	Pair repo + normalization.
internal/repository/similarity_repo/suspect_summary.go	Suspect summary repo.
internal/repository/similarity_repo/similarity_whitelist.go	Pair whitelist repo.
internal/repository/similarity_repo/integrity_whitelist.go	Integrity whitelist repo.
internal/repository/similarity_repo/integrity_review.go	Integrity review queue repo.
internal/repository/similarity_repo/*_test.go	Repo/interface shape tests.
internal/repository/similarity_repo/mock/*.go	Generated repo mocks.
internal/repository/similarity_repo/doc.go	Repo package docs.
internal/repository/script_repo/script_code.go	Adds FindByIDIncludeDeleted.
internal/repository/script_repo/mock/script_code.go	Updates mock for new method.
internal/pkg/code/code.go	Adds similarity error codes.
internal/pkg/code/zh_cn.go	Adds similarity zh-CN messages.
internal/model/entity/similarity_entity/*.go	New similarity/integrity entities.
internal/controller/similarity_ctr/similarity.go	Similarity controller methods.
internal/api/router.go	Wires admin + evidence routes.
configs/db_provider.go	Adds typed getters (bool/float/int).
configs/db_provider_test.go	Adds DB provider tests.
configs/config.go	Adds SimilarityConfig + defaults/overrides + validate hook.
configs/config.yaml.example	Adds similarity config examples.
cmd/app/main.go	Registers similarity repos/services + ensures ES index.
go.mod	Adds deps for similarity/integrity.
go.sum	Updates dependency checksums.
.golangci.yml	Updates lint exclusions.
.gitignore	Ignores .omc directory.

Copilot · 2026-04-16T07:45:15Z

+	// similarity.scan_enabled=true 需要 elasticsearch 地址（cago 读取 elasticsearch.address 列表）
+	if cfg.Bool(ctx, "similarity.scan_enabled") {
+		var esAddress []string
+		_ = cfg.Scan(ctx, "elasticsearch.address", &esAddress)
+		if len(esAddress) == 0 {
+			return fmt.Errorf("similarity.scan_enabled=true requires elasticsearch.address to be set")
+		}
+	}


Validate() checks cfg.Bool("similarity.scan_enabled") to decide whether Elasticsearch must be configured, but Similarity() defaults ScanEnabled=true even when the YAML key is absent. This can let the app start without elasticsearch.address while similarity scanning is effectively enabled (and main.go later calls EnsureFingerprintIndex based on Similarity().ScanEnabled). Consider basing this check on Similarity().ScanEnabled (or otherwise applying the same defaulting logic as Similarity()) so startup validation matches runtime behavior.

Copilot · 2026-04-16T07:45:15Z

+	// 完整性前置检查（仅执行快速信号，耗时信号由相似度扫描消费者异步处理）
+	if similarity_svc.IntegrityEnabled() && similarity_svc.Integrity() != nil && req.Code != "" {
+		latest, _ := script_repo.ScriptCode().FindLatest(ctx, script.ID, 0, true)
+		var existingHash string
+		if latest != nil {
+			existingHash = sha256HexString(latest.Code)
+		}
+		newHash := sha256HexString(req.Code)
+		if newHash != existingHash {
+			whitelisted, _ := similarity_svc.Integrity().IsWhitelisted(ctx, script.ID)
+			if !whitelisted {


UpdateCode's integrity pre-check ignores errors from ScriptCode().FindLatest and Integrity().IsWhitelisted (both assigned to _). If either call fails transiently, the code may treat the script as not whitelisted / changed and incorrectly block an update (400) instead of surfacing a server error or skipping the integrity gate. Handle these errors explicitly (e.g., return the error, or fail-open with a warning log depending on desired policy) to avoid false rejections.

Copilot · 2026-04-16T07:45:15Z

+	ok, release, err := h.acquireBackfillLock(ctx)
+	if err != nil || !ok {
+		logger.Ctx(ctx).Warn("similarity backfill: redis lock unavailable",
+			zap.Bool("ok", ok), zap.Error(err))
+		return nil
+	}


RunBackfill returns nil when acquireBackfillLock returns an error (it checks if err != nil || !ok { ...; return nil }). This suppresses real Redis failures and makes backfill runs silently no-op, which is hard to detect/alert on. Consider returning the error when err != nil, and only treating !ok (lock held) as a nil/no-op path.

CodFrm · 2026-04-16T07:59:32Z

Code review

Found 7 issues (issues 1-3 confirm Copilot's findings, 4-7 are additional):

Config validation mismatch: Validate() defaults scan_enabled=false but Similarity() defaults true (CLAUDE.md says "New required config keys must be added to Validate() function (fails fast at startup)")

Validate() reads cfg.Bool(ctx, "similarity.scan_enabled") which returns false when the YAML key is absent. But Similarity() initializes ScanEnabled: true before applying YAML. If a user deploys without the similarity block in config.yaml, Validate() skips the ES check, but Similarity().ScanEnabled returns true at runtime — the app starts cleanly then crashes on ES operations.

Fix: use Similarity().ScanEnabled in Validate() instead of cfg.Bool().

scriptlist/configs/config.go

Lines 239 to 246 in 074e812

    
           // similarity.scan_enabled=true 需要 elasticsearch 地址（cago 读取 elasticsearch.address 列表） 
        
           if cfg.Bool(ctx, "similarity.scan_enabled") { 
        
           	var esAddress []string 
        
           	_ = cfg.Scan(ctx, "elasticsearch.address", &esAddress) 
        
           	if len(esAddress) == 0 { 
        
           		return fmt.Errorf("similarity.scan_enabled=true requires elasticsearch.address to be set") 
        
           	} 
        
           }

Silently discarded errors in UpdateCode integrity pre-check (bug: transient DB/Redis failures cause false rejections)

FindLatest() and IsWhitelisted() errors are assigned to _. If FindLatest fails, existingHash stays empty so the hash comparison always triggers — every update runs integrity check even when code is unchanged. If IsWhitelisted fails, whitelisted scripts are treated as non-whitelisted and can be spuriously rejected with HTTP 400.

scriptlist/internal/service/script_svc/script.go

Lines 453 to 472 in 074e812

    
           if similarity_svc.IntegrityEnabled() && similarity_svc.Integrity() != nil && req.Code != "" { 
        
           	latest, _ := script_repo.ScriptCode().FindLatest(ctx, script.ID, 0, true) 
        
           	var existingHash string 
        
           	if latest != nil { 
        
           		existingHash = sha256HexString(latest.Code) 
        
           	} 
        
           	newHash := sha256HexString(req.Code) 
        
           	if newHash != existingHash { 
        
           		whitelisted, _ := similarity_svc.Integrity().IsWhitelisted(ctx, script.ID) 
        
           		if !whitelisted { 
        
           			result := similarity_svc.Integrity().CheckFast(ctx, req.Code) 
        
           			if result.Score >= similarity_svc.IntegrityBlockThreshold() { 
        
           				return nil, i18n.NewErrorWithStatus( 
        
           					ctx, http.StatusBadRequest, 
        
           					code.SimilarityIntegrityRejected, 
        
           					result.BuildUserMessage(), 
        
           				) 
        
           			} 
        
           		} 
        
           	}

RunBackfill swallows Redis errors and leaves backfill lock held for 2 hours (bug: err != nil and !ok collapsed into one branch)

When acquireBackfillLock returns a Redis error, the function returns nil (no error) and exits before the defer finishBackfill() on line 176. The backfillRunningRedisKey stays held for its 2-hour TTL, causing all subsequent TriggerBackfill calls to return 409 Conflict.

Fix: split the condition — return err when err != nil, only return nil when !ok && err == nil.

scriptlist/internal/task/crontab/handler/similarity_patrol.go

Lines 168 to 175 in 074e812

    
           // Redis lock is belt-and-suspenders. 
        
           ok, release, err := h.acquireBackfillLock(ctx) 
        
           if err != nil || !ok { 
        
           	logger.Ctx(ctx).Warn("similarity backfill: redis lock unavailable", 
        
           		zap.Bool("ok", ok), zap.Error(err)) 
        
           	return nil 
        
           } 
        
           defer release()

stopFpRedisKey constant duplicated across two packages (bug: silent decoupling risk)

The same Redis key "similarity:stop_fp" is independently defined in scan.go (reader) and similarity_stop_fp.go (writer). If either is changed independently, the system silently breaks — scans read an empty set and treat all fingerprints as non-stop.

scriptlist/internal/service/similarity_svc/scan.go

Lines 76 to 78 in 074e812

    
           // stopFpRedisKey holds the current stop-fingerprint set (populated by the 
        
           // Task 20 crontab). It is a Redis SET of hex-encoded uint64 fingerprints. 
        
           const stopFpRedisKey = "similarity:stop_fp"

scriptlist/internal/task/crontab/handler/similarity_stop_fp.go

Lines 16 to 18 in 074e812

    
           const ( 
        
           	stopFpRedisKey = "similarity:stop_fp" 
        
           	stopFpLockKey  = "crontab:similarity:stop_fp_refresh:lock"

BuildUserMessage() result silently dropped — zh_cn message has no format directives (bug: users see generic error instead of signal details)

Both Create and UpdateCode pass result.BuildUserMessage() as extra arg to i18n.NewErrorWithStatus, but the registered message "代码未通过完整性检查，请勿提交压缩或混淆后的代码" has no %s format directives. The detailed signal breakdown from BuildUserMessage() is silently discarded by fmt.Sprintf.

scriptlist/internal/service/script_svc/script.go

Lines 312 to 318 in 074e812

    
           if result.Score >= similarity_svc.IntegrityBlockThreshold() { 
        
           	return nil, i18n.NewErrorWithStatus( 
        
           		ctx, http.StatusBadRequest, 
        
           		code.SimilarityIntegrityRejected, 
        
           		result.BuildUserMessage(), 
        
           	) 
        
           }

scriptlist/internal/pkg/code/zh_cn.go

Lines 151 to 152 in 074e812

    
           SimilarityAccessDenied:       "无权访问该相似对", 
        
           SimilarityIntegrityRejected:  "代码未通过完整性检查，请勿提交压缩或混淆后的代码",

Jaccard denominator uses stale FingerprintCntEffective from script B (bug: score can exceed 1.0 or go negative)

denom := effective + other.FingerprintCntEffective - c.CommonCount mixes the current stop-fp-filtered count for script A with a stored count for script B that was computed under a potentially different stop-fp set. If the stop-fp set changed since B was last scanned, the denominator is inconsistent — it can go negative (pair silently dropped) or produce Jaccard > 1.0.

scriptlist/internal/service/similarity_svc/scan.go

Lines 316 to 321 in 074e812

    
           // Jaccard = |A ∩ B| / |A ∪ B| where |A ∪ B| = |A| + |B| - |A ∩ B|. 
        
           denom := effective + other.FingerprintCntEffective - c.CommonCount 
        
           if denom <= 0 { 
        
           	continue 
        
           } 
        
           jaccard := float64(c.CommonCount) / float64(denom)

SyncOnce auto-sync now subject to integrity block with no bypass (bug: upstream code quality change silently breaks sync)

SyncOnce calls UpdateCode() which now includes the integrity pre-check. If an upstream sync URL starts serving minified/obfuscated code, auto-sync is silently rejected with SimilarityIntegrityRejected. There is no mechanism to distinguish system-sync from user submission, and no per-script exemption for the sync path.

scriptlist/internal/service/script_svc/script.go

Lines 1026 to 1030 in 074e812

    
           } 
        
           if _, err := s.UpdateCode(ctx, req); err != nil { 
        
           	logger.Error("更新代码失败", zap.String("sync_url", script.SyncUrl), zap.Error(err)) 
        
           	return err 
        
           }

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

…stent defaults

…llowing them

… constant Export StopFpRedisKey from similarity_svc (the domain owner) and remove the duplicate local definition from the stop-fp crontab handler, so a key change in either place can no longer silently break the other.

…Code integrity check

…ayer Extract Redis lock and system_config cursor primitives from similarity_svc into a new BackfillStateRepo interface in similarity_repo, following the project's service-locator convention. Service layer retains business logic; tests updated to use MockBackfillStateRepo instead of function-var faking.

CodFrm added 30 commits April 13, 2026 13:42

feat(similarity): scaffold similarity_svc package with goja/xxhash deps

5a1d1b7

feat(similarity): define core fingerprint types and TokenKind enum

b49dac1

test(similarity): cover KindUnknown in TokenKind_String table

9320821

fix(similarity): nil guards on Expression walks; walk ObjectLiteral p…

b6ff6c8

…roperties

feat(similarity): implement k-gram xxhash64 sliding window

cfad561

test(similarity): cover operator value distinction and zero/negative k

eeb5f67

feat(similarity): implement winnowing with monotonic deque

4758845

test(similarity): add winnow brute-force invariant test; clarify comp…

4230617

…lexity

style(similarity): use range over int in property test loop

3e3586f

feat(similarity): wire ExtractFingerprints public API

aee6188

test(similarity): tighten ExtractFingerprints test contracts; documen…

980e613

…t dual-channel error

feat(similarity): implement pure set-based Jaccard similarity

c9b9d66

test(similarity): add Jaccard symmetry and proper-subset cases

f022d3d

test(similarity): golden test for rename invariance

179d61e

test(similarity): golden test for code reorder similarity

acd98ce

test(similarity): golden test for unrelated code disjointness

4c8b66d

chore(similarity): lint-fix polish

b4b8e4b

update .gitignore

3e1e888

refactor(similarity): convert entity status enums to named types + po…

f9e1007

…inter receivers

feat(similarity): add FingerprintRepo with upsert + parse-status helpers

68910c1

feat(similarity): add SimilarPairRepo with normalized-pair upsert

f9084c3

feat(similarity): add SuspectSummaryRepo with upsert

c2d9979

feat(similarity): add SimilarityWhitelistRepo (pair-level)

b6fe291

feat(similarity): add IntegrityWhitelistRepo (script-level)

07843e6

feat(similarity): add IntegrityReviewRepo with code-id upsert

e1f6e67

feat(similarity): add error codes + zh_CN i18n at 114000 range

9e624c1

feat(similarity): add similarity.* config keys + Validate() check

9d8e41f

CodFrm added 17 commits April 14, 2026 14:33

fix(similarity): silence errcheck on new ES resp.Body.Close deferred …

20367af

…calls

chore: exclude auto-generated docs/ dir from golangci-lint

38342bd

docs/docs.go is regenerated by swag and its legacy strings.Replace calls trigger staticcheck QF1004 — we don't own the template, so exclude the entire directory rather than chase transient hits.

fix(similarity): defer cancel in patrol context cancellation test

9d5099a

gosec (G118) flagged the missing defer even though the goroutine fires cancel on the happy path — defer guards the early-return paths.

chore(similarity): silence gosec G304 on bench test fixture read

33fcdf7

refactor(similarity): 移除 AAEncode 和 JJEncode 完整性信号

2a79ba8

这两个编码方式极其冷门，实际恶意脚本几乎不会使用，且其代码特征会被其他信号（单字符标识符比率、空白比率等）覆盖，无需专门检测。

style: fix gofmt formatting in integrity signals

074e812

CodFrm requested a review from Copilot April 16, 2026 07:41

Copilot started reviewing on behalf of CodFrm April 16, 2026 07:42 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

CodFrm added 9 commits April 16, 2026 16:03

fix(similarity): use Similarity().ScanEnabled in Validate() for consi…

248fc59

…stent defaults

fix(similarity): propagate Redis errors in RunBackfill instead of swa…

47dcd93

…llowing them

fix(similarity): clamp Jaccard score and document stop-fp approximation

8c81038

fix(similarity): use format directive in integrity rejection message

3cb51cf

fix(similarity): handle FindLatest and IsWhitelisted errors in Update…

5274a96

…Code integrity check

fix(similarity): skip integrity pre-check for auto-sync code updates

5485bb9

refactor(similarity): split admin.go into evidence and whitelist files

4073a28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(similarity): 脚本相似度检测与完整性校验系统#3

feat(similarity): 脚本相似度检测与完整性校验系统#3
CodFrm wants to merge 87 commits intomainfrom
test/hotfix

CodFrm commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

CodFrm commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CodFrm commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

CodFrm commented Apr 16, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants