Skip to content

feat: Add Java import resolution to extractImportMappings for cross-module disambiguation #314

@ZanXusV

Description

@ZanXusV

Background

While analyzing CodeGraph's resolution layer for a Java microservices project (1,500+ files / 44,000+ symbols across 8+ Maven modules), I noticed that src/resolution/import-resolver.ts handles import parsing for TypeScript/JavaScript, Python, Go, and PHP — but has no corresponding extractJavaImports branch for Java.

Current Behavior

extractImportMappings returns [] for any Java file, which means:

  1. resolveViaImport always returns null for Java references.
  2. Every Java cross-file symbol lookup falls through to findBestMatch in name-matcher.ts, which uses file-path proximity as a heuristic (up to +80 score for shared directory segments).
  3. Confidence for multi-candidate Java resolutions is capped at 0.4–0.7, versus 0.9 for languages with import-based resolution.

Impact: Java Same-Name Symbol Collisions Are Resolved Heuristically

In a real Maven multi-module project, same-name classes in different packages are common — especially converter/mapper/DTO classes that follow naming conventions like XxxConverter or XxxMapper. A quick query on the project's .codegraph/codegraph.db confirms:

qualified_name node count modules spanned
FooConverter::convert 11 dao/converter + service/converter
BarConverter::convert 10 same pattern
BazConverter::convert 9 same pattern

When a service-layer class calls FooConverter.convert(), CodeGraph picks the right node most of the time due to path proximity — but this is coincidental, not semantic. If the caller is in a cross-cutting module equidistant from both converters, the wrong node gets chosen, producing incorrect call edges.

Root Cause

Java import statements carry the full FQN:

import com.example.project.dao.converter.FooConverter;

This is exactly the disambiguation signal needed. A simple regex over the caller's source file resolves the ambiguity with certainty — no AST change required, just a new extractJavaImports branch parallel to extractGoImports.

Why No Java Import Parsing Today?

My hypothesis (please correct if wrong): Java's import resolution differs architecturally from JS/TS in that Java imports are type declarations at the top of the file and don't map to a resolvable file path within the project the way import { Foo } from './foo' does in JS. The resolveImportPath function expects a relative or alias-based path, not a FQN like com.example.project.dao.converter.FooConverter.

This means the existing resolveViaImport → resolveImportPath → findExportedSymbol chain doesn't directly apply to Java. A Java-specific branch would need to:

  1. Parse import com.example.Foo; → extract simple name Foo and FQN
  2. Look up nodes by name Foo in the context
  3. Use the FQN to filter candidates whose file_path matches the package path (e.g., path contains com/example/Foo.java)

This avoids touching resolveImportPath entirely and stays within the existing context API.

Real-World Use Case: Multi-Microservice Architecture

Running ~10 Java microservices in a monorepo-style layout under a single parent directory. Each service is an independent Maven multi-module project (client / common / dao / service / web). A typical feature spans 3–4 services and involves:

  • A client jar from Service A imported as a Maven dependency in Service B
  • Service B's service module calling converter classes from its own dao module
  • Cross-service event payloads defined in common modules

In this setup, same-named converter/mapper/DTO classes exist across modules and across services. CodeGraph's cross-project indexing (via nested git repo discovery) is valuable here — but only if the edge resolution is semantically correct. Heuristic path proximity works when services are well-structured, but breaks silently when two modules share a similar directory layout.

Concretely: when I use codegraph_callers to find all callers of a specific converter method before refactoring it, a misresolved edge means a caller in another module is invisible — exactly the case where missing an edge causes a production incident.

Proposed Fix

In src/resolution/import-resolver.ts, add:

else if (language === 'java') {
    mappings.push(...extractJavaImports(content));
}

function extractJavaImports(content: string): ImportMapping[] {
    const mappings: ImportMapping[] = [];
    const importRegex = /^import\s+(?:static\s+)?([\w.]+);/gm;
    let match: RegExpExecArray | null;
    while ((match = importRegex.exec(content)) !== null) {
        const fqn = match[1];
        const simpleName = fqn.split('.').pop()!;
        mappings.push({
            localName: simpleName,
            exportedName: simpleName,
            source: fqn,
            isDefault: false,
            isNamespace: false,
        });
    }
    return mappings;
}

resolveViaImport would then need a small Java-specific path to translate FQN → file path: com.example.project.dao.FooConverter**/com/example/project/dao/FooConverter.java.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions