fix(compiler): handle Rust identifier escaping and name collisions by BaldDemian · Pull Request #3658 · apache/fory

BaldDemian · 2026-05-08T05:27:19Z

Why?

Identifiers colliding with Rust keywords should be escaped to prevent compiling errors.

What does this PR do?

Workflow

The current implementation broadly follows the approach used by prost-build. It first normalizes identifiers from the IDL according to Rust naming conventions, e.g. converting struct names to UpperCamelCase and field names to snake_case. It then checks whether the normalized identifiers are Rust keywords and escapes them when necessary.

This process is not trivial as I first expected. For example, we need to cache the escaped name of each identifier to make sure the definition site and all use sites refer to the same name consistently.

Known issues

Although I have tested the implementation against many corner cases, this handling is still incomplete, much like prost itself. It may still generate Rust code that fails to compile for valid (but rare, I suppose 🤔) IDL. In most cases, this happens because an identifier from the IDL conflicts with an identifier already used by the Rust code generated by Fory.

Example 1:

IDL:

package demo;
message Fory {
  string value = 1;
}

Generated Rust code:

use fory::Fory;
pub struct Fory {
    pub value: String,
}
pub fn register_types(fory: &mut Fory) -> Result<(), fory::Error> {
    fory.register::<Fory>(...)?;
    Ok(())
}

Compiling errors:

error[E0255]: the name `Fory` is defined multiple times
error[E0117]: derive/impl resolves to the external `fory::Fory` type

Example 2:

IDL:

package demo;
message HashMap {
  string value = 1;
}
message Holder {
  map<string, string> values = 1;
  HashMap item = 2;
}

Generated Rust code:

use std::collections::HashMap;
pub struct HashMap {
    pub value: String,
}
pub struct Holder {
    pub values: HashMap<String, String>,
    pub item: HashMap,
}

Compiling errors:

error[E0255]: the name `HashMap` is defined multiple times
error[E0107]: missing generics for struct `std::collections::HashMap`

However, I think IDL like the above are rare in real practice. The resulting compiling errors are clear and straightforward and the user can handle the conflicting names easily in the IDL.

Changes made to the FBS compiler

In the current implementation, I use the tuple <file_path, line, column> as the stable key for an identifier. This lets the generator connect an identifier's definition site with all of its use sites, so they consistently resolve to the same name after normalization and Rust keyword escaping.

However, when translating a FlatBuffers union into the Fory IR, the FBS translator currently assigns every generated union case the same source location: the line and column of the union declaration itself.

fory/compiler/fory_compiler/frontend/fbs/translator.py

Lines 243 to 244 in 389b66c

    
           line=fbs_union.line, 
        
           column=fbs_union.column,

As a result, different cases in the same union can end up sharing the same identifier key. This causes the later case to overwrite the earlier one in the Rust generator's name cache, and can incorrectly make distinct union variants resolve to the same Rust name.

Since both the FDL parser and the Protobuf translator preserve finer-grained source locations for union/oneof cases:

fory/compiler/fory_compiler/frontend/proto/translator.py

Lines 132 to 133 in 389b66c

    
           line=v.line, 
        
           column=v.column,

fory/compiler/fory_compiler/frontend/fdl/parser.py

Lines 598 to 599 in 389b66c

    
           line=start.line, 
        
           column=start.column,

the FBS translator should do the same by assigning each union case the location of its own token.

Related issues

Close #3544.

AI Contribution Checklist

Substantial AI assistance was used in this PR: no
If yes, I included a completed AI Contribution Checklist in this PR description and the required AI Usage Disclosure.
If yes, my PR description includes the required ai_review summary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.

Does this PR introduce any user-facing change?

Does this PR introduce any public API change?
Does this PR introduce any binary protocol compatibility change?

Benchmark

BaldDemian · 2026-05-08T05:42:35Z

~~I am aware of the root cause of the failing xlang test. Working on the fix~~ Now CI passes

chaokunyang · 2026-05-08T08:21:19Z

+            scope = match.nested_messages
+        return lineage
+
+    def _allocate_type_scope_names(self, type_defs: List[object], scope: str) -> None:


This allocator only checks IDL declarations against each other, but the generated Rust module also uses helper and prelude names unqualified. For example, message String { string value = 1; } generates pub struct String { pub value: String, }, so the field type resolves to the generated struct instead of std::string::String and the Rust code is recursive/invalid. Vec and Fory have similar failures with Vec<T> and use fory::Fory. Please reserve the generated/helper/prelude names in the relevant scopes or fully qualify those references, and add a generated Rust compile check for String, Vec, and Fory.

I'm considering changing the generated Rust code to use crate-root absolute paths for generator-owned dependencies, for example emitting ::std::string::String instead of std::string::String, and removing generated use statements.

Prost follows this absolute path style.

Using absolute paths keeps Fory/runtime/std/core references out of the generated code's local namespace, so IDL-defined names such as String, Vec cannot collide with the generator-owned types.

I'd like to hear your opinion @chaokunyang

good idea, absolute paths is better

chaokunyang · 2026-05-08T08:21:20Z

-        if grpc:
-            service_files = generator.generate_services()
-            files.extend(service_files)
+        if lang == "rust":


This guard only rejects overwriting existing non-generated files. It does not catch two inputs in the same invocation that normalize to the same Rust output path. For example, packages foo.bar and foo_bar both produce foo_bar.rs; the second file sees the generated marker, overwrites the first file, and the command exits successfully with only the second schema's types. Please track generated target paths during the compile invocation and fail when distinct source files map to the same Rust output file, with a two-file CLI regression test.

fix(compiler): handle Rust identifier escaping and name collisions

4d4a54f

BaldDemian force-pushed the rust-keywords branch from 97caecd to 4d4a54f Compare May 8, 2026 07:12

chaokunyang reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compiler): handle Rust identifier escaping and name collisions#3658

fix(compiler): handle Rust identifier escaping and name collisions#3658
BaldDemian wants to merge 1 commit intoapache:mainfrom
BaldDemian:rust-keywords

BaldDemian commented May 8, 2026 •

edited

Loading

Uh oh!

BaldDemian commented May 8, 2026 •

edited

Loading

Uh oh!

chaokunyang May 8, 2026

Uh oh!

BaldDemian May 8, 2026

Uh oh!

chaokunyang May 8, 2026

Uh oh!

chaokunyang May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BaldDemian commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What does this PR do?

Workflow

Known issues

Changes made to the FBS compiler

Related issues

AI Contribution Checklist

Does this PR introduce any user-facing change?

Benchmark

Uh oh!

BaldDemian commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaokunyang May 8, 2026

Choose a reason for hiding this comment

Uh oh!

BaldDemian May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chaokunyang May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chaokunyang May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BaldDemian commented May 8, 2026 •

edited

Loading

BaldDemian commented May 8, 2026 •

edited

Loading