Skip to content

fix(compiler): handle Rust identifier escaping and name collisions#3658

Open
BaldDemian wants to merge 1 commit intoapache:mainfrom
BaldDemian:rust-keywords
Open

fix(compiler): handle Rust identifier escaping and name collisions#3658
BaldDemian wants to merge 1 commit intoapache:mainfrom
BaldDemian:rust-keywords

Conversation

@BaldDemian
Copy link
Copy Markdown
Contributor

@BaldDemian BaldDemian commented May 8, 2026

Why?

Identifiers colliding with Rust keywords should be escaped to prevent compiling errors.

What does this PR do?

Workflow

The current implementation broadly follows the approach used by prost-build. It first normalizes identifiers from the IDL according to Rust naming conventions, e.g. converting struct names to UpperCamelCase and field names to snake_case. It then checks whether the normalized identifiers are Rust keywords and escapes them when necessary.

This process is not trivial as I first expected. For example, we need to cache the escaped name of each identifier to make sure the definition site and all use sites refer to the same name consistently.

Known issues

Although I have tested the implementation against many corner cases, this handling is still incomplete, much like prost itself. It may still generate Rust code that fails to compile for valid (but rare, I suppose 🤔) IDL. In most cases, this happens because an identifier from the IDL conflicts with an identifier already used by the Rust code generated by Fory.

Example 1:

IDL:

package demo;
message Fory {
  string value = 1;
}

Generated Rust code:

use fory::Fory;
pub struct Fory {
    pub value: String,
}
pub fn register_types(fory: &mut Fory) -> Result<(), fory::Error> {
    fory.register::<Fory>(...)?;
    Ok(())
}

Compiling errors:

error[E0255]: the name `Fory` is defined multiple times
error[E0117]: derive/impl resolves to the external `fory::Fory` type

Example 2:

IDL:

package demo;
message HashMap {
  string value = 1;
}
message Holder {
  map<string, string> values = 1;
  HashMap item = 2;
}

Generated Rust code:

use std::collections::HashMap;
pub struct HashMap {
    pub value: String,
}
pub struct Holder {
    pub values: HashMap<String, String>,
    pub item: HashMap,
}

Compiling errors:

error[E0255]: the name `HashMap` is defined multiple times
error[E0107]: missing generics for struct `std::collections::HashMap`

However, I think IDL like the above are rare in real practice. The resulting compiling errors are clear and straightforward and the user can handle the conflicting names easily in the IDL.

Changes made to the FBS compiler

In the current implementation, I use the tuple <file_path, line, column> as the stable key for an identifier. This lets the generator connect an identifier's definition site with all of its use sites, so they consistently resolve to the same name after normalization and Rust keyword escaping.

However, when translating a FlatBuffers union into the Fory IR, the FBS translator currently assigns every generated union case the same source location: the line and column of the union declaration itself.

line=fbs_union.line,
column=fbs_union.column,

As a result, different cases in the same union can end up sharing the same identifier key. This causes the later case to overwrite the earlier one in the Rust generator's name cache, and can incorrectly make distinct union variants resolve to the same Rust name.

Since both the FDL parser and the Protobuf translator preserve finer-grained source locations for union/oneof cases:

line=v.line,
column=v.column,

line=start.line,
column=start.column,

the FBS translator should do the same by assigning each union case the location of its own token.

Related issues

Close #3544.

AI Contribution Checklist

  • Substantial AI assistance was used in this PR: no
  • If yes, I included a completed AI Contribution Checklist in this PR description and the required AI Usage Disclosure.
  • If yes, my PR description includes the required ai_review summary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@BaldDemian
Copy link
Copy Markdown
Contributor Author

BaldDemian commented May 8, 2026

I am aware of the root cause of the failing xlang test. Working on the fix Now CI passes

scope = match.nested_messages
return lineage

def _allocate_type_scope_names(self, type_defs: List[object], scope: str) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocator only checks IDL declarations against each other, but the generated Rust module also uses helper and prelude names unqualified. For example, message String { string value = 1; } generates pub struct String { pub value: String, }, so the field type resolves to the generated struct instead of std::string::String and the Rust code is recursive/invalid. Vec and Fory have similar failures with Vec<T> and use fory::Fory. Please reserve the generated/helper/prelude names in the relevant scopes or fully qualify those references, and add a generated Rust compile check for String, Vec, and Fory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering changing the generated Rust code to use crate-root absolute paths for generator-owned dependencies, for example emitting ::std::string::String instead of std::string::String, and removing generated use statements.

Prost follows this absolute path style.

Using absolute paths keeps Fory/runtime/std/core references out of the generated code's local namespace, so IDL-defined names such as String, Vec cannot collide with the generator-owned types.

I'd like to hear your opinion @chaokunyang

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, absolute paths is better

if grpc:
service_files = generator.generate_services()
files.extend(service_files)
if lang == "rust":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard only rejects overwriting existing non-generated files. It does not catch two inputs in the same invocation that normalize to the same Rust output path. For example, packages foo.bar and foo_bar both produce foo_bar.rs; the second file sees the generated marker, overwrites the first file, and the command exits successfully with only the second schema's types. Please track generated target paths during the compile invocation and fail when distinct source files map to the same Rust output file, with a two-file CLI regression test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keywords are not escaped in compiler

2 participants