[Question]: [rank0]: FileNotFoundError: [Errno 2] No such file or directory: 'traj_data/r2r'

### Question

### Environment
- Running on local machine (no SLURM)
- Script: `scripts/train/qwenvl_train/train_system2_local.sh`
- 2x GPU setup via `torchrun --nproc_per_node=2`

### Problem

After downloading the dataset and placing it according to the documentation, training fails with:

```
FileNotFoundError: [Errno 2] No such file or directory: 'traj_data/r2r'
```

The error originates from `get_annotations_from_lerobot_data` in `internnav/dataset/internvla_n1_lerobot_dataset.py` at line 762:

```python
scene_ids = [d for d in os.listdir(data_path) if os.path.isdir(os.path.join(data_path, d))]
```

The `data_path` is hardcoded as a relative path (`traj_data/r2r`) in the dataset config dictionary (lines 51–100), but `torchrun` does not guarantee that the working directory is the project root, causing the relative path to fail.

### Steps to Reproduce

1. Download the InternNav N1 dataset following the official instructions
2. Place data at `<project_root>/traj_data/r2r` and `<project_root>/traj_data/rxr`
3. Run `bash scripts/train/qwenvl_train/train_system2_local.sh` from the project root
4. Training crashes immediately with `FileNotFoundError`

### Additional Issue: Data structure mismatch

Even after working around the path issue (e.g. via symlinks), a second error appears:

```
ValueError: num_samples should be a positive integer value, but got num_samples=0
```

This is because the downloaded data has the structure:
```
traj_data/r2r/<scene_id>/trajectory_N/meta/episodes.jsonl
```

But `get_annotations_from_lerobot_data` expects:
```
traj_data/r2r/<scene_id>/meta/episodes.jsonl
```

The code only iterates one level deep (`scene_ids`), missing the `trajectory_N` subdirectory level entirely, resulting in 0 episodes loaded.

### Expected Behavior

- The training script should resolve `data_path` relative to the project root regardless of working directory, or use absolute paths
- The dataset loader should either document the exact expected folder structure, or handle the `trajectory_N` subdirectory level

### Questions

1. What is the exact expected directory structure of the downloaded data for `traj_data/r2r`?
2. Should `data_path` be set as an absolute path or resolved relative to `PROJECT_ROOT_PATH`?
3. Is there a data preprocessing step needed before training (e.g. to flatten the `trajectory_N` structure)?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: [rank0]: FileNotFoundError: [Errno 2] No such file or directory: 'traj_data/r2r' #349

Question

Environment

Problem

Steps to Reproduce

Additional Issue: Data structure mismatch

Expected Behavior

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Question]: [rank0]: FileNotFoundError: [Errno 2] No such file or directory: 'traj_data/r2r' #349

Description

Question

Environment

Problem

Steps to Reproduce

Additional Issue: Data structure mismatch

Expected Behavior

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions