Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 32 additions & 16 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ pathrex/
│ ├── mod.rs # FormatError enum, re-exports
│ ├── csv.rs # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
│ ├── mm.rs # MatrixMarket directory loader (vertices.txt, edges.txt, *.txt)
│ └── nt.rs # NTriples<R> — N-Triples → Edge iterator (full predicate IRI labels)
│ └── rdf.rs # Rdfunified RDF parser (N-Triples, Turtle) → Edge iterator
├── tests/
│ ├── inmemory_tests.rs # Integration tests for InMemoryBuilder / InMemoryGraph
│ ├── mm_tests.rs # Integration tests for MatrixMarket format
Expand Down Expand Up @@ -139,7 +139,7 @@ feed itself into a specific [`GraphBuilder`]:
- [`apply_to(self, builder: B) -> Result<B, B::Error>`](src/graph/mod.rs:169) — consumes the
source and returns the populated builder.

[`Csv<R>`](src/formats/csv.rs), [`MatrixMarket`](src/formats/mm.rs), and [`NTriples<R>`](src/formats/nt.rs)
[`Csv<R>`](src/formats/csv.rs), [`MatrixMarket`](src/formats/mm.rs), and [`Rdf`](src/formats/rdf.rs)
implement `GraphSource<InMemoryBuilder>` (see [`src/graph/inmemory.rs`](src/graph/inmemory.rs)), so they
can be passed to [`GraphBuilder::load`] and [`Graph::try_from`].

Expand Down Expand Up @@ -207,7 +207,6 @@ which is used by the MatrixMarket loader.
Three built-in parsers are available, each yielding
`Iterator<Item = Result<Edge, FormatError>>` and pluggable into
`GraphBuilder::load()` via `GraphSource<InMemoryBuilder>` (see [`src/graph/inmemory.rs`](src/graph/inmemory.rs)).
CSV and MatrixMarket edge loaders are available:

#### `Csv<R>`

Expand Down Expand Up @@ -251,26 +250,40 @@ Helper functions:

`MatrixMarket` implements `GraphSource<InMemoryBuilder>` in [`src/graph/inmemory.rs`](src/graph/inmemory.rs) (see the `impl` at line 215): `vertices.txt` maps are converted from 1-based file indices to 0-based matrix ids before [`set_node_map`](src/graph/inmemory.rs:67); `edges.txt` indices are unchanged for `n.txt` lookup.

#### `NTriples<R>`
#### `Rdf` — Unified RDF Parser

[`NTriples<R>`](src/formats/nt.rs:64) parses [W3C N-Triples](https://www.w3.org/TR/n-triples/)
RDF files using `oxttl` and `oxrdf`. Each triple `(subject, predicate, object)` becomes an
[`Edge`](src/graph/mod.rs:158) where:
[`Rdf`](src/formats/rdf.rs) is a unified parser for RDF formats using `oxttl` and `oxrdf`.
It supports both **N-Triples** (`.nt`) and **Turtle** (`.ttl`) formats via the [`RdfFormat`](src/formats/rdf.rs) enum.

Each triple `(subject, predicate, object)` becomes an [`Edge`](src/graph/mod.rs:158) where:

- `source` — subject IRI or blank-node ID (`_:label`).
- `target` — object IRI or blank-node ID; triples whose object is an RDF
literal yield `Err(FormatError::LiteralAsNode)` (callers may filter these out).
- `label` — predicate IRI, transformed by [`LabelExtraction`](src/formats/nt.rs:38):
- `label` — full predicate IRI string (including fragment `#…` when present).

Constructor:

| Variant | Behaviour |
- [`Rdf::from_path(path)`](src/formats/rdf.rs) — auto-detects format from file extension (`.nt` → N-Triples, `.ttl` → Turtle). Parses in parallel using memory-mapping and rayon.

Format detection via [`RdfFormat::from_path(path)`](src/formats/rdf.rs):

| Extension | Format |
|---|---|
| `LocalName` (default) | Fragment (`#name`) or last path segment of the predicate IRI |
| `FullIri` | Full predicate IRI string |
| `.nt`, `.ntriples` | `RdfFormat::NTriples` |
| `.ttl`, `.turtle` | `RdfFormat::Turtle` |

Constructors:
Example usage:

- [`NTriples::new(reader)`](src/formats/nt.rs:70) — uses `LabelExtraction::LocalName`.
- [`NTriples::with_label_extraction(reader, strategy)`](src/formats/nt.rs:74) — explicit strategy.
```rust
use pathrex::formats::Rdf;
use pathrex::graph::{Graph, InMemory};

// Auto-detect from extension
let graph = Graph::<InMemory>::try_from(
Rdf::from_path("data.ttl")?
)?;
```

### SPARQL parsing (`src/sparql/mod.rs`)

Expand Down Expand Up @@ -421,7 +434,10 @@ LAGraph. Safe Rust wrappers live in [`graph::mod`](src/graph/mod.rs):
- [`GraphblasVector`](src/graph/mod.rs:128) — RAII wrapper around `GrB_Vector`
(derives `Debug`).
- [`GraphblasMatrix`](src/graph/mod.rs) — RAII wrapper around `GrB_Matrix` (`dup` + `free` on drop).
- [`ensure_grb_init()`](src/graph/mod.rs:39) — one-time `LAGraph_Init` via `std::sync::Once`.
- [`ensure_grb_init()`](src/graph/wrappers.rs:11) — internal one-time `LAGraph_Init` via
`std::sync::Once`. Called automatically by RAII-wrapped constructors
(`LagraphGraph::from_coo`, `LagraphGraph::from_matrix`, `ThreadScope::enter`) and by
`load_mm_file`. Crate-private; no other code should call it.

### Macros & helpers (`src/utils.rs`)

Expand Down Expand Up @@ -472,7 +488,7 @@ Tests in `src/graph/mod.rs` use `CountingBuilder` / `CountOutput` / `VecSource`
[`src/utils.rs`](src/utils.rs) — these do **not** call into GraphBLAS and run without
native libraries.

Tests in `src/formats/csv.rs` and `src/formats/nt.rs` are pure Rust and need no native dependencies.
Tests in `src/formats/csv.rs` and `src/formats/rdf.rs` are pure Rust and need no native dependencies.

Tests in `src/sparql/mod.rs` are pure Rust and need no native dependencies.

Expand Down
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@ edition = "2024"
csv = "1.4.0"
egg = "0.10.0"
libc = "0.2"
memmap2 = "0.9"
oxrdf = "0.3.3"
oxttl = "0.2.3"
rayon = "1"
rustfst = "1.2"
spargebra = "0.4.6"
thiserror = "1.0"
Expand Down
2 changes: 2 additions & 0 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ fn regenerate_bindings() {
.allowlist_function("LAGraph_CheckGraph")
.allowlist_function("LAGraph_Init")
.allowlist_function("LAGraph_Finalize")
.allowlist_function("LAGraph_SetNumThreads")
.allowlist_function("LAGraph_GetNumThreads")
.allowlist_function("LAGraph_New")
.allowlist_function("LAGraph_Delete")
.allowlist_function("LAGraph_Cached_AT")
Expand Down
60 changes: 9 additions & 51 deletions src/formats/mm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,56 +24,12 @@
//! ```

use std::collections::HashMap;
use std::ffi::CString;
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::os::fd::IntoRawFd;
use std::path::{Path, PathBuf};

use crate::formats::FormatError;
use crate::graph::{GraphError, ensure_grb_init};
use crate::la_ok;
use crate::lagraph_sys::{FILE, GrB_Matrix, LAGraph_MMRead};

/// Read a single MatrixMarket file and return the raw [`GrB_Matrix`].
pub fn load_mm_file(path: impl AsRef<Path>) -> Result<GrB_Matrix, FormatError> {
let path = path.as_ref();

ensure_grb_init().map_err(|e| match e {
GraphError::LAGraph(info, msg) => FormatError::MatrixMarket {
code: info,
message: msg,
},
_ => FormatError::MatrixMarket {
code: crate::lagraph_sys::GrB_Info::GrB_PANIC,
message: "Failed to initialize GraphBLAS".to_string(),
},
})?;

let file = File::open(path)?;
let fd = file.into_raw_fd();

let c_mode = CString::new("r").unwrap();
let f = unsafe { libc::fdopen(fd, c_mode.as_ptr()) };
if f.is_null() {
unsafe { libc::close(fd) };
return Err(std::io::Error::last_os_error().into());
}

let mut matrix: GrB_Matrix = std::ptr::null_mut();

let err = la_ok!(LAGraph_MMRead(&mut matrix, f as *mut FILE));
unsafe { libc::fclose(f) };

match err {
Ok(_) => Ok(matrix),
Err(GraphError::LAGraph(info, msg)) => Err(FormatError::MatrixMarket {
code: info,
message: msg,
}),
_ => unreachable!("should be either mm read error or ok"),
}
}
pub use crate::graph::load_mm_file;

// Trims first "<" and last ">".
fn normalize_map_name(name: &str) -> String {
Expand All @@ -92,12 +48,12 @@ pub(crate) fn apply_base_iri(name: String, base: Option<&str>) -> String {
}
}

type IndexMap = (HashMap<usize, String>, HashMap<String, usize>);

/// Parse a `<name> <index>` mapping file.
///
/// Throws error on non-positive or duplicate indicies
pub(crate) fn parse_index_map(
path: &Path,
) -> Result<(HashMap<usize, String>, HashMap<String, usize>), FormatError> {
pub(crate) fn parse_index_map(path: &Path) -> Result<IndexMap, FormatError> {
let file_name = path
.file_name()
.map(|n| n.to_string_lossy().into_owned())
Expand Down Expand Up @@ -189,7 +145,7 @@ impl MatrixMarket {
self
}

pub(crate) fn mm_path(&self, idx: usize) -> PathBuf {
pub fn mm_path(&self, idx: usize) -> PathBuf {
self.dir.join(format!("{}.txt", idx))
}
}
Expand Down Expand Up @@ -278,10 +234,12 @@ mod tests {

#[test]
fn test_load_nonexistent_mm_file_returns_io_error() {
use crate::formats::FormatError;
use crate::graph::GraphError;
let result = load_mm_file("/nonexistent/path/to/file.txt");
assert!(
matches!(result, Err(FormatError::Io(_))),
"expected Io error for missing file, got: {:?}",
matches!(result, Err(GraphError::Format(FormatError::Io(_)))),
"expected Format(Io) error for missing file, got: {:?}",
result
);
}
Expand Down
19 changes: 10 additions & 9 deletions src/formats/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,29 @@
//!
//! ```no_run
//! use pathrex::graph::{Graph, InMemory, GraphDecomposition};
//! use pathrex::formats::{Csv, NTriples};
//! use pathrex::formats::{Csv, Rdf};
//! use std::fs::File;
//!
//! // Build from CSV in one line
//! // Build from CSV
//! let g = Graph::<InMemory>::try_from(
//! Csv::from_reader(File::open("edges.csv").unwrap()).unwrap()
//! ).unwrap();
//!
//! // Build from N-Triples in one line
//! // Build from Turtle (auto-detect from extension)
//! let g2 = Graph::<InMemory>::try_from(
//! NTriples::new(File::open("data.nt").unwrap())
//! Rdf::from_path("data.ttl").unwrap()
//! ).unwrap();
//! ```

pub mod csv;
pub mod mm;
pub mod nt;
pub mod rdf;

pub use csv::Csv;
pub use mm::MatrixMarket;
pub use nt::NTriples;
pub use rdf::{Rdf, RdfFormat};

use oxttl::TurtleSyntaxError;
use thiserror::Error;

use crate::lagraph_sys::GrB_Info;
Expand Down Expand Up @@ -57,9 +58,9 @@ pub enum FormatError {
reason: String,
},

/// An error produced by the N-Triples parser.
#[error("N-Triples parse error: {0}")]
NTriples(String),
/// An error produced by an RDF parser (N-Triples, Turtle, etc.)
#[error("RDF parse error: {0}")]
Rdf(#[from] TurtleSyntaxError),

/// An RDF literal appeared as a subject or object where a node IRI or
/// blank node was expected.
Expand Down
Loading
Loading