From d4a62e3062cceae25b5a2b3277b11d206af21813 Mon Sep 17 00:00:00 2001 From: Georgi Mammen Mullassery <54147004+Mullassery@users.noreply.github.com> Date: Mon, 15 Jun 2026 10:02:40 +0530 Subject: [PATCH 1/3] Add StreamXL to Data Processing section --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6bb6dc8..0288bc6 100644 --- a/README.md +++ b/README.md @@ -51,6 +51,7 @@ High-performance data processing and serialization libraries. - [polars](https://github.com/pola-rs/polars) - DataFrame library with a Pandas-like API. - [pydantic-core](https://github.com/pydantic/pydantic-core) - Core validation logic for Pydantic v2. - [rustworkx](https://github.com/Qiskit/rustworkx) - High-performance Python graph library implemented in Rust. +- [StreamXL](https://github.com/Mullassery/StreamXL) - Streaming XLSX reader for Python powered by Rust. Reads large Excel files row-by-row at constant memory. 4-5x faster than openpyxl. - [yaml-rs](https://github.com/lava-sh/yaml-rs) - High-performance YAML v1.2 parser. ## Development Tools From 868976c5c4d22a7ea2c6bb72cfd1b7d8f145032f Mon Sep 17 00:00:00 2001 From: Georgi Mammen Mullassery <54147004+Mullassery@users.noreply.github.com> Date: Tue, 16 Jun 2026 23:43:26 +0530 Subject: [PATCH 2/3] =?UTF-8?q?Add=20AudiencePro,=20statguard,=20StreamXL?= =?UTF-8?q?=20=E2=80=94=20Rust-powered=20Python=20libs=20for=20ML,=20data?= =?UTF-8?q?=20quality,=20and=20Excel=20streaming?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0288bc6..31de588 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,8 @@ High-performance data processing and serialization libraries. - [polars](https://github.com/pola-rs/polars) - DataFrame library with a Pandas-like API. - [pydantic-core](https://github.com/pydantic/pydantic-core) - Core validation logic for Pydantic v2. - [rustworkx](https://github.com/Qiskit/rustworkx) - High-performance Python graph library implemented in Rust. -- [StreamXL](https://github.com/Mullassery/StreamXL) - Streaming XLSX reader for Python powered by Rust. Reads large Excel files row-by-row at constant memory. 4-5x faster than openpyxl. +- [statguard](https://github.com/Mullassery/statguard) - Declarative data quality and validation library — schema checks, drift detection (PSI + KS), anomaly detection, and native Delta Lake/Iceberg support. 13–25× faster than pandera and Great Expectations. +- [StreamXL](https://github.com/Mullassery/StreamXL) - Streaming XLSX reader that processes large Excel files row-by-row at constant memory usage. 4–5× faster than openpyxl with PyO3 bindings. - [yaml-rs](https://github.com/lava-sh/yaml-rs) - High-performance YAML v1.2 parser. ## Development Tools @@ -89,6 +90,7 @@ Web servers, networking libraries, and cryptographic tools. Tools for machine learning, NLP, and AI applications. +- [AudiencePro](https://github.com/Mullassery/AudiencePro) - Python library for customer segmentation — RFM analysis, KMeans/K-Prototypes clustering, drift detection, and streaming updates at 10–25× the speed of scikit-learn + pandas. - [boxlite](https://github.com/boxlite-ai/boxlite) - Local-first sandbox for AI agents. - [chroma](https://github.com/chroma-core/chroma) - Search and retrieval database for AI applications. - [monty](https://github.com/pydantic/monty) - Minimal secure Python interpreter for AI workloads. From d46a3a8238de991784db47f8cc8a9368047ea039 Mon Sep 17 00:00:00 2001 From: Georgi Mammen Mullassery <54147004+Mullassery@users.noreply.github.com> Date: Sat, 20 Jun 2026 01:49:10 +0530 Subject: [PATCH 3/3] Update AudiencePro to ClusterAudienceKit --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 31de588..a9e4183 100644 --- a/README.md +++ b/README.md @@ -90,7 +90,7 @@ Web servers, networking libraries, and cryptographic tools. Tools for machine learning, NLP, and AI applications. -- [AudiencePro](https://github.com/Mullassery/AudiencePro) - Python library for customer segmentation — RFM analysis, KMeans/K-Prototypes clustering, drift detection, and streaming updates at 10–25× the speed of scikit-learn + pandas. +- [ClusterAudienceKit](https://github.com/Mullassery/ClusterAudienceKit) - Python library for customer segmentation in Martech pipelines — RFM analysis, clustering, streaming updates, and drift detection in a single pip install. - [boxlite](https://github.com/boxlite-ai/boxlite) - Local-first sandbox for AI agents. - [chroma](https://github.com/chroma-core/chroma) - Search and retrieval database for AI applications. - [monty](https://github.com/pydantic/monty) - Minimal secure Python interpreter for AI workloads.