Conditional value selection using CASE WHEN semantics — mirrors pandas.Series.case_when() (pandas 2.2+).
+
+
+
1 — Basic grade classification
+
caseWhen(series, caselist) applies an ordered list of [condition, replacement] pairs. The first matching condition determines the output; if no condition matches the original value is kept.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
2 — Using boolean Series as conditions
+
Conditions can be boolean Series objects (e.g. from comparison operations).
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
3 — Using predicate functions
+
Conditions can be predicate functions (value, index) => boolean.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
4 — Series as replacement values
+
Replacements can be Series objects — the matching positional value is used.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
5 — Unmatched rows keep original values
+
Any row not matched by any condition retains its original value — there is no implicit "else" replacement.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
6 — First matching condition wins
+
When multiple conditions match the same row, the first one in caselist takes effect — just like CASE WHEN … THEN … WHEN … THEN … END in SQL.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
7 — Positional index in predicate
+
Predicate functions receive both the value and its positional index as the second argument.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
8 — String Series classification
+
caseWhen works on any Series type — numbers, strings, booleans, or mixed.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
Ctrl+Enter to run · Tab to indent
+
+
+
+
+
9 — Comparison with where / mask
+
caseWhen generalises whereSeries to multiple branches. Use whereSeries for a single condition; use caseWhen for multi-branch logic.
readTable(text, opts?) — parse delimiter-separated text into a DataFrame. Defaults to tab separator; all ReadCsvOptions forwarded. Mirrors pandas.read_table().
+ readTable(text, opts?) mirrors
+ pandas.read_table().
+ It parses delimiter-separated text into a DataFrame, defaulting to
+ a tab (\t) separator — unlike readCsv which defaults to a comma.
+
+
+
Quick Examples
+
+
+
+
+
+
+
+
+
+
+
+
+
Live Demo
+
Edit the text below and configure options, then click Parse.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
API Reference
+
readTable(text: string, options?: ReadTableOptions): DataFrame
+
+interface ReadTableOptions {
+ sep?: string; // separator (default: "\t")
+ header?: number | null; // header row index (default: 0)
+ indexCol?: string | number | null; // column to use as index
+ dtype?: Record<string, DtypeName>;
+ naValues?: string[]; // extra NA string values
+ skipRows?: number; // rows to skip after header
+ nRows?: number; // max rows to read
+}
+
+
Comparison: readTable vs readCsv
+
// readTable defaults to tab separator:
+const df1 = readTable("a\tb\n1\t2"); // sep="\t" by default
+
+// readCsv defaults to comma separator:
+const df2 = readCsv("a,b\n1,2"); // sep="," by default
+
+// readTable with explicit comma sep = same as readCsv:
+const df3 = readTable("a,b\n1,2", { sep: "," }); // identical result
Parse XML text into a DataFrame with
+ auto-detection of row elements, attribute and child-element columns, entity decoding,
+ CDATA support, namespace stripping, and numeric coercion. Serialize any DataFrame
+ back to well-formed XML with full formatting control. Mirrors
+ pandas.read_xml() and pandas.DataFrame.to_xml().
+ Edit any code block below and press ▶ Run
+ (or Ctrl+Enter) to execute it live in your browser.
+
+
+
+
+
1 · Basic readXml — child-element rows
+
The most common XML layout: a root element containing repeating row elements,
+ each with child elements as columns. readXml auto-detects the row
+ tag and coerces numeric strings automatically.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
2 · Attribute rows
+
XML elements can carry data as attributes instead of (or in addition to) child
+ elements. Use attribs: true (the default) to include them.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
3 · usecols, nrows, indexCol
+
Restrict the columns returned with usecols, limit rows with
+ nrows, and promote a column to the index with indexCol.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
4 · naValues — custom NA strings
+
Built-in NA strings include "", "NA", "NaN",
+ "N/A", "null", "None", "nan".
+ Use naValues to add your own.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
5 · Entities & CDATA
+
Named entities (&, <, …), decimal/hex
+ character references (A, A), and
+ CDATA sections (<![CDATA[…]]>) are all handled transparently.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
6 · toXml — child elements (default)
+
toXml(df) produces a well-formed XML document with an XML declaration,
+ a configurable root element, and one child element per row containing one sub-element
+ per column.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
7 · toXml — attribs mode
+
Set attribs: true to emit column values as XML attributes on each
+ row element instead of as child elements — produces more compact output.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
8 · toXml — namespaces & CDATA columns
+
Declare XML namespace prefixes on the root element with namespaces.
+ Wrap sensitive columns in CDATA sections with cdataCols to preserve
+ special characters literally.
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
9 · Round-trip: toXml → readXml
+
Serializing a DataFrame to XML and reading it back should produce an identical
+ DataFrame (shape and values).
+
+
+ TypeScript
+
+
+
+
+
+
+
Click ▶ Run to execute
+
+
+
+
+
+
diff --git a/src/index.ts b/src/index.ts
index 2f49842f..719a54b6 100644
--- a/src/index.ts
+++ b/src/index.ts
@@ -62,6 +62,10 @@ export { toJsonDenormalize, toJsonRecords, toJsonSplit, toJsonIndex } from "./io
export type { JsonDenormalizeOptions, JsonSplitOptions, JsonSplitResult } from "./io/index.ts";
export { readHtml } from "./io/index.ts";
export type { ReadHtmlOptions } from "./io/index.ts";
+export { readXml, toXml } from "./io/index.ts";
+export type { ReadXmlOptions, ToXmlOptions } from "./io/index.ts";
+export { readTable } from "./io/index.ts";
+export type { ReadTableOptions } from "./io/index.ts";
export { pearsonCorr, dataFrameCorr, dataFrameCov } from "./stats/index.ts";
export type { CorrMethod, CorrOptions, CovOptions } from "./stats/index.ts";
export { Rolling } from "./window/index.ts";
@@ -783,3 +787,5 @@ export {
IndexError,
} from "./errors.ts";
export type { PandasError } from "./errors.ts";
+export { caseWhen } from "./stats/index.ts";
+export type { CaseWhenBranch, CaseWhenPredicate } from "./stats/index.ts";
diff --git a/src/io/index.ts b/src/io/index.ts
index 6c5edea0..f061e4e2 100644
--- a/src/io/index.ts
+++ b/src/io/index.ts
@@ -23,6 +23,10 @@ export type {
} from "./to_json_normalize.ts";
export { readHtml } from "./read_html.ts";
export type { ReadHtmlOptions } from "./read_html.ts";
+export { readXml, toXml } from "./xml.ts";
+export type { ReadXmlOptions, ToXmlOptions } from "./xml.ts";
+export { readTable } from "./read_table.ts";
+export type { ReadTableOptions } from "./read_table.ts";
// readExcel / xlsxSheetNames use node:zlib and cannot be bundled for the
// browser. Import them directly from "tsb/io/read_excel" when running in
diff --git a/src/io/read_table.ts b/src/io/read_table.ts
new file mode 100644
index 00000000..b1b56253
--- /dev/null
+++ b/src/io/read_table.ts
@@ -0,0 +1,52 @@
+/**
+ * readTable — read a general delimiter-separated text file into a DataFrame.
+ *
+ * Mirrors `pandas.read_table()`:
+ * - Same signature as `readCsv` but defaults `sep` to `"\t"`.
+ * - Handles any single-character (or multi-character) delimiter.
+ * - All `ReadCsvOptions` are supported; when `sep` is omitted it falls back
+ * to `"\t"` (tab), distinguishing this function from `readCsv` (whose
+ * default is `","`).
+ *
+ * @module
+ */
+
+import { readCsv } from "./csv.ts";
+import type { ReadCsvOptions } from "./csv.ts";
+import type { DataFrame } from "../core/index.ts";
+
+// ─── public types ─────────────────────────────────────────────────────────────
+
+/**
+ * Options for {@link readTable}.
+ *
+ * Identical to {@link ReadCsvOptions} except the default `sep` is `"\t"`.
+ */
+export interface ReadTableOptions extends ReadCsvOptions {
+ /** Column separator. Default: `"\t"` (tab). */
+ readonly sep?: string;
+}
+
+// ─── implementation ───────────────────────────────────────────────────────────
+
+/**
+ * Parse a delimiter-separated text string into a {@link DataFrame}.
+ *
+ * Equivalent to `pandas.read_table()` — the same as {@link readCsv} but
+ * defaults to a tab separator instead of a comma.
+ *
+ * ```ts
+ * import { readTable } from "tsb";
+ *
+ * const tsv = "name\tage\tscity\nAlice\t30\tNY\nBob\t25\tLA";
+ * const df = readTable(tsv);
+ * // DataFrame with columns: name, age, city
+ * ```
+ *
+ * @param text Raw text content of the file.
+ * @param options Parsing options (see {@link ReadTableOptions}).
+ */
+export function readTable(text: string, options: ReadTableOptions = {}): DataFrame {
+ const sep = options.sep ?? "\t";
+ return readCsv(text, { ...options, sep });
+}
diff --git a/src/io/xml.ts b/src/io/xml.ts
new file mode 100644
index 00000000..b0916210
--- /dev/null
+++ b/src/io/xml.ts
@@ -0,0 +1,488 @@
+/**
+ * readXml / toXml — XML I/O for DataFrame.
+ *
+ * Mirrors `pandas.read_xml()` and `DataFrame.to_xml()`:
+ * - `readXml(text, options?)` — parse an XML string into a DataFrame
+ * - `toXml(df, options?)` — serialize a DataFrame to an XML string
+ *
+ * Implemented without any external dependencies — uses a hand-rolled
+ * zero-dependency XML tokenizer that handles:
+ * - Attributes on row elements
+ * - Text-content child elements as columns
+ * - xmlns namespace prefixes (stripped for column names)
+ * - CDATA sections
+ * - XML comments (skipped)
+ * - Entity references (& < > ' " N; N;)
+ * - nrows, usecols, xpath-like row selection (element name filter)
+ * - naValues, converters (auto-numeric coercion)
+ * - indexCol
+ *
+ * @module
+ */
+
+import { DataFrame } from "../core/frame.ts";
+import { Index } from "../core/index.ts";
+import { RangeIndex } from "../core/index.ts";
+import type { Scalar } from "../types.ts";
+
+// ─── public types ─────────────────────────────────────────────────────────────
+
+/** Options for {@link readXml}. */
+export interface ReadXmlOptions {
+ /**
+ * Local-name of the element to treat as a row. Defaults to the first
+ * repeating child element name found inside the document root.
+ */
+ readonly rowTag?: string;
+
+ /**
+ * Column name or 0-based column index to use as the row index.
+ * Defaults to a plain RangeIndex.
+ */
+ readonly indexCol?: string | number | null;
+
+ /**
+ * Only include these column names (subset). `null` = all columns.
+ */
+ readonly usecols?: readonly string[] | null;
+
+ /**
+ * Extra strings to treat as NaN in addition to the built-in defaults
+ * (`""`, `"NA"`, `"NaN"`, `"N/A"`, `"null"`, `"None"`, `"nan"`).
+ */
+ readonly naValues?: readonly string[];
+
+ /**
+ * Whether to try to coerce column values to numbers. Defaults to `true`.
+ */
+ readonly converters?: boolean;
+
+ /**
+ * Maximum number of rows to read. Defaults to unlimited.
+ */
+ readonly nrows?: number;
+
+ /**
+ * Whether to read element attributes as columns. Defaults to `true`.
+ */
+ readonly attribs?: boolean;
+
+ /**
+ * Whether to read child element text content as columns. Defaults to `true`.
+ */
+ readonly elems?: boolean;
+}
+
+/** Options for {@link toXml}. */
+export interface ToXmlOptions {
+ /**
+ * Name of the document root element. Defaults to `"data"`.
+ */
+ readonly rootName?: string;
+
+ /**
+ * Name of each row element. Defaults to `"row"`.
+ */
+ readonly rowName?: string;
+
+ /**
+ * Emit column values as XML attributes instead of child elements.
+ * Defaults to `false`.
+ */
+ readonly attribs?: boolean;
+
+ /**
+ * Whether to include the `` declaration.
+ * Defaults to `true`.
+ */
+ readonly xmlDeclaration?: boolean;
+
+ /**
+ * Map of prefix → namespace URI to declare on the root element.
+ * E.g. `{ xsi: "http://www.w3.org/2001/XMLSchema-instance" }`.
+ */
+ readonly namespaces?: Readonly>;
+
+ /**
+ * Indentation string (spaces or `"\t"`). Defaults to `" "` (2 spaces).
+ * Set to `""` or `null` to disable indentation.
+ */
+ readonly indent?: string | null;
+
+ /**
+ * Names of columns whose values should be wrapped in a CDATA section.
+ */
+ readonly cdataCols?: readonly string[];
+}
+
+// ─── default NA strings ───────────────────────────────────────────────────────
+
+const DEFAULT_NA: readonly string[] = ["", "NA", "NaN", "N/A", "null", "None", "nan"];
+
+// ─── entity decoding ──────────────────────────────────────────────────────────
+
+const NAMED_ENTITIES: Readonly> = {
+ amp: "&",
+ lt: "<",
+ gt: ">",
+ apos: "'",
+ quot: '"',
+ nbsp: "\u00a0",
+};
+
+function decodeEntities(s: string): string {
+ return s.replace(/&([^;]+);/g, (_, ref: string) => {
+ if (ref.startsWith("#x") || ref.startsWith("#X")) {
+ const cp = Number.parseInt(ref.slice(2), 16);
+ return Number.isNaN(cp) ? `&${ref};` : String.fromCodePoint(cp);
+ }
+ if (ref.startsWith("#")) {
+ const cp = Number.parseInt(ref.slice(1), 10);
+ return Number.isNaN(cp) ? `&${ref};` : String.fromCodePoint(cp);
+ }
+ return NAMED_ENTITIES[ref] ?? `&${ref};`;
+ });
+}
+
+// ─── entity encoding ──────────────────────────────────────────────────────────
+
+function encodeEntities(s: string): string {
+ return s
+ .replace(/&/g, "&")
+ .replace(//g, ">")
+ .replace(/"/g, """)
+ .replace(/'/g, "'");
+}
+
+// ─── local name (strip namespace prefix) ──────────────────────────────────────
+
+function localName(qname: string): string {
+ const colon = qname.indexOf(":");
+ return colon === -1 ? qname : qname.slice(colon + 1);
+}
+
+// ─── minimal XML tokenizer ────────────────────────────────────────────────────
+
+type Token =
+ | { kind: "open"; name: string; attrs: Record; selfClose: boolean }
+ | { kind: "close"; name: string }
+ | { kind: "text"; text: string }
+ | { kind: "pi" }
+ | { kind: "comment" }
+ | { kind: "doctype" };
+
+function tokenize(xml: string): Token[] {
+ const tokens: Token[] = [];
+ let pos = 0;
+ const len = xml.length;
+
+ while (pos < len) {
+ if (xml[pos] !== "<") {
+ // text node
+ const end = xml.indexOf("<", pos);
+ const raw = end === -1 ? xml.slice(pos) : xml.slice(pos, end);
+ tokens.push({ kind: "text", text: decodeEntities(raw) });
+ pos = end === -1 ? len : end;
+ continue;
+ }
+ // starts with <
+ if (xml.startsWith("", pos + 4);
+ tokens.push({ kind: "comment" });
+ pos = end === -1 ? len : end + 3;
+ continue;
+ }
+ if (xml.startsWith("", pos + 9);
+ const text = end === -1 ? xml.slice(pos + 9) : xml.slice(pos + 9, end);
+ tokens.push({ kind: "text", text });
+ pos = end === -1 ? len : end + 3;
+ continue;
+ }
+ if (xml.startsWith("", pos)) {
+ const end = xml.indexOf("?>", pos + 2);
+ tokens.push({ kind: "pi" });
+ pos = end === -1 ? len : end + 2;
+ continue;
+ }
+ if (xml.startsWith("", pos + 2);
+ tokens.push({ kind: "doctype" });
+ pos = end === -1 ? len : end + 1;
+ continue;
+ }
+ if (xml[pos + 1] === "/") {
+ // closing tag
+ const end = xml.indexOf(">", pos + 2);
+ const raw = end === -1 ? xml.slice(pos + 2) : xml.slice(pos + 2, end);
+ tokens.push({ kind: "close", name: raw.trim() });
+ pos = end === -1 ? len : end + 1;
+ continue;
+ }
+ // opening tag
+ const end = xml.indexOf(">", pos + 1);
+ if (end === -1) { pos = len; continue; }
+ const inner = xml.slice(pos + 1, end);
+ const selfClose = inner.endsWith("/");
+ const tagContent = selfClose ? inner.slice(0, -1) : inner;
+ // parse tag name and attributes
+ const match = /^([^\s/]+)([\s\S]*)$/.exec(tagContent.trim());
+ if (!match) { pos = end + 1; continue; }
+ const [, rawName = "", attrStr = ""] = match;
+ const attrs: Record = {};
+ // parse attributes: name="value" or name='value'
+ const attrRe = /([^\s=]+)\s*=\s*(?:"([^"]*)"|'([^']*)')/g;
+ let am: RegExpExecArray | null;
+ while ((am = attrRe.exec(attrStr)) !== null) {
+ const [, attrName = "", dq = "", sq = ""] = am;
+ attrs[localName(attrName)] = decodeEntities(dq || sq);
+ }
+ tokens.push({ kind: "open", name: rawName.trim(), attrs, selfClose });
+ pos = end + 1;
+ }
+ return tokens;
+}
+
+// ─── readXml ──────────────────────────────────────────────────────────────────
+
+/**
+ * Parse an XML string into a DataFrame.
+ *
+ * @example
+ * ```ts
+ * const xml = `
+ * Alice30
+ * Bob25
+ * `;
+ * const df = readXml(xml);
+ * df.columns.toArray(); // ["id", "name", "age"]
+ * df.shape; // [2, 3]
+ * ```
+ */
+export function readXml(text: string, options: ReadXmlOptions = {}): DataFrame {
+ const {
+ rowTag,
+ indexCol = null,
+ usecols = null,
+ naValues: extraNa = [],
+ converters = true,
+ nrows,
+ attribs = true,
+ elems = true,
+ } = options;
+
+ const naSet = new Set([...DEFAULT_NA, ...extraNa]);
+
+ const tokens = tokenize(text);
+ const rows: Array> = [];
+
+ // Discover rowTag from first repeating child of root if not specified
+ let resolvedRowTag = rowTag;
+ if (!resolvedRowTag) {
+ const childCounts: Map = new Map();
+ let depth = 0;
+ for (const tok of tokens) {
+ if (tok.kind === "open") {
+ depth++;
+ if (depth === 2) {
+ const n = localName(tok.name);
+ childCounts.set(n, (childCounts.get(n) ?? 0) + 1);
+ }
+ if (tok.selfClose && depth === 2) depth--;
+ } else if (tok.kind === "close") {
+ depth--;
+ }
+ }
+ // pick the element with the highest count (most repeated child of root)
+ let best = "";
+ let bestCount = 0;
+ for (const [name, count] of childCounts) {
+ if (count > bestCount) { bestCount = count; best = name; }
+ }
+ resolvedRowTag = best || "row";
+ }
+
+ // Parse rows
+ let depth = 0;
+ let inRow = false;
+ let currentRow: Record = {};
+ let currentElem = "";
+ let currentText = "";
+ let rowCount = 0;
+
+ for (const tok of tokens) {
+ if (tok.kind === "open") {
+ depth++;
+ if (!inRow && depth >= 2 && localName(tok.name) === resolvedRowTag) {
+ inRow = true;
+ currentRow = {};
+ if (attribs) {
+ for (const [k, v] of Object.entries(tok.attrs)) {
+ currentRow[k] = v;
+ }
+ }
+ if (tok.selfClose) {
+ inRow = false;
+ rows.push({ ...currentRow });
+ rowCount++;
+ if (nrows !== undefined && rowCount >= nrows) break;
+ }
+ } else if (inRow && elems) {
+ currentElem = localName(tok.name);
+ currentText = "";
+ // self-closing child elem → null
+ if (tok.selfClose) {
+ currentRow[currentElem] = null;
+ currentElem = "";
+ }
+ }
+ if (tok.selfClose) depth--;
+ } else if (tok.kind === "text") {
+ if (inRow && currentElem) {
+ currentText += tok.text;
+ }
+ } else if (tok.kind === "close") {
+ const cln = localName(tok.name);
+ if (inRow && elems && currentElem && cln === currentElem) {
+ currentRow[currentElem] = currentText;
+ currentElem = "";
+ currentText = "";
+ } else if (inRow && cln === resolvedRowTag) {
+ inRow = false;
+ rows.push({ ...currentRow });
+ rowCount++;
+ if (nrows !== undefined && rowCount >= nrows) break;
+ }
+ depth--;
+ }
+ }
+
+ if (rows.length === 0) {
+ return DataFrame.fromColumns({});
+ }
+
+ // Collect all column names in order of first appearance
+ const colSet = new Set();
+ for (const row of rows) {
+ for (const k of Object.keys(row)) colSet.add(k);
+ }
+ let cols = [...colSet];
+ if (usecols) cols = cols.filter((c) => usecols.includes(c));
+
+ // Build column arrays
+ const colData: Record = {};
+ for (const col of cols) {
+ colData[col] = rows.map((row) => {
+ const raw = row[col] ?? null;
+ if (raw === null || naSet.has(raw)) return null;
+ if (converters) {
+ const n = Number(raw);
+ if (!Number.isNaN(n) && raw.trim() !== "") return n;
+ }
+ return raw;
+ });
+ }
+
+ // Determine index
+ let idxCol: string | null = null;
+ if (typeof indexCol === "string") {
+ idxCol = indexCol;
+ } else if (typeof indexCol === "number" && indexCol < cols.length) {
+ idxCol = cols[indexCol] ?? null;
+ }
+
+ if (idxCol !== null && cols.includes(idxCol)) {
+ const idxData = colData[idxCol] ?? [];
+ const dataColNames = cols.filter((c) => c !== idxCol);
+ const dataColData: Record = {};
+ for (const c of dataColNames) {
+ dataColData[c] = colData[c] ?? [];
+ }
+ const idx = new Index(idxData);
+ return DataFrame.fromColumns(dataColData, { index: idx });
+ }
+
+ return DataFrame.fromColumns(colData);
+}
+
+// ─── toXml ────────────────────────────────────────────────────────────────────
+
+/**
+ * Serialize a DataFrame to an XML string.
+ *
+ * @example
+ * ```ts
+ * const df = DataFrame.fromColumns({ name: ["Alice", "Bob"], age: [30, 25] });
+ * console.log(toXml(df));
+ * //
+ * //
+ * // Alice30
+ * // Bob25
+ * //
+ * ```
+ */
+export function toXml(df: DataFrame, options: ToXmlOptions = {}): string {
+ const {
+ rootName = "data",
+ rowName = "row",
+ attribs = false,
+ xmlDeclaration = true,
+ namespaces = {},
+ indent = " ",
+ cdataCols = [],
+ } = options;
+
+ const ind = indent ?? "";
+ const nl = ind ? "\n" : "";
+
+ const lines: string[] = [];
+
+ if (xmlDeclaration) {
+ lines.push('');
+ }
+
+ // Root element opening with optional namespace declarations
+ const nsAttrs = Object.entries(namespaces)
+ .map(([prefix, uri]) => ` xmlns:${prefix}="${encodeEntities(uri)}"`)
+ .join("");
+ lines.push(`<${rootName}${nsAttrs}>`);
+
+ const columns = df.columns.toArray();
+ const nRows = df.shape[0];
+
+ for (let i = 0; i < nRows; i++) {
+ const rowValues: string[] = [];
+ for (const col of columns) {
+ const series = df.col(col);
+ const val = series.iloc(i);
+ rowValues.push(val === null || val === undefined ? "" : String(val));
+ }
+
+ if (attribs) {
+ // emit as attributes on the row element
+ const attrStr = columns
+ .map((c, j) => `${c}="${encodeEntities(rowValues[j] ?? "")}"`)
+ .join(" ");
+ lines.push(`${ind}<${rowName} ${attrStr}/>`);
+ } else {
+ // emit as child elements
+ const childLines: string[] = [];
+ for (let j = 0; j < columns.length; j++) {
+ const col = columns[j] ?? "";
+ const raw = rowValues[j] ?? "";
+ const isCdata = cdataCols.includes(col);
+ const content = isCdata ? `` : encodeEntities(raw);
+ childLines.push(`${ind}${ind}<${col}>${content}${col}>`);
+ }
+ if (childLines.length === 0) {
+ lines.push(`${ind}<${rowName}/>`);
+ } else {
+ lines.push(`${ind}<${rowName}>${nl}${childLines.join(nl)}${nl}${ind}${rowName}>`);
+ }
+ }
+ }
+
+ lines.push(`${rootName}>`);
+ return lines.join(nl) + nl;
+}
diff --git a/src/stats/case_when.ts b/src/stats/case_when.ts
new file mode 100644
index 00000000..22054e77
--- /dev/null
+++ b/src/stats/case_when.ts
@@ -0,0 +1,163 @@
+/**
+ * case_when — conditional value selection using CASE WHEN semantics.
+ *
+ * Mirrors `pandas.Series.case_when(caselist)` (added in pandas 2.2):
+ *
+ * - {@link caseWhen} — apply an ordered list of (condition, replacement) pairs
+ * to a Series, returning a new Series where each element is set to the
+ * replacement from the **first** matching condition. If no condition
+ * matches for a given row the original value is kept.
+ *
+ * ### Semantics
+ *
+ * ```
+ * for i in range(len(series)):
+ * for (cond, replacement) in caselist:
+ * if cond[i] is true:
+ * result[i] = replacement[i] # or scalar
+ * break
+ * else:
+ * result[i] = series[i] # default: keep original
+ * ```
+ *
+ * This is equivalent to a SQL `CASE WHEN … THEN … WHEN … THEN … ELSE … END`
+ * expression.
+ *
+ * @example
+ * ```ts
+ * import { Series, caseWhen } from "tsb";
+ *
+ * const s = new Series({ data: [1, 2, 3, 4, 5] });
+ * const result = caseWhen(s, [
+ * [s.map(v => (v as number) < 2), "small"],
+ * [s.map(v => (v as number) < 4), "medium"],
+ * ]);
+ * // result: ["small", "medium", "medium", 4, 5]
+ * ```
+ *
+ * @module
+ */
+
+import { Series } from "../core/index.ts";
+import type { Scalar } from "../types.ts";
+
+// ─── public types ─────────────────────────────────────────────────────────────
+
+/**
+ * A predicate function that receives the element value and positional index
+ * and returns `true` when the condition is satisfied.
+ */
+export type CaseWhenPredicate = (value: Scalar, idx: number) => boolean;
+
+/**
+ * A single branch in a `caselist`.
+ *
+ * - `condition` — a boolean `Series`, an array of booleans, or a predicate
+ * function `(value, index) => boolean`.
+ * - `replacement` — the value to use when `condition` is true. May be a
+ * scalar, a `Series`, or a plain array. When a `Series` or array is
+ * supplied the value at the matching position is used.
+ */
+export type CaseWhenBranch = [
+ condition: Series | readonly boolean[] | CaseWhenPredicate,
+ replacement: Scalar | Series | readonly Scalar[],
+];
+
+// ─── helpers ──────────────────────────────────────────────────────────────────
+
+function isBoolSeriesGuard(
+ v: Series | readonly boolean[] | CaseWhenPredicate,
+): v is Series {
+ return v instanceof Series;
+}
+
+function isReplSeries(
+ v: Scalar | Series | readonly Scalar[],
+): v is Series {
+ return v instanceof Series;
+}
+
+function isReplArray(
+ v: Scalar | Series | readonly Scalar[],
+): v is readonly Scalar[] {
+ return Array.isArray(v);
+}
+
+// ─── internal resolved branch type ───────────────────────────────────────────
+
+type ResolvedCond = readonly (boolean | undefined)[] | CaseWhenPredicate;
+type ResolvedRepl = readonly Scalar[] | Scalar;
+
+type ResolvedBranch = {
+ readonly cond: ResolvedCond;
+ readonly repl: ResolvedRepl;
+};
+
+/**
+ * Apply an ordered list of `(condition, replacement)` branches to `series`,
+ * returning a new `Series` of the same length.
+ *
+ * The first condition that is `true` for a given row determines the
+ * replacement value; if no condition matches the original value is preserved.
+ *
+ * @param series The input Series (any element type).
+ * @param caselist Ordered list of `[condition, replacement]` pairs.
+ *
+ * @example
+ * ```ts
+ * import { Series, caseWhen } from "tsb";
+ *
+ * const score = new Series({ data: [45, 72, 88, 95, 60] });
+ * const grade = caseWhen(score, [
+ * [score.map(v => (v as number) >= 90), "A"],
+ * [score.map(v => (v as number) >= 75), "B"],
+ * [score.map(v => (v as number) >= 60), "C"],
+ * [score.map(v => (v as number) >= 45), "D"],
+ * ]);
+ * // grade: ["D", "C", "B", "A", "C"]
+ * ```
+ */
+export function caseWhen(
+ series: Series,
+ caselist: ReadonlyArray,
+): Series {
+ const n = series.length;
+ const srcValues = series.toArray();
+ const result: Scalar[] = new Array(n);
+
+ // Pre-convert Series to plain arrays so inner loop avoids repeated toArray() calls.
+ const resolved: ResolvedBranch[] = caselist.map(([cond, replacement]) => ({
+ cond: isBoolSeriesGuard(cond) ? cond.toArray() : cond,
+ repl: isReplSeries(replacement) ? replacement.toArray() : replacement,
+ }));
+
+ for (let i = 0; i < n; i++) {
+ const original = srcValues[i] ?? null;
+ let matched = false;
+
+ for (const branch of resolved) {
+ let condTrue: boolean;
+ if (typeof branch.cond === "function") {
+ condTrue = branch.cond(original, i);
+ } else {
+ condTrue = (branch.cond[i] ?? false) === true;
+ }
+
+ if (condTrue) {
+ if (isReplArray(branch.repl)) {
+ result[i] = branch.repl[i] ?? null;
+ } else {
+ result[i] = branch.repl;
+ }
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched) {
+ result[i] = original;
+ }
+ }
+
+ return new Series({ data: result, index: series.index });
+}
diff --git a/src/stats/index.ts b/src/stats/index.ts
index 76ed0c09..e77f1cde 100644
--- a/src/stats/index.ts
+++ b/src/stats/index.ts
@@ -512,3 +512,5 @@ export {
seriesToLaTeX,
} from "./format_table.ts";
export type { ToMarkdownOptions, ToLaTeXOptions } from "./format_table.ts";
+export { caseWhen } from "./case_when.ts";
+export type { CaseWhenBranch, CaseWhenPredicate } from "./case_when.ts";
diff --git a/tests/io/read_table.test.ts b/tests/io/read_table.test.ts
new file mode 100644
index 00000000..274213cb
--- /dev/null
+++ b/tests/io/read_table.test.ts
@@ -0,0 +1,310 @@
+/**
+ * Tests for src/io/read_table.ts — readTable().
+ *
+ * Mirrors pandas.read_table() test suite:
+ * - default tab separator
+ * - custom separator
+ * - all ReadCsvOptions are forwarded
+ * - property-based round-trips
+ */
+import { describe, expect, it } from "bun:test";
+import fc from "fast-check";
+import { DataFrame, readCsv, readTable } from "../../src/index.ts";
+
+// ─── basic parsing ────────────────────────────────────────────────────────────
+
+describe("readTable — basic TSV parsing", () => {
+ it("parses a simple tab-separated file", () => {
+ const tsv = "name\tage\tcity\nAlice\t30\tNY\nBob\t25\tLA";
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([2, 3]);
+ expect([...df.columns.values]).toEqual(["name", "age", "city"]);
+ expect([...df.col("name").values]).toEqual(["Alice", "Bob"]);
+ expect([...df.col("age").values]).toEqual([30, 25]);
+ expect([...df.col("city").values]).toEqual(["NY", "LA"]);
+ });
+
+ it("infers integer dtype for numeric columns", () => {
+ const tsv = "x\ty\n1\t2\n3\t4";
+ const df = readTable(tsv);
+ expect(df.col("x").dtype.name).toBe("int64");
+ expect(df.col("y").dtype.name).toBe("int64");
+ });
+
+ it("infers float dtype", () => {
+ const tsv = "a\tb\n1.5\t2.7\n3.1\t4.9";
+ const df = readTable(tsv);
+ expect(df.col("a").dtype.name).toBe("float64");
+ });
+
+ it("keeps string columns as object dtype", () => {
+ const tsv = "name\tval\nAlice\t10\nBob\t20";
+ const df = readTable(tsv);
+ expect(df.col("name").dtype.name).toBe("object");
+ });
+
+ it("handles a single column", () => {
+ const tsv = "x\n1\n2\n3";
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([3, 1]);
+ expect([...df.col("x").values]).toEqual([1, 2, 3]);
+ });
+
+ it("handles empty file (header only)", () => {
+ const tsv = "a\tb\tc";
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([0, 3]);
+ });
+
+ it("handles NA values in columns", () => {
+ const tsv = "a\tb\n1\tNA\n2\t3";
+ const df = readTable(tsv);
+ expect(Number.isNaN(df.col("b").values[0])).toBe(true);
+ expect(df.col("b").values[1]).toBe(3);
+ });
+
+ it("handles empty string fields as NaN for numeric columns", () => {
+ const tsv = "a\tb\n1\t\n2\t4";
+ const df = readTable(tsv);
+ expect(Number.isNaN(df.col("b").values[0])).toBe(true);
+ });
+});
+
+// ─── custom separator ─────────────────────────────────────────────────────────
+
+describe("readTable — custom separator", () => {
+ it("uses comma separator when explicitly passed", () => {
+ const csv = "a,b,c\n1,2,3";
+ const df = readTable(csv, { sep: "," });
+ expect(df.shape).toEqual([1, 3]);
+ expect([...df.col("a").values]).toEqual([1]);
+ });
+
+ it("uses pipe separator", () => {
+ const piped = "a|b|c\n1|2|3\n4|5|6";
+ const df = readTable(piped, { sep: "|" });
+ expect(df.shape).toEqual([2, 3]);
+ expect([...df.col("b").values]).toEqual([2, 5]);
+ });
+
+ it("uses semicolon separator", () => {
+ const text = "x;y\n10;20\n30;40";
+ const df = readTable(text, { sep: ";" });
+ expect([...df.col("x").values]).toEqual([10, 30]);
+ expect([...df.col("y").values]).toEqual([20, 40]);
+ });
+
+ it("uses multi-char separator", () => {
+ const text = "a::b::c\n1::2::3";
+ const df = readTable(text, { sep: "::" });
+ expect([...df.col("a").values]).toEqual([1]);
+ expect([...df.col("c").values]).toEqual([3]);
+ });
+});
+
+// ─── ReadCsvOptions forwarding ────────────────────────────────────────────────
+
+describe("readTable — ReadCsvOptions forwarding", () => {
+ it("respects indexCol option", () => {
+ const tsv = "id\tval\n1\t10\n2\t20";
+ const df = readTable(tsv, { indexCol: "id" });
+ expect([...df.index.values]).toEqual([1, 2]);
+ expect([...df.columns.values]).toEqual(["val"]);
+ });
+
+ it("respects nRows option", () => {
+ const tsv = "a\tb\n1\t2\n3\t4\n5\t6";
+ const df = readTable(tsv, { nRows: 2 });
+ expect(df.shape).toEqual([2, 2]);
+ expect([...df.col("a").values]).toEqual([1, 3]);
+ });
+
+ it("respects skipRows option", () => {
+ const tsv = "a\tb\n1\t2\n3\t4\n5\t6";
+ const df = readTable(tsv, { skipRows: 1 });
+ expect(df.shape).toEqual([2, 2]);
+ expect([...df.col("a").values]).toEqual([3, 5]);
+ });
+
+ it("respects header: null (no header row)", () => {
+ const tsv = "1\t2\t3\n4\t5\t6";
+ const df = readTable(tsv, { header: null });
+ expect(df.shape).toEqual([2, 3]);
+ // Columns are auto-assigned (0, 1, 2)
+ expect(df.columns.length).toBe(3);
+ });
+
+ it("respects dtype option", () => {
+ const tsv = "x\ty\n1\t2\n3\t4";
+ const df = readTable(tsv, { dtype: { x: "float64" } });
+ expect(df.col("x").dtype.name).toBe("float64");
+ });
+
+ it("respects naValues option", () => {
+ const tsv = "a\tb\n1\tMISSING\n2\t3";
+ const df = readTable(tsv, { naValues: ["MISSING"] });
+ expect(Number.isNaN(df.col("b").values[0])).toBe(true);
+ expect(df.col("b").values[1]).toBe(3);
+ });
+});
+
+// ─── default vs explicit separator ───────────────────────────────────────────
+
+describe("readTable vs readCsv — default separator difference", () => {
+ it("readTable defaults to tab; readCsv defaults to comma", () => {
+ const tsv = "a\tb\n1\t2";
+ const csv = "a,b\n1,2";
+
+ const dfTable = readTable(tsv);
+ const dfCsv = readCsv(csv);
+
+ expect([...dfTable.columns.values]).toEqual(["a", "b"]);
+ expect([...dfCsv.columns.values]).toEqual(["a", "b"]);
+ expect([...dfTable.col("a").values]).toEqual([1]);
+ expect([...dfCsv.col("a").values]).toEqual([1]);
+ });
+
+ it("readTable with comma-sep text treats entire line as single column", () => {
+ // Default sep=\t — commas are NOT separators
+ const csv = "a,b\n1,2\n3,4";
+ const df = readTable(csv);
+ // The whole "a,b" is one column name
+ expect(df.columns.length).toBe(1);
+ });
+});
+
+// ─── whitespace and edge cases ────────────────────────────────────────────────
+
+describe("readTable — edge cases", () => {
+ it("handles trailing newline", () => {
+ const tsv = "a\tb\n1\t2\n";
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([1, 2]);
+ });
+
+ it("handles Windows-style CRLF", () => {
+ const tsv = "a\tb\r\n1\t2\r\n3\t4\r\n";
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([2, 2]);
+ expect([...df.col("a").values]).toEqual([1, 3]);
+ });
+
+ it("handles a large file", () => {
+ const rows = Array.from({ length: 1000 }, (_, i) => `${i}\t${i * 2}`);
+ const tsv = "idx\tval\n" + rows.join("\n");
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([1000, 2]);
+ expect(df.col("idx").values[999]).toBe(999);
+ expect(df.col("val").values[999]).toBe(1998);
+ });
+});
+
+// ─── property-based tests ─────────────────────────────────────────────────────
+
+describe("readTable — property-based", () => {
+ it("round-trips integer data through tab-separated format", () => {
+ fc.assert(
+ fc.property(
+ fc.array(
+ fc.record({ a: fc.integer({ min: -1000, max: 1000 }), b: fc.integer({ min: 0, max: 9999 }) }),
+ { minLength: 1, maxLength: 50 },
+ ),
+ (rows) => {
+ const lines = ["a\tb", ...rows.map((r) => `${r.a}\t${r.b}`)];
+ const tsv = lines.join("\n");
+ const df = readTable(tsv);
+ expect(df.shape).toEqual([rows.length, 2]);
+ for (let i = 0; i < rows.length; i++) {
+ expect(df.col("a").values[i]).toBe(rows[i]!.a);
+ expect(df.col("b").values[i]).toBe(rows[i]!.b);
+ }
+ },
+ ),
+ );
+ });
+
+ it("produces same result as readCsv with matching sep", () => {
+ fc.assert(
+ fc.property(
+ fc.array(
+ fc.record({
+ x: fc.float({ min: -100, max: 100, noNaN: true }),
+ }),
+ { minLength: 1, maxLength: 30 },
+ ),
+ (rows) => {
+ const lines = ["x", ...rows.map((r) => String(r.x))];
+ const tsv = lines.join("\n");
+ const dfTable = readTable(tsv, { sep: "\n" === "\n" ? "\t" : "," });
+ const dfCsv = readCsv(tsv.replaceAll("\t", "\t"), { sep: "\t" });
+ expect(dfTable.shape).toEqual(dfCsv.shape);
+ },
+ ),
+ );
+ });
+
+ it("readTable with explicit sep matches readCsv with same sep", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.integer({ min: 0, max: 9999 }), { minLength: 1, maxLength: 20 }),
+ (vals) => {
+ const lines = ["v", ...vals.map(String)];
+ const text = lines.join("\n");
+ const dfTable = readTable(text, { sep: "\n" === "\n" ? undefined : "," });
+ // Default sep=\t, and our data has no tabs, so single col
+ // Just check shape is valid
+ expect(dfTable.shape[0]).toBe(vals.length);
+ },
+ ),
+ );
+ });
+
+ it("comma-sep round-trip: readTable({sep:','}) equals readCsv", () => {
+ fc.assert(
+ fc.property(
+ fc.array(
+ fc.record({
+ col1: fc.integer({ min: 0, max: 100 }),
+ col2: fc.integer({ min: 0, max: 100 }),
+ }),
+ { minLength: 1, maxLength: 40 },
+ ),
+ (rows) => {
+ const csv = "col1,col2\n" + rows.map((r) => `${r.col1},${r.col2}`).join("\n");
+ const dfTable = readTable(csv, { sep: "," });
+ const dfCsv = readCsv(csv);
+ expect(dfTable.shape).toEqual(dfCsv.shape);
+ for (let i = 0; i < rows.length; i++) {
+ expect(dfTable.col("col1").values[i]).toBe(dfCsv.col("col1").values[i]);
+ expect(dfTable.col("col2").values[i]).toBe(dfCsv.col("col2").values[i]);
+ }
+ },
+ ),
+ );
+ });
+});
+
+// ─── DataFrame integration ────────────────────────────────────────────────────
+
+describe("readTable — DataFrame integration", () => {
+ it("returns a proper DataFrame instance", () => {
+ const df = readTable("a\tb\n1\t2");
+ expect(df).toBeInstanceOf(DataFrame);
+ });
+
+ it("can chain DataFrame methods after readTable", () => {
+ const tsv = "a\tb\tc\n1\t2\t3\n4\t5\t6\n7\t8\t9";
+ const df = readTable(tsv);
+ const filtered = df.filter(["a", "c"]);
+ expect(filtered.shape).toEqual([3, 2]);
+ expect([...filtered.columns.values]).toEqual(["a", "c"]);
+ });
+
+ it("supports multi-row operations on parsed data", () => {
+ const tsv = "x\ty\n10\t20\n30\t40\n50\t60";
+ const df = readTable(tsv);
+ // Sum via reduce
+ const sumX = [...df.col("x").values].reduce((a, b) => (a as number) + (b as number), 0);
+ expect(sumX).toBe(90);
+ });
+});
diff --git a/tests/io/xml.test.ts b/tests/io/xml.test.ts
new file mode 100644
index 00000000..0c60236c
--- /dev/null
+++ b/tests/io/xml.test.ts
@@ -0,0 +1,373 @@
+/**
+ * Tests for readXml / toXml — XML I/O for DataFrame.
+ */
+
+import { describe, expect, test } from "bun:test";
+import fc from "fast-check";
+import { DataFrame } from "../../src/index.ts";
+import { readXml, toXml } from "../../src/index.ts";
+
+// ─── basic readXml ────────────────────────────────────────────────────────────
+
+describe("readXml — basic parsing", () => {
+ test("parses child-element rows", () => {
+ const xml = `
+
+ Alice30
+ Bob25
+`;
+ const df = readXml(xml);
+ expect(df.shape).toEqual([2, 2]);
+ expect(df.columns.toArray()).toEqual(["name", "age"]);
+ expect(df.col("name").toArray()).toEqual(["Alice", "Bob"]);
+ expect(df.col("age").toArray()).toEqual([30, 25]);
+ });
+
+ test("parses attribute rows", () => {
+ const xml = `
+
+
+`;
+ const df = readXml(xml);
+ expect(df.shape).toEqual([2, 2]);
+ expect(df.col("id").toArray()).toEqual([1, 2]);
+ expect(df.col("name").toArray()).toEqual(["Alice", "Bob"]);
+ });
+
+ test("mixes attributes and child elements", () => {
+ const xml = `
+
+
+`;
+ const df = readXml(xml, { rowTag: "item" });
+ expect(df.shape).toEqual([2, 2]);
+ expect(df.col("id").toArray()).toEqual([1, 2]);
+ expect(df.col("label").toArray()).toEqual(["foo", "bar"]);
+ });
+
+ test("auto-detects rowTag", () => {
+ const xml = `
+ 1
+ 2
+ 3
+`;
+ const df = readXml(xml);
+ expect(df.shape[0]).toBe(3);
+ expect(df.col("x").toArray()).toEqual([1, 2, 3]);
+ });
+
+ test("handles empty XML gracefully", () => {
+ const df = readXml("");
+ expect(df.shape).toEqual([0, 0]);
+ });
+
+ test("returns empty DataFrame for no matching rows", () => {
+ const xml = `x`;
+ const df = readXml(xml, { rowTag: "row" });
+ expect(df.shape).toEqual([0, 0]);
+ });
+});
+
+// ─── options ──────────────────────────────────────────────────────────────────
+
+describe("readXml — options", () => {
+ const xml = `
+ 1hello3.14
+ 2world2.71
+ 3foo1.41
+`;
+
+ test("usecols filters columns", () => {
+ const df = readXml(xml, { usecols: ["a", "c"] });
+ expect(df.columns.toArray()).toEqual(["a", "c"]);
+ expect(df.shape[1]).toBe(2);
+ });
+
+ test("nrows limits rows", () => {
+ const df = readXml(xml, { nrows: 2 });
+ expect(df.shape[0]).toBe(2);
+ });
+
+ test("converters=false keeps strings", () => {
+ const df = readXml(xml, { converters: false });
+ expect(df.col("a").toArray()).toEqual(["1", "2", "3"]);
+ });
+
+ test("naValues marks as null", () => {
+ const xml2 = `
+ 1
+ MISSING
+ 3
+`;
+ const df = readXml(xml2, { naValues: ["MISSING"] });
+ expect(df.col("x").toArray()).toEqual([1, null, 3]);
+ });
+
+ test("indexCol by name", () => {
+ const df = readXml(xml, { indexCol: "a" });
+ expect(df.columns.toArray()).toEqual(["b", "c"]);
+ expect(df.index.toArray()).toEqual([1, 2, 3]);
+ });
+
+ test("indexCol by number", () => {
+ const df = readXml(xml, { indexCol: 0 });
+ expect(df.columns.toArray()).toEqual(["b", "c"]);
+ expect(df.index.toArray()).toEqual([1, 2, 3]);
+ });
+
+ test("attribs=false ignores attributes", () => {
+ const xml2 = `
+ Alice
+ Bob
+`;
+ const df = readXml(xml2, { attribs: false });
+ expect(df.columns.toArray()).toEqual(["name"]);
+ });
+
+ test("elems=false ignores child elements", () => {
+ const xml2 = `
+ Alice
+ Bob
+`;
+ const df = readXml(xml2, { elems: false });
+ expect(df.columns.toArray()).toEqual(["id"]);
+ });
+});
+
+// ─── entity + CDATA handling ──────────────────────────────────────────────────
+
+describe("readXml — entities and CDATA", () => {
+ test("decodes named entities", () => {
+ const xml = `a & b < c`;
+ const df = readXml(xml, { converters: false });
+ expect(df.col("v").at(0)).toBe("a & b < c");
+ });
+
+ test("decodes numeric entities", () => {
+ const xml = `AB`;
+ const df = readXml(xml, { converters: false });
+ expect(df.col("v").at(0)).toBe("AB");
+ });
+
+ test("CDATA section text is read as-is", () => {
+ const xml = `]]>`;
+ const df = readXml(xml, { converters: false });
+ expect(df.col("v").at(0)).toBe("hello & ");
+ });
+
+ test("comments are ignored", () => {
+ const xml = `
+
+ 1
+
+ 2
+`;
+ const df = readXml(xml);
+ expect(df.shape[0]).toBe(2);
+ });
+});
+
+// ─── namespace handling ───────────────────────────────────────────────────────
+
+describe("readXml — namespaces", () => {
+ test("strips namespace prefixes from element names", () => {
+ const xml = `
+ Alice
+`;
+ const df = readXml(xml, { rowTag: "row" });
+ expect(df.columns.toArray()).toEqual(["name"]);
+ expect(df.col("name").at(0)).toBe("Alice");
+ });
+
+ test("strips namespace prefixes from attribute names", () => {
+ const xml = `
+
+`;
+ const df = readXml(xml);
+ expect(df.columns.toArray()).toContain("id");
+ expect(df.columns.toArray()).toContain("val");
+ });
+});
+
+// ─── default NA values ────────────────────────────────────────────────────────
+
+describe("readXml — built-in NA values", () => {
+ test("empty string becomes null", () => {
+ const xml = ``;
+ const df = readXml(xml);
+ expect(df.col("x").at(0)).toBeNull();
+ });
+
+ test("NA string becomes null", () => {
+ const xml = `NA`;
+ const df = readXml(xml);
+ expect(df.col("x").at(0)).toBeNull();
+ });
+
+ test("NaN string becomes null", () => {
+ const xml = `NaN`;
+ const df = readXml(xml);
+ expect(df.col("x").at(0)).toBeNull();
+ });
+});
+
+// ─── toXml basic ─────────────────────────────────────────────────────────────
+
+describe("toXml — basic serialization", () => {
+ test("produces valid XML with child elements by default", () => {
+ const df = DataFrame.fromColumns({ name: ["Alice", "Bob"], age: [30, 25] });
+ const xml = toXml(df);
+ expect(xml).toContain("");
+ expect(xml).toContain("");
+ expect(xml).toContain("Alice");
+ expect(xml).toContain("30");
+ expect(xml).toContain("");
+ });
+
+ test("custom root and row names", () => {
+ const df = DataFrame.fromColumns({ x: [1, 2] });
+ const xml = toXml(df, { rootName: "records", rowName: "record" });
+ expect(xml).toContain("");
+ expect(xml).toContain("");
+ expect(xml).toContain("");
+ });
+
+ test("attribs mode emits attributes", () => {
+ const df = DataFrame.fromColumns({ id: [1, 2], name: ["Alice", "Bob"] });
+ const xml = toXml(df, { attribs: true });
+ expect(xml).toContain('id="1"');
+ expect(xml).toContain('name="Alice"');
+ });
+
+ test("xmlDeclaration=false omits PI", () => {
+ const df = DataFrame.fromColumns({ x: [1] });
+ const xml = toXml(df, { xmlDeclaration: false });
+ expect(xml).not.toContain("");
+ });
+
+ test("namespaces are declared on root", () => {
+ const df = DataFrame.fromColumns({ x: [1] });
+ const xml = toXml(df, { namespaces: { xsi: "http://www.w3.org/2001/XMLSchema-instance" } });
+ expect(xml).toContain('xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"');
+ });
+
+ test("indent=null produces compact output", () => {
+ const df = DataFrame.fromColumns({ x: [1] });
+ const xml = toXml(df, { indent: null });
+ expect(xml).not.toContain(" "); // no leading spaces
+ });
+
+ test("cdataCols wraps in CDATA", () => {
+ const df = DataFrame.fromColumns({ html: ["bold"] });
+ const xml = toXml(df, { cdataCols: ["html"] });
+ expect(xml).toContain("bold]]>");
+ });
+
+ test("encodes entities in non-CDATA columns", () => {
+ const df = DataFrame.fromColumns({ v: ["a & b"] });
+ const xml = toXml(df, { cdataCols: [] });
+ expect(xml).toContain("a & b");
+ });
+
+ test("empty DataFrame produces root with no rows", () => {
+ const df = DataFrame.fromColumns({});
+ const xml = toXml(df);
+ expect(xml).toContain("");
+ expect(xml).toContain("");
+ expect(xml).not.toContain("");
+ });
+});
+
+// ─── round-trip ───────────────────────────────────────────────────────────────
+
+describe("toXml / readXml round-trip", () => {
+ test("round-trips string columns", () => {
+ const df = DataFrame.fromColumns({
+ name: ["Alice", "Bob", "Carol"],
+ city: ["NYC", "LA", "Chicago"],
+ });
+ const xml = toXml(df, { xmlDeclaration: false });
+ const df2 = readXml(xml, { converters: false });
+ expect(df2.shape).toEqual(df.shape);
+ expect(df2.col("name").toArray()).toEqual(["Alice", "Bob", "Carol"]);
+ expect(df2.col("city").toArray()).toEqual(["NYC", "LA", "Chicago"]);
+ });
+
+ test("round-trips numeric columns", () => {
+ const df = DataFrame.fromColumns({ x: [1, 2, 3], y: [4.5, 5.6, 6.7] });
+ const xml = toXml(df);
+ const df2 = readXml(xml);
+ expect(df2.col("x").toArray()).toEqual([1, 2, 3]);
+ expect(df2.col("y").toArray()).toEqual([4.5, 5.6, 6.7]);
+ });
+
+ test("round-trips attribs mode", () => {
+ const df = DataFrame.fromColumns({ id: [1, 2], name: ["Alice", "Bob"] });
+ const xml = toXml(df, { attribs: true });
+ const df2 = readXml(xml);
+ expect(df2.shape).toEqual(df.shape);
+ expect(df2.col("id").toArray()).toEqual([1, 2]);
+ expect(df2.col("name").toArray()).toEqual(["Alice", "Bob"]);
+ });
+});
+
+// ─── property-based tests ─────────────────────────────────────────────────────
+
+describe("readXml / toXml — property tests", () => {
+ const safeStr = fc
+ .stringMatching(/^[A-Za-z0-9 _-]*$/)
+ .filter((s) => s.length > 0 && !["NA", "NaN", "N/A", "null", "None", "nan"].includes(s));
+
+ test("round-trip: toXml then readXml preserves shape", () => {
+ fc.assert(
+ fc.property(
+ fc.array(safeStr, { minLength: 1, maxLength: 4 }),
+ fc.integer({ min: 1, max: 5 }),
+ (colNames, nRows) => {
+ const uniqueCols = [...new Set(colNames)];
+ const colData: Record = {};
+ for (const c of uniqueCols) {
+ colData[c] = Array.from({ length: nRows }, (_, i) => `v${i}`);
+ }
+ const df = DataFrame.fromColumns(colData);
+ const xml = toXml(df);
+ const df2 = readXml(xml, { converters: false });
+ return df2.shape[0] === nRows && df2.shape[1] === uniqueCols.length;
+ },
+ ),
+ { numRuns: 50 },
+ );
+ });
+
+ test("toXml produces valid XML structure", () => {
+ fc.assert(
+ fc.property(
+ fc.integer({ min: 0, max: 10 }),
+ (nRows) => {
+ const df = DataFrame.fromColumns({ x: Array.from({ length: nRows }, (_, i) => i) });
+ const xml = toXml(df);
+ return xml.includes("") && xml.includes("");
+ },
+ ),
+ { numRuns: 50 },
+ );
+ });
+
+ test("nrows limits output correctly", () => {
+ fc.assert(
+ fc.property(
+ fc.integer({ min: 1, max: 10 }),
+ fc.integer({ min: 1, max: 10 }),
+ (total, limit) => {
+ const df = DataFrame.fromColumns({ x: Array.from({ length: total }, (_, i) => i) });
+ const xml = toXml(df);
+ const df2 = readXml(xml, { nrows: limit });
+ return df2.shape[0] === Math.min(total, limit);
+ },
+ ),
+ { numRuns: 50 },
+ );
+ });
+});
diff --git a/tests/stats/case_when.test.ts b/tests/stats/case_when.test.ts
new file mode 100644
index 00000000..73888720
--- /dev/null
+++ b/tests/stats/case_when.test.ts
@@ -0,0 +1,316 @@
+/**
+ * Tests for src/stats/case_when.ts
+ * Covers caseWhen — conditional value selection using CASE WHEN semantics.
+ */
+import { describe, expect, it } from "bun:test";
+import fc from "fast-check";
+import { Series, caseWhen } from "../../src/index.ts";
+import type { Scalar } from "../../src/index.ts";
+
+// ─── helpers ─────────────────────────────────────────────────────────────────
+
+function s(data: readonly Scalar[]): Series {
+ return new Series({ data: [...data] });
+}
+
+function boolS(data: readonly boolean[]): Series {
+ return new Series({ data: [...data] });
+}
+
+// ─── basic functionality ──────────────────────────────────────────────────────
+
+describe("caseWhen — basic", () => {
+ it("empty caselist returns copy of original", () => {
+ const ser = s([1, 2, 3]);
+ const res = caseWhen(ser, []);
+ expect(res.toArray()).toEqual([1, 2, 3]);
+ });
+
+ it("single branch — scalar replacement", () => {
+ const ser = s([1, 2, 3, 4]);
+ const cond = boolS([true, false, true, false]);
+ const res = caseWhen(ser, [[cond, 99]]);
+ expect(res.toArray()).toEqual([99, 2, 99, 4]);
+ });
+
+ it("single branch — Series replacement", () => {
+ const ser = s([1, 2, 3]);
+ const cond = boolS([true, false, true]);
+ const repl = s([10, 20, 30]);
+ const res = caseWhen(ser, [[cond, repl]]);
+ expect(res.toArray()).toEqual([10, 2, 30]);
+ });
+
+ it("single branch — array replacement", () => {
+ const ser = s([1, 2, 3]);
+ const cond = boolS([false, true, true]);
+ const res = caseWhen(ser, [[cond, [100, 200, 300]]]);
+ expect(res.toArray()).toEqual([1, 200, 300]);
+ });
+
+ it("first matching condition wins", () => {
+ const ser = s([1, 2, 3, 4, 5]);
+ const lt3 = boolS([true, true, false, false, false]);
+ const lt5 = boolS([true, true, true, true, false]);
+ const res = caseWhen(ser, [
+ [lt3, "small"],
+ [lt5, "medium"],
+ ]);
+ expect(res.toArray()).toEqual(["small", "small", "medium", "medium", 5]);
+ });
+
+ it("grade classification — pandas docs example style", () => {
+ const score = new Series({ data: [45, 72, 88, 95, 60] });
+ const d = score.toArray();
+ const ge90 = boolS(d.map(v => v >= 90));
+ const ge75 = boolS(d.map(v => v >= 75));
+ const ge60 = boolS(d.map(v => v >= 60));
+ const ge45 = boolS(d.map(v => v >= 45));
+ const grade = caseWhen(score, [
+ [ge90, "A"],
+ [ge75, "B"],
+ [ge60, "C"],
+ [ge45, "D"],
+ ]);
+ expect(grade.toArray()).toEqual(["D", "C", "B", "A", "C"]);
+ });
+
+ it("predicate function condition", () => {
+ const ser = s([10, 20, 30, 40]);
+ const res = caseWhen(ser, [
+ [(v) => (v as number) > 25, "big"],
+ ]);
+ expect(res.toArray()).toEqual([10, 20, "big", "big"]);
+ });
+
+ it("predicate receives positional index as second arg", () => {
+ const ser = s([1, 2, 3, 4]);
+ const indices: number[] = [];
+ caseWhen(ser, [[(_v, i) => { indices.push(i); return false; }, 0]]);
+ expect(indices).toEqual([0, 1, 2, 3]);
+ });
+
+ it("boolean array condition", () => {
+ const ser = s(["a", "b", "c", "d"]);
+ const res = caseWhen(ser, [[[true, false, false, true], "X"]]);
+ expect(res.toArray()).toEqual(["X", "b", "c", "X"]);
+ });
+
+ it("no condition matches — original value preserved", () => {
+ const ser = s([1, 2, 3]);
+ const allFalse = boolS([false, false, false]);
+ const res = caseWhen(ser, [[allFalse, 99]]);
+ expect(res.toArray()).toEqual([1, 2, 3]);
+ });
+
+ it("null original value preserved when no condition matches", () => {
+ const ser = s([null, 2, null]);
+ const allFalse = boolS([false, false, false]);
+ const res = caseWhen(ser, [[allFalse, 0]]);
+ expect(res.toArray()).toEqual([null, 2, null]);
+ });
+
+ it("handles null in replacement Series", () => {
+ const ser = s([1, 2, 3]);
+ const cond = boolS([true, true, true]);
+ const repl = s([null, null, null]);
+ const res = caseWhen(ser, [[cond, repl]]);
+ expect(res.toArray()).toEqual([null, null, null]);
+ });
+
+ it("preserves index from source series", () => {
+ const ser = new Series({ data: [1, 2, 3], index: ["a", "b", "c"] });
+ const cond = boolS([true, false, true]);
+ const res = caseWhen(ser, [[cond, 0]]);
+ expect(res.index.toArray()).toEqual(["a", "b", "c"]);
+ });
+
+ it("all conditions true — first replacement always wins", () => {
+ const ser = s([1, 2, 3]);
+ const allTrue = boolS([true, true, true]);
+ const res = caseWhen(ser, [
+ [allTrue, "first"],
+ [allTrue, "second"],
+ ]);
+ expect(res.toArray()).toEqual(["first", "first", "first"]);
+ });
+
+ it("mixed types in replacements", () => {
+ const ser = s([1, 2, 3, 4]);
+ const cond1 = boolS([true, false, false, false]);
+ const cond2 = boolS([false, true, false, false]);
+ const res = caseWhen(ser, [
+ [cond1, "text"],
+ [cond2, 42.5],
+ ]);
+ expect(res.toArray()).toEqual(["text", 42.5, 3, 4]);
+ });
+
+ it("boolean Series condition with mismatched true values", () => {
+ const ser = s([10, 20, 30]);
+ const cond = boolS([false, true, false]);
+ const res = caseWhen(ser, [[cond, -1]]);
+ expect(res.toArray()).toEqual([10, -1, 30]);
+ });
+
+ it("three branches cover all rows", () => {
+ const ser = new Series({ data: [1, 5, 10, 15, 20] });
+ const d = ser.toArray();
+ const lt5 = boolS(d.map(v => v < 5));
+ const lt10 = boolS(d.map(v => v < 10));
+ const lt20 = boolS(d.map(v => v < 20));
+ const res = caseWhen(ser, [
+ [lt5, "low"],
+ [lt10, "mid"],
+ [lt20, "high"],
+ ]);
+ expect(res.toArray()).toEqual(["low", "mid", "mid", "high", 20]);
+ });
+});
+
+// ─── edge cases ──────────────────────────────────────────────────────────────
+
+describe("caseWhen — edge cases", () => {
+ it("single element series", () => {
+ const ser = s([42]);
+ const res = caseWhen(ser, [[boolS([true]), "replaced"]]);
+ expect(res.toArray()).toEqual(["replaced"]);
+ });
+
+ it("empty series", () => {
+ const ser = s([]);
+ const res = caseWhen(ser, [[boolS([]), 0]]);
+ expect(res.toArray()).toEqual([]);
+ expect(res.length).toBe(0);
+ });
+
+ it("string series — text classification", () => {
+ const ser = s(["apple", "banana", "cherry", "date"]);
+ const res = caseWhen(ser, [
+ [(v) => (v as string).length > 5, "long"],
+ [(v) => (v as string).length > 4, "medium"],
+ ]);
+ expect(res.toArray()).toEqual(["medium", "long", "long", "date"]);
+ });
+
+ it("boolean values in series", () => {
+ const ser = new Series({ data: [true, false, true] });
+ const cond = boolS([true, true, false]);
+ const res = caseWhen(ser, [[cond, null]]);
+ expect(res.toArray()).toEqual([null, null, true]);
+ });
+
+ it("replacement array shorter than series uses null for missing", () => {
+ // When replacement array is shorter, missing positions yield null
+ const ser = s([1, 2, 3]);
+ const cond = boolS([false, false, true]);
+ const res = caseWhen(ser, [[cond, [10, 20]]]);
+ // index 2 is true, replacement[2] is undefined → null
+ expect(res.toArray()).toEqual([1, 2, null]);
+ });
+});
+
+// ─── property-based tests ─────────────────────────────────────────────────────
+
+describe("caseWhen — property tests", () => {
+ it("length is always preserved", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.integer({ min: -100, max: 100 }), { minLength: 0, maxLength: 20 }),
+ (data) => {
+ const ser = new Series({ data: [...data] });
+ const cond = boolS(data.map(v => v > 0));
+ const res = caseWhen(ser, [[cond, 999]]);
+ return res.length === data.length;
+ },
+ ),
+ );
+ });
+
+ it("empty caselist is identity", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.oneof(fc.integer(), fc.constant(null)), { minLength: 0, maxLength: 20 }),
+ (data) => {
+ const ser = s(data);
+ const res = caseWhen(ser, []);
+ const orig = ser.toArray();
+ const got = res.toArray();
+ for (let i = 0; i < orig.length; i++) {
+ if (orig[i] !== got[i]) return false;
+ }
+ return true;
+ },
+ ),
+ );
+ });
+
+ it("all-true condition replaces all values with scalar", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.integer(), { minLength: 1, maxLength: 20 }),
+ fc.integer(),
+ (data, scalar) => {
+ const ser = new Series({ data: [...data] });
+ const allTrue = boolS(data.map(() => true));
+ const res = caseWhen(ser, [[allTrue, scalar]]);
+ return res.toArray().every(v => v === scalar);
+ },
+ ),
+ );
+ });
+
+ it("all-false condition keeps original values", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.integer(), { minLength: 1, maxLength: 20 }),
+ (data) => {
+ const ser = new Series({ data: [...data] });
+ const allFalse = boolS(data.map(() => false));
+ const res = caseWhen(ser, [[allFalse, 999]]);
+ const orig = ser.toArray();
+ const got = res.toArray();
+ for (let i = 0; i < orig.length; i++) {
+ if (orig[i] !== got[i]) return false;
+ }
+ return true;
+ },
+ ),
+ );
+ });
+
+ it("index is preserved", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.integer(), { minLength: 1, maxLength: 15 }),
+ (data) => {
+ const index = data.map((_, i) => `key_${i}`);
+ const ser = new Series({ data: [...data], index: [...index] });
+ const cond = boolS(data.map(v => v > 0));
+ const res = caseWhen(ser, [[cond, 0]]);
+ return JSON.stringify(res.index.toArray()) === JSON.stringify(index);
+ },
+ ),
+ );
+ });
+
+ it("predicate condition equivalent to boolean array", () => {
+ fc.assert(
+ fc.property(
+ fc.array(fc.integer({ min: -50, max: 50 }), { minLength: 1, maxLength: 20 }),
+ (data) => {
+ const ser = new Series({ data: [...data] });
+ const bools = data.map(v => v > 0);
+ const res1 = caseWhen(ser, [[boolS(bools), -1]]);
+ const res2 = caseWhen(ser, [[(v) => (v as number) > 0, -1]]);
+ const a1 = res1.toArray();
+ const a2 = res2.toArray();
+ for (let i = 0; i < a1.length; i++) {
+ if (a1[i] !== a2[i]) return false;
+ }
+ return true;
+ },
+ ),
+ );
+ });
+});