Skip to content

Poor compressor behavior on tiny array. #8131

@danking

Description

@danking

What happened?

In [1]: import vortex as vx
In [2]: x = vx.array(['alpha', 'charlie', 'bravo'])
In [3]: vx.io.write(x, '/tmp/foo.vortex')

Then drop that file into https://explore.vortex.dev .

You'll notice that the array is encoded using FSST with a symbol array of:

61 00 00 00 00 00 00 00                              a.......

And this codes array:

00 ff 6c ff 70 ff 68 00  ff 63 ff 68 00 ff 72 ff     ..l.p.h..c.h..r.
6c ff 69 ff 65 ff 62 ff  72 00 ff 76 ff 6f           l.i.e.b.r..v.o

A bit odd. There are 17 characters in total, but we compress that into 39 bytes (the extra byte is
for the length of the only symbol "a").

Seems like we should just use a string view?


I had to give a txt extension to upload it here but obviously it is a vortex file.

foo.vortex.txt

Steps to reproduce

In [1]: import vortex as vx
In [2]: x = vx.array(['alpha', 'charlie', 'bravo'])
In [3]: vx.io.write(x, '/tmp/foo.vortex')

Environment

uv run python3 --version
Python 3.11.6
# uname -a
Darwin Daniels-MacBook-Pro-3.local 24.6.0 Darwin Kernel Version 24.6.0: Wed Nov  5 21:32:38 PST 2025; root:xnu-11417.140.69.705.2~1/RELEASE_ARM64_T6031 arm64
# git show
commit 11b5de1ef4879a5c99563b90751c42c8076db32e (HEAD, origin/ct/tq-error, ct/tq-error)
Author: Connor Tsui <connor.tsui20@gmail.com>
Date:   Wed May 27 17:32:44 2026 +0100

    address comments

    Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

diff --git a/Cargo.lock b/Cargo.lock

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions