vortex @vortexdotdev

An extensible, state of the art columnar file format. Formerly at @spiraldb, now a Linux Foundation project (@LFAIDataFdn). Apache-2.0 vortex.dev Github Joined May 2025

Tweets

22
Followers

255
Following

14
Likes

61

Will Manning @willmanning

3 weeks ago

so cool to see another blazing fast database built on vortex!

Just announced at Interrupt! SmithDB. Agent traces have outgrown the databases built to hold them. That’s why we built SmithDB, a purpose-built distributed database for agent observability. Read the announcement from Co-Founder @ankush_gola11 → langchain.com/blog/introduci…

15 40 161 133K 124

0 8 33 71K 16

View Details

Ankush Gola @ankush_gola11

3 weeks ago

We leveraged two amazing open source projects when building SmithDB. One is @ApacheDataFusio: an extensible Rust based query engine. We built custom execution plans specifically tuned for our workloads and storage backend, and DataFusion made it straightforward to plumb everything together. The other is @vortexdotdev: an extensible file format that allows you to build custom layouts with specific encoding and chunking strategies for different columns. I would highly recommend checking out both of these projects if you're interested in modern data systems.

Ankush Gola @ankush_gola11

3 weeks ago

We built SmithDB: the database purpose built for agent observability workloads that now powers many parts of LangSmith. Agent observability presents a challenging data problem. Agent traces can contain tens of thousands of intermediate spans and large, unbounded payloads. These

11 19 119 107K 97

2 18 105 18K 67

View Details

Spice AI @spice_ai

2 months ago

The Research Behind Modern Data Compression & @vortexdotdev When we chose Vortex as the storage layer for Spice Cayenne (the data accelerator engine in Spice), we were betting on decades of database research finally reaching production-ready maturity. Here's the research behind Vortex: 📄 BtrBlocks (SIGMOD 2023) - The core algorithm from the Technical University of Munich. Cascading multiple lightweight encodings outperforms monolithic compression. Optimize for decompression speed, not just compression ratio. 📄 FastLanes (VLDB 2023) - Hardware-friendly integer compression. Structures bit-packing to maximize SIMD utilization across AVX-512, AVX2, and ARM NEON. Near-memory-bandwidth decompression. 📄 FSST (VLDB 2020) - Fast Static Symbol Table for strings. Near-LZ4 ratios at 5-10× faster decompression. Critical for string-heavy columns. 📄 ALP (CWI Amsterdam) - Adaptive Lossless floating-Point compression. Exploits real-world float patterns (prices with 2 decimals, sensor readings with limited precision). 📄 MonetDB/X100 + Morsel-Driven Parallelism - Foundations for vectorized, NUMA-aware query execution that Vortex builds on. The result? Compression that is tailored to your data: • Integers via FastLanes bit-packing • Floats via ALP adaptive encoding • Strings via FSST symbol tables • Timestamps via delta encoding • Sorted columns via run-length encoding Why does this matter for production systems? 1️⃣ Query performance scales with decompression speed. Focus on decode performance translates directly to faster queries. 2️⃣ Automatic encoding selection means zero configuration. The algorithm samples your data and picks optimal strategies per column. 3️⃣ SIMD acceleration is baked in. FastLanes was designed for vectorized, hardware accelerated execution from day one. 4️⃣ Zero-copy Arrow access. Data decompresses directly to Arrow arrays with no intermediate copies. Vortex is now a Linux Foundation AI & Data project, and researchers are building on it (Anyblox, F3). You get SOTA research in production systems. The future of data storage is exciting. To learn more about our Vortex implementation, check out the blog: hubs.ly/Q04bGfvf0 #datafusion #ai #data #vortex #spiceai #arrow #parquet

0 1 6 372 2

View Details

Will Manning @willmanning

2 months ago

Connor Tsui & I just merged a first cut of TurboQuant into @vortexdotdev , already validated on production embeddings 🚀🚀🚀

1 5 12 2K 1

View Details

vortex @vortexdotdev

2 months ago

Fastest OSS file format, in both performance and velocity

Will Manning @willmanning

2 months ago

Connor Tsui & I just merged a first cut of TurboQuant into @vortexdotdev , already validated on production embeddings 🚀🚀🚀

1 5 12 2K 1

0 0 1 248 0

View Details

vortex @vortexdotdev

2 months ago

you took up with Weasley, but he can't afford sliceable cascaded encodings. now your random access is dogged, and your cortisol is properly spiked, potter

0 0 3 90 0

View Details

vortex @vortexdotdev

2 months ago

hey man, thrilled that you're interested in contributing. we'll be waiting for you in slack vortex.dev/slack

MeekMill @MeekMill

2 months ago

I need a GitHub too! Is it like that or nah?

855 2K 13K 5.8M 1K

0 0 1 161 1

View Details

Luke Kim @lukekim

3 months ago

CASE-WHEN support coming to @vortexdotdev Guess I'm a Vortex contributor now!

0 1 8 402 2

View Details

vortex @vortexdotdev

4 months ago

🦆❤️🚀

DuckDB @duckdb

4 months ago

DuckDB now supports reading from and writing to the Vortex file format! The DuckDB Labs and Spiral teams have worked together to make Vortex available as a core extension in DuckDB. Vortex is an open source, columnar file format whose design is heavily influenced by recent

6 37 309 24K 112

0 2 6 548 5

View Details

Luke Kim @lukekim

5 months ago

🌪️ Why LF Vortex for hot data? @ApacheParquet great compression, slow decode @ApacheArrow instant decode, no compression Vortex: encoding-efficient compression with SIMD decode to Arrow 80% of Parquet's compression, 10x faster decode

1 5 11 819 1

View Details

Alfonso Subiotto ❄️ @asubiotto

6 months ago

Happy to share that I've been nominated to the @vortexdotdev Technical Steering Committee! It's been fun and productive switching to Vortex from Parquet as our storage format at Polar Signals and I'm excited to continue contributing to the Vortex project.

1 1 4 352 0

View Details

Will Manning @willmanning

6 months ago

Super cool, they forked @DeltaLakeOSS to replace Parquet (for data) with Vortex and JSON (for metadata) with Vortex. Huge performance gains! Maybe we should upstream this one 😁 @vortexdotdev

Polar Signals @PolarSignalsIO

6 months ago

🧊 New on the Polar Signals Blog — Our Delta Lake Fork Purpose-built for our continuous profiling product. In our latest post, we walk through how Delta Lake works, and the changes we've made to improve performance for our product. 👉 Read the full post: buff.ly/KwHINtO

0 6 22 9K 12

4 5 65 8K 24

View Details

Will Manning @willmanning

6 months ago

So cool!! Polar Signals reduced query runtimes by 70% switching from Parquet to Vortex 🤯🚀

Polar Signals @PolarSignalsIO

6 months ago

We completed a major project to switch our storage file format from Parquet to Vortex 🌪️ resulting in 70% average query performance improvement across the board 🚀 Learn more about how rethinking interface-imposed limitations unlocked these gains in our latest blog post 👇

2 7 27 4K 6

0 3 24 2K 3

View Details

Polar Signals @PolarSignalsIO

6 months ago

polarsignals.com/blog/posts/202…

0 1 9 446 3

View Details

Polar Signals @PolarSignalsIO

6 months ago

2 7 27 4K 6

View Details

Andrew Lamb @andrewlamb1111

8 months ago

The talk on @SpiralDB at @CMUDB : youtube.com/watch?v=zyn_T5… is a great one. I think it would also be interesting to hear a counterpoint about @ApacheParquet that explains actual technical details of that format, the Cathedral vs Bizzaar management, options with Metadata, etc

2 15 110 9K 48

View Details

CMU Database Group @CMUDB

8 months ago

Today's Future Data Systems Seminar Speaker: Will Manning (@willmanning) will present @SpiralDB's Vortex file format (@vortexdotdev). Vortex is now a @LFAIDataFdn project. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futured…