Parquet Storage Engine

Deep dive into how Fortifiers manages data lifecycle, compression, and retrieval using Apache Parquet.

Automated Archival Schedule

Our system runs automated workers that identify data eligible for archival based on age and access patterns. This process is transparent to the user; you don't need to do anything.

Data Type	Archival Trigger	Schedule
Product Catalog	Full catalog refresh	Daily @ 2:00 AM
Chat History	Conversations > 30 days old	Daily @ 3:00 AM
User Memories	Beyond 1,000 most recent	Sundays @ 4:00 AM
Audit Logs	Logs > 7 days old	Sundays @ 5:00 AM

Hybrid Query Strategy

When you request data (e.g., "Show me all quotes from 2023"), our API intelligently routes the query:

Hot Path: The API checks PostgreSQL for recent, active data.
Cold Path: Simultaneously, it queries the Parquet archives using optimized vector operations.
Merge: The results are combined instantly and returned to the frontend.

This "Hybrid RAG" (Retrieval-Augmented Generation) approach ensures that our AI agents, like Freya, have access to your entire business history without the latency of scanning a massive database.

Compression & Efficiency

Parquet is a columnar storage format. This means if we want to calculate the sum of a "Total Price" column, we don't need to read the "Customer Name" or "Date" columns.

Compression Ratio: We typically see 13x storage reduction compared to raw JSON/SQL.
Vector Search: We link Parquet files to Qdrant vector stores, allowing semantic search over archived data.