Parquet Storage Engine

Deep dive into how Fortifiers manages data lifecycle, compression, and retrieval using Apache Parquet.


Automated Archival Schedule

Our system runs automated workers that identify data eligible for archival based on age and access patterns. This process is transparent to the user; you don't need to do anything.

Data TypeArchival TriggerSchedule
Product CatalogFull catalog refreshDaily @ 2:00 AM
Chat HistoryConversations > 30 days oldDaily @ 3:00 AM
User MemoriesBeyond 1,000 most recentSundays @ 4:00 AM
Audit LogsLogs > 7 days oldSundays @ 5:00 AM

Hybrid Query Strategy

When you request data (e.g., "Show me all quotes from 2023"), our API intelligently routes the query:

  1. Hot Path: The API checks PostgreSQL for recent, active data.
  2. Cold Path: Simultaneously, it queries the Parquet archives using optimized vector operations.
  3. Merge: The results are combined instantly and returned to the frontend.

This "Hybrid RAG" (Retrieval-Augmented Generation) approach ensures that our AI agents, like Freya, have access to your entire business history without the latency of scanning a massive database.

Compression & Efficiency

Parquet is a columnar storage format. This means if we want to calculate the sum of a "Total Price" column, we don't need to read the "Customer Name" or "Date" columns.

  • Compression Ratio: We typically see 13x storage reduction compared to raw JSON/SQL.
  • Vector Search: We link Parquet files to Qdrant vector stores, allowing semantic search over archived data.