July 29, 2025
By Benedetto Proietti
Amazon S3,
S3 Vectors,
Faiss,
AI,
Vector Search,
Open Source,
Vector Database,
Redis,
Jecq,
Cost Optimization
...And Say Goodbye to Expensive SaaS Pricing.
Vector databases are powerful — and often expensive. In this article, you'll learn how to build a scalable, low-cost vector search system using S3 Vectors, Product Quantization (PQ), and open-source tools.
Here’s an estimated price comparison for storing 1 billion and 10 billion vectors using the most common SaaS vector databases. These numbers are pulled directly from each provider’s pricing calculator.
Want the short version?
Vector databases are expensive because they rely on powerful, always-on hardware. They keep in-memory indexes fresh and caches hot — and that costs money.
For tabular or log data, we decoupled compute and storage a long time ago: store data cheaply in S3 or MinIO, and spin up compute (like Spark or Presto) only when needed.
Amazon has now extended this model to vector embeddings with Amazon S3 Vectors. [Quick dive here]
S3 Vectors lets you store huge volumes of vector data at low cost and run similarity searches in under a second — ideal for batch workloads and analytics.
Can we ditch expensive Vector DBs now?
Not quite. S3 Vectors doesn’t offer the low latency needed for real-time use cases — like fraud detection, recommendation engines, or chatbots that require sub-100 ms responses.
Instead, think of S3 Vectors as the durable, budget-friendly foundation. You’ll still need a lightweight layer on top to meet real-time latency requirements.
Let’s start simple: use open-source software out of the box, no code changes, just run it.
We lower similarity search latency by using a classic Computer Science technique — indexes — and storing data in RAM (which is fast but expensive).
Performing exact distance calculations (cosine, Euclidean) on billions of 768-dimensional vectors is too slow and compute-heavy.
Product Quantization (PQ) helps by compressing vectors into compact forms. This makes searches 10–100× faster — with minimal accuracy loss.
PQ splits each high-dimensional vector into smaller chunks (e.g., groups of 8 dimensions), then maps each chunk to the closest centroid in a precomputed codebook. Only the centroid IDs are stored.
At query time, instead of comparing against billions of raw vectors, the system compares to ~256 centroids per chunk — massively reducing compute time.
For most NLP workloads, PQ delivers excellent recall while cutting memory and compute costs.
FAISS — originally developed by Facebook AI Research — is the go-to library for efficient similarity search and clustering of dense vectors. It’s widely adopted for high-performance vector indexing at scale. But I recommend JECQ, a drop-in replacement with 6× lower memory usage.
Disclaimer: I created JECQ. That said, it works well. But use FAISS if you prefer.
You can find the open-source JECQ project on GitHub.
You can cache a subset of raw vectors in RAM using tools like Redis or Valkey, depending on your licensing needs.
For 10 billion vectors (~30TB in S3), storing just 1% in RAM (about 300GB) can make a big difference. Pricey, but manageable.
Let’s walk through the architecture:
Level 1 works fine for datasets up to ~1 billion vectors or demo workloads. But if you want 10–100 ms P95 latency at multi-billion scale, you’ll need more:
All of this requires solid engineering chops — but it's necessary if you want to build a cost-effective vector database with 10–100 ms latency on top of S3 Vectors.
Stay tuned for Level 2!
Not on its own — but when paired with tools like Product Quantization (PQ), in-memory caching (e.g., Redis), and libraries like JECQ or FAISS, Amazon S3 can serve as the foundation of a fully functional vector search system. This architecture offers a cost-effective and scalable alternative to commercial SaaS vector databases, especially for workloads where sub-100 ms latency isn't critical.
Using S3 Vectors offloads storage to a low-cost, durable system and allows you to decouple compute from storage. This avoids the always-on, memory-heavy infrastructure typical of SaaS vector databases, significantly lowering costs — especially at scale.
JECQ is a drop-in replacement for FAISS that provides similar functionality but with a 6× lower memory footprint. It's ideal for cost-sensitive, large-scale vector search applications where memory optimization is key. JECQ is fully open source and available on GitHub, making it easy for developers to integrate, modify, and contribute.
Ready to discuss your software engineering needs with our team of experts?