How to Build a Vector Database Using Amazon S3 Vectors

July 29, 2025

By Benedetto Proietti

Amazon S3,
S3 Vectors,
Faiss,
AI,
Vector Search,
Open Source,
Vector Database,
Redis,
Jecq,
Cost Optimization

...And Say Goodbye to Expensive SaaS Pricing.

Vector databases are powerful — and often expensive. In this article, you'll learn how to build a scalable, low-cost vector search system using S3 Vectors, Product Quantization (PQ), and open-source tools.

Here’s an estimated price comparison for storing 1 billion and 10 billion vectors using the most common SaaS vector databases. These numbers are pulled directly from each provider’s pricing calculator.

Want the short version?

Vector databases are expensive because they rely on powerful, always-on hardware. They keep in-memory indexes fresh and caches hot — and that costs money.

Old Tricks, New Vector World

For tabular or log data, we decoupled compute and storage a long time ago: store data cheaply in S3 or MinIO, and spin up compute (like Spark or Presto) only when needed.

Amazon has now extended this model to vector embeddings with Amazon S3 Vectors. [Quick dive here]

S3 Vectors lets you store huge volumes of vector data at low cost and run similarity searches in under a second — ideal for batch workloads and analytics.

Can we ditch expensive Vector DBs now?

Not quite. S3 Vectors doesn’t offer the low latency needed for real-time use cases — like fraud detection, recommendation engines, or chatbots that require sub-100 ms responses.

Instead, think of S3 Vectors as the durable, budget-friendly foundation. You’ll still need a lightweight layer on top to meet real-time latency requirements.

Level 1: Download and Run

Let’s start simple: use open-source software out of the box, no code changes, just run it.

We lower similarity search latency by using a classic Computer Science technique — indexes — and storing data in RAM (which is fast but expensive).

Product Quantization (PQ): Fast, Memory-Efficient Search

Performing exact distance calculations (cosine, Euclidean) on billions of 768-dimensional vectors is too slow and compute-heavy.

Product Quantization (PQ) helps by compressing vectors into compact forms. This makes searches 10–100× faster — with minimal accuracy loss.

How PQ Works

PQ splits each high-dimensional vector into smaller chunks (e.g., groups of 8 dimensions), then maps each chunk to the closest centroid in a precomputed codebook. Only the centroid IDs are stored.

At query time, instead of comparing against billions of raw vectors, the system compares to ~256 centroids per chunk — massively reducing compute time.

For most NLP workloads, PQ delivers excellent recall while cutting memory and compute costs.

Tool Selection

FAISS — originally developed by Facebook AI Research — is the go-to library for efficient similarity search and clustering of dense vectors. It’s widely adopted for high-performance vector indexing at scale. But I recommend JECQ, a drop-in replacement with 6× lower memory usage.

Disclaimer: I created JECQ. That said, it works well. But use FAISS if you prefer.

You can find the open-source JECQ project on GitHub.

In-Memory Cache

You can cache a subset of raw vectors in RAM using tools like Redis or Valkey, depending on your licensing needs.

For 10 billion vectors (~30TB in S3), storing just 1% in RAM (about 300GB) can make a big difference. Pricey, but manageable.

Level 1 Architecture

Let’s walk through the architecture:

A router service handles incoming similarity search requests. It’s stateless and can scale horizontally.
Each node loads a copy of the JECQ (or FAISS) index in memory.
The router uses JECQ to find candidate vector IDs.
It then checks Redis for raw vectors:
- Cache hit: Redis returns vectors. Router re-ranks and returns results.
- Cache miss: Router pulls vectors from S3, re-ranks, and returns results.

Level 2: Teaser

Level 1 works fine for datasets up to ~1 billion vectors or demo workloads. But if you want 10–100 ms P95 latency at multi-billion scale, you’ll need more:

Local raw vectors on NVMe: A middle layer (5–10% of raw size, ~1.5TB) between RAM and S3 to avoid frequent S3 fetches.

Hierarchical data layers: JECQ + Redis/NVMe integration enables local posting list retrieval, turning 100 ms S3 reads into 2–5 ms NVMe reads.
Index sharding: Splits PQ clusters across nodes and avoids duplicating 100GB+ compressed data per node.
Advanced cache management: Store frequent queries, support MFU/LFU/LRU caching strategies, and pre-load data based on user behavior.
Aggressive S3 Vectors indexing: Each query hits just one index. A single S3 bucket can hold 10K indexes, each with ≤50M vectors. Smart indexing helps reduce latency significantly.

All of this requires solid engineering chops — but it's necessary if you want to build a cost-effective vector database with 10–100 ms latency on top of S3 Vectors.

Stay tuned for Level 2!

Frequently Asked Questions

Not on its own — but when paired with tools like Product Quantization (PQ), in-memory caching (e.g., Redis), and libraries like JECQ or FAISS, Amazon S3 can serve as the foundation of a fully functional vector search system. This architecture offers a cost-effective and scalable alternative to commercial SaaS vector databases, especially for workloads where sub-100 ms latency isn't critical.

Using S3 Vectors offloads storage to a low-cost, durable system and allows you to decouple compute from storage. This avoids the always-on, memory-heavy infrastructure typical of SaaS vector databases, significantly lowering costs — especially at scale.

JECQ is a drop-in replacement for FAISS that provides similar functionality but with a 6× lower memory footprint. It's ideal for cost-sensitive, large-scale vector search applications where memory optimization is key. JECQ is fully open source and available on GitHub, making it easy for developers to integrate, modify, and contribute.