03. How It Works
Exact vs. approximate nearest neighbor
Exact k-NN computes the true nearest neighbors by comparing the query to every vector. Accuracy is 100%. Speed does not scale.
Approximate nearest neighbor (ANN) builds an index structure that enables the search to skip most of the vector space and examine only a promising subset. Recall (what fraction of the true top-k are returned) is typically 95-99%, which is sufficient for retrieval tasks. The two dominant ANN index types are HNSW and IVF.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layer graph during ingestion. At the bottom layer every vector is a node, connected to its nearest neighbors. At each higher layer the graph becomes sparser, containing only a random subset of vectors. The structure resembles a highway system: top layers are express roads for fast long-distance navigation, lower layers are local roads for fine-grained search.
At query time, the search starts at the top layer, greedy-walks toward the query's neighborhood, drops to the next layer, repeats, and terminates at the bottom layer with a candidate set. This typically achieves 98%+ recall with query latency around 2-15ms at 10 million vectors.
HNSW uses more memory than IVF (the graph connections must be stored) and build time is slower, but query performance is consistently strong across diverse query distributions.
IVF (Inverted File Index)
IVF clusters the vector space using k-means during index construction, producing a set of centroids. Each stored vector is assigned to its nearest centroid. At query time, the system computes distances from the query to all centroids, selects the top nprobe nearest clusters, and does exact search only within those clusters.
IVF uses less memory than HNSW, builds faster, and performs better under heavy metadata filtering (because candidate sets are well-defined). Recall degrades if the data distribution shifts after index build, requiring periodic re-clustering.
Product Quantization (PQ)
Product Quantization is a compression technique applied on top of either index type. It splits each high-dimensional vector into sub-vectors and represents each sub-vector by a short code pointing to an entry in a learned codebook. A 1536-dimension float32 vector (6 KB) can be compressed to a few dozen bytes with a 64:1 ratio. This allows billions of vectors to fit in RAM that would otherwise require disk. Search runs on compressed codes with an optional re-scoring step on the original vectors for the top candidates.
Distance metrics
Cosine similarity measures the angle between two vectors, ignoring magnitude. It is the standard for text embeddings because embedding norms often carry length information that is not meaningful for semantic similarity. Cosine similarity = dot product of unit-normalized vectors.
Dot product (inner product) measures both angle and magnitude. Suitable when magnitude is meaningful, such as in recommendation systems where magnitude might encode popularity or confidence.
Euclidean distance (L2) measures straight-line distance in vector space. More sensitive to magnitude differences. Common in image and vision embeddings.
Most vector databases support all three. For text-based RAG with OpenAI or Cohere embeddings, cosine similarity is the default and correct choice.