Embeddings

In Short

An embedding is a list of numbers (a vector) that represents the meaning of a piece of text, an image, or other data. Items with similar meanings produce similar vectors. This numerical representation of meaning is what allows computers to search by concept, not just by keyword, and it is the foundation of modern semantic search, RAG, and recommendation systems.

01. What It Is

An embedding is a dense numeric vector: an ordered list of floating-point numbers, typically hundreds or thousands of values long. An embedding model takes unstructured data (a sentence, a document, an image, a piece of audio) and maps it to a specific point in a high-dimensional vector space.

The key property: items that are semantically similar end up close together in that vector space, and items that are semantically different end up far apart. "Dog" and "puppy" produce nearby vectors. "Dog" and "spreadsheet" produce distant ones.

This is fundamentally different from keyword search, where "automobile" and "car" share zero characters in common and a naive search engine treats them as unrelated. Embedding models are trained to understand that they mean the same thing, so their vectors are close.

02. Why It Matters

Embeddings are the computational substrate for a wide class of modern AI features:

Semantic search:
A user types "how to cancel my subscription." A keyword search finds pages containing those exact words. A semantic search finds pages meaning "account cancellation," even if they say "terminate your plan" instead.

Retrieval-Augmented Generation (RAG):
When an LLM needs to answer questions about a private knowledge base, it cannot hold the entire database in its context window. Instead, documents are pre-converted into embeddings and stored in a vector database. At query time, the question is also embedded, and the most semantically similar documents are retrieved and passed to the LLM. This is the dominant architecture for knowledge-intensive LLM applications in 2026.

Recommendations:
Embed users and products into the same space. A user's embedding is shaped by their history. Products close to that embedding are good recommendations. Streaming platforms, e-commerce, and content feeds all use this pattern.

Clustering and anomaly detection:
Financial institutions embed transactions and flag outliers: points far from any cluster in transaction space may indicate fraud.

Cross-modal retrieval:
Modern multimodal embedding models (like SigLIP 2 or OpenCLIP) embed both text and images into a shared space, enabling image search by text description and vice versa.

03. How It Works

Embedding models are neural networks trained with a contrastive objective. During training, pairs of semantically similar texts are pushed toward each other in vector space, while dissimilar pairs are pushed apart. The result is a model that encodes meaning as geometry.

The resulting vectors have a fixed number of dimensions regardless of input length. Common embedding dimensions:

128-256 dimensions: fast and compact, useful for lightweight retrieval but loses semantic nuance.
512 dimensions: the balanced default for most general-purpose retrieval tasks.
1024-1536+ dimensions: richer representation, better for multilingual and multimodal tasks, but higher storage and compute cost.

To compare two embeddings, the standard metric is cosine similarity. Cosine similarity measures the angle between two vectors, ignoring their magnitude. A value of 1.0 means identical direction (maximum similarity). A value of 0 means perpendicular (unrelated). A value of -1.0 means opposite directions (antonyms or contradictory content).

The formula is: cosine similarity = (A dot B) / (|A| times |B|)

In a vector database, this comparison is done at scale using approximate nearest neighbor (ANN) algorithms, which find the closest vectors among millions of stored embeddings in milliseconds.

04. Key Terms

Vector:
An ordered list of numbers. An embedding is a specific kind of vector produced by a neural network to represent semantic meaning.

Vector space:
The high-dimensional mathematical space in which embeddings live. Similar concepts cluster together in this space.

Dimensions:
The number of values in an embedding vector. Also called the embedding size or depth.

Cosine similarity:
The standard measure of similarity between two embeddings. Ranges from -1 to 1, where 1 is identical and 0 is unrelated.

Vector database:
A specialized database optimized for storing and querying embeddings. Examples: Pinecone, Weaviate, Chroma, pgvector (Postgres extension). Used in RAG pipelines to retrieve relevant document chunks.

Semantic similarity:
Similarity of meaning, as opposed to syntactic similarity (similarity of words). Embeddings capture semantic similarity.

Dense vector:
A vector where most or all values are nonzero. Contrasted with sparse vectors (like TF-IDF bag-of-words), where most values are zero.

MTEB:
Massive Text Embedding Benchmark, the standard evaluation framework for comparing embedding models across retrieval, classification, clustering, and other tasks.

05. Examples / Analogies

Imagine mapping every city in the world to coordinates on a globe. London and Paris are close. London and Tokyo are far. The coordinates are not the cities: they are a compressed representation that preserves geographic relationships.

Embeddings do the same thing for meaning. Every sentence gets mapped to a point in a high-dimensional space. "The bank approved my loan" and "My mortgage application was accepted" land near each other. "The river bank was muddy" lands far from both. The model has learned to separate the financial sense of "bank" from the geographical one.

In a RAG pipeline, the workflow is:

Convert all documents to embeddings and store them in a vector database.
When a user asks a question, convert the question to an embedding.
Find the stored documents with the highest cosine similarity to the question embedding.
Pass those documents as context to the LLM.
The LLM synthesizes an answer grounded in retrieved content.

06. Common Misconceptions

"Embeddings are just word vectors."
Early embeddings like Word2Vec (2013) operated at the word level: a single vector per word, regardless of context. Modern sentence embeddings and document embeddings capture full-context meaning. "Bank" in a financial context produces a different embedding than "bank" in a river context.

"More dimensions are always better."
Higher dimensions capture more nuance but increase storage, memory, and query latency. 512 dimensions outperforms 1536 for many practical retrieval tasks because ANN search is faster and more accurate at lower dimensions when the training data matches the use case.

"Vector search replaces keyword search."
In practice, hybrid search (combining vector similarity with keyword matching) outperforms pure vector search for most production retrieval tasks. Keywords catch exact product codes, names, and identifiers that semantic search can miss.

"Similar embeddings mean identical meaning."
Embeddings measure semantic proximity, not identity. High cosine similarity means the model judged the texts conceptually related, not logically equivalent. Two contradictory statements on the same topic can produce similar embeddings.

In Short

01. What It Is

02. Why It Matters

03. How It Works

04. Key Terms

05. Examples / Analogies

06. Common Misconceptions

Verified against primary sources

Key terms

Tags

Sources

More in Inside an LLM