03. How It Works
Collaborative filtering
Collaborative filtering (CF) recommends items based on the behavior of similar users or items, without needing to understand what an item is.
User-based CF: Find users with similar rating or interaction histories, then recommend items they liked that the target user has not seen. Similarity is typically computed with cosine similarity or Pearson correlation. The main drawback is scalability: computing pairwise similarity across millions of users is expensive.
Item-based CF: Compute similarity between items based on how users rated them. Amazon popularized this approach. It is more stable over time because item-item relationships change less frequently than user-item interactions.
Matrix factorization: The dominant CF method since the Netflix Prize (2006-2009). The user-item interaction matrix is decomposed into two lower-dimensional matrices: one containing user latent factors and one containing item latent factors. The dot product of a user's and an item's latent vectors predicts the interaction. Simon Funk's implementation of this approach, later called Funk MF, achieved state-of-the-art RMSE on the Netflix Prize dataset. SVD++ extends Funk MF by incorporating implicit feedback (clicks, views) alongside explicit ratings.
CF problems are cold start, sparsity, and scalability. A brand-new user or item has no interaction history, so the model cannot generate useful embeddings.
Content-based filtering
Content-based methods recommend items similar to items the user has interacted with, using item features. Pandora's Music Genome Project tagged each song with 450 attributes and built stations by finding songs with similar attribute vectors to those a user liked. The approach handles cold-start on items well (a new song can be tagged immediately) but tends toward over-specialization: it cannot recommend genuinely surprising items.
Hybrid systems
Most production systems combine both. Netflix uses collaborative signals (what similar users watched) alongside content signals (genre, director, cast) and contextual signals (device, time of day). Hybrid approaches outperform pure methods on both accuracy and cold-start resilience, per Ricci et al.'s Recommender Systems Handbook (2022).
The two-tower neural architecture
The two-tower model is the dominant architecture for large-scale retrieval. Two neural networks encode users and items independently into a shared vector space. At inference, all item embeddings are pre-computed and stored in an approximate nearest neighbor index (e.g., ScaNN or FAISS). Given a user embedding, the retrieval step is an efficient nearest neighbor search over potentially billions of items.
The user tower typically takes interaction history, demographics, and session context as input. The item tower takes metadata, content embeddings, and popularity signals. The towers are trained jointly to maximize similarity between a user's embedding and embeddings of items they interacted with, and to minimize similarity with negative samples.
Yi et al. (2019), "Sampling-bias-corrected neural modeling for large corpus item recommendations" (Google, RecSys 2019), formalized the two-tower retrieval architecture. An earlier precursor is Covington et al. (2016), "Deep neural networks for YouTube recommendations" (RecSys 2016), which introduced the deep candidate-generation-and-ranking pipeline that two-tower retrieval later refined. The architecture is now standard at Google, Meta, Twitter, and most large-scale platforms.
LLM-based and generative recommendation
The newest frontier replaces or augments embedding-based systems with large language models. Two directions:
- LLMs as feature encoders: Use a pre-trained LLM to generate rich item embeddings from text descriptions, replacing hand-crafted content features. This improves zero-shot performance on new items.
- Generative recommendation: Frame recommendation as a sequence generation task. Meta's HSTU (Hierarchical Sequential Transduction Units) architecture treats all user actions as tokens in a generative model, enabling training at trillion-parameter scale. Wikipedia's entry on recommender systems notes that generative recommenders "improve recommendation quality in test simulations and in real-world tests, while being faster than previous Transformer-based systems when handling long lists of user actions."
As of 2025-2026, generative recommenders are in production at Meta and are being evaluated at other large platforms.