Understanding Vectors and Embeddings: The Math Behind AI Search
Vectors and embeddings are fundamental to how modern AI systems understand and search through information. Here's what they are and why they matter.
What is a Vector?
A vector is simply a list of numbers. In AI, vectors represent the meaning of words, sentences, or documents. Similar meanings have similar vectors.
What are Embeddings?
Embeddings are vectors created by AI models that capture the semantic meaning of text. The sentence "I love cats" and "I adore felines" will have similar embeddings, even though they use different words.
How Semantic Search Works
Traditional search matches keywords. Semantic search matches meaning:
- Convert all documents to embeddings
- Convert the search query to an embedding
- Find documents with the most similar vectors
- RAG systems: Find relevant documents for LLMs to reference
- Recommendation engines: Find similar products, articles, or users
- Deduplication: Find near-duplicate content
- Classification: Group similar items together
Cosine Similarity
The most common way to measure similarity between vectors is cosine similarity. It measures the angle between two vectors. A smaller angle means more similarity.
Real-World Applications
Vector Databases
Pinecone, Weaviate, Qdrant, and pgvector are specialized databases built for storing and searching embeddings efficiently.
Why This Matters
Understanding embeddings helps you build better RAG systems, optimize search, and debug AI applications. It's a core concept in our Build with LLMs programme.