π§ What Are Embeddings?
Embeddings are how we turn symbolic data (like words, sentences, images, or even users) into vectors of numbers β in a way that captures meaning, similarity, or structure.
Think of embeddings as giving your data a GPS coordinate in meaning-space.
Theyβre the bridge between raw input and neural network understanding.
π’ Why Not Just Use Words Directly?β
Computers canβt understand raw text like βdragonβ or βhonor.β We need to turn them into numbers. One way is one-hot encoding:
ini
CopyEdit
dragon = [0, 0, 0, 1, 0, 0, 0]
honor = [0, 1, 0, 0, 0, 0, 0]
But that doesnβt say anything about how similar "dragon" and "honor" are β theyβre just orthogonal blobs.
π« Embeddings Fix Thatβ
An embedding maps each item (like a word) to a dense vector of real numbers. The vector is learned by a model and captures semantic or structural relationships.
So instead of random position, "dragon" and "honor" are placed closer in vector space if they appear in similar contexts.
π§ Embeddings In Practiceβ
Domain | Embedding Example |
---|---|
NLP | Words β vectors (Word2Vec, GloVe, BERT) |
Images | Images β vectors (ResNet, CLIP) |
Recommenders | Users & items β embedding vectors |
Graphs | Nodes β embeddings (e.g., Node2Vec, GNNs) |
These vectors are often used to:
- Compare similarity (cosine distance, dot product)
- Feed into deeper neural layers
- Store in vector databases for fast search (e.g., RAG)
π§ͺ Example: Word Embeddingsβ
Word | Embedding Vector (truncated) |
---|---|
king | [0.61, -0.12, 0.43, ..., 0.09] |
queen | [0.59, -0.11, 0.45, ..., 0.10] |
banana | [-0.28, 0.77, 0.13, ..., -0.09] |
You can see:
king
andqueen
are closebanana
is somewhere else entirely
π TL;DR β What Are Embeddings?β
- A vector representation of some discrete input (like a word)
- Learned by a model to capture relationships
- Useful for search, similarity, classification, and as input to neural networks