Research · Text Embeddings · OpenAI

Do You Really Need
All Those Dimensions?

A deep dive into Matryoshka embeddings, dimension reduction economics, and why pizza statistically belongs closer to video games than a controller does.

Larissa Rodrigues de Oliveira

AI Engineer & Researcher

NLPVector SearchRAG

Scroll

The Two Models

A generation jump in
semantic representation

OpenAI's third-generation models introduce dynamic dimensionality via Matryoshka learning — a complete departure from the rigid fixed-size vectors of the past.

Model	Max Dimensions	MTEB English	MIRACL Multilingual	Price / 1M tokens
text-embedding-3-small 3rd gen	1,536	62.3%	44.0%	$0.02
text-embedding-3-large 3rd gen	3,072	64.6%	54.9%	$0.13

🌸

text-embedding-3-small

The Efficient One

Low Cost

Max dimensions1,536

API cost$0.02 / 1M tok

MIRACL multilingual44.0%

Best forEnglish · High volume

💙

text-embedding-3-large

The Precision One

Frontier

Max dimensions3,072

API cost$0.13 / 1M tok

MIRACL multilingual54.9% 🚀

Best forPT · ES · FR · Deep search

Matryoshka Architecture

The nesting doll
that changes everything

Matryoshka Representation Learning (MRL) makes dimension-flexible embeddings possible. Understanding it reshapes how you think about vector storage.

MRL forces information priority: critical concepts live in the earliest dimensions

Core identityBroadest category

→

256

+ ContextDomain clarity

→

512

+ NuanceSentiment · tone

→

1024

+ SyntaxComplex deps

→

3072

+ Micro-detailJargon · subtlety

✂️

Safe truncationSlice the vector from the end — the most important information is always front-loaded and stays valid.

🏋️

Multi-loss trainingOptimizes simultaneously at dimensions 64, 256, 512, 1024, and 3072 — each prefix is independently meaningful.

📐

Infrastructure flexibilityConstrained to 1,024 dims? Use the frontier model, truncate to fit — no need to settle for a natively smaller model.

The Analogies

Two insights that will
change how you think

📸

Dimensions = Image Resolution

A 512×512 and a 1024×1024 photo both show you a leather boot — one just has more detail. But do you need 10,000px to recognize it's a boot? At some point, extra resolution stops adding meaningful information. The first few hundred dimensions capture broad ontological category — enough for most retrieval tasks.

💡 Extra dimensions are high-frequency modifiers: useful for microscopic nuance, not for core classification.

📷

Model Quality = Camera Sensor

Two cameras both set to 512×512. One is a basic smartphone; the other a professional DSLR. Same resolution — but the DSLR captures better color fidelity, texture, and depth. The Large model is that DSLR: more internal parameters, deeper attention heads, packing 768 dimensions with higher-quality representations than Small does.

💡 Output dimensionality is just a container. What's poured in depends entirely on architectural depth.

Model Showdown

Same query. Real rankings.
See the difference.

Query: "Video game" — results ordered by cosine similarity (most similar first), extracted directly from the experiment. Two stories worth telling.

⚠️ Tested in a product-catalog / e-commerce context (short product names and categories). Results in other domains may differ — always validate on your own data before migrating.

🤯 The surprising one

Large at 384 dim vs. Small at 1,536 dim

¼ of the max storage. Better results. The model's internal architecture matters more than dimension count.

🌸 Small · 1,536 dim Full capacity

1🪑 Gaming Chair
2Mouse
3🎮 PlayStation 5
4TV
5Switch

Gaming Chair tops the list — arguably not the most semantically central gaming concept.

💙 Large · 384 dim ¼ the storage

1🎮 PlayStation 5
2🪑 Gaming Chair
3Switch
4Mouse
5🍕 Pizza

PlayStation 5 lands at #1 — at ¼ of Small's storage. The DSLR wins even at lower resolution.

💡 The practical one

Large at 768 dim vs. Large at 1,536 dim

Half the storage. Identical ranking. This is Matryoshka learning doing exactly what it promises.

💙 Large · 1,536 dim Full cost

1🎮 PlayStation 5
2🪑 Gaming Chair
3Mouse
4Switch
5🍕 Pizza

💾 1,536 × 4 bytes × N vectors

💙 Large · 768 dim ½ the storage ✦

1🎮 PlayStation 5
2🪑 Gaming Chair
3Mouse
4Switch
5🍕 Pizza

💾 768 × 4 bytes × N vectors

✓ Exact same top-5 order. Matryoshka front-loading means the first 768 dimensions already carry the full semantic weight needed for this type of task.

Interactive

What changes as you
tune the Large model's dimensions?

📌 Applies to: text-embedding-3-large only

Dimensions to request

768

dimensions

3847681,5363,072

✨ 768 dim — the production sweet spot. Identical semantic rankings to Large-1536 at half the storage cost. Recommended for most product-search and semantic matching pipelines.

Key Findings

Three results that
actually surprised me

Query: "Video game" tested against 11 candidate terms, across both models and four dimension sizes, using Euclidean distance and Cosine similarity.

Large at 768 dimensions = Large at 1,536 dimensions

Half the storage. Half the vector length. Identical semantic rankings in this context. For product-catalog matching tasks, you can cut your vector database footprint in half with zero measurable loss in retrieval quality.

⚠️ Tested on product names in an e-commerce context. More nuanced tasks — like matching paragraphs of dense legal or academic text — may reveal differences at higher dimensions.

Storage Win

Large at 384 outperformed Small at 1,536 dimensions

One-eighth of the Large model's max storage — and it still produces better rankings. This confirms the camera-sensor analogy: the model's internal architecture dominates over raw dimension count. The Small model, fully unrolled, can't match Large's compressed precision in this domain.

⚠️ This finding applies to short, categorical terms (product names, labels). For longer, denser inputs the gap may narrow or shift.

Architecture Wins

🍕 "Pizza" ranked closer to "Video game" than "Controller" did

A controller is literally a video game accessory. Pizza is food. Yet the model places pizza closer in the vector space — and once you understand how embeddings actually work, this makes complete sense.

Mind-bending

Real Data

How the model actually
ranked the results

Query: "Video game" · Model: text-embedding-3-large · 768 dimensions · Cosine distance (lower = more similar).

📐 Cosine distance Lower value = more contextually related to "Video game" · Values read directly from the experiment

PlayStation 5

0.477

Gaming Chair

0.516

Mouse

0.572

Switch

0.573

🍕 Pizza beats Controller!

0.580

Headphone

0.592

Controller ↓ behind pizza

0.596

Keyboard

0.611

Running

0.617

Monitor

0.617

0.648

🇧🇷 These terms were originally tested in Portuguese

Gaming Chair→Cadeira Gamer

Controller→Controle

Keyboard→Teclado

Headphone→Fone de Ouvido

Running→Corrida

TV→Televisão

PlayStation 5, Switch, Mouse, Monitor, Pizza→same in Portuguese 🙂

The Pizza Paradox

Why contextual co-occurrence
beats taxonomy

🍕

Embeddings don't measure synonyms — they measure context

Ontologically, pizza and video games share zero overlap. Yet in the statistical geography of human language — across billions of forum posts, stories, and conversations — they're neighbors.

Semantic Similarity

What it means to be alike

Direct synonyms or near-identical meaning. Words that can substitute for each other without changing the core meaning.

"automobile" ≈ "car"

Semantic Relatedness

What it means to co-exist

Linked by function, culture, or co-occurrence — not necessarily alike in meaning, but appearing in the same situations.

"pizza" + "video game" = leisure

🔀 Why "Controller" drifts: the polysemy problem

The word "controller" appears across vastly different contexts in billions of training examples. Its vector is pulled in many directions simultaneously — diluting its gravitational pull toward gaming:

🎮 Game controller 💰 Financial controller ⚙️ Micro-controller 🧠 Emotional control ✈️ Air traffic controller 📦 Inventory control

The Economics

The real cost isn't
the API call

API inference costs are often negligible at enterprise scale. The compounding cost lives in storage, RAM, and query latency — and that's where dimension reduction pays off.

💸

6.5×

cost difference between Small ($0.02) and Large ($0.13) per 1M tokens

💾

87.5%

storage reduction when truncating Large from 3,072 → 384 dimensions

⚡

2×

query speed gain from 3,072 → 1,536 dims (dot product scales linearly)

🌍

+24.8%

multilingual MIRACL improvement of Large vs. Small — a paradigm shift for non-English corpora

Storage overhead per 1M vectors (relative to max)

Large — 3,072 dim (maximum)100% baseline

Large — 1,536 dim50%

Large — 768 dim ✦ Sweet spot25%

Large — 384 dim12.5%

Engineering tip: "Funnel retrieval" is emerging as the gold standard — store truncated vectors (768 or 384 dim) for fast approximate-nearest-neighbor broad search, then re-rank the top-K using full 3,072-dim vectors loaded from disk. Best of both worlds.

The Key Takeaway

The question isn't just
"which model?"

The real question is: which model — at what dimension count — for your specific use case?

🎯

For Portuguese, Spanish, French

Always use Large. The 24.8% multilingual MIRACL gap is a paradigm shift, not an incremental improvement. Small simply doesn't map non-English semantic spaces with enough fidelity.

⚡

For production at scale

Large at 768 dimensions is the sweet spot for product-search contexts: better multilingual performance, half the storage of Large-1536, and no measurable quality loss for standard retrieval tasks.

🔬

Before committing

Always A/B test on your actual domain. These findings were observed in a product-search context. Results for legal, academic, or conversational text may differ — so validate before migrating.

Do You Really NeedAll Those Dimensions?

A generation jump insemantic representation

The nesting dollthat changes everything

Two insights that willchange how you think

Dimensions = Image Resolution

Model Quality = Camera Sensor

Same query. Real rankings.See the difference.