Why We Used pgvector Instead of Pinecone for Ad-Library Semantic Search

Every ad AdWhispr ingests gets an embedding. When you ask "show me their pain-point hooks" or "which ads lean on social proof," we don't keyword-match — we run a semantic search over the meaning of the copy. That requires a vector index. The default advice in 2026 is still "spin up Pinecone." We didn't. We store ad embeddings in the same Postgres database that holds the ad rows, using the pgvector extension, and we'd make the same call again.

This is the engineering reasoning behind pgvector vs Pinecone for ad search at our scale — what we gained, what we gave up, and the exact point where we'd flip.

What the workload actually is

Before picking a database, it helps to be honest about the shape of the problem. AdWhispr's vector workload is not a billion-vector recommendation engine. It looks like this:

Dimension	Reality at AdWhispr
Embedding model	OpenAI `text-embedding-3-small`, 1536 dimensions
Vectors per brand	Tens to a few hundred active ads, sometimes low thousands
Total corpus	Hundreds of brands, not hundreds of millions of ads
Query pattern	Almost always scoped to one brand (or a small compare set)
Write pattern	Bursty — a full ad library lands during ingestion, then daily deltas
Read latency need	Sub-100ms inside a chat turn that's already streaming an LLM

The single most important line in that table is "almost always scoped to one brand." We are virtually never doing an unfiltered nearest-neighbor search across the entire corpus. We're doing WHERE brand_id = $1 first, then ranking by vector distance inside that brand's few hundred ads. That metadata pre-filter changes everything about which tool is the right one.

The case for pgvector: one database, one source of truth

The ad rows already live in Postgres. Hook taxonomy, format, tone, first-seen and last-seen dates, daily snapshots, engagement counts — all relational, all in one place. Putting the embedding in a separate vector service means every ad now lives in two systems that have to be kept in sync.

That sync problem is not theoretical. It's the part nobody puts in the architecture diagram and everybody debugs at 1am.

Transactional consistency. When we ingest a brand, the embedding is written in the same transaction as the ad row. If the transaction rolls back, there is no orphan vector pointing at an ad that doesn't exist, and no ad row whose vector silently failed to land in a remote service. With an external vector DB you get dual-write semantics: write the row to Postgres, write the vector to Pinecone, and pray the second call doesn't fail after the first succeeded. You end up building a reconciliation job to find the drift. We build no such job, because there is no drift.

Filter-then-search is just a SQL query. Because our hot path is "this brand's ads, ranked by semantic similarity," the ideal query joins a metadata filter and a vector sort:

SELECT id, primary_text, days_running
FROM ads
WHERE brand_id = $1
  AND ad_delivery_stop_time IS NULL
ORDER BY embedding <=> $2
LIMIT 12;

That's one round trip. The planner uses the brand_id filter to shrink the candidate set, then orders by cosine distance (<=>). In a separate vector store, metadata filtering is a feature you configure and pay for, and combining it with our own relational facts — like joining against the daily snapshot history that powers our days-running signal — means pulling IDs out of the vector DB and re-querying Postgres anyway. Two systems, two round trips, more code.

Operational simplicity. We already back up Postgres, monitor Postgres, run migrations against Postgres, and reason about Postgres connection limits on serverless. Adding pgvector added one extension and one column. It added zero new dashboards, zero new API keys to rotate, zero new vendor status pages to watch during an incident, and zero new line items on a bill.

Cost. A managed vector database is a second piece of always-on infrastructure with its own pricing curve. At our vector count, the marginal storage and compute for embeddings inside our existing Postgres instance is rounding-error territory. We're not paying a separate service to hold a few hundred thousand vectors we could index in the database we already run.

The honest tradeoffs

This isn't a free lunch, and pretending it is would be exactly the kind of thing this blog refuses to do. Here's what pgvector costs you.

pgvector isn't a first-class Prisma citizen. Prisma's schema language doesn't model the vector(1536) type, so the embedding column and its index are created with raw SQL, and reads/writes that touch the vector go through raw queries rather than the typed ORM. It's a sharp edge you have to remember exists.
Index tuning is on you. pgvector gives you HNSW and IVFFlat indexes, but you own the recall-versus-speed knobs (m, ef_construction, ef_search for HNSW). A managed service hides those decisions behind a slider or removes them entirely.
Write-heavy index maintenance shares the database. A big rebuild competes with your transactional workload for the same CPU and I/O. Our writes are bursty and our corpus is small, so this is fine; at constant high write volume it wouldn't be.
Horizontal scale is Postgres's scale. pgvector rides on whatever you've done to scale Postgres — read replicas, a bigger box, a serverless Postgres tier. There's no independent, near-infinite vector tier you can scale separately from your relational data.

When we'd reach for Pinecone instead

Choosing pgvector is a function of scale and access pattern, not ideology. We'd switch — or run a dedicated vector DB alongside Postgres — the moment any of these became true:

Corpus in the tens of millions of vectors and growing. When the index no longer fits comfortably in memory next to your relational working set, a purpose-built engine that's designed to shard and scale vectors independently earns its keep.
Unfiltered global similarity is the hot path. If our main query became "find the most similar ad across every brand we've ever ingested" with no metadata pre-filter, the dedicated systems' approximate-nearest-neighbor performance would pull ahead decisively.
You want vector scaling fully decoupled from your OLTP database. Managed scaling, zero index-maintenance ops, and not having a giant index fighting your transactional queries for resources is a real, buyable benefit — you're paying to make those problems someone else's.

None of those are true for us today. Recall on a few hundred per-brand candidates is excellent, latency is well inside the budget of a chat turn, and the operational surface stays small. The decision tracks the workload, and the workload is small-N, heavily filtered, and tightly coupled to relational facts we own.

The deeper reason: the relational data IS the product

There's a strategic angle under the engineering one. AdWhispr's edge isn't the embedding — anyone can call OpenAI. The edge is the history: the daily snapshots that let us compute how long an ad has been running, the creative-iteration rate derived from first-seen dates, the engagement counts we scrape and join. Those are relational, time-series, and ours. The semantic search is a lens we point at that data, not a separate product.

Keeping the vectors in the same database as the signal they describe means a single query can rank ads by meaning and filter by proven longevity and join engagement in one pass. Splitting the embedding into a remote service would have severed the vector from exactly the facts that make the result worth reading. For us, that coupling is the whole point — and it's why pgvector won.

We default to the boring, consolidated choice and only add infrastructure when the workload forces it. So far, it hasn't.

Curious what semantic search over a competitor's real ad library actually surfaces? Try it at adwhispr.com, or read more engineering notes on the blog.