Extracting Meta Ad Creatives from render_ad URLs with Puppeteer

The Meta Ad Library API hands you a wall of metadata — page name, ad copy, spend ranges, impression ranges, delivery dates — and then it withholds the one thing you actually want to look at: the creative. There is no image_url, no video_url, no CDN link to the thing a human saw in their feed. What you get instead is ad_snapshot_url, a link to a JavaScript-rendered single-page app, plus a sibling URL pattern that almost nobody documents but that solves the whole problem: render_ad.

This is the part of building competitor-ad tooling that nobody warns you about. Below is exactly how to extract Meta ad creative with Puppeteer — the right target URL, the DOM extraction logic, and the filters that keep junk out of your database.

Why the public Ad Library page is the wrong target

Your first instinct is to point a headless browser at the public ad you can see in your own browser:

https://www.facebook.com/ads/library/?id={metaAdId}

Do not do this. That URL renders the full Ad Library experience — the Facebook chrome, a left rail, a sidebar, and frequently a grid of other promoted ads from the same page. When you scrape the largest image off that DOM, you have no guarantee you grabbed the ad you asked for. Worse: when the specific ad has stopped delivering and its creative is gone, the page falls back to showing sibling content. Puppeteer dutifully grabs the largest visible image, which is now some unrelated sidebar ad.

The failure mode is silent and ugly. You run a batch of 100 ads for a brand, and a chunk of them all come back with the same wrong thumbnail — whatever happened to be the most prominent image in the chrome. You don't notice until you're staring at a dashboard where six different ads share one picture.

The fix is to stop scraping the page meant for humans and scrape the endpoint meant for rendering one ad in isolation.

The render_ad URL: one ad, no chrome

Meta exposes a rendering endpoint that returns a single ad creative with nothing else around it:

https://www.facebook.com/ads/archive/render_ad/?id={adId}&access_token={token}

Pass the ad's archive ID and a valid Ad Library access token, and you get back an isolated iframe-style render: just the creative, the headline, the CTA button. No sidebar. No promoted grid. No page chrome. Because there is exactly one creative on the page, "grab the biggest media element" becomes a reliable instruction instead of a gamble.

Two notes before you wire it up:

Treat the token as a secret. It belongs in an environment variable and never in client code, logs, or a committed file. Rotate it on Meta's 60-day expiry cycle.
It's still an SPA. The creative is injected by JavaScript after load, so a plain HTTP fetch returns an empty shell. You need a real browser to execute the page. That's where Puppeteer comes in.

Driving headless Chromium

The job is small: open the page, wait for the creative to paint, read the DOM, close. Keep the browser instance pooled across a batch so you're not paying cold-start cost per ad.

import puppeteer from "puppeteer";

const RENDER_BASE = "https://www.facebook.com/ads/archive/render_ad/";

export async function extractCreative(adId: string, token: string) {
  const browser = await puppeteer.launch({
    headless: true,
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  try {
    const page = await browser.newPage();
    const url = `${RENDER_BASE}?id=${adId}&access_token=${token}`;
    await page.goto(url, { waitUntil: "networkidle2", timeout: 30000 });

    // Give the SPA a beat to inject the creative.
    await page.waitForSelector("img, video", { timeout: 10000 }).catch(() => {});

    return await page.evaluate(extractFromDom);
  } finally {
    await browser.close();
  }
}

networkidle2 waits until the network has mostly quieted, which for this endpoint correlates well with "the creative has loaded." The waitForSelector is a belt-and-suspenders guard for the slow case — swallow its rejection so a missing creative degrades to "no media" instead of throwing.

Extracting video: src and poster

Video ads are the easy win, because a video ad renders an actual <video> element and the element carries both the playable file and a preview frame:

video.src — a direct fbcdn / fna MP4 URL. Store this as your playback source.
video.poster — a scontent image URL. Store this as the thumbnail.

function extractFromDom() {
  const video = document.querySelector("video");
  if (video && video.src) {
    return {
      type: "video" as const,
      mediaUrl: video.src,
      thumbnailUrl: video.poster || null,
    };
  }
  // ...fall through to image handling
}

That gives you an inline-playable MP4 plus a poster frame, both straight off Meta's CDN. No transcoding, no storage layer required to display it.

Extracting image: largest CDN img, minus the profile pic

Image ads have no <video>, so you walk the <img> elements and pick the creative. The naive "largest image" heuristic is close but has one trap: the brand's profile picture. It's an img, it's on a CDN, and on a sparse render it can be the only image besides the creative.

Filter by rendered area. Profile pics render small — roughly 100×100, so under ~10,000px². The actual creative is large. Pick the largest image whose area clears the threshold:

  const imgs = Array.from(document.querySelectorAll("img"));
  const creative = imgs
    .map((el) => ({
      url: el.currentSrc || el.src,
      area: el.naturalWidth * el.naturalHeight,
    }))
    .filter((c) => c.url && c.area > 10000)        // drop profile pics & icons
    .sort((a, b) => b.area - a.area)[0];           // biggest wins

  return creative
    ? { type: "image" as const, mediaUrl: creative.url, thumbnailUrl: creative.url }
    : { type: "none" as const, mediaUrl: null, thumbnailUrl: null };

Use naturalWidth/naturalHeight (the intrinsic pixel size), not the CSS box — the layout can shrink a large creative, and you want to rank by the real asset. The same area filter that kills the profile pic also kills CTA icons, the Meta watermark, and tracking pixels for free.

Edge cases worth handling

A few things that bite you at scale:

Case	What happens	Handling
Carousel ad	Multiple slides; default extraction grabs only the first/largest	Iterate the carousel DOM nodes and collect each slide; otherwise accept first-slide-only
Dead creative	Ad stopped delivering, render returns empty	Return `type: "none"` and skip — never fall back to the public page
CDN expiry	fbcdn URLs can stop resolving after days/weeks	Persist the bytes to R2/S3 if you need permanence; otherwise re-extract on a schedule
Token expired	render returns an auth wall	Surface as a hard error, not a missing creative — it's a config problem

The CDN-expiry one matters most if you're keeping a historical library. The URLs are signed and ephemeral. For a swipe file you intend to keep, download the asset during extraction and rehost it; for a live dashboard, re-running extraction on your ingest cron is enough.

Where this fits in a research pipeline

Creative extraction is plumbing, not the product. The creative tells you what an ad looks like; it can't tell you whether the ad is working. The Ad Library API exposes no CTR, no CPC, no ROAS for a competitor — those numbers live only inside the advertiser's own account, and any tool that displays a competitor's exact ROAS invented it.

The signal you can trust is longevity. An ad that's been delivering for 100+ days is a proven winner, because nobody keeps paying to run a loser. But the API returns no history — only a current snapshot. So the history is something you have to build: snapshot every ad daily and read the distribution of run-times over time. That's the layer that turns "here's a video file" into "here's the creative this brand has bet on for four months straight."

That snapshot-and-derive pipeline — render_ad extraction, daily longevity tracking, hook/format/tone tagging, engagement-verified reach — is exactly what AdWhispr runs so you don't have to maintain a Chromium farm yourself. Paste a competitor's Facebook URL and interrogate their entire ad library by chat or straight inside Claude via MCP. More engineering teardowns live on the blog.

Build the extractor for the fun of it — then let AdWhispr keep the snapshots running while you sleep.