get_media_list

Deep media asset discovery with Zyte rendering, variant grouping, and size estimation

Overview

Finds every image, video, font, PDF, and downloadable file on a WordPress site using JavaScript-rendered crawling via Zyte. Extracts media from CSS background-image, srcset, picture elements, OG/Twitter meta tags, favicons, and @font-face declarations. Groups WordPress thumbnail variants to their originals, preserves WP upload directory structure, and estimates total storage size.

How It Works

  1. Discovers pages via XML sitemap (sitemap.xml, sitemap_index.xml, wp-sitemap.xml) before crawling.
  2. Fetches each page with Zyte (JavaScript-rendered) when available, falling back to static fetch.
  3. Extracts media from img, picture, srcset, video, audio, iframe, CSS background-image, OG/Twitter meta, favicons, and @font-face.
  4. Normalizes WordPress thumbnail suffixes (-300x200, -1024x768) to identify base images and groups variants.
  5. Preserves wp-content/uploads/ directory structure in suggested paths (e.g., /public/uploads/2024/03/photo.jpg).
  6. Estimates total storage by sampling file sizes via HEAD requests.
  7. Optionally scans WP export files for media referenced in content and meta fields.

Input Parameters

ParameterTypeRequiredDescription
site_url string required URL of the WordPress site to scan
export_path string optional Path to a WP export file for additional media discovery
max_pages number optional Maximum pages to scan for media (default: 10)

What You Get Back

Example Use Case

Before cloning, run this on a WooCommerce site. It discovers 450 product images (with 1,200 thumbnail variants grouped to 450 originals), 12 PDFs, 3 videos, 8 web fonts, and estimates 2.1GB total storage — all with paths matching the original wp-content/uploads/ structure.

Tips

Increase max_pages for sites with lots of media spread across many pages.
Combine with the export file scan to catch media referenced in post content that might not appear on the live site anymore.
The suggested paths preserve WordPress upload directory structure (year/month) for easy migration.
Variant grouping helps you download only original images and regenerate thumbnails on the clone.
Storage estimation samples up to 20 files — accuracy improves with larger image sets.

Related Tools