discover_all_pages

Comprehensive page discovery using sitemaps, crawling, and intelligent mapping

Overview

Before you can clone a site, you need to know every page it has. This tool queues a discovery run using multiple methods — XML sitemap parsing, recursive HTML crawling, and intelligent page mapping — and returns immediately with a job_id. Discovery runs in the background. Use get_migration_status to poll for completion.

How It Works

Creates (or reuses) a cloning job for the source domain and queues the discovery pipeline step.
Returns immediately with job_id and discovery_status — the actual crawl runs in the background.
The crawler parses XML sitemaps, follows internal links recursively, and finds JavaScript-rendered routes.
URLs are normalized and deduplicated; each page is classified by type (homepage, product, blog post, etc.).
Use get_migration_status to track when discovery finishes and how many pages were found.

Input Parameters

Parameter	Type	Required	Description
`source_url`	`string`	required	Homepage URL of the WordPress site
`clone_domain`	`string`	optional	Domain of the clone site (for automatic URL mapping)
`job_id`	`string`	optional	Existing job ID to update instead of creating new
`mode`	`append \| replace \| discover_then_delete`	optional	Re-discovery mode (default: append). 'replace' is destructive — wipes all pages and data. 'discover_then_delete' re-crawls then removes pages no longer on the source site. Omit to short-circuit on existing domains.

What You Get Back

job_id — use this with compare_page_pair, run_full_comparison, and get_migration_status
source_domain — the parsed origin URL
discovery_status — 'discovering' (queued) or 'in_progress' (already running)
mode — the re-discovery mode that was applied
message — guidance to poll get_migration_status for progress

Example Use Case

Call discover_all_pages on telepresencerobots.com. It returns immediately with job_id and discovery_status: 'discovering'. Then poll get_migration_status to track progress — when discovery finishes, it will show 983 pages found across multiple discovery methods.

Tips

✓Always provide clone_domain if you know it — this pre-maps source URLs to clone URLs for compare_page_pair.

✓The tool returns immediately — discovery runs in the background. Use get_migration_status to poll for completion.

✓Re-running on the same domain without a mode param returns the existing job_id instead of re-crawling.

✓The job_id returned here is used by compare_page_pair, run_full_comparison, and get_migration_status.

discover_all_pages

Overview

How It Works

Input Parameters

What You Get Back

Example Use Case

Tips

Related Tools