How Googlebot's two-wave indexing works

Crawling and indexing aren’t the same thing. The gap between them is where Google decides what to do with the URL it just fetched, and the decision pipeline has more stages than most SEO discussions acknowledge.

Googlebot fetches URLs. The Indexing system decides what to do with them. The Rendering pipeline handles pages that depend on JavaScript. The selection algorithms decide which version of duplicate or near-duplicate content gets surfaced in search results. Each stage operates with different criteria, and a URL can pass one stage while failing another.

The “two-wave indexing” framing that dominated SEO discussions through 2019 captured part of the picture: pages with JavaScript-rendered content went through a separate, slower pipeline than pages with HTML-rendered content. The framing was useful but incomplete. As of 2026, the pipeline has more stages, more decision points, and more ways for content to fail to appear in search results despite being crawled.

The map: how the pipeline actually works, where URLs commonly get stuck, and what site owners can influence at each stage.

The stages of the indexing pipeline:

The pipeline has five distinct stages, each producing a decision that determines whether the next stage runs:

Stage	What happens	Decision produced
<strong>Discovery</strong>	Google learns about the URL (sitemap, link, redirect, manual submission)	URL added to crawl queue, or ignored
<strong>Crawl</strong>	Googlebot fetches the URL and processes the HTML response	Content extracted, or crawl failed (404, 500, robots.txt block)
<strong>Rendering</strong>	When needed, Web Rendering Service executes JavaScript and produces the rendered DOM	Rendered content available, or rendering failed/timed out
<strong>Indexing</strong>	Google decides whether the content is worth storing in the index	URL indexed, or excluded (low quality, duplicate, noindex)
<strong>Serving</strong>	Google selects which indexed URLs to surface for specific queries	URL appears in results for relevant queries, or not selected

Each stage operates with different signals and different latency. The URL discovery stage can complete in seconds. The serving stage operates on every query, in real time. The stages in between vary: median latency runs in seconds to hours for most sites, while a long tail of low-authority or heavily JavaScript-dependent URLs can wait days to weeks.

Where the “two waves” framing came from:

Google’s Martin Splitt described the rendering pipeline at Google I/O 2018 using a two-wave model. The waves:

First wave: Googlebot fetches the URL. The HTML response is parsed. Content visible in the HTML gets indexed immediately. Links visible in the HTML get added to the crawl queue.
Second wave: URLs that need JavaScript execution to display content go to the render queue. The Web Rendering Service processes the queue. After rendering, the additional content gets indexed.

The framing produced useful intuition: pages with content in the HTML response get indexed faster than pages with content built by JavaScript. Sites that depended on JavaScript for primary content paid a latency cost.

The complication that emerged after 2019: the pipeline isn’t really two waves anymore. The rendering pipeline operates more like a continuous queue. The decision logic is more nuanced than “render or don’t render.” Splitt has acknowledged in subsequent talks that the two-wave model was a simplification for clarity, and the actual system is closer to “fetch, queue for render if needed, render when capacity allows, index the result.”

The implication for SEO: the two-wave model is still useful for explaining why JavaScript-rendered content takes longer to index, but the specifics matter less than the general principle. Content in the HTML response gets indexed faster than content that requires JavaScript execution.

What happens in the render queue:

The render queue handles JavaScript-dependent sites. The medians have improved substantially since the original two-wave era: Splitt cited a median render delay of about 5 seconds at Chrome Dev Summit 2019, and Vercel’s 2024 nextjs.org analysis put the 25th percentile at 4 seconds. The tail is where SEO problems live; the median is fine.

URLs land in the queue when the crawl stage flags them as needing JavaScript execution. The flag is set when the HTML response is missing content that internal heuristics suggest should be there, or when the URL pattern matches a site Google has previously identified as JavaScript-rendered.

Priority is set by signals similar to crawl prioritization. High-authority sites and important URLs get rendered first; low-authority sites and tail URLs wait longer.

Inside the queue, the render worker uses Chromium-based rendering, with the version regularly updated to match recent stable Chrome. JavaScript executes, CSS applies, images load (sometimes), and the resulting DOM gets captured.

That captured DOM moves to the indexing stage. Content present in the rendered DOM but not in the HTML response gets evaluated for indexing at this point.

Latency variance is the real cost. Most URLs render within seconds; outlier cases (low-authority sites with heavy JavaScript dependencies) can wait days, and Onely’s research has documented 5-50% of newly-added JavaScript-dependent pages still partially unindexed two weeks after sitemap submission. The mean is the median plus a long tail, and a JavaScript-dependent site can find itself in the tail without knowing.

The implication: SEO performance for JavaScript-rendered sites depends on factors that don’t apply to HTML-rendered sites. Render queue priority, render success rates, and rendering accuracy become operational concerns alongside the standard ranking signals.

What can fail at the rendering stage:

Five failure modes recur:

The render times out. Google’s rendering has a timeout limit that varies but is generally measured in seconds. Pages that take too long to render don’t get fully rendered. The content that hadn’t loaded by the timeout doesn’t get indexed.
JavaScript errors halt execution. A runtime error in the rendering pipeline (uncaught exception, infinite loop, blocked async call) stops further rendering. The page gets indexed in whatever state it reached.
Required resources are blocked. Resources blocked by robots.txt (JavaScript files, CSS files, API endpoints) can’t be fetched during rendering. The page renders in a degraded state, missing whatever the blocked resources would have provided.
Hydration produces different content than the initial render. When client-side JavaScript modifies the DOM significantly after initial render, the captured DOM may differ from what users see. Google sees one version; users see another.
Lazy-loaded content doesn’t trigger. Content loaded only on scroll, on click, or on viewport intersection may not load during rendering. The content exists in the codebase but doesn’t reach the indexing stage.

Each failure mode produces different symptoms in Search Console. Sometimes the page reports as indexed but with missing content in the rendered HTML view. Sometimes it reports as crawled but not indexed because Google’s quality signals downgraded it after rendering. Sometimes it appears in results but with title and snippet that don’t match the visual rendering. Different pages on the same site can have wildly different indexation success despite identical templates.

How indexing decides:

After rendering completes, the indexing stage evaluates the content. The decisions made here aren’t visible in Search Console as cleanly as the earlier stages, but the patterns are:

Duplicate content detection. If the indexing system identifies the URL as substantially similar to another URL on the same site or elsewhere, only one version typically gets indexed. The canonical signal influences which version, but doesn’t override Google’s judgment.
Quality evaluation. Pages that the indexing system judges as low value (thin content, low-quality templates, machine-generated text without editorial value) get excluded from the index regardless of crawl success.
noindex respect. Pages with noindex directives get excluded, but the directive must be visible to the rendering pipeline. A noindex tag added by JavaScript after initial render gets respected; a noindex tag added by JavaScript after the rendering pipeline has captured the DOM may not.
Soft 404 detection. Pages returning 200 status but containing content that signals “no results” or “page not found” get treated as soft 404s. The page exists in the crawl but not in the index.

The indexing decisions produce the “Indexed” vs “Crawled – currently not indexed” vs “Excluded” categories in Search Console. Each category reflects different decisions made at this stage.

Common diagnostic patterns:

Six patterns recur in diagnosing pipeline problems.

Start with comparing the HTML response to the rendered DOM. Tools like Google’s Rich Results Test, Mobile-Friendly Test, and URL Inspection in Search Console show what Google sees after rendering. Comparison with the source HTML reveals what depends on JavaScript and confirms whether rendering produces the expected result.

Next, monitor Search Console’s “Crawled – currently not indexed” report. This category captures URLs where the pipeline got past crawl but stopped before serving. Reasons vary, but the patterns are visible. Sudden growth in this category suggests a recent change to quality signals or to duplicate detection.

Render timing on important pages deserves a separate check. Pages with heavy JavaScript that takes 5+ seconds to render face high failure rates in the render queue. Reducing the render time produces measurable indexation improvements.

Audit robots.txt for resources required by rendering. Sites that block their own JavaScript files in robots.txt produce rendering failures Google reports vaguely. The fix is allowing the resources.

Test the page with JavaScript disabled. Open it in DevTools with JS turned off. The visible content is what crawl-stage indexing sees. The gap between that and the JavaScript-enabled view is what depends on rendering.

Finally, check for hydration mismatches. The HTML response and the rendered DOM should match for the parts that affect SEO. Significant differences suggest hydration issues that may produce inconsistent indexation.

What site owners can influence:

The pipeline operates on Google’s schedule, but several inputs are within site control:

The HTML response content. The more SEO-critical content is in the initial HTML, the less depends on the rendering stage. Title tags, meta descriptions, headings, body text, structured data, and primary navigation should all appear in the HTML response.
Render performance. Pages that render fast have higher success rates in the render queue. Optimizing JavaScript bundle size, reducing third-party scripts, and avoiding render-blocking resources all help.
Crawl resource accessibility. Every resource needed for rendering must be reachable. Audit robots.txt and CDN configurations to ensure CSS, JavaScript, and API endpoints aren’t blocked.
Internal linking signals. Internal links from authoritative pages signal which URLs should get pipeline priority. Pages with strong internal linking get crawled and rendered faster than orphan pages.
Sitemap accuracy. Sitemaps signal which URLs are important. A sitemap with thousands of low-value URLs dilutes the signal; a sitemap with the actual priority URLs focuses Google’s attention.

The pattern: the pipeline rewards sites that make Google’s job easy. Sites that produce clean HTML, fast rendering, accessible resources, and clear priority signals get indexed fast and consistently. Sites that fight the pipeline pay the latency cost.

The pipeline as system, not black box:

Understanding the pipeline matters less for what to do and more for diagnosing why things aren’t working. When a page gets indexed quickly, the pipeline ran smoothly. When a page gets stuck, the stuck stage usually reveals which input failed.

For sites with primary content in HTML, the pipeline is fast and the discussion is mostly academic. The two-wave model and the rendering queue don’t produce meaningful delays.

For sites that depend on JavaScript rendering, the pipeline becomes operationally important. The render queue, the failure modes, and the latency are all factors that affect when and how content appears in search results.

The sites that handle technical SEO well at scale treat the pipeline as a system to understand rather than a black box to fight. The understanding produces better architecture decisions, faster diagnostic work, and more accurate expectations about what will and won’t appear in search results after content publication.

The operational impact varies by content velocity. News sites publishing dozens of articles per day measure pipeline latency in minutes; the gap between publish time and indexation directly affects breaking-news traffic. E-commerce sites publishing dozens of new products per day measure it in hours, with indexation lag delaying revenue from new inventory. Blog sites publishing weekly measure it in days, with little business consequence. The pipeline matters more as content velocity rises.

The decision pipeline has more stages than most SEO discussions acknowledge, and that’s the point. The “Google indexed my page” or “Google didn’t index my page” framing collapses a multi-stage decision process into a binary outcome. Discovery is one stage; crawl scheduling another; HTML parsing another; rendering another; canonicalization another; quality evaluation another; selection for the index another. Each stage can succeed or fail or stall, and the page that doesn’t appear in search results failed somewhere specific in that chain. Diagnosing where, instead of treating the whole pipeline as opaque, is what separates technical SEO that works from technical SEO that guesses.

Related posts: