Tech SEO

Technical SEO audit with Screaming Frog, Sitebulb, GSC

A complete technical SEO audit combines three data sources: a site crawl that mimics how a search engine sees the site, a deeper analysis tool that surfaces patterns the crawl misses, and Google Search Console showing what Google actually does. Screaming Frog, Sitebulb, and Search Console fill these three roles. Used together, they answer most of the diagnostic questions that come up on a healthy site, and most of the urgent questions that come up when something breaks.

This isn’t the only stack. Botify, Oncrawl, ContentKing, and other platforms cover similar ground. But Screaming Frog and Sitebulb represent the desktop crawler tier that handles most audit work, and Search Console is non-negotiable because it’s the only source of Google’s own perspective.

The map: how each tool fits into the audit workflow, what to look for in each, and how the findings connect.


The three tools and what each is for:

Tool Role What it shows
<strong>Screaming Frog SEO Spider</strong> The site crawler URL inventory, status codes, redirects, canonicals, meta tags, internal links, images, structured data
<strong>Sitebulb</strong> Deeper analysis layer Crawl with built-in audits, hint system, visualizations, technical issue prioritization
<strong>Google Search Console</strong> Google's perspective Indexed pages, search performance, crawl stats, manual actions, enhancement reports

Screaming Frog produces the raw inventory. Sitebulb interprets the inventory. Search Console shows what Google is actually doing with the site. The three together answer most diagnostic questions.


Starting with Screaming Frog:

A first crawl with Screaming Frog produces the URL inventory and the baseline data. The current version (v20, released 2025) added improvements to JavaScript rendering reliability and expanded structured data validation, building on the v18 (2023) introduction of hreflang and Core Web Vitals reporting. The configuration that matters:

  • User agent. Default is Screaming Frog’s own UA. Switch to Googlebot Smartphone to see what Google’s primary crawler sees, especially important for mobile-first indexing.
  • JavaScript rendering. Off by default for speed. Turn on for sites that rely on client-side rendering, or audits will miss content that Googlebot can see.
  • Respect robots.txt. Default on. Turn off only when auditing what robots.txt is blocking.
  • Custom extraction. Use XPath or CSS selectors to extract specific data from pages: author names, publish dates, schema content, internal modules.

The first-pass reports to look at:

  • Response codes. Sort by status code. Identify 4xx and 5xx pages, redirect chains, and pages that should be returning different status codes than they are.
  • Page titles. Filter for missing, duplicate, too long, too short. Each is a separate optimization opportunity.
  • Meta descriptions. Same filters as titles.
  • H1 tags. Missing H1s, duplicate H1s, multiple H1s on a single page.
  • Canonical tags. Pages with no canonical, pages with canonical pointing elsewhere, pages with conflicting canonical signals.
  • Internal links. Orphan pages (no internal links pointing in), pages with only one inbound internal link, the most-linked-to pages on the site.
  • Images. Missing alt text, image file size, image format distribution.
  • Hreflang. If the site is international, validate hreflang tag implementation.

Screaming Frog’s bulk export functions let analysts move data into spreadsheets for further analysis. The crawl data becomes the foundation for most subsequent audit work.


Moving to Sitebulb:

Sitebulb does a crawl similar to Screaming Frog’s, but adds an interpretation layer: hints that flag specific issues and rank them by priority. Where Screaming Frog produces raw data, Sitebulb produces opinions about what matters.

The audit features that distinguish Sitebulb:

  • Hint system. Each crawl produces a list of hints organized by category (indexability, internal linking, content, performance). Each hint shows which URLs trigger it and explains why it matters.
  • Crawl maps. Visualizations of the site’s internal link structure, useful for spotting orphan pages, isolated sections, and over-linked or under-linked areas.
  • Comparison reports. Compare two crawls over time to see what changed: new errors, fixed issues, new pages, removed pages.
  • Search Console integration. Pulls Search Console data into the audit to combine crawl data with performance data.

The Sitebulb hints to prioritize:

  • Critical hints (broken, blocking, or actively harmful issues)
  • High-priority hints (issues that consistently affect ranking or indexation)
  • Internal link distribution (pages with abnormally few or many internal links)
  • Indexability issues (canonical conflicts, noindex on pages that should be indexed)
  • Content issues (thin content, duplicate content, missing required elements)

For audit teams that prefer raw data, Screaming Frog alone is sufficient. For teams that value the interpretation layer or visualization, Sitebulb adds value. Many audit workflows use both: Screaming Frog for bulk operations and custom extraction, Sitebulb for the prioritized issue list.


Search Console: the third source:

Search Console is the only source for what Google actually does with the site. The reports that matter for technical audits:

Pages report (formerly Index Coverage). Shows every URL Google knows about, classified by status:

  • Indexed
  • Crawled but currently not indexed
  • Discovered but currently not crawled
  • Page with redirect
  • Excluded by ‘noindex’ tag
  • Blocked by robots.txt
  • Soft 404
  • Not found (404)
  • Server error (5xx)

The classifications reveal pipeline problems. Pages “Discovered but currently not crawled” indicate Google knows about them but hasn’t gotten to them yet, often a crawl budget signal. Pages “Crawled but currently not indexed” indicate Google decided not to index them, often a quality signal. Pages “Soft 404” indicate content issues.

Crawl Stats report. Shows Googlebot’s activity on the site:

  • Total requests per day
  • Total bytes downloaded per day
  • Average response time
  • Distribution by file type, response code, Googlebot type, purpose (refresh vs discovery)

The Crawl Stats data complements log analysis. Sites without log access rely on Crawl Stats as the primary source for crawl pattern analysis.

Performance report. Shows search performance data:

  • Total clicks, impressions, CTR, position
  • Breakdown by query, page, country, device, search appearance
  • Comparison across time periods

Performance data identifies pages that are indexed but not ranking, pages that rank but don’t get clicks, pages that get clicks but don’t convert. Each is a different optimization opportunity.

Enhancement reports. Show structured data status:

  • Pages with each schema type
  • Errors and warnings
  • Performance over time

These reports flag schema implementation issues that Screaming Frog and Sitebulb crawls can’t see, because they show Google’s interpretation rather than just the markup.

Manual actions report. Shows whether the site has any manual actions from Google’s spam team. Always check this; manual actions explain ranking problems that no other diagnosis would catch.


The standard audit workflow:

A complete audit on a healthy site follows this sequence:

Phase 1: Crawl the site. Run Screaming Frog or Sitebulb to produce the URL inventory. Configure the crawler to match how Googlebot accesses the site (mobile user agent, JavaScript rendering if needed).

Phase 2: Cross-reference with Search Console. Export Search Console’s Pages report. Match Search Console’s URL list against the crawl’s URL list. Identify:

  • URLs Google knows about that the crawl didn’t find (discovery issue)
  • URLs the crawl found that Google doesn’t have indexed (indexation issue)
  • URLs in both lists with different status (canonicalization or quality issue)

Phase 3: Diagnose the discrepancies. Each category of discrepancy has standard causes. URLs Google has but the site doesn’t show as crawlable are often orphan pages or pages reachable only through paths the crawl didn’t follow. URLs the site has but Google doesn’t have indexed are often quality issues or technical blocks.

Phase 4: Prioritize the issues. Some issues affect many pages and high-value pages; others affect few pages or low-priority pages. The audit output ranks issues by impact, not by count.

Phase 5: Produce remediation recommendations. For each priority issue, specify the change needed, who can make it (developer, content team, SEO team), and how to verify the fix.

Phase 6: Set up monitoring. Schedule re-crawls (weekly for active sites, monthly for stable sites). Set up Search Console alerts for new errors. Build dashboards that combine crawl data and Search Console data so trends are visible.


Common findings across audits:

The issues that appear in most technical audits:

  • Redirect chains. A chain of A to B to C to D where the canonical destination should be reachable in one hop.
  • Mixed signals. Pages with noindex meta tag, but also in the sitemap, also linked from canonical tags as the preferred version.
  • Orphan pages. Pages with no internal links pointing to them, often because the linking module that used to surface them was removed.
  • Soft 404s. Pages returning 200 with “not found” or “no results” content.
  • Canonical conflicts. Multiple canonical signals pointing to different URLs (canonical tag, sitemap, hreflang, internal links).
  • Hreflang errors. Missing return tags, conflicting language codes, region codes that don’t match Google’s supported list.
  • Schema errors. Required properties missing, structured data that doesn’t match visible content.
  • Images missing alt text. Particularly on content pages where image search visibility matters.

The fixes are usually straightforward once the issues are identified. The harder work is the identification, which is what the audit tooling supports.


When to do which kind of audit:

The audit cadence depends on the site:

  • Annual deep audit for stable sites. A complete pass through every report, cross-referenced with Search Console, with documented recommendations.
  • Quarterly mid-audit for active sites. Focus on what’s changed since the last audit, new issues, fixes that need follow-up.
  • Monthly monitoring for high-velocity sites. Automated reports, alerts on threshold violations, manual review of trend changes.
  • Triggered audit when something breaks. Ranking drops, indexation drops, manual actions, server issues, deployment problems.

The audit isn’t one-size-fits-all. Sites with thousands of changes per day need different cadence and tooling than sites with monthly content updates.


AI crawler audit components:

Traditional technical SEO audits focused on Googlebot behavior. Modern audits need to account for AI crawler activity as a separate category with its own diagnostic questions.

The components to add to standard audit checklists:

  • AI crawler identification in logs. Server logs should be filtered for known AI crawler user agents (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Meta-ExternalAgent, Bytespider, Applebot, OAI-SearchBot, ChatGPT-User, Claude-User). The audit reports which AI crawlers are active, at what rates, and accessing which content paths.
  • Robots.txt vs actual access comparison. For each AI crawler that the site’s robots.txt addresses, verify that the actual log behavior matches the declared rule. Crawlers that ignore opt-outs need CDN-layer enforcement, not just robots.txt declarations.
  • AI fetch agent behavior on key pages. Real-time fetch agents (ChatGPT-User, Claude-User) retrieve content when users ask questions. Spot-checking representative pages by issuing test queries through the AI systems reveals whether the site’s content is being cited and how accurately.
  • CDN-layer AI crawler policy verification. Cloudflare AI Audit, Fastly Bot Management, and similar tools have their own dashboards showing AI crawler categorization. The dashboard data should align with what the site’s policy specifies.
  • Emerging standards coverage. Whether the site implements TDM Reservation headers, llms.txt, ai.txt, or other emerging signals is now part of comprehensive audits. Most sites haven’t addressed these; the audit identifies whether the omission matters for the site’s content business.

The implication for audit reports: AI crawler activity belongs in the technical SEO findings section alongside crawl budget, indexation, and Core Web Vitals. Reports that omit this category miss material risk for sites where content is valuable.


Monitoring and SIEM integration for ongoing audits:

Audits historically happened on a cadence: weekly, monthly, quarterly. Modern audit practice combines cadence audits with continuous monitoring that catches issues between scheduled reviews.

The infrastructure that supports continuous audit posture:

  • Log aggregation. Crawler activity feeds into platforms like Datadog, Splunk, Elastic, or self-hosted ELK stacks. The platforms provide queryable views of crawler behavior across time, enabling pattern detection without manual log analysis.
  • Bot management dashboards. Cloudflare Bot Management, DataDome, PerimeterX, and Akamai Bot Manager categorize incoming traffic by crawler identity with verification metadata. The dashboards show what’s allowed, what’s blocked, and what’s challenged.
  • SIEM integration. For sites with security operations, crawler activity feeds into SIEM platforms (Splunk Enterprise Security, IBM QRadar, Microsoft Sentinel) alongside other security signals. The integration matters when crawler behavior overlaps with security concerns: content exfiltration patterns, credential stuffing disguised as crawler activity, competitor scraping.
  • Synthetic monitoring. Pingdom, UptimeRobot, and Datadog Synthetics check critical URLs on schedules. Status code changes, content drift, or unexpected redirects trigger alerts within minutes rather than waiting for the next scheduled audit.
  • CrUX and Search Console alerts. Google’s free tooling provides baseline alerts for indexability changes and Core Web Vitals regressions. The signal-to-noise ratio is good for low-volume sites; high-volume sites typically supplement with paid RUM tools.

The cadence-vs-continuous distinction matters because some issues compound silently between audits. A site that audits quarterly may discover three months of degraded crawler activity that continuous monitoring would have flagged on day one.


Audit cadence by site type and scale:

Different site types have different audit needs. The standard recommendation (weekly Search Console, monthly Screaming Frog, quarterly Sitebulb) is a baseline; site type and scale modify it.

  • Large e-commerce sites with frequent inventory changes. Daily synthetic monitoring of category and product page templates; weekly Screaming Frog crawls of changed sections; monthly full-site crawls; quarterly comprehensive Sitebulb audits with competitor comparison.
  • News and publisher sites with constant new content. Hourly synthetic monitoring of breaking news templates; daily IndexNow notification verification; weekly Screaming Frog crawls focused on freshness signals; quarterly comprehensive audits.
  • B2B sites with slower content cadence. Weekly Search Console review; monthly Screaming Frog crawl; quarterly deep audit. Lower-frequency monitoring matches lower-frequency content updates.
  • High-stakes regulated sites (legal, medical, financial). All of the above plus continuous YMYL compliance auditing. Pages affecting health or financial decisions warrant tighter audit cadence because content errors carry higher consequences.
  • Sites with major recent migrations. Daily monitoring for the first 30 days post-migration; weekly audits for 90 days; standard cadence afterward. Migrations are when undetected issues accumulate fastest.

The cost-benefit calculation: audit cadence should match content velocity, business risk, and team capacity. Sites that audit too rarely accumulate hidden issues; sites that audit too frequently spend audit time that would produce more value elsewhere. The right rhythm is whatever catches material issues within their useful diagnostic window.


Audits aren’t projects. They’re a rhythm:

Technical SEO audits exist to find the problems before they become urgent and to maintain the health of the site between major projects. The combination of Screaming Frog, Sitebulb, and Search Console covers most diagnostic questions for most sites.

The fundamentals stay consistent: crawl to see what’s there, compare to what Google sees, diagnose the differences, fix what matters most first. The tools change (new versions, new platforms, new features), but the underlying discipline doesn’t.

The teams that handle technical SEO well audit on a regular rhythm: weekly Search Console checks, monthly Screaming Frog crawls, quarterly deep Sitebulb audits. The teams that don’t handle it well let issues accumulate until something breaks visibly, then scramble to diagnose. The first approach is cheaper and produces better outcomes; the second approach is more expensive and reactive.

The tooling supports either approach, but the cadence of regular auditing determines whether the tooling pays back its cost.