OnPage SEO

Soft 404 vs hard 404: what each does to indexing

A 404 is the server saying “this page doesn’t exist”:

When a browser or crawler requests a URL the server can’t fulfill, the server returns an HTTP status code that explains the problem. The 404 status code, the most familiar of the error codes, means the requested resource was not found.

That single status code is doing real work. The browser uses it to decide whether to display the page or an error message. The search engine uses it to decide whether to keep the URL in its index, remove it, or treat the page as still meaningful. The analytics system uses it to flag the request as a missing-page event for tracking. The CDN uses it to decide whether to cache the response and for how long.

The trouble starts when the response doesn’t match what actually happened. A page that’s gone forever should return 404 (or 410). A page that the user is being redirected from should return 301 or 302. A page that exists but had a temporary problem should return 500. When the status code disagrees with the reality of the page, Google and other crawlers can’t trust the signal, and the resulting confusion produces what’s called a soft 404.


Hard 404: the server confirms the page is missing:

A hard 404, sometimes called a true 404, is the response a web server returns when the URL requested doesn’t map to any resource on the site.

The HTTP response looks like this:

HTTP/1.1 404 Not Found
Content-Type: text/html

<html>
  <head><title>Page not found</title></head>
  <body>
    <h1>404 - Page not found</h1>
    <p>The page you requested doesn't exist on this site.</p>
  </body>
</html>

Two parts matter. The status line at the top declares 404 Not Found. The body of the response can be anything: a styled error page, a search box, links to popular pages, a humorous illustration. The server is saying “this URL doesn’t lead anywhere,” and the browser displays whatever HTML accompanies that declaration.

Hard 404s are the correct response for genuinely missing pages. A blog post that got deleted. A product page for a product that was discontinued. A URL someone typed wrong. The crawler reads the 404 status, removes the URL from its active index (or never adds it in the first place), and stops re-crawling it as frequently. The signal is unambiguous: this URL has nothing to offer.

For URLs that are permanently gone, the related status code is 410 Gone. The difference is intent. 404 means “the page isn’t here right now, but you can keep checking.” 410 means “the page is gone permanently and won’t come back.” Google treats both similarly in practice but processes 410 slightly faster, since the signal is more definitive.


Soft 404: the page acts missing while claiming to exist:

A soft 404 is the failure mode that breaks the signal. The page returns HTTP 200 (OK, the page exists) while the body of the page tells the user the content isn’t there.

Examples that produce soft 404s:

A site’s catch-all handler sends any unknown URL to a generic “page not found” message at the homepage, returning 200 the entire way. The user sees an error. The crawler sees a successful response to a real page.

A search results page that finds zero matching products renders with the message “No results found” but returns 200. To a crawler, it’s a valid page. To a user, it’s empty.

A category page that lost all its products through inventory changes still loads, returns 200, and displays “This category is currently empty.” The page technically works. It contains nothing.

A product page that gets soft-deleted in the CMS doesn’t 404, it redirects to a generic error template that returns 200. The product is gone. The URL is still serving content that doesn’t represent anything.

In every case, the user understands the situation: there’s nothing on this page. The crawler doesn’t. The HTTP 200 says “fine, normal page,” so the crawler indexes the URL, keeps re-crawling it, and consumes crawl budget on a page that should have been retired.

Google detects most soft 404s automatically. The detection works through signals like very thin content combined with “not found” phrasing in the page body. When Google detects a soft 404, the URL gets flagged in Search Console’s Page indexing report (formerly Coverage report) under the “Soft 404” category, and the page typically drops out of the index even though the server keeps returning 200. The end state is similar to a hard 404 (the page falls out of search) but achieved through pattern detection rather than a clear status code, which means the timeline is slower and the detection isn’t perfectly reliable.


Why soft 404s happen by accident:

Most soft 404s aren’t deliberate. They’re side effects of how sites get built. Four patterns produce the bulk of them.

The catch-all 200 handler. A CMS or framework configuration routes unknown URLs through a custom error template, and the template returns 200 instead of 404. The “page not found” message displays correctly to users while the status code says everything is fine.

The empty result page. Search pages, filter pages, and dynamic listing pages return 200 even when they have no content to show. Every empty filter combination becomes a soft 404 in Google’s eyes.

The thin redesign. A site rebuild eliminates pages that used to have content. The URLs still resolve, but the new pages contain placeholder text, generic site-wide messaging, or just a navigation menu. To a user, the page looks empty. To Google, it looks like a page that lost its content but is still being served.

The unintentional content removal. A staging environment or CMS bug deletes the body content from many pages at once while keeping the page records. URLs return 200 because the page record exists. The body is empty or contains template-default text. Without close attention, these pages can sit in the index for weeks, accumulating soft 404 flags as Google’s detection catches up.

The common pattern across all four: the server doesn’t know the page is broken. It returns 200 because, technically, the request succeeded. The breakdown is at the content layer, which the server isn’t checking.


Hard 404 vs soft 404: what changes for SEO:

The practical SEO consequences of each pattern differ in specific ways.

Behavior Hard 404 (correct) Soft 404 (broken)
HTTP status code 404 (or 410) 200
Crawler interpretation "URL is gone, drop from index" "URL is fine, keep in index"
Initial indexing URL never enters index URL enters index, may be dropped later
Crawl frequency Drops over time as URL repeatedly returns 404 Continues at normal frequency, wastes crawl budget
Link equity Equity from inbound links is lost cleanly Equity gets stuck on a dead page that can't pass it along
User experience Browser shows error page; users understand Page loads, but content tells user something is wrong
Time to clear from index Days to weeks for definitive removal Weeks to months, depends on Google's soft 404 detection
Search Console reporting "Not found (404)" under Page indexing report "Soft 404" under Page indexing report

The most damaging consequence is the crawl waste. A large site with thousands of soft 404 URLs (often from filter combinations, search pages, or stale product listings) burns through crawl budget on pages that won’t ever provide value. The pages Google should be re-crawling more often (money pages, fresh content, important updates) get crawled less because the crawler is busy re-fetching soft 404s.

The link equity consequence affects sites with external inbound links to URLs that have become soft 404s. The site receives the link signal, but the destination can’t do anything with it because the page has no content to amplify. A hard 404 at the same URL would at least clear the signal, allowing the site to consolidate it elsewhere through redirects or canonical tags. A soft 404 leaves the signal stranded.


Detection: finding the soft 404s already on the site:

Most sites with significant content have some soft 404 problems they don’t know about. Detection is the first step in fixing them.

Google Search Console’s Page indexing report (formerly Coverage report) categorizes indexed pages by status. Under “Why pages aren’t indexed,” a “Soft 404” category lists URLs Google has detected as soft 404s. The report shows the URL, the date Google first flagged it, and a sample of the page content at the time of the detection. For most sites, this report is the primary detection tool.

Crawler tools like Screaming Frog and Sitebulb identify potential soft 404s through different signals. Screaming Frog flags pages with thin content (under a configurable word count) and pages with content matching configurable phrases (“not found,” “no results,” “page does not exist”). The crawl produces a list of suspect URLs that need manual review.

Server log analysis catches a third category: pages that return 200 to crawler requests but are never linked from within the site. A page no other page links to that returns 200 might be a soft 404 the site doesn’t know exists. Log analysis tools like Splunk, Screaming Frog Log File Analyser, and Botify surface these patterns at scale.

For small sites, manual sampling works. Pick a sample of pages flagged as low-traffic in analytics, visit them, check whether they actually contain content. The review takes minutes per page. The fixes that come out of it often eliminate dozens of soft 404s a site didn’t realize existed.


The four-step fix:

When a soft 404 is identified, the response depends on what the page should be doing. Four possible outcomes:

Convert to hard 404. If the page is genuinely gone and shouldn’t exist, configure the server to return 404 (or 410) for the URL. The CMS error template needs to return the correct status code, not just display the right message. In Apache, this typically means an .htaccess directive. In Nginx, it means a try_files fallback that returns 404 explicitly. In application frameworks, it means the route handler returning the correct status.

Redirect to a related page. If the missing page had content that’s now somewhere else, a 301 redirect points the URL at the new location. A discontinued product redirects to a similar product or to the category page. An old blog post redirects to an updated version. The redirect preserves the link equity and serves users who arrive from external links.

Restore real content. If the page should have content but currently doesn’t (empty filter combinations, depleted categories, broken templates), the fix is at the content layer. Restore the products. Re-stock the category. Fix the template. The URL stays. The content returns.

Block from indexing. For pages that should exist for users but shouldn’t be indexed (internal search results, infinite filter combinations, parameter variants), the right tool is noindex. The page still works for users. The crawler stops adding it to the index.

The choice between options depends on the page’s role. Genuinely deleted content gets 404. Content that moved gets redirected. Content that should exist gets restored. Content that exists for users but not for search gets noindex. The wrong choice creates a different problem: 404’ing a redirected page breaks bookmarks; redirecting an empty filter to the homepage creates a different kind of soft 404; noindex’ing a real page hides it from users who would benefit.


Custom 404 pages: useful without breaking the signal:

A well-designed 404 page reduces user frustration without compromising the underlying HTTP signal. The two layers operate independently.

The status code must be 404. This is the non-negotiable part. The server returns HTTP/1.1 404 Not Found regardless of how pretty the error page looks. Tools like the Network panel in browser DevTools, curl -I, and the URL Inspection tool in Search Console all confirm whether a 404 page is returning the correct status code.

The page body can be anything useful. Common patterns:

A search box for users who typed a wrong URL. A sitemap or list of popular pages for users who arrived from a stale link. A friendly explanation of what happened and what the user can do next. A link back to the homepage or to the most likely intended destination based on the URL structure. A reporting mechanism for users to flag broken links.

Some sites add humor, illustrations, or animation. These choices don’t break SEO as long as the status code is correct. The user sees a thoughtful 404 page. The crawler sees a 404 response.

The opposite pattern, a stylish “page not found” message that returns 200, is the classic soft 404. Same visible page, different status code, different SEO consequence. The fix is one line of server configuration.

For larger sites, the 404 page also gets logging. Patterns in the 404 traffic reveal which broken links readers are hitting most often, which often surfaces missing redirects that should be added. A 404 page that gets 500 hits a month for the same URL is a strong signal that URL should be redirecting to its replacement.


Seven 404 anti-patterns:

The status-code mistakes that produce soft 404s tend to repeat across sites, often at the framework or template layer where they’re easiest to miss.

  1. Unknown URLs redirected or rewritten to a homepage that returns 200. Whether the server uses a 3xx redirect to the homepage or an internal rewrite, the end state is the same: a non-404 response for a URL that should have been a 404. Fix: keep unknown URLs returning 404. Use a dedicated 404 page that returns the right status code with helpful content.
  1. Empty search and filter pages returning 200. Zero-result pages exist as real URLs in the index. Fix: add noindex to zero-result pages. For high-volume filter combinations that should be indexable when populated, return 404 only when truly empty.
  1. Discontinued product pages with placeholder text. Pages still load, but the content reads “This product is no longer available.” Fix: decide per-product whether to redirect to a successor product (301), block from indexing (noindex), or remove entirely (404). Don’t leave placeholder pages in the index.
  1. 404 page returning 200. The visible error message displays correctly, but the status code says everything is fine. Fix: server configuration. The HTTP status code must match the user-facing message. Test with curl -I or DevTools to confirm.
  1. No 404 monitoring. Soft 404s accumulate without anyone noticing because no one looks at the Page indexing report. Fix: add the Soft 404 report to monthly site health reviews. Set up alerts for spikes in soft 404 detections.
  1. 404 page with no navigation. The error page is a dead end. Users have no way to recover. Fix: every 404 page should include site navigation, a search box, and links to high-value content. Make it useful, not just present.
  1. 404 URLs blocked by robots.txt. If robots.txt blocks the path pattern, the crawler never requests those URLs, so it never sees the 404 response and never learns the URL is gone. Fix: don’t block URL patterns that need to return 404. The 404 response itself is the signal Google uses to drop the URL; that signal can only travel if the crawler is allowed to fetch the response.

An eighth pattern worth flagging: redirect chains that end in 404. A URL redirects through three or four 301s before landing at a hard 404. The crawler treats this as a 404 but wastes time getting there. Fix: audit redirect chains periodically. Either fix the final destination so the redirect lands on a real page, or shortcut the chain to land on the 404 directly.


One status code wrong, three consequences:

A page that returns the wrong status code starts a chain of consequences the site operator doesn’t immediately see.

The first consequence is on the crawler. The crawler reads the 200 and treats the page as a successful response. The URL gets indexed. Subsequent crawls return for the same URL on the regular crawl schedule. The crawl budget that should have been spent finding new content gets spent re-fetching a page with nothing on it.

The second consequence is on the index. Google’s automatic soft 404 detection takes time, sometimes weeks, sometimes longer. During the detection window, the URL sits in the index, may appear in search results, and may even rank for queries that match its meager content. Users clicking those results land on a page that doesn’t deliver. The site’s reputation absorbs the impact.

The third consequence is on the link signal. Inbound links to the soft 404 URL accumulate normally, but the destination has no content to amplify, no internal links to pass equity through, no way to consolidate the signal anywhere useful. The link equity gets stuck. The site receives the inbound traffic but can’t convert it to anything ranking can use.

By the time the soft 404 is detected and fixed, the consequences have already propagated. The crawl budget waste shows up as slower indexing of legitimate new content. The index pollution shows up as gradually degrading visibility for the affected sections of the site. The stranded link equity shows up as inbound links that don’t produce the search benefit they should.

The fix is straightforward at the moment of building the page: return the right status code for the situation. The cost of getting it wrong is invisible at first and structural by the time it shows up. The status code is one of the smallest pieces of metadata a server can send, and it’s one of the few that propagates consequences across every system that reads the page.