The Case of the Site that Wouldn’t Index
The Case of the Site that Wouldn’t Index
Complications with WordPress and Organic Search Listings
This article explores why a website that, for all appearances seemed to be working, could not be indexed by search engines like Google or Bing despite having otherwise correct settings and a clean robots.txt file.
Recently a friend of mine reached out to our social network on Facebook asking for help for his non-profit’s website which he couldn’t seem to get Google to index.
After reading the comments it became clear he was doing everything right: the robots.txt was written correctly and the page headers didn’t contain a NOINDEX directive). Several friends suggested it just took a while to be listed in organic search results and he replied he had been waiting for months.
However, he explained that in Google Search Console when he attempted to “fetch” the site it would respond with an error.
Sure enough, the status was Unreachable.
It Loads Fine, Why Won’t it Index?
The first thing I did was load up the site to see if the page was rendering at all for the public. Perhaps it was only working for him in some sort of sandbox mode?
Yes, it loaded up all the content just fine. The site was fully online.
From top to bottom all of the page content appeared to be loading, all the way down to the legal footer and copyright.
Just in case, I opened up the source code for the home page just to make sure it wasn’t truncating the tags and prematurely terminating the site.
Indeed it was finishing the site load completely — the full page was there and yet Google was choking when attempting to fetch it.
Wanting to check if there were any redirect issues I enabled the Chrome plugin “Redirect Path” (which I recommend for any web developer). Immediately the problem presented itself.
The site was returning a “500 Internal Server Error” status code despite rendering the page! I would’ve noticed this if I’d opened up my Chrome developer tools as well, but just happened to see it with the Redirect Path plugin first.
What is a 500 Internal Server Error?
The HTTP protocol contains a list of possible response codes for a given web request.
“200 OK” means the page loaded normally. “302 Temporary Redirect” means the page wasn’t there as expected but the server knew to send you to another page instead. “404 Not Found” means the page wasn’t there and the server didn’t have anything else to give you. A full list can be found here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
“500 Internal Server Error” means that while the page was being built by the server it encountered a fatal error that caused either a portion or the entirety of the page to fail.
Yet, strangely enough, the entire site appeared to still be loading!
Built with WordPress Minus a Few Things
After logging into the server itself to poke around I found a standard WordPress installation with a custom theme and a variety of plugins installed.
WordPress is incredibly fault tolerant these days. So despite the persistent and loud fatal errors documented in the server error log it was still serving up the page as best it could.
Some quick Googling around showed that the theme needed the Vafpress framework for a portion of its functionality (that was essentially unused and thus wasn’t obvious), and that the JchOptimize plugin was missing parts of its installation files.
By simply installing the missing framework and disabling/re-installing the plugin the site suddenly ceased having the 500 errors and returned a “200 OK” instead.
After verifying the rest of the site pages were loading correctly as well I went back to the Google Search Console and gave it another go.
Success! We are Go for Google Organic Search
This time Google was able to fetch and render the entire site in a matter of minutes. By the next morning it was fully showing up in Google search results as well:
Google, and other web services, trust HTTP status codes more than their “eyes.”
In the case of this website, despite WordPress recovering with alacrity from the theme/plugin errors, it masked the underlying technical problem.
Neither the hosting provider nor Google would’ve been able to provide any meaningful support in getting the site indexed. Only a direct look at how and why the page was being built and the HTTP status code being returned to the browser could clue us in to what was really happening.
It’s understandable why a lot of the troubleshooting advice for WordPress users when they run into problems is to switch back to a default theme and disable plugins. The problems could be buried in the custom theme code, or a conflict in the plugin, or missing dependencies. Best to start from the simplest possible place and add back complexity until it breaks, than be left wondering which of the many parts is broken!