Google Search is a black box, and the inner workings of its algorithmic crawl bot are a closely guarded secret. Still, close examination of Search Console data does tell us something about the way Google Bot moves around websites, and uncovers/analyses new content.

Without getting all mystic meg, we watch for ripples in the pond – and try to unpick what they tell us about the way the stone was thrown. 

1-is-google-struggling-to-index-new-content

In the past, Google Bot used to be incredibly thorough. A couple of years ago, you’d expect each and every page of a reasonably-sized, active and well-maintained website to be crawled and re-crawled every couple of weeks. 

But over the last twelve months, people at the coalface of digital marketing have been seeing a new trend. 

Stack Exchange, Reddit, LinkedIn, Quora and Google’s own webmaster support forum have all been filling up with threads that talk about a marked decline in both the speed and frequency with which Google is crawling - and indexing - site content.

20-is-google-struggling-to-index-new-content 

To begin with, it looked like the problem was limited to enormous eCommerce websites with 6 million+ unique URLs or big websites with lots of rich media – websites that have always been indexed slowly, due to the sheer volume of content they’re throwing at Google

But in recent months, we’ve seen complaints from the owners of much smaller websites and our own data shows a similar trend spreading to 300-3,000 page websites as well. In fact, client Search Console accounts show a marked change in the way Google’s interacting with content. 

4-is-google-struggling-to-index-new-content

Accounting for the noise you always find in Google Search Console, we found that:

 

  • Existing, high quality content is crawled approximately 47% less frequently than it was in the period Feb 2022- Feb 2023
  • New content is approximately 12% more likely to end up Discovered - Currently Not Indexed
  • New content is 34% more likely to end up Crawled - Currently Not Indexed

 

And it’s worth noting that the type, quality and ‘discoverability’ of content hasn’t changed at all in this period. We know that our ability to deliver  for clients depends on our ability to create compelling content that adds value and ranks well in Google, which is to say that we haven't suddenly started pumping out low-quality content.

In many cases, clients are actively contributing to this content too. Given that most of the businesses we work with are B2B organisations that trade on deep knowledge of complex industrial processes, we rely on the subject matter experts employed by our clients to craft, refine and improve everything from service landing pages to case studies. 

So, we’re confident that the content is good, but it’s not being crawled or indexed as frequently by Google. What’s going on? 


Discovered, Crawled And Indexed: What’s The Difference? 

Before we do a deep dive into the nitty gritty of Google Bot’s capabilities, it’s worth explaining how Google actually crawls and indexes content. Skip this bit if you’re an SEO geek, but it’s a good refresher for anyone that’s not 100% sure of the difference between 

In simple terms, there are three phases or stages to indexing. According to Google, their crawl bot 

  • Discovers new URLs via links from another (crawled) web page
  • Crawls them; analysing the content, assessing the quality and deciding whether it’s worth serving in one or more search engine results pages (SERPs)
  • Adds them to its index, which ensures that they’re served when someone searches for a relevant query

New content can get stuck at any stage of this process: Google might see a link to it but decide that it’s probably not worth following the link and crawling the content, which is how content ends up marked as Discovered - Currently Not Indexed

seo-for-smes-explained-in-plain-english-9

Or Google can choose to follow the link, crawl the page and then decide that it doesn’t want to index the content which is how pages get marked as Crawled - Currently Not Indexed.

Now, we all know that Google can be fairly selective when it comes to crawling and indexing content. There are no shortage of Search Engine Journal, Moz or SEMRush articles documenting the fact that Google probably won’t crawl or index your content if it’s perceived as low quality, a duplicate (or near duplicate of other web content) or hampered by technical issues. 

Thing is, we’re not talking about general indexing or crawling problems here. We’re talking about a marked slowdown in the way good quality, rankable content on quick websites is handled by Google Search. 

So, what’s the explanation?


Why Would Google Struggle To Keep Up With New Content?

First things first, it’s worth remembering that Google’s not trying to index the entire internet. The boffins over at Search Engine Journal reckon that Google’s index contains hundreds of billions of web pages  – and takes up over 100 billion gigabytes of memory (source: Search Engine Journal). 

But that’s still less than 5% of the entire internet, which is supposed to contain approximately 64 zettabytes or 64 trillion gigabytes of data. (source: Statista). The reality of the situation is that it is simply impractical for anyone to try crawling and indexing every last scrap of content. 

5-is-google-struggling-to-index-new-content

As Google’s own Senior Search Analyst puts it, the world’s favourite search engine has to try to “find the balance between indexing as much content as possible, and indexing content that’s actually useful to search engine users (source: Google Search Central). 

Ask some pundits, and they’ll tell you that you can watch this struggle in real time. Kevin Indig, host of the (very good) Contrarian Marketing podcast once posted a tremendously detailed article that mapped out the fluctuations in Google’s index before and after major algorithmic updates like Panda or Penguin (source: Growth Memo).

He found that Google’s index actually shrank following the Panda update, which goes some way to demonstrating the fact that Google’s less interested in the size of its collection, and more interested in the value individual pages provide for web users. 

Still, the internet’s always been much larger than Google’s index. What we’re talking about here is a relatively new trend, marking a very definite shift in the way Google analyses useful and relevant content on relatively small sites. 

This sudden shift must be a response to specific stimuli, and we want to know exactly what’s going on behind the scenes because it could spell disaster for any business that’s gone all-in on content marketing – and can’t work out how to get their articles indexed in good time. 

Writing in Search Engine Land, Dan Taylor points out that the internet is growing at an exponential rate, and that growth has only surged following the release of generative ‘AI’ tools like Chat GTP (source: Search Engine Land).

11-is-google-struggling-to-index-new-content

That tallies with reports that Google is now being inundated with a deluge of low-quality, spam articles spun up by large language models that are freely available to the public (source: MIT Technology Review).

But it’d be both alarmist and unscientific to pin all the blame on AI (much as I’d love to. God I hate AI). 

See, there’s a sustainability angle to all of this too: Back in January 2022, Search Engine Journal announced that Google was toying with the idea of reducing its web page crawl rate in response to growing pressure to reduce the company’s carbon footprint (source: Search Engine Journal).

This might sound incongruous – bordering on ridiculous, but the logic’s actually fairly sound. Computing is an energy-intensive process and repeatedly churning through millions of URLs is a surefire way to waste resources. 

Instead, Google could opt to halve or quarter the frequency with which they crawled or recrawled certain sites to significantly reduce the environmental impact of their indexing process.

6-is-google-struggling-to-index-new-content

This is all conjecture at the moment: We have an official statement of intent but no follow up statement declaring that Google have adjusted their web page crawl rate. Still, a slight reduction in crawl rate coupled with an exponential growth in the amount of spam content clamouring for Google’s attention would go some way towards explaining the trend we’ve noted here. 


How Can We Tackle The Slowdown?

Assuming you run a marketing team or business, your primary goal will be making sure that your own content isn’t languishing in purgatory for longer than it needs to. 

10-is-google-struggling-to-index-new-content

Yes, it’s nice to know that there’s a reason for the marked increase in ‘Discovered - Currently Not Indexed’ errors in your search console, and it’s probably equally nice to know that other people are struggling with tonnes of ‘Crawled – Currently Not Indexed’ errors too. 

But priority number one is making sure that your team is equipped with the tools required to get your content indexed asap. After all, content that isn’t indexed can’t rank, and content that isn’t ranking can’t attract potential customers. 

I don’t want to get stuck into diagnosing and fixing the errors that eat up your crawl budget because that’s been done to death by people far smarter than me. But it is worth checking that you’re not currently plagued by these issues before you start trying to upregulate your chances of getting indexed.

A house built on shaky foundations, and all that.

7-is-google-struggling-to-index-new-content

Next comes the softer side: Trying to demonstrate that your content is valuable to Google. Obviously step one is making sure that you have a meaningful page title element and a reasonably compelling meta description. 

There’s no proof that a good meta description gets your content indexed faster, but there is proof that Google uses user-generated signals (interaction data) to work out what content it should index and rank (source: Search Engine Land). So it pays to nail this bit early in the process. 

After that, our best advice is:

  • Use clear, info-rich header tags to break up your content, make it easier to parse, and signpost value to Google
  • Make sure that your content is unique and genuinely useful. Thin waffle is easy to detect and Google makes a habit of ignoring it.
  • Remember that EEAT is Google’s mantra now: All content must demonstrate
    • Experience
    • Expertise
    • Authoritativeness
    • Trustworthiness
  • Flagging author info, update dates and other contextual data can help to demonstrate relevance
  • Rich media like videos add another element, and help to demonstrate that you’re adding value to a conversation
  • Bringing in external experts to add commentary can help if you’re really struggling to create something that’s meaningful and addresses a specific search query
  • You can always set yourself specific goals by looking at the content ranking in number one for your search, and working out how to out-do it.

What About Pre Existing Content?

If a lot of your historic content is already trapped in purgatory and hasn’t been indexed, we’d recommend picking a few problem pieces, and improving them using the blueprint above. 

8-is-google-struggling-to-index-new-content

After that, request reindexing in Google Search Console and wait to see what happens. If they end up indexed, you have a repeatable formula that you can use to get the rest of your historical content indexed asap.

Failing that, we’d recommend getting in touch. The nuances of indexing and the minute signals that can denote ‘quality’ content are so varied and intangible that there’s no real benefit to producing an exhaustive list here.

If fixing the basics doesn’t fix your problem, you probably need an expert to dig around under the hood and find out what’s going on. We’re always looking for more opportunities to geek out over indexing issues in GSC so don’t hesitate to reach out if you’re struggling to get your content crawled.

Alternatively, if you’re not fully comfortable reaching out to a brand-new agency on the basis of a single blog post, you’ll find some useful resources and further reading here: 



Inbound tips in your inbox

To get more great inbound marketing tips sign up to our blog and follow us on Twitter or Facebook.

New!  A plain-talking digital marketing podcast  Available in all the usual places  Grab it here
Free Site Audit  Yeah we know, website audits are overplayed.   But what if you could actually get a real expert to pick through your site and  tell you where you’re going wrong?  Get Your FREE Audit

Call us, email us or just click here to book a meeting