Kotoba SEO Agency Logo

URL Structure, Parameters and Google Indexing [Case Study]

URL Structure, Parameters and Google Indexing
How to fix a website that returns 5 million pages when crawled?

Table of Contents

A few weeks ago we were contacted by a client whose e-commerce website was having crawling and indexation issues – the crawl rate of the website was declining, lots of unnecessary pages were being indexed and, most importantly, they were unable to analyze their website with a crawling tool (Screaming Frog) to identify and resolve those issues. 

After getting the overall description of the situation, we decided to start planning the crawling of the website and asked for log files for later analysis. 

We realized that the bot was discovering millions of pages while the client’s website only had around 5000 product pages. Since it was an e-commerce website, we naturally thought that URL parameters might be at fault. 

URL Parameters

First, let’s talk about the URL parameters. According to Google, URL parameter is “a way to pass information about a click through its URL”. In the example below, we know that we are looking at T-shirts that are blue and short-sleeved.

URL parameter example blue t-shirt

URL parameters are especially useful for tracking and for enhancing the user experience on the website by providing filtered and organized content. But as you may have already guessed, there is a catch! 

While certainly handy, URL parameters can also become a nightmare for SEOs. They can create duplicate content, waste the crawl budget, and even mess up the ranking signals.

So naturally, there are several SEO best practices website owners should follow to avoid potential URL parameter issues:

  • Remove URLs with blank values
URL with blank value
  • Avoid ID duplication in the URL
URL with duplicate ID
  • Keep the same order of keys in the URL 
  • Add canonical tags when applicable 
  • Add noindex tags when applicable 
  • Disallow URLs with unimportant parameters in the Robots.txt file
  • Easily add rules for URL parameters in the Search Console
Search Console URL Parameter Tool

These will ensure that you are using the URL parameters in accordance to your business goals instead of wasting that oh so essential crawl budget. 

Okay, enough with introduction and back to the matter at hand – how to fix a 5000-page website that returns millions of pages. 

After identifying that the issues were indeed caused by the URL parameters, we quickly analyzed the syntax of the URLs and came up with a list of 54 parameters that should be excluded from the crawl. We used Screaming Frog for this. 

We did this by excluding the URL parameters from the crawl file by going to Configuration > Exclude and writing the id in the following formats for each parameter: 

.\?id. 

.\&id.

Screaming Frog exclude URL parameters

After running a new crawl (limit being 5.000.000 URLs), even with the excluded parameters, we still ended up with 5.000.000 URLs. (I know, I know)

But! Luckily, the exclusion enabled us to see yet another issue with the URLs. Besides having parameters in the URL, static URLs were created whenever the user chose a brand in the shop page. Moreover, if several brands were chosen at the same time, a static URL including all of the brands was being generated, and the order in which the brands were listed in the URL depended on the user’s selection. Here is what the URL looked like: https://www.example.com/t-shirts/brand_1-brand_2-brand_3

To confirm our suspicion,  we went on and updated the exclude file with the brand links. You can exclude the brand URLs from your SF crawl by writing the following rule:

https://www.domain.com/.*brand_1.*

https://www.domain.com/.*brand_2.*

We were finally able to make Screaming Frog crawl the website as it was intended! *fireworks*

But It was too soon to celebrate, here is where we discovered some other issues with the URL structure the most worrying of which was that the URLs contained lots of repeating words.

Screaming Frog Crawl

URL Structure

Getting the URL structure right is essential both for SEO and for pleasant user experience. After all, the page URL is one of the first things that the user sees alongside page title and meta description in the search engine result page. 

While there are no fixed rules on how to structure your URLs, there are still some best practices worth following:

  • Use short keywords as topics: 

https://www.example.com/t-shirts 

  • Avoid using unnecessary repetition of words:

https://www.example.com/t-shirts/red-t-shirts/red-v-neck-t-shirts 

  • Build hierarchy and structure and stick to it:

 T-shirt > brand > v-neck, crewneck > color  

  • Avoid caps and pay attention to hash usage and URL length: 

https://www.example.com/t-SHIRTS/red-t_shirts/red-v-neck-t-shirts 

Optimal Indexing

So coming back to our website, after identifying the issues with the structure and parameters, it was time to identify the priority URLs that needed to be fixed. This is where the log files came to play! We got the most important URLs from the log file analysis and started with the optimizations. *more fireworks*

Since Google uses mobile-first indexing, we firstly prioritized the URLs according to the number of events occurring for Googlebot Smartphone. The brand URLs had the biggest number of occurring events.

URLs with most number of events Googlebot Smartphone

When it came to Desktop Googlebot and Bingbot, URL parameters were taking most of the crawl budget.

Desktop Googlebot and Bingbot URLs

Here is the summary of how we have fixed the URL parameter and URL structure issues. 

TL;DR Solution Summary

Brand URLs

URL Parameters

  • We added rules in the Search Console for all existing URL parameters.
  • We added a noindex tag on the pages that are already indexed but shouldn’t be. 
  • Some of the parameters were disallowed through the robots.txt file. 
  • We fixed the parameter order and structure.

URL Structure

  • New hierarchy for the URL structure was developed and implemented where necessary.

Conclusion

The bots recognize your website through links and URLs, so their structure can easily make or break a website. The best way to approach the URL structure is to develop a strategy for them even before developing the website. Nonetheless, even if you don’t have a strategy yet, it is never too late to start. Although time-consuming, it is always possible to track and fix URLs parameter and structure issues.