MGWM

How to Mimic Googlebot with Screaming Frog

Table of Contents

Wouldn’t it be great to know exactly how Googlebot is crawling your website? Unless you have the keys to the kingdom, you won’t get perfect information. But, we can get close!

Screaming Frog has nearly limitless configurations for just about any SEO use case you can imagine. This article presents a method to help you understand how Googlebot interacts with your initial HTML.

I’ve found this crawl configuration to be incredibly useful for identifying crawl inefficiencies that might sneakily be holding back your SEO performance. Read on to set up this crawl for yourself.

Instructions

1. Configure the crawl settings

From the top menu, navigate to Configuration > Crawl Config > Spider > Crawl. Configure the crawl settings exactly as you see them in the screenshot below.

  • Optionally, you can deselect “Images,” “CSS,” “JavaScrip,” and/or “SWF” to speed up your crawl.

2. Provide your XML sitemap(s)

While in the Crawl section, scroll down to “XML Sitemaps” and provide the XML sitemaps that represent the URLs you want Google to crawl and index.

  • If you know your XML sitemaps are not set up properly or don’t contain all of the URLs you want crawled and indexed, I recommend addressing this issue first.

3. Follow redirect chains

Go to the Advanced settings (Configuration > Crawl Config > Spider > Advanced) and opt in to “Always Follow Redirects” (to identify redirect chains).

4. Opt in for structured data extraction

Go to the Extraction settings (Configuration > Crawl Config > Spider > Extraction) and opt in for structured data.

5. Set Googlebot as the user-agent

Navigate to Configuration > Crawl Config > User-agent and select "Googlebot (Smartphone)"

6. Start your crawl

From the top menu, select Mode > Spider. Enter the website's homepage and start the crawl.

7. Run a “Crawl Analysis”

When the crawl is finished, navigate to Crawl Analysis > Start

What to Pay Attention to

Here’s a non-exhaustive list of potential issues to be on the lookout for:

Basic crawl inefficiencies

  • Both HTTPS & HTTP URLs are being crawled
  • Both ‘www’ and non-’www’ URLs are being crawled
  • There are excessive internal redirects and page errors

Advanced crawl inefficiencies

  • There are excessive non-indexable URLs with a 200 status code being crawled
    • Noindexed URLs
    • Canonicalized URLs
  • There are seemingly limitless combinations of URL parameters being crawled
  • Redirect chains are triggering excessive crawling or even time-outs
  • Internal links are not present in the raw HTML
  • Paginated links are not crawlable

Poor XML sitemap coverage

  • The total of indexable URLs crawled far exceeds the size of the XML sitemap
  • There are URLs not included in the XML sitemap that should be
  • There are non-indexable URLs in the XML sitemap
  • There are orphan URLs

Structured data issues

  • Missing structured data
  • Parse errors
  • Validation errors

Speed Up Your Analysis with this Google Sheets Dashboard

It's not necessarily easy to spot potential issues in Screaming Frog's interface. That's why I created this dashboard in Google Sheets. Import your crawl data and the pre-built tables and charts will expedite your analysis.

First, make a copy of the spreadsheet. Then, read the "Instructions" tab to import your data correctly.

Get Your Crawl On

Replicating Googlebot’s crawling habits with Screaming Frog can reveal hidden SEO inefficiencies that are holding back your performance. If you need help transforming the issues you identified into actionable recommendations, reach out to the Uproer team at [email protected]!

By the way, if you found this Screaming Frog article helpful, check out our popular guide on custom extraction.

Griffin Roer

Griffin Roer

Griffin has spent more than a decade in the search engine marketing industry. After years of working as an SEO consultant to some of the country’s largest retail and tech brands, Griffin pursued his entrepreneurial calling and founded Uproer in May of 2017. He's also served as a board member for the Minnesota Search Engine Marketing Association.

See More Insights

How Strategic Category Page Optimization Boosted Non-Brand Traffic by 95%

Ecommerce brands face a consistent challenge: how to rank for competitive transactional keywords that drive meaningful sales. Category pages, often targeted for non-brand keywords, compete with established players and marketplaces, requiring a sophisticated, multi-faceted SEO strategy. In this case study, we'll explore how we helped an outdoor ecommerce brand overcome

Read More

How to Mimic Googlebot with Screaming Frog

Wouldn’t it be great to know exactly how Googlebot is crawling your website? Unless you have the keys to the kingdom, you won’t get perfect information. But, we can get close! Screaming Frog has nearly limitless configurations for just about any SEO use case you can imagine. This article presents

Read More
MGWM

Director of Operations

Dave Sewich

dave sewich

Dave made an accidental foray into digital marketing after graduating from the University of Minnesota Duluth and hasn’t looked back. Having spent the first part of his marketing journey brand-side, he now works with the Uproer team to help clients realize their goals through the lens of search.

When not at work, you’ll find Dave staying active and living a healthy lifestyle, listening to podcasts, and enjoying live music. A Minnesotan born and raised, his favorite sport is hockey and he still finds time to skate once in a while.

Dave’s DiSC style is C. He enjoys getting things done deliberately and systematically without sacrificing speed and efficiency. When it comes to evaluating new ideas and plans, he prefers to take a logical approach, always sprinkling on a bit of healthy skepticism for good measure. At work, Dave’s happiest when he has a chance to dive deep into a single project for hours at a time. He loves contributing to Uproer and being a part of a supportive team but is most productive when working solo.

Founder & CEO

Griffin Roer

Griffin discovered SEO in 2012 during a self-taught web development course and hasn’t looked back. After years of working as an SEO consultant to some of the country’s largest retail and tech brands, Griffin pursued his entrepreneurial calling of starting an agency in May of 2017.

Outside of work, Griffin enjoys going to concerts and spending time with his wife, two kids, and four pets.

Griffin’s DiSC style is D. He’s driven to set and achieve goals quickly, which helps explain why he’s built his career in the fast-paced agency business. Griffin’s most valuable contributions to the workplace include his motivation to make progress, his tendency towards bold action, and his willingness to challenge assumptions.