The crawl budget is the amount of time and resources a search engine assigns for crawling a particular website. In other words, it is the maximum number of pages a search engine can crawl on your site within a specific timeframe.
The crawl budget can vary for different search engines (or crawlers).
Google states that you shouldn’t worry about the crawl budget unless:
Each website gets a different crawl budget based on these two factors:
The crawl budget is important because it affects how many pages Googlebot can crawl on your site. Besides, it also influences how often Googlebot can recrawl your web pages to update its index.
Google has enormous resources, yet it cannot crawl (and regularly recrawl) all pages of the Internet. As a result, Google allocates a crawl budget to websites.
And that’s why you want to ensure that your crawl budget is not being wasted on crawling the unimportant pages of your site.
That said, you need not be bothered about the crawl budget if you’re running a standard blog or small website.
Here’s how you can optimize your site’s crawl budget.
The server response time and page loading speed directly affect crawling. It works something like this:
When Googlebot crawls your site, it downloads the resources first and then processes them. If your server responds quickly to the crawl requests by Google, it can crawl more pages on your website.
So, use a fast and reliable web hosting service and Content Delivery Network (CDN) to improve server initial response time.
At the same time, decrease your page loading times by:
The number of links to a page tells Google about the importance of this page. Googlebot prioritizes crawling pages with more backlinks and internal links.
So, you can increase your crawl budget by adding more external and internal links to your pages. While getting backlinks from external sites may take time and is not (completely) in your control, you can start with the easier option—internal linking.
You can get internal linking suggestions by auditing your website with our Site Audit tool.
Too many broken internal links (404 or 410 response codes) and redirected URLs (3xx) can waste your site’s crawl budget. Although these pages will have low crawl priority if they’ve remained unchanged for a while, it’s better to fix them to optimize your crawl budget and for overall site maintenance.
You can easily find the broken, and redirecting URLs on your site in the Internal pages report in Site Audit or using our free Webmaster Tools.
Once you find the broken internal links, you can reinstate the page at the same URL or redirect the URL to another relevant page.
For the redirects, see if there are many unnecessary redirects and redirect chains and replace them with a direct link.
Another way to get your pages crawled faster is by using Google’s Indexing API. It lets you notify Google directly whenever you add, remove, or update pages on your site.
However, the Indexing API is currently available only for use cases like live videos and job postings. So if it’s applicable to your site, you can use it to keep your URLs updated in Google’s index and search results.
No, Googlebot doesn’t respect the crawl-delay settings applied in a robots.txt file.
You should care about the crawl budget only if you’re operating a very large site i.e. more than 1 million pages or a medium-sized website with very frequent (daily) changes in content. That said, most of the sites don’t need to worry about the crawl budget.
You won’t find the exact number for crawl budget anywhere. But you can check the overview of Google crawl activity in the Crawl Stats report in Google Search Console.