Log File Analysis

What is Log File Analysis?

Log file analysis involves reviewing the data stored by a website’s servers in the form of log files, which record every request made to the site. This process is an essential part of technical SEO.

In SEO, log file analysis provides valuable insights into how Googlebot and other web crawlers interact with a website. By examining log files, you can identify problematic pages, understand the crawl budget, and gain other critical information related to technical SEO.

To better understand log file analysis, it’s important to first know what log files are. These records are created by servers and contain data about each request made to the site, including the IP address of the requesting server, the type of request, the user agent, a timestamp, the requested resource URL path, and HTTP status codes.

Log files contain a wealth of information, but they are typically stored for a limited time, depending on the website’s traffic and data accumulation rate.

Why is log file analysis important?

Log file analysis is critical in technical SEO because it provides valuable insights into how Google and its crawlers interact with your website. By examining log files, you can track:

  • How often Google crawls your website
  • Which pages are most commonly crawled and which ones aren’t crawled as often, including whether your website’s most important pages are being crawled
  • Whether there are problematic and irrelevant pages that waste search engine resources, also known as your crawl budget
  • The specific HTTP status codes for each page on your website
  • Sudden changes, like significant increases or drops, in crawler activity
  • Non-intentional orphan URLs, which are pages with no incoming internal links that can’t be crawled and indexed

Log file analysis provides answers to important questions related to search engine crawling behaviors and helps you make informed decisions about website optimization. By understanding which content is being crawled and how often, you can improve your website’s visibility and performance in search engine results.

How to do a log file analysis?

You’ll find a high-level overview of the steps needed to complete a log file analysis below. 

1. Access the log files

Log files are kept on the server, so you will need access to download a copy. The most common way of accessing the server is via FTP - such as Filezilla, a free, open-source FTP - but you can also do it through the server control panel’s file manager. 

The actual steps you will have to take to access the log files depend on the web hosting solution you are using. 

Keep in mind that there are certain issues you might encounter when trying to access log files: 

  • The log files could contain partial data that is scattered across several servers, meaning you would have to compile logs from all these different servers.
  • Privacy and compliance issues due to the fact that log files contain users’ IP addresses - which are considered personally identifiable information and would need to be removed.
  • The log files might’ve been configured only to store data for a few days, so they won’t be useful for trend analysis. 
  • The files tend to be formatted in unsupported ways and will need to be parsed before an analysis.

2. Export and parse log files

Once connected to the server, you can retrieve the log files you’re interested in analyzing, which will most likely be the logs from search engine bots. Do note that you may have to parse the log data and convert it into the correct format before proceeding to the next step. 

3. Analyze the log files

You could simply import the data to Google Sheets. However, it can quickly add up, even if you filter it for requests from Googlebot within a limited time frame - and you will likely have to comb through tens, if not hundreds, of thousands of rows of data. 

The more efficient - and most certainly less time-consuming - option would be to use specialized software designed to do the manual work for you.

Here are a few examples of tools you can use for log file analysis: 

  • Logz.io
  • Splunk
  • Screaming Frog Log File Analyser
  • ELK Stack

You can also use Ahrefs’ Site Audit to get more data and then combine it with the log file data. Combining data from two different sources and including information about your website’s traffic, crawl depth, indexability, internal links, and status codes will provide in-depth insights. 

Here are some things to pay attention to as you go over your data: 

  • Examine status codes and identify HTTP errors (URLs that return non-200 status codes, such as 404 Not Found and 410 Gone errors)
  • Take note of potential crawl budget wastage (due to crawling non-indexable URLs) 
  • Check which search engine bots crawl your website most frequently 
  • Monitor crawling over time, and see if there was an increase in crawling activity on your website
  • Look for orphan pages (pages that cannot be crawled and indexed because they do not have any incoming internal links)