The X-Robots-Tag is an optional component of the HTTP response header of a URL. Much like the meta robots tag, it informs the search engines on how to crawl and index a web page.
However, there are some crucial differences between the two:
Unlike the robots meta tag (which is reserved for HTML pages), the X-Robots-Tags are used in the HTTP protocol response headers and can, therefore, be used for non-HTML files - such as images, text files, and PDF documents, too.
Here’s what the HTTP response with the X-Robots-Tag may look like:
HTTP/1.1 200 OK
Date: Tue, 15 Nov 2022 11:38:17 GMT
Content-encoding: gzip
(…)
X-Robots-Tag: noindex
(…)
The X-Robots-Tag may specify a user agent (crawler) and can use a comma-separated list of directives.
Example:
HTTP/1.1 200 OK
Date: Tue, 15 Nov 2022 11:38:17 GMT
Content-encoding: gzip
(…)
X-Robots-Tag: googlebot: noarchive, nofollow
(…)
Google first announced that it included support for the X-Robots-Tag directive in 2007 and made it clear that any directive used as a robots meta tag could also be used as an X-Robots-Tag.
X-Robots-Tag is important because it enables the use of regular expressions, executing crawler directives on non-HTML files, and applying parameters at a global, site-wide level. As such, it is a more flexible choice compared to the meta robots tag.
It’s worth noting that using an X-Robots-Tag generally isn’t as straightforward as using the meta robots tags - but since it allows you to direct search engines on how to index and crawl other file types, there are instances where it makes more sense to employ the X-Robots-Tag.
Here’s an example:
The most common way to instruct Google’s crawlers not to crawl or index page content is to use the meta robots tag found in the HTML code. But, when you want to instruct Google not to index specific text documents and other non-HTML files, an X-Robots-Tag is your only option.
In essence, X-Robots-Tags are a better choice if:
As mentioned in Google’s specifications, any directive that’s utilized in a robots meta tag can be employed as an X-Robots-Tag, as well.
The complete list of valid indexing and/or serving directives that are recognized by Google and can be used with the X-Robots-Tag is relatively long - but the most commonly used ones are:
You can find the full list of directives accepted by Google here.
You can include the X-Robots-Tag in a website’s HTTP responses using the configuration files of the website’s server.
On Apache, you can do so through the .htaccess
file or httpd.conf file - and on NGINX, it can be done in the site’s .conf
file.
Furthermore, you can combine several X-Robots-Tag directives in a comma-separated list. Here is an example of a combination of directives that will instruct Googlebot not to index URLs with a “.pdf” extension or follow any links on them:
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>
Adding the snippet of code shown above to the site’s root directory means that directive applies to any and all PDF files found on the website.
Also, when using an X-Robots-Tag in HTTP responses, you can appoint different instructions for different crawlers by specifying the crawler’s name.
If the bot (crawler) name isn’t specified, the directive applies to all crawlers by default. However, when a particular robot is specified - in this example, Googlebot - then the command will look like this:
Header set X-Robots-Tag "googlebot: noindex, nofollow"
There’s one crucial thing to remember when using the X-Robots-Tag:
Googlebot - or any other “good” web crawler - will only be able to read the X-Robots-Tag when it is able to access the URL. If you’ve disallowed crawling of a particular URL through the robots.txt file, then these indexing and/or serving directives will be ignored.
Unlike the robots meta tag, the X-Robots-Tag cannot be found in the HTML code of a particular page. It’s an element of the HTTP response header - and, as such, can be trickier to read in the browser.
Here’s how you can view the HTTP response headers in Google Chrome:
Our SEO Toolbar is a free browser extension available for Chrome and Firefox that simplifies that process. If present, the X-Robots-Tag of a certain URL will be displayed in the “Indexability” section.