Everything you need to know about the X-Robots-Tag HTTP header

Everything you need to know about the X-Robots-Tag HTTP header

Search engine optimization, in its most basic sense, relies on one thing above all else: search engine spiders Crawling and indexing your location.

But almost every website will have pages that you don’t want to include in this exploration.

For example, do you really want your privacy policy or internal search pages to appear in Google results?

In the best-case scenario, these do nothing to actively drive traffic to your site, and in the worst-case scenario, they can divert traffic from your most important pages.

Fortunately, Google allows webmasters to tell search engine bots which pages and content to crawl and what to ignore. There are several ways to do this, the most common of which is to use a robots.txt file or the robots meta tag.

We have an excellent and detailed explanation The ins and outs of robots.txt, which you should definitely read.

But in higher level terms, it’s a plain text file that lives at the root of your website and follows a Bot Exclusion Protocol (REP).

The robots.txt file gives crawlers instructions about the site as a whole, while the robots meta tags include directions to specific pages.

Some of the meta tags include the robots you might use indexwhich asks search engines to add the page to their index; noindexindicating that the page has not been added to the index or included in the search results; Followwhich instructs the search engine to follow links on the page; Do not followwho tells her not to follow the links, and a whole host of others.

The robots.txt tags and meta robots are both useful tools to keep in your toolbox, but there is also another way to direct search engine bots to noindex or nofollow: X-Robots-Tag.

What is the X-Robots-Tag?

X-Robots-Tag is another way to control how spiders crawl and index your web pages. As part of the URL’s HTTP header response, it controls the indexing of an entire page, as well as the specific elements on that page.

And while using meta robots tags is fairly simple, X-Robots-Tag is a bit more complicated.

But this of course raises the question:

When should you use X-Robots-Tag?

to me The GoogleAny command that can be used in a robots meta tag can also be specified as an X-Robots-Tag.

While you can set directives related to robots.txt in HTTP response headers with both the robots meta tag and the X-Robots tag, there are certain situations in which you may want to use the X-Robots-Tag – the two most common being when:

  • You want to control how your non-HTML files are crawled and indexed.
  • You want to provide directions at the site level instead of at the page level.

For example, if you want to prevent a specific image or video from being crawled – the HTTP response method makes this easy.

The X-Robots-Tag header is also useful because it allows you to combine multiple tags in an HTTP response or use a comma-separated list of directives to specify directives.

Maybe you don’t want to cache a particular page and want it to be unavailable after a certain date. You can use a combination of the “noarchive” and “unavailable_after” tags to instruct search engine bots to follow these instructions.

The main strength of the X-Robots-Tag is that it is more flexible than the meta robots tag.

File use feature X-Robots-Tag With HTTP responses is that it allows you to use regular expressions to execute crawl commands on non-HTML, as well as apply parameters on a larger global level.

To help you understand the difference between these directives, it’s helpful to categorize them by type. That is, are they crawler directives or indexer directives?

Here is an easy to explain cheat sheet:

Crawler directives Indexer directives
robots.txt file – Uses user-agent, allow, disallow, and sitemap directives to determine where in a site search engine bots are allowed and not allowed to crawl. Meta Robots tag Allows you to specify and prevent search engines from showing certain pages on a site in search results.

Do not follow – Allows you to specify which links should not pass on authority or PageRank.

X-Tag robots Allows you to control how specific file types are indexed.

Where do you put the X-Robots-Tag?

Let’s say you want to block certain file types. The ideal way is to add the X-Robots-Tag to your Apache config or .htaccess file.

The X-Robots-Tag can be added to the site’s HTTP responses in the Apache server configuration via a .htaccess file.

Real world examples and uses of the X-Robots-Tag

This sounds great in theory, but what does it look like in the real world? lets take alook.

Let’s say we wanted search engines not to index .pdf file types. On Apache servers, this configuration will look like this:

  Header set X-Robots-Tag "noindex, nofollow"

On Nginx it would look like this:

location ~* .pdf$ {
  add_header X-Robots-Tag "noindex, nofollow";

Now, let’s look at a different scenario. Suppose we want to use X-Robots-Tag to prevent image files from being indexed, such as .jpg, .gif, .png, and so on. You can do this using the X-Robots-Tag which will look like this:

Header set X-Robots-Tag "noindex"

Please note that understanding how these directives work and impact each other is crucial.

For example, what happens if the X-Robots-Tag and the meta robots tag are set when crawler bots discover the URL?

If this URL is blocked by robots.txt, certain indexing and rendering directives cannot be detected and will not be followed.

If the commands are followed, URLs containing these cannot be prevented from being crawled.

Check for the X-Robots-Tag

There are several different methods that can be used to check for the presence of an X-Robots-Tag on a site.

The easiest way to check is to install a file browser extension It will tell you the X-Robots-Tag information about the URL.

Screenshot of the Bot Exclusion Checker, December 2022

Another plugin you can use to determine if an X-Robots-Tag is being used, for example, is Web Developer plugin.

By clicking on the plugin in your browser and going to View Response Headers, you can see the various HTTP headers that are used.

Web developer plugin

Another method that can be used for benchmarking to identify problems on million page websites is Screaming Frog.

After launching a site through Screaming Frog, you can go to the “X-Robots-Tag” column.

This will show you which sections of the site use the tag, as well as specific directions.

Frog Screaming Report.  X-RobotScreenshot of the screaming frog report. X-Robot-Tag, December 2022

Using X-Robots-Tags on your site

Understanding and controlling how search engines interact with your website is the cornerstone of search engine optimization. And X-Robots-Tag is a powerful tool that you can use to do just that.

Just be aware: It’s not without its risks. It’s very easy to make a mistake and deindex your entire site.

However, if you are reading this article, you are probably not a beginner in SEO. As long as you use it wisely, take your time and check your work, you will find X-Robots-Tag to be a useful addition to your arsenal.

More resources:

Featured image: Song_about_summer / Shutterstock

#XRobotsTag #HTTP #header

Leave a Comment

Your email address will not be published. Required fields are marked *