A canonical URL is a URL that Google sees as the “master” version of a set of duplicate or near-duplicate pages. Think of it as the difference between an original piece of art and its copies or prints. This canonical URL is what Google will index and potentially return to users in Google search.
Canonical URLs are important because Google will only index canonical URLs. This means that if you have duplicate content on your website, i.e., pages that are near or exact duplicates of each other, then Google is only going to index one of them (the canonical).
If you set your canonical URLs properly, Google is likely to respect your decision and view that page as canonical. However, in the absence of a specified canonical for exact or near-duplicate pages, Google is going to use its best judgment to choose a canonical for you.
The problem is, this may not be the URL you want Google to choose as the canonical one. So if you want to stand the best chance of it being the right one, you should manually set a canonical URL.
Google looks at several signals to determine the canonical URL for a set of duplicate or near-duplicate pages, one of which is the canonical tag. The canonical tag is a piece of HTML code that you add to the <head>
section of a page to specify the canonical version of that page. It looks like this:
<link rel="canonical” href=“https://example.com/canonical-page/" />
For example, suppose you own an ecommerce store where visitors can filter products by parameters like style, size, and color. In that case, this typically results in parameterized URLs with virtually identical content to your “master page”:
yourstore.com/tshirts
(“master” page listing all T-shirts)yourstore.com/tshirts?size=small
(identical to the “master” page but filtered for small T-shirts only)yourstore.com/tshirts?color=red
(identical to the “master” page but filtered for red T-shirts only)Without proper canonicalization, you may end up in a situation where Google chooses to index the “wrong” version of the page or all of them.
Both of these outcomes are often problematic for SEO.
To ensure Google has everything it needs to index your page, you can set the canonical version of these URLs by using a canonical tag on the master page, pointing to the “master” version without the URL parameters.
This helps Google to understand which version of the page it should index.
Learn more: Canonical Tags: A Simple Guide for Beginners
Canonicalization is a complex and technical topic, but most website owners need only know a handful of best practices. So to keep things simple, we’ll cover just a few of them here.
Self-referencing canonical tags are canonical tags on a page that points to itself.
For example, this page has a self-referencing canonical tag that looks like this:
<link rel=“canonical” href=“https://ahrefs.com/blog/what-is-a-canonical-url/” />
Although using self-referential canonical tags isn’t mandatory and may also seem like a strange thing to do, Google’s John Mueller actually recommends their use:
I recommend [using a] self-referential canonical because it really makes it clear to us which page you want to have indexed, or what the URL should be when it is indexed.
A self-referential canonical tag tells Google that you consider the URL to be canonical and that you’d like Google to index the page. Of course, indexing isn’t guaranteed, but the canonical tag, self-referential or otherwise, is one of the strongest signals Google uses to understand what is and isn’t canonical on your website.
Google says you shouldn’t list non-canonical URLs in your sitemap because it sees these URLs as suggested canonicals.
As with canonical tags, this doesn’t necessarily mean that Google will always treat a URL in your sitemap as canonical—but it’s yet another signal to help Google better understand how you view your site’s content.
One quick way to check whether you have non-canonical URLs in your sitemap is to crawl your website for free using Site Audit in Ahrefs Webmaster Tools (AWT).
Here’s how to do it in five simple steps:
If you see this error, click the error and hit the “View affected URLs” button. You can then take steps to remove these URLs from your sitemap or change their canonicalization status.
A 404 status code is returned by the browser when a page or resource cannot be found. This usually happens because the page has been deleted or taken offline.
It probably goes without saying that you shouldn’t intentionally specify a 404 as canonical, but this is still a somewhat common error that occurs on websites over time because people often remove or relocate pages.
As a result, it’s important to keep an eye out for dead pages marked as canonicals. You can do this for free using Site Audit in Ahrefs Webmaster Tools (AWT).
If you see this error, you have canonical tags on your site that specify dead URLs as canonical URLs. You can see which pages are affected by clicking the error and hitting the “View affected URLs” button.
You should replace any canonical tags linking to 4XX URLs with links to live pages.
If you have paginated pages—such as a series of blog archive pages—these work a little differently than parameterized URLs. In this case, paginated pages should not be canonicalized to the first page in the series. Instead, you should use self-referencing canonical tags on each page. Google’s John Mueller confirmed that this is the correct way to handle canonicalization with pagination on Reddit.
No, it’s not good practice to have multiple canonical tags on the same page. In this case, Google will most likely ignore both canonical tags and may not index the page.