Become the sitemap expert: Complex guide + cheat sheet

Become the sitemap expert: Complex guide + cheat sheet

I have been hearing all sorts of misunderstandings about sitemaps. Again and again, every week.

That’s why I decided to create one complex article with a printable cheat sheet. It will cover all of the best practices from the sitemap world and disprove all of the misinterpretations about it.

I gathered all of the know-how and experience that I had accumulated by creating sitemaps for my work through the years. 

And voila. Here is the article that will help you to become the sitemap expert, whether you are a beginner or are already a professional.

In this article, you will find everything important:

  1. How sitemaps can help you with SEO
  2. Do you really need a sitemap? Find it out
  3. Choose most suitable types of sitemaps for your website
  4. Make your sitemaps in 4 steps:
  5. The last important step: Notify search engines about sitemaps

Sitemap cheatsheet: How to create, validate and notify about sitemaps, What to avoid, Overview of limits, Templates

1. How sitemaps can help you with SEO?

First, you should understand what a sitemap is, along with its main purpose.

What is the main purpose of sitemaps?

A Sitemap is a list of web pages created for web crawlers so they can find your web content fast and easy.  The search engine can use a sitemap to see the newest pages on a website, or all the web pages together, including all images, video content, etc.

Sitemap structure

A search engine will find page 9 in a sitemap immediately, within one visit of the file. On the website illustrated above, it will have to jump through 6 web pages to find page 9.

SEO benefits of sitemaps

  • Faster indexation: Search engine will find out about new pages much faster, therefore, the indexation process and displaying the website in search results will be faster as well. Sitemaps can help you with the deindexation too.
  • Better indexation of deep pages: Search engines can find out the pages that were not discovered while crawling the website. But it doesn’t necessarily mean that all of them will be indexed.
  • Monitoring of indexed pages: In combination with Google Search Console, it is possible to find out which URLs are covered in the sitemap that Google indexes.

Unfortunately, webmasters often give sitemaps credit, which they actually don’t deserve. I have to say it clear and loud:

Sitemaps don’t help you with better search rankings.

A sitemap is not a ranking factor. Listing URLs into a sitemap has zero influence on the higher search rankings.

URLs with a deep link level or lack of internal links get no or little link juice. With no link juice, search bots can still consider these pages as low-quality, even if they are listed in sitemaps. Most likely, the search engines will refuse to display them in SERPs anyway.

Did you read the article up until this part? Great job. You know the main principles and the SEO benefits of sitemaps by now. Now, it will be much easier to go a little deeper.

2. Do you need a sitemap?

A sitemap isn’t necessary for every website. And if you need one, it doesn’t exactly have to be a sitemap.xml. You can implement several types of sitemaps, like an RSS feed or special extensions for a sitemap.xml.

They told me to make a sitemap.xml. But I have a website with only one page

A sitemap is for you, if:

  • You have a large website:
    A website with 100 pages isn’t large. Even a website with 1,000 pages is still pretty small, and a sitemap isn’t really necessary. For bigger sites, however, you should definitely consider it. Do you have one million pages? Then you need a sitemap.

  • You have a new site: 
    A sitemap helps crawlers to find a brand new website just after launch and to index the new pages much faster.

  • You change your website’s content frequently 
  • You need to index your fresh content very fast:
    A sitemap helps to accelerate the indexation of newly added pages after adding them to the sitemap, especially, when your site is in Google News.

You don’t need a sitemap if your website is:

  • One-page presentation
  • A portfolio website
  • Website of an organization
  • SaaS application

Don’t forget. 

If a blog is part of your website, you should use an RSS feed for it.

3. What types of sitemaps are most suitable for your website?

Before you start to think about how to code and implement sitemaps, you have to choose the right type of sitemap to fit your needs. Here is the complete table:

Type Google/Bing compatibility Purpose Where to use it
XML sitemap Google, Bing Better indexation of deep linked HTML pages New or extensive website with complex navigation
Image XML sitemap Google, Bing Extension for images Website with important image content
Video XML sitemap Google, Bing Extension for video

Website with video content

Mobile XML sitemap

Google

Extension for mobile version pages

Website with a mobile version located on a separate URL without annotations in HTML

Alternate language XML sitemap

Google

Extension for pages in alternate languages

Website with more language mutations, but not providing the hreflang attribute in the HTML code or HTTP header

Google News Sitemap

Google

Rapid indexation of news

Website involved in Google News

RSS feed / Atom

Google, Bing

Newest or updated HTML sites, images

Website with often added or updated content

mRSS

Google, Bing

RSS video extension

Website with often added or updated video content

A sitemap can be written as a simple text file. We don’t deal more with it because it contains only a list of URLs. Other sitemap forms allow better URL annotation.

Best practices for the sitemap types

Still not sure what sitemaps are most suitable for you? Here are my recommendations. Mouse over these website types:

This is a general recommendation, as that every website is unique. Once you choose the type of sitemap, think about its purposes. 

Avoid static sitemap generators

Sitemap generator tools like XML-sitemaps.com are very popular. You enter the URL of your website, and then the generator starts to crawl it and creates the sitemap. After that, you upload the generated sitemap file to the web server. It is easy to use, but it’s also mostly useless and counterproductive. If you create new web pages, the sitemap remains the same. It will not reflect the changes.

Why are XML sitemap generators useless?

1. Generated sitemaps will be outdated very quickly

As I said in the beginning of this guide, one of the main benefits of a sitemap is that the search engine will find out about the new pages much faster. The generator only gives you a snapshot of a website when the sitemap is generated, which is tough. It will be outdated very quickly.

2. The unimportant URLs are listed

As I wrote in the section “Which URLs should be put in a sitemap”, it’s important to exclude not important URLs. Most sitemap generators put in all of the website URLs. Sure, you can delete all of the unnecessary URLs, but you can easily make a mistake in the messy data.

3. Generator sees only linked URLs

You can add to sitemap URLs, which aren’t linked from the website. Sitemap generator can’t. But don’t forget, it doesn’t necessarily mean that all of the not-linked URLs will be indexed.

So, please avoid using sitemap generators while you don’t have a very good reason to use them. What’s a very good reason? For example, when your website is new, and you need to index it fast. In this case, you probably don’t have the time or the resources to create a dynamic sitemap, which would be updated regularly. 

In most cases, you need a dynamic sitemap with regular updates. Let’s see how to implement it.

4. How to make a sitemap

Now I am going to be a little bit techy. All of the sitemaps described above are in XML format. There are two basic formats: XML sitemap and RSS.

Sitemap hierarchy

Sitemaps hierarchy

With these formats, you will cover everything. Each Bing & Google sitemap. So, you don’t need to use or know more about other formats, such as an Atom feed.

4.1. XML sitemap

1. Declaration

Crawlers recognize the sitemap by this declaration:

2. Adding URLs

Tag <urlset> wraps URLs. Here is a simple sitemap with one URL:

For adding more URLs, you just put more <url> tags in there. Only one tag is required in a <url> tag and that is its location. Thanks to using <loc> tag, Google will know about all of the important URLs.

3. Adding lastmod to URLs

I also strongly recommend wrapping an information about the last update of URL in the <url> tag. It can accelerate the recrawling of the URL with fresh content. You can do it with a <lastmod> tag like this:

The complete sitemap example:

Lastmod formatting

Lastmod uses a W3C date and time format. Let’s take a look at the value in the example above:  Description of value 2017-10-20T17:30:00-02:00

Part of the value Description
2017-10-20 The date, 20th of October, 2017
T Separator between date and time
17:30:00 17 hours, 30 minutes, 0 seconds
-02:00 Time is observed in UTC time zone, and -02:00 means time is shifted 2 hours backwards. If you want to use the exact UTC, put a “Z” character instead (2017-10-20T17:30:00Z)

 You can add two more tags into a sitemap, <changefreq> and <priority>. Google ignores them (Seroundtable.com, May 2015). John Mueller from Google confirmed it again in August 2017 (for sitemap priority tag). Changefreq is ignored by the most search engines. Except for Bing, the priority tag still can be used.

Sitemap guidelines

When you will be creating a sitemap, you have to fulfill some limits and follow these rules: Use correct encoding, character escaping and URL format.

  • Use UTF-8 encoding
  • Use absolute URLs If you have the sitemap placed in http://www.example.com/sitemap.xml, you can’t specify a relative URL like /page1.html. You have to use http://www.example.com/page1.html instead.
  • You can compress the sitemap with the gzip method
  • Entity escaping:
Character   Escape code
Ampersand & &amp;
Single quote &apos;
Double quote &quot;
Greater than > &gt;
Less than < &lt;

For example, the URL http://www.example.com/page1.html&lang=en requires escaping of character (&): http://www.example.com/page1.html&amp;lang=en

  • Diacritics and other non-ASCII characters escaping:
    For example, the URL http://www.example.com/päge1.html requires escaping of  character (ä): http://www.example.com/p%C3%A4ge1.html

Limitations Each sitemap should contain less than 50,000 URLs, or the file size should not exceed 50MB uncompressed.

Which URLs should be put in a sitemap

The selection of ideal URLs into a sitemap is a very important task. It’s not a good practice to list every URL of a website in a sitemap. 

Include web pages you want to rank in Google and want to show to the users — important and quality ones.

Exclude:

  • Utility pages, which are very needed and useful to users (review form, accounts, wishlists, etc), but are not meant to be landing pages
  • Any redirections (3xx), pages with client (4xx) or server (5xx) error pages
  • URLs with parameters or id sessions used
  • URLs created by filtering (unnecessary for SEO)
  • URLs on other subdomains than sitemap.xml are placed; if you have URLs on other subdomains, you have to place another sitemap there
  • Canonicalised pages
  • Paginated pages
  • Duplicate pages
  • Pages disallowed by robots.txt
  • Pages with noindex

4.2. Image sitemap extension

You can extend the XML sitemap with images. Have a look at a basic sitemap again:

If you want to add images there, add the XML namespace for the images:

Then, add images in the <url> tag like this:

<image:image> and <image:loc> tags are required.

Provide more information about the images

You can add more info about each of them:

Name

Tag

Description

Title

<image:title>

The title of the image. You can use the same text like in the alt attribute of the <img> tag.

License URL

<image:license>

A URL to the license of the image, for example, https://creativecommons.org/licenses/by-sa/2.0/. Choose the right license type or get the link to your license at https://creativecommons.org/choose/.

Geographic location

<image:geo_location>

The geographic location of the image, for example, “Berlin, Germany”

Caption

<image:caption>

Longer description of the image

An example of the sitemap with optional tags:

Image sitemap guidelines

  • You can add up to 1,000 images for each web page.
  • Images don’t have to be in the same domain as your website. You can use a CDN (content delivery network). Make sure that your CDN is verified in the Search Console.

4.3. Video sitemap extension

Provide the required information about the videos

You have to add all of this info about each of them:

Name

Tag

Description

Thumbnail image file

<video:thumbnail_loc>

A URL pointing to the video thumbnail image file. Images must be at least 160×90 pixels, and at most 1920×1080 pixels.

Title

<video:title>

The title of the video. You can use the same text like in the alt attribute of <video> tag.

Description

<video:description>

Video description must match the description displayed on the web page.

Location

<video:content_loc>

or

<video:player_loc>

At least one of these must be specified:

content_loc points to an URL with actual video media file; flash is allowed, but may be indexed less well;  player_loc points to a URL with a video player. Usually this is the information in the src element of an <embed> tag.

Don’t forget to add the XML namespace for video and use <video:video> tag for each URL.

Provide additional information about the videos

Search engines may use more information about the videos. You can add more optional tags:

Name

Tag

Description

Duration

<video:duration>

The duration of the video in seconds — the max value is 28,800 (8 hours)

Expiration date

<video:expiration_date>

Applicable only for videos with the date after which the video will no longer be available, in W3C format

Publication date

<video:publication_date>

The date the video was published, in W3C format

Rating

<video:rating>

The rating of the video, numbers in the range 0.0 to 5.0

View count

<video:view_count>

The number of times the video has been viewed

Family friendly

<video:family_friendly>

Define “no” only if the video should be available only to users with SafeSearch turned off

Restriction

<video:restriction>

List of countries where the video may or may not be played, in ISO 3166 format; requires attribute relationship with value “allow” or “deny”

Requiring subscription

<video:requires_subscription>

“Yes” or “no,” depending on whether a subscription is required to view the video

Live video

<video:live>

“Yes” or “no,” depending on whether the video is a live stream

 

The example with optional tags:

As another example, check a video sitemap of The Guardian.

4.4. Alternate languages extension

If your site is written in more languages, you should link them by using “hreflang” attribute so that the search engine gets to know about them easily, understanding which site to show in the search results of specific regions. It’s the same for sitemaps. By inserting the alternate element to a sitemap, you support the indexation of different language versions.

As a good example of sitemaps with alternate languages extension, this is a sitemap index of Nike with a list of sitemaps for each language version.

You must create a separate URL element for each URL. Each URL element must include a <loc> tag indicating the page URLs, and a <xhtml:link> subelement for every alternate version of the page, including itself.

The <xhtml:link> tag includes attributes to serve the correct language or regional URL: rel=”alternate” hreflang=”en-us” href=”http://www.example.com/”.

Don’t forget to add the XML namespace for XHTML.

4.5. Mobile sitemap extension

A mobile sitemap is useful only for the mobile version sites. If your site uses responsive design to serve mobile users, there is no need to create a mobile sitemap, because the URL is still the same. When you annotate a mobile version in HTML, mobile sitemap extension is not necessary.

When building the mobile sitemap, add the XML namespace for mobile, and use  <xhtml:link> tag, including attributes to serve the mobile version of the URL: rel=”alternate” media=”only screen and (max-width: 640px)” href=”http://www.m.example.com/”.

4.6. Google News Sitemap

If you have a news site, build a Google News Sitemap. It lets you control which content you submit to Google News. For example, The Guardian has a very well-written news sitemap.

Before you start to use a news sitemap, register your site in Google News, so that Google will be alerted to it. After the registration, add all of the news articles created in the last two days to it, with a limit of 1,000 URLs. An update of the sitemap should be done each time a news article is published.

Provide information about news articles

You should add this info for each of the following:

Name

Tag

Description

Publication

<publication>
with subtags
   <news:name>
   <news:language>

First, include the name of the news publication, and then the language of your publication. Use an ISO 639 Language Code.

Date of publication

<publication_date>

Article publication date in W3C format

Title of article

<title>

The title of the news article defined in <h1> heading

Keywords

<news:keywords>

Keywords describing the article topic, divided by commas

Gengre

<news:genres>

Use <news:genres>Opinion

</news:genres> tag, if the article contains only opinions or comments.

When coding the news sitemap, add the XML namespace for news and use <news:news> tag for each URL.

For better displaying your article in Google News, try to add <image:image> tag, along with the location of a representative image. Usually, Google picks the image from the article, but without defining the image in a sitemap, your article can be displayed in Google without it.

A big plus is that readers of RSS feed can add Google News sections to their RSS aggregators. Your articles will be shown in Google News and so will in RSS feed. 

4.7. RSS feed

An RSS feed is a small file containing the most ten recent fresh updates to your website.

Provide the required information about the articles

You have to add all of this information for the channel in the <channel> tag and for each URL in the <item> tag:

Name

Tag

Description

Title

<title>

The title of an article defined in the <h1> heading

Link URL

<link>

Link pointing to an article destination

Description

<description>

Description of an article

Provide more information about the articles

Even if these tags are not important for SEO and site indexation, they are important to your readers. The more information you provide, the more information RSS feed readers will get about your posts in their RSS aggregators.

Name

Tag

Description

Language

<language>

Language availability — RSS aggregators will group sites based on their language

Image of blog

<image>

Image link points to an image/logo destination

Description

<description>

Text description of an article.

Publication date

<pubDate>

Date and time of publication in RFC 822 date-time format

File enclosure

<enclosure>

A media-file included with an item

Category

<category>

Category division — RSS aggregators will group sites based on category. If two or more categories are applicable, use the <category> tag separately for each category.

The example with optional tags:

If you run a blog on WordPress, an RSS feed is generated automatically and updated at the URL www.example.com/feed/.

The final step is submitting your feed to an aggregator that will match the interest. There are many popular aggregators where you should try to submit your website.

4.8. mRSS extension

mRSS feed is the same as an RSS sitemap, but with only one difference — it’s made for video content.  A nice example is Vimeo mRSS feed.

Name

Tag

Description

Media content

<media:content>

Parent of all the media tags

Media player

<media:player>

Link pointing to a video player

Media credit

<media:credit>

Requires attribute role and scheme linking to an author bio

Media thumbnail

<media:thumbnail>

A thumbnail with a link to its destination

Media title

<media:title>

A title of a video — can be the same as an item title

Create a structure for sitemaps with a sitemap index

When you will create well-structured sitemaps, it greatly helps you with monitoring of indexation.

Search Console provides information about how many URLs in each sitemap are indexed. Have a look at this sitemap dashboard:

Sitemap index in search console

It’s much easier to examine indexation issues for separate sitemaps than just for one large file.

You can connect more sitemaps together with a sitemap index. In the example of the CNN website’s sitemap index, they build three sitemaps differing by content every month and then update them in the sitemap index file.

Implementation of a sitemap index

Here is the example of the sitemap index with two files:

Only <loc> tag is required. But I recommend add <lastmod> too. You can find how to format date & time above. To a sitemap index, you can add up to 50,000 sitemaps. And to Search Console, you can add 500 sitemap indices.

Validate your sitemaps

Generated sitemaps can be much too complex, and it is easy to make a mistake. After you generate a sitemap, I would recommend validating it with Yandex Sitemap Validator and after that with Google Search Console.

Yandex Sitemap Validator

Yandex sitemap validator

Yandex’s validator is very easy to use. You can input text, a URL with a sitemap or simply upload a file.

What validator checks:

  • Checks basic XML sitemaps, RSS feeds and Sitemap Indexes. Unfortunately, it doesn’t help you with sitemap extensions like Image sitemaps.
  • Checks XML syntax
  • Shows file sizes
  • Shows the number of links in the sitemap

Yandex Sitemap Checker serves the results immediately. It is a great tool for initial sitemap testing or for checking the sitemaps of domains which you don’t own.

Google Search Console

Google sitemap validator

The search console checks almost everything, including all sitemap extensions, URL status codes and much more.

It is the best solution for your sitemap testing. It takes the Search Console some time to proceed them, though — from hours to a few days. And you can’t test a text input or sitemaps on external domains. 

5. Notify the search engines about a sitemap

In order for search engines to process the sitemap, they must first learn about them, otherwise, the sitemaps are useless for SEO. There are several ways of doing this:

Methods the search engines use to find and use your sitemap

You can combine more than one of the following methods:

  • Put the sitemap location at robots.txt.
  • Write the sitemap location in the source code.
  • Submit the sitemap via the search engine’s webmaster tools.
  • Register yourself as a publisher to use Google News Sitemap.

Robots.txt

The safest way to get to know about your sitemap is to update the XML sitemap in the robots.txt file. Every search engine will find it in there.

You can define more than one sitemap, and its URL must be absolute.

Definition of XML sitemap in robots.txt:

The biggest issue here is that you make your sitemaps public so that anyone can easily find them. However, you should think about that, because in the worst scenario, it can be abused by your competition.

Source code

RSS feed is a little bit different and has to be added to the source code.

Definition of an RSS feed in the (X)HTML header:

The search engines can also learn about your sitemap from the simple <a> tag linking to the sitemap destination, but this method is not reliable, and there’s no guarantee that every engine will find it.

Search engine’s webmaster tools

Not all search engines have their own webmaster tools. If an XML sitemap does not need to be confidential, it should also be listed in robots.txt.

Google Search Console

  1. Open the Google Webmaster Tool and click on “Sitemaps” under “Crawl.”
  2. Click on “Add/Test Sitemap” and paste your XML Sitemap file here and click “Test” button before hitting the “Submit” button.
  3. After this step, you can see how many web pages and other files were found in the sitemap; click “Close test” and hit “Submit.”
  4. Now repeat the process and copy all of the sitemaps you have one-by-one in the webmaster tools. If the sitemaps are listed in a sitemap index, you need to submit the index only.

Add sitemap to Google Search Console

You can submit up to 500 sitemaps index files in Search Console.

Bing Webmaster Tools

  1. Open Bing Webmaster Tools.
  2. Now click on Dashboard- > Configure my site -> Sitemap.
  3. Copy and paste the XML sitemap in the Bing sitemap box, and then hit the “Submit” button.

Add sitemap in Bing Webmaster tools

With Bing Webmaster Tools, you can only see the number of submitted pages, but not those that are indexed. Therefore, analysing data is not as easy as in Search Console. The same is true for Yandex.

Yandex Webmaster Tools

  1. Open Yandex Webmaster Tools.
  2. Click on Dashboard -> Indexing -> Sitemaps file.
  3. Copy and paste the XML sitemap in the Yandex sitemap box and hit the “Add” button.

Add sitemap to Yandex Webmaster tools

Register yourself as a publisher to use News Sitemap

Until your site is included in Google News, Google doesn’t use your news sitemap. You can request inclusion in the Google News Publisher Center:

Register yourself as a publisher to use Google News Sitemap

Before your request, read this step-by-step guide from Shane Barker and the official Google guidelines.

Your competitors can abuse a visible sitemap

Location of a sitemap in robots.txt or the source code is public. Anyone can find it. If you have a well-structured and complex sitemap, competitors can easily get abusable data, like the list of categories, tags or all your products.

On the other hand, even if you don’t have visible sitemap and competitors want that data, they will get it. But they’ll have to try harder. To obtain and clean the data, they have to use a web crawler. But this way is much harder than to simply parse sitemap files.

Just be careful. If you get 99% of your traffic from Google, Bing and other search engines which have their own webmaster tools, consider the privacy of your sitemaps. It is not necessary to link to your sitemap in the robots.txt file.  

Ping search engines immediately after every update

When adding a new site to a sitemap, you can alert the search engines through so-called “pinging.” They will know about the changes immediately, and they can then index the new content.

“Ping” your sitemap in the search engine by entering a customized URL:

  • Google: http://www.google.sk/ping?sitemap=http://example.com/sitemap-path.xml
  • Bing: http://www.bing.com/ping?sitemap=http://example.com/sitemap-path.xml

For any other search engine, try to use this URL template:

Although not every search engine supports it, it is ideal if you can automate “pinging”. Whenever you update a sitemap on the server, send a notification to Google or Bing as well.

Get the free sitemap cheatsheet

Get your overview of all important information in one piece:

Sitemap cheatsheet: How to create, validate and notify about sitemaps, What to avoid, Overview of limits, Templates

Summary

Sitemaps help you with faster and better indexation, along with the monitoring of indexed pages. They are most suitable for websites with thousands of pages and for frequently updated content. When you want to implement sitemaps, you have to go through three steps:

You should follow sitemap guidelines, limitations and avoid the static sitemap generators. Then you will benefit from all of the advantages of sitemaps.

Are you proud of your sitemaps structure? Let us know in the comments or via janko@spotibo.com. We would like to link to some well-built sitemaps. 🙂

Get more technical SEO Tips

We won’t send you anything else than SEO tips. You can cancel subscription anytime.

Submit a Comment

Your email address will not be published.