Webmaster level: intermediate-advanced

To crawl, or not to crawl, that is the robots.txt question.

Making and maintaining correct robots.txt files can sometimes be difficult. While most sites have it easy (tip: they often don't even need a robots.txt file!), finding the directives within a large robots.txt file that are or were blocking individual URLs can be quite tricky. To make that easier, we're now announcing an updated robots.txt testing tool in Webmaster Tools.

You can find the updated testing tool in Webmaster Tools within the Crawl section:

Here you'll see the current robots.txt file, and can test new URLs to see whether they're disallowed for crawling. To guide your way through complicated directives, it will highlight the specific one that led to the final decision. You can make changes in the file and test those too, you'll just need to upload the new version of the file to your server afterwards to make the changes take effect. Our developers site has more about robots.txt directives and how the files are processed.

Additionally, you'll be able to review older versions of your robots.txt file, and see when access issues block us from crawling. For example, if Googlebot sees a 500 server error for the robots.txt file, we'll generally pause further crawling of the website.

Since there may be some errors or warnings shown for your existing sites, we recommend double-checking their robots.txt files. You can also combine it with other parts of Webmaster Tools: for example, you might use the updated Fetch as Google tool to render important pages on your website. If any blocked URLs are reported, you can use this robots.txt tester to find the directive that's blocking them, and, of course, then improve that. A common problem we've seen comes from old robots.txt files that block CSS, JavaScript, or mobile content — fixing that is often trivial once you've seen it.

We hope this updated tool makes it easier for you to test & maintain the robots.txt file. Should you have any questions, or need help with crafting a good set of directives, feel free to drop by our webmaster's help forum!

Webmaster level: intermediate-advanced

To crawl, or not to crawl, that is the robots.txt question.

Making and maintaining correct robots.txt files can sometimes be difficult. While most sites have it easy (tip: they often don't even need a robots.txt file!), finding the directives within a large robots.txt file that are or were blocking individual URLs can be quite tricky. To make that easier, we're now announcing an updated robots.txt testing tool in Webmaster Tools.

You can find the updated testing tool in Webmaster Tools within the Crawl section:

Here you'll see the current robots.txt file, and can test new URLs to see whether they're disallowed for crawling. To guide your way through complicated directives, it will highlight the specific one that led to the final decision. You can make changes in the file and test those too, you'll just need to upload the new version of the file to your server afterwards to make the changes take effect. Our developers site has more about robots.txt directives and how the files are processed.

Additionally, you'll be able to review older versions of your robots.txt file, and see when access issues block us from crawling. For example, if Googlebot sees a 500 server error for the robots.txt file, we'll generally pause further crawling of the website.

Since there may be some errors or warnings shown for your existing sites, we recommend double-checking their robots.txt files. You can also combine it with other parts of Webmaster Tools: for example, you might use the updated Fetch as Google tool to render important pages on your website. If any blocked URLs are reported, you can use this robots.txt tester to find the directive that's blocking them, and, of course, then improve that. A common problem we've seen comes from old robots.txt files that block CSS, JavaScript, or mobile content — fixing that is often trivial once you've seen it.

We hope this updated tool makes it easier for you to test & maintain the robots.txt file. Should you have any questions, or need help with crafting a good set of directives, feel free to drop by our webmaster's help forum!

Developing modern multi-device websites

Fortunately, making websites that work on all modern devices is not that hard: websites can use HTML5 since it is universally supported, sometimes exclusively, by all devices. To help webmasters build websites that work on all types of devices regardless of the type of content they wish to serve, we recently announced two resources:
  • Web Fundamentals: a curated source for modern best practices.
  • Web Starter Kit: a starter framework supporting the Web Fundamentals best practices out of the box.
By following the best practices described in Web Fundamentals you can build a responsive web design, which has long been Google's recommendation for search-friendly sites. Be sure not to block crawling of any Googlebot of the page assets (CSS, JavaScript, and images) using robots.txt or otherwise. Being able to access these external files fully helps our algorithms detect your site's responsive web design configuration and treat it appropriately. You can use the Fetch and render as Google feature in Webmaster Tools to test how our indexing algorithms see your site.
As always, if you need more help you can ask a question in our webmaster forum.


Making sure the deployed annotations are usable by search engines can be rather difficult, especially on sites with many pages, and site owners all around the world haven’t been shy telling us about this. Today we're releasing a feature that should make debugging rel-alternate-hreflang annotations much easier.

The Language Targeting section in the International Targeting feature enables you to identify two of the most common issues with hreflang annotations:
  • Missing return links: annotations must be confirmed from the pages they are pointing to. If page A links to page B, page B must link back to page A, otherwise the annotations may not be interpreted correctly.
    For each error of this kind we report where and when we detected them, as well as where the return link is expected to be.
incorrect_backlinks.png

  • Incorrect hreflang values: The value of the hreflang attribute must either be a language code in ISO 639-1 format such as "es", or a combination of language and country code such as "es-AR", where the country code is in ISO 3166-1 Alpha 2 format.
    In case our indexing systems detect language or country codes that are not in these formats, we provide example URLs to help you fix them.

incorrect_language.png

Additionally, we've moved the geographic targeting setting to this part of Webmaster Tools, so that you can find all information relevant to international and multilingual targeting in the same place.

We hope you'll find this new feature useful and that it helps you to identify issues with the rel-hreflang-implementation on your site. If you have comments or questions about the feature, please post in our Webmaster Help Forum.

Posted by Gary Illyes, Webmaster Trends