Traffic

CPX

PTP

DOWNLOAD OUR MOVIE COLLECTION

Thursday, December 22, 2011

Download search queries data using Python

Webmaster level: Advanced

For all the developers who have expressed interest in getting programmatic access to the search queries data for their sites in Webmaster Tools, we've got some good news. You can now get access to your search queries data in CSV format using a open source Python script from the webmaster-tools-downloads project. Search queries data is not currently available via the Webmaster Tools API, which has been a common API user request that we're considering for the next API update. For those of you who need access to search queries data right now, let's look at an example of how the search queries downloader Python script can be used to download your search queries data and upload it to a Google Spreadsheet in Google Docs.

Example usage of the search queries downloader Python script
1) If Python is not already installed on your machine, download and install Python.
2) Download and install the Google Data APIs Python Client Library.
3) Create a folder and add the downloader.py script to the newly created folder.
4) Copy the example-create-spreadsheet.py script to the same folder as downloader.py and edit it to replace the example values for “website,” “email” and “password” with valid values for your Webmaster Tools verified site.
5) Open a Terminal window and run the example-create-spreadsheet.py script by entering "python example-create-spreadsheet.py" at the Terminal window command line:
python example-create-spreadsheet.py
6) Visit Google Docs to see a new spreadsheet containing your search queries data.


If you just want to download your search queries data in a .csv file without uploading the data to a Google spreadsheet use example-simple-download.py instead of example-create-spreadsheet.py in the example above.

You could easily configure these scripts to be run daily or monthly to archive and view your search queries data across larger date ranges than the current one month of data that is available in Webmaster Tools, for example, by setting up a cron job or using Windows Task Scheduler.

An important point to note is that this script example includes user name and password credentials within the script itself. If you plan to run this in a production environment you should follow security best practices like using encrypted user credentials retrieved from a secure data storage source. The script itself uses HTTPS to communicate with the API to protect these credentials.

Take a look at the search queries downloader script and start using search queries data in your own scripts or tools. Let us know if you have questions or feedback in the Webmaster Help Forum.

Tuesday, December 20, 2011

Website user research and testing on the cheap

Webmaster level: Intermediate

As the team responsible for tens of thousands of Google’s informational web pages, the Webmaster Team is here to offer tips and advice based on their experiences as hands-on webmasters.

If you’ve never tested or analyzed usage of your website, ask yourself if you really know whether your site is useful for your target audience. If you’re unsure, why not find out? For example, did you know that on average users scroll down 5.9 times as often as they scroll up, meaning that often once page content is scrolled past, it is “lost?” (See Jakob Nielsen’s findings on scrolling, where he advises that users don’t mind scrolling, but within limits.)

Also, check your analytics—are you curious about high bounce rates from any of your pages, or very short time-on-page metrics?

First, think about your user


The start of a web project—whether it’s completely new or a revamp of an existing site—is a great time to ask questions like:

  • How might users access your site—home, office, on-the-go?
  • How tech-savvy are your visitors?
  • How familiar are users with the subject matter of your website?

The answers to some of these questions can be valuable when making initial design decisions.

For instance, if the user is likely to be on the road, they might be short on time to find the information they need from your site, or be in a distracting environment and have a slow data connection—so a simple layout with single purpose would work best. Additionally, if you’re providing content for a less technical audience, make sure it’s not too difficult to access content—animation might provide a “wow” factor, but only if your user appreciates it and it’s not too difficult to get to the content.

Even without testing, building a basic user profile (or “persona”) can help shape your designs for the benefit of the user—this doesn’t have to be an exhaustive biography, but just some basic considerations of your user’s behavior patterns.

Simple testing


Testing doesn’t have to be a costly operation – friends and family can be a great resource. Some pointers:

  • Sample size: Just five people can be a large enough number of users to find common problems in your layouts and navigation (see Jakob Nielsen’s article on why using a small sample size is sufficient).
  • Choosing your testers: A range of different technical ability can be useful, but be sure to only focus on trends—for example, if more than 50% of your testers have the same usability issue, it’s likely a real problem—rather than individual issues encountered.
  • Testing location: If possible, visit the user in their home and watch how they use the site—observe how he/she normally navigates the web when relaxed and in their natural environment. Remote testing is also a possibility if you can’t make it in person—we’ve heard that Google+ hangouts can be used effectively for this (find out more about using Google+ hangouts).
  • How to test: Based on your site’s goals, define 4 or 5 simple tasks to do on your website, and let the user try to complete the tasks. Ask your testers to speak aloud so you can better understand their experiences and thought processes.
  • What to test: Basic prototypes in clickable image or document format (for example, PDF) or HTML can be used to test the basic interactions, without having to build out a full site for testing. This way, you can test out different options for navigation and layouts to see how they perform before implementing them.
  • What not to test: Focus on functionality rather than graphic design elements; viewpoints are often subjective. You would only get useful feedback on design from quantitative testing with large (200+) numbers of users (unless, for example, the colors you use on your site make the content unreadable, which would be good feedback!). One format for getting some useful feedback on the design can be to offer 5-6 descriptive keywords and ask your user to choose the most representative ones.
Overall, basic testing is most useful for seeing how your website’s functionality is working—the ease of finding information and common site interactions.

Lessons learned


In case you’re still wondering whether it’s really worth research and testing, here are a few simple things we confirmed from actual users that we wouldn’t have known if we hadn’t sat with actual users and watched them use our pages, or analyzed our web traffic.

  • Take care when using layouts that hide/show content: We found when using scripts to expand and collapse long text passages, the user often didn’t realize the extra content was available—effectively “hiding” the JavaScript-rendered content when the user searches within the page (for example, using Control + F, which we’ve seen often).


    Wireframe of layout tested, showing “zipped”
    content on the bottom left



    Final page design showing anchor links in the top
    and content laid out in the main body of the page


  • Check your language: Headings, link and button text are what catches the user’s eye the most when scanning the page. Avoid using “Learn more…” in link text—users seem averse to clicking on a link which implies they will need to learn something. Instead, just try to use a literal description of what content the user will get behind the link—and make sure link text makes sense and is easy to understand out of context, because that is often how it will be scanned. Be mindful about language and try to make button text descriptive, inviting and interesting.
  • Test pages on a slower connection: Try out your pages using different networks (for example, try browsing your website using the wifi at your local coffee shop or a friend’s house), especially if your target users are likely to be viewing your pages from a home connection that’s not as fast as your office network. We found a considerable improvement in CTR and time-on-site metrics in some cases when we made scripted animations much simpler and faster (hint: use Google’s Page Speed Online to check performance if you don’t have access to a slower Internet connection).
So if you’re caught up in a seemingly never-ending redevelopment cycle, save yourself some time in the future by investing a little up front through user profiling and basic testing, so that you’re more likely to choose the right approach for your site layout and architecture.

We’d love to hear from you in the comments: have you tried out website usability testing? If so, how did you get on, and what are your favorite simple and low-cost tricks to get the most out of it?

Friday, December 16, 2011

Rich Snippets Instructional Videos

Webmaster level: All

When users come to Google, they have a pretty good idea of what they’re looking for, but they need help deciding which result might have the information that best suits their needs. So, the challenge for Google is to make it very clear to our users what content exists on a page in both a useful and concise manner. That’s why we have rich snippets.


Essentially, rich snippets provide you with the ability to help Google highlight aspects of your page. Whether your site contains information about products, recipes, events or apps, a few simple additions to your markup can result in more engagement with your content -- and potentially more traffic to your site.

To help you get started or fine tune your rich snippets, we’ve put together a series of tutorial videos for webmasters of all experience levels. These videos provide guidance as you mark up your site so that Google is better able to understand your content. We can use that content to power the rich snippets we display for your pages. Check out the videos below to get started:



For more information on how to use rich snippets markup for your site, visit our Help Center.

Thursday, December 15, 2011

Introducing smartphone Googlebot-Mobile

Webmaster level: All

With the number of smartphone users rapidly rising, we’re seeing more and more websites providing content specifically designed to be browsed on smartphones. Today we are happy to announce that Googlebot-Mobile now crawls with a smartphone user-agent in addition to its previous feature phone user-agents. This is to increase our coverage of smartphone content and to provide a better search experience for smartphone users.

Here are the main user-agent strings that Googlebot-Mobile now uses:

  • Feature phones Googlebot-Mobile:

    • SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
    • DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  • Smartphone Googlebot-Mobile:

    • Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

The content crawled by smartphone Googlebot-Mobile will be used primarily to improve the user experience on mobile search. For example, the new crawler may discover content specifically optimized to be browsed on smartphones as well as smartphone-specific redirects.

One new feature we’re also launching that uses these signals is Skip Redirect for Smartphone-Optimized Pages. When we discover a URL in our search results that redirects smartphone users to another URL serving smartphone-optimized content, we change the link target shown in the search results to point directly to the final destination URL. This removes the extra latency the redirect introduces leading to a saving of 0.5-1 seconds on average when visiting landing page for such search results.

Since all Googlebot-Mobile user-agents identify themselves as a specific kind of mobile, please treat each Googlebot-Mobile request as you would a human user with the same phone user-agent. This, and other guidelines are described in our previous blog post and they still apply, except for those referring to smartphones which we are updating today. If your site has treated Googlebot-Mobile specially based on the fact that it only crawls with feature phone user-agents, we strongly recommend reviewing this policy and serving the appropriate content based on the Googlebot-Mobile’s user-agent, so that both your feature phone and smartphone content will be indexed properly.

If you have more questions, please ask on our Webmaster Help forums.

Wednesday, December 14, 2011

Clicks and impressions for authors

Webmaster Level: All
(Cross-posted on the Inside Search Blog)

With the latest improvements to the way authorship annotations look in search and the addition of authorship to Google News, authors have been really excited about getting more visibility, and users benefit from seeing the name, photo, and way to connect with the person who created the content.

Authors have also been giving us a lot of feedback on what else they'd like to see, so today we're introducing “Author Stats” in Webmaster Tools that shows you how often your content is showing up on the Google search results page. If you associate your content with your Google Profile either via e-mail verification or a simple link, you can visit Webmaster Tools to see how many impressions and clicks your content got on the Google search results page. Check out what Matt Cutts would see for his content:

To see your information, go to google.com/webmasters and login with the same username you use for your Google+ Profile. On the left hand panel, you can see “Author Stats” under the “Labs” section. This is an experimental feature so we’re continuing to iterate and improve, but we wanted to get early feedback from you. You can e-mail us at authorship-pilot@google.com if you run into any issues or have feedback.

If you’re a content creator interested in learning more about authorship, check out our Help Center.

Tuesday, December 6, 2011

Tips for hosting providers and webmasters

Webmaster level: All

Some webmasters on our forums ask about hosting-related issues affecting their sites. To help both hosting providers and webmasters recognize, diagnose, and fix such problems, we’d like to share with you some of the common problems we’ve seen and suggest how you can fix them.

  • Blocking of Googlebot crawling. This is a very common issue usually due to a misconfiguration in a firewall or DoS protection system and sometimes due to the content management system the site runs. Protection systems are an important part of good hosting and are often configured to block unusually high levels of server requests, sometimes automatically. Because, however, Googlebot often performs more requests than a human user, these protection systems may decide to block Googlebot and prevent it from crawling your website. To check for this kind of problem, use the Fetch as Googlebot function in Webmaster Tools, and check for other crawl errors shown in Webmaster Tools.

    We offer several tools to webmasters and hosting providers who want more control over Googlebot’s crawling, and to improve crawling efficiency:

    We have more information in our crawling and indexing FAQ.

  • Availability issues. A related type of problem we see is websites being unavailable when Googlebot (and users) attempt to access the site. This includes DNS issues, overloaded servers leading to timeouts and refused connections, misconfigured content distribution networks (CDNs), and many other kinds of errors. When Googlebot encounters such issues, we report them in Webmaster Tools as either URL unreachable errors or crawl errors.

  • Invalid SSL certificates. For SSL certificates to be valid for your website, they need to match the name of the site. Common problems include expired SSL certificates and servers misconfigured such that all websites on that server use the same certificate. Most web browsers will try warn users in these situations, and Google tries to alert webmasters of this issue by sending a message via Webmaster Tools. The fix for these problems is to make sure to use SSL certificates that are valid for all your website’s domains and subdomains your users will interact with.

  • Wildcard DNS. Websites can be configured to respond to all subdomain requests. For example, the website at example.com can be configured to respond to requests to foo.example.com, made-up-name.example.com and all other subdomains.

    In some cases this is desirable to have; for example, a user-generated content website may choose to give each account its own subdomain. However, in some cases, the webmaster may not wish to have this behavior as it may cause content to be duplicated unnecessarily across different hostnames and it may also affect Googlebot’s crawling.

    To minimize problems in wildcard DNS setups, either configure your website to not use them, or configure your server to not respond successfully to non-existent hostnames, either by refusing the connection or by returning an HTTP 404 header.

  • Misconfigured virtual hosting. The symptom of this problem is that multiple hosts and/or domain names hosted on the same server always return the contents of only one site. To rephrase, although the server hosts multiple sites, it returns only one site regardless of what is being requested. To diagnose the issue, you need to check that the server responds correctly to the Host HTTP header.

  • Content duplication through hosting-specific URLs. Many hosts helpfully offer URLs for your website for testing/development purposes. For example, if you’re hosting the website http://a.com/ on the hosting provider example.com, the host may offer access to your site through a URL like http://a.example.com/ or http://example.com/~a/. Our recommendation is to have these hosting-specific URLs not publicly accessible (by password protecting them); and even if these URLs are accessible, our algorithms usually pick the URL webmasters intend. If our algorithms select the hosting-specific URLs, you can influence our algorithms to pick your preferred URLs by implementing canonicalization techniques correctly.

  • Soft error pages. Some hosting providers show error pages using an HTTP 200 status code (meaning “Success”) instead of an HTTP error status code. For example, a “Page not found” error page could return HTTP 200 instead of 404, making it a soft 404 page; or a “Website temporarily unavailable” message might return a 200 instead of correctly returning a 503 HTTP status code. We try hard to detect soft error pages, but when our algorithms fail to detect a web host’s soft error pages, these pages may get indexed with the error content. This may cause ranking or cross-domain URL selection issues.

    It’s easy to check the status code returned: simply check the HTTP headers the server returns using any one of a number of tools, such as Fetch as Googlebot. If an error page is returning HTTP 200, change the configuration to return the correct HTTP error status code. Also, keep an eye out for soft 404 reports in Webmaster Tools, on the Crawl errors page in the Diagnostics section.

  • Content modification and frames. Webmasters may be surprised to see their page contents modified by hosting providers, typically by injecting scripts or images into the page. Web hosts may also serve your content by embedding it in other pages using frames or iframes. To check whether a web host is changing your content in unexpected ways, simply check the source code of the page as served by the host and compare it to the code you uploaded.

    Note that some server-side code modifications may be very useful. For example, a server using Google’s mod_pagespeed Apache module or other tools may be returning your code minified for page speed optimization.

  • Spam and malware. We’ve seen some web hosts and bulk subdomain services become major sources of malware and spam. We try hard to be granular in our actions when protecting our users and search quality, but if we see a very large fraction of sites on a specific web host that are spammy or are distributing malware, we may be forced to take action on the web host as a whole. To help you keep on top of malware, we offer:

We hope this list helps both hosting providers and webmasters diagnose and fix these issues. Beyond this list, also think about the qualitative aspects of hosting like quality of service and the helpfulness of support. As always, if you have questions or need more help, please ask in our Webmaster Help Forum.

Monday, December 5, 2011

New markup for multilingual content

Many websites serve users from around the world. There are different approaches to serving content appropriate to your users' language and/or region. Last year, we launched support for explicit annotations for web pages rendering the same content with different language templates.
Today we're going further with our support for multilingual content with improved handling for these two scenarios:
  • Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price
  • Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French

Specifying language and location

We've expanded our support of the rel="alternate" hreflang link element to handle content that is translated or provided for multiple geographic regions. The hreflang attribute can specify the language, optionally the country, and URLs of equivalent content. By specifying these alternate URLs, our goal is to be able to consolidate signals for these pages, and to serve the appropriate URL to users in search. Alternative URLs can be on the same site or on another domain.

Annotating pages as substantially similar content

Optionally, for pages that have substantially the same content in the same language and are targeted at multiple countries, you may use the rel="canonical" link element to specify your preferred version. We’ll use that signal to focus on that version in search, while showing the local URLs to users where appropriate. For example, you could use this if you have the same product page in German, but want to target it separately to users searching on the Google properties for Germany, Austria, and Switzerland.
Update: to simplify implementation, we no longer recommend using rel=canonical.

Example usage

To explain how it works, let’s look at some example URLs:
  • http://www.example.com/ - contains the general homepage of a website, in Spanish
  • http://es-es.example.com/ - is the version for users in Spain, in Spanish
  • http://es-mx.example.com/ - is the version for users in Mexico, in Spanish
  • http://en.example.com/ - is the generic English language version
On all of these pages, we could use the following markup to specify language and optionally the region:

<link rel="alternate" hreflang="es" href="http://www.example.com/" />
<link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" />
<link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" />
<link rel="alternate" hreflang="en" href="http://en.example.com/" />

If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.

More help

As always, if you need more help correctly implementing multiregional and multilingual websites, please see our Help Center article about this topic, or ask in our Webmaster Help Forum.

Monday, November 7, 2011

Grow your audience with Google+

Webmaster Level: All

At Google, we help grow your audience by connecting you with new users. We introduced the +1 button so your site would stand out on search and your users could easily share your content on Google+. But, sometimes you want to join the conversation and post content directly to where people are sharing.

Today we’re introducing Google+ for Business, a collection of tools and products that help you grow your audience. At the core of this is Google+ Pages, your site’s identity on Google+.

Google+ Pages: Have real conversations with the right people

To get your site on Google+, you first need to create a Google+ Page. On your page, you can engage in conversations with your visitors, direct readers back to your site for the latest updates, send tailored messages to specific groups of people, and see how many +1’s you have across the web. Google+ Pages will help you build relationships with your users, encouraging them to spend more time engaging with your content.

Google+ Pages are at the heart of Google+ for Business

Hangouts
Sometimes you might want to chat with your users face-to-face.  For example, if you run a food blog, you may want to invite a chef to talk about her favorite recipe, or if you manage a fashion review site, beauty specialists might want to hold how-to sessions with makeup tips. Hangouts make this easy, by letting you have high-quality video chats with nine people with a single click. You can use Hangouts to hold live forums, break news or simply get to know people better, all in real time.

Hangouts let you meet your customers, face-to-face

Circles
Circles allow you to group followers of your Page into smaller audiences. You can then share specific messages with specific groups. For example, you could create a Circle containing your most loyal readers and offer them exclusive content.
The Google+ badge: Grow your audience on Google+

To help your users find your page and start sharing, there are two buttons you can add to your site by visiting our Google+ badge configuration tool:

The Google+ icon, a small icon that directly links to your Page.

The Google+ badge, which we’re introducing in the coming days. This badge lets people add your page to their circles without leaving your site, and allows them to get updates from your site via Google+.

 

Extend the power of +1, stand out in Google search
You can also link your site to your Google+ page so that all your +1s -- from your Page, your website, and search results -- will get tallied together and appear as a single total. Potential visitors will be more likely to see the recommendations your site has received, whether they’re looking at a search result, your website, or your Page, meaning your +1’s will reach not only the 40 million users of Google+, but all the people who come to Google every day. You can link your site to your Page either using the Google+ badge or with a  piece of code. To set this up, visit our Google+ badge configuration tool.

Bringing Google+ to the rest of Google

Our ultimate vision for Google+ is to transform the overall Google experience -- weaving identity and sharing into all of our products. Beginning today, we’re rolling out a new experimental feature to a small group of eligible publishers, Google+ Direct Connect -- an easy way for your audience to find your Google+ Page on Google search.  If you’ve linked your Page to your site and you qualify, when someone searches for your website’s name with the ‘+’ sign before it Direct Connect will send them directly to your Page. For example, try searching for ‘+YouTube’ on Google. Users will also be prompted to automatically add Pages they find through Direct Connect to their circles. 

Direct Connect suggestions start populating as you type on Google.com

Just the beginning

We want to help you get your site on Google+ as soon as possible, so we’re opening the field trial for Google+ Pages to everyone today. Creating a Google+ Page only takes a few minutes. To get started, you’ll need a personal Google+ profile. If you don’t have a Google account, it’s very quick and easy to join. And if you’re looking for inspiration, check out some of the sites that are already starting to set up their Pages:

Partner LogosBurberryHMMacysPepsiABC NewsAmazonAssassins_CreedATTBreaking_NewsOrangeDC_ComicsDellNBC_NewsGol_Linhas_aerasKiaLOrealMarvelNYTimesPiagetShadyTmobileToyotaUniqloVirgin

To learn more about how Google+ works for your site, check out the Google+ Your Business site. We’re just getting started, and have many more features planned for the coming weeks and months. To keep up to date on the latest news and tips, add the Google+ Your Business page to your circles. If you have ideas on how we can improve Google+ for your site, we’d love to hear them.

Tuesday, November 1, 2011

GET, POST, and safely surfacing more of the web

Webmaster Level: Intermediate to Advanced

As the web evolves, Google’s crawling and indexing capabilities also need to progress. We improved our indexing of Flash, built a more robust infrastructure called Caffeine, and we even started crawling forms where it makes sense. Now, especially with the growing popularity of JavaScript and, with it, AJAX, we’re finding more web pages requiring POST requests -- either for the entire content of the page or because the pages are missing information and/or look completely broken without the resources returned from POST. For Google Search this is less than ideal, because when we’re not properly discovering and indexing content, searchers may not have access to the most comprehensive and relevant results.

We generally advise to use GET for fetching resources a page needs, and this is by far our preferred method of crawling. We’ve started experiments to rewrite POST requests to GET, and while this remains a valid strategy in some cases, often the contents returned by a web server for GET vs. POST are completely different. Additionally, there are legitimate reasons to use POST (e.g., you can attach more data to a POST request than a GET). So, while GET requests remain far more common, to surface more content on the web, Googlebot may now perform POST requests when we believe it’s safe and appropriate.

We take precautions to avoid performing any task on a site that could result in executing an unintended user action. Our POSTs are primarily for crawling resources that a page requests automatically, mimicking what a typical user would see when they open the URL in their browser. This will evolve over time as we find better heuristics, but that’s our current approach.

Let’s run through a few POSTs request scenarios that demonstrate how we’re improving our crawling and indexing to evolve with the web.

Examples of Googlebot’s POST requests
  • Crawling a page via a POST redirect
    <html>
      <body onload="document.foo.submit();">
        <form name="foo" action="request.php" method="post">       <input type="hidden" name="bar" value="234"/>
        </form>
      </body>
    </html>
  • Crawling a resource via a POST XMLHttpRequest
    In this step-by-step example, we improve both the indexing of a page and its Instant Preview by following the automatic XMLHttpRequest generated as the page renders.

    1. Google crawls the URL, yummy-sundae.html.
    2. Google begins indexing yummy-sundae.html and, as a part of this process, decides to attempt to render the page to better understand its content and/or generate the Instant Preview.
    3. During the render, yummy-sundae.html automatically sends an XMLHttpRequest for a resource, hot-fudge-info.html, using the POST method.
      <html>
        <head>
          <title>Yummy Sundae</title>
          <script src="jquery.js"></script>
        </head>
        <body>
          This page is about a yummy sundae.
          <div id="content"></div>
          <script type="text/javascript">
            $(document).ready(function() {
              $.post('hot-fudge-info.html', function(data)
                {$('#content').html(data);});
            });
          </script>
        </body>
      </html>
    4. The URL requested through POST, hot-fudge-info.html, along with its data payload, is added to Googlebot’s crawl queue.
    5. Googlebot performs a POST request to crawl hot-fudge-info.html.
    6. Google now has an accurate representation of yummy-sundae.html for Instant Previews. In certain cases, we may also incorporate the contents of hot-fudge-info.html into yummy-sundae.html.
    7. Google completes the indexing of yummy-sundae.html.
    8. User searches for [hot fudge sundae].
    9. Google’s algorithms can now better determine how yummy-sundae.html is relevant for this query, and we can properly display a snapshot of the page for Instant Previews.
Improving your site’s crawlability and indexability

General advice for creating crawlable sites is found in our Help Center. For webmasters who want to help Google crawl and index their content and/or generate the Instant Preview, here are a few simple reminders:
  • Prefer GET for fetching resources, unless there’s a specific reason to use POST.
  • Verify that we're allowed to crawl the resources needed to render your page. In the example above, if hot-fudge-info.html is disallowed by robots.txt, Googlebot won't fetch it. More subtly, if the JavaScript code that issues the XMLHttpRequest is located in an external .js file disallowed by robots.txt, we won't see the connection between yummy-sundae.html and hot-fudge-info.html, so even if the latter is not disallowed itself, that may not help us much. We've seen even more complicated chains of dependencies in the wild. To help Google better understand your site it's almost always better to allow Googlebot to crawl all resources.

    You can test whether resources are blocked through Webmaster Tools “Labs -> Instant Previews.”
  • Make sure to return the same content to Googlebot as is returned to users’ web browsers. Cloaking (sending different content to Googlebot than to users) is a violation of our Webmaster Guidelines because, among other things, it may cause us to provide a searcher with an irrelevant result -- the content the user views in their browser may be a complete mismatch from what we crawled and indexed. We’ve seen numerous POST-request examples where a webmaster non-maliciously cloaked (which is still a violation), and their cloaking -- on even the smallest of changes -- then caused JavaScript errors that prevented accurate indexing and completely defeated their reason for cloaking in the first place. Summarizing, if you want your site to be search-friendly, cloaking is an all-around sticky situation that’s best to avoid.

    To verify that you're not accidentally cloaking, you can use Instant Previews within Webmaster Tools, or try setting the User-Agent string in your browser to something like:

    Mozilla/5.0 (compatible; Googlebot/2.1;
      +http://www.google.com/bot.html)

    Your site shouldn't look any different after such a change. If you see a blank page, a JavaScript error, or if parts of the page are missing or different, that means that something's wrong.
  • Remember to include important content (i.e., the content you’d like indexed) as text, visible directly on the page and without requiring user-action to display. Most search engines are text-based and generally work best with text-based content. We’re always improving our ability to crawl and index content published in a variety of ways, but it remains a good practice to use text for important information.
Controlling your content

If you’d like to prevent content from being crawled or indexed for Google Web Search, traditional robots.txt directives remain the best method. To prevent the Instant Preview for your page(s), please see our Instant Previews FAQ which describes the “Google Web Preview” User-Agent and the nosnippet meta tag.

Moving forward

We’ll continue striving to increase the comprehensiveness of our index so searchers can find more relevant information. And we expect our crawling and indexing capability to improve and evolve over time, just like the web itself. Please let us know if you have questions or concerns.

Monday, October 31, 2011

Raising awareness of cross-domain URL selections

Webmaster level: Advanced

A piece of content can often be reached via several URLs, not all of which may be on the same domain. A common example we’ve talked about over the years is having the same content available on more than one URL, an issue known as duplicate content. When we discover a group of pages with duplicate content, Google uses algorithms to select one representative URL for that content. A group of pages may contain URLs from the same site or from different sites. When the representative URL is selected from a group with different sites the selection is called a cross-domain URL selection. To take a simple example, if the group of URLs contains one URL from a.com and one URL from b.com and our algorithms select the URL from b.com, the a.com URL may no longer be shown in our search results and may see a drop in search-referred traffic.

Webmasters can greatly influence our algorithms’ selections using one of the currently supported mechanisms to indicate the preferred URL, for example using rel="canonical" elements or 301 redirects. In most cases, the decisions our algorithms make in this regard correctly reflect the webmaster’s intent. However, in some rare cases we’ve also found many webmasters are confused as to why it has happened and what they can do if they believe the selection is incorrect.

To be transparent about cross-domain URL selection decisions, we’re launching new Webmaster Tools messages that will attempt to notify webmasters when our algorithms select an external URL instead of one from their website. The details about how these messages work are in our Help Center article about the topic, and in this blog post we’ll discuss the different scenarios in which you may see a cross-domain URL selection and what you can do to fix any selections you believe are incorrect.

Common causes of cross-domain URL selection

There are many scenarios that can lead our algorithms to select URLs across domains.

In most cases, our algorithms select a URL based on signals that the webmaster implemented to influence the decision. For example, a webmaster following our guidelines and best practices for moving websites is effectively signalling that the URLs on their new website are the ones they prefer for Google to select. If you’re moving your website and see these new messages in Webmaster Tools, you can take that as confirmation that our algorithms have noticed.

However, we regularly see webmasters ask questions when our algorithms select a URL they did not want selected. When your website is involved in a cross-domain selection, and you believe the selection is incorrect (i.e. not your intention), there are several strategies to improve the situation. Here are some of the common causes of unexpected cross-domain URL selections that we’ve seen, and how to fix them:

  1. Duplicate content, including multi-regional websites: We regularly see webmasters use substantially the same content in the same language on multiple domains, sometimes inadvertently and sometimes to geotarget the content. For example, it’s common to see a webmaster set up the same English language website on both example.com and example.net, or a German language website hosted on a.de, a.at, and a.ch.

    Depending on your website and your users, you can use one of the currently-supported canonicalization techniques to signal to our algorithms which URLs you wish selected. Please see the following articles about this topic:

  2. Configuration mistakes: Certain types of misconfigurations can lead our algorithms to make an incorrect decision. Examples of misconfiguration scenarios include:
    1. Incorrect canonicalization: Incorrect usage of canonicalization techniques pointing to URLs on an external website can lead our algorithms to select the external URLs to show in our search results. We’ve seen this happen with misconfigured content management systems (CMS) or CMS plugins installed by the webmaster.

      To fix this kind of situation, find how your website is incorrectly indicating the canonical URL preference (e.g. through incorrect usage of a rel="canonical" element or a 301 redirect) and fix that.

    2. Misconfigured servers: Sometimes we see hosting misconfigurations where content from site a.com is returned for URLs on b.com. A similar case occurs when two unrelated web servers return identical soft 404 pages that we may fail to detect as error pages. In both situations we may assume the same content is being returned from two different sites and our algorithms may incorrectly select the a.com URL as the canonical of the b.com URL.

      You will need to investigate which part of your website’s serving infrastructure is misconfigured. For example, your server may be returning HTTP 200 (success) status codes for error pages, or your server might be confusing requests across different domains hosted on it. Once you find the root cause of the issue, work with your server admins to correct the configuration.

  3. Malicious website attacks: Some attacks on websites introduce code that can cause undesired canonicalization. For example, the malicious code might cause the website to return an HTTP 301 redirect or insert a cross-domain rel="canonical" link element into the HTML <head> or HTTP header, usually pointing to an external URL hosting malicious content. In these cases our algorithms may select the malicious or spammy URL instead of the URL on the compromised website.

    In this situation, please follow our guidance on cleaning your site and submit a reconsideration request when done. To identify cloaked attacks, you can use the Fetch as Googlebot function in Webmaster Tools to see your page’s content as Googlebot sees it.

In rare situations, our algorithms may select a URL from an external site that is hosting your content without your permission. If you believe that another site is duplicating your content in violation of copyright law, you may contact the site’s host to request removal. In addition, you can request that Google remove the infringing page from our search results by filing a request under the Digital Millennium Copyright Act.

And as always, if you need help in identifying the cause of an incorrect decision or how to fix it, you can see our Help Center article about this topic and ask in our Webmaster Help Forum.

Tuesday, October 18, 2011

Accessing search query data for your sites

Webmaster level: All

SSL encryption on the web has been growing by leaps and bounds. As part of our commitment to provide a more secure online experience, today we announced that SSL Search on https://www.google.com will become the default experience for signed in users on google.com. This change will be rolling out over the next few weeks.

What is the impact of this change for webmasters? Today, a web site accessed through organic search results on http://www.google.com (non-SSL) can see both that the user came from google.com and their search query. (Technically speaking, the user’s browser passes this information via the HTTP referrer field.) However, for organic search results on SSL search, a web site will only know that the user came from google.com.

Webmasters can still access a wealth of search query data for their sites via Webmaster Tools. For sites which have been added and verified in Webmaster Tools, webmasters can do the following:
  • View the top 1000 daily search queries and top 1000 daily landing pages for the past 30 days.
  • View the impressions, clicks, clickthrough rate (CTR), and average position in search results for each query, and compare this to the previous 30 day period.
  • Download this data in CSV format.
In addition, users of Google Analytics’ Search Engine Optimization reports have access to the same search query data available in Webmaster Tools and can take advantage of its rich reporting capabilities.

We will continue to look into further improvements to how search query data is surfaced through Webmaster Tools. If you have questions, feedback or suggestions, please let us know through the Webmaster Tools Help Forum.

Wednesday, October 12, 2011

Create and manage Custom Search Engines from within Webmaster Tools

Webmaster level: All

Custom Search Engines (CSEs) enable you to create Google-powered customized search experiences for your sites. You can search over one or more sites, customize the look and feel to match your site, and even make money with AdSense for Search. Now it’s even easier to get started directly from Webmaster Tools.

If you’ve never created a CSE, just click on the “Custom Search” link in the Labs section and we’ll automatically create a default CSE that searches just your site. You can do some basic configuring or immediately get the code snippet to add your new CSE to your site. You can always continue on to the full CSE control panel for more advanced settings.

Once you’ve created your CSE (or if you already had one), clicking the “Custom Search” link in Labs will allow you to manage your CSEs without leaving Webmaster Tools.

We hope these new features make it easier for you to help users search your site. If you have any questions, please post them in our Webmaster Help Forum or the Custom Search Help Forum.

Wednesday, October 5, 2011

Webmaster forums' Top Contributors rock

Webmaster level: All

The TC Summit was a blast! As we wrote in our announcement post, we recently invited more than 250 Top Contributors from all over the world to California to thank them for being so awesome and to give them the opportunity to meet some of our forum guides, engineers and product managers in person.

Our colleagues Adrianne and Brenna already published a recap post on the Official Google Blog. As for us, the search folks at Google, there's not much left to say except that we enjoyed the event and meeting Top Contributors in real life, many of them for the first time. We got the feeling you guys had a great time, too. Let’s quote a few of the folks who make a huge difference on a daily basis:

Sasch Mayer on Google+ (Webmaster TC in English):

"For a number of reasons this event does hold a special place for me, and always will. It's not because I was one of comparatively few people to be invited for a Jolly at the ‘Plex, but because this trip offered the world's TCs a unique opportunity to finally meet each other in person."

Herbert Sulzer, a.k.a. Luzie on Google+ (Webmaster TC in English, German and Spanish):

“Hehehe! Fun, fun fun, this was all fun :D Huhhh”

Aygul Zagidullina on Google+ (Web Search TC in English):

“It was a truly fantastic, amazing, and unforgettable experience meeting so many other TCs across product forums and having the chance to talk to and hear from so many Googlers across so many products!”

Of course we did receive lots of constructive feedback, too. Transparency and communication were on top of the list, and we're looking into increasing our outreach efforts via Webmaster Tools, so stay tuned! By the way, if you haven’t done so yet, please remember to use the forwarding option in the Webmaster Tools Message Center to get the messages straight to your email inbox. In the meantime please keep an eye on our Webmaster Central Blog, and of course keep on contributing to discussions in the Google Webmaster Forum.

On behalf of all Google guides who participated in the 2011 Summit we want to thank you. You guys rock! :)


That’s right, TCs & Google Guides came from all over the world to convene in California.


TCs & Google Guides from Webmaster Central and Search forums after one of the sessions.


After a day packed with presentations and breakout sessions...


...we did what we actually came for...


...enjoyed a party, celebrated and had a great time together.

Tuesday, October 4, 2011

Webmaster Tools Search Queries data is now available in Google Analytics

Webmaster level: All

Earlier this year we announced a limited pilot for Search Engine Optimization reports in Google Analytics, based on Search queries data from Webmaster Tools. Thanks to valuable feedback from our pilot users, we’ve made several improvements and are pleased to announce that the following reports are now publicly available in the Traffic Sources section of Google Analytics.
  • Queries: impressions, clicks, position, and CTR info for the top 1,000 daily queries
  • Landing Pages: impressions, clicks, position, and CTR info for the top 1,000 daily landing pages
  • Geographical Summary: impressions, clicks, and CTR by country
All of these Search Engine Optimization reports offer Google Analytics’ advanced filtering and visualization capabilities for deeper data analysis. With the secondary dimensions, you can view your site’s data in ways that aren’t available in Webmaster Tools.


To enable these Search Engine Optimization reports for a web property, you must be both a Webmaster Tools verified site owner and a Google Analytics administrator of that Property. Once enabled, administrators can choose which profiles can see these reports.

If you have feedback or suggestions, please let us know in the Webmaster Help Forum.

Thursday, September 29, 2011

Work smarter, not harder, with site health

Webmaster level: All

We consistently hear from webmasters that they have to prioritize their time. Some manage dozens or hundreds of clients’ sites; others run their own business and may only have an hour to spend on website maintenance in between managing finances and inventory. To help you prioritize your efforts, Webmaster Tools is introducing the idea of “site health,” and we’ve redesigned the Webmaster Tools home page to highlight your sites with health problems. This should allow you to easily see what needs your attention the most, without having to click through all of the reports in Webmaster Tools for every site you manage.

Here’s what the new home page looks like:


You can see that sites with health problems are shown at the top of the list. (If you prefer, you can always switch back to listing your sites alphabetically.) To see the specific issues we detected on a site, click the site health icon or the “Check site health” link next to that site:


This new home page is currently only available if you have 100 or fewer sites in your Webmaster Tools account (either verified or unverified). We’re working on making it available to all accounts in the future. If you have more than 100 sites, you can see site health information at the top of the Dashboard for each of your sites.

Right now we include three issues in your site’s health check:
  1. Have we detected malware on the site?
  2. Have any important pages been removed via our URL removal tool?
  3. Are any of your important pages blocked from crawling in robots.txt?
You can click on any of these items to get more details about what we detected on your site. If the site health icon and the “Check site health” link don’t appear next to a site, it means that we didn’t detect any of these issues on that site (congratulations!).

A word about “important pages:” as you know, you can get a comprehensive list of all URLs that have been removed by going to Site configuration > Crawler access > Remove URL; and you can see all the URLs that we couldn’t crawl because of robots.txt by going to Diagnostics > Crawl errors > Restricted by robots.txt. But since webmasters often block or remove content on purpose, we only wanted to indicate a potential site health issue if we think you may have blocked or removed a page you didn’t mean to, which is why we’re focusing on “important pages.” Right now we’re looking at the number of clicks pages get (which you can see in Your site on the web > Search queries) to determine importance, and we may incorporate other factors in the future as our site health checks evolve.

Obviously these three issues—malware, removed URLs, and blocked URLs—aren’t the only things that can make a website “unhealthy;” in the future we’re hoping to expand the checks we use to determine a site’s health, and of course there’s no substitute for your own good judgment and knowledge of what’s going on with your site. But we hope that these changes make it easier for you to quickly spot major problems with your sites without having to dig down into all the data and reports.

After you’ve resolved any site health issues we’ve flagged, it will usually take several days for the warning to disappear from your Webmaster Tools account, since we have to recrawl the site, see the changes you’ve made, and then process that information through our Web Search and Webmaster Tools pipelines. If you continue to see a site health warning for that site after a week or so, the issue may not have been resolved. Feel free to ask for help tracking it down in our Webmaster Help Forum... and let us know what you think!

Thursday, September 15, 2011

View-all in search results

Webmaster level: Intermediate to Advanced

User testing has taught us that searchers much prefer the view-all, single-page version of content over a component page containing only a portion of the same information with arbitrary page breaks (which cause the user to click “next” and load another URL).


Searchers often prefer the view-all vs. paginated content with arbitrary page breaks and worse latency.

Therefore, to improve the user experience, when we detect that a content series (e.g. page-1.html, page-2.html, etc.) also contains a single-page version (e.g. page-all.html), we’re now making a larger effort to return the single-page version in search results. If your site has a view-all option, there’s nothing you need to do; we’ll work to do it on your behalf. Also, indexing properties, like links, will be consolidated from the component pages in the series to the view-all page.

However, high latency can make the view-all less preferred

Interestingly, the cases when users didn’t prefer the view-all page were correlated with high latency (e.g., when the view-all page took a while to load, say, because it contained many images). This makes sense because we know users are less satisfied with slow results. So while a view-all page is commonly desired, as a webmaster it’s important to balance this preference with the page’s load time and overall user experience.

Best practices for a series of content
  1. If your site includes view-all pages

    We aim to detect the view-all version of your content and, if available, its associated component pages. There’s nothing more you need to do! However, if you’d like to make it more explicit to us, you can include rel=”canonical” from your component pages to your view-all to increase the likelihood that we detect your series of pages appropriately.


    rel=”canonical” can specify the superset of content (i.e. the view-all page, in this case page-all.html) from the same information in a series of URLs.

    Why does this work?

    In the diagram, page-2.html of a series may specify the canonical target as page-all.html because page-all.html is a superset of page-2.html's content. When a user searches for a query term and page-all.html is selected in search results, even if the query most related to page-2.html, we know the user will still see page-2.html’s relevant information within page-all.html.


    On the other hand, page-2.html shouldn’t designate page-1.html as the canonical because page-2.html’s content isn’t included on page-1.html. It’s possible that a user’s search query is relevant to content on page-2.html, but if page-2.html’s canonical is set to page-1.html, the user could then select page-1.html in search results and find herself in a position where she has to further navigate to a different page to arrive at the desired information. That’s a poor experience for the user, a suboptimal result from us, and it could also bring poorly targeted traffic to your site.


    However, if you strongly desire your view-all page not to appear in search results: 1) make sure the component pages in the series don’t include rel=”canonical” to the view-all page, and 2) mark the view-all page as “noindex” using any of the standard methods.
  2. If you’d like to surface individual, component pages (or there’s no view-all available)

    It may be the case that one or both of the situations below apply to your site:

    • The view-all page is undesirable as a search result (e.g., load time too high or too difficult for users to navigate).
    • Your users prefer the multi-page experience and to be directed to a component page in search results, rather than the view-all page.

    If so, you can use standard HTML rel=”next” and rel=”prev” elements to specify a relationship between the component pages in your series of content. If done correctly, Google will generally strive to:

    • Consolidate indexing properties, such as links, between the component pages/URLs.
    • Send users to the most relevant page/URL from the component pages. Typically, the most relevant page is the first page of your content, but our algorithms may point users to one of the component pages in the series.

It’s not uncommon for webmasters to incorrectly use rel=”canonical” from component pages to the first page of their series (e.g. page-2.html with rel=”canonical” to page-1.html). We recommend against this implementation because the component pages don’t actually contain duplicate content. Using rel=”next” and rel=”prev” is far more appropriate.

Summary

Because users generally prefer the view-all option in search results, we’re making more of an effort to properly detect and serve this version to searchers. If you have a series of content, there’s nothing more you need to do. If you’d like to hint more to Google how best to serve users your information:
  1. To better optimize your view-all page, you can use rel=”canonical” from component pages to the single-page version; otherwise,
  2. If a view-all page doesn’t provide a good user experience for your site, you can use the rel=”next” and rel=”prev” attributes as a strong hint for Google to identify the series of pages and still surface a component page in results.

Questions?

As always, feel free to ask in our Webmaster Help Forum.

Pagination with rel=“next” and rel=“prev”

Webmaster level: Intermediate to Advanced

Much like rel=”canonical” acts a strong hint for duplicate content, you can now use the HTML link elements rel=”next” and rel=”prev” to indicate the relationship between component URLs in a paginated series. Throughout the web, a paginated series of content may take many shapes—it can be an article divided into several component pages, or a product category with items spread across several pages, or a forum thread divided into a sequence of URLs. Now, if you choose to include rel=”next” and rel=”prev” markup on the component pages within a series, you’re giving Google a strong hint that you’d like us to:
  • Consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole (i.e., links should not remain dispersed between page-1.html, page-2.html, etc., but be grouped with the sequence).
  • Send users to the most relevant page/URL—typically the first page of the series.


The relationship between component URLs in a series can now be indicated to Google through rel=”next” and rel=”prev”.

There’s an exception to the rel=”prev” and rel=”next” implementation: If, alongside your series of content, you also offer users a view-all page, or if you’re considering a view-all page, please see our post on View-all in search results for more information. Because view-all pages are most commonly preferred by searchers, we do our best to surface this version when appropriate in results rather than a component page (component pages are more likely to surface with rel=”next” and rel=”prev”).

If you don’t have a view-all page or you’d like to override Google returning a view-all page, you can use rel="next" and rel="prev" as described in this post.

For information on paginated configurations that include a view-all page, please see our post on View-all in search results.

Outlining your options

Here are three options for a series:
  1. Leave whatever you have exactly as-is. Paginated content exists throughout the web and we’ll continue to strive to give searchers the best result, regardless of the page’s rel=”next”/rel=”prev” HTML markup—or lack thereof.
  2. If you have a view-all page, or are considering a view-all page, see our post on View-all in search results.
  3. Hint to Google the relationship between the component URLs of your series with rel=”next” and rel=”prev”. This helps us more accurately index your content and serve to users the most relevant page (commonly the first page). Implementation details below.

Implementing rel=”next” and rel=”prev”

If you prefer option 3 (above) for your site, let’s get started! Let’s say you have content paginated into the URLs:

http://www.example.com/article?story=abc&page=1
http://www.example.com/article?story=abc&page=2
http://www.example.com/article?story=abc&page=3
http://www.example.com/article?story=abc&page=4

On the first page, http://www.example.com/article?story=abc&page=1, you’d include in the <head> section:
<link rel="next" href="http://www.example.com/article?story=abc&page=2" />

On the second page, http://www.example.com/article?story=abc&page=2:
<link rel="prev" href="http://www.example.com/article?story=abc&page=1" />
<link rel="next" href="http://www.example.com/article?story=abc&page=3" />

On the third page, http://www.example.com/article?story=abc&page=3:
<link rel="prev" href="http://www.example.com/article?story=abc&page=2" />
<link rel="next" href="http://www.example.com/article?story=abc&page=4" />

And on the last page, http://www.example.com/article?story=abc&page=4:
<link rel="prev" href="http://www.example.com/article?story=abc&page=3" />

A few points to mention:
  • The first page only contains rel=”next” and no rel=”prev” markup.
  • Pages two to the second-to-last page should be doubly-linked with both rel=”next” and rel=”prev” markup.
  • The last page only contains markup for rel=”prev”, not rel=”next”.
  • rel=”next” and rel=”prev” values can be either relative or absolute URLs (as allowed by the <link> tag). And, if you include a <base> link in your document, relative paths will resolve according to the base URL.
  • rel=”next” and rel=”prev” only need to be declared within the <head> section, not within the document <body>.
  • We allow rel=”previous” as a syntactic variant of rel=”prev” links.
  • rel="next" and rel="previous" on the one hand and rel="canonical" on the other constitute independent concepts. Both declarations can be included in the same page. For example, http://www.example.com/article?story=abc&page=2&sessionid=123 may contain:

    <link rel="canonical" href="http://www.example.com/article?story=abc&page=2”/>
    <link rel="prev" href="http://www.example.com/article?story=abc&page=1&sessionid=123" />
    <link rel="next" href="http://www.example.com/article?story=abc&page=3&sessionid=123" />

  • rel=”prev” and rel=”next” act as hints to Google, not absolute directives.
  • When implemented incorrectly, such as omitting an expected rel="prev" or rel="next" designation in the series, we'll continue to index the page(s), and rely on our own heuristics to understand your content.

Questions?
More information can be found in our Help Center, or join the conversation in our Webmaster Help Forum!