Net Leading: 2010

Tuesday, March 30, 2010

URL removal explained, Part I: URLs & directories

Webmaster level: All

There's a lot of content on the Internet these days. At some point, something may turn up online that you would rather not have out there—anything from an inflammatory blog post you regret publishing, to confidential data that accidentally got exposed. In most cases, deleting or restricting access to this content will cause it to naturally drop out of search results after a while. However, if you urgently need to remove unwanted content that has gotten indexed by Google and you can't wait for it to naturally disappear, you can use our URL removal tool to expedite the removal of content from our search results as long as it meets certain criteria (which we'll discuss below).

We've got a series of blog posts lined up for you explaining how to successfully remove various types of content, and common mistakes to avoid. In this first post, I'm going to cover a few basic scenarios: removing a single URL, removing an entire directory or site, and reincluding removed content. I also strongly recommend our previous post on managing what information is available about you online.

Removing a single URL

In general, in order for your removal requests to be successful, the owner of the URL(s) in question—whether that's you, or someone else—must have indicated that it's okay to remove that content. For an individual URL, this can be indicated in any of three ways:

block the page from crawling via a robots.txt file
block the page from indexing via a noindex meta tag
indicate that the page no longer exists by returning a 404 or 410 status code

Before submitting a removal request, you can check whether the URL is correctly blocked:

robots.txt: You can check whether the URL is correctly disallowed using either the Fetch as Googlebot or Test robots.txt features in Webmaster Tools.
noindex meta tag: You can use Fetch as Googlebot to make sure the meta tag appears somewhere between the <head> and </head> tags. If you want to check a page you can't verify in Webmaster Tools, you can open the URL in a browser, go to View > Page source, and make sure you see the meta tag between the <head> and </head> tags.
404 / 410 status code: You can use Fetch as Googlebot, or tools like Live HTTP Headers or web-sniffer.net to verify whether the URL is actually returning the correct code. Sometimes "deleted" pages may say "404" or "Not found" on the page, but actually return a 200 status code in the page header; so it's good to use a proper header-checking tool to double-check.

If unwanted content has been removed from a page but the page hasn't been blocked in any of the above ways, you will not be able to completely remove that URL from our search results. This is most common when you don't own the site that's hosting that content. We cover what to do in this situation in a subsequent post. in Part II of our removals series.

If a URL meets one of the above criteria, you can remove it by going to http://www.google.com/webmasters/tools/removals, entering the URL that you want to remove, and selecting the "Webmaster has already blocked the page" option. Note that you should enter the URL where the content was hosted, not the URL of the Google search where it's appearing. For example, enter
   http://www.example.com/embarrassing-stuff.html
not
   http://www.google.com/search?q=embarrassing+stuff

This article has more details about making sure you're entering the proper URL. Remember that if you don't tell us the exact URL that's troubling you, we won't be able to remove the content you had in mind.

Removing an entire directory or site

In order for a directory or site-wide removal to be successful, the directory or site must be disallowed in the site's robots.txt file. For example, in order to remove the http://www.example.com/secret/ directory, your robots.txt file would need to include:
   User-agent: *
   Disallow: /secret/
It isn't enough for the root of the directory to return a 404 status code, because it's possible for a directory to return a 404 but still serve out files underneath it. Using robots.txt to block a directory (or an entire site) ensures that all the URLs under that directory (or site) are blocked as well. You can test whether a directory has been blocked correctly using either the Fetch as Googlebot or Test robots.txt features in Webmaster Tools.

Only verified owners of a site can request removal of an entire site or directory in Webmaster Tools. To request removal of a directory or site, click on the site in question, then go to Site configuration > Crawler access > Remove URL. If you enter the root of your site as the URL you want to remove, you'll be asked to confirm that you want to remove the entire site. If you enter a subdirectory, select the "Remove directory" option from the drop-down menu.

Reincluding content

You can cancel removal requests for any site you own at any time, including those submitted by other people. In order to do so, you must be a verified owner of this site in Webmaster Tools. Once you've verified ownership, you can go to Site configuration > Crawler access > Remove URL > Removed URLs (or > Made by others) and click "Cancel" next to any requests you wish to cancel.

Still have questions? Stay tuned for the rest of our series on removing content from Google's search results. If you can't wait, much has already been written about URL removals, and troubleshooting individual cases, in our Help Forum. If you still have questions after reading others' experiences, feel free to ask. Note that, in most cases, it's hard to give relevant advice about a particular removal without knowing the site or URL in question. We recommend sharing your URL by using a URL shortening service so that the URL you're concerned about doesn't get indexed as part of your post; some shortening services will even let you disable the shortcut later on, once your question has been resolved.

Edit: Read the rest of this series:
Part II: Removing & updating cached content
Part III: Removing content you don't own
Part IV: Tracking requests, what not to remove

Companion post: Managing what information is available about you online

Posted by Susan Moskwa, Webmaster Trends Analyst

Will the Real <Your Site Here> Please Stand Up?

Webmaster Level: Intermediate

In our recent post on the Google Online Security Blog, we described our system for identifying phishing pages. Of the millions of webpages that our scanners analyze for phishing, we successfully identify 9 out of 10 phishing pages. Our classification system only incorrectly flags a non-phishing site as a phishing site about 1 in 10,000 times, which is significantly better than similar systems. In our experience, these “false positive” sites are usually built to distribute spam or may be involved with other suspicious activity. If you find that your site has been added to our phishing page list (”Reported Web Forgery!”) by mistake, please report the error to us. On the other hand, if your site has been added to our malware list (”This site may harm your computer”), you should follow the instructions here. Our team tries to address all complaints within one day, and we usually respond within a few hours.

Unfortunately, sometimes when we try to follow up on your reports, we find that we are just as confused as our automated system. If you run a website, here are some simple guidelines that will allow us to quickly fix any mistakes and help keep your site off our phishing page list in the first place.

- Don’t ask for usernames and passwords that do not belong to your site. We consider this behavior phishing by definition, so don’t do it! If you want to provide an add-on service to another site, consider using a public API or OAuth instead.

- Avoid displaying logos that are not yours near login fields. Someone surfing the web might mistakenly believe that the logo represents your website, and they might be misled into entering personal information into your site that they intended for the other site. Furthermore, we can’t always be sure that you aren’t doing this intentionally, so we might block your site just to be safe. To prevent misunderstandings, we recommend exercising caution when displaying these logos.

- Minimize the number of domains used by your site, especially for logins. Asking for a username and password for Site X looks very suspicious on Site Y. Besides making it harder for us to evaluate your website, you may be inadvertently teaching your visitors to ignore suspicious URLs, making them more vulnerable to actual phishing attempts. If you must have your login page on a different domain from your main site, consider using a transparent proxy to enable users to access this page from your primary domain. If all else fails...

- Make it easy to find links to your pages. It is difficult for us (and for your users) to determine who controls an off-domain page in your site if the links to that page from your main site are hard to find. All it takes to clear this problem up is to have each off-domain page link back to an on-domain page which links to it. If you have not done this, and one of your pages ends up on our list by mistake, please mention in your error report how we can find the link from your main site to the wrongly blocked page. However, if you do nothing else...

- Don’t send strange links via email or IM. It’s all but impossible for us to verify unusual links that only appeared in your emails or instant messages. Worse, using these kinds of links conditions your users/customers/friends to click on strange links they receive through email or IM, which can put them at risk for other Internet crimes besides phishing.

While we hope you consider these recommendations to be common sense, we’ve seen major e-commerce and financial companies break these guidelines from time to time. Following them will not only improve your experience with our anti-phishing systems, but will also help provide your visitors with a better online experience.

Written by Colin Whittaker, Anti-Phishing Team

Friday, March 19, 2010

Working with multilingual websites

Webmaster Level: Intermediate

A multilingual website is any website that offers content in more than one language. Examples of multilingual websites might include a Canadian business with an English and a French version of its site, or a blog on Latin American soccer available in both Spanish and Portuguese.

Usually, it makes sense to have a multilingual website when your target audience consists of speakers of different languages. If your blog on Latin American soccer aims to reach the Brazilian audience, you may choose to publish it only in Portuguese. But if you’d like to reach soccer fans from Argentina also, then providing content in Spanish could help you with that.

Google and language recognition

Google tries to determine the main languages of each one of your pages. You can help to make language recognition easier if you stick to only one language per page and avoid side-by-side translations. Although Google can recognize a page as being in more than one language, we recommend using the same language for all elements of a page: headers, sidebars, menus, etc.

Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.

Someone who comes to Google and does a search in their language expects to find localized search results, and this is where you, as a webmaster, come in: if you’re going to localize, make it visible in the search results with some of our tips below.

The anatomy of a multilingual site: URL structure

There's no need to create special URLs when developing a multilingual website. Nonetheless, your users might like to identify what section of your website they’re on just by glancing at the URL. For example, the following URLs let users know that they’re on the English section of this site:

http://example.ca/en/mountain-bikes.html
http://en.example.ca/mountain-bikes.html

While these other URLs let users know that they’re viewing the same page in French:

http://example.ca/fr/mountain-bikes.html
http://fr.example.ca/mountain-bikes.html

Additionally, this URL structure will make it easier for you to analyze the indexing of your multilingual content.

If you want to create URLs with non-English characters, make sure to use UTF-8 encoding. UTF-8 encoded URLs should be properly escaped when linked from within your content. Should you need to escape your URLs manually, you can easily find an online URL encoder that will do this for you. For example, if I wanted to translate the following URL from English to French,

http://example.ca/fr/mountain-bikes.html

It might look something like this:

http://example.ca/fr/vélo-de-montagne.html

Since this URL contains one non-English character (é), this is what it would look like properly escaped for use in a link on your pages:

http://example.ca/fr/v%C3%A9lo-de-montagne

Crawling and indexing your multilingual website

We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam. More importantly, the point of making a multilingual website is to reach a larger audience by providing valuable content in several languages. If your users can’t understand an automated translation or if it feels artificial to them, you should ask yourself whether you really want to present this kind of content to them.

If you’re going to localize, make it easy for Googlebot to crawl all language versions of your site. Consider cross-linking page by page. In other words, you can provide links between pages with the same content in different languages. This can also be very helpful to your users. Following our previous example, let’s suppose that a French speaker happens to land on http://example.ca/en/mountain-bikes.html; now, with one click he can get to http://example.ca/fr/vélo-de-montagne.html where he can view the same content in French.

To make all of your site's content more crawlable, avoid automatic redirections based on the user's perceived language. These redirections could prevent users (and search engines) from viewing all the versions of your site.

And last but not least, keep the content for each language on separate URLs - don't use cookies to show translated versions.

Working with character encodings

Google directly extracts character encodings from HTTP headers, HTML page headers, and content. There isn’t much you need to do about character encoding, other than watching out for conflicting information - for example, between content and headers. While Google can recognize different character encodings, we recommend that you use UTF-8 on your website whenever possible.

If your tongue gets twisted...

Now that you know all of this, your tongue may get twisted when you speak many languages, but your website doesn’t have to!

For more information, read our post on multi-regional sites and stay tuned for our next post, where we'll delve into special situations that may arise when working with global websites. Until then, don't hesitate to drop by the Help Forum and join the discussion!

Written by Xavier deMorales, Google Search Quality

Monday, March 15, 2010

Sharing advice from our site clinic

Webmaster Level: All

Members of the Google Search Quality Team have participated in site clinic panels on a number of occasions. We receive a lot of positive feedback from these events and we've been thinking of ways to expand our efforts to reach even more webmasters. We decided to organize a small, free of charge pilot site clinic at Google in Dublin, and opened the invitation to webmasters from the neighborhood. The response we received was overwhelming and exceeded our expectations.

Meet the Googlers who hosted the site clinic: Anu Ilomäki, Alfredo Pulvirenti, Adel Saoud, Fili Wiese, Kaspar Szymanski and Uli Lutz.

It was fantastic to see the large turnout and we would like to share the slides presented as well as the takeaways.

These are some questions we came across, along with the advice shared:

I have 3 blogs with the same content, is that a problem?

If the content is identical, it's likely only one of the blogs will rank for it. Also, with this scattered of an effortwith this scattered of an effort chances are your incoming links will be distributed across the different blogs, instead of pointing to one source. Therefore you're running the risk of both users and search engines not knowing which of your blogs is the definitive source. You can mitigate that by redirecting to the preferred version or using the cross domain canonical to point to one source.

Should I believe SEO agencies that promise to make my site rank first in Google in a few months and with a precise number of links?

No one can make that promise; therefore the short answer is no, you should not. However, we have some great tips on how to find a trustworthy SEO in our Help Center.

There are keywords that are relevant for my website, but they're inappropriate to be shown in the content e.g. because they could be misunderstood, slang or offensive. How can I show the relevance to Google?

Depending on the topic of your site and expectations of the target group, you might consider actually using these keywords in a positive way, e.g. explaining their meaning and showing your users you're an authority on the subject. However if the words are plain abusive and completely inappropriate for your website, it's rather questionable whether the traffic resulting from these search queries is interesting for your website anyway.

Would you advise to use the rewrite URL function?

Some users may like seeing descriptive URLs in the search results. However, it's quite hard to correctly create and maintain rewrites that change dynamic URLs to static-looking URLs. That's why, generally speaking, we don't recommend rewriting them. If you still want to give it a try, please be sure to remove unnecessary parameters while maintaining a dynamic-looking URL and have a close look at our blog post on this topic. And if you don't, keep in mind that we might still make your URLs look readable in our search results no matter how weird they actually are.

If I used the geo-targeting tool for Ireland, is Northern Ireland included?

Google Webmaster Tools geo-targeting works on a country basis, which means that Northern Ireland would not be targeted if the setting was Republic of Ireland. One possible solution is to create a separate site or part of a website for Northern Ireland and to geo-target this site to the United Kingdom in Webmaster Tools.

Is there any preference between TLDs like .com and .info in ranking?

No, there is none. Our focus is on the content of the site.

I have a website on a dot SO (.so) domain name with content meant for the Republic of Ireland. Will this hurt my rankings in the Irish search results?

.so is the Internet country code top-level domain for Somalia. This is one factor we look into not pointing to the desired destination. But we do look at a larger number of factors when ranking your website. The extension of the domain name is just one of these. Your website can still rank in the Irish search results if you have topic-specific content. However, keep in mind that it may take our algorithms a little bit longer to fully understand where to best serve your website in our search results.

We would like to thank all participants for their time and effort. It was a pleasure to help you and we hope that it was beneficial for you, too. For any remaining questions, please don't hesitate to join the community on our GWHF.

Posted by Kaspar Szymanski, Search Quality Strategist, Dublin

Friday, March 12, 2010

Working with multi-regional websites

Webmaster Level: Intermediate

Did you know that a majority of users surveyed feel that having information in their own language was more important than a low price? Living in a non-English-speaking country, I've seen friends and family members explicitly look for and use local and localized websites—properly localized sites definitely have an advantage with users. Google works hard to show users the best possible search results. Many times those are going to be pages that are localized, for the user's location and/or in the user's language.

If you're planning to take the time to create and maintain a localized version of your website, making it easy to recognize and find is a logical part of that process. In this blog post series, we'll take a look at what is involved with multi-regional and multi-lingual websites from a search engine point of view. A multi-regional website is one that explicitly targets users in various regions (generally different countries); we call it multilingual when it is available in multiple languages, and sometimes, the website targets both multiple regions and is in multiple languages. Let's start with some general preparations and then look at websites that target multiple regions.

Preparing for global websites

Expanding a website to cover multiple regions and/or languages can be challenging. By creating multiple versions of your website, any issues with the base version will be multiplied; make sure that you have everything working properly before you start. Given that this generally means you'll suddenly be working with a multiplied number of URLs, don't forget that you'll need appropriate infrastructure to support the website.

Planning multi-regional websites

When planning sites for multiple regions (usually countries), don't forget to research legal or administrative requirements that might come into play first. These requirements may determine how you proceed, for instance whether or not you would be eligible to use a country-specific domain name.

All websites start with domain names; when it comes to domain names, Google differentiates between two types of domain names:

ccTLDs (country-code top level domain names): These are tied to a specific country (for example .de for Germany, .cn for China). Users and search engines use this as a strong sign that your website is explicitly for a certain country.
gTLDs (generic top level domain names): These are not tied to a specific country. Examples of gTLds are .com, .net, .org, .museum. Google sees regional top level domain names such as .eu and .asia as gTLDs, since they cannot be tied to a specific country. We also treat some vanity ccTLDs (such as .tv, .me, etc.) as gTLDs as we've found that users and webmasters frequently see these as being more generic than country-targeted (we don't have a complete list of such vanity ccTLDs that we treat as gTLDs as it may change over time). You can set geotargeting for websites with gTLDs using the Webmaster Tools Geographic Target setting.

Geotargeting factors

Google generally uses the following elements to determine the geotargeting of a website (or a part of a website):

Use of a ccTLD is generally a strong signal for users since it explicitly specifies a single country in an unmistakable way.
or
Webmaster Tools' manual geotargeting for gTLDs (this can be on a domain, subdomain or subdirectory level); more information on this can be found in our blog post and in the Help Center. With region tags from geotargeting being shown in search results, this method is also very clear to users. Please keep in mind that it generally does not make sense to set a geographic target if the same pages on your site target more than a single country (say, all German-speaking countries) — just write in that language and do not use the geotargeting setting (more on writing in other languages will follow soon!).
Server location (through the IP address of the server) is frequently near your users. However, some websites use distributed content delivery networks (CDNs) or are hosted in a country with better webserver infrastructure, so we try not to rely on the server location alone.
Other signals can give us hints. This could be from local addresses & phone numbers on the pages, use of local language and currency, links from other local sites, and/or the use of Google's Local Business Center (where available).

Note that we do not use locational meta tags (like "geo.position" or "distribution") or HTML attributes for geotargeting. While these may be useful in other regards, we've found that they are generally not reliable enough to use for geotargeting.

URL structures

The first three elements used for geotargeting are strongly tied to the server and to the URLs used. It's difficult to determine geotargeting on a page by page basis, so it makes sense to consider using a URL structure that makes it easy to segment parts of the website for geotargeting. Here are some of the possible URL structures with pros and cons with regards to geotargeting:

ccTLDs eg: example.de, example.fr	Subdomains with gTLDs eg: de.site.com, fr.site.com, etc.	Subdirectories with gTLDs eg: site.com/de/, site.com/fr/, etc.	URL parameters eg: site.com?loc=de, ?country=france, etc.
pros (+) - clear geotargeting - server location is irrelevant - easy separation of sites - legal requirements (sometimes)	pros (+) - easy to set up - can use Webmaster Tools geotargeting - allows different server locations - easy separation of sites	pros (+) - easy to set up - can use Webmaster Tools geotargeting - low maintenance (same host)	pros (+) (not recommended)
cons (-) - expensive (+ availability) - more infrastructure - ccTLD requirements (sometimes)	cons (-) - users might not recognize geotargeting from the URL alone (is "de" the language or country?)	cons (-) - users might not recognize geotargeting from the URL alone - single server location - separation of sites harder	cons (-) - segmentation based on the URL is difficult - users might not recognize geotargeting from the URL alone - geotargeting in Webmaster Tools is not possible

As you can see, geotargeting is not an exact science (even sites using country-code top level domain names can be global in nature), so it's important that you plan for the users from the "wrong" location. One way to do this could be to show links on all pages for users to select their region and language of choice. We'll look at some other possible solutions further on in this blog post series.

Dealing with duplicate content on global websites

Websites that provide content for different regions and in different languages sometimes create content that is the same or similar but available on different URLs. This is generally not a problem as long as the content is for different users in different countries. While we strongly recommend that you provide unique content for each different group of users, we understand that this may not always be possible for all pages and variations from the start. There is generally no need to "hide" the duplicates by disallowing crawling in a robots.txt file or by using a "noindex" robots meta tag. However, if you're providing the same content to the same users on different URLs (for instance, if both "example.de/" and "example.com/de/" show German language content for users in Germany), it would make sense to choose a preferred version and to redirect (or use the "rel=canonical" link element) appropriately.

Do you already have a website that targets multiple regions or do you have questions about the process of planning one? Come to the Help Forum and join the discussion. In following posts, we'll take a look at multi-lingual websites and then look at some special situations that can arise with global websites. Bis bald!

Written by John Mueller, Webmaster Trends Analyst, Google Switzerland

Thursday, March 11, 2010

Microdata support for Rich Snippets

Webmaster Level: All

HTML5 is the fifth major revision of HTML, the core language of the World Wide Web. The HTML5 specification includes a description of microdata, a new markup standard for specifying structured information within web pages.

Today, we’re happy to announce support for microdata for use in rich snippets in addition to our existing support for microformats and RDFa. By using microdata markup in your web pages, you can specify reviews, people profiles, or events information on your web pages that Google may use to improve the presentation of your pages in Google search results.

Here is a simple HTML block showing a section of a review of “L’Amourita Pizza”:

Here is the same HTML with microdata added to specify the restaurant being reviewed, the author and date of the review, and the rating:

Microdata has the nice property of balancing richness with simplicity. As you can see, it’s easy to add markup to your pages using a few HTML attributes like itemscope (to define a new item), itemtype (to specify the type of item being described), and itemprop (to specify a property of that item). Once you’ve added markup to a page, you can test it using the rich snippets testing tool to make sure that Google can parse the data on your page.

As with microformats and RDFa, the vocabulary that we support -- including which item types and item properties are understood by Google -- is specified in our rich snippets documentation as well as on data-vocabulary.org. Marking up your content does not guarantee that rich snippets will show for your site; Google will expand the use of microdata markup gradually to ensure a great user experience.

To get started, here are some helpful links:

Written by Siddhartha Chattopadhyay, Kavi Goel, Ramanathan V. Guha, Pravir Gupta, Othar Hansson

Monday, March 8, 2010

Fetch as Googlebot Mobile and Claim your Sidewiki comment - added to Webmaster Tools Labs!

Webmaster Level: All

Last October, we launched Webmaster Tools Labs and it has been a huge success. Malware Details have helped thousands of users identify pages on their site that may be infected with malicious code, and Fetch as Googlebot has given users more insight into our crawler.

Today, we're happy to announce two additional Labs features:

Fetch as Googlebot-Mobile
Create your Sidewiki page owner entry

Fetch as Gooblebot Mobile (developed by Ryoichi Imaizumi)

After we launched Fetch as Googlebot, many users with mobile-specific sites asked if we could provide the ability to fetch their pages as Googlebot-Mobile. We thought it was a great idea, and added it as an option to our Fetch as Googlebot feature. We have two mobile options: cHTML (primarily used for Japanese sites), and XHTML/WML.

Create your Sidewiki page owner entry (developed by Derek Prothro)

Sidewiki allows users to contribute helpful information to any webpage using a sidebar in Google Toolbar or a Chrome extension. Webmasters can create a special entry, called a page owner entry, that appears above all entries written by users.

After Sidewiki launched webmasters kept asking, "How can I put a Sidewiki page owner entry on all pages of my site quickly?" With the feature that we're introducing today, you can now create these page owner entries directly within Webmaster Tools for any site you own.

We're really happy about these new features, and hope you enjoy them as much as we do. Let us know what you think!

Written by Sagar Kamdar, Product Manager, Webmaster Tools

Tuesday, March 2, 2010

Sharing the verification love

Webmaster Level: All

Everything is more fun with a friend! We've just added a feature to Webmaster Tools Site Verification to make it easier to share verified ownership of your websites.

In the past, if more than one person needed to be a verified owner of a website, they each had to go through the meta tag or HTML file verification process. That works fine for some situations, but for others it can be challenging. For example, what if you have twenty people who need to be verified owners of your site? Adding twenty meta tags or HTML files could be pretty time consuming. Our new verification delegation feature makes adding new verified owners a snap.

Once you're a verified owner of a website, you can view the Verification Details page (linked from Webmaster Tools or the Verification home page). That page will show you information about the site as well as a list of any other verified owners. At the bottom of the list of owners, you'll now see a button labeled "Add a user...". Click that, enter the user's email address, and that person will instantly become a verified owner for the site! You can remove that ownership at any time by clicking the "Unverify" link next to the person's email address on the Details page.

There are a few important things to keep in mind as you use this feature. First, each site must always have at least one owner who has verified directly (via meta tag or HTML file). If all of the directly verified owners become unverified, the delegated owners may also become unverified. Second, you can only delegate ownership to people with Google Accounts. Finally, remember that anyone you delegate ownership to will have exactly the same access you have. They can delegate to more people, submit URL Removal requests and manage Sitelinks in Webmaster Tools, etc. Only delegate ownership to people you trust!

We hope this makes things a little easier for those of you who need more than one person to be a verified owner of your site. As always, please visit the Webmaster Help Forum if you have any questions.

Sean Harding, Software Engineer

Google's SEO Report Card

Webmaster Level: All

How many of Google's web pages use a descriptive title tag? Do we use description meta tags? Heading tags? While we always try to focus on the user, could our products use an SEO tune up? These are just some of the questions we set out to answer with Google's SEO Report Card.

Google's SEO Report Card is an effort to provide Google's product teams with ideas on how they can improve their products' pages using simple and accepted optimizations. These optimizations are intended to not only help search engines understand the content of our pages better, but also to improve our users' experience when visiting our sites. Simple steps such as fixing 404s and broken links, simplifying URL choice, and providing easier-to-understand titles and snippets for our pages can benefit both users and search engines. From the start of the project we also wanted to release the report card publicly so other companies and webmasters could learn from the report, which is filled with dozens of examples taken straight from our products' pages.

The project looked at the main pages of 100 different Google products, measuring them across a dozen common optimization categories. Future iterations of the project might look at deeper Google product web pages as well as international ones. We released the report card within Google last month and since then a good number of teams have taken action on it or plan to.

We hope you find our SEO Report Card useful and we'd love to hear your feedback in the comments below or in the Webmaster Central Help Forum. And if you'd like to do your own SEO tune up, a good place to start is by reading our free SEO Beginner's Guide.

Written by Brandon Falls, Adi Goradia, and Charlene Perez, Search Quality Team

Monday, March 1, 2010

Is your site hacked? New Message Center notifications for hacking and abuse

Webmaster Level: All

As we crawl the web, we see bad content inserted on to thousands of hacked sites each day. The number of sites attacked is staggering and the problem is only getting worse. Hackers and spammers target and successfully compromise any sites they can - small personal sites, schools and universities, even multinational corporations. Spam attacks against forums and user content sections of sites, though not as shocking, are even more widespread.

You may have read in an earlier post that we've begun notifying webmasters about new software versions via Webmaster Tools to help protect their sites. Continuing with our effort to provide more useful information to webmasters, we're happy to announce that we'll soon be sending even more notifications to the Message Center.

Starting this month, we will notify more webmasters of more potential issues we've detected on their websites, including:

Spammy or abused user-generated content
Abused forum pages or egregious amounts of comment spam
Suspected hacking

These notifications are meant to alert webmasters of potential issues and provide next steps on how to get their sites fixed and back into Google's search results. If it pertains to a hacking or abuse issue, the notification will point to example URLs exhibiting this type of behavior. These notifications will run in parallel with our existing malware notifications.

A notice of suspected hacking, for example, will look like this:

We've been notifying webmasters of suspected hacking for years, but a recent upgrade to our systems will allow us to notify many more site owners that have been hacked. We hope webmasters will find these notifications useful in making sure their sites are clean and secure, ultimately providing a better user experience for their visitors. In the future, we may extend this effort even further to include other types of vulnerabilities or abuse issues.

Just as before, webmasters who have not already signed up for Webmaster Tools may still do so and retrieve previously sent messages within one year of their send date. And if you don't want to miss out on any important messages, remember to use the email forwarding feature to receive these alerts in your inbox.

If you have any questions, please feel free to ask in our Webmaster Help Forum or leave your comments below.

Posted by Jessica Wong and Jason Morrison, Search Quality Team

Tuesday, February 2, 2010

How did you do on the Webmaster Quiz?

Webmaster Level: All

Thanks to all of you who took our webmaster quiz and waited patiently to see how well you did! Today, we're pleased to present the Webmaster Quiz answers! We hope this quiz has provided some clarity on common issues users ask about in the Webmaster Help Forum. We'll go over a few of the questions and answers here, but if some of the answers lead you to ask more questions, we encourage you to continue the discussion in the forum!

1) You have moved your site to a new domain name. For users and search engines, the preferred way to permanently redirect traffic is:

Correct answer: a) 301 redirect

Explanation: A 301 redirect is preferred because it tells search engines, "Ok, this is the new domain I want you to show to users from now on," as opposed to something like a 302 redirect, which tells search engines, "Hey, this is only a temporary redirect--so, uh, I might change the URL soon, okay?" In addition to implementing a 301 redirect, the Change of Address feature in Webmaster Tools can help Google find your new site.

2) Your server is going to be moved and unavailable for a day. What should you do?

Correct answer: c) Return "Network Unavailable (503)" with a helpful message to all requests

Explanation: Maybe not as commonly known to webmasters, but very useful if your site is down! This tells crawlers to come back later, rather than crawling and indexing your "Down for maintenance" pages when you respond with 200 rather than 503. Check out the Help Center to learn more about HTTP status codes.

3) Your website is not in the index five days after you've put it online; what should you do?

Correct answer: b) Continue working on the site

Explanation: This one is a bit tricky. There could be a number of reasons why your site is not indexed. For example, a site's robots.txt file may contain a directive to inadvertently block crawlers from searching its contents. But the main take-away from this question is that if your site is pretty new, it may just be a matter of time before it gets indexed. You should continue to focus on improving your site for your users.

6) You need to remove 192 PDF files from the /private-files/ folder which have gotten indexed. What's the fastest way to do this?

Correct answer: d) Disallow the folder in robots.txt and request removal of the whole folder in Webmaster Tools.

Explanation: Before removing a directory that you don't want indexed, you need to include the Disallow directive in your robots.txt file to tell search bots not to crawl it anymore.

9) You have a country-coded domain name called example.es. To associate your site with Spain, you need to:

Correct answer: c) None of this is necessary. Google should already associate a domain ending in .es with Spain.

Explanation: Some country-coded domains may overlap with international ones, like .tv--which could also be a site from Tuvalu. But these sort of cases are rare and if they do arise, don't be shy to seek out help on the forum.

Great job to everyone who took the quiz and tested their know-how! And last but certainly not least, kudos to the top scorers! Congratulations on a quiz well done!

40/40:

ChrisRaimondi
theopeek
beussery

39/40:

Petro
pornel
Ian Macfarlane
g1smd
Mattman
thinkpragmatic
GLV
GoalGorilla
rssmarketer

38/40:

BartVB
Kim Minh Kaplan
Ippi
Erik Dafforn
scole01
Konstantin
John
fer.vazquez
eMBe
Todd Nemet
p.jaroszynski
ph0b

Posted by Charlene Perez, Search Quality Team

Thursday, January 28, 2010

Request visitors' permission before installing software

(Cross-posted on the Google Korea Blog)

Webmaster Level: All

Legitimate websites may require that their visitors install software. These sites often do so to provide their users with additional functionality beyond what's available in standard web browsers, like viewing a special type of document. Please note, however, that if your site requires specific software for your visitors, the implementation of this software installation process is very important. Incorrect implementation can appear as though you're installing malware, triggering our malware detection filters, and resulting in your site being labeled with a 'This site may harm your computer' malware warning in our search results.

If using your site requires a special software install, you need to first inform visitors why they need to install additional software. Here are two bad examples and one good example of how to handle the situation of a new visitor to such a site:

Bad: Install the required software without giving the visitor a chance to choose whether or not they want to install the software.

Bad: Pop up a confirmation dialog box that prompts the visitor to agree to install the software, without providing enough detail for the visitor to make an informed choice. (This includes the standard ActiveX control installation dialog box, since it doesn't contain enough meaningful information for a visitor to make an informed decision about that particular piece of software.)

Good: Redirect the new visitor to an information page which provides thorough details on why a special software installation is required to use the site. From this page the visitor can initiate the installation of the required software if they decide to proceed with installation.

Has your site been labeled with a malware warning in our search results due to a poorly implemented software installation requirement? Updating the installation process to ensure that visitors are fully informed on why the installation is necessary, and giving them a chance to opt out, should resolve this issue. Once you've got this in place, you can go to Webmaster Tools and request a malware review to expedite the process of removing any malware warnings associated with your site in Google's search results.

Written by Jonathan Simon, Webmaster Trends Analyst

Tuesday, January 26, 2010

Protect your site from spammers with reCAPTCHA

Webmaster Level: All

If you allow users to publish content on your website, from leaving comments to creating user profiles, you’ll likely see spammers attempt to take advantage of these mechanisms to generate traffic to their own sites. Having this spammy content on your site isn't fun for anyone. Users may be subjected to annoying advertisements directing them to low-quality or dangerous sites containing scams or malware. And you as a webmaster may be hosting content that violates a search engine's quality guidelines, which can harm your site's standing in search results.

There are ways to handle this abuse, such as moderating comments and reviewing new user accounts, but there is often so much spam created that it can become impossible to keep up with. Spam can easily get to this unmanageable level because most spam isn’t created manually by a human spammer. Instead, spammers use computer programs called “bots” to automatically fill out web forms to create spam, and these bots can generate spam much faster than a human can review it.

To level the playing field, you can take steps to make sure that only humans can interact with potentially spammable features of your website. One way to determine which of your visitors are human is by using a CAPTCHA , which stands for "completely automated public Turing test to tell computers and humans apart." A typical CAPTCHA contains an image of distorted letters which humans can read, but are not easily understood by computers. Here's an example:

You can easily take advantage of this technology on your own site by using reCAPTCHA, a free service owned by Google. One unique aspect of reCAPTCHA is that data collected from the service is used to improve the process of scanning text, such as from books or newspapers. By using reCAPTCHA, you're not only protecting your site from spammers; you're helping to digitize the world's books.

Luis Von Ahn, one reCAPTCHA's co-founders, gives more details about how the service works in the video below:

If you’d like to implement reCAPTCHA for free on your own site, you can sign up here. Plugins are available for easy installation on popular applications and programming environments such as WordPress and PHP.

Posted by Michael Wyszomierski, Search Quality Team

Friday, January 22, 2010

Introducing a new Rich Snippets format: Events

Webmaster Level: All

Last year we introduced Rich Snippets, a new feature that makes it possible to surface structured data from your pages on Google's search results. So far, user reaction to Rich Snippets has been enthusiastic -- after all, Rich Snippets help people make more informed clicks and find what they need even faster.

We originally introduced Rich Snippets with two formats: reviews and people. Later in the year we added support for marking up video information which is used to improve Video Search. Today, we're excited to kick off the new year by adding support for events.

Events markup is based off of the hCalendar microformat. Here's an example of what the new events Rich Snippets will look like:

The new format shows links to specific events on the page along with dates and locations. It provides a fast and convenient way for users to determine if a page has events they may be interested in.

If you have event listings on your site, we encourage you to review the events documentation we've prepared to help you get started. Please note, however, that marking up your content is not a guarantee that Rich Snippets will show for your site. Just as we did for previous formats, we will take a gradual approach to incorporating the new event snippets to ensure a great user experience along the way.

Stay tuned for more developments in Rich Snippets throughout the year!

Written by Mike Danylchuk and Nitin Shetti

Thursday, January 21, 2010

Google SEO resources for beginners

Webmaster Level: Beginner

Want to eat healthier and exercise more in 2010? That's tough! Want to learn about search engine optimization (SEO) so you can disregard the rumors and know what's important? That's easy! Here's how to gain SEO knowledge as you go about your new start to 2010:

Step 1: Absorb the basics

If you like to learn by reading, download our SEO Starter Guide for reading while you're on an exercise bike, training for Ironman.
Or, if you're more a video watcher, try listening to my "Search Friendly Development" session while you're cleaning your house. Keep in mind that some parts of the presentation are a little more technical.

For good measure, and because at some point you'll hear references to them, check out our webmaster guidelines for yourself.

Step 2: Explore details that pique your interest

Are you done with the basics but now you have some questions? Good for you! Try researching a particular topic in our Webmaster Help Center. For example, do you want more information about crawling and indexing or understanding what links are all about?

Step 3: Verify ownership of your site in Webmaster Tools

It takes a little bit of skill, but we have tons of help for verification. Once you verify ownership of your site (i.e., signal to Google that you're the owner), you can:

See more confidential information about your site, like crawl errors or messages from Google about your site
Run through the Webmaster Tools checklist and learn new ways to monitor and manage your site
Forward your Google messages to your personal email address

A sample message regarding the crawlability of your site

Step 4: Research before you do anything drastic

Usually the basics (e.g., good content/service and a crawlable site with indexable information) are the necessities for SEO. You may hear or read differently, but before you do anything drastic on your site such as robots.txt disallow'ing all of your directories or revamping your entire site architecture, please try:

Researching material in our Help Center
Checking for related posts in our Webmaster Forum
Asking for feedback from the webmaster community -- we've got a super skilled group of Google employees, members, and Bionic Posters.

Written by Maile Ohye, Developer Programs Tech Lead

Wednesday, January 20, 2010

State of the Index 2009

Webmaster Level: All

At PubCon in Las Vegas in November 2009, I gave a "State of the Index" talk which covers what Google has done for users, web developers, and webmasters in the last year. I recently recreated it on video for those of you who didn't make it to the conference. You can watch it below:

And here are the slides if you'd like to follow along:

Posted by Matt Cutts, Search Quality Team

Tuesday, January 19, 2010

Test your webmaster know-how!

Webmaster Level: All

We thought it might be fun and educational to create a quiz for webmasters about issues we commonly see in the Webmaster Help Forum. Together with our awesome Bionic Posters, we've tried to come up with questions and answers that reflect recurring concerns in the forum and some information that may not be well known. Some things to keep in mind when taking this quiz:

The quiz will be available to take from today until Wednesday, January 27 at 5PM PST.
It doesn't cover all facets of webmaster problems that arise, and—as with any test—it is at best only a fun way to test your webmaster prowess ;). We leave discussion of specific cases to the forum.
We've set up the quiz using our very own Google Docs. This means you won't see results right away, but we plan to write a follow-up blog post explaining answers and listing top scorers. Be sure to save your answers or print out your completed quiz before submitting! This way you can check your answers against the correct ones when we publish them.
It's just for fun!

Now go take the quiz!

Posted by Charlene Perez, Search Quality Team

Thursday, January 14, 2010

Answering your December Grab Bag questions

Webmaster Level: All

You asked and Matt Cutts answered. It's time to answer the latest round of Grab Bag questions! Here's the first answer, complete with Matt's new hairstyle:

We have a lot of videos ready to share, so we're not currently taking new questions for the Grab Bag. If you have a question that you would like answered, your best bet as always is to head to our Webmaster Help Forum, where you'll find plenty of knowledgeable webmasters, including some Bionic Posters!

To be kept up-to-date on our latest video releases, you can follow @googlewmc on Twitter, where we'll announce new videos and blog posts as they're published.

Posted by Michael Wyszomierski, Search Quality Team

Pages

Traffic

CPX

PTP

DOWNLOAD OUR MOVIE COLLECTION

Tuesday, March 30, 2010

Friday, March 19, 2010

Google and language recognition

The anatomy of a multilingual site: URL structure

Crawling and indexing your multilingual website

Working with character encodings

If your tongue gets twisted...

Monday, March 15, 2010

Friday, March 12, 2010

Preparing for global websites

Planning multi-regional websites

Geotargeting factors

URL structures

Dealing with duplicate content on global websites

Thursday, March 11, 2010

Monday, March 8, 2010

Tuesday, March 2, 2010

Monday, March 1, 2010

Tuesday, February 2, 2010

Thursday, January 28, 2010

Tuesday, January 26, 2010

Friday, January 22, 2010

Thursday, January 21, 2010

Wednesday, January 20, 2010

Tuesday, January 19, 2010

Thursday, January 14, 2010