Net Leading: general tips

Showing posts with label general tips. Show all posts

Wednesday, October 10, 2012

Make the web faster with mod_pagespeed, now out of Beta

If your page is on the web, speed matters. For developers and webmasters, making your page faster shouldn’t be a hassle, which is why we introduced mod_pagespeed in 2010. Since then the development team has been working to improve the functionality, quality and performance of this open-source Apache module that automatically optimizes web pages and their resources. Now, after almost two years and eighteen releases, we are announcing that we are taking off the Beta label.

We’re committed to working with the open-source community to continue evolving mod_pagespeed, including more, better and smarter optimizations and support for other web servers. Over 120,000 sites are already using mod_pagespeed to improve the performance of their web pages using the latest techniques and trends in optimization. The product is used worldwide by individual sites, and is also offered by hosting providers, such as DreamHost, Go Daddy and content delivery networks like EdgeCast. With the move out of beta we hope that even more sites will soon benefit from the web performance improvements offered through mod_pagespeed.

mod_pagespeed is a key part of our goal to help make the web faster for everyone. Users prefer faster sites and we have seen that faster pages lead to higher user engagement, conversions, and retention. In fact, page speed is one of the signals in search ranking and ad quality scores. Besides evangelizing for speed, we offer tools and technologies to help measure, quantify, and improve performance, such as Site Speed Reports in Google Analytics, PageSpeed Insights, and PageSpeed Optimization products. In fact, both mod_pagespeed and PageSpeed Service are based on our open-source PageSpeed Optimization Libraries project, and are important ways in which we help websites take advantage of the latest performance best practices.

To learn more about mod_pagespeed and how to incorporate it in your site, watch our recent Google Developers Live session or visit the mod_pagespeed product page.

Posted by Joshua Marantz and Ilya Grigorik, Google PageSpeed Team

Friday, September 14, 2012

Answering the top questions from government webmasters

Webmaster level: Beginner - Intermediate

Government sites, from city to state to federal agencies, are extremely important to Google Search. For one thing, governments have a lot of content — and government websites are often the canonical source of information that’s important to citizens. Around 20 percent of Google searches are for local information, and local governments are experts in their communities.

That’s why I’ve spoken at the National Association of Government Webmasters (NAGW) national conference for the past few years. It’s always interesting speaking to webmasters about search, but the people running government websites have particular concerns and questions. Since some questions come up frequently I thought I’d share this FAQ for government websites.

Question 1: How do I fix an incorrect phone number or address in search results or Google Maps?

Although managing their agency’s site is plenty of work, government webmasters are often called upon to fix problems found elsewhere on the web too. By far the most common question I’ve taken is about fixing addresses and phone numbers in search results. In this case, government site owners really can do it themselves, by claiming their Google+ Local listing. Incorrect or missing phone numbers, addresses, and other information can be fixed by claiming the listing.

Most locations in Google Maps have a Google+ Local listing — businesses, offices, parks, landmarks, etc. I like to use the San Francisco Main Library as an example: it has contact info, detailed information like the hours they’re open, user reviews and fun extras like photos. When we think users are searching for libraries in San Francisco, we may display a map and a listing so they can find the library as quickly as possible.

If you work for a government agency and want to claim a listing, we recommend using a shared Google Account with an email address at your .gov domain if possible. Usually, ownership of the page is confirmed via a phone call or post card.

Question 2: I’ve claimed the listing for our office, but I have 43 different city parks to claim in Google Maps, and none of them have phones or mailboxes. How do I claim them?

Use the bulk uploader! If you have 10 or more listings / addresses to claim at the same time, you can upload a specially-formatted spreadsheet. Go to www.google.com/places/, click the "Get started now" button, and then look for the "bulk upload" link.

If you run into any issues, use the Verification Troubleshooter.

Question 3: We're moving from a .gov domain to a new .com domain. How should we move the site?

We have a Help Center article with more details, but the basic process involves the following steps:

Make sure you have both the old and new domain verified in the same Webmaster Tools account.
Use a 301 redirect on all pages to tell search engines your site has moved permanently.

Don't do a single redirect from all pages to your new home page — this gives a bad user experience.
If there's no 1:1 match between pages on your old site and your new site (recommended), try to redirect to a new page with similar content.
If you can't do redirects, consider cross-domain canonical links.

Make sure to check if the new location is crawlable by Googlebot using the Fetch as Google feature in Webmaster Tools.
Use the Change of Address tool in Webmaster Tools to notify Google of your site's move.
Have a look at the Links to Your Site in Webmaster Tools and inform the important sites that link to your content about your new location.
We recommend not implementing other major changes at the same time, like large-scale content, URL structure, or navigational updates.
To help Google pick up new URLs faster, use the Fetch as Google tool to ask Google to crawl your new site, and submit a Sitemap listing the URLs on your new site.
To prevent confusion, it's best to retain control of your old site’s domain and keep redirects in place for as long as possible — at least 180 days.

What if you’re moving just part of the site? This question came up too — for example, a city might move its "Tourism and Visitor Info" section to its own domain.

In that case, many of the same steps apply: verify both sites in Webmaster Tools, use 301 redirects, clean up old links, etc. In this case you don't need to use the Change of Address form in Webmaster Tools since only part of your site is moving. If for some reason you’ll have some of the same content on both sites, you may want to include a cross-domain canonical link pointing to the preferred domain.

Question 4: We've done a ton of work to create unique titles and descriptions for pages. How do we get Google to pick them up?

First off, that's great! Better titles and descriptions help users decide to click through to get the information they need on your page. The government webmasters I’ve spoken with care a lot about the content and organization of their sites, and work hard to provide informative text for users.

Google's generation of page titles and descriptions (or "snippets") is completely automated and takes into account both the content of a page as well as references to it that appear on the web. Changes are picked up as we recrawl your site. But you can do two things to let us know about URLs that have changed:

Submit an updated XML Sitemap so we know about all of the pages on your site.
In Webmaster Tools, use the Fetch as Google feature on a URL you’ve updated. Then you can choose to submit it to the index.

You can choose to submit all of the linked pages as well — if you’ve updated an entire section of your site, you might want to submit the main page or an index page for that section to let us know about a broad collection of URLs.

Question 5: How do I get into the YouTube government partner program?

For this question, I have bad news, good news, and then even better news. On the one hand, the government partner program has been discontinued. But don’t worry, because most of the features of the program are now available to your regular YouTube account. For example, you can now upload videos longer than 10 minutes.

Did I say I had even better news? YouTube has added a lot of functionality useful for governments in the past year:

You can now broadcast live streaming video to YouTube via Hangouts On Air (requires a Google+ account).
You can link your YouTube account with your Webmaster Tools account, making it the "official channel" for your site.
Automatic captions continue to get better and better, supporting more languages.

I hope this FAQ has been helpful, but I’m sure I haven’t covered everything government webmasters want to know. I highly recommend our Webmaster Academy, where you can learn all about making your site search-engine friendly. If you have a specific question, please feel free to add a question in the comments or visit our really helpful Webmaster Central Forum.

Posted by Jason Morrison, Search Quality Team

Tuesday, June 26, 2012

SEO essentials for startups in under 10 minutes

Webmaster Level: Beginner to Intermediate

Wondering how to be search-friendly but lacking time for SEO research? We’d like to help! Meta keywords tag? Google Search ignores it. Meta description? Good to include.
If you:

Work on a company website that’s under 50ish pages.
Hope to rank well for your company name and a handful of related terms (not lots of terms like a news agency or e-commerce site).
Want to be smart about search engines and attracting searchers, but haven’t kept up with the latest search news.

Then perhaps set aside ten minutes for this video (or just the slides) and gain SEO peace of mind.

Everything I’d tell a startup if I had ten minutes as their SEO consultant.

More tips at developers.google.com/startups. Best of luck!

Written by Maile Ohye, Developer Programs Tech Lead

Tuesday, May 22, 2012

Introducing Webmaster Academy

Webmaster Level: Beginner

Looking through all of the information in Webmaster Central can feel overwhelming, especially if you’re just getting started with a website. This is why we’re excited to introduce a new set of educational materials in a program called Webmaster Academy. Webmaster Academy provides practical and easy-to-understand lessons for beginner webmasters to help you improve your site with topics like getting your site in our index, providing search engines with helpful information about your video and image content, and understanding Webmaster Tools features.

We’ve organized the content to represent what beginner webmasters should know in a way that’s both structured and modular, meaning you can go through the whole curriculum, or pick and choose your own path. Once you’ve read the articles, you can easily delve deeper into each topic, as we provide links to more in-depth articles. Most lessons are also accompanied by a video from the Webmaster Central YouTube Channel. If you’re looking to understand search and improve your site, Webmaster Academy is for you!

Have feedback? Excellent. Post it in our Webmaster Help Forum.

Posted by Julian Prentice, Search Quality Team
Contributors: Oliver Barrett & Alexi Douvas, Search Quality Team

Wednesday, April 25, 2012

1000 Words About Images

Webmaster level: All

Creativity is an important aspect of our lives and can enrich nearly everything we do. Say I'd like to make my teammate a cup of cool-looking coffee, but my creative batteries are empty; this would be (and is!) one of the many times when I look for inspiration on Google Images.

The images you see in our search results come from publishers of all sizes — bloggers, media outlets, stock photo sites — who have embedded these images in their HTML pages. Google can index image types formatted as BMP, GIF, JPEG, PNG and WebP, as well as SVG.

But how does Google know that the images are about coffee and not about tea? When our algorithms index images, they look at the textual content on the page the image was found on to learn more about the image. We also look at the page's title and its body; we might also learn more from the image’s filename, anchor text that points to it, and its "alt text;" we may use computer vision to learn more about the image and may also use the caption provided in the Image Sitemap if that text also exists on the page.

To help us index your images, make sure that:

we can crawl both the HTML page the image is embedded in, and the image itself;
the image is in one of our supported formats: BMP, GIF, JPEG, PNG, WebP or SVG.

Additionally, we recommend:

that the image filename is related to the image’s content;
that the alt attribute of the image describes the image in a human-friendly way;
and finally, it also helps if the HTML page’s textual contents as well as the text near the image are related to the image.

Now some answers to questions we’ve seen many times:

Q: Why do I sometimes see Googlebot crawling my images, rather than Googlebot-Image?
A: Generally this happens when it’s not clear that a URL will lead to an image, so we crawl the URL with Googlebot first. If we find the URL leads to an image, we’ll usually revisit with Googlebot-Image. Because of this, it’s generally a good idea to allow crawling of your images and pages by both Googlebot and Googlebot-Image.

Q: Is it true that there’s a maximum file size for the images?
A: We’re happy to index images of any size; there’s no file size restriction.

Q: What happens to the EXIF, XMP and other metadata my images contain?
A: We may use any information we find to help our users find what they’re looking for more easily. Additionally, information like EXIF data may be displayed in the right-hand sidebar of the interstitial page that appears when you click on an image.

Q: Should I really submit an Image Sitemap? What are the benefits?
A: Yes! Image Sitemaps help us learn about your new images and may also help us learn what the images are about.

Q: I’m using a CDN to host my images; how can I still use an Image Sitemap?
A: Cross-domain restrictions apply only to the Sitemaps’ tag. In Image Sitemaps, the tag is allowed to point to a URL on another domain, so using a CDN for your images is fine. We also encourage you to verify the CDN’s domain name in Webmaster Tools so that we can inform you of any crawl errors that we might find.

Q: Is it a problem if my images can be found on multiple domains or subdomains I own — for example, CDNs or related sites?
A: Generally, the best practice is to have only one copy of any type of content. If you’re duplicating your images across multiple hostnames, our algorithms may pick one copy as the canonical copy of the image, which may not be your preferred version. This can also lead to slower crawling and indexing of your images.

Q: We sometimes see the original source of an image ranked lower than other sources; why is this?
A: Keep in mind that we use the textual content of a page when determining the context of an image. For example, if the original source is a page from an image gallery that has very little text, it can happen that a page with more textual context is chosen to be shown in search. If you feel you've identified very bad search results for a particular query, feel free to use the feedback link below the search results or to share your example in our Webmaster Help Forum.

SafeSearch

Our algorithms use a great variety of signals to decide whether an image — or a whole page, if we’re talking about Web Search — should be filtered from the results when the user’s SafeSearch filter is turned on. In the case of images some of these signals are generated using computer vision, but the SafeSearch algorithms also look at simpler things such as where the image was used previously and the context in which the image was used.

One of the strongest signals, however, is self-marked adult pages. We recommend that webmasters who publish adult content mark up their pages with one of the following meta tags:


<meta name="rating" content="adult" />
<meta name="rating" content="RTA-5042-1996-1400-1577-RTA" />

Many users prefer not to have adult content included in their search results (especially if kids use the same computer). When a webmaster provides one of these meta tags, it helps to provide a better user experience because users don't see results which they don't want to or expect to see.

As with all algorithms, sometimes it may happen that SafeSearch filters content inadvertently. If you think your images or pages are mistakenly being filtered by SafeSearch, please let us know using the following form.

If you need more information about how we index images, please check out the section of our Help Center dedicated to images, read our SEO Starter Guide which contains lots of useful information, and if you have more questions please post them in the Webmaster Help Forum.

Written by Gary Illyes, Webmaster Trends Analyst

Tuesday, April 24, 2012

How to move your content to a new location

Webmaster level: Intermediate

While maintaining a website, webmasters may decide to move the whole website or parts of it to a new location. For example, you might move content from a subdirectory to a subdomain, or to a completely new domain. Changing the location of your content can involve a bit of effort, but it’s worth doing it properly.

To help search engines understand your new site structure better and make your site more user-friendly, make sure to follow these guidelines:

It’s important to redirect all users and bots that visit your old content location to the new content location using 301 redirects. To highlight the relationship between the two locations, make sure that each old URL points to the new URL that hosts similar content. If you’re unable to use 301 redirects, you may want to consider using cross domain canonicals for search engines instead.
Check that you have both the new and the old location verified in the same Google Webmaster Tools account.
Make sure to check if the new location is crawlable by Googlebot using the Fetch as Googlebot feature. It’s important to make sure Google can actually access your content in the new location. Also make sure that the old URLs are not blocked by a robots.txt disallow directive, so that the redirect or rel=canonical can be found.
If you’re moving your content to an entirely new domain, use the Change of address option under Site configuration in Google Webmaster Tools to let us know about the change.

Change of address option in Google Webmaster Tools

Tell us about moving your content via Google Webmaster Tools

If you've also changed your site's URL structure, make sure that it's possible to navigate it without running into 404 error pages. Google Webmaster Tools may prove useful in investigating potentially broken links. Just look for Diagnostics > Crawl errors for your new site.
Check your Sitemap and verify that it’s up to date.
Once you've set up your 301 redirects, you can keep an eye on users to your 404 error pages to check that users are being redirected to new pages, and not accidentally ending up on broken URLs. When a user comes to a 404 error page on your site, try to identify which URL they were trying to access, why this user was not redirected to the new location of your content, and then make changes to your 301 redirect rules as appropriate.
Have a look at the Links to your site in Google Webmaster Tools and inform the important sites that link to your content about your new location.
If your site’s content is specific to a particular region you may want to double check the geotargeting preferences for your new site structure in Google Webmaster Tools.
As a general rule of thumb, try to avoid running two crawlable sites with completely or largely identical content without a 301 redirection or specifying a rel=”canonical”.
Lastly, we recommend not implementing other major changes when you’re moving your content to a new location, like large-scale content, URL structure, or navigational updates. Changing too much at once may confuse users and search engines.

We hope you find these suggestions useful. If you happen to have further questions on how to move your content to a new location we’d like to encourage you to drop by our Google Webmaster Help Forum and seek advice from expert webmasters.

Written by Fili Wiese (Ad Traffic Quality) & Kaspar Szymanski (Search Quality)

Monday, March 19, 2012

Five common SEO mistakes (and six good ideas!)

Webmaster Level: Beginner to Intermediate

To help you avoid common mistakes webmasters face with regard to search engine optimization (SEO), I filmed a video outlining five common mistakes I’ve noticed in the SEO industry. Almost four years ago, we also gathered information from all of you (our readers) about your SEO recommendations and updated our related Help Center article given your feedback. Much of the same advice from 2008 still holds true today -- here’s to more years ahead building a great site!

If you’re short on time, here’s the gist:

Avoid these common mistakes

1. Having no value proposition: Try not to assume that a site should rank #1 without knowing why it’s helpful to searchers (and better than the competition :)

2. Segmented approach: Be wary of setting SEO-related goals without making sure they’re aligned with your company’s overall objectives and the goals of other departments. For example, in tandem with your work optimizing product pages (and the full user experience once they come to your site), also contribute your expertise to your Marketing team’s upcoming campaign. So if Marketing is launching new videos or a more interactive site, be sure that searchers can find their content, too.

3. Time-consuming workarounds: Avoid implementing a hack rather than researching new features or best practices that could simplify development (e.g., changing the timestamp on an updated URL so it’s crawled more quickly instead of easily submitting the URL through Fetch as Googlebot).

4. Caught in SEO trends: Consider spending less time obsessing about the latest “trick” to boost your rankings and instead focus on the fundamental tasks/efforts that will bring lasting visitors.

5. Slow iteration: Aim to be agile rather than promote an environment where the infrastructure and/or processes make improving your site, or even testing possible improvements, difficult.

Six fundamental SEO tips

1. Do something cool: Make sure your site stands out from the competition -- in a good way!

2. Include relevant words in your copy: Try to put yourself in the shoes of searchers. What would they query to find you? Your name/business name, location, products, etc., are important. It's also helpful to use the same terms in your site that your users might type (e.g., you might be a trained “flower designer” but most searchers might type [florist]), and to answer the questions they might have (e.g., store hours, product specs, reviews). It helps to know your customers.

3. Be smart about your tags and site architecture: Create unique title tags and meta descriptions; include Rich Snippets markup from schema.org where appropriate. Have intuitive navigation and good internal links.

4. Sign up for email forwarding in Webmaster Tools: Help us communicate with you, especially when we notice something awry with your site.

5. Attract buzz: Natural links, +1s, likes, follows... In every business there's something compelling, interesting, entertaining, or surprising that you can offer or share with your users. Provide a helpful service, tell fun stories, paint a vivid picture and users will share and reshare your content.

6. Stay fresh and relevant: Keep content up-to-date and consider options such as building a social media presence (if that’s where a potential audience exists) or creating an ideal mobile experience if your users are often on-the-go.

Good luck to everyone!

Written by Maile Ohye, Developer Programs Tech Lead

Tuesday, March 6, 2012

Keeping your free hosting service valuable for searchers

Webmaster level: Advanced

Free web hosting services can be great! Many of these services have helped to lower costs and technical barriers for webmasters and they continue to enable beginner webmasters to start their adventure on the web. Unfortunately, sometimes these lower barriers (meant to encourage less techy audiences) can attract some dodgy characters like spammers who look for cheap and easy ways to set up dozens or hundreds of sites that add little or no value to the web. When it comes to automatically generated sites, our stance remains the same: if the sites do not add sufficient value, we generally consider them as spam and take appropriate steps to protect our users from exposure to such sites in our natural search results.

We consider automatically generated sites like this one to be spammy.

If a free hosting service begins to show patterns of spam, we make a strong effort to be granular and tackle only spammy pages or sites. However, in some cases, when the spammers have pretty much taken over the free web hosting service or a large fraction of the service, we may be forced to take more decisive steps to protect our users and remove the entire free web hosting service from our search results. To prevent this from happening, we would like to help owners of free web hosting services by sharing what we think may help you save valuable resources like bandwidth and processing power, and also protect your hosting service from these spammers:

Publish a clear abuse policy and communicate it to your users, for example during the sign-up process. This step will contribute to transparency on what you consider to be spammy activity.
In your sign-up form, consider using CAPTCHAs or similar verification tools to only allow human submissions and prevent automated scripts from generating a bunch of sites on your hosting service. While these methods may not be 100% foolproof, they can help to keep a lot of the bad actors out.
Try to monitor your free hosting service for other spam signals like redirections, large numbers of ad blocks, certain spammy keywords, large sections of escaped JavaScript code, etc. Using the site: operator query or Google Alerts may come in handy if you’re looking for a simple, cost efficient solution.
Keep a record of signups and try to identify typical spam patterns like form completion time, number of requests sent from the same IP address range, user-agents used during signup, user names or other form-submitted values chosen during signup, etc. Again, these may not always be conclusive.
Keep an eye on your webserver log files for sudden traffic spikes, especially when a newly-created site is receiving this traffic, and try to identify why you are spending more bandwidth and processing power.
Try to monitor your free web hosting service for phishing and malware-infected pages. For example, you can use the Google Safe Browsing API to regularly test URLs from your service, or sign up to receive alerts for your AS.
Come up with a few sanity checks. For example, if you’re running a local Polish free web hosting service, what are the odds of thousands of new and legitimate sites in Japanese being created overnight on your service? There’s a number of tools you may find useful for language detection of newly created sites, for example language detection libraries or the Google Translate API v2.

Last but not least, if you run a free web hosting service be sure to monitor your services for sudden activity spikes that may indicate a spam attack in progress.

For more tips on running a quality hosting service, have a look at our previous post. Lastly, be sure to sign up and verify your site in Google Webmaster Tools so we may be able to notify you when needed or if we see issues.

Written by Fili Wiese (Ad Traffic Quality Team) & Kaspar Szymanski (Search Quality Team)

Thursday, February 9, 2012

Preparing your site for a traffic spike

Webmaster level: Intermediate

It’s a moment any site owner both looks forward to, and dreads: a huge surge in traffic to your site (yay!) can often cause your site to crash (boo!). Maybe you’ll create a piece of viral content, or get Slashdotted, or maybe Larry Page will get a tattoo and your site on tech tattoos will be suddenly in vogue.

Many people go online immediately after a noteworthy event—a political debate, the death of a celebrity, or a natural disaster—to get news and information about that event. This can cause a rapid increase in traffic to websites that provide relevant information, and may even cause sites to crash at the moment they’re becoming most popular. While it’s not always possible to anticipate such events, you can prepare your site in a variety of ways so that you’ll be ready to handle a sudden surge in traffic if one should occur:

Prepare a lightweight version of your site.
Consider maintaining a lightweight version of your website; you can then switch all of your traffic over to this lightweight version if you start to experience a spike in traffic. One good way to do this is to have a mobile version of your site, and to make the mobile site available to desktop/PC users during periods of high traffic. Another low-effort option is to just maintain a lightweight version of your homepage, since the homepage is often the most-requested page of a site as visitors start there and then navigate out to the specific area of the site that they’re interested in. If a particular article or picture on your site has gone viral, you could similarly create a lightweight version of just that page.
A couple tips for creating lightweight pages:

Exclude decorative elements like images or Flash wherever possible; use text instead of images in the site navigation and chrome, and put most of the content in HTML.
Use static HTML pages rather than dynamic ones; the latter place more load on your servers. You can also cache the static output of dynamic pages to reduce server load.

Take advantage of stable third-party services.
Another alternative is to host a copy of your site on a third-party service that you know will be able to withstand a heavy stream of traffic. For example, you could create a copy of your site—or a pared-down version with a focus on information relevant to the spike—on a platform like Google Sites or Blogger; use services like Google Docs to host documents or forms; or use a content delivery network (CDN).
Use lightweight file formats.
If you offer downloadable information, try to make the downloaded files as small as possible by using lightweight file formats. For example, offering the same data as a plain text file rather than a PDF can allow users to download the exact same content at a fraction of the filesize (thereby lightening the load on your servers). Also keep in mind that, if it’s not possible to use plain text files, PDFs generated from textual content are more lightweight than PDFs with images in them. Text-based PDFs are also easier for Google to understand and index fully.
Make tabular data available in CSV and XML formats.
If you offer numerical or tabular data (data displayed in tables), we recommend also providing it in CSV and/or XML format. These filetypes are relatively lightweight and make it easy for external developers to use your data in external applications or services in cases where you want the data to reach as many people as possible, such as in the wake of a natural disaster.

We’d love to hear your tips and tricks for weathering traffic spikes—come join us in our Webmaster Help Forum.

Posted by Susan Moskwa, Webmaster Trends Analyst

Sunday, September 11, 2011

Îñţérñåţîöñåļîžåţîöñ

Webmaster level: Intermediate

So you’re going global, and you need your website to follow. Should be a simple case of getting the text translated and you’re good to go, right? Probably not. The Google Webmaster Team frequently builds sites that are localized into over 40 languages, so here are some things that we take into account when launching our pages in both other languages and regions.

(Even if you think you might be immune to these issues because you only offer content in English, it could be that non-English language visitors are using tools like Google Translate to view your content in their language. This traffic should show up in your analytics dashboard, so you can get an idea of how many visitors are not viewing your site in the way it’s intended.)

More languages != more HTML templates

We can’t recommend this enough: reuse the same template for all language versions, and always try to keep the HTML of your template simple.

Keeping the HTML code the same for all languages has its advantages when it comes to maintenance. Hacking around with the HTML code for each language to fix bugs doesn’t scale–keep your page code as clean as possible and deal with any styling issues in the CSS. To name just one benefit of clean code: most translation tools will parse out the translatable content strings from the HTML document and that job is made much easier when the HTML is well-structured and valid.

How long is a piece of string?

If your design relies on text playing nicely with fixed-size elements, then translating your text might wreak havoc. For example, your left-hand side navigation text is likely to translate into much longer strings of text in several languages–check out the difference in string lengths between some English and Dutch language navigation for the same content. Be prepared for navigation titles that might wrap onto more than one line by figuring out your line height to accommodate this (also worth considering when you create your navigation text in English in the first place).

Variable word lengths cause particular issues in form labels and controls. If your form layout displays labels on the left and fields on the right, for example, longer text strings can flow over into two lines, whereas shorter text strings do not seem associated with their form input fields–both scenarios ruin the design and impede the readability of the form. Also consider the extra styling you’ll need for right-to-left (RTL) layouts (more on that later). For these reasons we design forms with labels above fields, for easy readability and styling that will translate well across languages.

Screenshots of Chinese and German versions of web forms

click to enlarge

Also avoid fixed-height columns–if you’re attempting to neaten up your layout with box backgrounds that match in height, chances are when your text is translated, the text will overrun areas that were only tall enough to contain your English content. Think about whether the UI elements you’re planning to use in your design will work when there is more or less text–for instance, horizontal vs. vertical tabs.

On the flip side

Source editing for bidirectional HTML can be problematic because many editors have not been built to support the Unicode bidirectional algorithm (more research on the problems and solutions). In short, the way your markup is displayed might get garbled:

<p>ابةتث <img src="foo.jpg" alt=" جحخد"< ذرزسش!</p>

Our own day-to-day usage has shown the following editors to currently provide decent solutions for bidirectional editing: particularly Coda, and also Dreamweaver, IntelliJ IDEA and JEditX.

When designing for RTL languages you can build most of the support you need into the core CSS and use the directional attribute of the html element (for backwards compatibility) in combination with a class on the body element. As always, keeping all styles in one core stylesheet makes for better maintainability.

Some key styling issues to watch out for: any elements floated right will need to be floated left and vice versa; extra padding or margin widths applied to one side of an element will need to be overridden and switched, and any text-align attributes should be reversed.

We generally use the following approach, including using a class on the body tag rather than a html[dir=rtl] CSS selector because this is compatible with older browsers:

Elements:

<body class="rtl">
<h1><a href="http://www.blogger.com/"><img alt="Google" src="http://www.google.com/images/logos/google_logo.png" /></a> Heading</h1>

Left-to-right (default) styling:

h1 {
  height: 55px;
  line-height: 2.05;
  margin: 0 0 25px;
  overflow: hidden;
}
h1 img {
  float: left;
  margin: 0 43px 0 0;
  position: relative;
}

Right-to-left styling:

body.rtl {
  direction: rtl;
}
body.rtl h1 img {
  float: right;
  margin: 0 0 0 43px;
}

(See this in action in English and Arabic.)

One final note on this subject: most of the time your content destined for right-to-left language pages will be bidirectional rather than purely RTL, because some strings will probably need to retain their LTR direction–for example, company names in Latin script or telephone numbers. The way to make sure the browser handles this correctly in a primarily RTL document is to wrap the embedded text strings with an inline element using an attribute to set direction, like this:

<h2>‫עוד ב- <span dir="ltr">Google</span>‬</h2>

In cases where you don’t have an HTML container to hook the dir attribute into, such as title elements or JavaScript-generated source code for message prompts, you can use this equivalent to set direction where ‫ and ‬‬ are Unicode control characters for right-to-left embedding:

<title>&#x202B;‫הפוך את Google לדף הבית שלך‬&#x202C;</title>

Example usage in JavaScript code:

var ffError = '\u202B' +'כדי להגדיר את Google כדף הבית שלך ב\x2DFirefox, לחץ על הקישור \x22הפוך את Google לדף הבית שלי\x22, וגרור אותו אל סמל ה\x22בית\x22 בדפדפן שלך.'+ '\u202C';

(For more detail, see the W3C’s articles on creating HTML for Arabic, Hebrew and other right-to-left scripts and authoring right-to-left scripts.)

It’s all Greek to me…

If you’ve never worked with non-Latin character sets before (Cyrillic, Greek, and a myriad of Asian and Indic), you might find that both your editor and browser do not display content as intended.

Check that your editor and browser encodings are set to UTF-8 (recommended) and consider adding a element and the lang attribute of the html element to your HTML template so browsers know what to expect when rendering your page–this has the added benefit of ensuring that all Unicode characters are displayed correctly, so using HTML entities such as é (é) will not be necessary, saving valuable bytes! Check the W3C’s tutorial on character encoding if you’re having trouble–it contains in-depth explanations of the issues.

A word on naming

Lastly, a practical tip on naming conventions when creating several language versions. Using a standard such as the ISO 639-1 language codes for naming helps when you start to deal with several language versions of the same document.

Using a conventional standard will help users understand your site’s structure as well as making it more maintainable for all webmasters who might develop the site, and using the language codes for other site assets (logo images, PDF documents) is handy to be able to quickly identify files.

See previous Webmaster Central posts for advice about URL structures and other issues surrounding working with multi-regional websites and working with multilingual websites.

That’s a summary of the main challenges we wrestle with on a daily basis; but we can vouch for the fact that putting in the planning and work up front towards well-structured HTML and robust CSS pays dividends during localization!

Posted by Kathryn Cullen, Google Webmaster Team

Monday, July 11, 2011

Validation: measuring and tracking code quality

Webmaster level: All

Google’s Webmaster Team is responsible for most of Google’s informational websites like Google’s Jobs site or Privacy Centers. Maintaining tens of thousands of pages and constantly releasing new Google sites requires more than just passion for the job: it requires quality management.

In this post we won’t talk about all the different tests that can be run to analyze a website; instead we’ll just talk about HTML and CSS validation, and tracking quality over time.

Why does validation matter? There are different perspectives on validation—at Google there are different approaches and priorities too—but the Webmaster Team considers validation a baseline quality attribute. It doesn’t guarantee accessibility, performance, or maintainability, but it reduces the number of possible issues that could arise and in many cases indicates appropriate use of technology.

While paying a lot of attention to validation, we’ve developed a system to use it as a quality metric to measure how we’re doing on our own pages. Here’s what we do: we give each of our pages a score from 0-10 points, where 0 is worst (pages with 10 or more HTML and CSS validation errors) and 10 is best (0 validation errors). We started doing this more than two years ago, first by taking samples, now monitoring all our pages.

Since the beginning we’ve been documenting the validation scores we were calculating so that we could actually see how we’re doing on average and where we’re headed: is our output improving, or is it getting worse?

Here’s what our data say:

Validation score development 2009-2011.

On average there are about three validation issues per page produced by the Webmaster Team (as we combine HTML and CSS validation in the scoring process, information about the origin gets lost), down from about four issues per page two years ago.

This information is valuable for us as it tells us how close we are to our goal of always shipping perfectly valid code, and it also tells us whether we’re on track or not. As you can see, with the exception of the 2nd quarter of 2009 and the 1st quarter of 2010, we are generally observing a positive trend.

What has to be kept in mind are issues with the integrity of the data, i.e. the sample size as well as “false positives” in the validators. We’re working with the W3C in several ways, including reporting and helping to fix issues in the validators; however, as software can never be perfect, sometimes pages get dinged for non-issues: see for example the border-radius issue that has recently been fixed. We know that this is negatively affecting the validation scores we’re determining, but we have no data yet to indicate how much.

Although we track more than just validation for quality control purposes, validation plays an important role in measuring the health of Google’s informational websites.

How do you use validation in your development process?

Posted by Jens O. Meiert, Google Webmaster Team

Thursday, June 30, 2011

Beyond PageRank: Graduating to actionable metrics

Webmaster level: Beginner

Like any curious netizen, I have a Google Alert set up to email me whenever my name is mentioned online. Usually I get a slow trickle of my forum posts, blog posts, and tweets. But by far the most popular topic of these alerts over the past couple years has been my off-handed mention that we removed PageRank distribution data from Webmaster Tools in one of our 2009 releases.

The fact that people are still writing about this almost two years later—usually in the context of “Startling news from Susan Moskwa: ...”—really drives home how much PageRank has become a go-to statistic for some webmasters. Even the most inexperienced site owners I talk with have often heard about, and want to know more about, PageRank (“PR”) and what it means for their site. However, as I said in my fateful forum post, the Webmaster Central team has been telling webmasters for years that they shouldn't focus so much on PageRank as a metric for representing the success of one’s website. Today I’d like to explain this position in more detail and give you some relevant, actionable options to fill your time once you stop tracking your PR!

Why PageRank?
In 2008 Udi Manber, VP of engineering at Google, wrote on the Official Google Blog:

“The most famous part of our ranking algorithm is PageRank, an algorithm developed by Larry Page and Sergey Brin, who founded Google. PageRank is still in use today, but it is now a part of a much larger system.”

PageRank may have distinguished Google as a search engine when it was founded in 1998; but given the rate of change Manber describes—launching “about 9 [improvements] per week on the average”—we’ve had a lot of opportunity to augment and refine our ranking systems over the last decade. PageRank is no longer—if it ever was—the be-all and end-all of ranking.

If you look at Google’s Technology Overview, you’ll notice that it calls out relevance as one of the top ingredients in our search results. So why hasn’t as much ink been spilled over relevance as has been over PageRank? I believe it’s because PageRank comes in a number, and relevance doesn’t. Both relevance and PageRank include a lot of complex factors—context, searcher intent, popularity, reliability—but it’s easy to graph your PageRank over time and present it to your CEO in five minutes; not so with relevance. I believe the succinctness of PageRank is why it’s become such a go-to metric for webmasters over the years; but just because something is easy to track doesn’t mean it accurately represents what’s going on on your website.

What do we really want?
I posit that none of us truly care about PageRank as an end goal. PageRank is just a stand-in for what we really want: for our websites to make more money, attract more readers, generate more leads, more newsletter sign-ups, etc. The focus on PageRank as a success metric only works if you assume that a higher PageRank results in better ranking, then assume that that will drive more traffic to your site, then assume that that will lead to more people doing-whatever-you-want-them-to-do on your site. On top of these assumptions, remember that we only update the PageRank displayed on the Google Toolbar a few times a year, and we may lower the PageRank displayed for some sites if we believe they’re engaging in spammy practices. So the PR you see publicly is different from the number our algorithm actually uses for ranking. Why bother with a number that’s at best three steps removed from your actual goal, when you could instead directly measure what you want to achieve? Finding metrics that are directly related to your business goals allows you to spend your time furthering those goals.

If I don’t track my PageRank, what should I be tracking?
Take a look at metrics that correspond directly to meaningful gains for your website or business, rather than just focusing on ranking signals. Also consider metrics that are updated daily or weekly, rather than numbers (like PageRank) that only change a few times a year; the latter is far too slow for you to reliably understand which of your changes resulted in the number going up or down (assuming you update your site more than a few times a year). Here are three suggestions to get you started, all of which you can track using services like Google Analytics or Webmaster Tools:

Conversion rate
Bounce rate
Clickthrough rate (CTR)

Conversion rate
A “conversion” is when a visitor does what you want them to do on your website. A conversion might be completing a purchase, signing up for a mailing list, or downloading a white paper. Your conversion rate is the percentage of visitors to your site who convert (perform a conversion). This is a perfect example of a metric that, unlike PageRank, is directly tied to your business goals. When users convert they’re doing something that directly benefits your organization in a measurable way! Whereas your PageRank is both difficult to measure accurately (see above), and can go up or down without having any direct effect on your business.

Bounce rate
A “bounce” is when someone comes to your website and then leaves without visiting any other pages on your site. Your bounce rate is the percentage of visits to your site where the visitor bounces. A high bounce rate may indicate that users don’t find your site compelling, because they come, take a look, and leave directly. Looking at the bounce rates of different pages across your site can help you identify content that’s underperforming and point you to areas of your site that may need work. After all, it doesn’t matter how well your site ranks if most searchers are bouncing off of it as soon as they visit.

Clickthrough rate (CTR)
In the context of organic search results, your clickthrough rate is how often people click on your site out of all the times your site gets shown in search results. A low CTR means that, no matter how well your site is ranking, users aren’t clicking through to it. This may indicate that they don’t think your site will meet their needs, or that some other site looks better. One way to improve your CTR is to look at your site’s titles and snippets in our search results: are they compelling? Do they accurately represent the content of each URL? Do they give searchers a reason to click on them? Here’s some advice for improving your snippets; the HTML suggestions section of Webmaster Tools can also point you to pages that may need help. Again, remember that it doesn’t matter how well your site ranks if searchers don’t want to click on it.

Entire blogs and books have been dedicated to explaining and exploring web metrics, so you’ll excuse me if my explanations just scrape the surface; analytics evangelist Avinash Kaushik’s site is a great place to start if you want to dig deeper into these topics. But hopefully I’ve at least convinced you that there are more direct, effective and controllable ways to measure your site’s success than PageRank.

One final note: Some site owners are interested in their site’s PR because people won’t buy links from their site unless they have a high PageRank. Buying or selling links for the purpose of passing PageRank violates our Webmaster Guidelines and is very likely to have negative consequences for your website, so a) I strongly recommend against it, and b) don’t be surprised if we aren’t interested in helping you raise your PageRank or improve your website when this is your stated goal.

We’d love to hear what metrics you’ve found useful and actionable for your website! Feel free to share your success stories with us in the comments here or in our Webmaster Help Forum.

Posted by Susan Moskwa, Webmaster Trends Analyst

Thursday, May 12, 2011

Website Security for Webmasters

Webmaster level: Intermediate to Advanced

Users are taught to protect themselves from malicious programs by installing sophisticated antivirus software, but often they may also entrust their private information to websites like yours, in which case it’s important to protect their data. It’s also very important to protect your own data; if you have an online store, you don’t want to be robbed.

Over the years companies and webmasters have learned—often the hard way—that web application security is not a joke; we’ve seen user passwords leaked due to SQL injection attacks, cookies stolen with XSS, and websites taken over by hackers due to negligent input validation.

Today we’ll show you some examples of how a web application can be exploited so you can learn from them; for this we’ll use Gruyere, an intentionally vulnerable application we use for security training internally, too. Do not probe others’ websites for vulnerabilities without permission as it may be perceived as hacking; but you’re welcome—nay, encouraged—to run tests on Gruyere.

Client state manipulation - What will happen if I alter the URL?

Let’s say you have an image hosting site and you’re using a PHP script to display the images users have uploaded:

http://www.example.com/showimage.php?imgloc=/garyillyes/kitten.jpg

So what will the application do if I alter the URL to something like this and userpasswords.txt is an actual file?

http://www.example.com/showimage.php?imgloc=/../../userpasswords.txt

Will I get the content of userpasswords.txt?

Another example of client state manipulation is when form fields are not validated. For instance, let’s say you have this form:

It seems that the username of the submitter is stored in a hidden input field. Well, that’s great! Does that mean that if I change the value of that field to another username, I can submit the form as that user? It may very well happen; the user input is apparently not authenticated with, for example, a token which can be verified on the server.
Imagine the situation if that form were part of your shopping cart and I modified the price of a $1000 item to $1, and then placed the order.

Protecting your application against this kind of attack is not easy; take a look at the third part of Gruyere to learn a few tips about how to defend your app.

Cross-site scripting (XSS) - User input can’t be trusted

A simple, harmless URL:
http://google-gruyere.appspot.com/611788451095/%3Cscript%3Ealert('0wn3d')%3C/script%3E
But is it truly harmless? If I decode the percent-encoded characters, I get:

<script>alert('0wn3d')</script>

Gruyere, just like many sites with custom error pages, is designed to include the path component in the HTML page. This can introduce security bugs, like XSS, as it introduces user input directly into the rendered HTML page of the web application. You might say, “It’s just an alert box, so what?” The thing is, if I can inject an alert box, I can most likely inject something else, too, and maybe steal your cookies which I could use to sign in to your site as you.

Another example is when the stored user input isn’t sanitized. Let’s say I write a comment on your blog; the comment is simple:

<a href=”javascript:alert(‘0wn3d’)”>Click here to see a kitten</a>

If other users click on my innocent link, I have their cookies:

You can learn how to find XSS vulnerabilities in your own web app and how to fix them in the second part of Gruyere; or, if you’re an advanced developer, take a look at the automatic escaping features in template systems we blogged about on our Online Security blog.

Cross-site request forgery (XSRF) - Should I trust requests from evil.com?

Oops, a broken picture. It can’t be dangerous--it’s broken, after all--which means that the URL of the image returns a 404 or it’s just malformed. Is that true in all of the cases?

No, it’s not! You can specify any URL as an image source, regardless of its content type. It can be an HTML page, a JavaScript file, or some other potentially malicious resource. In this case the image source was a simple page’s URL:

That page will only work if I’m logged in and I have some cookies set. Since I was actually logged in to the application, when the browser tried to fetch the image by accessing the image source URL, it also deleted my first snippet. This doesn’t sound particularly dangerous, but if I’m a bit familiar with the app, I could also invoke a URL which deletes a user’s profile or lets admins grant permissions for other users.

To protect your app against XSRF you should not allow state changing actions to be called via GET; the POST method was invented for this kind of state-changing request. This change alone may have mitigated the above attack, but usually it's not enough and you need to include an unpredictable value in all state changing requests to prevent XSRF. Please head to Gruyere if you want to learn more about XSRF.

Cross-site script inclusion (XSSI) - All your script are belong to us

Many sites today can dynamically update a page's content via asynchronous JavaScript requests that return JSON data. Sometimes, JSON can contain sensitive data, and if the correct precautions are not in place, it may be possible for an attacker to steal this sensitive information.

Let’s imagine the following scenario: I have created a standard HTML page and send you the link; since you trust me, you visit the link I sent you. The page contains only a few lines:

<script>function _feed(s) {alert("Your private snippet is: " + s['private_snippet']);}</script><script src="http://google-gruyere.appspot.com/611788451095/feed.gtl"></script>

Since you’re signed in to Gruyere and you have a private snippet, you’ll see an alert box on my page informing you about the contents of your snippet. As always, if I managed to fire up an alert box, I can do whatever else I want; in this case it was a simple snippet, but it could have been your biggest secret, too.

It’s not too hard to defend your app against XSSI, but it still requires careful thinking. You can use tokens as explained in the XSRF section, set your script to answer only POST requests, or simply start the JSON response with ‘\n’ to make sure the script is not executable.

SQL Injection - Still think user input is safe?

What will happen if I try to sign in to your app with a username like

JohnDoe’; DROP TABLE members;--

While this specific example won’t expose user data, it can cause great headaches because it has the potential to completely remove the SQL table where your app stores information about members.

Generally, you can protect your app from SQL injection with proactive thinking and input validation. First, are you sure the SQL user needs to have permission to execute “DROP TABLE members”? Wouldn’t it be enough to grant only SELECT rights? By setting the SQL user’s permissions carefully, you can avoid painful experiences and lots of troubles. You might also want to configure error reporting in such way that the database and its tables’ names aren’t exposed in the case of a failed query.
Second, as we learned in the XSS case, never trust user input: what looks like a login form to you, looks like a potential doorway to an attacker. Always sanitize and quotesafe the input that will be stored in a database, and whenever possible make use of statements generally referred to as prepared or parametrized statements available in most database programming interfaces.

Knowing how web applications can be exploited is the first step in understanding how to defend them. In light of this, we encourage you to take the Gruyere course, take other web security courses from the Google Code University and check out skipfish if you're looking for an automated web application security testing tool. If you have more questions please post them in our Webmaster Help Forum.

Written by Gary Illyes, Webmaster Trends Analyst

Thursday, March 31, 2011

Changes in the Chrome user agent

Webmaster Level: Intermediate to Advanced

The Chrome team is exploring a few changes to Chrome’s UA string. These changes are designed to provide additional details in the user-agent, remove redundancy, and increase compatibility with Internet Explorer. They’re also happening in conjunction with similar changes in Firefox 4.

We intend to ship Chrome 11 with these changes, assuming they don't cause major web compatibility problems. To test them out and ensure your website remains compatible with Chrome, we recommend trying the Chrome Dev and Beta channel builds. If you have any questions, please check out the blog post on the Chromium blog or drop us a line at our help forum.

Written by Peter Kasting, Software Engineer

Introducing Page Speed Online, with mobile support

Webmaster level: intermediate

At Google, we’re striving to make the whole web fast. As part of that effort, we’re launching a new web-based tool in Google Labs, Page Speed Online, which analyzes the performance of web pages and gives specific suggestions for making them faster. Page Speed Online is available from any browser, at any time. This allows website owners to get immediate access to Page Speed performance suggestions so they can make their pages faster.

In addition, we’ve added a new feature: the ability to get Page Speed suggestions customized for the mobile version of a page, specifically smartphones. Due to the relatively limited CPU capabilities of mobile devices, the high round-trip times of mobile networks, and rapid growth of mobile usage, understanding and optimizing for mobile performance is even more critical than for the desktop, so Page Speed Online now allows you to easily analyze and optimize your site for mobile performance. The mobile recommendations are tuned for the unique characteristics of mobile devices, and contain several best practices that go beyond the recommendations for desktop browsers, in order to create a faster mobile experience. New mobile-targeted best practices include eliminating uncacheable landing page redirects and reducing the amount of JavaScript parsed during the page load, two common issues that slow down mobile pages today.

Page Speed Online is powered by the same Page Speed SDK that powers the Chrome and Firefox extensions and webpagetest.org.

Please give Page Speed Online a try. We’re eager to hear your feedback on our mailing list and how you’re using it to optimize your site.

Posted by Andrew Oates and Richard Rabbat, Page Speed team

Tuesday, March 30, 2010

URL removal explained, Part I: URLs & directories

Webmaster level: All

There's a lot of content on the Internet these days. At some point, something may turn up online that you would rather not have out there—anything from an inflammatory blog post you regret publishing, to confidential data that accidentally got exposed. In most cases, deleting or restricting access to this content will cause it to naturally drop out of search results after a while. However, if you urgently need to remove unwanted content that has gotten indexed by Google and you can't wait for it to naturally disappear, you can use our URL removal tool to expedite the removal of content from our search results as long as it meets certain criteria (which we'll discuss below).

We've got a series of blog posts lined up for you explaining how to successfully remove various types of content, and common mistakes to avoid. In this first post, I'm going to cover a few basic scenarios: removing a single URL, removing an entire directory or site, and reincluding removed content. I also strongly recommend our previous post on managing what information is available about you online.

Removing a single URL

In general, in order for your removal requests to be successful, the owner of the URL(s) in question—whether that's you, or someone else—must have indicated that it's okay to remove that content. For an individual URL, this can be indicated in any of three ways:

block the page from crawling via a robots.txt file
block the page from indexing via a noindex meta tag
indicate that the page no longer exists by returning a 404 or 410 status code

Before submitting a removal request, you can check whether the URL is correctly blocked:

robots.txt: You can check whether the URL is correctly disallowed using either the Fetch as Googlebot or Test robots.txt features in Webmaster Tools.
noindex meta tag: You can use Fetch as Googlebot to make sure the meta tag appears somewhere between the <head> and </head> tags. If you want to check a page you can't verify in Webmaster Tools, you can open the URL in a browser, go to View > Page source, and make sure you see the meta tag between the <head> and </head> tags.
404 / 410 status code: You can use Fetch as Googlebot, or tools like Live HTTP Headers or web-sniffer.net to verify whether the URL is actually returning the correct code. Sometimes "deleted" pages may say "404" or "Not found" on the page, but actually return a 200 status code in the page header; so it's good to use a proper header-checking tool to double-check.

If unwanted content has been removed from a page but the page hasn't been blocked in any of the above ways, you will not be able to completely remove that URL from our search results. This is most common when you don't own the site that's hosting that content. We cover what to do in this situation in a subsequent post. in Part II of our removals series.

If a URL meets one of the above criteria, you can remove it by going to http://www.google.com/webmasters/tools/removals, entering the URL that you want to remove, and selecting the "Webmaster has already blocked the page" option. Note that you should enter the URL where the content was hosted, not the URL of the Google search where it's appearing. For example, enter
   http://www.example.com/embarrassing-stuff.html
not
   http://www.google.com/search?q=embarrassing+stuff

This article has more details about making sure you're entering the proper URL. Remember that if you don't tell us the exact URL that's troubling you, we won't be able to remove the content you had in mind.

Removing an entire directory or site

In order for a directory or site-wide removal to be successful, the directory or site must be disallowed in the site's robots.txt file. For example, in order to remove the http://www.example.com/secret/ directory, your robots.txt file would need to include:
   User-agent: *
   Disallow: /secret/
It isn't enough for the root of the directory to return a 404 status code, because it's possible for a directory to return a 404 but still serve out files underneath it. Using robots.txt to block a directory (or an entire site) ensures that all the URLs under that directory (or site) are blocked as well. You can test whether a directory has been blocked correctly using either the Fetch as Googlebot or Test robots.txt features in Webmaster Tools.

Only verified owners of a site can request removal of an entire site or directory in Webmaster Tools. To request removal of a directory or site, click on the site in question, then go to Site configuration > Crawler access > Remove URL. If you enter the root of your site as the URL you want to remove, you'll be asked to confirm that you want to remove the entire site. If you enter a subdirectory, select the "Remove directory" option from the drop-down menu.

Reincluding content

You can cancel removal requests for any site you own at any time, including those submitted by other people. In order to do so, you must be a verified owner of this site in Webmaster Tools. Once you've verified ownership, you can go to Site configuration > Crawler access > Remove URL > Removed URLs (or > Made by others) and click "Cancel" next to any requests you wish to cancel.

Still have questions? Stay tuned for the rest of our series on removing content from Google's search results. If you can't wait, much has already been written about URL removals, and troubleshooting individual cases, in our Help Forum. If you still have questions after reading others' experiences, feel free to ask. Note that, in most cases, it's hard to give relevant advice about a particular removal without knowing the site or URL in question. We recommend sharing your URL by using a URL shortening service so that the URL you're concerned about doesn't get indexed as part of your post; some shortening services will even let you disable the shortcut later on, once your question has been resolved.

Edit: Read the rest of this series:
Part II: Removing & updating cached content
Part III: Removing content you don't own
Part IV: Tracking requests, what not to remove

Companion post: Managing what information is available about you online

Posted by Susan Moskwa, Webmaster Trends Analyst

Friday, December 4, 2009

Your site's performance in Webmaster Tools

Webmaster level: Intermediate

Let's take a quick look at the individual sections in the Google Webmaster Tools' Site Performance feature:

Performance overview

The performance overview shows a graph of the aggregated speed numbers for the website, based on the pages that were most frequently accessed by visitors who use the Google Toolbar with the PageRank feature activated. By using data from Google Toolbar users, you don't have to worry about us testing your site from a location that your users do not use. For example, if your site is in Germany and all your users are in Germany, the chart will reflect the load time as seen in Germany. Similarly, if your users mostly use dial-up connections (or high-speed broadband), that would be reflected in these numbers as well. If only a few visitors of your site use the Google Toolbar, we may not be able to show this data in Webmaster Tools.

The line between the red and the green sections on the chart is the 20th percentile — only 20% of the sites we check are faster than this. This website is pretty close to the 20% mark, which pages would we have to work on first?

Example pages with load times

In this section you can find some example pages along with the average, aggregated load times that users observed while they were on your website. These numbers may differ from what you see as they can come from a variety of different browsers, internet connections and locations. This list can help you to recognize pages which take longer than average to load — pages that slow your users down.

As the page load times are based on actual accesses made by your users, it's possible that it includes pages which are disallowed from crawling. While Googlebot will not be able to crawl disallowed pages, they may be a significant part of your site's user experience.

Keep in mind that you may see occasional spikes here, so it's recommended that you watch the load times over a short period to see what's stable. If you consistently see very large load times, that probably means that most of your users are seeing very slow page loads (whether due to slow connections or otherwise), so it's something you should take seriously.

Page Speed suggestions

These suggestions are based on the Page Speed Firefox / Firebug plugin. In order to find the details for these sample URLs, we fetch the page and all its embedded resources with Googlebot. If we are not able to fetch all of embedded content with Googlebot, we may not be able to provide a complete analysis. Similarly, if the servers return slightly modified content for Googlebot than they would for normal users, this may affect what is shown here. For example, some servers return uncompressed content for Googlebot, similar to what would be served to older browsers that do not support gzip-compressed embedded content (this is currently the case for Google Analytics' "ga.js").

When looking at flagged issues regarding common third-party code such as website analytics scripts, one factor that can also play a role is how wide-spread these scripts are on the web. If they are common across the web, chances are that the average user's browser will have already cached the DNS lookup and the content of the script. While these scripts will still be flagged as separate DNS lookups, in practice they might not play a strong role in the actual load time.

We offer these suggestions as a useful guideline regarding possible first performance improvement steps and recommend using the Page Speed plugin (or a similar tool) directly when working on your website. This allows you to better recognize the blocking issues and makes it easy to see how modifications on the server affect the total load time.

For questions about Webmaster Tools and this new feature, feel free to read the Help Center article, search and post in the Webmaster Help Forums or in the Page Speed discussion group. We hope this information helps you make your website even faster!

Posted by John Mueller, Webmaster Trends Analyst, Google Zürich

Pages

Traffic

CPX

PTP

DOWNLOAD OUR MOVIE COLLECTION