duplicate content ses ny 2009
DESCRIPTION
Sasi Parthasarathy, Program Manager for Live Search at Microsoft talks about duplicate content & multiple site issues at SES NY.TRANSCRIPT
Duplicate Content & Multiple Site
Issues
Sasi Parthasarathy
Program Manager, Microsoft
Topics covered
• Duplicate content
– Internal content -> URL Canonicalization
– External content -> Spam, Geo-targeting
• Content Syndication
• Good practices
• Examples Examples Examples
URL canonicalization
• Less is more - expose only one URL per piece of content – pretty
please
• The practice of consolidating all versions of a page under one URL is
referred to as "canonicalization"
• Helps the search engine; at the same time does not split your rank juice
• Having too many duplicate URLs will waste crawl time – the crawler might
spend time indexing duplicate URLs and miss good content
• 4 ways to get to microsoft.com but we need only one
1. microsoft.com
2. www.microsoft.com
3. www.microsoft.com/en/us/default.aspx
4. www.microsoft.com/en/us/
Few recommendations for canonicalization
• Select WWW or Non-WWW, then redirect the other option to your
preferred version
• Remove the default filename from the end of your URLs
– All web servers allow you to select one or more default filenames to serve when
the browser requests a directory. Check and see if the default filename is at the
end of the URL and then trim it off
• Link internally to the canonical form of your URL
– Make sure you always link to the proper canonical form of your URLs from within
your site
• Remove query string variables or rewrite to readable URLs
– http://www.mysite.com/downloads/details.aspx?FamilyID=ab99&displaylang=en
to
http://www.mysite.com/downloads/en/family/ab99
Why duplicate content?
• Your intention is the key
• If your intent is to manipulate the search engine, you will
be penalized
Example1: Multiple domains with very little or no
difference in content and no clear intent why these
domains exist
Example2: If you are trying to falsely promote original
content as your own (please report any issues with
copied content to Live Search support)
Going International – Help Search Engines
You may have similar pages but for various regions.
Problems for search engines with geo-targeting:
• No standardized way to tell a search engine which region or
language your content is targeted for
• Top level domains may not indicate the intended audience. For
example, http://ma.tt/, an English site or Orange.com, a French
Telecom site hosted in France.
• Using search unfriendly redirection techniques
Few indicators - Help Live Search while Geo-
targeting
• Country code top-level domain (ccTLD). For example, .ca
specifically targets users in Canada
• Set all your domains in Live Search webmaster tools and make it
explicit for the region
These indicators will help us show the correct page for the correct
market
Content Syndication
• Syndicate with caution: For sites that syndicate their content on
other sites
• From our perspective, we always want to show the version we think
is appropriate to the user. This may not be the version you want or
prefer.
• Tip:
Ask your partner to use robots.txt to stop us from indexing the syndicated material
General tips to help the Search Engine
• Dynamic URLs – if the content is not changing, don’t have too many
parameters
• 301 is your best friend – use them when you can
• No 302 hijack!!
• When you do a site update, don’t have links to expired pages
• Use robots.txt for anything you don’t want crawlers to crawl
• Consistent naming convention – easy for search engines to
understand
• Follow standard URL formation practices