webmaster guide-en

Making the Mostof Your Content

A Publisher’s Guide to the Web

Contents

Introduction 2

Abriefoverviewofwebsearch 3

What’snewinGooglewebsearch? 4

CanGooglefindyoursite? 5

CanGoogleindexyoursite? 6

Controlling what Google indexes 7

Robots.txt vs. meta tags 9

Controlling caching and snippets 10

Doesyoursitehaveuniqueandusefulcontent? 11

Increasingvisibility:bestpractices 12

WebmasterCentral 13

Sitemaps 14

FrequentlyAskedQuestions 15

Glossary 19

3

Introduction

Ifyou’relookingforvisibility,theInternetistheplacetobe.Justaskanyadvertiserwhohasincreasedsalesusingonlineads,abloggerwhosepopularityhasledtoabookdeal,oranewspaperthathasexpandeditsreadershiptoaninternationalaudiencethankstoincreasedtraffic.

AtGooglewe’refrequentlyaskedhowwebsearchworks,andhowwebpublisherscanmaximisetheirvisibilityontheInternet.

We’vewrittenthisshortbooklettohelpyouunderstandhowasearchengine‘sees’yourcontent,andhowyoucanbesttailoryourwebpresencetoensurethatwhatyouwanttobefoundisfound–andwhatyouwanttokeephidden,stayshidden.

Fromwebmastertipsandonlinetools,toastep-by-stepguidetofrequentlyaskedquestions,thisbookletisgearedtowardssmallwebpublishersaswellasownersoflargesites.

JustastheInternetitselfhasevolveddramaticallyinthepastdecade,sohasGoogle’sownapproachtowebsearchanditsrelationshipwithsiteowners.We’vecreatednumeroustoolstohelpwebmastersmaximisethevisibilityoftheircontent,aswellascontrolhowtheirwebpagesareindexed.Butthere’salwaysmorewecando.Wehopethatthisbookletwillencourageyoutogiveusfeedbackandletusknowwhatwecandotomakethewebanevenbetterplaceforbothsearchersandpublishers.

- The Google Webmaster Team

A brief overview of web search: How it works

Inthemostsimpleofterms,youcouldthinkofthewebasaverylargebook,withanimpressiveindextellingyouexactlywhereeverythingislocated.

Googlehasasetofcomputers–the‘Googlebot’–thatarecontinually‘crawling’(browsing)billionsofpagesontheweb.Thiscrawlprocessisalgorithmic:computerprogramsdeterminewhichsitestocrawl,howoften,andhowmanypagestofetchfromeachsite.Wedon’tacceptpaymenttocrawlasitemorefrequently,andwekeepthesearchsideofourbusinessseparatefromourrevenue-generatingAdWordsservice.

Google’scrawlprocessbeginswithalistofwebpageURLs.AstheGooglebotbrowsesthesewebsitesitdetectslinksoneachpageandaddsthemtoitslistofpagestocrawl.TheGooglebotmakesacopyofeachofthepagesitcrawlsinordertocompileamassiveindexofallthewordsitsees.Thatlistalsoindicateswhereeachwordoccursoneachpage.

Whenauserentersaquery,ourmachinessearchtheindexformatchingpagesandreturnthemostrelevantresultstotheuser.Relevancyisdeterminedbyover200factors,oneofwhichisthe‘PageRank’foragivenpage.PageRankisthemeasureofthe‘importance’ofapagebasedontheincominglinksfromotherpages.Insimpleterms,eachwebpagelinkingtoWebpageXYZaddstoWebpageXYZ’sPageRank.

Before the search

During the search

4 5

Can Google find your site?

InclusioninGoogle’ssearchresultsisfreeandeasy;youdon’tevenneedtosubmityoursitetoGoogle.Infact,thevastmajorityofsiteslistedinourresultsaren’tmanuallysubmittedforinclusion,butfoundandaddedautomaticallywhentheGooglebotcrawlstheweb.

AlthoughGooglecrawlsbillionsofpages,it’sinevitablethatsomesiteswillbemissed.WhentheGooglebotmissesasite,it’sfrequentlyforoneofthefollowingreasons:

•thesiteisn’twellconnectedthroughmultiplelinkstoothersitesontheweb;

•thesitelaunchedafterGoogle’smostrecentcrawlwascompleted;

•thesitewastemporarilyunavailablewhenwetriedtocrawlitorwereceivedanerrorwhenwetriedtocrawlit.

UsingGooglewebmastertoolssuchasSitemaps,youcandeterminewhetheryoursiteiscurrentlyincludedinGoogle’sindexandwhetherwereceivederrorswhenwetriedtocrawlit(seepage14).YoucanalsousethesetoolstoaddyourURLtoGoogle’sindexmanually,orprovideGooglewithaSitemapthatgivesusgreaterinsightintoyourcontent.Thishelpsusfindnewsectionsandcontentfromyoursite.

What’s new in Google web search?

Whilethefundamentalsofwebsearchhavelargelyremainedconstant,Googleisconstantlyworkingtoimproveitssearchresults.

What’sdifferentfromwebsearch,say,fiveyearsago?Well,foroneit’salotfaster.

Inaddition,comparedtofiveyearsagoourcrawlandindexsystemsaremuchmoreintelligent.Forexample,wenowbrowsewebpagescontinuously,andschedulevisitstoeachpageinasmarterwaysoastomaximisefreshness.Thismoreefficientapproachtakesintoaccountthefactthatanewspaper’sonlinesite,forexample,needsfrequentcrawlingwhereasastaticwebsiteupdatedonceamonthdoesnot.Infact,wearealsolettingwebmasterscontrolhowfrequentlywecrawltheirsitesthroughourwebmastertools.Overallthisresultsinafresherandmorecomprehensiveindex.

Althoughwebsearchtodayisfasterandmoreefficientthanever,thekeyfactorsdeterminingawebsite’svisibilityintheGooglesearchresultshavebeenaprioritysincethedayoursearchenginemadeitsdebut:

CanGooglefindthesite?(page5)

CanGoogleindexthesite?(page6)

Doesthesitehaveuniqueandusefulcontent?(page11)

6 7

Controlling what Google indexes

EverywebpublisherhasadifferentgoalforwhatheorsheistryingtoachieveontheInternet.Somenewspaperpublishers,forexample,havechosentoprovidefreeaccesstotheirrecentarticles,whileoptingforapremium,paidserviceforaccesstoarchives.Somewantvisibilityonallofasearchengine’sproperties(forexample,GoogleMobile,GoogleImages,etc.),whileothersonlywanttoappearinwebsearchresults.

Searchengineswanttorespectpublishers’wishes–afterall,it’stheircontent.Butwearen’tmindreaders,soit’svitalthatwebmasterstellushowtheywanttheircontentindexed.ThiscanbedoneviatheRobotsExclusionProtocol,awell-establishedtechnicalspecificationthattellssearchengineswhichsiteorpartsofasiteshouldnotbesearchable,andwhichpartsshouldremainvisibleinthesearchresults.

Robots.txt: site-wide control

ThecoreoftheRobotsExclusionProtocolisasimpletextfilecalledrobots.txtthathasbeenanindustrystandardformanyyears.Withrobots.txtyoucancontrolaccessatmultiplelevels,fromtheentiresitetoindividualdirectories,pagesofaspecifictype,orevenindividualpages.

TherearesomepagesonmysitethatIdon’twantinGoogle’sindex.HowdoIkeepthemfromappearinginGoogle’ssearchresults?

Ingeneral,mostsiteownerswanttheGooglebottoaccesstheirsitesothattheirwebpagescanbefoundbyuserssearchingonGoogle.However,youmayhavepagesthatyoudon’twantindexed:forexample,internallogsornewsarticlesthatrequirepaymenttoaccess.

YoucanexcludepagesfromGoogle’sindexbycreatingarobots.txtfileandplacingitintherootdirectoryonyourwebserver.Therobots.txtfileliststhepagesthatsearchenginesshouldn’tindex.Creatingarobots.txtfileisstraightforwardandgivespublishersasophisticatedlevelofcontroloverhowsearchenginesaccesstheirwebsites.

Forexample,ifawebmasterwantstopreventindexingofhisinternallogstherobots.txtfileshouldcontain:

User-Agent:Googlebot – TheUser-AgentlinespecifiesthatthenextsectionisasetofinstructionsjustfortheGooglebot.

Disallow:/logs/ –TheDisallowlinetellstheGooglebotnottoaccessfilesinthelogssub-directoryofyoursite.

Can Google index your site?

Occasionally,webmasterswilldiscoverthattheirsitesarenotappearinginsearchresults.Theissuecouldbeoneof‘indexability’–whetherornottheGooglebotcanmakeacopyofawebpageforinclusioninoursearchresults.

Structure and ContentAcommonreasonfornon-inclusioninsearchresultsistiedtothestructureandcontentofthewebpage.Forexample,apagethatrequiresausertofilloutaformmaynotbeindexablebyGoogle.Norcanapageusing‘dynamiccontent’(Flash,JavaScript,framesordynamicallygeneratedURLs)alwaysbeeasilyindexedbysearchengines.Ifyouarewonderingwhetherthismightbeyoursite’sproblem,tryviewingthesiteinatextbrowserlikeLynxorinabrowserwithimages,Javascript,andFlashturnedoff,whichwillsignalwhetherallofyourcontentisaccessible.

Ifyoursiteusesalotofimages,ensurethatyoudescribetheimportantcontentofeachimageinthetext.Notonlydoesthisallowsearchenginestoindextheimagecorrectly,italsomakestheimageaccessibletovisuallyimpairedusers.Youcanalsomakeuseofalttextfortheimageandusedescriptivefilenames,asshowninthisexample(whichisanimageofalogoforacompanycalled‘Buffy’sHouseofPies’):

<imgsrc=”buffyshouseofpies.jpg”alt=”WelcometoBuffy’sHouseofPies!”>

URLsAnadditionalhurdlecouldbetheURLitself.IftherearesessionIDsorseveralparametersintheURL,oriftheURLredirectsanumberoftimes,Googlemaynotbeabletoindexthepage.

Server and NetworkServerornetworkissuesmaypreventusfromaccessingcertainpagesofyoursite.ByusingtoolsavailableatGoogle’sWebmasterCentral,publisherscanseealistoftheirpagestheGooglebotcannotaccess.TolearnmoreaboutWebmasterCentral,seepage13.

Robots Exclusion ProtocolOccasionallypageswillbeblockedbytheRobotsExclusionProtocol,atechnicalstandardthatallowswebpublishersto‘tell’searchenginesnottoindextheirsite’scontent(seepage7).Ifyourwebsiteisn’tappearinginGooglesearchresultsyoushouldchecktoensurethatrobots.txtorametatagisn’tblockingaccesstoourcrawlers.

8 9

Robots.txt vs. meta tags

Ingeneral,robots.txtisagoodwaytoprovidesite-widecontrol,whilemetatagsgivefine-graincontroloverindividualfiles.Metatagsareparticularlyusefulifyouhavepermissiontoeditindividualfilesbutnottheentiresite.Metatagsalsoallowyoutospecifycomplexaccess-controlpoliciesonapage-by-pagebasis.

Sometimeseitherofthetwotoolscansolvethesameproblem:

HowcanImakesurethetextofapageisindexed,butnottheimages?

Oneoptionwouldbetoblockaccesstoimagesbyfileextensionacrossyoursiteusingrobots.txt.Thefollowinglinesinarobots.txtfiletellGooglenottoindexanyfilesendingin*.jpgor*.jpeg:

User-agent:Googlebot

Disallow:/*.jpg$

Disallow:/*.jpeg$

Alternatively,ifyourContentManagementSystem(CMS)storesimagesinaseparatedirectory,youcanexcludethatentiredirectory.Ifyourimagesareinadirectorycalled/imagesyoucanexcludethatdirectoryfromallsearchenginesusing:

User-agent:*

Disallow:/images/

AnotheroptionwouldbetoaddaNOINDEXtagtoeachfilethatincludesanimage.

Alltheseapproacheswillkeepyourimagesfrombeingindexed;theonlyquestionishowextensiveyouwouldlikethisimageexclusiontobe.

ThesiteownerhasspecifiedthatnoneofthepagesinthelogsdirectoryshouldshowupinGoogle’ssearchresults.

Allmajorsearchengineswillreadandobeytheinstructionsyouputinrobots.txt,andyoucanspecifydifferentrulesfordifferentsearchenginesifyouwish.

Meta tags: fine-grain control

Inadditiontotherobots.txtfile–whichallowsyoutospecifyinstructionsconciselyforalargenumberoffilesonyourwebsite–youcanusetherobotsmetatagforfine-graincontroloverindividualpagesonyoursite.Toimplementthis,simplyaddspecificmetatagstoanHTMLpagetocontrolhowthatpageisindexed.Together,robots.txtandmetatagsgiveyoutheflexibilitytoexpresscomplexaccesspoliciesrelativelyeasily.

Ihaveaparticularnewsarticleonmysitethatisonlyaccessibletoregisteredusers.HowdoIkeepthisoutofGoogle’ssearchresults?

Todothis,simplyaddtheNOINDEXmetatagtothefirst<head>sectionofthearticle.Itshouldlooksomethinglikethis:

<html>

<head>

<metaname=”googlebot”content=”noindex”>

[...]

ThisstopsGooglefromindexingthisfile.

However,it’sworthkeepinginmindthatinsomecasesyoumaywantGoogletoindexthesetypesofpages–forexample,anarchivenewspaperarticlethatviewerscanpaytoreadonline.Whilethistypeof“premium”contentwon’tappearinGoogle’ssearchresults,certainGoogleserviceslikeNewsArchiveSearchwillincludethearticleintheirindexes,withpaymentinformationclearlydisplayedtousers.

10 11

Does your site have unique and useful content?

Oncethesiteisdiscoverableandindexable,thefinalquestiontoaskiswhetherthecontentofthewebpagesisuniqueanduseful.

Firstlookatyourtextasawhole.Areyourtitleandtextlinksdescriptive?Doesyourcopyflownaturallyandinaclearandintuitivemanner?

Justasachapterinabookisorganisedaroundspecificareasandthemes,soeachwebpageshouldbefocusedonaspecificareaortopic.Keywordsandphrasesemergenaturallyfromthistypeofcopy,andusersarefarmorelikelytostayonawebpagethatprovidesrelevantcontentandlinks.

Makesure,however,thatthephrasesyouwriteincludethephrasesthatvisitorswilllikelysearchfor.Forinstance,ifyoursiteisforanMGenthusiastclub,makesurethewords‘MG’and‘cars’actuallyappearinthecopy,ratherthanonlytermslike‘Britishautomobiles’.

Controlling caching and snippets

Searchresultsusuallyshowacachedpagelinkandasnippet.Here,forexample,isoneofthefirstresultsweseewhenwesearchfor‘Mallardduck’:

Snippet–anextractoftextfromthewebpage

Cached link–thislinktakesuserstoacopyofthepagestoredonGoogle’sservers

Whyhaveasnippet?Usersaremorelikelytovisitawebsiteifthesearchresultsshowasnippetfromthatsite.Thisisbecausesnippetsmakeitmucheasierforuserstoseetherelevanceoftheresulttotheirquery.Ifauserisn’tabletomakethisdeterminationquickly,heorsheusuallymovesontothenextsearchresult.

Whyhaveacachedlink?Thecachedlinkisusefulinanumberofcases,suchaswhensitesbecometemporarilyunavailable;whennewssitesgetoverloadedintheaftermathofamajorevent;orwhensitesareaccidentallydeleted.AnotheradvantageisthatGoogle’scachedcopyhighlightsthewordsemployedbytheuserintheirsearch,allowingaquickevaluationoftherelevanceofthepage.

MostwebpublisherswantGoogletodisplayboththesnippetandthecachedlink.However,therearesomecaseswhereasiteownermightwanttodisableoneorbothofthese:

Mynewspapercontentchangesseveraltimesaday.Itdoesn’tseemliketheGooglebotisindexingthatcontentasquicklyasweareupdatingit,andthecachedlinkispointingtoapagethatisnolongerup-to-date.HowcanIpreventGooglefromcreatingacachedlink?

ThenewssiteownercanpreventthiscachedlinkfromappearinginsearchresultsbyaddingtheNOARCHIVEtagtoitspage:

<METANAME=”GOOGLEBOT”CONTENT=”NOARCHIVE”>

Similarly,youcantellGooglenottodisplayasnippetforapageviatheNOSNIPPETtag:

<METANAME=”GOOGLEBOT”CONTENT=”NOSNIPPET”>

Note:AddingNOSNIPPETalsohastheeffectofpreventingacachelinkfrombeingshown,soifyouspecifyNOSNIPPET,youautomaticallygetNOARCHIVEaswell.

12 13

Increasing visibility: best practices

Siteownersoftenaskusaboutthebestwaystoincreasethevisibilityandrankingoftheirsitesinoursearchresults.Oursimpleansweris,‘Thinklikeauser,becausethat’showwetrytothink.’

Whatdoesthismeaninpractice?Aboveall,makesuretogivevisitorstheinformationtheyarelookingfor,asrelevanceiswhatwilldrivetraffictoyoursiteandhelpyouretainit.

Manysiteownersfixateonhowwelltheirrespectivewebpagesrank.Butrankingisdeterminedbyover200criteriainadditiontoPageRank.It’smuchbettertospendyourtimefocusingonthequalityofyourcontentanditsaccessibilitythantryingtofindwaysto‘game’asearchengine’salgorithm.Ifasitedoesn’tmeetourqualityguidelines,itmaybeblockedfromtheindex.

What to do:

1. Createrelevant,eye-catchingcontent:visitorswillarriveatyourpagesviavariouslinks,somakesureeachpagewillgrabtheirattention.

2. Involveusers:canyouaddacommentssectionorblogtoyourwebsite?Buildingacommunityhelpsdriveregularusageofyoursite.Gettingyourvisitorsinvolvedhelpsboostvisibilityanduserfidelity.

3. Monitoryoursite:useWebmasterCentral(seepage13)toseewhatqueriesarepushingvisitorstoyoursite,ortotrackrankingchangesinsearchresultsinrelationtolargersitechanges.

4. Aimforhigh-quality,inboundlinks.

5. Providecleartextlinks:placetextlinksappropriatelyonyoursiteandmakesuretheyincludetermsthatdescribethetopic.

What to avoid:

1. Don’tfillyourpagewithlistsofkeywords.

2. Don’tattemptto‘cloak’pagesbywritingtextthatcanbeseenbysearch enginesbutnotbyusers.

3. Don’tputup‘crawleronly’pagesbysettinguppagesorlinkswhosesole purposeistofoolsearchengines.

4. Don’tuseimagestodisplayimportantnames,contentorlinks–search enginescan’t‘read’images.

5. Don’tcreatemultiplecopiesofapageunderdifferentURLswiththeintentofmisleadingsearchengines.

Whenindoubt,consultourwebmasterguidelines,availableat:google.com/webmasters/guidelines.html

Webmaster Central

Asacompanyaimingtoprovidethemostrelevantandusefulsearchresultsontheweb,westrivetoprovidescalableandequitablesupportforallwebmastersandallwebsites,nomatterhowlargeorsmall.That’swhywe’vecreatedWebmasterCentral,locatedatgoogle.com/webmasters.

WebmasterCentralisagreatresourceforallwebpublishers.Itcomprehensivelyanswerscrawling,indexing,andrankingquestions;providesandavenueforfeedbackandissues;andoffersdiagnostictoolsthathelpwebmastersdebugpotentialcrawlingproblems.

Here’satasteofwhatyoucanfindatWebmasterCentral.

• Diagnosepotentialproblemsinaccessingpagesandoffersolutions

• Requestremovalofspecificpagesfromourindex

• Ensureyourrobots.txtfileisallowingandblockingthepagesyouexpect

• Seequeryandpagestatisticsrelatedtoyourwebsite:

• Querystatistics:Determinewhichsearchqueriesbringyoursitethemostvisitors,andwhattopicscouldyoursiteexpandupontocapturemoretraffic?

• PageAnalysis:SeeyourwebpageasGoogleseesit.Viewthemostcommonwordsonyoursite,inboundlinkstothesite,andhowothersdescribeyoursitewhentheylinktoit.

• Crawlrate:SeehowfrequentlyyoursiteisbeingcrawledbytheGooglebotandtellGooglewhetheritshouldbecrawlingfasterorslower.

Sitemaps

WebmasterCentralalsoofferspublishersSitemapsforwebsearch,mobile,andnewsresults.

Sitemapsisaprotocolwesupportwithothersearchenginestohelpwebmastersgiveusmoreinformationabouttheirwebpages.SitemapscomplementsexistingstandardwebcrawlmechanismsandwebmasterscanuseittotellGoogleaboutthepagesoftheirsiteinordertoimprovethecrawlingandvisibilityoftheirpagesinGooglesearchresults.

InadditiontoWebSearchSitemaps,wealsoofferGoogleMobileSitemaps,whichenablespublisherstosubmitURLsthatservecontentformobiledevicesintoourmobileindex.

AndforthosepublisherswhosenewssiteisincludedinGoogleNews,NewsSitemapscanhelpprovidestatisticsaboutthepublisher’sarticles,fromqueriestofrequencyofappearance.UsedinconjunctionwithWebmasterCentraldiagnostictools,NewsSitemapscanalsoprovideerrorreportsthathelpexplainanyproblemsGoogleexperienceswhencrawlingorextractingnewsarticlesfromapublisher’ssite.Inaddition,apublishercansubmitaNewsSitemapcontainingURLsthatitwouldlikeconsideredforinclusioninGoogleNews.NewsSitemaps,unlikeWebandMobileSitemaps,iscurrentlyonlyavailableinEnglish,althoughwehopetosoonmakeitavailableinotherlanguagesaswell.

Frequently Asked Questions

Whycan’tyoudoone-on-onesupportformywebsite?Bysomeestimates,thereare100millionsitesontheweb.Eachofthosewebsitesisimportanttous,becausewithoutthem–nomatterhowbigorsmall–ourindexwouldbelesscomprehensive,andultimatelylessusefultoourusers.

WebmasterCentralisagreatsourceofsupportforalltypesofwebsites.Wepostandanswerpublishers’questionssothateveryonecanbenefitfromtheinformation.AtWebmasterCentralyoucanalsofindafriendlyandusefulcommunityofwebmasterswhocansharetipsandhelpwithtroubleshooting.

Dotheadsyoushowinfluenceyourrankings?Areyouradlistingsandsearchresultstotallyseparate?Adandsearchrankingsarenotrelatedinanyway,andinfactwehaveentirelyseparateteamstoworkonthemsothatthereisnointerference.Webelievethattheobjectivityofoursearchresultsiscrucialtoprovidingthebestexperienceforusers.

HowdoIaddasitetoGoogle’ssearchindex?InclusioninGoogle’ssearchresultsisfreeandeasyanddoesnotrequireamanualsubmissionofthesitetoGoogle.Googleisafullyautomatedsearchengine;itcrawlsthewebonaregularbasisandfindssitestoaddtoourindex.Infact,thevastmajorityofsiteslistedinourresultsaren’tmanuallysubmittedforinclusion,butfoundandaddedautomaticallywhenourspiderscrawltheweb.

Inaddition,GoogleWebmasterTools(atWebmasterCentral)provideaneasywayforwebmasterstosubmitasitemaportheirURLstotheGoogleindexandgetdetailedreportsaboutthevisibilityoftheirpagesonGoogle.WithGoogleWebmasterTools,siteownerscanautomaticallykeepGoogleinformedofallcurrentpagesandofanyupdatesmadetothosepages.

Howlong,onaverage,doesittakeforGoogletofindanewlycreatedwebsite,andhowfrequentlydoesGooglecrawlthewebingeneral?ThereisnosetamountoftimeittakesforGoogletofindanewsite.

TheGooglebotregularlycrawlsthewebtorebuildourindex.ByusingWebmasterCentral,awebmastercanseehowfrequentlyhissiteisbeingcrawledbytheGooglebotandtellGooglewhetheritshouldbecrawlingfasterorslower.

WhatifIwantmywebsitetoappearonwebsearchresults,butnotonseparateserviceslikeGoogleNewsorGoogleImageSearch?Googlealwaysallowswebpublisherstooptoutofitsservices,andapublishercancontactthesupportteamofaparticularproducttodoso.

Asdiscussedearlierinthisbooklet,theRobotsExclusionProtocolcanbeusedtoblockindexingofbothimagesandwebpages.The‘URLremoval’featureinWebmasterCentralcanalsobeusedforthispurposeandcoverswebsearchandimagesearch.

14 15

Inaddition,becausetheGooglebotreliesonseveraldifferentbots,youcantargetwhatyoublock:

•Googlebot:crawlpagesfromourwebindexandournewsindex

•Googlebot-Mobile:crawlspagesforourmobileindex

•Googlebot-Image:crawlspagesforourimageindex

•Mediapartners-Google:crawlspagestodetermineAdSensecontent.WeonlyusethisbottocrawlyoursiteifyoushowAdSenseadsonyoursite.

•Adsbot-Google:crawlspagestomeasureAdWordslandingpagequality.WeonlyusethisbotifyouuseGoogleAdWordstoadvertiseyoursite.

Forinstance,toblockGooglebotentirely,youcanusethefollowingsyntax:

User-agent:Googlebot

Disallow:/

CanIchoosewhattextIwantspecifiedasasnippet? No.Thisisn’tagoodidea,bothfromtheperspectiveoftheuserandthecontentcreator.Wechooseasnippetoftextfromthesitethatshowsthesearcher’squeryincontext,whichinturndemonstratestherelevanceoftheresult.

Studiesshowthatusersaremorelikelytovisitawebsiteifthesearchresultsshowthesnippet.Thisisbecausesnippetsmakeitmucheasierforuserstoseewhytheresultisrelevanttotheirquery.Ifausercan’tmakethisdeterminationquickly,heorsheusuallymovesontothenextsearchresult.

WebpublisherscanincludeametatagintheirpagesinordertoprovideGooglewithadditionalinputincaseswherewearen’tabletogenerateausefulsnippetfromthecontentonthepagealgorithmically.Todothis,simplyaddthefollowingtothe<head>sectionofthepage:

<metaname=”description”content=”Whydoesn’tAnya

likebunnies?We’reabouttofindout.”>

Anywebpublisherwhodoesn’twantasnippetgeneratedfromtheirpagescanusetheNOSNIPPETtag,asfollows:

<metaname=”robots”content=”nosnippet”>

Finally,wesometimesuseasite’sdescriptionfromtheOpenDirectoryProjectforthesearchresultsnippet.Ifyoudon’twantthisdescriptiontobeused,simplyaddthefollowingmetatag:

<metaname=”robots”content=”noodp”>

Breakingnewsarticlesonmysiteonlyappearforafewhoursbeforebeingupdatedandmovedtoastandardarticlessection.IwantthefullarticletoappearinGoogle’sindex,butnotthesebreakingnewsstories.Oneoptionistoputallthebreakingnewsarticlesinonedirectory,anduserobots.txttodisallowtheGooglebotaccesstothatdirectory.

AnotheroptionistoaddtheNOFOLLOWtagtothe<HEAD>sectionofthehtmlofyourbreakingnewssection.ThistellstheGooglebotnottofollowanylinksitfindsonthatpage.Keepinmind,though,thatNOFOLLOWonlystopstheGooglebotfromfollowinglinksfromonepagetoanother.Ifanotherwebpagelinkstothatarticle,Googlecanstillfindthearticlewhenitindexes.

IfIhavemultipledomainnamesandrunthesamecontentoffthesedifferentdomainswillIbekickedoutofyoursearchresults?Althoughsomepublishersmaytrytofoolsearchenginesbyduplicatingcontentandrunningmirrorsites,thereisalsolegitimatecontentthatmaybeduplicatedforgoodreasons.Googledoesn’twanttopenalisethosesites.Forexample,wedon’ttreatsimilarcontentexpressedindifferentlanguages(say,EnglishononesiteandFrenchinanother)asduplicatecontent.

Havingthesamecontentonmultiplewebsites(e.g.articlesyndication)won’tnecessarilyresultinoneorsomeofthosesitesbeingremovedfromthesearchresultsentirely.However,keepinmindthateachinstanceofthearticleislikelytoappearlowerintherankingsbecauseitonlyhasafractionoftheincominglinksthatasinglecopywould.Ingeneral,asinglecopyofanarticlewillrankhigherandthereforebeseenbymoreusersthanwillmultiplecopiesofthesamecontent.

Inaddition,inordertoensuresearchquality,Googledoesn’tincludemultiplecopiesofapageinoursearchresults.Rather,weoftenchoosejustoneversionofthepagetoshow.However,webmasterscanindicatetoGoogletheirpreferredversionbyusingrobots.txtorametatagtoblockanycopiestheydon’twantshowingupinoursearchresults.

WhyismysitebeingblockedfromtheGoogleindex? First,yoursitemaynotbeblocked.Therearemanyreasonswhyasitemaynotappearinoursearchresults(seepages5-11).

Ifyoursiteisn’tpresentinganyhurdlesfordiscoveryorindexation,thenitcouldbethatyoursiteisblocked.SitesmaybeblockedfromourindexbecausetheydonotmeetthequalitystandardsoutlinedinourWebmasterGuidelines(availableatWebmasterCentral).Thismostoftenhappenswhenawebsiteisusingunfairmethodstotrytoappearhigherinthesearchrankings.Commonguidelineviolationsincludecloaking(writingtextinsuchawaythatitcanbeseenbysearchenginesbutnotbyusers)orsettinguppages/linkswiththesolepurposeoffoolingsearchenginesandmanipulatingsearchengineresults.

Whenwebmasterssuspectthattheirsitesviolateourqualityguidelines,theycanmodifytheirsitetomeettheseguidelines,thenclickthe“requestre-inclusion”linkwithinourWebmasterToolsinterfacetoaskustore-evaluatethesite.

16 17

18 19

Glossary

Cache link AsnapshotofhowapageappearedthelasttimeGooglevisitedit.Acachedcopyallowsuserstoviewapageevenwhentheliveversionisunavailable,althoughthecontentmaydifferslightly.Toviewacachedcopy,clickonthe‘cached’linkthatappearsunderneathasearchresult.

Cloaking Showingsearchenginesdifferentcontentthanwhatyoushowusers.

CMS (Content Management System) Asoftwaresystemusedtomanagecontentfromcomputer,image,andaudiofilestowebcontent.

Crawler SoftwareusedtodiscoverandindexURLsontheweboranintranet.

Crawling Theprocessusedbysearchenginestocollectpagesfromtheweb.

Dynamic content Contentsuchasimages,animations,orvideoswhichrelyonFlash,JavaScript,frames,ordynamicallygeneratedURLs.

File extension Thenameofacomputerfile(.doc,.txt,.pdf,etc.)oftenusedtoindicatethetypeofdatastoredinthefile.

HTML (Hypertext Markup Language) Amark-uplanguageusedonthewebtostructuretext.

To index Theprocessofhavingyoursite’scontentaddedtoasearchengine.

Keyword Awordthatisenteredintothesearchboxofasearchengine.Thesearchenginethenlooksforpagesthatincludethewordorphrase.

Meta tags AtagintheHTMLthatdescribesthecontentofawebpage.Metatagscanbeusedtocontrolindexingofindividualpagesinawebsite.

Mirror site Aduplicatewebpage;sometimesusedtofoolsearchenginesandtrytooptimiseindexationandwebrankingsofawebsite.

20

Page Rank AGooglefeaturethathelpsdeterminetherankofasiteinoursearchresults.PageRankreliesontheuniquelydemocraticnatureofthewebbyusingitsvastlinkstructureasanindicatorofanindividualpage’svalue.Important,high-qualitysitesreceiveahigherPageRank,whichGooglerememberseachtimeitconductsasearch.GooglecombinesPageRankwithsophisticatedtext-matchingtechniquestofindpagesthatarebothimportantandrelevanttoyoursearches.

Robots Exclusion protocol Atechnicalspecificationthattellssearchengineswhichsiteorpartsofasiteshouldbenon-searchable,andwhichpartsshouldremainvisibleinthesearchresults.

Robots.txt Atextfilethatallowsawebpublishertocontrolaccesstotheirsiteatmultiplelevels,fromtheentiresitetoindividualdirectories,pagesofaspecifictype,orevenindividualpages.Thisfiletellscrawlerswhichdirectoriescanorcannotbecrawled.

Root directory Thetoporcoredirectoryinacomputerfilesystem.

URL (Uniform Resource Locator) TheaddressofawebsiteontheInternet,consistingoftheaccessprotocol(http),domainname(www.google.com),andinsomecasesthelocationofanotherfile(www.google.com/webmaster).

FormoreinfoaboutWebmasterCentralpleasevisit:

google.com/webmasters/

©Copyright2007.GoogleisatrademarkofGoogleInc.Allothercompanyandproductnamesmaybetrademarksoftherespectivecompanieswithwhichtheyareassociated.

webmaster guide-en

Documents

meta tags

controlling caching

content

search

google

controlling

site

robots