webmaster guide-en
Post on 14-Sep-2014
924 views
DESCRIPTION
TRANSCRIPT
Contents
Introduction 2
Abriefoverviewofwebsearch 3
What’snewinGooglewebsearch? 4
CanGooglefindyoursite? 5
CanGoogleindexyoursite? 6
Controlling what Google indexes 7
Robots.txt vs. meta tags 9
Controlling caching and snippets 10
Doesyoursitehaveuniqueandusefulcontent? 11
Increasingvisibility:bestpractices 12
WebmasterCentral 13
Sitemaps 14
FrequentlyAskedQuestions 15
Glossary 19
3
Introduction
Ifyou’relookingforvisibility,theInternetistheplacetobe.Justaskanyadvertiserwhohasincreasedsalesusingonlineads,abloggerwhosepopularityhasledtoabookdeal,oranewspaperthathasexpandeditsreadershiptoaninternationalaudiencethankstoincreasedtraffic.
AtGooglewe’refrequentlyaskedhowwebsearchworks,andhowwebpublisherscanmaximisetheirvisibilityontheInternet.
We’vewrittenthisshortbooklettohelpyouunderstandhowasearchengine‘sees’yourcontent,andhowyoucanbesttailoryourwebpresencetoensurethatwhatyouwanttobefoundisfound–andwhatyouwanttokeephidden,stayshidden.
Fromwebmastertipsandonlinetools,toastep-by-stepguidetofrequentlyaskedquestions,thisbookletisgearedtowardssmallwebpublishersaswellasownersoflargesites.
JustastheInternetitselfhasevolveddramaticallyinthepastdecade,sohasGoogle’sownapproachtowebsearchanditsrelationshipwithsiteowners.We’vecreatednumeroustoolstohelpwebmastersmaximisethevisibilityoftheircontent,aswellascontrolhowtheirwebpagesareindexed.Butthere’salwaysmorewecando.Wehopethatthisbookletwillencourageyoutogiveusfeedbackandletusknowwhatwecandotomakethewebanevenbetterplaceforbothsearchersandpublishers.
- The Google Webmaster Team
A brief overview of web search: How it works
Inthemostsimpleofterms,youcouldthinkofthewebasaverylargebook,withanimpressiveindextellingyouexactlywhereeverythingislocated.
Googlehasasetofcomputers–the‘Googlebot’–thatarecontinually‘crawling’(browsing)billionsofpagesontheweb.Thiscrawlprocessisalgorithmic:computerprogramsdeterminewhichsitestocrawl,howoften,andhowmanypagestofetchfromeachsite.Wedon’tacceptpaymenttocrawlasitemorefrequently,andwekeepthesearchsideofourbusinessseparatefromourrevenue-generatingAdWordsservice.
Google’scrawlprocessbeginswithalistofwebpageURLs.AstheGooglebotbrowsesthesewebsitesitdetectslinksoneachpageandaddsthemtoitslistofpagestocrawl.TheGooglebotmakesacopyofeachofthepagesitcrawlsinordertocompileamassiveindexofallthewordsitsees.Thatlistalsoindicateswhereeachwordoccursoneachpage.
Whenauserentersaquery,ourmachinessearchtheindexformatchingpagesandreturnthemostrelevantresultstotheuser.Relevancyisdeterminedbyover200factors,oneofwhichisthe‘PageRank’foragivenpage.PageRankisthemeasureofthe‘importance’ofapagebasedontheincominglinksfromotherpages.Insimpleterms,eachwebpagelinkingtoWebpageXYZaddstoWebpageXYZ’sPageRank.
Before the search
During the search
4 5
Can Google find your site?
InclusioninGoogle’ssearchresultsisfreeandeasy;youdon’tevenneedtosubmityoursitetoGoogle.Infact,thevastmajorityofsiteslistedinourresultsaren’tmanuallysubmittedforinclusion,butfoundandaddedautomaticallywhentheGooglebotcrawlstheweb.
AlthoughGooglecrawlsbillionsofpages,it’sinevitablethatsomesiteswillbemissed.WhentheGooglebotmissesasite,it’sfrequentlyforoneofthefollowingreasons:
•thesiteisn’twellconnectedthroughmultiplelinkstoothersitesontheweb;
•thesitelaunchedafterGoogle’smostrecentcrawlwascompleted;
•thesitewastemporarilyunavailablewhenwetriedtocrawlitorwereceivedanerrorwhenwetriedtocrawlit.
UsingGooglewebmastertoolssuchasSitemaps,youcandeterminewhetheryoursiteiscurrentlyincludedinGoogle’sindexandwhetherwereceivederrorswhenwetriedtocrawlit(seepage14).YoucanalsousethesetoolstoaddyourURLtoGoogle’sindexmanually,orprovideGooglewithaSitemapthatgivesusgreaterinsightintoyourcontent.Thishelpsusfindnewsectionsandcontentfromyoursite.
What’s new in Google web search?
Whilethefundamentalsofwebsearchhavelargelyremainedconstant,Googleisconstantlyworkingtoimproveitssearchresults.
What’sdifferentfromwebsearch,say,fiveyearsago?Well,foroneit’salotfaster.
Inaddition,comparedtofiveyearsagoourcrawlandindexsystemsaremuchmoreintelligent.Forexample,wenowbrowsewebpagescontinuously,andschedulevisitstoeachpageinasmarterwaysoastomaximisefreshness.Thismoreefficientapproachtakesintoaccountthefactthatanewspaper’sonlinesite,forexample,needsfrequentcrawlingwhereasastaticwebsiteupdatedonceamonthdoesnot.Infact,wearealsolettingwebmasterscontrolhowfrequentlywecrawltheirsitesthroughourwebmastertools.Overallthisresultsinafresherandmorecomprehensiveindex.
Althoughwebsearchtodayisfasterandmoreefficientthanever,thekeyfactorsdeterminingawebsite’svisibilityintheGooglesearchresultshavebeenaprioritysincethedayoursearchenginemadeitsdebut:
CanGooglefindthesite?(page5)
CanGoogleindexthesite?(page6)
Doesthesitehaveuniqueandusefulcontent?(page11)
6 7
Controlling what Google indexes
EverywebpublisherhasadifferentgoalforwhatheorsheistryingtoachieveontheInternet.Somenewspaperpublishers,forexample,havechosentoprovidefreeaccesstotheirrecentarticles,whileoptingforapremium,paidserviceforaccesstoarchives.Somewantvisibilityonallofasearchengine’sproperties(forexample,GoogleMobile,GoogleImages,etc.),whileothersonlywanttoappearinwebsearchresults.
Searchengineswanttorespectpublishers’wishes–afterall,it’stheircontent.Butwearen’tmindreaders,soit’svitalthatwebmasterstellushowtheywanttheircontentindexed.ThiscanbedoneviatheRobotsExclusionProtocol,awell-establishedtechnicalspecificationthattellssearchengineswhichsiteorpartsofasiteshouldnotbesearchable,andwhichpartsshouldremainvisibleinthesearchresults.
Robots.txt: site-wide control
ThecoreoftheRobotsExclusionProtocolisasimpletextfilecalledrobots.txtthathasbeenanindustrystandardformanyyears.Withrobots.txtyoucancontrolaccessatmultiplelevels,fromtheentiresitetoindividualdirectories,pagesofaspecifictype,orevenindividualpages.
TherearesomepagesonmysitethatIdon’twantinGoogle’sindex.HowdoIkeepthemfromappearinginGoogle’ssearchresults?
Ingeneral,mostsiteownerswanttheGooglebottoaccesstheirsitesothattheirwebpagescanbefoundbyuserssearchingonGoogle.However,youmayhavepagesthatyoudon’twantindexed:forexample,internallogsornewsarticlesthatrequirepaymenttoaccess.
YoucanexcludepagesfromGoogle’sindexbycreatingarobots.txtfileandplacingitintherootdirectoryonyourwebserver.Therobots.txtfileliststhepagesthatsearchenginesshouldn’tindex.Creatingarobots.txtfileisstraightforwardandgivespublishersasophisticatedlevelofcontroloverhowsearchenginesaccesstheirwebsites.
Forexample,ifawebmasterwantstopreventindexingofhisinternallogstherobots.txtfileshouldcontain:
User-Agent:Googlebot – TheUser-AgentlinespecifiesthatthenextsectionisasetofinstructionsjustfortheGooglebot.
Disallow:/logs/ –TheDisallowlinetellstheGooglebotnottoaccessfilesinthelogssub-directoryofyoursite.
Can Google index your site?
Occasionally,webmasterswilldiscoverthattheirsitesarenotappearinginsearchresults.Theissuecouldbeoneof‘indexability’–whetherornottheGooglebotcanmakeacopyofawebpageforinclusioninoursearchresults.
Structure and ContentAcommonreasonfornon-inclusioninsearchresultsistiedtothestructureandcontentofthewebpage.Forexample,apagethatrequiresausertofilloutaformmaynotbeindexablebyGoogle.Norcanapageusing‘dynamiccontent’(Flash,JavaScript,framesordynamicallygeneratedURLs)alwaysbeeasilyindexedbysearchengines.Ifyouarewonderingwhetherthismightbeyoursite’sproblem,tryviewingthesiteinatextbrowserlikeLynxorinabrowserwithimages,Javascript,andFlashturnedoff,whichwillsignalwhetherallofyourcontentisaccessible.
Ifyoursiteusesalotofimages,ensurethatyoudescribetheimportantcontentofeachimageinthetext.Notonlydoesthisallowsearchenginestoindextheimagecorrectly,italsomakestheimageaccessibletovisuallyimpairedusers.Youcanalsomakeuseofalttextfortheimageandusedescriptivefilenames,asshowninthisexample(whichisanimageofalogoforacompanycalled‘Buffy’sHouseofPies’):
<imgsrc=”buffyshouseofpies.jpg”alt=”WelcometoBuffy’sHouseofPies!”>
URLsAnadditionalhurdlecouldbetheURLitself.IftherearesessionIDsorseveralparametersintheURL,oriftheURLredirectsanumberoftimes,Googlemaynotbeabletoindexthepage.
Server and NetworkServerornetworkissuesmaypreventusfromaccessingcertainpagesofyoursite.ByusingtoolsavailableatGoogle’sWebmasterCentral,publisherscanseealistoftheirpagestheGooglebotcannotaccess.TolearnmoreaboutWebmasterCentral,seepage13.
Robots Exclusion ProtocolOccasionallypageswillbeblockedbytheRobotsExclusionProtocol,atechnicalstandardthatallowswebpublishersto‘tell’searchenginesnottoindextheirsite’scontent(seepage7).Ifyourwebsiteisn’tappearinginGooglesearchresultsyoushouldchecktoensurethatrobots.txtorametatagisn’tblockingaccesstoourcrawlers.
8 9
Robots.txt vs. meta tags
Ingeneral,robots.txtisagoodwaytoprovidesite-widecontrol,whilemetatagsgivefine-graincontroloverindividualfiles.Metatagsareparticularlyusefulifyouhavepermissiontoeditindividualfilesbutnottheentiresite.Metatagsalsoallowyoutospecifycomplexaccess-controlpoliciesonapage-by-pagebasis.
Sometimeseitherofthetwotoolscansolvethesameproblem:
HowcanImakesurethetextofapageisindexed,butnottheimages?
Oneoptionwouldbetoblockaccesstoimagesbyfileextensionacrossyoursiteusingrobots.txt.Thefollowinglinesinarobots.txtfiletellGooglenottoindexanyfilesendingin*.jpgor*.jpeg:
User-agent:Googlebot
Disallow:/*.jpg$
Disallow:/*.jpeg$
Alternatively,ifyourContentManagementSystem(CMS)storesimagesinaseparatedirectory,youcanexcludethatentiredirectory.Ifyourimagesareinadirectorycalled/imagesyoucanexcludethatdirectoryfromallsearchenginesusing:
User-agent:*
Disallow:/images/
AnotheroptionwouldbetoaddaNOINDEXtagtoeachfilethatincludesanimage.
Alltheseapproacheswillkeepyourimagesfrombeingindexed;theonlyquestionishowextensiveyouwouldlikethisimageexclusiontobe.
ThesiteownerhasspecifiedthatnoneofthepagesinthelogsdirectoryshouldshowupinGoogle’ssearchresults.
Allmajorsearchengineswillreadandobeytheinstructionsyouputinrobots.txt,andyoucanspecifydifferentrulesfordifferentsearchenginesifyouwish.
Meta tags: fine-grain control
Inadditiontotherobots.txtfile–whichallowsyoutospecifyinstructionsconciselyforalargenumberoffilesonyourwebsite–youcanusetherobotsmetatagforfine-graincontroloverindividualpagesonyoursite.Toimplementthis,simplyaddspecificmetatagstoanHTMLpagetocontrolhowthatpageisindexed.Together,robots.txtandmetatagsgiveyoutheflexibilitytoexpresscomplexaccesspoliciesrelativelyeasily.
Ihaveaparticularnewsarticleonmysitethatisonlyaccessibletoregisteredusers.HowdoIkeepthisoutofGoogle’ssearchresults?
Todothis,simplyaddtheNOINDEXmetatagtothefirst<head>sectionofthearticle.Itshouldlooksomethinglikethis:
<html>
<head>
<metaname=”googlebot”content=”noindex”>
[...]
ThisstopsGooglefromindexingthisfile.
However,it’sworthkeepinginmindthatinsomecasesyoumaywantGoogletoindexthesetypesofpages–forexample,anarchivenewspaperarticlethatviewerscanpaytoreadonline.Whilethistypeof“premium”contentwon’tappearinGoogle’ssearchresults,certainGoogleserviceslikeNewsArchiveSearchwillincludethearticleintheirindexes,withpaymentinformationclearlydisplayedtousers.
10 11
Does your site have unique and useful content?
Oncethesiteisdiscoverableandindexable,thefinalquestiontoaskiswhetherthecontentofthewebpagesisuniqueanduseful.
Firstlookatyourtextasawhole.Areyourtitleandtextlinksdescriptive?Doesyourcopyflownaturallyandinaclearandintuitivemanner?
Justasachapterinabookisorganisedaroundspecificareasandthemes,soeachwebpageshouldbefocusedonaspecificareaortopic.Keywordsandphrasesemergenaturallyfromthistypeofcopy,andusersarefarmorelikelytostayonawebpagethatprovidesrelevantcontentandlinks.
Makesure,however,thatthephrasesyouwriteincludethephrasesthatvisitorswilllikelysearchfor.Forinstance,ifyoursiteisforanMGenthusiastclub,makesurethewords‘MG’and‘cars’actuallyappearinthecopy,ratherthanonlytermslike‘Britishautomobiles’.
Controlling caching and snippets
Searchresultsusuallyshowacachedpagelinkandasnippet.Here,forexample,isoneofthefirstresultsweseewhenwesearchfor‘Mallardduck’:
Snippet–anextractoftextfromthewebpage
Cached link–thislinktakesuserstoacopyofthepagestoredonGoogle’sservers
Whyhaveasnippet?Usersaremorelikelytovisitawebsiteifthesearchresultsshowasnippetfromthatsite.Thisisbecausesnippetsmakeitmucheasierforuserstoseetherelevanceoftheresulttotheirquery.Ifauserisn’tabletomakethisdeterminationquickly,heorsheusuallymovesontothenextsearchresult.
Whyhaveacachedlink?Thecachedlinkisusefulinanumberofcases,suchaswhensitesbecometemporarilyunavailable;whennewssitesgetoverloadedintheaftermathofamajorevent;orwhensitesareaccidentallydeleted.AnotheradvantageisthatGoogle’scachedcopyhighlightsthewordsemployedbytheuserintheirsearch,allowingaquickevaluationoftherelevanceofthepage.
MostwebpublisherswantGoogletodisplayboththesnippetandthecachedlink.However,therearesomecaseswhereasiteownermightwanttodisableoneorbothofthese:
Mynewspapercontentchangesseveraltimesaday.Itdoesn’tseemliketheGooglebotisindexingthatcontentasquicklyasweareupdatingit,andthecachedlinkispointingtoapagethatisnolongerup-to-date.HowcanIpreventGooglefromcreatingacachedlink?
ThenewssiteownercanpreventthiscachedlinkfromappearinginsearchresultsbyaddingtheNOARCHIVEtagtoitspage:
<METANAME=”GOOGLEBOT”CONTENT=”NOARCHIVE”>
Similarly,youcantellGooglenottodisplayasnippetforapageviatheNOSNIPPETtag:
<METANAME=”GOOGLEBOT”CONTENT=”NOSNIPPET”>
Note:AddingNOSNIPPETalsohastheeffectofpreventingacachelinkfrombeingshown,soifyouspecifyNOSNIPPET,youautomaticallygetNOARCHIVEaswell.
12 13
Increasing visibility: best practices
Siteownersoftenaskusaboutthebestwaystoincreasethevisibilityandrankingoftheirsitesinoursearchresults.Oursimpleansweris,‘Thinklikeauser,becausethat’showwetrytothink.’
Whatdoesthismeaninpractice?Aboveall,makesuretogivevisitorstheinformationtheyarelookingfor,asrelevanceiswhatwilldrivetraffictoyoursiteandhelpyouretainit.
Manysiteownersfixateonhowwelltheirrespectivewebpagesrank.Butrankingisdeterminedbyover200criteriainadditiontoPageRank.It’smuchbettertospendyourtimefocusingonthequalityofyourcontentanditsaccessibilitythantryingtofindwaysto‘game’asearchengine’salgorithm.Ifasitedoesn’tmeetourqualityguidelines,itmaybeblockedfromtheindex.
What to do:
1. Createrelevant,eye-catchingcontent:visitorswillarriveatyourpagesviavariouslinks,somakesureeachpagewillgrabtheirattention.
2. Involveusers:canyouaddacommentssectionorblogtoyourwebsite?Buildingacommunityhelpsdriveregularusageofyoursite.Gettingyourvisitorsinvolvedhelpsboostvisibilityanduserfidelity.
3. Monitoryoursite:useWebmasterCentral(seepage13)toseewhatqueriesarepushingvisitorstoyoursite,ortotrackrankingchangesinsearchresultsinrelationtolargersitechanges.
4. Aimforhigh-quality,inboundlinks.
5. Providecleartextlinks:placetextlinksappropriatelyonyoursiteandmakesuretheyincludetermsthatdescribethetopic.
What to avoid:
1. Don’tfillyourpagewithlistsofkeywords.
2. Don’tattemptto‘cloak’pagesbywritingtextthatcanbeseenbysearch enginesbutnotbyusers.
3. Don’tputup‘crawleronly’pagesbysettinguppagesorlinkswhosesole purposeistofoolsearchengines.
4. Don’tuseimagestodisplayimportantnames,contentorlinks–search enginescan’t‘read’images.
5. Don’tcreatemultiplecopiesofapageunderdifferentURLswiththeintentofmisleadingsearchengines.
Whenindoubt,consultourwebmasterguidelines,availableat:google.com/webmasters/guidelines.html
Webmaster Central
Asacompanyaimingtoprovidethemostrelevantandusefulsearchresultsontheweb,westrivetoprovidescalableandequitablesupportforallwebmastersandallwebsites,nomatterhowlargeorsmall.That’swhywe’vecreatedWebmasterCentral,locatedatgoogle.com/webmasters.
WebmasterCentralisagreatresourceforallwebpublishers.Itcomprehensivelyanswerscrawling,indexing,andrankingquestions;providesandavenueforfeedbackandissues;andoffersdiagnostictoolsthathelpwebmastersdebugpotentialcrawlingproblems.
Here’satasteofwhatyoucanfindatWebmasterCentral.
• Diagnosepotentialproblemsinaccessingpagesandoffersolutions
• Requestremovalofspecificpagesfromourindex
• Ensureyourrobots.txtfileisallowingandblockingthepagesyouexpect
• Seequeryandpagestatisticsrelatedtoyourwebsite:
• Querystatistics:Determinewhichsearchqueriesbringyoursitethemostvisitors,andwhattopicscouldyoursiteexpandupontocapturemoretraffic?
• PageAnalysis:SeeyourwebpageasGoogleseesit.Viewthemostcommonwordsonyoursite,inboundlinkstothesite,andhowothersdescribeyoursitewhentheylinktoit.
• Crawlrate:SeehowfrequentlyyoursiteisbeingcrawledbytheGooglebotandtellGooglewhetheritshouldbecrawlingfasterorslower.
Sitemaps
WebmasterCentralalsoofferspublishersSitemapsforwebsearch,mobile,andnewsresults.
Sitemapsisaprotocolwesupportwithothersearchenginestohelpwebmastersgiveusmoreinformationabouttheirwebpages.SitemapscomplementsexistingstandardwebcrawlmechanismsandwebmasterscanuseittotellGoogleaboutthepagesoftheirsiteinordertoimprovethecrawlingandvisibilityoftheirpagesinGooglesearchresults.
InadditiontoWebSearchSitemaps,wealsoofferGoogleMobileSitemaps,whichenablespublisherstosubmitURLsthatservecontentformobiledevicesintoourmobileindex.
AndforthosepublisherswhosenewssiteisincludedinGoogleNews,NewsSitemapscanhelpprovidestatisticsaboutthepublisher’sarticles,fromqueriestofrequencyofappearance.UsedinconjunctionwithWebmasterCentraldiagnostictools,NewsSitemapscanalsoprovideerrorreportsthathelpexplainanyproblemsGoogleexperienceswhencrawlingorextractingnewsarticlesfromapublisher’ssite.Inaddition,apublishercansubmitaNewsSitemapcontainingURLsthatitwouldlikeconsideredforinclusioninGoogleNews.NewsSitemaps,unlikeWebandMobileSitemaps,iscurrentlyonlyavailableinEnglish,althoughwehopetosoonmakeitavailableinotherlanguagesaswell.
Frequently Asked Questions
Whycan’tyoudoone-on-onesupportformywebsite?Bysomeestimates,thereare100millionsitesontheweb.Eachofthosewebsitesisimportanttous,becausewithoutthem–nomatterhowbigorsmall–ourindexwouldbelesscomprehensive,andultimatelylessusefultoourusers.
WebmasterCentralisagreatsourceofsupportforalltypesofwebsites.Wepostandanswerpublishers’questionssothateveryonecanbenefitfromtheinformation.AtWebmasterCentralyoucanalsofindafriendlyandusefulcommunityofwebmasterswhocansharetipsandhelpwithtroubleshooting.
Dotheadsyoushowinfluenceyourrankings?Areyouradlistingsandsearchresultstotallyseparate?Adandsearchrankingsarenotrelatedinanyway,andinfactwehaveentirelyseparateteamstoworkonthemsothatthereisnointerference.Webelievethattheobjectivityofoursearchresultsiscrucialtoprovidingthebestexperienceforusers.
HowdoIaddasitetoGoogle’ssearchindex?InclusioninGoogle’ssearchresultsisfreeandeasyanddoesnotrequireamanualsubmissionofthesitetoGoogle.Googleisafullyautomatedsearchengine;itcrawlsthewebonaregularbasisandfindssitestoaddtoourindex.Infact,thevastmajorityofsiteslistedinourresultsaren’tmanuallysubmittedforinclusion,butfoundandaddedautomaticallywhenourspiderscrawltheweb.
Inaddition,GoogleWebmasterTools(atWebmasterCentral)provideaneasywayforwebmasterstosubmitasitemaportheirURLstotheGoogleindexandgetdetailedreportsaboutthevisibilityoftheirpagesonGoogle.WithGoogleWebmasterTools,siteownerscanautomaticallykeepGoogleinformedofallcurrentpagesandofanyupdatesmadetothosepages.
Howlong,onaverage,doesittakeforGoogletofindanewlycreatedwebsite,andhowfrequentlydoesGooglecrawlthewebingeneral?ThereisnosetamountoftimeittakesforGoogletofindanewsite.
TheGooglebotregularlycrawlsthewebtorebuildourindex.ByusingWebmasterCentral,awebmastercanseehowfrequentlyhissiteisbeingcrawledbytheGooglebotandtellGooglewhetheritshouldbecrawlingfasterorslower.
WhatifIwantmywebsitetoappearonwebsearchresults,butnotonseparateserviceslikeGoogleNewsorGoogleImageSearch?Googlealwaysallowswebpublisherstooptoutofitsservices,andapublishercancontactthesupportteamofaparticularproducttodoso.
Asdiscussedearlierinthisbooklet,theRobotsExclusionProtocolcanbeusedtoblockindexingofbothimagesandwebpages.The‘URLremoval’featureinWebmasterCentralcanalsobeusedforthispurposeandcoverswebsearchandimagesearch.
14 15
Inaddition,becausetheGooglebotreliesonseveraldifferentbots,youcantargetwhatyoublock:
•Googlebot:crawlpagesfromourwebindexandournewsindex
•Googlebot-Mobile:crawlspagesforourmobileindex
•Googlebot-Image:crawlspagesforourimageindex
•Mediapartners-Google:crawlspagestodetermineAdSensecontent.WeonlyusethisbottocrawlyoursiteifyoushowAdSenseadsonyoursite.
•Adsbot-Google:crawlspagestomeasureAdWordslandingpagequality.WeonlyusethisbotifyouuseGoogleAdWordstoadvertiseyoursite.
Forinstance,toblockGooglebotentirely,youcanusethefollowingsyntax:
User-agent:Googlebot
Disallow:/
CanIchoosewhattextIwantspecifiedasasnippet? No.Thisisn’tagoodidea,bothfromtheperspectiveoftheuserandthecontentcreator.Wechooseasnippetoftextfromthesitethatshowsthesearcher’squeryincontext,whichinturndemonstratestherelevanceoftheresult.
Studiesshowthatusersaremorelikelytovisitawebsiteifthesearchresultsshowthesnippet.Thisisbecausesnippetsmakeitmucheasierforuserstoseewhytheresultisrelevanttotheirquery.Ifausercan’tmakethisdeterminationquickly,heorsheusuallymovesontothenextsearchresult.
WebpublisherscanincludeametatagintheirpagesinordertoprovideGooglewithadditionalinputincaseswherewearen’tabletogenerateausefulsnippetfromthecontentonthepagealgorithmically.Todothis,simplyaddthefollowingtothe<head>sectionofthepage:
<metaname=”description”content=”Whydoesn’tAnya
likebunnies?We’reabouttofindout.”>
Anywebpublisherwhodoesn’twantasnippetgeneratedfromtheirpagescanusetheNOSNIPPETtag,asfollows:
<metaname=”robots”content=”nosnippet”>
Finally,wesometimesuseasite’sdescriptionfromtheOpenDirectoryProjectforthesearchresultsnippet.Ifyoudon’twantthisdescriptiontobeused,simplyaddthefollowingmetatag:
<metaname=”robots”content=”noodp”>
Breakingnewsarticlesonmysiteonlyappearforafewhoursbeforebeingupdatedandmovedtoastandardarticlessection.IwantthefullarticletoappearinGoogle’sindex,butnotthesebreakingnewsstories.Oneoptionistoputallthebreakingnewsarticlesinonedirectory,anduserobots.txttodisallowtheGooglebotaccesstothatdirectory.
AnotheroptionistoaddtheNOFOLLOWtagtothe<HEAD>sectionofthehtmlofyourbreakingnewssection.ThistellstheGooglebotnottofollowanylinksitfindsonthatpage.Keepinmind,though,thatNOFOLLOWonlystopstheGooglebotfromfollowinglinksfromonepagetoanother.Ifanotherwebpagelinkstothatarticle,Googlecanstillfindthearticlewhenitindexes.
IfIhavemultipledomainnamesandrunthesamecontentoffthesedifferentdomainswillIbekickedoutofyoursearchresults?Althoughsomepublishersmaytrytofoolsearchenginesbyduplicatingcontentandrunningmirrorsites,thereisalsolegitimatecontentthatmaybeduplicatedforgoodreasons.Googledoesn’twanttopenalisethosesites.Forexample,wedon’ttreatsimilarcontentexpressedindifferentlanguages(say,EnglishononesiteandFrenchinanother)asduplicatecontent.
Havingthesamecontentonmultiplewebsites(e.g.articlesyndication)won’tnecessarilyresultinoneorsomeofthosesitesbeingremovedfromthesearchresultsentirely.However,keepinmindthateachinstanceofthearticleislikelytoappearlowerintherankingsbecauseitonlyhasafractionoftheincominglinksthatasinglecopywould.Ingeneral,asinglecopyofanarticlewillrankhigherandthereforebeseenbymoreusersthanwillmultiplecopiesofthesamecontent.
Inaddition,inordertoensuresearchquality,Googledoesn’tincludemultiplecopiesofapageinoursearchresults.Rather,weoftenchoosejustoneversionofthepagetoshow.However,webmasterscanindicatetoGoogletheirpreferredversionbyusingrobots.txtorametatagtoblockanycopiestheydon’twantshowingupinoursearchresults.
WhyismysitebeingblockedfromtheGoogleindex? First,yoursitemaynotbeblocked.Therearemanyreasonswhyasitemaynotappearinoursearchresults(seepages5-11).
Ifyoursiteisn’tpresentinganyhurdlesfordiscoveryorindexation,thenitcouldbethatyoursiteisblocked.SitesmaybeblockedfromourindexbecausetheydonotmeetthequalitystandardsoutlinedinourWebmasterGuidelines(availableatWebmasterCentral).Thismostoftenhappenswhenawebsiteisusingunfairmethodstotrytoappearhigherinthesearchrankings.Commonguidelineviolationsincludecloaking(writingtextinsuchawaythatitcanbeseenbysearchenginesbutnotbyusers)orsettinguppages/linkswiththesolepurposeoffoolingsearchenginesandmanipulatingsearchengineresults.
Whenwebmasterssuspectthattheirsitesviolateourqualityguidelines,theycanmodifytheirsitetomeettheseguidelines,thenclickthe“requestre-inclusion”linkwithinourWebmasterToolsinterfacetoaskustore-evaluatethesite.
16 17
18 19
Glossary
Cache link AsnapshotofhowapageappearedthelasttimeGooglevisitedit.Acachedcopyallowsuserstoviewapageevenwhentheliveversionisunavailable,althoughthecontentmaydifferslightly.Toviewacachedcopy,clickonthe‘cached’linkthatappearsunderneathasearchresult.
Cloaking Showingsearchenginesdifferentcontentthanwhatyoushowusers.
CMS (Content Management System) Asoftwaresystemusedtomanagecontentfromcomputer,image,andaudiofilestowebcontent.
Crawler SoftwareusedtodiscoverandindexURLsontheweboranintranet.
Crawling Theprocessusedbysearchenginestocollectpagesfromtheweb.
Dynamic content Contentsuchasimages,animations,orvideoswhichrelyonFlash,JavaScript,frames,ordynamicallygeneratedURLs.
File extension Thenameofacomputerfile(.doc,.txt,.pdf,etc.)oftenusedtoindicatethetypeofdatastoredinthefile.
HTML (Hypertext Markup Language) Amark-uplanguageusedonthewebtostructuretext.
To index Theprocessofhavingyoursite’scontentaddedtoasearchengine.
Keyword Awordthatisenteredintothesearchboxofasearchengine.Thesearchenginethenlooksforpagesthatincludethewordorphrase.
Meta tags AtagintheHTMLthatdescribesthecontentofawebpage.Metatagscanbeusedtocontrolindexingofindividualpagesinawebsite.
Mirror site Aduplicatewebpage;sometimesusedtofoolsearchenginesandtrytooptimiseindexationandwebrankingsofawebsite.
20
Page Rank AGooglefeaturethathelpsdeterminetherankofasiteinoursearchresults.PageRankreliesontheuniquelydemocraticnatureofthewebbyusingitsvastlinkstructureasanindicatorofanindividualpage’svalue.Important,high-qualitysitesreceiveahigherPageRank,whichGooglerememberseachtimeitconductsasearch.GooglecombinesPageRankwithsophisticatedtext-matchingtechniquestofindpagesthatarebothimportantandrelevanttoyoursearches.
Robots Exclusion protocol Atechnicalspecificationthattellssearchengineswhichsiteorpartsofasiteshouldbenon-searchable,andwhichpartsshouldremainvisibleinthesearchresults.
Robots.txt Atextfilethatallowsawebpublishertocontrolaccesstotheirsiteatmultiplelevels,fromtheentiresitetoindividualdirectories,pagesofaspecifictype,orevenindividualpages.Thisfiletellscrawlerswhichdirectoriescanorcannotbecrawled.
Root directory Thetoporcoredirectoryinacomputerfilesystem.
URL (Uniform Resource Locator) TheaddressofawebsiteontheInternet,consistingoftheaccessprotocol(http),domainname(www.google.com),andinsomecasesthelocationofanotherfile(www.google.com/webmaster).