data scraping with excel - campixx 2013 - maik schmidt

17.03.2013 – Berlin - SEO Campixx

Data Scraping with Excel – by Maik Schmidt

Wer ich bin

• Maik Schmidt• SEO Consultant bei Catbird Seat (2010)• SEO-Contest „KubaSEOTräume“ Gewinner

@chillboyy

facebook.com/chillboy.de

xing.com/profile/Maik_Schmidt11

http://www.catbirdseat.de/

http://www.seocontest.de/kubaseotraeume/siegerehrung/

https://twitter.com/ChillBoyy

https://twitter.com/ChillBoyy

http://www.facebook.com/chillboy.de

https://www.xing.com/profile/Maik_Schmidt11



Was scrapen wir heute?

• Standard KPIs• Malware Checker• Index Checker• Google SERPs• Google Suggest

Warum Excel?

• Weil ich nicht programmieren kann

Nachteile:• Langsam• Begrenzte Datenmengen

?

Was benötige ich?

• Excel• Niels Bosma SEO Tools for Excel

http://nielsbosma.se/projects/seotools/

http://nielsbosma.se/projects/seotools/

Niels Bosma SEO Tools (1/4)

Onpage • LinkCount• HtmlTitle• HtmlMetaDescription• HtmlMetaKeywords• HtmlMeta• HtmlFirst• HtmlH1• HtmlH2• HtmlH3• HtmlCanonical• W3CValidate• PageCodeToTextRatio• PageSize• PageTextSize• PageCodeSize• HttpStatus• HttpHeader• ResponseTime• PageEncoding• IsFoundOnPage

Content • FindDuplicatedContent• CountWords• LCS• SpinTextBacklinks• CheckBacklink• GooglePageRank• GoogleResultCount• GoogleIndexCount• GoogleLinkCount• AlexaReach• AlexaPopularity• AlexaLinkCount• DmozEntries• WikipediaLinksSocial • FacebookLikes• GooglePlusCount• TwitterCount

u.v.m.


SEOlytics• Backlinks• SVR (Sichtbarkeit)• Keyword Rankings• Domain Metriken• LinkCount/URL• Link History


MajesticSEO• Größte Backlink DB• Fresh Index• Historischer Index• Trust/Citation Flow


Google Analytics• Ähnlich:

http://ga-dev-tools.appspot.com/explorer/

• =GoogleAnalytics(string id,string metrics,string startDate,string endDate,[string dimensions,string segment,string filter,string sort,integer startIndex,integer maxResults,bool excludeHeaderInResult,bool excludeDimensionsInResult]) : {string}

http://ga-dev-tools.appspot.com/explorer/

X-Path Basics

Mit X-Path kann man bestimmte Teile innerhalb eines XML-Dokumentes adressieren

Beispiele:Document root node:/ Direct child element: XML_element_name Direct child of the root node: /XML_element_name Child of a child: XML_element_name/XML_element_nameDescendant of the root://XML_element_name Descendant of a node: XML_element_name//XML_element_name Parent of a node:../ A far cousin of a node ../../XML_element_name/XML_element_name

/html/body/div/div/div/h3[position()=1

Holt sich in diesem Pfad den Inhalt des ersten H3 Tags

//h3[@class='x']/a);"href"

Holt sich alle Links innerhalb H3 Tags mit der Class „X“

Um Google SERPs zu scrapen

Um Sichtbarkeitsindex.de zu scrapen

X-Path easy rausfinden

Mit dem Firefox Plugin Firebug (und FirePath) lässt sich der X-Path ziemlich schnell und leicht finden:

Standard KPIs

QUELLEN:Free SI: Sichtbarkeitsindex.de/deinedomain.deSI API: http://api.sistrix.net/domain.sichtbarkeitsindex?api_key=xy&domain=deinewebseite.deAlexa Rank: http://www.alexa.com/siteinfo/deinedomain.de

=XPathOnUrl[URL];"/html/body/div/div/div/h3[position()=1]")

=XPathOnUrl([SI API URL];"response/answer/sichtbarkeitsindex";"value")

=XPathOnUrl([Alexa URL];"//table[@id='siteStats']/tbody/

tr[1]/td[2]/div")

Google Save Browsing API

Quelle: http://safebrowsing.clients.google.com/safebrowsing/diagnostic?site=domain.de

=UrlProperty([URL];"domain")

=XPathOnUrl([Google SafeBrowsing URL]; "/html/body/center/div/div/blockquote/p[position()=1]")

Index Checker

Quelle: http://www.google.de/search?gcx=c&sourceid=chrome&ie=UTF-8&pws=0&q=info:deinewebseite.de

=HttpStatus([USER URL])

=WENN(ISTFEHLER(IDENTISCH(TEIL(XPathOnUrl("http://www.google.de/search?gcx=c&sourceid=chrome&ie=UTF-8&q=

"&("info:"&(A2))&"&pws=0";"//li[@class='g']//h3[@class='r']//a";"href");8;LÄNGE(A2));A2));"not indexed";"indexed")

=WENN(HtmlCanonical(A2)=A2;"self canonical";HtmlCanonical(A2))

Google Suggest Scrapen

• Quelle: http://google.de/complete/search?output=toolbar&hl=de&q=

• Scraped das KW + mit/ohne Leerzeichen und einem Buchstaben

• Matrix Funktion um 10er Ergebnisse zu scrapen

• 2. Iteration der Top 10

Über 600 suggested Keywords!

Google SERPs scrapen

/url?q=http://de.wikipedia.org/wiki/Suchmaschinenoptimierung&sa=U&ei=bTU2UP6sPMfNsgbAnoHYBQ&ved=0CB0QFjAA&usg=AFQjCNHwx6lcRxVC0-eBeDJ6GgHBiHGtFQ

Quelle:

http://www.google.de/search?q=deinkeyword&num=100&start=0&pws=0

Ergebnis:

Formel:

=RECHTS(B1;LÄNGE(B1)-7) =RECHTS(C1;LÄNGE(C1)-SUCHEN("&amp";C1))

=XPathOnUrl([URL];"(//h3[@class='r']/a)["&A1&"]";"href")

&

Google SERPs scrapen

Watt noch?

Analytics for Twitter von Microsoft

&Power Pivot

http://www.microsoft.com/en-us/download/details.aspx?id=26213




http://www.microsoft.com/de-de/download/details.aspx?id=7609

Ende

Be Creative!Die live gezeigte Excel-Dateien werden auf dem Blog von www.catbirdseat.de als Download zur Verfügung stehen

Mit gezeigten Beispielen & Tools kann man theoretisch jede x-beliebige Webseite abscrapen und in Excel verarbeiten

http://www.catbirdseat.de/

data scraping with excel - campixx 2013 - maik schmidt

Self Improvement

excel niels bosma seo

google serps scrapenquelle

direct child element

string sort

string metrics

string filter

string startdate

string segment