pratical deep dive into the semantic web - #smconnect
TRANSCRIPT
International Freelance SEO
What is
―The Semantic Web is a collaborative
movement led by international standards
body the World Wide Web
Consortium (W3C). The standard promotes
common data formats on the World Wide
Web‖
―The Semantic Web provides a common
framework that allows data to be shared
and reused across application, enterprise,
and community boundaries‖
Why are Google and other online giants interested
So…what is the main reason?
36%
24%
29%
46%
42%
36% 37%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
North America South America Europe Asia Africa Oceania Global
2014 average versus 2015 until date
So how does the
How about those future
So…
54
55
International Freelance SEO
SEO Consultant Metapeople
/ Netbooster Group
Brand Ambassador Majestic
Cycling & Skating
Science: Physics in particular
1. Make data available
2. Use specific markup languages
3. Data is available for everyone
―The Open Graph protocol enables any web
page to become a rich object in a social
graph. For instance, this is used on
Facebook to allow any web page to have
the same functionality as any other object
on Facebook.‖
Use: https://developers.facebook.com/docs/opengraph/
Use: https://cards-dev.twitter.com/validator
1. Schema.org microdata
2. Open Graph protocol
3. Title + metadescription element
4. Best guess from page content
Use: https://developers.google.com/+/web/snippet/
Use: https://wordpress.org/plugins/wordpress-seo/
Use Amazon EC2, setup a crawler and crawl
the top 1.000.000 Alexa URLs
Checked for occurrences of:
–Microdata / Schema
–OpenGraph
–Twitter Cards
- Crawled with 360/URLS/sec
- 68.4GB of data used
- 68% (683267 URLs) returned 200 OK
- 27% 30X Redirects
- 3% of domains had DNS issues
15,84%
14,55%
1,59%
1,32%
7,27%
2,69%
0,22%
OpenGraph Title
OpenGraph URL
Twitter:title
Twitter:url
Schema itemprop
Schema Itemprop Name
AggregateRating
Based on 683k of top million Alexa urls
Commercial tool: http://www.builtwith.com
Commercial tool: http://www.builtwith.com
se·man·tics [si-man-tiks]
noun
the branch of linguistics that deals with the
study of meaning, changes in meaning,
and the principles that govern the
relationship between sentences or words
and their meanings
―Microdata is a set of tags, introduced with
HTML5, that allows you to do this.‖
• Is separated from the HTML
• Which gives more flexibility and scalabilty
options
• Used in more software, like the washing
machine I showed earlier
• But… Google hasn’t integrated everything
yet
<div itemscope itemtype="http://data-vocabulary.org/Review-aggregate">
<span itemprop="itemreviewed">Several German beers</span>
<img itemprop="photo" src="beer.jpg" />
<span itemprop="rating" itemscope itemtype="http://data-vocabulary.org/Rating">
<span itemprop="average">9</span>
<span itemprop="best">10</span>
</span>
<span itemprop="votes">24</span>
<span itemprop="count">5</span>
</div>
<div itemscope itemtype="http://schema.org/Person">
<span itemprop="name">Jan-Willem</span>
<img src="janwillem.jpg" itemprop="image" />
<span itemprop="jobTitle">International SEO</span>
<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="addressLocality">Amsterdam</span>,
<span itemprop="addressRegion">- Europe</span>
<span itemprop="postalCode">9999XX</span>
</div>
</div>
1. Products
2. Product offer
3. Product aggregated offer
Create multiple links to relevant pages within
1 entry in the SERPs.
• https://developers.google.com/structured-data/rich-
snippets/
• Schema Creator by Raven http://schema-creator.org/
• Schema.org Generator http://www.microdatagenerator.com/
• Rich Snippets Testing Tool Bookmarklet• http://www.blindfiveyearold.com/rich-snippets-testing-tool-bookmarklet
• Everything you need to know to generate
rich snippets: http://seogadget.com/micro-data-schema-org-
guide-to-generating-rich-snippets/
1. You have specific data points available
2. SE’s accept specific markup language
3. SE’s accept certain snippets
4. Information within the SERPs is correct
• Implement code and check with the SE’s:
https://developers.google.com/structured-data/testing-tool/?hl=it
• Make sure all items are structured and
nested in the correct way.
• Google Testing tool only shows errors
based on missing elements, not on wrong
coding!
https://plus.google.com/communities/103048251221048356778
―Google doesn’t use markup for ranking purposes at this time—but rich snippets can make your web pages appear more prominently in search results, so you may see an increase in traffic.‖
Source: https://support.google.com/webmasters/answer/1211158?hl=en
https://support.google.com/webmasters/contact/rich_snippets_spam
406
368
288
248
228
182
177
148
135
Artificial Intelligence and Machine Learning
Algorithms and Theory
Human-Computer Interaction and Visualization
Natural Language Processing
Machine Perception
Information Retrieval and the Web
Security, Cryptography, and Privacy
Data Mining
Software Systems
Top 10 Research fields per # Publications
What happened during the past 8 years?
2007 2010 2015
From a database to search engine result pages
Now… Let’s be honest
Basic information retrieval
Basic information retrieval
Basic information retrieval
Freebase only has +/- 200 attributes for the class Country
?
http://arxiv.org/pdf/1503.00759.pdf
http://research.google.com/pubs/pub41894.html
Four different methods to extract triples from web content
Natural Language
Processing tools
Entity recognition
Entity linkage
Entity verification
against Freebase
Source: https://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf
Document Object
Model
Either text or
database driven
―deep web‖ sources
Think of quering
HTML forms
570M tables on the
web
Relations are difficult
to extract
Schema matching
methods
Entity verification
against Freebase
Schema.org
Mostly people
related
Products & Events
are not stored
Mapping
Schema.org to
Freebase for
predicates
Researchers deal with ―duplicate content‖ as being just one source
P1
P2P3
P4
Exploring the power of tables on the Web
https://research.google.com/tables
The papers share some insights about the factors relevant to Google Tables results
Sources of data Google uses according to the paper
Optimise the
surrounding content
with relevant
captions and texts.
Use <th> table
headings to add
labels to specific
columns
Add relevant
attributes to your
table headings
focusing on the
queries used
Only add useful
content to the table.
Boilerplate content
is filtered out.
http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper3.pdf
―Extraction errors are far more prevalent than
source errors. Ignoring this distinction can
cause us to incorrectly distrust a website‖
Back to the basics for Google (and probably the other search engines too)
Links still tell something about
relationships between pages but also
between entities.
Simply search in the indices you already
have. In the case of Google, they already
have ―everything‖.
Simply gather user feedback from within
the search results.
Source: https://twitter.com/brentnau
Source: https://twitter.com/brentnau
One in 20 searches is health related according to Google.
Use Web based Fact
extraction, like DOM, tables
and annotated data
(Schema.org)
Text based extractors
adding more triples to the
datasets
Systems like described in the Biperpedia
paper. Data is enriched and quality
control takes place. Use partnerships for
trusted resources.
Use existing datasets like
Freebase / Wikidata to verify
extracted data and calculate
probability
Make sure you understand
A few possibilities to influence the content of brand cards
Main source still is Wikipedia, always backup your edits with sources
Your are able to give Google hints about your logo, corporate contacts and social profiles
Add schema.org Organization markup to your official website
Add schema.org Organization markup to your official website
Add schema.org Organization markup to your official website
Find example JSON-LD at
https://developers.google.com/structured-data/customize/overview
What about the localised Google search indices?
?
?
?
?
?
?
Contains the main
subject of the required
answer
Contains the main
subject of the required
answer
Within the content, the
question is answered in
a single sentence
No, Euro NCAP is more
authoritative in the EU
for car safety levels.
NHTSA for the US
Two indices, two truths?
So how can we make use this for our brand?
Since not many are focusing on the getting into the Direct Answers yet, grab the positions first!
95% of the cases had increased traffic - including movements within top 10 normal blue links.
Less than
expected, probably
because of quality of the
answer: results between -
5% and +6% traffic.
Results varied between -3%
and +11% depending on
previous position in the
SERPs
These were performing the
best, increases between 6
and 14%
Depending on the
topic, complicated topics
tend to get more clicks.
Average results between -
2% and 16% increase