pratical deep dive into the semantic web - #smconnect

Post on 11-Apr-2017

834 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

International Freelance SEO

What is

―The Semantic Web is a collaborative

movement led by international standards

body the World Wide Web

Consortium (W3C). The standard promotes

common data formats on the World Wide

Web‖

―The Semantic Web provides a common

framework that allows data to be shared

and reused across application, enterprise,

and community boundaries‖

Why are Google and other online giants interested

So…what is the main reason?

36%

24%

29%

46%

42%

36% 37%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

North America South America Europe Asia Africa Oceania Global

2014 average versus 2015 until date

So how does the

How about those future

So…

54

55

International Freelance SEO

SEO Consultant Metapeople

/ Netbooster Group

Brand Ambassador Majestic

Cycling & Skating

Science: Physics in particular

1. Make data available

2. Use specific markup languages

3. Data is available for everyone

―The Open Graph protocol enables any web

page to become a rich object in a social

graph. For instance, this is used on

Facebook to allow any web page to have

the same functionality as any other object

on Facebook.‖

Use: https://developers.facebook.com/docs/opengraph/

Use: https://cards-dev.twitter.com/validator

1. Schema.org microdata

2. Open Graph protocol

3. Title + metadescription element

4. Best guess from page content

Use: https://developers.google.com/+/web/snippet/

Use: https://wordpress.org/plugins/wordpress-seo/

Use Amazon EC2, setup a crawler and crawl

the top 1.000.000 Alexa URLs

Checked for occurrences of:

–Microdata / Schema

–OpenGraph

–Twitter Cards

- Crawled with 360/URLS/sec

- 68.4GB of data used

- 68% (683267 URLs) returned 200 OK

- 27% 30X Redirects

- 3% of domains had DNS issues

15,84%

14,55%

1,59%

1,32%

7,27%

2,69%

0,22%

OpenGraph Title

OpenGraph URL

Twitter:title

Twitter:url

Schema itemprop

Schema Itemprop Name

AggregateRating

Based on 683k of top million Alexa urls

Commercial tool: http://www.builtwith.com

Commercial tool: http://www.builtwith.com

se·man·tics [si-man-tiks]

noun

the branch of linguistics that deals with the

study of meaning, changes in meaning,

and the principles that govern the

relationship between sentences or words

and their meanings

―Microdata is a set of tags, introduced with

HTML5, that allows you to do this.‖

• Is separated from the HTML

• Which gives more flexibility and scalabilty

options

• Used in more software, like the washing

machine I showed earlier

• But… Google hasn’t integrated everything

yet

<div itemscope itemtype="http://data-vocabulary.org/Review-aggregate">

<span itemprop="itemreviewed">Several German beers</span>

<img itemprop="photo" src="beer.jpg" />

<span itemprop="rating" itemscope itemtype="http://data-vocabulary.org/Rating">

<span itemprop="average">9</span>

<span itemprop="best">10</span>

</span>

<span itemprop="votes">24</span>

<span itemprop="count">5</span>

</div>

<div itemscope itemtype="http://schema.org/Person">

<span itemprop="name">Jan-Willem</span>

<img src="janwillem.jpg" itemprop="image" />

<span itemprop="jobTitle">International SEO</span>

<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">

<span itemprop="addressLocality">Amsterdam</span>,

<span itemprop="addressRegion">- Europe</span>

<span itemprop="postalCode">9999XX</span>

</div>

</div>

1. Products

2. Product offer

3. Product aggregated offer

Create multiple links to relevant pages within

1 entry in the SERPs.

• https://developers.google.com/structured-data/rich-

snippets/

• Schema Creator by Raven http://schema-creator.org/

• Schema.org Generator http://www.microdatagenerator.com/

• Rich Snippets Testing Tool Bookmarklet• http://www.blindfiveyearold.com/rich-snippets-testing-tool-bookmarklet

• Everything you need to know to generate

rich snippets: http://seogadget.com/micro-data-schema-org-

guide-to-generating-rich-snippets/

1. You have specific data points available

2. SE’s accept specific markup language

3. SE’s accept certain snippets

4. Information within the SERPs is correct

• Implement code and check with the SE’s:

https://developers.google.com/structured-data/testing-tool/?hl=it

• Make sure all items are structured and

nested in the correct way.

• Google Testing tool only shows errors

based on missing elements, not on wrong

coding!

https://plus.google.com/communities/103048251221048356778

―Google doesn’t use markup for ranking purposes at this time—but rich snippets can make your web pages appear more prominently in search results, so you may see an increase in traffic.‖

Source: https://support.google.com/webmasters/answer/1211158?hl=en

https://support.google.com/webmasters/contact/rich_snippets_spam

406

368

288

248

228

182

177

148

135

Artificial Intelligence and Machine Learning

Algorithms and Theory

Human-Computer Interaction and Visualization

Natural Language Processing

Machine Perception

Information Retrieval and the Web

Security, Cryptography, and Privacy

Data Mining

Software Systems

Top 10 Research fields per # Publications

What happened during the past 8 years?

2007 2010 2015

From a database to search engine result pages

Now… Let’s be honest

Basic information retrieval

Basic information retrieval

Basic information retrieval

Freebase only has +/- 200 attributes for the class Country

?

http://arxiv.org/pdf/1503.00759.pdf

http://research.google.com/pubs/pub41894.html

Four different methods to extract triples from web content

Natural Language

Processing tools

Entity recognition

Entity linkage

Entity verification

against Freebase

Source: https://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf

Document Object

Model

Either text or

database driven

―deep web‖ sources

Think of quering

HTML forms

570M tables on the

web

Relations are difficult

to extract

Schema matching

methods

Entity verification

against Freebase

Schema.org

Mostly people

related

Products & Events

are not stored

Mapping

Schema.org to

Freebase for

predicates

Researchers deal with ―duplicate content‖ as being just one source

P1

P2P3

P4

Exploring the power of tables on the Web

https://research.google.com/tables

The papers share some insights about the factors relevant to Google Tables results

Sources of data Google uses according to the paper

Optimise the

surrounding content

with relevant

captions and texts.

Use <th> table

headings to add

labels to specific

columns

Add relevant

attributes to your

table headings

focusing on the

queries used

Only add useful

content to the table.

Boilerplate content

is filtered out.

http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper3.pdf

―Extraction errors are far more prevalent than

source errors. Ignoring this distinction can

cause us to incorrectly distrust a website‖

Back to the basics for Google (and probably the other search engines too)

Links still tell something about

relationships between pages but also

between entities.

Simply search in the indices you already

have. In the case of Google, they already

have ―everything‖.

Simply gather user feedback from within

the search results.

Source: https://twitter.com/brentnau

Source: https://twitter.com/brentnau

One in 20 searches is health related according to Google.

Use Web based Fact

extraction, like DOM, tables

and annotated data

(Schema.org)

Text based extractors

adding more triples to the

datasets

Systems like described in the Biperpedia

paper. Data is enriched and quality

control takes place. Use partnerships for

trusted resources.

Use existing datasets like

Freebase / Wikidata to verify

extracted data and calculate

probability

Make sure you understand

A few possibilities to influence the content of brand cards

Main source still is Wikipedia, always backup your edits with sources

Your are able to give Google hints about your logo, corporate contacts and social profiles

Add schema.org Organization markup to your official website

Add schema.org Organization markup to your official website

Add schema.org Organization markup to your official website

Find example JSON-LD at

https://developers.google.com/structured-data/customize/overview

What about the localised Google search indices?

?

?

?

?

?

?

Contains the main

subject of the required

answer

Contains the main

subject of the required

answer

Within the content, the

question is answered in

a single sentence

No, Euro NCAP is more

authoritative in the EU

for car safety levels.

NHTSA for the US

Two indices, two truths?

So how can we make use this for our brand?

Since not many are focusing on the getting into the Direct Answers yet, grab the positions first!

95% of the cases had increased traffic - including movements within top 10 normal blue links.

Less than

expected, probably

because of quality of the

answer: results between -

5% and +6% traffic.

Results varied between -3%

and +11% depending on

previous position in the

SERPs

These were performing the

best, increases between 6

and 14%

Depending on the

topic, complicated topics

tend to get more clicks.

Average results between -

2% and 16% increase

top related