how to reveal hidden relationships in data and risk analytics

40
How to Reveal Hidden Relationships in Data and Risk Analytics Ontotext Webinar, 28 Mar 2016

Upload: ontotext

Post on 16-Apr-2017

904 views

Category:

Technology


1 download

TRANSCRIPT

How to Reveal Hidden Relationships in Data and Risk Analytics

Ontotext Webinar, 28 Mar 2016

Presentation Outline

• Discovery and analytics case

• Data integration and FIBO mapping

• Discovery and analytics examples

• Future work

Apr 2016 Hidden Relationships in Data and Risk Analytics

Relation Discovery Case

Apr 2016 Hidden Relationships in Data and Risk Analytics

• Find suspicious

relationships like:

− Company in USA controls

− Another company in USA

− Through a company in an

off-shore zone

• Show news

relevant to them

• Database of locations with sub-region info

• Database with companies and control relations

• Define the semantics of the relevant relationships (using FIBO) – sub-region and control are transitive relationships

– located-in is transitive over sub-region

• Define suspicious relationships

CONSTRUCT { ?orgA my:suspiciousLink ?orgB } WHERE {

?orgA ptop:locatedIn ?x ; fibo:controls ?y .

?y fibo:controls ?orgB ; ptop:locatedIn ?z .

?orgB ptop:locatedIn ?x .

?z a ptop:OffshoreZone .

}

What It Takes to Make It Work?

Hidden Relationships in Data and Risk Analytics Apr 2016

Presentation Outline

• Discovery and analytics case

• Data integration and FIBO mapping

• Discovery and analytics examples

• Future work

Apr 2016 Hidden Relationships in Data and Risk Analytics

The Web of Linked Data in 2007

Apr 2016 Hidden Relationships in Data and Risk Analytics

structured database

version of Wikipedia

database of all

locations on Earth

product

reviews

semantic synonym

dictionary

Note: Each bubble represents a dataset.

Arrows represent mappings across datasets; e.g. dbpedia:Paris owl:sameAs geo:2988507

The Web of Linked Data is Gaining Mass

Apr 2016 Hidden Relationships in Data and Risk Analytics

The Web of Data is Gaining Mass (2011)

Apr 2016 Hidden Relationships in Data and Risk Analytics

The Web of Linked Data is Gaining Mass

Apr 2016 Hidden Relationships in Data and Risk Analytics

• 2013 stats: 2 289 public datasets − http://stats.lod2.eu/

• Growing exponentially − see the dotted trend line

• Structured markup − Schema.org; semantic SEO

• Enables better semantic tagging! − As there are more concepts and

richer descriptions to refer to

27 43 89 162295

822

2,289

2007 2008 2009 2010 2011 2012 2013

Linked Data Datasets

Data Integration and Loading

• DBpedia (the English version only) 496M statements

• Geonames (all geographic features on Earth) 150M statements − owl:sameAs links between DBpedia and Geonames 471K statements

• Company registry data (GLEI) 3M statements

• News metadata (from NOW) 128M statements

• Total size: 986М statements − 667M explicit statements + 318M inferred statements

− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-region constraints

Apr 2016 Hidden Relationships in Data and Risk Analytics

Global Legal Entity Identifier (GLEI) data

Apr 2016

• Global Markets Entity Identifier (GMEI) Utility data − The Global Markets Entity Identifier (GMEI) utility is DTCC's legal entity identifier solution offered in

collaboration with SWIFT

− We downloaded data dump from https://www.gmeiutility.org/

• RDF-ized company records − Fields: LEI#, legal name, ultimate parent, registered country

− 3M explicit statements for 211 thousand organizations

▪ For comparison, there are 490 000 organizations in DBPeda and D&B covers above 200 million

− 10,821 ultimate parent relationships and 1632 ultimate parents

− About 2 800 organizations from the GLEI dump mapped to DBPedia

Hidden Relationships in Data and Risk Analytics

GLEI Company Data Sample: ABN-AMRO

Apr 2016 Hidden Relationships in Data and Risk Analytics

lei:businessRegistry "Kamer van Koophandel"^^xsd:string

lei:businessRegistryNumber "34334259"^^xsd:string

lei:duplicateReference data:549300T5O0D0T4V2ZB28

lei:entityStatus "ACTIVE"^^xsd:string

lei:headquartersCity "Amsterdam"^^xsd:string

lei:headquartersState "Noord-Holland"^^xsd:string

lei:legalForm "NAAMLOZE VENNOOTSCHAP"^^xsd:string

lei:legalName "ABN AMRO Bank N.V."^^xsd:string

lei:lei "BFXS5XCH7N0Y05NIXW11"^^xsd:string

lei:registeredCity "Amsterdam"^^xsd:string

lei:registeredCountry "NL"^^xsd:string

lei:registeredPostCode "1082 PP"^^xsd:string

lei:registeredState "Noord-Holland"^^xsd:string

Global Legal Entity Identifier (GLEI) data

Apr 2016 Hidden Relationships in Data and Risk Analytics

Ultimate parent Children Country

1 The Goldman Sachs Group, Inc. 1 851 US

2 United Technologies Corporation 427 US

3 Honeywell International Inc. 341 US

4 Morgan Stanley 228 US

5 Cargill, Incorporated 217 US

6 1832 Asset Management L.P. 202 CA

7 Aegon N.V. 174 NL

8 Union Bancaire Privée, UBP SA 138 CH

9 Citigroup Inc. 135 US

10 State Street Corporation 128 US

Country Companies

1 dbr:United_States 103 548

2 dbr:Canada 17 425

3 dbr:Luxembourg 13 984

4 dbr:Sweden 7 934

5 dbr:United_Kingdom 7 421

6 dbr:Belgium 6 868

7 dbr:Ireland 4 762

8 dbr:Australia 4 385

9 dbr:Germany 3 039

10 dbr:Netherlands 2 561

Quick news-analytics case

Apr 2016 Hidden Relationships in Data and Risk Analytics

• Our Dynamic Semantic

Publishing platform

already offers linking

of text with big open

data graphs

• One can get navigate

from text to concepts,

get trends, related

entities and news

• Try it at

http://now.ontotext.com

Technology: Semantic Content Enrichment

Dec 2015 Technology, Clients & Use Cases, Market 15

News Metadata

• Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase

• News stream from Google since Feb 2015, about 10k news/month − ~70 tags (annotations) per news article

• Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases

Apr 2016 Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016 Hidden Relationships in Data and Risk Analytics

Category Count

International 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions / entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Class Hierarchy Map (by number of instances)

Apr 2016 Hidden Relationships in Data and Risk Analytics

Left: The big picture Right: dbo:Agent class (2.7M organizations and persons)

Loading FIBO

• FIBO = Financial Industry Business Ontology

• We loaded FIBO Foundations and BE in GraphDB − About 55 RDF files the “foundations-14-11-30” and “business-eneitites-15-02-23” packages

• Reasoning switched to OWL 2 RL − Loading takes 3-4 seconds

• Number of explicit statements: 5 433

• Number of total statements: 20 646 − Of which inferred and materialized: 15 213

Apr 2016 Hidden Relationships in Data and Risk Analytics

FIBO Class Hierarchy

Apr 2016 Hidden Relationships in Data and Risk Analytics

Explore properties related to a class

Apr 2016 Hidden Relationships in Data and Risk Analytics

Mapping FIBO to DBPedia

• We mapped FIBO to DBPedia Ontology − Minimalistic approach – we mapped as much as we needed

dbo:Organization rdfs:subClassOf fibo-fnd-org-fm:FormalOrganization.

dbo:Company rdfs:subClassOf fibo-be-le-cb:Corporation.

dbo:Person rdfs:subClassOf fibo-fnd-aap-ppl:Person.

dbo:subsidiary rdfs:subPropertyOf fibo-fnd-rel-rel:controls.

• Methodological notes − Note, fibo-fnd-rel-rel:controls is not transitive

− We mapped more specific DBPedia primitives to more general FIBO, so, that data becomes “visible” through FIBO

Apr 2016 Hidden Relationships in Data and Risk Analytics

See open data through the FIBO lens

Apr 2016 Hidden Relationships in Data and Risk Analytics

Presentation Outline

• Discovery and analytics case

• Data integration and FIBO mapping

• Discovery and analytics examples

• Future work

Apr 2016 Hidden Relationships in Data and Risk Analytics

Semantic Press-Clipping

• We can trace references to a specific company in the news − This is pretty much standard, however we can deal with syntactic variations in the names, because state

of the art Named Entity Recognition technology is used

− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)

• We can trace and consolidate references to daughter companies

• We have comprehensive industry classification − The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.

company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)

Apr 2016 Hidden Relationships in Data and Risk Analytics

Mentions of related entities

select distinct ?news ?title ?date ?rel_entity

from onto:disable-sameAs

where {

BIND( dbr:Volkswagen_Group as ?entity )

{ ?entity fibo-fnd-rel-rel:controls ?rel_entity }

UNION

{ BIND(?entity as ?rel_entity) }

?news pub-old:containsMention / pub-old:hasInstance / pub:exactMatch ?rel_entity .

?news pub-old:creationDate ?date; pub-old:title ?title .

FILTER ( (?date > "2015-04-01T00:02:00Z"^^xsd:dateTime)

&& (?date < "2015-05-01T00:02:00Z"^^xsd:dateTime))

}

Apr 2016 Hidden Relationships in Data and Risk Analytics

Industry distribution

Apr 2016 Hidden Relationships in Data and Risk Analytics

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX ff-map: <http://factforge.net/ff2016-mapping/>

select distinct ?top_industry (count(?company) as ?companies)

where {

?company dbo:industry ?industry .

?industrySum ff-map:industryVariant ?industry;

ff-map:industryCenter ?top_industry .

} group by ?top_industry order by desc(?companies)

Most popular companies per industry

Apr 2016 Hidden Relationships in Data and Risk Analytics

select distinct ?pub_entity ?label (count(?news) as ?news_count)

where {

?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .

?pub_entity pub:exactMatch ?entity; pub:preferredLabel ?label.

?entity dbo:industry ?industry .

dbr:Automotive ff-map:industryVariant ?industry .

} group by ?pub_entity ?label order by desc(?news_count)

Most popular companies, including children

Apr 2016 Hidden Relationships in Data and Risk Analytics

select distinct ?parent (count(?news) as ?news_count)

where {

{ select distinct ?parent ?entity {

BIND(dbr:Software as ?industry)

?industry ff-map:industryVariant ?industryVar .

?parent dbo:industry ?industryVar .

?parent a dbo:Company .

FILTER NOT EXISTS { ?parent dbo:parent / dbo:industry / ff-map:industryVariant ?industry }

{ ?entity dbo:parent ?parent . } UNION

{ BIND(?parent as ?entity) }

} }

?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .

?pub_entity pub:exactMatch ?entity .

?news pub-old:creationDate ?date .

} group by ?parent order by desc(?news_count)

News Popularity Ranking: Automotive

Apr 2016 Hidden Relationships in Data and Risk Analytics

Rank Company News # Rank Company incl. mentions of controlled News #

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999 3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370 5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity: Finance

Apr 2016 Hidden Relationships in Data and Risk Analytics

Rank Company News # Rank Company incl. mentions of controlled News #

1 Bloomberg L.P. 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731 3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc. 22601 5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg L.P. 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq, Inc. 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note: Including investment funds, stock exchanges, agencies, etc.

News Popularity: Banking

Apr 2016 Hidden Relationships in Data and Risk Analytics

Rank Company News # Rank Company incl. mentions of controlled News #

1 Goldman Sachs 996 1 China Merchants Bank * 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972 3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966 5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Note: including investment funds, stock exchanges, agencies, etc.

Regional exposition of a company

Apr 2016 Hidden Relationships in Data and Risk Analytics

select distinct ?country (count(*) as ?count)

from onto:disable-sameAs

where {

{ select distinct ?related_entity {

BIND ( dbr:Toyota as ?entity )

{ ?related_entity ff-map:agentRelation ?entity . } UNION

{ BIND(?entity as ?related_entity) }

}

}

?news pub-old:containsMention / pub-old:hasInstance

/ pub:exactMatch ?related_entity .

?news pub:country ?country .

} group by ?country order by desc(?count)

Regional exposition – normalized

Apr 2016 Hidden Relationships in Data and Risk Analytics

select distinct ?country (count(*) as ?count) (?count / ?country_score as ?score)

from onto:disable-sameAs

where {

{ select distinct ?related_entity {

BIND ( dbr:BP as ?entity )

{ ?related_entity ff-map:agentRelation ?entity . } UNION

{ BIND(?entity as ?related_entity) }

}

}

?news pub-old:containsMention / pub-old:hasInstance

/ pub:exactMatch ?related_entity .

?news pub:country ?country .

?country ff-map:countryPopularityScore ?country_score .

} group by ?country ?country_score having (?count > 20) order by desc(?score)

Relationships discovery examples

• Companies that control other companies across countries

• Companies that control other companies in the same country through a company in another country

• Companies that control other companies in the same country through a company in an off-shore zone

Apr 2016 Hidden Relationships in Data and Risk Analytics

Presentation Outline

• Discovery and analytics case

• Data integration and FIBO mapping

• Discovery and analytics examples

• Future work

Apr 2016 Hidden Relationships in Data and Risk Analytics

Analytics with relations extracted from text

Apr 2016 Hidden Relationships in Data and Risk Analytics

Subject Object Count

dbr:Chrysler dbr:Fiat_Chrysler_Automobiles 455

dbr:NASA dbr:Goddard_Space_Flight_Center 69

dbr:Time_Warner_Cable dbr:Comcast 44

dbr:National_Football_League dbr:New_England_Patriots 40

dbr:DirecTV dbr:AT&T 33

dbr:Alcatel-Lucent dbr:Nokia 31

dbr:AOL dbr:Verizon_Communications 30

dbr:University_of_Pennsylvania dbr:Perelman_School_of_Medicine_at_... UPEN 29

dbr:Time_Warner_Cable dbr:Charter_Communications 27

dbr:Continental_Airlines dbr:United_Airlines 26

Note: relation types "RelationOrganizationAffiliatedWithOrganization" "RelationAcquisition" "RelationMerger"

Future Work

Apr 2016

• Comprehensive mapping of LEI data

• Experiments on Ultimate Parent discovery

• Partnership with commercial data providers

• Organizations, related in the news, but not in other datasets

• Organizations, co-occurring in the news, but not in other datasets

• Construct a profile of related entities for an orgnization

Hidden Relationships in Data and Risk Analytics

Wrap up

Apr 2016

• We allow Open Data to be accessed via FIBO − It took just few days to clean up DBPedia’s industry classifications and control relationships

• Integrating more data sources is easy (e.g. GLEI) − We can integrate proprietary and 3rd party data within days or weeks

• We can perform analytics on metadata − Regional exposition, popularity of entities, relation extraction

• All integrated in proven products and solutions − GraphDB triplestore, OpenPolicy, Dynamic Semantic Publishing platform

Hidden Relationships in Data and Risk Analytics

Thank you!

Experience the technology with NOW: Semantic News Portal

http://now.ontotext.com

Start using GraphDB and text-mining with S4 in the cloud

http://s4.ontotext.com

Learn more at our website or simply get in touch

[email protected], @ontotext

Apr 2016 Hidden Relationships in Data and Risk Analytics