attack on graph
DESCRIPTION
This sharing is talking about how Trend micro SPN using HBase to solve Graph model problem. And use pageRank to process our graph data to do predictive things. Then we also put the partial impl. of our Graph solution named HGraph on github for everyone interesting about this topic.TRANSCRIPT
![Page 1: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/1.jpg)
Scott Miao2013/12/14
![Page 2: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/2.jpg)
Who am I
• RD, SPN, Trend Micro• 3 years for Hadoop eco system• Expertise in HDFS/MR/HBase• @takeshi.miao
![Page 3: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/3.jpg)
THREATCONNECT
![Page 4: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/4.jpg)
IP, domain, URL, filename, process, file hash, Virus detection, registry key, etc.
Product 1 Product 2 Product 3 …
Threat Conne
ct
Sand-box File
Detection
Threat
Web
Web Reputatio
nFamil
y Write-up
TE
Virus DB
APT KB
Most relevant threat report with actionable
intelligenceon a single portal
Process and correlates different data sources
![Page 5: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/5.jpg)
A GRAPH
![Page 6: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/6.jpg)
The problems
• Store large size of Graph data
• Access large size of Graph data
• Process large size of Graph data
![Page 7: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/7.jpg)
大數據
![Page 8: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/8.jpg)
STORE
![Page 9: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/9.jpg)
Property Graph Model (1/3)
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
![Page 10: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/10.jpg)
Property Graph Model (2/3)
• A property graph has these elements– a set of vertices
• each vertex has a unique identifier.• each vertex has a set of outgoing edges.• each vertex has a set of incoming edges.• each vertex has a collection of properties defined by a map from key to
value.
– a set of edges• each edge has a unique identifier.• each edge has an outgoing tail vertex.• each edge has an incoming head vertex.• each edge has a label that denotes the type of relationship between its
two vertices.• each edge has a collection of properties defined by a map from key to
value.
![Page 11: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/11.jpg)
Property Graph Model (3/3)
![Page 12: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/12.jpg)
The domain model for Property Graph Model
![Page 13: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/13.jpg)
The relational model forProperty Graph Model
![Page 14: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/14.jpg)
Massive scalable ?
Active community ?
Analyzable ?
![Page 15: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/15.jpg)
• We use HBase as a Graph Storage– Google BigTable and PageRank– HBaseCon2012
The winner is…
YeahWe are NO. 1 !!
![Page 16: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/16.jpg)
Use HBase to store Graph data (1/3)
• Schema design– Table: vertex
– Table: edge
‘<vertex-id>@<entity-type>’, ‘property:<property-key>@<property-value-type>’,<property-value>
‘<vertex1-row-key>--><label>--><vertex2-row-key>’, ‘property:<property-key>@<property-value-type>’, <property-value>
![Page 17: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/17.jpg)
Use HBase to store Graph data (2/3)
• Sample– Table: vertex
– Table: edge
‘myapps-ups.com@domain’, ‘property:ip@String’, ‘…’‘myapps-ups.com@domain’, ‘property:asn@String’, ‘…’…‘http://track.muapps-ups.com/InvoiceA1423AC.JPG.exe@url’, ‘property:path@String’, ‘…’‘http://track.muapps-ups.com/InvoiceA1423AC.JPG.exe@url’, ‘property:parameter@String’, ‘…’
‘myapps-ups.com@domain-->host-->http://track.muapps-ups.com/InvoiceA1423AC.JPG.exe@url’, ‘property:property1’, ‘…’‘myapps-ups.com@domain-->host-->http://track.muapps-ups.com/InvoiceA1423AC.JPG.exe@url’, ‘property:property2’, ‘…’
![Page 18: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/18.jpg)
Use HBase to store Graph data (3/3)
• Tables– create 'test.vertex', {NAME => 'property',
BLOOMFILTER => 'ROW', COMPRESSION => ‘lzo', TTL => '7776000'}
– create 'test.edge', {NAME => 'property', BLOOMFILTER => 'ROW', COMPRESSION => ‘lzo', TTL => '7776000'}
![Page 19: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/19.jpg)
ACCESS
It’s not me, actually…
![Page 20: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/20.jpg)
HBase
Data Sources
Algorithms
Clients1. Put data
2. Get Data
3. Process Data
![Page 21: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/21.jpg)
Put Data
• HBase schema design is simple and human-readable
• They are easy to write your own dumping tool as you need– MR/Pig/Completebulkload– Can write cron-job to clean up the broken-edge
data– TTL can also help to retire old data
• We already have a lot practices for this task
![Page 22: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/22.jpg)
Get Data (1/2)• A Graph API• A better semantics for manipulating Graph
data– As a wrapper for HBase Client API– Rather than use HBase Client API directly
• Simple to UseVertex vertex = this.graph.getVertex("40012");Vertex subVertex = null;Iterable<Edge> edges =
vertex.getEdges(Direction.OUT, "knows", "foo", "bar");for(Edge edge : edges) { subVertex = edge.getVertex(Direction.OUT); ...}
![Page 23: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/23.jpg)
Get Data (2/2)
• We implement blueprints API– It provides interfaces as spec. for users to impl.– Currently basic query methods are implemented– We can get benefits from it• Other libraries support if we can impl. more degrees of
blueprints API– http://www.tinkerpop.com/– RESTful server, graph algorithmn, dataflow, etc
![Page 24: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/24.jpg)
![Page 25: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/25.jpg)
PROCESS
![Page 26: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/26.jpg)
• Thanks for human-readable HBase schema design and random accessible in natural– Write your own MR– Write your own Pig/UDFs
• Ex. The pagerank– http://zh.wikipedia.org/wiki/Pagerank
![Page 27: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/27.jpg)
HGraph
• A project is open and put on github– https://github.com/takeshimiao/HGraph
• A partial impl. released from our internal pilot project– Follow HBase schema design– Read data via Blueprints API– Process data with pagerank
• Download or ‘git clone’ it– Use ‘mvn clean package’– Run on unix-like OS
• Use window may encounter some errors
![Page 28: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/28.jpg)
![Page 29: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/29.jpg)
![Page 30: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/30.jpg)
![Page 31: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/31.jpg)
There is another project
http://thinkaurelius.github.io/titan/
http://thinkaurelius.github.io/faunus/
![Page 32: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/32.jpg)
![Page 33: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/33.jpg)
OBSERVATIONS
![Page 34: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/34.jpg)
• It seems bring Hadoop to a de-facto big data platform– Loose bound the MR framework and
accommodate others • There are bunch of data processing migrated
with it
YARN
![Page 35: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/35.jpg)
http://hortonworks.com/hadoop/yarn/
![Page 36: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/36.jpg)
• Impala V.S. Hive (Stinger and Tez)– Impala seems more mature than Hive
• YARN !!– Hive stinger and Tez are based on YARN (HDP2)– Impala also has plan to migrated to YARN (CDH5)– Even HBase !! (HOYA)
SQL-on-Hadoop
Hive built on top of a batch processing framework (even MRv2), but Impala goes itself own way !!
Todd LipconCommitter/PMC member on Apache Thrift, HBаse, and Hаdoop projects
![Page 37: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/37.jpg)
• As I saw in Europe/CA/China, I can say HBase is most popular noSQL solution if you already adopted Hadoop
• Other noSQLs will not help you out of OPS paintpoints
• So the best way is to pick your right tool and play it well
HBase is a popular noSQL
![Page 38: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/38.jpg)
![Page 39: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/39.jpg)
http://www.slideshare.net/Hadoop_Summit/what-is-the-point-of-hadoop?from_search=1 #p34
![Page 40: Attack on graph](https://reader038.vdocument.in/reader038/viewer/2022102815/554f8b5db4c905435d8b4ddb/html5/thumbnails/40.jpg)