webinar: proofpoint, a pioneer in security-as-a-service protects people, information and brands with...
TRANSCRIPT
Proofpoint protects people, information and brands with DataStaxJuly 28, 2016Rich Sutton, CTO / VP of Engineering, Social Security & Compliance @ Proofpoint
Proofpoint
• Cloud-based security and compliance for the enterprise: email, social and mobile
• Founded 2002• 1300 employees worldwide• $3B public company: PFPT• $350M revenue• Cassandra used all over
the organization
© DataStax, All Rights Reserved. 2
Nexgate
• Security and compliance platform for digital risk, focusing on social media
• Founded in 2011, backed by Sierra Ventures• Acquired by Proofpoint in 2014• Early customer of Datastax
© DataStax, All Rights Reserved. 3
Why Cassandra
© DataStax, All Rights Reserved. 4
Nexgate Stack• Frontend
• Ruby on Rails 4• Can JS
• Service• Unicorn• Apache• CherryPy (python)
• Data• MySQL• Datastax Cassandra• Redis
• Platform• AWS• Ubuntu
© DataStax, All Rights Reserved. 5
Deployment: Current Production• 3 TB of data across 23 nodes in 9 DCs in 4 clusters• Primary prod cluster:
• ~1000 writes & 250 reads per second• LOCAL_QUORUM writes with RF 3
© DataStax, All Rights Reserved. 6
Deployment: Evolution
© DataStax, All Rights Reserved. 7
Start – 2012:•Cassandra 1.1.6, Ubuntu 10.04•One datacenter, three nodes•m1.larges
•Never Down
Finish – Today:•Datastax Enterprise 4.5.7, Ubuntu 14.04•Two datacenters, six nodes•Solr deployed•i2.xlarges
Use Case: Anglerphishing• Problem: Bad actors respond to support tweets from fake social accounts to lure
users to phishing sites• Solution: Solr indexed Cassandra column family that has …
• Partition Key: Tweet ID• Clustering Key: Mentioned user ID• Data: Mentioning user ID, Content, Timestamp
• Benefits: Efficiently enforce unique writes of mentions without modeling for data science team’s access patterns (search)
© DataStax, All Rights Reserved. 8
Use Case: Spam Multiplicity• Problem: Spammers on social repeat messages across accounts• Solution: Cassandra column family that has…
• Partition Key: Hash of content• Clustering Key: Content “native” ID• Data: Timestamp
• Benefits: Efficiently get a count of times we’ve seen content, while retaining detail data, supporting real-time analysis
© DataStax, All Rights Reserved. 9
Use Case: Trending Topics• Problem: Detect when the
conversation radically changes on a social account
• Solution: Cassandra column family that has…
• Partition Key: Account, Year_Month• Clustering Key: Day_Minute, Bi-
gram (parsed from content)• Data: Count
• Benefits: Efficiently get bi-gram counts from adjacent date ranges and analyze them for statistical differences
© DataStax, All Rights Reserved. 10
Use Case: Archive Search• Problem: Allow customers to identify arbitrary compliance problems in social content
with an open-ended search feature• Solution:
• Cassandra column family that contains the content and all metadata: timestamp, user IDs and names, links, etc.
• Datastax Enterprise Solr with a core on that column family• Benefits: Near real-time index updates make new content available via search from
same infrastructure
© DataStax, All Rights Reserved. 11
Use Case: Threat Event Correlation• Problem: Proofpoint collects billions of threat data points a day which can’t be analyzed• Solution:
• Build a custom graph database on top of C*• Key is vertex, wide rows are edges• 18 nodes, 24 TB of data, ingest peaks of 1M events per second
• Benefits: Security researchers can now identify relationships between hosts, actors and threats that they couldn’t before
© DataStax, All Rights Reserved. 12
Advice
• Understand eventual consistency
• Plan for horizontal scalability
• Test fault tolerance with your app
• Use AWS (and skip hardware planning)
• Prefer instance attached storage,or understand EBS optimization
• Pay for support
Contacts & ResourcesProofpoint•Rich Sutton: CTO / VP of Engineering, Social Security & Compliance•Email: [email protected]
DataStax•Allene Jue: Product Marketing•Email: [email protected]
Resources •www.proofpoint.com •www.datastax.com
Proofpoint
© 2015 DataStax, All Rights Reserved. 14
Before we go…a few reminders
• Download DSE 5.0, available today!
• Register for the August 16th webinar on “The Agility Challenge: Powering Cloud Applications with Multi-Model & Mixed Workloads”
• Join us at Cassandra Summit (Sept 7-9): https://cassandrasummit.org/ Insider Tip: Get 15% off using promo code Webinar15
• Become a DataStax Professional Community Member: http://academy.datastax.com/community
© DataStax, All Rights Reserved.
Q & A
16© DataStax, All Rights Reserved.
Thank You!