cyberspace law committee meeting, august 3, 2012 big data lois mermelstein the law office of lois d....
TRANSCRIPT
Cyberspace Law Committee Meeting, August 3, 2012
Big DataLois MermelsteinThe Law Office of Lois D. [email protected]
Ted ClaypooleWomble [email protected]
What Is Big Data?
✤ Data that exceeds the processing capacity of conventional database systems.
✤ Too much data
✤ It moves too fast
✤ It’s too diverse
How’d we get here?
✤ Storage, processing speed, and bandwidth are becoming exponentially faster
✤ Networking is expanding exponentially
✤ And you can buy all the pieces - data, infrastructure, processing
source: http://radar.oreilly.com/2011/08/building-data-startups.html
Crunching Big Data - Volume
✤ Turn 12 terabytes of tweets/day into improved product sentiment analysis
✤ Convert 350 billion annual meter readings to better predict power consumption
✤ Crunching Facebook recommendations based on your friends’ interests
Crunching Big Data - Velocity
✤ Time-sensitive analysis and decision-making - to catch important events as they happen
✤ When there’s too much input data (so toss some) or immediate decisions must be made
✤ Examples:
✤ Scrutinize 5 million trade events/day to identify potential fraud
✤ Analyze 500 million daily call detail records in real-time to predict customer churn faster
Crunching Big Data - Variety
✤ Not just names/addresses in a customer database
✤ Want to analyze text, sensor data, audio, video, location data, click streams, log files, and anything else that’s available
✤ Principle: when you can, keep everything - there might be something useful in what you throw away
Unexpected Consequences
✤ Anonymous AOL searcher isn’t (NYT, 8/9/2006)
✤ Anonymous Netflix users aren’t, when compared with IMDb database (Wired, 12/13/2007)
✤ For many, browsing history is unique and repeatable (8/1/2012)
✤ Target knows when you’re pregnant (NYT, 2/19/2012)
Lessons to (Re)learn
✤ Correlation isn't causation
✤ But correlation may be all you need
✤ You can't hide in the crowd
Personally Identifiable Information
PII as a mathematical function
How many points of data do you need?
Pineda v Williams Sonoma Stores, Inc. (Cal, Feb 10 2011)
HIPAA De-Identified Data
Re-Identifying De-Identified Data
Escaping Regulatory Requirements
Privacy
Fair Credit Reporting
Redlining
Employment Discrimination
Single Transaction Owned By:
Retailer
Wholesale vendor
Manufacturer
Shipping Company
Customer’s Bank
Customer’s ISP
Retailer’s Bank
Merchant Card Processor
Phone company/Hardware/Software
Government Using Big Data
Law Enforcement
Copyright Issues
Who owns the data?
Who owns the derivative works?
Combined data?