real time api delivering data @ scale
TRANSCRIPT
Agenda
API Overview
Key System Requirement
Big Data System Vs RDBMS
Architecture
Data Flow
Questions?
API Overview
API details
REST based API
Partners can request for various types of reports
Each reports has data in order of T.B's
Sample Request
?start-date=2012-10-01&end-date=2012-10-29&partner=1&aggregate-by=state,city
Response
Zip file [Size in order of 10-30 M.B]
Key System Requirement
Interactive Filtering Query– Partner can filter data on various parameter.
Real Time Response– SLA of 1-3 min.
Security
Extremely private and confidential data.
Need to go through an audit by external vendor
Scalability
Only more machine for more customer
Big Data System Vs Relational Data System
Large Amount of Data [In order of T.B's ]
Hadoop/Hive
RDBMS
Real Time Interactive Filtering/Querying
Hadoop/Hive
RDBMS
Join's between large tables [ millions X millions X millions ]– Hadoop/Hive– RDBMS
Big Data System Vs Relational Data System
Access/Security Control
Hadoop/Hive
RDBMS
Resilient to Hardware failure and Auto Scaling
Hadoop/Hive
RDBMS
Fast read operation's– Hadoop/Hive– RDBMS
Data Flow
Security Control in RDBMS
Strong User authentication mechanism.
Restricted access to each user on database and table level
Each partner has specific user and associated tables
No cross-referencing of data across [table] partner.
Data Flow
Java API
Common Pattern [Streaming]• Read a bunch of records from DB.• Process records.• Stream back to client.
Avoiding creating unnecessary objects• Java heap memory exception because of using String in
place of Char Array.