twitter frenzy fpga data stream processing

14
Twitter Frenzy FPGA Data Stream Processing Cory Kleinheksel (Team Leader) Tim Meyer David Graziano Josh Clausman

Upload: cade-newman

Post on 02-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Twitter Frenzy FPGA Data Stream Processing. Cory Kleinheksel (Team Leader) Tim Meyer David Graziano Josh Clausman. Project Idea. Twitter Frenzy - A way to filter tweets as a set of frequencies using a FPGA to perform packet analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Twitter Frenzy  FPGA Data Stream Processing

Twitter Frenzy  FPGA Data Stream Processing

Cory Kleinheksel (Team Leader)Tim Meyer

David GrazianoJosh Clausman

Page 2: Twitter Frenzy  FPGA Data Stream Processing

Project Idea  • Twitter Frenzy - A way to filter tweets as a set of frequencies using a FPGA

to perform packet analysis.

• Accelerate the stream processing of Twitter data queries.

• Specifically accelerate computationally intensive and long life-time queries with data with short life-times.

• The design/implementation of a frequency-based query will be the primary focus (interesting application of signal processing).

 

Page 3: Twitter Frenzy  FPGA Data Stream Processing

Details  • Input: Live (or simulated) Twitter stream data

• Java program used to simulate twitter feed by reading from a dataset

• Processing:1. Extract tweets from input stream2. Filter tweets based on query parameters

• Text Matching3. Determine tweet frequency components

• Frequency Analysis4. Apply signal filter (signal processing)

• Output: Tweets matching filter

Page 4: Twitter Frenzy  FPGA Data Stream Processing

Design Issues

• Ability to acquire data from twitter at a useful speed

• Determining packet usefulness (send/drop) in efficient manner

• Managing concurrently arriving packets and multi-fragment packets

• How to calculate frequency and filter corresponding packets

Page 5: Twitter Frenzy  FPGA Data Stream Processing

Implementation Issues• How to properly buffer and send fragmented tweets

• Time/clock cycles needed to perform frequency calculations

• Time to perform Hashing – Created a lookup table based hashing block

• Modules consuming data at different rates

• Debugging HW

Page 6: Twitter Frenzy  FPGA Data Stream Processing

System Architecture Diagram

 

Page 7: Twitter Frenzy  FPGA Data Stream Processing

Breakdown: Network Data Flow

 

Page 8: Twitter Frenzy  FPGA Data Stream Processing

Breakdown: Text Matching

Page 9: Twitter Frenzy  FPGA Data Stream Processing

Breakdown: Frequency Analysis

Page 10: Twitter Frenzy  FPGA Data Stream Processing

Algorithms

• Hashing

• String Matching

• Frequency Analysis

• Filtering (FIR)

Page 11: Twitter Frenzy  FPGA Data Stream Processing

Project Results

• Analyzed the problem

• Implemented full simulator in software

• Implemented in VHDL

• Simulated in ModelSim

• Tested on hardware, confirmed results against software implementation

Dataset: JSON_29493.txtProcessed 29493 tweets192 passed string filter133 passed frequency filter

Page 12: Twitter Frenzy  FPGA Data Stream Processing

Software Simulator Example

Page 13: Twitter Frenzy  FPGA Data Stream Processing

Demo

Page 14: Twitter Frenzy  FPGA Data Stream Processing

References

Berinde, Indyk, Cormode, Strauss. "Space-optimal Heavy Hitters with Strong Error Bounds"

Cormode, Korn, Tirthapura. "Time-Decaying Aggregates in Out-of-order Streams"

Charikar, Chen, Farach-Colton. "Finding Frequent Items in Data Streams“