data visualizations of hyip dataset
TRANSCRIPT
Data Visualizations of HYIP Dataset
Jie Han
Quantifying the WorldApril 23, 2012
Financial Cryptography 2012
http://fc12.ifca.ai/pre-proceedings/paper_27.pdf
This could be you!!!
Overview
1. What's an HYIP?2. Dataset 3. Processes4. R graph examples5. Google Chart examples6. Some helpful hints
High Yield Investment Programs (HYIPs)
● Also known as a Ponzi or pyramid scheme● Promise high returns on investment● Pay existing investors with revenue from new
investors● Unsustainable in the long run
Why are HYIPs a problem?
● Advertised as legitimate investments
● Sophisticated online ecosystem in support of the schemes
HYIP Website
HYIP Aggregator Websites
HYIP Variables
HYIP Lifetime
Typical life cycle of an HYIP:
About the Data
● Since 11/17/2010, still running● Collected data from nine "aggregator" websites● Total observations: 141k+● Total HYIPs observed: 1,576+
Process
Data collection (Python, crontab, mongoDB)
Preliminary analysis (Python, R)
Continue data collection, work on parsing all aggregators (Python)
Look at what we have, decide on what we want (R)
Difficulties in analyzing data -> create interactive data visualizations (Python, Google Charts, JS, HTML)
Use new tools to look for patterns (browser & eyes)
How an R Chart Gets Generated
Data Collection (Python)
Parse data & insert into db (Python, mongoDB)
Fetch & manipulate data (Python, mongoDB, R)
Output a .pdf image to server
New user input (HTML forms)
Front End
Back End
User interact with data in browser
Background scripts
How Can We Trust Aggregator Data?
CDF of Standard Deviations of HYIP Lifetimes ● Aggregators agree 80% of the time
How Long Do HYIPs Last Before Collapsing?
Survival function of HYIP Lifetimes● Most HYIPs collapse within a few weeks
What Factors Lead to Collapse?
Factors that lead to shorter HYIP lifespans:● Higher advertised rates of return● Shorter mandatory investment terms
R vs. Google Charts
● Useful if familiar with the dataset
● Good at presenting aggregate summaries
● Large learning curve, especially when you want to do something specific
● More customizable● Most analysis techniques
are available
● Anyone can view & interact with the data
● See a complete data distribution
● Learning curve isn't bad● Not as customizable● Have to wait for updates for
more functionality, or write your own
R Google Charts
How a Google Chart Gets GeneratedData Collection
(Python)
Parse data & insert into db (Python, mongoDB)
Fetch & manipulate data (Python, mongoDB, R)
Write JS & HTML page (Python, JS, HTML, CSS)
New user input (HTML forms)
User interact with data in browser
Background scripts
Back End
Front End
Distribution of HYIPs Around the World
Link
Motion Charts
Link
Variable Changes Over Time
cherryshares.com, aggregator ratingLink
Relationships Between Two Variables
Link
Multi-Dimensional Scatterplot
Link
Multi-Dimensional Scatterplot
Link
General Programming Tips● Spend time on data quality● Organize your code, variable names, and files● Keep records of working examples● Plan out your code to maximize pattern capture● Error-catching, browser consoles, and regexes
are friends● Test out chunks of code before putting them
together● Google Tables take a while to load for large
datasets● Google Charts Playground allows you to test code
in their environment
Future Work
● Create an interactive web based visualization for our dataset - some examples I made
● Link scams together● Explore larger dataset
Thanks!