coursera amazon cloudsearch presentation

12
Coursera + AWS CloudSearch Frank Chen Software Engineer

Upload: michael-bohlig

Post on 19-Jun-2015

746 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Coursera amazon cloudsearch presentation

Coursera + AWS CloudSearch

Frank Chen Software Engineer

Page 2: Coursera amazon cloudsearch presentation

About •  Ed-Tech startup providing MOOCs

o  Massive Open Online Courses

•  New company -- launched 4/18/12 o  Less than a year old.

•  215 free courses from 33 top universities o  Princeton, Stanford, Penn, Duke, etc... o  From Cryptography to Modern and Contemporary American

Poetry

•  2.5+ million users o  We reached a million users faster than Facebook and

Pinterest.

•  ~9 million course enrollments

Page 3: Coursera amazon cloudsearch presentation

Platform Scale •  Moderate-sized (>10,000 concurrent users) •  65 concurrent courses running now, each with tens of

thousands of enrollments each •  >600 "pretty heavy" PHP/Python dynamic pages served

per second sustained o  Might make backend calls to services (e.g. CloudSearch or SES -->

want low latencies)

•  Various other services (70 instances+ on EC2 running at the moment)

•  Spiky traffic o  People procrastinate on deadlines - spiky on the weekends

Page 4: Coursera amazon cloudsearch presentation

Stack •  PHP / Python / Scala backed by MySQL •  Runs on AWS completely •  Utilizes lots of AWS services

o  EC2 / ELB for servers o  MySQL RDS for databases o  S3 for video and static hosting o  Cloudfront for video / asset hosting o  SES for emails (>1 million emails everyday) o  SQS for long running tasks (video encoding, gradebook generation,

etc...) o  SNS for notification services o  Route53 for DNS o  CloudSearch for forum search

Page 5: Coursera amazon cloudsearch presentation

Why CloudSearch? •  Big issue for us back in March / April. Solution then

didn't work o  MySQL Full Text Search

§  LIKE %x% AS NATURAL LANGUAGE? §  Really terrible results §  MyISAM (eww...)

•  Requirements: o  Fast searches (we call backend APIs - don't want to keep the users

waiting too long) o  Good results (need to be relevant - don't waste the students' time) o  Low/no maintenance (we have enough instances to manage as is)

Page 6: Coursera amazon cloudsearch presentation

Why CloudSearch?

•  Alternatives we looked at: o  Apache Solr, Sphinx, fiddling with MySQL

•  Then CloudSearch was announced... •  Early general adopter - we started using

CloudSearch ~10 days after announcement o  We didn't get any heads-up about CS before the public

announcement o  Wrote the code to use CloudSearch and import over our

existing forum posts / comments in 2 or 3 days. §  From decision to production! §  Easy to use and great documentation

Page 7: Coursera amazon cloudsearch presentation

CloudSearch Uses

User facing forum search

Page 8: Coursera amazon cloudsearch presentation

CloudSearch Uses

•  Analytics o  Most frequent searches and other statistics about their courses

§  Informing instructors about this so they can clarify information

o  Finding posts across forums §  Easy for CloudSearch, hard normally because of sharded

scatter-gather problems •  Old way: Querying 600 databases on 4 RDS servers? Not fun

§  Usage analysis §  Unexpected use: Instructors often want to find all their own

posts so they can save / archive common answers

Page 9: Coursera amazon cloudsearch presentation

CloudSearch Scale

•  Moderate scale

•  ~1.5 million documents indexed o  All forum posts and comments

•  50,000+ searches a day o  Spikey! Depends on when homeworks are due.

Page 10: Coursera amazon cloudsearch presentation

Experience

GREAT!

Page 11: Coursera amazon cloudsearch presentation

We Want...

•  "Did you mean..." o  Lots of typos from non-native speakers

•  Multilingual Tokenization / Search o  We are starting to run courses in other languages...

•  Find Similar Documents

Page 12: Coursera amazon cloudsearch presentation

Thank You! Questions?

[email protected]