getting maximum performance from amazon redshift: complex queries
DESCRIPTION
Getting Maximum Performance from Amazon Redshift: Complex Queries. Timon Karnezos, Aggregate Knowledge. November 13, 2013. Meet the new boss. Multi-touch Attribution. Same as the old boss. Behavioral Analytics. Same as the old boss. Market Basket Analysis. $. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Getting Maximum Performance from Amazon Redshift: Complex Queries
Timon Karnezos, Aggregate Knowledge
November 13, 2013
![Page 2: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/2.jpg)
Multi-touch Attribution
Meet the new boss
![Page 3: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/3.jpg)
Same as the old bossBehavioral Analytics
![Page 4: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/4.jpg)
Same as the old bossMarket Basket Analysis
![Page 5: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/5.jpg)
$
![Page 6: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/6.jpg)
We know how to do this,in SQL*!
* SQL:2003
![Page 7: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/7.jpg)
Here it is.
SELECT record_date, user_id, action, site, revenue, SUM(1) OVER (PARTITION BY user_id ORDER BY record_date ASC) AS positionFROM user_activities;
![Page 8: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/8.jpg)
So why is MTAhard?
![Page 9: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/9.jpg)
“Web Scale”Queries 30 queries 1700 lines of SQL 20+ logical phases GBs of output
~109 daily impressions ~107 daily conversions ~104 daily sites x 90 days
per report.
Data
![Page 10: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/10.jpg)
So, how do we delivercomplex reports
over“web scale” data?(Pssst. The answer’s Redshift. Thanks AWS.)
![Page 11: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/11.jpg)
Write (good) queries.
Organize the data.
Optimize for the humans.
![Page 12: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/12.jpg)
Write (good) queries.
Remember: SQL is code.
![Page 13: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/13.jpg)
Software engineering rigor applies to SQL.
Factored.
Concise.
Tested.
![Page 14: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/14.jpg)
Common Table Expression
![Page 15: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/15.jpg)
Factored.Concise.Tested.
![Page 16: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/16.jpg)
Window functions
![Page 17: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/17.jpg)
-- Position in timelineSUM(1) OVER (PARTITION BY user_id ORDER BY record_date DESC ROWS UNBOUNDED PRECEDING)
-- Event count in timelineSUM(1) OVER (PARTITION BY user_id ORDER BY record_date DESC BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)-- Transition matrix of sitesLAG(site_name) OVER (PARTITION BY user_id ORDER BY record_date DESC)
-- Unique sites in timeline, up to nowCOUNT(DISTINCT site_name) OVER (PARTITION BY user_id ORDER BY record_date DESC ROWS UNBOUNDED PRECEDING)
![Page 18: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/18.jpg)
Window functions
Scalable, combinable.
Compact but expressive.
Simple to reason about.
![Page 19: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/19.jpg)
Organize the data.
![Page 20: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/20.jpg)
Leverage Redshift’s MPP roots.
Fast, columnar scans, IO.
Fast sort and load.
Effective when work is distributable.
![Page 21: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/21.jpg)
Leverage Redshift’s MPP roots.
Sort into multiple representations.
Materialize shared views.
Hash-partition by user_id.
![Page 22: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/22.jpg)
Optimize for the humans.
![Page 23: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/23.jpg)
Operations should not be the bottleneck.
Develop without fear.
Trade time for money.
Scale with impunity.
![Page 24: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/24.jpg)
Operations should not be the bottleneck.
Fast S3 = scratch space for cheap
Linear query scaling = GTM quicker
Dashboard Ops = dev/QA envs, marts, clusters with just a click
![Page 25: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/25.jpg)
But, be frugal.
![Page 26: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/26.jpg)
Quantify and control costs
Test across different hardware, clusters.
Shut down clusters often.
Buy productivity, not bragging rights.
![Page 27: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/27.jpg)
Thank you!
http://bit.ly/rs_ak
http://www.adweek.com/news/technology/study-facebook-leads-24-sales-boost-146716
http://en.wikipedia.org/wiki/Behavioral_analytics
http://en.wikipedia.org/wiki/Market_basket_analysis
References
![Page 28: Getting Maximum Performance from Amazon Redshift: Complex Queries](https://reader035.vdocument.in/reader035/viewer/2022062323/56816705550346895ddb6a07/html5/thumbnails/28.jpg)
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
DAT305