lumberjacking on aws: cutting through logs to find what matters (arc306) | aws re:invent 2013
DESCRIPTION
AWS offers services that revolutionize the scale and cost for customers to extract information from large data sets, commonly called Big Data. This session analyzes Amazon CloudFront logs combined with additional structured data as a scenario for correlating log and transactional data. Successfully implementing this type of solution requires architects and developers to assemble a set of services with multiple decision points. The session provides a design and example of architecting and implementing the scenario using Amazon S3, AWS Data Pipeline, Amazon Elastic MapReduce, and Amazon Redshift. It explores loading, query performance, security, incremental updates, and design trade-off decisions.TRANSCRIPT
![Page 1: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
ARC 306: Lumberjacking on AWS
Cutting Through Logs to Find What Matters
Guy Ernest, Solutions Architecture
November 15, 2013
![Page 2: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/2.jpg)
![Page 3: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/3.jpg)
![Page 4: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/4.jpg)
Progress Is Not Evenly Distributed
1980 Today
$14,000,000/TB
100 MB
4 MB/s
$30/TB
3 TB
200 MB/s
30,000 X
50 X
450,000 ÷
![Page 5: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/5.jpg)
Solution: More Spindles by Kheel Center, Cornell University
![Page 6: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/6.jpg)
Case Study – Foursquare
![Page 7: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/7.jpg)
The Challenge
“…Foursquare streams hundreds
of millions of application logs
each day. The company relies on
analytics to report on its daily
usage, evaluate new offerings,
and perform long-term trend
analysis—and with millions of
new check-ins each day, the
workload is only growing…”
![Page 8: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/8.jpg)
“Real” Project Requirements Example
Cost Analysis
Data transfer
• By date/time
• By edge location
• By date/time within an edge location
• By top X URLs
• By HTTP vs. HTTPS
Marketing
Top URLs
• As-is count
• By content type
• By edge location
• By edge location and content type
Requests served
• By edge location
Revenue
• By edge location
Top games
• By age
• By income
• By gender
Operations
Error rates
• By top X URLs
• By edge location
• By edge location and content type
Revenue
Top games
• By revenue
• By edge location and revenue
Top ads
• That lead to a game purchase
![Page 9: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/9.jpg)
Viable Business
# Users
$ Money
Operation Costs
Revenues
![Page 10: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/10.jpg)
Available Data Sources Metric Sources
Data transfer by date/time CloudFront logs
Data transfer by edge location CloudFront logs
Data transfer by date/time within an edge location CloudFront logs
Data transfer by top x URLs CloudFront logs, web servers logs
Data transfer by http vs HTTPS CloudFront logs
Top URLs CloudFront logs, web servers logs
Top URLs by Content Type CloudFront logs
Top URLs by Edge Location CloudFront logs
Top URLs by Edge Location and Content Type CloudFront logs
Error rates by top x URLs CloudFront logs, web servers logs
Error rate by edge location CloudFront logs
Error Rate by edge location and content type CloudFront logs
Requests served by edge location CloudFront logs
Revenue by edge location CloudFront logs, OrdersDB, app servers logs
Top games segmented by age CloudFront logs, user profile
Top games segmented by income CloudFront logs, user profile
Top games segmented by gender CloudFront logs, user profile
Top games by revenue CloudFront logs, OrdersDB
Top games by edge location and revenue CloudFront logs, OrdersDB
Top game revenue segmented by age CloudFront logs, OrdersDB, user profile
![Page 11: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/11.jpg)
CloudFront Access Log Format #Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query
2012-05-25 22:01:30 AMS1 4448 94.212.249.78 GET d1234567890213.cloudfront.net /YT0KthT/F5SOWdDPqNqQF07tiTOXqJMpfD\
dlb3LMwv3/jP3/CINm/yDSy0MsRcWJN/Simutrans.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/5.0%20(compatible;%20M\
SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625181
2012-05-25 22:01:30 AMS1 4952 94.212.249.78 GET d1234567890213.cloudfront.net /66IG584/CPCxY0P44BGb5ZOd3qSUrauL05\
0LOvFwaMj/eH/caw/Blob Wars-Blob And Conquer.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/5.0%20(compatible;%20M\
SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625184
2012-05-25 22:01:30 AMS1 4556 78.8.5.135 GET d1234567890213.cloudfront.net /SwlufjC/xEjH3BRbXMXwmFWqzKt7od6tlW\
R3e13LhmH/V3eF/lo6g/AstroMenace.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows%20NT%205.1;%20U;%20pl)%2\
0Presto/2.10.229%20Version/11.60 uid=100&oid=108625189
2012-05-25 22:01:30 AMS1 47172 78.8.5.135 GET d1234567890213.cloudfront.net /Di1cXoN/TskldkSHcgkvZXQEmv5vOVR25X\
5UTisFkRq/pQa/wCjUXZb/Z1HRuGlo/Kroz.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows%20NT%205.1;%20U;\
%20pl)%20Presto/2.10.229%20Version/11.60 uid=100&oid=108625206
![Page 12: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/12.jpg)
Sample Your Data with R
> sample_data <- read.delim(”SampleFiles/E123ABCDEF.2012-05-25-22.NEfbhLN3", header=F)
> sample_data <- sample_data[-1:-2,]
> View(sample_data)
> m <- ggplot(sample_data, aes(x = factor(V9)))
> m + geom_histogram() + scale_y_log10() + xlab('Error Codes') + ylab('log(Frequency)')
![Page 13: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/13.jpg)
Need a Lot of Memory?
![Page 14: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/14.jpg)
OpenRefine Running on an EC2 Instance
![Page 15: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/15.jpg)
DATAWAREHOUSE
Web
ANALYST CRM
DB
Logs
OLTP
OLTP
OLAP
E T L
![Page 16: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/16.jpg)
![Page 17: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/17.jpg)
![Page 18: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/18.jpg)
![Page 19: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/19.jpg)
Log Shipping Swedish public domain photo taken in 1918
![Page 20: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/20.jpg)
“Poor Man’s Log Shipping”
![Page 21: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/21.jpg)
![Page 22: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/22.jpg)
Embedding Poor-man Invisible Pixel http://www.poor-man-analytics.com/__track.gif?idt=5.1.5&idc=5&utmn=1532897343&utmhn=www.douban.com&utmcs=UTF-8&utmsr=1440x900&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.3%20r181&utmdt=%E8%B1%86%E7%93%A3&utmhid=571356425&utmr=-&utmp=%2F&utmac=UA-7019765-1&utmcc=__utma%3D30149280.1785629903.1314674330.1315290610.1315452707.10%3B%2B__utmz%3D30149280.1315452707.10.7.utmcsr%3Dbiaodianfu.com%7Cutmccn%3D(referral)%7Cutmcmd%3Dreferral%7Cutmcct%3D%2Fpoor-man-analytics-architecture.html%3B%2B__utmv%3D30149280.162%3B&utmu=qBM~
![Page 23: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/23.jpg)
![Page 24: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/24.jpg)
![Page 25: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/25.jpg)
Open Source
Frameworks
Input Output
+--------------------------------------------+
| |
| Web Apps ---+ +--> File |
| | | |
| +--> ---+ |
| /var/log ------> Fluentd ------> Mail |
| +--> ---+ |
| | | |
| Apache ---+ +--> S3 |
| |
+--------------------------------------------+
Web Server
+---------+
| Fluentd -------+
+---------+ |
|
Proxy Server |
+---------+ +--> +---------+
| Fluentd ----------> | Fluentd |
+---------+ +--> +---------+
|
Database Server |
+---------+ |
| Fluentd -------+
+---------+
Fluentd
Flume
Scribe
Chukwa
…
Fluentd Ascii Diagrams
![Page 26: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/26.jpg)
Use Amazon Kinesis to Ship Your Logs
New
![Page 27: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/27.jpg)
![Page 28: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/28.jpg)
Aggregation with S3Distcp Aggregated
Even-size
Compressed
![Page 29: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/29.jpg)
S3distcp on EMR Job Sample ./elastic-mapreduce --jobflow j-3GY8JC4179IOK --jar \
/home/hadoop/lib/emr-s3distcp-1.0.jar \
--args \
'--src,s3://myawsbucket/cf,\
--dest,s3://myoutputbucket/aggregate ,\
--groupBy,.*XABCD12345678.([0-9]+-[0-9]+-[0-9]+-[0-9]+).*,\
--targetSize,128,\
--outputCodec,lzo,\
--deleteOnSuccess'
![Page 30: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/30.jpg)
![Page 31: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/31.jpg)
![Page 32: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/32.jpg)
Pig for Access Logs Analysis RAW_LOG = LOAD 's3://myoutputbucket/aggregate/' AS (ts:chararray, url:chararray…);
LOGS_BASE_F = FILTER RAW_LOG BY url MATCHES '^GET /__track.*$’;
LOGS_BASE_F_W_PARAM = FOREACH LOGS_BASE_F GENERATE
url,
DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') as dt,
SUBSTRING(DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') ,0, 10 ) as day,
…
status,
REGEX_EXTRACT(url, '^GET /([^\\?]+)', 1) AS action: chararray,
REGEX_EXTRACT(url, 'idt=([^&]+)', 1) AS idt: chararray,
REGEX_EXTRACT(url, 'idc=([^&]+)', 1) AS idc: chararray;
I1 = FILTER LOGS_BASE_F_W_PARAM by action == 'clic' or action == 'display';
LOGS_SHORT = FOREACH I1 GENERATE uuid, action, dt, day, ida, idas, act, idp, idcmp ,idc;
G1 = GROUP LOGS_SHORT BY (uuid,idc);
store G1 into ‘s3://mybucket/sessions/’;
Load and Filter
(cat / grep)
Parse
(awk) Store
(>)
![Page 33: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/33.jpg)
Pig vs. Hive
• Pig is geared toward sequentially transforming data
– ETL
– Shell in scale (from local mode to any scale)
• Hive is for querying data
– Data analysis / HQL
– Some transformation, typically as a means to a goal i.e., temporary tables
![Page 34: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/34.jpg)
Monitoring Pig
https://github.com/netflix/lipstick
![Page 35: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/35.jpg)
Another Monitoring
Tool
https://github.com/twitter/ambrose
![Page 36: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/36.jpg)
Optimize Your EMR Cluster
![Page 37: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/37.jpg)
Monitor Your EMR Cluster
![Page 38: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/38.jpg)
Bootstrap Actions --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia
![Page 39: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/39.jpg)
Management Console
![Page 40: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/40.jpg)
![Page 41: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/41.jpg)
Customers Tools
Gathering information about EMR
jobs from multiple sources and
presentation it in a textual and
graphic view
github.com/Hi-Media/EmrMonitoring
![Page 42: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/42.jpg)
Completed Job View
![Page 43: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/43.jpg)
![Page 44: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/44.jpg)
![Page 45: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/45.jpg)
Spot Bidding Strategies
Most Saving
Not paying
more
Less
Interruptions
![Page 46: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/46.jpg)
Jeff Bezos (early Amazon days)
![Page 47: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/47.jpg)
Data Sources
Queries
Value
![Page 48: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/48.jpg)
More Trends to Consider
Transactional Processing Analytical Processing
Transactional context Global context
Latency Throughput
Indexed access Full table scans
Random IO Sequential IO
Disk seek times Disk transfer rate
![Page 49: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/49.jpg)
![Page 50: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/50.jpg)
![Page 51: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/51.jpg)
COPY into Amazon Redshift create table cf_logs
( d date, t char(8), edge char(4), bytes int, cip varchar(15),
verb char(3), distro varchar(MAX), object varchar(MAX), status int,
Referer varchar(MAX), agent varchar(MAX), qs varchar(MAX) )
copy cf_logs from 's3://big-data/logs/E123ABCDEF/'
credentials 'aws_access_key_id=<key_id>;aws_secret_access_key=<secret_key>'
IGNOREHEADER 2
GZIP
DELIMITER '\t'
DATEFORMAT 'YYYY-MM-DD'
![Page 52: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/52.jpg)
COPY into Amazon Redshift with
AWS Data Pipeline
![Page 53: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/53.jpg)
Time for Data Visualization
Charles Minard's flow map of Napoleon's March (1869)
![Page 54: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/54.jpg)
![Page 55: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/55.jpg)
Choose Your Favorite
Visualization Tool
Tableau (Windows instance)
R
Jaspersoft
QlikView
MicroStrategy
SiSense
…
![Page 56: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/56.jpg)
![Page 57: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/57.jpg)
![Page 58: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/58.jpg)
![Page 59: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/59.jpg)
Snapshot before Delete
![Page 60: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/60.jpg)
Unload Data from Amazon Redshift unload (“select * from cf_logs where date between '2013-11-03’ and '2013-11-10’“)
to 's3://mybucket/unload_cf_logs_week_46'
credentials 'aws_access_key_id=<key_id>;
aws_secret_access_key=<secret_key>’
delimiter as '\t’
GZIP;
![Page 61: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/61.jpg)
Reference Architecture
![Page 62: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/62.jpg)
Partner Services
Loggly
Splunk
Stratalux (Logstash)
…
Loggly AWS Marketplace Page
![Page 63: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/63.jpg)
What Else Can You Do with
Log Analysis?
![Page 64: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/64.jpg)
![Page 65: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/65.jpg)
Finally, a Small Warning
Abraham Wald (1902-1950)
![Page 66: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/66.jpg)
A B C
![Page 67: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/67.jpg)
![Page 68: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/68.jpg)
Would You Like to Know More?
Further reading http://aws.amazon.com/architecture
http://aws.amazon.com/articles
http://aws.typepad.com
Re:invent sessions DAT205 - Amazon Redshift in Action: Enterprise, Big Data, and SaaS
DAT305 - Getting Maximum Performance from Amazon Redshift
BDT301 - Scaling your Analytics with Amazon Elastic MapReduce
![Page 69: Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AWS re:Invent 2013](https://reader033.vdocument.in/reader033/viewer/2022052820/54c6b2164a7959526c8b4614/html5/thumbnails/69.jpg)
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
ARC306