arc 306: lumberjacking on aws cutting through logs to find...

69
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. ARC 306: Lumberjacking on AWS Cutting Through Logs to Find What Matters Guy Ernest, Solutions Architecture November 15, 2013

Upload: others

Post on 10-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

ARC 306: Lumberjacking on AWS

Cutting Through Logs to Find What Matters

Guy Ernest, Solutions Architecture

November 15, 2013

Page 2: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 3: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 4: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Progress Is Not Evenly Distributed

1980 Today

$14,000,000/TB

100 MB

4 MB/s

$30/TB

3 TB

200 MB/s

30,000 X

50 X

450,000 ÷

Page 5: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Solution: More Spindles by Kheel Center, Cornell University

Page 6: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Case Study – Foursquare

Page 7: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

The Challenge

“…Foursquare streams hundreds

of millions of application logs

each day. The company relies on

analytics to report on its daily

usage, evaluate new offerings,

and perform long-term trend

analysis—and with millions of

new check-ins each day, the

workload is only growing…”

Page 8: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

“Real” Project Requirements Example

Cost Analysis

Data transfer

• By date/time

• By edge location

• By date/time within an edge location

• By top X URLs

• By HTTP vs. HTTPS

Marketing

Top URLs

• As-is count

• By content type

• By edge location

• By edge location and content type

Requests served

• By edge location

Revenue

• By edge location

Top games

• By age

• By income

• By gender

Operations

Error rates

• By top X URLs

• By edge location

• By edge location and content type

Revenue

Top games

• By revenue

• By edge location and revenue

Top ads

• That lead to a game purchase

Page 9: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Viable Business

# Users

$ Money

Operation Costs

Revenues

Page 10: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Available Data Sources Metric Sources

Data transfer by date/time CloudFront logs

Data transfer by edge location CloudFront logs

Data transfer by date/time within an edge location CloudFront logs

Data transfer by top x URLs CloudFront logs, web servers logs

Data transfer by http vs HTTPS CloudFront logs

Top URLs CloudFront logs, web servers logs

Top URLs by Content Type CloudFront logs

Top URLs by Edge Location CloudFront logs

Top URLs by Edge Location and Content Type CloudFront logs

Error rates by top x URLs CloudFront logs, web servers logs

Error rate by edge location CloudFront logs

Error Rate by edge location and content type CloudFront logs

Requests served by edge location CloudFront logs

Revenue by edge location CloudFront logs, OrdersDB, app servers logs

Top games segmented by age CloudFront logs, user profile

Top games segmented by income CloudFront logs, user profile

Top games segmented by gender CloudFront logs, user profile

Top games by revenue CloudFront logs, OrdersDB

Top games by edge location and revenue CloudFront logs, OrdersDB

Top game revenue segmented by age CloudFront logs, OrdersDB, user profile

Page 11: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

CloudFront Access Log Format #Version: 1.0

#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query

2012-05-25 22:01:30 AMS1 4448 94.212.249.78 GET d1234567890213.cloudfront.net /YT0KthT/F5SOWdDPqNqQF07tiTOXqJMpfD\

dlb3LMwv3/jP3/CINm/yDSy0MsRcWJN/Simutrans.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/5.0%20(compatible;%20M\

SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625181

2012-05-25 22:01:30 AMS1 4952 94.212.249.78 GET d1234567890213.cloudfront.net /66IG584/CPCxY0P44BGb5ZOd3qSUrauL05\

0LOvFwaMj/eH/caw/Blob Wars-Blob And Conquer.exe 200 http://AtRJw2kxg0EMW.com/kZetr/YCb6AM9N2xt2 Mozilla/5.0%20(compatible;%20M\

SIE%209.0;%20Windows%20NT%206.1;%20WOW64;%20Trident/5.0) uid=100&oid=108625184

2012-05-25 22:01:30 AMS1 4556 78.8.5.135 GET d1234567890213.cloudfront.net /SwlufjC/xEjH3BRbXMXwmFWqzKt7od6tlW\

R3e13LhmH/V3eF/lo6g/AstroMenace.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows%20NT%205.1;%20U;%20pl)%2\

0Presto/2.10.229%20Version/11.60 uid=100&oid=108625189

2012-05-25 22:01:30 AMS1 47172 78.8.5.135 GET d1234567890213.cloudfront.net /Di1cXoN/TskldkSHcgkvZXQEmv5vOVR25X\

5UTisFkRq/pQa/wCjUXZb/Z1HRuGlo/Kroz.exe 200 http://AtRJw2kxg0EMW.com/AC1vg/1727EWfb7fPt Opera/9.80%20(Windows%20NT%205.1;%20U;\

%20pl)%20Presto/2.10.229%20Version/11.60 uid=100&oid=108625206

Page 12: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Sample Your Data with R

> sample_data <- read.delim(”SampleFiles/E123ABCDEF.2012-05-25-22.NEfbhLN3", header=F)

> sample_data <- sample_data[-1:-2,]

> View(sample_data)

> m <- ggplot(sample_data, aes(x = factor(V9)))

> m + geom_histogram() + scale_y_log10() + xlab('Error Codes') + ylab('log(Frequency)')

Page 13: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Need a Lot of Memory?

Page 14: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

OpenRefine Running on an EC2 Instance

Page 15: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

DATAWAREHOUSE

Web

ANALYST CRM

DB

Logs

OLTP

OLTP

OLAP

E T L

Page 16: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 17: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 18: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 19: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Log Shipping Swedish public domain photo taken in 1918

Page 20: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

“Poor Man’s Log Shipping”

Page 21: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 22: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Embedding Poor-man Invisible Pixel http://www.poor-man-analytics.com/__track.gif?idt=5.1.5&idc=5&utmn=1532897343&utmhn=www.douban.com&utmcs=UTF-8&utmsr=1440x900&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.3%20r181&utmdt=%E8%B1%86%E7%93%A3&utmhid=571356425&utmr=-&utmp=%2F&utmac=UA-7019765-1&utmcc=__utma%3D30149280.1785629903.1314674330.1315290610.1315452707.10%3B%2B__utmz%3D30149280.1315452707.10.7.utmcsr%3Dbiaodianfu.com%7Cutmccn%3D(referral)%7Cutmcmd%3Dreferral%7Cutmcct%3D%2Fpoor-man-analytics-architecture.html%3B%2B__utmv%3D30149280.162%3B&utmu=qBM~

Page 23: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 24: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 25: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Open Source

Frameworks

Input Output

+--------------------------------------------+

| |

| Web Apps ---+ +--> File |

| | | |

| +--> ---+ |

| /var/log ------> Fluentd ------> Mail |

| +--> ---+ |

| | | |

| Apache ---+ +--> S3 |

| |

+--------------------------------------------+

Web Server

+---------+

| Fluentd -------+

+---------+ |

|

Proxy Server |

+---------+ +--> +---------+

| Fluentd ----------> | Fluentd |

+---------+ +--> +---------+

|

Database Server |

+---------+ |

| Fluentd -------+

+---------+

Fluentd

Flume

Scribe

Chukwa

Fluentd Ascii Diagrams

Page 26: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Use Amazon Kinesis to Ship Your Logs

New

Page 27: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 28: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Aggregation with S3Distcp Aggregated

Even-size

Compressed

Page 29: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

S3distcp on EMR Job Sample ./elastic-mapreduce --jobflow j-3GY8JC4179IOK --jar \

/home/hadoop/lib/emr-s3distcp-1.0.jar \

--args \

'--src,s3://myawsbucket/cf,\

--dest,s3://myoutputbucket/aggregate ,\

--groupBy,.*XABCD12345678.([0-9]+-[0-9]+-[0-9]+-[0-9]+).*,\

--targetSize,128,\

--outputCodec,lzo,\

--deleteOnSuccess'

Page 30: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 31: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 32: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Pig for Access Logs Analysis RAW_LOG = LOAD 's3://myoutputbucket/aggregate/' AS (ts:chararray, url:chararray…);

LOGS_BASE_F = FILTER RAW_LOG BY url MATCHES '^GET /__track.*$’;

LOGS_BASE_F_W_PARAM = FOREACH LOGS_BASE_F GENERATE

url,

DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') as dt,

SUBSTRING(DATE_TIME(ts, 'dd/MMM/yyyy:HH:mm:ss Z') ,0, 10 ) as day,

status,

REGEX_EXTRACT(url, '^GET /([^\\?]+)', 1) AS action: chararray,

REGEX_EXTRACT(url, 'idt=([^&]+)', 1) AS idt: chararray,

REGEX_EXTRACT(url, 'idc=([^&]+)', 1) AS idc: chararray;

I1 = FILTER LOGS_BASE_F_W_PARAM by action == 'clic' or action == 'display';

LOGS_SHORT = FOREACH I1 GENERATE uuid, action, dt, day, ida, idas, act, idp, idcmp ,idc;

G1 = GROUP LOGS_SHORT BY (uuid,idc);

store G1 into ‘s3://mybucket/sessions/’;

Load and Filter

(cat / grep)

Parse

(awk) Store

(>)

Page 33: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Pig vs. Hive

• Pig is geared toward sequentially transforming data

– ETL

– Shell in scale (from local mode to any scale)

• Hive is for querying data

– Data analysis / HQL

– Some transformation, typically as a means to a goal i.e., temporary tables

Page 34: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Monitoring Pig

https://github.com/netflix/lipstick

Page 35: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Another Monitoring

Tool

https://github.com/twitter/ambrose

Page 36: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Optimize Your EMR Cluster

Page 37: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Monitor Your EMR Cluster

Page 38: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Bootstrap Actions --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia

Page 39: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Management Console

Page 40: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 41: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Customers Tools

Gathering information about EMR

jobs from multiple sources and

presentation it in a textual and

graphic view

github.com/Hi-Media/EmrMonitoring

Page 42: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Completed Job View

Page 43: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 44: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 45: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Spot Bidding Strategies

Most Saving

Not paying

more

Less

Interruptions

Page 46: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Jeff Bezos (early Amazon days)

Page 47: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Data Sources

Queries

Value

Page 48: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

More Trends to Consider

Transactional Processing Analytical Processing

Transactional context Global context

Latency Throughput

Indexed access Full table scans

Random IO Sequential IO

Disk seek times Disk transfer rate

Page 49: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 50: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 51: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

COPY into Amazon Redshift create table cf_logs

( d date, t char(8), edge char(4), bytes int, cip varchar(15),

verb char(3), distro varchar(MAX), object varchar(MAX), status int,

Referer varchar(MAX), agent varchar(MAX), qs varchar(MAX) )

copy cf_logs from 's3://big-data/logs/E123ABCDEF/'

credentials 'aws_access_key_id=<key_id>;aws_secret_access_key=<secret_key>'

IGNOREHEADER 2

GZIP

DELIMITER '\t'

DATEFORMAT 'YYYY-MM-DD'

Page 52: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

COPY into Amazon Redshift with

AWS Data Pipeline

Page 53: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Time for Data Visualization

Charles Minard's flow map of Napoleon's March (1869)

Page 54: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 55: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Choose Your Favorite

Visualization Tool

Tableau (Windows instance)

R

Jaspersoft

QlikView

MicroStrategy

SiSense

Page 56: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 57: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 58: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 59: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Snapshot before Delete

Page 60: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Unload Data from Amazon Redshift unload (“select * from cf_logs where date between '2013-11-03’ and '2013-11-10’“)

to 's3://mybucket/unload_cf_logs_week_46'

credentials 'aws_access_key_id=<key_id>;

aws_secret_access_key=<secret_key>’

delimiter as '\t’

GZIP;

Page 61: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Reference Architecture

Page 62: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Partner Services

Loggly

Splunk

Stratalux (Logstash)

Loggly AWS Marketplace Page

Page 63: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

What Else Can You Do with

Log Analysis?

Page 64: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 65: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Finally, a Small Warning

Abraham Wald (1902-1950)

Page 66: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

A B C

Page 67: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of
Page 68: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Would You Like to Know More?

Further reading http://aws.amazon.com/architecture

http://aws.amazon.com/articles

http://aws.typepad.com

Re:invent sessions DAT205 - Amazon Redshift in Action: Enterprise, Big Data, and SaaS

DAT305 - Getting Maximum Performance from Amazon Redshift

BDT301 - Scaling your Analytics with Amazon Elastic MapReduce

Page 69: ARC 306: Lumberjacking on AWS Cutting Through Logs to Find …awsmedia.s3.amazonaws.com/ARC306.pdf · 2013-11-21 · The Challenge “…Foursquare streams hundreds of millions of

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

ARC306