spell: streaming parsing of system event logsmind/papers/spell_slides.pdf · spell: streaming...

Post on 19-Apr-2018

232 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Spell: Streaming Parsing of System Event Logs

Min Du, Feifei Li

School of Computing,

University of Utah

Background

Spell: Streaming Parsing of System Event Logs2

Background

Spell: Streaming Parsing of System Event Logs3

System Event Log

Background

Spell: Streaming Parsing of System Event Logs4

System Event Log

Exists practically on

every computer system!

Background

Spell: Streaming Parsing of System Event Logs5

System Event Log

Exists practically on

every computer system!

Background

Spell: Streaming Parsing of System Event Logs6

System

Event

Log

Started service A on port 80

Started service B on port 90

Started service C on port 100

Executor updated: app-1 is now LOADING

Executor updated: app-2 is now LOADING

TaskSetManager: Starting task 0 in stage 2

TaskSetManager: Starting task 1 in stage 5

……

Background

Spell: Streaming Parsing of System Event Logs7

System

Event

Log

Structured DataMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

Started service A on port 80

Started service B on port 90

Started service C on port 100

Executor updated: app-1 is now LOADING

Executor updated: app-2 is now LOADING

TaskSetManager: Starting task 0 in stage 2

TaskSetManager: Starting task 1 in stage 5

……

Started service * on port *

Executor updated: * is now LOADING

TaskSetManager: Starting task * in stage *

……

Background

Spell: Streaming Parsing of System Event Logs8

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

L O G A N A L Y S I S

Background

Spell: Streaming Parsing of System Event Logs9

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

Message count vector:

Xu’SOSP09, Lou’ATC10, Lin’ICSE16, etc.

L O G A N A L Y S I S

Background

Spell: Streaming Parsing of System Event Logs10

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

Message count vector:

Xu’SOSP09, Lou’ATC10, Lin’ICSE16, etc.

Build workflow model:

Lou’KDD10, Beschastnikh’ICSE14,

Yu’ASPLOS16, etc.

L O G A N A L Y S I S

Background

Spell: Streaming Parsing of System Event Logs11

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

L O G P A R S I N G

Background

Spell: Streaming Parsing of System Event Logs12

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

L O G P A R S I N G

Use source code as template to parse logs:

Xu’SOSP09

Background

Spell: Streaming Parsing of System Event Logs13

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

L O G P A R S I N G

Use source code as template to parse logs:

Xu’SOSP09

Problem: What if we don’t have source code?

Background

Spell: Streaming Parsing of System Event Logs14

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

L O G P A R S I N G

Use source code as template to parse logs:

Xu’SOSP09

Problem: What if we don’t have source code?

Directly parse from raw system logs:

Makanju’KDD09, Fu’ICDM09, Tang’ICDM10, Tang’CIKM11, etc.

Background

Spell: Streaming Parsing of System Event Logs15

System

Event

Log

Structured Data Anomaly

DetectionMessage/Event type

Log key

……

printf(“Started service

%s on port %d”, x, y);

L O G P A R S I N G

Use source code as template to parse logs:

Xu’SOSP09

Problem: What if we don’t have source code?

Directly parse from raw system logs:

Makanju’KDD09, Fu’ICDM09, Tang’ICDM10, Tang’CIKM11, etc.

Problem: Offline batched processing, some very slow.

Our approach

Spell: Streaming Parsing of System Event Logs16

Spell, a structured Streaming Parser for Event Logs using an

LCS (longest common subsequence) based approach.

Our approach

Spell: Streaming Parsing of System Event Logs17

Spell, a structured Streaming Parser for Event Logs using an

LCS (longest common subsequence) based approach.

Two log entries:

Temperature (41C) exceeds warning threshold

Temperature (42C, 43C) exceeds warning threshold

Example:

Our approach

Spell: Streaming Parsing of System Event Logs18

Spell, a structured Streaming Parser for Event Logs using an

LCS (longest common subsequence) based approach.

Two log entries:

Temperature (41C) exceeds warning threshold

Temperature (42C, 43C) exceeds warning threshold

LCS:

Temperature * exceeds warning threshold

Example:

Our approach

Spell: Streaming Parsing of System Event Logs19

Spell, a structured Streaming Parser for Event Logs using an

LCS (longest common subsequence) based approach.

Two log entries:

Temperature (41C) exceeds warning threshold

Temperature (42C, 43C) exceeds warning threshold

LCS:

Temperature * exceeds warning threshold

Naturally a message type!

printf(“Temperature %s exceeds warning threshold”)

Example:

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs20

Add new log entry into LCSMap in a streaming fashion, update existing message type if

length(LCS) > 0.5 * length(new log entry)

LCSMap

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs21

LCSMap

new log entry: Temperature (41C) exceeds warning threshold

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs22

LCSMap

LC

SO

bje

ct

LCSseq: Temperature (41C) exceeds warning threshold

lineIds: {0}

paramPos: {empty}

new log entry:

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs23

LCSMap

LC

SO

bje

ct

LCSseq: Temperature (41C) exceeds warning threshold

lineIds: {0}

paramPos: {empty}

new log entry: Temperature (43C) exceeds warning threshold

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs24

LCSMap

LC

SO

bje

ct

LCSseq: Temperature * exceeds warning threshold

lineIds: {0, 1}

paramPos: {1}

new log entry:

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs25

LCSMap

LC

SO

bje

ct

LCSseq: Temperature * exceeds warning threshold

lineIds: {0, 1}

paramPos: {1}

new log entry: Command has completed successfully

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs26

LCSMap

new log entry:

LC

SO

bje

ct

LCSseq: Temperature * exceeds warning threshold

lineIds: {0, 1}

paramPos: {1}

LC

SO

bje

ct

LCSseq: Command has completed successfully

lineIds: {2}

paramPos: {empty}

SPELL – Basic workflow

Spell: Streaming Parsing of System Event Logs27

LCSMap

new log entry: ……

……

LC

SO

bje

ct

LCSseq: Temperature * exceeds warning threshold

lineIds: {0, 1}

paramPos: {1}

LC

SO

bje

ct

LCSseq: Command has completed successfully

lineIds: {2}

paramPos: {empty}

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs28

To compute LCS of two log entries, each one has 𝑶(𝒏) length:

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs29

To compute LCS of two log entries, each one has 𝑶(𝒏) length:

Naïve way: Dynamic Programing

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs30

To compute LCS of two log entries, each one has 𝑶(𝒏) length:

Naïve way: Dynamic Programing

Time complexity:

To compare a log entry with an existing message type: 𝑂(𝑛2)To compare a new log entry with 𝑂(𝑚) existing message types: 𝑂(𝑚𝑛2)

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs31

To compute LCS of two log entries, each one has 𝑶(𝒏) length:

Naïve way: Dynamic Programing

Time complexity:

To compare a log entry with an existing message type: 𝑂(𝑛2)To compare a new log entry with 𝑂(𝑚) existing message types: 𝑂(𝑚𝑛2)

Can we do better?

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs32

Observation. For a complex system,

number of log entries: millions

number of message types: hundreds

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs33

Observation. For a complex system,

number of log entries: millions

number of message types: hundreds

For example:Blue Gene/L log:

4,457,719 log entries, 394 message types

Hadoop log used in Xu’SOSP09:

11,197,705 log entries, only 29 message types

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs34

Observation. For a complex system,

number of log entries: millions

number of message types: hundreds

For example:Blue Gene/L log:

4,457,719 log entries, 394 message types

Hadoop log used in Xu’SOSP09:

11,197,705 log entries, only 29 message types

For a majority of new log entries, their message types already exist in LCSMap!

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs35

Improvement 1: Prefix Tree

Existing message types:

A B C

A C D

A D

E F

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs36

Improvement 1: Prefix Tree

Existing message types:

A B C

A C D

A D

E F

ROOT

A E

FB C D

C D

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs37

Improvement 1: Prefix TreeROOT

A E

FB C D

C D

New log entry: A B P C

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs38

Improvement 1: Prefix TreeROOT

A E

FB C D

C D

New log entry: A B P C

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs39

Improvement 1: Prefix TreeROOT

A E

FB C D

C D

New log entry: A B P C

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs40

Improvement 1: Prefix TreeROOT

A E

FB C D

C D

New log entry: A B P C

Parameter:

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs41

Improvement 1: Prefix TreeROOT

A E

FB C D

C D

New log entry: A B P C

Parameter:

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs42

Improvement 1: Prefix TreeROOT

A E

FB C D

C D

Time Complexity:

𝑶(𝒏) for each log entry

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs43

Improvement 1: Prefix TreeROOT

A D

AB

C

Problem:

New log entry: D A P B C

E

F

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs44

Improvement 1: Prefix TreeROOT

D

A

Problem:

New log entry: D A P B C

Matches D A

A

B

C

E

F

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs45

Improvement 1: Prefix Tree

Problem:

New log entry: D A P B C

Matches D A

Should be: A B C

ROOT

D

A

A

B

C

E

F

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs46

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs47

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs48

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs49

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs50

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs51

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs52

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs53

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒎

𝑷𝒐𝒊𝒏𝒕𝒆𝒓 𝑷𝒍

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs54

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

Matched length:

3

N/A

2

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs55

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs56

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

Time complexity

𝑶 𝒎𝒏

Number of

message

types

Log entry

length

SPELL – Improvement on efficiency

Spell: Streaming Parsing of System Event Logs57

Improvement 2: Simple Loop

Compare each message type with new log entry

Message types:[ A B C ]

New log entry:

[ A E F ]

[ D A ]

[ D A P B C]

Time complexity

𝑶 𝒎𝒏

Number of

message

types

Log entry

length

For remaining log entries, compare it with each message type using simple DP.

Evaluation

Spell: Streaming Parsing of System Event Logs58

IPLoM (Makanju’KDD09):

Partition log file using 3-step heuristics (log entry length, etc.)

CLP (Fu’ICDM09)

Cluster similar logs together based on weighted edit distance

Log dataset:

Log type Count Message type ground truth

Los Alamos HPC log 433,490 Available online

BlueGene/L log 4,747,963 Available online

Methods to compare:

Evaluation - Efficiency

Spell: Streaming Parsing of System Event Logs59

log size (× 105, Los Alamos) log size (× 105, Blue Gene)

Evaluation - Effectiveness

Spell: Streaming Parsing of System Event Logs60

log size (× 105, Los Alamos) log size (× 105, Blue Gene)

Conclusion

Spell: Streaming Parsing of System Event Logs61

Thank you

mind@cs.utah.edu

A streaming system event log parser

Using LCS

Prefix tree and simple loop to improve efficiency

Outperform offline methods on large system log dataset

Spell:

Evaluation - Efficiency

Spell: Streaming Parsing of System Event Logs62

Evaluation - Effectiveness

Spell: Streaming Parsing of System Event Logs63

top related