improvement of log pattern extracting algorithm using text
TRANSCRIPT
ImprovementofLogPatternExtractingAlgorithmUsingTextSimilarity
ZHAOYiningComputerNetworkInformationCenter,
ChineseAcademyofSciencesinHPBDC18,2018/05/21
Content
v CNGrid&LARGEv WhyLogPatterns&ExtractingAlgorithmv AlgorithmofIdenticalWordRatev TextSimilarityBasedApproach
Ø ImprovedExtractingFormation&LCSØ ExperimentResult
v ModifiedLogComparingModelv Summary&FutureWork
CNGrid&LARGE
v ChinaNationalHPCEnvironment
2OperatingCenters(Beijing/Hefei)
19Sites(200PF+162PB)
PortalwithMicro-ServiceArchitecture
ApplicationorientedGlobalScheduling&Predicting
ResourceEvaluationStandard&ComprehensiveEvaluationIndex
CNGrid&LARGE
v LogAnalyzingfRameworkinGridEnvironment
LogPatterns&ExtractingAlgorithm
v Wewanttobealertedforlogsincertainpatterns,but…Ø toomanylogsforhumantoreadØ needtosummarizepatternsbeforedefiningalertrules
v Setoflogpatternsinourcontext:Ø patternsaredifferentfromeachotherØ coveringalllogsinoriginalsetØ significantlylessthanoriginal
v TheprocessofusinglogpatternsØ filterandremovefrequentnormallogsØ uselogpatternextractionalgorithmstogetthesetofpatternsØ manuallycheckthesetandpickoutabnormalpatternsØ definerulestogeneratealertsforthesepatterns
AlgorithmofIdenticalWordRate
v Algorithmofidenticalwordrate–astraightforwardwayØ identicalwords
• 2wordsthatareidentical• andinthesamepositionin2originallogs
Ø identicalwordrate• (numberofidenticalwords)/(totalwords)• predefinedthresholdt• IfIWRisgreaterthant,thetwologsareinonepattern
v ProcessofalgorithmofIWRØ setthresholdtandinitialemptypatternsetPØ foreachnewincominglogs,computeIWRwitheachpatterninPØ ifpatternmatched,skiptonext;ifnonematched,addtoP
v SignificantLimitationØ LogswithdifferentlengthhasIWRofZERO!
TextSimilarityBasedApproach(1)
v UsingTextSimilaritytoresolvetheproblemØ S=PxOØ S:similarity,P:propotionofcommonwords,O:orderfactor
v Twologsl1andl2,L1andL2arewordsetsrespectivelyØ defineP:P(l1,l2)=(|L1∩L2|×2)/(|L1|+|L2|)Ø defineO:O(l1,l2)=SeqSim(l1,l2)/|L1∩L2|Ø henceS:S(l1,l2)=(SeqSim(l1,l2)×2)/(|L1|+|L2|)
v Bythis,logsindifferentlengthscanbecompared
TextSimilarityBasedApproach(2)
v UsingLongestCommonSubsequencetodefineSeqSim(l1,l2)Ø S(l1,l2)=(|LCS(l1,l2)|×2)/(|L1|+|L2|)Ø SamepatternifS(l1,l2)≥t,wheretisthepredefinedthreshold
v TheprocessofimprovedlogpatternextractingalgorithmØ setthethresholdvaluet.SettheinitiallogpatternsetPtobean
emptysetØ foranewloglappearingfromtheinputlogsetL,computeSi(l,pi)
betweenlandeverypi∈PusingaLCSalgorithmØ ifthereisnoSi(l,pi)≥t,addltoPØ afteralllogsinLhavebeenchecked,returnP
v IncreasetimecostforsinglecomparisonØ butreducetotalnumberofcomparisonsØ canbeoffsetbychoosingabetterLCSalgorithm
TextSimilarityBasedApproach(3)
v ExperimentresultØ numbersofextractedpatterns
TextSimilarityBasedApproach(3)
v ExperimentresultØ timecostsofcandidatealgorithms(inmilliseconds)
ModifiedPatternComparingModel(1)
v TheoriginalmodelisbadintimecostofsearchingpatternsØ hastovisitallpatternsuntiltheoneismet
v UsehashmaptoacceleratethematchingØ dividepatternsetintosubsetsbyinitialwordsØ skipmajorityofpatternsinirrelevantsubsets
v Matchingprocess:1. getinitialwordofthelog2. hashtheword3. finddesiredsubsetinhashmap4. comparewithpatterns
inthesubset
ModifiedPatternComparingModel(2)
v ThisapproachcannotdealwithpatternswithunfixedinitialsØ buildanunfixedpatternset
v Inrealsystem,wesplitpatternsetin4parts:Ø fixedalertpatternsetØ unfixedalertpatternsetØ fixednormalpatternsetØ unfixednormalpatternset
v Whenanewlogcomes,itiscomparedinthe4setsinturntodecideprocessingmethods
ModifiedPatternComparingModel(3)
v Realtimecostcomparisonbetweenoriginal&modifiedmodels
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
originalmodel modifiedmodel
cronmillisecond
0
500000
1000000
1500000
2000000
2500000
3000000
originalmodel modifiedmodel
maillogmillisecond
0
100000
200000
300000
400000
500000
600000
originalmodel modifiedmodel
securemillisecond
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
originalmodel modifiedmodel
messagesmillisecond
Summary&FutureWork
v Logpatterns:usedtobuildlogrecognitionv AlgorithmofIWRisn’tcapabletomatchlogsindifferent
lengthsv UsingtheideaoftextsimilarityandLCStoimprovethe
algorithmv Modifylogcomparingmodeltoacceleratetheprocess
v Futurework:logpatternbasedanalysesinCNGridØ logpatternassociationsØ logflowfeaturemodeling