dus integration observability

17
DUS Integration observability & Diagnostics Are we doing the right thing to improve? Notes from meeting with: RCS, CA CS, CA O&M and DUS integration and a meeting with TN and IELL.

Upload: zulnoorain-ali-ghumman

Post on 29-Sep-2015

30 views

Category:

Documents


5 download

DESCRIPTION

vcnf

TRANSCRIPT

DUS Integration observability & Diagnostics

DUS Integration observability &DiagnosticsAre we doing the right thing to improve?Notes from meeting with: RCS, CA CS, CA O&M and DUS integration and a meeting with TN and IELL.Slide title70 pt

CAPITALS

Slide subtitle minimum 30 pt

Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 6/0363-10/FCP 130 0532 Uen, Rev PA1 1DUS Integration observability Meeting: 10/10 -14Participants : DUS Integration : Anders Hgberg, Bjrn ForsbergCA:Martin Rydar, Irene Strmbck, Anders nerudRCS:Stefan Dahl, Maja DrmacMeeting: 21/10 14TNAlexandru Andrei, Gunnar Hallendal, Mirjana Dalarsson and moreIELLPer Sydhoff, Henrik man, and moreDUS integraBjrn ForsbergMinutesSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 2DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 DUS Integration has had and continue to have problems in observing what happens inside a DUS during integration.What is missing?What is planned in CA and design to improve?Do we need to change priority in backlog?Do we need to add something new in backlog?problemSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #DUS Integration ExperienceGoal : WRAT, LRAT traffic on DUS52 (simple traffic case)Step 1 black workStep 2 make it white, include CI test casesExperienceObservability is difficultHard to troubleshootforce a crash and look at the dumplack of traces and counters. Only normal cases implemented, exception cases not handledBBI is worst (time shortage)EMCA pmd decoding have been an issueBBI integration. 40 corrections required to get LRAT standing upBBI fault coverage in DPT is too lowwindriver 6 has resulted in new issues in CS pmd decodingSymbol info missing, soon fixed, but has been troublesomemini core dump and Preempt RT (Linutronics) exist, but are still too bigLTTng from efficiOS.Lack of support level agreement with key suppliers of debug functionalitySLA towards Windriver is not enoughCurrent WOW is not sharp enough. IdeasQuality, nr of impl trace pointsCS RoadmapLTTng Kernel traceLinux MUM replacer (memory leakage)NTP sync to LTTNg to keep time stamps correctCharmon for Linux ( to show cache miss, TLB fault etc)HW Trace support (ARM sw, branch history) , using CoreSightPMD snapshot as part of Bb restartBack trace improvement frn WindriverCS exploratory testO&M Road mapUse of baseband time in both CS and BBI env.Study needed?? Streaming trace to disk or client diagnostics access to radio (not critical for integration)Clarify Log DR and usageBBIHicap trace work ongoing. Dependent on TNxRATContext Based tracing for WCDMA. Work ongoing.Transport (TN & IELL)Internal observabilityCI improvementsJCAT test suites with possibility to choose trace levelAPRCA TTF, why does it take timetarget specific shared libraries for PMD per DUS typeImprove internal transport observability

2014-10-26 3DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Often only normal cases implemented.Abnormal and exception cases not handled i.e. no info that it occursPoor observabilityLack of implemented tracesMitigation today: force a crash and look at dumpInstable tracing, traces disappearsLack of countersEMCA PMD decoding have been an issue. Working nowChange to Windriver 6 have messed up AXM PMD decodingE.g. missing symbol files. Almost ok nowDUS integration experience 1(2)Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 4DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Low test coverage in BBI software.>40 BBI corrections required to get LRAT standing upPoor functionality to observe internal user plane transportNeed to better understand and visualize the ports on the internal switch and the usage of it.Port mirroring? Or other means to find stray packets and look at activities ongoing.Pm counters are not so useful. Its the ADK counters we need, during execution. Using commands to read ADK counters.Correction of fault takes too long timeE.g. intermittent watchdog fault in CS and crash when using commands in colish.DUS integration experience 2(2)Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 5DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 The support from and cooperation with partners supplying debug functionality is not good enoughPost Mortem Dump (PMD), Mini core dump from LinutronicsMini core dumps to big, hard to changeLTTng (EfficiOS)We have SLA (Support Level Agreement) only with Windriver, not with subcontractor, who we work with in reality.

Backtrace functionality in LTTng is being improved. Soon available

CS experienceSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 6DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Ongoing WPsLTTng Kernel traceHiCAP trace (BBI) (ongoing?) Context Based Tracing (WRAT)

WPs in backlogDiagnostics mechanism, (streaming trace) (CS & CAT)Mostly CS and some EMCLI work (MoShell)

CS have exploratory test of LTTng ongoing to improve usability and find issues early

CS & O&M WP backlog 1(2)SF Debug supportSF RBS trouble shootingSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 7DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Ongoing studiesLinux MUM replacer (memory leakage)Charmon for Linux ( to show cache miss, TLB fault etc) (prio 3)HW Trace support (ARM sw, branch history) , using CoreSight (prio 4)All studies above includes prototyping and will be used for E/// internal troubleshooting and can arrive after LFD. PMD snapshot as part of Bb restart CS has no capability to LFD. Bb restart study will delay this.Diagnostics access to radio (halted due to CAT constraint)CA backlogNTP sync to LTTng to keep time stamps correctCommon time base in EMCA and AXM traces using baseband timeClarify Log DR and usageAbove items are not secured for 15B

CS & O&M CA backlog 2(2)SF Debug supportSF RBS trouble shootingSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 8DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 TN has used a BBI loop in EMCATN use debug port to redirect data in AXM (intended for EMCA)Sniffer (wire shark) on closest switch outside has also been used by TN to look at user plane trafficInternal ADK counters are used for observability.Accessible through COLI (colish) in real time using commandsNot stored for later analysis.DUS integration have a list of useful TN and IELL commands. Available observability features are known.

TRP observability TodaySlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 9DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Colish is crashing, which blocks usage of test commands.TR issued by CS to WindRiver. Hard to get quick response.Hard coded MAC addresses. Have caused startup sync issues. New (correct solution for G2) MACI will improve when its introduced as the startup sequence will be steered by server client relationship.Many faults have been endian related ..

TN/IELL experiencein DUS integrationSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 10DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 In IELL (IPT) Backlog (ongoing)IPT COLI cmd (ongoing, together with TN)Print PCEP local dataPrint PCEP level statisticsIn TN backlogScript in code to collect global counters using ESI action hookMake LOCO ( Appl simulator) useful for debuggingPossible Items for backlogEndian fault troubleshooting. How do we check for faults like this?Circular fault event buffer for transport faults. Using COLI cmd

TN/IELL backlogSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 11DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Main problem for Integration is The missing implementation of exceptions and abnormal cases in combination with instable tracing.Lack of traces, missing tracesIntro of Windriver 6 at the same time as DUS integration has caused extra delay.Lack of internal transport observabilityBBI own testing (DPT) leaves to many faults to next stepDifficult to be in main track when the use case is half ready. WoW missingWould anything in WP and CA backlog have helped in the integration if it had been available?Yes, but not enoughIs the backlog content or prio wrong?No, it looks fairly OkIs something missing?More of internal transport observability for troubleshooting ConclusionSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 12DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 CSFast action on HOT faultscrashing colish (blocking ADK counter commands)Instable tracing (traces disappear)Watchdog fault Daily participation on follow up meetingEnsure fast and efficient WindRiver support, including subcontractors (EfficiOS, Linutronics)Finalize Backtrace functionalityImplement diagnostics mechanism for trace streaming to disk and clientAction points wk44 1(2)Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #TNDevelop LOCO test toolIPTPCEP test commandsBBI Reduced fault slip through, short term branch ?Finalize HiCAP trace featureDUS IntegrationHave a clear list on requested support with responsible project/teamFollowed up on project meetingsMake sure that known test loops are usedBBI Loop in EMCASniffer (wireshark) for outgoing IP trafficRedirect to debug portLRAT / WRATImplement true MACI from IELL (not urgent Improved tracing/protection on exceptions and abnormal use cases?CAMaintain focus on debug support and RBS trouble shooting studies, even if they are not secured to LFD.

Action points wk44 2(2)Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #IELL and TN should look at the need for internal observability and test commands together with DUS integration.BBI should analyze why so many BBI faults have been found and consider possible improvements in local test coverageThe E/// connection and SLA towards Windriver subcontractors in trouble shooting area has to improve.Very difficult right nowMake an RCA on why fault correction takes so long timeE.g. Watchdog fault in CSKeep baseline as stable as possible during integration

Way forward 42Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 15DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1 Endian fault troubleshooting. Tool or method to find endian faults in the code.Circular fault event buffer for transport faults. Using COLI cmdAn updated WoW that makes it easier to protect integration activities from Main Track impact.

Wish listSlide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #

Slide title 44 pt

Text and bullet level 1 minimum 24 pt

Bullets level 2-5minimum 20 pt

Characters for Embedded font:!"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Do not add objects or text in the footer areaDUS Integration observability | Ericsson Internal | 6/0363-10/FCP 130 0532 Uen, Rev PA2 | 2014-10-29 | Page #2014-10-26 17DUS Integration observability 6/0363-10/FCP 130 0532 Uen, Rev PA1