pushing up performance for everyone matt mathis 7-dec-99

24
Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Upload: gabriel-cadwell

Post on 01-Apr-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Pushing Up Performance for Everyone

Matt Mathis

7-Dec-99

Page 2: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Why do so few people get good network performance?

• Context and history

• Architectural origins

• Approaches

Page 3: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

The Wizard Gap

0.1

1

10

100

1000

Year

Dat

a R

ate

(Mb/

s)

Expert

Default

Page 4: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Past Performance Evolution

• Wizards wrote standards– Standard TCP could not go fast (1988)

• Wizards enhanced systems– Stock systems could not go fast (1995)

• Gurus tune systems (today)– Fast TCP is present – Badly misstuned by default

Page 5: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Ongoing Performance Evolution

• More disciples tune and debug (tomorrow)– All netadmins and sysadmins?

• Systems are tuned by default (future)– Web100..…

• Debugging will become “easy” (?)

Page 6: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Architecture

• The Good news– TCP hides the net from the application

• The Bad news– TCP hides the net

Page 7: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Architecture

• The Good news– TCP hides the net from the application

• The Bad news– TCP hides the net

……. including ALL bugs everywhere.

• The only legal symptom is less than expected performance

Page 8: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

You get poor performance if:

– The application is inefficient– TCP is buggy – TCP is misstuned– The path is buggy– The path is congested– Routing is suboptimal

Especially on a long path.– Think: weakest link of an invisible chain

Page 9: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Closing the Wizard gap

• Share the expertise– Train more disciples

• Require less expertise– Systems should tune themselves

• Better observability– Focused and efficient debugging

• Documentation– Show that the world is improving

Page 10: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Share the expertise

• Joint Techs meetings

• TCP Tuning– In depth presentation by Matt Mathis

• DAST Application tutorials– See: dast.nlanr.net

Page 11: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Require less expertise

• TCP Autotuning– Presentation by Matt Mathis

• Web100– Presentation be Basil Irwin

• Online TCP debugging resources– See http://www.ncne.nlanr.net/TCP

Page 12: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Better Observability (Instrumentation)

• Network Instrumentation and Visualization– Presentation by Mark Gates

• Trace Analysis and Auto-Diagnosis– Presentation by Kathy Benninger

• Better TCP instrumentation (Web-100)– Just ask TCP why it is slow

Page 13: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Better Observability(Debugging methods)

• Sweden - Pittsburgh path– Presentation by Greg Miller & Jerry Sobieski

• iPerf tool– Presentation by Mark Gates

• Existing tools and tool repositories– See: http://www.ncne.nlanr.net/tools

• Still insufficient

Page 14: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Better Observability(Measurement)

• Measurements from Seattle I2 Meeting– Presentation by Matt Zekauskas

• Advanced Research and Engineering Atlas– Presentation by John Jamison

• Many distributed measurement efforts– AMP, Surveyor, NIMI, etc

Page 15: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Documentation

• vBNS stats and measurement– Tutorial by Rick Wilder

• NLANR MOAT vBNS traffic on NAI– See: moat.nlanr.net

• Many benchmark efforts– Surveyor, AMP, NIMI, Web100……

• HPC host census(?)

Page 16: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Conclusion

• We need to find every bug that TCP hides– Now and always

• We need to eliminate all irrelevant controls– Autotune TCP (and RED, etc)

Page 17: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Debugging flowchart

• http://www.ncne.nlanr.net/TCP/debugging

• Look at a trace and click to study symptoms

• Ongoing evolution

Page 18: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Testrig kit

• "Fool proof" TCP diagnosis starter kit with:– Simple diagnostic application– TCP trace collection tools– Visualization tools– Pointer to the debugging flowchart

• With wrapper scripts around everything

Page 19: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

TCP Debugging In-depth

• Draft done at CAIDA this summer

• Future NCNE On-site– 1, 2.5 and 5 hour versions

• Basis for the debugging flowchart

• Update from flowchart as it evolves

• Interactive - Uses magicpoint/xplot

Page 20: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Trace Analysis and Auto-Diagnosis(TAAD)

• Scan GigaPop traffic for misstuned TCP connections– that fail to meet the model

rate = (MSS/RTT) * (C/sqrt(p))

• Running prototype

• Use to direct other resources

Page 21: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Autotuning

• Make TCP “do the right thing” by default

• No unneeded user controls

Page 22: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Generate data points (AMP)

• Nearly 100 systems already

• Kernel TCP bug– Need to upgrade to freeBSD 3.3

• Easy to create 100x1 data points

• Can create 100x100 data points

• Opportunity for NIMI

Page 23: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

Generate OC-12 data points

• Max Okumoto working at PSC for SDSC

• Will start tuning selected paths

Page 24: Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

HPC Host Census

• Use existing data from MCI OC-Xmon

• Patterned after HWB big flow detection

• Measure the number of fast hosts

• Words needed to generalize to all of JET