demystify the impact of e2e latency on mobile qoecellnet2013.cs.wisc.edu/hui-cellnet2013.pdf ·...
TRANSCRIPT
Demystify the Impact of E2E Latency on Mobile QoE
Dr. Jie Hui ([email protected]) , Principal Engineer
QoE Lab, T-Mobile U.S.A
Jun 25, 2013
Acknowledgements to T-Mobile team and contributors: Kevin Lau, Himesh Bagley, Jeffery Smith, Pablo Tapia, Jason Chang, Aaron Drake, Antoine Tran
2 T-Mobile Proprietary and Confidential
Introduction of Me
15 years experience in mobile networking R&D, passionate about Quality of User Experience (QoE) and E2E Cross-Layer Optimizations
• QoS over UMTS, ZTE, 3G R&D
• QoS over Wi-Fi, NCSU, ECE
• QoS over Wi-Max, Intel Labs, Wireless Network Lab
• QoE over Cellular Network, T-Mobile, QoE Lab
3 T-Mobile Proprietary and Confidential
Agenda
• Mobile trends and challenges
• Latency mysteries
• Call for actions
Mobile Trend and Challenges
5 T-Mobile Proprietary and Confidential
Aggressive Mobile Data Growth
“2013 Internet Trends”, KCPB “The State of the Internet”, Q4 2012, Akamai
Mobile voice traffic stays steady Mobile data traffic grows exponentially
6 T-Mobile Proprietary and Confidential
Faster Mobile Network Speed
“The State of the Internet”, Q4 2011/ 2012, Akamai
- Access technology faster – GSM, EDGE, UMTS, HSPA, HSPA+, LTE - Mobile connection speed faster
0
1
2
3
4
5
2011 Q4 2012 Q4
Co
nn
ect
ion
Sp
ee
d (
Mb
ps)
Average Speed
US-1
US-2
US-3
0
2
4
6
8
10
12
14
16
18
2011 Q4 2012 Q4
Co
nn
ect
ion
Sp
ee
d (
Mb
ps)
Peak Speed
US-1
US-2
US-3
7 T-Mobile Proprietary and Confidential
Mobile Device Processing Faster
http://www.qualcomm.com/snapdragon/processors/800-600-400-200/specs
Snapdragon platform examples
8 T-Mobile Proprietary and Confidential
Is The Mobile Experience Faster?
Mobile web 30% faster Desktop web 5.7% faster
“Building faster mobile websites”, Ilya Grigorik, Google “Is the web getting faster?”, Google Analytics Blog
People expects mobile experience as fast as desktop
9 T-Mobile Proprietary and Confidential
Great, But There Are Still Challenges
“Is the web getting faster?”, Google Analytics Blog
)
Anomalies and long tail
Latency Mysteries
11 T-Mobile Proprietary and Confidential
Mobile Browsing QoE Investigations
12 T-Mobile Proprietary and Confidential
Mystery 1 – Long DNS Lookup Time
• Aggregated average DNS lookup (network side) = 222 ms
• Curious Georges looked into distribution and outliers (10s+) from device side
0
10
20
30
DNS Lookup Time (ms)
Frequency
13 T-Mobile Proprietary and Confidential
Root Cause: Cross-layer Analysis
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
27 32 37 42 47 52 57 62
Cell_PCH
Cell_DCH
Cell_FACH
DNS response DNS query Corrupted DNS response Label:
Data Stall #1: Radio network controller corrupts downlink packets due to key out-of-synch after radio state transition (Cell PCH Cell FACH)
14 T-Mobile Proprietary and Confidential
Improvements After Fix
0
5
10
15
20
25
30
25% 50% 75% 90% 95% 99% Average
Tim
e (
s)
Percentile
DNS Lookup Time (s) Before vs. After DataStall #1 Fix
Before
After
3.7s 0.7s
Saving avg. 3s [*] during erroneous scenario
[*] Note: results provide by device testing of 200+ samples.
When DNS lookup is initiated from Cell_PCH state
15 T-Mobile Proprietary and Confidential
Mystery 2 – Long TCP Connect Time
Delay up to 11s
0
1
2
3
4
5
6
25% 50% 75% 90% 95% 99% Average
Tim
e (
s)
Percentile
TCP Connect Time (s) Before vs. After DataStall #2 Fix
Before
After
0.95s 0.8s
Slight improvement Data Stall #3
Root Cause: Radio network controller scheduler unnecessarily buffer DL packets up to 11s due to a bug in credit based scheduling algorithm between NodeB and RNC causes UL RLC retransmissions and RESETs.
[*] Note: results provide by device testing of 2000+ samples.
16 T-Mobile Proprietary and Confidential
0
1
2
3
4
5
25% 50% 75% 90% 95% 99% Average
Tim
e (
s)
Percentile
TCP Connect Time (s) Before DataStall #3 Fix
Before
After
Mystery 3 – TCP Connect Time Still Long
Root Cause: Radio network controller UL scheduler drops UL packets during channel switching from DCH to FACH, same problem happens during HSR99, DCHFACH
Waiting for fix
[*] Note: results provide by device testing of 2000+ samples.
17 T-Mobile Proprietary and Confidential
Mystery 4 – Long HTTP Response Time
Keystroke HTTP response time
14:36:58 14:38:24 14:39:50 14:41:17 14:42:43 14:44:10 14:45:36 14:47:02 14:48:29
0
0.5
1
1.5
2
2.5
3
3.5
4
Google Search Keystroke Response Time Correlation with Radio States
RadioState
Total RTT-UE
Total RTT RAN
Total RTT Core
Delta UE-RAN
Delta Ran-CORE
High latency during low
power state Cell_DCH
Cell_FACH
82% spent over the air
18 T-Mobile Proprietary and Confidential
11:52:48 11:53:31 11:54:14 11:54:58 11:55:41 11:56:24 11:57:07
0
1
2
3
HTTP Keystroke HTTP Response Time Correlation with Radio States
RadioState
HTTP response time
Cross-Layer Analysis After Search Traffic Change
Keystroke HTTP requests and responses mostly happen in high power state (Cell_DCH) on dedicated channel due to large packet size (~1200B) therefore experience high data rate and low latency.
0
2
4
Tim
e (
s)
Percentile
Google Search Keystroke Response Time (s) Before vs. After Traffic Tuning
Before
After
1.6s 0.4s
Low HTTP response time
19 T-Mobile Proprietary and Confidential
Misperception
Front End Back End
3G, 4G speed facial illustrations from http://www.intomobile.com/2012/04/18/tmobile-and-att-top-network-speed-tests/
20 T-Mobile Proprietary and Confidential
Demystify
- Shared channel - Power limitation - Traffic dependent – You have the power to tune your traffic to be radio aware!
It is an elastic pipe!
21 T-Mobile Proprietary and Confidential
Perhaps The Biggest Mystery Of All…
Call for Actions
23 T-Mobile Proprietary and Confidential
Inspirations - Fast Web Evolution and Open Innovations
http://www.evolutionoftheweb.com
24 T-Mobile Proprietary and Confidential
More Open Innovations & Inter-disciplinary R&D
CS (application, web
development)
EE (telecommunication-
engineering)
Application Layer
Link Layer
CS
Cross Layer
EE
25 T-Mobile Proprietary and Confidential
Area 1: Transient State Performance and Resilience
Problem: – Automatically characterize and instrument transient state mobile application performance
– Packet loss and retransmissions observed during transient states [*]
– Cross-layer analysis and optimizations still not fully explored!
Cell_DCH
(Connected Mode)
Cell_FACH, Cell/URA_PCH/
(Idle Mode)
Handover
/Cell-Reselection Link Failure
Mobility
[*] “Discovering RRC Transient States, Behavior and Performance Implications in Cellular Networks”, Sanae Rosen, Haokun Luo, Z. Morley Mao, Jie Hui, Aaron Drake, Kevin Lau, submitted to IMC 2013
26 T-Mobile Proprietary and Confidential
Area 2: Interactive Traffic Modeling and Scheduling
Problem: – 3GPP radio scheduling designed mostly with Full Buffer Traffic model
– Higher layer scheduling (RNC, SGW) is not considered
“Traffic models impact on OFDMA scheduling design”, P Ameigeiras et al. EURASIP Journal on Wireless Communications and Networking 2012
Scheduler
NodeB
SGSN (MME)
GGSN (SGW, PGW)
RNC
Proxy
Over the air
eNodeB
Evaluation assumption: NodeB or eNB queues always full of packets
27 T-Mobile Proprietary and Confidential
Quality of User Experience (QoE) Lab
Unique approach – Measure end user experiences from the device perspective
– Seek success by improving not only averages but also the spread and long tail
– Leverage key industry and academic partners for joint R&D and Real User Measurement (RUM)
– Analyze cross functionally to achieve cross layer intelligence and automation
Applications
O/S
Modem/AP
Radio Access
Network
Core Network
Service Provider
Time Synchronized Cross Layer Analysis