active/idle toggling with 0baseactive/idle … active/idle toggling with 0baseactive/idle toggling...
TRANSCRIPT
1
Active/Idle Toggling with 0BASEActive/Idle Toggling with 0BASEActive/Idle Toggling with 0BASEActive/Idle Toggling with 0BASE----x x x x for Energy Efficient Ethernetfor Energy Efficient Ethernetfor Energy Efficient Ethernetfor Energy Efficient Ethernet
November 2007IEEE 802.3az Task Force
Presenter: Robert HaysIntel Corporation
Contributors: Dave Chalupsky, Eric Mann, James Tsai, Aviad Wertheimer
2
AgendaAgendaAgendaAgenda
• GbE Controller Power & Energy Consumption
• Active/Idle Toggling with 0BASE-x Proposal
• Average Power Curves
• State Transitions
• Implications to 802.3 Specifications
• PHY Considerations
• Recommendations
3
Power & Energy GlossaryPower & Energy GlossaryPower & Energy GlossaryPower & Energy Glossary
• Operating PowerOperating PowerOperating PowerOperating Power – (Watts) The rate at which electrical energy is delivered to a circuit or system
– Peak PowerPeak PowerPeak PowerPeak Power – (Watts) Maximum operating power of a system
– Idle PowerIdle PowerIdle PowerIdle Power – (Watts) Operating power of a system at rest
• Energy Consumption Energy Consumption Energy Consumption Energy Consumption – (Joules) Aggregate power consumed by a system over a period of time.
– Energy Efficiency Energy Efficiency Energy Efficiency Energy Efficiency –––– (Joules/bit) Energy required to complete a unit of work. E.g. energy required to transmit/receive each bit of data.
– Average PowerAverage PowerAverage PowerAverage Power – (Watts) Energy consumed divided by the period measured
Peak Power
power
Idle Power
t0 t1
Average Power = Energy Consumption / (t1-t0)
Energy Consumption = ∫ power(t) dtt1
t0
Operating Power
4
GbE Controller Power MeasurementsGbE Controller Power MeasurementsGbE Controller Power MeasurementsGbE Controller Power Measurements
Observations:• 100M power is ~40% of 1000M power at 10% performance• 100M & 1000M Idle power savings (vs. Active) is 17-35%
– Savings come from PCIe L0s/L1 idle-state usage and turning off some circuits
• 10M Idle power savings is 60%– Additional savings from 10BASE-T idle signaling
• “No Link” is the lowest-power mode with the device still “on” (D0 state)– Savings come from turning off all unessential digital & analog circuits
Test Information:• Intel® 82573L Gigabit Ethernet
controller, including:– 10/100/1000 PHY, MAC, buffers,
PCI-Express x1 host interface– Typical client PC NIC device– .13µm fab process
• Idle = No traffic, 0 Mbps• Active = Line-rate bi-directional traffic• No Link = Cable removed (D0 state)
Single-Port PCI-e 10/100/1000 Controller
(MAC+PHY) Power Measurements
1010
53 314194
1217
483504
0
200
400
600
800
1000
1200
1400
No Link 10 100 1000
Link Rate (Mbps)
Avera
ge P
ow
er
(mW
)
Idle Active
Source: Intel labs
5
GbE Controller Energy ConsumptionGbE Controller Energy ConsumptionGbE Controller Energy ConsumptionGbE Controller Energy Consumption
Observations:
• 1000M transmission is 4x more energy efficient (J/bit) than 100Mbecause it sends the same amount of data in 1/10th the time
• 1000M is 40x more energy efficient than 10M
• An “ideal” EEE solution would transmit data with minimal energy and return to low-power idle between packet bursts– An Idle state would need to be defined by 802.3 for higher-speed PHYs
* Note: Bidirectional traffic. 10MB transferred in each direction.
Energy Required to Transmit 10MB
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10 100 1000
Link Rate (Mbps)
Tra
nsfe
r T
ime (
s)
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
En
erg
y
Co
nsu
mp
tio
n (
J)
Transfer Time (s) Energy Consumption (J)
Source: Intel labs
*
6
Proposed Proposed Proposed Proposed ““““0BASE0BASE0BASE0BASE----xxxx”””” IdleIdleIdleIdle
• 0BASE-x is a “quiet” line idle that consumes minimum power– ‘0’ = zero data rate, ‘x’ = Variable for any supported PHY type
• 0BASE-x idle power expectations for a GbE device:– “No Link”≤ 0BASE-x ≤ 10BASE-T Idle (e.g. 53mW ≤ 0BASE-x ≤ 194mW)
– Est. it to be closer to “No Link”
• 0BASE-x could take form of:
– A newly defined idle signal or…
– Reduced-voltage 10BASE-T idle
• 0BASE-x variations are likely
– Support for varying PHY types with unique timing requirements
– Possibly multiple sleep levels with differing resume latencies
Single-Port PCI-e 10/100/1000 Controller
(MAC+PHY) Power Measurements
1010
53 314194
1217
483504
0
200
400
600
800
1000
1200
1400
No Link 10 100 1000
Link Rate (Mbps)
Avera
ge P
ow
er
(mW
)
Idle Active
Source: Intel labs
7
Active/Idle Toggling with 0BASEActive/Idle Toggling with 0BASEActive/Idle Toggling with 0BASEActive/Idle Toggling with 0BASE----x Conceptx Conceptx Conceptx Concept
• Principle: Transmit data at fastest rate then return to idle– Energy savings come from power cycling between active/idle states
• Active/Idle toggling could be used instead of PHY rate shifting– Offers the best energy efficiency on links with lower utilization
– Integrates well with existing PC power management schemes (e.g. ACPI)
– Clock & power gating (on/off) is easier than rate shifting
• Asymmetrical operation would provide even better energy efficiency– Each direction could enter active & idle states independently
– Most end-node traffic is heavily weighted toward either send or receive
– Tx & Rx data paths already operate independently above the PHY
Idle
Active
8
Behavioral ComparisonBehavioral ComparisonBehavioral ComparisonBehavioral Comparison
0BASE-x Idle
GbE Active
time
Idle IntervalActiveDelay
Active/Idle Toggling with 0BASEActive/Idle Toggling with 0BASEActive/Idle Toggling with 0BASEActive/Idle Toggling with 0BASE----x:x:x:x:
IdleDelay
power
GbE
time
Up-ShiftDelay
PHY Rate Shifting:PHY Rate Shifting:PHY Rate Shifting:PHY Rate Shifting:
Down-ShiftDelay
100Mb
power
Off (No Link)
On (GbE)
time
IPG
Typical NIC Today (On or Off):Typical NIC Today (On or Off):Typical NIC Today (On or Off):Typical NIC Today (On or Off):
Packets
power
Lo
we
r E
ne
rgy C
on
su
mp
tio
n
9
Conceptual Average Power vs. BW UtilizationConceptual Average Power vs. BW UtilizationConceptual Average Power vs. BW UtilizationConceptual Average Power vs. BW Utilization
• FastFastFastFast----StartStartStartStart offers course-grain power regulation via 10x PHY choices
• SubsetSubsetSubsetSubset----PHYPHYPHYPHY allows finer-grain power steps utilizing new PHY modes
• Active/Idle TogglingActive/Idle TogglingActive/Idle TogglingActive/Idle Toggling with 0BASE-x allows smooth power averaging and lowest energy consumption on underutilized connections
10 100 1000 10,000
1
6
2
3
4
5
Ave
rag
e P
ow
er (W
)
Bandwidth Utilization (Mbps)
Fast-Start
1G Subset-PHY
10G Subset-PHY
Active/Idle Toggling
10
Average Power vs. BW Utilization SimulationAverage Power vs. BW Utilization SimulationAverage Power vs. BW Utilization SimulationAverage Power vs. BW Utilization Simulation
•Active (Resume) Delay = 10us•Idle (Sleep) Wait Period = 10us•Idle (Sleep) Delay = 1us
Input Assumptions:•Traffic Input = Trace_VOIP_*.txt•1000Mbps Active Power = 1217mW•Idle Power = 65mW
Avg Power vs. Bandwidth Utilization Using Active/Idle Toggling
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
0.01% 0.10% 1.00% 10.00% 100.00%
Bandwidth Utilization
Av
era
ge
Po
we
r (m
W)
(1Gb/s)(100Mb/s)(10Mb/s)(1Mb/s)(100kb/s)
1000Mbps (1217mw)
Idle(65mw)
Source: Intel labs
11
Comparison to Existing GbE Controller PowerComparison to Existing GbE Controller PowerComparison to Existing GbE Controller PowerComparison to Existing GbE Controller Power
Key (Active / Idle Power Assumptions):Active/Idle Toggling Simulation (1217mW / 65mW)1000BASE-T Measurements (1217mW / 1010mW)100BASE-TX Measurements (483mW / 314mW)10BASE-T Measurements (504mW / 194mW)
Avg Power vs. Bandwidth Utilization Using Active/Idle Toggling
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
0.01% 0.10% 1.00% 10.00% 100.00%
Bandwidth Utilization
Av
era
ge
Po
we
r (m
W)
(1Gb/s)(100Mb/s)(10Mb/s)(1Mb/s)(100kb/s)
Source: Intel labs
12
Simulation Model ContributionSimulation Model ContributionSimulation Model ContributionSimulation Model Contribution
• Intel will contribute the ‘Average Power vs. Bandwidth Utilization’ simulation model to the IEEE 802.3az Task Force
• The “C” program source code and sample traffic pattern trace files will be posted on the EEE Tools web page:
– http://grouper.ieee.org/groups/802/3/az/public/tools/index.html
13
Active/Idle State TransitionsActive/Idle State TransitionsActive/Idle State TransitionsActive/Idle State Transitions
Tx_Idle Tx_Active
Rx_Active Rx_Idle
Link PartnerReceive data path goes to Idle when link partner has completed sending data
Rx_Idle
Link PartnerReceive data path resumes to Active when link partner wants to send data
Rx_Active
System Policy Manager (e.g. LAN Driver)
Transmit data path goes to Idle when there is no data to send
Tx_Idle
System Policy Manager (e.g. LAN Driver)
Transmit data path resumes to Active when the system wants to send data
Tx_Active
Transition InitiatorTransition InitiatorTransition InitiatorTransition InitiatorDescriptionDescriptionDescriptionDescriptionTransitionTransitionTransitionTransition
14
Computer Networking System Block DiagramComputer Networking System Block DiagramComputer Networking System Block DiagramComputer Networking System Block Diagram
Hybrid
DSP
MAC/PHY Interface
RxTx
Media-Dependant Interface
(e.g. Cat6 cable)
MAC/PHY Interface(e.g SGMII, XAUI)
Host Interface(e.g. PCI-Express)
I/O Memory Buffer
Memory Bus(e.g. DDR2)
CPU Interface(e.g. FSB)
Processor
Chipset
System Memory
Controller
PHY
MAC/PHY Interface
Ethernet MAC
Filters, Queues, & Protocol Offloads
Data Buffers
I/O Bus
I/O Bus
Memory Controller
CPU Core(s)
Software
OS Kernel
LAN Driver
15
Hybrid
DSP
MAC/PHY Interface
RxTx
Media-Dependant Interface
(e.g. Cat6 cable)
MAC/PHY Interface(e.g SGMII, XAUI)
Host Interface(e.g. PCI-Express)
I/O Memory Buffer
Memory Bus(e.g. DDR2)
CPU Interface(e.g. FSB)
Processor
Chipset
System Memory
Controller
PHY
MAC/PHY Interface
Ethernet MAC
Filters, Queues, & Protocol Offloads
Data Buffers
I/O Bus
I/O Bus
Memory Controller
CPU Core(s)
Software
OS Kernel
LAN Driver
Tx_ActiveTx_ActiveTx_ActiveTx_Active TransitionTransitionTransitionTransition1. LAN Driver manages Tx toggling
policy. Example: It monitors I/O Memory Buffer for data to reach a fill threshold, then initiates Tx_Active
2. LAN Driver tells Controller when to enter Tx_Active
3. Controller enters Tx_Active and begins buffering data
4. Controller tells PHY to resume Tx_Active and then sends data to PHY
5. Local PHY enters Tx_Active
6. Local PHY sends it’s link partner a “start-transmit”frame and begins transmitting data after a specified delay
Key:Vendor Dependent802.3az Specified
16
Hybrid
DSP
MAC/PHY Interface
RxTx
Media-Dependant Interface
(e.g. Cat6 cable)
MAC/PHY Interface(e.g SGMII, XAUI)
Host Interface(e.g. PCI-Express)
I/O Memory Buffer
Memory Bus(e.g. DDR2)
CPU Interface(e.g. FSB)
Processor
Chipset
System Memory
Controller
PHY
MAC/PHY Interface
Ethernet MAC
Filters, Queues, & Protocol Offloads
Data Buffers
I/O Bus
I/O Bus
Memory Controller
CPU Core(s)
Software
OS Kernel
LAN Driver
Tx_IdleTx_IdleTx_IdleTx_Idle TransitionTransitionTransitionTransition1. LAN Driver manages the Tx
toggling policy. Example: Itmonitors I/O Memory Buffer for it to empty, then initiates Tx_Idle
2. LAN Driver tells Controller when to enter Tx_Idle
3. Controller tells PHY to enter Tx_Idle
4. Controller enters Tx_Idle
5. Local PHY stops transmitting and enters Tx_Idle
Key:Vendor Dependent802.3az Specified
17
Hybrid
DSP
MAC/PHY Interface
RxTx
Media-Dependant Interface
(e.g. Cat6 cable)
MAC/PHY Interface(e.g SGMII, XAUI)
Host Interface(e.g. PCI-Express)
I/O Memory Buffer
Memory Bus(e.g. DDR2)
CPU Interface(e.g. FSB)
Processor
Chipset
System Memory
Controller
PHY
MAC/PHY Interface
Ethernet MAC
Filters, Queues, & Protocol Offloads
Data Buffers
I/O Bus
I/O Bus
Memory Controller
CPU Core(s)
Software
OS Kernel
LAN Driver
Rx_ActiveRx_ActiveRx_ActiveRx_Active TransitionTransitionTransitionTransition
1. Local PHY receives “start-transmit” frame from it’s link partner
2. PHY tells controller to enter Rx_Active
3. PHY enters Rx_Active and begins receiving data
4. Controller enters Rx_Activeand receives data
5. Controller places data into I/O Memory Buffer
Key:Vendor Dependent802.3az Specified
18
Hybrid
DSP
MAC/PHY Interface
RxTx
Media-Dependant Interface
(e.g. Cat6 cable)
MAC/PHY Interface(e.g SGMII, XAUI)
Host Interface(e.g. PCI-Express)
I/O Memory Buffer
Memory Bus(e.g. DDR2)
CPU Interface(e.g. FSB)
Processor
Chipset
System Memory
Controller
PHY
MAC/PHY Interface
Ethernet MAC
Filters, Queues, & Protocol Offloads
Data Buffers
I/O Bus
I/O Bus
Memory Controller
CPU Core(s)
Software
OS Kernel
LAN Driver
Rx_IdleRx_IdleRx_IdleRx_Idle TransitionTransitionTransitionTransition
1. Local PHY detects Idle signal and enters Rx_Idle
2. PHY tells controller to enter Rx_Idle
3. Controller enters Rx_Idle
Key:Vendor Dependent802.3az Specified
19
Hybrid
DSP
MAC/PHY Interface
RxTx
Media-Dependant Interface
(e.g. Cat6 cable)
MAC/PHY Interface(e.g SGMII, XAUI)
Host Interface(e.g. PCI-Express)
I/O Memory Buffer
Memory Bus(e.g. DDR2)
CPU Interface(e.g. FSB)
Processor
Chipset
System Memory
Controller
PHY
MAC/PHY Interface
Ethernet MAC
Filters, Queues, & Protocol Offloads
Data Buffers
I/O Bus
I/O Bus
Memory Controller
CPU Core(s)
Software
OS Kernel
LAN Driver
Implications to IEEE 802.3 SpecificationsImplications to IEEE 802.3 SpecificationsImplications to IEEE 802.3 SpecificationsImplications to IEEE 802.3 Specifications
Vendor dependent (unspecified):•Tx state transition control & policy•System interface & data movement
802.3 specifications:•0BASE-x signaling for each PHY type•Active/idle transition signals•Transition timing requirements•MAC/PHY interface control•Control/status registers for active & idle states•Auto-negotiation capability registers•Clause 30 MIB implications•Error detection & recovery
20
0BASE0BASE0BASE0BASE----x PHY Considerations*x PHY Considerations*x PHY Considerations*x PHY Considerations*
1. Idle power consumption– How low can Idle get for each PHY type?
2. Transition delays– How quickly can the PHY resume active operation?
3. Timing recovery– How to maintain link status and clock sync?
4. Asymmetric operation– How to support transmission one way while the other is idle?
5. Implementation cost & complexity– How would 0BASE-x compare to Fast-Start or Subset-PHY?
*Note: These considerations are addressed in zimmerman_01_1107.pdf
21
Recommendations to 802.3az Task ForceRecommendations to 802.3az Task ForceRecommendations to 802.3az Task ForceRecommendations to 802.3az Task Force
• Define a 0BASE-x Idle state for each supported PHY type in the IEEE 802.3az Objectives
• Consider Active/Idle Toggling with 0BASE-x as an alternative to PHY Rate Shifting for Energy Efficient Ethernet
22