networking and distributed systems research at msr...
TRANSCRIPT
Networking and Distributed Systems Research at MSR India
Chandu Thekkath
Managing Director MSR-India
Outline
• Last-mile access in the developing world • How do you provide ubiquitous WiFi like access in the presence of
infrastructural constraints
• Security in the cloud • How do you execute a program securely in the cloud with an untrusted cloud
provider
• Job scheduling in the cloud • How do you manage the compute and storage resources in a cloud in real-
time with competing jobs
Last-mile Access Project Greenspaces.
Krishna Kant Chintalapudi, Deeparnab Chakrabarty, Bozidar Radunovic, Ramachandran Ramjee, Vidya Natampally, Apurv Bhartia (Meraki/Cisco)
4
Enabling unlicensed wireless wide area networks has
the potential for immense societal and economic
impact on developing nations by providing
broadband connectivity to billions
Imagine… Ubiquitous and affordable Wifi access
• Suppose that • There were WiFi APs everywhere
• Mobile devices (phones, laptops) could be equipped with such WiFi
• Campus-wide coverage in educational and educational institutions
• Outdoor rural broadband coverage
• Cheap urban outdoor broadband coverage
•Current WiFi solutions are typically very limited in reach
•Outdoor broadband access is either not affordable or not available for many countries • Minimal or no rural wireless broadband coverage
• Sub- gigahertz frequencies enable long distance wireless connectivity
•Multiple challenges: technology, manufacturers, governmental policies
6
Today’s Developing World : Broadband Wireless Access
Government, Spectrum Policy and Regulatory Bodies
Manufacturers
R&D , Technology and Standards Bodies
Success Depends on Synergy of Three Entities
7
Policy and Regulation : White Spaces Ruling in US (2008)
Observation
• TV Spectrum is severely under-utilized in US and Europe
FCC (US) Ruling 2008
• Unused 6MHz channels can be used
• Provides a database lookup into which channels can be used
8
Extreme Spectrum Leakage Regulations
• 35 dB (4000 times) higher decay requirement over Industry standards (WiFi/LTE) to protect adjacent TV channels
• Increases manufacturing cost by 65% and also increases power consumption significantly
Over-Conservative Spectrum Database
• New-York, Los Angeles 0 channels
• Chicago, Seattle etc. about 20-30Mhz
Project Green Spaces : An Opportunity to Leap Frog
MG Road, Bangalore, India
• South Africa: UHF is 80% unused[1]
• Venezuela: 66%-96% unused [2]
• Malaysia: about 50MHz used [3]
• Developing nations are different • Spectrum availability in urban and rural areas is abundant
9
[1] Table 14.2, TV White Spaces, A pragmatic Approach, Dec 2013 [2] Spectrum occupancy at UHF TV band for cognitive radio applications, IEEE RFM, 2011 [3] http://www.nsf.gov/mps/ast/ears/1212340MacMullan_pres.pdf
• Create a new set of regulations and a standard for developing nations • Reflects the fact that there are little or no active incumbents
• Make it attractive for OEMs to manufacture WiFi-like end-consumer devices • Spectrum mask similar to WiFi or LTE and no harsher
• Abundant contiguous spectrum, no strict need for a database
• Huge potential market in developing nations
Cellular
FM
T.V
Long Range is a Double Edged Sword
• For cost effectiveness in rural and developing economies
• Need fewer access points, and long-range sub-Ghz transmissions
• Much Higher Inter-Access Point Interference
• Access Points are located high up and face little or no obstacles
• Access Points have higher transmit power than clients
• Interference from 700-2500m (Based on measurements we did in Cambridge)
Sharing in Time - Carrier Sense Multiple Access - CSMA
Access Point (AP)
Client (C1) Client (C2)
• Listen Before Transmit (FCC 2.4 GHz Regulation)
11
• Random Wait Before Transmit : Provides
fair access for each device
• Share : Each device gets a minimum of 1
𝑁
share where N is number of devices
• Completely decentralized
Access Point (AP2) Client (C2.1) Client (C2.2)
Sharing in Frequency - Frequency (Channel) Selection
• Suppose there are three WiFi networks each with an AP and its clients
• Each network gets 1/3 share
• Now suppose there are three channels
• Each network can take a different channel
• More capacity for each network Access Point (AP1)
Client (C1.1) Client (C2.2)
Access Point (AP3) Client (C3.1) Client (C3.2)
Frequency Selection is Much Harder than Time Sharing
No centralized coordinator
• Each AP decides on its own : which channel is the best to operate on?
• No Global View : AP in one channel cannot sense on other channels
• None specified by the WiFi Standard
13
• AP then asks all its clients to use this channel
• AP aggregates all these snooping measurements and ranks all channels and
determines the ―best‖ channel
Existing Proprietary Frequency Selection Mechanisms
• Periodically AP and its clients scan all channels
• Snoop traffic to determine the number of other devices sharing, amount of traffic etc.
• Examples : White-Fi
Problems with Channel Measurement Based Approaches
Measurement Overhead • Scanning all channels and aggregating measurements from all clients can be a significant
overhead
• In order to amortize this overhead scanning is infrequent and so these schemes cannot adapt to
the traffic dynamics quickly
Wireless Effects that are Hard To Quantify
• Not all interference is necessarily bad but it is hard to distinguish between harmless and harmful
interference – leads to over-conservative estimates
• The wireless channel itself maybe inferior leading to packet losses
Oscillations • If everyone goes to the least congested channel, the congestion bottleneck will simply shift!
•Does not perform channel measurement • No measurement overhead
•Adapts to changing conditions • Adaptation takes a second or a few seconds
•End-to-end solutions • Accounts for all measured and immeasurable effects
•Provable guarantees similar to centralized scheme
•Very simple to implement
IQ-Hopping : A Fundamentally New Approach
Ineffective Time Quantum (IQ)-Hopping
IQ- Hopping Algorithm
1. Select a random channel
2. Generate deadline 𝝉 = 𝑬𝒙𝒑(𝜶)
3. Track 𝝉wasted, the ineffective time
4. If 𝝉wasted > 𝝉, go to 1.
• Ineffective time = Time when you have packets to transmit but can’t • Waiting to gain access
because others are using the channel
• Packet loss due to collisions or bad wireless conditions
• Packet loss due to interference from hidden sources
16
Random-Hopping
• Select a random channel
• Generate a deadline 𝝉 = 𝑬𝒙𝒑(𝜶)
• After 𝝉 seconds
IQ-Hopping
• Select a random channel
• Generate a deadline 𝝉 = 𝑬𝒙𝒑(𝜶)
• After 𝝉 seconds of ineffective time
• Random hopping with time ticking only when it is being wasted!
IQ-Hopping – Surprisingly Similar to Random Hopping
17
• Random hopping is never used in practice since it is extremely inefficient
Example of Its Working (10 APs, 10 Channels)
• All start on channel 0 initially
• All get settled on a unique channel
within 10 seconds
18
IQ-Hopping Random-Hopping
(AP0)
• APs keep constantly hopping
• Average aggregate capacity 61%
What if the Number of Channels is Fewer?
• The APs keep hopping around.
• Let 𝑥𝑖(c) be a node’s aggregate
throughput when there are c
channels
• Normalized Throughput at c
channels = 𝑥𝑖(𝑐)
𝑥𝑖(10)
• Jain’s Fairness Index = 𝑥𝑖(𝑐)
2
10 𝑥𝑖(𝑐)2
• Provides full and fair utilization
even when the number of
channels is fewer
19
• K = number of channels ; N = number of nodes in a collision domain
• Theorem 1 : For K≥N, then, within an expected number of hops 𝐾𝑙𝑛𝐾
𝐾−𝑁+1 IQ-Hopping
converges to a state where each node has its own private channel. As 𝐾
𝑁 increases, the expected
number of hops tends to N.
In English : If there are sufficient channels, each AP will find a unique channel quickly
• Theorem 2 : When K<N, for any channel, the number of nodes utilizing that channel converges
to the stationary distribution 1 + 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑁 − 𝐾,1
𝐾). In particular, this implies that for any
channel, with high probability (say ≥ 99.99%), the number of nodes transmitting in that channel
is within 𝑁
𝐾± 6
𝑁
𝐾.
In English : If not, all nodes will keep hopping but the number of nodes per channel will
be equally distributed
Theorems on Optimality and Convergence
20
The Cloud Computing Landscape
•Computing has evolved from main frames, to PCs to a system of devices-and-cloud
•Big-data is an essential characteristic of cloud systems
•Programming cloud systems gives rise to interesting new challenges • Data parallelism • Data Security • Performance estimation and scheduling
Cloud Security The Gryffindor Approach
Sriram Rajamani, Manuel Costa, Ramarathnam Venkatesan, Kapil Vaswani
23
Cloud providers would like to say the following…
―No one—and this includes <Microsoft/Amazon/Google>, the government, and hackers—can access the data without the customer’s permission‖
Approach:
Encrypt all data!
Challenges:
How do we manage keys?
How can we do computation?
The Value Proposition
•Customers’ data is safe in cloud, even from: • Hackers who exploit OS vulnerabilities to break into the cloud (e.g., Azure)
• Malicious cloud employees
• Government agencies who try to strong-arm the cloud-provider
• Service-Level Agreements (SLAs), easy to hard: • Level 1: Encryption at rest and in transit
• Level 2: Encryption at rest and in transit, with key protection
• Level 3: Complete encryption –during rest, transit and computation
Our Goal
•Ensure confidentiality and integrity of data in the cloud • Protect data during computation
• Good performance for general-purpose workloads
• Small Trusted Computing Base (TCB) (e.g., operating system out of the TCB)
•Context • Trusted hardware will be commonplace (Intel SGX, FPGAs, HSMs, etc.,)
• Virtual Secure Mode (VSM) enables hypervisor-based implementation
World view
T1
T2 T3
U1
U3
U4
U2
•Every service contains trusted and untrusted components
•Data is encrypted in untrusted components
• Keys are available only in ―trusted red rectangles‖ inside trusted components
•Trust model? • How to design/implement the ―trusted
components‖? • How do untrusted and trusted
components interact securely? • How to get end-to-end security
guarantees?
Trust Model
TCB = Trusted
Computing Base
(indicated by
red-dotted
rectangles)
Operating System
App
Hypervisor
App
Operating System
App
Hypervisor
App
TCB Today TCB with Gryffindor (using hypervisor)
Hardware Hardware
29
Operating System
App
Hypervisor
App
TCB with Gryffindor (using only hardware)
Hardware
Intel Software Guard Extensions (SGX)
cores
cache System
Memory
SGX CPU
Encrypted
Data
private key
• Provides isolated execution of user-mode code in enclaves • Hardware guarantees isolation and integrity of code and data
inside the enclave, without trusting host OS • Remote attestation enables establishing trust with application
code running inside an SGX enclave in an untrusted environment
30
Untrusted Part
of App
Trusted Part
of App
Create Enclave
CallTrusted Func.
Execute
Return
(etc.)
Privileged System Code OS, VMM, BIOS, SMM, …
Call Gate
1. App is built with trusted
and untrusted parts
2. App runs and creates
enclave which is placed
in trusted memory
3. Trusted function is
called; code running
inside enclave sees
data in clear; external
access to data is denied
4. Function returns;
enclave data remains in
trusted memory
SSN: 999-84-2611 m8U3bcV#zP49Q
31
31
Gryffindor: Software Layers
Hadoop Key
Manager Cosmos
Runtime
C#
Apps
CLR Crypto libraries, memory management, etc.,
Applications and services re-factored and re-designed to provide confidentiality and integrity
SGX FPGA TZ VSM Secure Hardware or software with small TCB
Application: Trusted Hadoop
Azure Storage
output
Azure Map/Reduce Node
data
code
Azure Storage
map() reduce()
data
33
Protect the data analytics functions and data
• We also guarantee integrity: protocol ensures correctness of the results
Hadoop Framework (Untrusted)
Level 3
0
0,2
0,4
0,6
0,8
1
1,2
1,4
IoVolumes Options
Relative runtime
Baseline Trusted Hadoop
Trusted Hadoop Performance
34
• Two types of applications (numbers from our SGX emulator):
• IoVolumes: processes logs of a large cluster for billing (light computation)
• Options: calculates prices for stock options (heavy computation)
Run time overhead range from 0% to 24%
SELECT SUM(Balance)
FROM Accounts
WHERE t.Branch = ―Seattle‖
INSERT INTO @TempTable
SELECT Balance
FROM Accounts
WHERE t.Branch = ―343fe32435c342‖
SQL
Azure
Application: Trusted SQL
SELECT Sum(Decrypt(k, Balance))
FROM @TempTable
35
Level 3
What is new from a research perspective?
How to construct systems that run in an untrusted execution environment and reason about end-to-end security • Felix Schuster, Manuel Costa, Cedric Fournet, Christos Gkantsidis, Marcus Peinado, Gloria Mainar-
Ruiz, and Mark Russinovich, VC3: Trustworthy Data Analytics in the Cloud, To appear in IEEE S&P (Oakland) 2015
• Rohit Sinha, Sriram Rajamani, Sanjit Seshia, Kapil Vaswani, A Moat For Secure Enclaves, under submission
• Andrew Baumann, Marcus Peinado, and Galen Hunt, Shielding Applications from an Untrusted Cloud with Haven, OSDI 2014
• Arvind Arasu, Spyros Blanas, Ken Eguro, Raghav Kaushik, Donald Kossmann, Ravi Ramamurthy, and Ramarathnam Venkatesan, Orthogonal Security With Cipherbase, 6th Biennial Conference on Innovative Data Systems Research (CIDR'13), 2013
Open research questions
• Language design • How can we design a language where programmer marks secrets and compiler
splits into trusted and untrusted parts • Support secure multiparty computation
•Verification • Layered verification: (1) Verifying secure hardware. (2) Verify region runtime • Ensure isolation between user code and region runtime (using compiler) • Refinement to reduce size of TCB
•Preventing leakage through side channels
•Debugging, monitoring and diagnostics