solros: a data-centric operating system architecture for heterogeneous...

53
Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, Taesoo Kim Virginia Tech, eBay, Georgia Tech, Chonbuk National University April 26, 2018 Changwoo Min Solros: Data-Centric OS April 26, 2018 1 / 21

Upload: others

Post on 30-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros: A Data-Centric Operating System Architecturefor Heterogeneous Computing

Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap,Steffen Maass, Heeseung Jo, Taesoo Kim

Virginia Tech, eBay, Georgia Tech, Chonbuk National University

April 26, 2018

Changwoo Min Solros: Data-Centric OS April 26, 2018 1 / 21

Page 2: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Cambrian Explosion of Processor Architecture

Changwoo Min Solros: Data-Centric OS April 26, 2018 2 / 21

Specialization of general-purpose processors

Page 3: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Cambrian Explosion of Processor Architecture

Changwoo Min Solros: Data-Centric OS April 26, 2018 2 / 21

Specialization of general-purpose processors

Generalization of co-processors

Page 4: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Cambrian Explosion of Processor Architecture

Changwoo Min Solros: Data-Centric OS April 26, 2018 2 / 21

Specialization of general-purpose processors

Generalization of co-processors

Specialization of co-processors

Page 5: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Blazingly fast IO Devices

Changwoo Min Solros: Data-Centric OS April 26, 2018 3 / 21

Blazingly fast storage/memory

Page 6: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Blazingly fast IO Devices

Changwoo Min Solros: Data-Centric OS April 26, 2018 3 / 21

Blazingly fast storage/memory

Blazingly fast network

Page 7: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Blazingly fast IO Devices

Changwoo Min Solros: Data-Centric OS April 26, 2018 3 / 21

Blazingly fast storage/memory

Blazingly fast network

How to exploit the full potential of such hardware devices without pain?

System-wide performance

Ease of programming

Page 8: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Outline

1 Heterogeneous Computing Architectures

2 Solros: Split-Kernel ApproachSolros ArchitectureOperating System Services

3 Evaluation

Changwoo Min Solros: Data-Centric OS April 26, 2018 4 / 21

Page 9: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Host-Centric Approach

Host OS controls co-processors and IO devices

Examples: OpenCL, CUDA

Application

Host processor

OS

Mem Core

I/O deviceSSD / NIC

Co-processorApplication

Mem Core controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21

Page 10: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Host-Centric Approach

Host OS controls co-processors and IO devices

Examples: OpenCL, CUDA

Application

Host processor

OS

Mem Core

I/O deviceSSD / NIC

Co-processorApplication

Mem Core

controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21

Page 11: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Host-Centric Approach

Host OS controls co-processors and IO devices

Examples: OpenCL, CUDA

Application

Host processor

OS

Mem Core

I/O deviceSSD / NIC

Co-processorApplication

Mem Core

controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21

Page 12: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Host-Centric Approach

Host OS controls co-processors and IO devices

Examples: OpenCL, CUDA

Application

Host processor

OS

Mem Core

I/O deviceSSD / NIC

Co-processorApplication

Mem Core

③ controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21

Page 13: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Host-Centric Approach

Host OS controls co-processors and IO devices

Examples: OpenCL, CUDA

Application

Host processor

OS

Mem Core

I/O deviceSSD / NIC

Co-processorApplication

Mem Core

③ controldata

Problem

Redundant data communicationComplex to program and hard to optimize

Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21

Page 14: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Coprocessor-Centric Architecture

Co-processors control IO devices

Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14]

Application

Host processor

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS

Mem Core

OS

controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21

Page 15: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Coprocessor-Centric Architecture

Co-processors control IO devices

Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14]

Application

Host processor

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS

Mem Core

OS

controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21

Page 16: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Coprocessor-Centric Architecture

Co-processors control IO devices

Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14]

Application

Host processor

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS

Mem Core

OS

controldata

Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21

Page 17: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Coprocessor-Centric Architecture

Co-processors control IO devices

Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14]

Application

Host processor

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS

Mem Core

OS

controldata

Problem

Significant effort required for porting IO stack to co-processorNot completely exploiting powerful host processors

Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21

Page 18: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Outline

1 Heterogeneous Computing Architectures

2 Solros: Split-Kernel ApproachSolros ArchitectureOperating System Services

3 Evaluation

Changwoo Min Solros: Data-Centric OS April 26, 2018 7 / 21

Page 19: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Goal

Ease of programming

Best use of processor architecture

System-wide optimization

Changwoo Min Solros: Data-Centric OS April 26, 2018 8 / 21

Page 20: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Goal

Ease of programming

Best use of processor architecture

System-wide optimization

Challenge

Co-processor needs IO abstraction

IO stacks is branch-divergent and difficult to parallelize

It needs system-wide information

Changwoo Min Solros: Data-Centric OS April 26, 2018 8 / 21

Page 21: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Architecture

Split-Kernel Architecture

Data-plane OS

Runs on a co-processorProvides IO abstractionDelegates actual IO operations to a control-plane OS

Control-plane OS

Runs on a host processorRuns actual IO stackPerforms system-wide coordination

Changwoo Min Solros: Data-Centric OS April 26, 2018 9 / 21

Page 22: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Architecture

Control-plane OS: actual OS service + system-wide coordination

Data-plane OS: thin communication layer to host processor

controldata

Application

Host processor

OS proxy

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS stub

Core Mem

Policy

Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21

Page 23: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Architecture

Control-plane OS: actual OS service + system-wide coordination

Data-plane OS: thin communication layer to host processor

controldata

Application

Host processor

OS proxy

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS stub

Core Mem

① Policy

Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21

Page 24: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Architecture

Control-plane OS: actual OS service + system-wide coordination

Data-plane OS: thin communication layer to host processor

controldata

Application

Host processor

OS proxy

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS stub

Core Mem

Policy

Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21

Page 25: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Architecture

Control-plane OS: actual OS service + system-wide coordination

Data-plane OS: thin communication layer to host processor

controldata

Application

Host processor

OS proxy

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS stub

Core Mem

②③

Policy

Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21

Page 26: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Solros Architecture

Control-plane OS: actual OS service + system-wide coordination

Data-plane OS: thin communication layer to host processor

controldata

Application

Host processor

OS proxy

Mem Core

I/O deviceSSD / NIC

Application

Co-processor

OS stub

Core Mem

②③

Policy

Co-processor has OS abstraction with minimal effort

Best use of each of the fat and lean processors

Efficient global coordination among devices (policy)

Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21

Page 27: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Operating System Services

1 Transport service

2 Filesystem service

3 Network service

Changwoo Min Solros: Data-Centric OS April 26, 2018 11 / 21

Page 28: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Operating System Services

1 Transport service

2 Filesystem service

3 Network service

Changwoo Min Solros: Data-Centric OS April 26, 2018 12 / 21

Page 29: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Transport Service

High performance data transfer among devices are challenging:

Uniform data transfer among devices

High contention in massively-parallel co-processor

Asymmetric performance between host processor and co-processor

Changwoo Min Solros: Data-Centric OS April 26, 2018 13 / 21

Page 30: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Transport Service

High performance data transfer among devices are challenging:

Uniform data transfer among devices

High contention in massively-parallel co-processor

Asymmetric performance between host processor and co-processor

Our approach

Uniform data transfer ⇒ system-mapped PCIe window

High contention ⇒ combining, replication, interleaving, etc.

Asymmetric performance ⇒ flexibly configurable (host DMA enginevs. co-processor DMA engine)

Changwoo Min Solros: Data-Centric OS April 26, 2018 13 / 21

Page 31: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Transport Service

High performance data transfer among devices are challenging:

Uniform data transfer among devices

High contention in massively-parallel co-processor

Asymmetric performance between host processor and co-processor

Our approach

Uniform data transfer ⇒ system-mapped PCIe window

High contention ⇒ combining, replication, interleaving, etc.

Asymmetric performance ⇒ flexibly configurable (host DMA enginevs. co-processor DMA engine)

See details in the paper

Changwoo Min Solros: Data-Centric OS April 26, 2018 13 / 21

Page 32: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub①

File system proxy

File system

PCIe

controldata

Zero-copy of data between co-processor memory and SSDMinimal data transfer

Changwoo Min Solros: Data-Centric OS April 26, 2018 14 / 21

Page 33: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub①

File system proxy

File system

PCIe

controldata

Zero-copy of data between co-processor memory and SSDMinimal data transfer

Changwoo Min Solros: Data-Centric OS April 26, 2018 14 / 21

Page 34: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

File system

PCIe

controldata

Zero-copy of data between co-processor memory and SSDMinimal data transfer

Changwoo Min Solros: Data-Centric OS April 26, 2018 14 / 21

Page 35: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

File system

PCIe

controldata

Zero-copy of data between co-processor memory and SSDMinimal data transfer

Changwoo Min Solros: Data-Centric OS April 26, 2018 14 / 21

Page 36: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

File system

PCIe

⑤ controldata

Zero-copy of data between co-processor memory and SSDMinimal data transfer

Changwoo Min Solros: Data-Centric OS April 26, 2018 14 / 21

Page 37: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub①

File system proxy

File system

PCIe

controldata

Buffer cache

Reduce storage IO by leveraging shared buffer cache among co-processorsAvoid performance anomaly of peer-to-peer communication over PCIe bus

Changwoo Min Solros: Data-Centric OS April 26, 2018 15 / 21

Page 38: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

Buffer cache

File system

PCIe

controldata

Reduce storage IO by leveraging shared buffer cache among co-processorsAvoid performance anomaly of peer-to-peer communication over PCIe bus

Changwoo Min Solros: Data-Centric OS April 26, 2018 15 / 21

Page 39: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

Buffer cache

File system

PCIe

controldata

Reduce storage IO by leveraging shared buffer cache among co-processorsAvoid performance anomaly of peer-to-peer communication over PCIe bus

Changwoo Min Solros: Data-Centric OS April 26, 2018 15 / 21

Page 40: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

Buffer cache

File system

PCIe

controldata

Reduce storage IO by leveraging shared buffer cache among co-processorsAvoid performance anomaly of peer-to-peer communication over PCIe bus

Changwoo Min Solros: Data-Centric OS April 26, 2018 15 / 21

Page 41: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Filesystem Service

Peer-to-peer operation

Buffered operation

Host processor

SSD

DMA engine

Application

Co-processor

File system stub① ③

File system proxy

Buffer cache

File system

PCIe

controldata

④⑥

Reduce storage IO by leveraging shared buffer cache among co-processorsAvoid performance anomaly of peer-to-peer communication over PCIe bus

Changwoo Min Solros: Data-Centric OS April 26, 2018 15 / 21

Page 42: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Implementation

Host: 2-socket Xeon processor (12 cores each)

Co-processor: 4 Xeon Phi (KNC, 61 cores, Linux, PCIe Gen 3x16)

Storage device: 4 NVMe SSD

NIC: 100 Gbps Ethernet

ModuleLines of code

Added lines Deleted lines

Transport service 1,035 365

File system ServiceStub 5,957 2,073Proxy 2,338 124

Network ServiceStub 2,921 79Proxy 5,609 34

NVMe device driver 924 25SCIF kernel module 60 14

Total 18,844 2,714

Changwoo Min Solros: Data-Centric OS April 26, 2018 16 / 21

Page 43: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Evaluation

Questions:

Performance of Solros services

Impact on real-world applications

Changwoo Min Solros: Data-Centric OS April 26, 2018 17 / 21

Page 44: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Performance of Solros Services

0

0.5

1

1.5

2

2.5

32KB

64KB

128KB

256KB

512KB

1MB

2MB

4MB

0

20

40

60

80

100

10 100 1000

GB

/se

c

(a) file random read on SSD

Phi-SolrosPhi-Linux

Per

cen

tag

eo

fre

qu

ests

(%)

Latency(usec)

(b) TCP latency: 64-byte message

Phi-SolrosPhi-Linux

File IO performance: 19x faster than the stock Linux on Xeon PhiTCP latency (99 percentile): 7x shorter than the stock Linux on Xeon Phi

Changwoo Min Solros: Data-Centric OS April 26, 2018 18 / 21

Page 45: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Performance of Solros Services

0

2

4

6

Phi-Linux Phi-Solros0

2

4

6

8

10

Phi-Linux Phi-Solros

La

ten

cy(m

sec)

(a) file random read on SSD

StorageBlock/Transport

File system

(b) TCP latency: 64-byte message

Proxy/TransportNetwork stack

Significant performance gain in data transportRunning IO stack on co-processor is slower

Changwoo Min Solros: Data-Centric OS April 26, 2018 19 / 21

Page 46: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Real-world Application - Image Search

Image search engine is running on Xeon Phi

Image database is on NVMe SSD (shared read-only)

Image search queries are from network

0

100

200

300

400

0

100

200

0 100 200 300

Ela

pse

dT

ime

(sec

)

Phi-SolrosPhi-Linux

Ba

nd

wid

th(M

B/

s)

Time(sec)

Phi-SolrosPhi-Linux

Solros performs 2x faster than stock Linux on Xeon Phi

Changwoo Min Solros: Data-Centric OS April 26, 2018 20 / 21

Page 47: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Conclusion

Solros, a new operating system architecture for co-processors and fastIO devices

Control-plane, data-plane architecture allow:

Supporting high-level OS abstraction on co-processorEfficient global coordination among devicesNear ideal IO performance from co-processor

We will release source code soon

Changwoo Min Solros: Data-Centric OS April 26, 2018 21 / 21

Page 48: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Image Search - Scalability

Increase the number of Xeon Phi to 4

0

1

2

3

4

1 2 3 4

Nor

ma

lize

dsp

eed

up

The number of Xeon Phi co-processors

Solros load balancing mechanism achieves near linear scaling

Changwoo Min Solros: Data-Centric OS April 26, 2018 22 / 21

Page 49: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Related work

Control-plane/data-plane OS: Arrakis [OSDI’14], IX [OSDI’14]

OS for heterogeneous systems: Helios [SOSP]09], M3 [ASPLOS’16],Hydra [ASPLOS’08]

IO support for GPU: PTask [SOSP’11], GPUfs [ASPLOS’13], GPUnet[OSDI’14]

Changwoo Min Solros: Data-Centric OS April 26, 2018 23 / 21

Page 50: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Transport service

Host's physical address space

Host RAM PCIe Window mmio

DRAM

NICDRAM mmio

Xeon PhiDRAM mmio

NVMe

Host's virtualaddress space

App 1 App 2

...

mmioDevice's physical

address sapce

mapped to

(across devices)

mapped to

(in-host)

Changwoo Min Solros: Data-Centric OS April 26, 2018 24 / 21

Page 51: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Network Service (TCP)

Outbound operation

Inbound operation

Host processor

NIC

Application

Co-processor

TCP stub

TCP proxy

Load balancer

TCP stack

PCIe

controldata

A load balancer on a host distributes incoming TCP connections to one ofleast-loaded co-processors.

See details in the paper.

Changwoo Min Solros: Data-Centric OS April 26, 2018 25 / 21

Page 52: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Discussion

Hardware support other than Xeon Phi

Two atomic instructions: transport serviceMMU: isolation among co-processor applications

Scalability of control-plane OS

Limited by scalability of OS service, PCIe interconnect, andperformance of IO devices

Changwoo Min Solros: Data-Centric OS April 26, 2018 26 / 21

Page 53: Solros: A Data-Centric Operating System Architecture for Heterogeneous Computingsolros-slides.pdf · 2019. 7. 17. · Solros: A Data-Centric Operating System Architecture for Heterogeneous

Real-world Application - Text Search

CLucene text indexing engine running on Xeon Phi

Text data is on NVMe SSD

0

500

1000

1500

2000

2500

3000

0.0

0.5

1.0

1.5

0 50 100 150

Ela

pse

dT

ime

(sec

)

Phi-SolrosPhi-Linux

Ba

nd

wid

th(G

B/

s)

Time(sec)

Phi-SolrosPhi-Linux (virtio)

Solros performs 19x faster than stock Linux (ext4/virtio) on Xeon Phi

Changwoo Min Solros: Data-Centric OS April 26, 2018 27 / 21