[231] the simplicity of cluster apps with circuit
TRANSCRIPT
The simplicity of implementing a job scheduler with Circuit
DEVIEW 2015Seoul, South Korea
Petar Maymounkov
Circuit: Light-weight cluster OS
● Real-time API to see and control:○ Hosts, processes, containers
● System never fails○ API endpoint on every host
○ Robust master-less, peer-to-peer membership
protocol
API: Model of cluster
Proc 1
Proc 2 Proc 3
Proc 4
Proc 5
Proc 7
Host 1 Host 2 Host 3 Host 4
Proc 8
Proc 6
API: Abstraction
Proc 1
Proc 2 Proc 3
Proc 4
Proc 5
Proc 7
Host 1 Host 2 Host 3 Host 4
Proc 8
Proc 6
Host 1Host 2 etc.
Cluster DOM operations:● Traversal● Manipulation● Notifications
API: Command-line example Host A
mysql
hdfs
Host B
apachephp
memcache
fs
db
cache
http
X85ec139ad
X68a30645c7
$ circuit ls -l /...server /X85ec139adserver /X85ec139ad/fsproc /X85ec139ad/dbserver /X68a30645cproc /X68a30645c/cachedocker /X68a30645c/http
API: Command-line example Host A
mysql
hdfs
Host B
apachephp
memcache
fs
db
cache
http
X85ec139ad
X68a30645c7
$ circuit mkproc /X85ec/test <<EOF{“Path”:”/sbin/lscpu”}EOF
$ circuit stdout /X85ec/testArchitecture: x86_64etc.
$ circuit wait /X85ec/db
$ circuit stderr /X68a3etc.
Host C
System architecture: Boot individual hosts
Host A Host B
circuitserver
circuitserver
circuitserver
$ circuit start
Host C
System architecture: Discovery
Host A Host B
circuitserver
circuitserver
circuitserver
$ circuit start --discover=228.8.8.8:8822
Host C
System architecture: API endpoints
Host A Host B
circuitserver
circuitserver
circuitserver
API API API
$ circuit startcircuit://127.0.0.1:7822/17880/Q413a079318a275ca
Eng Notebook
System architecture: Client connections
Host A Host B
circuitserver
circuitserver
circuitclient
circuitclient
$ circuit startcircuit://127.0.0.1:7822/17880/Q413a079318a275ca
App-to-circuit connection never fails, because of
colocation.
Go: Entrypoint
import . “github.com/gocircuit/circuit/client”
func Dial(circuitAddr string,crypto []byte,
) *Client
func DialDiscover(udpMulticast string, crypto []byte,
) *Client
Go: Error handling
● Physical errors are panics● Application errors are returned values
func Dial(circuitAddr string,crypto []byte,
) *Client
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
/X114cd3eba3423596
/Xfea8b5b798f2fc09
/X45af9c2b9c9ab78b
/X114cd3eba3423596/db
/X114cd3eba3423596/jobs
/X45af9c2b9c9ab78b/nginx
/X45af9c2b9c9ab78b/nodejs
/X45af9c2b9c9ab78b/dns
/
Container
Process
Container
DNS
Server
Server
Server
Go: Anchor hierarchyAn anchor is a node in a unified namespace
An anchor is like a “directory”. It can have children anchors
An anchor is like a “variable”. It can be empty or hold an object (server, process, container, etc.)
Client connection is the root anchor
Anchor
Anchor
Anchor
Anchor
/
Server
Server
Server
Go: Anchor API
type Anchor interface{Walk(path []string) AnchorView() map[string]AnchorMakeProc(Spec) (Proc, error)MakeDocker(Spec) (Container, error)Get() interface{}Scrub()
}
type Proc interface{Peek() (State, error)Wait() error…
}
/Xfea8b5b798f2fc09
/X45af9c2b9c9ab78b
/X114cd3eba3423596
Go: Anchor residence
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
Anchor
Residing on host:/X114cd3eba3423596
/X114cd3eba3423596
/X45af9c2b9c9ab78b
/X114cd3eba3423596/db
/X114cd3eba3423596/jobs
/X45af9c2b9c9ab78b/nginx
/X45af9c2b9c9ab78b/nodejs
/X45af9c2b9c9ab78b/dns
/
Container
Process
Container
DNS
Server
Server
Residing on host:/X45af9c2b9c9ab78b
Scheduler: Design spec
● User specifies:○ Maximum number of jobs per host○ Address of circuit server to connect to
● HTTP API server:○ Add job by name and command spec○ Show status
Scheduler: Service architecture
Host CHost B
circuitserver
circuitserver
circuitserver
sched
Host A
HTTP API
Job1Job2 Job3
Scheduler: State and logic
State
ControllerAdd job
Show status
Job exited
Host joined/left
USER
CIRCUIT
scheduleStart job
H1
J1
J2
J3
H3
J4
J7
H5
J8
Pend
J5
J6
Live hosts andthe jobs they are
running
Jobs waiting to
be run
Scheduler: Mainimport “github.com/gocircuit/circuit/client”
var flagAddr = flag.String(“addr”, “”, “Circuit to connect to”)var flagMaxJobs = flag.Int(“maxjob”, 2, “Max jobs per host”)
func main() {flag.Parse()defer func() {
if r := recover(); r != nil {log.Fatalf("Could not connect to circuit: %v", r)
}}()conn := client.Dial(*flagAddr, nil)controller := NewController(conn, *flagMaxJobs)… // Link controller methods to HTTP requests handlers.log.Fatal(http.ListenAndServe(":8080", nil))
}
Scheduler: Controller statetype Controller struct {
client *client.ClientmaxJobsPerHost intsync.Mutex // Protects state fields.jobName map[string]struct{}worker map[string]*workerpending []*job
}
type worker struct {name stringjob []*job
}
type job struct {name stringcmd client.Cmd
}
H1
J1
J2
J3
H3
J4
J7
H5
J8
Pend
J5
J6
Live hosts andthe jobs they are
running
Jobs waiting to
be run
func NewController(conn *Client, maxjob int) *Controller {c := &Controller{…} // Initialize fields.// Subscribe to notifications of hosts joining and leaving.c.subscribeToJoin()c.subscribeToLeave()return c
}
Scheduler: Start controller
func (c *Controller) subscribeToJoin() {a := c.client.Walk([]string{c.client.ServerID(), "controller", "join"})a.Scrub()onJoin, err := a.MakeOnJoin()if err != nil {
log.Fatalf("Another controller running on this circuit server.")}go func() {
for {x, ok := onJoin.Consume()if !ok {
log.Fatal("Circuit disappeared.")}c.workerJoined(x.(string)) // Update state.
}}()
}
Scheduler: Subscribe to host join/leave events
Place subscription on server the scheduler is connected to.
Pick a namespace for scheduler service.
func (c *Controller) workerJoined(name string) {c.Lock()defer c.Unlock()… // Update state structure. Add worker to map with no jobs.go c.schedule()
}
func (c *Controller) workerLeft(name string) {c.Lock()defer c.Unlock()… // Update state structure. Remove worker from map.go c.schedule()
}
Scheduler: Handle host join/leave
Scheduler: So far ...
State
ControllerAdd job
Show status
Job exited
Host joined/left
USER
CIRCUIT
scheduleStart job
func (c *Controller) AddJob(name string, cmd Cmd) {c.Lock()defer c.Unlock()… // Update state structure. Add job to pending queue.go c.schedule()
}
func (c *Controller) Status() string {c.Lock()defer c.Unlock()… // Print out state to string.
}
Scheduler: User requests
Scheduler: So far ...
State
ControllerAdd job
Show status
Job exited
Host joined/left
USER
CIRCUIT
scheduleStart job
Scheduler: Controller statefunc (c *Controller) schedule() {
c.Lock()defer c.Unlock()// Compute job-to-worker matchingvar match []*match = …for _, m := range match {
… // Mark job as running in workergo c.runJob(m.job, m.worker)
}}
type match struct {*job // Job from pending*worker // Worker below capacity
}
H1
J1
J2
H3
J4
H5
J8
Pend
Live hosts andthe jobs they are
running
Jobs waiting to
be run
Ja
Jb
J9 J6J3 J7
Scheduler: Run a jobfunc (c *Controller) runJob(job *job, worker *worker) {
defer func() {if r := recover(); r != nil { // Worker died before job completed.
… // Put job back on pending queue.}
}()jobAnchor := c.client.Walk([]string{worker.name, "job", job.name})proc, err := jobAnchor.MakeProc(job.cmd)… // Handle error, another process already running.proc.Stdin().Close()go func() { // Drain Stdout. Do the same for Stderr.
defer func() { recover() }() // In case of worker failure.io.Copy(drain{}, proc.Stdout())
}()_, err = proc.Wait()… // Mark complete or put back on pending queue.
}
Place process on desired host. Pick namespace for jobs.
Scheduler: Demo.
State
ControllerAdd job
Show status
Job exited
Host joined/left
USER
CIRCUIT
scheduleStart job
Recursive processes: Execution tree
Host CHost B
master
Host A
Job1
sched1 sched2
Job2 Job3 Job4 Job5
useruser
master
sched1 sched2
Job1 Job2 Job4 Job3 Job5
Universal distribution
master
sched
job1
job2
One Go binarythat takes on
different roles
master
sched
job1
job2
circuitserver
Package binary & circuit in an executable container image
Host A Host B
Host C Host D
Customer runs container alongside other
infrastructure with isolation
mysql
The vision forward
● Easy (only?) way to share any cloud system○ “Ship with Circuit included” vs “Hadoop required”?○ Circuit binary + Your binary = Arbitrary complex cloud○ Like Erlang/OTP but language agnostic