linux cluster architecture

31
Linux Cluster Architecture Linux Users Group Slide # 1 October 5 th , 2002 Copyright © 2002 Alexander Vrenios by Alex Vrenios mailto://[email protected] (Shameless Plug)

Upload: rsrrsreddynet

Post on 02-Apr-2015

97 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 1 October 5th, 2002Copyright © 2002 Alexander Vrenios

by

Alex Vreniosmailto://[email protected]

(Shameless Plug)

Page 2: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 2 October 5th, 2002Copyright © 2002 Alexander Vrenios

Overview:• Why would anyone want to build a cluster system?

• Computer Architecture Review: UPs through Clusters

• Gathering the PC computer hardware (on the cheap!)

• Connecting the node computers into a local area network

• Configuring relevant Linux OS files for internetworking

• Client-Services and sockets make PCs work as a team

• The design of our simple master-slave cluster server

• Internal and external performance monitoring and tuning

Page 3: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 3 October 5th, 2002Copyright © 2002 Alexander Vrenios

Why would anyone want to build a cluster system?• Hobbyists:

It’s a new and interesting pathway to experience; andhow many of your friends have a cluster server anyway?

• Professionals:Sophisticated systems are often developed in parallel,meaning the hardware won’t be ready when you want totest your software. Having a test bed will get you pastthe hardware independent bugs, and put you in a positionto polish your product when the platform is finally ready.

• Managers:This is all bleeding edge stuff; you’ll want to prepare forthe issues your people might face and the questions theymight ask. Experience gives you the insight you’ll need.

• Academics:Analyze data from a live system, instead of questionableand potentially over-simplified simulation output.

Page 4: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 4 October 5th, 2002Copyright © 2002 Alexander Vrenios

Page 5: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 5 October 5th, 2002Copyright © 2002 Alexander Vrenios

Computer Architecture Review:

• Uniprocessor or UP

RAMCPU

InstructionProcessor

Arithmeticand Logic

Instructionsand Data

INP

UT

OU

TP

UT

I/O

Port

Port

Data Bus

SISD*: Single instruction,single data stream.

The typical PC is a uniprocessor.

* Flynn proposed this taxonomy - some other configurations follow…

Page 6: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 6 October 5th, 2002Copyright © 2002 Alexander Vrenios

• Array or Vector Processor

CPU0 CPU1 CPUN

Data

Instruction Bus

. . .

CONTROLLER

Instructions

A[0] , B[0]Data

A[1] , B[1]Data

A[n] , B[n]

SIMD: Single Instruction,multiple data stream.

(ILLIAC IV, IBM 390, DSPs, etc.)

• Pipeline Processor

Pipe: confluent instruction execution

Data Bus

CPU

RAM: Instructions and Data

MISD: Multiple instruction,single data stream?

(Some say there is no MISD.)

• Multiprocessor or MP

Data Bus

CPU0

RAM: Instructions and Data

CPU1 CPUN. . . MIMD: Multiple instruction,multiple data streams.

Mainframe, Workstation, etc.(Mostly for the very wealthy!)

Page 7: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 7 October 5th, 2002Copyright © 2002 Alexander Vrenios

• The MIMD is so interesting that gets its own taxonomy:

MULT

ICOM

PUTE

RP(L

oose

ly-C

oupl

ed)

CPU0 CPU1 CPUN

Inst+Data Inst+Data Inst+Data

Local Area Network

. . .NORMA: No (hardware) RemoteMemory Access (rare distinction)

Older Beowulf Clusters, DistributedShared Memory Systems (IVY), andsome Modern day cluster computers.

CPU0 CPU1 CPUN

Inst+Data Inst+Data Inst+Data

High-speed Back Plane

. . .

PCs, whose “personality” maybe molded by its software!

Data Bus

CPU0

RAM: Instructions and Data

CPU1 CPUN. . . UMA: Uniform Memory Access

Tightly-coupled multiprocessor.All CPUs access instructions anddata at the same transfer rate.

MP

NUMA: Non-uniform memory access

VME Chassis, some Beowulf Clusters,and many embedded processor systems.

Plug-in boards, for example, eachWith a CPU and some local memory.

Page 8: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 8 October 5th, 2002Copyright © 2002 Alexander Vrenios

Gathering PC Computer Hardware:

• Small computer stores (Renaissance Computer, e.g.)

• Newspaper and club and organization newsletter ads

• Family, friends and neighbors (closets, garage sales)

• Large corporations? (hospitals, Am Exp, Mot, etc.)

• Computer salvage outlets:

ASU Salvage

University

Pim

a 10

1

N Rio Salado

Page 9: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 9 October 5th, 2002Copyright © 2002 Alexander Vrenios

Connecting the Node PCs into a LAN:

alpha

. . .

10.0.0.1

beta omegachaos.orgHub

10.0.0.2 10.0.0.5

IP Address Network Interface Cables

PC Rear View

(10/100 Base T)

RJ-45 Jack RJ-45 Plugs and Hub Ports

Built-in Network Interface PortsAdd-on

Page 10: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 10 October 5th, 2002Copyright © 2002 Alexander Vrenios

Network Block Diagram:

CHAOS: CHeap Array of Outmoded Systems

Multicomputer Server

10MB ‘386Linux rh4.2

12MB ‘386Linux rh4.2

12MB ‘386Linux rh4.2

13MB ‘386Linux rh4.2

13MB ‘386Linux rh4.2

13MB ‘386Linux rh4.2

16MB ‘386Linux rh4.2

13MB ‘386Linux rh4.232MB ‘486

(p75 Equiv.)Linux rh4.2

NFS

InteractiveClient

ExternalMonitor

Desktop PC(Real-Time

Performance)

EthernetClient queries& responses

Page 11: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 11 October 5th, 2002Copyright © 2002 Alexander Vrenios

Configuring a Linux Network – Local User Files:/ (the root directory)

/home

/home/chiefalpha:/home/chief/src> make pgm

/home/chief/bin

/home/chief/src/home/chief/inc

.rhosts

(others)pgm.c

pgm.hpgm

makefile

pgm:gcc -I../inc/ –o../bin/pgm pgm.c

Page 12: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 12 October 5th, 2002Copyright © 2002 Alexander Vrenios

Configuring a Linux Network – Remote User Files:

• Network File System: the illusion of locality via remote-mount points

omega

chaos.org

alpha HubNFS Server NFS Client

/dev/hda /dev/hda10.0.0.1 10.0.0.5

/ /

/home

/home/chief

/home/chief/bin /home/chief/src

/home/chief/inc

/home

/home/chief

adduser

Page 13: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 13 October 5th, 2002Copyright © 2002 Alexander Vrenios

Configuring a Linux Network:

• File /etc/hosts.equiv on every cluster node:

alpha.chaos.org chief. . .omega.chaos.org chief

• File /home/chief/.rhosts in the user’s home directory:

alpha.chaos.org chief. . .omega.chaos.org chief

• Test access using rsh, a remote shell command:

alpha:/home/chief> rsh omega

> Note that this and what follows may lead to a SECURITY leak!.

Page 14: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 14 October 5th, 2002Copyright © 2002 Alexander Vrenios

Configuring a Linux Network (continued):

• First, file /etc/hosts belongs on all the network nodes:

127.0.0.1 localhost localhost.chaos.org10.0.0.1 alpha alpha.chaos.org

. . .10.0.0.5 omega omega.chaos.org

• File /etc/exports on 10.0.0.1, the NFS server named alpha:

/home (rw)

• File /etc/fstab on each cluster node except the server named alpha:

/dev/hda1 swap swap defaults 0 0/dev/hda2 / ext2 defaults 1 1alpha:/home /home nfs rw 0 0/dev/fd0 /mnt/floppy ext2 noauto 0 0none /proc proc defaults 0 0

Server

Clients

Page 15: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 15 October 5th, 2002Copyright © 2002 Alexander Vrenios

Internetworking Services – Operation:

Screen Output

myfile

catrcatd

UDP or TCPsocket

Remote Machine

service

rcat

Local Machine> rcat myfile remoteFirst line in myfileSecond line, etc.

Keyboard Input

rcat myfile remote

Net

wor

k

Page 16: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 16 October 5th, 2002Copyright © 2002 Alexander Vrenios

Internetworking services – Configuration (inetd):

• Add a line to file /etc/services on each remote-server node:

rcatd 5000/udp # remote-cat UDP service on port 5000

• Add a line to file /etc/inetd.conf on each remote-server node:

rcatd dgram udp wait chief /home/chief/bin/rcatd

• [Reconfiguration if necessary] omega:/root> killall -HUP inetd

Sequence of events:

1. Client process sends a UDP packet to server’s port 50002. Daemon (inetd) starts process at /home/chief/bin/rcatd3. Service reads incoming UDP packet data from “keyboard”

Refers toentry inservices

Page 17: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 17 October 5th, 2002Copyright © 2002 Alexander Vrenios

Internetworking Services – Configuration (xinetd):• File /etc/xinetd.d/rcatd on each (xinetd) remote-server node:

service rcatd{

port = 5000socket_type = dgramprotocol = udpwait = yesuser = chiefserver = /home/chief/bin/rcatdonly_from = 10.0.0.0disable = no

}

• [Reconfiguration] omega:/root> /etc/rc.d/init.d/xinetd restart

Refers toname ofservice

Means 10.0.0.*

Page 18: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 18 October 5th, 2002Copyright © 2002 Alexander Vrenios

Distributed Systems C-Language Skills:SUBTASKING INTERNETWORKING

main

subtask

Sockets

SharedMemory

main

subtask

Sockets

SharedMemory

remoteprocess

main

subtaskSocketsShared

Memory remoteservice

inetdmain

subtask

SIGALRM

SIGCHLD

Many examplesin the book!

Network

NETWORK SERVICESSIGNAL HANDLING

Network Network

Page 19: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 19 October 5th, 2002Copyright © 2002 Alexander Vrenios

Master-Slave Cluster Server - Initialization:

Broadcast starts slave tasks…

slave

master

slave

slave

beta gamma

alpha delta

master starts local subtask, onefor each registering remote slave.

s1 s2 s3

Local subtasks contact slaves…

perform

perform perform

performslave

master

slave

slave

beta gamma

alpha delta

s1 s2 s3

all tasks start perform subtasks.

Page 20: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 20 October 5th, 2002Copyright © 2002 Alexander Vrenios

Master-Slave Cluster Server - Operation:

Perform tasks send performance info to monitor…

perform

perform perform

performslave

master

slave

slave

beta gamma

alpha delta

s1 s2 s3

client monitorquery

response

incoming queries are processed by first available slave, via subtask.

Page 21: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 21 October 5th, 2002Copyright © 2002 Alexander Vrenios

Real-Time Performance Monitoring – Internal:

• Resource utilization reporting via the /proc pseudo-files*:

- CPU Utilization in /proc/stat – Running Jiffy Counts in each State

cpu 1256 0 1566 565277

- Disk Reads and Writes in /proc/stat – Running I/O Counts

disk_rio 1270 0 0 0

disk_wio 1337 0 0 0/dev/hda

idlesystemniceuser

* Note that the exact meaning and content of proc files can be OS release dependent.

Page 22: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 22 October 5th, 2002Copyright © 2002 Alexander Vrenios

Real-Time Performance Monitoring – Internal (continued):

• Resource utilization reporting (continued):

- Memory Utilization in /proc/meminfo – Current Values

Mem: 14942208 13713408 1228800 . . .

- Packets Sent and Received in /proc/net/dev – Running I/O Counts

lo: 80 0 0 0 0 80 0 0 0 0eth0: 115 0 0 0 0 68 0 0 0 0

FreeUsedTotal

TransmittedReceived

Page 23: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 23 October 5th, 2002Copyright © 2002 Alexander Vrenios

Real-Time Performance Monitoring – monitor/perform(s):

NEAR REAL-TIME CLUSTER PERFORMANCE STATISTICS

10Base2+----ALPHA-----+ | +-----BETA-----+| Cpu Mem | | | Cpu Mem || 7% 94% |Rcvd 0 | 21 Rcvd| 28% 40% || Rio Wio +-----------+-----------+ Rio Wio || 1 0 |Sent 12 | 1 Sent| 0 1 |+---10.0.0.1---+ | +---10.0.0.2---+

|+----GAMMA-----+ | +----DELTA-----+| Cpu Mem | | | Cpu Mem || 2% 75% |Rcvd 2 | 0 Rcvd| 5% 56% || Rio Wio +-----------+-----------+ Rio Wio || 4 0 |Sent 0 | 10 Sent| 3 0 |+---10.0.0.3---+ | +---10.0.0.4---+

chaos.org

- Overall Network Loading -23 Pkts/sec

Page 24: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 24 October 5th, 2002Copyright © 2002 Alexander Vrenios

Real-Time Performance Monitoring – External (displayed):

• Resource utilization reporting via a custom client process:

RESPONSE | OBSERVATIONSTIME (msec) | 10 20 30 40 50------------+----+----+----+----+----+----+----+----+----+----+

1 10 |11 20 |21 30 |************************31 40 |************************41 50 |**51 60 |61 70 |71 80 |81 90 |91 100 |

50 Total Observations

Average = 30 milliseconds …so what if you’re not happy with this level of performance?

Page 25: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 25 October 5th, 2002Copyright © 2002 Alexander Vrenios

Performance Tuning – Defining Execution Phases:

Slave 2Slave 1

Ethernet

MASTER

S1

S2

Response

Query

STPTable

6

12

7

4DB

5

3

8

Client

Transittimes

Shared

Page 26: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 26 October 5th, 2002Copyright © 2002 Alexander Vrenios

Performance Tuning – Execution Phase Times:

������������

������������

������������

������������

������������

������������

������������

������������

����������

������������

������������

������������

������������

������������

������

��������

������������

������������

������������

������������

������������

������������

������

����������

������������

������������

������������

������������

������������

������

����������

����������

�����

����������

����������

����������

����������

����������

����������

����������

�����

������������

����������

����������

����������

����������

�����

������������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

�����

����������

����������

�����

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

�����

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

�����

����������

����������

�����

Initial MSI Phase Times

0.00

0.01

0.02

1 2 3 4 5 6 7 8

Execution Phases(Three Time Distributions)

Ave

rag

e T

ime

(se

con

ds)

���������� Expon

�����

Pulse���������� Sweep

Leave thefile open?

Not a SW issue.

Founda bug!

Page 27: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 27 October 5th, 2002Copyright © 2002 Alexander Vrenios

Performance Tuning – Final Times:

������������

����������

������������

������������

������������

������������

������

��������

������������

������������

������������

������������

������

����������

������������

������������

������������

������������

������

����������

����������

����������

������������

����������

����������

����������

�����

������������

����������

����������

����������

����������

����������

����������

����������

�����

����������

����������

�����

����������

����������

����������

����������

����������

����������

����������

����������

����������

����������

�����

����������

����������

����������

����������

����������

����������

����������

�����

Final MSI Phase Times

0.00

0.01

0.02

1 2 3 4 5 6 7 8

Execution Phases(Three Time Distributions)

Ave

rag

e T

ime

(se

con

ds)

�����

Expon

����� Pulse���������� Sweep

Dramatic reduction!

About a 10% improvement(not too bad)

See book for further details on statisticaldistributions.

Page 28: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 28 October 5th, 2002Copyright © 2002 Alexander Vrenios

Performance Tuning:

• The proof is in the pudding!

RESPONSE | OBSERVATIONSTIME (msec) | 10 20 30 40 50------------+----+----+----+----+----+----+----+----+----+----+

1 10 |11 20 |*******21 30 |*********************************31 40 |********41 50 |**51 60 |61 70 |71 80 |81 90 |91 100 |

50 Total Observations

Average = 25 milliseconds = 17% improvement!

Page 29: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 29 October 5th, 2002Copyright © 2002 Alexander Vrenios

Further Details are in the Book:• Download all the source code for free:

http://www.samspublishing.com

- Search on "Linux cluster architecture“ or “Vrenios”

- Click on the “Downloads” link in the book description

1. Individual chapter examples are in zip files

2. A complete user chief environment is in a tar.gz file

• Book Signings:

Sep 8th Borders Chandler, Sunday @ 2pmSep 15th Borders Arrowhead, Sunday @ 2pmOct 25th Barnes & Noble Arrowhead, Friday @ 7pm

Page 30: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 30 October 5th, 2002Copyright © 2002 Alexander Vrenios

References:Distributed Operating Systems, Andrew S. Tanenbaum

(of MINIX and AMOEBA fame!), Prentice Hall, 1995

Unix Distributed Programming, Chris Brown,Prentice Hall, 1994

Advanced Programming in the UNIX Environment,W. Richard Stevens, Addison-Wesley, 1992

“CHAOS: A CHeap Array of Outmoded Systems,” Alex Vrenios,LinuxGazette.com, October 1998

“CHAOS Part 2,” LinuxGazette.com, Alex Vrenios, December 1998

Linux Programming White Papers, Rushling, et al, Coriolis Open, 1999

Page 31: Linux Cluster Architecture

Linux Cluster Architecture

Linux Users Group Slide # 31 October 5th, 2002Copyright © 2002 Alexander Vrenios

You’ve been a terrific audience!

Any questions?

Manifold Server - Final Performance Results

0

5

10

15

20

1 2 3 4 5 6 7

Number of Remote Workers

Tim

e fo

r 10

Que

ries

(se

con

ds)

. .

Hurry out andbuy this book!