virtualization with solaris - front page · pdf file · 2013-08-26virtualization...
TRANSCRIPT
Virtualization with SolarisBased on Solaris 10 10/08
Bart MuijzerSystems Solutions ArchitectOperating Systems AmbassadorSUN Microsystems Nederland BV
A bit about me
• Did “Hogere Informatica Opleiding” in Enschede• UNIX addict since 1985• 1991 – 1998: UNIX Sysadmin, Utrecht University• 1998 – 1999: UNIX Specialist, AZU• 1999 – today: SUN Microsystems
> Techie functions> OS Ambassador since 2002
• Married, 3 kids (that I teach Solaris)• http://bartmu.hyves.nl
AgendaOverall
• Virtualization with SUN• Solaris, OpenSolaris... what's up?• BREAK• Solaris
> selected features for developers> ... with some handson!
• How to engage• Sorry... no talk on Java ;-)
Agenda – Part 1Virtualization with SUN
• Overview• SUN offerings
Server VirtualizationSituation Today
Web Web ServerServer
Email Email ServerServer
DNS DNS ServerServer
App App ServerServer
DB DB ServerServer
One application per server
Increasing operational and
mangement costs
Dir Dir ServerServer
Average utilization rate is 5%-20%
Server VirtualizationIt's not easy but it helps...
What the entire art of [server] virtualization comes down to is moving the OS to a place where it
should not be, and running around like your head is on fire trying to fix all the problems that come up.
There are a lot of problems, and they happen quite often, so the performance loss is nothing specific,
but more of a death by 1000 cuts.The InquirerFebruary 2005
Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.
Server
OS
App
Multiple OSes Single OS
CalendarServer Database Web
ServerSunRayServer
AppServerDatabaseMail
ServerWeb
ServerFile
ServerIdentityServer
AppServer Database
Trend to flexibility Trend to isolation
Server Virtualization Approaches
> Very High RAS> Very Scalable> Mature Technology> Ability to run different
OS versions
> Very scalable and low overhead
> Single OS to manage> Cleanly divides system
and application administration
> Fine grained resource management
> Very scalable and low overhead
> Single OS to manage> Fine grained resource
management
> Ability to live migrate an OS
> Ability to run different OS versions and types
> De-couples OS and HW versions
Hard Partitions
Server
OS
ApplicationIsolation all the
way into the hardware
Only as granular as the hardware
allows
Only on certain hardware
IdentityServer
AppServer Database
Server
OS
Application
Server
Application
Virtual Machines
Allows different OS versions and types
Extra overhead for the
Hypervisor
Available on many platforms
MailServer
WebServer
FileServer
Virtualization Types• Full Virtualization
> Thick Hypervisor> OS does not know it runs in a VM> Enables running legacy OS-es
• Paravirtualization> Thin Hypervisor> OS needs to know it runs on top of a VM> Can't run legacy OS-es
• Both can use hardware assistance> Intel-VT, AMD-V, SPARC CMT
Logical Domains Virtual Machines for SPARC
Server
OS
Application
Server
ApplicationMailServer
WebServer
FileServer
Stable interface(sun4v)
Firmware (upgrade)
CMT processor(T1000/T2000)
OS Virtualization (1)Solaris zones
Server
OS
ApplicationResource and namespace
isolation
Very scalable
Available on all platforms
CalendarServer
Database WebServer
ContainerContainer ContainerContainerContainerContainerContainerContainer ContainerContainerContainerContainer
Server VirtualizationSolaris Zones
• Single Solaris instance> Appearance of many OS instances> Minimal performance impact
ZoneZone
CPU CPU CPU CPU CPU CPU CPU CPU
Memory
OS
ZoneZone
Memory
OS
ZoneZone ZoneZone
Zone Properties• Can have own IP stack• Tightly linked with Solaris' Resource Management
capabilities> Same controls for global and local zone
• Upgrade tools know about local zones• Can be branded:
> Linux> Solaris 8
• Attach, Detach, Clone, Migrate• Configurable privileges
Branded Zones• Available for
> SPARC: Solaris 8 and Solaris 9 (userland only)> x86 : Linux
• Linux:> RedHat Enterprise Linux 3, and CentOS
>32-bit only> Only for Solaris 10 x86> NOT running a Linux kernel> Needs Linux CD (and hence valid RTU)
Example: S8C - Upgrade in Phases
Solaris 10Global
OPL
Solaris 10 Container
ZFS DTrace
DatabaseApplication
Solaris 8
Solaris 8 Migration Container
BrandZ
Server
OS
ApplicationDatabaseApplication
Phase I: Deploy H/W, Deploy Solaris 8 Container
db27.foo.comNIS Name SvcRoot PW: db27
Local tools & scripts
db27.foo.comNIS Name SvcRoot PW: db27
Local tools & scripts
FMA
T2000/T5120/T5220
Using Containers to help migration to Solaris 10
Example: S8MA - Upgrade in Phases
Solaris 10Global
OPL
Solaris 10 Container
ZFS DTrace
BrandZ
Server
OS
Application
db27.foo.comNIS Name SvcRoot PW: db27
Local tools & scripts
FMA
Phase II: Application Redeploy
db27.foo.comNIS Name SvcRoot PW: db27
Local tools & scripts
Solaris 8 Migration Container
DatabaseApplication
DatabaseApplication
T2000/T5120/T5220
OS Virtualization (2)Resource Management
Server
OS
ApplicationResource
controls only
Very scalable
Available on all platforms
SunRayServer
AppServerDatabase
OS Virtualization (3)Solaris Containers
Server
OS
ApplicationResource and namespace isolation with
Resource Controls
Very scalable
CalendarServer
Database WebServer
OS Virtualization (4)BrandZ, SCLA, S8C and S9C
• BrandZ is an extension to Zones technology> Enables Solaris Containers to assume different OS
personalities a.k.a. “Brands”• Solaris Containers for Linux Applications build on
BrandZ to provide Linux-branded Containers> Ideal for Linux consolidation and development as well as
migration to Solaris• Solaris 8 Containers, Solaris 9 Containers build on
BrandZ to provide {S8, S9}-branded Containers> Migrate S8 or S9 servers onto S10
Hosted VirtualizationWhat it is
• Virtualization runs on top of some Operating System• Examples:
> Sun xVM VirtualBox (www.virtualbox.org)> VMWare Workstation> Microsoft Virtual Server> User-mode Linux (UML)
>./linux> Virtuozzo
• No doubt, there is more out there...
Server VirtualizationSolutions from Sun
Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.
Server
OS
App
Multiple OSes Single OSTrend to flexibility Trend to isolation
Dynamic System Domains Solaris Containers(Zones + SRM)
Solaris Containersfor Linux Applications
Solaris Trusted Extensions
Solaris Resource Manager(SRM)
Logical Domains
Xen
VMware
Microsoft Virtual Server
CalendarServer Database Web
ServerSunRayServer
AppServerDatabaseMail
ServerWeb
ServerFile
ServerIdentityServer
AppServer Database
Hybrid Solutions
Server
Dynamic System Domains with Solaris Containers> Combine high RAS and proven robustness with
flexible application environments> Both can scale all the way up to 144 way systems> Incur no extra overhead for Virtualization
LDoms/Xen/VMware/MSVS with Solaris Containers> Combine flexibility of OS version and type with secure
application environments> Live migration allows for off-loading a system in
production for repair of DR
Hard Partitions & OS Virtualization Virtual Machines & OS VirtualizationDatabase Mail
ServerWeb
ServerMail
ServerWeb
ServerFile
Server
What's Up ??
Why the OS Matters
Applications
Infrastructure Services
Hardware
Operating System
Support
What You Care About
What You Depend On
What Makes the Difference
Data Overload
What YouWorry About
Intrusions
Costs
Management
Overload
Level of Service
Solaris and Open Source• Innovate through Sharing• Goal: allow external collaborations during the
development of Solaris• Model:
> Release source code every ~2 weeks> Create a “way into SUN” for external contributions> Apply the Solaris Quality Process
• Solaris is Open Source, therefore:> Common Development and Distribution License (CDDL)> Compilers and other tools are free
Solaris and Open DevelopmentS9
S10
Nevada (Open Sourced parts of Solaris)
FCS u1 u2 u3 u4 u5 u6 u7 u8
u1 u2 u3 u4 u5 u6 u7 u8
SXCE (binary distro of Nevada)
+IPS + Installer = Indiana2008.05 2008.11 2009.04
Today
b103
Solaris.Next
SchilliX, Belenix, MartUX mBE,Nexenta OS, MilaX
Further Dev
OpenSolaris• Sun expects
> Help with device drivers> Help with security fixes> Larger footprint
> More ISV support> More self-help discussions> More customers not choosing Linux (or Microsoft)
> A better sense for the future direction Solaris should take> Credibility – we've delivered what we've promised
• Sun does NOT expect> The community to do our work> Customers to run their business on the code base; they should run
on the product
BREAK
Agenda – Part 2Solaris (aimed at Software Developers)
• Selected Solaris Features> ZFS> DTrace> Predictive Self Healing / FMA> Zones> Resource Management> Containers
• ... with some handson!!!
ZFS: no more of...# format... (long interactive session omitted)
# metadb -a -f disk1:slice0 disk2:slice0
# metainit d10 1 1 disk1:slice1d10: Concat/Stripe is setup# metainit d11 1 1 disk2:slice1d11: Concat/Stripe is setup# metainit d20 -m d10d20: Mirror is setup# metattach d20 d11d20: submirror d11 is attached
# metainit d12 1 1 disk1:slice2d12: Concat/Stripe is setup# metainit d13 1 1 disk2:slice2d13: Concat/Stripe is setup# metainit d21 -m d12d21: Mirror is setup# metattach d21 d13d21: submirror d13 is attached
# metainit d14 1 1 disk1:slice3d14: Concat/Stripe is setup# metainit d15 1 1 disk2:slice3d15: Concat/Stripe is setup# metainit d22 -m d14d22: Mirror is setup# metattach d22 d15d22: submirror d15 is attached
# newfs /dev/md/rdsk/d20newfs: construct a new file system /dev/md/rdsk/d20: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d20 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d20 /dev/md/rdsk/d20 /export/home/ann ufs 2 yes -
# newfs /dev/md/rdsk/d21newfs: construct a new file system /dev/md/rdsk/d21: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d21 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d21 /dev/md/rdsk/d21 /export/home/bob ufs 2 yes -
# newfs /dev/md/rdsk/d22newfs: construct a new file system /dev/md/rdsk/d22: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d22 /export/home/sue# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d22 /dev/md/rdsk/d22 /export/home/sue ufs 2 yes -
# format... (long interactive session omitted)# metattach d12 disk3:slice1d12: component is attached# metattach d13 disk4:slice1d13: component is attached# metattach d21# growfs -M /export/home/bob /dev/md/rdsk/d21/dev/md/rdsk/d21:... (many pages of 'superblock backup' output omitted)
Traditional Filesystem Administration
Filesystem Admin – The ZFS way
ZFS Administration• Create a storage pool named “home” # zpool create home mirror c0t3d0 c0t4d0
# zfs set mountpoint=/export/home home
• Create filesystems “ann”, “bob”, “sue” # zfs create home/pieter
# zfs create home/clemens
# zfs create home/bartm
• Later, add space to the “home” pool # zpool add home mirror c0t8d0 c0t9d0
ZFS Goodies• snapshot, clone, rollback # zfs snapshot tank/home@yesterday
# zfs clone tank/home@yesterday tank/home-yesterday
# ls ~/.zfs/home-yesterday
# zfs rollback -r tank/home@yesterday
• replicate (incremental) # zfs send tank/home@yesterday |
ssh rhost zfs receive rpool/home@yesterday
# zfs send -i home tank/home@yesterday |
ssh rhost zfs receive rpool/home
Demo: ZFS
PSH: Fault Management Architecture• Predictive Self Healing components:
> Fault Management Architecture (FMA)> Service Management Facility (SMF)
• FMA is based on > Error events, which are dispatched to> Diagnosis agents, that generate> Fault events, handled by> Agents that take proactive action
• Available for CPU, mem, I/O bus• Agents interact with DR, RM, ...• See: http://www.sun.com/bigadmin/content/selfheal/
I
FMA for X64 - example
PSH: Service Management Facility• Other part of Predictive Self Healing• Manage running services
> Replace ancient “rc files” • Maintain:
> Dependencies> Snapshots> Status
• Functions: enable, disable, rollback, restart• See:
> http://www.sun.com/bigadmin/content/selfheal/smf-quickstart.html
I
Tracing and DebuggingA real-life example
• Application: konsole (The X Terminal emulator of KDE)• Problem: konsole becomes unresponsive (hangs) after hitting
^C• Others: no source code available
I
Tracing and DebuggingA real-life eaxample• Analisys:
I
konsole normally sits in a loop calling poll() witha number of fd's, amongst which is an fd that points to
/devices/pseudo/clone@0:ptm
After hitting ^C, konsole still runs, but calls to pollsys()no longer contain the fd that has opened /devices/pseudo/clone@0:ptm.So it looks like konsole never gets any more input from it's childprocess (the shell in this case).
Tracing and debuggingUsing truss(1M) – trace system calls and signals
truss konsole from another window when hitting ^C.
# truss -tpollsys,read -vpollsys,read -p `pgrep konsole`
/1: pollsys(0x08074110, 6, 0x080466A8, 0x00000000) = 1/1: fd=3 ev=POLLIN rev=0/1: fd=9 ev=POLLIN rev=0/1: fd=8 ev=POLLIN rev=0/1: fd=5 ev=POLLIN rev=0/1: fd=11 ev=POLLIN rev=POLLIN/1: fd=15 ev=POLLIN rev=0/1: timeout: 0.928000000 sec/1: read(11, 0x0814A990, 0) = 0
Tracing and debuggingUsing truss(1M)
fd 11 is subsequently dropped off the list of fds thatkonsole wants to watch:
/1: pollsys(0x08074110, 5, 0x080466A8, 0x00000000) = 1/1: fd=3 ev=POLLIN rev=0/1: fd=9 ev=POLLIN rev=0/1: fd=8 ev=POLLIN rev=POLLIN/1: fd=5 ev=POLLIN rev=0/1: fd=15 ev=POLLIN rev=0/1: timeout: 0.928000000 sec
Tracing and debuggingUsing truss(1M)
It looks like konsole treats the 0-byte-read as an EOF, which itshouldn't since, on STREAMs based implementations, poll() (and it'ssystem implementation pollsys()) can return POLLIN revents even when there's 0 bytes available, see man poll(2):
POLLIN Data other than high priority data may be read without blocking. For STREAMS, this flag is set in revents even if the message is of zero length.
Tracing and debuggingUsing truss(1M)
So, if the assumption is correct, and konsole treats the zero-read asEOF, it should be changed to look for POLLHUP revents instead.
The reason this shows up on Solaris and not other unices is probablybecause this is the only STREAMS based pseudo tty implementation that you're running on.
Tracing and debuggingDevelopers reaction
“The story of how we got to this point deserves a blog entry of its own -- maybe I'll write one in the train when next traveling -- because it shows off all the fancy debugging tools that are available on the platform. [...]
Having tools at hand so you can ask questions like 'what are all the FDs passed in to select() in the Qt event loop?' with no recompiles is a godsend here. Or 'what are all the stack traces leading to QSocketNotifier::setEnabled in this running konsole?' Those are powerful tools, a tale for some other time.”
http://www.fruitsalad.org/people/adridg/bobulate/index.php?/archives/638-Incorporating-post-4.1.0-fixes-in-OpenSolaris.html
Dtrace – Dynamic TracingWhat is causing all the cross calls?
The X serverWhat are the X servers doing?
They're mapping and unmapping /dev/nullWhy are they doing that?
They're creating and destroying pixmapsWho's asking them to do that?
Several instances of a stock-ticker applicationHow often is each stock-ticker making this
request?100 times per second
Why is the application doing that?It was written by 10000 monkeys at 10000
keyboards
DTrace
• Improved system observability> Better debugging and performance tuning> Complete view from Java thread to kernel
• Dynamic instrumentation> Enables continuous “black box” recording
• Examine live systems and crash dumps> Reduce time-to-resolution
Dtrace (2)
Dtrace Framework
C C Dtrace(1M)
P PPPP
User
Kernel
DTrace• Structure:
syscall::open:entry/execname==”ls”/{
printf(“Opened file: %s\n”, copyinstr(arg0);}
provider:module:function:name/predicate/{
action; action;}
Demo: Dtrace
Resource Management• Resource set
> Partitions of the hardware resources> Can be: CPU, memory or SWAP
• Resource pools> Logical partitions of different resource sets> Multiple pools can link to the same set> Dynamic: resources are re(allocated) to meet demand and objectives
• Projects> Workload labels linked to a Resource Pool> Enables processes running in a project to have specific resource sets> Mechanism of “shares” to assign right amount of CPUs to workloads
Resource sets
Hardware
OS OS S
CPU Memory SWAP
Resource Pools
Hardware
OS OS S
CPU Memory SWAP
ResourcePool
ResourcePool
Projects
Hardware
OS OS S
CPU Memory SWAP
ResourcePool
ResourcePool
Project[10]
Project[50] Project
Server VirtualizationSolutions from Sun
Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.
Server
OS
App
Multiple OSes Single OSTrend to flexibility Trend to isolation
Dynamic System Domains Solaris Containers(Zones + SRM)
Solaris Containersfor Linux Applications
Solaris Trusted Extensions
Solaris Resource Manager(SRM)
Logical Domains
Xen
VMware
Microsoft Virtual Server
CalendarServer Database Web
ServerSunRayServer
AppServerDatabaseMail
ServerWeb
ServerFile
ServerIdentityServer
AppServer Database
Zones and Resource Mgt
Solaris Zones+
Solaris Resource Manager
=
Solaris Containers
S10 Resource Management
Hardware
OS OS S
CPU Memory SWAP
ResourcePool
ResourcePool
Project
ProjectProject
ZONE ZONE
Next Steps> Get Solaris
sun.com/solaris/get
> Get Data Sheets and White Paperssun.com/solaris/reference_materials
> Get Trainedsun.com/solaris/freetraining | Learning Paths: sun.com/training/solaris
1
2
3
4
> Get Started with Solaris Learning Centerssun.com/solaris/teachme5
> Get Currentsun.com/solaris/move | bigadmin.com/apps | bigadmin.com/hcl
6
> Get Involvedopensolaris.{org,com} | bigadmin.com | developers.sun.com/solaris
SAI – Sun Academic Initiativehttp://www.sun.com/solutions/landing/industry/education/sai/index.xml
• Collaborative relationship with educational institutions.
• Schools become authorized to deliver training on Sun technologies to their students, faculty, and staff.
• Access to free Web-based training and curricula, including courses in the latest Java and Solaris technologies.
• Fontys is already participating• Campus Ambassador Program
> http://developers.sun.com/students/community/map.jsp
Q & A
Bart [email protected]
Virtualization with Solaris