solaris 10 - dps.uibk.ac.attf/lehre/ss07/bs/vorlesungen/solaris vortrag...sun microsystems gesmbh...
TRANSCRIPT
Sun Microsystems GesmbH Wienerbergstrasse 3/VII A- 1101 Wien
Solaris 10
DI Gerald HartlAccount Manager for Education and Research
1
Agenda
• Short Solaris 10 Overview
• Introduction to Solaris Internals
• Memory
• File System
• Q&A
2
Solaris 10 InnovationsOverview
... and over 600 projecets
HighestAvailability
withPredictive
Self Healing
MaximumSecuritybased on
Trusted Solaris
OptimalMonitoring
withDTrace
Secure andEffective
Consolidationwith
SolarisContainern
ExtremePerformance
5
Solaris 10Same Ideas about Consolidation
Container 1:Web-Server
Container 2:App-Server
Container 3:Database
Memory PCI-E I/O
134GB/s Interconnect
Cor
e #1
Cor
e #2
Cor
e #3
L2 Cache
Cor
e #4
Cor
e #5
Cor
e #6
Cor
e #7
Cor
e #8
6
Container and Ultra/OpenSPARC T1Blade Shelf on a Chip
• Network consolidation on chip> Higher performance (chip bandwidth)
• Container can be assignedto cores> Optimize Resource
utilization
• Sandbox for application
Container 1:Web-Server
Container 2:App-Server
Container 3:Database
Memory PCI-E I/O
134GB/s Interconnect
Cor
e #1
Cor
e #2
Cor
e #3
L2 Cache
Cor
e #4
Cor
e #5
Cor
e #6
Cor
e #7
Cor
e #8
7
OS Virtualisation Trends
• More OS instances
> More administratin required
• Strong seperation
• Higher costs (HW or license)
More Flexibility
Stronger Seperation
Hardware Partitions Virtual Machines OS Virtualisation Resource Management
Dynamic SystemDomains
Solaris Container(Zones + SRM)
Solaris ResourceManager (SRM) VMware
Hardware Consolidation OS Consolidation
• Only one OS instance
> Simple administration
• Less seperation (HW)
• More flexibility
8
• Extreme reliability> No data without checksums> Selfhealing datastore
• Simple administration> Single line instead of scripts> Includes Volume Manager
• Highest capacity> 128bit filesystem
• High performance
• Add ons modules available
ZFS: The Ultimate Filesystem
9
The ZFS Idea
• Volume Manager andFilesystem> Reduce complexity> Simple administration> Increase resource utilization
• Innovative architecture> No filesystem check required> Mirroring, Snapshot, RAID-Z,
compression, ...
• Available for testing:
Server
ZFS
1
ZFS
2
ZFS
3
ZFS
4
ZFS Storage Pool
c0t0d0 c0t0d1 c0t2d0
10
In the Past# format... (long interactive session omitted)
# metadb -a -f disk1:slice0 disk2:slice0
# metainit d10 1 1 disk1:slice1d10: Concat/Stripe is setup# metainit d11 1 1 disk2:slice1d11: Concat/Stripe is setup# metainit d20 -m d10d20: Mirror is setup# metattach d20 d11d20: submirror d11 is attached
# metainit d12 1 1 disk1:slice2d12: Concat/Stripe is setup# metainit d13 1 1 disk2:slice2d13: Concat/Stripe is setup# metainit d21 -m d12d21: Mirror is setup# metattach d21 d13d21: submirror d13 is attached
# metainit d14 1 1 disk1:slice3d14: Concat/Stripe is setup# metainit d15 1 1 disk2:slice3d15: Concat/Stripe is setup# metainit d22 -m d14d22: Mirror is setup# metattach d22 d15d22: submirror d15 is attached
# newfs /dev/md/rdsk/d20newfs: construct a new file system /dev/md/rdsk/d20: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d20 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d20 /dev/md/rdsk/d20 /export/home/ann ufs 2 yes -
# newfs /dev/md/rdsk/d21newfs: construct a new file system /dev/md/rdsk/d21: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d21 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d21 /dev/md/rdsk/d21 /export/home/bob ufs 2 yes -
# newfs /dev/md/rdsk/d22newfs: construct a new file system /dev/md/rdsk/d22: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d22 /export/home/sue# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d22 /dev/md/rdsk/d22 /export/home/sue ufs 2 yes -
# format... (long interactive session omitted)# metattach d12 disk3:slice1d12: component is attached# metattach d13 disk4:slice1d13: component is attached# metattach d21# growfs -M /export/home/bob /dev/md/rdsk/d21/dev/md/rdsk/d21:... (many pages of 'superblock backup' output omitted)
12
With ZFS
• Create a storage pool named “home”# zpool create home mirror c0t0d0 c0t1d0
• Create a filesysteme for “ann”, “bob” and “sue”# zfs create home/ann# zfs create home/bob# zfs create home/sue
• Add new disk to pool# zpool add home mirror c1t0d0 c1t1d0
13
Global Thread Priorities
Source: Solaris Internals, page 18
Lightweight Process (LWP)The kernel visible execution context for a user thread
17
Processor Abstractions
• CPU partitions
• Processor sets
• Resource pools
• Locality groups (lgroups, MPO)Solaris 9, Memory Placement Optimization
Source: Solaris Internals, page 162
25
Virtual Address Spaces
• Executable textbinary, read only with execute permissions
• Executable datamapped read/write/private
• Heap spacememory allocated by malloc()
• Process stackanonymous memory and is mapped read/write
Source: Solaris Internals, page 457
29
The Stack
Solaris Version Maximum Heap Size Notes
Solaris x86 32bit mode 2GBytes by default
Boot option kernel basecan be moved to allowlarger process addressspace
Solaris x64 64bit mode 16EBytes Virtually unlimited
SPARC 64bit mode 16TBytes on UltraSPARC I/II16EBytes Virtually unlimited
Source: Solaris Internals, page 462
33
Tracing the VM System
sol10# ./vm.d <pid>sol10# more vm.d
:::BEGIN{ start = timestamp;}
syscall:::/$target == pid/{ trace((timestamp - start) / 1000);}
::add_physmem:,::sptcreate:,...::sptdestroy:,::va_to_pfn:/$target == pid/{ trace((timestamp - start) / 1000);}
Source: Solaris Internals, page 466
35
Tracing the VM System
0 => munmap 31940 -> as_unmap 31990 -> as_findseg 32060 <- as_findseg 32090 -> segvn_unmap 32110 -> segvn_lockop 32170 <- segvn_lockop 32190 -> hat_unload_callback 32210 -> page_get_pagesize 32360 <- page_get_pagesize 32370 -> hat_page_setattr 32390 <- hat_page_setattr 32400 -> free_vp_pages 32470 -> page_share_cnt 32520 -> hat_page_getshare 32550 <- hat_page_getshare 32560 <- page_share_cnt 32580 <- free_vp_pages 32590 <- hat_unload_callback 32610 -> seg_free 32630 -> as_removeseg 32650 <- as_removeseg 32700 -> segvn_free 3272...
Source: Solaris Internals, page 466
36
The vnode Segment seg_vn
Source: Solaris Internals, page 481
• Executable text
• Executable data
• Heap and stack (anonymous memory)
• Shared libraries
• Mapped files
40