it2204: systems administration i 7. device management

IT2204: Systems

Administration I

7. Device Management

2

Managing desktops Managing many desktops (workstations)

Best done by automation for supported platforms, thus automating the process of managing many PCs.

−Main duties for workstations□ Loading of system and software□ Updating system software and applications□ Configuring network parameters

−All three must be right□ Initial load must be consistent across all machines□ Quick updates□ Network configuration to be managed centrally

3

Managing many workstations

− Best done by automation for supported platforms

4

Machine life cycle Five states and several transitions exist.

There is need to plan for the different states and transitions

5

Machine life cycle

• Machine states New

− A new machine Clean

− OS installed, but not yet configured for environment.

Configured− Set up (configured) correctly for the

operating environment. Unknown

− Misconfigured, broken, newly discovered, outdated configuration, etc.

Off− Retired computer

6

Machine life cycle

•State transitions Build

− Transition from new to clean states. − Set up hardware and install OS.

Initialize− Configure for environment; often

part of build. Update

− Install new software. − Patch old software. − Change configurations.

7

Machine life cycle

•State transitions Entropy

− Undisciplined/ unmanageable changes to configurations

− Major environment changes − Unexplained problems

Debug− Going back to correct configured state

Rebuild− Machine rebuilding

□ Possibly due to a major OS revision,□ Drastic changes to be made such that

simple updates make no sense

8

Automated Installations• Advantages Saves time/money

− Boot the computer, then go do something else. Ensures consistency

− No chance of entering wrong input during install.

− Avoids user requests due to mistakes in configuration

− What works on one desktop, works on all. Allows for fast system recovery

− Rebuild system with auto-install vs. slow tapes.

9

Automated Installations

Full automation better than partial one

− Eliminates prompts in installation scripts

− Can include complete notification when complete.

Partial automation (better than no none)

− Needs proper documentation for consistency

10

Vendor Installations

•Weaknesses Need to always reload the OS on new machines (you may have your OS preference)

− You need to configure the host for your environment

− Eventually you’ll reload the OS on a desktop, leaving you with two platforms to support: the vendor OS install and your OS install.

− Vendors change their OS images from time to time, so systems bought today have a different OS from systems bought a few months ago.

11

Your own Installations

Little trust to vendor' s pre-installed OS− Makes adding new apps to your clean install

easier as you are installing for your own environment. No need to contact the vendor for installation help.

− There may be need for special apps or add-ons

− You may eventually need a re-install□ This may be different from that of the

vendor□ Need to make sure required drivers,

software are available

12

System and Application

Updates At times the software we install may get bugs and security loop holes.

New applications are also launched now and again.

We may need to update or upgrade our software. Updates should be updated too Automation systems include:

− Solaris autopatch− Windows − Linux package updaters eg yum, apt• Updates can be retrieved from the online

software update centers for different OS vendors.

• A software update provides bug fixes for features that aren't working right, minor software enhancements and sometimes include new drivers.

• Some software updates are free, and are sometimes called a patch because the update is installed over software you're already using and it isn't a full software installation.

13

System and Application Updates

• A software upgrade requires a purchase of a new version of your software, usually at a lower price than you would pay if you bought the software for the first time.

• Some companies will offer a free update to the latest version if you just recently bought your software, so be sure to register the software when you install it so you know if you qualify for free upgrades.

14


• A patch is a software update comprised of code inserted (or patched) into the code of an executable program. Typically, a patch is installed into an existing software program. Patches are often temporary fixes between full releases of a software package.

• Patches may do any of the following:– Fix a software bug– Install new drivers– Address new security vulnerabilities– Address software stability issues– Upgrade the software

15


• A patch is designed to update a computer program or its supporting data, to fix or improve it. This includes fixing security vulnerabilities and other bugs, and improving the usability or performance.

• Though meant to fix problems, poorly designed patches can sometimes introduce new problems.

16

17

Network configuration

Done over the network, typically using DHCP

− Eliminates time wastage and manual error

− More secure – only authorized systems have access

− Centralized control makes updates and changes a lot easier (e.g. new DNS server)

18

Managing Servers

Different from workstation− Serves many users− Requires reliability and high uptime (a server

should always be on)− Requires tight security− Often expected to last longer− More expensive− Typically has a different OS configuration from

desktops− Deployed within a data center− Often has maintenance contracts− Has backup systems− Has better remote access

19

Server HW

Buy server HW for servers

− specialized hardware− offers more internal space− offers more CPU performance− Offer high performance I/O (both

disk and network)− Has more upgrade options− Can be rack mountable

For reliability, use known vendors

20

Vendor server product lines

• The typical vendor has three product lines:

1.Home

2.Require more internal space (Business)

3.Offer high performance I/O (Server)

21


Home− Absolute cheapest purchase price− Original equipment manufacturer

(OEM) components change often− focuses on being the lowest cost at

the outset. consumers make purchasing decisions based on price. (Any add-on features can be considered later and purchased for a premium cost.)

22


Require more internal space (Business)− Longer life, reduced total cost of ownership− Fewer component changes− concentrate on the total cost of

ownership. Businesses tend to keep their computers for a longer time than the average home consumer. Therefore, the manufacturers have to keep a large pool of spare parts for maintenance of these computers.

− Usually higher quality components than that of the home line computers.

23


Offer high performance I/O (Server)− Lowest cost performance metric− Easier to service components and design− The server line tends to focus on the

lowest total cost of ownership. For example, a server is designed with higher quality components that will last a great deal of time longer than the home or business line of computers. A server is also designed to be able to process a higher workload than a home line of computers.

24

Maintenance contracts

Vendors have variety of service contracts− Customer-purchased spare parts get

replaced when they get used up How to select a maintenance contract?

− Think of needs□ Non-critical hosts: next-day or two-day

response time is likely reasonable, or perhaps no contract

□ Large groups of similar hosts: use spares approach

□ Controlled model: only use a small set of distinct technologies so that few spare part kits needed.

25

Maintenance contracts

How to select a maintenance contract?

− Think of needs□ Critical host: stock failure-prone

and interchangeable parts (power supplies, hard drives); get same-day contract

□ Large variety of models from same vendor: sufficiently large sites may opt for a contract with an on-site technician

26

Data backups

Servers often have critical data that must be backed up

− Client data often backed up on server

− Think of separate administrative network□ Keep bandwidth-hungry backup

jobs off of the production network□ Provide alternate access during

network problems □ May require additional NICs,

cabling, switches etc

27

Servers in the data center

Servers should be located in data centers

− Data centers provide□ Proper power (enough power,

conditioned, UPS, maybe generator)

□ Fire protection/suppression□ Networking□ Sufficient air conditioning (climate

controlled)□ Physical security

28

Remote administration

Data centers are expensive and may be distant from admin office

Servers should not require physical presence at a console

− Typical solution is a console server□ Eliminate need for keyboard and screen□ Can see booting, can send special keystrokes□ Access to console server can be remote (e.g.,

ssh, rdesktop)− Power cycling provided by remote-access power-strips− Media insertion & hardware servicing are still

problems

RAIDRedundant Array of Independent

Disks

30

RAID Redundant array of independent disks

RAID

− A system whereby two or more disks are physically linked together to form a single logical, large capacity storage device that offers a number of advantages over conventional hard disk storage devices

− Makes many smaller disks appear as one large disk to a server

31

Why RAID? Superior performance and system

reliability

Improved resilience through increased redundancy

Lower costs

32

Why RAID? Performance

The parallelism or ability to access multiple disks in the same time allows for the data to be written or read from an array in a faster way than what would be possible in a simple single drive.

− Typically used in large file servers, transaction or application servers, where data accessibility is critical, and fault tolerance is required.

− Today, RAID is also being used in desktop environments for CAD, multimedia editing and playback where higher transfer rates are needed.

Performance is increased because the server has more "spindles" to read from or write to when data is accessed from a drive

33

Why RAID? More resilience

This provided allowance for a backup of data in the storage array during failure.

The failure of any array can be prevented by swapping out a new drive without turning the system off.

The RAID performance depends on the number of drives used in the array.

34

Why RAID? More resilience

Redundancy is achieved by either writing the same data to multiple drives (mirroring), or collecting data (parity data) across the array, such that the failure of one or more disks in the array does not result in data loss.

A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining data and the parity data.

35

Why RAID? Cheaper

Since the main principle involved in the RAID is to provide greater or the same storage capacity to a system in comparison with that for a single drive, there is a high price difference.

When self repairing configurations are used that do not need humans to replace drives, storage could become cheaper and more liable.

36

RAID Techniques

• Two principal techniques employed:

1. Mirroring− The first implementation of RAID,

typically requiring two individual drives of similar capacity. One drive is the active drive and the secondary drive is the mirror

− The technique provides a simple form of redundancy for data by automatically writing data to the mirror drive when it is written to the active drive.

37

RAID Techniques

• Two principal techniques employed:

2. Striping− This technique provides increased

performance. It is a method of mapping data across the physical drives in an array to create a large virtual drive.

− The data is subdivided into consecutive segments or stripes that are written sequentially across the drives in the array.

38

RAID levelsSeveral levels of RAID are used

RAID 0 – “Disk Striping”− It is technically not a RAID level since it

provides no fault tolerance. Data is written in blocks across multiple drives, so one drive can be writing or reading a block while the next is seeking the next block.

RAID 0: Striping

39

40

RAID levels

− Advantages: higher access rate, and full utilization of the array capacity, easy to implement.

− Disadvantage: there is no fault tolerance - if one drive fails, the entire contents of the array become inaccessible.

− Ideal for non-critical systems.

41

RAID levels: RAID 1

RAID 1 – “Disk Mirroring”− Provides redundancy by storing

data twice. Data is written to both the data disk (active) and a mirror disk(s).

− If one fails, the controller uses either the data drive or the mirror drive for data recovery and continues operation.

RAID 1: Mirroring

42

43

RAID levels: RAID 1

− Advantage: it provides the best protection of data since the array management software will simply direct all application requests to surviving disk when one disk fails.

− Disadvantages: no improvement in data access speed, and higher cost, since twice the number of drives is required.

− Ideal for critical mission systems

44

RAID levels: RAID 3

RAID 3− Data blocks are subdivided (striped)

and written in parallel on two or more drives. An additional drive stores parity information for error correction/recovery . A minimum of 3 disks is needed for a RAID 3 array.

− Since parity is used, a RAID 3 stripe set can withstand a single disk failure without losing data or access to data.

RAID 3

45

46

RAID levels: RAID 3

− Advantages: it provides high throughput (both read and write) for large data transfers; and disk failures do not significantly slow down throughput.

− Disadvantages: this technology is fairly complex and too resource intensive to be done in software; performance is slower for random, small I/O operations.

47

RAID levels: RAID 5

RAID 5 – “Data striping with parity”− The most common secure RAID level.− Similar to RAID-3 except that data are

transferred to disks by independent read and write operations (not in parallel).

− The written data chunks are also larger.− Instead of a dedicated parity disk, parity

information is spread across all the drives. A minimum of 3 disks is needed for a RAID 5 array.

RAID 5 – “Data striping with parity”

48

49

RAID levels: RAID 5 RAID 5 – “Data striping with parity”

− A RAID 5 array can withstand a single disk failure without losing data or access to data. Although RAID 5 can be achieved in software, a hardware controller is recommended. Often extra cache memory is used on these controllers to improve the write performance.

50

RAID levels: RAID 5− Advantages: read data transactions

are very fast while write data transaction are somewhat slower (due to the parity that has to be calculated).

− Disadvantages: Disk failures have an effect on throughput, although this is still acceptable; like RAID 3, this is complex technology.

− RAID 5 is a good all-round system that combines efficient storage with excellent security and decent performance.

− It is ideal for file and application servers.

51

RAID levels

• RAID 0 and RAID 1 combinations

Combines the advantages (and disadvantages) of RAID 0 and RAID 1 in one single system.− It provides security by mirroring all

data on a secondary set of disks (disk 3 and 4 in the drawing) while using striping across each set of disks to speed up data transfers.

53

RAID levels

• RAID 0 and RAID 1 combinations

RAID 1+0 (or 10) is a mirrored data set (RAID 1) which is then striped (RAID 0), hence the "1+0" name. A RAID 10 array requires a minimum of two drives, but is more commonly implemented with 4 drives to take advantage of speed benefits.

RAID 0+1 (or 01) is a striped data set (RAID 0) which is then mirrored (RAID 1). A RAID 0+1 array requires a minimum of four drives: two to hold the striped data, plus another two to mirror the first pair.

55

RAID Implementations

Data distribution across multiple drives can be managed either by dedicated hardware or by software.

When done in software the software may be part of the OS or it may be part of the firmware and drivers supplied with the card.

56


• Operating system based (Software RAID)

• Software implementations are now provided by many OSs.

• A software layer sits above the disk device drivers and provides an abstraction layer between the logical drives (RAIDs) and physical drives.

• Most common levels are RAID 0 and RAID 1, followed by RAID 1+0, RAID 0+1, and RAID 5 are supported.

57


• Software RAID Apple's Mac OS X Server supports RAID 0, RAID 1,

RAID 5 and RAID 1+0.

FreeBSD supports RAID 0, RAID 1, RAID 3, and RAID 5 and all layerings of the above via GEOM (main storage framework for FreeBSD OS) modules, as well as supporting RAID 0, RAID 1, RAID-Z, and RAID-Z2 (similar to RAID 5 and RAID 6 respectively), plus nested combinations of those via ZFS a Suns file system and logical volume manager .

Linux supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6 and all layerings of the above.

58


• Software RAID Microsoft's server OSs support 3 RAID levels; RAID 0,

RAID 1, and RAID 5. Some of the Microsoft desktop OSs support RAID such as Windows XP Professional which supports RAID level 0 in addition to spanning multiple disks but only if using dynamic disks and volumes. Windows XP supports RAID 0, 1, and 5 with a simple file patch. RAID functionality in Windows is slower than hardware RAID, but allows a RAID array to be moved to another machine with no compatibility issues.

NetBSD supports RAID 0, RAID 1, RAID 4 and RAID 5 (and any nested combination of those like 1+0) via its software implementation, named RAIDframe.

59

RAID Implementations• Software RAID OpenBSD aims to support RAID 0, RAID 1, RAID 4 and

RAID 5 via its software implementation softraid.

OpenSolaris and Solaris 10 supports RAID 0, RAID 1, RAID 5 (or the similar “RAID Z” found only on ZFS), and RAID 6 (and any nested combination of those like 1+0) via ZFS and now has the ability to boot from a ZFS volume on both x86 and UltraSPARC a Suns microprocessor.

− Through SVM, Solaris 10 and earlier versions support RAID 1 for the boot filesystem, and adds RAID 0 and RAID 5 support (and various nested combinations) for data drives.

60


• Hardware RAID Hardware RAID controllers use different,

proprietary disk layouts, so it is not usually possible to span controllers from different manufacturers.

They do not require processor resources, the BIOS can boot from them, and tighter integration with the device driver may offer better error handling.

61

RAID Implementations• Hardware RAID A hardware implementation of RAID requires at

least a special-purpose RAID controller. On a desktop system this may be a PCI expansion card, PCI-e expansion card or built into the motherboard. Controllers supporting most types of drive may be used – IDE/ATA, SATA, SCSI, SSA, Fibre Channel, sometimes even a combination.

The controller and disks may be in a stand-alone disk enclosure, rather than inside a computer.

The enclosure may be directly attached to a computer, or connected via SAN.

62


• Hardware RAID Most hardware implementations provide a

read/write cache, which, depending on the I/O workload, will improve performance. In most systems the write cache is non-volatile (battery-protected), so pending writes are not lost on a power failure.

Hardware implementations provide guaranteed performance, add no overhead to the local CPU complex and can support many operating systems, as the controller simply presents a logical disk to the operating system.

63


• Reading Assignment What are the advantages and

disadvantages of SW RAID?

What are the advantages and disadvantages of HW RAID?

64

RAID and Backup!

• RAID is no substitute for back-up! All RAID levels except RAID 0 offer protection from

a single drive failure. A RAID 6 system even survives 2 disks dying

simultaneously. For complete security there is need to back-up the data from a RAID system.

− A back-up comes in handy if all drives fail simultaneously because of a power spike.

− It is a safeguard if the storage system gets stolen.

− Back-ups can be kept off-site at a different location. This can come in handy if a natural disaster or fire destroys your workplace.

65

RAID and Backup!

• RAID is no substitute for back-up!

− The most important reason to back-up multiple generations of data is user error. If someone accidentally deletes some important data and this goes unnoticed for several hours, days or weeks, a good set of back-ups ensure you can still retrieve those files.

− **Read about existing backup strategies

66

Redundant Power supplies

• Power supplies are the 2nd most failure-prone part

• Ideally, servers should have RPSs– The server will still operate if one

power supply fails– Should have separate power cords– Should draw power from different

sources (e.g., separate UPS)

67

Hot-swap components

• Redundant components should be hot-swappable

– New components can be added without downtime

– Failed components can be replaced without outage

• Hot-swap components increases cost

– But consider cost of downtime• Always check

– Does OS fully support hot-swapping components?

– What parts are not hot-swappable?– How long/severe is the service interruption?

68

But Servers are expensive !

• Is there an alternative?

• Server appliances

– Dedicated-purpose, already optimized– Examples: file servers, web servers, email, DNS, routers, etc.

• Many inexpensive workstations

– Common approach for web services□ Google, Hotmail, Yahoo, etc.

– Use full redundancy to counter unreliability– Can be useful (but need to consider total costs,

e.g., support and maintenance, not just purchase price)

69

Managing Services

• Services distinguish a structured computing environment from a bunch of standalone computers

• Larger groups are typically linked by shared services that ease communication and optimize resources

• Typical environments have many services

– DNS, email, authentication, networking, printing

– Remote access, license servers, DHCP, software repositories, backup services, Internet access, file service

70

Managing Services

• Providing a service means– Not just putting together hardware

and software– Making service reliable– Scaling the service– Monitoring, maintaining, and

supporting the service

71

Provide Good Solid Services

• Get customer requirements– Reason for service

□ How service will be used□ Features needed vs. desired□ Level of reliability required□ Justifies budget level

– Define a service level agreement (SLA)□ Enumerates services□ Defines level of support provided□ Response time commitments for various kinds

of problems

– Estimate satisfaction from demos or small usability trials

72


• Get operational requirements

– What other services does it depend on?□ Only services/systems built to same standards or higher

□ Integration with existing authentication or directory services?

– How will the service be administered?

– Will the service scale for growth in usage or data?

– How is it upgraded? Will it require touching each desktop?

– Consider high-availability or redundant hardware

– Consider network impact and performance for remote users

• Revisit budget after considering operational concerns

73


• Consider an open architecture– e.g. open protocols and open file formats– Proprietary protocols and formats can be

changed, may cause dependent systems/vendors to become incompatible

– Beware of vendors who “embrace and extend” so that claims can be made for standards support, while not providing customer interoperability

– Open protocols allow different parties to select client vs. server portions separately

– Open protocols change slowly, typically in upward compatible ways, giving maximum product choice

– No need for protocol gateways (another system/service)

74


• Favor simplicity over complexity

KISSKeep it Simple Stupid

– Simple systems are more reliable, easier to maintain, and less expensive

– Typically a features vs. reliability trade-off

Take advantage of vendor relationships– Provide recommendations for standard services– Let multiple vendors compete for your business– Understand where the product is going– Attempt to favor vendors who develop natively on

your platform (not port to it)

75


• Machine independence– Clients should access services using generic

names

□ e.g., www, calendar, pop, imap, etc.– Moving services to different machines becomes

invisible to users– Consider (at the start) what it will take to move

the service to a new machine

• Supportive environment– Data center provides power, AC, security,

networking– Only rely on systems/services also found in data

center (within protected environment)

76


• Reliability– Build on reliable hardware– Exploit redundancy when available

□ Plug redundant power supply into different UPS on different circuit

– Components of service should be tightly coupled□ Reduce single points of failure

– e.g., all on same power circuit, network switch, etc.

□ Includes dependent services– e.g., authentication, authorization, DNS,

etc.

77


• Reliability– Make service as simple as possible– Independent services on separate

machines, when possible like having a DNS service independent of a Mail service.□ But put multiple parts of single

service together

78


• Restrict access– Customers should not need physical access to

servers□ Fewer people□ Eliminate any unnecessary services on server (security)

• Centralization and standards– Building a service = centralizing management of

service– May be desirable to standardize the service and

centralize within the organization as well□ Makes support easier, reducing training costs□ Eliminates redundant resources

79

Provide Good Solid Services• Performance

– If a complicated service is deployed, but slow, it is unsuccessful

– Need to build in ability to scale□ Can't afford to build servers for service every year□ Need to understand how the service can be split

across multiple machines if needed

– Estimate capacity required for production (and get room for growth)

– First impression of user base is very difficult to correct

– When choosing hardware, consider whether service is Disk I/O, memory, or network bound

80


• Monitoring– Helpdesk or front-line support must be

automatically alerted to problems– Customers that notice major problems before

sysadmins are getting poor service– Need to monitor for capacity planning as well

• Service roll-out– First impressions

□ Have all documentation available□ Helpdesk fully trained□ Use slow roll-out (helps clients adjust to

service)

it2204: systems administration i 7. device management

Documents

vendor os

different os

consistency slide

new software

available slide

sense slide

new machine clean os

os images