network management brian bramer department of computing sciences demontfort university leicester uk
Post on 15-Jan-2016
219 views
TRANSCRIPT
Network Management
Brian BramerDepartment of Computing Sciences
DeMontfort UniversityLeicester UK
1.1 Why is network management needed?
1. Networks encapsulate a corporate asset
The computer system hardware and software and the information stored can form a large percentage of the companies assets.
This would be in terms of the capital cost of the hardware and software and, the less easy to
quantify, value of the information stored, eg lists of customers and contacts, sales data and forecasts, designs of products (circuit diagrams, plans, etc).
Organisations traditionally (over)manage tangible (hardware) assets
The companies hardware assets appear on the inventory and the shareholders would expect these to managed and operated efficiently. This could range from a few staff required to
manage a PC network to 100+ for a large mainframe system.
The data represents a crucial asset:
(a) integrity and security: data must be protected from unauthorised access, theft, and even
deliberate corruption. The sale of a companies data to a competitor
could cause its failure and bankruptcy.
(b) accessibility: authorised people require access to information when and where they require it.
The traditional mainframe environment was physically centralised simplifying management
and security.
The move to distributing network systems has distributed the problems but not the responsibility
(staff have to manage systems at remote sites often in other countries).
1.2 What is needed?
1. Operational control for operational decisions
The day to day operation of the system in terms of fault finding and reporting (hardware and
software), backing up file systems, mounting new software, etc.
2. Administrative control for tactical decisions
In a competitive environment the company must keep up to date in terms of its operation and
products.
Computer systems play an important role in this, eg what new technology will become the industry
standard over the next few years?
3. Performance analysis for tactical & strategic decisions
The enable future planning detailed performance analysis of the current system is required,
eg has the current system any severe problems, what will happen to the overall system if a particular area is expanded or enhanced
(eg will a server or network segment become overloaded), etc.
1.3 What is required to support this?
Provision of adequate information
Raw data on the performance of the system in terms of usage, data flows, operational costs, etc.
Tools to support analysis of this informationMasses of raw data is no use to management;
they need easy to read tables, diagrams, graphs, pie charts, etc.
Procedures to implement resultant decisionsThe current system is overloaded or expansion is
planned; who is responsible for what?
2 Design Criteria for NetworksAny network design will be subject to constraints
and required levels of performance.
These will normally fall into the following categories:
Cost - Availability and Reliability Throughput - Response - Security
Any design will be a trade-off between these criteria, with the particular applications which the
network must support deciding their relative priorities.
2.1 Cost
All networks are expensive, both in terms of equipment (hardware and software) and
personnel, but some are enormously expensive.
The first category of cost is for the network hardware.
This is typically a small proportion of the overall cost but will be high relative to the cost of the
processing devices which it connects, eg a small office network to connect 16 PCs @ around £500 to £1500 each will require an interface board in
each PC (@ around £50 to £150) and a dedicated file server (@ around £3000 to £5000).
The second cost category is for the communication links.
Leased telephone lines are obviously expensive and their costs have not fallen in the way that
hardware has.
With local area networks the cost of the cabling itself is not high but the costs of physically
installing it are.
Alternative wireless networks - security?
A final category of cost which many organisations forget is that for support staff.
A network does not run itself but needs to be managed - support staff are needed to maintain
the configuration of the network, monitor performance, track faults and so forth.
2.2 Availability and Reliability
Availability measures the percentage time that services on the network are accessible to a user
as a percentage of the time they should be available.
Reliability measures how far the network preserves the integrity of network data in the light
of faults.
These can always be improved (although 100% can never be guaranteed) at greater cost.
This is achieved by increasing the redundancy in the network, eg by adding duplicate comms links
to provide alternate routes, duplicating equipment, etc.
Redundancy is also used in communication protocols to provides checks on the integrity of
data.
2.3 Throughput
Throughput measures the capacity of the network for sustained transfer of USEFUL data;
this measure is often crucial to the success of transaction processing systems such as the Stock
Exchange share dealing system.
It is not the same as the network data rate;
an Ethernet local area network may have a raw transmission rate of 10 Mbps but the 'application' data which it can carry in a second will be very
much less.
Many network protocols add a significant overhead to the data (the redundancy mentioned
above) and may operate on a basis which requires the transmission of control messages,
retransmission of messages damaged by errors, etc.
A number of network protocols are very badly behaved when load level increases,
eg they may stop transmitting ANY useful data above a certain load.
2.4 Response time
Response Time is the criteria which is the biggest bugbear for most network designers since, after availability, it is usually the aspect to which users
are most sensitive.
Response time problems are caused by queuing for network facilities - processor, line capacity or (most often) file access - and increase with load.
The problem can be improved at a cost by increasing the speed of the scarcest resources
but it is usually very difficult to predict exactly how much capacity will be required.
The mathematical techniques which exist require a large number of idealised assumptions about
the load; these idealised conditions are unlikely to be met in practice.
2.5 Security
Security is concerned with preventing unauthorised access to the network,
eg unauthorised users trying to gain access or by passive eavesdropping on the traffic flowing
through the network.
The first is usually implemented by means of physical (badges, toggles) or logical (password)
keys.
The second can be achieved by encryption of all data transmitted around the network and/or
shielding the transmission medium.
Data access is normally controlled via a hierarchical series of permissions for the different
file operations.
3 Network Management Elements
3.1 Configuration Management
1 What have we got where (hardware, software, information, users)?
2 What is it used for? Who has access to it?
3 What is its status? Operational, being upgraded, faulty, etc.
3.2 Fault Management
1 Detection of fault conditions
2 Diagnosis of problem
3 Recovery of service
4 Progressing fault clearance
3.4 Performance Analysis
1 Monitoring service levels
2 Identifying potential faults
3 Forward capacity planning
Ideally all the above form part of an integrated package.
3.4 Performance Analysis
1 Monitoring service levels
2 Identifying potential faults
3 Forward capacity planning
Ideally all the above form part of an integrated package.
4 Configuration Management
4.1 Why is it needed?
1. Fault detection and identification
How are faults detected?
Who checks reported faults?
What is wrong?
2 Reconfiguration for fault isolation & recovery
In a commercial organisation a fault could cause serious disruption and loss of revenue.
Is it possible to reconfigure the system while the fault is being repaired, eg mirrored disks and/or
servers, alternative network routes, etc.
3 Facilitating change
The system will need to change due to changes in technology, new software and in response to
changes in organisational requirements,
Eg: personnel changes - training, backup and replacement of staff
upgrades to hardware or software
system expansion due to new organisational requirements
How is this managed without causing chaos and staff resentment?
4 Supplier performance analysis
How is the hardware and software performing,
eg are particular makes of disk giving problems?
What is the quality of the maintenance,
eg response time, quality of components, capability of suppliers staff, does the fault
reappear, etc?
4.2 What is involved?
A formalised knowledge of all components, their location and status:
(a) inventory
type of equipment and functionidentification, eg network address, serial number,
etc.
supplier information,
eg who to contact when faulty
(b) topography
physical cable pathsbridges, gateways, routers
hubs, network connections and tapscable type and capacity, eg bandwidth
(c) status current and historical
status
eg processor type, RAM and disk size, etc.upgradability,
eg free memory and bus slots, physical capacity, etc.
5 Fault Management
5.1 Why is it needed?
To support rapid response to problems to maximise:
1 user satisfaction to real or perceived faults
2 company productivity
5.2 What is involved?
1 Fault detection (signal alarm)
(a) faults reported by users, eg "crash" or degradation of service level
(b) a large network needs continuous monitoring ideally with automated probes:
programs which monitor network activity and error levels, server workload, disk errors, etc.
could be done from remote central site typically by raising alarm signal on network map
2 Problem diagnosis (accept alarm)
(a) check it is a real fault
(b) localise problem and isolate from network
(c) classify and prioritise, eg is a server down?
(d) identify responsibility for repair, eg on-site, supplier, maintenance company, etc.
3 Problem recovery (clear alarm)
reconfigure network if possible, eg using mirrors, moving work to other systems, etc.
(I) automatic rerouting
(ii) hot spare switched in on line, e.g. mirror disks and servers.
(iii) cold spare install replacement
implies duplication of facilities - expensive applies to both hardware & software backup
4 Progressing faults: typically based on trouble ticket
(a) identifies component & symptom
(b) raised when fault detected
(c) status of open tickets reported periodically
(d) status updated periodically
(e) identify cause and correction
(f) closed when problem corrected
Clearly one needs to maintain a record of the fault history of the various system components,
eg have you a 'Friday afternoon' component with a poor history.
5.3 What tools are needed?
Fault management is very expensive of personnel
Maximise automated support - equipment monitors level of faults, etc. and
reports when a threshold is exceeded.
Most networks provide data collection tools & probes to support this
Most equipment supports loopback tests - enables testing of individual peices of equipment if
one is not sure what is wrong.
Number of third party suppliers for WANs, PC LANs, etc. -
gives a wide choice of suppliers of services and equipment.
6 Security & Accounting Management
6.1 Why is it needed?
The companies data is a valuable asset and it is important to:
Minimise risk of inadvertent or malicious damage, eg by incompetent operators, disgruntled staff,
etc.
Prevent theft or unauthorised disclosure
Minimise or prevent loss of information when a fault occurs, eg a disk crash
Adequately account for consumption of resources: computer systems are expensive to purchase
and operate - they must be properly costed and paid for by users.
6.2 What is involved?
Physical access controls on network, eg key operated doors to access terminals, locks
on keyboards, etc.
Logical access controls on network & data paths, eg passwords at various security levels within the
system.
Filters on the import of files , eg users not allowed to import files from floppy
disk or via the internet.
Audit, ie maintaining track of what users are doing
Systematic backup of storage
ie continuous/daily/weekly backup either locally to magnetic tape or to remote sites,
care of backups (in fire proof safes, copies in other buildings, etc.) - also use of mirrored
disks/servers, etc.
Bookkeeping for accounting management
1 storage used
2 peripheral use, eg printers, plotters, etc.
3 cpu usage
4 communications transmission capacity usage
6.3 What tools are needed?
Variety of hardware and software access controls, eg badges, passwords etc.
Virus checkers and disinfectants
Automatic spy monitors to check usage, ie programs that monitor what is being run or
accessed and by whom.
Usage loggers - easy to do at a node level - remarkably difficult to allocate fairly
7 Performance Analysis
7.1 Why is it needed?
Optimise (not maximise) resource utilization
Proactive fault analysis, ie find a fault before it becomes catastrophic
Support forward capacity planning
7.2 What does it involve?
1 Monitoring objective system performance criteria
throughput, ie amount of programs running, data transferred, etc.
response, eg typical response to a query on a database
availability, ie how much system down time?
2 Identifying potential problem areas:
high error incidence on a cable segment, eg possibly damaged cable, bad connector.
abnormal level of retries on a connection
abnormal level of transfer errors from a disk
3 Identifying bottlenecks
low disk space, ie need more disk space or move data to another server
excessive collisions on an Ethernet segment, eg need another bridge, move some terminals to
another segment, etc.
high reject level from a bridge
7.3 What tools are needed?
Monitors similar to those for fault management.