network management brian bramer department of computing sciences demontfort university leicester uk

Network Management

Brian BramerDepartment of Computing Sciences

DeMontfort UniversityLeicester UK

1.1 Why is network management needed?

1. Networks encapsulate a corporate asset

The computer system hardware and software and the information stored can form a large percentage of the companies assets.

This would be in terms of the capital cost of the hardware and software and, the less easy to

quantify, value of the information stored, eg lists of customers and contacts, sales data and forecasts, designs of products (circuit diagrams, plans, etc).

Organisations traditionally (over)manage tangible (hardware) assets

The companies hardware assets appear on the inventory and the shareholders would expect these to managed and operated efficiently. This could range from a few staff required to

manage a PC network to 100+ for a large mainframe system.

The data represents a crucial asset:

(a) integrity and security: data must be protected from unauthorised access, theft, and even

deliberate corruption. The sale of a companies data to a competitor

could cause its failure and bankruptcy.

(b) accessibility: authorised people require access to information when and where they require it.

The traditional mainframe environment was physically centralised simplifying management

and security.

The move to distributing network systems has distributed the problems but not the responsibility

(staff have to manage systems at remote sites often in other countries).

1.2 What is needed?

1. Operational control for operational decisions

The day to day operation of the system in terms of fault finding and reporting (hardware and

software), backing up file systems, mounting new software, etc.

2. Administrative control for tactical decisions

In a competitive environment the company must keep up to date in terms of its operation and

products.

Computer systems play an important role in this, eg what new technology will become the industry

standard over the next few years?

3. Performance analysis for tactical & strategic decisions

The enable future planning detailed performance analysis of the current system is required,

eg has the current system any severe problems, what will happen to the overall system if a particular area is expanded or enhanced

(eg will a server or network segment become overloaded), etc.

1.3 What is required to support this?

Provision of adequate information

Raw data on the performance of the system in terms of usage, data flows, operational costs, etc.

Tools to support analysis of this informationMasses of raw data is no use to management;

they need easy to read tables, diagrams, graphs, pie charts, etc.

Procedures to implement resultant decisionsThe current system is overloaded or expansion is

planned; who is responsible for what?

2 Design Criteria for NetworksAny network design will be subject to constraints

and required levels of performance.

These will normally fall into the following categories:

Cost - Availability and Reliability Throughput - Response - Security

Any design will be a trade-off between these criteria, with the particular applications which the

network must support deciding their relative priorities.

2.1 Cost

All networks are expensive, both in terms of equipment (hardware and software) and

personnel, but some are enormously expensive.

The first category of cost is for the network hardware.

This is typically a small proportion of the overall cost but will be high relative to the cost of the

processing devices which it connects, eg a small office network to connect 16 PCs @ around £500 to £1500 each will require an interface board in

each PC (@ around £50 to £150) and a dedicated file server (@ around £3000 to £5000).

The second cost category is for the communication links.

Leased telephone lines are obviously expensive and their costs have not fallen in the way that

hardware has.

With local area networks the cost of the cabling itself is not high but the costs of physically

installing it are.

Alternative wireless networks - security?

A final category of cost which many organisations forget is that for support staff.

A network does not run itself but needs to be managed - support staff are needed to maintain

the configuration of the network, monitor performance, track faults and so forth.

2.2 Availability and Reliability

Availability measures the percentage time that services on the network are accessible to a user

as a percentage of the time they should be available.

Reliability measures how far the network preserves the integrity of network data in the light

of faults.

These can always be improved (although 100% can never be guaranteed) at greater cost.

This is achieved by increasing the redundancy in the network, eg by adding duplicate comms links

to provide alternate routes, duplicating equipment, etc.

Redundancy is also used in communication protocols to provides checks on the integrity of

data.

2.3 Throughput

Throughput measures the capacity of the network for sustained transfer of USEFUL data;

this measure is often crucial to the success of transaction processing systems such as the Stock

Exchange share dealing system.

It is not the same as the network data rate;

an Ethernet local area network may have a raw transmission rate of 10 Mbps but the 'application' data which it can carry in a second will be very

much less.

Many network protocols add a significant overhead to the data (the redundancy mentioned

above) and may operate on a basis which requires the transmission of control messages,

retransmission of messages damaged by errors, etc.

A number of network protocols are very badly behaved when load level increases,

eg they may stop transmitting ANY useful data above a certain load.

2.4 Response time

Response Time is the criteria which is the biggest bugbear for most network designers since, after availability, it is usually the aspect to which users

are most sensitive.

Response time problems are caused by queuing for network facilities - processor, line capacity or (most often) file access - and increase with load.

The problem can be improved at a cost by increasing the speed of the scarcest resources

but it is usually very difficult to predict exactly how much capacity will be required.

The mathematical techniques which exist require a large number of idealised assumptions about

the load; these idealised conditions are unlikely to be met in practice.

2.5 Security

Security is concerned with preventing unauthorised access to the network,

eg unauthorised users trying to gain access or by passive eavesdropping on the traffic flowing

through the network.

The first is usually implemented by means of physical (badges, toggles) or logical (password)

keys.

The second can be achieved by encryption of all data transmitted around the network and/or

shielding the transmission medium.

Data access is normally controlled via a hierarchical series of permissions for the different

file operations.

3 Network Management Elements

3.1 Configuration Management

1 What have we got where (hardware, software, information, users)?

2 What is it used for? Who has access to it?

3 What is its status? Operational, being upgraded, faulty, etc.

3.2 Fault Management

1 Detection of fault conditions

2 Diagnosis of problem

3 Recovery of service

4 Progressing fault clearance

3.4 Performance Analysis

1 Monitoring service levels

2 Identifying potential faults

3 Forward capacity planning

Ideally all the above form part of an integrated package.

4 Configuration Management

4.1 Why is it needed?

1. Fault detection and identification

How are faults detected?

Who checks reported faults?

What is wrong?

2 Reconfiguration for fault isolation & recovery

In a commercial organisation a fault could cause serious disruption and loss of revenue.

Is it possible to reconfigure the system while the fault is being repaired, eg mirrored disks and/or

servers, alternative network routes, etc.

3 Facilitating change

The system will need to change due to changes in technology, new software and in response to

changes in organisational requirements,

Eg: personnel changes - training, backup and replacement of staff

upgrades to hardware or software

system expansion due to new organisational requirements

How is this managed without causing chaos and staff resentment?

4 Supplier performance analysis

How is the hardware and software performing,

eg are particular makes of disk giving problems?

What is the quality of the maintenance,

eg response time, quality of components, capability of suppliers staff, does the fault

reappear, etc?

4.2 What is involved?

A formalised knowledge of all components, their location and status:

(a) inventory

type of equipment and functionidentification, eg network address, serial number,

etc.

supplier information,

eg who to contact when faulty

(b) topography

physical cable pathsbridges, gateways, routers

hubs, network connections and tapscable type and capacity, eg bandwidth

(c) status current and historical

status

eg processor type, RAM and disk size, etc.upgradability,

eg free memory and bus slots, physical capacity, etc.

5 Fault Management


To support rapid response to problems to maximise:

1 user satisfaction to real or perceived faults

2 company productivity


1 Fault detection (signal alarm)

(a) faults reported by users, eg "crash" or degradation of service level

(b) a large network needs continuous monitoring ideally with automated probes:

programs which monitor network activity and error levels, server workload, disk errors, etc.

could be done from remote central site typically by raising alarm signal on network map

2 Problem diagnosis (accept alarm)

(a) check it is a real fault

(b) localise problem and isolate from network

(c) classify and prioritise, eg is a server down?

(d) identify responsibility for repair, eg on-site, supplier, maintenance company, etc.

3 Problem recovery (clear alarm)

reconfigure network if possible, eg using mirrors, moving work to other systems, etc.

(I) automatic rerouting

(ii) hot spare switched in on line, e.g. mirror disks and servers.

(iii) cold spare install replacement

implies duplication of facilities - expensive applies to both hardware & software backup

4 Progressing faults: typically based on trouble ticket

(a) identifies component & symptom

(b) raised when fault detected

(c) status of open tickets reported periodically

(d) status updated periodically

(e) identify cause and correction

(f) closed when problem corrected

Clearly one needs to maintain a record of the fault history of the various system components,

eg have you a 'Friday afternoon' component with a poor history.

5.3 What tools are needed?

Fault management is very expensive of personnel

Maximise automated support - equipment monitors level of faults, etc. and

reports when a threshold is exceeded.

Most networks provide data collection tools & probes to support this

Most equipment supports loopback tests - enables testing of individual peices of equipment if

one is not sure what is wrong.

Number of third party suppliers for WANs, PC LANs, etc. -

gives a wide choice of suppliers of services and equipment.

6 Security & Accounting Management


The companies data is a valuable asset and it is important to:

Minimise risk of inadvertent or malicious damage, eg by incompetent operators, disgruntled staff,

etc.

Prevent theft or unauthorised disclosure

Minimise or prevent loss of information when a fault occurs, eg a disk crash

Adequately account for consumption of resources: computer systems are expensive to purchase

and operate - they must be properly costed and paid for by users.


Physical access controls on network, eg key operated doors to access terminals, locks

on keyboards, etc.

Logical access controls on network & data paths, eg passwords at various security levels within the

system.

Filters on the import of files , eg users not allowed to import files from floppy

disk or via the internet.

Audit, ie maintaining track of what users are doing

Systematic backup of storage

ie continuous/daily/weekly backup either locally to magnetic tape or to remote sites,

care of backups (in fire proof safes, copies in other buildings, etc.) - also use of mirrored

disks/servers, etc.

Bookkeeping for accounting management

1 storage used

2 peripheral use, eg printers, plotters, etc.

3 cpu usage

4 communications transmission capacity usage


Variety of hardware and software access controls, eg badges, passwords etc.

Virus checkers and disinfectants

Automatic spy monitors to check usage, ie programs that monitor what is being run or

accessed and by whom.

Usage loggers - easy to do at a node level - remarkably difficult to allocate fairly

7 Performance Analysis


Optimise (not maximise) resource utilization

Proactive fault analysis, ie find a fault before it becomes catastrophic

Support forward capacity planning

7.2 What does it involve?

1 Monitoring objective system performance criteria

throughput, ie amount of programs running, data transferred, etc.

response, eg typical response to a query on a database

availability, ie how much system down time?

2 Identifying potential problem areas:

high error incidence on a cable segment, eg possibly damaged cable, bad connector.

abnormal level of retries on a connection

abnormal level of transfer errors from a disk

3 Identifying bottlenecks

low disk space, ie need more disk space or move data to another server

excessive collisions on an Ethernet segment, eg need another bridge, move some terminals to

another segment, etc.

high reject level from a bridge


Monitors similar to those for fault management.

network management brian bramer department of computing sciences demontfort university leicester uk

Documents

network systems

companies data

overall system

data flows

terms of equipment hardware

sales data

large mainframe system

network segment