preventing serversickness

65
Prevent Server Sickness Becoming a Pandemic! Gabriella Davis The Turtle Partnership [email protected] twitter: gabturtle

Upload: gabriella-davis

Post on 14-May-2015

869 views

Category:

Documents


2 download

DESCRIPTION

Preventing Server Sickness Becoming A Pandemic - Benelux March 2013

TRANSCRIPT

Page 1: Preventing serversickness

Prevent Server Sickness Becoming a Pandemic!

Gabriella DavisThe Turtle Partnership

[email protected]: gabturtle

Page 2: Preventing serversickness

Fixing Your Server

2

What causes server sickness

Tools to spot sicknessGetting Your Server Back to Full Health

Page 3: Preventing serversickness

Server Sickness

3

Page 4: Preventing serversickness

Server Sickness

4

The problem with Domino

How does a server get sick?–Vulnerabilities–Aging Configurations–Bad Habits–Developers Gone Wild

Page 5: Preventing serversickness

The Problem With Domino

5

“My Server Is Running Fine”

Server Stability–Often despite our best effortsTasks that just run–even without being properly configured

Page 6: Preventing serversickness

Vulnerabilities

6

Start with the OS–patch levels–unnecessary processes with exposed ports–disk and data security

Then the hardware–It’s all about disk performance–Using a SAN? Is the SAN configured for Domino?–Transaction logs configured?

Page 7: Preventing serversickness

Vulnerabilities

7

Security–ACLs

• -Default- and Anonymous• LocalDomainServers

HTTP vs HTTPs

LDAPDIIOPSametime

Page 8: Preventing serversickness

Aging Configurations

8

What can give you problems over time–Database sizes–More users–More tasks and features

Page 9: Preventing serversickness

Bad Habits

9

What are your users doing?–what features are they using–how are they using them

• are they creating repeating 10yr appointments for instance• are they copying themselves on emails

Password quality for HTTP passwords

Page 10: Preventing serversickness

Giving Developers Power

Allowing development to dictate replication and agent scheduling

The curse of not production tested XPages codeDemands for “LDAP” or “DIIOP” for an application to work

10

Page 11: Preventing serversickness

Tools to Spot Sickness

11

Page 12: Preventing serversickness

Tools to Spot Sickness

12

Understanding Priorities

DDM Probes and Event AnalysisStatistics

Catalog.nsfQoS - new with Domino 9Enhanced Fault Reporting - new with Domino 9

Page 13: Preventing serversickness

Understanding Priorities

13

Server role–What do you want from your server–What are statistics telling youWarning Levels–Is it safe to ignore ‘Warning (Low)’ and focus on

‘Fatal’ or ‘Failure’

Page 14: Preventing serversickness

Bringing Problems to You

14

Event Handlers, Event Generators, Statistics, Fault Reports and DDM Probes - where to start

Setting Statistic ThresholdsChoosing and configuring probes

Reviewing FaultsSetting up QoS behaviour

Page 15: Preventing serversickness

Bringing Problems To You

15

Why we set up collection hierarchies for DDM–and howDaily and Weekly DDM reviews–What to look out for

Page 16: Preventing serversickness

Probes for Mail Servers

16

Security - Weekly

Directory PerformanceCritical mail routes

Mail ‘Slack’

Page 17: Preventing serversickness

Probes for Application Servers

17

Agent run times –agent cpu usageSecurity and Web Configuration

Page 18: Preventing serversickness

Probes for Struggling Servers

18

OS level –disk performance (beware of reported SAN

problems)–memory–network

Page 19: Preventing serversickness

What to look for

19

Fatal problems

Persistent WarningsPeak activity behaviour–uptick in problems at 9am, 1pm etcRepetitive low level ‘annoyances’

Page 20: Preventing serversickness

Catalog.nsf

20

Not every database is immediately visible but they are all there (just hidden with selection formulae)

It’s a good place to start looking for multiple replicaIt’s a good place to find ACL issues

Replicates around your domain and updates overnight

Page 21: Preventing serversickness

QoS - Quality of Service

Monitor server health and performanceMonitors application behavior, stability and hangsRestarts Domino if it thinks there are memory issues or an application is hungShuts down Domino if a clean shutdown doesn’t happen and the server hangsControlled via notes.ini settings and dcontroller.iniRequires Domino to be running under the Java Controller

• nserver -jc21

Page 22: Preventing serversickness

QoS Configuration

Starting Domino under Java Controller should create a dcontroller.ini file

QOS_Enable=1In Notes.Ini

• QOS_ProbeInterval (defaults to 1 min)• QOS_ProbeTimeout (defaults to 5 mins)• QOS_ShutDown_Timeout• QOS_Apps_Timeout• QOS_Shutdown_Timeout

22

Page 23: Preventing serversickness

QOS - Potential Problems

QOS doesn’t support passwords on server ids , the restart will pause at the password entry screen

QOS timeouts being too lowDon’t enable QOS on servers without transaction logging

23

Page 24: Preventing serversickness

Enhanced Fault Reporting

Fault Reporting Database -lndfr.nsf

Expanded to include a by Disposition view–all faults when analyzed have a disposition value

that categorises as• Problem• Possible Problem (possibly actionable ) • Possible Problem (likely NOT actionable ) • Informational• Unknown (investigate)

24

Page 25: Preventing serversickness

Possible Problem - Actionable

Out Of Memory: Represents a crash in which the Java virtual machine (JVM) ran out of a memory resource such as heap space. Launched Notes multiple times: Indicates that the user quickly launched multiple instances of the Notes clientPossible hang: Indicates that the Notes client was manually terminated while it appeared to be doing useful work.User Kill: Indicates that the user manually terminated the client while it appeared to be waiting for input or network timeout

25

Page 26: Preventing serversickness

Back to Full Health

26

Getting Control–Mail , Databases and ECLs–SMTP–Agent Scheduling–Directories–Adminp–LDAP–Tasks and Internet Site DocumentsDomino Configuration Tuner

Page 27: Preventing serversickness

Back to Full Health

27

Getting Control–Mail , Databases and ECLs–SMTP–Agent Scheduling–Directories–Adminp–LDAP–Tasks and Internet Site DocumentsDomino Configuration Tuner

Page 28: Preventing serversickness

Getting ControlMail and Databases

28

Setting ACLs at directory level (Editor)Lock down ECLs via PoliciesIntroducing quotas alongside server based archivingConsider archiving files to a dedicated serverUpgrade to 8 and enable OOO router instead of agentsDisable forwarding rules set up by usersUse message tracking and mail rules very sparinglyDisable on the fly searching of non indexed databases

Page 29: Preventing serversickness

Database Management ToolsDBMT Server Command

• runs copy-style compact operations • purges deletion stubs • expires soft deleted entries • updates views • reorganizes folders • merges full-text indexes • updates unread lists • ensures that critical views are created for failover

–Replaces Updall• Load updall - nodbmt tells updall to run but not perform the

functions that DMBT already does

29

Page 30: Preventing serversickness

DBMT Parameters-compactThreads

-updallThreads-ftiThreads-timeLimit refers to compact timeout for DBMT-range starttime stoptime–compactNdays (run Compact every x days)–ftiNdays (run FT Index every x days)–force d (day Sunday =1) fixup if compact fails for

consecutive day

30

Page 31: Preventing serversickness

Getting ControlSMTP

31

Restrict relaying to specific ip addresses not network ranges

Beware of allowing authenticated relaying and opening up to dictionary attacksRestrict rights to send to internal groups from internet addresses

Don’t accept mail for local part matchesConfigure your server for HTML mail not plain text

Page 32: Preventing serversickness

Getting ControlSMTP (more)

32

Don’t allow all connecting hosts to deliver mail inbound, if you use a service restrict to those hosts

Use services / tools to spot attacks such as–persistent attempts to mass deliver within a time

period–continual failures by a host to deliver to a correct

addressMove responsibility for that first line of defense away from native Domino

Page 33: Preventing serversickness

Getting ControlAgent Scheduling

33

When are agents set to run–amgr_newmaileventdelay–amgr_newmailagentminintervalIf you’re using OOO agents how often are they scheduledDo users have private agents running–Sh Agents [DBName]

• All shared and private agents in a databaseWho has rights to run agents

Page 34: Preventing serversickness

Getting ControlDirectories

34

Avoid adding additional views to the Domino Directory

The risk of allowing local replicas with Author rightsDirectory Assistance –Sh xdir

Page 35: Preventing serversickness

Getting ControlAdminp

35

Purge old documents

Requests awaiting approvalTell adminp process NEW not ALL

Page 36: Preventing serversickness

Getting ControlLDAP

36

Allowing anonymous access to query LDAP

Authenticating LDAP queriesExtended Directory Catalog used by LDAP

Relying on DNSNot configuring the LDAP task correctly to allow large searches with no timeoutsMaintaining schema.nsf

Page 37: Preventing serversickness

Getting ControlTasks and Program Documents

37

Disable tasks you don’t need

Schedule overnight tasks so they don’t overlap–and don’t conflict with backupsUse program documents so you can review and manage easily–sh config servertasksat*Keeping templates on every serverUsing compact -B

Page 38: Preventing serversickness

Getting ControlInternet Site Documents

38

Web Configuration means TCPIP tasks are configured in the server document and are server wide–often enabled by defaultInternet site documents require you to opt in for TCPIP services–configured by hostname

Page 39: Preventing serversickness

Domino Configuration Tuner

39

Domino Configuration Tuner is an analysis tool based on a set of pre-configured best practice/worst practice rules

The Rules are shipped by IBM with the Lotus installs and are updated via a public update siteMakes recommendations on configuration changes to enhance performance and security and reduce TCO

Page 40: Preventing serversickness

How does it work?

40

Run and installed via the Domino Configuration Tuner databaseUpdated by online template updates and rule updatesDCT rules and results are held in a local database and will require a restart of the client for changes to take effectScans–Server documents–notes.ini settings–advanced database propertiesIntended to scan servers in a single domain

Page 41: Preventing serversickness

How does it work?

41

Creates reports on each scanned server based on the rules you select

Each report contains –Issues–recommendations for adjustments–links to supporting documentation

Page 42: Preventing serversickness

Pre-requisites

42

v8 Notes client (standard or basic) or administrator

dct.nsf database and dct.ntf templateservers 7.x or higher

Page 43: Preventing serversickness

Setup

43

DCT.NSF

StdDominoConfigTuner Template (dct.ntf)ID must have reader access to names.nsf

ID must have ‘View Administrator’ rightsRequires no server or domain changes

Page 44: Preventing serversickness

View Administrator Rights

44

Server Document

Security TabView Administrator is a subset of ‘Administrator’ rights

Think of it as ‘Show’ not ‘Tell’ rights–Sh users - YES–tell http refresh - NO

Page 45: Preventing serversickness

List of all rules

Review rule , description and supporting documentationAll rules are enabled by default for all scans

Enable and Disable rules

DCT Preferences

45

Page 46: Preventing serversickness

Connects to the IBM site to download–must have outbound connectivity

DCT Updates

46

Page 47: Preventing serversickness

DCT Updates

47

Click ‘check for updates’

Connects to an external IBM site to identifies any template or rule updates

Page 48: Preventing serversickness

DCT Updates

48

Accept license and updates download

It’s not possible to selectively download

Page 49: Preventing serversickness

DCT Updates - Finished

49

“Successful” screen will notify you to restart your client

You may need to do 2 client restarts before DCT can be used

Page 50: Preventing serversickness

Running the tuner

50

First select the servers in your current domain you want to run against

The list of servers is retrieved from the domain of the home server identified in your location document

Change locations to scan a different domain

Page 51: Preventing serversickness

Running the tuner

51

You can manually type in the full hierarchical names of any other servers you want to scan as part of this analysis

Separate multiple server names with commas, semi colons or new linesYou can only scan servers you can reach so you need a connection document to any you list–or the server needs to be available via your

passthru server in your location

Page 52: Preventing serversickness

Summary results

Issues by criticality

Understanding the Results

52

Page 53: Preventing serversickness

Summary results

Servers that failed to scan–reason why scan failed

Understanding the Results

53

Page 54: Preventing serversickness

Summary results

Detailed list of rules evaluated

Understanding the Results

54

Page 55: Preventing serversickness

View the current report

Select ‘change’ to view a different report

Understanding the Results

55

Page 56: Preventing serversickness

Understanding the Results

56

Filter results to make analysis easier–by server–by specific rules–by severity

Page 57: Preventing serversickness

Categorised results of recommendations

Sorted by criticality and then by server name

Understanding the results

57

Page 58: Preventing serversickness

Understanding the results

58

Each recommendation comes with an explanation so you can evaluate on a result by result basis if you want to make the change

Page 59: Preventing serversickness

Understanding the results

59

Each recommendation is provided with a link to a best / worst practices supporting documentation

Page 60: Preventing serversickness

Working with Rules

60

Disabling and enabling rules can be done through the ‘Preferences’

Page 61: Preventing serversickness

Working with Rules

61

Selecting a rule shows the description and links to the best / worst practice documentation

Page 62: Preventing serversickness

Making Changes

62

Advanced Database Properties–assigned en masse via Domino Adminnotes.ini settings–assigned via the command set config xxx = x–shown via the command sh config xxx = xMany recommendations refer to ‘some databases’ but don’t specify which ones - check which ones will be affected

Page 63: Preventing serversickness

Resources

63

Domino Configuration Tuner blog–http://www.bleedyellow.com/blogs/DCT/–details and explanations of new rules published

each month

Page 64: Preventing serversickness

Summary• No matter how well your servers are configured they will continue to

degrade in performance over time unless you pro-actively monitor and fix• Many of the server performance issues will be seen first by your users

before they filter down to you• Make reviewing your server configuration using DDM probes followed by

a DCT analysis part of every server upgrade• Enable probes that are specific to the server role. Mail and Directory

probes on Mail servers and Agent probes on Application servers• Use Security and Database probes configured in DDM to stay on top of any

low level warnings that could cause larger problems in the future• Don’t over configure your servers to monitor everything or you’ll be

looking for a needle in a haystack. Ask your servers to tell you only what you need to be aware of so immediately

• Use the built in tools, DCT, Statistics, DDM, Catalog, Activity Trends to monitor your servers and gain a good understanding of what is their ‘normal’ behaviour so you can more easily spot when something goes wrong.

Page 65: Preventing serversickness

Questions

How to contact me:Gabriella [email protected]: gabturtle