preventing serversickness
DESCRIPTION
Preventing Server Sickness Becoming A Pandemic - Benelux March 2013TRANSCRIPT
Prevent Server Sickness Becoming a Pandemic!
Gabriella DavisThe Turtle Partnership
[email protected]: gabturtle
Fixing Your Server
2
What causes server sickness
Tools to spot sicknessGetting Your Server Back to Full Health
Server Sickness
3
Server Sickness
4
The problem with Domino
How does a server get sick?–Vulnerabilities–Aging Configurations–Bad Habits–Developers Gone Wild
The Problem With Domino
5
“My Server Is Running Fine”
Server Stability–Often despite our best effortsTasks that just run–even without being properly configured
Vulnerabilities
6
Start with the OS–patch levels–unnecessary processes with exposed ports–disk and data security
Then the hardware–It’s all about disk performance–Using a SAN? Is the SAN configured for Domino?–Transaction logs configured?
Vulnerabilities
7
Security–ACLs
• -Default- and Anonymous• LocalDomainServers
HTTP vs HTTPs
LDAPDIIOPSametime
Aging Configurations
8
What can give you problems over time–Database sizes–More users–More tasks and features
Bad Habits
9
What are your users doing?–what features are they using–how are they using them
• are they creating repeating 10yr appointments for instance• are they copying themselves on emails
Password quality for HTTP passwords
Giving Developers Power
Allowing development to dictate replication and agent scheduling
The curse of not production tested XPages codeDemands for “LDAP” or “DIIOP” for an application to work
10
Tools to Spot Sickness
11
Tools to Spot Sickness
12
Understanding Priorities
DDM Probes and Event AnalysisStatistics
Catalog.nsfQoS - new with Domino 9Enhanced Fault Reporting - new with Domino 9
Understanding Priorities
13
Server role–What do you want from your server–What are statistics telling youWarning Levels–Is it safe to ignore ‘Warning (Low)’ and focus on
‘Fatal’ or ‘Failure’
Bringing Problems to You
14
Event Handlers, Event Generators, Statistics, Fault Reports and DDM Probes - where to start
Setting Statistic ThresholdsChoosing and configuring probes
Reviewing FaultsSetting up QoS behaviour
Bringing Problems To You
15
Why we set up collection hierarchies for DDM–and howDaily and Weekly DDM reviews–What to look out for
Probes for Mail Servers
16
Security - Weekly
Directory PerformanceCritical mail routes
Mail ‘Slack’
Probes for Application Servers
17
Agent run times –agent cpu usageSecurity and Web Configuration
Probes for Struggling Servers
18
OS level –disk performance (beware of reported SAN
problems)–memory–network
What to look for
19
Fatal problems
Persistent WarningsPeak activity behaviour–uptick in problems at 9am, 1pm etcRepetitive low level ‘annoyances’
Catalog.nsf
20
Not every database is immediately visible but they are all there (just hidden with selection formulae)
It’s a good place to start looking for multiple replicaIt’s a good place to find ACL issues
Replicates around your domain and updates overnight
QoS - Quality of Service
Monitor server health and performanceMonitors application behavior, stability and hangsRestarts Domino if it thinks there are memory issues or an application is hungShuts down Domino if a clean shutdown doesn’t happen and the server hangsControlled via notes.ini settings and dcontroller.iniRequires Domino to be running under the Java Controller
• nserver -jc21
QoS Configuration
Starting Domino under Java Controller should create a dcontroller.ini file
QOS_Enable=1In Notes.Ini
• QOS_ProbeInterval (defaults to 1 min)• QOS_ProbeTimeout (defaults to 5 mins)• QOS_ShutDown_Timeout• QOS_Apps_Timeout• QOS_Shutdown_Timeout
22
QOS - Potential Problems
QOS doesn’t support passwords on server ids , the restart will pause at the password entry screen
QOS timeouts being too lowDon’t enable QOS on servers without transaction logging
23
Enhanced Fault Reporting
Fault Reporting Database -lndfr.nsf
Expanded to include a by Disposition view–all faults when analyzed have a disposition value
that categorises as• Problem• Possible Problem (possibly actionable ) • Possible Problem (likely NOT actionable ) • Informational• Unknown (investigate)
24
Possible Problem - Actionable
Out Of Memory: Represents a crash in which the Java virtual machine (JVM) ran out of a memory resource such as heap space. Launched Notes multiple times: Indicates that the user quickly launched multiple instances of the Notes clientPossible hang: Indicates that the Notes client was manually terminated while it appeared to be doing useful work.User Kill: Indicates that the user manually terminated the client while it appeared to be waiting for input or network timeout
25
Back to Full Health
26
Getting Control–Mail , Databases and ECLs–SMTP–Agent Scheduling–Directories–Adminp–LDAP–Tasks and Internet Site DocumentsDomino Configuration Tuner
Back to Full Health
27
Getting Control–Mail , Databases and ECLs–SMTP–Agent Scheduling–Directories–Adminp–LDAP–Tasks and Internet Site DocumentsDomino Configuration Tuner
Getting ControlMail and Databases
28
Setting ACLs at directory level (Editor)Lock down ECLs via PoliciesIntroducing quotas alongside server based archivingConsider archiving files to a dedicated serverUpgrade to 8 and enable OOO router instead of agentsDisable forwarding rules set up by usersUse message tracking and mail rules very sparinglyDisable on the fly searching of non indexed databases
Database Management ToolsDBMT Server Command
• runs copy-style compact operations • purges deletion stubs • expires soft deleted entries • updates views • reorganizes folders • merges full-text indexes • updates unread lists • ensures that critical views are created for failover
–Replaces Updall• Load updall - nodbmt tells updall to run but not perform the
functions that DMBT already does
29
DBMT Parameters-compactThreads
-updallThreads-ftiThreads-timeLimit refers to compact timeout for DBMT-range starttime stoptime–compactNdays (run Compact every x days)–ftiNdays (run FT Index every x days)–force d (day Sunday =1) fixup if compact fails for
consecutive day
30
Getting ControlSMTP
31
Restrict relaying to specific ip addresses not network ranges
Beware of allowing authenticated relaying and opening up to dictionary attacksRestrict rights to send to internal groups from internet addresses
Don’t accept mail for local part matchesConfigure your server for HTML mail not plain text
Getting ControlSMTP (more)
32
Don’t allow all connecting hosts to deliver mail inbound, if you use a service restrict to those hosts
Use services / tools to spot attacks such as–persistent attempts to mass deliver within a time
period–continual failures by a host to deliver to a correct
addressMove responsibility for that first line of defense away from native Domino
Getting ControlAgent Scheduling
33
When are agents set to run–amgr_newmaileventdelay–amgr_newmailagentminintervalIf you’re using OOO agents how often are they scheduledDo users have private agents running–Sh Agents [DBName]
• All shared and private agents in a databaseWho has rights to run agents
Getting ControlDirectories
34
Avoid adding additional views to the Domino Directory
The risk of allowing local replicas with Author rightsDirectory Assistance –Sh xdir
Getting ControlAdminp
35
Purge old documents
Requests awaiting approvalTell adminp process NEW not ALL
Getting ControlLDAP
36
Allowing anonymous access to query LDAP
Authenticating LDAP queriesExtended Directory Catalog used by LDAP
Relying on DNSNot configuring the LDAP task correctly to allow large searches with no timeoutsMaintaining schema.nsf
Getting ControlTasks and Program Documents
37
Disable tasks you don’t need
Schedule overnight tasks so they don’t overlap–and don’t conflict with backupsUse program documents so you can review and manage easily–sh config servertasksat*Keeping templates on every serverUsing compact -B
Getting ControlInternet Site Documents
38
Web Configuration means TCPIP tasks are configured in the server document and are server wide–often enabled by defaultInternet site documents require you to opt in for TCPIP services–configured by hostname
Domino Configuration Tuner
39
Domino Configuration Tuner is an analysis tool based on a set of pre-configured best practice/worst practice rules
The Rules are shipped by IBM with the Lotus installs and are updated via a public update siteMakes recommendations on configuration changes to enhance performance and security and reduce TCO
How does it work?
40
Run and installed via the Domino Configuration Tuner databaseUpdated by online template updates and rule updatesDCT rules and results are held in a local database and will require a restart of the client for changes to take effectScans–Server documents–notes.ini settings–advanced database propertiesIntended to scan servers in a single domain
How does it work?
41
Creates reports on each scanned server based on the rules you select
Each report contains –Issues–recommendations for adjustments–links to supporting documentation
Pre-requisites
42
v8 Notes client (standard or basic) or administrator
dct.nsf database and dct.ntf templateservers 7.x or higher
Setup
43
DCT.NSF
StdDominoConfigTuner Template (dct.ntf)ID must have reader access to names.nsf
ID must have ‘View Administrator’ rightsRequires no server or domain changes
View Administrator Rights
44
Server Document
Security TabView Administrator is a subset of ‘Administrator’ rights
Think of it as ‘Show’ not ‘Tell’ rights–Sh users - YES–tell http refresh - NO
List of all rules
Review rule , description and supporting documentationAll rules are enabled by default for all scans
Enable and Disable rules
DCT Preferences
45
Connects to the IBM site to download–must have outbound connectivity
DCT Updates
46
DCT Updates
47
Click ‘check for updates’
Connects to an external IBM site to identifies any template or rule updates
DCT Updates
48
Accept license and updates download
It’s not possible to selectively download
DCT Updates - Finished
49
“Successful” screen will notify you to restart your client
You may need to do 2 client restarts before DCT can be used
Running the tuner
50
First select the servers in your current domain you want to run against
The list of servers is retrieved from the domain of the home server identified in your location document
Change locations to scan a different domain
Running the tuner
51
You can manually type in the full hierarchical names of any other servers you want to scan as part of this analysis
Separate multiple server names with commas, semi colons or new linesYou can only scan servers you can reach so you need a connection document to any you list–or the server needs to be available via your
passthru server in your location
Summary results
Issues by criticality
Understanding the Results
52
Summary results
Servers that failed to scan–reason why scan failed
Understanding the Results
53
Summary results
Detailed list of rules evaluated
Understanding the Results
54
View the current report
Select ‘change’ to view a different report
Understanding the Results
55
Understanding the Results
56
Filter results to make analysis easier–by server–by specific rules–by severity
Categorised results of recommendations
Sorted by criticality and then by server name
Understanding the results
57
Understanding the results
58
Each recommendation comes with an explanation so you can evaluate on a result by result basis if you want to make the change
Understanding the results
59
Each recommendation is provided with a link to a best / worst practices supporting documentation
Working with Rules
60
Disabling and enabling rules can be done through the ‘Preferences’
Working with Rules
61
Selecting a rule shows the description and links to the best / worst practice documentation
Making Changes
62
Advanced Database Properties–assigned en masse via Domino Adminnotes.ini settings–assigned via the command set config xxx = x–shown via the command sh config xxx = xMany recommendations refer to ‘some databases’ but don’t specify which ones - check which ones will be affected
Resources
63
Domino Configuration Tuner blog–http://www.bleedyellow.com/blogs/DCT/–details and explanations of new rules published
each month
Summary• No matter how well your servers are configured they will continue to
degrade in performance over time unless you pro-actively monitor and fix• Many of the server performance issues will be seen first by your users
before they filter down to you• Make reviewing your server configuration using DDM probes followed by
a DCT analysis part of every server upgrade• Enable probes that are specific to the server role. Mail and Directory
probes on Mail servers and Agent probes on Application servers• Use Security and Database probes configured in DDM to stay on top of any
low level warnings that could cause larger problems in the future• Don’t over configure your servers to monitor everything or you’ll be
looking for a needle in a haystack. Ask your servers to tell you only what you need to be aware of so immediately
• Use the built in tools, DCT, Statistics, DDM, Catalog, Activity Trends to monitor your servers and gain a good understanding of what is their ‘normal’ behaviour so you can more easily spot when something goes wrong.
Questions
How to contact me:Gabriella [email protected]: gabturtle