maintaining large vista installations amy edwards, ezra freelove, & george hernandez july 12,...
TRANSCRIPT
2
Agenda
• Comparisons
• Who is USG
• Automation
• Monitoring
• Maintenance
• More Tricks
• Questions?
3
Informal Poll - Number of nodes
(All prod clusters) now:• 1-10• 11-20• 21-50• 50-70• 70+
• Ours in bold
• (All prod clusters) by December:
• 1-10• 11-20• 21-50• 50-70• 70+
4
Informal Poll – Number of DB Instances
Including secondary and non-production
• 1-2• 3-6• 7-10• 10+
• Ours in bold
6
GeorgiaVIEW Project
• University System of Georgia (USG)
• Vista 3.0.7 • Host 32 institutions &
multiple consortial programs
• >150,000 active students– Active is 100+ actions
• >11,000 active sections / term
7
Issues
• Handling performance issues
• Capacity planning
• Upgrades
• Replication
• JMS sensitivity
• Integration
8
Automation
• Rolling Restarts– Managed nodes restarted weekly
• except JMS
• Log cleanup to preserve space• Error reporting
– application, tracking, vulnerabilities
• Thread dumps• Sync admin node with backup• LDIS batch integration
9
Monitoring
• Nagios– http://www.nagios.org/– Sends alerts
• Stats– Custom AJAX web app– Watch changes of over time
• AWStats– http://www.awstats.org/
11
Nagios Monitors
• OS / Hardware– Load– Temperature– Free space
• Database– Tablespace free space– Listener– Oracle processes
• Application– Direct-login– Weblogic processes– Java MBeans
• Default/Primary Pending Requests Current Count
• Java Heap Current
• JDBC Waiting for Connection Current Count
• Multicast Messages Lost
• Primary count
12
Stats
• Short and long term analysis– 21 months of data
• Graphs all Nagios data collected
• Flexible creation of reports
• Built with AJAX
16
AWStats
• Records data from web server logs
• Custom script grabs data from webserver.log files
• Runs daily
20
JMS Node
• Provides special services– Mail, LC creation, chat
• Failure or migration of JMS node hinders usage
• Services do not migrate well– Allow targeted migration– OTHERS: Pin JMS to a specific node
21
Integration
• Batched LDIS data files
• Cron runs nightly• Files broken up by:
– type– “reasonable” number
of records
• Done on Inst node– Issues with import can
kill node
22
Touching Nodes
• ssh & dsh– Touch groups of nodes at once– Useful for:
• Installs• Gathering logs• Locating a session
23
Maintenance Page
• Hosted on opposite f5
• Two versions– Scheduled maintenance– Unscheduled outage
• In an f5 outage, move DNS to other f5 so message still appears
24
Installs and Upgrades
• Silent install scripts
• Test in both development environments– Create against a small database– Get results of time to complete against a full
size copy of production
• Install to production
25
Powerlinks and Custom Development
• Test in development
• Try to break
• Pilot in production
• Release to all
27
Want More?
• To view my resources and references for this presentation, visit
www.scholar.com• Simply click “Advanced Search” and
search by ezrafreelove and tag: ‘bbworld07’
28
Contact Information
• Ezra Freelove [email protected]
• Amy [email protected]
• George [email protected]