openstack trove in production at hp - troveday 2014
Post on 29-Jun-2015
1.098 Views
Preview:
DESCRIPTION
TRANSCRIPT
August 19, 2014
OpenStack Trove Day
Vipul Sabhaya, Software Development Lead, HP Cloud
Trove in Production at HP
tesora.com 2
What is this about?• Trove• How to deploy Trove with HA• How we do config management• Monitoring Trove• Operating Trove
8/19/14
tesora.com 3
Trove• Database as a Service• MySQL• MongoDB• Cassandra• Postgres• …
• Integrated Openstack Project• Icehouse Release
8/19/14
tesora.com 4
Architecture
8/19/14
tesora.com 5
Which Cloud?• Trove has only API dependencies• Overcloud (bare-metal)?• In-Cloud (vms)?
8/19/14
tesora.com 6
HA Trove• HA OverCloud• Availability Zones
• HA Trove Control Plane• Control Plane across availability zones• Galera Cluster• RabbitMQ Cluster• Multiple Trove API, TaskManager, Conductors
8/19/14
tesora.com 8
How did we get here?• Salt Stack
• Salt-based Trove deployment• https://github.com/saurabhsurana/trove-installer/tree/m
aster/saltstack
• Salt-based Openstack deployment• https://github.com/EntropyWorks/salt-openstack
8/19/14
tesora.com 9
Configuration Management• Helps define/control • Packages and dependencies to be installed• Configuration files to be copied• Users / groups
• Gives a reproducible state of the infrastructure
• Highstate Trove-managed VMs on first boot
8/19/14
tesora.com 10
Remote Execution• No SSH
• Can control infrastructure from single machine
• Can define user and resource level access
• Specifically useful for Trove to help manage DB instances
8/19/14
tesora.com 11
trove-api.slstrove: user.present: - name: trove
trove-package: pip.installed: - name: trove - require: - user: trove
/etc/trove/trove.conf: file.managed: - source: salt://trove/api/trove.conf - template: jinja - user: trove - require: - pip: trove-package - user: trove
trove-api: service: - running - enable: True - watch: - pip: trove-package - file: /etc/trove/trove.conf
8/19/14
tesora.com 12
trove.conf# Number of child processes to runtrove_api_workers = {{ pillar['trove_worker_threads']}}
# AMQP Connection inforabbit_password = {{ pillar['trove_rabbit_password'] }}rabbit_hosts = {{ pillar['trove_rabbit_hosts'] }}rabbit_userid = {{ pillar['trove_rabbit_userid'] }}
sql_connection = {{ pillar['trove_mysql_connection']}}
{% if not pillar['devstack_setup'] %}
# Updates service and instance task statuses if instance failed become activeupdate_status_on_fail = True
# how long to wait for guest agent to become active (in sec) (default is 300)usage_sleep_time = 30usage_timeout = {{ salt['pillar.get']('trove_guestagent_active_timeout', 600) }}
{% endif %}
# Path to the extensionsapi_extensions_path = {{ pillar['trove_path'] }}/extensions/routes
8/19/14
tesora.com 13
Trove @ HP Helion• Image-based Deploys• TripleO• Trove Heat Templates• Trove Image Elements
• Saltcloud / Nova wrapper -> Salt Master -> Trove
• Seed -> Under -> Over -> Heat -> Trove
8/19/14
tesora.com 14
Operations - SaltStack• Most of the DBaaS operations are based on
SaltStack• HA Deployment of Salt Masters• Control the access to infrastructure with Salt Stack• Control access to customer instances • To help Debug the issues• But protect the data and access to MySQL database
• Each Trove guest instance becomes a minion
8/19/14
tesora.com 15
Trove Upgrades• Trove Datastore must be usable during all upgrades• Upgrades usually involve downtime• RPC Versioning
• Upgrade Sequence that we follow:• Upgrade all the guest agents first (trove service)• Upgrade Task Manager and Conductor• Upgrade API servers• If new RPC method is introduced, it must be available on the
Guest before an api operation is performed
8/19/14
tesora.com 16
Security of key Trove components• Use SSL• Trove API• RabbitMQ
• Security Group• Database• Only Control Plane components needs access
• RabbitMQ• Control Plane and All the guestagent needs access, but use the range where
ever possible
• Use separate DB and RMQ Credentials for each service
8/19/14
tesora.com 17
Monitoring of Trove Service / Instances
8/19/14
• Trove doesn’t ship with monitoring• Upstart scripts respawn Trove services• Monitor Trove API ports with Nagios• Monitor RabbitMQ and DB connectivity from
Control plane nodes
tesora.com 18
Monitoring of key Trove components
• RabbitMQ• Number of Queues• Number of Sockets used• Number of Established Connections• Cluster Status• Failed access attempts
• Database• MySQL standard monitoring• Cluster status• Slow query log• error.log for unauthorized/failed access attempts
8/19/14
tesora.com 19
Monitoring of key Trove components
• Trove Guest Agent Heartbeat status• Trove Instance Audit (catch failed instances
to help identify service issues)• Connectivity to trove instances from outside
8/19/14
tesora.com 20
What we learned?
8/19/14
tesora.com 21
OpenStack Trove : RabbitMQ • RabbitMQ• Up the default socket descriptor limit (as that will blow up
pretty soon)• Number of queues and sockets will keep on growing, if you
don’t enable RabbitMQ connections with heartbeat• Monitoring is the key to deal with RabbitMQ cluster
configured with Mirrored queues
8/19/14
tesora.com 22
OpenStack Trove• GuestAgent Hearbeats (Service Status notifications)
should be monitored for failure• Upgrading the Guest Agent is tricky on xsmall • Quota mismatch between Trove and Nova would be
the biggest reason for instance failures• Resource mismatch between Trove and Nova• Schedule jobs to correct things
8/19/14
tesora.com 23
Thank you
8/19/14
top related