sharing best practices in setting up and operating openstack ci loops
Post on 20-Jul-2015
147 Views
Preview:
TRANSCRIPT
Operating OpenStack Xen CI Loops
Stefano Stabellini, Bob Ball, Anthony Perard
BoF – Sharing Best Practices
20/05/2015
© 2014 Citrix. Confidential.2
About this BoF
Brief introduction to XenProject CI
Comparing two CI environments we have set up (XenServer vs libvirt+Xen CI)
Talk through some of the issues we have encountered, and what we see as good
practices coming out of those issues
We aren’t claiming to have “the answers” – we hope our experience will prompt
for discussions.
© 2014 Citrix. Confidential.3
Why Xen?
Xen is a type-1 hypervisor
small footprint (Less then 100K LOC)
GPLv2
Powers the largest public cloud in production(>50% of vendors listed in Gartner 2014 Cloud magic
quadrant, more in terms of hosts)
© 2014 Citrix. Confidential.9
Progress and Goals
Goals:
• Make Xen a great platform for OpenStack production deployments
• Make Xen a great platform for OpenStack development and hacking
01/2015: Xen via libvirt still in Group C
© 2014 Citrix. Confidential.11
A tale of two CIs
XenServer CI• 15 months old
• Built before many current tools were
available
• Single use devstack Virtual Machines
• Custom process to watch Gerrit
• Custom process to trigger jobs
• Heavily-modified upstream components
• Uploading logs to swift
Libvirt+Xen CI• 3 months old
• Fork of Ramy Asselin’s puppet scripts
• Single use devstack Virtual Machines
• Zuul watches Gerrit stream
• Jenkins triggers jobs
• Uploading logs to swift
© 2014 Citrix. Confidential.12
XenServer CI: Major components
Gerrit Xenapi-os-testing Devstack-GateCitrix-openstack-ci
Nodepool
project-config
© 2014 Citrix. Confidential.13
XenServer CI: The Good, The Bad and The Ugly
Single-use VMs
Easy access statistics
Trivial to disable
Email monitoring
Swift upload
Failure reproduction
Swift upload
Custom orchestration
Tempest exclusion list
Single cloud
Constant rebasing
Forked upstream repos
Comment format
Single point of failure
Inconsistent reliability
© 2014 Citrix. Confidential.14
Libvirt+Xen CI: Major components
Nodepool
Gerrit Zuul
Jenkins Devstack-GateGearman
JJB
os-ext-testing
© 2014 Citrix. Confidential.15
Libvirt+Xen CI: The Good, The Bad and The Ugly
Single-use VMs
Based on upstream
Swift upload
Highly reliable
Steep learning curve
Swift upload
Backport upstream Xen fixes
Changes to puppet scripts
No monitoring
Single Cloud
Swift upload
No pre-prod env
© 2014 Citrix. Confidential.16
Libvirt+Xen CI: Upgrades and backportNo “hacks” required
Libvirt 1.2.14 with:• f86ae40 libxl: Move job acquisition in libxlDomainStart to callers
• 894d2ff libxl: acquire a job when destroying a domain
• 6dfec1e libxl: drop virDomainObj lock when destroying a domain
Xen 4.4 (ubuntu package) with:• 9369988 libxl: event handling: Break out ao_work_outstanding
• f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding
• 4783c99 libxl: In domain death search, start search at first domid we want
• 188e9c5 libxl: Domain destroy: fork
© 2014 Citrix. Confidential.17
Our mistakesWell, some of them…
Not regularly attending the Third Party meetings - XenServer CI predated them
Too many forks - although some have been merged back already
Assumed creating own environment was easier
Incorrect assumptions with devstack-gate flags – Use The Source Luke.
Insufficient isolation between the CI environments – cloud credentials
Too many CPUs / Not enough RAM – Devstack is hungry
Using Microsoft Outlook – Insufficient Filtering
© 2014 Citrix. Confidential.19
Our Suggestions(Not necessarily best practices)
Participate in the Third Party meetings
Participate in the Third Party WG meetings
Nodes Single use, orchestrated by Nodepool. Preferably in an OpenStack cloud.
Orchestration Third-party CI puppet scripts.
Logs Served from Swift. We now have > 1TB logs.
Projects Almost everything(!) - Minimal suggestion: Add Tempest and Devstack.
Coverage Disable tests to improve pass rate(!).
© 2014 Citrix. Confidential.20
Our Questions
Voting Simplifying the disable/enable loop
Coverage Disabling individual tests – Where is the line?
Monitoring Need a solution both for us and for sharing results more widely
Enforcement Several cases of cores ignoring valid fails – breaking the CI
Platform vs Driver testing Testing of all the things
Shared orchestration Wouldn’t it be nice… No one can cover all weekends and
timezones
top related