status update of colo project xiaowei yang, huawei and will auld, intel
DESCRIPTION
We have presented the idea of coarse grain lock-stepping (COLO) virtual machiens for non-stop service in last year's xen summit. We have made significant progress in the past year and submitted the patch series to the community. It is a good time for us to present the latest status to the community and call for participation.TRANSCRIPT
Status of COLO Project
Eddie Dong*, Xiaowei Yang#
*Intel Open Source Technology Center
#Huawei Technology Co.
Key Contributors: Jianshan Lai, Congyang Wen, Tao Hong
1
Notices and Disclaimers
2
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel may make changes to specifications, product descriptions, and plans at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
All dates provided are subject to change without notice.
Intel and Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2013, Intel Corporation. All rights reserved.
Agenda
3
Background
Status
Performance
Call for action
What is COLO ?
COarse-grain LOck-stepping Virtual Machines for Non-stop Service
Solution for Client / Server application without application awareness
Dual VM based high availability solution
Relaxed constraints for higher performance
Replicated network
Copy client request to both PVM/SVM
Compare response packets from PVM and SVM with compare module
When both are the same the response is send to the client
When they are not the same, sync PVM and SVM and then send the
response
Non-Stop Service with VM Replication
6
Hardware
VMM
PVM
OS
Hardware
VMM
SVM
OS
Network
Hardware
Failure
VM Replication
Storage
Fail Over
Primary Secondary
APPs APPs
Compare w/
Remus
Problems with existing approaches
7
Instruction level lock-stepping
Excessive overhead from maintaining the exact machine state
memory access in an MP-guest is un-deterministic
Periodic Check-pointing
Extra network latency
Excessive VM checkpoint overhead
Relaxed constraints help
8
Relaxing constraints tends to lower the rate of synchronization
Periodic check-pointing defines the rate of synchronization
Tying the rate of synchronization to dissimilar responses ties it to the
application characteristics
In most cases this lowers the rate as compared to the periodic mothod
Architecture of COLO
9
COarse-grain LOck-stepping Virtual Machine for Non-stop Service
Agenda
10
Background
Status
Performance
Call for action
Current Status
11
Patches for Xen are sent to the mailing list
Academia paper published at ACM Symposium on Cloud Computing (SOCC’13)
Refer to “COLO: COarse-grained LOck-stepping Virtual Machines for Non-stop Service” for details
http://www.socc2013.org/home/program
Industry announcement
Huawei FusionSphere uses COLO
http://enterprise.huawei.com/ilink/enenterprise/about/news/news-list/HW_308817?KeyTemps=
TCP/IP optimization
Per-Connection Comparison (no modification to TCP/IP)
Coarse-grain TCP Timestamp
Coarse-grain TCP Notification Window Size
Deterministic Algorithm to segment application data
Deterministic Algorithm to generate Initail Seq Number
Deterministic Algorithm to generate ID(IP packet header)
Immediately Acknowledgement
Use separated packet to send FIN
…
EXAMPLE:Coarse-grain TCP Notification Window
Size
Coarse-grain Window size rules:
if origin window < 256
rounds down to the nearest power of 2
else
masks the 8 least significant bits
For example:
1.orgin window size=172(10101100b)
set window size to 128(1000000b)
2. orgin window size=283(100011011b)
set window size to 256(100000000b)
3. orgin window size=789(1100010101b)
set window size to 768(1100000000b)
3000 B 2000 B
1360 B 1360 B 280 B 1080 B 920 B
1360 B 1360 B 280 B 1360 B 640 B
App data1 (Time point1)
App data2(Time point2)
Method1:Find latest unsent skb and append app data2 to unused tail skb payload
Application data to send at T1 and T2
Method2:Find latest unsent skb(skb==NULL) and use new skb to send app data2
Colo Deterministic Method:NOT check the latest unsend skb and use new skb to send app data2
EXAMPLE :Deterministic segmentation
TCP/IP packet header
Write
Pnode
DM sends the Write request (offset, len, data) to PVM
cache in Snode
DM calls block driver to write to storage
Snode
DM saves Write request in SVM cache
Read
Snode
From SVM cache, or storage otherwise
Pnode
From storage
Checkpoint
DM calls block driver to flush PVM cache
Failover
DM calls block driver to flush SVM cache
Storage process
Memory sync
One of the biggest time-consume step
Asynchronous sends dirty memory when the PVM/SVM are running
Less dirty memory transmission during VM checkpoint
Less CPU pressure and latency
Critical for the case where the VM
checkpoint happens very few
Faster VBD/VIF frontend/backend suspend/resume
Old method:
communication between Frontend and backend through xenstored - low
efficient
New method:
Use event channel to speed frontend/backend communication
Agenda
18
Background
Status
Performance
Call for action
*Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance
tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to
assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Web Server Performance - Web Bench
19
Source: Intel For more complete information about performance and benchmark results, visit
www.intel.com/benchmarks
Web Server Performance - Web Bench (MP)
20
Source: Intel
For more complete information about performance and benchmark results, visit
www.intel.com/benchmarks
PostgreSQL Performance - Pgbench
21
Source: Intel For more complete information about performance and benchmark results, visit
www.intel.com/benchmarks
PostgreSQL Performance - Pgbench (MP)
22
Source: Intel
For more complete information about performance and benchmark results, visit
www.intel.com/benchmarks
Upstream
Initial patch series are posted
More comments are welcome
Depend on the readiness of the Remus on top of XL
COLO reuses Remus for VM checkpoint and heartbeat
Agenda
24
Background
Status
Performance
Call for action
Next and Call for actions
Work good when HVM linux guest + PV driver
Window guest support is under developement
Need more participants and fast turn over of upstreaming