flessr usecase 2 flexible,(on …repository.jisc.ac.uk/5871/1/flessr_-_data_use_case.pdf ·...
Post on 17-Jul-2020
0 Views
Preview:
TRANSCRIPT
FLESSR USECASE 2 – FLEXIBLE, ON-‐DEMAND DATA MANAGEMENT AND STORAGE PROVISION
HARDWARE AND PLATFORM SETUP
GENERAL SETUP
The University of Reading platform comprises of the following:
• Ubuntu 10.10 Cloud Server running Eucalyptus 2.0.1 on: o One Dell 2950 as combined Cluster Controller (CC), Cloud Controller (CLC), Storage Controller
(SC) and Walrus. o Two Dell R610 as Node Controllers (NCs) and shared storage, supporting up to 48 VM
instances.
Gigabit Ethernet is used between the servers and for external connections.
Figure 1: Use Case 2 Cloud Architecture
STORAGE ARCHITECTURE
For the purpose of this Usecase the storage infrastructure is kept within the servers. The two Node Controllers had a relatively large amount of capable local disk available (1TB RAID10 SAS per Node) and were more than capable of providing the capacity and throughput required during development.
The Storage Controller service controls two repositories, one for “buckets” and another for “volumes”. Buckets are areas for users to store Virtual Machine images. Volumes are the containers for user-‐defined disks, also known as Elastic Block Storage (EBS). These storage pools are considered local to the SC, with VM images being
copied to and cached on the NCs when a new VM is requested. EBS volumes are always kept on the SC, instead being dynamically shared via iSCSI or ATA over Ethernet (ATAoE) to the NC that requires access to that particular volume.
Storage was shared via NFS and kept to a simple design of one repository per Node. Due to always being accessed centrally, the largest overhead is on the EBS storage. This would be the primary candidate for being located on a SAN in a production environment.
PROBLEMS AND SOLUTIONS
SERVER INSTALLATION
General installation was very straightforward, with the customised Ubuntu Cloud installer automating a large portion of the Eucalyptus installation. The most recent distribution (Ubuntu 11.04) has addressed or documented some of the issues that were noted during the development of this Usecase, notably the requirement that LVM not be used for the server disk management.
ISCSI
The default network storage technology used to provide EBS to the NCs is iSCSI, with ATA over Ethernet provided as an alternative. A significant difference between the two technologies is the level they operate at. iSCSI functions at the IP level and is therefore capable of being used across multiple network boundaries. ATAoE is at the lower, Ethernet, level and cannot be exposed further than the local network subnet.
The Usecase platform used the default, iSCSI. The platform was often unreliable during heavy EBS IO. This was traced back to the Ubuntu iSCSI settings allowing for access to only 6 devices before the command queue saturated and timeouts triggered. Eucalyptus is very sensitive to disk timeouts and so rapidly dropped devices that were being throttled. This issue only arose during testing of large GlusterFS filesystems that used more than 6 EBS volumes. Although settings were changed to address this there was not enough time to see what would be required to achieve a stable setup. Unfortunately the side effect was a total corruption of any volumes in use at the time and the need to destroy all instances associated with those volumes, as the Eucalyptus state database became out of sync with the actual system status.
SERVER CONNECTIVITY
The Eucalyptus system has a massive reliance upon Ethernet connectivity. All data travels through the front end server roles. A user will never have a direct connection to their virtual machine, nor will they have a direct connection to EBS volumes. This Usecase highlights a scenario that is stressful on the networking by providing the means for many users to move large amounts of data through the front end to their respective hosted EBS volumes.
Data to EBS travels via the Storage Controller then Node Controller (VM instance) then to the EBS, hosted on the Storage Controller. If clustered or replicated filesystems are used on the EBS volumes, this further multiplies the traffic between nodes in the system per disk transaction.
Figure 2: Example of data flow for an EBS IO request.
ISSUES DURING DEVELOPMENT
Within the Eucalyptus and UEC documentation Ideal steps and concepts are presented but very little effort is made to either explain why things are done this way or how to address issues if they arise. Although possible to work around, these issues served to highlight that despite being on the second major release, Eucalyptus and the related technologies (KVM, VirtIO) suffer from a lack of maturity.
A large amount of time was involved with finding and understanding these issues -‐ the documentation for Eucalyptus available on either the official website or within the UEC wiki is sparse at best. Searching the UEC bug tracker ended up providing reference to most issues that arose during the Usecase.
EBS VOLUME MOUNTING
When connecting an EBS volume to an instance, typically one would elect to use a device name within the /dev/sd[a-‐z] range. With UEC choosing to use VirtIO attached disks by default, Eucalyptus silently fails unless using a device in the range /dev/vd[a-‐z].
Additional issues, such as devices not cleanly persisting across reboot and EBS images getting orphaned were also encountered. In the latter scenario, destroying the machine instance is the only means to free the resource. See https://bugs.launchpad.net/qemu/+bug/432154.
GLUSTERFS AND DNS
Instances reside behind an IPTables firewall hosted on the Cluster Controller, used to dynamically configure the routing of a users’ connection from the service external connection to the correct Node Controller behind it. Instances by default have no concept of their external IP address and hostname, instead using private addresses. Manual steps were required to ensure that both external clients and the instances used mutual DNS names that resolved to the respective IPs. This allowed GlusterFS configurations to refer to a hostname common across all parts of the filesystem setup.
STUCK CREDENTIALS -‐ NO IMAGE DEPLOYMENT
The Amazon EC2 API makes extensive use of public key encryption for authentication. The Eucalyptus user management interface generates a zip file containing the certificates assigned to the user so that once can easily set up service access.
Accounts keypairs are used by Eucalyptus to manage image deployment. Uploaded images are encrypted by the account used to import them, then decrypted on a Node Controller the first time it is used to run an
instance. Downloading a new credential archive from the Cloud Controller web interface generates a new keypair. The old keypair is not referenced when decrypting during deployment, resulting in a failure to run a VM. See https://bugs.launchpad.net/eucalyptus/+bug/644482.
ZEEL/I
Zeel/I allows for us to develop against a single API without having to be concerned about variations in Cloud provider APIs and management methods. The API is developed against using Java. Documentation is in early stages but once familiar with how the API functioned, development was quite rapid.
DEVELOPMENT / API
PROS
• Single point of management and login. • Cloud provider API abstraction.
o Support of a heterogeneous cloud. • The concept of regions and resource costing allows for scoping of datacentre and type of server
resource granted to a user.
CONS
• The API is predominantly focussed on instance management rather than EBS management, at times needing additions to the API to create functional parity between the instance and disk management capabilities.
• Eucalyptus providers don’t currently report as much status information as would be desired for fine-‐grained control of EBS volumes.
• Can only use one keypair at a time when interacting with an instance. If deploying an instance with a keypair not generated by Zeel/I then it is incapable of accessing that instance as it is missing the private half of the keypair (Zeel/I has no means of retrieving a private key from the cloud provider). A workaround was to manually embed an existing public key in the image prior to or during deployment.
ID MANAGEMENT
Zeel/I strives to remove the need to track multiple credentials and certificates when making use of the cloud regions it has access to. A UK e-‐Science certificate is associated with the user’s per-‐cloud certificates and from then on only the e-‐Science certificate is required to interact with all services. Although this currently requires the disclosure of the private keys one owns, this is a step in the right direction for simple cloud use.
HYBRIDFOX
Hybridfox is an extension to Firefox that greatly eases the use of a cloud system that implements the Amazon EC2 API, of which Eucalyptus is one. Although the final Usecase programs use Zeel/I, Hybridfox was very useful for directly interacting with the various cloud regions used in the trial without having to resort to a command line. See http://code.google.com/p/hybridfox/.
GENERAL USE
Very simple. Some quirks but mostly addressed by the documentation.
ID MANAGEMENT
Build a list of Region endpoints and your accounts in them and you can easily move between systems. Administrative tasks are available if the extension detects that your account has been granted such rights.
Keypairs can be easily generated on a cloud and exported to a file for logging into instances.
VS ZEEL/I
• No costing estimation. • Requires knowledge of images provided and how to deploy and interact with them.
o Zeel/I removes the bulk of this from the user and places it in the hands of the developer. • Easy to see and manage resources. • Doesn’t provide remote command or file upload/download mechanisms.
PERFORMANCE
IOPERF DATA
Statistics were generated against a replicated GlusterFS volume spread across two instances, two EBS volumes. The filesystem was ext4. Computer running the test was running Linux with a 1GB Ethernet connection to the Cloud Controller.
Command used to generate the data was:
iozone -‐a -‐R -‐b/home/vis07rmt/glusterio.xls -‐g64m -‐e –c
All graphs show a summary of throughput for different record sizes (amount of data written/read per IO) for a number of different overall file sizes, ranging from 128KB to 64MB. See Appendix A for more.
64
1024
16384 0
5000
10000
15000
20000
25000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Random Write
HOW HARDWARE / INFRASTRUCTURE AFFECTS THIS
As noted above, the platform was often unreliable during heavy EBS IO though this issue only arose during testing of large GlusterFS filesystems that used more than 6 EBS volumes.
EBS -‐ FROM REQUEST TO READY:
USER PROVIDES:
LOCATION
• (Remove specifics from user as much as possible) • local/external vs. Cloud name such as "Reading FleSSR Cloud".
DISK SPEC
• Size • Duration of ownership (Costing NYI)
ACCESS METHODS
• SCP -‐ WebDAV -‐ FTP -‐ GlusterFS
SHOW COST OF CURRENT CHOICE TO USER
• Requires functioning policies and costs associated with duration and type – not available during development so everything “free”
LAUNCH A NEW STORAGE VM INSTANCE
• Check user has a 'default' keypair o Generate if not
64
512 4096 32768
0
500000
1000000
1500000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Random Read
• Check security group allows access to Zeel/I o Generate if not
• Restrict location based on user choice • Restrict Box based on location AMI / manifest
o (Probably the same ID in all clouds) • Reserve and Provision VM
o Wait for / verify running state
SET SECURITY GROUP FOR ACCESS METHODS
• Check for / add EBS group o Populate group with rules
• Check for / add default keypair
CREATE NEW EBS DISK PER REQUEST
• Restrict based on location • Set size • Allocate disk
o Wait for / verify
ATTACH EBS TO NEW VM
• Set parameters -‐ /dev/vda • Request attach
o Wait for and verify o Check from within VM for hotplug event (via ControlAgent) – no success state reporting
provided by the cloud so must be a manual check
FORMAT + MOUNT EBS
• Send commands via ControlAgent o Partition -‐ whole disk o Format -‐ ext4 o Mount -‐ /ebs
§ Verify at each stage via ExecResult • Scripts baked into custom image cover these steps. Only need to invoke script via ControlAgent.
CONFIGURE VM FOR REQUESTED ACCESS TYPES
• Generate user credentials unique to this instance o Store on local disk for later retrieval and display via UI o Set home dir to mount point
• SCP / SFTP o "Just works"
• WebDAV o Generate .htpasswd based on earlier credentials o Start httpd
• FTP
• GlusterFS o Modify hosts to reflect external DNS o Start glusterd o Create volume
PROVIDE FEEDBACK TO USER -‐ ACCESS ENDPOINT URLS / CREDENTIALS
• Show credentials as generated earlier • Show connection methods -‐ Hostname or URL
CUSTOMISE "STORAGE VM":
INCLUDE:
• SCP /SFTP • WebDAV • FTP • rsync (for cloud migration) • (GlusterFS)
PRE-‐CONFIGURE AS MUCH AS POSSIBLE
RE-‐BUNDLE IMAGE FOR UEC
DETERMINE POST-‐INSTANTIATION CHANGES
CREATE SCRIPT / COMMAND LIST TO CARRY OUT CHANGES
GLUSTER
Staff at Environmental Systems Science Centre (ESSC) at the University of Reading tested the Gluster distributed storage system (http://www.gluster.com) on the Eucalyptus infrastructure.
Many researchers at the ESSC have data sets that are hundreds of gigabytes (GB) in size, and the total amount of data currently stored on ESSC’s storage servers currently exceeds 100 terabytes (TB). This capacity has been built up over several years, by the acquisition of computer hardware as and when research funds were available. One result of this piecemeal expansion is that large data sets frequently span two or more separate servers, and ESSC has implemented a distributed storage system called GlusterFS to manage the data effectively. In ESSC’s GlusterFS storage cluster the servers are deployed as replicated pairs, and each pair is divided up into a number of storage units called bricks. A volume typically consists of two or more replicated bricks, with files evenly distributed between the bricks on several servers. Each volume is exported via NFS and also via the GlusterFS native client, which is best used for applications involving a high level of concurrent data access. The replication feature allows the all the files in each volume to remain accessible in the event of the failure or scheduled maintenance of one server in each pair. Most of the data is the output of computer models running on local or remote compute servers and clusters, but there is also a significant amount of observational data and products of data analysis and processing utilities.
In order to make effective use of FleSSR’s storage facilities, ESSC would need to be able to aggregate the capacity of several Elastic Block Storage (EBS) blocks to create large enough FleSSR storage volumes , and to have the flexibility to increase their size according to demand. An increasing number of GlusterFS users are using commercial cloud storage infrastructure such as Amazon Web Services (AWS), and recent versions of the software have enhanced features specifically for this type of usage. Therefore, it was sensible for GlusterFS to be involved in ESSC’s FleSSR storage tests. A long term aspiration is to extend ESSC’s existing storage cluster into the FleSSR cloud, to cater for sudden or short term increases in demand for storage for example. To test the effectiveness of FleSSR for storing ESSC type research data, GlusterFS volumes were set up on the Reading and Eduserv FleSSR clouds. The main concerns at the outset were performance and reliability, especially given the physical distance between ESSC and Eduserv’s data centre in Swindon.
GlusterFS was not part of the original FleSSR project plan, and the software developed during the project did not include features for automating the GlusterFS deployment process. Nevertheless, deploying GlusterFS was relatively straightforward, assisted by the availability of Debian class installation packages. The GlusterFS instances were deployed using the HybridFox browser plugin for Eucalyptus, each instance being attached to a single 100GB EBS block. This was the largest EBS block size available on the Reading and Eduserv clouds during the project, but there is no reason why this block size could not be increased in the future, using servers with cheaper SATA drives for example, to make the management of volumes several TB in size a practical and affordable proposition.
Two 500GB volumes were created, one at Reading and one at Eduserv, each consisting of five EBS blocks. GlusterFS replication was not tested, because all the EBS blocks in each volume were physically located on the same server. Although it was beyond the scope of the FleSSR project, there are several potential uses of replication between two or more clouds or cloud providers, a technique that is sometimes known as geo-‐replication. For example, geo-‐replication could be used to safeguard against data loss or temporary unavailability involving one cloud, or to allow two or more groups of users a large distance apart to share the same data.
The early tests of the Reading and Eduserv storage volumes were positive. The Eucalyptus cloud security features were flexible enough to allow both volumes to be mounted in ESSC’s hierarchical network file system, enabling users to access the FleSSR data in the same way as data stored on ESSC’s local storage cluster and conventional file severs. The Reading FleSSR volume was indistinguishable from the local storage volumes in terms of performance, with write speeds of ~30MB/sec measured on average for a range of file sizes up to several GB per file . Access to the Eduserv volume was much slower as expected, with typical write speeds of around 5MB/sec. This level of performance would not rule out direct access to the data by computer models and data analysis utilities, but it is more likely that such volume s would be more useful as archive repositories of infrequently used data. Unfortunately, the unreliability of the prototype cloud infrastructure prevented a more thorough evaluation of the effectiveness of the FleSSR storage facilities by ESSC researchers. Despite several debugging attempts by system administrators, followed by repeated GlusterFS volume deployments (aided by shell scripts designed to automate parts of the process), the system could not withstand sustained data transfer for more than a few hours at a time, and the 500GB volumes never came close to being filled to capacity. However, the basic data transfer tests that were carried out suggested that FleSSR cloud storage could be used effectively by researchers at ESSC to supplement, or even replace, the local storage cluster.
APPENDIX A. IOZONE EBS THROUGHPUT DATA
Tests were carried out on a 1Gb Ethernet connection to the Front End against a 2-‐brick GlusterFS volume. The GlusterFS volume was mounted on the client machine using a fuse mountpoint.
Command used to generate the data was:
iozone -‐a -‐R -‐b/home/vis07rmt/glusterio.xls -‐g64m -‐e –c
All graphs show a summary of throughput for different record sizes (amount of data written/read per IO) for a number of different overall file sizes, ranging from 128KB to 64MB.
64
512 4096 32768
0
5000
10000
15000
20000
25000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Writer
64
512 4096 32768
0
5000
10000
15000
20000
25000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Re-‐Writer
64
512
4096 32768
0
20000
40000
60000
80000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Reader
64
512 4096 32768
0
500000
1000000
1500000
2000000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Re-‐Reader
64
512 4096 32768
0
500000
1000000
1500000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Random Read
64
1024
16384 0
5000
10000
15000
20000
25000
4 8 16
32
64
128
256
512
1024
2048
4096
8192
16384
File Size (KB)
Throughp
ut (K
B/s)
Record Size (KB)
Random Write
top related