efficiently sharing common data htcondor week 2015 zach miller ([email protected]) center for high...
TRANSCRIPT
![Page 1: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/1.jpg)
Efficiently SharingCommon Data
HTCondor Week 2015
Zach Miller ([email protected])Center for High Throughput Computing
Department of Computer SciencesUniversity of Wisconsin-Madison
![Page 2: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/2.jpg)
› Input files are never reusedh From one job to the nexth Multiple slots on same machine
› Input files are transferred serially from the machine where the job was submitted
› This results in the submit machine often transferring multiple copies of the same file simultaneously (bad!), sometimes to the same machine (even worse!).
“Problems” with HTCondor
2
![Page 3: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/3.jpg)
› Enter the HTCache!› Runs on the execute machine› Runs under the condor_master just like any
other daemon› One daemon serves all users of that
machine› Runs with same privilege as the startd
HTCache
3
![Page 4: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/4.jpg)
› Cache is on disk
› Persists across restarts› Configurable size› Configurable cache replacement policy
HTCache
4
![Page 5: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/5.jpg)
› The cache is shared
› All slots use same local cache › Even if user is different (data is data!)› Thus, the HTCache needs the ability to
write files into a job’s sandbox as the user that will run the job
HTCache
5
![Page 6: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/6.jpg)
› Instead of fetching files from the shadow, the job instructs the HTCache to put specific files into the sandbox
› If the file is in the cache, the HTCache COPIES the file into the sandbox
› Each slot gets its own copy, in case the job decides to modify it. (As opposed to hard or soft-linking the file into the sandbox)
Preparing Job Sandbox
6
![Page 7: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/7.jpg)
› If the file is not in the cache, the HTCache fetches the file directly into the sandbox and then possibly adds it to the cache
› Wait… possibly?
Preparing Job Sandbox
7
![Page 8: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/8.jpg)
› Yes, possibly.
› Obvious case: File is larger than cache› Larger question: which files are the best to
keep?› Cache policy is one of those things where it
is rarely a “one solution works best in all cases”
Cache Policy
8
![Page 9: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/9.jpg)
› There are 10 problems in Computer Science:h Cachingh Levels of Indirectionh Off-by-one errors
› Allow flexible caching by adding a level of indirection. Don’t use size, time, etc., but rather the “value” of a file.
Cache Policy
9
![Page 10: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/10.jpg)
› How do we determine the value?› Another trick: punt to the admin!› The cache policy is implemented as a
plugin, using a dynamically loaded library:
double valuationFun (long size, long age, int stickiness,long uses, long bytes_seeded, long time_since_seed) {
return (stickiness – age) * size;
}
Cache Policy
10
![Page 11: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/11.jpg)
› The plugin determines the “value” of a file using the input parameters:h File sizeh Time file entered cacheh Time last accessedh Number of hitsh “Stickiness” (This is a hint provided by the
submit node… more on that later)
Cache Policy
11
![Page 12: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/12.jpg)
› When deciding whether or not to cache a file, the HTCache considers all files currently in the cache, plus the file under consideration
› Computes the “value” of each file› Finds the “maximum value cache” that fits
in the allocated size› May or may NOT include the file just
fetched
Cache Policy
12
![Page 13: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/13.jpg)
› There is a submit-side component as well, although it has a slightly different roleh Does not have a dedicated disk cacheh Instead, serves all files requested by jobsh Periodically scans the queue, counts the
number of jobs that use each input file, and broadcasts this “stickiness” value to all HTCache daemons
Submit Node HTCache
13
![Page 14: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/14.jpg)
› Suppose I have a cluster of 25 eight-core machines
› I have a 1GB input file common to all my jobs (a common scenario for say, BLAST)
› I submit 1000 jobs› Old way: Each time a job starts up it
transfers the 1GB file to the sandbox (1TB)
Example
14
![Page 15: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/15.jpg)
› New way: Each of the 25 machines gets the file once, shares it among all 8 slots, and it persists across jobs
› Naïve calculation: 25GB transfer (as opposed to 1TB).
› Of course, this ignores competition for the cache.
Example
15
![Page 16: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/16.jpg)
› This is where “stickiness” helps› If I submit a separate batch of 50 jobs using
a different 1GB input, the HTCache can look at the stickiness and decide not to evict the first 1GB file since 1000 jobs are scheduled to use it is opposed to 50
› It’s possible to write a cache policy tailored to your cluster’s particular workload
Example
16
![Page 17: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/17.jpg)
› This already has huge advantages.› Even if cache does nothing useful and
makes all the wrong choices, it can do NO WORSE than the existing method of transferring file every time.
› A huge advantage: Multiple slots share same cache! (And this advantage grows as number of cores grows)
› Massively reduces network load on Schedd
Success!
17
![Page 18: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/18.jpg)
HTCache Results
18
![Page 19: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/19.jpg)
› Although the load is reduced, the Schedd is still the single source for all input files
However…
19
![Page 20: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/20.jpg)
› What if there was a way to get the files from somewhere else?
› Maybe even bits of the files from multiple different sources?
› Peer-to-peer?› We already have an HTCache deployed on
all the execute nodes…
However…
20
![Page 21: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/21.jpg)
BitTorrent
21
![Page 22: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/22.jpg)
› The HTCache running on the submit node acts as a SeedServer
› It always has all pieces of files that may be read. If you recall, it is not managing a cache, only serving the already existing files in place.
› When a job is submitted, input files are then automatically added to the seed server
Submit Node w/ BitTorrent
22
![Page 23: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/23.jpg)
› The HTCache uses BitTorrent to retreive the file directly into the sandbox first.
› Optionally adds the file to its own cache
› Thus, BitTorrent is used to transfer files even if they won't end up in the cache
Execute Node w/ BitTorrent
23
![Page 24: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/24.jpg)
Putting It All Together
24
![Page 25: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/25.jpg)
Putting It All Together
25
![Page 26: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/26.jpg)
› “GradStudent-ware”h Was done as a class projecth Doesn’t yet meet the exceedingly high
standards for committing into our main code repository.
› BitTorrent traffic is completely independent from HTCondor. As such, doesn’t work with the shared_port daemon
Project Status
26
![Page 27: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/27.jpg)
› Obvious statement of the year: Caching is good!
› Runner-up: Using peer-to-peer file transfer can be faster than one-to-many file transfer!
› However, the nature of scientific workloads and multi-core machines creates an environment where these are especially advantageous
Conclusion
27
![Page 28: Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller (zmiller@cs.wisc.edu) Center for High Throughput Computing Department of Computer Sciences](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d825503460f94a681e4/html5/thumbnails/28.jpg)
› Thank you!
› Questions? Comments?
› Ask now, talk to me at lunch, or email me at [email protected]
Conclusion
28