cs 744: google file system
TRANSCRIPT
![Page 1: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/1.jpg)
CS 744: GOOGLE FILE SYSTEM
Shivaram VenkataramanFall 2021
Hello !
![Page 2: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/2.jpg)
ANNOUNCEMENTS
- Assignment 1 out later today- Group submission form- Anybody on the waitlist?
![Page 3: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/3.jpg)
OUTLINE
1. Brief history2. GFS3. Discussion4. What happened next?
![Page 4: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/4.jpg)
HISTORY OF DISTRIBUTED FILE SYSTEMS
![Page 5: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/5.jpg)
SUN NFS
FileServer
Client
Client
Client
Client
RPC
RPC
RPC
RPCLocal FS
~ 1980s
read (a. txt,o,4096)
I
Ents tread
Data
pots ofdisk
![Page 6: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/6.jpg)
/dev/sda1 on //dev/sdb1 on /backups
NFS on /home
/
backups home
bak1 bak2 bak3
etc bin
tyler
537
p1 p2
.bashrc
Transparentto the user
Nfs provided Posix semantics
![Page 7: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/7.jpg)
CACHING
Client cache records time when data block was fetched (t1)Before using data block, client does a STAT request to server
- get’s last modified timestamp for this file (t2) (not block…)- compare to cache timestamp- refetch data block if changed since timestamp (t2 > t1)
Local FS
Server Client 2
NFScache: A t1t2
Client - side
reaching+
STAT requestto
validate cache
entries
![Page 8: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/8.jpg)
ANDREW FILE SYSTEM
- Design for scale
- Whole-file caching
- Callbacks from server
when you open ( AFS)^a file ,
entire
1,
me .
and cached at
client
![Page 9: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/9.jpg)
WORKLOAD PATTERNS (1991)
"
" "
f.les
2=44
![Page 10: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/10.jpg)
OceanSTORE/PAST
Wide area storage systems
Fully decentralized
Built on distributed hash tables (DHT)
late 90s to
early 2000s
![Page 11: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/11.jpg)
GFS: WHY ?
hots of data
↳very large files
Fault tolerance
workload patterns- Primarily append- latency was not a concern
-maximize bandwidth use
![Page 12: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/12.jpg)
GFS: WHY ?
Components with failures Files are huge !
Applications are different
![Page 13: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/13.jpg)
GFS: WORKLOAD ASSUMPTIONS
“Modest” number of large files
Two kinds of reads: Large Streaming and small random
Writes: Many large, sequential writes. Few random
High bandwidth more important than low latency
--
![Page 14: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/14.jpg)
GFS: DESIGN
- Single Master for metadata
- Chunkservers for storing data
- No POSIX API ! - No Caches!
"
gfs.tn"
read part of a filea. txt 2
G -
( abcd ,@11m27)
abcd 5123
M1
☒ ☐ ☐ ☐ DD ☐
files are chunked into
fixed sire chunks
![Page 15: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/15.jpg)
CHUNK SIZE TRADE-OFFS
Client à Master
Client à Chunkserver
Metadata
if chunk size is small more calls
to Marter
too small ⇒ need b- open
connections to many
chink serverstoo large
smaller chunks↳ hotspot at chink sewers
⇒ moremetadata-
HDFS ~ 128m13
⇒ 64 MB more ? ?
![Page 16: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/16.jpg)
GFS: REPLICATION
- 3-way replication to handle faults- Primary replica for each chunk- Chain replication (consistency)
- Decouple data, control flow- Dataflow: Pipelining, network-
aware
\
replicaslease is
givenb- -
Primary -
replica
-mutationusing -
data_
-
![Page 17: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/17.jpg)
RECORD APPENDSWrite Client specifies the offsetRecord Append GFS chooses offset
ConsistencyAt-least onceAtomic
→ write ( a. txt ,21421
- " abcd"
)["
Wisconsin"
,
20ns,200 Ok] → record
→ the appended record will appear atleast
once in the chunk
↳ the entire record will appear
![Page 18: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/18.jpg)
MASTER OPERATIONS
- No “directory” inode! Simplifies locking- Replica placement considerations
- Implementing deletes ←÷¥÷.→ load,disk utilization la
→ failure probability
→ hazy .
Rename the file
Garbage collection
![Page 19: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/19.jpg)
FAULT TOLERANCE
- Chunk replication with 3 replicas- Master
- Replication of log, checkpoint- Shadow master
- Data integrity using checksum blocks
Gfs Master
¥4
![Page 20: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/20.jpg)
DISCUSSION
https://forms.gle/YpDcxPncdqnZ7JXG6
![Page 21: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/21.jpg)
GFS SOCIAL NETWORKYou are building a new social networking application. The operations you will need to perform are
(a) add a new friend id for a given user (b) generate a histogram of number of friends per user.
How will you do this using GFS as your storage system ? "
histogram stream through the filedelete
ner+id,<Htottn%appendrandom ÷write ☐ ☒→ small files
merit Merzavoid randomwrites
![Page 22: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/22.jpg)
it,C u3 >
It,
us ,cus> append only
U2,
lit
: histogramgroup users u1
,
usneeds grouping
µ÷u1
,
v8, 3-
![Page 23: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/23.jpg)
GFS EVALList your takeaways from “Table 3: Performance metrics”
all append goto
replicationlower tfmt same chunkserver
![Page 24: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/24.jpg)
WHAT HAPPENED NEXT
![Page 25: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/25.jpg)
Keynote at PDSW-DISCS 2017: 2nd Joint International Workshop On Parallel Data Storage & Data Intensive Scalable Computing Systems
![Page 26: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/26.jpg)
GFS EVOLUTIONMotivation:
- GFS MasterOne machine not large enough for large FSSingle bottleneck for metadata operations (data path offloaded)Fault tolerant, but not HA
- Lack of predictable performanceNo guarantees of latency(GFS problems: one slow chunkserver -> slow writes)
![Page 27: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/27.jpg)
GFS EVOLUTION
GFS master replaced by ColossusMetadata stored in BigTable
Recursive structure ? If Metadata is ~1/10000 the size of data100 PB data → 10 TB metadata10TB metadata → 1GB metametadata1GB metametadata → 100KB meta...
![Page 28: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/28.jpg)
GFS EVOLUTION
Need for Efficient Storage
Rebalance old, cold data
Distributes newly written data evenly across disk
Manage both SSD and hard disks
![Page 29: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/29.jpg)
Heterogeneous storage
F4: Facebook
Blob stores Key Value Stores
![Page 30: CS 744: GOOGLE FILE SYSTEM](https://reader034.vdocument.in/reader034/viewer/2022052106/6287723d35b868226b0fd88d/html5/thumbnails/30.jpg)
NEXT STEPS
- Assignment 1 out tonight!- Next up: MapReduce, Spark