NETWORK PROGRAMMINGINTRO TO DISTRIBUTED SYSTEMSFALL 2013
L1-IntroDongsu Han
Some material taken from publicly available lecture slides including Srini Seshan’s and David Anderson’s
Today’s Lecture
Administrivia What is a distributed system and what
does it do? Walking through an example Overview of the topics covered in this
course
Instructors
Instructor 한동수 (Dongsu Han) [email protected], N1 814 Office hours: Wednesday 1-3pm
Teaching Assistant 정은영 [email protected], N1 820
Course Goals
Become familiar with the principles and practice of distributed systems
Understand the challenges and common techniques in distributes systems design
Learn how to write distributed applications that use the network How does Dropbox work? How does a Content Distribution Network
work?
Course Format
~30 lectures References (no single textbook):
Distributed systems: Principles and Paradigms Distributed systems: Concepts and Design
(CDK) Computer Networks: A Systems Approach
Exams: Midterm and Final Programming assignments
5 to 6 assignments Loosely tied to lecture materials (start early)
About Programming Assignments Systems programming in Low-level (C) Must be robust, error handling must be
rock solid Handle concurrency Understand the system’s failure modes Interfaces specified by documented
protocols 1 or 2 TA led hands-on session on
programming/debugging
Grading
10% late penalty per day Can’t be more than 3 days late Exceptions: documented
medical/personal emergency. Two “late points” to use over the entire
course (up to one point for each assignment)
Regrade request must be done in writing within a week of the original grading.
Grading
Weight assignment 20% for Midterm exam 25% for Final exam 45% for Homework/programming
assignment 10% for class participation
You MUST demonstrate competence in both projects and tests to pass the course
Collaboration
Working together important Discuss course material Work on problem debugging
Programming assignment must be your own work Partial credit (points) Will run plagiarism detection on source code “Copy and paste” codes will get severely
penalized Implication: You will fail this course if you
copy someone else’s code.
Why do I need this course?
“Everything” is distributed these days.
Web, google, dropbox, kakao talk, youtube, calendar, email, facebook, cais, the cloud,…
“Everything” relies on distribute systems Learn how they really work. Learn how to design systems that scale.
Why do I need this course?
Enables new things Search engine Facebook (Social Networking Systems) Dropbox
Make existing thing more efficient Scale Facebook for the next billion users Scale CAIS to work well even when
everyone tries to access it Make computer graphs rendering faster
using clusters
Examples of Scale
Updates/Posts Twitter: The record is 25,088 tweets per
second (when Castle in the Sky was broadcast in Japan)
Searches Google: 5,134,000,000 searches per day.
Network operations 10Gbps == 14,880,952 packets per second
(@64bytes) Fast key-value store: 50~70 Mops/sec
Examples of Scale
Akamai running 105,000 servers in more than 1,900 networks
Number of networks on the Internet: 45,000 (2013/8/26)
Microsoft: 1 million servers Google envisions 10 million servers.
In-class Activity
Form a group of 3 Questions to answer:
Name a few distributed systems you know of. Pick one and draw its components as best as you
can. Think about how many servers there are or how
many requests it handles per second. (Make some assumptions)
Describe each component in ~2 sentences. No right or wrong answer. 10 mins. 1 or 2 teams will present.
Today’s Lecture
Administrivia What is a distributed system and what
does it do? Walking through an example Overview of the topics covered in this
course
Remember IP...
From: 128.2.185.33
To: 66.233.169.103
<packet contents>
hosts.txt
www.google.com 66.233.169.103www.cmu.edu 128.2.185.33www.cs.cmu.edu 128.2.56.91www.areyouawake.com 66.93.60.192...
Domain Name System
Local DNS server
`who is www.google.com?
www.google.com is 66.233.169.103.com DNS server
google.com DNS server
`
. DNS server
who is www.google.com?ask the .com guy... (here’s his IP)
`ask the google.com guy... (IP)
`
66.233.169.103
who is www.google.com?
Decentralized - admins update own domains without coordinating with other domains
Scalable - used for hundreds of millions of domainsRobust - handles load and failures well
But there’s more...
who is www.google.com?
google.com DNS server
`128.2.53.5
Which google datacenter is 128.2.53.5 closest to?
Is it too busy?
66.233.169.99Search!
How do you index the web?
1.Get a copy of the web.2.Build an index.3.Profit (insert advertisements)There are over 60 trillion individual web pages
Hundreds of millions of websites
How do you index the web?
Crawling -- download those web pages Indexing -- harness 10s of thousands of
machines to do it Profiting -- we leave that to you.
“Data-Intensive Computing”
i1 i2 i3
i4 ...
i1 i2 i3
i4 ...
i1 i2 i3
i4 ...
Data is split into chunks
Replicate: Handle load
GFS distributed filesystemReplicatedConsistentFast
Storing: Google File System
doc1,2,3,..n
Indexing
hello hadoopgoodbye hadoophello youhello me
hello worldgoodbye world
doc1
doc2
goodbye doc1 =>1 doc2=>1hadoop doc1=>2hello doc1=>3 doc2=>1me doc1=>1world doc2=>2you doc1 =>1
Inverted index
MapReduce / Hadoop
Data Chunks
...
Computers
Data Transformation
Sort by key
DataAggregation
Storage
Storage
Doc 1~n
hello
you
MapReduce / Hadoop
Data Chunks
...
Computers
Data Transformation
Sort
DataAggregation
Storage
Storage
Why? Hiding details of programming 10,000 machines!
Programmer writes two simple functions:
map (data item) -> list(tmp values)reduce ( list(tmp values)) -> list(out values)
MapReduce system balances load, handles failures, starts job, collects results, etc.
All that...
Hundreds of DNS servers Protocols on protocols on protocols Distributed network of Internet routers
to get packets around the globe Hundreds of thousands of servers ... to find ryu hyun jin in under 0.2
second
Today’s Lecture
Administrivia What is a distributed system and what
does it do? Walking through an example Overview of the topics covered in this
course
39
What is a Distributed System?
A distributed system is: “A collection of independent computers that appears to its users as a single coherent system”
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." – Leslie Lamport
40
Distributed Systems
The middleware layer extends over multiple machines, and offers each application the same interface.
41
What does it do?
Hide complexity to programmers/usersHide the fact that its processes and resources are physically distributed across multiple machines.
Transparency in a Distributed System
42
How?
The middleware layer extends over multiple machines, and offers each application the same interface.
What we will learn
Learn principles in distributed systems design.
Distributed systems differ from traditional software because components are dispersed.
Many assumptions break for this reason. We will study important aspects that we
must consider in dealing with distributed systems.
Challenges (1/2)
Heterogeneity Networks, computer hardware, OS, programming
languages, different implementations Openness
Different implementation or extension can be added. (e.g., Firefox, Internet Explorer). Key interfaces are published.
Security Confidentiality, integrity, and availability E-doctor (you don’t want someone else to see
your record)
Challenges (2/2)
Scalability Does the system remain effective with a significant
increase in # of users? Failure handling
Detection, masking, tolerating failure, recovery Concurrency
Need synchronization when accessing shared resources.
Transparency Quality of service
Video quality may suffer when the network is overloaded.