welcome to cis 455 / 555 – internet and web systems zachary g. ives university of pennsylvania cis...

26
Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

Upload: brent-reynolds

Post on 04-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

Welcome to CIS 455 / 555 –Internet and Web Systems

Zachary G. IvesUniversity of Pennsylvania

CIS 455 / 555 – Internet and Web Systems

January 13, 2010

Page 2: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

2

What this Course Is About

How do we build services like Google, Akamai, iTunes, Facebook, EBAY, …? What are the principles behind them?

(This is NOT a course on building Web sites!) How do “cloud computing,” P2P, and Web services relate?

The main themes of the course: Distributed systems concepts, with emphasis on data,

scalability and interoperability (including “the cloud”) Data representation fundamentals, with emphasis on XML Information retrieval concepts, including ranking and indexing

It’s a course that involves building software using the principles learned, evaluating it, and programming in teams

Page 3: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

3

How Does this Relate to Other CIS Courses?

CIS 330/550 Data representation and management Relational querying with SQL; XML querying with XQuery DBMS-backed web sites 455/555 focuses on data with respect to interoperability

CIS 350/573: software engineering and mashups

CIS 505: focuses on distributed systems and algorithms CIS 505 is less project-oriented than CIS 555 CIS 555 covers Web services, cloud architectures in more detail

Page 4: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

4

Some Things We’ll Look at

What are the principles behind building systems that work on the Internet?

How do these relate to many of today’s hot technologies? Web servers, DHTML, Servlets, JSP, … XML Web services Peer-to-peer Application servers Cloud computing environments Content distribution networks Web search Mash-ups The cloud …

Page 5: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

5

Staff

Instructor: Zack Ives, zives@cis Office: 576 Levine North Office hours Th 3:30-4:30 (and by

arrangement)

TA: Katie Gibson, gibsonk@seas Office hours TBA

Discussion group: [email protected] http://groups.google.com/cis-455-555-spring10

Page 6: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

6

Textbooks

Distributed Systems: Principles and Paradigms, 2nd ed, Tanenbaum and van Steen We’ll read from the book ~50% of the time

Frequent supplementary handouts Excerpts from several books Many recent research papers

Your first one, which you should read by Wed:http://research.microsoft.com/en-us/um/people/blampson/33-Hints/Acrobat.pdf

(linked off the CIS 555 “Schedule & Slides” page)

Send me mail if it’s difficult for you to find a way of printing the paper yourself

Page 7: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

7

Prerequisites, Workload, etc.

Necessary skills: Ability to code in Java: there is a substantial implementation project Good debugging skills – this will be the biggest time sink! The ability to work as a team with classmates (towards the end) A willingness to learn how to read API documentation Some exposure to threads and concurrent programming A willingness to “push the envelope”

Workload: Several programming/debugging-based homework assignments A substantial term project with experimental evaluation and a report Two midterms

Payoff: Lots of practical development and debugging experience A good working knowledge of the fundamentals behind scalable systems A working “academic clone of Google,” hosted on Amazon EC2!

WARNING: this course should be considered 1.5 CU!

Page 8: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

8

A Disclaimer…

This remains a “bleeding edge” course! Goal 0: an understanding of scalable distributed data-centric systems Goal 1: a look under the covers of today’s hottest topics – in lectures

and in projects Goal 2: a level of comfort in managing large, complex software

development with others’ code Part of this means doing a substantial implementation project

As in the real world: learning APIs, dealing with inadequate tools Most of you will find this a struggle! You’ll spend many hours debugging!

We will be using some immature technology Not everything has been tested and validated ahead of time

e.g., this will be the first year we are using Amazon Elastic Compute Cloud We’ll do the best we can to smooth over the bugs

We hope it will be a fun course, though…… And an interesting one!

Page 9: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

9

A Bit of Context for the Course

Page 10: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

10

What Exactly Is the Web?

The Web consists of HTTP servers that publish HTML, XML, and a few other content types These are hyperlinked via URLs (a subset of URIs) Plus there are a huge number of web clients

The Web is built on a number of Internet protocols: DNS, TCP, IP

Other Internet services use other protocols SMTP, IMAP, POP, AIM, FTP, … Streaming media, music swapping protocols, …

Web services, custom applications may actually also use HTTP in ways it wasn’t designed for

Page 11: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

11

The Internet is Built in Layers

IPv4, IPv6 Unicast, (multicast)

TCP (session-based)

UDP (sessionless)

WiFi, ZigBee, Ethernet, WiMax

Lightweight streaming, etc.

SSH, FTP,HTTP, IM, P2P,

Web Services, distrib

transactions, …

Link

IP

Transport

Session

Middleware

Your Application

… …

Page 12: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

12

What Is an Internet System?

Not just a web server or web application… An application built over the Internet, whose

functionality is distributed across more than one machine Typically, at least in a client-server or server-to-server

fashion, but may have many more participants Typically, data and/or code must be exchanged in distributed

fashion for the functioning of the application Often, the data must be partitioned, replicated, translated,

etc. (“shards” in Google-speak) Often, the code is written in multiple different environments,

languages, etc. Often, there are concerns about handling failures, firewalls,

attacks, …

Page 13: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

13

Why Are Internet System Topics Interesting?

Understanding what’s underneath today’s Web How does it work? What are its shortcomings? What are its strengths?

Understanding distributed algorithms Using the right approach when designing

new protocols and web systems Being able to anticipate what’s actually

possible in the future

Page 14: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

14

Example: Web Search, a Cloud Service

Index Servers

Crawlers

Search Interface Servers

queries HTML forms;results

query results

WebPages

pages

keywords +locations

clientclientclient

Uses a model ofdocument/wordsimilarity to rankmatches

Page 15: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

15

Example: Social Networking (Facebook / Twitter), a Cloud Service

Recommender

Users &entities

User PageServers

clicks pages & notifications

suggestions

common properties, usage logs, …

clientclientclient

updates, posts

Page 16: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

16

Example: Information Integration

XML sources

MediatorSystem

queries results in “mediated schema”

clientclientclient

Relationalsources HTML sources

XQuery + XPath overXML

XMLSQL

ODBC results HTTP

POST

HTML

Maps all data into a single format and virtual schema

Page 17: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

17

Example: SETI@home

Problem Partitioning

clientclientclient

Breaks computation intomany parts and distributes them tothe clients

Data Aggregation

New sub-problems Computed

subresults

Page 18: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

18

Example: P2P File Sharing

client

client

client

client

request

request

request data

data

data

Processes name-basedrequests for data; eachnode can make requests,forward requests,return data

Page 19: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

19

What are the Hard Problems?

Disclaimer: most of the hard problems AREN’T solved (or solvable) – and there often isn’t any single BEST solution

Much of systems design is about finding the right compromise for each specific problem

We can divide them into: Scalability Availability / reliability Consistency Interoperability Location and resource discovery

Page 20: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

20

Scalability

How do we support a large number of clients or requests?

Distribute work! Challenges:

Coordination – takes significant overhead in the general case

Load balancing – avoid having bottlenecks

Parts of the solution: Client-server, multi-tier, P2P architectures Restricted programming models, e.g., MapReduce Data partitioning, replication, remote procedure calls, …

Page 21: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

21

Availability/Reliability

How do we ensure the system is “up” when we want it to be, and doing the “right” thing?

Replication and redundancy Security measures against attacks Ability to undo/redo Challenges:

Keeping things consistent Performance vs. security Acknowledgments

Parts of the solution: Data partitioning, replication, … Logging, transactions, … Redundant hardware, multiple sites, … Quorum and consensus algorithms

Page 22: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

22

Consistency / Consensus

Replication, distribution, and failures make it difficult to keep a unified, consistent view of the world – how do we combat this?

Locking, concurrency control, and invalidation schemes Clock synchronization Challenges:

Locking has huge performance overhead Network partitions, disconnected operation

Parts of the solution: Optimistic concurrency control, 2-phase locking Distributed clock sync Conflict resolvers

Page 23: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

23

Interoperability

How do we coordinate the efforts of components that have different data formats and/or source languages, and are on different machines?

Standardization! Challenges:

Everything has a different semantics! Parts of the solution:

Standard data formats: XML, XML schemas “Schema mediation” and data translation Remote procedure calls: CORBA, XML-RPC, …

Page 24: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

24

Location & Resource Discovery

How do you find what you’re looking for?

Naming Declarative queries over standard schemas Advertisements Challenges:

Naming has implicit semantics What do you do when you don’t know what to call something?

Parts of the solution: Directory systems – DNS, LDAP, etc. Resource discovery and advertising protocols Overlay networks, sharding schemes Standardized schemas

Page 25: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

25

Our First Focus: Single Machines, aka Servers

How do you handle large numbers of concurrent users? Processes Threads Events Hybrids (e.g., thread pools) Staged architectures

Page 26: Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 13, 2010

26

Next Time (Wed due to MLK Day)…

We’ll look under the covers of an HTTP server Key ideas in building scalable systems Principles of HTTP and web servers Management of concurrent sessions

To read by next Wednesday: Lampson and Saltzer paper

http://research.microsoft.com/en-us/um/people/blampson/33-Hints/Acrobat.pdf

Tanenbaum Ch. 3.1 If necessary: Review Tanenbaum “Modern OS,” Ch. 2.3

or a similar OS book on interprocess communication