nba 600: day 21 open source software - cornell...

30
NBA 600: Day 21 Open Source Software 13 April 2004 Daniel Huttenlocher

Upload: hoangliem

Post on 17-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

NBA 600: Day 21Open Source Software

13 April 2004

Daniel Huttenlocher

2

Today’s Class

GmailOpen source software

Reminders– Short paper #5 handed out today, due

Tuesday 27th (change)– Final project presentations 5/4, papers 5/11

• Email me topics by today

3

What is Open Source?

Open source software is distributed in human readable form without charge– Subject to a license that encourages or

requires similar terms for derivative works– Machine executable form readily created from

this source form by knowledgeable people

Phrase open source widely used– One of earliest such projects was Stallman’s

GNU project, termed “Free Software”• Free Software Foundation (FSF) • Philosophical objections to term open source

4

Anatomy of Open Source (OSS)

A license governing the software– Several licenses approved by Open Software

Initiative (OSI)• A major such license known as GPL

A group of experts responsible for changes to the software– Level of formality varies greatly

A broader community of software developers contributing to the workTools for remote collaboration and for distribution of the software

5

Open Source Licensing

Protect rights to do following with software– Free distribution and redistribution of source

and executable• Use the software• Study and learn from the code

– Creation of derivative works• Improve the software• Extend the software

– Integrity of reputation• Protect the good name of the software

Most projects adopt existing licenses– Some large projects have their own

6

GPL

GPL is a license that limits the right to use a piece of software– The code is copyrighted and then the rights

are governed by that license• As provided in copyright law, holder can control

how copies are made

Permits copies and derivative works to made and distributed freely provided– The GPL governs all such copies and works– The source is always available at cost of

distribution– Other terms intended to preserve free nature

7

Effect of GPL

GPL is often called a “viral” license– Applies to all copies and derivative works

• Can only include GPL’ed code in a product if the product is governed by GPL

– A limited form of GPL applies to code libraries• A library is code that performs some basic

function that might be employed in many different software products− E.g., ability to send data over the Internet

• LGPL applies only to the library and not products using that library

GPL and related licenses are also referred to as copyleft

8

Where OSS Has Succeeded

There are many open source projects– SourceForge.net, a leading site for open source

development, lists nearly 80,000– A small number of open source projects are

highly successful• Provide widely used alternative to traditional

commercial or “closed source” software

– Many more projects are used by small communities of experts

The most successful projects tend to be infrastructure software, not applications

9

Overview of Systems Software

Understanding OSS requires some knowledge of system-level architecture– Much infrastructure software has developed into

common components– Together components provide functionality that

is visible to users • Sometimes boundaries visible to users such as

Windows operating system versus Office Tools• Often boundaries not visible as in operating

system (OS) kernel versus device drivers

– Component boundaries not fixed• E.g., Web browser separate from OS or not

10

Systems Software Components

Operating System (OS) Kernel[Resource Allocation]

Device Drivers[Control External Input/Output]

System Libraries or API’s[Common Functions E.g., Rendering Text]

Application Software[E.g., Word Processor]

Server Software[E.g., Web Server]

OS

11

Client-Server Architecture

Client machines run desktop (user) applications such as Web browser, mailServer machines run server applications such as Web server, mail server

TCP/IP Network(E.g., LAN,

Internet)

12

Basic Web Server Architecture

Web browser on client machine (A)– Sends requests to server based on user actions– Displays results based on response from server

Web server on server machine (B)– Sends response to client based on requests– Response often generated from database info

Network

HTTP request

HTML responseA B

Linux OS

Apache Web Server

Windows OS

IE Web Browser

13

“Three Tier” Server Architecture

Separates server-side processing onto two separate machines– Application logic that constructs response to

client– Database server that stores information

Often now hear of “n tier” architecture

Network

HTTP request

HTML response

Web/AppServer

DatabaseServer

14

Open Source Successes

More on server than client side– More expert users of servers– Server operating systems

• Linux

– Certain server software• Web • Mail • DNS

– So far less so with other server software• Database (MySQL rising, less easy to measure)

Software for technical users

15

Apache a Major Web Server

Surveys of Web server software– Netcraft polls nearly 50M host names – Port80 polls hosts at Fortune 1000 companies

16

Linux is #2 Server OS

Netcraft web server survey in 2001– About 50% Windows, 30% Linux

• Solaris Unix about 7%• BSD Unix about 6%

– Includes sites such as Google and CNN• Google is one site but runs about 100,000 servers

Surveys of OS sales put Linux around 25%– For server class machines– For desktops negligible– But hard to compare Windows and Linux sales

• Windows often pre-installed• One Linux distribution often copied

17

Internet Email Architecture

Mail client connects to mail server– E.g., Outlook to Exchange, Eudora to POP

Mail servers connect to each other for delivery– Using protocol called SMTP

NetworkWindows OSOutlook

Windows OSExchange/IIS

Solaris OSSendmail

Windows OSEudora

SMTP

POPProprietary

18

Internet DNS Architecture

Recall domain name service (DNS) lookup precedes any request over Internet– Maps name of host to its IP address

• E.g., www.cornell.edu to 132.236.218.15– Client machine caches local copy– If no valid cached data, contacts DNS server run by ISP

• E.g., Cornell, AOL, etc.– If that server does not have a valid copy, it contacts a

root server

Network

19

Internet Infrastructure Largely OSS

Sendmail is the main mail delivery server– Distributed under BSD license (OSI approved)– Credentia 2003 survey found 39% of Internet

email servers running open source sendmail• Commercial Microsoft Exchange next with 17%

Bind is the main Internet domain name server– Resolves names to IP addresses– A 2004 survey found 80% of domain name

servers running open source bind• About 16% running commercial MSDNS

20

Rise of OSS

Server software such as sendmail and bind predates much commercial software– Grew out of research ARPAnet, precursor to

commercial Internet

Apache was best performing Web server– Got early lead in fast rise of Web

Linux fills broad need for a cross-platform Unix OS– Solaris, HP/UX, AIX are single vendor Unix– Especially Unix on Intel-based servers

21

OSS Projects

OSS license is just one aspectReliable, widely used software requires more than just a licensing scheme– Plenty of open source projects have failed or

have had limited impact• Including large ones such as Mozilla

Internet had big impact on how OSS distributed teams work– Enabling qualitatively larger more dynamic

groups and more feedback from users• Torvalds led way with Linux kernel development

22

Cathedral and the Bazaar

Eric Raymond’s book on Open Source– Involved with GNU project since mid 1980’s

OSS had been successful with smaller or less complex software– Tools such as those produced by GNU– C++ compiler while large was very well

understood technology– Although emacs text editor was large and

complex Stallman had written initial version

Complex software seemed to require more– Crafting a “Cathedral” using commercial

development or individual super-star(s)

23

Linux Kernel Project

Torvalds’ development of the Linux kernel involved many people– Made active use of the Internet

• To coordinate contributions of many developers• To interact with large number of users• Often these were the same people

– Torvalds’ “principles”• Release early, release often• Delegate anything you can• Solicit input from anyone

– Resembled a cacophonous bazaar• Yet resulted in stable, quality, complex software

24

Observations About Open Source

Based on the Linux project and his own experiences Raymond concludes– Every good work of (open source) software

starts by scratching a developer's personal itch– Plan to throw one away; you will, anyhow

(Fred Brooks, The Mythical Man-Month)– When you lose interest in a program, your last

duty is handing it off to a competent successor– Release early, release often, listen to users– Treating users as co-developers is your least-

hassle route to rapid code improvement and effective debugging

25

What Drives Open Source Projects

Many people wonder why developers contribute to open source projects– Not maximizing any economic utility function– Maximizing intangible of ego satisfaction and

reputation among peers• Self-reinforcing: more “interesting” and visible

project and better developers attract more

– Volunteer activities that work this way are not uncommon

Software development is largely an intellectual/creative pursuit

26

Applicability of Open Source

Developers need to find the problem being solved exciting and personally useful– Tends to work better for

• Systems rather than applications software• Widely used software (impact on the world)

– Less appropriate to applications, particularly esoteric non-systems ones

Suggests competition from OSS primarily issue for vendors of systems software – Web servers already true – Apache– OS becoming true – Linux– Databases? – MySQL, Postgres less significant

27

Involving Many People in Software

Get as many users as possible, early on– Good for proprietary development also, but

requires mindset change from “cathedral”

Get as many developers as possible in your user base– They will be curious about things that don’t

work for them and suggest fixes or extensions• Requires source to be readily available

Recognize that code usually needs to be rewritten until it is easily understood– Like other creative works, revision and

sometimes outright replacement

28

Sustainability Crucial

Many successful open source projects have “key personalities”– Stallman for GNU, Torvalds for Linux,

Behlendorf for Apache

If OSS is to be viable alternative to commercial software need longevity– Users need to know that they don’t depend on

one person for continued success– Different models

• Apache Software Foundation• Commercial commitment to Linux

29

Sustainability of Linux

Large companies such as IBM have made substantial commitment to Linux– Ported to mainframe hardware– Selling Intel based servers– Promising support to customers

Most commercial users of Linux need this kind of support– Google may not need it because they have so

much expertise• But still don’t want to be doing OS development

– Really took off after such commitments made

30

Sustainability of Apache

Apache is much smaller and simpler than GNU/LinuxPerceived as less mission critical and easier to change than OS– Less support from large computer companies

needed or available

Original authors less directly involved nowSet up foundation that controls Apache software and name– Has some employees as well as many

volunteers