nba 600: day 21 open source software - cornell...
TRANSCRIPT
2
Today’s Class
GmailOpen source software
Reminders– Short paper #5 handed out today, due
Tuesday 27th (change)– Final project presentations 5/4, papers 5/11
• Email me topics by today
3
What is Open Source?
Open source software is distributed in human readable form without charge– Subject to a license that encourages or
requires similar terms for derivative works– Machine executable form readily created from
this source form by knowledgeable people
Phrase open source widely used– One of earliest such projects was Stallman’s
GNU project, termed “Free Software”• Free Software Foundation (FSF) • Philosophical objections to term open source
4
Anatomy of Open Source (OSS)
A license governing the software– Several licenses approved by Open Software
Initiative (OSI)• A major such license known as GPL
A group of experts responsible for changes to the software– Level of formality varies greatly
A broader community of software developers contributing to the workTools for remote collaboration and for distribution of the software
5
Open Source Licensing
Protect rights to do following with software– Free distribution and redistribution of source
and executable• Use the software• Study and learn from the code
– Creation of derivative works• Improve the software• Extend the software
– Integrity of reputation• Protect the good name of the software
Most projects adopt existing licenses– Some large projects have their own
6
GPL
GPL is a license that limits the right to use a piece of software– The code is copyrighted and then the rights
are governed by that license• As provided in copyright law, holder can control
how copies are made
Permits copies and derivative works to made and distributed freely provided– The GPL governs all such copies and works– The source is always available at cost of
distribution– Other terms intended to preserve free nature
7
Effect of GPL
GPL is often called a “viral” license– Applies to all copies and derivative works
• Can only include GPL’ed code in a product if the product is governed by GPL
– A limited form of GPL applies to code libraries• A library is code that performs some basic
function that might be employed in many different software products− E.g., ability to send data over the Internet
• LGPL applies only to the library and not products using that library
GPL and related licenses are also referred to as copyleft
8
Where OSS Has Succeeded
There are many open source projects– SourceForge.net, a leading site for open source
development, lists nearly 80,000– A small number of open source projects are
highly successful• Provide widely used alternative to traditional
commercial or “closed source” software
– Many more projects are used by small communities of experts
The most successful projects tend to be infrastructure software, not applications
9
Overview of Systems Software
Understanding OSS requires some knowledge of system-level architecture– Much infrastructure software has developed into
common components– Together components provide functionality that
is visible to users • Sometimes boundaries visible to users such as
Windows operating system versus Office Tools• Often boundaries not visible as in operating
system (OS) kernel versus device drivers
– Component boundaries not fixed• E.g., Web browser separate from OS or not
10
Systems Software Components
Operating System (OS) Kernel[Resource Allocation]
Device Drivers[Control External Input/Output]
System Libraries or API’s[Common Functions E.g., Rendering Text]
Application Software[E.g., Word Processor]
Server Software[E.g., Web Server]
OS
11
Client-Server Architecture
Client machines run desktop (user) applications such as Web browser, mailServer machines run server applications such as Web server, mail server
TCP/IP Network(E.g., LAN,
Internet)
12
Basic Web Server Architecture
Web browser on client machine (A)– Sends requests to server based on user actions– Displays results based on response from server
Web server on server machine (B)– Sends response to client based on requests– Response often generated from database info
Network
HTTP request
HTML responseA B
Linux OS
Apache Web Server
Windows OS
IE Web Browser
13
“Three Tier” Server Architecture
Separates server-side processing onto two separate machines– Application logic that constructs response to
client– Database server that stores information
Often now hear of “n tier” architecture
Network
HTTP request
HTML response
Web/AppServer
DatabaseServer
14
Open Source Successes
More on server than client side– More expert users of servers– Server operating systems
• Linux
– Certain server software• Web • Mail • DNS
– So far less so with other server software• Database (MySQL rising, less easy to measure)
Software for technical users
15
Apache a Major Web Server
Surveys of Web server software– Netcraft polls nearly 50M host names – Port80 polls hosts at Fortune 1000 companies
16
Linux is #2 Server OS
Netcraft web server survey in 2001– About 50% Windows, 30% Linux
• Solaris Unix about 7%• BSD Unix about 6%
– Includes sites such as Google and CNN• Google is one site but runs about 100,000 servers
Surveys of OS sales put Linux around 25%– For server class machines– For desktops negligible– But hard to compare Windows and Linux sales
• Windows often pre-installed• One Linux distribution often copied
17
Internet Email Architecture
Mail client connects to mail server– E.g., Outlook to Exchange, Eudora to POP
Mail servers connect to each other for delivery– Using protocol called SMTP
NetworkWindows OSOutlook
Windows OSExchange/IIS
Solaris OSSendmail
Windows OSEudora
SMTP
POPProprietary
18
Internet DNS Architecture
Recall domain name service (DNS) lookup precedes any request over Internet– Maps name of host to its IP address
• E.g., www.cornell.edu to 132.236.218.15– Client machine caches local copy– If no valid cached data, contacts DNS server run by ISP
• E.g., Cornell, AOL, etc.– If that server does not have a valid copy, it contacts a
root server
Network
19
Internet Infrastructure Largely OSS
Sendmail is the main mail delivery server– Distributed under BSD license (OSI approved)– Credentia 2003 survey found 39% of Internet
email servers running open source sendmail• Commercial Microsoft Exchange next with 17%
Bind is the main Internet domain name server– Resolves names to IP addresses– A 2004 survey found 80% of domain name
servers running open source bind• About 16% running commercial MSDNS
20
Rise of OSS
Server software such as sendmail and bind predates much commercial software– Grew out of research ARPAnet, precursor to
commercial Internet
Apache was best performing Web server– Got early lead in fast rise of Web
Linux fills broad need for a cross-platform Unix OS– Solaris, HP/UX, AIX are single vendor Unix– Especially Unix on Intel-based servers
21
OSS Projects
OSS license is just one aspectReliable, widely used software requires more than just a licensing scheme– Plenty of open source projects have failed or
have had limited impact• Including large ones such as Mozilla
Internet had big impact on how OSS distributed teams work– Enabling qualitatively larger more dynamic
groups and more feedback from users• Torvalds led way with Linux kernel development
22
Cathedral and the Bazaar
Eric Raymond’s book on Open Source– Involved with GNU project since mid 1980’s
OSS had been successful with smaller or less complex software– Tools such as those produced by GNU– C++ compiler while large was very well
understood technology– Although emacs text editor was large and
complex Stallman had written initial version
Complex software seemed to require more– Crafting a “Cathedral” using commercial
development or individual super-star(s)
23
Linux Kernel Project
Torvalds’ development of the Linux kernel involved many people– Made active use of the Internet
• To coordinate contributions of many developers• To interact with large number of users• Often these were the same people
– Torvalds’ “principles”• Release early, release often• Delegate anything you can• Solicit input from anyone
– Resembled a cacophonous bazaar• Yet resulted in stable, quality, complex software
24
Observations About Open Source
Based on the Linux project and his own experiences Raymond concludes– Every good work of (open source) software
starts by scratching a developer's personal itch– Plan to throw one away; you will, anyhow
(Fred Brooks, The Mythical Man-Month)– When you lose interest in a program, your last
duty is handing it off to a competent successor– Release early, release often, listen to users– Treating users as co-developers is your least-
hassle route to rapid code improvement and effective debugging
25
What Drives Open Source Projects
Many people wonder why developers contribute to open source projects– Not maximizing any economic utility function– Maximizing intangible of ego satisfaction and
reputation among peers• Self-reinforcing: more “interesting” and visible
project and better developers attract more
– Volunteer activities that work this way are not uncommon
Software development is largely an intellectual/creative pursuit
26
Applicability of Open Source
Developers need to find the problem being solved exciting and personally useful– Tends to work better for
• Systems rather than applications software• Widely used software (impact on the world)
– Less appropriate to applications, particularly esoteric non-systems ones
Suggests competition from OSS primarily issue for vendors of systems software – Web servers already true – Apache– OS becoming true – Linux– Databases? – MySQL, Postgres less significant
27
Involving Many People in Software
Get as many users as possible, early on– Good for proprietary development also, but
requires mindset change from “cathedral”
Get as many developers as possible in your user base– They will be curious about things that don’t
work for them and suggest fixes or extensions• Requires source to be readily available
Recognize that code usually needs to be rewritten until it is easily understood– Like other creative works, revision and
sometimes outright replacement
28
Sustainability Crucial
Many successful open source projects have “key personalities”– Stallman for GNU, Torvalds for Linux,
Behlendorf for Apache
If OSS is to be viable alternative to commercial software need longevity– Users need to know that they don’t depend on
one person for continued success– Different models
• Apache Software Foundation• Commercial commitment to Linux
29
Sustainability of Linux
Large companies such as IBM have made substantial commitment to Linux– Ported to mainframe hardware– Selling Intel based servers– Promising support to customers
Most commercial users of Linux need this kind of support– Google may not need it because they have so
much expertise• But still don’t want to be doing OS development
– Really took off after such commitments made
30
Sustainability of Apache
Apache is much smaller and simpler than GNU/LinuxPerceived as less mission critical and easier to change than OS– Less support from large computer companies
needed or available
Original authors less directly involved nowSet up foundation that controls Apache software and name– Has some employees as well as many
volunteers