P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
ii
This page intentionally left blank
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
OpenSource
Fromthe Internet’s infrastructure to operating systems like GNU/Linux, the open
sourcemovement comprises some of the greatest accomplishments in computing over
thepast quarter century. Its story embraces technological advances, unprecedented
globalcollaboration, and remarkable tools for facilitating distributed development.
Theevolution of the Internet enabled an enormous expansion of open development,
allowingdevelopers to exchange information and ideas without regard toconstraints of
space,time, or national boundary. The movement has had widespread impact on
educationand government, as well as historic, cultural, and commercial repercussions.
PartI discusses key open source applications, platforms, and technologies used in open
development.Part II explores social issues rangingfrom demographics and psychology
tolegal and economic matters. Part III discusses the Free Software Foundation, open
sourcein the public sector (government and education), and future prospects.
fadi p. deekreceived his Ph.D. in computer and information science from the New
JerseyInstitute of Technology (NJIT). He is Dean of the College of Science and
LiberalArts and Professor of Information Systems, Information Technology, and
MathematicalSciences at NJIT, where he began his academic career as a Teaching
Assistantin 1985. He is also a member of the Graduate Faculty – Rutgers University
Ph.D.Program in Management.
james a. m. mchughreceived his Ph.D. in applied mathematics from the Courant
Instituteof Mathematical Sciences, New York University.During the course of his
career,he has been a Member of Technical Staff at Bell Telephone Laboratories (Wave
PropagationLaboratory), Director of the Ph.D. program in computer science at NJIT,
ActingChair of the Computer and Information Science Department at NJIT, and
Directorof the Program in Information Technology. He is currently a tenured Full
Professorin the Computer Science Department at NJIT.
i
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
ii
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
Open Source
Technology and Policy
FADI P.DEEK
New JerseyInstitute of Technology
JAMES A. M. McHUGH
New JerseyInstitute of Technology
iii
CAMBRIDGEUNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge CB28RU, UK
First published in print format
ISBN-13 978-0-521-88103-6
ISBN-13 978-0-521-70741-1
ISBN-13 978-0-511-36775-5
© Fadi P. Deek and James A. M. McHugh 2008
2007
Information on this title: www.cambridge.org/9780521881036
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written
p
ermission of Cambrid
g
e University Press.
ISBN-10 0-511-36775-9
ISBN-10 0-521-88103-X
ISBN-10 0-521-70741-2
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
g
uarantee that any content on such websites is, or will remain, accurate or a
pp
ro
p
riate.
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
hardback
paperback
paperback
eBook (NetLibrary)
eBook (NetLibrary)
hardback
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
To my children,
Matthew,Andrew, and Rebecca
FadiP. Deek
To my parents, Anne and Peter
To my family, Alice, Pete, and Jimmy
andto my sister, Anne Marie
JamesA. M. McHugh
v
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
vi
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
Contents
Preface page ix
Acknowledgments xi
1. Introduction 1
1.1 Why Open Source 2
1.2 Preview 11
Section One: Open Source – InternetApplications,
Platforms,and Technologies
2. Open Source Internet Application Projects 21
2.1 The WWW and the Apache Web Server 23
2.2 The Browsers 37
2.3 Fetchmail 50
2.4 The Dual License Business Model 61
2.5 The P’s in LAMP 70
2.6 BitTorrent 77
2.7 BIND 78
3. The Open Source Platform 80
3.1 Operating Systems 81
3.2 Windowing Systems and Desktops 99
3.3 GIMP 111
4. Technologies Underlying Open Source Development 119
4.1 Overview of CVS 120
4.2 CVS Commands 124
4.3 Other Version Control Systems 143
4.4 Open Source Software Development Hosting Facilities
andDirectories 151
vii
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
viii Contents
Section Two:Social, Psychological, Legal, and
Economic Aspects of Open Source
5. Demographics, Sociology, and Psychology of Open Source
Development 159
5.1 Scale of Open Source Development 160
5.2 Demographics and Statistical Profile of Participants 162
5.3 Motivation of Participants 164
5.4 Group Size and Communication 166
5.5 Social Psychology and Open Source 168
5.6 Cognitive Psychology and Open Source 181
5.7 Group Problem Solving and Productivity 190
5.8 Process Gains and Losses in Groups 197
5.9 The Collaborative Medium 206
6. Legal Issues in Open Source 222
6.1 Copyrights 223
6.2 Patents 228
6.3 Contracts and Licenses 232
6.4 Proprietary Licenses and Trade Secrets 236
6.5 OSI – The Open Source Initiative 243
6.6 The GPL and Related Issues 250
7. The Economics of Open Source 265
7.1 Standard Economic Effects 266
7.2 Open Source Business Models 272
7.3 Open Source and Commoditization 281
7.4 Economic Motivations for Participation 285
Section Three: FreeSoftware: The Movement, the
Public Sector,and the Future
8. The GNU Project 297
8.1 The GNU Project 297
8.2 The Free Software Foundation 302
9. Open Source in the Public Sector 309
9.1 Open Source in Government and Globally 310
9.2 Open Source in Education 316
10. The Future of the Open Source Movement 325
Glossary 336
SubjectIndex 351
AuthorIndex 366
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
Preface
The story of free and open software is a scientific adventure, packed with
extraordinary, larger-than-life characters and epic achievements. From infra-
structure for the Internet to operating systems like Linux, this movement
involvessome of the greataccomplishments in computing over thepast quarter
century.Thestory encompasses technologicaladvances, global softwarecollab-
orationon anunprecedented scale, andremarkable softwaretools for facilitating
distributeddevelopment. It involvesinnovative business models, voluntaryand
corporate participation, and intriguing legal questions. Its achievements have
had widespread impact in education and government, as well as historic cul-
tural and commercial consequences. Some of its attainments occurred before
the Internet’srise, but it was the Internet’s emergence that knitted together the
scientificbards ofthe open sourcecommunity. Itlet themexchange their innova-
tionsand interact almostwithout regard toconstraints of space,time, or national
boundary.Our story recounts thetales of major open community projects: Web
browsers that fueled and popularized the Internet, the long dominant Apache
Web server, the multifarious development of Unix, the near-mythical rise of
Linux, desktop environments like GNOME, fundamental systems like those
provided by the Free Software Foundation’s GNU project, infrastructure like
theX Window System, and more. Wewill encounter creative, driven scientists
who are often bold, colorful entrepreneurs or eloquent scientific spokesmen.
The story is not without its conflicts, both internal and external to the move-
ment. Indeed the free software movement is perceived by some as a threat to
the billions in revenue generated by proprietary firms and their products, or
converselyas a development methodology that is limited in its ability to ade-
quately identify consumer needs. Much of this tale is available on the Internet
becauseof the way the community conducts its business, making it a uniquely
ix
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
x Preface
accessible tale. As free and open software continues to increasingly permeate
our private and professional lives, we believe this story will intrigue a wide
audience of computer science students and practitioners, IT managers, policy-
makers in government and education, and others who want to learn about the
fabled,ongoing legacy of transparent software development.
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
Acknowledgments
Manypeople helped us during the process of writing and publishing thisbook.
Although it is impossible to know all of them by name, we offer a word of
appreciation and gratitude to all who have contributed to this project. In par-
ticular,we thank the anonymous reviewers who read the proposal for the text
andcarefully examined the manuscript during the earlier stages of the process.
Theyprovided excellent recommendations and offered superb suggestions for
improvingthe accuracy and completeness of the presented material.
HeatherBergman, Computer Science Editorat Cambridge University Press,
deserves enormous praise for her professionalism and competence. Heather
responded promptly to our initial inquiry and provided excellent insight and
guidance throughout the remaining stages. Her extraordinary efforts were
instrumentalin getting this book into the hands of its readers.
xi
P1:KAE
9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23
xii
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1
Introduction
The open source movement is a worldwide attempt to promote an open style
of software development more aligned with the accepted intellectual style of
science than the proprietary modes of invention that have been characteristic
of modern business. The idea – or vision – is to keep the scientific advances
created by software development openly available for everyone to understand
and improve upon. Perhaps even more so than in the conventional scientific
paradigm, the very process of creation in open source is highly transparent
throughout. Its products and processes can be continuously, almost instan-
taneously scrutinized over the Internet, even retrospectively. Its peer review
process is even more open than that of traditional science. But most of all: its
discoveries are not kept secret and it lets anyone, anywhere, anytime free to
buildon its discoveries and creations.
Opensource is transparent. The source code itself is viewable and available
to study and comprehend. The code can be changed and then redistributed to
sharethe changesand improvements. Itcan be executedfor anypurpose without
discrimination. Its process of development is largely open, with the evolution
of free and open systems typically preserved in repositories accessible via the
Internet,including archives of debates onthe design and implementation of the
systems and the opinions of observers about proposed changes. Open source
differsvastly fromproprietary code where allthese transparencies are generally
lacking.Proprietary code is developedlargely in private,albeit its requirements
are developed with its prospective constituencies. Its source code is generally
notdisclosed and is typicallydistributed under the shield ofbinary executables.
Itsuse is controlled by proprietary software licensing restrictions. The right to
copy the program executables is restricted and the user is generally forbidden
fromattempting tomodify and certainlyfrom redistributing thecode or possible
improvements. In most respects, the two modalities of program development
1
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
2 1Introduction
arepolar opposites, though this is notto say there are not many areaswhere the
commercialand open communities have cooperated.
Throughout this book, we will typically use the term open source in a
generic sense, encompassing free software as referred to by the Free Soft-
ware Foundation (FSF) and open source software as referred to by the Open
Source Initiative (OSI) organization. The alternative composite terms FLOSS
(for Free/Libre/Open Source Software) or FOSS are often used in a European
context.The two organizations, theFSF and the OSI, represent the twostreams
ofthe free oropen source movement.Free software isan intentionally evocative
term, a rallying cry as it were, used by the FSF and intended to resonate with
thevalues of freedom: user and developer freedom. The FSF’s General Public
License (GPL) is its gold standard for free licenses. It has the distinctive char-
acteristic of preventing software licensed under it from being redistributed in
a closed, proprietary distribution. Its motto might be considered as “share and
sharealike.” However,the FSF also recognizesmany other software licensesas
freeas long as theylet the user run a program forany purpose, access its source
code,modify the code if desired, and freely redistribute the modifications. The
OSIon the other handdefines ten criteria for callinga license open source.Like
the FSF’s conditions for free software (though not the GPL), the OSI criteria
do not require the software or modifications to be freely redistributed, allow-
ing licenses that let changes be distributed in proprietary distributions. While
the GPL is the free license preferred by the FSF, licenses like the (new) BSD
or MIT license are more characteristic of the OSI approach, though the GPL
is also an OSI-certified license. Much of the time we will not be concerned
aboutthe differences between the variouskinds of free or opensource licenses,
thoughthese differences can bevery important and havemajor implications for
usersand developers (see suchas Rosen, 2005). When necessary,we will make
appropriate distinctions, typically referring to whether certain free software is
GPL-licensedor is under a specific OSI-certified license. We will elaborate on
software licenses in the chapter on legal issues. For convenience we will also
referat times to “open software” and “open development” in the same way.
We will begin our exploration by considering the rationale for open source,
highlightingsome ofits putative ordemonstrable characteristics,its advantages,
andopportunities it provides. We will then overview what we will coverin the
restof the book.
1.1Why Open Source
Beforewe embark on our detailed examination of open source, we will briefly
exploresome markers for comparing open and proprietary products. A proper
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.1Why Open Source 3
comparison of their relative merits would be a massively complex, possibly
infeasible undertaking. There are many perspectives that would have to be
considered, as well as an immense range of products, operating in diverse
settings,under differentconstraints, and withvaried missions.Unequivocal data
fromunbiased sources would have to be obtained for an objectivecomparative
evaluation, but this is hard to come by. Even for a single pair of open and
proprietary products it is often difficult to come to clear conclusions about
relative merits, except for the case of obviously dominant systems like Web
servers (Apache). What this section modestly attempts is to set forth some of
the parameters or metrics that can help structure a comparative analysis. The
issuesintroduced here are elaborated on throughout the book.
Opensource systems and applications often appear to offer significant ben-
efitsvis- `a-visproprietary systems. Consider some of the metrics they compete
on. First of all, open source products are usually free of direct cost. They are
often superior in terms of portability. You can modify the code because you
can see it and it’s allowed by the licensing requirements, though there are
different licensing venues. The products may arguably be both more secure
and more reliable than systems developed in a proprietary environment. Open
productsalso often offer hardware advantages, with frequentlyleaner platform
requirements. Newer versions can be updated to for free. The development
process also exhibits potential macroeconomic advantages. These include the
innately antimonopolistic character of open source development and its the-
oretically greater efficiency because of its arguable reduction of duplicated
effort. The open source paradigm itself has obvious educational benefits for
students because of the accessibility of open code and the development pro-
cess’transparent exposure of high-quality software practice. The products and
processeslend themselves in principle to internationalization and localization,
thoughthis is apparently not always well-achieved in practice. There are other
metrics that can be considered as well, including issues of quality of vendor
support, documentation, development efficiency,and so on. We will highlight
some of these dimensions of comparison. A useful source of information on
these issues is provided by the ongoing review at (Wheeler, 2005), a detailed
discussion which, albeit avowedly sympathetic to the open source movement,
makesan effort to be balanced in its analysis of the relative merits ofopen and
proprietarysoftware.
1.1.1 Usefulness, Cost, and Convenience
Doesthe open source model createuseful software products in atimely fashion
at a reasonable cost that are easy to learn to use? In terms of utility, consider
that open source has been instrumental in transforming the use of computing
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
4 1Introduction
insociety. Most of the Internet’s infrastructure and the vastlysuccessful Linux
operating system are products of open source style development. There are
increasingly appealing open desktop environments like GNOME and KDE.
Furthermore, many of these products like the early Web servers and browsers
as well as Linux were developed quite rapidly and burst on the market. Fire-
fox is a recent example. It is of course hard to beat the direct price of open
sourceproducts since they areusually free. The zero purchasecost is especially
attractivewhen the software product involved hasalready been commoditized.
Commoditization occurs when one product is pretty much like another or at
least good enough for the needs it serves. In such cases, it does not pay to
pay more. An open source program like the Apache Webserver does not even
have to be best of breed to attract considerable market share; it just has to be
cheap enough and good enough for the purpose it serves. Open source is also
not only freely available but is free to update with new versions, which are
typically available for free download on the same basis as the original. For
most users, the license restrictions on open products are not a factor, though
theymay be relevant to software developers or major users who want to mod-
ify the products. Of course, to be useful, products have to be usable. Here the
situationis evolving. Historically, many open sourceproducts have been in the
categoryof Internet infrastructure tools orsoftware used by system administra-
tors. For such system applications, the canons of usability are less demanding
because the users are software experts. For ordinary users, we observe that
at least in the past interface, usability has not been recognized as a strong
suit of open source. Open source advocate Eric Raymond observed that the
design of desktops and applications is a problem of “ergonomic design and
interfacepsychology, and hackers havehistorically been poor at it” (Raymond,
1999). Ease of installation is one aspect of open applications where usability
is being addressed such as for the vendor-provided GNU/Linux distributions
or,at a much simpler level, installers for software like the bundledAMP pack-
age (Apache, MySQL, Perl, PHP). (We use GNU/Linux here to refer to the
combination of GNU utilities and the Linux kernel, though the briefer desig-
nation Linux is more common.) Another element in usability is user support.
There is for-charge vendor-based support for many open source products just
as is for proprietary products. Arguments havebeen made on both sides about
whichis better. Majorproprietary software developersmay have more financial
resources to expend on “documentation, customer support and product train-
ing than do open source providers” (Hahn, 2002), but open source products
by definition can have very wide networks of volunteer support. Furthermore,
since the packages are not proprietary, the user is not locked-in to a particular
vendor.
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.1Why Open Source 5
1.1.2 Performance Characteristics
Doesopen source provide products that are fast, secure, reliable,and portable?
Theoverview inWheeler (2005) modestlystates that GNU/Linux isoften either
superior or at least competitive in performance with Windows on the same
hardware environment. However, the same review emphasizes the sensitiv-
ity of performance to circumstances. Although proprietary developers benefit
fromfinancial resources that enable them to produce high qualitysoftware, the
transparentcharacter of open source is uniquely suitableto the requirements of
securityand reliability.
In terms of security, open source code is widely considered to be highly
effectivefor mission-critical functions, precisely because its code can be pub-
liclyscrutinized for security defects.It allows users the opportunityto security-
enhancetheir own systems,possibly with the helpof an open sourceconsultant,
rather than being locked into a system purchased from a proprietary vendor
(Cowan,2003). In contrast, for example, Hoepmanand Jacobs (2007) describe
howthe exposure of the code for a proprietary voting system revealed serious
security flaws. Open accessibility is also necessary for government security
agencies that have to audit software before using it to ensure its operation is
transparent(Stoltz, 1999). Though security agencies canmake special arrange-
mentswith proprietarydistributors togain access toproprietary code,this access
is automatically available for open source. Open source products also have a
uniquelybroad peerreview processthat lendsitself todetection of defectsduring
development,increasing reliability. Not only are the changes to software pro-
posedby developers scrutinized by project maintainers, but also anybystander
observing the development can comment on defects, propose implementation
suggestions,and critique the workof contributors. One of themost well-known
aphorisms of the open source movement “Given enough eyeballs, all bugs are
shallow”(Raymond, 1998) identifies anadvantage that may translate intomore
reliable software. In open source “All the world’s a stage” with open source
developersvery public actors on that stage. The internal exposure and review
of open source occurs not just when an application is being developed and
improvementsare reviewed by project developers and maintainers, but for the
entirelife cycleof the productbecause its codeis always open.These theoretical
benefitsof open source appear to beverified by data. For example,a significant
empiricalstudy described in Reasoning Inc. (2003) indicates that free MySQL
hadsix timesfewer defectsthan comparable proprietarydatabases (Tong,2004).
Alegendaryacknowledgment of Linux reliabilitywas presented in the famous
MicrosoftHalloween documents (Valloppillil,1998) which described Linux as
havinga failure rate two to five times lower than commercial Unix systems.
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
6 1Introduction
The open source Linux platform is the most widely ported operating sys-
tem.It is dominant on servers, workstations,and supercomputers and is widely
used in embedded systems like digital appliances. In fact, its portability is
directly related to the design decisions that enabled the distributed open style
of development under which Linux was built in the first place. Its software
organizationallowed architect Linus Torvaldsto manage core kernel develop-
ment while other distributed programmers could work independently on so-
called kernel modules (Torvalds,1999). This structure helped keep hardware-
specificcode like device drivers out of the corekernel, keeping the core highly
portable(Torvalds, 1999).Another key reason whyLinux is portable isbecause
the GNU GCC compiler itself is ported to most “major chip architectures”
(Torvalds,1999,p.107).Ironically, it is the open source Wine software that
lets proprietary Windowsapplications portably run on Linux. Of course, there
are open source clones of Windows products like MS Office that work on
Windows platforms. A secondary consideration related to portability is soft-
ware localization and the related notion of internationalization. Localization
refers to the ability to represent a system using a native language. This can
involvethe language a system interface is expressed in, character-sets or even
syntactical effects like tokenization (since different human languages are bro-
kenupdifferently, which can impact the identification of search tokens). It
may be nontrivial for a proprietary package that is likely to have been devel-
oped by a foreign corporation to be localized, since the corporate developer
may only be interested in major language groupings. It is at least more nat-
ural for open software to be localized because the source code is exposed
and there may be local open developers interested in the adaptation. Interna-
tionalization is a different concept where products are designed in the first
place so that they can be readily adapted, making subsequent localization
easier. Internationalization should be more likely to be on the radar screen
in an open source framework because the development model itself is inter-
national and predisposed to be alert to such concerns. However, Feller and
Fitzgerald(2002) who are sympathetic to free software critique it with respect
to internationalization and localization, contrasting what appears to be, for
example, the superior acceptability of the Microsoft IIS server versus Apache
on these metrics. They suggest the root of the problem is that these char-
acteristics are harder to “achieve if they are not factored into the original
design” (p. 113). Generally, open source seems to have an advantage in sup-
porting the customization of applications over proprietary code, because its
code is accessible and modification of the code is allowed by the software
license.
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.1Why Open Source 7
1.1.3 Forward-looking Effects
Is open source innovative or imitative? The answer is a little of both. On the
onehand, open source products are often developed by imitating the function-
ality of existing proprietary products, “following the taillights” as the saying
goes.This is what theGNOME project does for desktopenvironments, just like
Appleand Microsofttook off onthe graphical environmentsdeveloped atXerox
PARCin the early1980s. However, open developmenthas also been incredibly
innovative in developing products for the Internet environment, from infras-
tructure software like code implementing the TCP/IP protocols, the Apache
Web server, the early browsers at CERN and NCSA that led to the explosion
of commercial interest in the Internet to hugely successful peer-to-peer file
distributionsoftware like BitTorrent. Much ofthe innovation in computing has
traditionallyemerged from academicand governmental research organizations.
The open source model provides a singularly appropriate outlet for deploying
theseinnovations: in a certain sense it keeps these works public.
In contrast, Microsoft, the preeminent proprietary developer, is claimed by
manyin the open community to have a limited record of innovation. A typical
contention is illustrated in the claim by the FSF’s Moglen that “Microsoft’s
strategy as a business was to find innovative ideas elsewhere in the software
marketplace, buy them up and either suppress them or incorporate them in its
proprietary product” (Moglen, 1999). Certainly a number of Microsoft’s sig-
nature products have been reimplementations of existing software (Wheeler,
2006)oracquisitions which were possibly subsequently improved on. These
includeQDOS (later MS-DOS)from Seattle Computer in1980 (Conner, 1998),
FrontPagefrom Vermeer in 1996(Microsoft Press Release, 1996), PowerPoint
from Forethought in 1987 (Parker, 2001), and Cooper’s Tripod subsequently
developed at Microsoft into Visual Basic in 1988 (Cooper, 1996). In a sense,
these small independent companies recognized opportunities that Microsoft
subsequently appropriated. For other examples, see McMillan (2006). On the
other hand, other analysts counter that a scenario where free software domi-
nateddevelopment could seriously undermineinnovation. Thus Zittrain (2004)
critically observes that “no one can readily monopolize derivatives to popular
freesoftware,” which is a precondition to recouping the investments needed to
improvethe original works; see also Carroll (2004).
Comparisons with proprietary accomplishments aside, the track record on
balancesuggests thatthe opensource paradigm encouragesinvention.The avail-
ability of source code lets capable users play with the code, which is a return
to a venerable practice in the history of invention: tinkering (Wheeler, 2005).
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
8 1Introduction
The public nature of Internet-based open development provides computer sci-
encestudents everywhere with anever-available set ofworld-class examples of
software practice. The communities around open source projects offer unique
environmentsfor learning. Indeed, the opportunity to learn is one of the most
frequentlycited motivations for participating in such development. The model
demonstrablyembodies a participatory worldwide engine of invention.
1.1.4 Economic Impact
Freeand open software is an important and established feature of the commer-
cial developmentlandscape. Granted, no open source company has evolved to
anything like the economic status of proprietary powerhouses like Microsoft;
nonetheless, the use of open source, especially as supporting infrastructure
for proprietary products, is a widely used and essential element of the busi-
ness strategies of major companies from IBM to Apple and Oracle. Software
companiestraditionally rely at least partly onclosed, proprietary code to main-
tain their market dominance. Open source, on the other hand, tends to under-
minemonopoly, thelikelihood of monopolistic dominancebeing reduced to the
extentthat majorsoftware infrastructure systemsand applications are open.The
largestproprietary software distributors are U.S. corporations – a factor that is
increasinglyencouraging counterbalancing nationalistic responses abroad. For
example,foreign governments are more than everdisposed to encourage a pol-
icy preference for open source platforms like Linux. The platforms’ openness
reducestheir dependency on proprietary, foreign-produced code, helps nurture
thelocal pool of software expertise, and preventslock-in to proprietary distrib-
utorsand a largely English-only mode where local languages may not even be
supported.Software is a corecomponent of governmental operation andinfras-
tructure,so dependency on extranational entities is perceived as a security risk
anda cession of control to foreign agency.
At the macroeconomic level, open source development arguably reduces
duplication of effort. Open code is available to all and acts as a public reposi-
tory of software solutions to a broad range of problems, as well as best prac-
tices in programming. It has been estimated that 75% of code is written for
specific organizational tasks and not shared or publicly distributed for reuse
(Stoltz,1999). The open availability of such source code throughout the econ-
omy would reduce the need to develop applications from scratch. Just as soft-
ware libraries and objects are software engineering paradigms for facilitating
softwarereuse, at a much grander scale the opensource movement proposes to
preserveentire ecosystems of software, open for reuse, extension, and modifi-
cation. It has traditionally been perceived that “open source software is often
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.1Why Open Source 9
geared toward information technology specialists, to whom the availability of
source code can be a real asset, (while) proprietary software is often aimed
at less sophisticated users” (Hahn, 2002). Although this observation could be
refined,generally a majorappeal of open sourcehas been that itscode availabil-
itymakes it easier for firms to customize the softwarefor internal applications.
Such in-house customization is completely compatible with all open source
licensesand is extremely significant since most software is actually developed
or custom-designed rather than packaged (Beesen, 2002). As a process, open
source can also reduce the development and/or maintenance risks associated
with software development even when done by private, for-profit companies.
Forexample,considercode that hasbeen developed internally fora company.It
mayoften have little or noexternal sales value to the organization,even though
itprovides a useful internal service. Stallman (1999) recounts theexample of a
distributedprint-spooler written for an in-house corporate network. There was
a good chance the life cycle of the code would be longer than the longevity
of its original programmers. In this case, distributing the code as open source
created the possibility of establishing an open community of interest in the
software. This is useful to the company that owns the code since it reduces
the risk of maintenance complications when the original developers depart.
Withany luck, it may connect the software to a persistent pool of experts who
become familiar with the software and who can keep it up to date for their
own purposes. More generally, open development can utilize developers from
multipleorganizations in order tospread out development risksand costs, split-
ting the cost among the participants. In fact, while much open source code
has traditionally been developed with a strong volunteer pool, there has also
beenextensive industrial support for open development. Linux development is
a prime example. Developed initially under the leadership of Linus Torvalds
usinga purely volunteermodel, most current Linuxcode contributions are done
byprofessional developers who are employees of for-profit corporations.
References
Beesen, J. (2002). What Good is Free Software? In: Government Policy toward Open
Source Software, R.W. Hahn (editor). Brookings Institution Press, Washington,
DC.
Carroll, J. (2004). Open Source vs. Proprietary: Both Have Advantages. ZDNet
Australia.http://opinion.zdnet.co.uk/comment/0,1000002138,39155570,00.htm.
AccessedJune 17, 2007.
Conner,D. (1998). Fatherof DOS Still HavingFun at Microsoft, MicrosoftMicroNews,
April10. http://www.patersontech.com/Dos/Micronews/paterson04
10 98.htm.
AccessedDecember 20, 2006.
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
10 1 Introduction
Cooper, A. (1996). Why I Am Called “the Father of Visual Basic,” Cooper Interac-
tion design. http://www.cooper.com/alan/father
of vb.html. Accessed December
20,2006.
Cowan,C. (2003). Software security for open-source systems. IEEE Security and Pri-
vacy,1,38–45.
Feller,J. and Fitzgerald,B. (2002). Understanding OpenSource Software Development.
Addison-Wesley,Pearson Education Ltd., London.
Hahn,R. (2002). Government Policy towardOpen Source Software: An Overview.In:
GovernmentPolicy toward Open Source Software,R.W. Hahn (editor). Brookings
InstitutionPress, Washington, DC.
HoepmanJ.H. and Jacobs, B. (2007). Increased Security through Open Source, Com-
municationsof the ACM, 50(1), 79–83.
McMillan,A.(2006). Microsoft “Innovation.”http://www.mcmillan.cx/innovation.html.
AccessedDecember 20, 2006.
Microsoft Press Release. (1996). Microsoft Acquires Vermeer Technologies Inc., Jan-
uary 16th. http://www.microsoft.com/presspass/press/1996/jan96/vrmeerpr.mspx.
AccessedDecember 20, 2006.
Moglen,E. (1999). Anarchism Triumphant: Free Software and the Death of Copyright.
FirstMonday, 4(8). http://www.firstmonday.org/issues/issue4
8/moglen/index.
html.Accessed January 5, 2007.
Parker,I. (2001). Absolute Powerpoint – Can a Software Package Edit Our Thoughts.
New Yorker,May 28. http://www.physics.ohio-state.edu/˜wilkins/group/powerpt.
html.Accessed December 20, 2006.
Raymond, E. (1999). The Revenge of the Hackers. In: Open Sources: Voices from the
OpenSource Revolution, M. Stone,S. Ockman, and C. DiBona (editors). O’Reilly
Media,Sebastopol, CA, 207–219.
Raymond,E.S. (1998). The Cathedral and the Bazaar. Fi rst Monday, 3(3). http://www.
firstmonday.dk/issues/issue3
3/raymond/index.html.Accessed December 3, 2006.
ReasoningInc. (2003). HowOpen Source and CommercialSoftware Compare: MySQL
whitepaper MySQL 4.0.16.http://www.reasoning.com/downloads.html. Accessed
November29, 2006.
Rosen,L. (2005). Open Source Licensing: Software Freedom and Intellectual Property
Law,Prentice Hall, Upper Saddle River, NJ.
Stallman, R. (1999). The Magic Cauldron. http://www.catb.org/esr/writings/magic-
cauldron/.Accessed November 29, 2006.
Stoltz, M. (1999). The Case for Government Promotion of Open Source Soft-
ware. NetAction White Paper. http://www.netaction.org/opensrc/oss-report.html.
AccessedNovember 29, 2006.
Tong,T.(2004). Free/Open SourceSoftware in Education. UnitedNations Development
Programme’sAsia-Pacific Information Programme, Malaysia.
Torvalds,L. (1999). The Linux Edge. In: Open Sources: Voicesfrom the Open Source
Revolution, M. Stone, S. Ockman, and C. DiBona (editors). O’Reilly Media,
Sebastopol,CA, 101–112.
Valloppillil, V. (1998). Open Source Software: A (New?) Development Methodol-
ogy.http://www.opensource.org/halloween/.The HalloweenDocuments. Accessed
November29, 2006.
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.2Preview 11
Wheeler, D. (2005). Microsoft the Innovator? http://www.dwheeler.com/innovation/
microsoft.html.Accessed November 29, 2006.
Wheeler, D. (2006). Why Open Source Software/Free Software (OSS/FS, FLOSS,
or FOSS)? Look at the Numbers! http://www.dwheeler.com/oss
fs why.html.
AccessedNovember 29, 2006.
Zittrain,J. (2004). Normative Principles for Evaluating Free and Proprietary Software.
Universityof Chicago Law Review, 71(1), 265–287.
1.2Preview
We will view the panorama of open source development through a number of
different lenses: brief descriptive studies of prominent projects, the enabling
technologies of the process, its social characteristics, legal issues, its status
as a movement, business venues, and its public and educational roles. These
perspectivesare interconnected. For example, technological issues affect how
the development process works. In fact, the technological tools developed by
open source projects have at the same time enabled its growth. The paradigm
has been self-hosting and self-expanding, with open systems like Concurrent
VersionsSystem (CVS) and the Internet vastly extending the scale on which
open development takes place. Our case studies of open projects will reveal
its various social, economic, legal, and technical dimensions. We shall see
how its legal matrix affects its business models, while social and psycholog-
ical issues are in turn affected by the technological medium. Though we will
separate out these various factors, the following chapters will also continu-
ally merge these influences. The software projects we consider are intended
to familiarize the reader with the people, processes, and accomplishments of
freeand open development, focusingon Internet applications and free software
platforms. The enabling technologies of open development include the fasci-
natingversioning systems both centralized anddistributed that make enormous
openprojects feasible. Such novel modes of collaboration invariably pose new
questions about the social structures involved and their affect on how people
interact,as well as the psychological and cognitivephenomena that arise in the
newmedium/modality. Open developmentis significantly dependent on alegal
infrastructureas well as on a technological one, so we willexamine basic legal
conceptsincluding issues likelicensing arrangements and thechallenge of soft-
warepatents. Social phenomenalike open developmentdo not justhappen; they
depend on effectiveleadership to articulate and advance the movement. In the
caseof free and open software, we shall see how the FSF and the complemen-
tary OSI have played that role. The long-term success of a software paradigm
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
12 1 Introduction
requires that it be economically viable. This has been accomplished in free
software in different ways, from businesses based purely on open source to
hybrid arrangements more closely aligned with proprietary strategies. Beyond
the private sector,we consider the public sector of education and government
andhow theycapitalize on opensource or affectits social role.We willclose our
treatment by briefly considering likely future developments, in a world where
information technology has become one of the central engines of commerce
andculture.
Section One of the book covers key open source Internet applications and
platforms, and surveys technologies used in distributed collaborative open
development.Section Twoaddresses social issuesranging from the demograph-
icsof participants to legal issues andbusiness/economic models. Section Three
highlightsthe role of the Free Software Foundation in the movement, the rela-
tionof open sourceto the public sectorin government and education,and future
prospects.A glimpse of the topics covered by the remaining chapters follows.
Chapter 2 recounts some classic stories of open development related to
theInternet, like Berners-Lee’s groundbreakingwork on the Webat CERN, the
developmentof the NCSA HTTP Web serverand Mosaic browser, the Apache
project,and more. These casestudies represent remarkable achievementsin the
history of business and technology. They serve to introduce the reader unfa-
miliar with the world of open source to some of its signature projects, ideas,
processes, and people. The projects we describe have brought about a social
andcommunications revolution thathas transformed society. Thestory of these
achievements is instructive in many ways: for learning how the open source
process works, what some of its major attainments have been, who some of
the pioneering figures in the field are, how projects have been managed, how
peoplehave approached developmentin this context,what motivations haveled
people to initiate and participate in such projects, and some of the models for
commercialization.We consider the servers and browsers thatfueled the Inter-
net’s expansion, programming languages like Perl and PHP and the MySQL
database so prominent in Internet applications, newer systems like BitTorrent,
Firefox,and others.We alsoreview the Fetchmailproject thatbecame famous as
anexemplar ofInternet-based, collaborative, bazaar-styledevelopment because
ofa widely influential essay.
Chapter 3 explores the open source platform by which we mean the open
operating systems and desktops that provide the infrastructure for user inter-
action with a computer system. The root operating system model for open
source was Unix. Legal and proprietary issues associated with Unix led to the
development of the fundamentally important free software GNU project, the
aim of which was to create a complete and self-contained free platform that
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.2Preview 13
would allow anyone to do all their software development in a free software
environment. The flagship Linux operating system evolved out of a port of a
Unix variant to a personal computer environment and then burgeoned into the
centerpieceproject of the open software movement. The Linux and free Unix-
like platforms in turn needed a high-quality desktop style interface and it was
out of this imperative that the two major open desktops GNOME and KDE
emerged,which in turn dependedon the fundamental functionality providedby
the X Window System. This chapter recounts these epic developments in the
historyof computing, describing the people, projects, and associated technical
andlegal issues.
Chapter 4 overviews the key technologies used to manage open source
projects,with aspecial emphasis onCVS. Thefree software movementemerged
in the early 1980s, at a time when the ARPANET network with its several
hundred hosts was well-established and moving toward becoming the Inter-
net.The ARPANET allowed exchanges like e-mail andFTP, technologies that
significantly facilitated distributed collaboration, though the Internet was to
greatly amplify this. The TCP/IP protocols that enabled the Internet became
the ARPANETstandard on January 1, 1983, about the same time the flagship
opensource GNU project was announced by freesoftware leader and advocate
Richard Stallman. By the late 1980s the NSFNet backbone network merged
with the ARPANET to form the emerging worldwide Internet. The exponen-
tialspread of the Internet catalyzed further proliferation of open development.
The specific communications technologies used in open source projects have
historicallytended to be relatively lean: e-mail, mailing lists, newsgroups, and
lateron Websites, Internet Relay Chat, andforums. Major open sourceprojects
likeLinux in theearly 1990s still beganoperation with e-mail, newsgroups,and
FTP downloads to communicate. Newsgroups provided a means to broadcast
ideas to targeted interest groups whose members might like to participate in
adevelopment project. Usenet categories acted like electronic bulletin boards
which allowed newsgroup participants to post e-mail-like messages, like the
famous comp.os.minix newsgroup on Usenet used by Linus Torvalds to ini-
tiate the development of Linux. A powerful collaborative development tool
wasdeveloped during the late 1980s and early 1990s that greatly facilitated
managingdistributed software development:the versioning system. Versioning
systems are software tools that allow multiple developers to work on projects
concurrentlyand keeptrack of changes madeto the code.This chapter describes
in some detail how CVS works. To appreciate what it does it is necessary to
have a sense of its commands, their syntax, and outputs or effects and so we
examinethese closely. Wealso consider newer versioning tools like the decen-
tralized system BitKeeper that played a significant role in the Linux project
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
14 1 Introduction
for a period of time, its free successor Git, and the Subversion system. Other
means that have facilitated open source development have been the software
hosting facilities that help distributed collaborators manage their open source
projects and provide source code repositories for projects. We describe some
ofthe services they provide and the major Web sites.
Thereare many demographic, social, psychological, cognitive,process, and
mediacharacteristics that affectopen source development.Chapter 5 overviews
someof these. It also introduces a variety of concepts from the social sciences
that can be brought to bear on the open source phenomenon to help provide
a framework for understanding this new style of human, scientific, and com-
mercialinteraction. We first of all consider the basic demographics of the phe-
nomenon,such asthe numberand scale ofprojects underdevelopment, the kinds
ofsoftware thattend to beaddressed, population characteristics andmotivations
fordevelopersand community participants,how participantsinteract. Wesurvey
relevantconcepts from social psychology, including the notions of norms and
roles,the factors that affect group interactionslike compliance, internalization,
andidentification, normative influences, theimpact of power relationships, and
groupcohesion. Ideaslike these fromthe field ofsocial psychologyhelp provide
conceptualtools forunderstanding open development.Other usefulabstractions
comefrom cognitive psychology,like the well-recognized cognitivebiases that
affectgroup interactions and problemsolving. Social psychology also provides
models for understanding the productivity of collaborative groups in terms of
what are called process losses and gains, as well as organizational effects that
affectproductivity. The impact of the collaborative medium on group interac-
tionsis worth understanding,so we briefly describesome of theclassic research
on the effect of the communications medium on interaction. Like the field of
socialpsychology, media research offersa rich array of concepts and a pointof
departurefor understandingand analyzing distributedcollaboration. Potentially
useful concepts range from the effect of so-called common ground, coupling,
andincentive structures,to the useof social cuesin communication, therichness
ofinformational exchanges,and temporal effectsin collaboration. Weintroduce
thebasic concepts and illustrate their relevance to open collaboration.
The open source movement is critically affected by legal issues related to
intellectual property. Intellectual property includes creations like copyrighted
works,patented inventions,and proprietarysoftware. Theobjective of Chapter6
isto surveythe related legalissues in away that isinformative for understanding
theirimpact on free and opendevelopment. In addition to copyrightand patent,
wewill touchon topics likesoftware patents,licenses and contracts,trademarks,
reverseengineering, the notion ofreciprocity in licensing, and derivativeworks
insoftware. The legal and businessmechanisms to protect intellectual property
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.2Preview 15
areintended to address what is usually considered to be its core problem: how
to protect creations in order to provide incentivesfor innovators. Traditionally
such protection has been accomplished through exclusion. For example, you
cannotdistribute a copyrighted work foryour own profit without the authoriza-
tion of the copyright owner. The FSF’s GPL that lies at the heart of the free
softwaremovement takes a verydifferent attitude to copyright, focusing noton
how to invoke copyright to exclude others from using your work, but on how
toapply it to preserve the free and open distribution of your work, particularly
when modified. Wedescribe the GPL and the rationales for its conditions. We
also consider the OSI and the motivations for its licensing criteria. The OSI,
cofounded by Eric Raymond and Bruce Perens in 1998, was established to
representwhat was believed to be amore pragmatic approach to open develop-
mentthan that championed by the FSF.The OSI reflected the experience ofthe
streamof the free software movement that preferred licenses like the BSD and
MIT licenses which appeared more attractive for commercial applications. It
reflectedthe attitude of developers like McKusick of the BSDproject and Get-
tys of the X Window System. Wedescribe some of the OSI-certified software
licenses including the increasingly important Mozilla Public License. Wealso
brieflyaddress license enforcement and international issues, and the status and
conditionsof the next version of the GPL: GPLv3.
Chapter 7 examines economic concepts relevant to open source develop-
ment,the basic business modelsfor open products, theimpact of software com-
moditization, and economic models for why individuals participate in open
development.Some of the relevant economic concepts include vendor lock-in,
network effects (or externalities), the total cost of use of software, the impact
oflicensing on business models,complementary products, and the potentialfor
customizability of open versus proprietary products. The basic open business
models we describe include dual licensing, consultation on open source prod-
ucts, provision of open source software distributions and related services, and
the important hybrid models like the use of open source for in-house devel-
opment or horizontally in synergistic combination with proprietary products,
such as in IBM’s involvementwith Apache and Linux. We also examine soft-
ware commoditization, a key economic phenomenon that concerns the extent
to which a product’sfunction has become commoditized (routine or standard)
over time. Commoditization deeply affects the competitive landscape for pro-
prietaryproducts. We will present some of the explanations that have been put
forth to understand the role of this factor in open development and its impli-
cations for the future. Finally, observers of the open source scene have long
been intrigued by whether developers participate for psychological, social, or
other reasons. We will consider some of the economic models that have been
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
16 1 Introduction
offeredto explain why developersare motivated to workon these projects. One
model, based on empirical data from the Apache project, uses an effect called
signaling to explain why individuals find it economically useful to volunteer
foropen source projects. Anothermodel proposes that international differences
in economic conditions alter the opportunity cost of developer participation,
which in turn explains the relative participation rates for different geographic
regions.
The chapter on legal issues recounted the establishment and motivation for
the OSI in 1998 and Chris Peterson’s coinage of the open source designation
as an alternative to what was thought to be the more ideologically weighted
phrasefree software. The OSI represents onemain stream of the open software
movement.Of course, the stream of the movement represented bythe FSF and
the GNU project had already been formally active since the mid-1980s. The
FSFand its principals, particularlyRichard Stallman, initiated thefree software
concept,defined its terms, vigorouslyand boldly publicized its motivationsand
objectives, established the core GNU project, and led advocacy for the free
software movement. They have been instrumental in its burgeoning success.
Chapter 8 goes into some detail to describe the origin and technical objectives
of the GNU project, which represents one of the major technical triumphs of
the free software movement. It also elaborates on the philosophical principles
espousedby the FSF,as well assome of the rolesand services theFSF provides.
Chapter 9 considers the role of open source in the public sector which,
in the form of government and education, has been critical to the creation,
development,funding, deployment, and promotion/advocacyof open software.
The public sector continues to offer well-suited opportunities for using and
encouraging open source, in domains ranging from technological infrastruc-
ture to national security, educational use, administrative systems, and so on,
bothdomestically and internationally. Open sourcehas characteristics that nat-
urally suit many of these areas. Consider merely the role of the public sector
insupporting the maintenance and evolutionof technological infrastructure for
society,an area in which open software has proven extremely successful. The
governmenthas also historically played an extensiverole in promoting innova-
tion in science and technology. For example, the federal government was the
leaderin funding the development of theInternet with its myriad of underlying
open software components. Thus public investment in open development has
paid off dramatically in the past and can be expected to continue to do so in
the future. The transparency of open source makes it especially interesting in
national security applications. Indeed, this is an increasingly recognized asset
ininternational use where proprietarysoftware may be considered,legitimately
ornot, as suspect. Not only do governmental agencies benefit as users of open
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
1.2Preview 17
source,government andeducational institutions alsoplay a rolein promoting its
expandeduse. Governmentalpolicy decisions,whether ofa legislativeor policy-
drivencharacter, can significantly affect the expansion of open software use in
thegovernment andby the public. Forexample, nationalistic concernsabout the
economicautonomy of local softwareindustries or about nationalsecurity have
made open source increasingly attractive in the international arena. Lastly,we
willaddress at somelength the usesand advantagesof open source ineducation,
includingits unique role in computer science education.
We conclude our book in Chapter 10 with what, we believe, are the likely
scenariosfor the prospective roles of open and proprietary software.Our inter-
pretationis a balanced one. On the one hand, the open source paradigm seems
likelyto continue its advancetoward worldwide preeminence incomputer soft-
wareinfrastructure, not only in the network and itsassociated utilities, but also
in operating systems, desktop environments, and standard office utilities. Sig-
nificantly, the most familiar and routine applications seem likely to become
commoditizedand open source, resulting inpervasive public recognition of the
movement. The software products whose current dominance seems likely to
decline because of this transformation include significant parts of the current
Microsoft environment from operating systems to office software. However,
despite a dramatic expansion in the recognition and use of open source, this
in no ways means that open software will be dominant in software applica-
tions. To the contrary, the various dual modalities that have already evolved
arelikely to persist, with robust openand proprietary sectors each growing and
prevailing in different market domains. While on the one hand, some exist-
ing proprietary systems may see portions of their markets overtaken by open
source replacements, on the other hand proprietary applications and hybrid
modesof commercial development should continue to strengthen. Specialized
proprietary killer-apps serving mega-industries are likely to continue to domi-
natetheir markets, as will distributednetwork services built onopen infrastruc-
turesthat have beenvertically enhanced with proprietaryfunctionalities. Mixed
applicationmodes like those reflected in theWAMP stack (withWindows used
in place of Linux in the LAMP stack) and the strategically significant Wine
project that allows Windows applications to run on Linux environments will
also be important. The nondistributed, in-house commercial development that
hashistorically represented the preponderance of software development seems
likely to remain undisclosed either for competitive advantage or by default,
but this software is being increasingly built using open source components –
a trend that is already well-established. The hybrid models that have emerged
as reflected in various industrial/community cooperative arrangements, like
thoseinvolving the ApacheFoundation, the X WindowSystem, and Linux, and
P1:KAE
9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20
18 1 Introduction
based on industrial support for open projects under various licensing arrange-
ments, seem certain to strengthen even further. They represent an essential
strategyfor spreading the risks and costs of software development and provid-
ing an effective complementary set of platforms and utilities for proprietary
products.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
SECTION ONE
Open Source – Internet Applications,
Platforms, and Technologies
19
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
20
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2
Open Source Internet Application Projects
This chapter describes a number of open source applications related to the
Internet that are intended to introduce the reader unfamiliar with the world
of open development to some of its signature projects, ideas, processes, and
people. These projects represent remarkable achievements in the history of
technology and business. They brought about a social and communications
revolution that transformed society, culture, commerce, technology, and even
science. The story of these classic developments as well as those in the next
chapter is instructive in many ways: for learning how the open source process
works, what some of its major accomplishments have been, who some of the
pioneeringfigures in thefield are, howprojects have beenmanaged, how people
haveapproached developmentin this context, whatmotivations haveled people
toinitiate andparticipate in suchprojects, and whatsome of thebusiness models
arethat have been used for commercializing associated products.
Web servers and Web browsers are at the heart of the Internet and free
software has been prominent on both the server and browser ends. Thus the
first open source project we will investigate is a server, the so-called National
Center for Supercomputing Applications (NCSA) Web server developed by
Rob McCool in the mid-1990s. His work had in turn been motivated by the
then recent creation by Tim Berners-Lee of the basic tools and concepts for a
WorldWide Web (WWW), including the invention of the first Webserver and
browser,HTML (the Hypertext Markup Language), and the HTTP (Hypertext
TransferProtocol). For various reasons, McCool’s server project subsequently
forked, leading to the development of the Apache Web server. It is instruc-
tive and exciting to understand the dynamics of such projects, the contexts
in which they arise, and the motivations of their developers. In particular, we
will examine in some detail how the Apache project emerged, its organiza-
tional processes, and what its development was like. Complementary to Web
21
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
22 2 Open Source Internet Application Projects
servers, the introduction of easily used Web browsers had an extraordinary
impact on Web use, and thereby a revolutionary effect on business, technol-
ogy,and society at large. The Mosaic, Netscape,and more recently the Firefox
browserprojects thatwe will discusseven sharedsome of thesame development
context. The success of the Mosaic browser project was especially spectacu-
lar. In fact it was instrumental in catalyzing the historic Internet commercial
revolution.Mosaic’s developer Marc Andreessen later moved on to Netscape,
where he created, along with a powerhouse team of developers, the Netscape
browserthat trumped all competition in the browserfield for several years. But
Netscape’sstunning success proved to be temporary. After its initial triumph,
acombination of Microsoft’s bundling strategiesfor Internet Explorer (IE) and
thelatter’s slowbut steady improvement eventuallywon the dayover Netscape.
Thingslay dormant inthe browser areafor a whileuntil Firefox, adescendant of
theNetscape Mozilla browser, cameback to challenge IE, as weshall describe.
Theprocess ofcomputer-supported, distributedcollaborative softwaredevel-
opmentis relativelynew. Although elementsof it havebeen around fordecades,
thekind ofdevelopment seen inLinux wasnovel. Eric Raymondwrote a famous
essay on Linux-like development in which he recounted the story of his own
Fetchmail project, an e-mail utility. Although Fetchmail is far less significant
as an open source product than other projects that we review, it has come to
havea mythicalpedagogical status in thefield because Raymondused its devel-
opment – which he intentionally modeled on that of Linux – as an exemplar
of howdistributed open development works and why people develop software
thisway. Raymond’s viewpoints werepublished in his widely influential essay
(Raymond,1998) that characterizedopen development as akinto a bazaar style
of development, in contrast to the cathedral style of development classically
describedin Fred Brooks’ famed The Mythical Man Month(twentieth anniver-
saryedition in 1995). Wewill describe Fetchmail’s developmentin some detail
becauseof its pedagogical significance.
We conclude the chapter with a variety of other important Internet-related
open applications. A number of these are free software products that have
beencommercialized using the so-calleddual licensing model. Theseare worth
understanding,first of all because licensing issues are importantin open devel-
opment, and secondly because there is an enduring need for viable business
strategies that let creators commercially benefit from open software. The first
of these dual licensed projects that we will consider is the MySQL database
system. MySQL is prominent as the M in the LAMP Web architecture, where
it defines the backend database of a three-tier environment whose other com-
ponents are Linux, Apache, Perl, PHP, and Python. Linux is considered in
Chapter 3. Perl and PHP are considered here. We describe the influential role
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 23
ofPerl and its widely usedopen source module collection CPAN,as well as the
server-sidescripting language PHP thathas its own rather interestingmodel for
commercialization.We alsobriefly consider BerkeleyDB and Sendmail (which
servesa substantial portion of all Internetsites). Both of these are duallicensed
free softwares. Additional business models for free software are examined in
Chapter 7. The peer-to-peer Internet utility BitTorrent is a more recent open
sourcecreation that exploits theinterconnectedness of the Internet network ina
novelway and isintellectually intriguing to understand.BitTorrent has,in a few
short years, come to dominate the market for transferring popular, large files
over theInternet. We complete the chapter witha brief look at the fundamental
BIND utility that underlies the domain name system for the Internet, which
makessymbolic Web names possible.The tale of BIND represents astory with
anunexpected and ironic business outcome.
2.1 The WWW and the Apache Web Server
The story of the Apache Web server is a classic tale of open development. It
hasits roots in the fundamental ideas for the WWW conceivedand preliminar-
ily implemented by Tim Berners-Lee at a European research laboratory. Soon
afterward,these applications were taken up by studentsat an American univer-
sity,where Berners-Lee’s Webbrowser and serverwere dramatically improved
upon and extended as the NCSA Web server and the Mosaic browser. The
NCSAserver project would in turnbe adopted and its design greatlyrevised by
anew distributed development team. The resulting Apache server’s entry into
themarketplace was rapid and enduring.
2.1.1 WWW Development at CERN
We begin by highlighting the origins of the Web revolution. The idea for the
WWWwas originatedby physicist Berners-Leeat theCERN physics laboratory
in Switzerland when he proposed the creation of a global hypertext system
in 1989. The idea for such a system had been germinating in Berners-Lee’s
mind for almost a decade and he had even made a personal prototype of it
in the early 1980s. His proposal was to allow networked access to distributed
documents,including theuse ofhyperlinks. As anMIT Webpage onthe inventor
says,
Berners-Lee’svision was to create a comprehensive collection of information in
word,sound and image, each discretely identified by UDIs and interconnected by
hypertextlinks, and to use the Internet to provide universal access to that collection
ofinformation (http://web.mit.edu/invent/iow/berners-lee.html).
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
24 2 Open Source Internet Application Projects
Berners-Lee implemented the first Web server and a text-oriented Web
browser and made it available on the Web in 1991 for the NeXT operating
system.In fact, he not only developedthe server and browser,but also invented
HTTP, HTML, and the initial URI version of what would later become URLs
(uniformresource locators).His HTTP protocolwas designedto retrieve HTML
documents over a network, especially via hyperlinks. He designed HTML for
hisproject by creating asimplified version ofan SGML DTD heused at CERN,
which had been intended for designing documentation. He introduced a new
hyperlink anchor tag <a> that would allow distributed access to documents
andbe central to theWWW paradigm (Berglundet al., 2004). Berners-Leekept
his prototype implementations simple and widely publicized his ideas on the
www-talkmailing list started at CERN in 1991. He named his browser World-
WideWeband called his Web server httpd (Berners-Lee, 2006). The server ran
as a Unix background process (or daemon), continually waiting for incoming
HTTPrequests which it would handle.
Atabout the same point in time, Berners-Lee became familiar with the free
softwaremovement. Indeed, the Free Software Foundation’sRichard Stallman
gavea talkat CERN inmid-1991. Berners-Lee recognized thatthe free software
communityoffered the prospect of a plentitude of programmervolunteers who
coulddevelop his workfurther, so he beganpromoting the development ofWeb
browser software as suitable for projects for university students (Kesan and
Shah, 2002)! He had his own programmer gather the software components he
haddeveloped intoa C librarynamed libwww, whichbecame thebasis for future
Web applications. Berners-Lee’s initial inclination was to release the libwww
contents under the Free Software Foundation’s GPL license. However, there
were concerns at the time that corporations would be hesitant to use the Web
if they thought they could be subjected to licensing problems, so he decided
torelease it as public domain instead, which was, in any case, the usual policy
at CERN. By the fall of 1992, his suggestions about useful student projects
would indeed be taken up at the University of Illinois at Urbana–Champaign.
In 1994, Berners-Lee founded and became director of the W3C (World Wide
Web Consortium) that develops and maintain standards for the WWW. For
further information, see his book on his original design and ultimate objective
forthe Web (Berners-Lee and Fischetti, 2000).
2.1.2 Web Development at NCSA
The NCSA was one of the hubs for U.S. research on the Internet. It produced
major improvements in Berners-Lee’sWeb server and browser, in the form of
the NCSA Web server (which spawned the later Apache Web server) and the
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 25
MosaicWebbrowser. Wewilldiscuss the NCSAserver projectand its successor,
thestill preeminent Apache Webserver, in this section. Thesubsequent section
will consider the Mosaic browser and its equally famous descendants, which
even include Microsoft’sown IE.
Like many open source projects, the now pervasive Apache Web server
originated in the creativity and drive of youthful computer science students.
One of them was Rob McCool, an undergraduate computer science major at
the University of Illinois and a system administrator for the NCSA. McCool
and his colleague Marc Andreessen at NCSA had become fascinated by the
developmentsat CERN. Andreessen was working on a new Web browser(the
Mosaicbrowser) andthought theCERN server wastoo “largeand cumbersome”
(McCoolet al., 1999). He askedMcCool to take a lookat the server code. After
doing so, McCool thought he could simplify its implementation and improve
its performance relying on his system administration experience. Of course,
this kind of response is exactly what Web founder Berners-Lee had hoped for
whenhe had widely advertised and promoted his work. Since Andreessen was
developing the new browser, McCool concentrated on developing the server.
Theresult was the much improved NCSA httpd server.
While McCool was developing the improved httpd daemon, Andreessen
came up with a uniform way of addressing Web resources based on the URL
(Andreessen, 1993). This was a critical development. Up to this point, the
Web had been primarily viewed as asystem for hypertext-based retrieval. With
Andreessen’s idea, McCool could develop a standardized way for the Web
server and browser to pass data back and forth using extended HTML tags
calledforms in what was later to become the familiar Common GatewayInter-
face or CGI. As a consequence of this, their extended HTML and HTTP Web
protocols“transcended their original conception tobecome the basis of general
interactive,distributed, client-server information systems” (Gaines and Shaw,
1996). The client and server could now engage in a dynamic interaction, with
the server interpreting the form inputs from the client and dynamically adapt-
ing its responses in a feedback cycle of client-server interactions. Gaines and
Shaw(1996) nicely describe this innovation as enabling the client to “transmit
structuredinformation from the userback to an arbitraryapplication gatewayed
throughthe server. The servercould then process that informationand generate
an HTML document which it sent back as a reply. This document could itself
contain forms for further interaction with the user, thus supporting a sequence
ofclient-server transactions.”
Intraditional open developmentstyle, McCool kept hisserver project posted
on a Web site and encouraged users to improve it by proposing their own
modifications. At Andreessen’s recommendation, the software was released
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
26 2 Open Source Internet Application Projects
undera very unrestrictiveopen softwarelicense (essentially publicdomain) that
basicallylet developers do whateverthey wanted with thesource code, just like
theBerners-Lee/CERN approach. The opencharacter of this licensing decision
would later significantly expedite the development or evolution of the NCSA
httpdserver intothe Apache server(see McCoolet al., 1999;Apache.pdf, 2006).
Foraperiod of time, McCool’s NCSA httpd daemon was the most popular
Web server on the Internet. Indeed, the Netcraft survey (netcraft.com) gave it
almost 60% of the server market share by mid-1995, surpassing the market
penetration of the CERN server which by then stood at only 20%. Although
Netcraft surveyed fewer than 20,000 servers at the time, there were already
millions of host computers on the Internet (Zakon, 1993/2006). The Apache
serverthat developed out of the NCSA server would be even more pervasively
deployed.
2.1.3 The Apache Fork
Ascommonly happens in open source projects, the original developers moved
on,in this case to work atNetscape, creating a leadership vacuum inthe NCSA
httpd project. After an interim, by early 1995, an interest group of Web site
administrators or “Webmasters” took over the development of the server. The
Webmasterswere motivatedby a mix of personal andprofessional needs, espe-
ciallydoing theirjobs better.Brian Behlendorf, acomputer scientist recentlyout
ofBerkeley, was one of them. He was developing the HotWired site for Wired
magazine for his consulting company and had to solve a practical problem:
theHotWired site needed password authentication ona large scale. Behlendorf
providedit by writinga patch tothe httpd serverto incorporate thisfunctionality
atthe required scale (Leonard, 1997). By this point, there were a large number
of patches for the httpd code that had been posted to its development mailing
list, but which, since McCool’s departure from NCSA, had gone unintegrated
becausethere was no oneat NCSA in chargeof the project. Usingthese patches
was time consuming: thepatches had to be individually downloaded and man-
uallyapplied to the NCSA base code, an increasingly cumbersome process. In
response to this unsatisfactory situation, Behlendorf and his band established
a group of eight distributed developers, including himself, Roy Fielding, Rob
Hartill, Rob Thau, and several others and defined a new project mailing list:
new-httpd. For a while after its inauguration, McCool participated in the new
mailing list, even though he was now at Netscape working on a new propri-
etary Webserver. Netscape did not consider the free source Apache project as
competitive with its own system, so initially there appeared to be no conflict
of interest. McCool was able to explain the intricacies of the httpd daemon’s
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 27
codeto the new group, a considerable advantage to the project. However,after
Apache’srelease, it quicklybecame clear from theNetcraft market share analy-
sesthat Apache would bea major competitor tothe proprietary Netscape server
McCool was involved with. Thus McCool once again removed himself from
participation(McCool et al., 1999).
Since the NCSA httpd daemon served as the point of departure for the
new project, the new server’s development can be thought of as a fork in the
development of the original httpd project. The new group added a number of
fixes which it then released as “a patchy” server. Eventually, they recognized
theyhad to revisethe code intoa completely rewrittensoftware architecture that
wasdevelopedby Rob Thau by mid-1995. Thau called his design Shambhala.
Shambhala utilized a modular code structure and incorporated an extensible
Application Programming Interface (API). The modular design allowed the
developersto work independently on different modules, a capability critical to
adistributed software developmentproject (Apache.pdf, 2006). Bythe summer
of 1995 the group had added a virtual hosting capability that allowed ISPs
to host thousands of Web sites on a single Apache server. This innovation
representeda highlyimportant capabilitylacking in thecompeting Netscapeand
MicrosoftWebservers. After considerablefurther developmentalmachinations,
the “Apache” version 1.0 was released at the end of 1995 together with its
documentation. The thesis by Osterlie (2003) provides a detailed technical
historyof the development based on the original e-mailarchives of the project.
Although the appellation Apache is allegedly associated with the customary
open source diff-and-patchtechniques used during its development, whence it
could be thought of as “a patchy” Web server, the FAQ on the server’s Web
site says it is eponymous for the American Indian tribe of the same name,
“known for their skill in warfare .. . and endurance.” Within a few years the
Apache server dominated the Web server market. By late 1996, according to
Netcraft.com, Apache already had 40% of the market share, by 2000 it was
about65%, and by mid-2005 it was over 70%,with Microsoft’s IIS lagging far
behind at around 20% of market penetration for years. More recent statistics
from Netcraft credit Apache with about 60% of the Web server market versus
30%for Microsoft IIS.
The review of the Apache project by McCool et al. (1999)givesan inside
lookat the project.Notably, the majordevelopers were not hobbyisthackers but
eithercomputer sciencestudents, PhDs,or professional softwaredevelopers.All
of them had other regular jobs in addition to their voluntary Apache involve-
ment. Their developer community had the advantage of being an enjoyable
atmosphere. Since the development occurred in a geographically distributed
context, it was inconvenient if not infeasible to have physical meetings. The
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
28 2 Open Source Internet Application Projects
circumstancesalso precluded relying on synchronous communication because
members had differentwork schedules. The volunteers had full-time job com-
mitments elsewhere and so could not predictably dedicate large portions of
their time to the project. Consequently,not only was the workspace decentral-
ized, the uncoordinated work schedules necessitated asynchronous communi-
cation.E-mail lists followednaturally asthe obvious meansfor communicating.
Mockuset al. (2002) observe how theApache development “began with a con-
sciousattempt to solvethe process issuesfirst, before developmenteven started,
because it was clear from the very beginning that a geographically distributed
set of volunteers, without any traditional organizational ties, would require a
uniquedevelopment process in order to make decisions.” Their procedures for
decision making and coordinating the project had to reflect its asynchronous,
distributed,volunteer, and shared leadership character, so the team “needed to
determine group consensus, without using synchronous communication, and
in a way that would interfere as little as possible with the project progress”
(Fielding,1999,p.42).
The organizationalmodel they chose was quite simple: voting on decisions
was done through e-mail, decisions were made on the basis of a voting con-
sensus, and the source code (by 1996) was administered under the Concurrent
VersionsSystem (CVS). The core developers for Apache, a relatively small
group originally of less than ten members, were the votes that really counted.
Anymailing list member could express an opinion but “only votes cast by the
Apache Group members were considered binding” (McCool et al., 1999). In
orderto commit a patchto the CVS repository,there had tobe at least threepos-
itivevotes and no negative votes.Forother issues, therehad to be at least three
positivevotes, and the positive votes had to constitute amajority. A significant
tactical advantage of this approach was that the process required only partial
participation,enabling the project toproceed without hindrance, eventhough at
anygiven pointin time only afew core developersmight be active.Despite such
partialparticipation, the votingprotocol ensured thatdevelopment progress still
reflectedand required a reasonable level of peer review and approval. Because
negativevotes actedas vetoes inthe case of repositorychanges, such voteswere
expected to be used infrequently and required an explanation. One acceptable
rationalefor a veto mightbe to reject a proposed changebecause it was thought
thatit would interfere with thesystem’s support for a majorsupported platform
(McCool et al., 1999). Another acceptable rationale for a veto was to keep the
system simple and prevent an explosion of features. A priori, it might appear
thatdevelopment deadlockswould occur frequentlyunder such avoting system,
but the knowledge-base characteristics of the core developer group tended to
preventthis. Each of the group members tended to represent disjoint technical
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 29
perspectives and so they primarily enforced “design criteria” relevant to their
ownexpertise (McCool et al., 1999). Of course, problems could occur when
development was rapid but the availability of CVS kept the process simple
and reversible. Relatively routine changes could be committed to the repos-
itory first and then retroactively confirmed, since any patch could be easily
undone. Although participants outside the core group were restricted in their
votingrights, McCool’s reviewconfirms the benefitsderived from the feedback
obtainedfrom users via newsgroups and e-mail.
TheApache Group that guided the project had eightfounding members and
by the time of the study by Mockus et al. (2002) had grown to twenty-five
members,though for most of the development period there were only half that
many.Refer to http://httpd.apache.org/contributors/for a current list ofApache
contributors,their backgrounds, and technical contributions to the project. The
coredevelopers were not quite synonymous with this groupbut included those
groupmembers active at a givenpoint in time and those aboutto be eligible for
membershipin the group, again adding upto about eight members in total. The
Apache Group members could both vote on code changes and also had CVS
commit access. In fact, strictly speaking, any member of the developer group
could commit code to any part of the server, with the group votes primarily
usedfor code changes that might have an impact on other developers (Mockus
etal., 2002).
Apache’s pragmatic organizational and process model was in the spirit of
the Internet Engineering Task Force (IETF) philosophy of requiring “rough
consensusand working code” (see suchas Bradner (1999) and Moody (2001)).
Thismotto was coined by DaveClark, Chief Protocol Architect forthe Internet
during the 1980s and one of the leaders in the development of the Internet. In
alegendary presentation in 1992, Clark had urged an assembled IETF audi-
ence to remember a central feature of the successful procedure by which the
IETF established standards, namely “We reject: kings, presidents, and voting.
We believe in: rough consensus and running code” (Clark, 1992). In the IETF,
theexpression rough consensus meant80–90% agreement, reflecting a process
wherein “a proposal must answer to criticisms, butneed not be held up if sup-
ported by a vast majority of the group” (Russell, 2006,p.55).The condition
about running code meant that a party behind a proposed IETF standard was
requiredto provide“multiple actual andinteroperable implementationsof a pro-
posedstandard (which) must existand be demonstrated before theproposal can
be advanced along the standards track” (Russell, 2006,p.55).The pragmatic,
informalIETF process stood in stark contrast to the laborious ISO approach to
developingstandards, a process that entailed having a theoretical specification
prior to implementation of standards. The IETF approach and Clark’s stirring
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
30 2 Open Source Internet Application Projects
phrase represented an important “bureaucratic innovation,” a way of doing
things that “captured the technical and political values of Internet engineers
during a crucial period in the Internet’s growth” (Russell, 2006,p.48). Free
software advocate Lawrence Lessig (1999,p.4)describedit as “a manifesto
that will define our generation.” Although its circumstances and process were
not identical to Apache’s, the IETF’s simple pragmatism reflected the same
spirit that let productive, creative work get done efficiently, with appropriate
oversight,but minimal bureaucratic overhead.
By 1998, the project had been so remarkably successful that IBM asked to
join the Apache Group, a choice that made corporate sense for IBM since its
corporatefocus had become providingservices rather than marketing software.
TheApache Group decided to admit the IBM developerssubject to the group’s
normal meritocratic requirements. The group intended the relationship with
IBMto serve as a model forfuture industrial liaisons (McCool et al., 1999). As
of this writing a significant majority of the members of the Apache Software
Foundation appear to be similarly industrially affiliated (over 80%) based on
themember list at http://www.apache.org/foundation/members.html (accessed
January5, 2007).
ApacheDevelopment Process
Mockuset al. (2002) providea detailed analysis of theprocesses, project devel-
opmentpatterns, and statisticsfor the Apache project.The generic development
processapplied by a core developer was as follows:
r
identifya problem or a desired functionality;
r
attemptto involve a volunteer in the resolution of the problem;
r
testa putative solution in a local CVS source copy;
r
submitthe tested code to the group to review; and
r
onapproval, commit the code to the repository (preferably as a single
commit)and document the change.
Newwork efforts were identified in severalways: via the developer mailing
list,the Apache USENET groups, and the BUGDB reporting system (Mockus
et al., 2002). The developer mailing list was the most important vehicle for
identifying changes. It was the key tool for discussing fixes for problems and
newfeatures andwas giventhe highest priorityby the developers,receiving “the
attentionof allactive developers”for thesimple reason thatthese messages were
most likely to come from other active developers and so were deemed “more
likelyto contain sufficient informationto analyze the request or containa patch
to solve the problem” (Mockus et al., 2002). Screening processes were used
forthe other sources. The Apache BUGDB bug-reportingtool was actually not
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 31
directly used by most developers, partly because of annoying idiosyncrasies
in the tool. Instead, a few developers filtered the BUGDB information and
forwarded entries thought to be worthwhile to the developer mailing list. The
Apache USENET groups were also used less than one might expect because
they were considered “noisy.” Once again, volunteers filtered the USENET
information, forwarding significant problems or useful enhancements to the
developermailing list.
Oncea problem wasidentified, the nextissue was “whowould do thework?”
Atypical practice was for the core developers associated with the code for the
affected part of the system, having either developed it or spent considerable
time maintaining it, to take responsibility for the change. This attitude reflects
an implicit kind of code ownership (Mockus et al., 2002). Correlative to this
cultural practice, new developers would tend to focus on developing new fea-
tures(whence features thathad no priorputative “owner”)or to focuson parts of
theserver that were not actively being worked on by their previous maintainer
(and so no longer had a current “owner”). These practices were deferred to by
other developers. As a rule, established activity and expertise in an area were
thedefault guidelines. In reality,the actual practice of the developerswas more
flexible. Indeed, the data analysis provided by Mockus et al. (2002) suggests
thatthe Apache group’s core developershad sufficient respect for the expertise
ofthe other core developers that theycontributed widely to one another’smod-
ulesaccording to developmentneeds. Thus the notionof code ownership wasin
reality “more a matter of recognition of expertise than one of strictly enforced
abilityto make commits to partitions of the code base” (Mockus et al., 2002).
Regarding solutions to problems, typically several alternatives were first
identified.These were then forwarded by the volunteer developer,self-charged
with the problem, to the developer mailing list for preliminary feedback and
evaluation prior to developing the actual solution. The prototype solution
selected was subsequently refined and implemented by the originating devel-
oper and then tested on his local CVS copy before being committed to the
repository.The CVS commit itself couldbe done in two ways: using acommit-
then-reviewprocess that was typically applied in development versions of the
system,versus a post-for-review-first process in which the patch was posted to
the developer mailing list for prior review and approval before committing it,
as would normally be done if it were a stable release being modified (Mockus
etal., 2002). In either case, the modifications, including both the patch and the
CVS commit log, would be automatically sent to the developer mailing list.
It was not only standard practice that the core developers reviewed all such
changes as posted, butthey were also available to be reviewed by anyone who
followedthe developer mailinglist. The Apache Groupdetermined when a new
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
32 2 Open Source Internet Application Projects
stablerelease of the product was to be distributed. An experienced core devel-
operwho volunteered to act as the release manager, would, as part ofthat role,
identify any critical open problems and shepherd their resolution, including
changesproposed from outside the core developer group. The releasemanager
also controlled access to the repository at this stage and so any development
codethat was supposed to be frozen at this stage was indeed left alone.
Thedevelopment group achievedeffective coordinationin a variety ofways.
Akey software architecture requirement was that the basic server functional-
ity was intentionally kept limited in scope, with peripheral projects providing
added functionality by interfacing with the core server through well-defined
interfaces. Thus the software architecture itself automatically helped ensure
propercoordination, without significant additional effortrequired by the devel-
opergroup since the interfaceitself enforced the necessarycoordination. Exter-
nal developers who wanted to add functionality to the core Apache server
werethereby accommodated bya “stable, asymmetrically-controlledinterface”
(Mockus et al., 2002). The presence of this API has been a key feature in the
success of Apache since it greatly facilitates expanding the system’s func-
tionality by the addition of new modules. On the other hand, coordination of
developmentwithin the core area was handled effectivelyby the simple means
described previously,informally supported by the small core group’s intimate
knowledgeof the expertise of their own members. The relative absence of for-
mal mechanisms for approvalor permission to commit code made the process
speedybut maintained high quality. Bug reporting and repair were also simple
in terms of coordination. For example, bug reporting was done independently
byvolunteers. It entailed no dependencies that could lead to coordination con-
flicts, since these reports themselves did not change code, though they could
lead to changes in code. Similarly, most bug fixes themselves were relatively
independentof one another with the primary effort expended in tracking down
the bug, so that once again coordination among members was not a major
issue.
StatisticalProfile of Apache Development
Well-informedand detailedempirical studies of projectson the scale ofApache
are uncommon. Therefore, it is instructive to elaborate on the statistical anal-
ysis and interpretations provided in Mockus et al. (2002). The credibility
of their analysis is bolstered by the extensive commercial software develop-
ment experience of its authors and the intimate familiarity of second author
Roy Fielding with Apache. The study analyzes and compares the Apache and
Netscape Mozilla projects based on data derived from sources like the devel-
opere-mail lists,CVS archives,bug-reporting systems,and extensiveinterviews
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 33
with project participants. We will focus on the results for the Apache project.
(Another worthwhile study of the structure of open projects is by Holck and
Jorgensen(2005), which compares the Mozilla and FreeBSD projects. It pays
special attention to how the projects handle releases and contributions as well
astheir internal testing environments.)
TheApache server had about80,000 lines of source codeby 2000 (Wheeler,
2000),with approximately 400people contributingcode through 2001 (thetime
frameexamined in Mockus et al. (2002)). TheMockus study distinguishes two
kindsof Apache contributions:
codefixes madein response to reported problems
codesubmissions intendedto implement new system functionality
Roundednumbers are used in thefollowing statistical summaries forclarity.
Thesummary statistics for Apache code contributions are as follows:
r
Two hundred people contributed to 700 code fixes.
r
Two hundred fifty people contributed to 7,000 code submissions.
Thesummary error statistics are as follows:
r
Threethousand people submitted 4,000 problem reports, most triggering no
changeto the code base, because they either lacked detail or the defect had
beenfixed or was insignificant.
r
Fourhundred fifty people submitted the 600 bug reports that led to actual
changesto the code.
The 15 most productive developersmade 85% of implementation changes,
though for defect repair these top 15 developers made only 65% of the code
fixes.A narrow pool of contributors dominated code submissions, with only 4
developersper 100 code submissions versus 25 developers per 100 code fixes.
Thus“almost all new functionality is implemented and maintained by the core
group”(Mockus et al., 2002,p.322).
The Apache core developers compared favorably with those in reference
commercial projects, showing considerably higher levels of productivity and
handling more modification requests than commercial developers despite the
part-time,voluntary natureof their participation.The problem reportingrespon-
sibilities usually handled by test and customer support teams in proprietary
projects were managed in Apache by thousands of volunteers. While the 15
most productive developers submitted only 5% of the 4,000 problem reports,
there were over 2,500 mostly noncore developers who each submitted at least
one problem report, thus dispersing the traditional role of system tester over
manyparticipants. The response time forproblem reports was striking: half the
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
34 2 Open Source Internet Application Projects
problemsreported were solved in aday, 75% in a month,and 90% in 4 months,
atestimony to the efficiency of the organization of the project and the talent of
the volunteers.Of course, the data used in such studies is invariably subject to
interpretation.For example, metrics like productivityof groups can be affected
by the procedures used to attribute credit, while response rates reported could
be affected by details like when bug reports were officially entered into the
trackingsystem.
The social and motivational framework under which the developers oper-
atedwas an importantelement in the successof the Apacheproject. The merito-
craticprocess thatenables candidate developersto achievecore developer status
requires persistence, demonstrated responsibility to the established core team,
andexceptionally high technical capability.The motivationalstructure also dif-
fers significantly from commercial environments, where the project worked
on and component tasks are assigned by management, not freely chosen by a
developer.From this viewpoint, it seems unsurprising that the passionate, vol-
untary interest of the project developers should be a strong factor contributing
to its success. The stakeholder base for Apache is now sufficiently broad that
changes to the system must be conservatively vetted, so services to end users
arenot disrupted. For this reason, Ye et al. (2005) characterize it as now being
aservice-oriented open source project.
The Mockus et al. (2002)reviewmakes several tentative conjectures about
thedevelopment characteristics of openprojects based on their datafor Apache
and Netscape Mozilla development(prior to 2001). For example, they suggest
that for projects of Apache’s size (as opposed to the much larger Netscape
Mozilla project), a small core of developers create most of the code and func-
tionality and so are able to coordinate their efforts in a straightforward way
even when several developers are working on overlapping parts of the code
base.In contrast, in larger development projects likeNetscape Mozilla, stricter
practicesfor code ownership, work group separation,and CVS commit author-
ity have to be enforced to balance the potential for disorder against excessive
communicationrequirements. Anothersimple pattern isthat the sizesof the par-
ticipantcategories appear to differ significantly: thenumber of core developers
issmaller by an orderof magnitude than the numberof participants who submit
bugfixes,which in turn is smaller by an order of magnitude than the number
of participants who report problems and defects. The defect density for these
opensource projects was lowerthan the compared proprietary projects thathad
beenonly feature tested. However, the studyurges caution in the interpretation
of this result since it does not address postrelease bug density and may partly
reflect the fact that the developersin such projects tend to have strong domain
expertise as end users of the product being developed. The study concluded
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.1 The WWW and the Apache WebServer 35
that the open projects considered exhibited “very rapid responses to customer
problems”(Mockus et al., 2002).
ReusingOpen Source Creations
One of the key objectives of the open source movement is to build a reusable
public commons of software that is universally available and widely applica-
ble. Brian Behlendorf of the Apache (and later Subversion) project has some
valuable insights about how to apply the creations of open development to
circumstances beyond those originally envisioned. He identifies some general
conditionshe believes are necessary forother applications to benefit from open
productsand libraries when theyare applied not just in environmentsthey were
originallydesigned for but in updated versions of those environments or when
theyhave to be integrated with other applications (Anderson, 2004). There are
threekey ingredients that haveto come together to effectivelysupport the reuse
ofopen software:
1. access to the source code,
2. access to the context in which the code was developed, and
3. access to the communities that developed and use the code.
One might call this the 3AC model of what open source history has taught
us about software reuse. The availability of the code for a program or soft-
ware library is the first essential ingredient, not just the availability of stable
APIs like in a COTS (Commercial Off-the-Shelf) environment. Open source
obviouslyprovides the source code. Source code is required foreffective reuse
of software because any new application or infrastructure context, like a new
operating system, will necessitate understanding the code because embedding
software components in new contexts will “inevitably . .. trigger some defect
that the original developers didn’t know existed” (Anderson, 2004). However,
if you are trying to improve the code, you also need access to the context of
its development. In open source projects this can be obtained from a variety
of sources including e-mail archives and snapshots from the development tree
that provide the history of the project and its development artifacts. This way
you can find out if the problems you identify or questions you have in mind
havealready been asked and answered. Finally, inorder to understand how the
softwarewas built and why it was designed the wayit was, you also need to be
able to interact with the community of people who developed the product, as
well as the community of other users who may also be trying to reuse it. This
kind of community contact information is also available in open source that
has mailing lists for developers and users, as well as project announcements
thatcan be scavenged for information about how the project developed. That’s
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
36 2 Open Source Internet Application Projects
how you get a truly universal library of reuseable software: open code, the
developmentalcontext, and the community that made the software and uses it.
References
Anderson, T. (2004). Behlendorf on Open Source. Interview with Brian Behlendorf.
http://www.itwriting.com/behlendorf1.php.Accessed November 29, 2006.
Andreessen,M. (1993). NCSA Mosaic Technical Summary.NCSA, University of Illi-
nois.Accessed via Google Scholar, November 29, 2006.
Apache.pdf. (2006). World Wide Web. http://www.governingwithcode.org. Accessed
January10, 2007.
Berglund, Y., Morrison, A., Wilson, R., and Wynne, M. (2004). An Investigation
into Free eBooks. Oxford University. http://ahds.ac.uk/litlangling/ebooks/report/
FreeEbooks.html.Accessed December 16, 2006.
Berners-Lee, T. (2006). Frequently Asked Questions. www.w3.org/People/Berners-
Lee/FAQ.html.Accessed January 10, 2007.
Berners-Lee,T. and Fischetti, M. (2000). Weaving the Web– The Original Design and
UltimateDestiny of the World Wide Web by Its Inventor.Harper, San Francisco.
Bradner, S. (1999). The Internet Engineering Task Force. In: Open Sources: Voices
fromthe Open Source Revolution, M. Stone,S. Ockman, and C. DiBona (editors).
O’ReillyMedia, Sebastopol, CA, 47–52.
Clark,D. (1992). A Cloudy Crystal Ball: Visions of the Future. Plenary presentation at
24thmeeting of the Internet EngineeringTask Force, Cambridge,MA, July 13–17,
1992. Slides from this presentation are available at: http://ietf20.isoc.org/videos/
future
ietf 92.pdf. Accessed January 10, 2007.
Fielding,R.T. (1999). Shared leadership in the Apache Project.Communications of the
ACM,42(4), 42–43.
Gaines, B. and Shaw, M. (1996). Implementing the Learning Web.In: Proceedings of
EDMEDIA ’96: WorldConference on Educational Multimedia and Hypermedia.
Associationfor the Advancement of Computingin Education, Charlottesville, VA.
http://pages.cpsc.ucalgary.ca/gaines/reports/LW/EM96Tools/index.html.
AccessedNovember 29, 2006.
Holck,J. and Jorgensen N. (2005). Do NotCheck in on Red: Control Meets Anarchy in
Two Open SourceProjects. In: Free/Open SoftwareDevelopment, S. Koch(editor).
IdeaGroup Publishing, Hershey, PA, 1–26.
Kesan, J. and Shah, R. (2002). Shaping Code. http://opensource.mit.edu/shah.pdf.
AccessedNovember 29, 2006.
Leonard, A. (1997). Apache’s Free Software Warriors. Salon Magazine. http://
archive.salon.com/21st/feature/1997/11/cov
20feature.html. Accessed Nov-
ember29, 2006.
Lessig,L. (1999). Code and Other Laws of Cyberspace. Basic Books, New York.
McCool,R., Fielding, R.T., and Behlendorf, B. (1999). How the Web WasWon. http://
www.linux-mag.com/1999–06/apache
01.html.Accessed November 29, 2006.
Mockus,A., Fielding,R.T., andHerbsleb, J.D. (2002). TwoCase Studiesof Open Source
Development:Apache and Mozilla. ACM Transactions on Software Engineering
andMethodology, 11(3), 309–346.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 37
Moody,G. (2001). Rebel Code. Penguin Press, New York.
Osterlie,T.(2003). The User-DeveloperConvergence: Innovationand SoftwareSystems
Development in the Apache Project. Master’s Thesis, Norwegian University of
Scienceand Technology.
Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.
firstmonday.dk/issues/issue3
3/raymond/index.html.Ongoing version: http://
www.catb.org/esr/writings/cathedral-bazaar/.Accessed December 3, 2006.
Russell,A. (2006). “Rough Consensus and Running Code” and the Internet-OSI Stan-
dardsWar. IEEE Annals of the History of Computing, 28(3), 48–61.
Wheeler,D. (2000). Estimating Linux’s Size. http://www.dwheeler.com/sloc/redhat71-
v1/redhat71sloc.html.Accessed November 29, 2006.
Ye,Y., Nakakoji, K., Yamamoto, Y., and Kishida, K. (2005). The Co-Evolution of
Systemsand Communities. In: Free/Open Source Software Development,S. Koch
(editor).Idea Group Publishing, Hershey, PA, 59–83.
Zakon, R. (1993/2006). Hobbes’ Internet Timeline v8.2. http://www.zakon.org/robert/
internet/timeline/
.Accessed January 5, 2007.
2.2 The Browsers
Browsers have played a critical role in the Internet’s incredibly rapid expan-
sion. They represent the face of the Internet for most users and the key means
for accessing its capabilities. Three open source browsers have been most
prominent: Mosaic, Netscape Navigator, and Firefox. The proprietary Inter-
netExplorer browser, which is based onMosaic, coevolved and still dominates
the market. The developmentof these browsers is an intriguing but archetypal
taleof opensource development.It combines elementsof academic provenance,
proprietarycode, opensource code andlicenses, technological innovations,cor-
porate battles for market share, creative software distribution and marketing,
opentechnology standards, and opencommunity bases of volunteer developers
andusers. The story starts with the revolutionary Mosaic browser at thebegin-
ning of the Internet revolution, moves through the development of Netscape’s
corporatelysponsored browser and its browser war with InternetExplorer, and
finallyon to Netscape’s free descendant, Firefox.
2.2.1 Mosaic
Thefamed MosaicWe b browser was instrumentalin creatingthe Internet boom.
Mosaic was developed at the NCSA starting in 1993. The University of Illi-
nois student Marc Andreessen (who was the lead agent in the initiative) and
NCSA full-time employee and brilliant programmer Eric Bina were the chief
developers. Andreessen wanted to make a simple, intuitive navigational tool
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
38 2 Open Source Internet Application Projects
thatwould let ordinary users explore the new WWW more easily and let them
browse through the data available on the Web. Andreessen and Bina (1994)
identifiedthree key design decisions.The tool had to beeasy to use, like aword
processingGraphical User Interface (GUI) application.It had to be keptsimple
bydivorcing pageediting from presentation.(The original Berners-Leebrowser
had included publication features that complicated its use.) The software also
hadto accommodate images in such away that both text andembedded images
couldappear in the sameHTML page or browser window.For this, Andreessen
hadto introduce an HTML image tag, even though the standardsfor such a tag
hadnot yet been settled. Significantly, Mosaic also introduced formsthat users
could fill out. It took a short six weeks to write the original program of 9,000
lines of code (Wagner,2002). Mosaic transcended the capabilities of previous
text-orientedtools like FTP for accessing information on the Internet. Instead,
it replaced them with a multimedia GUI tool for displaying content, including
the appeal of clickable hyperlinks. Mosaic was initially available for Unix but
was quickly portedto PCs and Mac’s. It rapidly became the killer app for Web
accessof the mid-1990s.
Mosaic’ssuccess was notmerely a technical accomplishment. Andreessen’s
management of the project was nurturing and attentive. He was an activist
communicator and listener, one of the top participants in www-talk in
1993 (NCSAmosaic.pdf, 2006). According to Web founder Berners-Lee,
Andreessen’sskills in “customer relations” were decisive in the enhancement
of Mosaic: “You’d send him a bug [problem] report and then two hours later
he’dmail you a fix” (quoted in Gillies and Cailliau (2000,p.240)).Mosaic’s
popularityhad a remarkable effect: it caused an explosion in Web traffic.Each
increase in traffic in turn had a feedback effect, attracting more content to the
Internet,which in turn increased traffic evenfurther. Mosaic had over2 million
downloads in its first year, and by mid-1995 it was used on over 80% of the
computersthat were connected tothe Internet. An article inthe New YorkTimes
byJohn Markoff(1993) appreciated theimplications of the newsoftware forthe
Internet.The article ballyhooed the killer app statusof Mosaic. However, it did
notallude to the software’s developers by name but only to theNCSA director
LarrySmarr. Thisslight reflected theinstitutional provenance ofthe tool andthe
attitudeof NCSA: Mosaic was a product of NCSA, not of individuals, and the
Universityof Illinois expected itto stay that way. Werefer the interested reader
toGillies and Cailliau (2000) and NCSAmosaic.pdf (2006) for more details.
The Mosaic license was open but not GPL’d and had different provisions
for commercial versus noncommercial users. Refer to http://www.socs.uts.
edu.au/MosaicDocs-old/copyright.html (accessed January 10, 2007) for the
full terms of the NCSA Mosaic license. The browser was free of charge for
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 39
noncommercialuse, which meant academic, research, or internal businesspur-
poses, with the source code provided for the Unix version. Noncommercial
licenseeswere allowed to not onlydevelop the software but redistributederiva-
tiveworks. These redistributions weresubject to a proviso: the derivativeprod-
uctshad tobe identified asdifferent from theoriginal Mosaic codeand there was
tobe no charge for the derivative product. The terms for commercial licensees
weredifferent; for commercial distributionof a modified product,license terms
had to be separately negotiated with NCSA. NCSA assigned all the commer-
cialrights for Mosaic to Spyglass in late 1994 (Raggett et al., 1998). By 1995,
Microsoft had licensed Mosaic as the basis for its own early browser Internet
Explorer,but by that point Netscape Navigator dominated the browser market.
Ironically,however, to this day the Help > About tab on the once again dom-
inant Internet Explorer has as its first entry “based on NCSA Mosaic. NCSA
Mosaic(TM);was developed atthe National Center for SupercomputingAppli-
cationsat the University of Illinois at Urbana–Champaign.”
Beyond the revolutionary impact of its functionality on the growth of the
Internet,the Mosaic browseralso expedited the Web’sexpansionbecause of the
publicaccess it provided to HTML, which was essentiallyan open technology.
Mosaic inherited the View Source capability of Tim Berners-Lee’s browser.
Thishad a significant side effectsince it allowed anyoneto see the HTML code
fora page and imitate it.As Tim O’Reilly (2000) astutelyobserved, this simple
capability was “absolutely key to the explosive spread of the Web. Barriers to
entry for ‘amateurs’ were low, because anyone could look ‘over the shoulder’
ofanyone else producing a web page.”
2.2.2 Netscape
Softwaretalent isportable. Given theuncompromising, albeitby the book,insti-
tutional arrogation of Mosaic by the University of Illinois, there was no point
in Andreessen staying with NCSA. After graduating in 1993, he soon became
one of the founders of the new Netscape Corporation at the invitation of the
legendaryJim Clark, founder of Silicon Graphics. Netscape was Andreessen’s
nextspectacular success.
Andreessen was now more than ever a man with a mission. At Netscape,
he led a team of former students from NCSA, with the mission “to develop an
independentbrowser better than Mosaic, i.e. Netscape Navigator.” They knew
that the new browser’s code had to be completely independent of the original
Mosaicbrowser in order toavoid future legal conflictswith NCSA. As it turned
out, a settlement with the University of Illinois amounting to $3 million had
to be made in any case (Berners-Lee, 1999). The internal code name for the
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
40 2 Open Source Internet Application Projects
first Netscape browser was Mozilla,afeisty pun combining the words Mosaic
(the browser) and Godzilla (the movie monster) that was intended to connote
an application that would kill the then dominant Mosaic browser in terms of
popularity. (The page at http://sillydog.org/netscape/kb/netscapemozilla.php,
accessed January 10, 2007, provides a helpful description of the sometimes
confusinguse of the name Mozilla.) The development teamworked feverishly.
As one member of the group put it, “a lot of times, people were there straight
forty-eighthours, just coding. I’venever seen anything likeit. . .. But they were
drivenby this vision [of beating the original Mosaic]” (Reid, 1997). The sense
of pride and victory is even more pungent in a well-known postmortem by
team member Jamie Zawinski (1999) that would come later, after Netscape’s
unhappybrowser war with Internet Explorer:
...wewereout to change the world. And we did that. Without us, the change
probablywould have happened anyway...Butwewerethe ones who actually did
it.When you see URLs on grocery bags, on billboards, on the sides of trucks, at the
endof movie credits just after the studio logos – that was us, we did that. We put
theInternet in the hands of normal people. We kick-started a new communications
medium.We changed the world.
Netscape’spricing policy was based on a quest for ubiquity. Andreessen’s
belief was that if they dominated market share, the profits would follow from
sideeffects. According to Reid (1997), Andreessen thought,
Thatwas the way to get the company jump-started, because that just gives you
essentiallya broad platform to build off of. It’s basically a Microsoft lesson, right?
Ifyou get ubiquity, you have a lot of options, a lot of waysto benefit from that. You
canget paid by the product you are ubiquitous on, but you can also get paid on
productsthat benefit as a result. One of the fundamental lessons is that marketshare
nowequals revenue later, and if you don’t have market share now,you are not
goingto have revenue later. Another fundamental lesson is that whoever gets the
volumedoes win in the end. Just plain wins.
Netscape bet on the side effects of browser momentum. It basically gave
the browser away. However, it sold the baseline and commercial server they
developed,originally pricing them at $1,500 and$5,000, respectively. The free
browser was an intentional business marketing strategy designed to make the
productubiquitous so that profits could then be made offsymbiotic effects like
advertisingand selling servers (Reid, 1997). In principle,only academic use of
thebrowser wasfree and allothers were supposedto pay$39.00. But inpractice,
copies were just downloaded for free during an unenforced trial period. How-
ever,although the product waseffectively free ofcharge, it was notin any sense
freesoftware or open source. The original Netscape software license was pro-
prietary (http://www.sc.ucl.ac.be/misc/LICENSE.html, accessed January 10,
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 41
2007). It also explicitly prohibited disassembly, decompiling or any reverse
engineeringof the binary distribution, or the creation of any derivative works.
Netscape’sstrategy paid off handsomely and quickly. Originally posted for
downloadon October 13, 1994, Netscape quickly dominated the browser mar-
ket. This downloaded distribution of the product was itself a very important
Netscapeinnovation and accelerated its spread. The aggressive introduction of
newHTML tags by the Netscapedevelopers was also seductiveto Web design-
ers who rapidly incorporated them into their Web pages (Raggett et al., 1998;
Griffin, 2000). Since Netscape was the only browser that could read the new
tags,the Web pagedesigners would include a note ontheir page that it was best
viewedin Netscape. Theywould then providea link to wherethe free download
could be obtained, so Netscape spread like a virus. A major technical advan-
tage of the initial Netscape browser over Mosaic was that Netscape displayed
imagesas they were received fromembedded HTTP requests, rather than wait-
ing for all the images referred to in a retrieved HTML page to be downloaded
before the browser rendered them. It also introduced innovations like cookies
and even more importantly the new scripting language Javascript, which was
specificallydesigned for the browser environment and made pages much more
dynamic(Andreessen, 1998; Eich, 1998). Thus,brash technology meshed with
attractiveness, pricing, and distribution to make Netscape a juggernaut. The
company went public on August 9, 1995, and Andreessen and Clark became
millionaires and billionaires, respectively. By 1996 Netscape had penetrated
75%of the market. It was eventually bought by AOL for $10 billion.
Given this initial success, how did it happen that within a few years of
Netscape’striumphant conquest of the browser market, Internet Explorer, the
proprietary Microsoft browser, which was deeply rooted in Mosaic, became
the dominant browser? It was really Microsoft’s deep pockets that got the
better of Netscape in the so-called browser wars. Indeed, Microsoft called
its marketing campaigns jihads (Lohr, 1999). Microsoft destroyed Netscape’s
browser market by piggybacking on the pervasive use of its operating system
onPCs. It bundled Internet Explorer forfree with every copy ofWindows sold,
despite the fact that it had cost hundreds of millions of dollars to develop.
With its huge cash reservoirs, Microsoft was able to fund development that
incrementally improved IE until step by step it became equivalent in features
and reliability to Netscape. As time went by, the attraction of downloading
Netscape vanished, as the products became comparable. Netscape became a
victimof redundancy.
Manyof Microsoft’s practices were viewed as monopolistic and predatory,
resultingin its being prosecuted by the federal governmentfor illegally manip-
ulatingthe software market. Government prosecutor David Boies claimed that
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
42 2 Open Source Internet Application Projects
Microsoftwas trying to leverage its de factomonopoly in Windows to increase
itsmarket share for browsers and stifle competition (Lea, 1998). A settlement,
which many considered as a mere slap on the wrist to Microsoft, was reached
withthe Justice department inlate 2001 (Kollar-Kotelly,2002). In any case,the
original Netscape Navigator browser’s market share had fallen steadily. From
a peak of over 80% in 1996, it dropped to 70% in 1997, 50% in 1998, 20% in
1999, to a little over 10% in 2000. Microsoft’s IE rose in tandem as Netscape
fell,almost saturating the market by 2002, prior to Firefox’s emergence.
Ina lastditch effortto rebound, Netscapedecided thatperhaps the proprietary
IEdragon could bebeaten by a reformed,more open Netscape. Soearly in 1998
Netscape responded to Internet Explorer by going open source – sort of. The
company stated that it was strongly influenced in this strategy by the ideas
expressedin Raymond’s famous “TheCathedral and the Bazaar” paper (1998).
Refer to the e-mail from Netscape to Raymond in the latter’s epilogue to his
paper, updated as per its revision history in 1998 or later. Netscape thought
it could recover from the marketing debacle inflicted by the newly updated
releases of Internet Explorer by exploiting the benefits of open source style-
collaborativedevelopment. So it decided to release the browser source code as
opensource.
The new release was done under the terms of the Mozilla Public License
(MPL). The sponsor was the newly established Mozilla Organization whose
missionwould be to developopen source Internetsoftware products. The intent
ofthe MPL licenseand the MozillaOrganization was topromote open source as
ameans of encouraging innovation. Consistent with the general senseof copy-
left,distributed modificationsto anypreexisting sourcecode filesobtained under
anMPL open sourcelicense also hadto be disclosedunder the termsof the MPL
license.However,completely newsource codefiles, which alicensee developed,
werenot restricted orcovered by anyof the terms ofthe MPL. Furthermore,this
remainedthe case evenwhen the additions orchanges were referenced by mod-
ificationsmade in the MPL-licensed section of the source code. In comparison
with some existing open source licenses, the MPL license had “more copyleft
(characteristics)than the BSD family of licenses,which have no copyleft atall,
butless thanthe LGPL orthe GPL”licenses (http://www.mozilla.org/MPL/mpl-
faq.html).The Netscapebrowser itself (post-1998)contained both typesof files,
closedand open. It included proprietary (closedsource) files that were not sub-
ject to the MPL conditions and were available only in binary. But the release
alsoincluded MPL filesfrom the Mozilla project,which were now opensource.
AlthoughNetscape’s marketshare still declined, outof its asheswould come
something new and vital. The versions of Netscape released after 2000 con-
taineda newbrowser enginenamed Gecko, whichwas responsible forrendering
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 43
and laying out the content of Web pages. This was released under the MPL
license and was open source. But, the open source releases of Netscape were
notvery successful, partly because of a complicated distribution package. The
Netscape project was finally shut down by then owner AOL in 2003. How-
ever, a small, nonprofit, independent, open source development organization
called the Mozilla Foundation, largely self-funded through contributions, was
set up by AOL to independently continue browser development. The purpose
ofthe foundation was to provideorganizational, legal, and financial supportfor
the Mozilla open source software project. Its mission was to preserve choice
and promote innovation on the Internet (mozilla.org/foundation/). Out of this
matrix, the Firefox browser would rise phoenix-like and open source from the
ashesof Netscape. Thus“a descendant of NetscapeNavigator (was) nowpoised
toavenge Netscape’s defeat at the hands of Microsoft” (McHugh, 2005).
2.2.3 Firefox
The Mozilla Foundation development team that produced Firefox began by
refocusingon the basic needs of abrowser user. It scrappedthe overly complex
Netscape development plans and set itself the limited objective of making a
simple but effective, user-oriented browser. The team took the available core
codefrom the Netscape project and used that as a basis for a more streamlined
browser they thought would be attractive. In the process they modified the
original Netscape Gecko browser layout engine to create a browser that was
also significantly faster. The eventual outcome was Fi refox,across-platform
open source browser released at the end of 2004 by the Mozilla Foundation
that has proven explosively popular. Firefox is now multiply licensed under
the GPL, LGPL, or MPL at the developer’s choice. It also has an End User
License Agreement that has some copyright and trademark restrictions for the
downloadedbinaries needed by ordinary users.
Firefoxhas been a true mass-market success. It is unique as an open source
applicationbecause the number of its direct end users is potentially inthe hun-
dredsof millions. Previous successful open source applications like Linux and
Apachehad beenintended fortechnically proficient usersand addressed(at least
initiallyin the case ofLinux) a smaller end-usermarket, while desktop environ-
mentslike GNOME and KDE are intended for a Linux environment. Firefox’s
market advantages include being portable to Windows, Linux, and Apple.
Thisincreases its potential audience vis- `a-visInternet Explorer. It also closely
adheresto theW3C standardsthat Internet Explorerhas viewedas optional.Like
the original Netscape browser, Firefox burst onto the browser scene, quickly
capturingtens of millionsof downloads: 10 millionin its first month,25 million
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
44 2 Open Source Internet Application Projects
within 100 days of publication, and a 100 million in less than a year. Fire-
fox 1.5 had 2 million downloads within 2 days of publication in November
2005. It rapidly gained prominence in the browser market, capturing by some
estimates 25% of the market (w3schools.com/browsers/browsers
stats.asp,
accessed December 6, 2006) within a year or so of its initial release, though
sources like Net Applications Market Share survey show significantly lower
penetration,under 15% in late 2006 (http://marketshare.hitslink.com, accessed
December6, 2006).
Microsoft’s complacency with regard to the security of Internet Explorer
serendipitouslyhelped Firefox’s debut. In June 2004, a Russian criminal orga-
nization distributed Internet malware called Download.ject that exploited a
compositeweakness jointly involving Windows IIS servers and a security vul-
nerability in Internet Explorer. Ironically, the exploited security shortcoming
in Internet Explorer was tied precisely to its tight integration with the Win-
dowsoperating system. This integration provided certain software advantages
tothe browser but also allowed hackers to leverage theirattacks (Delio, 2004).
Althoughthe attack wascountered within afew days, itsoccurrence highlighted
IEsecurity holes and waswidely reported in the news.US CERT (theUS Com-
puter EmergencyReadiness Team) advised federal agencies at the time to use
browsers other than Internet Explorer in order to mitigate their security risks
(Delio, 2004). The negative publicity about IE vulnerabilities occurred pre-
cisely when the first stable version of Firefox appeared. This played right into
one of Firefox’s purported strengths, not just in usability but also in security,
therebyhelping to establish Firefox’s appeal.
The (Mozilla) Firefox project was started by Blake Ross. Blake had been
a contractor for Netscape from age 15 and already had extensive experience
in debugging the Mozilla browser. The precocious Ross had become dissatis-
fied with the project’s direction and its feature bloat. He envisioned instead
a simpler easy-to-use browser, so he initiated the Firefox project in 2002.
Experienced Netscape developer Dave Hyatt partnered with Ross, bringing
with him a deep knowledge of the critical Mozilla code base. Ben Goodger
was engaged to participate because of his Web site’s “thorough critique of
the Mozilla browser” (Connor, 2006b). He subsequently became lead Firefox
engineerwhen Ross enrolled in Stanford at age19. Firefox was released in late
2004 under Goodger, who was also instrumental in the platform’s important
add-onarchitecture (Mook, 2004). Although its development depended on the
extensive Netscape code base, it was an “extremely small team of commit-
ted programmers” who developed Firefox (Krishnamurthy, 2005a). The core
projectgroup currently has six members: the aforementioned Ross, Hyatt, and
Goodger,as well as Brian Ryner, Vladmir Vukicevic, and Mike Connor.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 45
Keyfactors in the success of Firefox included its user design criteria, the
characteristics and policies of its development team, and its unique, open
community-basedmarketing strategy.
The project’s central design principle was “keep it simple.” Ross has used
familyimagery to describe thedesign criteria for determining whichfeatures to
includein the browser. They would ask the following questions about putative
features(Ross, 2005b):
Doesthis help mom use the web? If the answer was no, the next question was: does
thishelp mom’s teenage son use the web? If the answer was still no, the feature
waseither excised entirely or (occasionally) relegated to config file access only.
Otherwise,it was often moved into an isolated realm that was outside of mom’s
reachbut not her son’s, like the preferences window.
Inthe same spirit, Rossdescribes Firefox as being about“serving users” and
contends that a window of opportunity for Firefox’s development had opened
becauseMicrosoft had irresponsibly abandoned Internet Explorer,leaving “for
deada browser that hundreds of millions of people rely on” (Ross, 2006).
TheFirefox development team structure was intentionally lean, even elitist.
The FAQin the inaugural manifesto for the project explained why the devel-
opmentteam was small by identifying the kinds of problems that handicapped
progress on the original Mozilla project under Netscape after 2000: “Factors
such as marketing teams, compatibility constraints, and multiple UI designers
pulling in different directions have plagued the main Mozilla trunk develop-
ment.We feel that fewer dependencies, faster innovation, andmore freedom to
experimentwill leadto abetter endproduct” (blakeross.com/firefox/README-
1.1.html,accessed December 6, 2006).
Thelead developers wanted to keep the development group’sstructure sim-
ple,not just the browser’s design.According to the manifesto, CVS access was
“restrictedto a very small team. We’llgrow as needed, basedon reputation and
meritorious hacks” (README-1.1.html). Thus in typical open source style,
admissionwas meritocratic. To the question “how doI get involved,” the blunt
answerwas “by invitation.This is a meritocracy –those who gain therespect of
thosein the group will be invitedto join the group.” As faras getting help from
participants who wanted to chime in about bugs they had detected, the FAQ
wasequally blunt. To the question “where doI file bugs,” the answer was “you
don’t. Weare not soliciting input at this time. See Q2.” Of course the project
wasopen, so youcould get acopy of thesource code fromthe Mozilla CVStree.
Despite these restrictions, the list of credited participants in the 1.0.4 version
included about 80 individuals, which is a significant base of recognized con-
tributors.You can refer to the Help > About Firefox button in the browser for
thecurrent credits list. Subsequent documents elaborated on howto participate
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
46 2 Open Source Internet Application Projects
in the project in a more nuanced and inclusive way but without changing the
underlyingtough standards (Ross, 2005a).
Thestudy by Krishnamurthy (2005a)describes the project as a“closed-door
opensource project,”a characterizationnot intended tobe pejorative.It analyzes
the logistic and organizational motivations for and consequences of enforcing
tight standards for participating in development. Overly restrictive control of
entryto participation in an open project can havenegative ramifications for the
long-term well-being of the project. Indeed, Firefox developer Mike Connor
complainedvocally at one point that “in nearly three years we haven’t built up
a community of hackers, and now I think we’re in trouble. Of the six people
whocan actually review inFirefox, four are AWOL,and one doesn’t do alot of
reviews” (Connor, 2006a). However, a subsequent blog by Connor described
the ongoing commitment of the major Firefox developers and the number of
participantsin Mozillaplatform projects morepositively,including the presence
ofmany corporate sponsored “hackers” (Connor, 2006b).
Althoughthe core developmentteam was small andinitially not solicitousto
potential code contributors,the project made intensive effort to create an open
communitysupport base of usersand boosters. The sitehttp://www.mozila.org/
isused to support product distribution. A marketing site atwww.spreadfirefox.
com was set up where volunteers were organized to “spread the word” about
Firefoxin variousways, akey objectiveof the promotionalcampaign being used
toget end users to switch from Internet Explorer. The site www.defendthefox.
comwas established to put “pressureon sites that were incompatible withFire-
fox.Users could visit itand identify web sites thatdid not display appropriately
when Firefox was used as the browser” (Krishnamurthy, 2005b). Although
Firefoxwas open source, the notion that largenumbers of developers would be
participating in its developmentwas mistaken; the participants were primarily
involvedin its promotion. The combination of a complacent competitor (Inter-
net Explorer), an energized open volunteer force organized under an effective
leader,and an innovativeproduct was instrumental in the rapid successof Fire-
fox (Krishnamurthy, 2005b). It also benefited from strong public recognition
likebeing named PC World Product of the Year 2005.
Thereare a number of characteristics on which Firefox has been claimed to
besuperior and perceptions that have helped makeit popular, including having
better security, the availability of many user-developed extensions, portabil-
ity,compliance with Web standards, as well as accessibility and performance
advantages.We will briefly examine these claims.
Open source Firefox is arguably more secure than proprietary Internet
Explorer.For example, the independent computer security tracking firm Secu-
nia’s (Secunia.com) vulnerability reports for 2003–2006 identify almost 90
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 47
security advisories for IE versus slightly more than 30 for Firefox. Further-
more, about 15% of the IE advisories were rated as extremely critical versus
only3% for Firefox. Other related security statistics from Secunia note that as
ofJune 2006, more than 20of over 100 IE advisorieswere unpatched, with one
or more of these listed as highly critical. In contrast, only 4 of the 33 Secunia
advisoriesfor Firefox were unpatchedand were listed as lesscritical. It must be
keptin mind that these security vulnerabilitiesfluctuate over time and there are
detailsto the advisories thatmake the interpretation ofthe statistics ambiguous,
but Firefox seems to havea security edge over Internet Explorer, at least at the
present time. Aside from the Firefox security model, the fact that the browser
isless closely bound to the operating system than Internet Explorer, its lack of
support for knownIE security exposures like ActiveX, and the public accessi-
bility of an open source product like Firefox to ongoing scrutiny of its source
codefor bugs and vulnerabilities arguably bolster its security.
A significant feature of Firefox is that it allows so-called extensionsto pro-
vide extra functionality. According to Firefox Help, “extensions are small
add-ons to Firefox that change existing browser functionality or add new
functionality.” The Firefox site contains many user-developed extensions,
like the NoScript extension that uses whitelist-based preemptive blocking to
allow Javascriptand other plug-ins “only for trusted domains of your choice”
(https://addons.mozilla.org). The extensions are easy to install and uninstall.
Individuals can develop their own extensions using languages like Javascript
andC++. Refer to http://developer.mozilla.org/for a tutorial onhow to build an
XPCOM (Cross-Platform Component Object Model) component for Firefox.
This feature helps recruit talent to further develop the product. The extension
model has two important advantages. Not providing such functionalities as
default features helps keep the core product lean and unbloated. It also pro-
videsan excellent venue for capitalizingon the talent and creativityof the open
community.Creative developerscan design and implementnew add-ons. Users
interestedin the newfunctionality caneasily incorporate itin their ownbrowser.
Thisprovides the advantage of feature flexibility without feature bloat and lets
userscustom-tailor their own set of features. Firefox also provides a variety of
accessibilityfeatures that facilitate its use by the aged and visually impaired.
Therelative performance of browsers in terms ofspeed is not easy to judge,
and speed is only one aspect of performance. A fast browser compromised
by serious security weaknesses is not a better browser. Useful Web sites like
howtocreate.co.uk/browserSpeed.html(accessed December 6, 2006) present a
mixed picture of various speed-related metrics for browsers for performance
characteristics, like time-to-cold-start the browser, warm-start-time (time to
restartbrowser after it has been closed), caching-retrieval-speed, script speed,
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
48 2 Open Source Internet Application Projects
andrendering tables,for browserssuch as Firefox,Internet Explorer,and Safari.
Thesestatistics do not uniformly favor any one of the browsers.
HTMLand Javascript
We conclude our discussionof browsers with some brief remarksabout HTML
and Javascript, tools that are central features of the Internet experience. Both
HTML(a markuplanguage) and Javascript(a client-side scriptinglanguage that
actsas the API foran HTML document’sDocument Object Model) haveacces-
sible source code. Therefore, in this rudimentary sense, they are not “closed
source.” Of course, neither are they “open source” in any strict sense of the
term, since, other than the visibility of the code, none of the other features of
open source software come into play, from licensing characteristics, to modi-
fication and redistribution rights, to open development processes. The HTML
and Javascript contents have implicit and possibly explicit copyrights and so
infringement by copying may be an issue, but there are no license agreements
involved in their access. Some purveyors of commercial Javascript/HTML
applications do have licenses specifically for developer use, but these are not
opensoftware licenses. Despitethe absence of licensingand other free software
attributes,the innate visibility of the code for these components is noteworthy
(seealso Zittrain, 2004).O’Reilly (2004) observed, aswe noted previously,that
thesimple “View Source” capabilityinherited by browsers from Berners-Lee’s
originalbrowser had the effect of reducing“barriers to entry for amateurs” and
was “absolutely key to the explosive spread of the Web” because one could
easilyimitate the code of others.
References
Andreessen, M. (1998). Innovators of the Net: Brendan Eich and Javascript. http://
cgi.netscape.com/columns/techvision/innovators
be.html. Accessed January 10,
2007.
Andreessen, M. and Bina, E. (1994). NCSA Mosaic: A Global Hypermedia System.
InternetResearch, 4(1), 7–17.
Berners-Lee,T. (1999). Weaving the Web.Harper, San Francisco.
Connor, M. (2006a). Myths and Clarifications. March 4. http://steelgryphon.com/
blog/?p=37.Accessed December 6, 2006.
Connor, M. (2006b). Myths and Clarifications. March 11. http://steelgryphon.com/
blog/?p=39.Accessed December 6, 2006.
Delio, M. (2004). Mozilla Feeds on Rival’s Woes. http://www.wired.com/news/
infostructure/0,1377,64065,00.html.Accessed November 29, 2006.
Eich, B. (1998). Making Web Pages Come Alive. http://cgi.netscape.com/columns/
techvision/innovators
be.html.Accessed January 10, 2007.
Gillies, J. and Cailliau, R. (2000). How the Web Was Born. Oxford University Press,
Oxford.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.2 The Browsers 49
Griffin,S. (2000).Internet Pioneers: Marc Andreessen.http://www.ibiblio.org/pioneers/
andreesen.html.Accessed January 10, 2007.
Kollar-Kotelly, C. (2002). United States of America v. Microsoft Corporation. Civil
Action No. 98–1232 (CKK). Final Judgment. http://www.usdoj.gov/atr/cases/
f200400/200457.htm.Accessed January 10, 2007.
Krishnamurthy, S. (2005a). About Closed-Door Free/Libre/Open Source (FLOSS)
Projects: Lessons from the Mozilla Firefox Developer Recruitment Approach.
European Journal for the Informatics Professional. 6(3), 28–32. http://www.
upgrade-cepis.org/issues/2005/3/up6–3Krishnamurthy.pdf.Accessed January 10,
2007.
Krishnamurthy, S. (2005b). The Launching of Mozilla Firefox A Case Study
in Community-Led Marketing. http://opensource.mit.edu/papers/sandeep2.pdf.
AccessedNovember 29, 2006.
Lea, G. (1998). Prosecution Says Gates Led Plan to Crush Netscape. Octo-
ber20. http://www.theregister.co.uk/1998/10/20/prosecution
says gates led plan/.
AccessedJanuary 10, 2007.
Lohr, S. (1999). The Prosecution Almost Rests: Government Paints Microsoft as
Monopolistand Bully. January 8.The NY Times on the Web.http://query.nytimes.
com/gst/fullpage.html?sec=technology&res=9C03E6DD113EF93BA35752C0
A96F958260&n=Top%2fReference%2fTimes%20Topics%2fSubjects%2fA%2f
Antitrust%20Actions%20and%20Laws.Accessed January 10, 2007.
Markoff, J. (1993). A Free and Simple Computer Link. December 8. http://www.
nytimes.com/library/tech/reference/120893markoff.html. Accessed January 10,
2007.
McHugh, J. (2005). The Firefox Explosion. Wired Magazine, Issue 13.02.
http://www.wired.com/wired/archive/13.02/firefox.html.Accessed November 29,
2006.
Mook,N. (2004). Firefox Architect Talks IE, Future Plane. Interview with Blake Ross.
November29. http://www.betanews.com/article/Firefox
Architect Talks IE
Future Plans/1101740041. Accessed December 6, 2006.
NCSAmosaic.pdf. (2006). World Wide Web. http://www.governingwithcode.org. Ac-
cessedJanuary 10, 2007.
O’Reilly,T. (2000). Open Source: TheModel for Collaboration in the Age of the Inter-
net. O’Reilly Network. http://www.oreillynet.com/pub/a/network/2000/04/13/
CFPkeynote.html?page=1.Accessed November 29, 2006.
Raggett,D., Lam,J., Alexander,I., and Kmiec,K. (1998). Raggetton HTML 4.Addison-
WesleyLongman, Reading, MA.
Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.
firstmonday.dk/issues/issue3
3/raymond/index.html.Ongoing version: http://
www.catb.org/esr/writings/cathedral-bazaar/.Accessed December 3, 2006.
Reid,R.H. (1997). Architects ofthe Web: 1,000 DaysThat Built the Future of Business.
JohnWiley & Sons, New York.
Ross,B. (2005a). Developer Recruitment in Firefox. January 25. http://blakeross.com/.
AccessedDecember 6, 2006.
Ross, B. (2005b). The Firefox Religion. January 22. http://blakeross.com/. Accessed
December6, 2006.
Ross,B. (2006).How to Hearwithout Listening. June6. http://blakeross.com/. Accessed
December6, 2006.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
50 2 Open Source Internet Application Projects
Wagner, D. (2002). “Marc Andreessen,” Jones Telecommunications and Multime-
diaEncyclopedia. Jones International. Seealso: http://www.thocp.net/biographies/
andreesen
marc.htm.Accessed January 10, 2007.
Zawinski, J. (1999). Resignation and Postmortem. http://www.jwz.org/gruntle/nomo.
html.Accessed November 29, 2006.
Zittrain,J. (2004). Normative Principles for Evaluating Free and Proprietary Software.
Universityof Chicago Law Review, 71(1), 265.
2.3 Fetchmail
EricRaymond, a well-knownopen source advocate, publishedan essay in 1998
aboutopen source development. The essay was called “The Cathedraland The
Bazaar”(Raymond, 1998). It famouslycontrasted the traditional model ofsoft-
ware development with the new paradigm introduced by Linus Torvalds for
Linux.Raymond compared the Linux styleof development to a Bazaar.In con-
trast,Brooks’ classic book on software development The Mythical ManMonth
(Brooks, 1995) had compared system design to building a Cathedral, a cen-
tralized understanding of design and project management. Raymond’s essay
recounts the story of his own open source development project, Fetchmail,
a mail utility he developed in the early 1990s. He intentionally modeled his
developmentof the mail utility on how Linus Torvaldshad handled the devel-
opmentof Linux. Fetchmail is now a common utility on Unix-like systems for
retrieving e-mail from remote mail servers. According to the description on
itsproject home page, it is currently a “full-featured, robust, well-documented
remote-mailretrieval andforwarding utilityintended to beused overon-demand
TCP/IPlinks (such as SLIP orPPP connections). It supports everyremote-mail
protocolnow in use on the Internet”(http://fetchmail.berlios.de/, accessed Jan-
uary12, 2007.)
Although Fetchmail is a notable project, it pales in scope and significance
to many other open source projects. Efforts like the X Window System are
orders of magnitude larger and far more fundamental in their application but
receiveless coverage. However,Fetchmail had a bard in EricRaymond and his
essayhas been widelyinfluential in the opensource movement. Itaphoristically
articulated Torvalds’development methodology at a critical point in time and
tookon the status of analmost mythological description of Internet-based open
sourcedevelopment. Italso introduced the termbazaar as an imagefor the open
styleof collaboration.
Raymond structures his tale as a series of object lessons in open source
design,development, and management that he learned from the Linux process
andapplied to his ownproject. The story beganin 1993 when Raymond needed
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.3 Fetchmail 51
a mail client that would retrieve his e-mail when he dialed up on his intermit-
tent connection from home. Applications like this were already available and
typicallyused a client-side application basedon the POP (or POP3)Post Office
Protocol.However, the first clients he tried did not handle e-mail replies prop-
erly,whence came his first Linux-derived lesson or moral: every good workof
softwarestarts by scratching a developer’s personal itch. This is a nowfamous
aphorism in the literature on the motivations of open source developers. This
motivation contrasts sharply with the workaday world of most programmers
who“spend their days grinding awayfor pay at programs they neitherneed nor
love.But not in the Linux world – which may explain why the average quality
of software originated in the Linux community is so high” (Raymond, 1998).
Thelessons extend on from there and are both interesting and instructive.
A defining characteristic of open source is that it lets you build on what
went before. It lets you start from somewhere, not from nowhere. It is a lot
easierto develop an application if youstart with a development base. Linusdid
thatwith Linux. Raymond did it with his more humble application, Fetchmail.
After Raymond recognized his need for an application, he did not just start
offprogramming it ex nihilo. That would have violated what Raymond (1998)
called the second lesson of open source development: “Good programmers
know what to write. Great ones know what to rewrite (and reuse).” People
typically think of code reuse in the context of general software engineering or
object-orientedstyle, class/library-based implementation. But reuse is actually
aquintessential characteristicand advantageof opensource development.When
onlyproprietary software is available,the source codefor previous applications
that a developer wants to improve or modify is, by definition, undisclosed. If
thesource code is not disclosed, it cannot be easily reused or modified, at least
withouta great deal ofreverse engineering effortwhich may evenbe a violation
of the software’slicensing requirements. If a proprietary program has an API,
it can be embedded in a larger application, on an as-is basis, but the hidden
source itself could not be adapted. Exactly the opposite is the case in the open
source world, where the source code is always disclosed by definition. Since
there is plenty of disclosed source code around, it would be foolish not to try
toreuse it as a point of departure for any related new development. Even if the
modificationis eventually thrown away or completely rewritten, it nonetheless
provides an initial scaffolding for the application. Raymond did this for his
e-mailclient, exactly as Linus had donewhen he initiated Linux. Linus had not
started with his own design. He started by reusing and modifying the existing
Minixopen source software developed by Tanenbaum. In Raymond’s case, he
“wentlooking for anexisting POP utility thatwas reasonably well coded,to use
asa developmentbase” (Raymond, 1998), eventuallysettling on anopen source
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
52 2 Open Source Internet Application Projects
e-mail client called Fetchpop. He did this intentionally, explicitly in imitation
of Linus’ approach to development. Following standard open source practice,
Raymondmodified Fetchpop and submitted his changes to the software owner
whoaccepted them and released it as an updated version.
Another principle of development is “reuse,” and then reuse and rebuild
againif appropriate. Fred Brooks’ had opined that a softwaredeveloper should
“plan to throw one away; you will anyhow” (Brooks, 1995). This is partly an
unavoidablecognitive constraint. To really understand a problem, you have to
try to solve the problem. After you’ve solved it once, then you have a better
appreciation of what the actual problem was in the first place. The next time
around, your solution can then be based on a more informed understanding
of the issues. With this in mind, Raymond anticipated that his first solution
might be only a temporary draft. So when the opportunity for improvement
presenteditself, he seized it. He came across another open source e-mail client
by Carl Harris called Popclient. After studying it, he recognized that it was
better coded than his own solution and he sent some patches to Harris for
consideration.However, as it turned out, Harris wasno longer interested in the
project.But, he gladly ceded ownership of the software to Raymond who took
on the role of maintainer for the Popclient project in mid-1996. This episode
illustrated another principle in the open source code of conduct: “When you
lose interest in a program, your last duty to it is to hand it off to a competent
successor”(Raymond, 1998). Responsible opensource fathers don’tleave their
childrento be unattended orphans.
Open source development has not always been distributed collaborative
development,which Raymondcalls bazaar styledevelopment. He describesthe
Linuxcommunity as resembling “a great babbling bazaar of differing agendas
and approaches. . .out of which a coherent and stable system could seemingly
emergeonly by a succession of miracles” (Raymond, 1998). He contrasts this
withone of the longest standing open source projects, the GNU project, which
had developed software the old-fashioned way, using a closed management
approach with a centralized team and slow software releases. With exceptions
like the GNU Emacs Lisp Library, GNU was not developed along the lines of
the Linux model. Indeed, consider the sluggishness of the development of the
GNUGCC Compiler, done in the traditional manner,versus the rapid develop-
mentthat occurred when the GCC project was bifurcated into two streams: the
regularGCC development mode and a parallel “bazaar” mode of development
`alaLinux for what was called EGCS (Experimental GNU Compiler System)
beginning in 1997. The difference in the rates of progress of the two projects
was striking. The new bazaar development style for EGCS dramatically out-
paced the conventional mode used for the GCC project, so much so that by
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.3 Fetchmail 53
1999 the original GCC project was sunset and development was placed under
the EGCS project, which almost amounted to a controlled experiment on the
relativeeffectiveness of the bazaar and conventional methods.
Inthe open source and Unix tradition, users tend tobe simultaneously users
and developers, often expert developers or hackers. As expected, Raymond’s
adopted Popclient project came with its own user base. So once again in con-
sciousimitation of the Linux development model, herecognized that this com-
munity of interest was an enormous asset and that “given a bit of encourage-
ment, your users will diagnose problems, suggest fixes, and help improve the
code far more quickly than you could unaided” (Raymond, 1998). The devel-
opment principle he followed was that “treating your users as co-developers
isyour least-hassle route to rapid code improvement and effective debugging”
(Raymond, 1998). This process of successfully engaging the user-developer
basewas exactly what Linus had done so well with Linux.
The use of early and frequent software releases was another quintessen-
tial characteristic of the Linux development process. This kept the user base
engagedand stimulated.Linus’ approach rancontrary to theconventional think-
ingabout development. Traditionally,people believedthat releasing premature,
buggyversions of software would turn users off. Of course, in the case of sys-
tem software like Linux and a user-developer base of dedicated, skilled hack-
ers, this logic did not apply. The Linux development principle was “Release
early.Release often. And listen to your customers” (Raymond, 1998). Granted
that frequent releases were characteristic in the Unix tradition, Linus went far
beyondthis. He “cultivatedhis base of co-developersand leveraged theInternet
forcollaboration” (Raymond, 1998)tosuchan extent and soeffectively that he
scaledup the frequent release practice by an order of magnitude overwhat had
ever been done previously. Releases sometimes came out at the unbelievable
rateof more than once a day. It was no accident that the initiation of the Linux
project and the burgeoninggrowth of the Internet were coincident because the
Internet provided both the distributed talent pool and the social interconnec-
tivitynecessary for this kind of development to happen. Raymond’s Fetchmail
project intentionally followed Linus’ modus operandi, with releases almost
alwaysarriving at most at 10-dayintervals, and sometimes evenonce a day `ala
Linux.
This community-participation-driven process unsurprisingly required a lot
of people skills to manage properly. Again per Linus’ practice, Raymond cul-
tivatedhis own beta list of tester supporters. The total number of participants
inhis project increased linearly from about 100 initiallyto around 1,500 over a
5-yearperiod, andwith user-developersreaching apeak of about300, eventually
stabilizing at around 250. During the same period the number of lines of code
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
54 2 Open Source Internet Application Projects
grewfrom under 10,000 to nearly 50,000. As with many open source projects,
thereare excellent development statistics. For Fetchmail, seethe historical and
statistical overviewat http://www.catb.org/esr/fetchmail/history.html.These
user-developers had to be kept engaged, just as Linus had to keep his user-
developersinterested. Their egos had to be stroked by being adequately recog-
nized for their contributions, and even given rapid satisfaction via the speedy
releases incorporating new patches. Raymond added anyone who contacted
him about Fetchmail to his beta list. Normally beta testing, where a product
is given exposure to real-world users outside the development organization,
would be the last round of testing of a product before its commercial release.
But in the Linux model, beta testing is dispersed to the user-developers over
manybeta style releases, prior to the releaseof a stable tested product for more
general users. Raymond would make “chatty announcements” to the group to
keepthem engaged and he listened closely to his beta testers. As a result, from
theonset he received high-qualitybug reports and suggestions.He summarized
theattitude with the observation that “if you treatyour beta-testers as if they’re
your most valuable resource, they will respond by becoming your most valu-
able resource” (Raymond, 1998). This not only requires a lot of energy and
commitment on the part of the project owner, it also means the leader has to
havegood interpersonal and communication skills. The interpersonalskills are
needed to attract people to the project and keep them happy with what’s hap-
pening. The communication skills are essential because communicating what
is happening in the project is a large part of what goes on. Technical skill is a
given, but personality or management skill is invariably a prominent element
inthese projects.
Theuser-developerbase is criticalto spottingand fixingbugs. Linusobserved
thatthe bug resolution processin Linux was typically twofold.Someone would
find a bug. Someone else would understand how to fix it. An explanation for
the rapidity of the debugging process is summarized in the famous adage:
“Given enough eyeballs, all bugs are shallow” (Raymond, 1998). The bazaar
developmentmodel appeared toparallelize debugging with amultitude of users
stressingthe behavior of the system in different ways. Given enough suchbeta
testers and codevelopers in the open source support group, problems could be
“characterized quickly and the fix (would be) obvious to someone.” Further-
more, the patch “contributions(were) received not from a random sample, but
frompeople who (were) interested enoughto use the software, learn abouthow
itworks, attempt to find solutions to problemsthey encounter(ed), and actually
produce an apparently reasonable fix. Anyone who passes all these filters is
highly likely to have something useful to contribute” (Raymond, 1998). On
thebasis of this phenomenon, not only were recognized bugs quickly resolved
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.3 Fetchmail 55
in Linux development, the overall system was relatively unbuggy, as even the
Halloweendocuments from Microsoft observed.
Debuggingin anopen sourceenvironment isextremely different fromdebug-
gingin aproprietary environment. Afterdiscussions with opendevelopers, Ray-
mondanalyzed in detail how the debugging processworks in open source. The
key characteristic is source-code awareness. Users who do not have access to
source code tend to supply more superficial reports of bugs. They provide not
onlyless background information but alsoless “reliable recipe(s) for reproduc-
ingthe bug” (Raymond, 1998). In a closed source environment, the user-tester
is on the outside of the application looking in, in contrast to the developer
who is on the inside looking out and trying to understand what the bug report
submitted by a user-observer means. The situation is completely different in
an open source context where the “tester and developer (are able) to develop
a shared representation grounded in the actual source code and to communi-
cateeffectively about it” (Raymond, 1998). He observesthat “most bugs, most
of the time, are easily nailed given even an incomplete but suggestive charac-
terization of their error conditions at source-code level (italics added). When
someone among your beta-testers can point out, ‘there’s a boundary problem
in line nnn’, or even merely ‘under conditions X, Y, and Z, this variable rolls
over’, a quick look at the offending code often suffices to pin down the exact
modeof failure and generate a fix” (Raymond, 1998).
Theleader of an open source developmentproject does not necessarily have
to be a great designer himself, buthe does have to be able to recognize a great
design when someone else comes up with one. At least this is one of Ray-
mond’sinterpretations of the Linux development process. It certainly reflects
whatoccurred in his own project.By a certain point, he hadgone through mod-
ificationsof two preexisting open source applications: Fetchpop, where he had
participated briefly as a contributor, and Popclient, where he had taken over
as the owner and maintainer from the previous project owner. Indeed he says
thatthe “biggest single payoff I got from consciously trying to emulate Linus’
methods” happened when a “user gave me this terrific idea – all I had to do
was understand the implications” (Raymond, 1998). The incident that precip-
itated the revelation occurred when Harry Hochheiser sent him some code for
forwardingmail to the client SMTP port. Thecode made Raymond realize that
he had been trying to solve the wrong problem and that he should completely
redesign Fetchmail as what is called a Mail Transport Agent: a program that
movesmail from one machine to another. The Linuxlessons he was emulating
atthis point were twofold: the second best thingto having a good idea yourself
is“recognizing good ideasfrom your users” andthat it isoften the case that“the
most striking and innovative solutions come from realizing that your concept
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
56 2 Open Source Internet Application Projects
ofthe problem was wrong” (Raymond, 1998).The code of the redesigned soft-
ware turned out to be both better and simpler than what he had before. At this
point it was proper to rename the project. He called it Fetchmail. Fetchmail
wasnowatool that any Unix developer with a PPP (Point-to-Point Protocol)
mail connection would need, potentially a category killer that fills a need so
thoroughly that alternatives are not needed. In order to advance it to the level
ofa truly great tool, Raymond listened to hisusers again and added some more
keyfeatures like what is called multidrop support (which turns out tobe useful
forhandling mailing lists) and support for 8-bit MIME.
Raymondalso elaboratescogently on thekey preconditions fora bazaar style
of development to be possible in the first place. These include programmatic,
legal,and communication requirements. The programmatic requirements were
particular to a project, and the legal and communications infrastructure were
genericrequirements for the entire phenomenon.
The Linux style design process does not begin in a vacuum. Programmati-
cally,in open source development there hasto be something to put on the table
beforeyou can start having participants improve, test, debug,add features, and
soon to the product. Linus, forexample, began Linux with apromising prelim-
inarysystem which in turnhad been precipitated by Tanenbaum’searlier Minix
kernel.The same was true for Raymond’s Fetchmail, which, like Linux, had a
“strong, attractive basic design(s)” before it went public. Although the bazaar
style of development works well for testing, debugging, code improving, and
program design, one cannot actually originate a product in this forum. First of
all,there has to be a program to puton display that runs! There cannot be justa
proposalfor an idea. In opensource, code talks. Secondly,the running program
has to haveenough appeal that it can “convince potential co-developers that it
canbe evolved into somethingreally neat in the foreseeablefuture” (Raymond,
1998). It may have bugs, lack key features, and have poor documentation, but
it must run and have promise. Remember that “attention is still a nonrenew-
able resource” and that the interest of potential participants as well as “your
reputationis on the line” (Fogel and Bar, 2003).
Another precondition for bazaar-style open development is the existence
of an appropriate legal framework. The nondisclosure agreements of the pro-
prietary Unix environment would prevent this kind of free-wielding process.
Explicitly formulated and widely recognized free software principles lay the
groundfor a legal milieu people can understand and depend on.
Prior to free software and Linux, the open development environment was
not only legally impeded but geographically handicapped as well. The field
already knew from extensive experience with the multidecade effort in Unix
that great software projects exploit “the attention and brainpower of entire
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.3 Fetchmail 57
communities” eventhough coding itself remains relatively solitary.So collab-
orativedistributed development was a recognized model – and after all it was
the classic way in which science had always advanced. But remote collabora-
tionhad remained clumsy, and an effectivecommunications infrastructure that
developerscould work in was needed. The Internet had to emergeto transcend
thegeographically bound developer communities at institutionslike Bell Labs,
MIT, and Berkeley that did foster free interactions between individuals and
groupsof highly skilled, but largely collocated, codevelopers. Withthe WWW
emerging, the collaborative approach represented by these traditional groups
couldbe detached from its geographic matrix and could even be exponentially
larger in terms of the number of people involved. At that point, one merely
neededa developer who knew “how to create an open, evolutionary context in
which feedback exploring the design space, code contributions, bug-spotting,
and other improvements come from hundreds (perhaps thousands) of people”
(Raymond,1998).
Linuxemerged when these enabling conditions wereall in place. The Linux
projectrepresented a conscious decision byTorvalds to use“the entire world as
itstalent pool” (Raymond,1998). Before the Internetand the WWW thatwould
have been essentially unworkable and certainly not expeditious. Without the
legalapparatus of freeand open software, theculture of development wouldnot
havehad a conceptualframework within which todevelop its process. Butonce
these were in place, things just happened naturally. Linux, and its intentional
imitatorslike Fetchmail, soon followed.
Traditional project management has well-identified, legitimate concerns:
howare resources required, people motivated, work checked for quality, inno-
vation nurtured, and so on. These issues do not disappear just because the
development model changes. Raymond describes how the project manage-
ment concerns of the traditional managerial model of software development
are addressed or obviated in the bazaar model of development. Let us assume
the basic traditional project management goals are defining the project goals,
making sure details are attended to, motivatingpeople to do what may be bor-
ing,organizing people to maximize productivity, and marshaling resources for
theproject. How are these objectives met in open source development?
To begin with, consider human resources. In open projects like Linux the
developerswere, at least initially, volunteers, self-selected on thebasis of their
interest, though subsequently they may have been paid corporate employees.
Certainlyat thehigher levels ofparticipation, they hadmeritorious development
skills, arguably typically at the 95% percentile level. Thus, these participants
broughttheir own resources to the project, thoughthe project leadership had to
be effectiveenough to attract them in the first place and then retain them. The
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
58 2 Open Source Internet Application Projects
openprocess also appears to be ableto organize people very effectivelydespite
thedistributed environment. Because participants tend to be self-selected,they
come equipped with motivation, in possible contrast to corporate organiza-
tions based on paid employees who might rather be doing something else or
at least working on a different project. The monitoring provided in a conven-
tionalmanagerial environment is implemented radically differently in an open
source, where it is replaced by widespread peer and expert review by project
leaders,maintainers, committers, or betatesters. In fact, inopen source “decen-
tralized peer review trumps all the conventional methods for trying to ensure
that details don’t get skipped” (Raymond, 1998). Finally, consider the initial
definition of the project, an issue that is also directly related to the question of
innovativeness.Open projects like Linux have been criticized for chasing the
taillights of other extant projects. This is indeed one of the design techniques
that has been used in the free software movement where part of the histori-
cal mission has to be to recreate successful platforms in free implementations
(seesuch as Bezroukov (1999)ontheHalloween-I document). However, open
projects do not always imitate. For example, Scacchi (2004,p.61)describes
howin the creation of open requirements for game software, the specifications
“emergeas aby-product of communitydiscourse about whatits software should
or shouldn’t do .. . and solidify into retrospective software requirements.” On
the other hand, the conventional corporate model has a questionable record
of defining systems properly, it being widely believed that half to three quar-
ters of such developments are either aborted before completion or rejected by
users. Creativeideas ultimately come from individuals in any case, so what is
neededis an environmentthat recognizes andfosters such ideas,which the open
sourcemodel seems to doquite well. Furthermore, historically,universities and
researchorganizations haveoften been the sourceof software innovation,rather
thancorporate environments.
We conclude with some comments about the bazaar metaphor and Linus’
law.Tobegin with, letus note that thoughRaymond’s seminal bazaarmetaphor
is striking; everymetaphor has its limitations. The imagery resonates with the
myriadof voices heard in an open development and has an appealing romantic
cache. The term also probably resonates with Raymond’s personal libertarian
beliefs, with their eschewal of centralized control. But it lacks adequate refer-
enceto an element essential in such development: the required dynamic, com-
petent, core leadership with its cathedral-like element. Of course, Raymond’s
essay clearly acknowledges this, but the bazaar metaphor does not adequately
capture it. Feller and Fitzgerald (2002,p.160)point out that many of the
most important open source projects from Linux and Apache to GNOME and
FreeBSD are in fact highly structured, with a cadre of proven developers with
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.3 Fetchmail 59
expertise acknowledged in the development community (p. 166). Raymond
himself underscores that a project must begin with an attractive base design
whichmore often than not comes from one or perhaps a few individuals. Con-
tributionsfrom a broader contributor pool may subsequently radically redefine
theoriginal vision or prototype, but all along there is either a single individual
like Linus Torvalds or a small coherent group that adjudicates and vets these
contributions,integrating them in a disciplined way and generally steering the
ship (Feller and Fitzgerald, 2002,p.171). The bazaar can also be a source
of distraction. Participation by “well meaning .. . (but) dangerously half clued
peoplewith opinions – not code, opinions” (Cox, 1998) may proliferate like in
a bazaar,but this does not advance the ship’s voyage. Such caveats aside, it is
also indisputable that some of the unknown voices emanating from the bazaar
may ultimately prove invaluable, even if this is not obvious at first. As Alan
Cox observes, there are “plenty of people who given a little help and a bit of
confidence boosting will become one with the best” (Cox, 1998). The bazaar
canalso provide helpful resources from unexpectedsources. For example, Cox
advises that “when you hear ‘I’d love to help but I can’t program’, you hear a
documenter.When they say ‘But English is not my first language’ you have a
documenterand translator for another language” (Cox, 1998).
We next comment about the reliability achieved by open processes. Ray-
mond’sstatement of Linus’ law, “with enough eyeballs, all bugs are shallow,”
focuses on parallel oversight as one key to the reliability of open source. The
implication is that bugs are detected rapidly. One might ask “are the products
actually more reliable and is the reliability due to the fact that there are many
overseers?”The record of performance for open source systems generallysup-
ports the thesis that open development is often remarkably effective in terms
of the reliability of its products. There are also specific studies like the anal-
ysis of the comparative reliability of MySQL mentioned in Chapter 1.Even
the Halloween memos from Microsoft refer to Microsoft’s own internal stud-
ies on Linux that accentuate its record of reliability. The “eyeballs” effect is
presumablypart of the explanation for this reliability.
Researchby Payne (1999,2002), which compares securityflaws in open and
closed systems, suggests that other causes may be at work as well. It suggests
that a mixture of practices explains reliability/security differences for the sys-
temsstudied. The study examined the security performance of three Unix-like
systemsin an effort to understandthe relation between the security characteris-
ticsof the systems, their open or closed source status, and their specific devel-
opment processes. The systems considered were OpenBSD, the open source
Debian GNU/Linux distribution, and the closed source Sun Solaris system.
Granted the myriad uncertainties intrinsic to any such study, Payne concluded
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
60 2 Open Source Internet Application Projects
that in terms of metrics like identified security vulnerabilities, OpenBSD was
themost secure of the three, followedat a considerable distance by Debian and
Solaris,with a slight edge given to Debian over Solaris. The results suggest an
overall security advantage for open source systems, though the relative close-
ness of the Debian and Solaris evaluations imply that open source status per
se is not the decisive driving factor. Indeed, as it turns out, there was a major
difference in the development processes for the systems that likely explains
much of the outcome. Namely, the “OpenBSD source code is regularly and
purposely examined with the explicit intention of finding and fixing security
holes. . .by programmers with the necessary background and security exper-
tiseto make a significant impact” (Payne, 2002). Inother words, the factor that
produced superior security may actually have been focused in auditing of the
code by specialists during development, though open source status appears to
bea supplemental factor.
Another perspectiveon the reliability or security benefits of open source is
provided by Witten et al. (2001). Their analysis is guarded about the general
abilityof code reviews to detect securityflaws regardless of the mode of devel-
opment.However, theyobserve that the proprietary developmentmodel simply
obliges users to “trust the source code and review process, the intentions and
capabilitiesof developers to build safe systems, and the developer’s compiler”
andto “forfeitopportunities for improvingthe security oftheir systems”(Witten
et al., 2001,p.61).They also underscore the important role that open compil-
ers, whose operation is itself transparent, play in instilling confidence about
what a system does. Incidentally, they observe how security enhancing open
compilers like Immunix Stackguard, a gcc extension (see also Cowan, 1998),
canadd so-called canaries toexecutables that can “defeat manybuffer overflow
attacks”(Witten et al., 2001,p.58).From this perspective, Linus’ law is about
more than just parallel oversight.It is a recognition of the inherent advantages
of transparency: open source code, a process of open development, the ability
tochange code, giving the user control of the product, open oversightby many
community observers, and even the transparency and confidence provided by
opencompilers.
References
Bezroukov, N. (1999).A Second Look at the Cathedral and Bazaar. First Monday,
4(12).http://www.firstmonday.org/issues/issue4
12/bezroukov/.Accessed January
5,2007.
Brooks,F.P.(1995). The Mythical Man-Month – Essays on Software Engineering,20th
AnniversaryEdition, Addison-Wesley Longman, Reading, MA.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.4 The Dual License Business Model 61
Cowan, C. (1998). Automatic Detection and Prevention of Buffer-Overflow Attacks.
In: Proceedings of the 7th USENIX Security Symposium, USENIX, San Diego,
63–78.
Cox, A. (1998). Cathedrals, Bazaars and the Town Council. http://slashdot.org/
features/98/10/13/1423253.shtml.Accessed December 6, 2006.
Feller,J. and Fitzgerald,B. (2002). Understanding OpenSource Software Development.
Addison-Wesley,Pearson Education Ltd., London.
Fogel, K. and Bar, M. (2003). Open Source Development with CVS, 3rd edition.
ParaglyphPress. http://cvsbook.red-bean.com/.
Payne, C. (1999). Security through Design as a Paradigm for Systems Development.
MurdochUniversity, Perth, Western Australia.
Payne,C. (2002).On the Securityof Open SourceSoftware. Information Systems,12(1),
61–78.
Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.
firstmonday.dk/issues/issue3
3/raymond/index.html.Ongoing version: http://
www.catb.org/esr/writings/cathedral-bazaar/.Accessed December 3, 2006.
Scacchi, W.(2004). Free and Open Source Development Practices in the Game Com-
munity.IEEE Software, 21(1), 59–66.
Witten,B., Landwehr, C., and Caloyannides, M. (2001).Does Open Source Improve
SystemSecurity? IEEE Software, 18(5), 57–61.
2.4 The Dual License Business Model
Asoftware productcan be offeredunder different licensesdepending, for exam-
ple, on how the software is to be used. This applies to proprietary and open
sourceproducts and provides abasis for a viable businessmodel. Karels (2003)
examines the different commercial models for open products. The Sendmail
and MySQL products described later are representative. They have their feet
plantedfirmly in two worlds,the commercial one andthe open community one.
Onthe one hand,the business model provides“extensions or aprofessional ver-
sion under a commercial license” for the product (Karels, 2003). At the same
time, the company that markets the product continues its management of the
openversion. A keydistinction in the duallicense model is whetherthe free and
commercialproducts are identical. For the companies and products we discuss
thebreakout as follows:
1.Open and proprietary code different Sendmail,Inc.
2.Open and proprietary code same MySQLAB, Berkeley DB, Qt
But with the proper license, proprietary enhancements can be done, for
example, for MySQL. Licensing duality can serve a company in a number of
ways. It continues the operation of the open user-developer base. It also pro-
motesgoodwill for the company with theuser-developer base. It maintains and
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
62 2 Open Source Internet Application Projects
improvesacceptance forthe open sourcebase version. Thecontinued sponsored
developmentof the openversion simultaneously helps maintainand expand the
market for the commercial product. With products like Sendmail, the propri-
etary enhancements may include security improvements, such as e-mail virus
checking. Its distributions may typically provide “configuration and manage-
menttools, higher-performance or higher-capacity versions” (Karels, 2003)to
supplement the root product in order to make a more attractive, commercially
viable product. The company’s product can thus end up incorporating both
open and commercially licensed software. From the customer’s point of view,
theproduct is now much like a traditional softwareproduct that is licensed and
paid for.One ongoing challenge for the distributor is to “maintain differentia-
tion between the free and commercial versions” since the commercial product
competeswith its open fork, at least when the commercial version is different.
In order for the open version to retain market share, its functionality has to be
maintained and upgraded. In order for the commercial version to avoid com-
petition from evolving open variants, it has to continue to provide “sufficient
additionalvalue to induce customers to buy it and to reduce the likelihood of a
free knockoff of the added components” (Karels, 2003). The commercial ver-
sion also has to provide all the accoutrements associated with a conventional
softwareprovider, such as support, training, and product documentation.
Productsthat canbe successfully marketedunder a duallicensing framework
tendto have what are called strong network effects; that is, thebenefit or value
ofusing a copy of the software tendsto depend on how many other people also
use the software. For example,e-mail is not of much value if you have no one
to e-mail; conversely, its value is greater the more people you can reach. For
such products, the broader the user base, the more valuablethe product. In the
dual license model, the free, open license serves the key role of establishing
a wide user base by helping to popularize the product with users (Valimaki,
2005). This popularized base helps make the product a branded entity, which
is extremely valuable for marketing purposes, especially for IT organizations
that are converting to open applications (Moczar, 2005). These effects in turn
makethe proprietary license moreworth buying for therelatively limited group
of potential commercial developers. It also makes it more attractive for them
to create a commercial derivative under the proprietary license because; for
example,the productwill havean established audienceof users.There is another
economic reason for allowing mixed license options like this: the open source
versioncan beused to buildup acoterie of independentdevelopers, bugspotters,
and so on – free contributors who can help enhance the product in one way
or another, benefiting the commercial company. As we will see later when
we discuss configuration management tools, Larry McVoycreated a dual faux
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.4 The Dual License Business Model 63
free license for BitKeeper, especially for use on the Linux kernel. This had
the expresspurpose of building BitKeeper’sproprietary market share and also
obtaining highly useful input about bugs in the product from the Linux kernel
developers who used it. This “dual licensing” business strategy worked very
welluntil ithad tobe discontinued becauseof theopen community controversies
aboutthe “free” license version.
References
Karels,M. (2003). Commercializing Open Source Software. ACMQueue, 1(5), 46–55.
Moczar, L. (2005). The Economics of Commercial Open Source. http://pascal.case.
unibz.it/handle/2038/501.Accessed November 29, 2006.
Valimaki,M. (2005). The Rise of Open Source Licensing: A Challenge to the Use of
IntellectualProperty in the Software Industry.Turre Publishing, Helsinki, Finland.
2.4.1 Sendmail
The open source product Sendmail is a pervasive Internet e-mail applica-
tion. However, its story is much less well known than Fetchmail’s because
ofthe unique influence of Raymond’s (1998) Cathedral and Bazaar article that
ensconced Fetchmail’s development as a canonical story of the practices and
principlesof open development,even though Fetchmailis for a farmore limited
use than Sendmail. Sendmail is worth discussing, not only because it carries
mostof the world’s e-mail traffic, butbecause it represents another major open
source application that eventually morphed into dual open and commercial
versions.
TheSendmail project was started asan open project at UC Berkeleyin 1981
byEric Allman, who has also maintained the project sincethat time as an open
development.Allman had previously authored theARPANET mail application
delivermailin 1979 that was included with the Berkeley Software Distribution
(BSD),which in turn would subsequently include Sendmail instead. Sendmail
is a Mail Transfer Agent or MTA. As such, its purpose is to reliably transfer
e-mail from one host to another, unlike mail user agents like Pine or Out-
look that are used by end users to compose mail. The software operates on
Unix-likesystems, though there is also a Windowsversion available. The open
sourceversion ofSendmail is licensed underan OSI-approvedBSD-like license
(seehttp://directory.fsf.org for verification), as well as, since 1998, under both
genericand custom commerciallicenses. Sendmail servesa significant percent-
ageof all Internet sites, though there appears to be a decline over time (Weiss,
2004).It represents a defacto Internet infrastructure standardlike TCP/IP, Perl,
andApache.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
64 2 Open Source Internet Application Projects
Naturally,andhealthily, thefree softwaremovement hasnever beenabout not
making money.The time frame in which time spent, effort, and talent are fun-
giblewith compensation may be elastic, but it is finite. People eventually have
tocash in on theirproducts or their expertise inone way or another,or switch to
anotherline of workwhere theycan make aliving. This isobviously appropriate
andexpected sincetheir involvementwill almostalways havebeen long,intense,
andvaluable. Thus for Sendmailas with many open sourceprojects, the project
originatorand leader Allmaneventually took theproject on a commercialroute,
establishingSendmail Inc. in 1998 in order todevelop a commercial version of
the software (see http://www.sendmail.org/ and the complementary commer-
cial site http://www.sendmail.com). The expanded commercialized version of
the product is differentfrom the open source version and offers many features
notavailable in the open version. For example, it providesa GUI interface that
significantlyfacilitates theinstallation and configuration ofthe software, incon-
trastto the open source version thatis well known to be extremely complicated
to install. The commercial product also incorporates proprietary components
that are combined with the open source components. According to the Send-
mail Inc. Web site, their commercial products provide “clear advantage over
open source implementations” in a variety of ways, from security and techni-
cal support to enhanced features. The Sendmail Inc. and its project maintainer
Allmanstill manage the development of the open source project, but they now
use its development process to also support the continued innovation of both
theopen and the commercial version of the product. After the Sendmail incor-
poration,the licensing arrangement was updatedto reflect the dual open source
andcommercial offerings. The original license for the pure open source appli-
cationremained essentially the same, afew remarks about trademarksand such
aside.For redistribution as part of a commercial product, a commerciallicense
isrequired.
References
Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.
firstmonday.dk/issues/issue3
3/raymond/index.html.Ongoing version: http://
www.catb.org/esr/writings/cathedral-bazaar/.Accessed December 3, 2006.
Weiss,A. (2004). Has Sendmail Kept Pace in the MTARace? http://www.serverwatch.
com/stypes/servers/article.php/16059
3331691.Accessed December 1, 2006.
2.4.2 MySQL – Open Source and Dual Licensing
MySQL (pronounced My S-Q-L) is the widely used open source relational
database system that provides fast, multiuser database service and is capable
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.4 The Dual License Business Model 65
of handling mission-critical applications with heavy loads. It is suitable for
both Web server environments and embedded database applications. MySQL
is famous in open source applications as the M in the LAMP software archi-
tecture. The company that owns MySQL provides a dual licensing model for
distribution that permits both free and proprietary redistribution. One of the
notablecharacteristics of MySQL as an open source project is that virtually all
ofits developmentis done bythe companythat owns theproduct copyright. This
modelhelps keepits license ownershippure, ensuringthat its proprietarylicense
optionremains undisturbed. (We will not considerthe other major open source
database PostgreSQL (pronounced postgres Q-L) which is licensed under the
BSDlicense. The BSD’s terms are so flexible that the duallicensing model we
are considering does not seem to come into play unless proprietary extensions
aredeveloped or value-added services are provided.)
MySQL was initially developed by Axmark and Widenius starting in 1995
and first released in 1996. They intended it to serve their personal need for an
SQL interface to a Web-accessible database. Widenius recalls that their moti-
vationfor releasing it as an open sourceproduct was “because we believed that
wehad created something good and thought that someone else could probably
havesome use forit. Webecame inspired and continuedto work onthis because
ofthe very good feedback we got from people that tried MySQL and loved it”
(Codewalkers,2002). The trademarked product MySQL is now distributed by
the commercial company MySQL AB founded by the original developers in
2001.The company owns the copyright to MySQL.
All the core developers who continue the development work on MySQL
work for MySQL AB, even though they are distributed around the world. The
complexityof theproduct is onedisabling factorin allowing third-partyinvolve-
ment in the project (Valimaki, 2005). Independent volunteer contributors can
propose patches but, if the patches prove to be acceptable, these are generally
reimplemented by the company’s core developers. This process helps ensure
that the company’scopyright ownership of the entire product is never clouded
ordiluted (Valimaki,2005). The code revisions ensurethat GPL’d code created
byan external developer, whois de facto its copyright owner,is not included in
the system, so that the overallsystem can still be licensed under an alternative
ordual proprietary license that isnot GPL. The business and legalmodel is fur-
therdescribed later. Sometimes code contributionsare accepted under a shared
copyrightwith the contributor.Despite the strict handlingof patch proposals by
external contributors, a significant number of the company’s employees were
actually recruited through the volunteer user-developer route. Indeed, accord-
ing to Hyatt (2006), of MySQL AB’s 300+ full-time employees, 50 of them
wereoriginally open source community volunteers for MySQL.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
66 2 Open Source Internet Application Projects
MySQLwas originally distributed for Unix-like environments under afree-
of-chargeand freeopen source licensethat allowed free redistributionunder the
usualcopyleft restriction (accordingto which anymodifications had tobe redis-
tributableunder the same terms as the original license). On the other hand, the
distribution license of MySQL in Windows environments was originally only
as so-called shareware that allowed copying and redistribution of the product
but did not permit modification and in fact required users to pay a license fee
to use the software after an initial free trial period. This was changed to the
standardGPL for all platforms after 2000 (Valimaki, 2005).
The MySQL AB distribution uses the dual licensing business model. The
sametechnical product is distributed both as a free GPL-licensed package and
under different licensing terms for the purpose of proprietary development.
Refer to http://www.mysql.com/company/legal/licensing/ (accessed January
10,2007) for the legal terms of the license. Some of the basic pointsto keep in
mindfollow.If you embedMySQL in aGPL’d application, thenthat application
hasto be distributed as GPL by the requirements ofthe GPL license. However,
theMySQL proprietary license allows commercial developersor companies to
modifyMySQL and integrate itwith their ownproprietary products and sell the
resulting system as a proprietary closed source system. The license to do this
requiresa fee ranging up to $5,000 per year for the company’shigh-end server
(asof 2005). Thus if youpurchase MySQL under this commercial license,then
youdo not have to comply with the terms of the GNU GeneralPublic License,
ofcourse only in so far as it applies to MySQL.
You cannot in any case infringe on the trademarked MySQL name in any
derivativeproductyou create, an issuethat arose inthe dispute betweenMySQL
andNuSphere (MySQL News Announcement, 2001). The commercial license
naturally provides product support from MySQL AB, as well as product war-
ranties and responsibilities. These are lacking in the identical but free GPL’d
copy of the product, which is offered only on an as-is basis. This proprietary
form of the license is required even if you sell a commercial product that only
merely requires the user to download a copy of MySQL, or if you include a
copyof MySQL, or include MySQL drivers in a proprietary application! Most
of MySQL AB’sincome derives from the fees for the proprietary license with
additional revenues from training and consultancy services. The income from
these services and fees adds up to a viable composite open source/proprietary
businessmodel. As per Valimaki(2005), most of the company’sincome comes
fromembedded commercial applications.
The preservation of copyright ownership is a key element in the continued
viability of a dual license model like MySQL AB’s. In particular, the licensee
must have undisputed rights to the software in order to able to charge for the
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.4 The Dual License Business Model 67
software, distribute it under different licenses, or modify its licensing policies
(Valimaki, 2005). The generic open source development framework, where
developmentis fullycollaborative anddistributed, tends toundermine or diffuse
copyright ownership since there are many contributors. For example, there
could be “hidden liabilities in code contributions from unknown third parties”
(Valimaki,2005).Thus, maintainingthe ability todual license witha proprietary
optionmandates that the developer of any newor modified code for the system
mustensure that he has exclusive copyright to the work – whence the cautious
behavior by MySQL AB with respect to how development contributions are
handled.
MySQLAB appears to apply a ratherstrict interpretation of what the condi-
tionsof the GPL mean in termsof its own legal rights (Valimaki,2005,p.137).
Forexample, consider a hypothetical case where someone develops a client
for MySQL. The client software might not even be bound either statically or
dynamically with any of the MySQL modules. The client software could just
use a user-developed GUI to open a command prompt to which it could send
dynamically generated commands for MySQL based on inputs and events at
the user interface. The client would thus act just like an ordinary user, except
thatthe commands it would tell MySQL to execute wouldbe generated via the
graphical interface based on user input, rather than being directly formulated
and requested by the user. However, since the composite application requires
the MySQL database to operate, it would, at least according to MySQL AB’s
interpretationof the GPL,constitute a derivativework of MySQLand so besub-
ject to GPL restrictions on the distributionof derivative works if they are used
forproprietary redistribution; that is, the client,even though it used noMySQL
code, would be considered a derivative of MySQL according to MySQL AB.
Asper Valimaki(2005,p.137),it seems that “thecompany regards allclients as
derivativeworks and inorder to even usea client with other termsthan GPL the
developer of the client would need to buy a proprietary license from MySQL
AB” and that in general “if one needs their database in order to run the client,
thenone is basicallyalso distributing MySQLdatabase and GPLbecomes bind-
ing,” though this does not appear to be supported by either the standard GPL
interpretationor the copyright law on derivative works (Valimaki, 2005).
Ironically,in the tangled Webof legal interactions that emerge betweencor-
porate actors, MySQL AB was itself involved in mid-2006 in potential legal
uncertaintiesvis- `a-vis the OracleCorporation and one of its own components.
Thesituation concerned MySQL AB’s use of the open source InnoDB storage
engine,a key componentthat was critical toMySQL’s handling oftransactions.
TheInnoDB component ensures that MySQL is ACID compliant. (Recall that
the well-known ACID rules for database integrity mean transactions have to
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
68 2 Open Source Internet Application Projects
satisfy the following behaviors: atomicity:nopartialtransaction execution
it’s all or nothing in terms of transaction execution; consistency: transactions
must maintain data consistency – they cannot introduce contradictions among
the data in the database; isolation: concurrent transactions cannot mutually
interfere;durability: committed transactions cannot be lost – forexample, they
must be preserved by backups and transaction logs.) Oracle acquired the Inn-
oDB storage engine and in 2006, it bought out Sleepycat which makes the
BerkeleyDB storage engine also used byMySQL. The InnoDB storage engine
waseffectively a plug-in forMySQL, so alternative substitutions wouldbe fea-
sible in the event that MySQL’s continued use of the storage engine became
problematic. Additionally,the InnoDB engine is also available under the GPL
license. Nonetheless, such developments illustrate the strategic uncertainties
that even carefully managed dual licensed products may be subject to (Kirk,
2005).
References
Codewalkers. (2002). Interview with Michael Widenius. http://codewalkers.com/
interviews/Monty
Widenius.html.Accessed November 29, 2006.
Hyatt,J. (2006). MySQL: Workersin 25 Countries with No HQ. http://money.cnn.com/
2006/05/31/magazines/fortune/mysql
greatteams fortune/. Accessed November
29,2006.
Kirk, J. (2005). MySQL AB to Counter Oracle Buy of Innobase. ComputerWorld,
November23. http://www.computerworld.com.au/index.php/id;1423768456.
AccessedFebruary 11, 2007.
MySQL News Announcement. (2001). FAQon MySQL vs. NuSphere Dispute. http://
www.mysql.com/news-and-events/news/article
75.html. Accessed November, 29
2006.
Valimaki,M. (2005). The Rise of Open Source Licensing: A Challenge to the Use of
IntellectualProperty in the Software Industry.Turre Publishing, Helsinki, Finland.
2.4.3 Sleepycat Software and TrollTech
Sleepycat Software and TrollTech are two other examples of prominent and
successful companies that sell open source products using a dual licensing
businessmodel.
BerkeleyDB
Sleepycat Software owns, sells, and develops a very famous database system
called BerkeleyDB. The software for this system was originally developed as
part of the Berkeley rewrite of the AT&T proprietary code in the BSD Unix
distribution, a rewrite done by programmers Keith Bostic, Margo Seltzer, and
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.4 The Dual License Business Model 69
Mike Olson. The software was first released in 1991 under the BSD license.
Recall that the BSD license allows proprietary modification and redistribution
of software with no payments required to the original copyright owners. The
originalsoftware became widelyused and embeddedin a numberof proprietary
products.
BerkeleyDB is not an SQL database. Queries to the Berkeley DB are done
throughits own specific API. The system supports many commercialand open
applications,ranging from majorWeb sitesto cell phones, andis one ofthe stor-
ageengines available for MySQL. The softwarecurrently has over 200 million
deployments(Martens, 2005). Berkeley DB is a C library thatruns in the same
process as an application, considerably reducing interprocess communication
delays. It stores data as key/value pairs, amazingly allowing data records and
keysto be up to 4 GB in length, with tables up to 256 TB. The Sleepycat Web
sitedescribes the Berkeley DBlibrary as “designed torun in a completely unat-
tended fashion, so all runtime administration is programmatically controlled
by the application, not by a human administrator. It has been designed to be
simple, fast, small and reliable” (sleepycat.com). We refer the reader to the
article by Sleepycat CTO Margo Seltzer (2005) for a commanding analysis of
the opportunities and challenges in database system design that go far beyond
thetraditional relational model. Inresponse to the demandfor Berkeley DBand
some needs for improvements in the software, its developers founded Sleep-
ycat, further significantly developed the product, and subsequently released
it under a dual licensing model in 1997 (Valimaki, 2005). The license model
used by Sleepycat is like that of MySQL. Versionsearlier than 2.0 were avail-
able under the BSD, but later versions are dual licensed. The noncommercial
license is OSI certified. However, the commercial license requires payment
for proprietary, closed source redistribution of derivatives. About 75% of the
company’s revenues come from such license fees. Similarly to the MySQL
AB model, Berkeley DB software development is basically internal, with any
externalcode contributions reimplemented by the company’s developers. This
is motivated, just as in the case of MySQL, not only by the desire to keep
ownershippure but also because of the complexity of the product.
TheQt Graphics Library
Another company that has long used a dual license model is TrollTech. Troll-
Techdevelops and markets a C++class library of GUI modules called Qt (pro-
nounced“cute” by its designers), which was eventually adopted by and played
animportant role inthe open sourceKDE project. Qtis cross-platform, support-
ing Unix-like, Windows,and Macintosh environments and provides program-
merswith an extensivecollection of so-calledwidgets. It isan extremely widely
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
70 2 Open Source Internet Application Projects
usedopen GUI developmentlibrary.Qt was firstpublicly released in 1995,orig-
inallyunder a restrictive though still open source license, which prohibited the
free redistribution of modifications. The use of Qt in the GPL’d KDE desktop
environmentcaused a well-known licensing controversy.Eventually TrollTech
was pressured bythe open source community to release its product not merely
asopen sourcebut under theGPL and despitethe initial aversionof thecompany
founder toward the GPL because of doubts about the implications of the GPL
in a dual license context (Valimaki, 2005). As it turned out, the approach was
successful and increased the popularity of the product. The company’s sales
derivelargely from its licensing fees. Proprietary development of the product
requires it be purchased under a commercial license. The free version helps
maintainthe open source user base. An educational version of the product that
integrateswith Microsoft’s Visual Studio.NET is available for Windows.
References
Martens,C. (2005). Sleepycat to Extend Paw to Asia. InfoWorld.http://infoworld.com/
article/05/06/22/HNsleepycat
1.html.Accessed November 29, 2006.
Seltzer,M. (2005). Beyond Relational Databases. ACM Queue, 3(3), 50–58.
Valimaki,M. (2005). The Rise of Open Source Licensing: A Challenge to the Use of
IntellectualProperty in the Software Industry.Turre Publishing, Helsinki, Finland.
2.5 The P’s in LAMP
The first three letters of the ubiquitous LAMP open source software stack
stand for Linux, Apache, and MySQL. The last letter P refers to the scripting
languageused and encompasses the powerful programming language Perl, the
scriptinglanguage PHP,and theapplication language Python.These areall open
source(unlike, for example, Javawhose major implementations areproprietary
even if programs written in it are open). Perl comes with an immense open
libraryof Perl modules called CPAN.We willfocus our discussion on PHP and
Perl. Concerning Python, we only note that it is a widely ported open source
programming language invented by Guido van Rossum in 1990 and used for
bothWeb and applications development,such as in BitTorrentand Google, and
sometimesfor scripting.
2.5.1 PHP Server-Side Scripting
PHP is a server-side scripting language embedded in HTML pages. It typi-
cally interfaceswith a background database, commonly MySQL, as in LAMP
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.5 The P’s in LAMP 71
environments.Thus, it allows the creation of data-based, dynamic Web pages.
The name PHP is a recursive acronymlike GNU, standing for PHP Hypertext
Preprocessor, though it originally referred to Personal Home Page tools. The
Netcraft survey indicates that PHP is the most widely deployed server-side
scripting language, with about one-third of all Internet sites surveyed having
PHPinstalled by early 2004.
PHP is another instructive tale of open source development. It illustrates
someof the ways in whichopen source projects originate, the influenceof their
initialdevelopers and the impact of new developers, the degree of open source
commitment, and attitudes toward commercialization. In the case of PHP, the
original developerRasmus Lerdorf was later joined by a group of other major
core developers. Lerdorf has kept with the open source project but did not
join in its commercialization – except in the indirect sense that he is currently
involvedas a development engineer in its vertical application internally within
Yahoo.In fact, Lerdorf seems to believe that the greatest monetary potentials
for open source lies in major vertical applications like Yahoo rather than in
support companies (like Zend in the case of PHP) (Schneider, 2003). Some of
the core PHP developers formed a commercial company named Zend, which
sells a number of PHP products, including an encoder designed to protect the
intellectualproperty represented by custom PHP scripts by encrypting them!
Programmersare often autodidacts: open source helps that happen. Rasmus
Lerdorf in fact describes himself as a “crappy coder” who thinks coding is a
“mind-numbing tedious endeavor” and “never took any programming courses
atschool” (Schneider, 2003).He has an engineering degreefrom the University
of Waterloo.But despite an absence of formal credentials in computer science
andthe self-deprecatory characterization ofhis interests, he hadbeen quite a bit
ofa hackersince his youth,hacking the CERNand NCSA servercode soon after
the latter’s distribution (Schneider, 2003). He was a self-taught Unix, Xenix,
andLinux fan andlearned what he neededfrom looking at theopen source code
they provided. To quote from the Schneider interview (2003), “What I like is
solving problems, which unfortunately often requires that I do a bit of coding.
Iwill steal and borrow as much existing code as I can and write as little ‘glue’
codeas possible to make it allwork together. That’s prettymuch what PHP is.”
Lerdorf started developing PHP in 1994–1995 with a simple and personal
motivationin mind, theclassic Raymond “scratch anitch” model. He wanted to
knowhowmany people werelooking athis resume, sincehe hadincluded a URL
for his resume in letters he had written to prospective employers (Schneider,
2003).He used a Perl CGI script to log visits to his resume page and to collect
information about the visitors. To impress the prospective employers, he let
visitors see his logging information (Schneider, 2003). People who visited the
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
72 2 Open Source Internet Application Projects
pagesoon became interested in using the tools, so Lerdorf gave the code away
intypical open source fashion, setting up a PHP mailing list to share the code,
bug reports, and fixes. He officially announced the availability of the initial
set of PHP tools (Version 1.0) in mid-1995, saying that “the tools are in the
public domain distributed under the GNU Public License. Yes, that means
they are free!” (see reference link under Schneider (2003)). Admittedly, there
are a lot of ambiguities in that statement, from public domain to free,but
the intent is clear. His own predisposition to open source was partly related
to money. As Lerdorf observes, “I don’t think I was ever really ‘hooked’ by
a ‘movement’. When you don’t have the money to buy SCO Unix and you
can download something that works and even find people who can help you
get it up and running, how can you beat that?” (Yank, 2002). Lerdorf worked
intensivelyon the PHP code for several years. Being the leader and originator
of a popular open source project is not a lark. Like Linus he personally went
throughall thecontributed patches duringthat period, usuallyrewriting thecode
before committing it. He estimates that he wrote 99% of the code at that time.
Lerdorf’sinvolvementwith opensource has continuedin aproprietary contextat
Yahoo.Unlikemany organizations,Yahoo haswhat Lerdorf describesas a“long
tradition of using open source” (like FreeBSD) for use in their own extremely
complex and demanding infrastructure (Schneider, 2003). Andrei Zmievski is
currently listed as the PHP project administrator and owner on freshmeat.net
and the principal developer of PHP since 1999. PHP 4 is licensed under the
GPL. There are over 800 contributors currently involved in its development
(availableat http://www.php.net).
Opensource developments cango through generationalchanges as the prod-
uctor its implementation evolvesin response tooutside events orthe insights of
new participants. This happened with PHP. Computer scientists Zeev Suraski
and Andi Gutmans became involved with PHP in mid-1997 as part of a Web
programming project at the Technion in Israel. An odd syntax error in PHP
led them to look at the source code for the language. They were surprised to
see that the program used a line-by-line parsing technique, which they rec-
ognized could be dramatically improved upon. After a few months of intense
development effort, they had recoded enough of the PHP source to convince
Lerdorf to discontinue the earlier version of PHP and base further work on
their new code. This led to a successful collaboration between Lerdorf and a
newextended group of seven PHP core developers. Lerdorf has observed that
“this was probably the most crucial moment during the development of PHP.
The project would have died at that point if it had remained a one-man effort
and it could easily have died if the newly assembled group of strangers could
not figure out how to work together towards a common goal. We somehow
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.5 The P’s in LAMP 73
managed to juggle our egos and other personal events and the project grew”
(Lerdorf,2004). Another major overhaul followed. At the time, PHP still used
anapproach in which the codewas executed as itwas parsed. In order tohandle
themuch larger applicationsthat people wereusing PHP for,the developers had
tomake yet another majorchange. This once againled to redesigning and reim-
plementingthe PHP engine from scratch(Suraski, 2000). The new compilation
engine, which used a compile first and then execute approach, was separable
from PHP 4. It was given its own name the Zend engine (combining Zeev +
Andi). It is licensed under an Apache-style as-is license. The company does
not dual license the Zend engine but provides support services and additional
productsfor a fee.
Although the PHP language processor is open source and the scripting lan-
guage programs are human-readable text (on the server side), its major com-
mercial distributor Zend ironically has tools for hiding source code that is
writtenin PHP – just likein the old-fashioned proprietary model! Gutmansand
Suraski, along with another core developer Doron Gerstel, founded Zend in
1999. It is a major distributor of PHP-related products and services. From an
opensource pointof view,a particularlyinteresting product isthe Zend Encoder.
TheZend Website describesthe encoder as“the recognized industrystandard in
PHP intellectual propertyprotection”–italics added (http://www.zend.com/).
Theencoder lets companies distribute their applications written in PHP “with-
out revealing the source code.” This protects the companies against copyright
infringement as well as from reverse engineering since the distributed code is
“both obfuscated and encoded” (http://www.zend.com/). As the site’s selling
pointsdescribe it, thisapproach allows “ProfessionalService Providers (to)rely
onthe ZendEncoder to delivertheir exclusiveand commercialPHP applications
tocustomers without revealing their valuable intellectual property. By protect-
ingtheir PHP applications, these and other enterprises expand distribution and
increaserevenue” (http://www.zend.com/). It is important to understand that it
isnot the source code for the PHP compiler that is hidden, but the PHP scripts
that development companies write for various applications and that are run in
a PHP environment that are hidden. Furthermore, the application source code
isnot being hidden from the end users (browser users), since they wouldnever
haveseen the code in the first place since the PHP scripts are executed on the
serverand only theirresults are sent tothe client, sothere’s nothing tohide from
the client. The code is being hidden from purchasers of the code who want to
run it on their own Web servers. In any case, this model represents an inter-
esting marriage of open source products and proprietary code. The encoding
is accomplished by converting the “plain-text PHP scripts into a platform-
independent binary format known as Zend Intermediate Code. These encoded
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
74 2 Open Source Internet Application Projects
binaryfiles are the ones that aredistributed (to prospective users) insteadof the
human-readablePHP files. The performance ofthe encoded PHP application is
completely unaffected!” (http://www.zend.com/). None of this turns out to be
openlicense related, since the proprietary nature of the distributions that Zend
is targeting is not about the PHP environment itself which is GPL’d but about
scripts written using PHP. Nonetheless, the thinking is not on the same page
as traditional open source distribution where disclosure of source is viewed as
beneficial.
There are striking arguments to be made about the cost-effectiveness of
open tools like PHP and open platforms. Who better to hear them from than
an open source star and proprietary developer like Rasmus Lerdorf? In an
interviewwith Sharon Machlis (2002), Lerdorf did an interesting back-of-the-
envelopcalculation about the relative cost benefits of open applications versus
proprietary tools like Microsoft’s. In response to the hypothetical question of
why one would choose PHP over (say) Microsoft’s ASP, he estimated that (at
that time) the ASP solution entailed (roughly): $4,000 for a Windows server,
$6,000 for an Internet security and application server on a per CPU basis,
$20,000 for an SQL Enterprise Edition Server per CPU, and about $3,000 per
developerfor an MSDN subscription, at a final cost of over $40,000 per CPU.
Incontrast, you could build an equivalentopen source environment thatdid the
same thing based on Linux, Apache + SSL, PHP, PostgreSQL, and the Web
proxySquid, for free.The price comparisons becomeeven more dramaticwhen
multipleCPUs are involved. Granted if you havean existing Microsoft shop in
place, then the open source solution does have a learning curve attached to it
thattranslates into additional costs. However, especially in the case where one
is starting from scratch, the PHP and free environment is economically very
attractive.
References
Lerdorf,R. (2004). Do You PHP? http://www.oracle.com/technology/pub/articles/php
experts/rasmus php.html. Accessed November 29, 2006.
Machlis, S. (2002). PHP Creator Rasmus Lerdorf. http://www.computerworld.com/
softwaretopics/software/appdev/story/0,10801,67864,00.html. Accessed Novem-
ber29, 2006.
Schneider,J. (2003). Interview: PHP Founder Rasmus Lerdorf on Relinquishing Con-
trol. http://www.midwestbusiness.com/news/viewnews.asp?newsletterID=4577.
AccessedNovember 29, 2006.
Suraski, Z. (2000). Under the Hood of PHP4. http://www.zend.com/zend/art/under-
php4-hood.php.Accessed November 29, 2006.
Yank,K.(2002). Interview –PHP’s Creator,Rasmus Lerdorf. http://www.sitepoint.com/
article/phps-creator-rasmus-lerdorf.Accessed November 29, 2006.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.5 The P’s in LAMP 75
2.5.2 Perl and CPAN
According to the perl.org Web site, the open source programming language
Perl, together with its largely open library of supporting perl modules CPAN,
isa “stable, cross-platform programming language. . .used for mission-critical
projects in the public and private sectors and .. . widely used to program web
applications”(italics added).Perl 1.0 wasinitially released byits designer Larry
Wallabout 1987, with the much revised Perl 5 version debuting in 1994. Perl
is a procedural language like C. It is also implemented with a combination of
C and some Perl modules. Perl has some of the characteristics of Unix shell
programmingand is influenced by Unix tools like awk and sed. Although Perl
was originally designed for text manipulation, it has become widely used in
many systems applications, particularly to glue systems together. It was also
the original technology used to produce dynamic Web pages using CGI. Its
diverseapplicability has given it areputation as a system administrator’s Swiss
army knife. In fact, hacker Henry Spencer comically called Perl a Swiss army
chainsaw(see the Ja rgon Fileat http://catb.org/jargon/html/index.html), while
forsimilar reasons others call it “the duct tape of the Internet.” The Perl 5 ver-
sionallowed the use ofmodules to extend the language. Likethe Linux module
approach, the Perl 5 module structure “allows continued development of the
languagewithout actually changing the core language” according to developer
Wall (Richardson, 1999). This version has been under continuous develop-
ment since its release. Perl is highly portable and also has binary distributions
likeActivePerl, which is commonly used for Windows environments.Perl and
CPANmodules are pervasivein financial applications, including long-standing
earlyuse at the Federal ReserveBoard and more recently in manybioinformat-
icsapplications. Indeed a classic application of Perl asan intersystem glue was
its application in integrating differently formatted data from multiple genome
sequencingdatabases during the Human Genome Project (Stein, 1996).
The essay by Larry Wall (1999), the inimitable creator of Perl, is worth
reading for a philosophical discourse on the design and purpose of Perl. Wall
discourses, among other things, on why the linguistic complexity of Perl is
neededin order to handle the complexityof messy real-world problems. Wall’s
academic background is interesting. Although he had worked full time for his
collegecomputer center, his graduate education was actually in linguistics (of
the human language kind) and he intended it to be in preparation for doing
biblical translations. The interview with Wall in Richardson (1999) describes
whatWall calls the “postmodern” motivation behind Perl’s design.
Perlis open source and GPL compatiblesince it can be licensed usingeither
theGPL or the so-called Artistic License,analternativecombination calledthe
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
76 2 Open Source Internet Application Projects
disjunctive license for Perl. The GPL option in this disjuncture is what makes
Perl GPL compatible. The Free Software Foundation considers the Artistic
License option for Perl to be vague and problematic in its wording and so it
recommendsthe GPL. On the other hand, the perl.com Website characterized
thePerl licenseas the Artisticlicense, which itdescribes as “akinder and gentler
version of the GNU license – one that doesn’t infect your work if you care to
borrow from Perl or package up pieces of it as part of a commercial product”
(perl.com).This last is a very important distinction since it allows the code for
Perlor Perl modules to be modified and embedded in proprietary products.
Averysignificantpart of Perl’s power comes from CPAN,which stands for
ComprehensivePerl ArchiveNetwork,animmenselibrary of Perl modules that
isfar moreextensivethan the Javaclass librariesor thoseavailable foreither PHP
or Python. The CPAN collection located at cpan.org was started about 1994,
enabled by the module architecture provided by Perl 5. It currently lists over
5,000 authors and over 10,000 modules. The search engine at search.cpan.org
helps programmers sort through the large number of modules available. The
modulesare human-readable Perl code, so theyare naturally accessible source.
However,a limited number of Perl binaries arealso available, but these are not
stored on the CPAN site. While the Perl language itself is distributed as GPL
(or Artistic), the modules written in Perl on the CPANsite do not require any
particular license. However, the CPAN FAQ does indicate that most, though
not all, of the modules available are in fact licensed under either the GPL or
the Artistic license. Contributors do not have to include a license but the site
recommendsit. With the limited exception of some sharewareand commercial
softwarefor Perl IDEs, SDKs and editors as indicated on thebinary ports page
of the Website, the site stipulates that it strongly disapproves of any software
forthe site that is not free software at least in the sense of free of charge.
References
Richardson,M. (1999). LarryWall, the Guruof Perl. 1999–05–01. LinuxJournal. http://
www.linuxjournal.com/article/3394.Accessed November 29, 2006.
Stein, L. (1996). How Perl Saved the Human Genome Project. The Perl Journal. 2001
version archived at Dr. Dobb’s Portal: www.ddj.com/dept/architect/184410424.
Accessed November 29, 2006. Original article: TPJ, 1996, 1(2). Also via: http://
scholar.google.com/scholar?hl=en&lr=&q=cache:vg2KokmwJNUJ:science.
bard.edu/cutler/classes/bioinfo/notes/perlsave.pdf+++%22The+Perl+Journal%22
+stein.Accessed November 29, 2006.
Wall,L. (1999). Diligence, Patience, and Humility. In: Open Sources: Voicesfrom the
OpenSource Revolution, M. Stone,S. Ockman, and C. DiBona (editors). O’Reilly
Media,Sebastopol, CA, 127–148.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.6 BitTorrent 77
2.6 BitTorrent
BitTorrentis a next-generation P2P Internet utility. It was created by Brahm
Cohenin 2002 and has become extremely widelyused, particularly for sharing
popularmultimedia files. Wediscuss it here fortwo reasons. It representsa next
generationtype of Internetservice, called a Web 2.0service by O’Reilly(2005).
It also facilitates the largest peer-to-peer network according to estimates done
byCacheLogic. BitTorrent works by exploiting the interconnectivity provided
by the Internet, avoidingthe bottlenecks that occur if every user tries to get an
entirecopy of afile from a single source,as is done inthe client-server model of
data exchange. It also differs from conventional peer-to-peer networks, where
exchangesare limited toa single pairof uploaders anddownloaders at anygiven
time.Under the BitTorrentprotocol, acentral servercalled a trackercoordinates
the file exchanges between peers. The tracker does not require knowledge of
the file contents and so can work with a minimum of bandwidth, allowing
it to coordinate many peers. Files are thought of as comprising disjoint pieces
called fragments. Initially, a source or seed server that contains the entire file
distributes the fragments to a set of peers. Each peer in a pool or so-called
swarm of peers will at a given point have some of the fragments from the
complete file but lack some others. These missing fragments are supplied by
being exchanged transparently among the peers in a many-peer-to-many-peer
fashion. The exchange protocol used by BitTorrent exhibits a fundamental,
remarkable, and paradoxical advantage. The more people who want to have
access to a file, the more readily individual users can acquire a complete copy
of the file since there will be more partners to exchange fragments with. Thus
BitTorrentis an example of what has been called a Web 2.0 service.Itexhibits
anew network externality principle, namely, that “the service gets better the
more people use it” (O’Reilly, 2005). In contrast to earlier Internet services
likeAkamai, BitTorrent canbe thought of as having aBYOB (Bring YourOwn
Bottle)approach. Once again, O’Reilly expresses it succinctly:
...everyBitTorrent consumer brings his own resources to the party.There’s an
implicit“architecture of participation,” a built-in ethic of cooperation, in which the
serviceacts primarily as an intelligent broker, connecting the edges to each other
andharnessing the power of the users themselves.
Developer Cohen’s software also avoids the so-called leeching effect that
occurs in P2P exchangesunder which people greedily download files but self-
ishly refuse to share their data by uploading. The BitTorrent protocol rules
requirethat downloadersof fragmentsalso haveto upload fragments.In Cohen’s
program,the more auser shares hisfiles, the faster thetorrent of fragmentsfrom
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
78 2 Open Source Internet Application Projects
otherusers downloads to his computer.This reflects a motto Cohenhad printed
on T-shirts: “Give and ye shall receive” (Thompson, 2005). It’s a “share and
sharealike” principle. BitTorrentnow carries anenormous share ofworld Inter-
nettraffic, currently one-third accordingto the estimate byCacheLogic, though
some aspects of the estimate have been disputed. The BitTorrent license is a
custom variation of the Jabber license. It reserves the BitTorrent
TM
name as a
trademark,which prevents thename from drifting into genericusage. Although
the BitTorrent source code is open source, it does not appear to be an OSI-
certified license or a free software license, partly because of some relicensing
restrictions, although the license definition seems to be GPL-like in charac-
ter.The BitTorrent company is now acting as a licensed distributor for movie
videos,an important, new open source business model.
References
O’Reilly, T. (2005). What is Web 2.0 Design Patterns and Business Models for the
Next Generation of Software. http://www.oreillynet.com/pub/a/oreilly/tim/news/
2005/09/30/what-is-web-20.html.Accessed November 29, 2006.
Thompson,C. (2005). The BitTorrentEffect. Wired.com,Issue 13.01. http://wired.com/
wired/archive/13.01/bittorrent.html.Accessed November 29, 2006.
2.7 BIND
BIND is a pervasive and fundamental Internet infrastructure utility. From an
opensource businessmodel point ofview,BIND is instructiveprecisely because
it led to such an unexpected business model. The model did not benefit the
original developers and was not based on the software itself. Instead it was
basedindirectly on information services that were enabled by the software.
The acronym BIND stands for Berkeley Internet Domain Name. BIND is
an Internet directory service. Its basic function is to implement domain name
servicesby translating symbolic host domainnames into numeric IP addresses,
using distributed name servers. The DNS (Domain Name System) environ-
ment is an enormous operation. It relies on domain name data stored across
billionsof resource records distributedover millions of filescalled zones (Sala-
mon, 1998/2004). The zones are kept on what are called authoritative servers,
whichare distributed over theInternet. Authoritative servers handleDNS name
requestsfor zones they havedata on and requestinformation from other servers
otherwise. Large name servers may have tens of thousands of zones. We will
notdelve further into how the system works.
P1:KAE
9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16
2.7 BIND 79
The general idea of using symbolic names for network communications
was originally introducedto support e-mail on the ARPANET, long before the
Internet, in fact going back to the mid-1970s. As network traffic increased,
the initial implementation approaches had to be completely revised (see RFC
805 from 1982, as well as RFC 881, etc.). The original version of the new
BIND software was written in the early 1980s by graduate students at UC
Berkeley as part of the development of Berkeley Unix under a DARPA grant.
Itscurrent version BIND 9is much more securethan earlier BIND versionsand
isoutsourced bythe Internet SoftwareConsortium to theNominum Corporation
forcontinued development and maintenance.
The business opportunity that emerged from BIND is interesting because
it turned out to be neither the software itself nor the service and marketing of
the software that was profitable. Instead, the profit potential lay in the service
opportunityprovided by the provision of the domain names. It is highly ironic
thatthe project maintainerfor BIND whichis arguably “thesingle most mission
criticalprogram on the Internet” had “scraped by fordecades on donations and
consultingfees,” while the business based on the registrationof domain names
thatwas in turnbased on BIND thrived(O’Reilly, 2004).As O’Reilly observed,
...domainname registration – an information service based on the software –
becamea business generating hundreds of millions of dollars a year, a virtual
monopolyfor Network Solutions, which was handed the business on government
contractbefore anyone realized just how valuable it would be. The...opportunity
ofthe DNS was not a software opportunity at all, but the service of managing the
namespaceused by the software. By a historical accident, the business model
becameseparated from the software.
References
O’Reilly, T. (2004). Open Source Paradigm Shift. http://tim.oreilly.com/articles/
paradigmshift
0504.html.Accessed November 29, 2006.
Salamon,A. (1998/2004). DNS Overviewand General References. http://www.dns.net/
dnsrd/docs/whatis.html.Accessed January 10, 2007.
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3
The Open Source Platform
We use the term open source platform to refer to the combination of open
operating systems and desktops, support environments like GNU, and under-
lyingframeworks like the X Window System, which togetherprovide a matrix
for user interaction with a computer system. The provision of such an open
infrastructure for computing has been one of the hallmark objectives of the
free software movement. The GNU project sponsored by the Free Software
Foundation(FSF) had as its ultimate objective the creation of a self-contained
free software platform that would allow computer scientists to accomplish all
their software development in a free environment uninhibited by proprietary
restrictions. This chapter describes these epic achievements in the history of
computing, including the people involved and technical and legal issues that
affected the development. We shall also examine the important free desktop
application GIMP which is intended as a free replacement for Adobe Photo-
shop. We shall reserve the discussion of the GNU project itself to a later
chapter.
Theroot system that servesas the reference modelfor open source operating
systems is Unix whose creation and evolution we shall briefly describe. Over
time,legal andproprietary issues associatedwith Unix openedthe door toLinux
as the signature open source operating system, though major free versions
of Unix continued under the BSD (Berkeley Software Distributions) aegis.
The Linux operating system, which became the flagship open source project,
evolved out of a simple port of Unix to a personal computer environment,
butitburgeoned rapidly into the centerpiece project of the movement. To be
competitivewith proprietary platforms in the mass market, the Linux and free
Unix-like platforms in turn required high-quality desktop style interfaces. It
was out of this necessity that the two major open desktops GNOME and KDE
emerged. Underlying the development of these desktops was the extensive,
longstandingdevelopment effort represented by the XWindow System, and an
80
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 81
open source project begun in the early 1980s at MIT that provided the basic
underlyingwindowing capabilities for Unix-like systems.
3.1 Operating Systems
This first section addresses the creation of Unix and its variants and the emer-
genceof Linux.
3.1.1 Unix
TheUnix operating system was developed at AT&T’s Bell TelephoneLabora-
tories(BTL) during the 1970s. An essential factorin the historic importance of
Unixlay in the fact that it was the first operating system writtenin a high-level
language (C), although this was not initially so. This approach had the critical
advantage that it made Unix highly portable to different hardware platforms.
Much of the development work on Unix involved distributed effort between
the initial developers at AT&T’s BTL and computer scientists at universities,
especiallythe developmentgroup at Universityof California(UC) Berkeleythat
addedextensivecapabilities to theoriginal AT&TUnixsystem during the1980s.
Remember that the network communications infrastructure that would greatly
facilitatesuch distributedcollaboration wasat thistime only inits infancy.While
Unixbegan its lifein the context ofan industrial research lab,it progressed with
thebacking of major academic and government (DARPA) involvement.
From the viewpoint of distributed collaboration and open source develop-
ment,the invention and developmentof Unix illustrates the substantial benefits
thatcan accrue from open development, as well as the disadvantages for inno-
vationthat can arise from proprietary restrictions in licensing. The Unix story
also illustrates how distributed collaboration was already feasible prior to the
Internet communications structure, but also how it could be done more effec-
tivelyonce even more advanced networked communications became available
(whichwas partly because of Unix itself).
Initiallythe AT&T Unix source code was freely and informally exchanged.
This was a common practice at the time and significantly helped researchers
in different organizations in their tinkering with the source code, fixing bugs,
andadding features. Eventually, however,licensing restrictions by AT&T,con-
siderablecharges for the software to other commercialorganizations, and legal
conflicts between AT&T and the UC Berkeley handicapped the development
of Unix as an open source system, at least temporarily during the early 1990s.
Theearly 1990s was also precisely the time when Linux had started to emerge
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
82 3 The Open Source Platform
and quickly seize center stage as the most popular Unix-like system, offered
with open source, licensed under the General Public License (GPL), and free
ofcharge.
Necessity is the mother of invention – or at least discomfort is. The devel-
opment of Unix at Bell Labs started when Ken Thompson and Dennis Ritchie
wrotean operating system in assemblylanguage for a DEC PDP-7. Theproject
wastheir reaction to BellLabs’ withdrawal fromthe Multics time-sharing oper-
ating system project with MIT and General Electric. Even though the Multics
projecthad problems (like system bloating),it was an important and innovative
systemand the two programmers had becomeaccustomed to it. Its unavailabil-
ity and replacement by an inferior,older system frustrated them (Scott, 1988).
In response, they decided to design a new, simple operating system to run on
theirDEC machine. Interestingly, the idea was not todevelop a BTL corporate
product but just to design and implement a usable and simple operating sys-
tem that the two developers could comfortably use. In addition to the core of
the operating system, the environment they developed included a file system,
a command interpreter,some utilities, a text editor, and a formatting program.
Sinceit providedthe functionality for abasic officeautomation system, Thomp-
sonand Ritchie persuaded the legaldepartment at Bell Labs to bethe first users
andrewrote their system for a PDP-11 for the department.
In1973, a decisive if notrevolutionary development occurred: theoperating
systemwas rewritten in C. The C languagewas a new high-level programming
language that Kernighan and Ritchie had just invented and which was, among
other things, intended to be useful for writing software that would ordinarily
havebeen written inassembly language, likeoperating systems. This extremely
innovativeapproach meant that Unix could now be much more easily updated
orported to other machines.In fact, within afew years, Unix hadbeen ported to
anumber of differentcomputer platforms – somethingthat had neverbeen done
before.The use of ahigher level language foroperating system implementation
wasavisionarydevelopment becauseprior tothis, operating systemshad always
beenclosely tied to the assembly language of their native hardware. The high-
levellanguage implementation made the code for the operating system “much
easier to understand and to modify” (Ritchie and Thompson, 1974), which
wasakey cognitive advantage in collaborative development. Furthermore, as
Raymond(1997) observed:
IfUnix could present the same face, the same capabilities, on machines of many
differenttypes, it could serve as a common software environment for all of them.
Nolonger would users have to pay for complete newdesigns of software every time
amachine went obsolete. Hackers could carry around software toolkits between
differentmachines, rather than having to re-invent the equivalents of fire and the
wheelevery time.
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 83
The Unix environmentwas also elegantly and simply designed as a toolkit of
simpleprograms that could easily interact. Allegedly the terse character of the
Unix and C commands was just an artifact of the fact that teletype machines
thatcommunicated with the PDP werequite slow: so the shorterthe commands
(andthe error messages), the more convenient it was for the user!
Theprofessional dissemination of their workby Ritchie and Thompson also
strongly affectedthe rapid deployment of Unix. At the end of 1973, they gave
areport on Unix at the Fourth ACM Symposiumon Operating Systems, which
waslater published inthe Communications of theACM (Ritchieand Thompson,
1974). As Tom VanVleck observed, this report still “remains one of the best
and clearest pieces of writing in the computer field” (Van Vleck, 1995). The
ACM symposium presentation caught the eye of a Berkeley researcher who
subsequentlypersuaded his home department to buy a DECon which to install
the new system. This initiated Berkeley’s heavy historic involvement in Unix
development,an involvement that was further extended when Ken Thompson
went to UC Berkeley as a visiting professor during 1976 (Berkeley was his
almamater). The article’s publication precipitated further deployment.
Thesystem was deployedwidely and rapidly, particularlyin universities. By
1977,there were more than 500 sites runningUnix. Given the legal restrictions
thatthe AT&Tmonopoly operated under,because of theso-called 1956 consent
decree with the U.S. Department of Justice, AT&T appeared to be prohibited
fromcommercially marketingand supporting computersoftware (Garfinkeland
Spafford,1996). Thus, de facto, software was not considered as a profit center
for the company. During this time, the source code for Unix, not merely the
binarycode, was made available by AT&Tto universities and the government,
aswell as tocommercial firms. However,the distributionwas under theterms of
anAT&T license andan associated nondisclosure agreement thatwas intended
to control the release of the Unix source code. Thus, although the source code
wasopen ina certain sense,it was strictlyspeaking not supposedto bedisclosed,
exceptto other licenserecipients who alreadyhad a copyof the code.University
groups could receive a tape of the complete source code for the system for
about $400, which was the cost of the materials and their distribution, though
theeducational license itself was free.
The distribution of Unix as an operating system that was widely used in
major educational environments and as part of their education by many of the
topcomputer science studentsin the country hadmany side-benefits. For exam-
ple,it meant – going forward – that there would be within a few years literally
thousandsof Unix-savvy users and developersemerging from the bestresearch
universitieswho would furthercontribute to the successfuldispersion, develop-
ment, and entrenchment of Unix. A Unix culture developed that was strongly
dependent on having access to the C source code for the system, the code that
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
84 3 The Open Source Platform
alsosimultaneously served as thedocumentation for the system programs.This
access to the source code greatly stimulated innovation. Programmers could
experimentwith the system, play with the code, and fix bugs, an advantage for
the developmentof Unix that would have been nonexistent if the distributions
ofthe Unix source and system utilities had only been available in binary form.
By 1978, the Berkeley Computer Systems Research Group, including stu-
dents of Ken Thompson at Berkeley, were making add-ons for Unix and dis-
tributing them, always with both the binary executable and the C source code
included,basically for the cost of shipping and materials as so-called Berkeley
SoftwareDistributions of Unix – but only as long as the recipient had a valid
Unixsource license from AT&T.The license under which the BSDapplication
code itself was distributed was very liberal. It allowed the BSD-licensed open
source code, or modifications to it, to be incorporated into closed, proprietary
softwarewhose code could then be kept undisclosed.
Legalissues invariablyimpact commercializablescience. An importantlegal
development occurred in 1979 when AT&T released the 7th version of Unix.
By that point AT&Twas in a legal position to sell software, so it decided that
it was now going to commercialize Unix, no longer distributing it freely and
no longer disclosing its source code. Companies, like IBM and DEC, could
receive the AT&T source-code licenses for a charge and sometimes even the
right to developproprietary systems that used the trademark Unix, like DEC’s
Ultrix.Almost inevitablythe companies createddifferentiated versions ofUnix,
leading eventuallyto a proliferation of incompatible proprietary versions. The
UCBerkeley group responded to AT&T’saction at the end of 1979 by making
its next BSD release (3BSD) a complete operating system, forking the Unix
development(Edwards, 2003). Berkeley,which was nowstrongly supported by
DARPAespeciallybecause ofthe successof the virtualmemory implementation
3BSD, would now come to rival Bell Labs as a center of Unix development.
Indeed, it was BSD Unix that was selected by DARPA as the base system for
theTCP/IP protocol thatwould underlie theInternet. During the early1980s the
BerkeleyComputer SystemsResearch Group introducedimprovements to Unix
that increased the popularity of its distributions with universities, especially
becauseof its improved networking capabilities.
Thenetworking capabilitiesprovided by BSD4 had whatmight be described
as a meta-effect on software development. BSD version 4.2 was released in
1983 and was much more popular with Unix vendors for several years than
AT&T’scommercial Unix System V version (McKusick,1999). But version 4
didn’tmerely improve the capabilitiesof an operating system. Itfundamentally
altered the very way in which collaboration on software development could
be done because it provided an infrastructure for digitally transmitting not
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 85
only communication messages but also large amounts of source code among
remotelylocated developers. This created a sharedworkspace where the actual
productionartifacts could be worked on in common.
However,AT&T did not stand still with its own Unix versions. The AT&T
System V variants, starting with the first release in 1983 and continuing to
release4 of System V in1989 (also designated SVR4), eventuallyincorporated
manyof the improvements to Unix thathad been developed at Berkeley (Scott,
1988).Indeed, release 4 of SystemV had over a millioninstallations. However,
these commercial releases no longer included source code. System V was the
first release that AT&T actually supported (because it was now a commercial
product) and ultimately became the preferred choice for hardware vendors,
partly because its operating system interfaces followed certain formal stan-
dards better (Wheeler, 2003). While the Unix license fees that AT&T charged
universities were nominal, those for commercial firms had ranged as high as
a quarter of a million dollars, but AT&T lowered the costs of the commercial
license with the release of System V.Many private companies developed their
own privatevariations (so-called flavors)of Unix based on SVR4 underlicense
from AT&T.Eventually, AT&Tsold its rights to Unix to Novel after release 4
ofSystem V in the early 1990s.
BSD Unix evolved both technically and legally toward free, open source
status, divorced from AT&T restrictions. Throughout the 1980s, Berkeley’s
Computer Science Research Group extensively redeveloped Unix, enhancing
it – and rewritingor excising almost every piece of the AT&TUnix code. The
BSDdistributions would ultimately be open source and not constrained by the
AT&Tlicensing restrictions. Indeed,by 1991, a BSD system (originally Net/2)
that was almost free of any of the AT&T source code was released that was
freely redistributed. However, in response to this action by Berkeley, AT&T,
concerned that its licensing income would be undermined, sued Berkeley in
1992for violating its licensing agreement with AT&T.Later Berkeleycounter-
suedAT&T fornot giving Berkeley adequate creditfor the extensive BSDcode
thatAT&T had used inits own System V! The dispute was settledby 1994. An
acceptable,free version of Unixcalled 4.4BSD-Lite was releasedsoon after the
settlement.All infringements of AT&T codehad been removed from this code,
even though theyhad been relatively minuscule in any case.
The legal entanglements for BSD Unix caused a delay that created a key
windowof opportunity for the fledgling Linux platform. Indeed, the timing of
thelegal dispute was significantly disruptive for BSD Unixbecause it was pre-
ciselyduring this several year period, during which UC Berkeley was stymied
bylitigation with AT&T, that Linux emerged and rapidly gained in popularity.
That said, it should be noted, in the case of Linux, that half of the utilities that
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
86 3 The Open Source Platform
comepackaged with Linux in reality come from the BSD distribution – and of
courseLinux itself in turn depends heavily on the free or open tools developed
bythe GNU project (McKusick, 1999).
The BSD open source versions that forked from 4.4BSD-Lite and which
wereessentially free/open Unix-like clones included four systems: OpenBSD,
NetBSD, BSDI, and most significantly FreeBSD. These versions were all
licensed under the BSD license (at least the kernel code and most new code)
which unlike the GPL permits both binary and source code redistribution. It
includes the right to make derivative works that can be taken proprietary, as
long as credit is given for the code done by the Berkeley group. OpenBSD
is the second most popular of these free operating systems, after FreeBSD. It
hasrecognized, empirically verified, strongsecurity performance, as we briefly
elaborated on in Chapter 2. NetBSD was designed with the intention of being
portableto almostany processor.BSDI was thefirst commercial versionof Unix
forthe widespread Intel platform (Wheeler, 2003). FreeBSD is the most popu-
larof all the free operating systems, after Linux. Unlike Linux, it is developed
under a single Concurrent VersionsSystem (CVS) revision tree. Additionally,
FreeBSD is a “complete operating system (kernel and userland)” and has the
advantage that both the “kernel and provided utilities are under the control
of the same release engineering team, (so) there is less likelihood of library
incompatibilities” (Lavigne, 2005). It is considered to have high-quality net-
workand security characteristics. Its Web sitedescribes it as providing “robust
networkservices, even underthe heaviest of loads,and uses memory efficiently
to maintain good response times for hundreds, or even thousands, of simulta-
neous user processes” (http://www.freebsd.org/about.html, accessed January
5, 2005). Yahoo uses FreeBSD for its servers as does the server survey Web
site NetCraft. It is considered an elegantly simple system that installs easily
on x86 compatible PCs and a number of other architectures. FreeBSD is also
consideredas binary compatible withLinux in the sense thatcommercial appli-
cationsthat are distributed asbinaries for Linux generally also runon FreeBSD
includingsoftware like Matlab and Acrobat.
For moreinformation on the Unix operating system and its history,we refer
theinterested reader to Raymond (2004) and Ritchie (1984).
OpenStandards for Unix-like Operating Systems
Standardsare extremely important inengineering. And, as AndrewTanenbaum
quipped: “The nice thing about standards is that there are so many of them to
choose from” (Tanenbaum, 1981)! Standards can be defined as openly avail-
able and agreed upon specifications. Forexample, there are international stan-
dardsfor HTML, XML, SQL, Unicode,and many other hardware andsoftware
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 87
systems,sanctioned by a variety of standards organizations like the W3C con-
sortium or the International Organization for Standardization (ISO). Publicly
availablestandards are important forsoftware development becausethey define
criteria around which software products or services can be built. They ensure
that software implementers are working toward an agreed upon shared target.
They help guarantee the compatibility and interoperability of products from
differentmanufacturers or developers.
Standards help control emergent chaos: something that was definitely hap-
pening in the development of Unix-like systems. The closed versions of Unix
developedby AT&T and eventually the variousUnix vendors naturally tended
to diverge increasingly over time, partly because the code bases were pro-
prietary and the individual hardware vendors’ needs were specialized. Such
divergencesmay have the advantage of allowinguseful specialized versions of
a system to emerge, tailored to specific hardware architectures, but they also
make it increasingly difficult for software developers to develop applications
programsthat work in these divergentenvironments. It alsoincreases the learn-
ing difficulties of users who work or migrate between the different variations.
Establishing accepted, widely recognized standards is a key way of guarding
againstthe deleterious effects of theproliferation of such mutating clones ofan
originalroot system.
Operatingsystems exhibitthe scaleof complexitythat necessitates standards.
In the context of operating systems, standards can help establish uniform user
views of a system as well as uniform system calls for application programs.
Two related standards for Unix (which have basically merged) are the POSIX
standard and the Single Unix Specification, both of which were initiated in
themid-1980s as a result of the proliferation of proprietary Unix-like systems.
Standards do not entail disclosing source code like in the open source model,
but they do at least ensure a degree of portability for applications and users,
and mitigate against the decreasing interoperability that tends to arise when
closed source systems evolve, mutate, and diverge. Thus, while open source
helpskeep divergenceunder control bymaking system internalstransparent and
reproducible,open standards attempt tohelp control divergenceby maintaining
coherencein the external user and programming interfaces of systems.
POSIX refers to a set of standards, defined by the IEEE and recognized by
the ISO, which is intended to standardize the Applications Program Interface
forprograms running on Unix-likeoperating systems. POSIX isan acronym for
PortableOperating Systems Interface, with the post-pended X due to the Unix
connection.The name wasproposed (apparently humorously) byStallman who
isprominent forhis role inthe FreeSoftware Foundation andmovement. POSIX
wasaneffortby a consortium ofvendors to establish asingle standard for Unix,
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
88 3 The Open Source Platform
makingit simpler to port applications across differenthardware platforms. The
user interface would look the same on different platforms and programs that
ran on one POSIX system would also run on another. In other words, the user
interface would be portable as would the Application Programmer Interface,
ratherthan the operating system.
The POSIX standards include a compliance suite called the Conformance
Test Suite. Actually, the term compliance is weaker than the stronger term
conformance that implies that a system supports the POSIX standards in their
entirety. The POSIX standards address both user and programming software
interfaces. For example, the Korn Shell is established as the standard user
command-lineinterface, as are an extensive set of user commands and utilities
likethe command for listing files (ls). These standardsfall under what is called
POSIX.2. The standards also define the C programming interface for system
calls,including those for I/O services, files,and processes, under what is called
POSIX.1.The POSIX standards were later integrated into the so-called Single
Unix Specification, which had originated about the same time as the POSIX
standards. The Single Unix Specification is the legal definition of the Unix
systemunder the Unix trademark owned by the Open Group.The Open Group
makes the standards freely available on the Web and provides test tools and
certificationfor the standards.
References
Edwards,K. (2003).Technological Innovation in the Software Industry: Open Source
Development.Ph.D. Thesis, Technical University of Denmark.
Garfinkel, S. and Spafford, G. (1996). Practical Unix and Internet Security. O’Reilly
Media,Sebastopol, CA.
Lavigne, D. (2005). FreeBSD: An Open Source Alternative to Linux. http://www.
freebsd.org/doc/en
US.ISO8859-1/articles/linux-comparison/article.html.
AccessedFebruary 10, 2007.
McKusick,M. (1999). TwentyYears of Berkeley Unix: From AT&T-Owned to Freely
Redistributable. In: Open Sources: Voices from the Open Source Revolution, M.
Stone, S. Ockman, and C. DiBona (editors). O’Reilly Media, Sebastopol, CA,
31–46.
Raymond, E. (2004). The Art of UNIX Programming. Addison-Wesley Professional
ComputerSeries. Pearson Education Inc. Also: Revision 1.0.September 19, 2003.
http://www.faqs.org/docs/artu/.Accessed January 10, 2007.
Raymond, E.S. (1997). A Brief History of Hackerdom. http://www.catb.org/esr/
writings/cathedral-bazaar/hacker-history/.Accessed November 29, 2006.
Ritchie,D. (1984).The Evolution ofthe UNIX Time-SharingSystem. Bell SystemTech-
nicalJournal, 63(8), 1–11. Also: http://cm.bell-labs.com/cm/cs/who/dmr/hist.pdf.
AccessedJanuary 10, 2007.
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 89
Ritchie,D. and Thompson, K. (1974). The UNIX Time-Sharing System. Communica-
tions of the ACM, 17(7), 365–375. Revised version of paper presented at: Fourth
ACM Symposium on Operating SystemPrinciples, IBM Watson Research Center,
YorktownHeights, New York,October 15–17, 1973.
Scott, G. (1988). A Look at UNIX. U-M Computing News. University of Michigan
ComputingNewsletter, 3(7).
Tanenbaum, A. (1981). Computer Networks, 2nd edition. Prentice Hall, Englewood
Cliffs,NJ.
Van Vleck,T. (1995). Unix and Multics.http://multicians.org/unix.html. Accessed Jan-
uary10, 2007.
Wheeler, D. (2003). Secure Programming for Linux and Unix HOWTO. http://www.
dwheeler.com/secure-programs.Accessed November 29, 2006.
3.1.2 Linux
Linux is the defining, triumphant mythic project of open source. It illustrates
perfectly the paradigm of open development and the variegated motivations
that make people initiate or participate in these projects. It led to unexpected,
unprecedented,explosivesystem developmentand deployment. Itrepresents the
metamorphosis of an initially modest project, overseen by a single individual,
intoa global megaproject.
Linuxwas the realization of a youthfulcomputer science student’s dream of
creating an operating system he would like and that would serve his personal
purposes.It began as a response to limitationsin the Minix PC implementation
of Unix. As described previously, Unix had originally been freely and widely
distributed at universities and research facilities, but by 1990 it had become
both expensiveand restricted by a proprietary AT&Tlicense. An inexpensive,
open source Unix clone named Minix, which could run on PCs and used no
AT&Tcode in its kernel, compilers, or utilities, was developed by Professor
Andrew Tanenbaum for use in teaching operating systems courses. In 1991,
Linus Torvalds, then an undergraduate student at the University of Helsinski
got a new PC, his first, based on an Intel 8086 processor. The only available
operatingsystems for thePC were DOS, whichlacked multitasking, and Minix.
Linusbought a copy ofMinix and tried iton his PC, butwas dissatisfied with its
performance.For example, itlacked important features likea terminal emulator
that would let him connect to his school’s computer. A terminal emulator is a
program that runs on a PC and lets it interact with a remote, multiuser server.
Thisis different from a command-line interpreter or shell. Terminal emulators
were frequently used to let a PC user log on to a remote computer to execute
programs available on the remote machine. The familiar Telnet program is a
terminal emulator that works over a TCP/IP network and lets the PC running
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
90 3 The Open Source Platform
it interact with the remote server program (SSH would now be used). Com-
mandsentered through the Telnet prompt are transmitted overthe network and
executedas if they had been directly entered on the remote machine’s console.
Linus implemented his own terminal emulator separate from Minix and also
developedadditional, Minix-independent programs for saving and transferring
files.
Thiswas the beginning ofthe Linux operating system. Thesystem’s name is
anelision of the developer’s first name Linusand Unix (the operating system it
ismodeled on). Linuxis said to bea Unix-like operatingsystem in the sensethat
itssystem interfaces or system calls arethe same as those of Unix,so programs
that work in a Unix environment will also work in a Linux environment. It
wouldbe worthwhile for the readerto look up a table ofLinux system calls and
identify the function of some of the major system calls to get a sense of what
isinvolved.
The post that would be heard round the world arrived in late August 1991.
Linusposted the noteon the Usenetnewsgroup comp.os.minix (category:Com-
puters > Operating Systems > Minix), a newsgroup of which Linus was a
member,dedicated to discussion of theMinix operating system. He announced
he was developing a free operating system for the 386(486) AT clones. These
networkedcommunications groups that had firstbecome available in the 1980s
wouldbe key enabling infrastructures for the kind of distributed, collaborative
development Linux followed. The group provided a forum for Linus to tell
people what he wanted to do and to attract their interest. The posted message
asked if anyone in the newsgroup had ideas to propose for additional features
forhis system. The original post follows:
From:torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups:comp.os.minix
Subject:What would you like to see most in minix?
Summary:small poll for my new operating system
Message-ID:<1991Aug25.205708.9541@klaava.Helsinki.FI>
Date:25Aug9120:57:08 GMT
Organization:University of Helsinki
Helloeverybody out there using minix –
I’mdoing a (free) operating system ( just a hobby, won’tbe big and professional
likegnu) for 386(486) AT clones. This has been brewing since april, and is starting
toget ready. I’d like any feedback on things people like/dislike in minix, as my OS
resemblesit somewhat (same physical layout of the file-system (due to practical
reasons)among other things).
I’vecurrently ported bash(1.08) and gcc(1.40), and things seem to work. This
impliesthat I’ll get something practical within a few months, and I’d like to know
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 91
whatfeatures most people would want. Any suggestions are welcome, but I won’t
promiseI’ll implement them :-)
Linus(torvalds@kruuna.helsinki.fi)
PS.Yes – it’s free of any minix code, and it has a multi-threaded fs. It is NOT
portable(uses 386 task switching etc), and it probably never will support anything
otherthan AT-harddisks, as that’s all I have :-(.
ThoughLinux would become a dragon-slayerof a project, the initialpost was a
“modestproposal” indeed, though it doesconvey the sense ofthe development.
Itwas motivated by personal needand interest. It was to be aUnix-like system.
It was to be free with a pure pedigree. Just as Minix contained none of the
proprietaryAT&T Unixcode, Linux too wouldcontain none of the Minixcode.
Linus wanted suggestions on additional useful features and enhancements. A
fewbasic Unixprograms had alreadybeen implemented ona specific processor,
butthe scale was small and itwas not even planned tobe “ported” (adapted and
implemented)on other machines. A little later, Linus posted another engaging
e-mailto the Minix newsgroup:
From:torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups:comp.os.minix
Subject:Free minix-like kernel sources for 386-AT
Message-ID:<1991Oct5.054106.4647@klaava.Helsinki.FI>
Date:5Oct91 05:41:06 GMT
Organization:University of Helsinki
Doyou pine for the nice days of minix-1.1, when men were men and wrote their
owndevice drivers? Are you without a nice project and just dying to cut your teeth
onan OS you can try to modify for your needs? Are you finding it frustrating when
everythingworks on minix? No more all-nighters to get a nifty program working?
Thenthis post might be just for you :-)
The rest of the post describes Linus’ goal of building a stand-alone operating
systemindependent of Minix. One has to smile at theunpretentious and enthu-
siastictone of the e-mail. The interest in making a “niftyprogram” sort of says
itall.
Linux would soon become a model of Internet-based development which
itself relied on networked communication and networked file sharing. Linus
encouragedinterested readers to download the source code he had written and
made available on an FTP server. He wanted the source to be easily available
over FTP and inexpensive(Yamagata, 1997). Thus in addition to collaborators
communicatingwith each another via the network, thenetworked environment
provided the means for the rapid dissemination of revisions to the system.
Potential contributors were asked to download the code, play with the system
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
92 3 The Open Source Platform
developed so far, tell about any corrections, and contribute code. The patch
programs had already been introduced some years earlier, so the exchange of
codechanges was relatively simple.
Releases were to follow quickly as a matter of development strategy and
Unix functionality was soon matched. Within a month of his October 1991
announcement,ten people had installedthe first version ontheir own machines.
Within two months, 30 people had contributed a few hundred error reports
or contributed utilities and drivers. When the comp.os.linux newsgroup was
subsequently established, it became one of the top five most read newsgroups
(Edwards,2003). Later in1991, Linus distributedwhat he calledversion 0.12. It
was initiallydistributed under a license that forbade charging for distributions.
By January 1992, this was changed and Linux was distributedunder the GNU
GPL. This was done partly for logistic reasons (so people could charge for
making disk copies available) but primarily out of Linus’ appreciation for the
GPL-licensed GNU tools that Torvalds grew up on and was using to create
Linux. Linus gavecredit to three people for their contributions in a December
release. By the release of development version 0.13, most of the patches had
beenwritten by people other than himself (Moon and Sproul, 2000).From that
point on, Linux developed quickly, partly as a result of Linus’ release-early,
release-often policy. Indeed, within a year-and-a-half, Linus had released 90
updated versions (!) of the original software, prior to the first user version 1.0
in 1994. By the end of 1993, Linux had developed sufficiently to serve as a
replacement for Unix. Version 1.0 of the Linux kernel was released in March
1994.
Linuxis just the kernel of an operating system. For example,the command-
lineinterpreter orshell thatruns on topof the Linuxkernel wasnot developed by
Linus but came from previously existing free software. There were of course
other key free components used in the composite GNU/Linux system, like
the GNU C compiler developed by the FSF in the 1980s. Successive Linux
versions substantially modified the initial kernel. Linus advocated the then
unconventionaluse of a monolithic kernel rather than a so-called microkernel.
Microkernelsare smallkernels with hardware-relatedcomponents embedded in
thekernel and which use message-passing to communicate between the kernel
and the separate outer layers of the operating system. This structure makes
microkerneldesigns portable but also generally slowerthan monolithic kernels
because of the increased interprocess communication they entail. Monolithic
kernels,on theother hand,integrate the outerlayers intothe kernel, whichmakes
them faster. Linux’s design uses modules that can be linked to the kernel at
runtimein order to achievethe advantages offered bythe microkernel approach
(Bovetand Cesati, 2003).A fascinating historicalexchange about kernel design
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 93
occurred in the heated butinstructive debate between Minix’s Tanenbaum and
Linux’sTorvaldsin their controversialearly 1992newsgroup discussions. Refer
to Tanenbaum’sprovocative “Linux is obsolete” thread in the comp.os.minix
newsgroup and Linus’ equally provocative response; (see also DiBona et al.,
1999,Appendix A).
Bugsin operating systemscan be difficult,unpredictable critters, butLinux’s
massivecollaborative environment was almost ideally suited for these circum-
stances.Since operating systems are subject to temporal and real-time concur-
rent effects, improvements in the system implementation tend to focus on the
need to remedy bugs – as well as on the need to develop new device drivers
as additional peripherals are developed (Wirzenius, 2003). Typical operating
system bugs might occur only rarely or intermittently or be highly context-
dependent. Bugs can be time-dependent or reflect anomalies that occur only
in some complicated context in a concurrent user, multitasking environment.
The huge number of individuals involved in developing Linux, especially as
informed users, greatly facilitated both exposing and fixing such bugs, which
would have been much more difficult to detect in a more systematic develop-
ment approach. The operativediagnostic principle or tactic from Linus’ view-
pointis expressed in his well-known aphorism that “given enougheyeballs, all
bugsare shallow.” On the other hand, therehave also been critiques of Linux’s
development.For example, one empirical study of the growthin coupling over
successiveversions of Linux concluded that “unless Linux is restructured with
a bare minimum of common coupling, the dependencies induced by common
coupling will, at some future date, make Linux exceedingly hard to main-
tain without inducing regression faults,” though this outcome was thought to
be avoidable if care were taken to introduce no additional coupling instances
(Schachet al., 2002).
The modular design of Linux’s architecture facilitated code development
just as the collaborative framework facilitated bug handling. For example, the
code for device drivers currently constitutes the preponderance of the Linux
source code, in contrast to the code for core operating system tasks like mul-
titasking which is much smaller by comparison. The drivers interface with
theoperating system kernel through well-defined interfaces(Wirzenius, 2003).
Thus,modular device drivers are easy to write without theprogrammer having
a comprehensive grasp of the entire system. This separable kind of structure
is extremely important from a distributed development point of view.It facili-
tatesletting different individualsand groups address thedevelopment of drivers
independentlyof one another,something which isessential given theminimally
synchronized and distributed nature of the open development model. In fact,
theoverall structure of thekernel that Linus designed washighly modular. This
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
94 3 The Open Source Platform
is a highly desirable characteristic of an open source software architecture,
because it is essential for decomposing development tasks into independent
pieces which can be worked on separately and in parallel, with only relatively
limited organizational coordination required. Furthermore, this structure also
allowsso-called redundant development (Moon andSproul, 2000) where more
than one individual or groups of individuals can simultaneously try to solve a
problem with the best outcome or the earliest outcome ultimately selected for
inclusionin the system.
Linux was portable, functional, and turned out to be surprisingly reliable.
As we havenoted, over its first several years of development, enough features
were added to Linux for it to become competitive as an alternative to Unix.
Its portability was directly related to the design decision to base Linux on a
monolithic core kernel (with hardware-specific code like device drivers han-
dledby so-called kernel modules). This decision was in turn directly related to
enablingthe distributed style ofLinux development. The structurealso allowed
Linus Torvalds to focus on managing core kernel development, while others
couldwork independently on kernel modules (Torvalds, 1999b). Several years
afterthe version 1.0 release of the system in1994, Linux was ported to proces-
sors other than the originally targeted 386/486 family, including the Motorola
68000, the Sun SPARC, the VAX, and eventually many others. Its reliability
quicklybecame superior to Unix. Indeed, Microsoft program manager Vallop-
pillil(1998), in the first ofthe now famous confidential Microsoft “Halloween”
memos, reported that the Linux failure rate was two to five times lower than
commercially available versions of Unix, according to performance analyses
doneinternally by Microsoft itself.
The scale of the project continued to grow.The size of the distributed team
of developers expanded almost exponentially. Despite this, the organizational
paradigmremained lean inthe extreme. Already bymid-year 1995, over15,000
peoplehad submitted contributions to the main Linux newsgroupsand mailing
lists(Moon and Sproul, 2000). Adecade later, by theyear 2005, there would be
almost 700 Linux user groups spread worldwide (http://lugww.counter.li.org/,
accessed January 5, 2007). The 1994 version 1.0 release, which had already
beencomparable in functionalityto Unix, encompassed 175,000lines of source
code. By the time version 2.0 was released in 1996, the system had 780,000
lines of source code. The 1998 version 2.1.110 release had a million and a
half lines of code (LOC), 30% of which consisted of code for the kernel and
file system, about 50% for device drivers, while about 20% was hardware
architecture-specific(Moon and Sproul, 2000). The amazing thing was that, to
quote Moon and Sproul (2000): “No (software) architecture group developed
the design; no management team approved the plan, budget, and schedule; no
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 95
HRgroup hired the programmers; no facilities group assignedthe office space.
Instead, volunteers from all over the world contributed code, documentation,
and technical support overthe Internet just because they wanted to.” It was an
unprecedented tour de force of large-scale distributed development and led to
ablockbuster system.
What motivated such an army of dedicated participants? Many of them
sharedthe same kind of motivationas Linus had: theywanted to add features to
thesystem sothat itcould dosomething useful thatthey wantedfor theirpersonal
benefit. People also wanted to be known for the good code they developed.
Initially,Linus providedpersonal acknowledgements for individualswho made
significant contributions. Already by the version 1.0 release in 1994, Linus
personallyacknowledged the workof over 80 people(Moon and Sproul, 2000).
This version also began the practice of including a credits file with the source
code that identified the major contributors and the roles they had played. It
wasuptothe contributors themselves to ask Linusto be included in the credits
file. This kind of reputational reward was another motivation for continued
developerparticipation in a voluntary context like that of Linux.
All participants were by no means equal. Linux kernel developer Andrew
Morton, lead maintainer for the Linux production kernel at the time, observed
in 2004 (at the Forum on Technology and Innovation) that of the 38,000 most
recentpatches to the Linux kernel (made byroughly 1,000 developers), 37,000
of these patches – that’s about 97% – were made by a subset of 100 devel-
opers who were employees paid by their companies to work on Linux! It is
worth perusing the Linux credits file. For example, you might try to observe
any notable demographic patterns, like country of origin of participants, their
industrialor academic affiliations based on theire-mail addresses, the apparent
sexof participants, and the like.
Decisionsother than technical ones were key to Linux. Managerial innova-
tiveness was central to its successful development. Technical and managerial
issues could very well intertwine. For example, after the original system was
written for the Intel 386 and then re-ported in 1993 for the Motorola 68000,
it became clear to Linus that he had to redesign the kernel architecture so
that a greater portion of the kernel could serve different processor architec-
tures. The new architectural design not only made the kernel code far more
easilyportable but also more modular. Organizationally, this allowed different
parts of the kernel to be developed in parallel (Torvalds, 1999a, b) and with
less coordination, which was highly advantageous in the distributed develop-
mentenvironment. The way in which software releases were handled was also
determined by market effects. A simple but important managerial/marketing
decision in this connection was the use of a dual track for release. The dual
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
96 3 The Open Source Platform
track differentiated between stable releases that could be used confidently by
peoplewho merelywanted to usethe operating systemas a platformon which to
dotheir applications work – versus development releases that were less stable,
still under development, and included the newest feature additions. This kept
two potentially disparate audiences happy: the developers had flexibility, the
end users had certainty.The distinction between developer and stable releases
also supported the “release-early, release-often” policy that facilitated rapid
development. The release-numbering system reflected the categorization and
is worth understanding. Odd-numbered release series such as 2.3 (or its sub-
tree members like 2.3.1 and 2.3.2) corresponded to developer or experimental
releases.Stable releases had aneven-numbered second digit,like 2.0, 2.2. Once
astable release wasannounced, a newdeveloper series wouldstart with the next
higher (odd) number (such as 2.3 in the present case). Amazingly, there were
almost600 releases of all kinds between the 0.01 release in 1991 that started it
alland the 2.3 release in 1999 (Moon and Sproul, 2000).
Though development was distributed and team-based, the project retained
itssingular leadership. While Linusdisplayed a somewhatself-deprecatory and
mild-manneredleadership or managementstyle, it wasultimately he whocalled
the shots. He decided on which patches were accepted and which additional
featureswere incorporated, announcedall releases, and atleast in the beginning
of the project reviewed all contributions personally and communicated by e-
mail with every contributor (Moon and Sproul, 2000). If it is true that enough
eyeballs make all bugs shallow, it also appears to be true that in the Linux
world there was a governing single pair of eyes overseeing and ensuring the
quality and integral vision of the overall process. So one might ask again: is it
acathedral (a design vision defined by a single mind) or a bazaar?
The choice of the GPL has been decisive to the developmental integrity of
Linux because it is instrumental in preventing the divergence of evolving ver-
sionsof the system. In contrast, we have seen how proprietary pressures in the
development of Unix systems encouraged the divergence of Unix mutations,
though the POSIX standards also act against this. This centralizing control
provided by the GPL for Linux was well-articulated by Roger Young (1999)
of Red Hat in a well-known essay where he argued that unlike proprietary
development: “In Linux the pressures are the reverse. If one Linux supplier
adoptsan innovation that becomes popular in the market, the other Linux ven-
dorswill immediately adoptthat innovation. Thisis because they haveaccess to
thesource code of the innovation andit comes under a license that allowsthem
touse it.” Thus,open source creates“unifying pressure toconform to acommon
reference point – in effect an open standard – and it removes the intellectual
propertybarriers that wouldotherwise inhibit thisconvergence” (Young,1999).
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.1 Operating Systems 97
This is a compelling argument not only for the stability of Linux but also for
themerits of the GPL in rapid, innovative system development.
Writing code may be a relatively solitary endeavor,but the development of
Linux was an interactive social act. We have remarked on the organizational
structure of Linux development, the motivations of its participants, and the
personal characteristics of its unique leader. It is also worthwhile to describe
somecharacteristics of thesocial matrix inwhich the projectoperated. For more
detail see the discussion in (Moon and Sproul, 2000). To begin with, Linus’
participation in the Usenet newsgroup comp.os.minix preceded his original
announcementto thecommunity of hisLinux adventure. Thiswas a largeonline
communitywith about 40,000members by 1992. Thegroup that woulddevelop
Linux was a self-selected subset that sat on top of this basic infrastructure,
which in turn sat on top of an e-mail and network structure. Of course, by
wordof e-mail the Linux group would quickly spread beyond the initial Minix
newsgroup.
Communities like those that developed Linux exhibit a sociological infras-
tructure. This includes their group communication structure and the roles
ascribed to the different members. In Linux development, group communi-
cation was handled via Usenet groups and various Linux mailing lists. Within
months of the initial project announcement, the original single mailing list
(designated Linux-activists) had 400 members. At the present time there are
hundreds of such mailing lists targeted at different Linux distributions and
issues. The comp.os.linux newsgroup was formed by mid-1992. Within a few
years there were literally hundreds of such Linux-related newsgroups (lin-
uxlinks.com/Links/USENET).The mailing list for Linux-activists wasthe first
list for Linux kernel developers, but others followed. It is worth looking up
thebasic information concerning the Linux kernel mailing listat www.tux.org.
Check some of the entries in the site’s hyperlink Index to understand how the
process works. If you are considering becoming a participant in one of the
lists,beware of which list you subscribe to and consider the advice in the FAQ
(frequentlyanswered questions) at the site that warns:
Thinkagain before you subscribe. Do you really want to get that much traffic in
yourmailbox? Are you so concerned about Linux kernel development that you will
patchyour kernel once a week, suffer through the oopses, bugs and the resulting
timeand energy losses? Are you ready to join the Order of the Great Penguin, and
becalled a “Linux geek” for the rest of your life? Maybe you’re better off reading
theweekly “Kernel Traffic” summary at http://www.kerneltraffic.org/.
The kernel mailing list is the central organizational tool for coordinating ker-
nel developers. Moon and Sproul (2000) observe that: “Feature freezes, code
freezes,and new releases are announced on this list. Bug reports are submitted
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
98 3 The Open Source Platform
tothis list. Programmers who want their code to be included in the kernel sub-
mit it to this list. Other programmers can then download it, test it within their
ownenvironment, suggest changes backto the author, or endorseit.” Messages
sentto the list are automatically resent to everyone on the list. The e-mail traf-
fic is enormous, with thousands of developers posting hundreds of thousands
of messages in the course of time. As of 2005, a member of the kernel list
could receivealmost 10,000 messages per month, so that digest summaries of
messagesare appropriate to look at, at least initially. The modular architecture
of Linux also affects the communications profile of the development since the
architecture partitions the developers into smaller groups. This way intensive
collaboration is not across a broad population of developers but with smaller
setsof developers.
The technical roles of the major participants are divided into so-called
credited developers and maintainers. Credited developers are those who have
made substantial code contributions and are listed in the Linux credits file
(such as http://www.kernel.org/pub/linux/kernel/CREDITS, accessed January
10, 2007). There are also major contributors who for various personal reasons
preferto keepa lowprofile and donot appearon the creditslist. Therewere about
400credited Linux kernel developers by 2000. Maintainersare responsible for
individual kernel modules. The maintainers “review linux-kernel mailing list
submissions (bug reports, bug fixes, new features) relevant to their modules,
buildthem into larger patches,and submit the largerpatches back to the listand
to Torvaldsdirectly” (Moon and Sproul, 2000). These are people whose judg-
mentand expertise is sufficiently trusted by Linus in areas of the kernel where
hehimself is not the primary developer that hewill give close attention to their
recommendationsand tend to approve their decisions. The credited developers
andmaintainers dominate the message traffic on the Linux-kernel mailing list.
Typically,1/50th of the developers generate 50% of the traffic. Of this 50%,
perhaps30% of the traffic is fromcredited developers while about 20% is from
maintainers. The norms for howto behave with respect to the mailing lists are
specified in a detailed FAQ document that is maintained by about 20 contrib-
utors. For example, refer to the http://www.tux.org/lkml/ document (accessed
January 10, 2007) to get a sense of the range of knowledge and behaviors that
are part of the defining norms of the developer community. It tells you every-
thing from “What is a feature freeze?” and “How to apply a patch?” to “What
kindof question can I ask on the list?” Documentslike these are important in a
distributed,cross-cultural contextbecause they allowparticipants to understand
what is expected of them and what their responsibilities are. In the absence of
face-to-face interactions, the delineation of such explicit norms of conduct is
criticalto allowing effective, largely text-based, remote communication.
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 Windowing Systems and Desktops 99
References
Bovet,D.P.andCesati, M. (2003).Understanding theLinux Kernel,2nd edition.O’Reilly
Media,Sebastopol, CA.
DiBona, C., Ockman, S., and Stone, M. (1999). The Tanenbaum-Torvalds Debate in
AppendixA of Open Sources: Voicesfrom theOpen Source Revolution. M. Stone,
S.Ockman, and C. DiBona (editors). O’Reilly Media, Sebastopol, CA.
Edwards, K. (2003). Technological Innovation in the Software Industry: Open Source
Development.Ph.D. Thesis, Technical University of Denmark.
Moon,J.Y. andSproul, L. (2000). Essence of DistributedWork: The Case ofthe Linux.
http://www.firstmonday.dk/issues/issue5
11/moon/index.html. Accessed Decem-
ber3, 2006.
Schach,S., Jin, B., Wright, D., Heller, G., and Offutt,A. (2002). Maintainability of the
LinuxKernel. IEE Proceedings – Software, 149(1), 18–23.
Torvalds,L (1999a). The Linux edge. Communications of the ACM, 42(4), 38–39.
Torvalds,L. (1999b). The Linux edge. In: Open Sources: Voicesfrom the Open Source
Revolution, M. Stone, S. Ockman, and C. DiBona (editors). O’Reilly Media,
Sebastopol,CA, 101–111.
Valloppillil,V. (1998). Open source software: A (New?) development methodology,
(August 11, 1998). http://www.opensource.org/halloween/. Accessed January 20,
2007.
Wirzenius,L. (2003). Linux: The Big Picture. PC Update Online. http://www.melbpc.
org.au/pcupdate/2305/2305article3.htm.Accessed November 29, 2006.
Yamagata, H. (1997). The Pragmatist of Free Software: Linus Torvalds Interview.
http://kde.sw.com.sg/food/linus.html.Accessed November 29, 2006.
Young,R.(1999). GivingIt Away.In: OpenSources: Voicesfrom theOpen Source Revo-
lution,M. Stone,S. Ockman, and C.DiBona (editors). O’ReillyMedia, Sebastopol,
CA,113–125.
3.2 Windowing Systems and Desktops
By the early 1970s, computer scientists at the famed Xerox PARC research
facility were vigorously pursuing ideas proposed by the legendary Douglas
Engelbart,inventor of the mouse and prescient computer engineer whose sem-
inal work had propelled interest in the development of effective bitmapped
graphicsand graphical userinterfaces years ahead ofits time (Engelbart, 1962).
Engelbart’swork eventually led to the development of the Smalltalk graphical
environmentreleased on the Xerox Star computer in 1981. Many of the engi-
neerswho workedat Xeroxlater migratedto Apple,which released therelatively
low-cost Macintosh graphical computer in 1984. Microsoft released systems
likeWindows 2.0 with a similar “look and feel” to Apple by 1987, a similarity
for which Microsoft would be unsuccessfully sued by Apple (Reimer, 2005),
though see the review of the intellectual property issues involved in Myers
(1995). The provision of open source windowing and desktop environments
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
100 3 The Open Source Platform
for Unix beganin the early 1980s with the initiation of the X Window System
project. By the mid-1990s the GNOME and KDE projects to create conve-
nientfree desktop environments forordinary users, with GUI interfaces similar
to Windows and Mac OS, were begun. This section describes the enormous
efforts that have gone into these major open source projects: X, GNOME,
andKDE.
3.2.1 The X Window System
TheX Window System (also called X or X11after the version which appeared
in1987) lets programmers developGUIs for bitmap displayson Unix and other
platformswhich do notcome with windowing capabilities.It was developedfor
aUnix environment beginning at MIT in 1984 in ajoint collaboration between
MIT, DEC, and IBM and licensed under the permissive MIT/X open source
license by 1985. It is considered to be “one of the first very large-scale free
softwareprojects” (XWindow System,2006), donein thecontext of thebudding
Internetwith extensive use of open mailing lists. Thesystem lets programmers
drawwindows and interact with the mouseand keyboard. It also provides what
is called network transparency,meaning that applications on one machine can
remotelydisplay graphics on another machine with adifferent architecture and
operatingsystem. For example, X allows a computationally intensive program
executingon a Unix workstation to display graphics on a Windowsdesktop (X
WindowSystem, 2006). X now serves as the basis for both remote and local
graphic interfaces on Linux and almost all Unix-like systems, as well as for
Mac OS X (which runs on FreeBSD). KDE and GNOME, the most popular
freedesktops, are higher levellayers that run on topof X11. In the case ofKDE
onUnix, for example, KDE applications siton top of the KDE librariesand Qt,
whichin turn run on X11 running on top of Unix (KDE, 2006).
TheX Window System is a large application with an impressive code base.
Forexample,X11had over 2,500 modules by 1994. In an overview of the size
ofa Linux distribution (Red Hat Linux 6.2), Wheeler (2000) observed that the
XWindows Server was the next largest component in the distribution after
the Linux kernel (a significant proportion of which was device-dependent). It
occupiedalmost a million-and-a-half SLOC. X was followedin size by the gcc
compiler,debugger, and Emacs, each about half the size of X. In comparison,
the important Apache project weighs in at under 100,000 SLOC. Despite this,
Xalso works perfectly wellon compact digital devices likeIBM’s Linux watch
orPDA’s and has aminimal footprint that is “currently just over 1megabyte of
code(uncompressed), excludingtoolkits thatare typically muchlarger” (Gettys,
2003).
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 Windowing Systems and Desktops 101
The system was used on Unix workstations produced by major vendors
like AT&T,Sun, HP, and DEC (Reimer, 2005). The X11 version released in
1987 intentionally reflected a more hardware neutral design. To maintain the
coherent evolutionof the system, a group of vendors established the nonprofit
MIT X Consortium in 1987. The project was directed by X cofounder Bob
Scheifler. The consortium proposed to develop X “in a neutral atmosphere
inclusiveof commercial and educational interests” (X WindowSystem, 2006),
with the objective of establishing “the X Window System as an industry-wide
graphics windowing standard” (Bucken, 1988). IBM joined the consortium in
1988. Over time, commercial influence on the project increased. There were
also ongoing philosophical and pragmatic differences between the FSF and
the X project. In fact, Stallman (1998) had described the X Consortium (and
its successor the Open Group) as “the chief opponent of copyleft” which is
one of the defining characteristics of the GPL, even though the X license is
GPL-compatible in the usual sense that X can be integrated with software
licensedunder the GPL. The FSF’sconcern is the familiar onethat commercial
vendors can develop extensive proprietary customizations of systems like the
Xreference implementation which they could then make dominant because of
theresources they can plowinto proprietary development,relying on the liberal
termsof the MIT/X license.
A notable organizational change occurred in 2003–2004. The XFree86
project had started in 1992 as a port of X to IBM PC compatibles. It had
over time become the most popular and technically progressive version of X.
However,by 2003 there was growing discontent among its developer commu-
nity,caused partly by the difficultyof obtaining CVS commit access. Ontop of
this,in 2004, theXFree86 project adopteda GPL-incompatible licensethat con-
tained a condition similar to the original BSD advertising clause. The change
wassupposed to provide more credit fordevelopers, but it had been donein the
face of strong community opposition, including from preeminent developers
likeJim Gettys, cofounder ofthe X project. Gettys opposedthe change because
itmade the license GPL-incompatible. Stallman (2004) observedthat although
the general intention of the new license requirement did “not conflict with the
GPL,” there were some specific details of the licensing requirement that did.
In combination with existing discontent about the difficulty of getting CVS
commitaccess, the new GPL incompatibility had almost immediate disruptive
consequences. The project forked, with the formation of the new X.Org foun-
dationin 2004. X.Org rapidly attractedalmost all the XFree86 developersto its
GPL-compatible fork. The newly formed organization places a much greater
emphasis on individual participation. In fact, Gettys (2003) notably observed
that “X.org is in the process of reconstituting its governance from an industry
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
102 3 The Open Source Platform
consortiumto an organization inwhich individuals, both at apersonal level and
as part of work they do for their companies have voice, working as part of the
larger freedesktop.org and free standards community” (italics added). X.Org
now provides the canonical reference implementation for the system which
remains “almost completely compatible with the original 1987 protocol” (X
WindowSystem, 2006).
References
Bucken, M. (1988). IBM Backs X Windows. Software Magazine, March 15. http://
findarticles.com/p/articles/mi
m0SMG/is n4 v8/ai 6297250. Accessed December
3,2006.
Engelbart, D.C. (1962). Augmenting Human Intellect: A Conceptual Framework.
Stanford Research Institute, Menlo Park, CA. http://www.invisiblerevolution.
net/engelbart/full
62 paper augm hum int.html. Accessed December 3, 2006.
Gettys,J. (2003). Open Source DesktopTechnology Road Map. HP Labs,Version 1.14.
http://people.freedesktop.org/jg/roadmap.html.Accessed December 6, 2006.
KDE. (2006). K Desktop Environment: Developer’s View. http://www.kde.org/
whatiskde/devview.php.Accessed January 10, 2007.
Myers, J. (1995). Casenote, Apple v. Microsoft: Virtual Identity in the GUI Wars.
Richmond Journal of Law and Technology, 1(5). http://law.richmond.edu/jolt/
pastIssues.asp.Accessed December 6, 2006.
Reimer,J. (2005). A History of the GUI. http://arstechnica.com/articles/paedia/gui.ars.
AccessedDecember 3, 2006.
Stallman, R. (1998). The X-Window Trap. Updated Version. http://www.gnu.org/
philosophy/x.html.Accessed December 3, 2006.
Stallman, R. (2004). GPL-Incompatible License. http://www.xfree86.org/pipermail/
forum/2004-February/003974.html.Accessed December 3, 2006.
Wheeler,D. (2000). EstimatingLinux’s Size. Updated 2004.http://www.dwheeler.com/
sloc/redhat62-v1/redhat62sloc.html.Accessed December 1, 2006.
XWindow System (2006). Top-Rated Wikipedia Article. http://en.wikipedia.org/wiki/
X
Window System. Accessed December 3.
3.2.2 Open Desktop Environments – GNOME
Theobjective ofthe GNOME Project isto create afree General Public Licensed
desktopenvironment for Unix-like systems like Linux. This ambition has long
been fundamental to the vision of free development for the simple reason that
providingan effective, free GUI desktop interfacefor Linux or other free oper-
atingsystems is necessary for them to realistically compete in themass market
with Windows and Apple environments. Aside from Linux itself, no other
open source project is so complex and massive in scale as GNOME, and so
overtlychallenges theexisting, established, proprietaryplatforms. The acronym
GNOMEstands for GNU NetworkObject Model Environment. Itis the official
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 Windowing Systems and Desktops 103
GNU desktop. In addition to the user desktop, GNOME also encompasses a
variety of standard applications and a comprehensive development environ-
mentused to develop applicationsfor GNOME or further developthe GNOME
platformitself.
Theidea for the GNOME project was initiated in 1996 by Miguel de Icaza.
Icaza,a recent computerscience graduate whowas the maintainer forthe GIMP
project, released (along with Fedrico Mena) a primitive version (0.10) of a
GUI infrastructure for Unix in 1997. The development language used was C
(de Icaza, 2000). There was a significant free licensing controversy behind
the motivation for developing GNOME. There already existed by that time
anotherfree desktop project calledKDE, but there werelicensing controversies
associatedwith KDE.One ofits keycomponents, the Qttoolkit librarydiscussed
inChapter 2, did not use anacceptable free software license. Toavoid this kind
of problem, the GNOME developers selected, instead of Qt, the GIMP open
source image processing toolkit GTK+. They believed this software would
serve as an acceptable LGPL basis for GNOME. The GNU LGPL permitted
anyapplications written for GNOME to use any kind of software license, free
or not, although of course the core GNOME applications themselves were to
belicensed under the GPL. The first major release ofGNOME was version 1.0
in1999. It was included as part of the Red Hat Linux distribution. This release
turnedout to be very buggy but was improved in a later release that year.
There are differentmodels for how founders continue a long-term relation-
shipwith an open source project.For example, they may maintainlicense own-
ershipand start a company that uses a dual open/proprietary track for licenses.
In the case of Icaza, after several years of development work on GNOME, he
relinquishedhis role and founded the for-profit Ximian Corporation in 2000as
aprovider for GNOME-related services. In order to ensure thecontinued inde-
pendence of the GNOME project, the GNOME Foundation was established
later that year. Its Board members make decisions on the future of GNOME,
using volunteer committees of developers and release teams to schedule plan-
ningand future releases. The not-for-profit GNOME Foundation, its industrial
partners,and avolunteer base ofcontributors cooperateto ensure thatthe project
progresses. The GNOME Foundation’s mandate has been defined as creating
“a computing platform for use by the general public that is completely free
software”(GNOME Foundation, 2000; German, 2003).
GNOME is unambiguously free in lineage and license – and it’s big. Fol-
lowing free software traditions, the development tools that were used to cre-
ate GNOME are all free software. They include the customary GNU soft-
ware development tools (gcc compiler, Emacs editor, etc.), the Concurrent
Versioning System for project configuration management, and the Bugzilla
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
104 3 The Open Source Platform
bug-tracking server software developed by the Mozilla Foundation and avail-
able from http://www.bugzilla.org/. These development tools were of course
themselvesthe product of lengthy development duringthe 1980s–1990s by the
GNU project. The project’scode base is extensive and increasingly reliable. It
is now a very large system with about two million LOC and 500 developers
invarious categories (German, 2003). As Koch and Schneider (2002) observe,
that’s actually roughly six million LOC added and four million lines deleted
per the project’s CVS repository. As already noted, the GNOME desktop has
beenenvisioned as an essential componentif the free GNU/Linux environment
is to compete in the popular market with Windows and Apple. That vision is
emergingas a reality. Thus GNOME representsone of the culminating accom-
plishmentsof the free software movement.
GNOME has three major architectural components: the GUI desktop envi-
ronment,a set of tools and libraries that can interact with theenvironment, and
a collection of office software tools. The scale and organizational structure of
the project reflect these components. The GNOME project architecture con-
sists of four main software categories, with roughly 45 major modules and a
largenumber of noncore applications. The categories comprise libraries (GUI,
CORBA, XML, etc. about 20 in total), core applications (about 16: mail
clients,word processors, spreadsheets, etc.), application programs, and several
dozennoncore applications (German, 2003). The modules, as typical in such a
large project, are relatively loosely coupled, so they can be developed mostly
independently of one another. When modules become unwieldy in size, they
aresubdivided as appropriate intoindependent submodules. The modular orga-
nizationis key to the success of the project because it keeps the organizational
structure manageable. Basically, a relatively small number of developers can
workindependently on each module.
While most open source projects do not, in fact, have large numbers of
developers, GNOME does, with over 500 contributors having write access to
the project repository (German, 2003), though as usual a smaller number of
activistdevelopers dominate. Koch and Schneider (2002) describe a pattern of
participationfor GNOME thatis somewhat differentthan that foundby Mockus
etal. (2002)intheirstatistical reviewof the Apacheproject. Though GNOME’s
development still emanated from a relatively small number of highly activist
developers, the distribution is definitely flatter than Apache’s. For example,
while for Apache the top 15 developers wrote about 90% of the code, for
GNOMEthe top 15developers wrote only50% of thecode, and toreach 80% of
theGNOME code, the top50 developers have tobe considered. At a morelocal
level, consider the case of the GNOME e-mail client (Evolution). According
to its development log statistics, five developers, out of roughly 200 total for
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 Windowing Systems and Desktops 105
theclient, were responsible forhalf the modifications; 20 developersaccounted
for 80% of the development transactions; while a total of 55 developers of
the 200 accounted for 95% of the transactions (German, 2003). This skewed
pattern of contribution is not untypical. Refer to the useful libresoft Web site
pagehttp://libresoft.urjc.es/Results/index
htmlfor CVS statistics forGNOME,
aswell as for many other open source projects.
Theuser environment that had to be created for GNOME was well-defined:
it was simply a matter of “chasing tail-lights” to develop it. Requirements
engineering for GNOME, like other open source projects, did not follow the
conventionalproprietary development cycle approach. It used a more generic
andimplicitly defined approachas described in German(2003). The underlying
objectivewas that GNOMEwas to be free software,providing a well-designed,
stabledesktop model, comparable to Windows and Apple’s,in order for Linux
to be competitive in the mass market PC environment. The nature of the core
applicationsthat needed to be developed was already well-defined. Indeed, the
mostprominent reference applications werefrom the market competition to be
challenged. For example, Windows MS Excel was the reference spreadsheet
application.It was to be matched by theGNOME gnumeric tool. Similarly, the
e-mail client Microsoft Outlook and the multifunction Lotus Notes were to be
replacedby the GNOME Evolutiontool. An anecdote by Icazareflects both the
informality and effectiveness of this reference model tactic in the design of a
simplecalendar tool:
Iproposed to Federico to write a calendar application in 10 days (because Federico
wouldnever show up on weekends to the ICN at UNAM to work on GNOME ;-).
Thefirst day we looked at OpenWindows calendar, that day we read all the relevant
standarddocuments that were required to implement the calendar, and started
hacking.Ten days later we did meet our deadline and we had implemented
GnomeCal(de Icaza, 2000).
Requirements also emerged from the discussions that occurred in the mail-
ing lists. Prototypes like the initial GNOME version 0.1 developed by Icaza
also served to define features. Ultimately, it was the project leader and the
maintainers who decided on and prioritized requirements. While fundamental
disagreements regarding such requirements could lead to forks, this did not
happenin the case of GNOME.
GNOME’scollaborative development model relies heavily on private com-
panies. Indeed, much of GNOME’s continued development is staffed by
employeesof for-profitcompanies. However,the projectitself isvendor-neutral.
Themaintainers of most of GNOME’simportant modules are actuallyemploy-
eesof for-profit corporations like Ximian, RedHat, and Sun. This arrangement
helps guarantee the stable development of the project since essential tasks are
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
106 3 The Open Source Platform
less subject to fluctuations at the volunteerlevel. The paid employee contribu-
torstend to handledesign, coordination, testing,documentation, and bugfixing,
asopposed to bugidentification (German, 2003). Thus,for the Evolution client,
about70% of the CVS commits come from thetop 10% of the contributors, all
of whom are employees of Ximian. Similarly, Sun has extensively supported
theso-called GNOMEaccessibility framework whichaddresses usability issues
including use by disabled individuals. Though paid corporate employees play
a major role, the volunteer participants are also pervasive, particularly as beta
testers,bug discoverers, anddocumenters. The volunteers areespecially impor-
tantin the area of internationalization –anaspectthat requiresnative language
experts and is supported by individuals motivated by a desire to see GNOME
supported in their own language. Interestingly,despite the role of voluntarism
it also appears to be the case that a career path strategy is often followed or at
least attempted by volunteers. Thus, most of the paid workers had started off
asproject volunteers and later movedfrom being enthusiastic hobbyists to paid
employeesof the major corporate sponsors (German, 2003).
Communicationamong the project participants is kept simple. It is handled
in a standard manner, using relatively lean media over the Internet commu-
nication channel, supplemented by traditional mechanisms like conferences
and Web sites. Mailing lists are used extensively for end users as well as for
individual development components. Bimonthly summaries are e-mailed on
the GNOME mailing list describing major current work, including the most
activemodules and developers during the report period. These are the forums
inwhich decisions about a module’s development are made. Project Web sites
contain information categorized according to type of participants, from items
for developers and bug reports to volunteer promotional efforts. An annual
conferencecalled GUADEC brings developerstogether and is organizedby the
GNOME Foundation.IRC or Internet Relay Chat (irc.gnome.org) provides an
informal means of instantaneous communication. (Incidentally, the Web site
Freenode.net provides IRC network services for many free software projects
includingGNU.) Of course, theCVS repository for theproject effectively coor-
dinatesthe development of the overall project.A limited number of developers
havewrite accessto the repository,having gainedthe privilege overtime by pro-
ducingpatches that maintainers have come to recognize as trustworthy, a tried
andtrue path inopen development. Initially,the patcheshave to besubmitted by
thedevelopers to the maintainers asdiffs or patches, at least until thedeveloper
hasattained a recognized trustworthy status with themaintainer. Rarely, it may
happenthat a developermay apply a patchto the repository thatis subsequently
rejectedby the maintainer.Such an outcomecan be disputedby appealing to the
broadercommunity, but these kinds of events are infrequent (German, 2003).
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 Windowing Systems and Desktops 107
References
De Icaza, M. (2000). The Story of the GNOME Project. http://primates.ximian.com/
miguel/gnome-history.html.Accessed November 29, 2006.
German,D.M. (2003). GNOME, aCase of Open SourceGlobal Software Development.
In:International Conference on Software Engineering, Portland, Oregon.
GNOME Foundation. (2000). GNOME Foundation Charter Draft 0.61. http://
foundation.gnome.org/charter.html.Accessed November 29, 2006.
Koch,S. and Schneider, G. (2002). Effort, Co-operation, and Co-ordination in anOpen
SourceSoftware Project: GNOME. Information Systems Journal, 12(1), 27–42.
Mockus,A., Fielding,R.T., andHerbsleb, J.D. (2002). TwoCase Studiesof Open Source
Development:Apache and Mozilla. ACM Transactions on Software Engineering
andMethodology, 11(3), 309–346.
3.2.3 Open Desktop Environments – KDE
The acronym KDE stands for believe it or not – Kool Desktop Environ-
ment. The KDE Website cogently expresses the vision and motivation for the
project: “UNIX did not address the needs of the average computer user. .. . It
isour hope that the combination UNIX/KDE will finally bring the same open,
reliable, stable and monopoly-free computing to the average computer user
thatscientist and computing professionals world-wide have enjoyed for years”
(www.kde.org).GNOME and KDE to some extent competitively occupy the
same niche in the Linux environment. But they are now both recognized for
the advances they made in achieving the OSS goal of creating a comprehen-
siveand popularly accessible free source platform. In 2005, USENIX gave the
twomost prominent developers of GNOME and KDE,de Icaza and Ettrich, its
STUG award for their work in developing a friendly GUI interface for open
desktops, saying that: “With the development of user friendly GUIs, both de
Icaza and Ettrich are credited with overcoming a significant obstacle in the
proliferation of open source. .. . Their efforts have significantly contributed to
the growing popularity of the open source desktop among the general public”
(http://www.usenix.org/about/newsroom/press/archive/stug05.html, accessed
January10, 2007).
The theme of product development by a sole inspired youth repeats itself
in KDE. The KDE project was started in 1996 by Matthias Ettrich, a 24-year-
old computer science student at the University of Tubingen. Ettrich had been
first exposed to free software development via the GNU project and Linux.
Actually it wasmore than mere exposure. Ettrich wrote the first version of the
open source product Lyxwhich uses the open source software system LaTeX,
developed for Don Knuth’s typesetting system TeX to produce high-quality
documentoutput. Ettrich has said that “this positive and successful experience
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
108 3 The Open Source Platform
of initiating a little self-sustaining free software community made me brave
enoughto start the KDE project later” (FOSDEM, 2005).
Ettrichannounced his KDE proposal in a now well-known e-mail where he
proposed the idea for the project. The objective is reminiscent of the attitude
that Firefox’sBlake Ross had for Firefox’s potential audience. Ettrich wanted
todefine and implement:
AGUI for end users
Theidea is NOT to create a GUI for the complete UNIX-System or the System-
Administrator.For that purpose the UNIX-CLI with thousands of tools and
scriptinglanguages is much better. The idea is to create a GUI for an ENDUSER.
Somebodywho wants to browse the web with Linux, write some letters and play
somenice games.
The e-mail was posted at the hacker’s favorite mid-morning hour: October
14, 1996, 3:00 a.m.,totheLinux newsgroup, de.comp.os.linux.misc. Refer to
the KDE organization’s http://www.kde.org/documentation/posting.txtfor the
fulltext.
In low-key, good-humored style reminiscent of Linus Torvalds, Ettrich
continued:
IMHOa GUI should offer a complete, graphical environment. It should allow a
userto do his everyday tasks with it, like starting applications, reading mail,
configuringhis desktop, editing some files, delete some files, look at some pictures,
etc.All parts must fit together and work together.
...Soone ofthemajorgoals is to provide amodern and common look & feel for all
theapplications. And this is exactly the reason, why this project is different from
elderattempts.
“IMHO” is the deferential “In My Humble Opinion” acronym derived from
Usenetcustom.
The inaugural e-mail refers prominently to the author’s intention to use the
QtC++ GUIwidget library forthe planned implementationof the project.Even-
tually,this use of the Qttoolkit would lead to free licensing concerns regarding
KDE. These concerns would be significant in motivating the development of
the competing GMOME project. The X referred to in the e-mail later is the X
WindowSystem for Unix which provided the basic toolkit for implementing
a window,mouse, and keyboard GUI. Motif is the classic open source toolkit
fromthe 1980sfor making GUIson Unixsystems. Incidentally,the misspellings
are from the original, reflecting the relaxed tone of the e-mail and perhaps the
differencein language. The e-mail continues as follows:
Sincea few weeks a really great new widget library is available free in source and
pricefor free software development. Check out http://www.troll.no
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 Windowing Systems and Desktops 109
Thestuff is called “Qt” and is really a revolution in programming X. It’s an almost
complete,fully C++ Widget-library that implementes a slightly improved Motif
lookand feel, or, switchable during startup, Window95.
Thefact that it is done by a company (Troll Tech) is IMO a great advantage. We
havethe sources and a superb library, they have beta testers. But they also spend
theirWHOLE TIME in improving the library. They also give great support. That
means,Qt is also interesting for commercial applications. A real alternative to the
terribleMotif :) But the greatest pro for Qt is the way how it is programmed. It’s
reallya very easy-to-use powerfull C++-library.
Itis clear from the post that Ettrich was unaware that there might be licens-
ing complications with the Qt toolkit. Originally Qt appears to have been pro-
prietary to Trolltech. Actually, there were both free and proprietary licenses
available, with the proprietary licenses only required if you were intend-
ing to release as closed source a product you developed using Qt. The free
license, however, was not quite free. For example, the version described at
http://www.kde.org/whatiskde/qt.php(accessed January10, 2007) requiresthat
“Ifyou want to make improvementsto Qt you need to sendyour improvements
to Troll Tech. Youcan not simply distribute the modified version of Qt your-
self,”which was contrary to the GPL. There was much legal wrangling on this
issue between the KDE developers and the FSF. Finally, in 2000, Trolltech –
for which Ettrich then worked – announced that it would license Qt under the
GNUGPL. This satisfied reservations among proponents of the Free Software
Movement.Per the KDEWeb site itis now the casethat “Each and everyline of
KDEcode is made availableunder the LGPL/GPL.This means that everyoneis
free to modify and distribute KDE source code. This implies in particular that
KDEis available free of charge to anyone and will always be free of charge to
anyone.”
The Qt licensing issue was a political cause c´el`ebre among certain open
source advocates, but it does not seem to have been a consideration for
users selecting between KDE and GNOME. They were primarily concerned
about the functionality of the systems (Compton, 2005). Already by 1998,
Red Hat had chosen KDE to be their standard graphical interface for their
Linux distributions. Currently, major Linux distributions tend to include both
KDE and GNOME in their distributions, with some companies like Sun
or Caldera preferring one to the other. A port of KDE to run on a Win-
dows environment is the mission of Cygwin (http://kde-cygwin.sourceforge.
net/).
The demographic profile of the KDE participants is fairly standard. KDE
has about 1,000 developers worldwide, mainly from Europe, having origi-
nated in Germany. It consists mostly of males aged 20–30 years old, many
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
110 3 The Open Source Platform
of whom are students or are employed in IT (Brand, 2004). The KDE
Web site is interesting and quite well-organized. Refer to the organization’s
http://www.kde.org/people/gallery.php “Virtual Gallery of Developers” for
biographiesof the major developers,with academic, professional, and personal
remarks. About two-thirds of the participants are developers, the remainder
beinginvolved in documentation, translation (about50 languages are currently
represented), and other activities. According to the survey by Brand (Chance,
2005),the workefforts ofindividual contributorsvary from aquarter-of-an-hour
to half-a-day, per day, with an average of two to three hours per day. In gen-
eral, as we havenoted previously, open development processes are visible and
extensivelydocumented (Nichols and Twidale,2003)inaway that proprietary,
closedsource, corporate developmentscannot be,almost in principle. Themail-
inglists and CVS repository that arethe key communications tools establishan
incrediblydetailed, time-stamped record of developmentwith readily available
machine-readablestatistics. For example, thelibresoft Web site mentionedpre-
viously,particularlythe statistics linkhttp://libresoft.urjc.es/Results/index
html
(accessedJanuary 10, 2007) is an excellent resource for detailed data on many
opensource projects including not only KDE but also important other projects
likeGNOME, Apache, FreeBSD, OpenBSD,XFree86, Mozilla and soon, with
plentiful data about CVS commits, module activity, and so on. The site also
containsdetailed data on committers and their contributions.
KDEstarted its developmentat a propitiousmoment in the evolutionof open
software platforms. This first version was both timely and critical because it
helpedadvertise the product at a time when Linux was rapidly growing. There
wasasyetno easy-to-usedesktop interface available for Linux, so the product
filled an unoccupied market niche. The initial success of the project was also
bolstered because the project creators were able to recruit developers from
anotheropen source project they hadconnections with (Chance, 2005). Further
successful development and possibly even the competition with the GNOME
projecthelped advertise the project even more, leading to additional developer
recruiting. The C++ implementation of KDE (vs. C for GNOME) facilitated
the enhancement of the system’s core libraries, again arguably facilitating its
success. Though KDE was initiated in 1996, most developers joined between
1999and 2002 (Brand, 2004).
Influence inside the KDE project is as usual determined by work-based
reputations. Reputations are based on experience and contributions, but
friendlyand cooperative behavior is an asset.Admission to the KDE core team
requires a reputation based on “outstanding contributions over a considerable
periodof time” (http://www.kde.org/).The kde-core-devel mailinglist is where
decisions are made, but the process is informal and unlike the centralized
“benevolent dictatorship” approach characteristic of Linux development. The
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.2 GIMP 111
normtends to be that “whoever does the work has the final decision” (Chance,
2005). Lead architects and maintainers who are authorized to speak for
the community are responsible for moving the platform forward. Ettrich has
observedthat the relativelyanarchical structure of theKDE organization makes
it hard to do things, commenting that “unless you have a captain,” then, even
with all the right ideas, “whether we are able to realize them against our own
resistanceis a different matter” (FOSDEM, 2005).These challenges reflect the
classic tension between the Cathedral and the Bazaar: it is hard to do without
strong, authoritative leadership in guiding the direction of large projects. The
conflictsthat have arisen derive mainly from differences concerning the future
direction of the project. Secondary sources of conflict include interpersonal
reactions to things like refusals to accept patches or ignored contributions.
There are also the traditional conflicts between the end users and developers.
These typically result from a disjuncture between the technical orientation of
thedevelopers versus the preference for stability and ease ofuse that end users
are interested in. A usability group (http://usability.kde.org/) has developed
that attempts to mediate the two viewpoints, but its standing is still of limited
importance(Chance, 2005). Like GNOME, KDE hasplaced a strong emphasis
on accessibility issues for individuals with disabilities. In terms of future
developments, Ettrich himself underscores usability issues as one of his “top
3favorite focus areas for KDE” (FOSDEM, 2005).
References
Brand,A. (2004). Structure of KDE Project. PELM Project, Goethe University, Frank-
furt.
Chance, T. (2005). The Social Structure of Open Source Development. Interview
with Andreas Brand in NewsForge. http://programming.newsforge.com/article.
pl?sid=05/01/25/1859253.Accessed November 29, 2006.
Compton,J. (2005).GNOME vs. KDE inOpen Source Desktops.http://www.developer.
com/tech/article.php/629891.Accessed January 20, 2007.
FOSDEM. (2005). Interview with Matthias Ettrich KDE. http://archive.fosdem.org/
2005/index/interviews/interviews
ettrich.html.Accessed January 10, 2007.
Nichols,D. and Twidale,M. (2003). The Usabilityof Open Source. First Monday,8(1).
http://www.firstmonday.dk/issues/issue8
1/nichols/index.html. Accessed Decem-
ber3, 2006.
3.3 GIMP
GIMP is a free software image manipulation tool intended to compete with
Adobe Photoshop. We include it in this chapter on open source platforms
because it is an important desktop application (not an Internet-related system
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
112 3 The Open Source Platform
likethose consideredin Chapter 2)and becauseits toolkit isused inthe GNOME
desktop.Imaging tools likeGIMP are ofincreasing importance inindustrial and
medicalapplications as wellas gaming andentertainment technology.The story
ofGIMP is important for understanding the record of accomplishment of open
developmentfor several reasons. Its originatorswere, prototypically, computer
scienceundergraduates at Berkeley who had themselves been weaned on open
sourceproducts. Out ofpersonal curiosity they wantedto develop aproduct that
incidentally,but only incidentally, would serve an important need in the open
sourceplatform. Their product imitated and challenged adominant proprietary
softwaretool for an end-user application, unlike most previous free programs.
Legalquestions about the licensing characteristics for some componentsof the
systemcreated a controversy within the free softwaremovement. The software
architecture represented by its plug-in system strongly impacted the success
of the project by making it easier for developers to participate. The reaction
and involvement of end users of the program was exceptionally important in
making GIMP successful because its value and effectiveness could only be
demonstrated by its ability to handle sophisticated artistic techniques. Conse-
quently,the tool’sdevelopment demanded an understandingof how it wasto be
usedthat could easily transcend the understanding of most of the actual devel-
opers of the system. In other words, the end users represented a parallel but
divergentform of sophistication to the programdevelopers. Management chal-
lenges arose with the fairlyabrupt departure of the originating undergraduates
for industrial positions on completion of their undergraduate careers and the
replacementof the original leadershipwith a team of coresponsibledevelopers.
Likethe other open source products we have examined, the storyof GIMP can
helpus understand how successful open source projects are born and survive.
GIMP,an acronymfor the“GNU ImageManipulation Program,” issupposed
tobe a challenge to Adobe Photoshop. It is intended to stand asthe free source
counterpart to Adobe Photoshop and is an official part of the GNU software
developmentproject. Coming out beginningin 1996, GIMP was oneof the first
major free software products for an end-user applications, as opposed to most
ofthe GNU projects thatwere oriented toward use byprogrammers. It provides
standard digital graphics functions and can be used, for example, to make
graphicsor logos, editand layer images, convertimage formats, makeanimated
images, and so on. According to its Freshmeat project description, GIMP is
“suitable for such tasks as photo retouching, image composition and image
authoring. It can be used as a simple paint program, an expert quality photo
retouching program, an online batch processing system, a mass production
image renderer, an image format converter, etc.” (from “The Gimp – Default
Branch”;description on www.freshmeat.net).
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.3 GIMP 113
Classprojects at UC Berkeleyhave a wayof making a bigsplash. GIMP was
developed by Peter Mattis and Spencer Kimball in August 1995, initially for
a class project for a computer science course when they were undergraduates.
Mattis“wanted to make a webpage”(Hackvn, 1999)soasaresulttheydecided
it would be interesting to design a pixel-based imaging program. Following
open source development custom, Mattis posted the following question on
comp.os.linux.x > Image Manipulation Program Features in July 1995 at the
canonicalhacker time of 3:00 a.m.:
Supposesomeone decided to write a graphical image manipulation
program(akin to photoshop). Out of curiousity (and maybe something
else),I have a few (2) questions:
Whatkind of features should it have? (tools, selections, filters, etc.)
Whatfile formats should it support? (jpeg, gif, tiff, etc.)?
Thanksin advance,
PeterMattis
Atthis point, neither Mattis nor Kimball had anythingbut a cursory familiarity
with image manipulation tools (Hackvn, 1999). However, within six months
Mattis and Kimball working alone not as part of a free-wheeling bazaar
format– hadreleased abeta version ofGIMP asopen source. Theannouncement
was made at 4:00 a.m. on November 21, 1995, on comp.windows.x.apps >
ANNOUNCE: The GIMP. The style of the release announcement is worth
notingfor the specificity and clarity of itsstatement of the project functionality
and requirements. Weprovide it in some detail as an illustration of how these
announcementsare heralded:
TheGIMP: the General Image Manipulation Program
TheGIMP is designed to provide an intuitive graphical interface to a
varietyof image editing operations. Here is a list of the GIMP’s major
features:
Imageviewing
r
Supports8, 15, 16 and 24 bit color.
r
Orderedand Floyd-Steinberg dithering for 8 bit displays.
r
View images as rgb color, grayscale or indexed color.
r
Simultaneouslyedit multiple images.
r
Zoomand pan in real-time.
r
GIF,JPEG, PNG, TIFF and XPM support.
Imageediting
r
Selectiontools including rectangle, ellipse, free, fuzzy, bezier and
intelligent.
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
114 3 The Open Source Platform
r
Transformationtools including rotate, scale, shear and flip.
r
Paintingtools including bucket, brush, airbrush, clone, convolve,
blendand text.
r
Effectsfilters (such as blur, edge detect).
r
Channel& color operations (such as add, composite, decompose).
r
Plug-inswhich allow for the easy addition of new file formats and
neweffect filters.
r
Multipleundo/redo. . . .
TheGIMP has been tested (and developed) on the following operating
systems:Linux 1.2.13, Solaris 2.4, HPUX 9.05, SGI IRIX.
Currently,the biggest restriction to running the GIMP is the Motif
requirement.We will release a statically linked binary for several
systemssoon (including Linux).
URLs
http://www.csua.berkeley.edu/gimp
ftp://ftp.csua.berkeley.edu/pub/gimpmailto:g.. . @soda.csua.berkeley.edu
Broughtto you by
SpencerKimball (spen . . .@soda.csua.berkeley.edu)
PeterMattis (p . . .@soda.csua.berkeley.edu)
NOTE
Thissoftware is currently a beta release. This means that we haven’t implemented
allof the features we think are required for a full, unqualified release. There are
undoubtedlybugs we haven’t found yet just waiting to surface given the right
conditions.If you run across one of these, please send mail to g...@soda.csua.
berkeley.eduwith precise details on how it can be reliably reproduced.
Thefirst public release (version.54) actually came in January 1996.
Plug-ins played an important role in the expansion of GIMP. The two solo
developers had provided a powerful and functional product with important
features like a uniform plug-in system, “so developers could make separate
programs to add to GIMP without breaking anything in the main distribution”
(Burgess,2003). Spencernoted that “The plug-inarchitecture of theGimp had a
tremendousimpact on its success, especiallyin the early stages ofdevelopment
(version 0.54). It allowed interested developers to add the functionality they
desiredwithout having to dig into the Gimp core” (Hackvn, 1999).
Plug-ins are very important in GIMP also because of its competition with
Photoshop. In fact, plug-ins for Adobe Photoshop can also run on GIMP if
you use the pspi (Photoshop Plug-in Interface) plug-in for GIMP that runs
third-party Photoshop plug-ins. Pspi was developed for Windows in 2001 and
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.3 GIMP 115
for Linux in 2006 (http://www.gimp.org/tml/gimp/win32/pspi.html). Pspi
acts as an intermediary between GIMP and Photoshop plug-ins, which are
implemented as dlls. According to pspi developer Tor Lillqvist, “The ques-
tion was ‘How would you load and call code in a Windows DLL on Linux’ ”
(http://www.spinics.net/lists/gimpwin/msg04517.html).As describedby Willis
(2006), pspi appears to be a “full, running copy of Photoshop. It provides the
hooks into the menus and functions of Photoshop that the plugin expects to
see, and connects them to the GIMP’s extension and menu system.” This is
actuallyextremely significant for theattractiveness of the Linuxplatform itself.
Professional graphics artists strongly prefer Photoshop under Windows; one
reasonbeing the availability of third-party plug-ins.The availability of pspi for
Linuxchanges this. There are a few ironiesin this story. A software bridgelike
pspi is made possible in the first place because of the Adobe policyof encour-
aging the development of third-party plug-ins through the use of its software
developmentkit. Thus,Adobe’s (naturaland logical) plug-inpolicy, designed to
increaseits own marketability,can also by the same tokenincrease its competi-
tion’smarketability.Furthermore, compiling thepspi source requires theAdobe
developmentkit, so you needthe kit to create the executablefor pspi. However,
oncethis is done, the executable itself is of course freely redistributable,as the
pspi is in the first place. Oddly,up until Photoshop 6 Adobe gave the software
development kit away for free but now requires specific approval. Thus, in a
certainsense an original compilation of pspi for use in GIMP would implicitly
requiresuch anapproval by Adobe.In anycase, the pointis moot becausedown-
loadable binaries are available for multiple platforms for pspi (Willis, 2006).
The pspi development illustrates the complicated interplay between technical
developmentissues, softwarearchitecture choices, legalissues, high-endgraph-
icsuser expectations,and the sometimes-unintended consequencesof corporate
policieslike those that encourage external development.
Perhapsunsurprisingly, licensing issueshave also affected GIMP’sdevelop-
ment.The initialGIMP toolkit forbuilding widgetswas based onthe proprietary
Motifwidget library. A widget (which is shorthand for “windows gadget”) can
be defined as a “standardized on-screen representation of a control that may
bemanipulated by the user” (redhat.com glossary),examples being scroll bars,
menus,buttons, sliders, and text boxes. Widgets can be thought of as the basic
building blocks of graphical interfaces and are constructed using toolkit pro-
grams.Because the Motifwidget library wasproprietary, anopen source widget
library was developed, called GTK+ (standing for GIMP toolkit), in order to
remainfully consistentwith the principlesof the freesoftware movement.There
wasalso another more personal professionalmotivation for replacing the Motif
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
116 3 The Open Source Platform
library.In addition to the developers thinking that Motif toolkit was “bloated
andinflexible” (Hackvn, 1999), Mattis personally “was dissatisfied with Motif
and wanted to see what it took to write a UI toolkit” for his own edification.
Theresulting GTK toolkit (eventually enhanced to GTK+) was licensed under
the LGPL, so it could be used even by developers of proprietary software
(www.gtk.org).GTK was provided with the 1996 release (Bunks, 2000). Sub-
sequentto that release, weaknesses in the beta version of the system, like poor
memory management, were resolved. There were also improvements like the
useof layer-based images,based on what thedevelopers saw usedin Photoshop
3.0.Another beta version was released in early 1997.
ByJune 1997, Kimball and Mattis hadreleased version 0.99.10 with further
improvements, including the updated GTK+ library. That final undergraduate
version represented a huge effort. Kimball remarked that he had “spent the
better part of two years on Gimp, typically at the expense of other pressing
obligations(school, work, life)” andthat “probably 95 to 98percent of the code
in0.99.10 was written by Pete or myself” (Hackvn,1999). They both share the
copyright on the entire project, though in point of fact Kimball concentrated
onGIMP and Mattis on the GTK. They never got to release version 1.0 – they
graduated from college in June 1997. The authors say that developing GIMP
waslargely a matter of dutyto the cause of free software.For Spencer Kimball,
hisGIMP development work hadbeen partly his payment onwhat he felt was a
debtof honor, as he said in an interview: “From the first line of source code to
thelast, GIMP was alwaysmy ‘dues’ paid tothe free software movement.After
usingemacs, gcc, Linux, etc., I really felt that I owed a debt to the community
which had, to a large degree, shaped my computing development” (Hackvn,
1999).Similar feelings were expressed by Mattisabout having “done his duty”
forfree software (Hackvn, 1999).
Transitions can be bumpy in open source. Since the model is significantly
volunteer-driven, you cannot just go out and hire new talent or leadership
(granted, the increasing participation of commercially supported open source
developers modifies this). Management problems set in at GIMP upon the
graduation of its principals from college because of the vacuum caused by
their departure. Spencer and Mattis had moved on. They were holding down
real jobs and could no longer put time into the project (Burgess, 2003). Most
problematically, “there was no defined successor to S&P, and they neglected
to tell anyone they were leaving” according to Burgess (2003). Turnover at
even traditional technical companies, where there is an average time between
job changes of two years, is a significant negative factor in productivity. This
impactis likely exacerbated in the free software community where “the rate of
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
3.3 GIMP 117
turnover for both volunteer and full-time contributors is probably higher and
the resulting losses to productivity and momentum are probably more severe.
New developers have the source code, but usually they can’t rely upon local
experts for assistance with their learning curves” (Hackvn, 1999). Thus, the
GIMP project had management hurdles to face that can handicap any project
andcertainly apply toopen source developmentas well. Buta new development
modelsoon emerged – a team of members with designated responsibilities for
managing releases, making bug fixes, etc. There was no single team leader,
and project decisions were made through the #gimp Internet Relay Channel.
The initial effort was “focused almost exclusively on stability” (quote from
Spencerin Hackvn (1999)). As far as the viability of the learning curve for the
volunteers,even without the guidanceof the original pair, Spencerapprovingly
observedthat “I’m not sure how long it took the new maintainers to learn their
way around the code, but judging by the stability of the product, they seem
to be doing quite well” (Hackvn, 1999). By mid-1998 the first stable version
wasreleased. GIMP was ported to Windows byTor Lillqvist in 1997. A binary
installer was developed by Jernej Simoncic that greatly simplified installation
on Windows. By 2004, after several years of development, a stable release,
supported not only on Unix, but on Mac OS X and Windows was announced.
The www.gimp.org Website now lists almost 200 developers involved in the
projectbeyond the founders Kimball and Mattis.
Forproducts likeAdobe Photoshopand GIMP, high-endspecialized usersset
thestandards for the product. Indeed, the requirements for anadvanced image-
processingproduct are probably more well understood in many respects by its
end users than its developers, particularly its professional graphic-artist users.
The effectivedevelopment and application of the product entails the develop-
mentof sophisticated artistic techniques anddemands an understanding of how
thetool is to beused that completely surpasses theunderstanding of most of the
actual developersof the system. Thus, as already observed, GIMP’s end users
representeda parallel butdivergent form ofsophistication to theprogram devel-
opers.An impassionedbase of suchusers was decisivein establishingthe recog-
nition and acceptance of GIMP. Their positive response and word-of-mouth
publicity helped spread the word about the product and define its evolution.
TheLinux Penguin logo was famously madeusing GIMP. The now-celebrated
Penguinwas designed byLarry Ewing in1996, using theearly 0.54 betaversion
of GIMP (Mears, 2003). In what would become common expert user action,
Ewing also set up a Web page that briefly described how he used GIMP to
makethe logo (Ewing, 1996). The whole episode became the first major expo-
surethat GIMP received (Burgess, 2003).The how-to site Ewing setup wasthe
P1:KAE
9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9
118 3 The Open Source Platform
first of many.As Burgess observed regarding GIMP, “what differentiated this
program from many others is that a lot of sites sprung up on how to use the
program. . .showing off artwork and sharing techniques” (Burgess, 2003).
The bottom line is how do GIMP and Adobe Photoshop compare? It is
a confusing issue because there are very different potential user audiences
involved,andso the levelof functionality neededvaries fromthe mundane tothe
esoteric.For many applications, GIMP appears to be perfectly suitable. GIMP
wasinitially awkward to installon Windows, but thecurrent download installer
isfast and effective.The basic GIMPinterface is highly professional.One hears
verydifferent comparativeevaluations fromdifferent sources,and it isnot really
clearhow objective theevaluators are. Overall,GIMP’s performance appearsto
not match that of Photoshop. Photoshop’s interface is more intuitive.GIMP is
lesseasy to use, an important distinction for a casual user. The proliferation of
separatewindows is not alwayswell-received. The qualityof the tools in GIMP
is arguably uneven. However,GIMP is free of charge and cross-platform. But
ifyou are a professional graphics artist or if the application is a significant one
wherethe graphics output is key to the successfuloutcome of a costly mission,
thecharge for the commercial product would likely not be an issue.
References
Bunks,C. (2000). Grokking the GIMP. New Riders Pub. Also: http://gimp-savvy.com/
BOOK/index.html?node1.html.Accessed November 29, 2006.
Burgess, S. (2003). A Brief History of GIMP. http://www.gimp.org/about/ancient
history.html.Accessed November 29, 2006.
Ewing,L. (1996).Penguin Tutorial.http://www.isc.tamu.edu/lewing/linux/notes.html.
AccessedJanuary 10, 2007.
Hackvn,S. (1999). Interview withSpencer Kimball and Peter Mattis.Linux World, Jan-
uary 1999. http://www.linuxworld.com/linuxworld/lw-1999-01/lw-01-gimp.html.
AccessedJanuary 21, 2004.
Mears,J. (2003). What’s the Story with the Linux Penguin? December 26. http://www.
pcworld.com/article/id,113881-page,1/article.html.Accessed January 10, 2007.
Willis, N. (2006). Running Photoshop Plugins in the GIMP, Even under Linux.
April10. http://applications.linux.com/article.pl?sid=06/04/05/1828238&tid=39.
AccessedNovember 29, 2006.
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
4
Technologies Underlying Open
Source Development
The free software movement emerged in the early 1980s at a time when
the ARPANET network with its several hundred hosts was well-established
and moving toward becoming the Internet. The ARPANET already allowed
exchanges like e-mail and FTP, technologies that significantly facilitated dis-
tributedcollaboration, thoughthe Internetwas toamplify this abilityimmensely.
TheTCP/IP protocols thatenabled the Internet becamethe ARPANETstandard
onJanuary 1, 1983. As a point ofreference, recall that the flagship open source
GNU project was announced by Richard Stallman in early 1983. By the late
1980s the NSFNet backbone networkmerged with the ARPANET to form the
emergingworldwide Internet. The exponential spread of the Internet catalyzed
further proliferation of open source development. This chapter will describe
some of the underlying enabling technologies of the open source paradigm,
other than the Internet itself, with an emphasis on the centralized Concurrent
VersionsSystem (CVS) versioning system as well as the newer decentralized
BitKeeper and Git systems that are used to manage the complexities of dis-
tributed open development. We also briefly discuss some of the well-known
Web sites used to host and publicize open projects and some of the services
theyprovide.
Thespecific communications technologiesused in opensource projects have
historicallytended to be relatively lean: e-mail, mailing lists, newsgroups, and
lateron Web sites, InternetRelay Chat, and forums. Most current activitytakes
place on e-mail mailing lists and Web sites (Feller and Fitzgerald, 2002). The
mailing lists allow many-to-many dialogs and can provide searchable Web-
based archives just like Usenet. Major open source projects like Linux in the
early1990s still began operation with e-mail,newsgroups, and FTP downloads
to communicate. Since the code that had to be exchanged could be volumi-
nous, some means for reducing the amount of information transmitted and for
clarifyingthe nature of suggested changes to the code was required. The patch
119
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
120 4Technologies Underlying Open Source Development
program created by Larry Wall served this purpose. Newsgroups provided a
meansto broadcast ideas to targetedinterest groups whose members might like
to participate in a development project. The Usenet categories acted like elec-
tronic bulletin boards that allowed newsgroup participants to post e-mail-like
messages like the famous comp.os.minix newsgroup on Usenet used by Linus
Torvaldsto initiate the development of Linux. Another powerful collaborative
tool, developed beginning during the late 1980s, that would greatly facilitate
managing distributed software development was the versioning or configura-
tion management system. It is this topic that will be the focus of our attention
inthis chapter.
Versioningsystemsare software tools thatallow multiple developersto work
onprojects concurrently and keep track of changes made to the code. The first
suchsystem was the Revision Control System (RCS) written inthe early 1980s
byWalter Tichy of Purdue. It used diffs to keep track of changes just like later
systems,but was limitedto single files. The firstsystem that could handleentire
projectswas written byDick Grune in1986 with amodest objective inmind. He
simplywanted tobe ableto workasynchronously with hisstudents ona compiler
project. Grune implemented his system using shell scripts that interacted with
RCS and eventually it evolved into the most widely used versioning system,
theopen source Concurrent Versions System, commonly knownas CVS. Brian
Berlinerinitiated the C implementation of CVS in mid-1989 by translating the
originalshell scriptsinto C. Latercontributors improved thesystem, noteworthy
being Jim Kingdom’s remote CVS implementation in 1993 that “enabled real
use of CVS by the open source community” (STUG award announcement for
2003,http://www.usenix.org/about/stug.html).
4.1 Overview of CVS
CVS has been crucial to open source development because it lets distributed
softwaredevelopers access a shared repository of the source code for a project
and permits concurrent changes to the code base. It also allows merging the
changes into an updated version of the project on the repository and monitor-
ing for potential conflicts that may occur because of the concurrent accesses.
Remarkably,at any point during a project development, any previous version
of the project can be easily accessed, so CVS also serves as a complete record
of the history of all earlier versions of the project and that of all the changes
to the project’s code. It thus acts like what has been metaphorically called
a time machine.Wewill overview the concepts and techniques that underlie
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
4.1 Overview of CVS 121
CVS (and similar systems) and illustrate its use in some detail, with exam-
ples selected from the comprehensive treatment of CVS by Fogel and Bar
(2003).
CVS, which is available for download from www.nongnu.org/cvs, is the
most widely used version control tool. It is distributed as open source
under the General Public License (GPL). It is an award-winning tool; its
major developers received the STUG (Software Tools User Group) award
in 2003 in which it was identified as “the essential enabling technol-
ogy of distributed development” (STUG award announcement for 2003;
http://www.usenix.org/about/stug.html). As Fogel and Bar (2003,p.10)
observe,“CVS becamethe free softwareworld’s first choicefor revisioncontrol
because there’s a close match ...between the way CVS encourages a project
tobe run and the way free projects actually do run.”
CVS serves two basic functions. On the one hand it keeps a complete his-
torical digest of all actions (patches) against a project and on the other hand
it facilitates distributed developer collaboration (Fogel and Bar, 2003). As an
example, consider the following scenario. Suppose a user reports a bug in the
lastpublic release of a CVSproject and a developer wantsto locate the bug and
fix it. Assuming the project has evolved since the previous release, the devel-
oper really needs an earlier version of the project, not its current development
state. Recapturing that earlier state is easy with CVS because it automatically
retainsthe entire developmenttree of theproject. Furthermore, CVSalso allows
the earlier version, once the bug is repaired, to be easily reintegrated with the
new current state of the project. Of course, it is worth stating that the kind of
developmentmanagement that CVS does had already been possible before the
deployment of CVS. The advantage is that CVS makes it much easier to do,
whichis a critical factor particularly in a volunteer environment. As Fogel and
Bar (2003, p.11) observe: “[I]t reduces the overhead in running a volunteer-
friendly project by givingthe general public easy access to the sources and by
offering features designed specifically to aid the generation of patches to the
source.”
CVSis a client-server system under which software projects are stored in a
so-called repository on a central server that serves content to possibly remote
clients.Its client-side manifestations let multiple developersremotely and con-
currently check out the latest version of a project from the repository. They
can then modify the source code on the client(s) as they see fit, and thereafter
commit anychanges they have made to their working copy back to the central
repositoryin a coordinated manner, assuming they have the write privileges to
doso. This is called a copy-modify-merge development cycle.
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
122 4Technologies Underlying Open Source Development
Priorto CVS, versioning tools followed a lock-modify-unlock model for file
changes. Only one developer could have access to a particular file at a time;
other developers had to wait until the file being modified was released. This
kind of solo, mutually exclusive access requires considerable coordination. If
thedevelopersare collocated, orknow eachother welland can contacteach other
quicklyif a lockout is handicapping their work, or ifthe group of developers is
smallso that concurrent accesses are infrequent, then the coordination may be
manageable. But in a large, geographically and temporarily distributed group
ofdevelopers, the overhead ofcoordinating becomes onerous and annoying –a
problematicissue in what may be a preponderantly volunteer community.This
concurrent access is one way in which the copy-modify-merge model of CVS
smoothesthe interactions in a distributed development.The impact of conflicts
in CVS also appears to be less than might be expected in any case. Berliner,
one of the key creators of CVS, indicated that in his own personal experience
actual conflicts are usually not particularly problematic: “conflicts that occur
when the same object has been modified by someone else are quite rare” and
thatif they dooccur “the changesmade by the otherdeveloper areusually easily
resolved”(Berliner, 1990).
Diffand Patch
TheCVS development treeis not stored explicitly.Under CVS, earlier versions
of the project under development are maintained only implicitly with just the
differencesbetween successive versions kept–atechnique that is called delta
compression. The CVS system lets a developer make changes, track changes
madeby other developers by viewing a log of changes, access arbitrary earlier
versionsof the project on the basis, for example, of a date or revision number,
and initiate new branches of the project. The system can automatically inte-
grate developer changes into the project master copy on the repository or to
any working copies that are currently checked out by any developers using a
combination of its update and commit processes. The distributed character of
theproject’s developers, who are working on the project at different times and
places, benefits greatly from this kind of concurrent access, with no developer
havingexclusive access to the repository files; otherwise the project could not
be collaborated on as effectively. Changes are generally only committed after
testingis complete, so the master copystays runnable. The committed changes
areaccompanied by developer log messages that explain the change. Conflicts
caused by a developer,who is concurrently changing a part of the project that
has already been changed by another developer, are detected automatically
when the developer attempts to commit the change. These conflicts must then
beresolved manually before the changes can be committed to the repository.
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
4.1 Overview of CVS 123
Basic programs (commands, utilities, files) required for such versioning
systeminclude the following:
1. the diff command,
2. the patch command, and
3. the patch file.
Thediff command is a Unix command that identifies and outputs the differ-
encesbetween a pair of textfiles on a line-by-line basis.It indicates (depending
on the format selected) whether different lines have been added, deleted, or
changed,with unchanged shared lines not output, except as context. Those are
theonly possible four editorial states.
Conceptually,diff takes a pair of files A andB and creates a file C represen-
ting their “difference.” The output file is usually called a patch file because of
its use in collaborative development where the difference represents a “soft-
ware patch” that is scheduled to be made to a current version of a program.
Modifications to projects may be submitted as patches to project developers
(ormaintainers) who can evaluate thesubmitted code. The core developers can
thendecide whether asuggested patch should berejected, or accepted andcom-
mittedto the source repository, towhich only the developers havewrite access.
Theso-called unified difference format for the diff command is especially use-
fulin open source developmentbecause it lets project maintainersmore readily
recognize and understand the code changes being submitted. Forexample, the
unified format includes surrounding lines that have not been changed as con-
text,making it easier torecognize what contents have beenchanged and where.
Then, a judgment is required before the changes are committed to the project
repository.
The diff command works in combination with the patch command to enact
changes(Fountain, 2002).The Unixpatch command usesthe textualdifferences
between an original file A and a revised file B, as summarized in a diff file
C, to update file A to reflect the changes introduced in B. For example, in a
collaborativedevelopmentcontext, if Bis an updatedversion of thedownloaded
sourcecode in A, then:
diff AB > C
createsthe patch file Casthedifference of A and B. Then the command:
patch A < C
could be used to apply the patch C to update A, so it corresponds to the revi-
sionB.
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
124 4Technologies Underlying Open Source Development
Thecomplementary diff and patch commands are extremely useful because
they allow source code changes, in the form of the relatively small patch file
like C (instead of the entire new version B), to be submitted, for example, by
e-mail. After this submission, the small patch changes can be scrutinized by
projectmaintainers before they are integrated into the development repository.
Thesecommands are considered the crucialunderlying elements in version-
ing systems, regardless of whether they are used explicitly or wrapped up in
a tool. CVS C-implementer Berliner characterizes the patch program as the
“indispensabletool for applying a diff file to an original”(Berliner, 1990). The
patchprogram was invented by Larry Wall (creator of Perl) in 1985.
4.2 CVS Commands
Note: The following discussion is based on the well-known introduction to
CVS by Karl Fogel and Moshe Bar (2003), specifically the “Tour of CVS”
in their Chapter 2 – though their treatment is far more detailed, overall about
370 pages for the entire text in the current PDF. The present overview gives
only a glimpse of CVS and is intended as a bird’s-eye-view of how it works.
We will use a number of examples from the Fogel and Bar (2003) tour which
wewill reference carefully to facilitate ready accessto the original treatise. We
alsointersperse the examples withcontextual comments about the roleof CVS.
The interested reader should see Fogel and Bar (2003) for a comprehensive,
in-depthtreatment. We next illustrate some of the key CVS commands.
4.2.1 Platforms and Clients
Naturally,in order to execute the cvs program it must have been installed on
yourmachine in the firstplace. CVS comes with mostLinux distributions, so in
that case you do not have to install it. Otherwise, you can build CVS from the
source code provided at sites like the Free Software Foundation (FSF)’s FTP
site. The stable releases of the software are those with a single decimal point
in their release version number. Unix-like platforms are obviously the most
widely used for CVS development. The well-known documentation manual
for the CVS system is called the Cederqvist Manual, named after its original
authorwho wrotethe firstversion in1992 (Cederqvistet al., 2003).(Incidentally,
dates are interesting in these matters because they help correlate noteworthy
technologicaldevelopments related to open source.For example, CVS debuted
around 1986, but the key C version by Berliner did not come out until 1989.
LinusTorvalds posted his original Linux announcement in August 1991.)
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
4.2 CVS Commands 125
Thecvs executableonce installedautomatically allowsyou to useit asa client
toconnect toremote CVS repositories.If you wantto create arepository on your
own machine, youuse the cvs newrepos command and theinit subcommand to
createone. Ifyou then addan appropriate Unixusers group,then any ofthe users
cancreate an independent new project usingthe cvs import command. Refer to
Fogeland Bar (2003) for detailed information about where to get source code,
compilation,commands, administration, etc.
There are also Windows versions of CVS available. Currently these can
onlyconnect as clients to repositorieson remote machines or serverepositories
on their own local machine. They cannot provide repository service to remote
machines. The Windows version is typically available as prebuilt binary exe-
cutables.A free Windowsprogram called WinCVS isdistributed under the GPL
thatprovides aCVS client that onlylets you connectto a remoteCVS repository
server.However, it does not let you serve a repository from you own machine,
even locally.WinCVS is available as a binary distribution with relatively easy
installationand configuration instructions. The WinCVS clientlets you make a
workingproject copy from a remote repository to which you can subsequently
commitchanges, update, or synchronize vis- `a-vis the repository,etc.
4.2.2 Command Format
TheCVS interfaceis command-lineoriented. Both commandoptions andglobal
optionscan be specified. Command (or local) options only affectthe particular
commandand are givento the rightof the command itself.Global options affect
the overall CVS environment independently of the current command and are
givento the left of the command. The format to execute a command is
cvs -global-options command -command-options
Forexample,thestatement:
cvs -Q update -p
runs the update command (Fogel and Bar, 2003,p.27).The token cvs is of
course the name of the CVS executable. The -Q tells the CVS program to
operate in the quiet mode, meaning there is no diagnostic output except when
thecommand fails. The -p command option directs the resultsof the command
tostandard output. The repository beingreferenced may be local orremote, but
ineither case a workingcopy must have alreadybeen checked out. Weconsider
the semantics of the illustrated update command in an upcoming section, but
firstwe address a basic question: how do you get a copy of the project to work
onin the first place?
P1:JYD
9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3
126 4Technologies Underlying Open Source Development
4.2.3 Checking Out a Project From a Repository
Thetransparency of online open source projects is truly amazing. Using CVS,
anyoneon the Internet can get a copy of the most current version of a project.
Whileonly core developers canmake changes to themaster copy of a projectin
the repository,CVS allows anyone to retrieve a copy of the project, as well as
tokeep any of their own modificationsconveniently synchronized vis-`a-visthe
repository.This is a major paradigmshift. It is the polar opposite to howthings
are done in a proprietary approach where access to the source is prohibited to
anyoneoutside theproject loop. In opensource under CVS,everyone connected
to the Internet has instantaneous access to the real-time version of the source
as well as to its development history: what was done, where it was done, by
whom was it done, and when was it done! Of course, the same technology
couldalso be used in a proprietary development modelfor use by a proprietary
developmentteam only, or by individuals or small teams working on a private
project.
Rememberthe natureof the CVSdistribution model. Thereis a singlemaster
copy of the project that some CVS system maintains centrally in a repository.
Anyone who wants to look at the source code for a project in that repository,
whetherjust to read itor to modify it, hasto get his orher own separate working
copy of the project from the repository. Given that a project named myproject
already exists, a person checks out a working copy of the project with the
command(Fogel and Bar, 2003,p.32):
cvs checkout myproject
Of course, before you can actually check out a working copy of a project,
youfirst have to tell your CVS system or client where the repository is located
thatyou expect to check out from. If the repositorywere stored locally on your
own machine, you could execute the cvs program with the -d (for directory)
option and just give the local path to the repository. A typical Unix example,
assumingthe repository is locatedat /usr/local/cvs (Fogeland Bar, 2003,p.27),
wouldbe
cvs -d /usr/ local/ cvs command
Toavoid having totype the-d repository youcan setthe environment variable
CVSROOTto pointto the repository.In therest of this overviewwe willassume
thishas already been done (Fogel and Bar, 2003,p.30).If the repository were
located on a remote server reached over the Internet, you would use an access
method.The methodmay allow unauthenticatedaccess tothe repository,but it is
alsopossible to have password-authenticatedaccess to the server(via an access
method called pserver). Authenticated access requires not only a username,