P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

This page intentionally left blank

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

OpenSource

Fromthe Internet’s infrastructure to operating systems like GNU/Linux, the open

sourcemovement comprises some of the greatest accomplishments in computing over

thepast quarter century. Its story embraces technological advances, unprecedented

globalcollaboration, and remarkable tools for facilitating distributed development.

Theevolution of the Internet enabled an enormous expansion of open development,

allowingdevelopers to exchange information and ideas without regard toconstraints of

space,time, or national boundary. The movement has had widespread impact on

educationand government, as well as historic, cultural, and commercial repercussions.

PartI discusses key open source applications, platforms, and technologies used in open

development.Part II explores social issues rangingfrom demographics and psychology

tolegal and economic matters. Part III discusses the Free Software Foundation, open

sourcein the public sector (government and education), and future prospects.

fadi p. deekreceived his Ph.D. in computer and information science from the New

JerseyInstitute of Technology (NJIT). He is Dean of the College of Science and

LiberalArts and Professor of Information Systems, Information Technology, and

MathematicalSciences at NJIT, where he began his academic career as a Teaching

Assistantin 1985. He is also a member of the Graduate Faculty – Rutgers University

Ph.D.Program in Management.

james a. m. mchughreceived his Ph.D. in applied mathematics from the Courant

Instituteof Mathematical Sciences, New York University.During the course of his

career,he has been a Member of Technical Staff at Bell Telephone Laboratories (Wave

PropagationLaboratory), Director of the Ph.D. program in computer science at NJIT,

ActingChair of the Computer and Information Science Department at NJIT, and

Directorof the Program in Information Technology. He is currently a tenured Full

Professorin the Computer Science Department at NJIT.

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

Open Source

Technology and Policy

FADI P.DEEK

New JerseyInstitute of Technology

JAMES A. M. McHUGH

New JerseyInstitute of Technology

iii

CAMBRIDGEUNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University Press

The Edinburgh Building, Cambridge CB28RU, UK

First published in print format

ISBN-13 978-0-521-88103-6

ISBN-13 978-0-521-70741-1

ISBN-13 978-0-511-36775-5

2007

Information on this title: www.cambridge.org/9780521881036

This publication is in copyright. Subject to statutory exception and to the provision of

relevant collective licensing agreements, no reproduction of any part may take place

without the written

ermission of Cambrid

e University Press.

ISBN-10 0-511-36775-9

ISBN-10 0-521-88103-X

ISBN-10 0-521-70741-2

Cambridge University Press has no responsibility for the persistence or accuracy of urls

for external or third-party internet websites referred to in this publication, and does not

uarantee that any content on such websites is, or will remain, accurate or a

riate.

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org

hardback

paperback

eBook (NetLibrary)

hardback

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

To my children,

Matthew,Andrew, and Rebecca

FadiP. Deek

To my parents, Anne and Peter

To my family, Alice, Pete, and Jimmy

andto my sister, Anne Marie

JamesA. M. McHugh

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

Contents

Preface page ix

Acknowledgments xi

1. Introduction 1

1.1 Why Open Source 2

1.2 Preview 11

Section One: Open Source – InternetApplications,

Platforms,and Technologies

2. Open Source Internet Application Projects 21

2.1 The WWW and the Apache Web Server 23

2.2 The Browsers 37

2.3 Fetchmail 50

2.4 The Dual License Business Model 61

2.5 The P’s in LAMP 70

2.6 BitTorrent 77

2.7 BIND 78

3. The Open Source Platform 80

3.1 Operating Systems 81

3.2 Windowing Systems and Desktops 99

3.3 GIMP 111

4. Technologies Underlying Open Source Development 119

4.1 Overview of CVS 120

4.2 CVS Commands 124

4.3 Other Version Control Systems 143

4.4 Open Source Software Development Hosting Facilities

andDirectories 151

vii

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

viii Contents

Section Two:Social, Psychological, Legal, and

Economic Aspects of Open Source

5. Demographics, Sociology, and Psychology of Open Source

Development 159

5.1 Scale of Open Source Development 160

5.2 Demographics and Statistical Proﬁle of Participants 162

5.3 Motivation of Participants 164

5.4 Group Size and Communication 166

5.5 Social Psychology and Open Source 168

5.6 Cognitive Psychology and Open Source 181

5.7 Group Problem Solving and Productivity 190

5.8 Process Gains and Losses in Groups 197

5.9 The Collaborative Medium 206

6. Legal Issues in Open Source 222

6.1 Copyrights 223

6.2 Patents 228

6.3 Contracts and Licenses 232

6.4 Proprietary Licenses and Trade Secrets 236

6.5 OSI – The Open Source Initiative 243

6.6 The GPL and Related Issues 250

7. The Economics of Open Source 265

7.1 Standard Economic Effects 266

7.2 Open Source Business Models 272

7.3 Open Source and Commoditization 281

7.4 Economic Motivations for Participation 285

Section Three: FreeSoftware: The Movement, the

Public Sector,and the Future

8. The GNU Project 297

8.1 The GNU Project 297

8.2 The Free Software Foundation 302

9. Open Source in the Public Sector 309

9.1 Open Source in Government and Globally 310

9.2 Open Source in Education 316

10. The Future of the Open Source Movement 325

Glossary 336

SubjectIndex 351

AuthorIndex 366

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

Preface

The story of free and open software is a scientiﬁc adventure, packed with

extraordinary, larger-than-life characters and epic achievements. From infra-

structure for the Internet to operating systems like Linux, this movement

involvessome of the greataccomplishments in computing over thepast quarter

century.Thestory encompasses technologicaladvances, global softwarecollab-

orationon anunprecedented scale, andremarkable softwaretools for facilitating

distributeddevelopment. It involvesinnovative business models, voluntaryand

corporate participation, and intriguing legal questions. Its achievements have

had widespread impact in education and government, as well as historic cul-

tural and commercial consequences. Some of its attainments occurred before

the Internet’srise, but it was the Internet’s emergence that knitted together the

scientiﬁcbards ofthe open sourcecommunity. Itlet themexchange their innova-

tionsand interact almostwithout regard toconstraints of space,time, or national

boundary.Our story recounts thetales of major open community projects: Web

browsers that fueled and popularized the Internet, the long dominant Apache

Web server, the multifarious development of Unix, the near-mythical rise of

Linux, desktop environments like GNOME, fundamental systems like those

provided by the Free Software Foundation’s GNU project, infrastructure like

theX Window System, and more. Wewill encounter creative, driven scientists

who are often bold, colorful entrepreneurs or eloquent scientiﬁc spokesmen.

The story is not without its conﬂicts, both internal and external to the move-

ment. Indeed the free software movement is perceived by some as a threat to

the billions in revenue generated by proprietary ﬁrms and their products, or

converselyas a development methodology that is limited in its ability to ade-

quately identify consumer needs. Much of this tale is available on the Internet

becauseof the way the community conducts its business, making it a uniquely

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

x Preface

accessible tale. As free and open software continues to increasingly permeate

our private and professional lives, we believe this story will intrigue a wide

audience of computer science students and practitioners, IT managers, policy-

makers in government and education, and others who want to learn about the

fabled,ongoing legacy of transparent software development.

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

Acknowledgments

Manypeople helped us during the process of writing and publishing thisbook.

Although it is impossible to know all of them by name, we offer a word of

appreciation and gratitude to all who have contributed to this project. In par-

ticular,we thank the anonymous reviewers who read the proposal for the text

andcarefully examined the manuscript during the earlier stages of the process.

Theyprovided excellent recommendations and offered superb suggestions for

improvingthe accuracy and completeness of the presented material.

HeatherBergman, Computer Science Editorat Cambridge University Press,

deserves enormous praise for her professionalism and competence. Heather

responded promptly to our initial inquiry and provided excellent insight and

guidance throughout the remaining stages. Her extraordinary efforts were

instrumentalin getting this book into the hands of its readers.

P1:KAE

9780521881036pre CUNY1180/Deek 0521 88103 6 October 1, 2007 17:23

xii

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

Introduction

The open source movement is a worldwide attempt to promote an open style

of software development more aligned with the accepted intellectual style of

science than the proprietary modes of invention that have been characteristic

of modern business. The idea – or vision – is to keep the scientiﬁc advances

created by software development openly available for everyone to understand

and improve upon. Perhaps even more so than in the conventional scientiﬁc

paradigm, the very process of creation in open source is highly transparent

throughout. Its products and processes can be continuously, almost instan-

taneously scrutinized over the Internet, even retrospectively. Its peer review

process is even more open than that of traditional science. But most of all: its

discoveries are not kept secret and it lets anyone, anywhere, anytime free to

buildon its discoveries and creations.

Opensource is transparent. The source code itself is viewable and available

to study and comprehend. The code can be changed and then redistributed to

sharethe changesand improvements. Itcan be executedfor anypurpose without

discrimination. Its process of development is largely open, with the evolution

of free and open systems typically preserved in repositories accessible via the

Internet,including archives of debates onthe design and implementation of the

systems and the opinions of observers about proposed changes. Open source

differsvastly fromproprietary code where allthese transparencies are generally

lacking.Proprietary code is developedlargely in private,albeit its requirements

are developed with its prospective constituencies. Its source code is generally

notdisclosed and is typicallydistributed under the shield ofbinary executables.

Itsuse is controlled by proprietary software licensing restrictions. The right to

copy the program executables is restricted and the user is generally forbidden

fromattempting tomodify and certainlyfrom redistributing thecode or possible

improvements. In most respects, the two modalities of program development

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

2 1Introduction

arepolar opposites, though this is notto say there are not many areaswhere the

commercialand open communities have cooperated.

Throughout this book, we will typically use the term open source in a

generic sense, encompassing free software as referred to by the Free Soft-

ware Foundation (FSF) and open source software as referred to by the Open

Source Initiative (OSI) organization. The alternative composite terms FLOSS

(for Free/Libre/Open Source Software) or FOSS are often used in a European

context.The two organizations, theFSF and the OSI, represent the twostreams

ofthe free oropen source movement.Free software isan intentionally evocative

term, a rallying cry as it were, used by the FSF and intended to resonate with

thevalues of freedom: user and developer freedom. The FSF’s General Public

License (GPL) is its gold standard for free licenses. It has the distinctive char-

acteristic of preventing software licensed under it from being redistributed in

a closed, proprietary distribution. Its motto might be considered as “share and

sharealike.” However,the FSF also recognizesmany other software licensesas

freeas long as theylet the user run a program forany purpose, access its source

code,modify the code if desired, and freely redistribute the modiﬁcations. The

OSIon the other handdeﬁnes ten criteria for callinga license open source.Like

the FSF’s conditions for free software (though not the GPL), the OSI criteria

do not require the software or modiﬁcations to be freely redistributed, allow-

ing licenses that let changes be distributed in proprietary distributions. While

the GPL is the free license preferred by the FSF, licenses like the (new) BSD

or MIT license are more characteristic of the OSI approach, though the GPL

is also an OSI-certiﬁed license. Much of the time we will not be concerned

aboutthe differences between the variouskinds of free or opensource licenses,

thoughthese differences can bevery important and havemajor implications for

usersand developers (see suchas Rosen, 2005). When necessary,we will make

appropriate distinctions, typically referring to whether certain free software is

GPL-licensedor is under a speciﬁc OSI-certiﬁed license. We will elaborate on

software licenses in the chapter on legal issues. For convenience we will also

referat times to “open software” and “open development” in the same way.

We will begin our exploration by considering the rationale for open source,

highlightingsome ofits putative ordemonstrable characteristics,its advantages,

andopportunities it provides. We will then overview what we will coverin the

restof the book.

1.1Why Open Source

Beforewe embark on our detailed examination of open source, we will brieﬂy

exploresome markers for comparing open and proprietary products. A proper

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.1Why Open Source 3

comparison of their relative merits would be a massively complex, possibly

infeasible undertaking. There are many perspectives that would have to be

considered, as well as an immense range of products, operating in diverse

settings,under differentconstraints, and withvaried missions.Unequivocal data

fromunbiased sources would have to be obtained for an objectivecomparative

evaluation, but this is hard to come by. Even for a single pair of open and

proprietary products it is often difﬁcult to come to clear conclusions about

relative merits, except for the case of obviously dominant systems like Web

servers (Apache). What this section modestly attempts is to set forth some of

the parameters or metrics that can help structure a comparative analysis. The

issuesintroduced here are elaborated on throughout the book.

Opensource systems and applications often appear to offer signiﬁcant ben-

eﬁtsvis- `a-visproprietary systems. Consider some of the metrics they compete

on. First of all, open source products are usually free of direct cost. They are

often superior in terms of portability. You can modify the code because you

can see it and it’s allowed by the licensing requirements, though there are

different licensing venues. The products may arguably be both more secure

and more reliable than systems developed in a proprietary environment. Open

productsalso often offer hardware advantages, with frequentlyleaner platform

requirements. Newer versions can be updated to for free. The development

process also exhibits potential macroeconomic advantages. These include the

innately antimonopolistic character of open source development and its the-

oretically greater efﬁciency because of its arguable reduction of duplicated

effort. The open source paradigm itself has obvious educational beneﬁts for

students because of the accessibility of open code and the development pro-

cess’transparent exposure of high-quality software practice. The products and

processeslend themselves in principle to internationalization and localization,

thoughthis is apparently not always well-achieved in practice. There are other

metrics that can be considered as well, including issues of quality of vendor

support, documentation, development efﬁciency,and so on. We will highlight

some of these dimensions of comparison. A useful source of information on

these issues is provided by the ongoing review at (Wheeler, 2005), a detailed

discussion which, albeit avowedly sympathetic to the open source movement,

makesan effort to be balanced in its analysis of the relative merits ofopen and

proprietarysoftware.

1.1.1 Usefulness, Cost, and Convenience

Doesthe open source model createuseful software products in atimely fashion

at a reasonable cost that are easy to learn to use? In terms of utility, consider

that open source has been instrumental in transforming the use of computing

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

4 1Introduction

insociety. Most of the Internet’s infrastructure and the vastlysuccessful Linux

operating system are products of open source style development. There are

increasingly appealing open desktop environments like GNOME and KDE.

Furthermore, many of these products like the early Web servers and browsers

as well as Linux were developed quite rapidly and burst on the market. Fire-

fox is a recent example. It is of course hard to beat the direct price of open

sourceproducts since they areusually free. The zero purchasecost is especially

attractivewhen the software product involved hasalready been commoditized.

Commoditization occurs when one product is pretty much like another or at

least good enough for the needs it serves. In such cases, it does not pay to

pay more. An open source program like the Apache Webserver does not even

have to be best of breed to attract considerable market share; it just has to be

cheap enough and good enough for the purpose it serves. Open source is also

not only freely available but is free to update with new versions, which are

typically available for free download on the same basis as the original. For

most users, the license restrictions on open products are not a factor, though

theymay be relevant to software developers or major users who want to mod-

ify the products. Of course, to be useful, products have to be usable. Here the

situationis evolving. Historically, many open sourceproducts have been in the

categoryof Internet infrastructure tools orsoftware used by system administra-

tors. For such system applications, the canons of usability are less demanding

because the users are software experts. For ordinary users, we observe that

at least in the past interface, usability has not been recognized as a strong

suit of open source. Open source advocate Eric Raymond observed that the

design of desktops and applications is a problem of “ergonomic design and

interfacepsychology, and hackers havehistorically been poor at it” (Raymond,

1999). Ease of installation is one aspect of open applications where usability

is being addressed such as for the vendor-provided GNU/Linux distributions

or,at a much simpler level, installers for software like the bundledAMP pack-

age (Apache, MySQL, Perl, PHP). (We use GNU/Linux here to refer to the

combination of GNU utilities and the Linux kernel, though the briefer desig-

nation Linux is more common.) Another element in usability is user support.

There is for-charge vendor-based support for many open source products just

as is for proprietary products. Arguments havebeen made on both sides about

whichis better. Majorproprietary software developersmay have more ﬁnancial

resources to expend on “documentation, customer support and product train-

ing than do open source providers” (Hahn, 2002), but open source products

by deﬁnition can have very wide networks of volunteer support. Furthermore,

since the packages are not proprietary, the user is not locked-in to a particular

vendor.

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.1Why Open Source 5

1.1.2 Performance Characteristics

Doesopen source provide products that are fast, secure, reliable,and portable?

Theoverview inWheeler (2005) modestlystates that GNU/Linux isoften either

superior or at least competitive in performance with Windows on the same

hardware environment. However, the same review emphasizes the sensitiv-

ity of performance to circumstances. Although proprietary developers beneﬁt

fromﬁnancial resources that enable them to produce high qualitysoftware, the

transparentcharacter of open source is uniquely suitableto the requirements of

securityand reliability.

In terms of security, open source code is widely considered to be highly

effectivefor mission-critical functions, precisely because its code can be pub-

liclyscrutinized for security defects.It allows users the opportunityto security-

enhancetheir own systems,possibly with the helpof an open sourceconsultant,

rather than being locked into a system purchased from a proprietary vendor

(Cowan,2003). In contrast, for example, Hoepmanand Jacobs (2007) describe

howthe exposure of the code for a proprietary voting system revealed serious

security ﬂaws. Open accessibility is also necessary for government security

agencies that have to audit software before using it to ensure its operation is

transparent(Stoltz, 1999). Though security agencies canmake special arrange-

mentswith proprietarydistributors togain access toproprietary code,this access

is automatically available for open source. Open source products also have a

uniquelybroad peerreview processthat lendsitself todetection of defectsduring

development,increasing reliability. Not only are the changes to software pro-

posedby developers scrutinized by project maintainers, but also anybystander

observing the development can comment on defects, propose implementation

suggestions,and critique the workof contributors. One of themost well-known

aphorisms of the open source movement “Given enough eyeballs, all bugs are

shallow”(Raymond, 1998) identiﬁes anadvantage that may translate intomore

reliable software. In open source “All the world’s a stage” with open source

developersvery public actors on that stage. The internal exposure and review

of open source occurs not just when an application is being developed and

improvementsare reviewed by project developers and maintainers, but for the

entirelife cycleof the productbecause its codeis always open.These theoretical

beneﬁtsof open source appear to beveriﬁed by data. For example,a signiﬁcant

empiricalstudy described in Reasoning Inc. (2003) indicates that free MySQL

hadsix timesfewer defectsthan comparable proprietarydatabases (Tong,2004).

Alegendaryacknowledgment of Linux reliabilitywas presented in the famous

MicrosoftHalloween documents (Valloppillil,1998) which described Linux as

havinga failure rate two to ﬁve times lower than commercial Unix systems.

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

6 1Introduction

The open source Linux platform is the most widely ported operating sys-

tem.It is dominant on servers, workstations,and supercomputers and is widely

used in embedded systems like digital appliances. In fact, its portability is

directly related to the design decisions that enabled the distributed open style

of development under which Linux was built in the ﬁrst place. Its software

organizationallowed architect Linus Torvaldsto manage core kernel develop-

ment while other distributed programmers could work independently on so-

called kernel modules (Torvalds,1999). This structure helped keep hardware-

speciﬁccode like device drivers out of the corekernel, keeping the core highly

portable(Torvalds, 1999).Another key reason whyLinux is portable isbecause

the GNU GCC compiler itself is ported to most “major chip architectures”

(Torvalds,1999,p.107).Ironically, it is the open source Wine software that

lets proprietary Windowsapplications portably run on Linux. Of course, there

are open source clones of Windows products like MS Ofﬁce that work on

Windows platforms. A secondary consideration related to portability is soft-

ware localization and the related notion of internationalization. Localization

refers to the ability to represent a system using a native language. This can

involvethe language a system interface is expressed in, character-sets or even

syntactical effects like tokenization (since different human languages are bro-

kenupdifferently, which can impact the identiﬁcation of search tokens). It

may be nontrivial for a proprietary package that is likely to have been devel-

oped by a foreign corporation to be localized, since the corporate developer

may only be interested in major language groupings. It is at least more nat-

ural for open software to be localized because the source code is exposed

and there may be local open developers interested in the adaptation. Interna-

tionalization is a different concept where products are designed in the ﬁrst

place so that they can be readily adapted, making subsequent localization

easier. Internationalization should be more likely to be on the radar screen

in an open source framework because the development model itself is inter-

national and predisposed to be alert to such concerns. However, Feller and

Fitzgerald(2002) who are sympathetic to free software critique it with respect

to internationalization and localization, contrasting what appears to be, for

example, the superior acceptability of the Microsoft IIS server versus Apache

on these metrics. They suggest the root of the problem is that these char-

acteristics are harder to “achieve if they are not factored into the original

design” (p. 113). Generally, open source seems to have an advantage in sup-

porting the customization of applications over proprietary code, because its

code is accessible and modiﬁcation of the code is allowed by the software

license.

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.1Why Open Source 7

1.1.3 Forward-looking Effects

Is open source innovative or imitative? The answer is a little of both. On the

onehand, open source products are often developed by imitating the function-

ality of existing proprietary products, “following the taillights” as the saying

goes.This is what theGNOME project does for desktopenvironments, just like

Appleand Microsofttook off onthe graphical environmentsdeveloped atXerox

PARCin the early1980s. However, open developmenthas also been incredibly

innovative in developing products for the Internet environment, from infras-

tructure software like code implementing the TCP/IP protocols, the Apache

Web server, the early browsers at CERN and NCSA that led to the explosion

of commercial interest in the Internet to hugely successful peer-to-peer ﬁle

distributionsoftware like BitTorrent. Much ofthe innovation in computing has

traditionallyemerged from academicand governmental research organizations.

The open source model provides a singularly appropriate outlet for deploying

theseinnovations: in a certain sense it keeps these works public.

In contrast, Microsoft, the preeminent proprietary developer, is claimed by

manyin the open community to have a limited record of innovation. A typical

contention is illustrated in the claim by the FSF’s Moglen that “Microsoft’s

strategy as a business was to ﬁnd innovative ideas elsewhere in the software

marketplace, buy them up and either suppress them or incorporate them in its

proprietary product” (Moglen, 1999). Certainly a number of Microsoft’s sig-

nature products have been reimplementations of existing software (Wheeler,

2006)oracquisitions which were possibly subsequently improved on. These

includeQDOS (later MS-DOS)from Seattle Computer in1980 (Conner, 1998),

FrontPagefrom Vermeer in 1996(Microsoft Press Release, 1996), PowerPoint

from Forethought in 1987 (Parker, 2001), and Cooper’s Tripod subsequently

developed at Microsoft into Visual Basic in 1988 (Cooper, 1996). In a sense,

these small independent companies recognized opportunities that Microsoft

subsequently appropriated. For other examples, see McMillan (2006). On the

other hand, other analysts counter that a scenario where free software domi-

nateddevelopment could seriously undermineinnovation. Thus Zittrain (2004)

critically observes that “no one can readily monopolize derivatives to popular

freesoftware,” which is a precondition to recouping the investments needed to

improvethe original works; see also Carroll (2004).

Comparisons with proprietary accomplishments aside, the track record on

balancesuggests thatthe opensource paradigm encouragesinvention.The avail-

ability of source code lets capable users play with the code, which is a return

to a venerable practice in the history of invention: tinkering (Wheeler, 2005).

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

8 1Introduction

The public nature of Internet-based open development provides computer sci-

encestudents everywhere with anever-available set ofworld-class examples of

software practice. The communities around open source projects offer unique

environmentsfor learning. Indeed, the opportunity to learn is one of the most

frequentlycited motivations for participating in such development. The model

demonstrablyembodies a participatory worldwide engine of invention.

1.1.4 Economic Impact

Freeand open software is an important and established feature of the commer-

cial developmentlandscape. Granted, no open source company has evolved to

anything like the economic status of proprietary powerhouses like Microsoft;

nonetheless, the use of open source, especially as supporting infrastructure

for proprietary products, is a widely used and essential element of the busi-

ness strategies of major companies from IBM to Apple and Oracle. Software

companiestraditionally rely at least partly onclosed, proprietary code to main-

tain their market dominance. Open source, on the other hand, tends to under-

minemonopoly, thelikelihood of monopolistic dominancebeing reduced to the

extentthat majorsoftware infrastructure systemsand applications are open.The

largestproprietary software distributors are U.S. corporations – a factor that is

increasinglyencouraging counterbalancing nationalistic responses abroad. For

example,foreign governments are more than everdisposed to encourage a pol-

icy preference for open source platforms like Linux. The platforms’ openness

reducestheir dependency on proprietary, foreign-produced code, helps nurture

thelocal pool of software expertise, and preventslock-in to proprietary distrib-

utorsand a largely English-only mode where local languages may not even be

supported.Software is a corecomponent of governmental operation andinfras-

tructure,so dependency on extranational entities is perceived as a security risk

anda cession of control to foreign agency.

At the macroeconomic level, open source development arguably reduces

duplication of effort. Open code is available to all and acts as a public reposi-

tory of software solutions to a broad range of problems, as well as best prac-

tices in programming. It has been estimated that 75% of code is written for

speciﬁc organizational tasks and not shared or publicly distributed for reuse

(Stoltz,1999). The open availability of such source code throughout the econ-

omy would reduce the need to develop applications from scratch. Just as soft-

ware libraries and objects are software engineering paradigms for facilitating

softwarereuse, at a much grander scale the opensource movement proposes to

preserveentire ecosystems of software, open for reuse, extension, and modiﬁ-

cation. It has traditionally been perceived that “open source software is often

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.1Why Open Source 9

geared toward information technology specialists, to whom the availability of

source code can be a real asset, (while) proprietary software is often aimed

at less sophisticated users” (Hahn, 2002). Although this observation could be

reﬁned,generally a majorappeal of open sourcehas been that itscode availabil-

itymakes it easier for ﬁrms to customize the softwarefor internal applications.

Such in-house customization is completely compatible with all open source

licensesand is extremely signiﬁcant since most software is actually developed

or custom-designed rather than packaged (Beesen, 2002). As a process, open

source can also reduce the development and/or maintenance risks associated

with software development even when done by private, for-proﬁt companies.

Forexample,considercode that hasbeen developed internally fora company.It

mayoften have little or noexternal sales value to the organization,even though

itprovides a useful internal service. Stallman (1999) recounts theexample of a

distributedprint-spooler written for an in-house corporate network. There was

a good chance the life cycle of the code would be longer than the longevity

of its original programmers. In this case, distributing the code as open source

created the possibility of establishing an open community of interest in the

software. This is useful to the company that owns the code since it reduces

the risk of maintenance complications when the original developers depart.

Withany luck, it may connect the software to a persistent pool of experts who

become familiar with the software and who can keep it up to date for their

own purposes. More generally, open development can utilize developers from

multipleorganizations in order tospread out development risksand costs, split-

ting the cost among the participants. In fact, while much open source code

has traditionally been developed with a strong volunteer pool, there has also

beenextensive industrial support for open development. Linux development is

a prime example. Developed initially under the leadership of Linus Torvalds

usinga purely volunteermodel, most current Linuxcode contributions are done

byprofessional developers who are employees of for-proﬁt corporations.

References

Beesen, J. (2002). What Good is Free Software? In: Government Policy toward Open

Source Software, R.W. Hahn (editor). Brookings Institution Press, Washington,

DC.

Carroll, J. (2004). Open Source vs. Proprietary: Both Have Advantages. ZDNet

Australia.http://opinion.zdnet.co.uk/comment/0,1000002138,39155570,00.htm.

AccessedJune 17, 2007.

Conner,D. (1998). Fatherof DOS Still HavingFun at Microsoft, MicrosoftMicroNews,

April10. http://www.patersontech.com/Dos/Micronews/paterson04

10 98.htm.

AccessedDecember 20, 2006.

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

10 1 Introduction

Cooper, A. (1996). Why I Am Called “the Father of Visual Basic,” Cooper Interac-

tion design. http://www.cooper.com/alan/father

of vb.html. Accessed December

20,2006.

Cowan,C. (2003). Software security for open-source systems. IEEE Security and Pri-

vacy,1,38–45.

Feller,J. and Fitzgerald,B. (2002). Understanding OpenSource Software Development.

Addison-Wesley,Pearson Education Ltd., London.

Hahn,R. (2002). Government Policy towardOpen Source Software: An Overview.In:

GovernmentPolicy toward Open Source Software,R.W. Hahn (editor). Brookings

InstitutionPress, Washington, DC.

HoepmanJ.H. and Jacobs, B. (2007). Increased Security through Open Source, Com-

municationsof the ACM, 50(1), 79–83.

McMillan,A.(2006). Microsoft “Innovation.”http://www.mcmillan.cx/innovation.html.

AccessedDecember 20, 2006.

Microsoft Press Release. (1996). Microsoft Acquires Vermeer Technologies Inc., Jan-

uary 16th. http://www.microsoft.com/presspass/press/1996/jan96/vrmeerpr.mspx.

AccessedDecember 20, 2006.

Moglen,E. (1999). Anarchism Triumphant: Free Software and the Death of Copyright.

FirstMonday, 4(8). http://www.ﬁrstmonday.org/issues/issue4

8/moglen/index.

html.Accessed January 5, 2007.

Parker,I. (2001). Absolute Powerpoint – Can a Software Package Edit Our Thoughts.

New Yorker,May 28. http://www.physics.ohio-state.edu/˜wilkins/group/powerpt.

html.Accessed December 20, 2006.

Raymond, E. (1999). The Revenge of the Hackers. In: Open Sources: Voices from the

OpenSource Revolution, M. Stone,S. Ockman, and C. DiBona (editors). O’Reilly

Media,Sebastopol, CA, 207–219.

Raymond,E.S. (1998). The Cathedral and the Bazaar. Fi rst Monday, 3(3). http://www.

ﬁrstmonday.dk/issues/issue3

3/raymond/index.html.Accessed December 3, 2006.

ReasoningInc. (2003). HowOpen Source and CommercialSoftware Compare: MySQL

whitepaper MySQL 4.0.16.http://www.reasoning.com/downloads.html. Accessed

November29, 2006.

Rosen,L. (2005). Open Source Licensing: Software Freedom and Intellectual Property

Law,Prentice Hall, Upper Saddle River, NJ.

Stallman, R. (1999). The Magic Cauldron. http://www.catb.org/esr/writings/magic-

cauldron/.Accessed November 29, 2006.

Stoltz, M. (1999). The Case for Government Promotion of Open Source Soft-

ware. NetAction White Paper. http://www.netaction.org/opensrc/oss-report.html.

AccessedNovember 29, 2006.

Tong,T.(2004). Free/Open SourceSoftware in Education. UnitedNations Development

Programme’sAsia-Paciﬁc Information Programme, Malaysia.

Torvalds,L. (1999). The Linux Edge. In: Open Sources: Voicesfrom the Open Source

Revolution, M. Stone, S. Ockman, and C. DiBona (editors). O’Reilly Media,

Sebastopol,CA, 101–112.

Valloppillil, V. (1998). Open Source Software: A (New?) Development Methodol-

ogy.http://www.opensource.org/halloween/.The HalloweenDocuments. Accessed

November29, 2006.

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.2Preview 11

Wheeler, D. (2005). Microsoft the Innovator? http://www.dwheeler.com/innovation/

microsoft.html.Accessed November 29, 2006.

Wheeler, D. (2006). Why Open Source Software/Free Software (OSS/FS, FLOSS,

or FOSS)? Look at the Numbers! http://www.dwheeler.com/oss

fs why.html.

AccessedNovember 29, 2006.

Zittrain,J. (2004). Normative Principles for Evaluating Free and Proprietary Software.

Universityof Chicago Law Review, 71(1), 265–287.

1.2Preview

We will view the panorama of open source development through a number of

different lenses: brief descriptive studies of prominent projects, the enabling

technologies of the process, its social characteristics, legal issues, its status

as a movement, business venues, and its public and educational roles. These

perspectivesare interconnected. For example, technological issues affect how

the development process works. In fact, the technological tools developed by

open source projects have at the same time enabled its growth. The paradigm

has been self-hosting and self-expanding, with open systems like Concurrent

VersionsSystem (CVS) and the Internet vastly extending the scale on which

open development takes place. Our case studies of open projects will reveal

its various social, economic, legal, and technical dimensions. We shall see

how its legal matrix affects its business models, while social and psycholog-

ical issues are in turn affected by the technological medium. Though we will

separate out these various factors, the following chapters will also continu-

ally merge these inﬂuences. The software projects we consider are intended

to familiarize the reader with the people, processes, and accomplishments of

freeand open development, focusingon Internet applications and free software

platforms. The enabling technologies of open development include the fasci-

natingversioning systems both centralized anddistributed that make enormous

openprojects feasible. Such novel modes of collaboration invariably pose new

questions about the social structures involved and their affect on how people

interact,as well as the psychological and cognitivephenomena that arise in the

newmedium/modality. Open developmentis signiﬁcantly dependent on alegal

infrastructureas well as on a technological one, so we willexamine basic legal

conceptsincluding issues likelicensing arrangements and thechallenge of soft-

warepatents. Social phenomenalike open developmentdo not justhappen; they

depend on effectiveleadership to articulate and advance the movement. In the

caseof free and open software, we shall see how the FSF and the complemen-

tary OSI have played that role. The long-term success of a software paradigm

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

12 1 Introduction

requires that it be economically viable. This has been accomplished in free

software in different ways, from businesses based purely on open source to

hybrid arrangements more closely aligned with proprietary strategies. Beyond

the private sector,we consider the public sector of education and government

andhow theycapitalize on opensource or affectits social role.We willclose our

treatment by brieﬂy considering likely future developments, in a world where

information technology has become one of the central engines of commerce

andculture.

Section One of the book covers key open source Internet applications and

platforms, and surveys technologies used in distributed collaborative open

development.Section Twoaddresses social issuesranging from the demograph-

icsof participants to legal issues andbusiness/economic models. Section Three

highlightsthe role of the Free Software Foundation in the movement, the rela-

tionof open sourceto the public sectorin government and education,and future

prospects.A glimpse of the topics covered by the remaining chapters follows.

Chapter 2 recounts some classic stories of open development related to

theInternet, like Berners-Lee’s groundbreakingwork on the Webat CERN, the

developmentof the NCSA HTTP Web serverand Mosaic browser, the Apache

project,and more. These casestudies represent remarkable achievementsin the

history of business and technology. They serve to introduce the reader unfa-

miliar with the world of open source to some of its signature projects, ideas,

processes, and people. The projects we describe have brought about a social

andcommunications revolution thathas transformed society. Thestory of these

achievements is instructive in many ways: for learning how the open source

process works, what some of its major attainments have been, who some of

the pioneering ﬁgures in the ﬁeld are, how projects have been managed, how

peoplehave approached developmentin this context,what motivations haveled

people to initiate and participate in such projects, and some of the models for

commercialization.We consider the servers and browsers thatfueled the Inter-

net’s expansion, programming languages like Perl and PHP and the MySQL

database so prominent in Internet applications, newer systems like BitTorrent,

Firefox,and others.We alsoreview the Fetchmailproject thatbecame famous as

anexemplar ofInternet-based, collaborative, bazaar-styledevelopment because

ofa widely inﬂuential essay.

Chapter 3 explores the open source platform by which we mean the open

operating systems and desktops that provide the infrastructure for user inter-

action with a computer system. The root operating system model for open

source was Unix. Legal and proprietary issues associated with Unix led to the

development of the fundamentally important free software GNU project, the

aim of which was to create a complete and self-contained free platform that

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.2Preview 13

would allow anyone to do all their software development in a free software

environment. The ﬂagship Linux operating system evolved out of a port of a

Unix variant to a personal computer environment and then burgeoned into the

centerpieceproject of the open software movement. The Linux and free Unix-

like platforms in turn needed a high-quality desktop style interface and it was

out of this imperative that the two major open desktops GNOME and KDE

emerged,which in turn dependedon the fundamental functionality providedby

the X Window System. This chapter recounts these epic developments in the

historyof computing, describing the people, projects, and associated technical

andlegal issues.

Chapter 4 overviews the key technologies used to manage open source

projects,with aspecial emphasis onCVS. Thefree software movementemerged

in the early 1980s, at a time when the ARPANET network with its several

hundred hosts was well-established and moving toward becoming the Inter-

net.The ARPANET allowed exchanges like e-mail andFTP, technologies that

signiﬁcantly facilitated distributed collaboration, though the Internet was to

greatly amplify this. The TCP/IP protocols that enabled the Internet became

the ARPANETstandard on January 1, 1983, about the same time the ﬂagship

opensource GNU project was announced by freesoftware leader and advocate

Richard Stallman. By the late 1980s the NSFNet backbone network merged

with the ARPANET to form the emerging worldwide Internet. The exponen-

tialspread of the Internet catalyzed further proliferation of open development.

The speciﬁc communications technologies used in open source projects have

historicallytended to be relatively lean: e-mail, mailing lists, newsgroups, and

lateron Websites, Internet Relay Chat, andforums. Major open sourceprojects

likeLinux in theearly 1990s still beganoperation with e-mail, newsgroups,and

FTP downloads to communicate. Newsgroups provided a means to broadcast

ideas to targeted interest groups whose members might like to participate in

adevelopment project. Usenet categories acted like electronic bulletin boards

which allowed newsgroup participants to post e-mail-like messages, like the

famous comp.os.minix newsgroup on Usenet used by Linus Torvalds to ini-

tiate the development of Linux. A powerful collaborative development tool

wasdeveloped during the late 1980s and early 1990s that greatly facilitated

managingdistributed software development:the versioning system. Versioning

systems are software tools that allow multiple developers to work on projects

concurrentlyand keeptrack of changes madeto the code.This chapter describes

in some detail how CVS works. To appreciate what it does it is necessary to

have a sense of its commands, their syntax, and outputs or effects and so we

examinethese closely. Wealso consider newer versioning tools like the decen-

tralized system BitKeeper that played a signiﬁcant role in the Linux project

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

14 1 Introduction

for a period of time, its free successor Git, and the Subversion system. Other

means that have facilitated open source development have been the software

hosting facilities that help distributed collaborators manage their open source

projects and provide source code repositories for projects. We describe some

ofthe services they provide and the major Web sites.

Thereare many demographic, social, psychological, cognitive,process, and

mediacharacteristics that affectopen source development.Chapter 5 overviews

someof these. It also introduces a variety of concepts from the social sciences

that can be brought to bear on the open source phenomenon to help provide

a framework for understanding this new style of human, scientiﬁc, and com-

mercialinteraction. We ﬁrst of all consider the basic demographics of the phe-

nomenon,such asthe numberand scale ofprojects underdevelopment, the kinds

ofsoftware thattend to beaddressed, population characteristics andmotivations

fordevelopersand community participants,how participantsinteract. Wesurvey

relevantconcepts from social psychology, including the notions of norms and

roles,the factors that affect group interactionslike compliance, internalization,

andidentiﬁcation, normative inﬂuences, theimpact of power relationships, and

groupcohesion. Ideaslike these fromthe ﬁeld ofsocial psychologyhelp provide

conceptualtools forunderstanding open development.Other usefulabstractions

comefrom cognitive psychology,like the well-recognized cognitivebiases that

affectgroup interactions and problemsolving. Social psychology also provides

models for understanding the productivity of collaborative groups in terms of

what are called process losses and gains, as well as organizational effects that

affectproductivity. The impact of the collaborative medium on group interac-

tionsis worth understanding,so we brieﬂy describesome of theclassic research

on the effect of the communications medium on interaction. Like the ﬁeld of

socialpsychology, media research offersa rich array of concepts and a pointof

departurefor understandingand analyzing distributedcollaboration. Potentially

useful concepts range from the effect of so-called common ground, coupling,

andincentive structures,to the useof social cuesin communication, therichness

ofinformational exchanges,and temporal effectsin collaboration. Weintroduce

thebasic concepts and illustrate their relevance to open collaboration.

The open source movement is critically affected by legal issues related to

intellectual property. Intellectual property includes creations like copyrighted

works,patented inventions,and proprietarysoftware. Theobjective of Chapter6

isto surveythe related legalissues in away that isinformative for understanding

theirimpact on free and opendevelopment. In addition to copyrightand patent,

wewill touchon topics likesoftware patents,licenses and contracts,trademarks,

reverseengineering, the notion ofreciprocity in licensing, and derivativeworks

insoftware. The legal and businessmechanisms to protect intellectual property

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.2Preview 15

areintended to address what is usually considered to be its core problem: how

to protect creations in order to provide incentivesfor innovators. Traditionally

such protection has been accomplished through exclusion. For example, you

cannotdistribute a copyrighted work foryour own proﬁt without the authoriza-

tion of the copyright owner. The FSF’s GPL that lies at the heart of the free

softwaremovement takes a verydifferent attitude to copyright, focusing noton

how to invoke copyright to exclude others from using your work, but on how

toapply it to preserve the free and open distribution of your work, particularly

when modiﬁed. Wedescribe the GPL and the rationales for its conditions. We

also consider the OSI and the motivations for its licensing criteria. The OSI,

cofounded by Eric Raymond and Bruce Perens in 1998, was established to

representwhat was believed to be amore pragmatic approach to open develop-

mentthan that championed by the FSF.The OSI reﬂected the experience ofthe

streamof the free software movement that preferred licenses like the BSD and

MIT licenses which appeared more attractive for commercial applications. It

reﬂectedthe attitude of developers like McKusick of the BSDproject and Get-

tys of the X Window System. Wedescribe some of the OSI-certiﬁed software

licenses including the increasingly important Mozilla Public License. Wealso

brieﬂyaddress license enforcement and international issues, and the status and

conditionsof the next version of the GPL: GPLv3.

Chapter 7 examines economic concepts relevant to open source develop-

ment,the basic business modelsfor open products, theimpact of software com-

moditization, and economic models for why individuals participate in open

development.Some of the relevant economic concepts include vendor lock-in,

network effects (or externalities), the total cost of use of software, the impact

oflicensing on business models,complementary products, and the potentialfor

customizability of open versus proprietary products. The basic open business

models we describe include dual licensing, consultation on open source prod-

ucts, provision of open source software distributions and related services, and

the important hybrid models like the use of open source for in-house devel-

opment or horizontally in synergistic combination with proprietary products,

such as in IBM’s involvementwith Apache and Linux. We also examine soft-

ware commoditization, a key economic phenomenon that concerns the extent

to which a product’sfunction has become commoditized (routine or standard)

over time. Commoditization deeply affects the competitive landscape for pro-

prietaryproducts. We will present some of the explanations that have been put

forth to understand the role of this factor in open development and its impli-

cations for the future. Finally, observers of the open source scene have long

been intrigued by whether developers participate for psychological, social, or

other reasons. We will consider some of the economic models that have been

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

16 1 Introduction

offeredto explain why developersare motivated to workon these projects. One

model, based on empirical data from the Apache project, uses an effect called

signaling to explain why individuals ﬁnd it economically useful to volunteer

foropen source projects. Anothermodel proposes that international differences

in economic conditions alter the opportunity cost of developer participation,

which in turn explains the relative participation rates for different geographic

regions.

The chapter on legal issues recounted the establishment and motivation for

the OSI in 1998 and Chris Peterson’s coinage of the open source designation

as an alternative to what was thought to be the more ideologically weighted

phrasefree software. The OSI represents onemain stream of the open software

movement.Of course, the stream of the movement represented bythe FSF and

the GNU project had already been formally active since the mid-1980s. The

FSFand its principals, particularlyRichard Stallman, initiated thefree software

concept,deﬁned its terms, vigorouslyand boldly publicized its motivationsand

objectives, established the core GNU project, and led advocacy for the free

software movement. They have been instrumental in its burgeoning success.

Chapter 8 goes into some detail to describe the origin and technical objectives

of the GNU project, which represents one of the major technical triumphs of

the free software movement. It also elaborates on the philosophical principles

espousedby the FSF,as well assome of the rolesand services theFSF provides.

Chapter 9 considers the role of open source in the public sector which,

in the form of government and education, has been critical to the creation,

development,funding, deployment, and promotion/advocacyof open software.

The public sector continues to offer well-suited opportunities for using and

encouraging open source, in domains ranging from technological infrastruc-

ture to national security, educational use, administrative systems, and so on,

bothdomestically and internationally. Open sourcehas characteristics that nat-

urally suit many of these areas. Consider merely the role of the public sector

insupporting the maintenance and evolutionof technological infrastructure for

society,an area in which open software has proven extremely successful. The

governmenthas also historically played an extensiverole in promoting innova-

tion in science and technology. For example, the federal government was the

leaderin funding the development of theInternet with its myriad of underlying

open software components. Thus public investment in open development has

paid off dramatically in the past and can be expected to continue to do so in

the future. The transparency of open source makes it especially interesting in

national security applications. Indeed, this is an increasingly recognized asset

ininternational use where proprietarysoftware may be considered,legitimately

ornot, as suspect. Not only do governmental agencies beneﬁt as users of open

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

1.2Preview 17

source,government andeducational institutions alsoplay a rolein promoting its

expandeduse. Governmentalpolicy decisions,whether ofa legislativeor policy-

drivencharacter, can signiﬁcantly affect the expansion of open software use in

thegovernment andby the public. Forexample, nationalistic concernsabout the

economicautonomy of local softwareindustries or about nationalsecurity have

made open source increasingly attractive in the international arena. Lastly,we

willaddress at somelength the usesand advantagesof open source ineducation,

includingits unique role in computer science education.

We conclude our book in Chapter 10 with what, we believe, are the likely

scenariosfor the prospective roles of open and proprietary software.Our inter-

pretationis a balanced one. On the one hand, the open source paradigm seems

likelyto continue its advancetoward worldwide preeminence incomputer soft-

wareinfrastructure, not only in the network and itsassociated utilities, but also

in operating systems, desktop environments, and standard ofﬁce utilities. Sig-

niﬁcantly, the most familiar and routine applications seem likely to become

commoditizedand open source, resulting inpervasive public recognition of the

movement. The software products whose current dominance seems likely to

decline because of this transformation include signiﬁcant parts of the current

Microsoft environment from operating systems to ofﬁce software. However,

despite a dramatic expansion in the recognition and use of open source, this

in no ways means that open software will be dominant in software applica-

tions. To the contrary, the various dual modalities that have already evolved

arelikely to persist, with robust openand proprietary sectors each growing and

prevailing in different market domains. While on the one hand, some exist-

ing proprietary systems may see portions of their markets overtaken by open

source replacements, on the other hand proprietary applications and hybrid

modesof commercial development should continue to strengthen. Specialized

proprietary killer-apps serving mega-industries are likely to continue to domi-

natetheir markets, as will distributednetwork services built onopen infrastruc-

turesthat have beenvertically enhanced with proprietaryfunctionalities. Mixed

applicationmodes like those reﬂected in theWAMP stack (withWindows used

in place of Linux in the LAMP stack) and the strategically signiﬁcant Wine

project that allows Windows applications to run on Linux environments will

also be important. The nondistributed, in-house commercial development that

hashistorically represented the preponderance of software development seems

likely to remain undisclosed either for competitive advantage or by default,

but this software is being increasingly built using open source components –

a trend that is already well-established. The hybrid models that have emerged

as reﬂected in various industrial/community cooperative arrangements, like

thoseinvolving the ApacheFoundation, the X WindowSystem, and Linux, and

P1:KAE

9780521881036c01 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:20

18 1 Introduction

based on industrial support for open projects under various licensing arrange-

ments, seem certain to strengthen even further. They represent an essential

strategyfor spreading the risks and costs of software development and provid-

ing an effective complementary set of platforms and utilities for proprietary

products.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

SECTION ONE

Open Source – Internet Applications,

Platforms, and Technologies

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

Open Source Internet Application Projects

This chapter describes a number of open source applications related to the

Internet that are intended to introduce the reader unfamiliar with the world

of open development to some of its signature projects, ideas, processes, and

people. These projects represent remarkable achievements in the history of

technology and business. They brought about a social and communications

revolution that transformed society, culture, commerce, technology, and even

science. The story of these classic developments as well as those in the next

chapter is instructive in many ways: for learning how the open source process

works, what some of its major accomplishments have been, who some of the

pioneeringﬁgures in theﬁeld are, howprojects have beenmanaged, how people

haveapproached developmentin this context, whatmotivations haveled people

toinitiate andparticipate in suchprojects, and whatsome of thebusiness models

arethat have been used for commercializing associated products.

Web servers and Web browsers are at the heart of the Internet and free

software has been prominent on both the server and browser ends. Thus the

ﬁrst open source project we will investigate is a server, the so-called National

Center for Supercomputing Applications (NCSA) Web server developed by

Rob McCool in the mid-1990s. His work had in turn been motivated by the

then recent creation by Tim Berners-Lee of the basic tools and concepts for a

WorldWide Web (WWW), including the invention of the ﬁrst Webserver and

browser,HTML (the Hypertext Markup Language), and the HTTP (Hypertext

TransferProtocol). For various reasons, McCool’s server project subsequently

forked, leading to the development of the Apache Web server. It is instruc-

tive and exciting to understand the dynamics of such projects, the contexts

in which they arise, and the motivations of their developers. In particular, we

will examine in some detail how the Apache project emerged, its organiza-

tional processes, and what its development was like. Complementary to Web

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

22 2 Open Source Internet Application Projects

servers, the introduction of easily used Web browsers had an extraordinary

impact on Web use, and thereby a revolutionary effect on business, technol-

ogy,and society at large. The Mosaic, Netscape,and more recently the Firefox

browserprojects thatwe will discusseven sharedsome of thesame development

context. The success of the Mosaic browser project was especially spectacu-

lar. In fact it was instrumental in catalyzing the historic Internet commercial

revolution.Mosaic’s developer Marc Andreessen later moved on to Netscape,

where he created, along with a powerhouse team of developers, the Netscape

browserthat trumped all competition in the browserﬁeld for several years. But

Netscape’sstunning success proved to be temporary. After its initial triumph,

acombination of Microsoft’s bundling strategiesfor Internet Explorer (IE) and

thelatter’s slowbut steady improvement eventuallywon the dayover Netscape.

Thingslay dormant inthe browser areafor a whileuntil Firefox, adescendant of

theNetscape Mozilla browser, cameback to challenge IE, as weshall describe.

Theprocess ofcomputer-supported, distributedcollaborative softwaredevel-

opmentis relativelynew. Although elementsof it havebeen around fordecades,

thekind ofdevelopment seen inLinux wasnovel. Eric Raymondwrote a famous

essay on Linux-like development in which he recounted the story of his own

Fetchmail project, an e-mail utility. Although Fetchmail is far less signiﬁcant

as an open source product than other projects that we review, it has come to

havea mythicalpedagogical status in theﬁeld because Raymondused its devel-

opment – which he intentionally modeled on that of Linux – as an exemplar

of howdistributed open development works and why people develop software

thisway. Raymond’s viewpoints werepublished in his widely inﬂuential essay

(Raymond,1998) that characterizedopen development as akinto a bazaar style

of development, in contrast to the cathedral style of development classically

describedin Fred Brooks’ famed The Mythical Man Month(twentieth anniver-

saryedition in 1995). Wewill describe Fetchmail’s developmentin some detail

becauseof its pedagogical signiﬁcance.

We conclude the chapter with a variety of other important Internet-related

open applications. A number of these are free software products that have

beencommercialized using the so-calleddual licensing model. Theseare worth

understanding,ﬁrst of all because licensing issues are importantin open devel-

opment, and secondly because there is an enduring need for viable business

strategies that let creators commercially beneﬁt from open software. The ﬁrst

of these dual licensed projects that we will consider is the MySQL database

system. MySQL is prominent as the M in the LAMP Web architecture, where

it deﬁnes the backend database of a three-tier environment whose other com-

ponents are Linux, Apache, Perl, PHP, and Python. Linux is considered in

Chapter 3. Perl and PHP are considered here. We describe the inﬂuential role

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 23

ofPerl and its widely usedopen source module collection CPAN,as well as the

server-sidescripting language PHP thathas its own rather interestingmodel for

commercialization.We alsobrieﬂy consider BerkeleyDB and Sendmail (which

servesa substantial portion of all Internetsites). Both of these are duallicensed

free softwares. Additional business models for free software are examined in

Chapter 7. The peer-to-peer Internet utility BitTorrent is a more recent open

sourcecreation that exploits theinterconnectedness of the Internet network ina

novelway and isintellectually intriguing to understand.BitTorrent has,in a few

short years, come to dominate the market for transferring popular, large ﬁles

over theInternet. We complete the chapter witha brief look at the fundamental

BIND utility that underlies the domain name system for the Internet, which

makessymbolic Web names possible.The tale of BIND represents astory with

anunexpected and ironic business outcome.

2.1 The WWW and the Apache Web Server

The story of the Apache Web server is a classic tale of open development. It

hasits roots in the fundamental ideas for the WWW conceivedand preliminar-

ily implemented by Tim Berners-Lee at a European research laboratory. Soon

afterward,these applications were taken up by studentsat an American univer-

sity,where Berners-Lee’s Webbrowser and serverwere dramatically improved

upon and extended as the NCSA Web server and the Mosaic browser. The

NCSAserver project would in turnbe adopted and its design greatlyrevised by

anew distributed development team. The resulting Apache server’s entry into

themarketplace was rapid and enduring.

2.1.1 WWW Development at CERN

We begin by highlighting the origins of the Web revolution. The idea for the

WWWwas originatedby physicist Berners-Leeat theCERN physics laboratory

in Switzerland when he proposed the creation of a global hypertext system

in 1989. The idea for such a system had been germinating in Berners-Lee’s

mind for almost a decade and he had even made a personal prototype of it

in the early 1980s. His proposal was to allow networked access to distributed

documents,including theuse ofhyperlinks. As anMIT Webpage onthe inventor

says,

Berners-Lee’svision was to create a comprehensive collection of information in

word,sound and image, each discretely identiﬁed by UDIs and interconnected by

hypertextlinks, and to use the Internet to provide universal access to that collection

ofinformation (http://web.mit.edu/invent/iow/berners-lee.html).

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

24 2 Open Source Internet Application Projects

Berners-Lee implemented the ﬁrst Web server and a text-oriented Web

browser and made it available on the Web in 1991 for the NeXT operating

system.In fact, he not only developedthe server and browser,but also invented

HTTP, HTML, and the initial URI version of what would later become URLs

(uniformresource locators).His HTTP protocolwas designedto retrieve HTML

documents over a network, especially via hyperlinks. He designed HTML for

hisproject by creating asimpliﬁed version ofan SGML DTD heused at CERN,

which had been intended for designing documentation. He introduced a new

hyperlink anchor tag <a> that would allow distributed access to documents

andbe central to theWWW paradigm (Berglundet al., 2004). Berners-Leekept

his prototype implementations simple and widely publicized his ideas on the

www-talkmailing list started at CERN in 1991. He named his browser World-

WideWeband called his Web server httpd (Berners-Lee, 2006). The server ran

as a Unix background process (or daemon), continually waiting for incoming

HTTPrequests which it would handle.

Atabout the same point in time, Berners-Lee became familiar with the free

softwaremovement. Indeed, the Free Software Foundation’sRichard Stallman

gavea talkat CERN inmid-1991. Berners-Lee recognized thatthe free software

communityoffered the prospect of a plentitude of programmervolunteers who

coulddevelop his workfurther, so he beganpromoting the development ofWeb

browser software as suitable for projects for university students (Kesan and

Shah, 2002)! He had his own programmer gather the software components he

haddeveloped intoa C librarynamed libwww, whichbecame thebasis for future

Web applications. Berners-Lee’s initial inclination was to release the libwww

contents under the Free Software Foundation’s GPL license. However, there

were concerns at the time that corporations would be hesitant to use the Web

if they thought they could be subjected to licensing problems, so he decided

torelease it as public domain instead, which was, in any case, the usual policy

at CERN. By the fall of 1992, his suggestions about useful student projects

would indeed be taken up at the University of Illinois at Urbana–Champaign.

In 1994, Berners-Lee founded and became director of the W3C (World Wide

Web Consortium) that develops and maintain standards for the WWW. For

further information, see his book on his original design and ultimate objective

forthe Web (Berners-Lee and Fischetti, 2000).

2.1.2 Web Development at NCSA

The NCSA was one of the hubs for U.S. research on the Internet. It produced

major improvements in Berners-Lee’sWeb server and browser, in the form of

the NCSA Web server (which spawned the later Apache Web server) and the

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 25

MosaicWebbrowser. Wewilldiscuss the NCSAserver projectand its successor,

thestill preeminent Apache Webserver, in this section. Thesubsequent section

will consider the Mosaic browser and its equally famous descendants, which

even include Microsoft’sown IE.

Like many open source projects, the now pervasive Apache Web server

originated in the creativity and drive of youthful computer science students.

One of them was Rob McCool, an undergraduate computer science major at

the University of Illinois and a system administrator for the NCSA. McCool

and his colleague Marc Andreessen at NCSA had become fascinated by the

developmentsat CERN. Andreessen was working on a new Web browser(the

Mosaicbrowser) andthought theCERN server wastoo “largeand cumbersome”

(McCoolet al., 1999). He askedMcCool to take a lookat the server code. After

doing so, McCool thought he could simplify its implementation and improve

its performance relying on his system administration experience. Of course,

this kind of response is exactly what Web founder Berners-Lee had hoped for

whenhe had widely advertised and promoted his work. Since Andreessen was

developing the new browser, McCool concentrated on developing the server.

Theresult was the much improved NCSA httpd server.

While McCool was developing the improved httpd daemon, Andreessen

came up with a uniform way of addressing Web resources based on the URL

(Andreessen, 1993). This was a critical development. Up to this point, the

Web had been primarily viewed as asystem for hypertext-based retrieval. With

Andreessen’s idea, McCool could develop a standardized way for the Web

server and browser to pass data back and forth using extended HTML tags

calledforms in what was later to become the familiar Common GatewayInter-

face or CGI. As a consequence of this, their extended HTML and HTTP Web

protocols“transcended their original conception tobecome the basis of general

interactive,distributed, client-server information systems” (Gaines and Shaw,

1996). The client and server could now engage in a dynamic interaction, with

the server interpreting the form inputs from the client and dynamically adapt-

ing its responses in a feedback cycle of client-server interactions. Gaines and

Shaw(1996) nicely describe this innovation as enabling the client to “transmit

structuredinformation from the userback to an arbitraryapplication gatewayed

throughthe server. The servercould then process that informationand generate

an HTML document which it sent back as a reply. This document could itself

contain forms for further interaction with the user, thus supporting a sequence

ofclient-server transactions.”

Intraditional open developmentstyle, McCool kept hisserver project posted

on a Web site and encouraged users to improve it by proposing their own

modiﬁcations. At Andreessen’s recommendation, the software was released

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

26 2 Open Source Internet Application Projects

undera very unrestrictiveopen softwarelicense (essentially publicdomain) that

basicallylet developers do whateverthey wanted with thesource code, just like

theBerners-Lee/CERN approach. The opencharacter of this licensing decision

would later signiﬁcantly expedite the development or evolution of the NCSA

httpdserver intothe Apache server(see McCoolet al., 1999;Apache.pdf, 2006).

Foraperiod of time, McCool’s NCSA httpd daemon was the most popular

Web server on the Internet. Indeed, the Netcraft survey (netcraft.com) gave it

almost 60% of the server market share by mid-1995, surpassing the market

penetration of the CERN server which by then stood at only 20%. Although

Netcraft surveyed fewer than 20,000 servers at the time, there were already

millions of host computers on the Internet (Zakon, 1993/2006). The Apache

serverthat developed out of the NCSA server would be even more pervasively

deployed.

2.1.3 The Apache Fork

Ascommonly happens in open source projects, the original developers moved

on,in this case to work atNetscape, creating a leadership vacuum inthe NCSA

httpd project. After an interim, by early 1995, an interest group of Web site

administrators or “Webmasters” took over the development of the server. The

Webmasterswere motivatedby a mix of personal andprofessional needs, espe-

ciallydoing theirjobs better.Brian Behlendorf, acomputer scientist recentlyout

ofBerkeley, was one of them. He was developing the HotWired site for Wired

magazine for his consulting company and had to solve a practical problem:

theHotWired site needed password authentication ona large scale. Behlendorf

providedit by writinga patch tothe httpd serverto incorporate thisfunctionality

atthe required scale (Leonard, 1997). By this point, there were a large number

of patches for the httpd code that had been posted to its development mailing

list, but which, since McCool’s departure from NCSA, had gone unintegrated

becausethere was no oneat NCSA in chargeof the project. Usingthese patches

was time consuming: thepatches had to be individually downloaded and man-

uallyapplied to the NCSA base code, an increasingly cumbersome process. In

response to this unsatisfactory situation, Behlendorf and his band established

a group of eight distributed developers, including himself, Roy Fielding, Rob

Hartill, Rob Thau, and several others and deﬁned a new project mailing list:

new-httpd. For a while after its inauguration, McCool participated in the new

mailing list, even though he was now at Netscape working on a new propri-

etary Webserver. Netscape did not consider the free source Apache project as

competitive with its own system, so initially there appeared to be no conﬂict

of interest. McCool was able to explain the intricacies of the httpd daemon’s

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 27

codeto the new group, a considerable advantage to the project. However,after

Apache’srelease, it quicklybecame clear from theNetcraft market share analy-

sesthat Apache would bea major competitor tothe proprietary Netscape server

McCool was involved with. Thus McCool once again removed himself from

participation(McCool et al., 1999).

Since the NCSA httpd daemon served as the point of departure for the

new project, the new server’s development can be thought of as a fork in the

development of the original httpd project. The new group added a number of

ﬁxes which it then released as “a patchy” server. Eventually, they recognized

theyhad to revisethe code intoa completely rewrittensoftware architecture that

wasdevelopedby Rob Thau by mid-1995. Thau called his design Shambhala.

Shambhala utilized a modular code structure and incorporated an extensible

Application Programming Interface (API). The modular design allowed the

developersto work independently on different modules, a capability critical to

adistributed software developmentproject (Apache.pdf, 2006). Bythe summer

of 1995 the group had added a virtual hosting capability that allowed ISPs

to host thousands of Web sites on a single Apache server. This innovation

representeda highlyimportant capabilitylacking in thecompeting Netscapeand

MicrosoftWebservers. After considerablefurther developmentalmachinations,

the “Apache” version 1.0 was released at the end of 1995 together with its

documentation. The thesis by Osterlie (2003) provides a detailed technical

historyof the development based on the original e-mailarchives of the project.

Although the appellation Apache is allegedly associated with the customary

open source diff-and-patchtechniques used during its development, whence it

could be thought of as “a patchy” Web server, the FAQ on the server’s Web

site says it is eponymous for the American Indian tribe of the same name,

“known for their skill in warfare .. . and endurance.” Within a few years the

Apache server dominated the Web server market. By late 1996, according to

Netcraft.com, Apache already had 40% of the market share, by 2000 it was

about65%, and by mid-2005 it was over 70%,with Microsoft’s IIS lagging far

behind at around 20% of market penetration for years. More recent statistics

from Netcraft credit Apache with about 60% of the Web server market versus

30%for Microsoft IIS.

The review of the Apache project by McCool et al. (1999)givesan inside

lookat the project.Notably, the majordevelopers were not hobbyisthackers but

eithercomputer sciencestudents, PhDs,or professional softwaredevelopers.All

of them had other regular jobs in addition to their voluntary Apache involve-

ment. Their developer community had the advantage of being an enjoyable

atmosphere. Since the development occurred in a geographically distributed

context, it was inconvenient if not infeasible to have physical meetings. The

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

28 2 Open Source Internet Application Projects

circumstancesalso precluded relying on synchronous communication because

members had differentwork schedules. The volunteers had full-time job com-

mitments elsewhere and so could not predictably dedicate large portions of

their time to the project. Consequently,not only was the workspace decentral-

ized, the uncoordinated work schedules necessitated asynchronous communi-

cation.E-mail lists followednaturally asthe obvious meansfor communicating.

Mockuset al. (2002) observe how theApache development “began with a con-

sciousattempt to solvethe process issuesﬁrst, before developmenteven started,

because it was clear from the very beginning that a geographically distributed

set of volunteers, without any traditional organizational ties, would require a

uniquedevelopment process in order to make decisions.” Their procedures for

decision making and coordinating the project had to reﬂect its asynchronous,

distributed,volunteer, and shared leadership character, so the team “needed to

determine group consensus, without using synchronous communication, and

in a way that would interfere as little as possible with the project progress”

(Fielding,1999,p.42).

The organizationalmodel they chose was quite simple: voting on decisions

was done through e-mail, decisions were made on the basis of a voting con-

sensus, and the source code (by 1996) was administered under the Concurrent

VersionsSystem (CVS). The core developers for Apache, a relatively small

group originally of less than ten members, were the votes that really counted.

Anymailing list member could express an opinion but “only votes cast by the

Apache Group members were considered binding” (McCool et al., 1999). In

orderto commit a patchto the CVS repository,there had tobe at least threepos-

itivevotes and no negative votes.Forother issues, therehad to be at least three

positivevotes, and the positive votes had to constitute amajority. A signiﬁcant

tactical advantage of this approach was that the process required only partial

participation,enabling the project toproceed without hindrance, eventhough at

anygiven pointin time only afew core developersmight be active.Despite such

partialparticipation, the votingprotocol ensured thatdevelopment progress still

reﬂectedand required a reasonable level of peer review and approval. Because

negativevotes actedas vetoes inthe case of repositorychanges, such voteswere

expected to be used infrequently and required an explanation. One acceptable

rationalefor a veto mightbe to reject a proposed changebecause it was thought

thatit would interfere with thesystem’s support for a majorsupported platform

(McCool et al., 1999). Another acceptable rationale for a veto was to keep the

system simple and prevent an explosion of features. A priori, it might appear

thatdevelopment deadlockswould occur frequentlyunder such avoting system,

but the knowledge-base characteristics of the core developer group tended to

preventthis. Each of the group members tended to represent disjoint technical

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 29

perspectives and so they primarily enforced “design criteria” relevant to their

ownexpertise (McCool et al., 1999). Of course, problems could occur when

development was rapid but the availability of CVS kept the process simple

and reversible. Relatively routine changes could be committed to the repos-

itory ﬁrst and then retroactively conﬁrmed, since any patch could be easily

undone. Although participants outside the core group were restricted in their

votingrights, McCool’s reviewconﬁrms the beneﬁtsderived from the feedback

obtainedfrom users via newsgroups and e-mail.

TheApache Group that guided the project had eightfounding members and

by the time of the study by Mockus et al. (2002) had grown to twenty-ﬁve

members,though for most of the development period there were only half that

many.Refer to http://httpd.apache.org/contributors/for a current list ofApache

contributors,their backgrounds, and technical contributions to the project. The

coredevelopers were not quite synonymous with this groupbut included those

groupmembers active at a givenpoint in time and those aboutto be eligible for

membershipin the group, again adding upto about eight members in total. The

Apache Group members could both vote on code changes and also had CVS

commit access. In fact, strictly speaking, any member of the developer group

could commit code to any part of the server, with the group votes primarily

usedfor code changes that might have an impact on other developers (Mockus

etal., 2002).

Apache’s pragmatic organizational and process model was in the spirit of

the Internet Engineering Task Force (IETF) philosophy of requiring “rough

consensusand working code” (see suchas Bradner (1999) and Moody (2001)).

Thismotto was coined by DaveClark, Chief Protocol Architect forthe Internet

during the 1980s and one of the leaders in the development of the Internet. In

alegendary presentation in 1992, Clark had urged an assembled IETF audi-

ence to remember a central feature of the successful procedure by which the

IETF established standards, namely “We reject: kings, presidents, and voting.

We believe in: rough consensus and running code” (Clark, 1992). In the IETF,

theexpression rough consensus meant80–90% agreement, reﬂecting a process

wherein “a proposal must answer to criticisms, butneed not be held up if sup-

ported by a vast majority of the group” (Russell, 2006,p.55).The condition

about running code meant that a party behind a proposed IETF standard was

requiredto provide“multiple actual andinteroperable implementationsof a pro-

posedstandard (which) must existand be demonstrated before theproposal can

be advanced along the standards track” (Russell, 2006,p.55).The pragmatic,

informalIETF process stood in stark contrast to the laborious ISO approach to

developingstandards, a process that entailed having a theoretical speciﬁcation

prior to implementation of standards. The IETF approach and Clark’s stirring

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

30 2 Open Source Internet Application Projects

phrase represented an important “bureaucratic innovation,” a way of doing

things that “captured the technical and political values of Internet engineers

during a crucial period in the Internet’s growth” (Russell, 2006,p.48). Free

software advocate Lawrence Lessig (1999,p.4)describedit as “a manifesto

that will deﬁne our generation.” Although its circumstances and process were

not identical to Apache’s, the IETF’s simple pragmatism reﬂected the same

spirit that let productive, creative work get done efﬁciently, with appropriate

oversight,but minimal bureaucratic overhead.

By 1998, the project had been so remarkably successful that IBM asked to

join the Apache Group, a choice that made corporate sense for IBM since its

corporatefocus had become providingservices rather than marketing software.

TheApache Group decided to admit the IBM developerssubject to the group’s

normal meritocratic requirements. The group intended the relationship with

IBMto serve as a model forfuture industrial liaisons (McCool et al., 1999). As

of this writing a signiﬁcant majority of the members of the Apache Software

Foundation appear to be similarly industrially afﬁliated (over 80%) based on

themember list at http://www.apache.org/foundation/members.html (accessed

January5, 2007).

ApacheDevelopment Process

Mockuset al. (2002) providea detailed analysis of theprocesses, project devel-

opmentpatterns, and statisticsfor the Apache project.The generic development

processapplied by a core developer was as follows:

identifya problem or a desired functionality;

attemptto involve a volunteer in the resolution of the problem;

testa putative solution in a local CVS source copy;

submitthe tested code to the group to review; and

onapproval, commit the code to the repository (preferably as a single

commit)and document the change.

Newwork efforts were identiﬁed in severalways: via the developer mailing

list,the Apache USENET groups, and the BUGDB reporting system (Mockus

et al., 2002). The developer mailing list was the most important vehicle for

identifying changes. It was the key tool for discussing ﬁxes for problems and

newfeatures andwas giventhe highest priorityby the developers,receiving “the

attentionof allactive developers”for thesimple reason thatthese messages were

most likely to come from other active developers and so were deemed “more

likelyto contain sufﬁcient informationto analyze the request or containa patch

to solve the problem” (Mockus et al., 2002). Screening processes were used

forthe other sources. The Apache BUGDB bug-reportingtool was actually not

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 31

directly used by most developers, partly because of annoying idiosyncrasies

in the tool. Instead, a few developers ﬁltered the BUGDB information and

forwarded entries thought to be worthwhile to the developer mailing list. The

Apache USENET groups were also used less than one might expect because

they were considered “noisy.” Once again, volunteers ﬁltered the USENET

information, forwarding signiﬁcant problems or useful enhancements to the

developermailing list.

Oncea problem wasidentiﬁed, the nextissue was “whowould do thework?”

Atypical practice was for the core developers associated with the code for the

affected part of the system, having either developed it or spent considerable

time maintaining it, to take responsibility for the change. This attitude reﬂects

an implicit kind of code ownership (Mockus et al., 2002). Correlative to this

cultural practice, new developers would tend to focus on developing new fea-

tures(whence features thathad no priorputative “owner”)or to focuson parts of

theserver that were not actively being worked on by their previous maintainer

(and so no longer had a current “owner”). These practices were deferred to by

other developers. As a rule, established activity and expertise in an area were

thedefault guidelines. In reality,the actual practice of the developerswas more

ﬂexible. Indeed, the data analysis provided by Mockus et al. (2002) suggests

thatthe Apache group’s core developershad sufﬁcient respect for the expertise

ofthe other core developers that theycontributed widely to one another’smod-

ulesaccording to developmentneeds. Thus the notionof code ownership wasin

reality “more a matter of recognition of expertise than one of strictly enforced

abilityto make commits to partitions of the code base” (Mockus et al., 2002).

Regarding solutions to problems, typically several alternatives were ﬁrst

identiﬁed.These were then forwarded by the volunteer developer,self-charged

with the problem, to the developer mailing list for preliminary feedback and

evaluation prior to developing the actual solution. The prototype solution

selected was subsequently reﬁned and implemented by the originating devel-

oper and then tested on his local CVS copy before being committed to the

repository.The CVS commit itself couldbe done in two ways: using acommit-

then-reviewprocess that was typically applied in development versions of the

system,versus a post-for-review-ﬁrst process in which the patch was posted to

the developer mailing list for prior review and approval before committing it,

as would normally be done if it were a stable release being modiﬁed (Mockus

etal., 2002). In either case, the modiﬁcations, including both the patch and the

CVS commit log, would be automatically sent to the developer mailing list.

It was not only standard practice that the core developers reviewed all such

changes as posted, butthey were also available to be reviewed by anyone who

followedthe developer mailinglist. The Apache Groupdetermined when a new

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

32 2 Open Source Internet Application Projects

stablerelease of the product was to be distributed. An experienced core devel-

operwho volunteered to act as the release manager, would, as part ofthat role,

identify any critical open problems and shepherd their resolution, including

changesproposed from outside the core developer group. The releasemanager

also controlled access to the repository at this stage and so any development

codethat was supposed to be frozen at this stage was indeed left alone.

Thedevelopment group achievedeffective coordinationin a variety ofways.

Akey software architecture requirement was that the basic server functional-

ity was intentionally kept limited in scope, with peripheral projects providing

added functionality by interfacing with the core server through well-deﬁned

interfaces. Thus the software architecture itself automatically helped ensure

propercoordination, without signiﬁcant additional effortrequired by the devel-

opergroup since the interfaceitself enforced the necessarycoordination. Exter-

nal developers who wanted to add functionality to the core Apache server

werethereby accommodated bya “stable, asymmetrically-controlledinterface”

(Mockus et al., 2002). The presence of this API has been a key feature in the

success of Apache since it greatly facilitates expanding the system’s func-

tionality by the addition of new modules. On the other hand, coordination of

developmentwithin the core area was handled effectivelyby the simple means

described previously,informally supported by the small core group’s intimate

knowledgeof the expertise of their own members. The relative absence of for-

mal mechanisms for approvalor permission to commit code made the process

speedybut maintained high quality. Bug reporting and repair were also simple

in terms of coordination. For example, bug reporting was done independently

byvolunteers. It entailed no dependencies that could lead to coordination con-

ﬂicts, since these reports themselves did not change code, though they could

lead to changes in code. Similarly, most bug ﬁxes themselves were relatively

independentof one another with the primary effort expended in tracking down

the bug, so that once again coordination among members was not a major

issue.

StatisticalProﬁle of Apache Development

Well-informedand detailedempirical studies of projectson the scale ofApache

are uncommon. Therefore, it is instructive to elaborate on the statistical anal-

ysis and interpretations provided in Mockus et al. (2002). The credibility

of their analysis is bolstered by the extensive commercial software develop-

ment experience of its authors and the intimate familiarity of second author

Roy Fielding with Apache. The study analyzes and compares the Apache and

Netscape Mozilla projects based on data derived from sources like the devel-

opere-mail lists,CVS archives,bug-reporting systems,and extensiveinterviews

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 33

with project participants. We will focus on the results for the Apache project.

(Another worthwhile study of the structure of open projects is by Holck and

Jorgensen(2005), which compares the Mozilla and FreeBSD projects. It pays

special attention to how the projects handle releases and contributions as well

astheir internal testing environments.)

TheApache server had about80,000 lines of source codeby 2000 (Wheeler,

2000),with approximately 400people contributingcode through 2001 (thetime

frameexamined in Mockus et al. (2002)). TheMockus study distinguishes two

kindsof Apache contributions:

codeﬁxes madein response to reported problems

codesubmissions intendedto implement new system functionality

Roundednumbers are used in thefollowing statistical summaries forclarity.

Thesummary statistics for Apache code contributions are as follows:

Two hundred people contributed to 700 code ﬁxes.

Two hundred ﬁfty people contributed to 7,000 code submissions.

Thesummary error statistics are as follows:

Threethousand people submitted 4,000 problem reports, most triggering no

changeto the code base, because they either lacked detail or the defect had

beenﬁxed or was insigniﬁcant.

Fourhundred ﬁfty people submitted the 600 bug reports that led to actual

changesto the code.

The 15 most productive developersmade 85% of implementation changes,

though for defect repair these top 15 developers made only 65% of the code

ﬁxes.A narrow pool of contributors dominated code submissions, with only 4

developersper 100 code submissions versus 25 developers per 100 code ﬁxes.

Thus“almost all new functionality is implemented and maintained by the core

group”(Mockus et al., 2002,p.322).

The Apache core developers compared favorably with those in reference

commercial projects, showing considerably higher levels of productivity and

handling more modiﬁcation requests than commercial developers despite the

part-time,voluntary natureof their participation.The problem reportingrespon-

sibilities usually handled by test and customer support teams in proprietary

projects were managed in Apache by thousands of volunteers. While the 15

most productive developers submitted only 5% of the 4,000 problem reports,

there were over 2,500 mostly noncore developers who each submitted at least

one problem report, thus dispersing the traditional role of system tester over

manyparticipants. The response time forproblem reports was striking: half the

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

34 2 Open Source Internet Application Projects

problemsreported were solved in aday, 75% in a month,and 90% in 4 months,

atestimony to the efﬁciency of the organization of the project and the talent of

the volunteers.Of course, the data used in such studies is invariably subject to

interpretation.For example, metrics like productivityof groups can be affected

by the procedures used to attribute credit, while response rates reported could

be affected by details like when bug reports were ofﬁcially entered into the

trackingsystem.

The social and motivational framework under which the developers oper-

atedwas an importantelement in the successof the Apacheproject. The merito-

craticprocess thatenables candidate developersto achievecore developer status

requires persistence, demonstrated responsibility to the established core team,

andexceptionally high technical capability.The motivationalstructure also dif-

fers signiﬁcantly from commercial environments, where the project worked

on and component tasks are assigned by management, not freely chosen by a

developer.From this viewpoint, it seems unsurprising that the passionate, vol-

untary interest of the project developers should be a strong factor contributing

to its success. The stakeholder base for Apache is now sufﬁciently broad that

changes to the system must be conservatively vetted, so services to end users

arenot disrupted. For this reason, Ye et al. (2005) characterize it as now being

aservice-oriented open source project.

The Mockus et al. (2002)reviewmakes several tentative conjectures about

thedevelopment characteristics of openprojects based on their datafor Apache

and Netscape Mozilla development(prior to 2001). For example, they suggest

that for projects of Apache’s size (as opposed to the much larger Netscape

Mozilla project), a small core of developers create most of the code and func-

tionality and so are able to coordinate their efforts in a straightforward way

even when several developers are working on overlapping parts of the code

base.In contrast, in larger development projects likeNetscape Mozilla, stricter

practicesfor code ownership, work group separation,and CVS commit author-

ity have to be enforced to balance the potential for disorder against excessive

communicationrequirements. Anothersimple pattern isthat the sizesof the par-

ticipantcategories appear to differ signiﬁcantly: thenumber of core developers

issmaller by an orderof magnitude than the numberof participants who submit

bugﬁxes,which in turn is smaller by an order of magnitude than the number

of participants who report problems and defects. The defect density for these

opensource projects was lowerthan the compared proprietary projects thathad

beenonly feature tested. However, the studyurges caution in the interpretation

of this result since it does not address postrelease bug density and may partly

reﬂect the fact that the developersin such projects tend to have strong domain

expertise as end users of the product being developed. The study concluded

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.1 The WWW and the Apache WebServer 35

that the open projects considered exhibited “very rapid responses to customer

problems”(Mockus et al., 2002).

ReusingOpen Source Creations

One of the key objectives of the open source movement is to build a reusable

public commons of software that is universally available and widely applica-

ble. Brian Behlendorf of the Apache (and later Subversion) project has some

valuable insights about how to apply the creations of open development to

circumstances beyond those originally envisioned. He identiﬁes some general

conditionshe believes are necessary forother applications to beneﬁt from open

productsand libraries when theyare applied not just in environmentsthey were

originallydesigned for but in updated versions of those environments or when

theyhave to be integrated with other applications (Anderson, 2004). There are

threekey ingredients that haveto come together to effectivelysupport the reuse

ofopen software:

1. access to the source code,

2. access to the context in which the code was developed, and

3. access to the communities that developed and use the code.

One might call this the 3AC model of what open source history has taught

us about software reuse. The availability of the code for a program or soft-

ware library is the ﬁrst essential ingredient, not just the availability of stable

APIs like in a COTS (Commercial Off-the-Shelf) environment. Open source

obviouslyprovides the source code. Source code is required foreffective reuse

of software because any new application or infrastructure context, like a new

operating system, will necessitate understanding the code because embedding

software components in new contexts will “inevitably . .. trigger some defect

that the original developers didn’t know existed” (Anderson, 2004). However,

if you are trying to improve the code, you also need access to the context of

its development. In open source projects this can be obtained from a variety

of sources including e-mail archives and snapshots from the development tree

that provide the history of the project and its development artifacts. This way

you can ﬁnd out if the problems you identify or questions you have in mind

havealready been asked and answered. Finally, inorder to understand how the

softwarewas built and why it was designed the wayit was, you also need to be

able to interact with the community of people who developed the product, as

well as the community of other users who may also be trying to reuse it. This

kind of community contact information is also available in open source that

has mailing lists for developers and users, as well as project announcements

thatcan be scavenged for information about how the project developed. That’s

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

36 2 Open Source Internet Application Projects

how you get a truly universal library of reuseable software: open code, the

developmentalcontext, and the community that made the software and uses it.

References

Anderson, T. (2004). Behlendorf on Open Source. Interview with Brian Behlendorf.

http://www.itwriting.com/behlendorf1.php.Accessed November 29, 2006.

Andreessen,M. (1993). NCSA Mosaic Technical Summary.NCSA, University of Illi-

nois.Accessed via Google Scholar, November 29, 2006.

Apache.pdf. (2006). World Wide Web. http://www.governingwithcode.org. Accessed

January10, 2007.

Berglund, Y., Morrison, A., Wilson, R., and Wynne, M. (2004). An Investigation

into Free eBooks. Oxford University. http://ahds.ac.uk/litlangling/ebooks/report/

FreeEbooks.html.Accessed December 16, 2006.

Berners-Lee, T. (2006). Frequently Asked Questions. www.w3.org/People/Berners-

Lee/FAQ.html.Accessed January 10, 2007.

Berners-Lee,T. and Fischetti, M. (2000). Weaving the Web– The Original Design and

UltimateDestiny of the World Wide Web by Its Inventor.Harper, San Francisco.

Bradner, S. (1999). The Internet Engineering Task Force. In: Open Sources: Voices

fromthe Open Source Revolution, M. Stone,S. Ockman, and C. DiBona (editors).

O’ReillyMedia, Sebastopol, CA, 47–52.

Clark,D. (1992). A Cloudy Crystal Ball: Visions of the Future. Plenary presentation at

24thmeeting of the Internet EngineeringTask Force, Cambridge,MA, July 13–17,

1992. Slides from this presentation are available at: http://ietf20.isoc.org/videos/

future

ietf 92.pdf. Accessed January 10, 2007.

Fielding,R.T. (1999). Shared leadership in the Apache Project.Communications of the

ACM,42(4), 42–43.

Gaines, B. and Shaw, M. (1996). Implementing the Learning Web.In: Proceedings of

EDMEDIA ’96: WorldConference on Educational Multimedia and Hypermedia.

Associationfor the Advancement of Computingin Education, Charlottesville, VA.

http://pages.cpsc.ucalgary.ca/∼gaines/reports/LW/EM96Tools/index.html.

AccessedNovember 29, 2006.

Holck,J. and Jorgensen N. (2005). Do NotCheck in on Red: Control Meets Anarchy in

Two Open SourceProjects. In: Free/Open SoftwareDevelopment, S. Koch(editor).

IdeaGroup Publishing, Hershey, PA, 1–26.

Kesan, J. and Shah, R. (2002). Shaping Code. http://opensource.mit.edu/shah.pdf.

AccessedNovember 29, 2006.

Leonard, A. (1997). Apache’s Free Software Warriors. Salon Magazine. http://

archive.salon.com/21st/feature/1997/11/cov

20feature.html. Accessed Nov-

ember29, 2006.

Lessig,L. (1999). Code and Other Laws of Cyberspace. Basic Books, New York.

McCool,R., Fielding, R.T., and Behlendorf, B. (1999). How the Web WasWon. http://

www.linux-mag.com/1999–06/apache

01.html.Accessed November 29, 2006.

Mockus,A., Fielding,R.T., andHerbsleb, J.D. (2002). TwoCase Studiesof Open Source

Development:Apache and Mozilla. ACM Transactions on Software Engineering

andMethodology, 11(3), 309–346.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 37

Moody,G. (2001). Rebel Code. Penguin Press, New York.

Osterlie,T.(2003). The User-DeveloperConvergence: Innovationand SoftwareSystems

Development in the Apache Project. Master’s Thesis, Norwegian University of

Scienceand Technology.

Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.

ﬁrstmonday.dk/issues/issue3

3/raymond/index.html.Ongoing version: http://

www.catb.org/∼esr/writings/cathedral-bazaar/.Accessed December 3, 2006.

Russell,A. (2006). “Rough Consensus and Running Code” and the Internet-OSI Stan-

dardsWar. IEEE Annals of the History of Computing, 28(3), 48–61.

Wheeler,D. (2000). Estimating Linux’s Size. http://www.dwheeler.com/sloc/redhat71-

v1/redhat71sloc.html.Accessed November 29, 2006.

Ye,Y., Nakakoji, K., Yamamoto, Y., and Kishida, K. (2005). The Co-Evolution of

Systemsand Communities. In: Free/Open Source Software Development,S. Koch

(editor).Idea Group Publishing, Hershey, PA, 59–83.

Zakon, R. (1993/2006). Hobbes’ Internet Timeline v8.2. http://www.zakon.org/robert/

internet/timeline/

.Accessed January 5, 2007.

2.2 The Browsers

Browsers have played a critical role in the Internet’s incredibly rapid expan-

sion. They represent the face of the Internet for most users and the key means

for accessing its capabilities. Three open source browsers have been most

prominent: Mosaic, Netscape Navigator, and Firefox. The proprietary Inter-

netExplorer browser, which is based onMosaic, coevolved and still dominates

the market. The developmentof these browsers is an intriguing but archetypal

taleof opensource development.It combines elementsof academic provenance,

proprietarycode, opensource code andlicenses, technological innovations,cor-

porate battles for market share, creative software distribution and marketing,

opentechnology standards, and opencommunity bases of volunteer developers

andusers. The story starts with the revolutionary Mosaic browser at thebegin-

ning of the Internet revolution, moves through the development of Netscape’s

corporatelysponsored browser and its browser war with InternetExplorer, and

ﬁnallyon to Netscape’s free descendant, Firefox.

2.2.1 Mosaic

Thefamed MosaicWe b browser was instrumentalin creatingthe Internet boom.

Mosaic was developed at the NCSA starting in 1993. The University of Illi-

nois student Marc Andreessen (who was the lead agent in the initiative) and

NCSA full-time employee and brilliant programmer Eric Bina were the chief

developers. Andreessen wanted to make a simple, intuitive navigational tool

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

38 2 Open Source Internet Application Projects

thatwould let ordinary users explore the new WWW more easily and let them

browse through the data available on the Web. Andreessen and Bina (1994)

identiﬁedthree key design decisions.The tool had to beeasy to use, like aword

processingGraphical User Interface (GUI) application.It had to be keptsimple

bydivorcing pageediting from presentation.(The original Berners-Leebrowser

had included publication features that complicated its use.) The software also

hadto accommodate images in such away that both text andembedded images

couldappear in the sameHTML page or browser window.For this, Andreessen

hadto introduce an HTML image tag, even though the standardsfor such a tag

hadnot yet been settled. Signiﬁcantly, Mosaic also introduced formsthat users

could ﬁll out. It took a short six weeks to write the original program of 9,000

lines of code (Wagner,2002). Mosaic transcended the capabilities of previous

text-orientedtools like FTP for accessing information on the Internet. Instead,

it replaced them with a multimedia GUI tool for displaying content, including

the appeal of clickable hyperlinks. Mosaic was initially available for Unix but

was quickly portedto PCs and Mac’s. It rapidly became the killer app for Web

accessof the mid-1990s.

Mosaic’ssuccess was notmerely a technical accomplishment. Andreessen’s

management of the project was nurturing and attentive. He was an activist

communicator and listener, one of the top participants in www-talk in

1993 (NCSAmosaic.pdf, 2006). According to Web founder Berners-Lee,

Andreessen’sskills in “customer relations” were decisive in the enhancement

of Mosaic: “You’d send him a bug [problem] report and then two hours later

he’dmail you a ﬁx” (quoted in Gillies and Cailliau (2000,p.240)).Mosaic’s

popularityhad a remarkable effect: it caused an explosion in Web trafﬁc.Each

increase in trafﬁc in turn had a feedback effect, attracting more content to the

Internet,which in turn increased trafﬁc evenfurther. Mosaic had over2 million

downloads in its ﬁrst year, and by mid-1995 it was used on over 80% of the

computersthat were connected tothe Internet. An article inthe New YorkTimes

byJohn Markoff(1993) appreciated theimplications of the newsoftware forthe

Internet.The article ballyhooed the killer app statusof Mosaic. However, it did

notallude to the software’s developers by name but only to theNCSA director

LarrySmarr. Thisslight reﬂected theinstitutional provenance ofthe tool andthe

attitudeof NCSA: Mosaic was a product of NCSA, not of individuals, and the

Universityof Illinois expected itto stay that way. Werefer the interested reader

toGillies and Cailliau (2000) and NCSAmosaic.pdf (2006) for more details.

The Mosaic license was open but not GPL’d and had different provisions

for commercial versus noncommercial users. Refer to http://www.socs.uts.

edu.au/MosaicDocs-old/copyright.html (accessed January 10, 2007) for the

full terms of the NCSA Mosaic license. The browser was free of charge for

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 39

noncommercialuse, which meant academic, research, or internal businesspur-

poses, with the source code provided for the Unix version. Noncommercial

licenseeswere allowed to not onlydevelop the software but redistributederiva-

tiveworks. These redistributions weresubject to a proviso: the derivativeprod-

uctshad tobe identiﬁed asdifferent from theoriginal Mosaic codeand there was

tobe no charge for the derivative product. The terms for commercial licensees

weredifferent; for commercial distributionof a modiﬁed product,license terms

had to be separately negotiated with NCSA. NCSA assigned all the commer-

cialrights for Mosaic to Spyglass in late 1994 (Raggett et al., 1998). By 1995,

Microsoft had licensed Mosaic as the basis for its own early browser Internet

Explorer,but by that point Netscape Navigator dominated the browser market.

Ironically,however, to this day the Help > About tab on the once again dom-

inant Internet Explorer has as its ﬁrst entry “based on NCSA Mosaic. NCSA

Mosaic(TM);was developed atthe National Center for SupercomputingAppli-

cationsat the University of Illinois at Urbana–Champaign.”

Beyond the revolutionary impact of its functionality on the growth of the

Internet,the Mosaic browseralso expedited the Web’sexpansionbecause of the

publicaccess it provided to HTML, which was essentiallyan open technology.

Mosaic inherited the View Source capability of Tim Berners-Lee’s browser.

Thishad a signiﬁcant side effectsince it allowed anyoneto see the HTML code

fora page and imitate it.As Tim O’Reilly (2000) astutelyobserved, this simple

capability was “absolutely key to the explosive spread of the Web. Barriers to

entry for ‘amateurs’ were low, because anyone could look ‘over the shoulder’

ofanyone else producing a web page.”

2.2.2 Netscape

Softwaretalent isportable. Given theuncompromising, albeitby the book,insti-

tutional arrogation of Mosaic by the University of Illinois, there was no point

in Andreessen staying with NCSA. After graduating in 1993, he soon became

one of the founders of the new Netscape Corporation at the invitation of the

legendaryJim Clark, founder of Silicon Graphics. Netscape was Andreessen’s

nextspectacular success.

Andreessen was now more than ever a man with a mission. At Netscape,

he led a team of former students from NCSA, with the mission “to develop an

independentbrowser better than Mosaic, i.e. Netscape Navigator.” They knew

that the new browser’s code had to be completely independent of the original

Mosaicbrowser in order toavoid future legal conﬂictswith NCSA. As it turned

out, a settlement with the University of Illinois amounting to $3 million had

to be made in any case (Berners-Lee, 1999). The internal code name for the

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

40 2 Open Source Internet Application Projects

ﬁrst Netscape browser was Mozilla,afeisty pun combining the words Mosaic

(the browser) and Godzilla (the movie monster) that was intended to connote

an application that would kill the then dominant Mosaic browser in terms of

popularity. (The page at http://sillydog.org/netscape/kb/netscapemozilla.php,

accessed January 10, 2007, provides a helpful description of the sometimes

confusinguse of the name Mozilla.) The development teamworked feverishly.

As one member of the group put it, “a lot of times, people were there straight

forty-eighthours, just coding. I’venever seen anything likeit. . .. But they were

drivenby this vision [of beating the original Mosaic]” (Reid, 1997). The sense

of pride and victory is even more pungent in a well-known postmortem by

team member Jamie Zawinski (1999) that would come later, after Netscape’s

unhappybrowser war with Internet Explorer:

...wewereout to change the world. And we did that. Without us, the change

probablywould have happened anyway...Butwewerethe ones who actually did

it.When you see URLs on grocery bags, on billboards, on the sides of trucks, at the

endof movie credits just after the studio logos – that was us, we did that. We put

theInternet in the hands of normal people. We kick-started a new communications

medium.We changed the world.

Netscape’spricing policy was based on a quest for ubiquity. Andreessen’s

belief was that if they dominated market share, the proﬁts would follow from

sideeffects. According to Reid (1997), Andreessen thought,

Thatwas the way to get the company jump-started, because that just gives you

essentiallya broad platform to build off of. It’s basically a Microsoft lesson, right?

Ifyou get ubiquity, you have a lot of options, a lot of waysto beneﬁt from that. You

canget paid by the product you are ubiquitous on, but you can also get paid on

productsthat beneﬁt as a result. One of the fundamental lessons is that marketshare

nowequals revenue later, and if you don’t have market share now,you are not

goingto have revenue later. Another fundamental lesson is that whoever gets the

volumedoes win in the end. Just plain wins.

Netscape bet on the side effects of browser momentum. It basically gave

the browser away. However, it sold the baseline and commercial server they

developed,originally pricing them at $1,500 and$5,000, respectively. The free

browser was an intentional business marketing strategy designed to make the

productubiquitous so that proﬁts could then be made offsymbiotic effects like

advertisingand selling servers (Reid, 1997). In principle,only academic use of

thebrowser wasfree and allothers were supposedto pay$39.00. But inpractice,

copies were just downloaded for free during an unenforced trial period. How-

ever,although the product waseffectively free ofcharge, it was notin any sense

freesoftware or open source. The original Netscape software license was pro-

prietary (http://www.sc.ucl.ac.be/misc/LICENSE.html, accessed January 10,

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 41

2007). It also explicitly prohibited disassembly, decompiling or any reverse

engineeringof the binary distribution, or the creation of any derivative works.

Netscape’sstrategy paid off handsomely and quickly. Originally posted for

downloadon October 13, 1994, Netscape quickly dominated the browser mar-

ket. This downloaded distribution of the product was itself a very important

Netscapeinnovation and accelerated its spread. The aggressive introduction of

newHTML tags by the Netscapedevelopers was also seductiveto Web design-

ers who rapidly incorporated them into their Web pages (Raggett et al., 1998;

Grifﬁn, 2000). Since Netscape was the only browser that could read the new

tags,the Web pagedesigners would include a note ontheir page that it was best

viewedin Netscape. Theywould then providea link to wherethe free download

could be obtained, so Netscape spread like a virus. A major technical advan-

tage of the initial Netscape browser over Mosaic was that Netscape displayed

imagesas they were received fromembedded HTTP requests, rather than wait-

ing for all the images referred to in a retrieved HTML page to be downloaded

before the browser rendered them. It also introduced innovations like cookies

and even more importantly the new scripting language Javascript, which was

speciﬁcallydesigned for the browser environment and made pages much more

dynamic(Andreessen, 1998; Eich, 1998). Thus,brash technology meshed with

attractiveness, pricing, and distribution to make Netscape a juggernaut. The

company went public on August 9, 1995, and Andreessen and Clark became

millionaires and billionaires, respectively. By 1996 Netscape had penetrated

75%of the market. It was eventually bought by AOL for $10 billion.

Given this initial success, how did it happen that within a few years of

Netscape’striumphant conquest of the browser market, Internet Explorer, the

proprietary Microsoft browser, which was deeply rooted in Mosaic, became

the dominant browser? It was really Microsoft’s deep pockets that got the

better of Netscape in the so-called browser wars. Indeed, Microsoft called

its marketing campaigns jihads (Lohr, 1999). Microsoft destroyed Netscape’s

browser market by piggybacking on the pervasive use of its operating system

onPCs. It bundled Internet Explorer forfree with every copy ofWindows sold,

despite the fact that it had cost hundreds of millions of dollars to develop.

With its huge cash reservoirs, Microsoft was able to fund development that

incrementally improved IE until step by step it became equivalent in features

and reliability to Netscape. As time went by, the attraction of downloading

Netscape vanished, as the products became comparable. Netscape became a

victimof redundancy.

Manyof Microsoft’s practices were viewed as monopolistic and predatory,

resultingin its being prosecuted by the federal governmentfor illegally manip-

ulatingthe software market. Government prosecutor David Boies claimed that

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

42 2 Open Source Internet Application Projects

Microsoftwas trying to leverage its de factomonopoly in Windows to increase

itsmarket share for browsers and stiﬂe competition (Lea, 1998). A settlement,

which many considered as a mere slap on the wrist to Microsoft, was reached

withthe Justice department inlate 2001 (Kollar-Kotelly,2002). In any case,the

original Netscape Navigator browser’s market share had fallen steadily. From

a peak of over 80% in 1996, it dropped to 70% in 1997, 50% in 1998, 20% in

1999, to a little over 10% in 2000. Microsoft’s IE rose in tandem as Netscape

fell,almost saturating the market by 2002, prior to Firefox’s emergence.

Ina lastditch effortto rebound, Netscapedecided thatperhaps the proprietary

IEdragon could bebeaten by a reformed,more open Netscape. Soearly in 1998

Netscape responded to Internet Explorer by going open source – sort of. The

company stated that it was strongly inﬂuenced in this strategy by the ideas

expressedin Raymond’s famous “TheCathedral and the Bazaar” paper (1998).

Refer to the e-mail from Netscape to Raymond in the latter’s epilogue to his

paper, updated as per its revision history in 1998 or later. Netscape thought

it could recover from the marketing debacle inﬂicted by the newly updated

releases of Internet Explorer by exploiting the beneﬁts of open source style-

collaborativedevelopment. So it decided to release the browser source code as

opensource.

The new release was done under the terms of the Mozilla Public License

(MPL). The sponsor was the newly established Mozilla Organization whose

missionwould be to developopen source Internetsoftware products. The intent

ofthe MPL licenseand the MozillaOrganization was topromote open source as

ameans of encouraging innovation. Consistent with the general senseof copy-

left,distributed modiﬁcationsto anypreexisting sourcecode ﬁlesobtained under

anMPL open sourcelicense also hadto be disclosedunder the termsof the MPL

license.However,completely newsource codeﬁles, which alicensee developed,

werenot restricted orcovered by anyof the terms ofthe MPL. Furthermore,this

remainedthe case evenwhen the additions orchanges were referenced by mod-

iﬁcationsmade in the MPL-licensed section of the source code. In comparison

with some existing open source licenses, the MPL license had “more copyleft

(characteristics)than the BSD family of licenses,which have no copyleft atall,

butless thanthe LGPL orthe GPL”licenses (http://www.mozilla.org/MPL/mpl-

faq.html).The Netscapebrowser itself (post-1998)contained both typesof ﬁles,

closedand open. It included proprietary (closedsource) ﬁles that were not sub-

ject to the MPL conditions and were available only in binary. But the release

alsoincluded MPL ﬁlesfrom the Mozilla project,which were now opensource.

AlthoughNetscape’s marketshare still declined, outof its asheswould come

something new and vital. The versions of Netscape released after 2000 con-

taineda newbrowser enginenamed Gecko, whichwas responsible forrendering

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 43

and laying out the content of Web pages. This was released under the MPL

license and was open source. But, the open source releases of Netscape were

notvery successful, partly because of a complicated distribution package. The

Netscape project was ﬁnally shut down by then owner AOL in 2003. How-

ever, a small, nonproﬁt, independent, open source development organization

called the Mozilla Foundation, largely self-funded through contributions, was

set up by AOL to independently continue browser development. The purpose

ofthe foundation was to provideorganizational, legal, and ﬁnancial supportfor

the Mozilla open source software project. Its mission was to preserve choice

and promote innovation on the Internet (mozilla.org/foundation/). Out of this

matrix, the Firefox browser would rise phoenix-like and open source from the

ashesof Netscape. Thus“a descendant of NetscapeNavigator (was) nowpoised

toavenge Netscape’s defeat at the hands of Microsoft” (McHugh, 2005).

2.2.3 Firefox

The Mozilla Foundation development team that produced Firefox began by

refocusingon the basic needs of abrowser user. It scrappedthe overly complex

Netscape development plans and set itself the limited objective of making a

simple but effective, user-oriented browser. The team took the available core

codefrom the Netscape project and used that as a basis for a more streamlined

browser they thought would be attractive. In the process they modiﬁed the

original Netscape Gecko browser layout engine to create a browser that was

also signiﬁcantly faster. The eventual outcome was Fi refox,across-platform

open source browser released at the end of 2004 by the Mozilla Foundation

that has proven explosively popular. Firefox is now multiply licensed under

the GPL, LGPL, or MPL at the developer’s choice. It also has an End User

License Agreement that has some copyright and trademark restrictions for the

downloadedbinaries needed by ordinary users.

Firefoxhas been a true mass-market success. It is unique as an open source

applicationbecause the number of its direct end users is potentially inthe hun-

dredsof millions. Previous successful open source applications like Linux and

Apachehad beenintended fortechnically proﬁcient usersand addressed(at least

initiallyin the case ofLinux) a smaller end-usermarket, while desktop environ-

mentslike GNOME and KDE are intended for a Linux environment. Firefox’s

market advantages include being portable to Windows, Linux, and Apple.

Thisincreases its potential audience vis- `a-visInternet Explorer. It also closely

adheresto theW3C standardsthat Internet Explorerhas viewedas optional.Like

the original Netscape browser, Firefox burst onto the browser scene, quickly

capturingtens of millionsof downloads: 10 millionin its ﬁrst month,25 million

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

44 2 Open Source Internet Application Projects

within 100 days of publication, and a 100 million in less than a year. Fire-

fox 1.5 had 2 million downloads within 2 days of publication in November

2005. It rapidly gained prominence in the browser market, capturing by some

estimates 25% of the market (w3schools.com/browsers/browsers

stats.asp,

accessed December 6, 2006) within a year or so of its initial release, though

sources like Net Applications Market Share survey show signiﬁcantly lower

penetration,under 15% in late 2006 (http://marketshare.hitslink.com, accessed

December6, 2006).

Microsoft’s complacency with regard to the security of Internet Explorer

serendipitouslyhelped Firefox’s debut. In June 2004, a Russian criminal orga-

nization distributed Internet malware called Download.ject that exploited a

compositeweakness jointly involving Windows IIS servers and a security vul-

nerability in Internet Explorer. Ironically, the exploited security shortcoming

in Internet Explorer was tied precisely to its tight integration with the Win-

dowsoperating system. This integration provided certain software advantages

tothe browser but also allowed hackers to leverage theirattacks (Delio, 2004).

Althoughthe attack wascountered within afew days, itsoccurrence highlighted

IEsecurity holes and waswidely reported in the news.US CERT (theUS Com-

puter EmergencyReadiness Team) advised federal agencies at the time to use

browsers other than Internet Explorer in order to mitigate their security risks

(Delio, 2004). The negative publicity about IE vulnerabilities occurred pre-

cisely when the ﬁrst stable version of Firefox appeared. This played right into

one of Firefox’s purported strengths, not just in usability but also in security,

therebyhelping to establish Firefox’s appeal.

The (Mozilla) Firefox project was started by Blake Ross. Blake had been

a contractor for Netscape from age 15 and already had extensive experience

in debugging the Mozilla browser. The precocious Ross had become dissatis-

ﬁed with the project’s direction and its feature bloat. He envisioned instead

a simpler easy-to-use browser, so he initiated the Firefox project in 2002.

Experienced Netscape developer Dave Hyatt partnered with Ross, bringing

with him a deep knowledge of the critical Mozilla code base. Ben Goodger

was engaged to participate because of his Web site’s “thorough critique of

the Mozilla browser” (Connor, 2006b). He subsequently became lead Firefox

engineerwhen Ross enrolled in Stanford at age19. Firefox was released in late

2004 under Goodger, who was also instrumental in the platform’s important

add-onarchitecture (Mook, 2004). Although its development depended on the

extensive Netscape code base, it was an “extremely small team of commit-

ted programmers” who developed Firefox (Krishnamurthy, 2005a). The core

projectgroup currently has six members: the aforementioned Ross, Hyatt, and

Goodger,as well as Brian Ryner, Vladmir Vukicevic, and Mike Connor.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 45

Keyfactors in the success of Firefox included its user design criteria, the

characteristics and policies of its development team, and its unique, open

community-basedmarketing strategy.

The project’s central design principle was “keep it simple.” Ross has used

familyimagery to describe thedesign criteria for determining whichfeatures to

includein the browser. They would ask the following questions about putative

features(Ross, 2005b):

Doesthis help mom use the web? If the answer was no, the next question was: does

thishelp mom’s teenage son use the web? If the answer was still no, the feature

waseither excised entirely or (occasionally) relegated to conﬁg ﬁle access only.

Otherwise,it was often moved into an isolated realm that was outside of mom’s

reachbut not her son’s, like the preferences window.

Inthe same spirit, Rossdescribes Firefox as being about“serving users” and

contends that a window of opportunity for Firefox’s development had opened

becauseMicrosoft had irresponsibly abandoned Internet Explorer,leaving “for

deada browser that hundreds of millions of people rely on” (Ross, 2006).

TheFirefox development team structure was intentionally lean, even elitist.

The FAQin the inaugural manifesto for the project explained why the devel-

opmentteam was small by identifying the kinds of problems that handicapped

progress on the original Mozilla project under Netscape after 2000: “Factors

such as marketing teams, compatibility constraints, and multiple UI designers

pulling in different directions have plagued the main Mozilla trunk develop-

ment.We feel that fewer dependencies, faster innovation, andmore freedom to

experimentwill leadto abetter endproduct” (blakeross.com/ﬁrefox/README-

1.1.html,accessed December 6, 2006).

Thelead developers wanted to keep the development group’sstructure sim-

ple,not just the browser’s design.According to the manifesto, CVS access was

“restrictedto a very small team. We’llgrow as needed, basedon reputation and

meritorious hacks” (README-1.1.html). Thus in typical open source style,

admissionwas meritocratic. To the question “how doI get involved,” the blunt

answerwas “by invitation.This is a meritocracy –those who gain therespect of

thosein the group will be invitedto join the group.” As faras getting help from

participants who wanted to chime in about bugs they had detected, the FAQ

wasequally blunt. To the question “where doI ﬁle bugs,” the answer was “you

don’t. Weare not soliciting input at this time. See Q2.” Of course the project

wasopen, so youcould get acopy of thesource code fromthe Mozilla CVStree.

Despite these restrictions, the list of credited participants in the 1.0.4 version

included about 80 individuals, which is a signiﬁcant base of recognized con-

tributors.You can refer to the Help > About Firefox button in the browser for

thecurrent credits list. Subsequent documents elaborated on howto participate

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

46 2 Open Source Internet Application Projects

in the project in a more nuanced and inclusive way but without changing the

underlyingtough standards (Ross, 2005a).

Thestudy by Krishnamurthy (2005a)describes the project as a“closed-door

opensource project,”a characterizationnot intended tobe pejorative.It analyzes

the logistic and organizational motivations for and consequences of enforcing

tight standards for participating in development. Overly restrictive control of

entryto participation in an open project can havenegative ramiﬁcations for the

long-term well-being of the project. Indeed, Firefox developer Mike Connor

complainedvocally at one point that “in nearly three years we haven’t built up

a community of hackers, and now I think we’re in trouble. Of the six people

whocan actually review inFirefox, four are AWOL,and one doesn’t do alot of

reviews” (Connor, 2006a). However, a subsequent blog by Connor described

the ongoing commitment of the major Firefox developers and the number of

participantsin Mozillaplatform projects morepositively,including the presence

ofmany corporate sponsored “hackers” (Connor, 2006b).

Althoughthe core developmentteam was small andinitially not solicitousto

potential code contributors,the project made intensive effort to create an open

communitysupport base of usersand boosters. The sitehttp://www.mozila.org/

isused to support product distribution. A marketing site atwww.spreadﬁrefox.

com was set up where volunteers were organized to “spread the word” about

Firefoxin variousways, akey objectiveof the promotionalcampaign being used

toget end users to switch from Internet Explorer. The site www.defendthefox.

comwas established to put “pressureon sites that were incompatible withFire-

fox.Users could visit itand identify web sites thatdid not display appropriately

when Firefox was used as the browser” (Krishnamurthy, 2005b). Although

Firefoxwas open source, the notion that largenumbers of developers would be

participating in its developmentwas mistaken; the participants were primarily

involvedin its promotion. The combination of a complacent competitor (Inter-

net Explorer), an energized open volunteer force organized under an effective

leader,and an innovativeproduct was instrumental in the rapid successof Fire-

fox (Krishnamurthy, 2005b). It also beneﬁted from strong public recognition

likebeing named PC World Product of the Year 2005.

Thereare a number of characteristics on which Firefox has been claimed to

besuperior and perceptions that have helped makeit popular, including having

better security, the availability of many user-developed extensions, portabil-

ity,compliance with Web standards, as well as accessibility and performance

advantages.We will brieﬂy examine these claims.

Open source Firefox is arguably more secure than proprietary Internet

Explorer.For example, the independent computer security tracking ﬁrm Secu-

nia’s (Secunia.com) vulnerability reports for 2003–2006 identify almost 90

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 47

security advisories for IE versus slightly more than 30 for Firefox. Further-

more, about 15% of the IE advisories were rated as extremely critical versus

only3% for Firefox. Other related security statistics from Secunia note that as

ofJune 2006, more than 20of over 100 IE advisorieswere unpatched, with one

or more of these listed as highly critical. In contrast, only 4 of the 33 Secunia

advisoriesfor Firefox were unpatchedand were listed as lesscritical. It must be

keptin mind that these security vulnerabilitiesﬂuctuate over time and there are

detailsto the advisories thatmake the interpretation ofthe statistics ambiguous,

but Firefox seems to havea security edge over Internet Explorer, at least at the

present time. Aside from the Firefox security model, the fact that the browser

isless closely bound to the operating system than Internet Explorer, its lack of

support for knownIE security exposures like ActiveX, and the public accessi-

bility of an open source product like Firefox to ongoing scrutiny of its source

codefor bugs and vulnerabilities arguably bolster its security.

A signiﬁcant feature of Firefox is that it allows so-called extensionsto pro-

vide extra functionality. According to Firefox Help, “extensions are small

add-ons to Firefox that change existing browser functionality or add new

functionality.” The Firefox site contains many user-developed extensions,

like the NoScript extension that uses whitelist-based preemptive blocking to

allow Javascriptand other plug-ins “only for trusted domains of your choice”

(https://addons.mozilla.org). The extensions are easy to install and uninstall.

Individuals can develop their own extensions using languages like Javascript

andC++. Refer to http://developer.mozilla.org/for a tutorial onhow to build an

XPCOM (Cross-Platform Component Object Model) component for Firefox.

This feature helps recruit talent to further develop the product. The extension

model has two important advantages. Not providing such functionalities as

default features helps keep the core product lean and unbloated. It also pro-

videsan excellent venue for capitalizingon the talent and creativityof the open

community.Creative developerscan design and implementnew add-ons. Users

interestedin the newfunctionality caneasily incorporate itin their ownbrowser.

Thisprovides the advantage of feature ﬂexibility without feature bloat and lets

userscustom-tailor their own set of features. Firefox also provides a variety of

accessibilityfeatures that facilitate its use by the aged and visually impaired.

Therelative performance of browsers in terms ofspeed is not easy to judge,

and speed is only one aspect of performance. A fast browser compromised

by serious security weaknesses is not a better browser. Useful Web sites like

howtocreate.co.uk/browserSpeed.html(accessed December 6, 2006) present a

mixed picture of various speed-related metrics for browsers for performance

characteristics, like time-to-cold-start the browser, warm-start-time (time to

restartbrowser after it has been closed), caching-retrieval-speed, script speed,

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

48 2 Open Source Internet Application Projects

andrendering tables,for browserssuch as Firefox,Internet Explorer,and Safari.

Thesestatistics do not uniformly favor any one of the browsers.

HTMLand Javascript

We conclude our discussionof browsers with some brief remarksabout HTML

and Javascript, tools that are central features of the Internet experience. Both

HTML(a markuplanguage) and Javascript(a client-side scriptinglanguage that

actsas the API foran HTML document’sDocument Object Model) haveacces-

sible source code. Therefore, in this rudimentary sense, they are not “closed

source.” Of course, neither are they “open source” in any strict sense of the

term, since, other than the visibility of the code, none of the other features of

open source software come into play, from licensing characteristics, to modi-

ﬁcation and redistribution rights, to open development processes. The HTML

and Javascript contents have implicit and possibly explicit copyrights and so

infringement by copying may be an issue, but there are no license agreements

involved in their access. Some purveyors of commercial Javascript/HTML

applications do have licenses speciﬁcally for developer use, but these are not

opensoftware licenses. Despitethe absence of licensingand other free software

attributes,the innate visibility of the code for these components is noteworthy

(seealso Zittrain, 2004).O’Reilly (2004) observed, aswe noted previously,that

thesimple “View Source” capabilityinherited by browsers from Berners-Lee’s

originalbrowser had the effect of reducing“barriers to entry for amateurs” and

was “absolutely key to the explosive spread of the Web” because one could

easilyimitate the code of others.

References

Andreessen, M. (1998). Innovators of the Net: Brendan Eich and Javascript. http://

cgi.netscape.com/columns/techvision/innovators

be.html. Accessed January 10,

2007.

Andreessen, M. and Bina, E. (1994). NCSA Mosaic: A Global Hypermedia System.

InternetResearch, 4(1), 7–17.

Berners-Lee,T. (1999). Weaving the Web.Harper, San Francisco.

Connor, M. (2006a). Myths and Clariﬁcations. March 4. http://steelgryphon.com/

blog/?p=37.Accessed December 6, 2006.

Connor, M. (2006b). Myths and Clariﬁcations. March 11. http://steelgryphon.com/

blog/?p=39.Accessed December 6, 2006.

Delio, M. (2004). Mozilla Feeds on Rival’s Woes. http://www.wired.com/news/

infostructure/0,1377,64065,00.html.Accessed November 29, 2006.

Eich, B. (1998). Making Web Pages Come Alive. http://cgi.netscape.com/columns/

techvision/innovators

be.html.Accessed January 10, 2007.

Gillies, J. and Cailliau, R. (2000). How the Web Was Born. Oxford University Press,

Oxford.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.2 The Browsers 49

Grifﬁn,S. (2000).Internet Pioneers: Marc Andreessen.http://www.ibiblio.org/pioneers/

andreesen.html.Accessed January 10, 2007.

Kollar-Kotelly, C. (2002). United States of America v. Microsoft Corporation. Civil

Action No. 98–1232 (CKK). Final Judgment. http://www.usdoj.gov/atr/cases/

f200400/200457.htm.Accessed January 10, 2007.

Krishnamurthy, S. (2005a). About Closed-Door Free/Libre/Open Source (FLOSS)

Projects: Lessons from the Mozilla Firefox Developer Recruitment Approach.

European Journal for the Informatics Professional. 6(3), 28–32. http://www.

upgrade-cepis.org/issues/2005/3/up6–3Krishnamurthy.pdf.Accessed January 10,

2007.

Krishnamurthy, S. (2005b). The Launching of Mozilla Firefox – A Case Study

in Community-Led Marketing. http://opensource.mit.edu/papers/sandeep2.pdf.

AccessedNovember 29, 2006.

Lea, G. (1998). Prosecution Says Gates Led Plan to Crush Netscape. Octo-

ber20. http://www.theregister.co.uk/1998/10/20/prosecution

says gates led plan/.

AccessedJanuary 10, 2007.

Lohr, S. (1999). The Prosecution Almost Rests: Government Paints Microsoft as

Monopolistand Bully. January 8.The NY Times on the Web.http://query.nytimes.

com/gst/fullpage.html?sec=technology&res=9C03E6DD113EF93BA35752C0

A96F958260&n=Top%2fReference%2fTimes%20Topics%2fSubjects%2fA%2f

Antitrust%20Actions%20and%20Laws.Accessed January 10, 2007.

Markoff, J. (1993). A Free and Simple Computer Link. December 8. http://www.

nytimes.com/library/tech/reference/120893markoff.html. Accessed January 10,

2007.

McHugh, J. (2005). The Firefox Explosion. Wired Magazine, Issue 13.02.

http://www.wired.com/wired/archive/13.02/ﬁrefox.html.Accessed November 29,

2006.

Mook,N. (2004). Firefox Architect Talks IE, Future Plane. Interview with Blake Ross.

November29. http://www.betanews.com/article/Firefox

Architect Talks IE

Future Plans/1101740041. Accessed December 6, 2006.

NCSAmosaic.pdf. (2006). World Wide Web. http://www.governingwithcode.org. Ac-

cessedJanuary 10, 2007.

O’Reilly,T. (2000). Open Source: TheModel for Collaboration in the Age of the Inter-

net. O’Reilly Network. http://www.oreillynet.com/pub/a/network/2000/04/13/

CFPkeynote.html?page=1.Accessed November 29, 2006.

Raggett,D., Lam,J., Alexander,I., and Kmiec,K. (1998). Raggetton HTML 4.Addison-

WesleyLongman, Reading, MA.

Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.

ﬁrstmonday.dk/issues/issue3

3/raymond/index.html.Ongoing version: http://

www.catb.org/∼esr/writings/cathedral-bazaar/.Accessed December 3, 2006.

Reid,R.H. (1997). Architects ofthe Web: 1,000 DaysThat Built the Future of Business.

JohnWiley & Sons, New York.

Ross,B. (2005a). Developer Recruitment in Firefox. January 25. http://blakeross.com/.

AccessedDecember 6, 2006.

Ross, B. (2005b). The Firefox Religion. January 22. http://blakeross.com/. Accessed

December6, 2006.

Ross,B. (2006).How to Hearwithout Listening. June6. http://blakeross.com/. Accessed

December6, 2006.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

50 2 Open Source Internet Application Projects

Wagner, D. (2002). “Marc Andreessen,” Jones Telecommunications and Multime-

diaEncyclopedia. Jones International. Seealso: http://www.thocp.net/biographies/

andreesen

marc.htm.Accessed January 10, 2007.

Zawinski, J. (1999). Resignation and Postmortem. http://www.jwz.org/gruntle/nomo.

html.Accessed November 29, 2006.

Zittrain,J. (2004). Normative Principles for Evaluating Free and Proprietary Software.

Universityof Chicago Law Review, 71(1), 265.

2.3 Fetchmail

EricRaymond, a well-knownopen source advocate, publishedan essay in 1998

aboutopen source development. The essay was called “The Cathedraland The

Bazaar”(Raymond, 1998). It famouslycontrasted the traditional model ofsoft-

ware development with the new paradigm introduced by Linus Torvalds for

Linux.Raymond compared the Linux styleof development to a Bazaar.In con-

trast,Brooks’ classic book on software development The Mythical ManMonth

(Brooks, 1995) had compared system design to building a Cathedral, a cen-

tralized understanding of design and project management. Raymond’s essay

recounts the story of his own open source development project, Fetchmail,

a mail utility he developed in the early 1990s. He intentionally modeled his

developmentof the mail utility on how Linus Torvaldshad handled the devel-

opmentof Linux. Fetchmail is now a common utility on Unix-like systems for

retrieving e-mail from remote mail servers. According to the description on

itsproject home page, it is currently a “full-featured, robust, well-documented

remote-mailretrieval andforwarding utilityintended to beused overon-demand

TCP/IPlinks (such as SLIP orPPP connections). It supports everyremote-mail

protocolnow in use on the Internet”(http://fetchmail.berlios.de/, accessed Jan-

uary12, 2007.)

Although Fetchmail is a notable project, it pales in scope and signiﬁcance

to many other open source projects. Efforts like the X Window System are

orders of magnitude larger and far more fundamental in their application but

receiveless coverage. However,Fetchmail had a bard in EricRaymond and his

essayhas been widelyinﬂuential in the opensource movement. Itaphoristically

articulated Torvalds’development methodology at a critical point in time and

tookon the status of analmost mythological description of Internet-based open

sourcedevelopment. Italso introduced the termbazaar as an imagefor the open

styleof collaboration.

Raymond structures his tale as a series of object lessons in open source

design,development, and management that he learned from the Linux process

andapplied to his ownproject. The story beganin 1993 when Raymond needed

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.3 Fetchmail 51

a mail client that would retrieve his e-mail when he dialed up on his intermit-

tent connection from home. Applications like this were already available and

typicallyused a client-side application basedon the POP (or POP3)Post Ofﬁce

Protocol.However, the ﬁrst clients he tried did not handle e-mail replies prop-

erly,whence came his ﬁrst Linux-derived lesson or moral: every good workof

softwarestarts by scratching a developer’s personal itch. This is a nowfamous

aphorism in the literature on the motivations of open source developers. This

motivation contrasts sharply with the workaday world of most programmers

who“spend their days grinding awayfor pay at programs they neitherneed nor

love.But not in the Linux world – which may explain why the average quality

of software originated in the Linux community is so high” (Raymond, 1998).

Thelessons extend on from there and are both interesting and instructive.

A deﬁning characteristic of open source is that it lets you build on what

went before. It lets you start from somewhere, not from nowhere. It is a lot

easierto develop an application if youstart with a development base. Linusdid

thatwith Linux. Raymond did it with his more humble application, Fetchmail.

After Raymond recognized his need for an application, he did not just start

offprogramming it ex nihilo. That would have violated what Raymond (1998)

called the second lesson of open source development: “Good programmers

know what to write. Great ones know what to rewrite (and reuse).” People

typically think of code reuse in the context of general software engineering or

object-orientedstyle, class/library-based implementation. But reuse is actually

aquintessential characteristicand advantageof opensource development.When

onlyproprietary software is available,the source codefor previous applications

that a developer wants to improve or modify is, by deﬁnition, undisclosed. If

thesource code is not disclosed, it cannot be easily reused or modiﬁed, at least

withouta great deal ofreverse engineering effortwhich may evenbe a violation

of the software’slicensing requirements. If a proprietary program has an API,

it can be embedded in a larger application, on an as-is basis, but the hidden

source itself could not be adapted. Exactly the opposite is the case in the open

source world, where the source code is always disclosed by deﬁnition. Since

there is plenty of disclosed source code around, it would be foolish not to try

toreuse it as a point of departure for any related new development. Even if the

modiﬁcationis eventually thrown away or completely rewritten, it nonetheless

provides an initial scaffolding for the application. Raymond did this for his

e-mailclient, exactly as Linus had donewhen he initiated Linux. Linus had not

started with his own design. He started by reusing and modifying the existing

Minixopen source software developed by Tanenbaum. In Raymond’s case, he

“wentlooking for anexisting POP utility thatwas reasonably well coded,to use

asa developmentbase” (Raymond, 1998), eventuallysettling on anopen source

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

52 2 Open Source Internet Application Projects

e-mail client called Fetchpop. He did this intentionally, explicitly in imitation

of Linus’ approach to development. Following standard open source practice,

Raymondmodiﬁed Fetchpop and submitted his changes to the software owner

whoaccepted them and released it as an updated version.

Another principle of development is “reuse,” and then reuse and rebuild

againif appropriate. Fred Brooks’ had opined that a softwaredeveloper should

“plan to throw one away; you will anyhow” (Brooks, 1995). This is partly an

unavoidablecognitive constraint. To really understand a problem, you have to

try to solve the problem. After you’ve solved it once, then you have a better

appreciation of what the actual problem was in the ﬁrst place. The next time

around, your solution can then be based on a more informed understanding

of the issues. With this in mind, Raymond anticipated that his ﬁrst solution

might be only a temporary draft. So when the opportunity for improvement

presenteditself, he seized it. He came across another open source e-mail client

by Carl Harris called Popclient. After studying it, he recognized that it was

better coded than his own solution and he sent some patches to Harris for

consideration.However, as it turned out, Harris wasno longer interested in the

project.But, he gladly ceded ownership of the software to Raymond who took

on the role of maintainer for the Popclient project in mid-1996. This episode

illustrated another principle in the open source code of conduct: “When you

lose interest in a program, your last duty to it is to hand it off to a competent

successor”(Raymond, 1998). Responsible opensource fathers don’tleave their

childrento be unattended orphans.

Open source development has not always been distributed collaborative

development,which Raymondcalls bazaar styledevelopment. He describesthe

Linuxcommunity as resembling “a great babbling bazaar of differing agendas

and approaches. . .out of which a coherent and stable system could seemingly

emergeonly by a succession of miracles” (Raymond, 1998). He contrasts this

withone of the longest standing open source projects, the GNU project, which

had developed software the old-fashioned way, using a closed management

approach with a centralized team and slow software releases. With exceptions

like the GNU Emacs Lisp Library, GNU was not developed along the lines of

the Linux model. Indeed, consider the sluggishness of the development of the

GNUGCC Compiler, done in the traditional manner,versus the rapid develop-

mentthat occurred when the GCC project was bifurcated into two streams: the

regularGCC development mode and a parallel “bazaar” mode of development

`alaLinux for what was called EGCS (Experimental GNU Compiler System)

beginning in 1997. The difference in the rates of progress of the two projects

was striking. The new bazaar development style for EGCS dramatically out-

paced the conventional mode used for the GCC project, so much so that by

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.3 Fetchmail 53

1999 the original GCC project was sunset and development was placed under

the EGCS project, which almost amounted to a controlled experiment on the

relativeeffectiveness of the bazaar and conventional methods.

Inthe open source and Unix tradition, users tend tobe simultaneously users

and developers, often expert developers or hackers. As expected, Raymond’s

adopted Popclient project came with its own user base. So once again in con-

sciousimitation of the Linux development model, herecognized that this com-

munity of interest was an enormous asset and that “given a bit of encourage-

ment, your users will diagnose problems, suggest ﬁxes, and help improve the

code far more quickly than you could unaided” (Raymond, 1998). The devel-

opment principle he followed was that “treating your users as co-developers

isyour least-hassle route to rapid code improvement and effective debugging”

(Raymond, 1998). This process of successfully engaging the user-developer

basewas exactly what Linus had done so well with Linux.

The use of early and frequent software releases was another quintessen-

tial characteristic of the Linux development process. This kept the user base

engagedand stimulated.Linus’ approach rancontrary to theconventional think-

ingabout development. Traditionally,people believedthat releasing premature,

buggyversions of software would turn users off. Of course, in the case of sys-

tem software like Linux and a user-developer base of dedicated, skilled hack-

ers, this logic did not apply. The Linux development principle was “Release

early.Release often. And listen to your customers” (Raymond, 1998). Granted

that frequent releases were characteristic in the Unix tradition, Linus went far

beyondthis. He “cultivatedhis base of co-developersand leveraged theInternet

forcollaboration” (Raymond, 1998)tosuchan extent and soeffectively that he

scaledup the frequent release practice by an order of magnitude overwhat had

ever been done previously. Releases sometimes came out at the unbelievable

rateof more than once a day. It was no accident that the initiation of the Linux

project and the burgeoninggrowth of the Internet were coincident because the

Internet provided both the distributed talent pool and the social interconnec-

tivitynecessary for this kind of development to happen. Raymond’s Fetchmail

project intentionally followed Linus’ modus operandi, with releases almost

alwaysarriving at most at 10-dayintervals, and sometimes evenonce a day `ala

Linux.

This community-participation-driven process unsurprisingly required a lot

of people skills to manage properly. Again per Linus’ practice, Raymond cul-

tivatedhis own beta list of tester supporters. The total number of participants

inhis project increased linearly from about 100 initiallyto around 1,500 over a

5-yearperiod, andwith user-developersreaching apeak of about300, eventually

stabilizing at around 250. During the same period the number of lines of code

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

54 2 Open Source Internet Application Projects

grewfrom under 10,000 to nearly 50,000. As with many open source projects,

thereare excellent development statistics. For Fetchmail, seethe historical and

statistical overviewat http://www.catb.org/∼esr/fetchmail/history.html.These

user-developers had to be kept engaged, just as Linus had to keep his user-

developersinterested. Their egos had to be stroked by being adequately recog-

nized for their contributions, and even given rapid satisfaction via the speedy

releases incorporating new patches. Raymond added anyone who contacted

him about Fetchmail to his beta list. Normally beta testing, where a product

is given exposure to real-world users outside the development organization,

would be the last round of testing of a product before its commercial release.

But in the Linux model, beta testing is dispersed to the user-developers over

manybeta style releases, prior to the releaseof a stable tested product for more

general users. Raymond would make “chatty announcements” to the group to

keepthem engaged and he listened closely to his beta testers. As a result, from

theonset he received high-qualitybug reports and suggestions.He summarized

theattitude with the observation that “if you treatyour beta-testers as if they’re

your most valuable resource, they will respond by becoming your most valu-

able resource” (Raymond, 1998). This not only requires a lot of energy and

commitment on the part of the project owner, it also means the leader has to

havegood interpersonal and communication skills. The interpersonalskills are

needed to attract people to the project and keep them happy with what’s hap-

pening. The communication skills are essential because communicating what

is happening in the project is a large part of what goes on. Technical skill is a

given, but personality or management skill is invariably a prominent element

inthese projects.

Theuser-developerbase is criticalto spottingand ﬁxingbugs. Linusobserved

thatthe bug resolution processin Linux was typically twofold.Someone would

ﬁnd a bug. Someone else would understand how to ﬁx it. An explanation for

the rapidity of the debugging process is summarized in the famous adage:

“Given enough eyeballs, all bugs are shallow” (Raymond, 1998). The bazaar

developmentmodel appeared toparallelize debugging with amultitude of users

stressingthe behavior of the system in different ways. Given enough suchbeta

testers and codevelopers in the open source support group, problems could be

“characterized quickly and the ﬁx (would be) obvious to someone.” Further-

more, the patch “contributions(were) received not from a random sample, but

frompeople who (were) interested enoughto use the software, learn abouthow

itworks, attempt to ﬁnd solutions to problemsthey encounter(ed), and actually

produce an apparently reasonable ﬁx. Anyone who passes all these ﬁlters is

highly likely to have something useful to contribute” (Raymond, 1998). On

thebasis of this phenomenon, not only were recognized bugs quickly resolved

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.3 Fetchmail 55

in Linux development, the overall system was relatively unbuggy, as even the

Halloweendocuments from Microsoft observed.

Debuggingin anopen sourceenvironment isextremely different fromdebug-

gingin aproprietary environment. Afterdiscussions with opendevelopers, Ray-

mondanalyzed in detail how the debugging processworks in open source. The

key characteristic is source-code awareness. Users who do not have access to

source code tend to supply more superﬁcial reports of bugs. They provide not

onlyless background information but alsoless “reliable recipe(s) for reproduc-

ingthe bug” (Raymond, 1998). In a closed source environment, the user-tester

is on the outside of the application looking in, in contrast to the developer

who is on the inside looking out and trying to understand what the bug report

submitted by a user-observer means. The situation is completely different in

an open source context where the “tester and developer (are able) to develop

a shared representation grounded in the actual source code and to communi-

cateeffectively about it” (Raymond, 1998). He observesthat “most bugs, most

of the time, are easily nailed given even an incomplete but suggestive charac-

terization of their error conditions at source-code level (italics added). When

someone among your beta-testers can point out, ‘there’s a boundary problem

in line nnn’, or even merely ‘under conditions X, Y, and Z, this variable rolls

over’, a quick look at the offending code often sufﬁces to pin down the exact

modeof failure and generate a ﬁx” (Raymond, 1998).

Theleader of an open source developmentproject does not necessarily have

to be a great designer himself, buthe does have to be able to recognize a great

design when someone else comes up with one. At least this is one of Ray-

mond’sinterpretations of the Linux development process. It certainly reﬂects

whatoccurred in his own project.By a certain point, he hadgone through mod-

iﬁcationsof two preexisting open source applications: Fetchpop, where he had

participated brieﬂy as a contributor, and Popclient, where he had taken over

as the owner and maintainer from the previous project owner. Indeed he says

thatthe “biggest single payoff I got from consciously trying to emulate Linus’

methods” happened when a “user gave me this terriﬁc idea – all I had to do

was understand the implications” (Raymond, 1998). The incident that precip-

itated the revelation occurred when Harry Hochheiser sent him some code for

forwardingmail to the client SMTP port. Thecode made Raymond realize that

he had been trying to solve the wrong problem and that he should completely

redesign Fetchmail as what is called a Mail Transport Agent: a program that

movesmail from one machine to another. The Linuxlessons he was emulating

atthis point were twofold: the second best thingto having a good idea yourself

is“recognizing good ideasfrom your users” andthat it isoften the case that“the

most striking and innovative solutions come from realizing that your concept

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

56 2 Open Source Internet Application Projects

ofthe problem was wrong” (Raymond, 1998).The code of the redesigned soft-

ware turned out to be both better and simpler than what he had before. At this

point it was proper to rename the project. He called it Fetchmail. Fetchmail

wasnowatool that any Unix developer with a PPP (Point-to-Point Protocol)

mail connection would need, potentially a category killer that ﬁlls a need so

thoroughly that alternatives are not needed. In order to advance it to the level

ofa truly great tool, Raymond listened to hisusers again and added some more

keyfeatures like what is called multidrop support (which turns out tobe useful

forhandling mailing lists) and support for 8-bit MIME.

Raymondalso elaboratescogently on thekey preconditions fora bazaar style

of development to be possible in the ﬁrst place. These include programmatic,

legal,and communication requirements. The programmatic requirements were

particular to a project, and the legal and communications infrastructure were

genericrequirements for the entire phenomenon.

The Linux style design process does not begin in a vacuum. Programmati-

cally,in open source development there hasto be something to put on the table

beforeyou can start having participants improve, test, debug,add features, and

soon to the product. Linus, forexample, began Linux with apromising prelim-

inarysystem which in turnhad been precipitated by Tanenbaum’searlier Minix

kernel.The same was true for Raymond’s Fetchmail, which, like Linux, had a

“strong, attractive basic design(s)” before it went public. Although the bazaar

style of development works well for testing, debugging, code improving, and

program design, one cannot actually originate a product in this forum. First of

all,there has to be a program to puton display that runs! There cannot be justa

proposalfor an idea. In opensource, code talks. Secondly,the running program

has to haveenough appeal that it can “convince potential co-developers that it

canbe evolved into somethingreally neat in the foreseeablefuture” (Raymond,

1998). It may have bugs, lack key features, and have poor documentation, but

it must run and have promise. Remember that “attention is still a nonrenew-

able resource” and that the interest of potential participants as well as “your

reputationis on the line” (Fogel and Bar, 2003).

Another precondition for bazaar-style open development is the existence

of an appropriate legal framework. The nondisclosure agreements of the pro-

prietary Unix environment would prevent this kind of free-wielding process.

Explicitly formulated and widely recognized free software principles lay the

groundfor a legal milieu people can understand and depend on.

Prior to free software and Linux, the open development environment was

not only legally impeded but geographically handicapped as well. The ﬁeld

already knew from extensive experience with the multidecade effort in Unix

that great software projects exploit “the attention and brainpower of entire

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.3 Fetchmail 57

communities” eventhough coding itself remains relatively solitary.So collab-

orativedistributed development was a recognized model – and after all it was

the classic way in which science had always advanced. But remote collabora-

tionhad remained clumsy, and an effectivecommunications infrastructure that

developerscould work in was needed. The Internet had to emergeto transcend

thegeographically bound developer communities at institutionslike Bell Labs,

MIT, and Berkeley that did foster free interactions between individuals and

groupsof highly skilled, but largely collocated, codevelopers. Withthe WWW

emerging, the collaborative approach represented by these traditional groups

couldbe detached from its geographic matrix and could even be exponentially

larger in terms of the number of people involved. At that point, one merely

neededa developer who knew “how to create an open, evolutionary context in

which feedback exploring the design space, code contributions, bug-spotting,

and other improvements come from hundreds (perhaps thousands) of people”

(Raymond,1998).

Linuxemerged when these enabling conditions wereall in place. The Linux

projectrepresented a conscious decision byTorvalds to use“the entire world as

itstalent pool” (Raymond,1998). Before the Internetand the WWW thatwould

have been essentially unworkable and certainly not expeditious. Without the

legalapparatus of freeand open software, theculture of development wouldnot

havehad a conceptualframework within which todevelop its process. Butonce

these were in place, things just happened naturally. Linux, and its intentional

imitatorslike Fetchmail, soon followed.

Traditional project management has well-identiﬁed, legitimate concerns:

howare resources required, people motivated, work checked for quality, inno-

vation nurtured, and so on. These issues do not disappear just because the

development model changes. Raymond describes how the project manage-

ment concerns of the traditional managerial model of software development

are addressed or obviated in the bazaar model of development. Let us assume

the basic traditional project management goals are deﬁning the project goals,

making sure details are attended to, motivatingpeople to do what may be bor-

ing,organizing people to maximize productivity, and marshaling resources for

theproject. How are these objectives met in open source development?

To begin with, consider human resources. In open projects like Linux the

developerswere, at least initially, volunteers, self-selected on thebasis of their

interest, though subsequently they may have been paid corporate employees.

Certainlyat thehigher levels ofparticipation, they hadmeritorious development

skills, arguably typically at the 95% percentile level. Thus, these participants

broughttheir own resources to the project, thoughthe project leadership had to

be effectiveenough to attract them in the ﬁrst place and then retain them. The

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

58 2 Open Source Internet Application Projects

openprocess also appears to be ableto organize people very effectivelydespite

thedistributed environment. Because participants tend to be self-selected,they

come equipped with motivation, in possible contrast to corporate organiza-

tions based on paid employees who might rather be doing something else or

at least working on a different project. The monitoring provided in a conven-

tionalmanagerial environment is implemented radically differently in an open

source, where it is replaced by widespread peer and expert review by project

leaders,maintainers, committers, or betatesters. In fact, inopen source “decen-

tralized peer review trumps all the conventional methods for trying to ensure

that details don’t get skipped” (Raymond, 1998). Finally, consider the initial

deﬁnition of the project, an issue that is also directly related to the question of

innovativeness.Open projects like Linux have been criticized for chasing the

taillights of other extant projects. This is indeed one of the design techniques

that has been used in the free software movement where part of the histori-

cal mission has to be to recreate successful platforms in free implementations

(seesuch as Bezroukov (1999)ontheHalloween-I document). However, open

projects do not always imitate. For example, Scacchi (2004,p.61)describes

howin the creation of open requirements for game software, the speciﬁcations

“emergeas aby-product of communitydiscourse about whatits software should

or shouldn’t do .. . and solidify into retrospective software requirements.” On

the other hand, the conventional corporate model has a questionable record

of deﬁning systems properly, it being widely believed that half to three quar-

ters of such developments are either aborted before completion or rejected by

users. Creativeideas ultimately come from individuals in any case, so what is

neededis an environmentthat recognizes andfosters such ideas,which the open

sourcemodel seems to doquite well. Furthermore, historically,universities and

researchorganizations haveoften been the sourceof software innovation,rather

thancorporate environments.

We conclude with some comments about the bazaar metaphor and Linus’

law.Tobegin with, letus note that thoughRaymond’s seminal bazaarmetaphor

is striking; everymetaphor has its limitations. The imagery resonates with the

myriadof voices heard in an open development and has an appealing romantic

cache. The term also probably resonates with Raymond’s personal libertarian

beliefs, with their eschewal of centralized control. But it lacks adequate refer-

enceto an element essential in such development: the required dynamic, com-

petent, core leadership with its cathedral-like element. Of course, Raymond’s

essay clearly acknowledges this, but the bazaar metaphor does not adequately

capture it. Feller and Fitzgerald (2002,p.160)point out that many of the

most important open source projects from Linux and Apache to GNOME and

FreeBSD are in fact highly structured, with a cadre of proven developers with

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.3 Fetchmail 59

expertise acknowledged in the development community (p. 166). Raymond

himself underscores that a project must begin with an attractive base design

whichmore often than not comes from one or perhaps a few individuals. Con-

tributionsfrom a broader contributor pool may subsequently radically redeﬁne

theoriginal vision or prototype, but all along there is either a single individual

like Linus Torvalds or a small coherent group that adjudicates and vets these

contributions,integrating them in a disciplined way and generally steering the

ship (Feller and Fitzgerald, 2002,p.171). The bazaar can also be a source

of distraction. Participation by “well meaning .. . (but) dangerously half clued

peoplewith opinions – not code, opinions” (Cox, 1998) may proliferate like in

a bazaar,but this does not advance the ship’s voyage. Such caveats aside, it is

also indisputable that some of the unknown voices emanating from the bazaar

may ultimately prove invaluable, even if this is not obvious at ﬁrst. As Alan

Cox observes, there are “plenty of people who given a little help and a bit of

conﬁdence boosting will become one with the best” (Cox, 1998). The bazaar

canalso provide helpful resources from unexpectedsources. For example, Cox

advises that “when you hear ‘I’d love to help but I can’t program’, you hear a

documenter.When they say ‘But English is not my ﬁrst language’ you have a

documenterand translator for another language” (Cox, 1998).

We next comment about the reliability achieved by open processes. Ray-

mond’sstatement of Linus’ law, “with enough eyeballs, all bugs are shallow,”

focuses on parallel oversight as one key to the reliability of open source. The

implication is that bugs are detected rapidly. One might ask “are the products

actually more reliable and is the reliability due to the fact that there are many

overseers?”The record of performance for open source systems generallysup-

ports the thesis that open development is often remarkably effective in terms

of the reliability of its products. There are also speciﬁc studies like the anal-

ysis of the comparative reliability of MySQL mentioned in Chapter 1.Even

the Halloween memos from Microsoft refer to Microsoft’s own internal stud-

ies on Linux that accentuate its record of reliability. The “eyeballs” effect is

presumablypart of the explanation for this reliability.

Researchby Payne (1999,2002), which compares securityﬂaws in open and

closed systems, suggests that other causes may be at work as well. It suggests

that a mixture of practices explains reliability/security differences for the sys-

temsstudied. The study examined the security performance of three Unix-like

systemsin an effort to understandthe relation between the security characteris-

ticsof the systems, their open or closed source status, and their speciﬁc devel-

opment processes. The systems considered were OpenBSD, the open source

Debian GNU/Linux distribution, and the closed source Sun Solaris system.

Granted the myriad uncertainties intrinsic to any such study, Payne concluded

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

60 2 Open Source Internet Application Projects

that in terms of metrics like identiﬁed security vulnerabilities, OpenBSD was

themost secure of the three, followedat a considerable distance by Debian and

Solaris,with a slight edge given to Debian over Solaris. The results suggest an

overall security advantage for open source systems, though the relative close-

ness of the Debian and Solaris evaluations imply that open source status per

se is not the decisive driving factor. Indeed, as it turns out, there was a major

difference in the development processes for the systems that likely explains

much of the outcome. Namely, the “OpenBSD source code is regularly and

purposely examined with the explicit intention of ﬁnding and ﬁxing security

holes. . .by programmers with the necessary background and security exper-

tiseto make a signiﬁcant impact” (Payne, 2002). Inother words, the factor that

produced superior security may actually have been focused in auditing of the

code by specialists during development, though open source status appears to

bea supplemental factor.

Another perspectiveon the reliability or security beneﬁts of open source is

provided by Witten et al. (2001). Their analysis is guarded about the general

abilityof code reviews to detect securityﬂaws regardless of the mode of devel-

opment.However, theyobserve that the proprietary developmentmodel simply

obliges users to “trust the source code and review process, the intentions and

capabilitiesof developers to build safe systems, and the developer’s compiler”

andto “forfeitopportunities for improvingthe security oftheir systems”(Witten

et al., 2001,p.61).They also underscore the important role that open compil-

ers, whose operation is itself transparent, play in instilling conﬁdence about

what a system does. Incidentally, they observe how security enhancing open

compilers like Immunix Stackguard, a gcc extension (see also Cowan, 1998),

canadd so-called canaries toexecutables that can “defeat manybuffer overﬂow

attacks”(Witten et al., 2001,p.58).From this perspective, Linus’ law is about

more than just parallel oversight.It is a recognition of the inherent advantages

of transparency: open source code, a process of open development, the ability

tochange code, giving the user control of the product, open oversightby many

community observers, and even the transparency and conﬁdence provided by

opencompilers.

References

Bezroukov, N. (1999).A Second Look at the Cathedral and Bazaar. First Monday,

4(12).http://www.ﬁrstmonday.org/issues/issue4

12/bezroukov/.Accessed January

5,2007.

Brooks,F.P.(1995). The Mythical Man-Month – Essays on Software Engineering,20th

AnniversaryEdition, Addison-Wesley Longman, Reading, MA.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.4 The Dual License Business Model 61

Cowan, C. (1998). Automatic Detection and Prevention of Buffer-Overﬂow Attacks.

In: Proceedings of the 7th USENIX Security Symposium, USENIX, San Diego,

63–78.

Cox, A. (1998). Cathedrals, Bazaars and the Town Council. http://slashdot.org/

features/98/10/13/1423253.shtml.Accessed December 6, 2006.

Feller,J. and Fitzgerald,B. (2002). Understanding OpenSource Software Development.

Addison-Wesley,Pearson Education Ltd., London.

Fogel, K. and Bar, M. (2003). Open Source Development with CVS, 3rd edition.

ParaglyphPress. http://cvsbook.red-bean.com/.

Payne, C. (1999). Security through Design as a Paradigm for Systems Development.

MurdochUniversity, Perth, Western Australia.

Payne,C. (2002).On the Securityof Open SourceSoftware. Information Systems,12(1),

61–78.

Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.

ﬁrstmonday.dk/issues/issue3

3/raymond/index.html.Ongoing version: http://

www.catb.org/∼esr/writings/cathedral-bazaar/.Accessed December 3, 2006.

Scacchi, W.(2004). Free and Open Source Development Practices in the Game Com-

munity.IEEE Software, 21(1), 59–66.

Witten,B., Landwehr, C., and Caloyannides, M. (2001).Does Open Source Improve

SystemSecurity? IEEE Software, 18(5), 57–61.

2.4 The Dual License Business Model

Asoftware productcan be offeredunder different licensesdepending, for exam-

ple, on how the software is to be used. This applies to proprietary and open

sourceproducts and provides abasis for a viable businessmodel. Karels (2003)

examines the different commercial models for open products. The Sendmail

and MySQL products described later are representative. They have their feet

plantedﬁrmly in two worlds,the commercial one andthe open community one.

Onthe one hand,the business model provides“extensions or aprofessional ver-

sion under a commercial license” for the product (Karels, 2003). At the same

time, the company that markets the product continues its management of the

openversion. A keydistinction in the duallicense model is whetherthe free and

commercialproducts are identical. For the companies and products we discuss

thebreakout as follows:

1.Open and proprietary code different Sendmail,Inc.

2.Open and proprietary code same MySQLAB, Berkeley DB, Qt

But with the proper license, proprietary enhancements can be done, for

example, for MySQL. Licensing duality can serve a company in a number of

ways. It continues the operation of the open user-developer base. It also pro-

motesgoodwill for the company with theuser-developer base. It maintains and

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

62 2 Open Source Internet Application Projects

improvesacceptance forthe open sourcebase version. Thecontinued sponsored

developmentof the openversion simultaneously helps maintainand expand the

market for the commercial product. With products like Sendmail, the propri-

etary enhancements may include security improvements, such as e-mail virus

checking. Its distributions may typically provide “conﬁguration and manage-

menttools, higher-performance or higher-capacity versions” (Karels, 2003)to

supplement the root product in order to make a more attractive, commercially

viable product. The company’s product can thus end up incorporating both

open and commercially licensed software. From the customer’s point of view,

theproduct is now much like a traditional softwareproduct that is licensed and

paid for.One ongoing challenge for the distributor is to “maintain differentia-

tion between the free and commercial versions” since the commercial product

competeswith its open fork, at least when the commercial version is different.

In order for the open version to retain market share, its functionality has to be

maintained and upgraded. In order for the commercial version to avoid com-

petition from evolving open variants, it has to continue to provide “sufﬁcient

additionalvalue to induce customers to buy it and to reduce the likelihood of a

free knockoff of the added components” (Karels, 2003). The commercial ver-

sion also has to provide all the accoutrements associated with a conventional

softwareprovider, such as support, training, and product documentation.

Productsthat canbe successfully marketedunder a duallicensing framework

tendto have what are called strong network effects; that is, thebeneﬁt or value

ofusing a copy of the software tendsto depend on how many other people also

use the software. For example,e-mail is not of much value if you have no one

to e-mail; conversely, its value is greater the more people you can reach. For

such products, the broader the user base, the more valuablethe product. In the

dual license model, the free, open license serves the key role of establishing

a wide user base by helping to popularize the product with users (Valimaki,

2005). This popularized base helps make the product a branded entity, which

is extremely valuable for marketing purposes, especially for IT organizations

that are converting to open applications (Moczar, 2005). These effects in turn

makethe proprietary license moreworth buying for therelatively limited group

of potential commercial developers. It also makes it more attractive for them

to create a commercial derivative under the proprietary license because; for

example,the productwill havean established audienceof users.There is another

economic reason for allowing mixed license options like this: the open source

versioncan beused to buildup acoterie of independentdevelopers, bugspotters,

and so on – free contributors who can help enhance the product in one way

or another, beneﬁting the commercial company. As we will see later when

we discuss conﬁguration management tools, Larry McVoycreated a dual faux

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.4 The Dual License Business Model 63

free license for BitKeeper, especially for use on the Linux kernel. This had

the expresspurpose of building BitKeeper’sproprietary market share and also

obtaining highly useful input about bugs in the product from the Linux kernel

developers who used it. This “dual licensing” business strategy worked very

welluntil ithad tobe discontinued becauseof theopen community controversies

aboutthe “free” license version.

References

Karels,M. (2003). Commercializing Open Source Software. ACMQueue, 1(5), 46–55.

Moczar, L. (2005). The Economics of Commercial Open Source. http://pascal.case.

unibz.it/handle/2038/501.Accessed November 29, 2006.

Valimaki,M. (2005). The Rise of Open Source Licensing: A Challenge to the Use of

IntellectualProperty in the Software Industry.Turre Publishing, Helsinki, Finland.

2.4.1 Sendmail

The open source product Sendmail is a pervasive Internet e-mail applica-

tion. However, its story is much less well known than Fetchmail’s because

ofthe unique inﬂuence of Raymond’s (1998) Cathedral and Bazaar article that

ensconced Fetchmail’s development as a canonical story of the practices and

principlesof open development,even though Fetchmailis for a farmore limited

use than Sendmail. Sendmail is worth discussing, not only because it carries

mostof the world’s e-mail trafﬁc, butbecause it represents another major open

source application that eventually morphed into dual open and commercial

versions.

TheSendmail project was started asan open project at UC Berkeleyin 1981

byEric Allman, who has also maintained the project sincethat time as an open

development.Allman had previously authored theARPANET mail application

delivermailin 1979 that was included with the Berkeley Software Distribution

(BSD),which in turn would subsequently include Sendmail instead. Sendmail

is a Mail Transfer Agent or MTA. As such, its purpose is to reliably transfer

e-mail from one host to another, unlike mail user agents like Pine or Out-

look that are used by end users to compose mail. The software operates on

Unix-likesystems, though there is also a Windowsversion available. The open

sourceversion ofSendmail is licensed underan OSI-approvedBSD-like license

(seehttp://directory.fsf.org for veriﬁcation), as well as, since 1998, under both

genericand custom commerciallicenses. Sendmail servesa signiﬁcant percent-

ageof all Internet sites, though there appears to be a decline over time (Weiss,

2004).It represents a defacto Internet infrastructure standardlike TCP/IP, Perl,

andApache.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

64 2 Open Source Internet Application Projects

Naturally,andhealthily, thefree softwaremovement hasnever beenabout not

making money.The time frame in which time spent, effort, and talent are fun-

giblewith compensation may be elastic, but it is ﬁnite. People eventually have

tocash in on theirproducts or their expertise inone way or another,or switch to

anotherline of workwhere theycan make aliving. This isobviously appropriate

andexpected sincetheir involvementwill almostalways havebeen long,intense,

andvaluable. Thus for Sendmailas with many open sourceprojects, the project

originatorand leader Allmaneventually took theproject on a commercialroute,

establishingSendmail Inc. in 1998 in order todevelop a commercial version of

the software (see http://www.sendmail.org/ and the complementary commer-

cial site http://www.sendmail.com). The expanded commercialized version of

the product is differentfrom the open source version and offers many features

notavailable in the open version. For example, it providesa GUI interface that

signiﬁcantlyfacilitates theinstallation and conﬁguration ofthe software, incon-

trastto the open source version thatis well known to be extremely complicated

to install. The commercial product also incorporates proprietary components

that are combined with the open source components. According to the Send-

mail Inc. Web site, their commercial products provide “clear advantage over

open source implementations” in a variety of ways, from security and techni-

cal support to enhanced features. The Sendmail Inc. and its project maintainer

Allmanstill manage the development of the open source project, but they now

use its development process to also support the continued innovation of both

theopen and the commercial version of the product. After the Sendmail incor-

poration,the licensing arrangement was updatedto reﬂect the dual open source

andcommercial offerings. The original license for the pure open source appli-

cationremained essentially the same, afew remarks about trademarksand such

aside.For redistribution as part of a commercial product, a commerciallicense

isrequired.

References

Raymond,E.S. (1998). The Cathedral and the Bazaar. First Monday,3(3). http://www.

ﬁrstmonday.dk/issues/issue3

3/raymond/index.html.Ongoing version: http://

www.catb.org/∼esr/writings/cathedral-bazaar/.Accessed December 3, 2006.

Weiss,A. (2004). Has Sendmail Kept Pace in the MTARace? http://www.serverwatch.

com/stypes/servers/article.php/16059

3331691.Accessed December 1, 2006.

2.4.2 MySQL – Open Source and Dual Licensing

MySQL (pronounced My S-Q-L) is the widely used open source relational

database system that provides fast, multiuser database service and is capable

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.4 The Dual License Business Model 65

of handling mission-critical applications with heavy loads. It is suitable for

both Web server environments and embedded database applications. MySQL

is famous in open source applications as the M in the LAMP software archi-

tecture. The company that owns MySQL provides a dual licensing model for

distribution that permits both free and proprietary redistribution. One of the

notablecharacteristics of MySQL as an open source project is that virtually all

ofits developmentis done bythe companythat owns theproduct copyright. This

modelhelps keepits license ownershippure, ensuringthat its proprietarylicense

optionremains undisturbed. (We will not considerthe other major open source

database PostgreSQL (pronounced postgres Q-L) which is licensed under the

BSDlicense. The BSD’s terms are so ﬂexible that the duallicensing model we

are considering does not seem to come into play unless proprietary extensions

aredeveloped or value-added services are provided.)

MySQL was initially developed by Axmark and Widenius starting in 1995

and ﬁrst released in 1996. They intended it to serve their personal need for an

SQL interface to a Web-accessible database. Widenius recalls that their moti-

vationfor releasing it as an open sourceproduct was “because we believed that

wehad created something good and thought that someone else could probably

havesome use forit. Webecame inspired and continuedto work onthis because

ofthe very good feedback we got from people that tried MySQL and loved it”

(Codewalkers,2002). The trademarked product MySQL is now distributed by

the commercial company MySQL AB founded by the original developers in

2001.The company owns the copyright to MySQL.

All the core developers who continue the development work on MySQL

work for MySQL AB, even though they are distributed around the world. The

complexityof theproduct is onedisabling factorin allowing third-partyinvolve-

ment in the project (Valimaki, 2005). Independent volunteer contributors can

propose patches but, if the patches prove to be acceptable, these are generally

reimplemented by the company’s core developers. This process helps ensure

that the company’scopyright ownership of the entire product is never clouded

ordiluted (Valimaki,2005). The code revisions ensurethat GPL’d code created

byan external developer, whois de facto its copyright owner,is not included in

the system, so that the overallsystem can still be licensed under an alternative

ordual proprietary license that isnot GPL. The business and legalmodel is fur-

therdescribed later. Sometimes code contributionsare accepted under a shared

copyrightwith the contributor.Despite the strict handlingof patch proposals by

external contributors, a signiﬁcant number of the company’s employees were

actually recruited through the volunteer user-developer route. Indeed, accord-

ing to Hyatt (2006), of MySQL AB’s 300+ full-time employees, 50 of them

wereoriginally open source community volunteers for MySQL.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

66 2 Open Source Internet Application Projects

MySQLwas originally distributed for Unix-like environments under afree-

of-chargeand freeopen source licensethat allowed free redistributionunder the

usualcopyleft restriction (accordingto which anymodiﬁcations had tobe redis-

tributableunder the same terms as the original license). On the other hand, the

distribution license of MySQL in Windows environments was originally only

as so-called shareware that allowed copying and redistribution of the product

but did not permit modiﬁcation and in fact required users to pay a license fee

to use the software after an initial free trial period. This was changed to the

standardGPL for all platforms after 2000 (Valimaki, 2005).

The MySQL AB distribution uses the dual licensing business model. The

sametechnical product is distributed both as a free GPL-licensed package and

under different licensing terms for the purpose of proprietary development.

Refer to http://www.mysql.com/company/legal/licensing/ (accessed January

10,2007) for the legal terms of the license. Some of the basic pointsto keep in

mindfollow.If you embedMySQL in aGPL’d application, thenthat application

hasto be distributed as GPL by the requirements ofthe GPL license. However,

theMySQL proprietary license allows commercial developersor companies to

modifyMySQL and integrate itwith their ownproprietary products and sell the

resulting system as a proprietary closed source system. The license to do this

requiresa fee ranging up to $5,000 per year for the company’shigh-end server

(asof 2005). Thus if youpurchase MySQL under this commercial license,then

youdo not have to comply with the terms of the GNU GeneralPublic License,

ofcourse only in so far as it applies to MySQL.

You cannot in any case infringe on the trademarked MySQL name in any

derivativeproductyou create, an issuethat arose inthe dispute betweenMySQL

andNuSphere (MySQL News Announcement, 2001). The commercial license

naturally provides product support from MySQL AB, as well as product war-

ranties and responsibilities. These are lacking in the identical but free GPL’d

copy of the product, which is offered only on an as-is basis. This proprietary

form of the license is required even if you sell a commercial product that only

merely requires the user to download a copy of MySQL, or if you include a

copyof MySQL, or include MySQL drivers in a proprietary application! Most

of MySQL AB’sincome derives from the fees for the proprietary license with

additional revenues from training and consultancy services. The income from

these services and fees adds up to a viable composite open source/proprietary

businessmodel. As per Valimaki(2005), most of the company’sincome comes

fromembedded commercial applications.

The preservation of copyright ownership is a key element in the continued

viability of a dual license model like MySQL AB’s. In particular, the licensee

must have undisputed rights to the software in order to able to charge for the

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.4 The Dual License Business Model 67

software, distribute it under different licenses, or modify its licensing policies

(Valimaki, 2005). The generic open source development framework, where

developmentis fullycollaborative anddistributed, tends toundermine or diffuse

could be “hidden liabilities in code contributions from unknown third parties”

(Valimaki,2005).Thus, maintainingthe ability todual license witha proprietary

optionmandates that the developer of any newor modiﬁed code for the system

mustensure that he has exclusive copyright to the work – whence the cautious

behavior by MySQL AB with respect to how development contributions are

handled.

MySQLAB appears to apply a ratherstrict interpretation of what the condi-

tionsof the GPL mean in termsof its own legal rights (Valimaki,2005,p.137).

Forexample, consider a hypothetical case where someone develops a client

for MySQL. The client software might not even be bound either statically or

dynamically with any of the MySQL modules. The client software could just

use a user-developed GUI to open a command prompt to which it could send

dynamically generated commands for MySQL based on inputs and events at

the user interface. The client would thus act just like an ordinary user, except

thatthe commands it would tell MySQL to execute wouldbe generated via the

graphical interface based on user input, rather than being directly formulated

and requested by the user. However, since the composite application requires

the MySQL database to operate, it would, at least according to MySQL AB’s

interpretationof the GPL,constitute a derivativework of MySQLand so besub-

ject to GPL restrictions on the distributionof derivative works if they are used

forproprietary redistribution; that is, the client,even though it used noMySQL

code, would be considered a derivative of MySQL according to MySQL AB.

Asper Valimaki(2005,p.137),it seems that “thecompany regards allclients as

derivativeworks and inorder to even usea client with other termsthan GPL the

developer of the client would need to buy a proprietary license from MySQL

AB” and that in general “if one needs their database in order to run the client,

thenone is basicallyalso distributing MySQLdatabase and GPLbecomes bind-

ing,” though this does not appear to be supported by either the standard GPL

interpretationor the copyright law on derivative works (Valimaki, 2005).

Ironically,in the tangled Webof legal interactions that emerge betweencor-

porate actors, MySQL AB was itself involved in mid-2006 in potential legal

uncertaintiesvis- `a-vis the OracleCorporation and one of its own components.

Thesituation concerned MySQL AB’s use of the open source InnoDB storage

engine,a key componentthat was critical toMySQL’s handling oftransactions.

TheInnoDB component ensures that MySQL is ACID compliant. (Recall that

the well-known ACID rules for database integrity mean transactions have to

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

68 2 Open Source Internet Application Projects

satisfy the following behaviors: atomicity:nopartialtransaction execution –

it’s all or nothing in terms of transaction execution; consistency: transactions

must maintain data consistency – they cannot introduce contradictions among

the data in the database; isolation: concurrent transactions cannot mutually

interfere;durability: committed transactions cannot be lost – forexample, they

must be preserved by backups and transaction logs.) Oracle acquired the Inn-

oDB storage engine and in 2006, it bought out Sleepycat which makes the

BerkeleyDB storage engine also used byMySQL. The InnoDB storage engine

waseffectively a plug-in forMySQL, so alternative substitutions wouldbe fea-

sible in the event that MySQL’s continued use of the storage engine became

problematic. Additionally,the InnoDB engine is also available under the GPL

license. Nonetheless, such developments illustrate the strategic uncertainties

that even carefully managed dual licensed products may be subject to (Kirk,

2005).

References

Codewalkers. (2002). Interview with Michael Widenius. http://codewalkers.com/

interviews/Monty

Widenius.html.Accessed November 29, 2006.

Hyatt,J. (2006). MySQL: Workersin 25 Countries with No HQ. http://money.cnn.com/

2006/05/31/magazines/fortune/mysql

greatteams fortune/. Accessed November

29,2006.

Kirk, J. (2005). MySQL AB to Counter Oracle Buy of Innobase. ComputerWorld,

November23. http://www.computerworld.com.au/index.php/id;1423768456.

AccessedFebruary 11, 2007.

MySQL News Announcement. (2001). FAQon MySQL vs. NuSphere Dispute. http://

www.mysql.com/news-and-events/news/article

75.html. Accessed November, 29

2006.

Valimaki,M. (2005). The Rise of Open Source Licensing: A Challenge to the Use of

IntellectualProperty in the Software Industry.Turre Publishing, Helsinki, Finland.

2.4.3 Sleepycat Software and TrollTech

Sleepycat Software and TrollTech are two other examples of prominent and

successful companies that sell open source products using a dual licensing

businessmodel.

BerkeleyDB

Sleepycat Software owns, sells, and develops a very famous database system

called BerkeleyDB. The software for this system was originally developed as

part of the Berkeley rewrite of the AT&T proprietary code in the BSD Unix

distribution, a rewrite done by programmers Keith Bostic, Margo Seltzer, and

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.4 The Dual License Business Model 69

Mike Olson. The software was ﬁrst released in 1991 under the BSD license.

Recall that the BSD license allows proprietary modiﬁcation and redistribution

of software with no payments required to the original copyright owners. The

originalsoftware became widelyused and embeddedin a numberof proprietary

products.

BerkeleyDB is not an SQL database. Queries to the Berkeley DB are done

throughits own speciﬁc API. The system supports many commercialand open

applications,ranging from majorWeb sitesto cell phones, andis one ofthe stor-

ageengines available for MySQL. The softwarecurrently has over 200 million

deployments(Martens, 2005). Berkeley DB is a C library thatruns in the same

process as an application, considerably reducing interprocess communication

delays. It stores data as key/value pairs, amazingly allowing data records and

keysto be up to 4 GB in length, with tables up to 256 TB. The Sleepycat Web

sitedescribes the Berkeley DBlibrary as “designed torun in a completely unat-

tended fashion, so all runtime administration is programmatically controlled

by the application, not by a human administrator. It has been designed to be

simple, fast, small and reliable” (sleepycat.com). We refer the reader to the

article by Sleepycat CTO Margo Seltzer (2005) for a commanding analysis of

the opportunities and challenges in database system design that go far beyond

thetraditional relational model. Inresponse to the demandfor Berkeley DBand

some needs for improvements in the software, its developers founded Sleep-

ycat, further signiﬁcantly developed the product, and subsequently released

it under a dual licensing model in 1997 (Valimaki, 2005). The license model

used by Sleepycat is like that of MySQL. Versionsearlier than 2.0 were avail-

able under the BSD, but later versions are dual licensed. The noncommercial

license is OSI certiﬁed. However, the commercial license requires payment

for proprietary, closed source redistribution of derivatives. About 75% of the

company’s revenues come from such license fees. Similarly to the MySQL

AB model, Berkeley DB software development is basically internal, with any

externalcode contributions reimplemented by the company’s developers. This

is motivated, just as in the case of MySQL, not only by the desire to keep

ownershippure but also because of the complexity of the product.

TheQt Graphics Library

Another company that has long used a dual license model is TrollTech. Troll-

Techdevelops and markets a C++class library of GUI modules called Qt (pro-

nounced“cute” by its designers), which was eventually adopted by and played

animportant role inthe open sourceKDE project. Qtis cross-platform, support-

ing Unix-like, Windows,and Macintosh environments and provides program-

merswith an extensivecollection of so-calledwidgets. It isan extremely widely

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

70 2 Open Source Internet Application Projects

usedopen GUI developmentlibrary.Qt was ﬁrstpublicly released in 1995,orig-

inallyunder a restrictive though still open source license, which prohibited the

free redistribution of modiﬁcations. The use of Qt in the GPL’d KDE desktop

environmentcaused a well-known licensing controversy.Eventually TrollTech

was pressured bythe open source community to release its product not merely

asopen sourcebut under theGPL and despitethe initial aversionof thecompany

founder toward the GPL because of doubts about the implications of the GPL

in a dual license context (Valimaki, 2005). As it turned out, the approach was

successful and increased the popularity of the product. The company’s sales

derivelargely from its licensing fees. Proprietary development of the product

requires it be purchased under a commercial license. The free version helps

maintainthe open source user base. An educational version of the product that

integrateswith Microsoft’s Visual Studio.NET is available for Windows.

References

Martens,C. (2005). Sleepycat to Extend Paw to Asia. InfoWorld.http://infoworld.com/

article/05/06/22/HNsleepycat

1.html.Accessed November 29, 2006.

Seltzer,M. (2005). Beyond Relational Databases. ACM Queue, 3(3), 50–58.

Valimaki,M. (2005). The Rise of Open Source Licensing: A Challenge to the Use of

IntellectualProperty in the Software Industry.Turre Publishing, Helsinki, Finland.

2.5 The P’s in LAMP

The ﬁrst three letters of the ubiquitous LAMP open source software stack

stand for Linux, Apache, and MySQL. The last letter P refers to the scripting

languageused and encompasses the powerful programming language Perl, the

scriptinglanguage PHP,and theapplication language Python.These areall open

source(unlike, for example, Javawhose major implementations areproprietary

even if programs written in it are open). Perl comes with an immense open

libraryof Perl modules called CPAN.We willfocus our discussion on PHP and

Perl. Concerning Python, we only note that it is a widely ported open source

programming language invented by Guido van Rossum in 1990 and used for

bothWeb and applications development,such as in BitTorrentand Google, and

sometimesfor scripting.

2.5.1 PHP Server-Side Scripting

PHP is a server-side scripting language embedded in HTML pages. It typi-

cally interfaceswith a background database, commonly MySQL, as in LAMP

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.5 The P’s in LAMP 71

environments.Thus, it allows the creation of data-based, dynamic Web pages.

The name PHP is a recursive acronymlike GNU, standing for PHP Hypertext

Preprocessor, though it originally referred to Personal Home Page tools. The

Netcraft survey indicates that PHP is the most widely deployed server-side

scripting language, with about one-third of all Internet sites surveyed having

PHPinstalled by early 2004.

PHP is another instructive tale of open source development. It illustrates

someof the ways in whichopen source projects originate, the inﬂuenceof their

initialdevelopers and the impact of new developers, the degree of open source

commitment, and attitudes toward commercialization. In the case of PHP, the

original developerRasmus Lerdorf was later joined by a group of other major

core developers. Lerdorf has kept with the open source project but did not

join in its commercialization – except in the indirect sense that he is currently

involvedas a development engineer in its vertical application internally within

Yahoo.In fact, Lerdorf seems to believe that the greatest monetary potentials

for open source lies in major vertical applications like Yahoo rather than in

support companies (like Zend in the case of PHP) (Schneider, 2003). Some of

the core PHP developers formed a commercial company named Zend, which

sells a number of PHP products, including an encoder designed to protect the

intellectualproperty represented by custom PHP scripts by encrypting them!

Programmersare often autodidacts: open source helps that happen. Rasmus

Lerdorf in fact describes himself as a “crappy coder” who thinks coding is a

“mind-numbing tedious endeavor” and “never took any programming courses

atschool” (Schneider, 2003).He has an engineering degreefrom the University

of Waterloo.But despite an absence of formal credentials in computer science

andthe self-deprecatory characterization ofhis interests, he hadbeen quite a bit

ofa hackersince his youth,hacking the CERNand NCSA servercode soon after

the latter’s distribution (Schneider, 2003). He was a self-taught Unix, Xenix,

andLinux fan andlearned what he neededfrom looking at theopen source code

they provided. To quote from the Schneider interview (2003), “What I like is

solving problems, which unfortunately often requires that I do a bit of coding.

Iwill steal and borrow as much existing code as I can and write as little ‘glue’

codeas possible to make it allwork together. That’s prettymuch what PHP is.”

Lerdorf started developing PHP in 1994–1995 with a simple and personal

motivationin mind, theclassic Raymond “scratch anitch” model. He wanted to

knowhowmany people werelooking athis resume, sincehe hadincluded a URL

for his resume in letters he had written to prospective employers (Schneider,

2003).He used a Perl CGI script to log visits to his resume page and to collect

information about the visitors. To impress the prospective employers, he let

visitors see his logging information (Schneider, 2003). People who visited the

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

72 2 Open Source Internet Application Projects

pagesoon became interested in using the tools, so Lerdorf gave the code away

intypical open source fashion, setting up a PHP mailing list to share the code,

bug reports, and ﬁxes. He ofﬁcially announced the availability of the initial

set of PHP tools (Version 1.0) in mid-1995, saying that “the tools are in the

public domain distributed under the GNU Public License. Yes, that means

they are free!” (see reference link under Schneider (2003)). Admittedly, there

are a lot of ambiguities in that statement, from public domain to free,but

the intent is clear. His own predisposition to open source was partly related

to money. As Lerdorf observes, “I don’t think I was ever really ‘hooked’ by

a ‘movement’. When you don’t have the money to buy SCO Unix and you

can download something that works and even ﬁnd people who can help you

get it up and running, how can you beat that?” (Yank, 2002). Lerdorf worked

intensivelyon the PHP code for several years. Being the leader and originator

of a popular open source project is not a lark. Like Linus he personally went

throughall thecontributed patches duringthat period, usuallyrewriting thecode

before committing it. He estimates that he wrote 99% of the code at that time.

Lerdorf’sinvolvementwith opensource has continuedin aproprietary contextat

Yahoo.Unlikemany organizations,Yahoo haswhat Lerdorf describesas a“long

tradition of using open source” (like FreeBSD) for use in their own extremely

complex and demanding infrastructure (Schneider, 2003). Andrei Zmievski is

currently listed as the PHP project administrator and owner on freshmeat.net

and the principal developer of PHP since 1999. PHP 4 is licensed under the

GPL. There are over 800 contributors currently involved in its development

(availableat http://www.php.net).

Opensource developments cango through generationalchanges as the prod-

uctor its implementation evolvesin response tooutside events orthe insights of

new participants. This happened with PHP. Computer scientists Zeev Suraski

and Andi Gutmans became involved with PHP in mid-1997 as part of a Web

programming project at the Technion in Israel. An odd syntax error in PHP

led them to look at the source code for the language. They were surprised to

see that the program used a line-by-line parsing technique, which they rec-

ognized could be dramatically improved upon. After a few months of intense

development effort, they had recoded enough of the PHP source to convince

Lerdorf to discontinue the earlier version of PHP and base further work on

their new code. This led to a successful collaboration between Lerdorf and a

newextended group of seven PHP core developers. Lerdorf has observed that

“this was probably the most crucial moment during the development of PHP.

The project would have died at that point if it had remained a one-man effort

and it could easily have died if the newly assembled group of strangers could

not ﬁgure out how to work together towards a common goal. We somehow

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.5 The P’s in LAMP 73

managed to juggle our egos and other personal events and the project grew”

(Lerdorf,2004). Another major overhaul followed. At the time, PHP still used

anapproach in which the codewas executed as itwas parsed. In order tohandle

themuch larger applicationsthat people wereusing PHP for,the developers had

tomake yet another majorchange. This once againled to redesigning and reim-

plementingthe PHP engine from scratch(Suraski, 2000). The new compilation

engine, which used a compile ﬁrst and then execute approach, was separable

from PHP 4. It was given its own name the Zend engine (combining Zeev +

Andi). It is licensed under an Apache-style as-is license. The company does

not dual license the Zend engine but provides support services and additional

productsfor a fee.

Although the PHP language processor is open source and the scripting lan-

guage programs are human-readable text (on the server side), its major com-

mercial distributor Zend ironically has tools for hiding source code that is

writtenin PHP – just likein the old-fashioned proprietary model! Gutmansand

Suraski, along with another core developer Doron Gerstel, founded Zend in

1999. It is a major distributor of PHP-related products and services. From an

opensource pointof view,a particularlyinteresting product isthe Zend Encoder.

TheZend Website describesthe encoder as“the recognized industrystandard in

PHP intellectual propertyprotection”–italics added (http://www.zend.com/).

Theencoder lets companies distribute their applications written in PHP “with-

out revealing the source code.” This protects the companies against copyright

infringement as well as from reverse engineering since the distributed code is

“both obfuscated and encoded” (http://www.zend.com/). As the site’s selling

pointsdescribe it, thisapproach allows “ProfessionalService Providers (to)rely

onthe ZendEncoder to delivertheir exclusiveand commercialPHP applications

tocustomers without revealing their valuable intellectual property. By protect-

ingtheir PHP applications, these and other enterprises expand distribution and

increaserevenue” (http://www.zend.com/). It is important to understand that it

isnot the source code for the PHP compiler that is hidden, but the PHP scripts

that development companies write for various applications and that are run in

a PHP environment that are hidden. Furthermore, the application source code

isnot being hidden from the end users (browser users), since they wouldnever

haveseen the code in the ﬁrst place since the PHP scripts are executed on the

serverand only theirresults are sent tothe client, sothere’s nothing tohide from

the client. The code is being hidden from purchasers of the code who want to

run it on their own Web servers. In any case, this model represents an inter-

esting marriage of open source products and proprietary code. The encoding

is accomplished by converting the “plain-text PHP scripts into a platform-

independent binary format known as Zend Intermediate Code. These encoded

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

74 2 Open Source Internet Application Projects

binaryﬁles are the ones that aredistributed (to prospective users) insteadof the

human-readablePHP ﬁles. The performance ofthe encoded PHP application is

completely unaffected!” (http://www.zend.com/). None of this turns out to be

openlicense related, since the proprietary nature of the distributions that Zend

is targeting is not about the PHP environment itself which is GPL’d but about

scripts written using PHP. Nonetheless, the thinking is not on the same page

as traditional open source distribution where disclosure of source is viewed as

beneﬁcial.

There are striking arguments to be made about the cost-effectiveness of

open tools like PHP and open platforms. Who better to hear them from than

an open source star and proprietary developer like Rasmus Lerdorf? In an

interviewwith Sharon Machlis (2002), Lerdorf did an interesting back-of-the-

envelopcalculation about the relative cost beneﬁts of open applications versus

proprietary tools like Microsoft’s. In response to the hypothetical question of

why one would choose PHP over (say) Microsoft’s ASP, he estimated that (at

that time) the ASP solution entailed (roughly): $4,000 for a Windows server,

$6,000 for an Internet security and application server on a per CPU basis,

$20,000 for an SQL Enterprise Edition Server per CPU, and about $3,000 per

developerfor an MSDN subscription, at a ﬁnal cost of over $40,000 per CPU.

Incontrast, you could build an equivalentopen source environment thatdid the

same thing based on Linux, Apache + SSL, PHP, PostgreSQL, and the Web

proxySquid, for free.The price comparisons becomeeven more dramaticwhen

multipleCPUs are involved. Granted if you havean existing Microsoft shop in

place, then the open source solution does have a learning curve attached to it

thattranslates into additional costs. However, especially in the case where one

is starting from scratch, the PHP and free environment is economically very

attractive.

References

Lerdorf,R. (2004). Do You PHP? http://www.oracle.com/technology/pub/articles/php

experts/rasmus php.html. Accessed November 29, 2006.

Machlis, S. (2002). PHP Creator Rasmus Lerdorf. http://www.computerworld.com/

softwaretopics/software/appdev/story/0,10801,67864,00.html. Accessed Novem-

ber29, 2006.

Schneider,J. (2003). Interview: PHP Founder Rasmus Lerdorf on Relinquishing Con-

trol. http://www.midwestbusiness.com/news/viewnews.asp?newsletterID=4577.

AccessedNovember 29, 2006.

Suraski, Z. (2000). Under the Hood of PHP4. http://www.zend.com/zend/art/under-

php4-hood.php.Accessed November 29, 2006.

Yank,K.(2002). Interview –PHP’s Creator,Rasmus Lerdorf. http://www.sitepoint.com/

article/phps-creator-rasmus-lerdorf.Accessed November 29, 2006.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.5 The P’s in LAMP 75

2.5.2 Perl and CPAN

According to the perl.org Web site, the open source programming language

Perl, together with its largely open library of supporting perl modules CPAN,

isa “stable, cross-platform programming language. . .used for mission-critical

projects in the public and private sectors and .. . widely used to program web

applications”(italics added).Perl 1.0 wasinitially released byits designer Larry

Wallabout 1987, with the much revised Perl 5 version debuting in 1994. Perl

is a procedural language like C. It is also implemented with a combination of

C and some Perl modules. Perl has some of the characteristics of Unix shell

programmingand is inﬂuenced by Unix tools like awk and sed. Although Perl

was originally designed for text manipulation, it has become widely used in

many systems applications, particularly to glue systems together. It was also

the original technology used to produce dynamic Web pages using CGI. Its

diverseapplicability has given it areputation as a system administrator’s Swiss

army knife. In fact, hacker Henry Spencer comically called Perl a Swiss army

chainsaw(see the Ja rgon Fileat http://catb.org/jargon/html/index.html), while

forsimilar reasons others call it “the duct tape of the Internet.” The Perl 5 ver-

sionallowed the use ofmodules to extend the language. Likethe Linux module

approach, the Perl 5 module structure “allows continued development of the

languagewithout actually changing the core language” according to developer

Wall (Richardson, 1999). This version has been under continuous develop-

ment since its release. Perl is highly portable and also has binary distributions

likeActivePerl, which is commonly used for Windows environments.Perl and

CPANmodules are pervasivein ﬁnancial applications, including long-standing

earlyuse at the Federal ReserveBoard and more recently in manybioinformat-

icsapplications. Indeed a classic application of Perl asan intersystem glue was

its application in integrating differently formatted data from multiple genome

sequencingdatabases during the Human Genome Project (Stein, 1996).

The essay by Larry Wall (1999), the inimitable creator of Perl, is worth

reading for a philosophical discourse on the design and purpose of Perl. Wall

discourses, among other things, on why the linguistic complexity of Perl is

neededin order to handle the complexityof messy real-world problems. Wall’s

academic background is interesting. Although he had worked full time for his

collegecomputer center, his graduate education was actually in linguistics (of

the human language kind) and he intended it to be in preparation for doing

biblical translations. The interview with Wall in Richardson (1999) describes

whatWall calls the “postmodern” motivation behind Perl’s design.

Perlis open source and GPL compatiblesince it can be licensed usingeither

theGPL or the so-called Artistic License,analternativecombination calledthe

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

76 2 Open Source Internet Application Projects

disjunctive license for Perl. The GPL option in this disjuncture is what makes

Perl GPL compatible. The Free Software Foundation considers the Artistic

License option for Perl to be vague and problematic in its wording and so it

recommendsthe GPL. On the other hand, the perl.com Website characterized

thePerl licenseas the Artisticlicense, which itdescribes as “akinder and gentler

version of the GNU license – one that doesn’t infect your work if you care to

borrow from Perl or package up pieces of it as part of a commercial product”

(perl.com).This last is a very important distinction since it allows the code for

Perlor Perl modules to be modiﬁed and embedded in proprietary products.

Averysigniﬁcantpart of Perl’s power comes from CPAN,which stands for

ComprehensivePerl ArchiveNetwork,animmenselibrary of Perl modules that

isfar moreextensivethan the Javaclass librariesor thoseavailable foreither PHP

or Python. The CPAN collection located at cpan.org was started about 1994,

enabled by the module architecture provided by Perl 5. It currently lists over

5,000 authors and over 10,000 modules. The search engine at search.cpan.org

helps programmers sort through the large number of modules available. The

modulesare human-readable Perl code, so theyare naturally accessible source.

However,a limited number of Perl binaries arealso available, but these are not

stored on the CPAN site. While the Perl language itself is distributed as GPL

(or Artistic), the modules written in Perl on the CPANsite do not require any

particular license. However, the CPAN FAQ does indicate that most, though

not all, of the modules available are in fact licensed under either the GPL or

the Artistic license. Contributors do not have to include a license but the site

recommendsit. With the limited exception of some sharewareand commercial

softwarefor Perl IDEs, SDKs and editors as indicated on thebinary ports page

of the Website, the site stipulates that it strongly disapproves of any software

forthe site that is not free software at least in the sense of free of charge.

References

Richardson,M. (1999). LarryWall, the Guruof Perl. 1999–05–01. LinuxJournal. http://

www.linuxjournal.com/article/3394.Accessed November 29, 2006.

Stein, L. (1996). How Perl Saved the Human Genome Project. The Perl Journal. 2001

version archived at Dr. Dobb’s Portal: www.ddj.com/dept/architect/184410424.

Accessed November 29, 2006. Original article: TPJ, 1996, 1(2). Also via: http://

scholar.google.com/scholar?hl=en&lr=&q=cache:vg2KokmwJNUJ:science.

bard.edu/cutler/classes/bioinfo/notes/perlsave.pdf+++%22The+Perl+Journal%22

+stein.Accessed November 29, 2006.

Wall,L. (1999). Diligence, Patience, and Humility. In: Open Sources: Voicesfrom the

OpenSource Revolution, M. Stone,S. Ockman, and C. DiBona (editors). O’Reilly

Media,Sebastopol, CA, 127–148.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.6 BitTorrent 77

2.6 BitTorrent

BitTorrentis a next-generation P2P Internet utility. It was created by Brahm

Cohenin 2002 and has become extremely widelyused, particularly for sharing

popularmultimedia ﬁles. Wediscuss it here fortwo reasons. It representsa next

generationtype of Internetservice, called a Web 2.0service by O’Reilly(2005).

It also facilitates the largest peer-to-peer network according to estimates done

byCacheLogic. BitTorrent works by exploiting the interconnectivity provided

by the Internet, avoidingthe bottlenecks that occur if every user tries to get an

entirecopy of aﬁle from a single source,as is done inthe client-server model of

data exchange. It also differs from conventional peer-to-peer networks, where

exchangesare limited toa single pairof uploaders anddownloaders at anygiven

time.Under the BitTorrentprotocol, acentral servercalled a trackercoordinates

the ﬁle exchanges between peers. The tracker does not require knowledge of

the ﬁle contents and so can work with a minimum of bandwidth, allowing

it to coordinate many peers. Files are thought of as comprising disjoint pieces

called fragments. Initially, a source or seed server that contains the entire ﬁle

distributes the fragments to a set of peers. Each peer in a pool or so-called

swarm of peers will at a given point have some of the fragments from the

complete ﬁle but lack some others. These missing fragments are supplied by

being exchanged transparently among the peers in a many-peer-to-many-peer

fashion. The exchange protocol used by BitTorrent exhibits a fundamental,

remarkable, and paradoxical advantage. The more people who want to have

access to a ﬁle, the more readily individual users can acquire a complete copy

of the ﬁle since there will be more partners to exchange fragments with. Thus

BitTorrentis an example of what has been called a Web 2.0 service.Itexhibits

anew network externality principle, namely, that “the service gets better the

more people use it” (O’Reilly, 2005). In contrast to earlier Internet services

likeAkamai, BitTorrent canbe thought of as having aBYOB (Bring YourOwn

Bottle)approach. Once again, O’Reilly expresses it succinctly:

...everyBitTorrent consumer brings his own resources to the party.There’s an

implicit“architecture of participation,” a built-in ethic of cooperation, in which the

serviceacts primarily as an intelligent broker, connecting the edges to each other

andharnessing the power of the users themselves.

Developer Cohen’s software also avoids the so-called leeching effect that

occurs in P2P exchangesunder which people greedily download ﬁles but self-

ishly refuse to share their data by uploading. The BitTorrent protocol rules

requirethat downloadersof fragmentsalso haveto upload fragments.In Cohen’s

program,the more auser shares hisﬁles, the faster thetorrent of fragmentsfrom

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

78 2 Open Source Internet Application Projects

otherusers downloads to his computer.This reﬂects a motto Cohenhad printed

on T-shirts: “Give and ye shall receive” (Thompson, 2005). It’s a “share and

sharealike” principle. BitTorrentnow carries anenormous share ofworld Inter-

nettrafﬁc, currently one-third accordingto the estimate byCacheLogic, though

some aspects of the estimate have been disputed. The BitTorrent license is a

custom variation of the Jabber license. It reserves the BitTorrent

name as a

trademark,which prevents thename from drifting into genericusage. Although

the BitTorrent source code is open source, it does not appear to be an OSI-

certiﬁed license or a free software license, partly because of some relicensing

restrictions, although the license deﬁnition seems to be GPL-like in charac-

ter.The BitTorrent company is now acting as a licensed distributor for movie

videos,an important, new open source business model.

References

O’Reilly, T. (2005). What is Web 2.0 Design Patterns and Business Models for the

Next Generation of Software. http://www.oreillynet.com/pub/a/oreilly/tim/news/

2005/09/30/what-is-web-20.html.Accessed November 29, 2006.

Thompson,C. (2005). The BitTorrentEffect. Wired.com,Issue 13.01. http://wired.com/

wired/archive/13.01/bittorrent.html.Accessed November 29, 2006.

2.7 BIND

BIND is a pervasive and fundamental Internet infrastructure utility. From an

opensource businessmodel point ofview,BIND is instructiveprecisely because

it led to such an unexpected business model. The model did not beneﬁt the

original developers and was not based on the software itself. Instead it was

basedindirectly on information services that were enabled by the software.

The acronym BIND stands for Berkeley Internet Domain Name. BIND is

an Internet directory service. Its basic function is to implement domain name

servicesby translating symbolic host domainnames into numeric IP addresses,

using distributed name servers. The DNS (Domain Name System) environ-

ment is an enormous operation. It relies on domain name data stored across

billionsof resource records distributedover millions of ﬁlescalled zones (Sala-

mon, 1998/2004). The zones are kept on what are called authoritative servers,

whichare distributed over theInternet. Authoritative servers handleDNS name

requestsfor zones they havedata on and requestinformation from other servers

otherwise. Large name servers may have tens of thousands of zones. We will

notdelve further into how the system works.

P1:KAE

9780521881036c02 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:16

2.7 BIND 79

The general idea of using symbolic names for network communications

was originally introducedto support e-mail on the ARPANET, long before the

Internet, in fact going back to the mid-1970s. As network trafﬁc increased,

the initial implementation approaches had to be completely revised (see RFC

805 from 1982, as well as RFC 881, etc.). The original version of the new

BIND software was written in the early 1980s by graduate students at UC

Berkeley as part of the development of Berkeley Unix under a DARPA grant.

Itscurrent version BIND 9is much more securethan earlier BIND versionsand

isoutsourced bythe Internet SoftwareConsortium to theNominum Corporation

forcontinued development and maintenance.

The business opportunity that emerged from BIND is interesting because

it turned out to be neither the software itself nor the service and marketing of

the software that was proﬁtable. Instead, the proﬁt potential lay in the service

opportunityprovided by the provision of the domain names. It is highly ironic

thatthe project maintainerfor BIND whichis arguably “thesingle most mission

criticalprogram on the Internet” had “scraped by fordecades on donations and

consultingfees,” while the business based on the registrationof domain names

thatwas in turnbased on BIND thrived(O’Reilly, 2004).As O’Reilly observed,

...domainname registration – an information service based on the software –

becamea business generating hundreds of millions of dollars a year, a virtual

monopolyfor Network Solutions, which was handed the business on government

contractbefore anyone realized just how valuable it would be. The...opportunity

ofthe DNS was not a software opportunity at all, but the service of managing the

namespaceused by the software. By a historical accident, the business model

becameseparated from the software.

References

O’Reilly, T. (2004). Open Source Paradigm Shift. http://tim.oreilly.com/articles/

paradigmshift

0504.html.Accessed November 29, 2006.

Salamon,A. (1998/2004). DNS Overviewand General References. http://www.dns.net/

dnsrd/docs/whatis.html.Accessed January 10, 2007.

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

The Open Source Platform

We use the term open source platform to refer to the combination of open

operating systems and desktops, support environments like GNU, and under-

lyingframeworks like the X Window System, which togetherprovide a matrix

for user interaction with a computer system. The provision of such an open

infrastructure for computing has been one of the hallmark objectives of the

free software movement. The GNU project sponsored by the Free Software

Foundation(FSF) had as its ultimate objective the creation of a self-contained

free software platform that would allow computer scientists to accomplish all

their software development in a free environment uninhibited by proprietary

restrictions. This chapter describes these epic achievements in the history of

computing, including the people involved and technical and legal issues that

affected the development. We shall also examine the important free desktop

application GIMP which is intended as a free replacement for Adobe Photo-

shop. We shall reserve the discussion of the GNU project itself to a later

chapter.

Theroot system that servesas the reference modelfor open source operating

systems is Unix whose creation and evolution we shall brieﬂy describe. Over

time,legal andproprietary issues associatedwith Unix openedthe door toLinux

as the signature open source operating system, though major free versions

of Unix continued under the BSD (Berkeley Software Distributions) aegis.

The Linux operating system, which became the ﬂagship open source project,

evolved out of a simple port of Unix to a personal computer environment,

butitburgeoned rapidly into the centerpiece project of the movement. To be

competitivewith proprietary platforms in the mass market, the Linux and free

Unix-like platforms in turn required high-quality desktop style interfaces. It

was out of this necessity that the two major open desktops GNOME and KDE

emerged. Underlying the development of these desktops was the extensive,

longstandingdevelopment effort represented by the XWindow System, and an

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 81

open source project begun in the early 1980s at MIT that provided the basic

underlyingwindowing capabilities for Unix-like systems.

3.1 Operating Systems

This ﬁrst section addresses the creation of Unix and its variants and the emer-

genceof Linux.

3.1.1 Unix

TheUnix operating system was developed at AT&T’s Bell TelephoneLabora-

tories(BTL) during the 1970s. An essential factorin the historic importance of

Unixlay in the fact that it was the ﬁrst operating system writtenin a high-level

language (C), although this was not initially so. This approach had the critical

advantage that it made Unix highly portable to different hardware platforms.

Much of the development work on Unix involved distributed effort between

the initial developers at AT&T’s BTL and computer scientists at universities,

especiallythe developmentgroup at Universityof California(UC) Berkeleythat

addedextensivecapabilities to theoriginal AT&TUnixsystem during the1980s.

Remember that the network communications infrastructure that would greatly

facilitatesuch distributedcollaboration wasat thistime only inits infancy.While

Unixbegan its lifein the context ofan industrial research lab,it progressed with

thebacking of major academic and government (DARPA) involvement.

From the viewpoint of distributed collaboration and open source develop-

ment,the invention and developmentof Unix illustrates the substantial beneﬁts

thatcan accrue from open development, as well as the disadvantages for inno-

vationthat can arise from proprietary restrictions in licensing. The Unix story

also illustrates how distributed collaboration was already feasible prior to the

Internet communications structure, but also how it could be done more effec-

tivelyonce even more advanced networked communications became available

(whichwas partly because of Unix itself).

Initiallythe AT&T Unix source code was freely and informally exchanged.

This was a common practice at the time and signiﬁcantly helped researchers

in different organizations in their tinkering with the source code, ﬁxing bugs,

andadding features. Eventually, however,licensing restrictions by AT&T,con-

siderablecharges for the software to other commercialorganizations, and legal

conﬂicts between AT&T and the UC Berkeley handicapped the development

of Unix as an open source system, at least temporarily during the early 1990s.

Theearly 1990s was also precisely the time when Linux had started to emerge

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

82 3 The Open Source Platform

and quickly seize center stage as the most popular Unix-like system, offered

with open source, licensed under the General Public License (GPL), and free

ofcharge.

Necessity is the mother of invention – or at least discomfort is. The devel-

opment of Unix at Bell Labs started when Ken Thompson and Dennis Ritchie

wrotean operating system in assemblylanguage for a DEC PDP-7. Theproject

wastheir reaction to BellLabs’ withdrawal fromthe Multics time-sharing oper-

ating system project with MIT and General Electric. Even though the Multics

projecthad problems (like system bloating),it was an important and innovative

systemand the two programmers had becomeaccustomed to it. Its unavailabil-

ity and replacement by an inferior,older system frustrated them (Scott, 1988).

In response, they decided to design a new, simple operating system to run on

theirDEC machine. Interestingly, the idea was not todevelop a BTL corporate

product but just to design and implement a usable and simple operating sys-

tem that the two developers could comfortably use. In addition to the core of

the operating system, the environment they developed included a ﬁle system,

a command interpreter,some utilities, a text editor, and a formatting program.

Sinceit providedthe functionality for abasic ofﬁceautomation system, Thomp-

sonand Ritchie persuaded the legaldepartment at Bell Labs to bethe ﬁrst users

andrewrote their system for a PDP-11 for the department.

In1973, a decisive if notrevolutionary development occurred: theoperating

systemwas rewritten in C. The C languagewas a new high-level programming

language that Kernighan and Ritchie had just invented and which was, among

other things, intended to be useful for writing software that would ordinarily

havebeen written inassembly language, likeoperating systems. This extremely

innovativeapproach meant that Unix could now be much more easily updated

orported to other machines.In fact, within afew years, Unix hadbeen ported to

anumber of differentcomputer platforms – somethingthat had neverbeen done

before.The use of ahigher level language foroperating system implementation

wasavisionarydevelopment becauseprior tothis, operating systemshad always

beenclosely tied to the assembly language of their native hardware. The high-

levellanguage implementation made the code for the operating system “much

easier to understand and to modify” (Ritchie and Thompson, 1974), which

wasakey cognitive advantage in collaborative development. Furthermore, as

Raymond(1997) observed:

IfUnix could present the same face, the same capabilities, on machines of many

differenttypes, it could serve as a common software environment for all of them.

Nolonger would users have to pay for complete newdesigns of software every time

amachine went obsolete. Hackers could carry around software toolkits between

differentmachines, rather than having to re-invent the equivalents of ﬁre and the

wheelevery time.

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 83

The Unix environmentwas also elegantly and simply designed as a toolkit of

simpleprograms that could easily interact. Allegedly the terse character of the

Unix and C commands was just an artifact of the fact that teletype machines

thatcommunicated with the PDP werequite slow: so the shorterthe commands

(andthe error messages), the more convenient it was for the user!

Theprofessional dissemination of their workby Ritchie and Thompson also

strongly affectedthe rapid deployment of Unix. At the end of 1973, they gave

areport on Unix at the Fourth ACM Symposiumon Operating Systems, which

waslater published inthe Communications of theACM (Ritchieand Thompson,

1974). As Tom VanVleck observed, this report still “remains one of the best

and clearest pieces of writing in the computer ﬁeld” (Van Vleck, 1995). The

ACM symposium presentation caught the eye of a Berkeley researcher who

subsequentlypersuaded his home department to buy a DECon which to install

the new system. This initiated Berkeley’s heavy historic involvement in Unix

development,an involvement that was further extended when Ken Thompson

went to UC Berkeley as a visiting professor during 1976 (Berkeley was his

almamater). The article’s publication precipitated further deployment.

Thesystem was deployedwidely and rapidly, particularlyin universities. By

1977,there were more than 500 sites runningUnix. Given the legal restrictions

thatthe AT&Tmonopoly operated under,because of theso-called 1956 consent

decree with the U.S. Department of Justice, AT&T appeared to be prohibited

fromcommercially marketingand supporting computersoftware (Garﬁnkeland

Spafford,1996). Thus, de facto, software was not considered as a proﬁt center

for the company. During this time, the source code for Unix, not merely the

binarycode, was made available by AT&Tto universities and the government,

aswell as tocommercial ﬁrms. However,the distributionwas under theterms of

anAT&T license andan associated nondisclosure agreement thatwas intended

to control the release of the Unix source code. Thus, although the source code

wasopen ina certain sense,it was strictlyspeaking not supposedto bedisclosed,

exceptto other licenserecipients who alreadyhad a copyof the code.University

groups could receive a tape of the complete source code for the system for

about $400, which was the cost of the materials and their distribution, though

theeducational license itself was free.

The distribution of Unix as an operating system that was widely used in

major educational environments and as part of their education by many of the

topcomputer science studentsin the country hadmany side-beneﬁts. For exam-

ple,it meant – going forward – that there would be within a few years literally

thousandsof Unix-savvy users and developersemerging from the bestresearch

universitieswho would furthercontribute to the successfuldispersion, develop-

ment, and entrenchment of Unix. A Unix culture developed that was strongly

dependent on having access to the C source code for the system, the code that

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

84 3 The Open Source Platform

alsosimultaneously served as thedocumentation for the system programs.This

access to the source code greatly stimulated innovation. Programmers could

experimentwith the system, play with the code, and ﬁx bugs, an advantage for

the developmentof Unix that would have been nonexistent if the distributions

ofthe Unix source and system utilities had only been available in binary form.

By 1978, the Berkeley Computer Systems Research Group, including stu-

dents of Ken Thompson at Berkeley, were making add-ons for Unix and dis-

tributing them, always with both the binary executable and the C source code

included,basically for the cost of shipping and materials as so-called Berkeley

SoftwareDistributions of Unix – but only as long as the recipient had a valid

Unixsource license from AT&T.The license under which the BSDapplication

code itself was distributed was very liberal. It allowed the BSD-licensed open

source code, or modiﬁcations to it, to be incorporated into closed, proprietary

softwarewhose code could then be kept undisclosed.

Legalissues invariablyimpact commercializablescience. An importantlegal

development occurred in 1979 when AT&T released the 7th version of Unix.

By that point AT&Twas in a legal position to sell software, so it decided that

it was now going to commercialize Unix, no longer distributing it freely and

no longer disclosing its source code. Companies, like IBM and DEC, could

receive the AT&T source-code licenses for a charge and sometimes even the

right to developproprietary systems that used the trademark Unix, like DEC’s

Ultrix.Almost inevitablythe companies createddifferentiated versions ofUnix,

leading eventuallyto a proliferation of incompatible proprietary versions. The

UCBerkeley group responded to AT&T’saction at the end of 1979 by making

its next BSD release (3BSD) a complete operating system, forking the Unix

development(Edwards, 2003). Berkeley,which was nowstrongly supported by

DARPAespeciallybecause ofthe successof the virtualmemory implementation

3BSD, would now come to rival Bell Labs as a center of Unix development.

Indeed, it was BSD Unix that was selected by DARPA as the base system for

theTCP/IP protocol thatwould underlie theInternet. During the early1980s the

BerkeleyComputer SystemsResearch Group introducedimprovements to Unix

that increased the popularity of its distributions with universities, especially

becauseof its improved networking capabilities.

Thenetworking capabilitiesprovided by BSD4 had whatmight be described

as a meta-effect on software development. BSD version 4.2 was released in

1983 and was much more popular with Unix vendors for several years than

AT&T’scommercial Unix System V version (McKusick,1999). But version 4

didn’tmerely improve the capabilitiesof an operating system. Itfundamentally

altered the very way in which collaboration on software development could

be done because it provided an infrastructure for digitally transmitting not

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 85

only communication messages but also large amounts of source code among

remotelylocated developers. This created a sharedworkspace where the actual

productionartifacts could be worked on in common.

However,AT&T did not stand still with its own Unix versions. The AT&T

System V variants, starting with the ﬁrst release in 1983 and continuing to

release4 of System V in1989 (also designated SVR4), eventuallyincorporated

manyof the improvements to Unix thathad been developed at Berkeley (Scott,

1988).Indeed, release 4 of SystemV had over a millioninstallations. However,

these commercial releases no longer included source code. System V was the

ﬁrst release that AT&T actually supported (because it was now a commercial

product) and ultimately became the preferred choice for hardware vendors,

partly because its operating system interfaces followed certain formal stan-

dards better (Wheeler, 2003). While the Unix license fees that AT&T charged

universities were nominal, those for commercial ﬁrms had ranged as high as

a quarter of a million dollars, but AT&T lowered the costs of the commercial

license with the release of System V.Many private companies developed their

own privatevariations (so-called ﬂavors)of Unix based on SVR4 underlicense

from AT&T.Eventually, AT&Tsold its rights to Unix to Novel after release 4

ofSystem V in the early 1990s.

BSD Unix evolved both technically and legally toward free, open source

status, divorced from AT&T restrictions. Throughout the 1980s, Berkeley’s

Computer Science Research Group extensively redeveloped Unix, enhancing

it – and rewritingor excising almost every piece of the AT&TUnix code. The

BSDdistributions would ultimately be open source and not constrained by the

AT&Tlicensing restrictions. Indeed,by 1991, a BSD system (originally Net/2)

that was almost free of any of the AT&T source code was released that was

freely redistributed. However, in response to this action by Berkeley, AT&T,

concerned that its licensing income would be undermined, sued Berkeley in

1992for violating its licensing agreement with AT&T.Later Berkeleycounter-

suedAT&T fornot giving Berkeley adequate creditfor the extensive BSDcode

thatAT&T had used inits own System V! The dispute was settledby 1994. An

acceptable,free version of Unixcalled 4.4BSD-Lite was releasedsoon after the

settlement.All infringements of AT&T codehad been removed from this code,

even though theyhad been relatively minuscule in any case.

The legal entanglements for BSD Unix caused a delay that created a key

windowof opportunity for the ﬂedgling Linux platform. Indeed, the timing of

thelegal dispute was signiﬁcantly disruptive for BSD Unixbecause it was pre-

ciselyduring this several year period, during which UC Berkeley was stymied

bylitigation with AT&T, that Linux emerged and rapidly gained in popularity.

That said, it should be noted, in the case of Linux, that half of the utilities that

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

86 3 The Open Source Platform

comepackaged with Linux in reality come from the BSD distribution – and of

courseLinux itself in turn depends heavily on the free or open tools developed

bythe GNU project (McKusick, 1999).

The BSD open source versions that forked from 4.4BSD-Lite and which

wereessentially free/open Unix-like clones included four systems: OpenBSD,

NetBSD, BSDI, and most signiﬁcantly FreeBSD. These versions were all

licensed under the BSD license (at least the kernel code and most new code)

which unlike the GPL permits both binary and source code redistribution. It

includes the right to make derivative works that can be taken proprietary, as

long as credit is given for the code done by the Berkeley group. OpenBSD

is the second most popular of these free operating systems, after FreeBSD. It

hasrecognized, empirically veriﬁed, strongsecurity performance, as we brieﬂy

elaborated on in Chapter 2. NetBSD was designed with the intention of being

portableto almostany processor.BSDI was theﬁrst commercial versionof Unix

forthe widespread Intel platform (Wheeler, 2003). FreeBSD is the most popu-

larof all the free operating systems, after Linux. Unlike Linux, it is developed

under a single Concurrent VersionsSystem (CVS) revision tree. Additionally,

FreeBSD is a “complete operating system (kernel and userland)” and has the

advantage that both the “kernel and provided utilities are under the control

of the same release engineering team, (so) there is less likelihood of library

incompatibilities” (Lavigne, 2005). It is considered to have high-quality net-

workand security characteristics. Its Web sitedescribes it as providing “robust

networkservices, even underthe heaviest of loads,and uses memory efﬁciently

to maintain good response times for hundreds, or even thousands, of simulta-

neous user processes” (http://www.freebsd.org/about.html, accessed January

5, 2005). Yahoo uses FreeBSD for its servers as does the server survey Web

site NetCraft. It is considered an elegantly simple system that installs easily

on x86 compatible PCs and a number of other architectures. FreeBSD is also

consideredas binary compatible withLinux in the sense thatcommercial appli-

cationsthat are distributed asbinaries for Linux generally also runon FreeBSD

includingsoftware like Matlab and Acrobat.

For moreinformation on the Unix operating system and its history,we refer

theinterested reader to Raymond (2004) and Ritchie (1984).

OpenStandards for Unix-like Operating Systems

Standardsare extremely important inengineering. And, as AndrewTanenbaum

quipped: “The nice thing about standards is that there are so many of them to

choose from” (Tanenbaum, 1981)! Standards can be deﬁned as openly avail-

able and agreed upon speciﬁcations. Forexample, there are international stan-

dardsfor HTML, XML, SQL, Unicode,and many other hardware andsoftware

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 87

systems,sanctioned by a variety of standards organizations like the W3C con-

sortium or the International Organization for Standardization (ISO). Publicly

availablestandards are important forsoftware development becausethey deﬁne

criteria around which software products or services can be built. They ensure

that software implementers are working toward an agreed upon shared target.

They help guarantee the compatibility and interoperability of products from

differentmanufacturers or developers.

Standards help control emergent chaos: something that was deﬁnitely hap-

pening in the development of Unix-like systems. The closed versions of Unix

developedby AT&T and eventually the variousUnix vendors naturally tended

to diverge increasingly over time, partly because the code bases were pro-

prietary and the individual hardware vendors’ needs were specialized. Such

divergencesmay have the advantage of allowinguseful specialized versions of

a system to emerge, tailored to speciﬁc hardware architectures, but they also

make it increasingly difﬁcult for software developers to develop applications

programsthat work in these divergentenvironments. It alsoincreases the learn-

ing difﬁculties of users who work or migrate between the different variations.

Establishing accepted, widely recognized standards is a key way of guarding

againstthe deleterious effects of theproliferation of such mutating clones ofan

originalroot system.

Operatingsystems exhibitthe scaleof complexitythat necessitates standards.

In the context of operating systems, standards can help establish uniform user

views of a system as well as uniform system calls for application programs.

Two related standards for Unix (which have basically merged) are the POSIX

standard and the Single Unix Speciﬁcation, both of which were initiated in

themid-1980s as a result of the proliferation of proprietary Unix-like systems.

Standards do not entail disclosing source code like in the open source model,

but they do at least ensure a degree of portability for applications and users,

and mitigate against the decreasing interoperability that tends to arise when

closed source systems evolve, mutate, and diverge. Thus, while open source

helpskeep divergenceunder control bymaking system internalstransparent and

reproducible,open standards attempt tohelp control divergenceby maintaining

coherencein the external user and programming interfaces of systems.

POSIX refers to a set of standards, deﬁned by the IEEE and recognized by

the ISO, which is intended to standardize the Applications Program Interface

forprograms running on Unix-likeoperating systems. POSIX isan acronym for

PortableOperating Systems Interface, with the post-pended X due to the Unix

connection.The name wasproposed (apparently humorously) byStallman who

isprominent forhis role inthe FreeSoftware Foundation andmovement. POSIX

wasaneffortby a consortium ofvendors to establish asingle standard for Unix,

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

88 3 The Open Source Platform

makingit simpler to port applications across differenthardware platforms. The

user interface would look the same on different platforms and programs that

ran on one POSIX system would also run on another. In other words, the user

interface would be portable as would the Application Programmer Interface,

ratherthan the operating system.

The POSIX standards include a compliance suite called the Conformance

Test Suite. Actually, the term compliance is weaker than the stronger term

conformance that implies that a system supports the POSIX standards in their

entirety. The POSIX standards address both user and programming software

interfaces. For example, the Korn Shell is established as the standard user

command-lineinterface, as are an extensive set of user commands and utilities

likethe command for listing ﬁles (ls). These standardsfall under what is called

POSIX.2. The standards also deﬁne the C programming interface for system

calls,including those for I/O services, ﬁles,and processes, under what is called

POSIX.1.The POSIX standards were later integrated into the so-called Single

Unix Speciﬁcation, which had originated about the same time as the POSIX

standards. The Single Unix Speciﬁcation is the legal deﬁnition of the Unix

systemunder the Unix trademark owned by the Open Group.The Open Group

makes the standards freely available on the Web and provides test tools and

certiﬁcationfor the standards.

References

Edwards,K. (2003).Technological Innovation in the Software Industry: Open Source

Development.Ph.D. Thesis, Technical University of Denmark.

Garﬁnkel, S. and Spafford, G. (1996). Practical Unix and Internet Security. O’Reilly

Media,Sebastopol, CA.

Lavigne, D. (2005). FreeBSD: An Open Source Alternative to Linux. http://www.

freebsd.org/doc/en

US.ISO8859-1/articles/linux-comparison/article.html.

AccessedFebruary 10, 2007.

McKusick,M. (1999). TwentyYears of Berkeley Unix: From AT&T-Owned to Freely

Redistributable. In: Open Sources: Voices from the Open Source Revolution, M.

Stone, S. Ockman, and C. DiBona (editors). O’Reilly Media, Sebastopol, CA,

31–46.

Raymond, E. (2004). The Art of UNIX Programming. Addison-Wesley Professional

ComputerSeries. Pearson Education Inc. Also: Revision 1.0.September 19, 2003.

http://www.faqs.org/docs/artu/.Accessed January 10, 2007.

Raymond, E.S. (1997). A Brief History of Hackerdom. http://www.catb.org/∼esr/

writings/cathedral-bazaar/hacker-history/.Accessed November 29, 2006.

Ritchie,D. (1984).The Evolution ofthe UNIX Time-SharingSystem. Bell SystemTech-

nicalJournal, 63(8), 1–11. Also: http://cm.bell-labs.com/cm/cs/who/dmr/hist.pdf.

AccessedJanuary 10, 2007.

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 89

Ritchie,D. and Thompson, K. (1974). The UNIX Time-Sharing System. Communica-

tions of the ACM, 17(7), 365–375. Revised version of paper presented at: Fourth

ACM Symposium on Operating SystemPrinciples, IBM Watson Research Center,

YorktownHeights, New York,October 15–17, 1973.

Scott, G. (1988). A Look at UNIX. U-M Computing News. University of Michigan

ComputingNewsletter, 3(7).

Tanenbaum, A. (1981). Computer Networks, 2nd edition. Prentice Hall, Englewood

Cliffs,NJ.

Van Vleck,T. (1995). Unix and Multics.http://multicians.org/unix.html. Accessed Jan-

uary10, 2007.

Wheeler, D. (2003). Secure Programming for Linux and Unix HOWTO. http://www.

dwheeler.com/secure-programs.Accessed November 29, 2006.

3.1.2 Linux

Linux is the deﬁning, triumphant mythic project of open source. It illustrates

perfectly the paradigm of open development and the variegated motivations

that make people initiate or participate in these projects. It led to unexpected,

unprecedented,explosivesystem developmentand deployment. Itrepresents the

metamorphosis of an initially modest project, overseen by a single individual,

intoa global megaproject.

Linuxwas the realization of a youthfulcomputer science student’s dream of

creating an operating system he would like and that would serve his personal

purposes.It began as a response to limitationsin the Minix PC implementation

of Unix. As described previously, Unix had originally been freely and widely

distributed at universities and research facilities, but by 1990 it had become

both expensiveand restricted by a proprietary AT&Tlicense. An inexpensive,

open source Unix clone named Minix, which could run on PCs and used no

AT&Tcode in its kernel, compilers, or utilities, was developed by Professor

Andrew Tanenbaum for use in teaching operating systems courses. In 1991,

Linus Torvalds, then an undergraduate student at the University of Helsinski

got a new PC, his ﬁrst, based on an Intel 8086 processor. The only available

operatingsystems for thePC were DOS, whichlacked multitasking, and Minix.

Linusbought a copy ofMinix and tried iton his PC, butwas dissatisﬁed with its

performance.For example, itlacked important features likea terminal emulator

that would let him connect to his school’s computer. A terminal emulator is a

program that runs on a PC and lets it interact with a remote, multiuser server.

Thisis different from a command-line interpreter or shell. Terminal emulators

were frequently used to let a PC user log on to a remote computer to execute

programs available on the remote machine. The familiar Telnet program is a

terminal emulator that works over a TCP/IP network and lets the PC running

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

90 3 The Open Source Platform

it interact with the remote server program (SSH would now be used). Com-

mandsentered through the Telnet prompt are transmitted overthe network and

executedas if they had been directly entered on the remote machine’s console.

Linus implemented his own terminal emulator separate from Minix and also

developedadditional, Minix-independent programs for saving and transferring

ﬁles.

Thiswas the beginning ofthe Linux operating system. Thesystem’s name is

anelision of the developer’s ﬁrst name Linusand Unix (the operating system it

ismodeled on). Linuxis said to bea Unix-like operatingsystem in the sensethat

itssystem interfaces or system calls arethe same as those of Unix,so programs

that work in a Unix environment will also work in a Linux environment. It

wouldbe worthwhile for the readerto look up a table ofLinux system calls and

identify the function of some of the major system calls to get a sense of what

isinvolved.

The post that would be heard round the world arrived in late August 1991.

Linusposted the noteon the Usenetnewsgroup comp.os.minix (category:Com-

puters > Operating Systems > Minix), a newsgroup of which Linus was a

member,dedicated to discussion of theMinix operating system. He announced

he was developing a free operating system for the 386(486) AT clones. These

networkedcommunications groups that had ﬁrstbecome available in the 1980s

wouldbe key enabling infrastructures for the kind of distributed, collaborative

development Linux followed. The group provided a forum for Linus to tell

people what he wanted to do and to attract their interest. The posted message

asked if anyone in the newsgroup had ideas to propose for additional features

forhis system. The original post follows:

From:torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)

Newsgroups:comp.os.minix

Subject:What would you like to see most in minix?

Summary:small poll for my new operating system

Message-ID:<1991Aug25.205708.9541@klaava.Helsinki.FI>

Date:25Aug9120:57:08 GMT

Organization:University of Helsinki

Helloeverybody out there using minix –

I’mdoing a (free) operating system ( just a hobby, won’tbe big and professional

likegnu) for 386(486) AT clones. This has been brewing since april, and is starting

toget ready. I’d like any feedback on things people like/dislike in minix, as my OS

resemblesit somewhat (same physical layout of the ﬁle-system (due to practical

reasons)among other things).

I’vecurrently ported bash(1.08) and gcc(1.40), and things seem to work. This

impliesthat I’ll get something practical within a few months, and I’d like to know

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 91

whatfeatures most people would want. Any suggestions are welcome, but I won’t

promiseI’ll implement them :-)

Linus(torvalds@kruuna.helsinki.ﬁ)

PS.Yes – it’s free of any minix code, and it has a multi-threaded fs. It is NOT

portable(uses 386 task switching etc), and it probably never will support anything

otherthan AT-harddisks, as that’s all I have :-(.

ThoughLinux would become a dragon-slayerof a project, the initialpost was a

“modestproposal” indeed, though it doesconvey the sense ofthe development.

Itwas motivated by personal needand interest. It was to be aUnix-like system.

It was to be free with a pure pedigree. Just as Minix contained none of the

proprietaryAT&T Unixcode, Linux too wouldcontain none of the Minixcode.

Linus wanted suggestions on additional useful features and enhancements. A

fewbasic Unixprograms had alreadybeen implemented ona speciﬁc processor,

butthe scale was small and itwas not even planned tobe “ported” (adapted and

implemented)on other machines. A little later, Linus posted another engaging

e-mailto the Minix newsgroup:

From:torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)

Newsgroups:comp.os.minix

Subject:Free minix-like kernel sources for 386-AT

Message-ID:<1991Oct5.054106.4647@klaava.Helsinki.FI>

Date:5Oct91 05:41:06 GMT

Organization:University of Helsinki

Doyou pine for the nice days of minix-1.1, when men were men and wrote their

owndevice drivers? Are you without a nice project and just dying to cut your teeth

onan OS you can try to modify for your needs? Are you ﬁnding it frustrating when

everythingworks on minix? No more all-nighters to get a nifty program working?

Thenthis post might be just for you :-)

The rest of the post describes Linus’ goal of building a stand-alone operating

systemindependent of Minix. One has to smile at theunpretentious and enthu-

siastictone of the e-mail. The interest in making a “niftyprogram” sort of says

itall.

Linux would soon become a model of Internet-based development which

itself relied on networked communication and networked ﬁle sharing. Linus

encouragedinterested readers to download the source code he had written and

made available on an FTP server. He wanted the source to be easily available

over FTP and inexpensive(Yamagata, 1997). Thus in addition to collaborators

communicatingwith each another via the network, thenetworked environment

provided the means for the rapid dissemination of revisions to the system.

Potential contributors were asked to download the code, play with the system

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

92 3 The Open Source Platform

developed so far, tell about any corrections, and contribute code. The patch

programs had already been introduced some years earlier, so the exchange of

codechanges was relatively simple.

Releases were to follow quickly as a matter of development strategy and

Unix functionality was soon matched. Within a month of his October 1991

announcement,ten people had installedthe ﬁrst version ontheir own machines.

Within two months, 30 people had contributed a few hundred error reports

or contributed utilities and drivers. When the comp.os.linux newsgroup was

subsequently established, it became one of the top ﬁve most read newsgroups

(Edwards,2003). Later in1991, Linus distributedwhat he calledversion 0.12. It

was initiallydistributed under a license that forbade charging for distributions.

By January 1992, this was changed and Linux was distributedunder the GNU

GPL. This was done partly for logistic reasons (so people could charge for

making disk copies available) but primarily out of Linus’ appreciation for the

GPL-licensed GNU tools that Torvalds grew up on and was using to create

Linux. Linus gavecredit to three people for their contributions in a December

release. By the release of development version 0.13, most of the patches had

beenwritten by people other than himself (Moon and Sproul, 2000).From that

point on, Linux developed quickly, partly as a result of Linus’ release-early,

release-often policy. Indeed, within a year-and-a-half, Linus had released 90

updated versions (!) of the original software, prior to the ﬁrst user version 1.0

in 1994. By the end of 1993, Linux had developed sufﬁciently to serve as a

replacement for Unix. Version 1.0 of the Linux kernel was released in March

1994.

Linuxis just the kernel of an operating system. For example,the command-

lineinterpreter orshell thatruns on topof the Linuxkernel wasnot developed by

Linus but came from previously existing free software. There were of course

other key free components used in the composite GNU/Linux system, like

the GNU C compiler developed by the FSF in the 1980s. Successive Linux

versions substantially modiﬁed the initial kernel. Linus advocated the then

unconventionaluse of a monolithic kernel rather than a so-called microkernel.

Microkernelsare smallkernels with hardware-relatedcomponents embedded in

thekernel and which use message-passing to communicate between the kernel

and the separate outer layers of the operating system. This structure makes

microkerneldesigns portable but also generally slowerthan monolithic kernels

because of the increased interprocess communication they entail. Monolithic

kernels,on theother hand,integrate the outerlayers intothe kernel, whichmakes

them faster. Linux’s design uses modules that can be linked to the kernel at

runtimein order to achievethe advantages offered bythe microkernel approach

(Bovetand Cesati, 2003).A fascinating historicalexchange about kernel design

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 93

occurred in the heated butinstructive debate between Minix’s Tanenbaum and

Linux’sTorvaldsin their controversialearly 1992newsgroup discussions. Refer

to Tanenbaum’sprovocative “Linux is obsolete” thread in the comp.os.minix

newsgroup and Linus’ equally provocative response; (see also DiBona et al.,

1999,Appendix A).

Bugsin operating systemscan be difﬁcult,unpredictable critters, butLinux’s

massivecollaborative environment was almost ideally suited for these circum-

stances.Since operating systems are subject to temporal and real-time concur-

rent effects, improvements in the system implementation tend to focus on the

need to remedy bugs – as well as on the need to develop new device drivers

as additional peripherals are developed (Wirzenius, 2003). Typical operating

system bugs might occur only rarely or intermittently or be highly context-

dependent. Bugs can be time-dependent or reﬂect anomalies that occur only

in some complicated context in a concurrent user, multitasking environment.

The huge number of individuals involved in developing Linux, especially as

informed users, greatly facilitated both exposing and ﬁxing such bugs, which

would have been much more difﬁcult to detect in a more systematic develop-

ment approach. The operativediagnostic principle or tactic from Linus’ view-

pointis expressed in his well-known aphorism that “given enougheyeballs, all

bugsare shallow.” On the other hand, therehave also been critiques of Linux’s

development.For example, one empirical study of the growthin coupling over

successiveversions of Linux concluded that “unless Linux is restructured with

a bare minimum of common coupling, the dependencies induced by common

coupling will, at some future date, make Linux exceedingly hard to main-

tain without inducing regression faults,” though this outcome was thought to

be avoidable if care were taken to introduce no additional coupling instances

(Schachet al., 2002).

The modular design of Linux’s architecture facilitated code development

just as the collaborative framework facilitated bug handling. For example, the

code for device drivers currently constitutes the preponderance of the Linux

source code, in contrast to the code for core operating system tasks like mul-

titasking which is much smaller by comparison. The drivers interface with

theoperating system kernel through well-deﬁned interfaces(Wirzenius, 2003).

Thus,modular device drivers are easy to write without theprogrammer having

a comprehensive grasp of the entire system. This separable kind of structure

is extremely important from a distributed development point of view.It facili-

tatesletting different individualsand groups address thedevelopment of drivers

independentlyof one another,something which isessential given theminimally

synchronized and distributed nature of the open development model. In fact,

theoverall structure of thekernel that Linus designed washighly modular. This

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

94 3 The Open Source Platform

is a highly desirable characteristic of an open source software architecture,

because it is essential for decomposing development tasks into independent

pieces which can be worked on separately and in parallel, with only relatively

limited organizational coordination required. Furthermore, this structure also

allowsso-called redundant development (Moon andSproul, 2000) where more

than one individual or groups of individuals can simultaneously try to solve a

problem with the best outcome or the earliest outcome ultimately selected for

inclusionin the system.

Linux was portable, functional, and turned out to be surprisingly reliable.

As we havenoted, over its ﬁrst several years of development, enough features

were added to Linux for it to become competitive as an alternative to Unix.

Its portability was directly related to the design decision to base Linux on a

monolithic core kernel (with hardware-speciﬁc code like device drivers han-

dledby so-called kernel modules). This decision was in turn directly related to

enablingthe distributed style ofLinux development. The structurealso allowed

Linus Torvalds to focus on managing core kernel development, while others

couldwork independently on kernel modules (Torvalds, 1999b). Several years

afterthe version 1.0 release of the system in1994, Linux was ported to proces-

sors other than the originally targeted 386/486 family, including the Motorola

68000, the Sun SPARC, the VAX, and eventually many others. Its reliability

quicklybecame superior to Unix. Indeed, Microsoft program manager Vallop-

pillil(1998), in the ﬁrst ofthe now famous conﬁdential Microsoft “Halloween”

memos, reported that the Linux failure rate was two to ﬁve times lower than

commercially available versions of Unix, according to performance analyses

doneinternally by Microsoft itself.

The scale of the project continued to grow.The size of the distributed team

of developers expanded almost exponentially. Despite this, the organizational

paradigmremained lean inthe extreme. Already bymid-year 1995, over15,000

peoplehad submitted contributions to the main Linux newsgroupsand mailing

lists(Moon and Sproul, 2000). Adecade later, by theyear 2005, there would be

almost 700 Linux user groups spread worldwide (http://lugww.counter.li.org/,

accessed January 5, 2007). The 1994 version 1.0 release, which had already

beencomparable in functionalityto Unix, encompassed 175,000lines of source

code. By the time version 2.0 was released in 1996, the system had 780,000

lines of source code. The 1998 version 2.1.110 release had a million and a

half lines of code (LOC), 30% of which consisted of code for the kernel and

ﬁle system, about 50% for device drivers, while about 20% was hardware

architecture-speciﬁc(Moon and Sproul, 2000). The amazing thing was that, to

quote Moon and Sproul (2000): “No (software) architecture group developed

the design; no management team approved the plan, budget, and schedule; no

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 95

HRgroup hired the programmers; no facilities group assignedthe ofﬁce space.

Instead, volunteers from all over the world contributed code, documentation,

and technical support overthe Internet just because they wanted to.” It was an

unprecedented tour de force of large-scale distributed development and led to

ablockbuster system.

What motivated such an army of dedicated participants? Many of them

sharedthe same kind of motivationas Linus had: theywanted to add features to

thesystem sothat itcould dosomething useful thatthey wantedfor theirpersonal

beneﬁt. People also wanted to be known for the good code they developed.

Initially,Linus providedpersonal acknowledgements for individualswho made

signiﬁcant contributions. Already by the version 1.0 release in 1994, Linus

personallyacknowledged the workof over 80 people(Moon and Sproul, 2000).

This version also began the practice of including a credits ﬁle with the source

code that identiﬁed the major contributors and the roles they had played. It

wasuptothe contributors themselves to ask Linusto be included in the credits

ﬁle. This kind of reputational reward was another motivation for continued

developerparticipation in a voluntary context like that of Linux.

All participants were by no means equal. Linux kernel developer Andrew

Morton, lead maintainer for the Linux production kernel at the time, observed

in 2004 (at the Forum on Technology and Innovation) that of the 38,000 most

recentpatches to the Linux kernel (made byroughly 1,000 developers), 37,000

of these patches – that’s about 97% – were made by a subset of 100 devel-

opers who were employees paid by their companies to work on Linux! It is

worth perusing the Linux credits ﬁle. For example, you might try to observe

any notable demographic patterns, like country of origin of participants, their

industrialor academic afﬁliations based on theire-mail addresses, the apparent

sexof participants, and the like.

Decisionsother than technical ones were key to Linux. Managerial innova-

tiveness was central to its successful development. Technical and managerial

issues could very well intertwine. For example, after the original system was

written for the Intel 386 and then re-ported in 1993 for the Motorola 68000,

it became clear to Linus that he had to redesign the kernel architecture so

that a greater portion of the kernel could serve different processor architec-

tures. The new architectural design not only made the kernel code far more

easilyportable but also more modular. Organizationally, this allowed different

parts of the kernel to be developed in parallel (Torvalds, 1999a, b) and with

less coordination, which was highly advantageous in the distributed develop-

mentenvironment. The way in which software releases were handled was also

determined by market effects. A simple but important managerial/marketing

decision in this connection was the use of a dual track for release. The dual

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

96 3 The Open Source Platform

track differentiated between stable releases that could be used conﬁdently by

peoplewho merelywanted to usethe operating systemas a platformon which to

dotheir applications work – versus development releases that were less stable,

still under development, and included the newest feature additions. This kept

two potentially disparate audiences happy: the developers had ﬂexibility, the

end users had certainty.The distinction between developer and stable releases

also supported the “release-early, release-often” policy that facilitated rapid

development. The release-numbering system reﬂected the categorization and

is worth understanding. Odd-numbered release series such as 2.3 (or its sub-

tree members like 2.3.1 and 2.3.2) corresponded to developer or experimental

releases.Stable releases had aneven-numbered second digit,like 2.0, 2.2. Once

astable release wasannounced, a newdeveloper series wouldstart with the next

higher (odd) number (such as 2.3 in the present case). Amazingly, there were

almost600 releases of all kinds between the 0.01 release in 1991 that started it

alland the 2.3 release in 1999 (Moon and Sproul, 2000).

Though development was distributed and team-based, the project retained

itssingular leadership. While Linusdisplayed a somewhatself-deprecatory and

mild-manneredleadership or managementstyle, it wasultimately he whocalled

the shots. He decided on which patches were accepted and which additional

featureswere incorporated, announcedall releases, and atleast in the beginning

of the project reviewed all contributions personally and communicated by e-

mail with every contributor (Moon and Sproul, 2000). If it is true that enough

eyeballs make all bugs shallow, it also appears to be true that in the Linux

world there was a governing single pair of eyes overseeing and ensuring the

quality and integral vision of the overall process. So one might ask again: is it

acathedral (a design vision deﬁned by a single mind) or a bazaar?

The choice of the GPL has been decisive to the developmental integrity of

Linux because it is instrumental in preventing the divergence of evolving ver-

sionsof the system. In contrast, we have seen how proprietary pressures in the

development of Unix systems encouraged the divergence of Unix mutations,

though the POSIX standards also act against this. This centralizing control

provided by the GPL for Linux was well-articulated by Roger Young (1999)

of Red Hat in a well-known essay where he argued that unlike proprietary

development: “In Linux the pressures are the reverse. If one Linux supplier

adoptsan innovation that becomes popular in the market, the other Linux ven-

dorswill immediately adoptthat innovation. Thisis because they haveaccess to

thesource code of the innovation andit comes under a license that allowsthem

touse it.” Thus,open source creates“unifying pressure toconform to acommon

reference point – in effect an open standard – and it removes the intellectual

propertybarriers that wouldotherwise inhibit thisconvergence” (Young,1999).

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.1 Operating Systems 97

This is a compelling argument not only for the stability of Linux but also for

themerits of the GPL in rapid, innovative system development.

Writing code may be a relatively solitary endeavor,but the development of

Linux was an interactive social act. We have remarked on the organizational

structure of Linux development, the motivations of its participants, and the

personal characteristics of its unique leader. It is also worthwhile to describe

somecharacteristics of thesocial matrix inwhich the projectoperated. For more

detail see the discussion in (Moon and Sproul, 2000). To begin with, Linus’

participation in the Usenet newsgroup comp.os.minix preceded his original

announcementto thecommunity of hisLinux adventure. Thiswas a largeonline

communitywith about 40,000members by 1992. Thegroup that woulddevelop

Linux was a self-selected subset that sat on top of this basic infrastructure,

which in turn sat on top of an e-mail and network structure. Of course, by

wordof e-mail the Linux group would quickly spread beyond the initial Minix

newsgroup.

Communities like those that developed Linux exhibit a sociological infras-

tructure. This includes their group communication structure and the roles

ascribed to the different members. In Linux development, group communi-

cation was handled via Usenet groups and various Linux mailing lists. Within

months of the initial project announcement, the original single mailing list

(designated Linux-activists) had 400 members. At the present time there are

hundreds of such mailing lists targeted at different Linux distributions and

issues. The comp.os.linux newsgroup was formed by mid-1992. Within a few

years there were literally hundreds of such Linux-related newsgroups (lin-

uxlinks.com/Links/USENET).The mailing list for Linux-activists wasthe ﬁrst

list for Linux kernel developers, but others followed. It is worth looking up

thebasic information concerning the Linux kernel mailing listat www.tux.org.

Check some of the entries in the site’s hyperlink Index to understand how the

process works. If you are considering becoming a participant in one of the

lists,beware of which list you subscribe to and consider the advice in the FAQ

(frequentlyanswered questions) at the site that warns:

Thinkagain before you subscribe. Do you really want to get that much trafﬁc in

yourmailbox? Are you so concerned about Linux kernel development that you will

patchyour kernel once a week, suffer through the oopses, bugs and the resulting

timeand energy losses? Are you ready to join the Order of the Great Penguin, and

becalled a “Linux geek” for the rest of your life? Maybe you’re better off reading

theweekly “Kernel Trafﬁc” summary at http://www.kerneltrafﬁc.org/.

The kernel mailing list is the central organizational tool for coordinating ker-

nel developers. Moon and Sproul (2000) observe that: “Feature freezes, code

freezes,and new releases are announced on this list. Bug reports are submitted

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

98 3 The Open Source Platform

tothis list. Programmers who want their code to be included in the kernel sub-

mit it to this list. Other programmers can then download it, test it within their

ownenvironment, suggest changes backto the author, or endorseit.” Messages

sentto the list are automatically resent to everyone on the list. The e-mail traf-

ﬁc is enormous, with thousands of developers posting hundreds of thousands

of messages in the course of time. As of 2005, a member of the kernel list

could receivealmost 10,000 messages per month, so that digest summaries of

messagesare appropriate to look at, at least initially. The modular architecture

of Linux also affects the communications proﬁle of the development since the

architecture partitions the developers into smaller groups. This way intensive

collaboration is not across a broad population of developers but with smaller

setsof developers.

The technical roles of the major participants are divided into so-called

credited developers and maintainers. Credited developers are those who have

made substantial code contributions and are listed in the Linux credits ﬁle

(such as http://www.kernel.org/pub/linux/kernel/CREDITS, accessed January

10, 2007). There are also major contributors who for various personal reasons

preferto keepa lowproﬁle and donot appearon the creditslist. Therewere about

400credited Linux kernel developers by 2000. Maintainersare responsible for

individual kernel modules. The maintainers “review linux-kernel mailing list

submissions (bug reports, bug ﬁxes, new features) relevant to their modules,

buildthem into larger patches,and submit the largerpatches back to the listand

to Torvaldsdirectly” (Moon and Sproul, 2000). These are people whose judg-

mentand expertise is sufﬁciently trusted by Linus in areas of the kernel where

hehimself is not the primary developer that hewill give close attention to their

recommendationsand tend to approve their decisions. The credited developers

andmaintainers dominate the message trafﬁc on the Linux-kernel mailing list.

Typically,1/50th of the developers generate 50% of the trafﬁc. Of this 50%,

perhaps30% of the trafﬁc is fromcredited developers while about 20% is from

maintainers. The norms for howto behave with respect to the mailing lists are

speciﬁed in a detailed FAQ document that is maintained by about 20 contrib-

utors. For example, refer to the http://www.tux.org/lkml/ document (accessed

January 10, 2007) to get a sense of the range of knowledge and behaviors that

are part of the deﬁning norms of the developer community. It tells you every-

thing from “What is a feature freeze?” and “How to apply a patch?” to “What

kindof question can I ask on the list?” Documentslike these are important in a

distributed,cross-cultural contextbecause they allowparticipants to understand

what is expected of them and what their responsibilities are. In the absence of

face-to-face interactions, the delineation of such explicit norms of conduct is

criticalto allowing effective, largely text-based, remote communication.

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 Windowing Systems and Desktops 99

References

Bovet,D.P.andCesati, M. (2003).Understanding theLinux Kernel,2nd edition.O’Reilly

Media,Sebastopol, CA.

DiBona, C., Ockman, S., and Stone, M. (1999). The Tanenbaum-Torvalds Debate in

AppendixA of Open Sources: Voicesfrom theOpen Source Revolution. M. Stone,

S.Ockman, and C. DiBona (editors). O’Reilly Media, Sebastopol, CA.

Edwards, K. (2003). Technological Innovation in the Software Industry: Open Source

Development.Ph.D. Thesis, Technical University of Denmark.

Moon,J.Y. andSproul, L. (2000). Essence of DistributedWork: The Case ofthe Linux.

http://www.ﬁrstmonday.dk/issues/issue5

11/moon/index.html. Accessed Decem-

ber3, 2006.

Schach,S., Jin, B., Wright, D., Heller, G., and Offutt,A. (2002). Maintainability of the

LinuxKernel. IEE Proceedings – Software, 149(1), 18–23.

Torvalds,L (1999a). The Linux edge. Communications of the ACM, 42(4), 38–39.

Torvalds,L. (1999b). The Linux edge. In: Open Sources: Voicesfrom the Open Source

Revolution, M. Stone, S. Ockman, and C. DiBona (editors). O’Reilly Media,

Sebastopol,CA, 101–111.

Valloppillil,V. (1998). Open source software: A (New?) development methodology,

(August 11, 1998). http://www.opensource.org/halloween/. Accessed January 20,

2007.

Wirzenius,L. (2003). Linux: The Big Picture. PC Update Online. http://www.melbpc.

org.au/pcupdate/2305/2305article3.htm.Accessed November 29, 2006.

Yamagata, H. (1997). The Pragmatist of Free Software: Linus Torvalds Interview.

http://kde.sw.com.sg/food/linus.html.Accessed November 29, 2006.

Young,R.(1999). GivingIt Away.In: OpenSources: Voicesfrom theOpen Source Revo-

lution,M. Stone,S. Ockman, and C.DiBona (editors). O’ReillyMedia, Sebastopol,

CA,113–125.

3.2 Windowing Systems and Desktops

By the early 1970s, computer scientists at the famed Xerox PARC research

facility were vigorously pursuing ideas proposed by the legendary Douglas

Engelbart,inventor of the mouse and prescient computer engineer whose sem-

inal work had propelled interest in the development of effective bitmapped

graphicsand graphical userinterfaces years ahead ofits time (Engelbart, 1962).

Engelbart’swork eventually led to the development of the Smalltalk graphical

environmentreleased on the Xerox Star computer in 1981. Many of the engi-

neerswho workedat Xeroxlater migratedto Apple,which released therelatively

low-cost Macintosh graphical computer in 1984. Microsoft released systems

likeWindows 2.0 with a similar “look and feel” to Apple by 1987, a similarity

for which Microsoft would be unsuccessfully sued by Apple (Reimer, 2005),

though see the review of the intellectual property issues involved in Myers

(1995). The provision of open source windowing and desktop environments

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

100 3 The Open Source Platform

for Unix beganin the early 1980s with the initiation of the X Window System

project. By the mid-1990s the GNOME and KDE projects to create conve-

nientfree desktop environments forordinary users, with GUI interfaces similar

to Windows and Mac OS, were begun. This section describes the enormous

efforts that have gone into these major open source projects: X, GNOME,

andKDE.

3.2.1 The X Window System

TheX Window System (also called X or X11after the version which appeared

in1987) lets programmers developGUIs for bitmap displayson Unix and other

platformswhich do notcome with windowing capabilities.It was developedfor

aUnix environment beginning at MIT in 1984 in ajoint collaboration between

MIT, DEC, and IBM and licensed under the permissive MIT/X open source

license by 1985. It is considered to be “one of the ﬁrst very large-scale free

softwareprojects” (XWindow System,2006), donein thecontext of thebudding

Internetwith extensive use of open mailing lists. Thesystem lets programmers

drawwindows and interact with the mouseand keyboard. It also provides what

is called network transparency,meaning that applications on one machine can

remotelydisplay graphics on another machine with adifferent architecture and

operatingsystem. For example, X allows a computationally intensive program

executingon a Unix workstation to display graphics on a Windowsdesktop (X

WindowSystem, 2006). X now serves as the basis for both remote and local

graphic interfaces on Linux and almost all Unix-like systems, as well as for

Mac OS X (which runs on FreeBSD). KDE and GNOME, the most popular

freedesktops, are higher levellayers that run on topof X11. In the case ofKDE

onUnix, for example, KDE applications siton top of the KDE librariesand Qt,

whichin turn run on X11 running on top of Unix (KDE, 2006).

TheX Window System is a large application with an impressive code base.

Forexample,X11had over 2,500 modules by 1994. In an overview of the size

ofa Linux distribution (Red Hat Linux 6.2), Wheeler (2000) observed that the

XWindows Server was the next largest component in the distribution after

the Linux kernel (a signiﬁcant proportion of which was device-dependent). It

occupiedalmost a million-and-a-half SLOC. X was followedin size by the gcc

compiler,debugger, and Emacs, each about half the size of X. In comparison,

the important Apache project weighs in at under 100,000 SLOC. Despite this,

Xalso works perfectly wellon compact digital devices likeIBM’s Linux watch

orPDA’s and has aminimal footprint that is “currently just over 1megabyte of

code(uncompressed), excludingtoolkits thatare typically muchlarger” (Gettys,

2003).

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 Windowing Systems and Desktops 101

The system was used on Unix workstations produced by major vendors

like AT&T,Sun, HP, and DEC (Reimer, 2005). The X11 version released in

1987 intentionally reﬂected a more hardware neutral design. To maintain the

coherent evolutionof the system, a group of vendors established the nonproﬁt

MIT X Consortium in 1987. The project was directed by X cofounder Bob

Scheiﬂer. The consortium proposed to develop X “in a neutral atmosphere

inclusiveof commercial and educational interests” (X WindowSystem, 2006),

with the objective of establishing “the X Window System as an industry-wide

graphics windowing standard” (Bucken, 1988). IBM joined the consortium in

1988. Over time, commercial inﬂuence on the project increased. There were

also ongoing philosophical and pragmatic differences between the FSF and

the X project. In fact, Stallman (1998) had described the X Consortium (and

its successor the Open Group) as “the chief opponent of copyleft” which is

one of the deﬁning characteristics of the GPL, even though the X license is

GPL-compatible in the usual sense that X can be integrated with software

licensedunder the GPL. The FSF’sconcern is the familiar onethat commercial

vendors can develop extensive proprietary customizations of systems like the

Xreference implementation which they could then make dominant because of

theresources they can plowinto proprietary development,relying on the liberal

termsof the MIT/X license.

A notable organizational change occurred in 2003–2004. The XFree86

project had started in 1992 as a port of X to IBM PC compatibles. It had

over time become the most popular and technically progressive version of X.

However,by 2003 there was growing discontent among its developer commu-

nity,caused partly by the difﬁcultyof obtaining CVS commit access. Ontop of

this,in 2004, theXFree86 project adopteda GPL-incompatible licensethat con-

tained a condition similar to the original BSD advertising clause. The change

wassupposed to provide more credit fordevelopers, but it had been donein the

face of strong community opposition, including from preeminent developers

likeJim Gettys, cofounder ofthe X project. Gettys opposedthe change because

itmade the license GPL-incompatible. Stallman (2004) observedthat although

the general intention of the new license requirement did “not conﬂict with the

GPL,” there were some speciﬁc details of the licensing requirement that did.

In combination with existing discontent about the difﬁculty of getting CVS

commitaccess, the new GPL incompatibility had almost immediate disruptive

consequences. The project forked, with the formation of the new X.Org foun-

dationin 2004. X.Org rapidly attractedalmost all the XFree86 developersto its

GPL-compatible fork. The newly formed organization places a much greater

emphasis on individual participation. In fact, Gettys (2003) notably observed

that “X.org is in the process of reconstituting its governance from an industry

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

102 3 The Open Source Platform

consortiumto an organization inwhich individuals, both at apersonal level and

as part of work they do for their companies have voice, working as part of the

larger freedesktop.org and free standards community” (italics added). X.Org

now provides the canonical reference implementation for the system which

remains “almost completely compatible with the original 1987 protocol” (X

WindowSystem, 2006).

References

Bucken, M. (1988). IBM Backs X Windows. Software Magazine, March 15. http://

ﬁndarticles.com/p/articles/mi

m0SMG/is n4 v8/ai 6297250. Accessed December

3,2006.

Engelbart, D.C. (1962). Augmenting Human Intellect: A Conceptual Framework.

Stanford Research Institute, Menlo Park, CA. http://www.invisiblerevolution.

net/engelbart/full

62 paper augm hum int.html. Accessed December 3, 2006.

Gettys,J. (2003). Open Source DesktopTechnology Road Map. HP Labs,Version 1.14.

http://people.freedesktop.org/∼jg/roadmap.html.Accessed December 6, 2006.

KDE. (2006). K Desktop Environment: Developer’s View. http://www.kde.org/

whatiskde/devview.php.Accessed January 10, 2007.

Myers, J. (1995). Casenote, Apple v. Microsoft: Virtual Identity in the GUI Wars.

Richmond Journal of Law and Technology, 1(5). http://law.richmond.edu/jolt/

pastIssues.asp.Accessed December 6, 2006.

Reimer,J. (2005). A History of the GUI. http://arstechnica.com/articles/paedia/gui.ars.

AccessedDecember 3, 2006.

Stallman, R. (1998). The X-Window Trap. Updated Version. http://www.gnu.org/

philosophy/x.html.Accessed December 3, 2006.

Stallman, R. (2004). GPL-Incompatible License. http://www.xfree86.org/pipermail/

forum/2004-February/003974.html.Accessed December 3, 2006.

Wheeler,D. (2000). EstimatingLinux’s Size. Updated 2004.http://www.dwheeler.com/

sloc/redhat62-v1/redhat62sloc.html.Accessed December 1, 2006.

XWindow System (2006). Top-Rated Wikipedia Article. http://en.wikipedia.org/wiki/

Window System. Accessed December 3.

3.2.2 Open Desktop Environments – GNOME

Theobjective ofthe GNOME Project isto create afree General Public Licensed

desktopenvironment for Unix-like systems like Linux. This ambition has long

been fundamental to the vision of free development for the simple reason that

providingan effective, free GUI desktop interfacefor Linux or other free oper-

atingsystems is necessary for them to realistically compete in themass market

with Windows and Apple environments. Aside from Linux itself, no other

open source project is so complex and massive in scale as GNOME, and so

overtlychallenges theexisting, established, proprietaryplatforms. The acronym

GNOMEstands for GNU NetworkObject Model Environment. Itis the ofﬁcial

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 Windowing Systems and Desktops 103

GNU desktop. In addition to the user desktop, GNOME also encompasses a

variety of standard applications and a comprehensive development environ-

mentused to develop applicationsfor GNOME or further developthe GNOME

platformitself.

Theidea for the GNOME project was initiated in 1996 by Miguel de Icaza.

Icaza,a recent computerscience graduate whowas the maintainer forthe GIMP

project, released (along with Fedrico Mena) a primitive version (0.10) of a

GUI infrastructure for Unix in 1997. The development language used was C

(de Icaza, 2000). There was a signiﬁcant free licensing controversy behind

the motivation for developing GNOME. There already existed by that time

anotherfree desktop project calledKDE, but there werelicensing controversies

associatedwith KDE.One ofits keycomponents, the Qttoolkit librarydiscussed

inChapter 2, did not use anacceptable free software license. Toavoid this kind

of problem, the GNOME developers selected, instead of Qt, the GIMP open

source image processing toolkit GTK+. They believed this software would

serve as an acceptable LGPL basis for GNOME. The GNU LGPL permitted

anyapplications written for GNOME to use any kind of software license, free

or not, although of course the core GNOME applications themselves were to

belicensed under the GPL. The ﬁrst major release ofGNOME was version 1.0

in1999. It was included as part of the Red Hat Linux distribution. This release

turnedout to be very buggy but was improved in a later release that year.

There are differentmodels for how founders continue a long-term relation-

shipwith an open source project.For example, they may maintainlicense own-

ershipand start a company that uses a dual open/proprietary track for licenses.

In the case of Icaza, after several years of development work on GNOME, he

relinquishedhis role and founded the for-proﬁt Ximian Corporation in 2000as

aprovider for GNOME-related services. In order to ensure thecontinued inde-

pendence of the GNOME project, the GNOME Foundation was established

later that year. Its Board members make decisions on the future of GNOME,

using volunteer committees of developers and release teams to schedule plan-

ningand future releases. The not-for-proﬁt GNOME Foundation, its industrial

partners,and avolunteer base ofcontributors cooperateto ensure thatthe project

progresses. The GNOME Foundation’s mandate has been deﬁned as creating

“a computing platform for use by the general public that is completely free

software”(GNOME Foundation, 2000; German, 2003).

GNOME is unambiguously free in lineage and license – and it’s big. Fol-

lowing free software traditions, the development tools that were used to cre-

ate GNOME are all free software. They include the customary GNU soft-

ware development tools (gcc compiler, Emacs editor, etc.), the Concurrent

Versioning System for project conﬁguration management, and the Bugzilla

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

104 3 The Open Source Platform

bug-tracking server software developed by the Mozilla Foundation and avail-

able from http://www.bugzilla.org/. These development tools were of course

themselvesthe product of lengthy development duringthe 1980s–1990s by the

GNU project. The project’scode base is extensive and increasingly reliable. It

is now a very large system with about two million LOC and 500 developers

invarious categories (German, 2003). As Koch and Schneider (2002) observe,

that’s actually roughly six million LOC added and four million lines deleted

per the project’s CVS repository. As already noted, the GNOME desktop has

beenenvisioned as an essential componentif the free GNU/Linux environment

is to compete in the popular market with Windows and Apple. That vision is

emergingas a reality. Thus GNOME representsone of the culminating accom-

plishmentsof the free software movement.

GNOME has three major architectural components: the GUI desktop envi-

ronment,a set of tools and libraries that can interact with theenvironment, and

a collection of ofﬁce software tools. The scale and organizational structure of

the project reﬂect these components. The GNOME project architecture con-

sists of four main software categories, with roughly 45 major modules and a

largenumber of noncore applications. The categories comprise libraries (GUI,

CORBA, XML, etc. – about 20 in total), core applications (about 16: mail

clients,word processors, spreadsheets, etc.), application programs, and several

dozennoncore applications (German, 2003). The modules, as typical in such a

large project, are relatively loosely coupled, so they can be developed mostly

independently of one another. When modules become unwieldy in size, they

aresubdivided as appropriate intoindependent submodules. The modular orga-

nizationis key to the success of the project because it keeps the organizational

structure manageable. Basically, a relatively small number of developers can

workindependently on each module.

While most open source projects do not, in fact, have large numbers of

developers, GNOME does, with over 500 contributors having write access to

the project repository (German, 2003), though as usual a smaller number of

activistdevelopers dominate. Koch and Schneider (2002) describe a pattern of

participationfor GNOME thatis somewhat differentthan that foundby Mockus

etal. (2002)intheirstatistical reviewof the Apacheproject. Though GNOME’s

development still emanated from a relatively small number of highly activist

developers, the distribution is deﬁnitely ﬂatter than Apache’s. For example,

while for Apache the top 15 developers wrote about 90% of the code, for

GNOMEthe top 15developers wrote only50% of thecode, and toreach 80% of

theGNOME code, the top50 developers have tobe considered. At a morelocal

level, consider the case of the GNOME e-mail client (Evolution). According

to its development log statistics, ﬁve developers, out of roughly 200 total for

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 Windowing Systems and Desktops 105

theclient, were responsible forhalf the modiﬁcations; 20 developersaccounted

for 80% of the development transactions; while a total of 55 developers of

the 200 accounted for 95% of the transactions (German, 2003). This skewed

pattern of contribution is not untypical. Refer to the useful libresoft Web site

pagehttp://libresoft.urjc.es/Results/index

htmlfor CVS statistics forGNOME,

aswell as for many other open source projects.

Theuser environment that had to be created for GNOME was well-deﬁned:

it was simply a matter of “chasing tail-lights” to develop it. Requirements

engineering for GNOME, like other open source projects, did not follow the

conventionalproprietary development cycle approach. It used a more generic

andimplicitly deﬁned approachas described in German(2003). The underlying

objectivewas that GNOMEwas to be free software,providing a well-designed,

stabledesktop model, comparable to Windows and Apple’s,in order for Linux

to be competitive in the mass market PC environment. The nature of the core

applicationsthat needed to be developed was already well-deﬁned. Indeed, the

mostprominent reference applications werefrom the market competition to be

challenged. For example, Windows MS Excel was the reference spreadsheet

application.It was to be matched by theGNOME gnumeric tool. Similarly, the

e-mail client Microsoft Outlook and the multifunction Lotus Notes were to be

replacedby the GNOME Evolutiontool. An anecdote by Icazareﬂects both the

informality and effectiveness of this reference model tactic in the design of a

simplecalendar tool:

Iproposed to Federico to write a calendar application in 10 days (because Federico

wouldnever show up on weekends to the ICN at UNAM to work on GNOME ;-).

Theﬁrst day we looked at OpenWindows calendar, that day we read all the relevant

standarddocuments that were required to implement the calendar, and started

hacking.Ten days later we did meet our deadline and we had implemented

GnomeCal(de Icaza, 2000).

Requirements also emerged from the discussions that occurred in the mail-

ing lists. Prototypes like the initial GNOME version 0.1 developed by Icaza

also served to deﬁne features. Ultimately, it was the project leader and the

maintainers who decided on and prioritized requirements. While fundamental

disagreements regarding such requirements could lead to forks, this did not

happenin the case of GNOME.

GNOME’scollaborative development model relies heavily on private com-

panies. Indeed, much of GNOME’s continued development is staffed by

employeesof for-proﬁtcompanies. However,the projectitself isvendor-neutral.

Themaintainers of most of GNOME’simportant modules are actuallyemploy-

eesof for-proﬁt corporations like Ximian, RedHat, and Sun. This arrangement

helps guarantee the stable development of the project since essential tasks are

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

106 3 The Open Source Platform

less subject to ﬂuctuations at the volunteerlevel. The paid employee contribu-

torstend to handledesign, coordination, testing,documentation, and bugﬁxing,

asopposed to bugidentiﬁcation (German, 2003). Thus,for the Evolution client,

about70% of the CVS commits come from thetop 10% of the contributors, all

of whom are employees of Ximian. Similarly, Sun has extensively supported

theso-called GNOMEaccessibility framework whichaddresses usability issues

including use by disabled individuals. Though paid corporate employees play

a major role, the volunteer participants are also pervasive, particularly as beta

testers,bug discoverers, anddocumenters. The volunteers areespecially impor-

tantin the area of internationalization –anaspectthat requiresnative language

experts and is supported by individuals motivated by a desire to see GNOME

supported in their own language. Interestingly,despite the role of voluntarism

it also appears to be the case that a career path strategy is often followed or at

least attempted by volunteers. Thus, most of the paid workers had started off

asproject volunteers and later movedfrom being enthusiastic hobbyists to paid

employeesof the major corporate sponsors (German, 2003).

Communicationamong the project participants is kept simple. It is handled

in a standard manner, using relatively lean media over the Internet commu-

nication channel, supplemented by traditional mechanisms like conferences

and Web sites. Mailing lists are used extensively for end users as well as for

individual development components. Bimonthly summaries are e-mailed on

the GNOME mailing list describing major current work, including the most

activemodules and developers during the report period. These are the forums

inwhich decisions about a module’s development are made. Project Web sites

contain information categorized according to type of participants, from items

for developers and bug reports to volunteer promotional efforts. An annual

conferencecalled GUADEC brings developerstogether and is organizedby the

GNOME Foundation.IRC or Internet Relay Chat (irc.gnome.org) provides an

informal means of instantaneous communication. (Incidentally, the Web site

Freenode.net provides IRC network services for many free software projects

includingGNU.) Of course, theCVS repository for theproject effectively coor-

dinatesthe development of the overall project.A limited number of developers

havewrite accessto the repository,having gainedthe privilege overtime by pro-

ducingpatches that maintainers have come to recognize as trustworthy, a tried

andtrue path inopen development. Initially,the patcheshave to besubmitted by

thedevelopers to the maintainers asdiffs or patches, at least until thedeveloper

hasattained a recognized trustworthy status with themaintainer. Rarely, it may

happenthat a developermay apply a patchto the repository thatis subsequently

rejectedby the maintainer.Such an outcomecan be disputedby appealing to the

broadercommunity, but these kinds of events are infrequent (German, 2003).

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 Windowing Systems and Desktops 107

References

De Icaza, M. (2000). The Story of the GNOME Project. http://primates.ximian.com/

∼miguel/gnome-history.html.Accessed November 29, 2006.

German,D.M. (2003). GNOME, aCase of Open SourceGlobal Software Development.

In:International Conference on Software Engineering, Portland, Oregon.

GNOME Foundation. (2000). GNOME Foundation Charter Draft 0.61. http://

foundation.gnome.org/charter.html.Accessed November 29, 2006.

Koch,S. and Schneider, G. (2002). Effort, Co-operation, and Co-ordination in anOpen

SourceSoftware Project: GNOME. Information Systems Journal, 12(1), 27–42.

Mockus,A., Fielding,R.T., andHerbsleb, J.D. (2002). TwoCase Studiesof Open Source

Development:Apache and Mozilla. ACM Transactions on Software Engineering

andMethodology, 11(3), 309–346.

3.2.3 Open Desktop Environments – KDE

The acronym KDE stands for – believe it or not – Kool Desktop Environ-

ment. The KDE Website cogently expresses the vision and motivation for the

project: “UNIX did not address the needs of the average computer user. .. . It

isour hope that the combination UNIX/KDE will ﬁnally bring the same open,

reliable, stable and monopoly-free computing to the average computer user

thatscientist and computing professionals world-wide have enjoyed for years”

(www.kde.org).GNOME and KDE to some extent competitively occupy the

same niche in the Linux environment. But they are now both recognized for

the advances they made in achieving the OSS goal of creating a comprehen-

siveand popularly accessible free source platform. In 2005, USENIX gave the

twomost prominent developers of GNOME and KDE,de Icaza and Ettrich, its

STUG award for their work in developing a friendly GUI interface for open

desktops, saying that: “With the development of user friendly GUIs, both de

Icaza and Ettrich are credited with overcoming a signiﬁcant obstacle in the

proliferation of open source. .. . Their efforts have signiﬁcantly contributed to

the growing popularity of the open source desktop among the general public”

(http://www.usenix.org/about/newsroom/press/archive/stug05.html, accessed

January10, 2007).

The theme of product development by a sole inspired youth repeats itself

in KDE. The KDE project was started in 1996 by Matthias Ettrich, a 24-year-

old computer science student at the University of Tubingen. Ettrich had been

ﬁrst exposed to free software development via the GNU project and Linux.

Actually it wasmore than mere exposure. Ettrich wrote the ﬁrst version of the

open source product Lyxwhich uses the open source software system LaTeX,

developed for Don Knuth’s typesetting system TeX to produce high-quality

documentoutput. Ettrich has said that “this positive and successful experience

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

108 3 The Open Source Platform

of initiating a little self-sustaining free software community made me brave

enoughto start the KDE project later” (FOSDEM, 2005).

Ettrichannounced his KDE proposal in a now well-known e-mail where he

proposed the idea for the project. The objective is reminiscent of the attitude

that Firefox’sBlake Ross had for Firefox’s potential audience. Ettrich wanted

todeﬁne and implement:

AGUI for end users

Theidea is NOT to create a GUI for the complete UNIX-System or the System-

Administrator.For that purpose the UNIX-CLI with thousands of tools and

scriptinglanguages is much better. The idea is to create a GUI for an ENDUSER.

Somebodywho wants to browse the web with Linux, write some letters and play

somenice games.

The e-mail was posted at the hacker’s favorite mid-morning hour: October

14, 1996, 3:00 a.m.,totheLinux newsgroup, de.comp.os.linux.misc. Refer to

the KDE organization’s http://www.kde.org/documentation/posting.txtfor the

fulltext.

In low-key, good-humored style reminiscent of Linus Torvalds, Ettrich

continued:

IMHOa GUI should offer a complete, graphical environment. It should allow a

userto do his everyday tasks with it, like starting applications, reading mail,

conﬁguringhis desktop, editing some ﬁles, delete some ﬁles, look at some pictures,

etc.All parts must ﬁt together and work together.

...Soone ofthemajorgoals is to provide amodern and common look & feel for all

theapplications. And this is exactly the reason, why this project is different from

elderattempts.

“IMHO” is the deferential “In My Humble Opinion” acronym derived from

Usenetcustom.

The inaugural e-mail refers prominently to the author’s intention to use the

QtC++ GUIwidget library forthe planned implementationof the project.Even-

tually,this use of the Qttoolkit would lead to free licensing concerns regarding

KDE. These concerns would be signiﬁcant in motivating the development of

the competing GMOME project. The X referred to in the e-mail later is the X

WindowSystem for Unix which provided the basic toolkit for implementing

a window,mouse, and keyboard GUI. Motif is the classic open source toolkit

fromthe 1980sfor making GUIson Unixsystems. Incidentally,the misspellings

are from the original, reﬂecting the relaxed tone of the e-mail and perhaps the

differencein language. The e-mail continues as follows:

Sincea few weeks a really great new widget library is available free in source and

pricefor free software development. Check out http://www.troll.no

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 Windowing Systems and Desktops 109

Thestuff is called “Qt” and is really a revolution in programming X. It’s an almost

complete,fully C++ Widget-library that implementes a slightly improved Motif

lookand feel, or, switchable during startup, Window95.

Thefact that it is done by a company (Troll Tech) is IMO a great advantage. We

havethe sources and a superb library, they have beta testers. But they also spend

theirWHOLE TIME in improving the library. They also give great support. That

means,Qt is also interesting for commercial applications. A real alternative to the

terribleMotif :) But the greatest pro for Qt is the way how it is programmed. It’s

reallya very easy-to-use powerfull C++-library.

Itis clear from the post that Ettrich was unaware that there might be licens-

ing complications with the Qt toolkit. Originally Qt appears to have been pro-

prietary to Trolltech. Actually, there were both free and proprietary licenses

available, with the proprietary licenses only required if you were intend-

ing to release as closed source a product you developed using Qt. The free

license, however, was not quite free. For example, the version described at

http://www.kde.org/whatiskde/qt.php(accessed January10, 2007) requiresthat

“Ifyou want to make improvementsto Qt you need to sendyour improvements

to Troll Tech. Youcan not simply distribute the modiﬁed version of Qt your-

self,”which was contrary to the GPL. There was much legal wrangling on this

issue between the KDE developers and the FSF. Finally, in 2000, Trolltech –

for which Ettrich then worked – announced that it would license Qt under the

GNUGPL. This satisﬁed reservations among proponents of the Free Software

Movement.Per the KDEWeb site itis now the casethat “Each and everyline of

KDEcode is made availableunder the LGPL/GPL.This means that everyoneis

free to modify and distribute KDE source code. This implies in particular that

KDEis available free of charge to anyone and will always be free of charge to

anyone.”

The Qt licensing issue was a political cause c´el`ebre among certain open

source advocates, but it does not seem to have been a consideration for

users selecting between KDE and GNOME. They were primarily concerned

about the functionality of the systems (Compton, 2005). Already by 1998,

Red Hat had chosen KDE to be their standard graphical interface for their

Linux distributions. Currently, major Linux distributions tend to include both

KDE and GNOME in their distributions, with some companies like Sun

or Caldera preferring one to the other. A port of KDE to run on a Win-

dows environment is the mission of Cygwin (http://kde-cygwin.sourceforge.

net/).

The demographic proﬁle of the KDE participants is fairly standard. KDE

has about 1,000 developers worldwide, mainly from Europe, having origi-

nated in Germany. It consists mostly of males aged 20–30 years old, many

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

110 3 The Open Source Platform

of whom are students or are employed in IT (Brand, 2004). The KDE

Web site is interesting and quite well-organized. Refer to the organization’s

http://www.kde.org/people/gallery.php “Virtual Gallery of Developers” for

biographiesof the major developers,with academic, professional, and personal

remarks. About two-thirds of the participants are developers, the remainder

beinginvolved in documentation, translation (about50 languages are currently

represented), and other activities. According to the survey by Brand (Chance,

2005),the workefforts ofindividual contributorsvary from aquarter-of-an-hour

to half-a-day, per day, with an average of two to three hours per day. In gen-

eral, as we havenoted previously, open development processes are visible and

extensivelydocumented (Nichols and Twidale,2003)inaway that proprietary,

closedsource, corporate developmentscannot be,almost in principle. Themail-

inglists and CVS repository that arethe key communications tools establishan

incrediblydetailed, time-stamped record of developmentwith readily available

machine-readablestatistics. For example, thelibresoft Web site mentionedpre-

viously,particularlythe statistics linkhttp://libresoft.urjc.es/Results/index

html

(accessedJanuary 10, 2007) is an excellent resource for detailed data on many

opensource projects including not only KDE but also important other projects

likeGNOME, Apache, FreeBSD, OpenBSD,XFree86, Mozilla and soon, with

plentiful data about CVS commits, module activity, and so on. The site also

containsdetailed data on committers and their contributions.

KDEstarted its developmentat a propitiousmoment in the evolutionof open

software platforms. This ﬁrst version was both timely and critical because it

helpedadvertise the product at a time when Linux was rapidly growing. There

wasasyetno easy-to-usedesktop interface available for Linux, so the product

ﬁlled an unoccupied market niche. The initial success of the project was also

bolstered because the project creators were able to recruit developers from

anotheropen source project they hadconnections with (Chance, 2005). Further

successful development and possibly even the competition with the GNOME

projecthelped advertise the project even more, leading to additional developer

recruiting. The C++ implementation of KDE (vs. C for GNOME) facilitated

the enhancement of the system’s core libraries, again arguably facilitating its

success. Though KDE was initiated in 1996, most developers joined between

1999and 2002 (Brand, 2004).

Inﬂuence inside the KDE project is as usual determined by work-based

reputations. Reputations are based on experience and contributions, but

friendlyand cooperative behavior is an asset.Admission to the KDE core team

requires a reputation based on “outstanding contributions over a considerable

periodof time” (http://www.kde.org/).The kde-core-devel mailinglist is where

decisions are made, but the process is informal and unlike the centralized

“benevolent dictatorship” approach characteristic of Linux development. The

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.2 GIMP 111

normtends to be that “whoever does the work has the ﬁnal decision” (Chance,

2005). Lead architects and maintainers who are authorized to speak for

the community are responsible for moving the platform forward. Ettrich has

observedthat the relativelyanarchical structure of theKDE organization makes

it hard to do things, commenting that “unless you have a captain,” then, even

with all the right ideas, “whether we are able to realize them against our own

resistanceis a different matter” (FOSDEM, 2005).These challenges reﬂect the

classic tension between the Cathedral and the Bazaar: it is hard to do without

strong, authoritative leadership in guiding the direction of large projects. The

conﬂictsthat have arisen derive mainly from differences concerning the future

direction of the project. Secondary sources of conﬂict include interpersonal

reactions to things like refusals to accept patches or ignored contributions.

There are also the traditional conﬂicts between the end users and developers.

These typically result from a disjuncture between the technical orientation of

thedevelopers versus the preference for stability and ease ofuse that end users

are interested in. A usability group (http://usability.kde.org/) has developed

that attempts to mediate the two viewpoints, but its standing is still of limited

importance(Chance, 2005). Like GNOME, KDE hasplaced a strong emphasis

on accessibility issues for individuals with disabilities. In terms of future

developments, Ettrich himself underscores usability issues as one of his “top

3favorite focus areas for KDE” (FOSDEM, 2005).

References

Brand,A. (2004). Structure of KDE Project. PELM Project, Goethe University, Frank-

furt.

Chance, T. (2005). The Social Structure of Open Source Development. Interview

with Andreas Brand in NewsForge. http://programming.newsforge.com/article.

pl?sid=05/01/25/1859253.Accessed November 29, 2006.

Compton,J. (2005).GNOME vs. KDE inOpen Source Desktops.http://www.developer.

com/tech/article.php/629891.Accessed January 20, 2007.

FOSDEM. (2005). Interview with Matthias Ettrich KDE. http://archive.fosdem.org/

2005/index/interviews/interviews

ettrich.html.Accessed January 10, 2007.

Nichols,D. and Twidale,M. (2003). The Usabilityof Open Source. First Monday,8(1).

http://www.ﬁrstmonday.dk/issues/issue8

1/nichols/index.html. Accessed Decem-

ber3, 2006.

3.3 GIMP

GIMP is a free software image manipulation tool intended to compete with

Adobe Photoshop. We include it in this chapter on open source platforms

because it is an important desktop application (not an Internet-related system

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

112 3 The Open Source Platform

likethose consideredin Chapter 2)and becauseits toolkit isused inthe GNOME

desktop.Imaging tools likeGIMP are ofincreasing importance inindustrial and

medicalapplications as wellas gaming andentertainment technology.The story

ofGIMP is important for understanding the record of accomplishment of open

developmentfor several reasons. Its originatorswere, prototypically, computer

scienceundergraduates at Berkeley who had themselves been weaned on open

sourceproducts. Out ofpersonal curiosity they wantedto develop aproduct that

incidentally,but only incidentally, would serve an important need in the open

sourceplatform. Their product imitated and challenged adominant proprietary

softwaretool for an end-user application, unlike most previous free programs.

Legalquestions about the licensing characteristics for some componentsof the

systemcreated a controversy within the free softwaremovement. The software

architecture represented by its plug-in system strongly impacted the success

of the project by making it easier for developers to participate. The reaction

and involvement of end users of the program was exceptionally important in

making GIMP successful because its value and effectiveness could only be

demonstrated by its ability to handle sophisticated artistic techniques. Conse-

quently,the tool’sdevelopment demanded an understandingof how it wasto be

usedthat could easily transcend the understanding of most of the actual devel-

opers of the system. In other words, the end users represented a parallel but

divergentform of sophistication to the programdevelopers. Management chal-

lenges arose with the fairlyabrupt departure of the originating undergraduates

for industrial positions on completion of their undergraduate careers and the

replacementof the original leadershipwith a team of coresponsibledevelopers.

Likethe other open source products we have examined, the storyof GIMP can

helpus understand how successful open source projects are born and survive.

GIMP,an acronymfor the“GNU ImageManipulation Program,” issupposed

tobe a challenge to Adobe Photoshop. It is intended to stand asthe free source

counterpart to Adobe Photoshop and is an ofﬁcial part of the GNU software

developmentproject. Coming out beginningin 1996, GIMP was oneof the ﬁrst

major free software products for an end-user applications, as opposed to most

ofthe GNU projects thatwere oriented toward use byprogrammers. It provides

standard digital graphics functions and can be used, for example, to make

graphicsor logos, editand layer images, convertimage formats, makeanimated

images, and so on. According to its Freshmeat project description, GIMP is

“suitable for such tasks as photo retouching, image composition and image

authoring. It can be used as a simple paint program, an expert quality photo

retouching program, an online batch processing system, a mass production

image renderer, an image format converter, etc.” (from “The Gimp – Default

Branch”;description on www.freshmeat.net).

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.3 GIMP 113

Classprojects at UC Berkeleyhave a wayof making a bigsplash. GIMP was

developed by Peter Mattis and Spencer Kimball in August 1995, initially for

a class project for a computer science course when they were undergraduates.

Mattis“wanted to make a webpage”(Hackvn, 1999)soasaresulttheydecided

it would be interesting to design a pixel-based imaging program. Following

open source development custom, Mattis posted the following question on

comp.os.linux.x > Image Manipulation Program Features in July 1995 at the

canonicalhacker time of 3:00 a.m.:

Supposesomeone decided to write a graphical image manipulation

program(akin to photoshop). Out of curiousity (and maybe something

else),I have a few (2) questions:

Whatkind of features should it have? (tools, selections, ﬁlters, etc.)

Whatﬁle formats should it support? (jpeg, gif, tiff, etc.)?

Thanksin advance,

PeterMattis

Atthis point, neither Mattis nor Kimball had anythingbut a cursory familiarity

with image manipulation tools (Hackvn, 1999). However, within six months

Mattis and Kimball working alone – not as part of a free-wheeling bazaar

format– hadreleased abeta version ofGIMP asopen source. Theannouncement

was made at 4:00 a.m. on November 21, 1995, on comp.windows.x.apps >

ANNOUNCE: The GIMP. The style of the release announcement is worth

notingfor the speciﬁcity and clarity of itsstatement of the project functionality

and requirements. Weprovide it in some detail as an illustration of how these

announcementsare heralded:

TheGIMP: the General Image Manipulation Program

TheGIMP is designed to provide an intuitive graphical interface to a

varietyof image editing operations. Here is a list of the GIMP’s major

features:

Imageviewing

Supports8, 15, 16 and 24 bit color.

Orderedand Floyd-Steinberg dithering for 8 bit displays.

View images as rgb color, grayscale or indexed color.

Simultaneouslyedit multiple images.

Zoomand pan in real-time.

GIF,JPEG, PNG, TIFF and XPM support.

Imageediting

Selectiontools including rectangle, ellipse, free, fuzzy, bezier and

intelligent.

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

114 3 The Open Source Platform

Transformationtools including rotate, scale, shear and ﬂip.

Paintingtools including bucket, brush, airbrush, clone, convolve,

blendand text.

Effectsﬁlters (such as blur, edge detect).

Channel& color operations (such as add, composite, decompose).

Plug-inswhich allow for the easy addition of new ﬁle formats and

neweffect ﬁlters.

Multipleundo/redo. . . .

TheGIMP has been tested (and developed) on the following operating

systems:Linux 1.2.13, Solaris 2.4, HPUX 9.05, SGI IRIX.

Currently,the biggest restriction to running the GIMP is the Motif

requirement.We will release a statically linked binary for several

systemssoon (including Linux).

URLs

http://www.csua.berkeley.edu/∼gimp

ftp://ftp.csua.berkeley.edu/pub/gimpmailto:g.. . @soda.csua.berkeley.edu

Broughtto you by

SpencerKimball (spen . . .@soda.csua.berkeley.edu)

PeterMattis (p . . .@soda.csua.berkeley.edu)

NOTE

Thissoftware is currently a beta release. This means that we haven’t implemented

allof the features we think are required for a full, unqualiﬁed release. There are

undoubtedlybugs we haven’t found yet just waiting to surface given the right

conditions.If you run across one of these, please send mail to g...@soda.csua.

berkeley.eduwith precise details on how it can be reliably reproduced.

Theﬁrst public release (version.54) actually came in January 1996.

Plug-ins played an important role in the expansion of GIMP. The two solo

developers had provided a powerful and functional product with important

features like a uniform plug-in system, “so developers could make separate

programs to add to GIMP without breaking anything in the main distribution”

(Burgess,2003). Spencernoted that “The plug-inarchitecture of theGimp had a

tremendousimpact on its success, especiallyin the early stages ofdevelopment

(version 0.54). It allowed interested developers to add the functionality they

desiredwithout having to dig into the Gimp core” (Hackvn, 1999).

Plug-ins are very important in GIMP also because of its competition with

Photoshop. In fact, plug-ins for Adobe Photoshop can also run on GIMP if

you use the pspi (Photoshop Plug-in Interface) plug-in for GIMP that runs

third-party Photoshop plug-ins. Pspi was developed for Windows in 2001 and

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.3 GIMP 115

for Linux in 2006 (http://www.gimp.org/∼tml/gimp/win32/pspi.html). Pspi

acts as an intermediary between GIMP and Photoshop plug-ins, which are

implemented as dlls. According to pspi developer Tor Lillqvist, “The ques-

tion was ‘How would you load and call code in a Windows DLL on Linux’ ”

(http://www.spinics.net/lists/gimpwin/msg04517.html).As describedby Willis

(2006), pspi appears to be a “full, running copy of Photoshop. It provides the

hooks into the menus and functions of Photoshop that the plugin expects to

see, and connects them to the GIMP’s extension and menu system.” This is

actuallyextremely signiﬁcant for theattractiveness of the Linuxplatform itself.

Professional graphics artists strongly prefer Photoshop under Windows; one

reasonbeing the availability of third-party plug-ins.The availability of pspi for

Linuxchanges this. There are a few ironiesin this story. A software bridgelike

pspi is made possible in the ﬁrst place because of the Adobe policyof encour-

aging the development of third-party plug-ins through the use of its software

developmentkit. Thus,Adobe’s (naturaland logical) plug-inpolicy, designed to

increaseits own marketability,can also by the same tokenincrease its competi-

tion’smarketability.Furthermore, compiling thepspi source requires theAdobe

developmentkit, so you needthe kit to create the executablefor pspi. However,

oncethis is done, the executable itself is of course freely redistributable,as the

pspi is in the ﬁrst place. Oddly,up until Photoshop 6 Adobe gave the software

development kit away for free but now requires speciﬁc approval. Thus, in a

certainsense an original compilation of pspi for use in GIMP would implicitly

requiresuch anapproval by Adobe.In anycase, the pointis moot becausedown-

loadable binaries are available for multiple platforms for pspi (Willis, 2006).

The pspi development illustrates the complicated interplay between technical

developmentissues, softwarearchitecture choices, legalissues, high-endgraph-

icsuser expectations,and the sometimes-unintended consequencesof corporate

policieslike those that encourage external development.

Perhapsunsurprisingly, licensing issueshave also affected GIMP’sdevelop-

ment.The initialGIMP toolkit forbuilding widgetswas based onthe proprietary

Motifwidget library. A widget (which is shorthand for “windows gadget”) can

be deﬁned as a “standardized on-screen representation of a control that may

bemanipulated by the user” (redhat.com glossary),examples being scroll bars,

menus,buttons, sliders, and text boxes. Widgets can be thought of as the basic

building blocks of graphical interfaces and are constructed using toolkit pro-

grams.Because the Motifwidget library wasproprietary, anopen source widget

library was developed, called GTK+ (standing for GIMP toolkit), in order to

remainfully consistentwith the principlesof the freesoftware movement.There

wasalso another more personal professionalmotivation for replacing the Motif

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

116 3 The Open Source Platform

library.In addition to the developers thinking that Motif toolkit was “bloated

andinﬂexible” (Hackvn, 1999), Mattis personally “was dissatisﬁed with Motif

and wanted to see what it took to write a UI toolkit” for his own ediﬁcation.

Theresulting GTK toolkit (eventually enhanced to GTK+) was licensed under

the LGPL, so it could be used even by developers of proprietary software

(www.gtk.org).GTK was provided with the 1996 release (Bunks, 2000). Sub-

sequentto that release, weaknesses in the beta version of the system, like poor

memory management, were resolved. There were also improvements like the

useof layer-based images,based on what thedevelopers saw usedin Photoshop

3.0.Another beta version was released in early 1997.

ByJune 1997, Kimball and Mattis hadreleased version 0.99.10 with further

improvements, including the updated GTK+ library. That ﬁnal undergraduate

version represented a huge effort. Kimball remarked that he had “spent the

better part of two years on Gimp, typically at the expense of other pressing

obligations(school, work, life)” andthat “probably 95 to 98percent of the code

in0.99.10 was written by Pete or myself” (Hackvn,1999). They both share the

onGIMP and Mattis on the GTK. They never got to release version 1.0 – they

graduated from college in June 1997. The authors say that developing GIMP

waslargely a matter of dutyto the cause of free software.For Spencer Kimball,

hisGIMP development work hadbeen partly his payment onwhat he felt was a

debtof honor, as he said in an interview: “From the ﬁrst line of source code to

thelast, GIMP was alwaysmy ‘dues’ paid tothe free software movement.After

usingemacs, gcc, Linux, etc., I really felt that I owed a debt to the community

which had, to a large degree, shaped my computing development” (Hackvn,

1999).Similar feelings were expressed by Mattisabout having “done his duty”

forfree software (Hackvn, 1999).

Transitions can be bumpy in open source. Since the model is signiﬁcantly

volunteer-driven, you cannot just go out and hire new talent or leadership

(granted, the increasing participation of commercially supported open source

developers modiﬁes this). Management problems set in at GIMP upon the

graduation of its principals from college because of the vacuum caused by

their departure. Spencer and Mattis had moved on. They were holding down

real jobs and could no longer put time into the project (Burgess, 2003). Most

problematically, “there was no deﬁned successor to S&P, and they neglected

to tell anyone they were leaving” according to Burgess (2003). Turnover at

even traditional technical companies, where there is an average time between

job changes of two years, is a signiﬁcant negative factor in productivity. This

impactis likely exacerbated in the free software community where “the rate of

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

3.3 GIMP 117

turnover for both volunteer and full-time contributors is probably higher and

the resulting losses to productivity and momentum are probably more severe.

New developers have the source code, but usually they can’t rely upon local

experts for assistance with their learning curves” (Hackvn, 1999). Thus, the

GIMP project had management hurdles to face that can handicap any project

andcertainly apply toopen source developmentas well. Buta new development

modelsoon emerged – a team of members with designated responsibilities for

managing releases, making bug ﬁxes, etc. There was no single team leader,

and project decisions were made through the #gimp Internet Relay Channel.

The initial effort was “focused almost exclusively on stability” (quote from

Spencerin Hackvn (1999)). As far as the viability of the learning curve for the

volunteers,even without the guidanceof the original pair, Spencerapprovingly

observedthat “I’m not sure how long it took the new maintainers to learn their

way around the code, but judging by the stability of the product, they seem

to be doing quite well” (Hackvn, 1999). By mid-1998 the ﬁrst stable version

wasreleased. GIMP was ported to Windows byTor Lillqvist in 1997. A binary

installer was developed by Jernej Simoncic that greatly simpliﬁed installation

on Windows. By 2004, after several years of development, a stable release,

supported not only on Unix, but on Mac OS X and Windows was announced.

The www.gimp.org Website now lists almost 200 developers involved in the

projectbeyond the founders Kimball and Mattis.

Forproducts likeAdobe Photoshopand GIMP, high-endspecialized usersset

thestandards for the product. Indeed, the requirements for anadvanced image-

processingproduct are probably more well understood in many respects by its

end users than its developers, particularly its professional graphic-artist users.

The effectivedevelopment and application of the product entails the develop-

mentof sophisticated artistic techniques anddemands an understanding of how

thetool is to beused that completely surpasses theunderstanding of most of the

actual developersof the system. Thus, as already observed, GIMP’s end users

representeda parallel butdivergent form ofsophistication to theprogram devel-

opers.An impassionedbase of suchusers was decisivein establishingthe recog-

nition and acceptance of GIMP. Their positive response and word-of-mouth

publicity helped spread the word about the product and deﬁne its evolution.

TheLinux Penguin logo was famously madeusing GIMP. The now-celebrated

Penguinwas designed byLarry Ewing in1996, using theearly 0.54 betaversion

of GIMP (Mears, 2003). In what would become common expert user action,

Ewing also set up a Web page that brieﬂy described how he used GIMP to

makethe logo (Ewing, 1996). The whole episode became the ﬁrst major expo-

surethat GIMP received (Burgess, 2003).The how-to site Ewing setup wasthe

P1:KAE

9780521881036c03 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:9

118 3 The Open Source Platform

ﬁrst of many.As Burgess observed regarding GIMP, “what differentiated this

program from many others is that a lot of sites sprung up on how to use the

program. . .showing off artwork and sharing techniques” (Burgess, 2003).

The bottom line is how do GIMP and Adobe Photoshop compare? It is

a confusing issue because there are very different potential user audiences

involved,andso the levelof functionality neededvaries fromthe mundane tothe

esoteric.For many applications, GIMP appears to be perfectly suitable. GIMP

wasinitially awkward to installon Windows, but thecurrent download installer

isfast and effective.The basic GIMPinterface is highly professional.One hears

verydifferent comparativeevaluations fromdifferent sources,and it isnot really

clearhow objective theevaluators are. Overall,GIMP’s performance appearsto

not match that of Photoshop. Photoshop’s interface is more intuitive.GIMP is

lesseasy to use, an important distinction for a casual user. The proliferation of

separatewindows is not alwayswell-received. The qualityof the tools in GIMP

is arguably uneven. However,GIMP is free of charge and cross-platform. But

ifyou are a professional graphics artist or if the application is a signiﬁcant one

wherethe graphics output is key to the successfuloutcome of a costly mission,

thecharge for the commercial product would likely not be an issue.

References

Bunks,C. (2000). Grokking the GIMP. New Riders Pub. Also: http://gimp-savvy.com/

BOOK/index.html?node1.html.Accessed November 29, 2006.

Burgess, S. (2003). A Brief History of GIMP. http://www.gimp.org/about/ancient

history.html.Accessed November 29, 2006.

Ewing,L. (1996).Penguin Tutorial.http://www.isc.tamu.edu/∼lewing/linux/notes.html.

AccessedJanuary 10, 2007.

Hackvn,S. (1999). Interview withSpencer Kimball and Peter Mattis.Linux World, Jan-

uary 1999. http://www.linuxworld.com/linuxworld/lw-1999-01/lw-01-gimp.html.

AccessedJanuary 21, 2004.

Mears,J. (2003). What’s the Story with the Linux Penguin? December 26. http://www.

pcworld.com/article/id,113881-page,1/article.html.Accessed January 10, 2007.

Willis, N. (2006). Running Photoshop Plugins in the GIMP, Even under Linux.

April10. http://applications.linux.com/article.pl?sid=06/04/05/1828238&tid=39.

AccessedNovember 29, 2006.

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

Technologies Underlying Open

Source Development

The free software movement emerged in the early 1980s at a time when

the ARPANET network with its several hundred hosts was well-established

and moving toward becoming the Internet. The ARPANET already allowed

exchanges like e-mail and FTP, technologies that signiﬁcantly facilitated dis-

tributedcollaboration, thoughthe Internetwas toamplify this abilityimmensely.

TheTCP/IP protocols thatenabled the Internet becamethe ARPANETstandard

onJanuary 1, 1983. As a point ofreference, recall that the ﬂagship open source

GNU project was announced by Richard Stallman in early 1983. By the late

1980s the NSFNet backbone networkmerged with the ARPANET to form the

emergingworldwide Internet. The exponential spread of the Internet catalyzed

further proliferation of open source development. This chapter will describe

some of the underlying enabling technologies of the open source paradigm,

other than the Internet itself, with an emphasis on the centralized Concurrent

VersionsSystem (CVS) versioning system as well as the newer decentralized

BitKeeper and Git systems that are used to manage the complexities of dis-

tributed open development. We also brieﬂy discuss some of the well-known

Web sites used to host and publicize open projects and some of the services

theyprovide.

Thespeciﬁc communications technologiesused in opensource projects have

historicallytended to be relatively lean: e-mail, mailing lists, newsgroups, and

lateron Web sites, InternetRelay Chat, and forums. Most current activitytakes

place on e-mail mailing lists and Web sites (Feller and Fitzgerald, 2002). The

mailing lists allow many-to-many dialogs and can provide searchable Web-

based archives just like Usenet. Major open source projects like Linux in the

early1990s still began operation with e-mail,newsgroups, and FTP downloads

to communicate. Since the code that had to be exchanged could be volumi-

nous, some means for reducing the amount of information transmitted and for

clarifyingthe nature of suggested changes to the code was required. The patch

119

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

120 4Technologies Underlying Open Source Development

program created by Larry Wall served this purpose. Newsgroups provided a

meansto broadcast ideas to targetedinterest groups whose members might like

to participate in a development project. The Usenet categories acted like elec-

tronic bulletin boards that allowed newsgroup participants to post e-mail-like

messages like the famous comp.os.minix newsgroup on Usenet used by Linus

Torvaldsto initiate the development of Linux. Another powerful collaborative

tool, developed beginning during the late 1980s, that would greatly facilitate

managing distributed software development was the versioning or conﬁgura-

tion management system. It is this topic that will be the focus of our attention

inthis chapter.

Versioningsystemsare software tools thatallow multiple developersto work

onprojects concurrently and keep track of changes made to the code. The ﬁrst

suchsystem was the Revision Control System (RCS) written inthe early 1980s

byWalter Tichy of Purdue. It used diffs to keep track of changes just like later

systems,but was limitedto single ﬁles. The ﬁrstsystem that could handleentire

projectswas written byDick Grune in1986 with amodest objective inmind. He

simplywanted tobe ableto workasynchronously with hisstudents ona compiler

project. Grune implemented his system using shell scripts that interacted with

RCS and eventually it evolved into the most widely used versioning system,

theopen source Concurrent Versions System, commonly knownas CVS. Brian

Berlinerinitiated the C implementation of CVS in mid-1989 by translating the

originalshell scriptsinto C. Latercontributors improved thesystem, noteworthy

being Jim Kingdom’s remote CVS implementation in 1993 that “enabled real

use of CVS by the open source community” (STUG award announcement for

2003,http://www.usenix.org/about/stug.html).

4.1 Overview of CVS

CVS has been crucial to open source development because it lets distributed

softwaredevelopers access a shared repository of the source code for a project

and permits concurrent changes to the code base. It also allows merging the

changes into an updated version of the project on the repository and monitor-

ing for potential conﬂicts that may occur because of the concurrent accesses.

Remarkably,at any point during a project development, any previous version

of the project can be easily accessed, so CVS also serves as a complete record

of the history of all earlier versions of the project and that of all the changes

to the project’s code. It thus acts like what has been metaphorically called

a time machine.Wewill overview the concepts and techniques that underlie

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

4.1 Overview of CVS 121

CVS (and similar systems) and illustrate its use in some detail, with exam-

ples selected from the comprehensive treatment of CVS by Fogel and Bar

(2003).

CVS, which is available for download from www.nongnu.org/cvs, is the

most widely used version control tool. It is distributed as open source

under the General Public License (GPL). It is an award-winning tool; its

major developers received the STUG (Software Tools User Group) award

in 2003 in which it was identiﬁed as “the essential enabling technol-

ogy of distributed development” (STUG award announcement for 2003;

http://www.usenix.org/about/stug.html). As Fogel and Bar (2003,p.10)

observe,“CVS becamethe free softwareworld’s ﬁrst choicefor revisioncontrol

because there’s a close match ...between the way CVS encourages a project

tobe run and the way free projects actually do run.”

CVS serves two basic functions. On the one hand it keeps a complete his-

torical digest of all actions (patches) against a project and on the other hand

it facilitates distributed developer collaboration (Fogel and Bar, 2003). As an

example, consider the following scenario. Suppose a user reports a bug in the

lastpublic release of a CVSproject and a developer wantsto locate the bug and

ﬁx it. Assuming the project has evolved since the previous release, the devel-

oper really needs an earlier version of the project, not its current development

state. Recapturing that earlier state is easy with CVS because it automatically

retainsthe entire developmenttree of theproject. Furthermore, CVSalso allows

the earlier version, once the bug is repaired, to be easily reintegrated with the

new current state of the project. Of course, it is worth stating that the kind of

developmentmanagement that CVS does had already been possible before the

deployment of CVS. The advantage is that CVS makes it much easier to do,

whichis a critical factor particularly in a volunteer environment. As Fogel and

Bar (2003, p.11) observe: “[I]t reduces the overhead in running a volunteer-

friendly project by givingthe general public easy access to the sources and by

offering features designed speciﬁcally to aid the generation of patches to the

source.”

CVSis a client-server system under which software projects are stored in a

so-called repository on a central server that serves content to possibly remote

clients.Its client-side manifestations let multiple developersremotely and con-

currently check out the latest version of a project from the repository. They

can then modify the source code on the client(s) as they see ﬁt, and thereafter

commit anychanges they have made to their working copy back to the central

repositoryin a coordinated manner, assuming they have the write privileges to

doso. This is called a copy-modify-merge development cycle.

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

122 4Technologies Underlying Open Source Development

Priorto CVS, versioning tools followed a lock-modify-unlock model for ﬁle

changes. Only one developer could have access to a particular ﬁle at a time;

other developers had to wait until the ﬁle being modiﬁed was released. This

kind of solo, mutually exclusive access requires considerable coordination. If

thedevelopersare collocated, orknow eachother welland can contacteach other

quicklyif a lockout is handicapping their work, or ifthe group of developers is

smallso that concurrent accesses are infrequent, then the coordination may be

manageable. But in a large, geographically and temporarily distributed group

ofdevelopers, the overhead ofcoordinating becomes onerous and annoying –a

problematicissue in what may be a preponderantly volunteer community.This

concurrent access is one way in which the copy-modify-merge model of CVS

smoothesthe interactions in a distributed development.The impact of conﬂicts

in CVS also appears to be less than might be expected in any case. Berliner,

one of the key creators of CVS, indicated that in his own personal experience

actual conﬂicts are usually not particularly problematic: “conﬂicts that occur

when the same object has been modiﬁed by someone else are quite rare” and

thatif they dooccur “the changesmade by the otherdeveloper areusually easily

resolved”(Berliner, 1990).

Diffand Patch

TheCVS development treeis not stored explicitly.Under CVS, earlier versions

of the project under development are maintained only implicitly with just the

differencesbetween successive versions kept–atechnique that is called delta

compression. The CVS system lets a developer make changes, track changes

madeby other developers by viewing a log of changes, access arbitrary earlier

versionsof the project on the basis, for example, of a date or revision number,

and initiate new branches of the project. The system can automatically inte-

grate developer changes into the project master copy on the repository or to

any working copies that are currently checked out by any developers using a

combination of its update and commit processes. The distributed character of

theproject’s developers, who are working on the project at different times and

places, beneﬁts greatly from this kind of concurrent access, with no developer

havingexclusive access to the repository ﬁles; otherwise the project could not

be collaborated on as effectively. Changes are generally only committed after

testingis complete, so the master copystays runnable. The committed changes

areaccompanied by developer log messages that explain the change. Conﬂicts

caused by a developer,who is concurrently changing a part of the project that

has already been changed by another developer, are detected automatically

when the developer attempts to commit the change. These conﬂicts must then

beresolved manually before the changes can be committed to the repository.

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

4.1 Overview of CVS 123

Basic programs (commands, utilities, ﬁles) required for such versioning

systeminclude the following:

1. the diff command,

2. the patch command, and

3. the patch ﬁle.

Thediff command is a Unix command that identiﬁes and outputs the differ-

encesbetween a pair of textﬁles on a line-by-line basis.It indicates (depending

on the format selected) whether different lines have been added, deleted, or

changed,with unchanged shared lines not output, except as context. Those are

theonly possible four editorial states.

Conceptually,diff takes a pair of ﬁles A andB and creates a ﬁle C represen-

ting their “difference.” The output ﬁle is usually called a patch ﬁle because of

its use in collaborative development where the difference represents a “soft-

ware patch” that is scheduled to be made to a current version of a program.

Modiﬁcations to projects may be submitted as patches to project developers

(ormaintainers) who can evaluate thesubmitted code. The core developers can

thendecide whether asuggested patch should berejected, or accepted andcom-

mittedto the source repository, towhich only the developers havewrite access.

Theso-called uniﬁed difference format for the diff command is especially use-

fulin open source developmentbecause it lets project maintainersmore readily

recognize and understand the code changes being submitted. Forexample, the

uniﬁed format includes surrounding lines that have not been changed as con-

text,making it easier torecognize what contents have beenchanged and where.

Then, a judgment is required before the changes are committed to the project

repository.

The diff command works in combination with the patch command to enact

changes(Fountain, 2002).The Unixpatch command usesthe textualdifferences

between an original ﬁle A and a revised ﬁle B, as summarized in a diff ﬁle

C, to update ﬁle A to reﬂect the changes introduced in B. For example, in a

collaborativedevelopmentcontext, if Bis an updatedversion of thedownloaded

sourcecode in A, then:

diff AB > C

createsthe patch ﬁle Casthedifference of A and B. Then the command:

patch A < C

could be used to apply the patch C to update A, so it corresponds to the revi-

sionB.

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

124 4Technologies Underlying Open Source Development

Thecomplementary diff and patch commands are extremely useful because

they allow source code changes, in the form of the relatively small patch ﬁle

like C (instead of the entire new version B), to be submitted, for example, by

e-mail. After this submission, the small patch changes can be scrutinized by

projectmaintainers before they are integrated into the development repository.

Thesecommands are considered the crucialunderlying elements in version-

ing systems, regardless of whether they are used explicitly or wrapped up in

a tool. CVS C-implementer Berliner characterizes the patch program as the

“indispensabletool for applying a diff ﬁle to an original”(Berliner, 1990). The

patchprogram was invented by Larry Wall (creator of Perl) in 1985.

4.2 CVS Commands

Note: The following discussion is based on the well-known introduction to

CVS by Karl Fogel and Moshe Bar (2003), speciﬁcally the “Tour of CVS”

in their Chapter 2 – though their treatment is far more detailed, overall about

370 pages for the entire text in the current PDF. The present overview gives

only a glimpse of CVS and is intended as a bird’s-eye-view of how it works.

We will use a number of examples from the Fogel and Bar (2003) tour which

wewill reference carefully to facilitate ready accessto the original treatise. We

alsointersperse the examples withcontextual comments about the roleof CVS.

The interested reader should see Fogel and Bar (2003) for a comprehensive,

in-depthtreatment. We next illustrate some of the key CVS commands.

4.2.1 Platforms and Clients

Naturally,in order to execute the cvs program it must have been installed on

yourmachine in the ﬁrstplace. CVS comes with mostLinux distributions, so in

that case you do not have to install it. Otherwise, you can build CVS from the

source code provided at sites like the Free Software Foundation (FSF)’s FTP

site. The stable releases of the software are those with a single decimal point

in their release version number. Unix-like platforms are obviously the most

widely used for CVS development. The well-known documentation manual

for the CVS system is called the Cederqvist Manual, named after its original

authorwho wrotethe ﬁrstversion in1992 (Cederqvistet al., 2003).(Incidentally,

dates are interesting in these matters because they help correlate noteworthy

technologicaldevelopments related to open source.For example, CVS debuted

around 1986, but the key C version by Berliner did not come out until 1989.

LinusTorvalds posted his original Linux announcement in August 1991.)

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

4.2 CVS Commands 125

Thecvs executableonce installedautomatically allowsyou to useit asa client

toconnect toremote CVS repositories.If you wantto create arepository on your

own machine, youuse the cvs newrepos command and theinit subcommand to

createone. Ifyou then addan appropriate Unixusers group,then any ofthe users

cancreate an independent new project usingthe cvs import command. Refer to

Fogeland Bar (2003) for detailed information about where to get source code,

compilation,commands, administration, etc.

There are also Windows versions of CVS available. Currently these can

onlyconnect as clients to repositorieson remote machines or serverepositories

on their own local machine. They cannot provide repository service to remote

machines. The Windows version is typically available as prebuilt binary exe-

cutables.A free Windowsprogram called WinCVS isdistributed under the GPL

thatprovides aCVS client that onlylets you connectto a remoteCVS repository

server.However, it does not let you serve a repository from you own machine,

even locally.WinCVS is available as a binary distribution with relatively easy

installationand conﬁguration instructions. The WinCVS clientlets you make a

workingproject copy from a remote repository to which you can subsequently

commitchanges, update, or synchronize vis- `a-vis the repository,etc.

4.2.2 Command Format

TheCVS interfaceis command-lineoriented. Both commandoptions andglobal

optionscan be speciﬁed. Command (or local) options only affectthe particular

commandand are givento the rightof the command itself.Global options affect

the overall CVS environment independently of the current command and are

givento the left of the command. The format to execute a command is

cvs -global-options command -command-options

Forexample,thestatement:

cvs -Q update -p

runs the update command (Fogel and Bar, 2003,p.27).The token cvs is of

course the name of the CVS executable. The -Q tells the CVS program to

operate in the quiet mode, meaning there is no diagnostic output except when

thecommand fails. The -p command option directs the resultsof the command

tostandard output. The repository beingreferenced may be local orremote, but

ineither case a workingcopy must have alreadybeen checked out. Weconsider

the semantics of the illustrated update command in an upcoming section, but

ﬁrstwe address a basic question: how do you get a copy of the project to work

onin the ﬁrst place?

P1:JYD

9780521881036c04 CUNY1180/Deek 0521 88103 6 October 1, 2007 17:3

126 4Technologies Underlying Open Source Development

4.2.3 Checking Out a Project From a Repository

Thetransparency of online open source projects is truly amazing. Using CVS,

anyoneon the Internet can get a copy of the most current version of a project.

Whileonly core developers canmake changes to themaster copy of a projectin

the repository,CVS allows anyone to retrieve a copy of the project, as well as

tokeep any of their own modiﬁcationsconveniently synchronized vis-`a-visthe

repository.This is a major paradigmshift. It is the polar opposite to howthings

are done in a proprietary approach where access to the source is prohibited to

anyoneoutside theproject loop. In opensource under CVS,everyone connected

to the Internet has instantaneous access to the real-time version of the source

as well as to its development history: what was done, where it was done, by

whom was it done, and when was it done! Of course, the same technology

couldalso be used in a proprietary development modelfor use by a proprietary

developmentteam only, or by individuals or small teams working on a private

project.

Rememberthe natureof the CVSdistribution model. Thereis a singlemaster

copy of the project that some CVS system maintains centrally in a repository.

Anyone who wants to look at the source code for a project in that repository,

whetherjust to read itor to modify it, hasto get his orher own separate working

copy of the project from the repository. Given that a project named myproject

already exists, a person checks out a working copy of the project with the

command(Fogel and Bar, 2003,p.32):

cvs checkout myproject

Of course, before you can actually check out a working copy of a project,

youﬁrst have to tell your CVS system or client where the repository is located

thatyou expect to check out from. If the repositorywere stored locally on your

own machine, you could execute the cvs program with the -d (for directory)

option and just give the local path to the repository. A typical Unix example,

assumingthe repository is locatedat /usr/local/cvs (Fogeland Bar, 2003,p.27),

wouldbe

cvs -d /usr/ local/ cvs command

Toavoid having totype the-d repository youcan setthe environment variable

CVSROOTto pointto the repository.In therest of this overviewwe willassume

thishas already been done (Fogel and Bar, 2003,p.30).If the repository were

located on a remote server reached over the Internet, you would use an access

method.The methodmay allow unauthenticatedaccess tothe repository,but it is

alsopossible to have password-authenticatedaccess to the server(via an access

method called pserver). Authenticated access requires not only a username,