Principles of Distributed Database Systems
M. Tamer Özsu • Patrick Valduriez
Principles of Distributed
Database Systems
Third Edition
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer, software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Springer New York Dordrecht Heidelberg London
M. Tamer Özsu
David R. Cheriton School of
Computer Science
University of Waterloo
Waterloo Ontario
Canada N2L 3G1
ISBN 978-1-4419-8833-1 e-ISBN 978-1-4419-8834-8
DOI 10.1007/978-1-4419-8834-8
This book was previously published by: Pearson Education, Inc.
Tamer.Ozsu@uwaterloo.ca
Library of Congress Control Number: 2011922491
© Springer Science+Business Media, LLC 2011
Patrick Valduriez
LIRMM
34392 Montpellier Cedex
France
Patrick.Valduriez@inria.fr
INRIA
161 rue Ada
Tomy family
and my parents
M.T.
¨
O.
ToEsther, my daughters Anna, Juliette and
Sarah, and my parents
P.V.
Preface
It has been almost twenty years since the first edition of this book appeared, and ten
yearssince we released the second edition. As one can imagine, in a fast changing
area such as this, there have been significant changes in the intervening period.
Distributeddata management went from a potentially significant technology to one
that is common place. The advent of the Internet and the WorldWide Web have
certainly changed the way we typically look at distribution. The emergence in recent
yearsof different forms of distributed computing, exemplified by data streams and
cloudcomputing, has regenerated interest in distributed data management. Thus, it
wastime for a major revision of the material.
Westarted to work on this edition five years ago, and it has taken quite a while to
complete the work.The end result, however, is a book that has been heavily revised –
whilewe maintained and updated the core chapters, we have also added new ones.
Themajor changes are the following:
1.
Database integration and querying is nowtreated inmuch more detail, re-
flecting the attention these topics have received in the community in the
past decade. Chapter 4 focuses on the integration process, while Chapter 9
discussesquerying over multidatabase systems.
2. The previous editions had only brief discussion of data replication protocols.
Thistopic is now covered in a separate chapter(Chapter 13) where we provide
anin-depth discussion of the protocols and how they can be integrated with
transactionmanagement.
3.
Peer-to-peer data management is discussed in depth in Chapter 16. These
systems havebecome an important and interesting architectural alternative to
classicaldistributed database systems. Although the early distributed database
systemsarchitectures followed the peer-to-peer paradigm, the modern incar-
nationof these systems have fundamentally different characteristics, so they
deservein-depth discussion in a chapter of their own.
4.
Webdata managementis discussed in Chapter 17. This is adifficult topic
to cover since there is no unifying framework.We discuss various aspects
vii
viii Preface
ofthe topic ranging from web models to search engines to distributed XML
processing.
5.
Earlier editions contained a chapter where we discussed “recent issues” at the
time.In this edition, we again have a similar chapter (Chapter 18) where we
coverstream data management and cloudcomputing. These topics are still
ina flux and are subjects of considerable ongoing research. We highlight the
issuesand the potential research directions.
The resulting manuscript strikesa balance betweenour two objectives, namely to
address newand emerging issues, and maintain the main characteristics of the book
inaddressing the principles of distributed data management.
Theorganization of the book can be divided into two major parts. The first part
covers the fundamental principles of distributed data managementand consist of
Chapters 1 to 14. Chapter 2 in this part coversthe background and canbe skipped if
the students already havesufficient knowledge of the relational database concepts
and the computer networktechnology. The only part of this chapter that is essential
is Example 2.3, which introduces the running example that we use throughout much
ofthe book. The second part covers more advanced topics and includes Chapters 15
18.What one covers in a course depends very much on the duration and the course
objectives.If the course aims to discuss the fundamental techniques, then it might
coverChapters 1, 3, 5, 68, 1012. An extendedcoverage would include, in addition
tothe above, Chapters 4, 9, and 13. Courses that have time to cover more material
canselectively pick one or more of Chapters 15 18 from the second part.
Manycolleagues have assisted with this edition of the book. S. Keshav (Univer-
sity of Waterloo) has read and provided many suggestions to update the sections
on computer networks. Ren
´
ee Miller (University of Toronto) and Erhard Rahm
(University of Leipzig) read an early draft ofChapter 4and providedmany com-
ments, Alon Halevy (Google) answered a number of questions about this chapter
and provided a draft copy of his upcoming book on this topic as well as reading
and providing feedback on Chapter 9, Avigdor Gal (Technion) also reviewed and
critiquedthis chapter very thoroughly. Matthias Jarke and Xiang Li (University of
Aachen), Gottfried Vossen (University of Muenster), Erhard Rahm and Andreas
Thor (Universityof Leipzig)contributed exercises to this chapter. Hubert Naacke
(Universityof Paris6) contributed to the section onheterogeneous cost modeling
andFabio Porto (LNCC, Petropolis) to the section on adaptive query processing of
Chapter 9. Data replication (Chapter 13) could not have been writtenwithout the
assistanceof Gustavo Alonso (ETH Z
¨
urich)and Bettina Kemme (McGill University).
Tamerspent four months in Spring 2006 visiting Gustavo where work on this chapter
beganand involved many long discussions. Bettina read multiple iterations of this
chapter overthe next one year criticizing everything and pointing out better ways of
explainingthe material. EstherPacitti (University of Montpellier) also contributed to
this chapter,both byreviewing it and by providing background material; shealso
contributedto the section on replication in database clusters in Chapter 14. Ricardo
Jimenez-Peris also contributed to that chapter inthe sectionon fault-tolerancein
database clusters. Khuzaima Daudjee (University of Waterloo) read and provided
Preface ix
comments on this chapter as well. Chapter 15on Distributed Object Database Man-
agementwas reviewed by Serge Abiteboul (INRIA), whoprovided important critique
ofthe material and suggestions for its improvement. Peer-to-peer data management
(Chapter 16) owes a lot to discussions with Beng Chin Ooi (National University
of Singapore) during the four months Tamerwas visitingNUS in the fall of 2006.
Thesection of Chapter 16 on query processing in P2P systems uses material from
thePhD work of Reza Akbarinia (INRIA) and Wenceslao Palma (PUC-Valparaiso,
Chile) while the section on replication uses material from the PhDwork of Vidal
Martins (PUCPR, Curitiba). The distributedXML processing section of Chapter 17
uses material from the PhD work of Ning Zhang (Facebook) and Patrick Kling at
the University of Waterloo, and Ying Zhang at CWI. All three of them also read
the material and providedsignificant feedback. VictorMunt
´
es i Mulero (Universitat
Polit
`
ecnica de Catalunya)contributed to the exercises in that chapter.
¨
Ozg
¨
ur Ulusoy
(Bilkent University) provided comments and corrections on Chapters 16 and 17.
Data stream management section of Chapter 18 draws from the PhD work of Lukasz
Golab (AT&T Labs-Research), and Yingying Taoat the University of Waterloo.
WalidAref (Purdue University) and Avigdor Gal (Technion) used the draft ofthe
bookin their courses, which was very helpful in debugging certain parts. We thank
them, as well as many colleagues whohad helpedout withthe first two editions,
for all their assistance. Wehave notalways followed their advice, and, needless
to say, the resulting problems and errorsare ours. Students in two courses at the
Universityof Waterloo (Web Data Management in Winter 2005, and Internet-Scale
DataDistribution in Fall 2005) wrote surveys as part of their coursework that were
very helpful in structuring some chapters. Tamertaught courses at ETH Z
¨
urich
(PDDBS– Parallel and Distributed Databases in Spring 2006) and at NUS (CS5225 –
Paralleland Distributed Database Systems in Fall 2010) using parts of this edition.
Wethank students in all these courses for their contributions and their patience as
theyhad to deal with chapters that were works-in-progress – the material got cleaned
considerablyas a result of these teaching experiences.
Youwill note that the publisher of the third edition of the book is different than
the first two editions. Pearson, our previous publisher, decided not to be involved
with the third edition. Springer subsequently showed considerable interest in the
book. Wewould like to thank Susan Lagerstrom-Fife and Jennifer Evans of Springer
for their lightning-fast decision to publish the book, and Jennifer Mauer fora ton
ofhand-holding during the conversion process. We would also like to thank Tracy
Dunkelbergerof Pearson who shepherded the reversal of the copyright to us without
delay.
Asin earlier editions, we will have presentation slides that can be used to teach
fromthe book as well as solutions to most of the exercises. These will be available
from Springer to instructors who adopt the bookand there will be a link to them
fromthe book’s site at springer.com.
Finally, we would be very interested to hear your comments and suggestions
regardingthe material. We welcome any feedback, but we would particularly like to
receivefeedback on the following aspects:
x Preface
1.
anyerrors that may have remained despite our best efforts (although we hope
thereare not many);
2.
any topics that should no longer be included and any topics thatshould be
addedor expanded; and
3.
anyexercises that you may have designed that you would like to be included
inthe book.
M.Tamer
¨
Ozsu(Tamer.Ozsu@uwaterloo.ca)
PatrickValduriez (Patrick.Valduriez@inria.fr)
November2010
Contents
1 Introduction. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 1
1.1 DistributedData Processing . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 2
1.2 Whatis a Distributed Database System? . . . . . . . . .. . . . . . . . .. . . . . 3
1.3 DataDelivery Alternatives . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 5
1.4 Promisesof DDBSs . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 7
1.4.1 Transparent Management of Distributedand Replicated Data 7
1.4.2 Reliability Through Distributed Transactions . . . .. . . . . . . . . 12
1.4.3 Improved Performance. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 14
1.4.4 Easier System Expansion. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 15
1.5 ComplicationsIntroduced by Distribution . .. . . . . . . . .. . . . . . . . .. . 16
1.6 DesignIssues .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 16
1.6.1 Distributed Database Design .. . . . . . . . .. . . . . . . . .. . . . . . . . 17
1.6.2 Distributed Directory Management .. . . . . . .. . . . . . . . .. . . . 17
1.6.3 Distributed Query Processing . . . . . .. . . . . . . . .. . . . . . . . .. . 17
1.6.4 Distributed Concurrency Control . . . . . . .. . . . . . . . .. . . . . . . 18
1.6.5 Distributed Deadlock Management .. . . . . . .. . . . . . . . .. . . . 18
1.6.6 Reliability of Distributed DBMS . . . . . . . . .. . . . . . . . .. . . . . 18
1.6.7 Replication .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 19
1.6.8 Relationship among Problems. . . . . .. . . . . . . . .. . . . . . . . .. . 19
1.6.9 Additional Issues .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 20
1.7 DistributedDBMS Architecture . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 21
1.7.1 ANSI/SPARCArchitecture .. . . . . .. . . . . . . . .. . . . . . . . .. . . 21
1.7.2 A Generic Centralized DBMS Architecture .. . . . . . . .. . . . . 23
1.7.3 Architectural Models for Distributed DBMSs. . . . . . . .. . . . . 25
1.7.4 Autonomy . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 25
1.7.5 Distribution . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 27
1.7.6 Heterogeneity . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 27
1.7.7 Architectural Alternatives . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 28
1.7.8 Client/Server Systems . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 28
1.7.9 Peer-to-Peer Systems . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 32
1.7.10 MultidatabaseSystem Architecture . . .. . . . . . . . .. . . . . . . . . 35
xi
xii Contents
1.8 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 38
2 Background . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 41
2.1 Overviewof Relational DBMS . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 41
2.1.1 Relational Database Concepts. . . . . . . . .. . . . . . . . .. . . . . . . . 41
2.1.2 Normalization . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . .. 43
2.1.3 Relational Data Languages . . . . . . . .. . . . . . . . .. . . . . . . . .. . 45
2.2 Reviewof Computer Networks .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 58
2.2.1 Typesof Networks .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 60
2.2.2 Communication Schemes .. . . . . . . .. . . . . . . . .. . . . . . . . .. . 63
2.2.3 Data Communication Concepts . .. . . . . . . . .. . . . . . . . .. . . . 65
2.2.4 Communication Protocols . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 67
2.3 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 70
3 DistributedDatabase Design . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . .. . 71
3.1 Top-DownDesign Process . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 73
3.2 DistributionDesign Issues . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 75
3.2.1 Reasons for Fragmentation . . . . . . . .. . . . . . . . .. . . . . . . . .. . 75
3.2.2 Fragmentation Alternatives . . .. . . . . . . . .. . . . . . . . .. . . . . . . 76
3.2.3 Degree of Fragmentation . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 77
3.2.4 Correctness Rules of Fragmentation. . . . . . .. . . . . . . . .. . . . . 79
3.2.5 Allocation Alternatives .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 79
3.2.6 Information Requirements. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 80
3.3 Fragmentation . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 81
3.3.1 Horizontal Fragmentation . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 81
3.3.2 VerticalFragmentation . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 98
3.3.3 Hybrid Fragmentation . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 112
3.4 Allocation. . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 113
3.4.1 Allocation Problem .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 114
3.4.2 Information Requirements. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 116
3.4.3 Allocation Model . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 118
3.4.4 Solution Methods . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 121
3.5 DataDirectory .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 122
3.6 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 123
3.7 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 125
4 DatabaseIntegration . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 131
4.1 Bottom-UpDesign Methodology . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 133
4.2 SchemaMatching . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 137
4.2.1 Schema Heterogeneity . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 140
4.2.2 Linguistic Matching Approaches . . . . . . . . .. . . . . . . . .. . . . . 141
4.2.3 Constraint-based Matching Approaches .. . . . . . . . .. . . . . . . 143
4.2.4 Learning-based Matching .. . . . . .. . . . . . . . .. . . . . . . . .. . . . 145
4.2.5 Combined Matching Approaches . . . . .. . . . . . . . .. . . . . . . . . 146
4.3 SchemaIntegration .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 147
Contents xiii
4.4 SchemaMapping . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . .. 149
4.4.1 Mapping Creation. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 150
4.4.2 Mapping Maintenance . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 155
4.5 DataCleaning . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 157
4.6 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 159
4.7 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 160
5 Dataand Access Control .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 171
5.1 ViewManagement . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 172
5.1.1 Viewsin Centralized DBMSs .. . . . . . . .. . . . . . . . .. . . . . . . . 172
5.1.2 Viewsin Distributed DBMSs . . . .. . . . . . . . .. . . . . . . . .. . . . 175
5.1.3 Maintenance of Materialized Views. . . . . . . . .. . . . . . . . .. . . 177
5.2 DataSecurity .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 180
5.2.1 Discretionary Access Control . . . . . .. . . . . . . . .. . . . . . . . .. . 181
5.2.2 Multilevel Access Control. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 183
5.2.3 Distributed Access Control . . . . . .. . . . . . . . .. . . . . . . . .. . . . 185
5.3 SemanticIntegrity Control . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 187
5.3.1 Centralized Semantic Integrity Control . . . . . . .. . . . . . . . .. . 189
5.3.2 Distributed Semantic Integrity Control .. . . . . . .. . . . . . . . .. 194
5.4 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 200
5.5 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 201
6 Overviewof Query Processing . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 205
6.1 QueryProcessing Problem .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 206
6.2 Objectivesof Query Processing . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 209
6.3 Complexityof Relational Algebra Operations .. . . . . .. . . . . . . . .. . . 210
6.4 Characterizationof Query Processors .. . . . . .. . . . . . . . .. . . . . . . . .. 211
6.4.1 Languages . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 212
6.4.2 Typesof Optimization .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 212
6.4.3 Optimization Timing . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 213
6.4.4 Statistics .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 213
6.4.5 Decision Sites . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 214
6.4.6 Exploitation of the Network Topology. . . . . .. . . . . . . . .. . . . 214
6.4.7 Exploitation of Replicated Fragments .. . . . . .. . . . . . . . .. . . 215
6.4.8 Use of Semijoins . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 215
6.5 Layersof Query Processing .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 215
6.5.1 Query Decomposition .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 216
6.5.2 Data Localization . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 217
6.5.3 Global Query Optimization . .. . . . . . . . .. . . . . . . . .. . . . . . . . 218
6.5.4 Distributed Query Execution. . . . . . . .. . . . . . . . .. . . . . . . . .. 219
6.6 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 219
6.7 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 220
xiv Contents
7 QueryDecomposition and Data Localization . .. . . . . . . . .. . . . . . . . .. . 221
7.1 QueryDecomposition . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 222
7.1.1 Normalization . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . .. 222
7.1.2 Analysis. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 223
7.1.3 Elimination of Redundancy. . . . . . . .. . . . . . . . .. . . . . . . . .. . 226
7.1.4 Rewriting. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 227
7.2 Localizationof Distributed Data . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 231
7.2.1 Reduction for Primary Horizontal Fragmentation. . . . . . .. . . 232
7.2.2 Reduction for VerticalFragmentation . . . . .. . . . . . . . .. . . . . 235
7.2.3 Reduction for Derived Fragmentation . . . . . . . .. . . . . . . . .. . 237
7.2.4 Reduction for Hybrid Fragmentation . . . . .. . . . . . . . .. . . . . . 238
7.3 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 241
7.4 BibliographicNOTES . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 241
8 Optimizationof Distributed Queries . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 245
8.1 QueryOptimization . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . .. 246
8.1.1 Search Space .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 246
8.1.2 Search Strategy .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 248
8.1.3 Distributed Cost Model . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 249
8.2 CentralizedQuery Optimization . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 257
8.2.1 Dynamic Query Optimization . .. . . . . . . . .. . . . . . . . .. . . . . . 257
8.2.2 Static Query Optimization. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 261
8.2.3 Hybrid Query Optimization. . . . . . .. . . . . . . . .. . . . . . . . .. . . 265
8.3 JoinOrdering in Distributed Queries . . . . . . . . .. . . . . . . . .. . . . . . . . 267
8.3.1 Join Ordering .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 267
8.3.2 Semijoin Based Algorithms. . . . . .. . . . . . . . .. . . . . . . . .. . . . 269
8.3.3 Join versus Semijoin .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 272
8.4 DistributedQuery Optimization . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 273
8.4.1 Dynamic Approach .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 274
8.4.2 Static Approach .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 277
8.4.3 Semijoin-based Approach . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 281
8.4.4 Hybrid Approach . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 286
8.5 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 290
8.6 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 292
9 MultidatabaseQuery Processing . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 297
9.1 Issuesin Multidatabase Query Processing . . .. . . . . . . . .. . . . . . . . .. 298
9.2 MultidatabaseQuery Processing Architecture .. . . . . . .. . . . . . . . .. . 299
9.3 QueryRewriting Using Views . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 301
9.3.1 Datalog Terminology . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 301
9.3.2 Rewriting in GAV. .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 302
9.3.3 Rewriting in LAV. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 304
9.4 QueryOptimization and Execution .. . . . . .. . . . . . . . .. . . . . . . . .. . . 307
9.4.1 Heterogeneous Cost Modeling . . . . . . . . .. . . . . . . . .. . . . . . . 307
9.4.2 Heterogeneous Query Optimization . . .. . . . . . . . .. . . . . . . . . 314
Contents xv
9.4.3 Adaptive Query Processing . .. . . . . . . . .. . . . . . . . .. . . . . . . . 320
9.5 QueryTranslation and Execution . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 327
9.6 Conclusion .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 330
9.7 BibliographicNotes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 331
10 Introductionto Transaction Management . . . . .. . . . . . . . .. . . . . . . . .. . 335
10.1 Definition of a Transaction . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 337
10.1.1 TerminationConditions of Transactions . .. . . . . . . . .. . . . . . 339
10.1.2 Characterizationof Transactions . . . . . . .. . . . . . . . .. . . . . . . 340
10.1.3 Formalizationof the Transaction Concept . . . . . .. . . . . . . . .. 341
10.2 Properties of Transactions . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 344
10.2.1 Atomicity .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 344
10.2.2 Consistency. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 345
10.2.3 Isolation. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 346
10.2.4 Durability . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 349
10.3 Types of Transactions . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 349
10.3.1 FlatTransactions . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 351
10.3.2 NestedTransactions . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 352
10.3.3 Workflows. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 353
10.4 Architecture Revisited . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 356
10.5 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 357
10.6 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 358
11 DistributedConcurrency Control .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 361
11.1 Serializability Theory .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 362
11.2 Taxonomy of Concurrency Control Mechanisms . .. . . . . . . . .. . . . . . 367
11.3 Locking-Based Concurrency Control Algorithms .. . . . . . . .. . . . . . . 369
11.3.1 Centralized2PL . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 373
11.3.2 Distributed2PL . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 374
11.4 Timestamp-Based Concurrency Control Algorithms . . . .. . . . . . . . .. 377
11.4.1 BasicTO Algorithm . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 378
11.4.2 ConservativeTO Algorithm . . .. . . . . . . . .. . . . . . . . .. . . . . . 381
11.4.3 MultiversionTO Algorithm . .. . . . . . . . .. . . . . . . . .. . . . . . . . 383
11.5 Optimistic Concurrency Control Algorithms .. . . . . . . .. . . . . . . . .. . 384
11.6 Deadlock Management .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 387
11.6.1 DeadlockPrevention . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 389
11.6.2 DeadlockAvoidance . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 390
11.6.3 DeadlockDetection and Resolution .. . . . . .. . . . . . . . .. . . . . 391
11.7 “Relaxed” Concurrency Control . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 394
11.7.1 Non-SerializableHistories . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 395
11.7.2 NestedDistributed Transactions . . . . . . . .. . . . . . . . .. . . . . . . 396
11.8 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 398
11.9 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 401
xvi Contents
12 DistributedDBMS Reliability . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 405
12.1 Reliability Concepts and Measures .. . . . . . . .. . . . . . . . .. . . . . . . . .. 406
12.1.1 System,State, and Failure .. . . . . .. . . . . . . . .. . . . . . . . .. . . . 406
12.1.2 Reliabilityand Availability .. . . . . .. . . . . . . . .. . . . . . . . .. . . 408
12.1.3 MeanTime between Failures/Mean Time to Repair . . . .. . . . 409
12.2 Failures in Distributed DBMS .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 410
12.2.1 TransactionFailures . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 411
12.2.2 Site(System) Failures . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 411
12.2.3 MediaFailures . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 412
12.2.4 CommunicationFailures .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 412
12.3 Local Reliability Protocols . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 413
12.3.1 ArchitecturalConsiderations . . . . . .. . . . . . . . .. . . . . . . . .. . . 413
12.3.2 RecoveryInformation . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 416
12.3.3 Executionof LRM Commands .. . . . . .. . . . . . . . .. . . . . . . . . 420
12.3.4 Checkpointing. . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 425
12.3.5 HandlingMedia Failures .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . 426
12.4 Distributed Reliability Protocols .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 427
12.4.1 Componentsof Distributed Reliability Protocols .. . . . . . . .. 428
12.4.2 Two-PhaseCommit Protocol .. . . . . . . . .. . . . . . . . .. . . . . . . . 428
12.4.3 Variationsof 2PC . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 434
12.5 Dealing with Site Failures . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 436
12.5.1 Terminationand Recovery Protocols for 2PC . . . . . .. . . . . . . 437
12.5.2 Three-PhaseCommit Protocol . . . . .. . . . . . . . .. . . . . . . . .. . 443
12.6 Network Partitioning . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 448
12.6.1 CentralizedProtocols . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 450
12.6.2 Voting-basedProtocols . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 450
12.7 Architectural Considerations . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 453
12.8 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 454
12.9 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 455
13 DataReplication . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 459
13.1 Consistency of Replicated Databases . . . .. . . . . . . . .. . . . . . . . .. . . . 461
13.1.1 MutualConsistency .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 461
13.1.2 MutualConsistency versus Transaction Consistency . . . . . . . 463
13.2 Update Management Strategies .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 465
13.2.1 EagerUpdate Propagation . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 465
13.2.2 LazyUpdate Propagation . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 466
13.2.3 CentralizedTechniques .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 466
13.2.4 DistributedTechniques . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 467
13.3 Replication Protocols . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 468
13.3.1 EagerCentralized Protocols . . . . .. . . . . . . . .. . . . . . . . .. . . . 468
13.3.2 EagerDistributed Protocols . . . .. . . . . . . . .. . . . . . . . .. . . . . . 474
13.3.3 LazyCentralized Protocols .. . . . . . . .. . . . . . . . .. . . . . . . . .. 475
13.3.4 LazyDistributed Protocols . . . .. . . . . . . . .. . . . . . . . .. . . . . . 480
13.4 Group Communication .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 482
Contents xvii
13.5 Replication and Failures .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 485
13.5.1 Failuresand Lazy Replication . . . . .. . . . . . . . .. . . . . . . . .. . . 485
13.5.2 Failuresand Eager Replication .. . . . . .. . . . . . . . .. . . . . . . . . 486
13.6 Replication Mediator Service . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 489
13.7 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 491
13.8 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 493
14 ParallelDatabase Systems . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 497
14.1 Parallel Database System Architectures . .. . . . . . . . .. . . . . . . . .. . . . 498
14.1.1 Objectives . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 498
14.1.2 FunctionalArchitecture .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 501
14.1.3 ParallelDBMS Architectures . .. . . . . . . . .. . . . . . . . .. . . . . . 502
14.2 Parallel Data Placement . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 508
14.3 Parallel Query Processing . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 512
14.3.1 QueryParallelism . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 513
14.3.2 ParallelAlgorithms for Data Processing . . .. . . . . . . . .. . . . . 515
14.3.3 ParallelQuery Optimization .. . . . . . . . .. . . . . . . . .. . . . . . . . 521
14.4 Load Balancing . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 525
14.4.1 ParallelExecution Problems .. . . . . . .. . . . . . . . .. . . . . . . . .. 525
14.4.2 Intra-OperatorLoad Balancing . . . . . . . . .. . . . . . . . .. . . . . . . 527
14.4.3 Inter-OperatorLoad Balancing . . . . . . . . .. . . . . . . . .. . . . . . . 529
14.4.4 Intra-QueryLoad Balancing .. . . . . . . . .. . . . . . . . .. . . . . . . . 530
14.5 Database Clusters . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 534
14.5.1 DatabaseCluster Architecture . . . .. . . . . . . . .. . . . . . . . .. . . . 535
14.5.2 Replication . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 537
14.5.3 LoadBalancing . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 540
14.5.4 QueryProcessing .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 542
14.5.5 Fault-tolerance . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 545
14.6 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 546
14.7 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 547
15 DistributedObject Database Management .. . . . . . . . .. . . . . . . . .. . . . . 551
15.1 Fundamental Object Concepts and Object Models . . . . .. . . . . . . . .. 553
15.1.1 Object .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 553
15.1.2 Typesand Classes . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 556
15.1.3 Composition(Aggregation) . . . . .. . . . . . . . .. . . . . . . . .. . . . . 557
15.1.4 Subclassingand Inheritance . . . . .. . . . . . . . .. . . . . . . . .. . . . 558
15.2 Object Distribution Design . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . 560
15.2.1 HorizontalClass Partitioning . . . . . .. . . . . . . . .. . . . . . . . .. . 561
15.2.2 VerticalClass Partitioning . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 563
15.2.3 PathPartitioning . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 563
15.2.4 ClassPartitioning Algorithms . . . . . . . .. . . . . . . . .. . . . . . . . . 564
15.2.5 Allocation . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 565
15.2.6 Replication . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 565
15.3 Architectural Issues . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 566
xviii Contents
15.3.1 AlternativeClient/Server Architectures . . . . . . . .. . . . . . . . .. 567
15.3.2 CacheConsistency .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 572
15.4 Object Management . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 574
15.4.1 ObjectIdentifier Management . . . .. . . . . . . . .. . . . . . . . .. . . . 574
15.4.2 PointerSwizzling .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 576
15.4.3 ObjectMigration . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 577
15.5 Distributed Object Storage .. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 578
15.6 Object Query Processing . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 582
15.6.1 ObjectQuery Processor Architectures .. . . . . .. . . . . . . . .. . . 583
15.6.2 QueryProcessing Issues . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 584
15.6.3 QueryExecution . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 589
15.7 Transaction Management .. . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 593
15.7.1 CorrectnessCriteria .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 594
15.7.2 TransactionModels and Object Structures . . . . .. . . . . . . . .. 596
15.7.3 TransactionsManagement in Object DBMSs .. . . . . . .. . . . . 596
15.7.4 Transactionsas Objects .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 605
15.8 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 606
15.9 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 607
16 Peer-to-PeerData Management . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 611
16.1 Infrastructure .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 614
16.1.1 UnstructuredP2P Networks . . . . .. . . . . . . . .. . . . . . . . .. . . . 615
16.1.2 StructuredP2P Networks . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 618
16.1.3 Super-peerP2P Networks .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 622
16.1.4 Comparisonof P2P Networks . . . . . . .. . . . . . . . .. . . . . . . . .. 624
16.2 Schema Mapping in P2P Systems .. . . . . . . .. . . . . . . . .. . . . . . . . .. . 624
16.2.1 PairwiseSchema Mapping . . . . . .. . . . . . . . .. . . . . . . . .. . . . 625
16.2.2 Mappingbased on Machine Learning Techniques .. . . . . . . . 626
16.2.3 CommonAgreement Mapping .. . . . . . . . .. . . . . . . . .. . . . . . 626
16.2.4 SchemaMapping using IR Techniques . .. . . . . . . . .. . . . . . . 627
16.3 Querying Over P2P Systems . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 628
16.3.1 Top-kQueries . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 628
16.3.2 JoinQueries . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 640
16.3.3 RangeQueries . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 642
16.4 Replica Consistency . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 645
16.4.1 BasicSupport in DHTs . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 646
16.4.2 DataCurrency in DHTs . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 648
16.4.3 ReplicaReconciliation . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 649
16.5 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 653
16.6 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 653
17 WebData Management . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 657
17.1 Web Graph Management . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 658
17.1.1 CompressingWeb Graphs . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 660
17.1.2 StoringWeb Graphs as S-Nodes . . . .. . . . . . . . .. . . . . . . . .. . 661
Contents xix
17.2 Web Search . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 663
17.2.1 WebCrawling . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 664
17.2.2 Indexing. . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 667
17.2.3 Rankingand Link Analysis . . . . . . . . .. . . . . . . . .. . . . . . . . .. 668
17.2.4 Evaluationof Keyword Search .. . . . . .. . . . . . . . .. . . . . . . . . 669
17.3 Web Querying .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 670
17.3.1 SemistructuredData Approach . . . . . . . . .. . . . . . . . .. . . . . . . 671
17.3.2 WebQuery Language Approach . . .. . . . . . . . .. . . . . . . . .. . . 676
17.3.3 QuestionAnswering . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . 681
17.3.4 Searchingand Querying the Hidden Web .. . . . . . . . .. . . . . . 685
17.4 Distributed XML Processing . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 689
17.4.1 Overviewof XML . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 691
17.4.2 XMLQuery Processing Techniques . . . .. . . . . . . . .. . . . . . . . 699
17.4.3 FragmentingXML Data . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 703
17.4.4 OptimizingDistributed XML Processing . . . . . . .. . . . . . . . . 710
17.5 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 718
17.6 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 719
18 .. . . . . . . .. . . 723
18.1 Data Stream Management . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . 723
18.1.1 StreamData Models . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 725
18.1.2 StreamQuery Languages . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 727
18.1.3 StreamingOperators and their Implementation . . . .. . . . . . . . 732
18.1.4 QueryProcessing .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 734
18.1.5 DSMSQuery Optimization . . . . . . . .. . . . . . . . .. . . . . . . . .. . 738
18.1.6 LoadShedding and Approximation . .. . . . . . . . .. . . . . . . . .. 739
18.1.7 Multi-QueryOptimization . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 740
18.1.8 StreamMining . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 741
18.2 Cloud Data Management . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 744
18.2.1 Taxonomyof Clouds .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 745
18.2.2 GridComputing . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . 748
18.2.3 Cloudarchitectures . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 751
18.2.4 Datamanagement in the cloud . . .. . . . . . . . .. . . . . . . . .. . . . 753
18.3 Conclusion .. . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . 760
18.4 Bibliographic Notes . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . 762
References. . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . 765
Index . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . 833
CurrentIssues: Streaming Data and Cloud Computing
Chapter 1
Introduction
Distributed database system (DDBS) technology is the union of what appear to
betwo diametrically opposed approaches to data processing: database system and
computernetwork technologies.Database systems have taken us from a paradigm
of data processing in which each application defined and maintained its own data
(Figure 1.1)to one in which the data are defined and administered centrally (Figure
1.2). This new orientation results in data independence, whereby the application
programsare immune to changes in the logical or physical organization of the data,
andvice versa.
One of the major motivations behind the use of databasesystems isthe desire
to integrate the operational data of an enterprise and to provide centralized, thus
controlled access to that data. The technology of computer networks, on the other
hand,promotes a mode of work that goes against all centralization efforts. At first
glanceit might be difficult to understand how these two contrasting approaches can
possibly be synthesized to produce a technology that is more powerful and more
promising than either one alone. The key to this understanding is the realization
PROGRAM 1
Data
Description
PROGRAM 2
FILE 1
FILE 2
FILE 3
PROGRAM 3
Data
Description
Data
Description
REDUNDANT DATA
Fig.1.1 TraditionalFile Processing
1
DOI 10.1007/978-1-4419-8834-8_1, © Springer Science+Business Media, LLC 2011
M.T. Özsu and P. Valduriez,
Principles of Distributed Database Systems: Third Edition,
2 1 Introduction
...
Data Description
Data Manipulation
DATABASE
PROGRAM 1
PROGRAM 2
PROGRAM 3
Fig.1.2 Database Processing
that the most important objective of the database technology is integration, not
centralization. It is important to realize that either one of these terms does not
necessarilyimply the other. It is possibleto achieve integration without centralization,
andthat is exactly what the distributed database technology attempts to achieve.
In this chapter we define thefundamental concepts and set the framework for
discussingdistributed databases. Westart by examining distributed systemsin general
in order to clarify the role of database technology within distributed data processing,
andthen move on to topics that are more directly related to DDBS.
1.1 Distributed Data Processing
Theterm distributed processing (or distributed computing) is hard to define precisely.
Obviously,some degree of distributed processing goes on in any computer system,
evenon single-processor computers where the central processing unit (CPU) and in-
put/output (I/O) functions are separated and overlapped.This separation and overlap
can be considered as one form of distributedprocessing. The widespread emergence
of parallel computers has further complicated the picture, since the distinction be-
tween distributedcomputing systems and someforms of parallel computers is rather
vague.
In this book we define distributed processing in such a way that it leads to a
definition of a distributed database system. The working definition we use for a
distributed computing system states that it is a number of autonomous processing
elements (not necessarily homogeneous) that are interconnected by a computer
network and that cooperate in performing their assigned tasks. The “processing
element” referred to in this definition is a computing device that can execute a
programon its own. This definition is similar to those given in distributed systems
textbooks(e.g., [Tanenbaum and van Steen, 2002] and [Colouris et al., 2001]).
A fundamental question that needs to be asked is: What is being distributed?
One of the things that might be distributed is the processing logic. In fact, the
definition of a distributedcomputing system given above implicitly assumes that the
1.2 Whatis a Distributed Database System? 3
processinglogic or processing elements are distributed. Another possible distribution
is according tofunction . Variousfunctions of a computer system could be delegated
tovarious pieces of hardware or software. A third possible mode of distribution is
according to data. Data used by a number of applications may bedistributed to a
number of processing sites. Finally, control can be distributed. The control of the
executionof various tasks might be distributed instead of being performed by one
computersystem. Fromthe viewpoint of distributed database systems, these modes
of distribution are all necessary and important. In the following sections we talk
aboutthese in more detail.
Anotherreasonable question to ask at this point is: Why do we distribute at all?
The classical answers to this question indicate that distributed processing better
corresponds to the organizationalstructure of today’s widely distributed enterprises,
and that such a system is more reliable and more responsive. More importantly,
manyof the current applications of computer technology are inherently distributed.
Web-basedapplications, electronic commerce business over the Internet, multimedia
applications such as news-on-demand or medical imaging, manufacturing control
systemsare all examples of such applications.
Froma more global perspective, however, it can be stated that the fundamental
reason behind distributedprocessing is to be better able to copewith the large-scale
datamanagement problems that we face today,by using a variation ofthe well-known
divide-and-conquerrule. If the necessary software support for distributed processing
canbe developed, it might be possible to solve these complicated problems simply
bydividing them into smaller pieces and assigning them to different software groups,
which work on different computers and produce a system that runs on multiple
processing elements butcan work efficiently toward the execution of a common task.
Distributeddatabase systems should also be viewed within this framework and
treated as tools that could make distributed processing easier and more efficient.It is
reasonable to drawan analogy between what distributeddatabases might offer to the
data processing world and what the database technology has already provided. There
is no doubt that the developmentof general-purpose, adaptable, efficient distributed
databasesystems has aided greatly in the task of developing distributed software.
1.2 What is a Distributed Database System?
Wedefine a distributed database as a collection of multiple, logically interrelated
databases distributedover a computer network. A distributeddatabase management
system(distributed DBMS) is then defined as the software system that permits the
managementof the distributed database and makes the distributiontransparent to the
users.Sometimes “distributed database system” (DDBS) is used to refer jointly to
the distributeddatabase and the distributedDBMS. The two important terms in these
definitionsare “logically interrelated” and “distributed over a computer network.”
Theyhelp eliminate certain cases that have sometimes been accepted to represent a
DDBS.
4 1 Introduction
A DDBS is not a “collection of files” that can be individually stored at each
node of a computer network. To form a DDBS, files shouldnot only belogically
related, but there should be structured among the files, and access should be via
a common interface. We should note that there has been muchrecent activity in
providingDBMS functionality over semi-structured data that are stored in files on
the Internet (such as Webpages). In light of this activity, the above requirement
may seem unnecessarily strict. Nevertheless, it is important to make a distinction
betweena DDBS where this requirement is met, and more general distributed data
management systems that provide a “DBMS-like”access todata. In various chapters
ofthis book, we will expand our discussion to cover these more general systems.
It has sometimes been assumed that the physical distribution of data isnot the
most significant issue. The proponents of this viewwould therefore feel comfortable
in labeling as a distributeddatabase anumber of (related) databases that reside in the
same computer system. However,the physical distribution of data isimportant. It
createsproblems that are not encountered when the databases reside in the same com-
puter.These difficulties are discussed in Section 1.5. Note that physical distribution
does not necessarily imply that the computer systems be geographically far apart;
theycould actually be in the same room. It simply implies that the communication
betweenthem is done over a network instead of through shared memory or shared
disk(as would be the case with multiprocessor systems), with the network as the only
sharedresource.
Thissuggests that multiprocessor systems should not be considered as DDBSs.
Although shared-nothing multiprocessors, where each processor node has its own
primary and secondary memory, and may also have its own peripherals, are quite
similar to the distributed environmentthat we focus on, there are differences. The
fundamental differenceis the modeof operation. A multiprocessor system design
is rather symmetrical, consisting of a number of identical processor and memory
components, and controlled by one or morecopies ofthe sameoperating system
that is responsible for a strict control of the task assignment to each processor.This
isnot true in distributed computing systems, where heterogeneity of the operating
system as well as the hardware is quite common.Database systemsthat runover
multiprocessor systems are called parallel database systems and are discussedin
Chapter14.
A DDBS is also not a system where, despite the existence of a network, the
database resides at only one node of the network (Figure 1.3). In this case, the
problems of database management are no differentthan theproblems encountered in
a centralized database environment(shortly, we will discuss client/server DBMSs
which relax this requirement to a certain extent).The database is centrally managed
by one computer system (site 2 in Figure 1.3) and all the requests are routed to
that site. The only additional consideration has to dowith transmissiondelays. It
isobvious that the existence of a computer network or a collection of “files” is not
sufficient to form a distributeddatabase system. What we are interested in is an
environmentwhere data are distributed among a number of sites (Figure 1.4).
1.3 DataDelivery Alternatives 5
Site 1
Site 2
Site 3
Site 4
Site 5
Communication
Network
Fig.1.3 Central Database on a Network
Site 1
Site 2
Site 3
Site 4
Site 5
Communication
Network
Fig.1.4 DDBS Environment
1.3 Data Delivery Alternatives
In distributeddatabases, data are “delivered” from the sites where they are stored to
wherethe query is posed. We characterize the data delivery alternatives along three
orthogonal dimensions: delivery modes, frequencyand communication methods. The
combinationsof alternatives along each of these dimensions (that we discuss next)
providea rich design space.
Thealternative delivery modes are pull-only, push-only and hybrid. In the pull-
only mode of data delivery, the transfer of data from servers to clients is initiated
by a client pull. When a client request is receivedat a server, the server responds by
locating the requested information. The main characteristic of pull-based deliveryis
that the arrivalof new data items or updates to existing data items are carried out at a
6 1 Introduction
serverwithout notification to clients unlessclients explicitly poll the server. Also, in
pull-basedmode, servers must be interrupted continuously to deal with requests from
clients. Furthermore, the information that clients can obtain from a server is limited
to when and what clients know to ask for. Conventional DBMSs offer primarily
pull-baseddata delivery.
In the push-only mode of data delivery,the transfer of data from servers to clients
is initiated by a server push in the absence of any specific request from clients.
Themain difficulty of the push-based approach is in deciding which data would be
of common interest, and when to sendthem toclients –alternatives are periodic,
irregular,or conditional. Thus, the usefulness of server push depends heavily upon
theaccuracy of a server to predict the needs of clients. In push-based mode, servers
disseminate information to either an unbounded set of clients (random broadcast)
whocan listen to a medium or selective set of clients (multicast),who belong to some
categoriesof recipients that may receive the data.
The hybrid mode of data deliverycombines the client-pull and server-push mech-
anisms. The continuous (or continual) query approach (e.g., [Liu et al., 1996],[Terry
etal., 1992],[Chen et al., 2000],[Pandey et al., 2003]) presents one possible way of
combiningthe pull and push modes: namely, the transfer of information from servers
toclients is first initiated by a client pull (by posing the query), and the subsequent
transferof updated information to clients is initiated by a server push.
Thereare three typical frequency measurements that can be used to classify the
regularityof data delivery. They are periodic, conditional, and ad-hoc or irregular.
In periodic delivery,data are sent from the server to clientsat regular intervals.
The intervalscan be defined by systemdefault or by clients using their profiles. Both
pulland push can be performed in periodic fashion. Periodic delivery is carried out
ona regular and pre-specified repeating schedule. A client request for IBM’s stock
priceevery week is an example of a periodic pull. An example of periodic push is
when an application can send out stock pricelisting ona regularbasis, say every
morning.Periodic push is particularly useful for situations in which clients might not
be availableat all times, or might be unable to react towhat has been sent, such as in
themobile setting where clients can become disconnected.
In conditional delivery,data are sentfrom servers whenever certain conditions
installed by clients in their profiles are satisfied. Such conditions can be as simple
asa given time span or as complicated as event-condition-action rules. Conditional
deliveryis mostlyused in thehybrid or push-onlydelivery systems. Using condi-
tional push, data are sent out according to a pre-specified condition, rather than
any particular repeating schedule. An application that sends out stock prices only
when theychange is anexample of conditional push. An application that sends out a
balancestatement only when the total balance is 5% below the pre-defined balance
thresholdis an example of hybrid conditional push. Conditional push assumes that
changesare critical to the clients, and that clients are always listening and need to
respond to what is being sent. Hybrid conditional push further assumes that missing
someupdate information is not crucial to the clients.
Ad-hoc deliveryis irregular and is performed mostly in a pure pull-based system.
Data are pulled from servers to clients in an ad-hoc fashion wheneverclients request
1.4 Promisesof DDBSs 7
it. In contrast, periodic pull arises when a client uses polling to obtaindata from
serversbased on a regular period (schedule).
Thethird component of the design space of information deliveryalternatives is the
communicationmethod. These methods determine the various ways in which servers
andclients communicate for delivering information to clients. The alternatives are
unicast and one-to-many. In unicast, the communication from aserver to aclient
is one-to-one: the server sends data to one client using a particular delivery mode
with some frequency. In one-to-many, as the name implies, theserver sendsdata
to a number of clients. Notethat we are not referring here to aspecific protocol;
one-to-manycommunication may use a multicast or broadcast protocol.
Weshould note that this characterization is subject to considerable debate. It is
not clear that every point in the design space is meaningful. Furthermore,specifi-
cationof alternatives such as conditional
and
periodic(which may make sense) is
difficult. However,it serves as a first-order characterization of the complexity of
emergingdistributed data management systems. For the most part, in this book, we
are concerned with pull-only,ad hoc datadelivery systems, although examples of
otherapproaches are discussed in some chapters.
1.4 Promises of DDBSs
Manyadvantages of DDBSs have been cited in literature, ranging from sociological
reasons for decentralization [D’Oliviera,1977] to better economics. All of thesecan
bedistilled to four fundamentals which may also be viewed as promises of DDBS
technology: transparent management of distributed and replicated data, reliable
access to data through distributed transactions, improved performance, and easier
system expansion. In this section we discuss these promises and, in the process,
introducemany of the concepts that we will study in subsequent chapters.
1.4.1 Transparent Management of Distributed and Replicated Data
Transparency refers to separation of the higher-levelsemantics of a system from
lower-levelimplementation issues. In other words, a transparent system “hides” the
implementationdetails from users. The advantage of a fully transparent DBMS is the
high levelof support that it provides for the development of complex applications. It
isobvious that we would like to make all DBMSs (centralized or distributed) fully
transparent.
Let us start our discussion with an example. Consider an engineering firm that
has offices in Boston, Waterloo, Paris and San Francisco. They run projects at
each of these sites and would liketo maintaina database of their employees, the
projectsand otherrelated data. Assuming that the database is relational, we can store
8 1 Introduction
this information in two relations: EMP(
ENO
, ENAME,TITLE)
1
and PROJ(
PNO
,
PNAME,BUDGET). We also introduce a third relation to store salary information:
SAL(
TITLE,AMT
) and a fourth relation ASG which indicateswhich employees
have been assigned to which projects forwhat duration with what responsibility:
ASG(
ENO,PNO
,RESP, DUR). If all of thisdata were stored in a centralized DBMS,
and we wanted to find out the names and employees who worked on a project for
morethan 12 months, we would specify this using the following SQL query:
SELECT ENAME, AMT
FROM EMP, ASG, SAL
WHERE ASG.DUR > 12
AND EMP.ENO = ASG.ENO
AND SAL.TITLE = EMP.TITLE
However,given the distributed natureof this firm’s business, it is preferable, under
these circumstances, to localize data such that data about the employees in Waterloo
office are stored in Waterloo, those in the Boston office are stored in Boston, and
so forth. The same applies to the project and salary information. Thus, what we
areengaged in is a process where we partition each of the relations and store each
partition at a differentsite. This is known as fragmentation and we discuss it further
belowand in detail in Chapter 3.
Furthermore, it may be preferable to duplicate some of this data at other sites
for performance and reliability reasons. The result is a distributed database which
is fragmented and replicated (Figure 1.5). Fully transparent access means that the
users can still pose the query as specified above, without paying any attention to
the fragmentation, location, or replication of data, and let the system worry about
resolvingthese issues.
Fora system to adequately deal with this type of query over a distributed, frag-
mentedand replicated database, it needs to be able to deal with a number of different
typesof transparencies. We discuss these in this section.
1.4.1.1 Data Independence
Dataindependence is a fundamental form of transparency that we look for within a
DBMS.It is also the only type that is important within the context of a centralized
DBMS.It refers to the immunity of user applications to changes in the definition and
organizationof data, and vice versa.
As is well-known, data definition occurs at two levels. At one level the logical
structureof the data are specified, and at the other level its physical structure. The
formeris commonly known as the schema definition, whereas the latter is referred
to as the physical data description. We can therefore talk about two types of data
1
Wediscuss relational systems in Chapter 2 (Section 2.1) where we develop this example further.
Forthe time being, it is sufficient to note that this nomenclature indicates that we have just defined
arelation with three attributes: ENO (which is the key, identified by underlining), ENAME and
TITLE.
1.4 Promises of DDBSs 9
Paris
San
Francisco
Waterloo
Boston
Communication
Network
Boston employees, Paris employees,
Boston projects
Waterloo employees,
Waterloo projects, Paris projects
San Francisco employees,
San Francisco projects
Paris employees, Boston employees,
Paris projects, Boston projects
Fig.1.5 A DistributedApplication
independence:logical data independence and physical data independence. Logical
data independence refers to the immunity of user applications to changes in the
logicalstructure (i.e., schema) of the database. Physical data independence, on the
otherhand, deals with hidingthe details of the storagestructure from user applications.
When a user application is written, it should notbe concernedwith thedetails of
physical data organization. Therefore, the user application should not need to be
modifiedwhen data organization changes occur due to performance considerations.
1.4.1.2 Network Transparency
In centralized database systems, the only availableresource that needs to be shielded
from the user is the data (i.e.,the storage system). In a distributed database envi-
ronment, however, there is a second resource that needs to be managed in much
the same manner: the network. Preferably, the user should be protected from the
operational details of the network; possibly evenhiding the existence of the network.
Then there wouldbe no difference between database applications that would run on
acentralized database and those that would run on a distributed database. This type
oftransparency is referred to as network transparency or distribution transparency.
Onecan consider network transparency from the viewpoint of either the services
providedor the data. From the former perspective, it is desirable to have a uniform
means by which services are accessed. From a DBMS perspective, distribution
transparencyrequires that users do not have to specify where data are located.
Sometimes twotypes of distribution transparency are identified: location trans-
parencyand naming transparency. Location transparency refers to the fact that the
10 1 Introduction
commandused to perform a task is independent of both the location of the data and
the system on which an operation is carried out. Naming transparencymeans that a
uniquename is provided for each object in the database. In the absence of naming
transparency,users are required to embed the location name (or an identifier) as part
ofthe object name.
1.4.1.3 Replication Transparency
Theissue of replicating data within a distributed database is introduced in Chapter
3 and discussed in detail in Chapter 13.At this point, let us just mention that for
performance, reliability,and availability reasons, it isusually desirable to beable
to distribute data in a replicated fashionacross themachines on a network. Such
replication helps performance since diverseand conflicting user requirements can be
moreeasily accommodated. For example, data that are commonly accessed by one
user can be placed on that user’slocal machine as well as on the machine of another
user with the same access requirements. This increases the locality of reference.
Furthermore, if one of the machines fails, a copy of the dataare still availableon
another machine on the network. Of course, this is a very simple-minded description
of the situation. In fact, the decision as to whether to replicate or not, and how many
copies of any database object to have, depends to a considerable degree on user
applications.We will discuss these in later chapters.
Assuming that data are replicated, thetransparency issue is whether the users
shouldbe aware of the existence of copies or whether the system should handle the
management of copies and the user should act as if there is a single copy of the data
(notethat we are not referring to the placement of copies, only their existence). From
a user’sperspective the answer is obvious. It is preferable not to be involved with
handling copies and havingto specify the fact that a certain action can and/or should
betaken on multiple copies. From a systems point ofview, however,the answer is not
that simple. As we will see in Chapter 11, when the responsibility of specifying that
anaction needs to be executed on multiple copies is delegated to the user, it makes
transactionmanagement simpler for distributed DBMSs. On the other hand, doing
soinevitably results in the loss of some flexibility. It is not the system that decides
whether or not to havecopies and how many copies to have, but the user application.
Anychange in these decisions because of various considerations definitely affects
theuser application and, therefore, reduces data independence considerably. Given
these considerations, it is desirable that replication transparency be provided asa
standard feature of DBMSs. Remember that replication transparency refers only
to the existence of replicas, not to their actual location. Note also that distributing
thesereplicas across the network in a transparent manner is the domain of network
transparency.
1.4 Promises of DDBSs 11
1.4.1.4 Fragmentation Transparency
The final form of transparency that needs to beaddressed withinthe contextof a
distributeddatabase system is that of fragmentation transparency. In Chapter 3 we
discuss and justify the fact that it is commonly desirable to divide each database
relation into smaller fragments and treat each fragment as a separate database object
(i.e., another relation). This is commonly done forreasons ofperformance, avail-
ability,and reliability. Furthermore, fragmentation can reduce the negative effects of
replication. Each replica is not the full relation but only a subset of it; thus less space
isrequired and fewer data items need be managed.
There are two general types of fragmentation alternatives. In one case, called
horizontal fragmentation, a relation is partitioned into a set of sub-relations each
of which have a subset of the tuples (rows) of the original relation. The second
alternativeis vertical fragmentation where each sub-relation is defined on a subset of
theattributes (columns) of the original relation.
When database objects are fragmented, we have to deal with the problem of
handling user queries that are specified on entire relations but haveto be executed on
subrelations.In other words, the issue is one of finding a query processing strategy
basedon the fragments rather than the relations, even though thequeries are specified
onthe latter. Typically,this requires a translation fromwhat is called a global query to
severalfragment queries. Since the fundamental issue of dealing with fragmentation
transparencyis one of query processing, we defer the discussion of techniques by
whichthis translation can be performed until Chapter 7.
1.4.1.5 Who Should Provide Transparency?
Inprevious sections we discussed various possible forms of transparency within a
distributedcomputing environment. Obviously, to provide easy and efficient access
by novice users to the services of the DBMS, one would want to have full trans-
parency,involving all the various types that we discussed. Nevertheless, the level of
transparencyis inevitably a compromise between ease of use and the difficulty and
overheadcost of providinghigh levels of transparency. For example, Gray argues
thatfull transparency makes the management of distributed data very difficult and
claims that “applications coded with transparent access to geographically distributed
databaseshave: poor manageability, poormodularity, and poor message performance”
[Gray,1989]. He proposes a remote procedure call mechanism between the requestor
usersand the server DBMSs whereby the users woulddirect their queries to a specific
DBMS.This is indeed the approach commonly taken by client/server systems that
wediscuss shortly.
Whathas not yet been discussed is who is responsible for providing these services.
Itis possible to identify three distinct layers at which thetransparency services can be
provided.It is quite common to treat these as mutually exclusive means of providing
theservice, although it is more appropriate to view them as complementary.
12 1 Introduction
Wecould leave the responsibility of providingtransparent access to data resources
to the access layer. The transparency features can be builtinto the userlanguage,
which then translates the requested services into required operations. In other words,
the compiler or the interpreter takes over the task and no transparent service is
providedto the implementer of the compiler or the interpreter.
Thesecond layer at which transparency can be provided is the operating system
level.State-of-the-art operating systems provide some levelof transparency to system
users. Forexample, the device drivers within the operating system handle the details
ofgetting each piece of peripheral equipment to do what is requested. The typical
computeruser, or even an application programmer, does not normally write device
driversto interact with individual peripheral equipment; that operation is transparent
tothe user.
Providing transparent access to resources at the operating system level can ob-
viously be extended to the distributed environment,where the management of the
networkresource is taken over by the distributed operating system or the middleware
if the distributedDBMS is implemented over one. There are two potential problems
with this approach. The first is that not all commercially available distributedoperat-
ing systems providea reasonable level of transparency in network management. The
second problem is that some applications do not wish to be shielded from the details
ofdistribution and need to access them for specific performance tuning.
The third layer at which transparency can be supported is within the DBMS. The
transparencyand support for database functions provided to the DBMS designers
by an underlying operating system is generally minimal and typically limited to
veryfundamental operations for performing certain tasks. It is the responsibility of
theDBMS to make all the necessary translations from the operating system to the
higher-leveluser interface. This mode ofoperation is the most common method today.
Thereare, however, various problems associated with leaving the task of providing
fulltransparency to the DBMS. These have to do with the interaction ofthe operating
systemwith the distributed DBMS and are discussed throughout this book.
Ahierarchy of these transparencies is shown in Figure 1.6. It is not always easy
todelineate clearly the levels of transparency, but such a figure serves an important
instructional purpose even if it is not fully correct. To complete the picture we
have added a “language transparency” layer, although it is not discussed in this
chapter.With this generic layer, users have high-level access to the data (e.g., fourth-
generationlanguages, graphical user interfaces, natural language access).
1.4.2 Reliability Through Distributed Transactions
Distributed DBMSs are intended to improve reliability since they have replicated
componentsand, thereby eliminate single points of failure. Thefailure of a single site,
orthe failure of a communication link which makes one or more sites unreachable,
is not sufficientto bring down the entire system. In the case of a distributed database,
this means that some of the data may be unreachable,but with propercare, users
1.4 Promises of DDBSs 13
Data
D
a
t
a
I
n
d
e
p
e
n
d
e
n
c
e
N
e
t
w
o
r
k
T
r
a
n
s
p
a
r
e
n
c
y
R
e
p
l
i
c
a
t
i
o
n
T
r
a
n
s
p
a
r
e
n
c
y
F
r
a
g
m
e
n
t
a
t
i
o
n
T
r
a
n
s
p
a
r
e
n
c
y
L
a
n
g
u
a
g
e
T
r
a
n
s
p
a
r
e
n
c
y
Fig.1.6 Layers of Transparency
may be permitted to access other parts of the distributed database. The “proper care”
comesin the form of support for distributed transactions and application protocols.
Wediscuss transactions and transaction processing in detail in Chapters 1012.
A transaction is a basic unit of consistent and reliable computing, consisting of a
sequence of database operations executedas an atomic action. It transforms a consis-
tentdatabase state to another consistent database state even when a number of such
transactions are executedconcurrently (sometimes called concurrency transparency),
and evenwhen failuresoccur (also called failure atomicity). Therefore, a DBMS
that providesfull transaction support guarantees that concurrent execution of user
transactions will not violate database consistency in the face of system failures as
long as each transaction is correct, i.e., obeys the integrity rules specified on the
database.
Let us givean example of a transaction based on the engineering firm example
that we introduced earlier. Assume that there is an application that updates the
salaries of all the employees by 10%. It is desirable to encapsulate the query (or
the program code) that accomplishes this task withintransaction boundaries. For
example,if a system failure occurs half-way through the execution of this program,
we would like the DBMS to be able to determine, upon recovery, where it left off
and continue with its operation (or start all overagain). This is the topic of failure
atomicity. Alternatively, if some other user runs a query calculating the average
salariesof the employees in this firm while the original update action is going on, the
calculatedresult will be in error. Therefore we would like the system to be able to
synchronize the concurrentexecution of these two programs. To encapsulate a query
(or a program code) within transactional boundaries, it is sufficient to declarethe
beginof the transaction and its end:
Begin transaction SALARY UPDATE
begin
EXEC SQL UPDATE PAY
SET SAL = SAL
*
1.1
end.
14 1 Introduction
Distributed transactions execute at a number of sites at which they access the
local database. The abovetransaction, for example, will execute in Boston, Waterloo,
Parisand San Francisco since the data are distributed at these sites. With full support
for distributed transactions, user applications can access a single logical image of
thedatabase and rely on the distributed DBMS to ensure that their requests will be
executedcorrectly nomatter what happensin the system.“Correctly” means that
user applications do not need to be concerned with coordinating their accessesto
individuallocal databases nor do they need to worry about the possibility of site or
communicationlink failures during the executionof their transactions. This illustrates
the link between distributedtransactions and transparency, since both involve issues
relatedto distributed naming and directory management, among other things.
Providingtransaction support requires the implementation of distributed concur-
rencycontrol (Chapter 11) and distributed reliability (Chapter 12) protocols in
particular,two-phase commit (2PC) and distributed recovery protocols — which are
significantly more complicated than their centralized counterparts. Supporting repli-
casrequires the implementation of replica control protocols that enforce a specified
semanticsof accessing them (Chapter 13).
1.4.3 Improved Performance
The case for the improvedperformance of distributed DBMSs is typically made
basedon two points. First, a distributed DBMS fragments the conceptual database,
enabling data to be stored in close proximity to its points ofuse (also calleddata
localization).This has two potential advantages:
1.
Since each site handles only a portion of the database, contention for CPU
andI/O services is not as severe as for centralized databases.
2.
Localizationreduces remote access delays that are usually involved in wide
area networks (for example, the minimum round-trip message propagation
delayin satellite-based systems is about 1 second).
Most distributedDBMSs are structured to gain maximum benefit from data localiza-
tion.Full benefits of reduced contention and reduced communication overhead can
beobtained only by a proper fragmentation and distribution of the database.
This point relates to the overhead of distributed computing if the data have
to reside at remote sites and one has to access it by remote communication. The
argumentis that it is better, in these circumstances, to distribute thedata management
functionality to where the data are located rather than moving large amounts of data.
Thishas lately become a topic of contention. Some argue that with the widespread
useof high-speed, high-capacity networks, distributing data and data management
functions no longer makes sense and that it may be much simpler to store data
at a central site and access it (by downloading) over high-speed networks. This
argument,while appealing, misses the point of distributed databases. First of all, in
1.4 Promises of DDBSs 15
most of today’sapplications, data aredistributed; what may be open for debate is
howand where we process it. Second, and more important,point is that this argument
does not distinguish between bandwidth (the capacity of thecomputer links)and
latency (how long it takes for data to be transmitted). Latency is inherent in the
distributedenvironments and there are physical limits to how fast we can send data
overcomputer networks. As indicated above, for example, satellite links take about
half-a-second to transmit data between two ground stations. This is a function of the
distanceof the satellites from the earth and there is nothing that wecan do to improve
that performance. Forsome applications, this might constitute an unacceptable delay.
The second case point is that the inherent parallelism of distributed systems
may be exploited for inter-queryand intra-query parallelism. Inter-query parallelism
resultsfrom the ability to execute multiple queries at the same time while intra-query
parallelismis achieved bybreaking up a single queryinto a number of subquerieseach
ofwhich is executed at a different site, accessing a different part of the distributed
database.
If the user access to the distributed database consisted only of querying (i.e.,
read-onlyaccess), then provision of inter-query and intra-query parallelism would
implythat as much of the database as possible should be replicated. However, since
mostdatabase accesses are not read-only, the mixing of read and update operations
requires the implementation of elaborate concurrencycontrol and commit protocols.
1.4.4 Easier System Expansion
Ina distributed environment, it is much easier to accommodate increasing database
sizes. Major system overhauls are seldom necessary; expansion can usually be
handledby adding processing and storage power to the network. Obviously, it may
not be possible to obtain a linear increase in “power,”since this also depends on the
overheadof distribution. However, significant improvements are still possible.
One aspect of easier system expansionis economics. It normally costs much less
toput together a system of “smaller” computers with the equivalent power of a single
big machine. In earlier times, it was commonly believed that it would be possible
to purchase a fourfold powerful computer if one spent twice as much. This was
knownas Grosh’s law. With the advent of microcomputers and workstations, and
theirprice/performance characteristics, this law is considered invalid.
Thisshould not be interpreted to mean that mainframes are dead; this is not the
pointthat we are making here. Indeed, in recent years,we have observed a resurgence
inthe world-wide sale of mainframes. The point is that for many applications, it is
moreeconomical to put together a distributed computer system (whether composed
ofmainframes or workstations) with sufficient power than it is to establish a single,
centralizedsystem to run these tasks. In fact, the latter may not even be feasible these
days.
16 1 Introduction
1.5 Complications Introduced by Distribution
Theproblems encountered in database systems take on additional complexity in a
distributedenvironment, even though the basic underlying principles are the same.
Furthermore,this additional complexity gives rise to newproblems influenced mainly
bythree factors.
First, data may be replicated in a distributedenvironment. A distributed database
canbe designed so that the entire database, or portions of it, reside at different sites
ofa computer network. It is not essential that every site on the network contain the
database; it is only essential that there be more than one site where the database
resides.The possible duplication of data items is mainly due to reliability and effi-
ciencyconsiderations. Consequently, the distributed database system is responsible
for(1) choosing one of the stored copies of the requested data for access in case of
retrievals,and (2) making sure that the effect of an update is reflected on each and
everycopy of that data item.
Second,if some sites fail (e.g., by either hardware or software malfunction), or
ifsome communication links fail (making some of the sites unreachable) while an
update is being executed,the system must make sure that the effects will be reflected
on the data residing at the failing or unreachable sites as soon as the system can
recoverfrom the failure.
The third point is that since each site cannot have instantaneous information
onthe actions currently being carried out at the other sites, the synchronization of
transactionson multiple sites is considerably harder than for a centralized system.
Thesedifficulties point to a number ofpotential problems with distributed DBMSs.
These are the inherent complexity of building distributed applications, increased
cost of replicating resources, and, more importantly, managing distribution, the
devolution of control to many centers and the difficulty of reaching agreements,
and the exacerbatedsecurity concerns (the secure communication channel problem).
Theseare well-known problems in distributed systems in general, and, in this book,
wediscuss their manifestations within the context of distributed DBMS and how they
canbe addressed.
1.6 Design Issues
InSection 1.4, we discussed the promises of distributedDBMS technology, highlight-
ingthe challenges that need to be overcome in order to realize them. In this section
we buildon this discussion by presenting the design issues that arise in building a
distributedDBMS. These issues will occupy much of the remainder of this book.
1.6 Design Issues 17
1.6.1 Distributed Database Design
Thequestion that is being addressed is how the database and the applications that run
againstit should be placed across the sites. There are two basicalternatives to placing
data:partitioned (or non-replicated) and replicated. In the partitioned scheme the
database is divided into a number of disjoint partitions each of which is placed at
a different site. Replicated designs can be either fully replicated (also called fully
duplicated) where the entire database is stored at each site, orpartially replicated (or
partially duplicated) where each partition of the database is stored at more than one
site,but not at all the sites. The two fundamental design issues are fragmentation,
theseparation of the database into partitions called fragments, and distribution, the
optimumdistribution of fragments.
The research in this area mostly involvesmathematical programming in order
to minimize the combined cost of storing the database, processing transactions
againstit, and message communication among sites. The general problem is NP-hard.
Therefore,the proposed solutions are based on heuristics. Distributeddatabase design
isthe topic of Chapter 3.
1.6.2 Distributed Directory Management
A directory contains information (such as descriptions and locations) about data
itemsin the database. Problems related to directory management are similar in nature
tothe database placement problem discussed in the preceding section. A directory
maybe global to the entire DDBS or local to each site; it can be centralized at one
site or distributed overseveral sites; there can be a single copy or multiple copies.
Webriefly discuss these issues in Chapter 3.
1.6.3 Distributed Query Processing
Queryprocessing deals with designing algorithms that analyze queries and convert
them into a series of data manipulation operations. The problem is how to decide
on a strategy for executingeach query over the network in the most cost-effective
way,however cost is defined.The factorsto beconsidered are the distribution of
data,communication costs, and lack of sufficient locally-available information. The
objectiveis to optimize where the inherent parallelism is used to improve the perfor-
mance of executingthe transaction, subject to the above-mentioned constraints. The
problemis NP-hard in nature, and the approaches are usually heuristic. Distributed
queryprocessing is discussed in detail in Chapter 6 - 8.
18 1 Introduction
1.6.4 Distributed Concurrency Control
Concurrencycontrol involves the synchronization of accesses to the distributed data-
base,such that the integrity of the database is maintained. It is, without any doubt,
oneof the most extensively studied problems in the DDBS field. The concurrency
controlproblem in a distributed context is somewhat different than in a centralized
framework.One not only has to worry about the integrity of a single database, but
also about the consistency of multiple copies of the database. The condition that
requiresall the values of multiple copies of every data item to converge to the same
valueis called mutual consistency.
Thealternative solutions are too numerous to discuss here, so we examine them in
detailin Chapter 11. Let us only mention that the twogeneral classes are pessimistic ,
synchronizing the execution of user requests before the execution starts, and opti-
mistic,executing the requests and then checking if the execution has compromised
theconsistency of the database. Two fundamental primitives that can be used with
bothapproaches are locking, which is based on the mutual exclusion of accesses to
dataitems, and timestamping, where the transaction executions are ordered based on
timestamps.There are variations of these schemes as well as hybrid algorithms that
attemptto combine the two basic mechanisms.
1.6.5 Distributed Deadlock Management
Thedeadlock problem in DDBSs is similar in nature to that encountered in operating
systems.The competition among users for access to a set of resources (data, in this
case) can result in a deadlock if the synchronization mechanism is based on locking.
Thewell-known alternatives of prevention, avoidance, and detection/recovery also
applyto DDBSs. Deadlock management is covered in Chapter 11.
1.6.6 Reliability of Distributed DBMS
Wementioned earlier that one of the potential advantages of distributed systems
is improved reliability and availability. This, however,is nota feature that comes
automatically.It is important that mechanisms be provided to ensure the consistency
ofthe database as well as to detect failures and recover from them. The implication
forDDBSs is that when a failure occurs and various sites become either inoperable
orinaccessible, the databases at the operational sites remain consistent and up to date.
Furthermore,when the computer system or network recovers from the failure, the
DDBSsshould be able to recover and bringthe databases at the failed sites up-to-date.
Thismay be especially difficult in the case of network partitioning, where the sites
aredivided into two or more groups withno communication among them. Distributed
reliabilityprotocols are the topic of Chapter 12.
1.6 Design Issues 19
Directory
Management
Query
Processing
Distributed
DB Design
Concurrency
Control
Deadlock
Management
Reliability
Replication
Fig.1.7 Relationship Among Research Issues
1.6.7 Replication
Ifthe distributed database is(partially or fully) replicated, it isnecessary to implement
protocols that ensure the consistencyof the replicas,i.e., copies of the same data item
have the same value. These protocols can be eagerin thatthey forcethe updates
to be applied to all the replicas before the transaction completes, or they may be
lazy so that the transaction updates one copy (called the master) from which updates
arepropagated to the others after the transaction completes. We discuss replication
protocolsin Chapter 13.
1.6.8 Relationship among Problems
Naturally,these problems are not isolated from one another. Eachproblem is affected
bythe solutions found for the others, and in turn affects the set of feasible solutions
forthem. In this section we discuss how they are related.
The relationship among the components is shown in Figure 1.7. The design of
distributeddatabases affects many areas.It affects directory management, because the
definitionof fragments and their placement determine the contents of the directory
(or directories) as well as the strategies that may be employed to manage them.
Thesame information (i.e., fragment structure and placement) is used by the query
processorto determine the query evaluation strategy. On the other hand, the access
andusage patterns that are determined by the query processor are used as inputs to
thedata distribution and fragmentation algorithms. Similarly, directory placement
andcontents influence the processing of queries.
20 1 Introduction
The replication of fragments when they are distributed affects the concurrency
control strategies that might be employed. As we willstudy in Chapter 11, some
concurrency control algorithms cannot be easily used with replicated databases.
Similarly,usage and access patterns to the database will influence the concurrency
control algorithms. If the environmentis update intensive, the necessary precautions
arequite different from those in a query-only environment.
Thereis a strong relationship among the concurrency control problem, the dead-
lock management problem, and reliability issues. This is to be expected, since to-
getherthey are usually called the transaction management problem. The concurrency
controlalgorithm that is employed will determine whether or not a separate deadlock
management facilityis required. If a locking-based algorithm is used, deadlocks will
occur,whereas they will not if timestamping is the chosen alternative.
Reliability mechanisms involveboth localrecovery techniques and distributed
reliability protocols. In that sense, theyboth influence the choice of the concurrency
controltechniques and are built on top of them. Techniques to provide reliability also
makeuse of data placement information since the existence of duplicate copies of
thedata serve as a safeguard to maintain reliable operation.
Finally,the need forreplication protocols arise if datadistribution involves replicas.
Asindicated above, there is a strong relationship between replication protocols and
concurrencycontrol techniques, since both deal withthe consistency of data, but from
differentperspectives. Furthermore, the replication protocols influence distributed
reliability techniques such as commit protocols. In fact,it is sometimes suggested
(wrongly,in our view) that replication protocols can be used instead of implementing
distributedcommit protocols.
1.6.9 Additional Issues
The abovedesign issues cover what may be called “traditional” distributed database
systems.The environment has changed significantly since these topics started to be
investigated,posing additional challenges and opportunities.
Oneof the important developments has been the movetowards “looser” federation
among data sources, which may also be heterogeneous. As we discuss in the next
section, this has givenrise to the development of multidatabase systems (also called
federated databases and data integration systems) that require re-investigation of
some of the fundamental database techniques. These systems constitute an important
part of today’sdistributed environment. We discuss database design issues in multi-
databasesystems (i.e., database integration) in Chapter 4 and the query processing
challengesin Chapter 9.
The growth of the Internet as a fundamental networking platform has raised
importantquestions about the assumptions underlying distributed database systems.
Twoissues are of particular concern to us. One is the re-emergence of peer-to-peer
computing, and the other is the development and growth of the WorldWide Web
(web for short). Both of these aim at improving data sharing, but take different
1.7 Distributed DBMS Architecture 21
approaches and pose differentdata management challenges. We discuss peer-to-peer
datamanagement in Chapter 16 and web data management in Chapter 17.
Weshould note that peer-to-peer is nota new concept in distributed databases,
aswe discuss in the next section. However, their new re-incarnation has significant
differencesfrom the earlier versions. In Chapter 16, it is these new versions that we
focuson.
Finally, as earlier noted, there is a strong relationship between distributed
databases and parallel databases. Although the former assumes each site to be a
singlelogical computer, most of these installations are, in fact, parallel clusters.Thus,
while most of the book focuses on issues that arise in managing data distributed
acrossthese sites, interesting data management issues exist within a single logical
sitethat may be a parallel system. We discuss these issues in Chapter 14.
1.7 Distributed DBMS Architecture
The architecture of a system defines its structure. This means that the components of
thesystem are identified, the function of each component is specified, and the interre-
lationshipsand interactions among these components are defined. The specification
ofthe architecture of a system requires identification of the various modules, with
theirinterfaces and interrelationships, in terms of the data and control flow through
thesystem.
Inthis section we develop three“reference” architectures
2
fora distributed DBMS:
client/serversystems, peer-to-peer distributed DBMS, and multidatabase systems.
These are “idealized” viewsof a DBMS in that many of the commercially available
systemsmay deviate from them; however, the architectures will serveas a reasonable
frameworkwithin which the issues related to distributed DBMS can be discussed.
Wefirst start with abrief presentation of the “ANSI/SPARCarchitecture”, which is
adatalogical approach to defining a DBMS architecture – it focuses on the different
user classes and roles and their varying viewson data. This architecture is helpful in
puttingcertain concepts we have discussed so far in their proper perspective.We then
havea short discussion of a generic architecture of a centralized DBMSs, thatwe
subsequentlyextend to identify the set of alternative architectures for a distributed
DBMS. Whithin this characterization, we focus on the three alternatives that we
identifiedabove.
1.7.1 ANSI/SPARC Architecture
Inlate 1972, the Computer and Information Processing Committee (X3) of the Amer-
ican National Standards Institute (ANSI) established a Study Groupon Database
2
A reference architecture is commonly created by standards developersto clearly define the
interfacesthat need to be standardized.
22 1 Introduction
External
Schema
Conceptual
Schema
Internal
Schema
Internal
view
Conceptual
view
External
view
External
view
External
view
Users
Fig.1.8 The ANSI/SPARCArchitecture
ManagementSystems under the auspices of its Standards Planning andRequirements
Committee (SPARC).The mission ofthe study groupwas to study thefeasibility
ofsetting up standards in this area, as well as determining which aspects should be
standardized if it was feasible. The study group issued its interim report in 1975
[ANSI/SPARC, 1975], and its final report in 1977 [Tsichritzis and Klug, 1978].
The architectural framework proposed in these reports came to be known as the
“ANSI/SPARCarchitecture,” its full title being “ANSI/X3/SPARCDBMS Frame-
work.”The study group proposedthat the interfacesbe standardized, and defined
an architectural framework that contained 43 interfaces, 14 of which would deal
with the physical storage subsystem of the computer and therefore not be considered
essentialparts of the DBMS architecture.
Asimplified version of the ANSI/SPARC architecture is depicted in Figure 1.8.
Thereare three views of data: the external view, which is that of the end user, who
might be a programmer; the internal view, that of the system or machine; and
theconceptual view, that of the enterprise. For each of these views, an appropriate
schemadefinition is required.
Atthe lowest level of the architecture is the internal view, which deals with the
physicaldefinition and organization of data. The location of data on different storage
devicesand the access mechanisms used to reach and manipulate data are the issues
dealt with at this level.At the other extreme is the external view, which is concerned
withhow users view the database. An individual user’s view represents theportion of
thedatabase that will be accessed by that useras well as the relationships that the user
wouldlike to see among the data. A view can be shared among a number of users,
withthe collection of user views making up the external schema. In between these
twoends is the conceptual schema, which is an abstract definition of the database. It
isthe “real world” view of the enterprise being modeled in the database [Yormark,
1977]. As such, it is supposed to represent the data and the relationships among data
without considering the requirements of individual applications or the restrictions
ofthe physical storage media. In reality, however, it is not possible to ignore these
1.7 Distributed DBMS Architecture 23
requirementscompletely, due to performance reasons. The transformation between
thesethree levels is accomplished by mappings that specify how a definition at one
levelcan be obtained from a definition at another level.
This perspectiveis important, because it provides the basis for data independence
thatwe discussed earlier. The separation of the external schemas from the conceptual
schemaenables logical data independence, while the separation of the conceptual
schemafrom the internal schema allows physical data independence.
1.7.2 A Generic Centralized DBMS Architecture
A DBMS is a reentrant program shared by multiple processes (transactions), that
rundatabase programs. When running on a general purpose computer, a DBMS is
interfacedwith two other components: the communication subsystem and the operat-
ing system. The communication subsystem permits interfacing the DBMS with other
subsystems in order to communicate with applications. For example, the terminal
monitorneeds to communicate with the DBMS to run interactive transactions. The
operating system providesthe interface between the DBMS and computer resources
(processor,memory, disk drives, etc.).
Thefunctions performed by a DBMS can be layered as in Figure 1.9, where the
arrows indicate the direction of the data and the control flow. Taking a top-down
approach,the layers are the interface, control, compilation, execution, data access,
andconsistency management.
The interface layer manages the interface to the applications. There can be
several interfaces such as, in the case of relational DBMSs discussed in Chapter
2, SQL embedded in a host language, such as C and QBE (Query-by-Example).
Databaseapplication programs are executed against external views of the database.
Foran application, a view is useful in representing its particular perception of the
database (shared by many applications). A view in relationalDBMSs is a virtual
relationderived from base relations by applying relational algebra operations.
3
These
concepts are defined more precisely in Chapter 2, but they are usuallycovered in
undergraduate database courses, so we expect many readers to be familiar with
them.View management consists of translating the user query from external data to
conceptualdata.
The controllayer controls the query by adding semantic integrity predicates and
authorizationpredicates. Semantic integrity constraints and authorizations are usually
specifiedin a declarative language, as discussed in Chapter 5. The outputof this layer
isan enriched query in the high-level language accepted by the interface.
The query processing (or compilation) layer maps the query into an optimized
sequence of lower-level operations. This layer is concernedwith performance. It
3
Notethat this does not mean that the real-world views are, or should be, specified in relational
algebra.On the contrary, they are specified by some high-level data language such as SQL. The
translationfrom one of these languages to relational algebra is now well understood, and the effects
ofthe view definition can be specified in terms of relational algebra operations.
24 1 Introduction
Applications
User Interfaces
View Management
Semantic Integrity Control
Authorization Checking
Query Decomposition and Optimization
Access Plan Management
Access Plan Execution Control
Algebra Operation Execution
Buffer Management
Access Methods
Concurrency Control
Logging
retrieval/update
retrieval/update
relational algebra
relational calculus
relational calculus
Interface
Control
Compilation
Execution
Data Access
Consistency
Results
Database
Fig.1.9 Functional Layers of a Centralized DBMS
decomposes the query into a tree of algebra operations and tries to find the “optimal”
orderingof the operations. The result is stored in an access plan. The output of this
layeris a query expressed in lower-level code (algebra operations).
Theexecution layer directs the execution of theaccess plans, including transaction
management(commit, restart) and synchronization of algebra operations. Itinterprets
the relational operations by calling the data access layer through the retrieval and
updaterequests.
Thedata access layer manages the data structuresthat implement the files, indices,
etc.It also manages the buffers by caching the most frequently accesseddata. Careful
useof this layer minimizes the access to disks to get or write data.
Finally,the consistency layer manages concurrency controland logging for update
requests.This layer allows transaction, system, and media recovery after failure.
1.7 Distributed DBMS Architecture 25
1.7.3 Architectural Models for Distributed DBMSs
Wenow consider the possible ways in which a distributed DBMS may be architected.
Weuse a classification (Figure 1.10) that organizes the systems as characterized
with respect to (1) the autonomy of local systems, (2) their distribution, and (3) their
heterogeneity.
Distribution
Heterogeneity
Autonomy
Client/Server
Systems
Multidatabase
Systems
Peer-to-Peer
DDBSs
Fig.1.10 DBMS Implementation Alternatives
1.7.4 Autonomy
Autonomy, in this context, refers to the distribution of control, not of data. Itindi-
catesthe degree to which individual DBMSs can operate independently. Autonomy
is a function of a number of factors such as whetherthe component systems(i.e.,
individual DBMSs) exchange information, whether they can independently exe-
cutetransactions, and whether one is allowed to modify them. Requirements of an
autonomous system have been specified as follows [Gligor and Popescu-Zeletin,
1986]:
1.
The local operations of the individualDBMSs are not affected by their partic-
ipationin the distributed system.
26 1 Introduction
2.
The manner in which the individual DBMSs process queries and optimize
them should not be affected by the execution of globalqueries thataccess
multipledatabases.
3.
System consistencyor operation should not be compromised when individual
DBMSsjoin or leave the distributed system.
Onthe other hand, the dimensions of autonomy can be specified as follows [Du
andElmagarmid, 1989]:
1.
Design autonomy: Individual DBMSs are free to use the data models and
transactionmanagement techniques that they prefer.
2.
Communication autonomy: Each of the individualDBMSs is free to make its
owndecision as to what type of information it wants to provide to the other
DBMSsor to the software that controls their global execution.
3.
Executionautonomy: Each DBMS can execute the transactions that are sub-
mittedto it in any way that it wants to.
Wewill use aclassification that coversthe important aspectsof these features.
One alternative is tight integration, where a single-image of the entire database
is available to any user who wants to share theinformation, which may reside in
multipledatabases. From the users’ perspective, the data are logically integrated in
onedatabase. In these tightly-integrated systems, the data managers are implemented
so that one of them is in control of the processing of each user request even if
that request is serviced by more than one data manager. The data managers do
not typically operate as independent DBMSs even though they usually have the
functionalityto do so.
Nextwe identify semiautonomous systems that consist of DBMSs that can (and
usuallydo) operate independently, but have decided to participate in a federation to
maketheir local data sharable. Each of these DBMSs determine what parts of their
owndatabase they will make accessible to users of other DBMSs. They are not fully
autonomoussystems because they need to be modified to enable them to exchange
informationwith one another.
Thelast alternative that weconsider is total isolation, wherethe individual systems
are stand-alone DBMSs that knowneither of the existence of other DBMSs nor how
to communicate with them. In such systems, the processing of user transactions that
access multiple databases is especially difficultsince there is no global control over
theexecution of individual DBMSs.
It is important to note at this point that the three alternativesthat we consider for
autonomous systems are not the only possibilities. Wesimply highlight the three
mostpopular ones.
1.7 Distributed DBMS Architecture 27
1.7.5 Distribution
Whereas autonomy refers to the distribution (or decentralization) of control, the
distributiondimension of the taxonomydeals with data. Of course, weare considering
the physicaldistribution of data over multiple sites; as we discussed earlier, the user
sees the data as one logical pool. There are a numberof ways DBMSs havebeen
distributed.We abstract these alternatives into two classes: client/server distribution
andpeer-to-peer distribution (or full distribution). Together with the non-distributed
option,the taxonomy identifies three alternative architectures.
The client/server distribution concentrates data management duties at servers
while the clients focus on providing the application environment including the
user interface. The communication duties are shared between the client machines
andservers. Client/server DBMSs represent a practical compromise to distributing
functionality. There are a variety of ways of structuring them, each providing a
different level of distribution. With respect to the framework, we abstract these
differencesand leave thatdiscussion to Section 1.7.8, which wedevote to client/server
DBMSarchitectures. What is important at this point is that the sites on a network are
distinguishedas “clients” and “servers” and their functionality is different.
Inpeer-to-peer systems, there is no distinction of client machines versus servers.
Each machine has full DBMS functionality and can communicatewith otherma-
chinesto execute queries and transactions. Most of the very early work on distributed
database systems haveassumed peer-to-peer architecture. Therefore, our main focus
inthis book are on peer-to-peer systems (also called fully distributed), even though
manyof the techniques carry over to client/server systems as well.
1.7.6 Heterogeneity
Heterogeneity may occur in various forms in distributed systems, ranging from
hardwareheterogeneity and differences in networking protocols to variations in data
managers.The important ones from the perspective of this bookrelate to data models,
query languages, and transaction management protocols. Representing data with
different modeling tools creates heterogeneity because of the inherentexpressive
powersand limitations of individual data models. Heterogeneity in query languages
notonly involves the use of completely different data access paradigms in different
datamodels (set-at-a-time access in relational systems versus record-at-a-time access
in some object-oriented systems), but also covers differences in languages even
when the individual systems use the same datamodel. AlthoughSQL is now the
standard relational query language, there are many different implementations and
every vendor’s language has a slightly different flavor (sometimeseven different
semantics,producing different results).
28 1 Introduction
1.7.7 Architectural Alternatives
Thedistribution of databases, their possible heterogeneity, and their autonomy are
orthogonal issues. Consequently, following the above characterization, there are
18 different possible architectures. Not all of these architectural alternativesthat
form the design space are meaningful. Furthermore,not allare relevantfrom the
perspectiveof this book.
In Figure 1.10, we haveidentified three alternative architectures that are the focus
of this book and that we discuss in more detail in the next three subsections: (A0,
D1, H0) that corresponds to client/server distributed DBMSs, (A0, D2, H0) that
is a peer-to-peerdistributed DBMS and (A2, D2, H1) which represents a (peer-to-
peer) distributed, heterogeneous multidatabase system. Note that we discuss the
heterogeneityissues within the context of one system architecture, although the issue
arisesin other models as well.
1.7.8 Client/Server Systems
Client/serverDBMSs entered the computing scene at the beginning of 1990’s and
havemade a significant impact on both the DBMS technology and the way we do
computing. The general idea is very simple and elegant: distinguish the functionality
thatneeds to be provided and divide these functions into twoclasses: server functions
andclient functions. This provides a two-level architecture which makes it easier to
managethe complexity of modern DBMSs and the complexity of distribution.
As with any highly popular term, client/server hasbeen much abused and has
come to mean differentthings. If one takes a process-centric view, then any process
thatrequests the services of another process is its client and vice versa. However, it
isimportant to note that “client/server computing” and “client/server DBMS,” as it is
used in our context,do not refer to processes, but to actual machines. Thus, we focus
on what software should run on the client machines and what software should run on
theserver machine.
Putthis way, the issue is clearer and we can beginto study the differences in client
and server functionality. The functionality allocation between clients and serves
differin different types of distributed DBMSs (e.g., relational versus object-oriented).
Inrelational systems, the server does most of the data management work. This means
thatall of query processing and optimization, transaction management and storage
managementis done at the server. The client, in addition to the application and the
userinterface, has a DBMS client module that is responsible for managing the data
that is cached to the client and (sometimes) managing the transaction locks that may
havebeen cached as well. It is also possible to place consistency checking of user
queries at the client side, butthis is not common since it requires thereplication
of the system catalog at the client machines. Of course, there isoperating system
and communication softwarethat runs on both the client and the server, but we only
focuson the DBMS related functionality. This architecture, depicted in Figure 1.11,
1.7 Distributed DBMS Architecture 29
Database
SQL
queries
Semantic Data Controller
Query Optimizer
Transaction Manager
Recovery Manager
Runtime Support Processor
Communication Software
O
p
e
r
a
t
i
n
g
S y s t e m
Communication Software
Client DBMS
User
Interface
Application
Program
Operating
System
Result
relation
Fig.1.11 Client/ServerReference Architecture
is quite common in relational systems where the communication between the clients
andthe server(s) is at the level of SQL statements. In other words, the client passes
SQL queries to the server without trying to understand or optimize them. The server
doesmost of the work and returns the result relation to the client.
There are a number of differenttypes of client/server architecture. The simplest is
thecase where there is only one server which is accessed by multiple clients. We call
thismultiple client/single server . From adata management perspective, this is not
muchdifferent from centralized databases since the database is stored on only one
machine (the server) that also hosts the software to manage it. However,there are
some(important) differences from centralized systems in the way transactions are
executedand caches are managed. We do not consider such issues at this point. A
moresophisticated client/server architecture is one wherethere are multiple servers in
the system (the so-called multiple client/multiple server approach).In this case, two
alternativemanagement strategies are possible: either each client manages its own
connectionto the appropriate server or each client knows of only its “home server”
which then communicates with other servers as required. The former approach
simplifiesserver code, but loads the client machines with additional responsibilities.
Thisleads to what has been called “heavy client” systems. The latter approach, on
30 1 Introduction
the other hand, concentrates the data management functionality at the servers. Thus,
thetransparency of data access is provided at the server interface, leading to “light
clients.”
Froma datalogical perspective, client/server DBMSs provide the same view of
dataas do peer-to-peer systems that we discuss next. That is, they give the user the
appearance of a logically single database, while at the physical level data
may
be
distributed. Thus the primary distinction between client/server systems and peer-
to-peer ones is not in the level of transparency that is provided to the users and
applications, but in the architectural paradigm that is used to realize this level of
transparency.
Client/servercan be naturally extended to provide for a more efficient function
distributionon different kinds of servers: client servers run the user interface (e.g.,
web servers), application servers run application programs, and database servers
run database management functions. This leads to the present trend in three-tier
distributed system architecture, where sites are organized as specialized servers
ratherthan as general-purpose computers.
The original idea, which is to offload the databasemanagement functionsto a
specialserver, dates back to the early 1970s [Canaday et al., 1974]. At the time, the
computeron which the database system was run was called the database machine,
databasecomputer , or backendcomputer, while the computer that ran the applica-
tions was called the host computer. More recent termsfor theseare thedatabase
server and application server, respectively.Figure 1.12 illustrates a simple view of
the database server approach, with application servers connected toone database
servervia a communication network.
Thedatabase server approach, as an extension of the classical client/server archi-
tecture, has severalpotential advantages. First, the single focus on data management
makespossible the development of specific techniques for increasing data reliability
and availability,e.g. using parallelism. Second, the overall performance of database
managementcan be significantly enhanced by the tight integration of the database
system and a dedicated database operating system. Finally,a database server can
also exploitrecent hardware architectures, such as multiprocessors or clusters of PC
serversto enhance both performance and data availability.
Although these advantages are significant, theycan be offset by the overhead
introduced by the additional communication between the application and the data
servers. This is an issue, of course, in classical client/server systems as well, but
in this case there is an additional layer of communication to worry about. The
communication cost can be amortized only if the serverinterface is sufficiently high
levelto allow the expression of complex queries involving intensive data processing.
The application server approach (indeed, a n-tier distributedapproach) canbe
extendedby the introduction of multiple database servers and multiple application
servers(Figure 1.13), as can be done in classical client/server architectures. In this
case, it is typically the case that each application serveris dedicated to one or a few
applications, while database serversoperate in the multiple server fashion discussed
above.
1.7 Distributed DBMS Architecture 31
network
Application
server
Database
server
Client Client
...
network
Fig.1.12 Database ServerApproach
network
Database
server
Client
Application
server
Client
...
network
Database
server
Database
server
Application
server
...
Fig.1.13 DistributedDatabase Servers