ALAN DIX, JANET FINLAY,
GREGORY D. ABOWD, RUSSELL BEALE
THIRD EDITION
THIRD
EDITION
HUMAN–COMPUTER
INTERACTION
Much has changed since the first edition of
Human–Computer Interaction was published. Ubiquitous
computing and rich sensor-filled environments are
finding their way out of the laboratory, not just into
movies but also into our workplaces and homes. The
computer has broken out of its plastic and glass
bounds providing us with networked societies where
personal computing devices from mobile phones to
smart cards fill our pockets and electronic devices
surround us at home and work. The web too has grown
from a largely academic network into the hub of
business and everyday lives. As the distinctions between
the physical and the digital, and between work and
leisure start to break down, human–computer
interaction is also changing radically.
The excitement of these changes is captured in this new
edition, which also looks forward to other emerging
technologies. However, the book is firmly rooted in
strong principles and models independent of the
passing technologies of the day: these foundations will
be the means by which today’s students will
understand tomorrow’s technology.
The third edition of Human–Computer Interaction can be
used for introductory and advanced courses on HCI,
Interaction Design, Usability or Interactive Systems
Design. It will also prove an invaluable reference for
professionals wishing to design usable computing
devices.
Accompanying the text is a comprehensive website
containing a broad range of material for instructors,
students and practitioners, a full text search facility for
the book, links to many sites of additional interest and
much more: go to www.hcibook.com
Alan Dix is Professor in the Department of Computing, Lancaster, UK. Janet Finlay is
Professor in the School of Computing, Leeds Metropolitan University, UK. Gregory D. Abowd is
Associate Professor in the College of Computing and GVU Center at Georgia Tech, USA.
Russell Beale is lecturer at the School of Computer Science, University of
Birmingham, UK.
Cover illustration by Peter Gudynas
New to this edition:
A revised structure, reflecting the
growth of HCI as a discipline,
separates out basic material suitable
for introductory courses from more
detailed models and theories.
New chapter on interaction design
adds material on scenarios and basic
navigation design.
New chapter on universal design,
substantially extending the coverage
of this material in the book.
Updated and extended treatment of
socio/contextual issues.
Extended and new material on novel
interaction,including updated
ubicomp material,designing
experience,physical sensors and a
new chapter on rich interaction.
Updated material about the web
including dynamic content.
Relaunched websiteincluding case
studies,WAP access and search.
www.pearson-books.com
HUMAN–COMPUTER
INTERACTION
DIX
FINLAY
ABOWD
BEALE
Human–Computer Interaction
We work with leading authors to develop the
strongest educational materials in computing,
bringing cutting-edge thinking and best learning
practice to a global market.
Under a range of well-known imprints, including
Prentice Hall, we craft high quality print and
electronic publications which help readers to
understand and apply their content, whether
studying or at work.
To find out more about the complete range of our
publishing, please visit us on the world wide web at:
www.pearsoned.co.uk
Human–Computer Interaction
Third Edition
Alan Dix,
Lancaster University
Janet Finlay, Leeds Metropolitan University
Gregory D. Abowd, Georgia Institute of Technology
Russell Beale, University of Birmingham
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England
and Associated Companies throughout the world
Visit us on the world wide web at:
www.pearsoned.co.uk
First published 1993
Second edition published 1998
Third edition published 2004
© Prentice-Hall Europe 1993, 1998
© Pearson Education Limited 2004
The rights of Alan Dix, Janet E. Finlay, Gregory D. Abowd and Russell Beale
to be identified as authors of this work have been asserted by them in
accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored
in a retrieval system, or transmitted in any form or by any means, electronic,
mechanical, photocopying, recording or otherwise, without either the prior
written permission of the publisher or a licence permitting restricted copying
in the United Kingdom issued by the Copyright Licensing Agency Ltd,
90 Tottenham Court Road, London W1T 4LP.
All trademarks used herein are the property of their respective owners. The use
of any trademark in this text does not vest in the author or publisher any trademark
ownership rights in such trademarks, nor does the use of such trademarks imply any
affiliation with or endorsement of this book by such owners.
ISBN-13: 978-0-13-046109-4
ISBN-10: 0-13-046109-1
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
109876543
10 09 08 07 06
Typeset in 10/12
1
/2pt Minion by 35
Printed and bound by Scotprint, Haddington
BRIEF CONTENTS
Guided tour xiv
Foreword xvi
Preface to the third edition xix
Publisher’s acknowledgements xxiii
Introduction 1
FOUNDATIONS 9
Chapter 1 The human 11
Chapter 2 The computer 59
Chapter 3 The interaction 123
Chapter 4 Paradigms 164
DESIGN PROCESS 189
Chapter 5 Interaction design basics 191
Chapter 6 HCI in the software process 225
Chapter 7 Design rules 258
Chapter 8 Implementation support 289
Chapter 9 Evaluation techniques 318
Chapter 10 Universal design 365
Chapter 11 User support 395
MODELS AND THEORIES 417
Chapter 12 Cognitive models 419
Chapter 13 Socio-organizational issues and stakeholder requirements 450
Part 3
Part 2
Part 1
vi Brief Contents
Chapter 14 Communication and collaboratio0n models 475
Chapter 15 Task analysis 510
Chapter 16 Dialog notations and design 544
Chapter 17 Models of the system 594
Chapter 18 Modeling rich interaction 629
OUTSIDE THE BOX 661
Chapter 19 Groupware 663
Chapter 20 Ubiquitous computing and augmented realities 716
Chapter 21 Hypertext, multimedia and the world wide web 748
References 791
Index 817
Part 4
CONTENTS
Guided tour xiv
Foreword xvi
Preface to the third edition xix
Publisher’s acknowledgements xxiii
Introduction 1
FOUNDATIONS 9
Chapter 1 The human 11
1.1 Introduction 12
1.2 Input–output channels 13
Design Focus: Getting noticed 16
Design Focus: Where’s the middle? 22
1.3 Human memory 27
Design Focus: Cashing in 30
Design Focus: 7 ± 2 revisited 32
1.4 Thinking: reasoning and problem solving 39
Design Focus: Human error and false memories 49
1.5 Emotion 51
1.6 Individual differences 52
1.7 Psychology and the design of interactive systems 53
1.8 Summary 55
Exercises 56
Recommended reading 57
Chapter 2 The computer 59
2.1 Introduction 60
2.2 Text entry devices 63
Design Focus: Numeric keypads 67
2.3 Positioning, pointing and drawing 71
Part 1
viii Contents
2.4 Display devices 78
Design Focus: Hermes: a situated display 86
2.5 Devices for virtual reality and 3D interaction 87
2.6 Physical controls, sensors and special devices 91
Design Focus: Feeling the road 94
Design Focus: Smart-Its – making using sensors easy 96
2.7 Paper: printing and scanning 97
Design Focus: Readability of text 101
2.8 Memory 107
2.9 Processing and networks 114
Design Focus: The myth of the infinitely fast machine 116
2.10 Summary 120
Exercises 121
Recommended reading 122
Chapter 3 The interaction 123
3.1 Introduction 124
3.2 Models of interaction 124
Design Focus: Video recorder 130
3.3 Frameworks and HCI 130
3.4 Ergonomics 131
Design Focus: Industrial interfaces 133
3.5 Interaction styles 136
Design Focus: Navigation in 3D and 2D 144
3.6 Elements of the WIMP interface 145
Design Focus: Learning toolbars 151
3.7 Interactivity 152
3.8 The context of the interaction 154
Design Focus: Half the picture? 155
3.9 Experience, engagement and fun 156
3.10 Summary 160
Exercises 161
Recommended reading 162
Chapter 4 Paradigms 164
4.1 Introduction 165
4.2 Paradigms for interaction 165
4.3 Summary 185
Exercises 186
Recommended reading 187
Contents ix
DESIGN PROCESS 189
Chapter 5 Interaction design basics 191
5.1 Introduction 192
5.2 What is design? 193
5.3 The process of design 195
5.4 User focus 197
Design Focus: Cultural probes 200
5.5 Scenarios 201
5.6 Navigation design 203
Design Focus: Beware the big button trap 206
Design Focus: Modes 207
5.7 Screen design and layout 211
Design Focus: Alignment and layout matter 214
Design Focus: Checking screen colors 219
5.8 Iteration and prototyping 220
5.9 Summary 222
Exercises 223
Recommended reading 224
Chapter 6 HCI in the software process 225
6.1 Introduction 226
6.2 The software life cycle 226
6.3 Usability engineering 237
6.4 Iterative design and prototyping 241
Design Focus: Prototyping in practice 245
6.5 Design rationale 248
6.6 Summary 256
Exercises 257
Recommended reading 257
Chapter 7 Design rules 258
7.1 Introduction 259
7.2 Principles to support usability 260
7.3 Standards 275
7.4 Guidelines 277
7.5 Golden rules and heuristics 282
7.6 HCI patterns 284
7.7 Summary 286
Exercises 287
Recommended reading 288
Part 2
x Contents
Chapter 8 Implementation support 289
8.1 Introduction 290
8.2 Elements of windowing systems 291
8.3 Programming the application 296
Design Focus: Going with the grain 301
8.4 Using toolkits 302
Design Focus: Java and AWT 304
8.5 User interface management systems 306
8.6 Summary 313
Exercises 314
Recommended reading 316
Chapter 9 Evaluation techniques 318
9.1 What is evaluation? 319
9.2 Goals of evaluation 319
9.3 Evaluation through expert analysis 320
9.4 Evaluation through user participation 327
9.5 Choosing an evaluation method 357
9.6 Summary 362
Exercises 363
Recommended reading 364
Chapter 10 Universal design 365
10.1 Introduction 366
10.2 Universal design principles 366
10.3 Multi-modal interaction 368
Design Focus: Designing websites for screen readers 374
Design Focus: Choosing the right kind of speech 375
Design Focus: Apple Newton 381
10.4 Designing for diversity 384
Design Focus: Mathematics for the blind 386
10.5 Summary 393
Exercises 393
Recommended reading 394
Chapter 11 User support 395
11.1 Introduction 396
11.2 Requirements of user support 397
11.3 Approaches to user support 399
11.4 Adaptive help systems 404
Design Focus: It’s good to talk – help from real people 405
11.5 Designing user support systems 412
11.6 Summary 414
Exercises 415
Recommended reading 416
Contents xi
MODELS AND THEORIES 417
Chapter 12 Cognitive models 419
12.1 Introduction 420
12.2 Goal and task hierarchies 421
Design Focus: GOMS saves money 424
12.3 Linguistic models 430
12.4 The challenge of display-based systems 434
12.5 Physical and device models 436
12.6 Cognitive architectures 443
12.7 Summary 447
Exercises 448
Recommended reading 448
Chapter 13 Socio-organizational issues and stakeholder requirements 450
13.1 Introduction 451
13.2 Organizational issues 451
Design Focus: Implementing workflow in Lotus Notes 457
13.3 Capturing requirements 458
Design Focus: Tomorrow’s hospital – using participatory design 468
13.4 Summary 472
Exercises 473
Recommended reading 474
Chapter 14 Communication and collaboration models 475
14.1 Introduction 476
14.2 Face-to-face communication 476
Design Focus: Looking real – Avatar Conference 481
14.3 Conversation 483
14.4 Text-based communication 495
14.5 Group working 504
14.6 Summary 507
Exercises 508
Recommended reading 509
Chapter 15 Task analysis 510
15.1 Introduction 511
15.2 Differences between task analysis and other techniques 511
15.3 Task decomposition 512
15.4 Knowledge-based analysis 519
15.5 Entity–relationship-based techniques 525
15.6 Sources of information and data collection 532
15.7 Uses of task analysis 538
Part 3
xii Contents
15.8 Summary 541
Exercises 542
Recommended reading 543
Chapter 16 Dialog notations and design 544
16.1 What is dialog? 545
16.2 Dialog design notations 547
16.3 Diagrammatic notations 548
Design Focus: Using STNs in prototyping 551
Design Focus: Digital watch – documentation and analysis 563
16.4 Textual dialog notations 565
16.5 Dialog semantics 573
16.6 Dialog analysis and design 582
16.7 Summary 589
Exercises 591
Recommended reading 592
Chapter 17 Models of the system 594
17.1 Introduction 595
17.2 Standard formalisms 595
17.3 Interaction models 608
17.4 Continuous behavior 618
17.5 Summary 624
Exercises 625
Recommended reading 627
Chapter 18 Modeling rich interaction 629
18.1 Introduction 630
18.2 Status–event analysis 631
18.3 Rich contexts 639
18.4 Low intention and sensor-based interaction 649
Design Focus: Designing a car courtesy light 655
18.5 Summary 657
Exercises 658
Recommended reading 659
OUTSIDE THE BOX 661
Chapter 19 Groupware 663
19.1 Introduction 664
19.2 Groupware systems 664
Part 4
Contents xiii
19.3 Computer-mediated communication 667
Design Focus: SMS in action 673
19.4 Meeting and decision support systems 679
19.5 Shared applications and artifacts 685
19.6 Frameworks for groupware 691
Design Focus: TOWER – workspace awareness 701
19.7 Implementing synchronous groupware 702
19.8 Summary 713
Exercises 714
Recommended reading 715
Chapter 20 Ubiquitous computing and augmented realities 716
20.1 Introduction 717
20.2 Ubiquitous computing applications research 717
Design Focus: Ambient Wood – augmenting the physical 723
Design Focus: Classroom 2000/eClass – deploying and evaluating ubicomp 727
Design Focus: Shared experience 732
20.3 Virtual and augmented reality 733
Design Focus: Applications of augmented reality 737
20.4 Information and data visualization 738
Design Focus: Getting the size right 740
20.5 Summary 745
Exercises 746
Recommended reading 746
Chapter 21 Hypertext, multimedia and the world wide web 748
21.1 Introduction 749
21.2 Understanding hypertext 749
21.3 Finding things 761
21.4 Web technology and issues 768
21.5 Static web content 771
21.6 Dynamic web content 778
21.7 Summary 787
Exercises 788
Recommended reading 788
References 791
Index 817
xiv Guided tour
DESIGN PROCESS
In this part, we concentrate on how design practice
addresses the critical feature of an interactive system –
usability from the human perspective. The chapters in
this part promote the purposeful design of more usable
interactive systems. We begin in Chapter 5 by introducing
the key elements in the interaction design process. These
elements are then expanded in later chapters.
Chapter 6 discusses the design process in more detail,
specifically focussing on the place of user-centered design
within a software engineering framework. Chapter 7 high-
lights the range of design rules that can help us to specify
usable interactive systems, including abstract principles,
guidelines and other design representations.
In Chapter 8, we provide an overview of implementa-
tion support for the programmer of an interactive system.
Chapter 9 is concerned with the techniques used to evalu-
ate the interactive system to see if it satisfies user needs.
Chapter 10 discusses how to design a system to be univer-
sally accessible, regardless of age, gender, cultural background
or ability. In Chapter 11 we discuss the provision of user
support in the form of help systems and documentation.
PART
2
MODELING RICH
INTERACTION
OVERVIEW
We operate within an ecology of people, physical artifacts
and electronic systems, and this rich ecology has recently
become more complex as electronic devices invade the
workplace and our day-to-day lives. We need methods
to deal with these rich interactions.
nStatus–event analysis is a semi-formal, easy to apply
technique that:
–classifies phenomena as event or status
–embodies naïve psychology
–highlights feedback problems in interfaces.
nAspects of rich environments can be incorporated into
methods such as task analysis:
–other people
–information requirements
–triggers for tasks
–modeling artifacts
–placeholders in task sequences.
nNew sensor-based systems do not require explicit
interaction; this means:
–new cognitive and interaction models
–new design methods
–new system architectures.
18
19.3 Computer-mediated communication
675
CuSeeMe
Special-purpose video conferencing is still relatively expensive, but low-fidelity desktop
video conferencing is now within the reach of many users of desktop computers. Digital video
cameras are now inexpensive and easily obtainable. They often come with pre-packaged video
phone or video conferencing software. However, the system which has really popularized
video conferencing is a web-based tool. CuSeeMe works over the internet allowing participants
across the world owning only a basic digital video camera to see and talk to one another. The soft-
ware is usually public domain (although there are commercial versions) and the services allowing
connection are often free. The limited bandwidth available over long-distance internet links means
that video quality and frame rates are low and periodic image break-up may occur. In fact, it is
sound break-up which is more problematic. After all, we can talk to one another quite easily with-
out seeing one another, but find it very difficult over a noisy phone line. Often participants may
see one another’s video image, but actually discuss using a synchronous text-based ‘talk’ program.
CuSeeMe – video conferencing on the internet. Source: Courtesy of Geoff Ellis
440 Chapter 12nCognitive models
Worked exercise Do a keystroke-level analysis for opening up an application in a visual desktop interface using
a mouse as the pointing device, comparing at least two different methods for performing the
task. Repeat the exercise using a trackball. Consider how the analysis would differ for various
positions of the trackball relative to the keyboard and for other pointing devices.
Answer We provide a keystroke-level analysis for three different methods for launching an
application on a visual desktop. These methods are analyzed for a conventional one-
button mouse, a trackball mounted away from the keyboard and one mounted close to
the keyboard. The main distinction between the two trackballs is that the second one
does not require an explicit repositioning of the hands, that is there is no time required
for homing the hands between the pointing device and the keyboard.
Method 1 Double clicking on application icon
Steps Operator Mouse Trackball
1
Trackball
2
1. Move hand to mouse H[mouse] 0.400 0.400 0.000
2. Mouse to icon P[to icon] 0.664 1.113 1.113
3. Double click 2B[click] 0.400 0.400 0.400
4. Return to keyboard H[kbd] 0.400 0.400 0.000
Total times 1.864 2.313 1.513
Method 2 Using an accelerator key
Steps Operator Mouse Trackball
1
Trackball
2
1. Move hand to mouse H[mouse] 0.400 0.400 0.000
2. Mouse to icon P[to icon] 0.664 1.113 1.113
3. Click to select B[click] 0.200 0.200 0.200
4. Pause M 1.350 1.350 1.350
5. Return to keyboard H[kbd] 0.400 0.400 0.000
6. Press accelerator K 0.200 0.200 0.200
Total times 3.214 3.663 2.763
Method 3 Using a menu
Steps Operator Mouse Trackball
1
Trackball
2
1. Move hand to mouse H[mouse] 0.400 0.400 0.000
2. Mouse to icon P[to icon] 0.664 1.113 1.113
3. Click to select B[click] 0.200 0.200 0.200
4. Pause M 1.350 1.350 1.350
5. Mouse to file menu P 0.664 1.113 1.113
6. Pop-up menu B[down] 0.100 0.100 0.100
7. Drag to open P[drag] 0.713 1.248 1.248
8. Release mouse B[up] 0.100 0.100 0.100
9. Return to keyboard H[kbd] 0.400 0.400 0.000
Total times 4.591 6.024 5.224
The part structure separates out introductory and more
advanced material, with each part opener giving a simple
description of what its constituent chapters cover
Bullet points at the start of each chapter highlight the
core coverage
Worked exercises within chapters provide step-by-step
guidelines to demonstrate problem-solving techniques
Boxed asides contain descriptions of particular tasks or
technologies for additional interest, experimentation
and discussion
Guided tour xv
732 Chapter 20nUbiquitous computing and augmented realities
within these environments. Much of our understanding of work has developed from
Fordist and Taylorist principles on the structuring of activities and tasks. Evaluation
within HCI reflects these roots and is often predicated on notions of task and the
measurement of performance and efficiency in meeting these goals and tasks.
However, it is not clear that these measures can apply universally across activities
when we move away from structured and paid work to other activities. For example,
DESIGN FOCUS
Shared experience
You are in the Mackintosh Interpretation Centre in an arts center in Glasgow, Scotland. You notice a
man wearing black wandering around looking at the exhibits and then occasionally at a small PDA he is
holding. As you get closer he appears to be talking to himself, but then you realize he is simply talking
into a head-mounted microphone. ‘Some people can never stop using their mobile phone’, you think.
As you are looking at one exhibit, he comes across and suddenly cranes forward to look more closely,
getting right in front of you. ‘How rude’, you think.
The visitor is taking part in the City project – a mixed-reality experience. He is talking to two other
people at remote sites, one who has a desktop VR view of the exhibition and the other just a website.
However, they can all see representations of each other. The visitor is being tracked by ultrasound and
he appears in the VR world. Also, the web user’s current page locates her in a particular part of the
virtual exhibition. All of the users see a map of the exhibitiion showing where they all are.
You might think that in such an experiment the person actually in the museum would take the lead, but
in fact real groups using this system seemed to have equal roles and really had a sense of shared experi-
ence despite their very different means of seeing the exhibition.
See the book website for a full case study: /e3/casestudy/city/
City project: physical presence, VR interfaces and web interface. Source: Courtesy of
Matthew Chalmers, note: City is an Equator project
Recommended reading
509
RECOMMENDED READING
J. Carroll, editor, HCI Models, Theories, and Frameworks: Toward an Interdisciplinary
Science, Morgan Kaufmann, 2003.
See chapters by Perry on distributed cognition, Monk on common ground and
Kraut on social psychology.
L. A. Suchman, Plans and Situated Actions: The Problem of Human–Machine
Communication, Cambridge University Press, 1987.
This book popularized ethnography within HCI. It puts forward the viewpoint
that most actions are not pre-planned, but situated within the context in which
they occur. The principal domain of the book is the design of help for a photo-
copier. This is itself a single-user task, but the methodology applied is based on
both ethnographic and conversational analysis. The book includes several chap-
ters discussing the contextual nature of language and analysis of conversation
transcripts.
T. Winograd and F. Flores, Understanding Computers and Cognition: A New
Foundation for Design, Addison-Wesley, 1986.
Like Suchman, this book emphasizes the contextual nature of language and the
weakness of traditional artificial intelligence research. It includes an account of
speech act theory as applied to Coordinator. Many people disagree with the
authors’ use of speech act theory, but, whether by application or reaction, this
work has been highly influential.
S. Greenberg, editor, Computer-supported Cooperative Work and Groupware,
Academic Press, 1991.
The contents of this collection originally made up two special issues of the
International Journal of Man–Machine Studies. In addition, the book contains
Greenberg’s extensive annotated bibliography of CSCW, a major entry point for
any research into the field. Updated versions of the bibliography can be obtained
from the Department of Computer Science, University of Calgary, Calgary,
Alberta, Canada.
Communications of the ACM, Vol. 34, No. 12, special issue on ‘collaborative com-
puting’, December 1991.
Several issues of the journal Interacting with Computersfrom late 1992 through early
1993 have a special emphasis on CSCW.
Computer-Supported Cooperative Workis a journal dedicated to CSCW. See also back
issues of the journal Collaborative Computing. This ran independently for a while,
but has now merged with Computer-Supported Cooperative Work.
See also the recommended reading list for Chapter 19, especially the conference
proceedings.
Exercises
393
SUMMARY
Universal design is about designing systems that are accessible by all users in all
circumstances, taking account of human diversity in disabilities, age and culture.
Universal design helps everyone – for example, designing a system so that it can be
used by someone who is deaf or hard of hearing will benefit other people working in
noisy environments or without audio facilities. Designing to be accessible to screen-
reading systems will make websites better for mobile users and older browsers.
Multi-modal systems provide access to system information and functionality
through a range of different input and output channels, exploiting redundancy.
Such systems will enable users with sensory, physical or cognitive impairments to
make use of the channels that they can use most effectively. But all users benefit
from multi-modal systems that utilize more of our senses in an involving interactive
experience.
For any design choice we should ask ourselves whether our decision is excluding
someone and whether there are any potential confusions or misunderstandings in
our choice.
10.5
EXERCISES
10.1 Is multi-modality always a good thing? Justify your answer.
10.2 What are (i) auditory icons and (ii) earcons? How can they be used to benefit both visually
impaired and sighted users?
10.3 Research your country’s legislation relating to accessibility of technology for disabled people.
What are the implications of this to your future career in computing?
10.4 Take your university website or another site of your choice and assess it for accessibility using
Bobby. How would you recommend improving the site?
10.5 How could systems be made more accessible to older users?
10.6 Interview either (i) a person you know over 65 or (ii) a child you know under 16 about their
experience, attitude and expectations of computers. What factors would you take into account
if you were designing a website aimed at this person?
10.7 Use the screen reader simulation available at www.webaim.org/simulations/screenreader to
experience something of what it is like to access the web using a screen reader. Can you find
the answers to the test questions on the site?
Annotated further reading encourages readers to
research topics in depth
Design Focus mini case studies highlight practical
applications of HCI concepts
Frequent links to the
book website for
further information
Chapter summaries reinforce student learning.
Exercises at the end of chapters can be used by
teachers or individuals to test understanding
FOREWORD
Human–computer interaction is a difficult endeavor with glorious rewards.
Designing interactive computer systems to be effective, efficient, easy, and enjoyable to
use is important, so that people and society may realize the benefits of computation-
based devices. The subtle weave of constraints and their trade-offs – human,
machine, algorithmic, task, social, aesthetic, and economic – generates the difficulty.
The reward is the creation of digital libraries where scholars can find and turn the
pages of virtual medieval manuscripts thousands of miles away; medical instruments
that allow a surgical team to conceptualize, locate, and monitor a complex neuro-
surgical operation; virtual worlds for entertainment and social interaction, respon-
sive and efficient government services, from online license renewal to the analysis of
parliamentary testimony; or smart telephones that know where they are and under-
stand limited speech. Interaction designers create interaction in virtual worlds and
embed interaction in physical worlds.
Human–computer interaction is a specialty in many fields, and is therefore multi-
disciplinary, but it has an intrinsic relationship as a subfield to computer science.
Most interactive computing systems are for some human purpose and interact with
humans in human contexts. The notion that computer science is the study of algo-
rithms has virtue as an attempt to bring foundational rigor, but can lead to ignoring
constraints foundational to the design of successful interactive computer systems.
A lesson repeatedly learned in engineering is that a major source of failure is the
narrow optimization of a design that does not take sufficient account of contextual
factors. Human users and their contexts are major components of the design
problem that cannot be wished away simply because they are complex to address. In
fact, that largest part of program code in most interactive systems deals with user
interaction. Inadequate attention to users and task context not only leads to bad user
interfaces, it puts entire systems at risk.
The problem is how to take into account the human and contextual part of a sys-
tem with anything like the rigor with which other parts of the system are understood
and designed – how to go beyond fuzzy platitudes like ‘know the user’ that are true,
but do not give a method for doing or a test for having done. This is difficult to do,
but inescapable, and, in fact, capable of progress. Over the years, the need to take
into account human aspects of technical systems has led to the creation of new fields
of study: applied psychology, industrial engineering, ergonomics, human factors,
Foreword xvii
man–machine systems. Human–computer interaction is the latest of these, more
complex in some ways because of the breadth of user populations and applications,
the reach into cognitive and social constraints, and the emphasis on interaction. The
experiences with other human-technical disciplines lead to a set of conclusions about
how a discipline of human–computer interaction should be organized if it is to be
successful.
First, design is where the action is. An effective discipline of human–computer
interaction cannot be based largely on ‘usability analysis’, important though that
may be. Usability analysis happens too late; there are too few degrees of freedom; and
most importantly, it is not generative. Design thrives on understanding constraints,
on insight into the design space, and on deep knowledge of the materials of the
design, that is, the user, the task, and the machine. The classic landmark designs in
human–computer interaction, such as the Xerox Star and the Apple Lisa/Macintosh,
were not created from usability analysis (although usability analysis had important
roles), but by generative principles for their designs by user interface designers who
had control of the design and implementation.
Second, although the notion of ‘user-centered design’ gets much press, we should
really be emphasizing ‘task-centered design’. Understanding the purpose and con-
text of a system is key to allocating functions between people and machines and to
designing their interaction. It is only in deciding what a human–machine system
should do and the constraints on this goal that the human and technical issues can
be resolved. The need for task-centered design brings forward the need for methods
of task analysis as a central part of system design.
Third, human–computer interaction needs to be structured to include both
analytic and implementation methods together in the same discipline and taught
together as part of the core. Practitioners of the discipline who can only evaluate, but
not design and build are under a handicap. Builders who cannot reason analytically
about the systems they build or who do not understand the human information pro-
cessing or social contexts of their designs are under a handicap. Of course, there will
be specialists in one or another part of human–computer interaction, but for there
to be a successful field, there must be a common core.
Finally, what makes a discipline is a set of methods for doing something. A field
must have results that can be taught and used by people other than their originators
to do something. Historically, a field naturally evolves from a set of point results to
a set of techniques to a set of facts, general abstractions, methods, and theories. In
fact, for a field to be cumulative, there must be compaction of knowledge by crunch-
ing the results down into methods and theories; otherwise the field becomes fad-
driven and a collection of an almost unteachably large set of weak results. The most
useful methods and theories are generative theories: from some task analysis it is
possible to compute some insightful property that constrains the design space of a
system. In a formula: task analysis, approximation, and calculation. For example,
we can predict that if a graphics system cannot update the display faster than 10
times/second then the illusion of animation will begin to break down. This con-
straint worked backwards has architectural implications for how to guarantee the
needed display rate under variable computational load. It can be designed against.
xviii Foreword
This textbook, by Alan Dix, Janet Finlay, Gregory Abowd, and Russell Beale,
represents how far human–computer interaction has come in developing and
organizing technical results for the design and understanding of interactive
systems. Remarkably, by the light of their text, it is pretty far, satisfying all the just-
enumerated conclusions. This book makes an argument that by now there are many
teachable results in human–computer interaction by weight alone! It makes an argu-
ment that these results form a cumulative discipline by its structure, with sections
that organize the results systematically, characterizing human, machine, interaction,
and the design process. There are analytic models, but also code implementation
examples. It is no surprise that methods of task analysis play a prominent role in
the text as do theories to help in the design of the interaction. Usability evaluation
methods are integrated in their proper niche within the larger framework.
In short, the codification of the field of human–computer interaction in this
text is now starting to look like other subfields of computer science. Students by
studying the text can learn how to understand and build interactive systems.
Human–computer interaction as represented by the text fits together with other
parts of computer science. Moreover, human–computer interaction as presented is
a challenge problem for advancing theory in cognitive science, design, business, or
social-technical systems. Given where the field was just a few short years ago, the
creation of this text is a monumental achievement. The way is open to reap the
glorious rewards of interactive systems through a markedly less difficult endeavor,
both for designer and for user.
Stuart K. Card
Palo Alto Research Center, Palo Alto, California
PREFACE TO THE THIRD EDITION
It is ten years since the first edition of this book was published and much has
changed. Ubiquitous computing and rich sensor-filled environments are finding
their way out of the laboratory, not just into films and fiction, but also into our
workplaces and homes. Now the computer really has broken its bounds of plastic
and glass: we live in networked societies where personal computing devices from
mobile phones to smart cards fill our pockets, and electronic devices surround us at
home and at work. The web too has grown from a largely academic network into the
hub of business and everyday lives. As the distinctions between physical and digital,
work and leisure start to break down, human–computer interaction is also radically
changing.
We have tried to capture some of the excitement of these changes in this revised
edition, including issues of physical devices in Chapters 2 and 3, discussion of
web interfaces in Chapter 21, ubiquitous computing in Chapters 4 and 20, and new
models and paradigms for interaction in these new environments in Chapters 17 and
18. We have reflected aspects of the shift in use of technology from work to leisure
in the analysis of user experience in Chapter 3, and in several of the boxed examples
and case studies in the text. This new edition of Human–Computer Interaction is not
just tracking these changes but looking ahead at emerging areas.
However, it is also rooted in strong principles and models that are not dependent
on the passing technologies of the day. We are excited both by the challenges of the
new and by the established foundations, as it is these foundations that will be the
means by which today’s students understand tomorrow’s technology. So we make no
apology for continuing the focus of previous editions on the theoretical and con-
ceptual models that underpin our discipline. As the use of technology has changed,
these models have expanded. In particular, the insular individual focus of early
work is increasingly giving way to include the social and physical context. This is
reflected in the expanded treatment of social and organizational analysis, including
ethnography, in Chapter 13, and the analysis of artifacts in the physical environment
in Chapter 18.
xx Preface to the third edition
STRUCTURE
The structure of the new edition has been completely revised. This in part reflects the
growth of the area: ten years ago HCI was as often as not a minority optional sub-
ject, and the original edition was written to capture the core material for a standard
course. Today HCI is much expanded: some areas (like CSCW) are fully fledged dis-
ciplines in their own right, and HCI is studied from a range of perspectives and at
different levels of detail. We have therefore separated basic material suitable for intro-
ductory courses into the first two parts, including a new chapter on interaction
design, which adds new material on scenarios and navigation design and provides an
overview suitable for a first course. In addition, we have included a new chapter on
universal design, to reflect the growing emphasis on design that is inclusive of all,
regardless of ability, age or cultural background. More advanced material focussing
on different HCI models and theories is presented in Part 3, with extended cover-
age of social and contextual models and rich interaction. It is intended that these
sections will be suitable for more advanced HCI courses at undergraduate and
postgraduate level, as well as for researchers new to the field. Detailed coverage of the
particular domains of web applications, ubiquitous computing and CSCW is given
in Part 4.
New to this edition is a full color plate section. Images flagged with a camera icon
in the text can be found in color in the plate section.
WEBSITE AND SUPPORT MATERIALS
We have always believed that support materials are an essential part of a textbook of
this kind. These are designed to supplement and enhance the printed book – phys-
ical and digital integration in practice. Since the first edition we have had exercises,
mini-case studies and presentation slides for all chapters available electronically.
For the second edition these were incorporated into a website including links and
an online search facility that acts as an exhaustive index to the book and mini-
encyclopedia of HCI. For visually disabled readers, access to a full online electronic
text has also been available. The website is continuing to develop, and for the third
edition provides all these features plus more, including WAP search, multi-choice
questions, and extended case study material (see also color plate section). We will use
the book website to bring you new exercises, information and other things, so do
visit us at www.hcibook.com (also available via www.booksites.net/dix). Throughout
the book you will find shorthand web references of the form /e3/a-page-url/. Just
prepend http://www.hcibook.com to find further information. To assist users of the
second edition, a mapping between the structures of the old and new editions is
available on the web at: http://www.hcibook.com/e3/contents/map2e/
Preface to the third edition xxi
STYLISTIC CONVENTION
As with all books, we have had to make some global decisions regarding style and
terminology. Specifically, in a book in which the central characters are ‘the user’
and ‘the designer’, it is difficult to avoid the singular pronoun. We therefore use the
pronoun ‘he’ when discussing the user and ‘she’ when referring to the designer. In
other cases we use ‘she’ as a generic term. This should not be taken to imply anything
about the composition of any actual population.
Similarly, we have adopted the convention of referring to the field of ‘Human–
Computer Interaction’ and the notion of ‘human–computer interaction’. In many
cases we will also use the abbreviation HCI.
ACKNOWLEDGEMENTS
In a book of this size, written by multiple authors, there will always be myriad
people behind the scenes who have aided, supported and abetted our efforts. We
would like to thank all those who provided information, pictures and software that
have enhanced the quality of the final product. In particular, we are indebted to
Wendy Mackay for the photograph of EVA; Wendy Hall and her colleagues at the
University of Southampton for the screen shot of Microcosm; Saul Greenberg for
the reactive keyboard; Alistair Edwards for Soundtrack; Christina Engelbart for the
photographs of the early chord keyset and mouse; Geoff Ellis for the screen shot of
Devina and himself using CuSeeMe; Steve Benford for images of the Internet Foyer;
and Tony Renshaw who provided photographs of the eye tracking equipment.
Thanks too to Simon Shum for information on design rationale, Robert Ward who
gave us material on psycho-physiology, and Elizabeth Mynatt and Tom Rodden who
worked with Gregory on material adapted in Chapter 20. Several of the boxed case
studies are based on the work of multi-institution projects, and we are grateful
to all those from the project teams of CASCO, thePooch SMART-ITS, TOWER,
AVATAR-Conference and TEAM-HOS for boxes and case studies based on their
work; and also to the EQUATOR project from which we drew material for the boxes
on cultural probes, ‘Ambient Wood’ and ‘City’. We would also like to thank all the
reviewers and survey respondents whose feedback helped us to select our subject
matter and improve our coverage; and our colleagues at our respective institutions
and beyond who offered insight, encouragement and tolerance throughout the revi-
sion. We are indebted to all those who have contributed to the production process
at Pearson Education and elsewhere, especially Keith Mansfield, Anita Atkinson,
Lynette Miller, Sheila Chatten and Robert Chaundy.
Personal thanks must go to Fiona, Esther, Miriam, Rachel, Tina, Meghan, Aidan
and Blaise, who have all endured ‘The Book’ well beyond the call of duty and over
xxii Preface to the third edition
many years, and Bruno and ‘the girls’ who continue to make their own inimitable
contribution.
Finally we all owe huge thanks to Fiona for her continued deep personal support
and for tireless proofreading, checking of figures, and keeping us all moving. We
would never have got beyond the first edition without her.
The efforts of all of these have meant that the book is better than it would other-
wise have been. Where it could still be better, we take full responsibility.
PUBLISHERS ACKNOWLEDGEMENTS
We are grateful to the following for permission to reproduce copyright material:
Figure p. 2, Figures 3.14, 3.15, 3.16 and 5.13 and Exercise 8.4 screen shots reprinted
by permission from Apple Computer, Inc.; Figure 2.11 reprinted by permission of
Keith Cheverst; Figure 3.13 from The WebBook and Web Forager: An information
workspace for the world-wide web in CHI Conference Proceedings, © 1996 ACM, Inc.,
reprinted by permission (Card, S. K., Robertson, G. G. and York, W. 1996); Figures
3.9, 3.19, 5.5, Chapter 14, Design Focus: Looking real – Avatar Conference screen
shots, Figures 21.3, 21.10, 21.11 screen shot frames reprinted by permission from
Microsoft Corporation; Tables 6.2 and 6.3 adapted from Usability engineering: our
experience and evolution in Handbook for Human–Computer Interaction edited by
M. Helander, Copyright 1988, with permission from Elsevier (Whiteside, J., Bennett,
J. and Hotzblatt, K. 1988); Figure 7.1 adapted from The alternate reality kit – an
animated environment for creating interactive simulations in Proceedings of
Workshop on Visual Languages, © 1986 IEEE, reprinted by permission of IEEE
(Smith, R. B. 1986); Figure 7.2 from Guidelines for designing user interface software
in MITRE Corporation Report MTR-9420, reprinted by permission of The MITRE
Corporation (Smith, S. L. and Mosier, J. N. 1986); Figure 7.3 reprinted by permis-
sion of Jenifer Tidwell; Figures 8.6 and 8.9 from Xview Programming Manual,
Volume 7 of The X Window System, reprinted by permission of O’Reilly and
Associates, Inc. (Heller, D. 1990); Figure 9.8 screen shot reprinted by permission of
Dr. R. D. Ward; Figure 10.2 after Earcons and icons: their structure and common
design principles in Human-Computer Interaction, 4(1), published and reprinted
by permission of Lawrence Erlbaum Associates, Inc. (Blattner, M., Sumikawa, D. and
Greenberg, R. 1989); Figure 10.5 reprinted by permission of Alistair D. N. Edwards;
Figure 10.7 reprinted by permission of Saul Greenberg; Figure 11.2 screen shot
reprinted by permission of Macromedia, Inc.; Table 12.1 adapted from The
Psychology of Human Computer Interaction, published and reprinted by permission
of Lawrence Erlbaum Associates, Inc. (Card, S. K., Moran, T. P. and Newell, A.
1983); Table 12.2 after Table in A comparison of input devices in elemental
pointing and dragging tasks in Reaching through technology – CHI’91 Conference
Proceedings, Human Factors in Computing Systems, April, edited by S. P. Robertson,
G. M. Olson and J. S. Olson, © 1991 ACM, Inc., reprinted by permission (Mackenzie,
xxiv Publisher’s acknowledgements
I. S., Sellen, A. and Buxton, W. 1991); Figure 14.1 from Understanding Computers
and Cognition: A New Foundation for Design, published by Addison-Wesley,
reprinted by permission of Pearson Education, Inc. (Winograd, T. and Flores, F.
1986); Figure 14.5 from Theories of multi-party interaction. Technical report, Social
and Computer Sciences Research Group, University of Surrey and Queen Mary and
Westfield Colleges, University of London, reprinted by permission of Nigel Gilbert
(Hewitt, B., Gilbert, N., Jirotka, M. and Wilbur, S. 1990); Figure 14.6 from Dialogue
processes in computer-mediated communication: a study of letters in the com system.
Technical report, Linköping Studies in Arts and Sciences, reprinted by permission of
Kerstin Severinson Eklundh (Eklundh, K. S. 1986); Chapter 14, Design Focus:
Looking real – Avatar Conference, screen shots reprinted by permission of
AVATAR-Conference project team; Figure 16.17 screen shot reprinted by permis-
sion of Harold Thimbleby; Figure 17.5 based on Verifying the behaviour of virtual
world objects in DSV-IS 2000 Interactive Systems: Design, Specification and
Verification. LNCS 1946, edited by P. Palanque and F. Paternò, published and
reprinted by permission of Spinger-Verlag GmbH & Co. KG (Willans, J. S. and
Harrison, M. D. 2001); Figure 18.4 icons reprinted by permission of Fabio Paternò;
Chapter 19, p.675 CuSeeMe screen shot reprinted by permission of Geoff Ellis;
Chapter 19, Design Focus: TOWER – workspace awareness, screen shots reprinted
by permission of Wolfgang Prinz; Figure 20.1 reprinted by permission of Mitsubishi
Electric Research Laboratories, Inc.; Figure 20.4 (right) reprinted by permission of
Sony Computer Science Laboratories, Inc; Figure 20.9 from Cone trees. Animated 3d
visualisation of hierarchical information in Proceedings of the CH’91 Conference of
Human Factors in Computing Systems, © 1991 ACM, Inc., reprinted by permission
(Robertson, G. G., Card, S. K., and Mackinlay, J. D. 1991); Figure 20.10 from
Lifelines: visualising personal histories in Proceedings of CH’96, © 1996 ACM, Inc.,
reprinted by permission (Plaisant, C., Milash, B., Rose, A., Widoff, S. and
Shneiderman, B. 1996); Figure 20.11 from Browsing anatomical image databases: a
case study of the Visible Human in CH’96 Conference Companion, © 1996 ACM,
Inc., reprinted by permission (North, C. and Korn, F. 1996); Figure 20.12 from
Externalising abstract mathematical models in Proceedings of CH’96, © 1996 ACM,
Inc., reprinted by permission (Tweedie, L., Spence, R., Dawkes, H. and Su, H. 1996);
Figure 21.2 from The impact of Utility and Time on Distributed Information
Retrieval in People and Computers XII: Proceedings of HCI’97, edited by H.
Thimbleby, B. O’Conaill and P. Thomas, published and reprinted by permission
of Spinger-Verlag GmbH & Co. KG (Johnson, C. W. 1997); Figure 21.4 screen
shot reprinted by permission of the Departments of Electronics and Computer
Science and History at the University of Southampton; Figure 21.6 Netscape browser
window © 2002 Netscape Communications Corporation. Used with permission.
Netscape has not authorized, sponsored, endorsed, or approved this publication and
is not responsible for its content.
We are grateful to the following for permission to reproduce photographs:
Chapter 1, p. 50, Popperfoto.com; Chapter 2, p. 65, PCD Maltron Ltd; Figure 2.2
Electrolux; Figures 2.6 and 19.6 photos courtesy of Douglas Engelbart and Bootstrap
Institute; Figure 2.8 (left) British Sky Broadcasting Limited; Figure 2.13 (bottom
Publisher’s acknowledgements xxv
right) Sony (UK) Ltd; Chapter 2, Design Focus: Feeling the Road, BMW AG;
Chapter 2, Design Focus: Smart-Its – making using sensors easy, Hans Gellersen;
Figures 4.1 (right) and 20.2 (left) Palo Alto Research Center; Figure 4.2 and 20.3
(left) François Guimbretière; Figure 4.3 (bottom left) Franklin Electronic Publishers;
Figure 5.2 (top plate and middle plate) Kingston Museum and Heritage Service,
(bottom plate) V&A Images, The Victoria and Albert Museum, London; Chapter 5,
Design Focus: Cultural probes, William W. Gaver, Anthony Boucher, Sarah
Pennington and Brendan Walker, Equator IRC, Royal College of Art; Chapter 6,
p. 245, from The 1984 Olympic Message System: a text of behavioural principle of
system design in Communications of the ACM, 30(9), © 1987 ACM, Inc., reprinted
by permission (Gould, J. D., Boies, S. J., Levy, S., Richards, J. T. and Schoonard, J.
1987); Figures 9.5 and 9.6 J. A. Renshaw; Figure 9.7 Dr. R. D. Ward; Figure 10.3
SensAble Technologies; Chapter 13, Design Focus: Tomorrow’s hospital – using
participatory design, Professor J. Artur Vale Serrano; Chapter 18, p. 650, Michael
Beigl; Chapter 19, p. 678, Steve Benford, The Mixed Reality Laboratory, University
of Nottingham; Chapter 19, Design Focus: SMS in action, Mark Rouncefield; Figure
20.2 (right) Ken Hinckley; Figure 20.3 (right) MIT Media Lab; Figure 20.4 (left)
from Interacting with paper on the digital desk in Communications of the ACM,
36(7), © 1993 ACM, Inc., reprinted by permission (Wellner, P. 1993); Chapter 20,
p. 726, Peter Phillips; Chapter 20, Design Focus: Ambient wood – augmenting the
physical, Yvonne Rogers; Chapter 20, Design Focus: Shared experience, Matthew
Chalmers.
We are grateful to the following for permission to reproduce text extracts:
Pearson Education, Inc. Publishing as Pearson Addison Wesley for an extract
adapted from Designing the User Interface: Strategies for Effective Human–Computer
Interaction 3/e by B. Shneiderman © 1998, Pearson Education, Inc; Perseus Books
Group for an extract adapted from The Design of Everyday Things by D. Norman,
1998; and Wiley Publishing, Inc. for extracts adapted from ‘Heuristic Evaluation’ by
Jakob Nielson and Robert L. Mack published in Usability Inspection Methods© 1994
Wiley Publishing, Inc.; IEEE for permission to base chapter 20 on ‘The human ex-
perience’ by Gregory Abowd, Elizabeth Mynatt and Tom Rodden which appeared
in IEEE Pervasive Computing Magazine, Special Inaugural Issue on Reaching for
Weiser’s Vision, Vol. 1, Issue 1, pp. 48–58, Jan–March 2002. © 2002 IEEE.
In some instances we have been unable to trace the owners of copyright material, and
we would appreciate any information that would enable us to do so.
INTRODUCTION
WHY HUMAN–COMPUTER INTERACTION?
In the first edition of this book we wrote the following:
This is the authors’ second attempt at writing this introduction. Our first attempt
fell victim to a design quirk coupled with an innocent, though weary and less than
attentive, user. The word-processing package we originally used to write this intro-
duction is menu based. Menu items are grouped to reflect their function. The ‘save’
and ‘delete’ options, both of which are correctly classified as file-level operations, are
consequently adjacent items in the menu. With a cursor controlled by a trackball it
is all too easy for the hand to slip, inadvertently selecting delete instead of save. Of
course, the delete option, being well thought out, pops up a confirmation box allow-
ing the user to cancel a mistaken command. Unfortunately, the save option produces
a very similar confirmation box – it was only as we hit the ‘Confirm’ button that we
noticed the word ‘delete’ at the top...
Happily this word processor no longer has a delete option in its menu, but unfortu-
nately, similar problems to this are still an all too common occurrence. Errors such
as these, resulting from poor design choices, happen every day. Perhaps they are not
catastrophic: after all nobody’s life is endangered nor is there environmental damage
(unless the designer happens to be nearby or you break something in frustration!).
However, when you lose several hours’ work with no written notes or backup and
a publisher’s deadline already a week past, ‘catastrophe’ is certainly the word that
springs to mind.
Why is it then that when computers are marketed as ‘user friendly’ and ‘easy to
use’, simple mistakes like this can still occur? Did the designer of the word processor
actually try to use it with the trackball, or was it just that she was so expert with the
system that the mistake never arose? We hazard a guess that no one tried to use it
when tired and under pressure. But these criticisms are not levied only on the design-
ers of traditional computer software. More and more, our everyday lives involve pro-
grammed devices that do not sit on our desk, and these devices are just as unusable.
Exactly how many VCR designers understand the universal difficulty people have
trying to set their machines to record a television program? Do car radio designers
2 Introduction
actually think it is safe to use so many knobs and displays that the driver has to
divert attention away from the road completely in order to tune the radio or adjust
the volume?
Computers and related devices have to be designed with an understanding that
people with specific tasks in mind will want to use them in a way that is seamless with
respect to their everyday work. To do this, those who design these systems need to
know how to think in terms of the eventual users’ tasks and how to translate that
knowledge into an executable system. But there is a problem with trying to teach the
notion of designing computers for people. All designers are people and, most prob-
ably, they are users as well. Isn’t it therefore intuitive to design for the user? Why
does it need to be taught when we all know what a good interface looks like? As a
result, the study of human–computer interaction (HCI) tends to come late in the
designer’s training, if at all. The scenario with which we started shows that this is a
mistaken view; it is not at all intuitive or easy to design consistent, robust systems
DESIGN FOCUS
Things don’t change
It would be nice to think that problems like those described at the start of the Introduction would
never happen now. Think again! Look at the MacOS X ‘dock’ below. It is a fast launch point for applica-
tions; folders and files can be dragged there for instant access; and also, at the right-hand side, there
sits the trash can. Imagine what happens as you try to drag a file into one of the folders. If your finger
accidentally slips whilst the icon is over the trash can – oops!
Happily this is not quite as easy in reality as it looks in the screen shot, since the icons in the dock con-
stantly move around as you try to drag a file into it. This is to make room for the file in case you want
to place it in the dock. However, it means you have to concentrate very hard when dragging a file over
the dock. We assume this is not a deliberate feature, but it does have the beneficial side effect that
users are less likely to throw away a file by accident – whew!
In fact it is quite fun to watch a new user trying to throw away a file. The trash can keeps moving as if
it didn’t want the file in it. Experienced users evolve coping strategies. One user always drags files into
the trash from the right-hand side as then the icons in the dock don’t move around. So two lessons:
n designs don’t always get better
n but at least users are clever.
Screen shot reprinted by permission from Apple Computer, Inc.
that will cope with all manner of user carelessness. The interface is not something
that can be plugged in at the last minute; its design should be developed integrally
with the rest of the system. It should not just present a ‘pretty face’, but should sup-
port the tasks that people actually want to do, and forgive the careless mistakes. We
therefore need to consider how HCI fits into the design process.
Designing usable systems is not simply a matter of altruism towards the eventual
user, or even marketing; it is increasingly a matter of law. National health and safety
standards constrain employers to provide their workforce with usable computer sys-
tems: not just safe but usable. For example, EC Directive 90/270/EEC, which has been
incorporated into member countries’ legislation, requires employers to ensure the
following when designing, selecting, commissioning or modifying software:
n that it is suitable for the task
n that it is easy to use and, where appropriate, adaptable to the user’s knowledge
and experience
n that it provides feedback on performance
n that it displays information in a format and at a pace that is adapted to the user
n that it conforms to the ‘principles of software ergonomics’.
Designers and employers can no longer afford to ignore the user.
WHAT IS HCI?
The term human–computer interactionhas only been in widespread use since the early
1980s, but has its roots in more established disciplines. Systematic study of human
performance began in earnest at the beginning of the last century in factories, with
an emphasis on manual tasks. The Second World War provided the impetus for
studying the interaction between humans and machines, as each side strove to pro-
duce more effective weapons systems. This led to a wave of interest in the area among
researchers, and the formation of the Ergonomics Research Society in 1949. Tradi-
tionally, ergonomists have been concerned primarily with the physical characteristics
of machines and systems, and how these affect user performance. Human Factors
incorporates these issues, and more cognitive issues as well. The terms are often used
interchangeably, with Ergonomics being the preferred term in the United Kingdom
and Human Factors in the English-speaking parts of North America. Both of these
disciplines are concerned with user performance in the context of any system, whether
computer, mechanical or manual. As computer use became more widespread, an
increasing number of researchers specialized in studying the interaction between
people and computers, concerning themselves with the physical, psychological and
theoretical aspects of this process. This research originally went under the name man–
machine interaction, but this became human–computer interaction in recognition of
the particular interest in computers and the composition of the user population!
Another strand of research that has influenced the development of HCI is infor-
mation science and technology. Again the former is an old discipline, pre-dating the
introduction of technology, and is concerned with the management and manipulation
What is HCI? 3
4 Introduction
of information within an organization. The introduction of technology has had a
profound effect on the way that information can be stored, accessed and utilized
and, consequently, a significant effect on the organization and work environment.
Systems analysis has traditionally concerned itself with the influence of technology
in the workplace, and fitting the technology to the requirements and constraints of
the job. These issues are also the concern of HCI.
HCI draws on many disciplines, as we shall see, but it is in computer science and
systems design that it must be accepted as a central concern. For all the other discip-
lines it can be a specialism, albeit one that provides crucial input; for systems design
it is an essential part of the design process. From this perspective, HCI involves the
design, implementation and evaluation of interactive systems in the context of the
user’s task and work.
However, when we talk about human–computer interaction, we do not necessarily
envisage a single user with a desktop computer. By user we may mean an individual
user, a group of users working together, or a sequence of users in an organization,
each dealing with some part of the task or process. The user is whoever is trying to
get the job done using the technology. By computerwe mean any technology ranging
from the general desktop computer to a large-scale computer system, a process
control system or an embedded system. The system may include non-computerized
parts, including other people. By interaction we mean any communication between
a user and computer, be it direct or indirect. Direct interaction involves a dialog
with feedback and control throughout performance of the task. Indirect interaction
may involve batch processing or intelligent sensors controlling the environment.
The important thing is that the user is interacting with the computer in order to
accomplish something.
WHO IS INVOLVED IN HCI?
HCI is undoubtedly a multi-disciplinary subject. The ideal designer of an interactive
system would have expertise in a range of topics: psychology and cognitive science
to give her knowledge of the user’s perceptual, cognitive and problem-solving
skills; ergonomics for the user’s physical capabilities; sociology to help her under-
stand the wider context of the interaction; computer science and engineering to
be able to build the necessary technology; business to be able to market it; graphic
design to produce an effective interface presentation; technical writing to produce
the manuals, and so it goes on. There is obviously too much expertise here to be held
by one person (or indeed four!), perhaps even too much for the average design team.
Indeed, although HCI is recognized as an interdisciplinary subject, in practice peo-
ple tend to take a strong stance on one side or another. However, it is not possible to
design effective interactive systems from one discipline in isolation. Input is needed
from all sides. For example, a beautifully designed graphic display may be unusable
if it ignores dialog constraints or the psychological limitations of the user.
Theory and HCI 5
In this book we want to encourage the multi-disciplinary view of HCI but we too
have our ‘stance’, as computer scientists. We are interested in answering a particular
question. How do principles and methods from each of these contributing dis-
ciplines in HCI help us to design better systems? In this we must be pragmatists
rather than theorists: we want to know how to apply the theory to the problem
rather than just acquire a deep understanding of the theory. Our goal, then, is to be
multi-disciplinary but practical. We concentrate particularly on computer science,
psychology and cognitive science as core subjects, and on their application to design;
other disciplines are consulted to provide input where relevant.
THEORY AND HCI
Unfortunately for us, there is no general and unified theory of HCI that we can
present. Indeed, it may be impossible ever to derive one; it is certainly out of our
reach today. However, there is an underlying principle that forms the basis of our
own views on HCI, and it is captured in our claim that people use computers to
accomplish work. This outlines the three major issues of concern: the people, the
computers and the tasks that are performed. The system must support the user’s
task, which gives us a fourth focus, usability: if the system forces the user to adopt an
unacceptable mode of work then it is not usable.
There are, however, those who would dismiss our concentration on the task,
saying that we do not even know enough about a theory of human tasks to support
them in design. There is a good argument here (to which we return in Chapter 15).
However, we can live with this confusion about what real tasks are because our
understanding of tasks at the moment is sufficient to give us direction in design. The
user’s current tasks are studied and then supported by computers, which can in
turn affect the nature of the original task and cause it to evolve. To illustrate, word
processing has made it easy to manipulate paragraphs and reorder documents,
allowing writers a completely new freedom that has affected writing styles. No longer
is it vital to plan and construct text in an ordered fashion, since free-flowing prose
can easily be restructured at a later date. This evolution of task in turn affects the
design of the ideal system. However, we see this evolution as providing a motivating
force behind the system development cycle, rather than a refutation of the whole idea
of supportive design.
This word ‘task’ or the focus on accomplishing ‘work’ is also problematic when we
think of areas such as domestic appliances, consumer electronics and e-commerce.
There are three ‘use’ words that must all be true for a product to be successful; it
must be:
useful – accomplish what is required: play music, cook dinner, format a document;
usable – do it easily and naturally, without danger of error, etc.;
used – make people want to use it, be attractive, engaging, fun, etc.
6 Introduction
The last of these has not been a major factor until recently in HCI, but issues of
motivation, enjoyment and experience are increasingly important. We are certainly
even further from having a unified theory of experience than of task.
The question of whether HCI, or more importantly the design of interactive sys-
tems and the user interface in particular, is a science or a craft discipline is an inter-
esting one. Does it involve artistic skill and fortuitous insight or reasoned methodical
science? Here we can draw an analogy with architecture. The most impressive struc-
tures, the most beautiful buildings, the innovative and imaginative creations that
provide aesthetic pleasure, all require inventive inspiration in design and a sense of
artistry, and in this sense the discipline is a craft. However, these structures also have
to be able to stand up to fulfill their purpose successfully, and to be able to do this
the architect has to use science. So it is for HCI: beautiful and/or novel interfaces are
artistically pleasing andcapable of fulfilling the tasks required – a marriage of art and
science into a successful whole. We want to reuse lessons learned from the past about
how to achieve good results and avoid bad ones. For this we require both craft and
science. Innovative ideas lead to more usable systems, but in order to maximize the
potential benefit from the ideas, we need to understand not only that they work, but
how and why they work. This scientific rationalization allows us to reuse related con-
cepts in similar situations, in much the same way that architects can produce a bridge
and know that it will stand, since it is based upon tried and tested principles.
The craft–science tension becomes even more difficult when we consider novel
systems. Their increasing complexity means that our personal ideas of good and bad
are no longer enough; for a complex system to be well designed we need to rely on
something more than simply our intuition. Designers may be able to think about
how one user would want to act, but how about groups? And what about new media?
Our ideas of how best to share workloads or present video information are open to
debate and question even in non-computing situations, and the incorporation of one
version of good design into a computer system is quite likely to be unlike anyone
else’s version. Different people work in different ways, whilst different media color
the nature of the interaction; both can dramatically change the very nature of the
original task. In order to assist designers, it is unrealistic to assume that they can rely
on artistic skill and perfect insight to develop usable systems. Instead we have to pro-
vide them with an understanding of the concepts involved, a scientific view of the
reasons why certain things are successful whilst others are not, and then allow their
creative nature to feed off this information: creative flow, underpinned with science;
or maybe scientific method, accelerated by artistic insight. The truth is that HCI is
required to be both a craft and a science in order to be successful.
HCI IN THE CURRICULUM
If HCI involves both craft and science then it must, in part at least, be taught.
Imagination and skill may be qualities innate in the designer or developed through
experience, but the underlying theory must be learned. In the past, when computers
HCI in the curriculum 7
were used primarily by expert specialists, concentration on the interface was a lux-
ury that was often relinquished. Now designers cannot afford to ignore the interface
in favour of the functionality of their systems: the two are too closely intertwined. If
the interface is poor, the functionality is obscured; if it is well designed, it will allow
the system’s functionality to support the user’s task.
Increasingly, therefore, computer science educators cannot afford to ignore HCI.
We would go as far as to claim that HCI should be integrated into every computer
science or software engineering course, either as a recurring feature of other modules
or, preferably, as a module itself. It should not be viewed as an ‘optional extra’
(although, of course, more advanced HCI options can complement a basic core
course). This view is shared by the ACM SIGCHI curriculum development group,
who propose a curriculum for such a core course [9]. The topics included in this
book, although developed without reference to this curriculum, cover the main
emphases of it, and include enough detail and coverage to support specialized
options as well.
In courses other than computer science, HCI may well be an option specializing
in a particular area, such as cognitive modeling or task analysis. Selected use of the
relevant chapters of this book can also support such a course.
HCI must be taken seriously by designers and educators if the requirement for
additional complexity in the system is to be matched by increased clarity and usabil-
ity in the interface. In this book we demonstrate how this can be done in practice.
DESIGN FOCUS
Quick fixes
You should expect to spend both time and money on interface design, just as you would with other
parts of a system. So in one sense there are no quick fixes. However, a few simple steps can make a
dramatic improvement.
Think ‘user’
Probably 90% of the value of any interface design technique is that it forces the designer to remember
that someone (and in particular someone else) will use the system under construction.
Try it out
Of course, many designers will build a system that they find easy and pleasant to use, and they find
it incomprehensible that anyone else could have trouble with it. Simply sitting someone down with
an early version of an interface (without the designer prompting them at each step!) is enormously
valuable. Professional usability laboratories will have video equipment, one-way mirrors and other
sophisticated monitors, but a notebook and pencil and a home-video camera will suffice (more about
evaluation in Chapter 9).
Involve the users
Where possible, the eventual users should be involved in the design process. They have vital know-
ledge and will soon find flaws. A mechanical syringe was once being developed and a prototype was
demonstrated to hospital staff. Happily they quickly noticed the potentially fatal flaw in its interface.
The doses were entered via a numeric keypad: an accidental keypress and the dose could be out by a
factor of 10! The production version had individual increment/decrement buttons for each digit (more
about participatory design in Chapter 13).
Iterate
People are complicated, so you won’t get it right first time. Programming an interface can be a very
difficult and time-consuming business. So, the result becomes precious and the builder will want
to defend it and minimize changes. Making early prototypes less precious and easier to throw away is
crucial. Happily there are now many interface builder tools that aid this process. For example, mock-
ups can be quickly constructed using HyperCard on the Apple Macintosh or Visual Basic on the PC.
For visual and layout decisions, paper designs and simple models can be used (more about iterative
design in Chapter 5).
8 Introduction
Figure 0.1 Automatic syringe: setting the dose to 1372. The effect of one key slip before and after
user involvement
PART
1
FOUNDATIONS
In this part we introduce the fundamental components of
an interactive system: the human user, the computer system
itself and the nature of the interactive process. We then
present a view of the history of interactive systems by look-
ing at key interaction paradigms that have been significant.
Chapter 1 discusses the psychological and physiological
attributes of the user, providing us with a basic overview of
the capabilities and limitations that affect our ability to use
computer systems. It is only when we have an understand-
ing of the user at this level that we can understand what
makes for successful designs. Chapter 2 considers the
computer in a similar way. Input and output devices are
described and explained and the effect that their individual
characteristics have on the interaction highlighted. The
computational power and memory of the computer is
another important component in determining what can be
achieved in the interaction, whilst due attention is also paid
to paper output since this forms one of the major uses
of computers and users’ tasks today. Having approached
interaction from both the human and the computer side,
we then turn our attention to the dialog between them
in Chapter 3, where we look at models of interaction. In
Chapter 4 we take a historical perspective on the evolution
of interactive systems and how they have increased the
usability of computers in general.
THE HUMAN
1
OVERVIEW
n Humans are limited in their capacity to process
information. This has important implications for design.
n Information is received and responses given via a
number of input and output channels:
– visual channel
– auditory channel
– haptic channel
– movement.
n Information is stored in memory:
– sensory memory
– short-term (working) memory
– long-term memory.
n Information is processed and applied:
– reasoning
– problem solving
– skill acquisition
– error.
n Emotion influences human capabilities.
n Users share common capabilities but are individuals
with differences, which should not be ignored.
12 Chapter 1 n The human
INTRODUCTION
This chapter is the first of four in which we introduce some of the ‘foundations’ of
HCI. We start with the human, the central character in any discussion of interactive
systems. The human, the user, is, after all, the one whom computer systems are de-
signed to assist. The requirements of the user should therefore be our first priority.
In this chapter we will look at areas of human psychology coming under the general
banner of cognitive psychology. This may seem a far cry from designing and building
interactive computer systems, but it is not. In order to design something for some-
one, we need to understand their capabilities and limitations. We need to know if
there are things that they will find difficult or, even, impossible. It will also help us to
know what people find easy and how we can help them by encouraging these things.
We will look at aspects of cognitive psychology which have a bearing on the use of com-
puter systems: how humans perceive the world around them, how they store and
process information and solve problems, and how they physically manipulate objects.
We have already said that we will restrict our study to those things that are relev-
ant to HCI. One way to structure this discussion is to think of the user in a way that
highlights these aspects. In other words, to think of a simplified model of what is
actually going on. Many models have been proposed and it useful to consider one of
the most influential in passing, to understand the context of the discussion that is to
follow. In 1983, Card, Moran and Newell [56] described the Model Human Processor,
which is a simplified view of the human processing involved in interacting with
computer systems. The model comprises three subsystems: the perceptual system,
handling sensory stimulus from the outside world, the motor system, which controls
actions, and the cognitive system, which provides the processing needed to connect
the two. Each of these subsystems has its own processor and memory, although
obviously the complexity of these varies depending on the complexity of the tasks
the subsystem has to perform. The model also includes a number of principles of
operation which dictate the behavior of the systems under certain conditions.
We will use the analogy of the user as an information processing system, but in
our model make the analogy closer to that of a conventional computer system.
Information comes in, is stored and processed, and information is passed out. We
will therefore discuss three components of this system: input–output, memory and
processing. In the human, we are dealing with an intelligent information-processing
system, and processing therefore includes problem solving, learning, and, con-
sequently, making mistakes. This model is obviously a simplification of the real
situation, since memory and processing are required at all levels, as we have seen in
the Model Human Processor. However, it is convenient as a way of grasping how
information is handled by the human system. The human, unlike the computer, is
also influenced by external factors such as the social and organizational environ-
ment, and we need to be aware of these influences as well. We will ignore such
factors for now and concentrate on the human’s information processing capabilities
only. We will return to social and organizational influences in Chapter 3 and, in
more detail, in Chapter 13.
1.1
1.2 Input–output channels 13
In this chapter, we will first look at the human’s input–output channels, the senses
and responders or effectors. This will involve some low-level processing. Secondly,
we will consider human memory and how it works. We will then think about how
humans perform complex problem solving, how they learn and acquire skills, and
why they make mistakes. Finally, we will discuss how these things can help us in the
design of computer systems.
INPUT–OUTPUT CHANNELS
A person’s interaction with the outside world occurs through information being
received and sent: input and output. In an interaction with a computer the user
receives information that is output by the computer, and responds by providing
input to the computer – the user’s output becomes the computer’s input and vice
versa. Consequently the use of the terms input and output may lead to confusion so
we shall blur the distinction somewhat and concentrate on the channels involved.
This blurring is appropriate since, although a particular channel may have a primary
role as input or output in the interaction, it is more than likely that it is also used in
the other role. For example, sight may be used primarily in receiving information
from the computer, but it can also be used to provide information to the computer,
for example by fixating on a particular screen point when using an eyegaze system.
Input in the human occurs mainly through the senses and output through the
motor control of the effectors. There are five major senses: sight, hearing, touch, taste
and smell. Of these, the first three are the most important to HCI. Taste and smell
do not currently play a significant role in HCI, and it is not clear whether they could
be exploited at all in general computer systems, although they could have a role to
play in more specialized systems (smells to give warning of malfunction, for example)
or in augmented reality systems. However, vision, hearing and touch are central.
Similarly there are a number of effectors, including the limbs, fingers, eyes, head
and vocal system. In the interaction with the computer, the fingers play the primary
role, through typing or mouse control, with some use of voice, and eye, head and
body position.
Imagine using a personal computer (PC) with a mouse and a keyboard. The appli-
cation you are using has a graphical interface, with menus, icons and windows. In
your interaction with this system you receive information primarily by sight, from
what appears on the screen. However, you may also receive information by ear: for
example, the computer may ‘beep’ at you if you make a mistake or to draw attention
to something, or there may be a voice commentary in a multimedia presentation.
Touch plays a part too in that you will feel the keys moving (also hearing the ‘click’)
or the orientation of the mouse, which provides vital feedback about what you have
done. You yourself send information to the computer using your hands, either
by hitting keys or moving the mouse. Sight and hearing do not play a direct role
in sending information in this example, although they may be used to receive
1.2
14 Chapter 1 n The human
information from a third source (for example, a book, or the words of another per-
son) which is then transmitted to the computer.
In this section we will look at the main elements of such an interaction, first con-
sidering the role and limitations of the three primary senses and going on to consider
motor control.
1.2.1 Vision
Human vision is a highly complex activity with a range of physical and perceptual
limitations, yet it is the primary source of information for the average person.
We can roughly divide visual perception into two stages: the physical reception of
the stimulus from the outside world, and the processing and interpretation of that
stimulus. On the one hand the physical properties of the eye and the visual system
mean that there are certain things that cannot be seen by the human; on the other
the interpretative capabilities of visual processing allow images to be constructed
from incomplete information. We need to understand both stages as both influence
what can and cannot be perceived visually by a human being, which in turn directly
affects the way that we design computer systems. We will begin by looking at the
eye as a physical receptor, and then go on to consider the processing involved in
basic vision.
The human eye
Vision begins with light. The eye is a mechanism for receiving light and transform-
ing it into electrical energy. Light is reflected from objects in the world and their
image is focussed upside down on the back of the eye. The receptors in the eye
transform it into electrical signals which are passed to the brain.
The eye has a number of important components (see Figure 1.1) which we will
look at in more detail. The corneaand lens at the front of the eye focus the light into
a sharp image on the back of the eye, the retina. The retina is light sensitive and con-
tains two types of photoreceptor: rods and cones.
Rods are highly sensitive to light and therefore allow us to see under a low level of
illumination. However, they are unable to resolve fine detail and are subject to light
saturation. This is the reason for the temporary blindness we get when moving from
a darkened room into sunlight: the rods have been active and are saturated by the
sudden light. The cones do not operate either as they are suppressed by the rods. We
are therefore temporarily unable to see at all. There are approximately 120 million
rods per eye which are mainly situated towards the edges of the retina. Rods there-
fore dominate peripheral vision.
Cones are the second type of receptor in the eye. They are less sensitive to light
than the rods and can therefore tolerate more light. There are three types of cone,
each sensitive to a different wavelength of light. This allows color vision. The eye has
approximately 6 million cones, mainly concentrated on the fovea, a small area of the
retina on which images are fixated.
1.2 Input–output channels 15
Although the retina is mainly covered with photoreceptors there is one blind spot
where the optic nerve enters the eye. The blind spot has no rods or cones, yet our visual
system compensates for this so that in normal circumstances we are unaware of it.
The retina also has specialized nerve cells called ganglioncells. There are two types:
X-cells, which are concentrated in the fovea and are responsible for the early detec-
tion of pattern; and Y-cells which are more widely distributed in the retina and are
responsible for the early detection of movement. The distribution of these cells
means that, while we may not be able to detect changes in pattern in peripheral
vision, we can perceive movement.
Visual perception
Understanding the basic construction of the eye goes some way to explaining the
physical mechanisms of vision but visual perception is more than this. The informa-
tion received by the visual apparatus must be filtered and passed to processing ele-
ments which allow us to recognize coherent scenes, disambiguate relative distances
and differentiate color. We will consider some of the capabilities and limitations of
visual processing later, but first we will look a little more closely at how we perceive
size and depth, brightness and color, each of which is crucial to the design of effective
visual interfaces.
Figure 1.1 The human eye
16 Chapter 1 n The human
Perceiving size and depth Imagine you are standing on a hilltop. Beside you on the
summit you can see rocks, sheep and a small tree. On the hillside is a farmhouse with
outbuildings and farm vehicles. Someone is on the track, walking toward the
summit. Below in the valley is a small market town.
Even in describing such a scene the notions of size and distance predominate. Our
visual system is easily able to interpret the images which it receives to take account
of these things. We can identify similar objects regardless of the fact that they appear
to us to be of vastly different sizes. In fact, we can use this information to judge
distances.
So how does the eye perceive size, depth and relative distances? To understand this
we must consider how the image appears on the retina. As we noted in the previous
section, reflected light from the object forms an upside-down image on the retina.
The size of that image is specified as a visual angle. Figure 1.2 illustrates how the
visual angle is calculated.
If we were to draw a line from the top of the object to a central point on the front
of the eye and a second line from the bottom of the object to the same point, the
visual angle of the object is the angle between these two lines. Visual angle is affected
by both the size of the object and its distance from the eye. Therefore if two objects
are at the same distance, the larger one will have the larger visual angle. Similarly,
if two objects of the same size are placed at different distances from the eye, the
DESIGN FOCUS
Getting noticed
The extensive knowledge about the human visual system can be brought to bear in practical design. For
example, our ability to read or distinguish falls off inversely as the distance from our point of focus
increases. This is due to the fact that the cones are packed more densely towards the center of our
visual field. You can see this in the following image. Fixate on the dot in the center. The letters on the
left should all be equally readable, those on the right all equally harder.
This loss of discrimination sets limits on the amount that can be seen or read without moving one’s
eyes. A user concentrating on the middle of the screen cannot be expected to read help text on the
bottom line.
However, although our ability to discriminate static text diminishes, the rods, which are concentrated
more in the outer parts of our visual field, are very sensitive to changes; hence we see movement well
at the edge of our vision. So if you want a user to see an error message at the bottom of the screen it
had better be flashing! On the other hand clever moving icons, however impressive they are, will be
distracting even when the user is not looking directly at them.
1.2 Input–output channels 17
furthest one will have the smaller visual angle. The visual angle indicates how much
of the field of view is taken by the object. The visual angle measurement is given in
either degrees or minutes of arc, where 1 degree is equivalent to 60 minutes of arc,
and 1 minute of arc to 60 seconds of arc.
So how does an object’s visual angle affect our perception of its size? First, if
the visual angle of an object is too small we will be unable to perceive it at all. Visual
acuity is the ability of a person to perceive fine detail. A number of measurements
have been established to test visual acuity, most of which are included in standard
eye tests. For example, a person with normal vision can detect a single line if it has a
visual angle of 0.5 seconds of arc. Spaces between lines can be detected at 30 seconds
to 1 minute of visual arc. These represent the limits of human visual acuity.
Assuming that we can perceive the object, does its visual angle affect our per-
ception of its size? Given that the visual angle of an object is reduced as it gets
further away, we might expect that we would perceive the object as smaller. In fact,
our perception of an object’s size remains constant even if its visual angle changes.
So a person’s height is perceived as constant even if they move further from you.
This is the law of size constancy, and it indicates that our perception of size relies on
factors other than the visual angle.
One of these factors is our perception of depth. If we return to the hilltop scene
there are a number of cues which we can use to determine the relative positions and
distances of the objects which we see. If objects overlap, the object which is partially
covered is perceived to be in the background, and therefore further away. Similarly,
the size and height of the object in our field of view provides a cue to its distance.
Figure 1.2 Visual angle
18 Chapter 1 n The human
A third cue is familiarity: if we expect an object to be of a certain size then we can
judge its distance accordingly. This has been exploited for humour in advertising:
one advertisement for beer shows a man walking away from a bottle in the fore-
ground. As he walks, he bumps into the bottle, which is in fact a giant one in the
background!
Perceiving brightness A second aspect of visual perception is the perception of
brightness. Brightness is in fact a subjective reaction to levels of light. It is affected by
luminance which is the amount of light emitted by an object. The luminance of an
object is dependent on the amount of light falling on the object’s surface and its
reflective properties. Luminance is a physical characteristic and can be measured
using a photometer. Contrastis related to luminance: it is a function of the luminance
of an object and the luminance of its background.
Although brightness is a subjective response, it can be described in terms of the
amount of luminance that gives a just noticeable difference in brightness. However,
the visual system itself also compensates for changes in brightness. In dim lighting,
the rods predominate vision. Since there are fewer rods on the fovea, objects in low
lighting can be seen less easily when fixated upon, and are more visible in peripheral
vision. In normal lighting, the cones take over.
Visual acuity increases with increased luminance. This may be an argument
for using high display luminance. However, as luminance increases, flicker also
increases. The eye will perceive a light switched on and off rapidly as constantly
on. But if the speed of switching is less than 50 Hz then the light is perceived to
flicker. In high luminance flicker can be perceived at over 50 Hz. Flicker is also
more noticeable in peripheral vision. This means that the larger the display (and
consequently the more peripheral vision that it occupies), the more it will appear
to flicker.
Perceiving color A third factor that we need to consider is perception of color.
Color is usually regarded as being made up of three components: hue, intensity and
saturation. Hue is determined by the spectral wavelength of the light. Blues have short
wavelengths, greens medium and reds long. Approximately 150 different hues can be
discriminated by the average person. Intensity is the brightness of the color, and
saturation is the amount of whiteness in the color. By varying these two, we can
perceive in the region of 7 million different colors. However, the number of colors
that can be identified by an individual without training is far fewer (in the region
of 10).
The eye perceives color because the cones are sensitive to light of different wave-
lengths. There are three different types of cone, each sensitive to a different color
(blue, green and red). Color vision is best in the fovea, and worst at the periphery
where rods predominate. It should also be noted that only 3–4% of the fovea is
occupied by cones which are sensitive to blue light, making blue acuity lower.
Finally, we should remember that around 8% of males and 1% of females suffer
from color blindness, most commonly being unable to discriminate between red and
green.
1.2 Input–output channels 19
The capabilities and limitations of visual processing
In considering the way in which we perceive images we have already encountered
some of the capabilities and limitations of the human visual processing system.
However, we have concentrated largely on low-level perception. Visual processing
involves the transformation and interpretation of a complete image, from the light
that is thrown onto the retina. As we have already noted, our expectations affect the
way an image is perceived. For example, if we know that an object is a particular size,
we will perceive it as that size no matter how far it is from us.
Visual processing compensates for the movement of the image on the retina
which occurs as we move around and as the object which we see moves. Although
the retinal image is moving, the image that we perceive is stable. Similarly, color and
brightness of objects are perceived as constant, in spite of changes in luminance.
This ability to interpret and exploit our expectations can be used to resolve ambi-
guity. For example, consider the image shown in Figure 1.3. What do you perceive?
Now consider Figure 1.4 and Figure 1.5. The context in which the object appears
Figure 1.3 An ambiguous shape?
Figure 1.4 ABC
20 Chapter 1 n The human
allows our expectations to clearly disambiguate the interpretation of the object, as
either a B or a 13.
However, it can also create optical illusions. For example, consider Figure 1.6.
Which line is longer? Most people when presented with this will say that the top
line is longer than the bottom. In fact, the two lines are the same length. This may be
due to a false application of the law of size constancy: the top line appears like a con-
cave edge, the bottom like a convex edge. The former therefore seems further away
than the latter and is therefore scaled to appear larger. A similar illusion is the Ponzo
illusion (Figure 1.7). Here the top line appears longer, owing to the distance effect,
although both lines are the same length. These illusions demonstrate that our per-
ception of size is not completely reliable.
Another illusion created by our expectations compensating an image is the proof-
reading illusion. Read the text in Figure 1.8 quickly. What does it say? Most people
reading this rapidly will read it correctly, although closer inspection shows that the
word ‘the’ is repeated in the second and third line.
These are just a few examples of how the visual system compensates, and some-
times overcompensates, to allow us to perceive the world around us.
Figure 1.5 12 13 14
Figure 1.6 The Muller–Lyer illusion – which line is longer?
1.2 Input–output channels 21
Figure 1.7 The Ponzo illusion – are these the same size?
Figure 1.8 Is this text correct?
22 Chapter 1 n The human
Reading
So far we have concentrated on the perception of images in general. However,
the perception and processing of text is a special case that is important to interface
design, which invariably requires some textual display. We will therefore end
this section by looking at reading. There are several stages in the reading process.
First, the visual pattern of the word on the page is perceived. It is then decoded
with reference to an internal representation of language. The final stages of lan-
guage processing include syntactic and semantic analysis and operate on phrases or
sentences.
We are most concerned with the first two stages of this process and how they
influence interface design. During reading, the eye makes jerky movements called
saccades followed by fixations. Perception occurs during the fixation periods, which
account for approximately 94% of the time elapsed. The eye moves backwards over
the text as well as forwards, in what are known as regressions. If the text is complex
there will be more regressions.
Adults read approximately 250 words a minute. It is unlikely that words are
scanned serially, character by character, since experiments have shown that words can
be recognized as quickly as single characters. Instead, familiar words are recognized
using word shape. This means that removing the word shape clues (for example, by
capitalizing words) is detrimental to reading speed and accuracy.
The speed at which text can be read is a measure of its legibility. Experiments have
shown that standard font sizes of 9 to 12 points are equally legible, given pro-
portional spacing between lines [346]. Similarly line lengths of between 2.3 and 5.2
inches (58 and 132 mm) are equally legible. However, there is evidence that reading
from a computer screen is slower than from a book [244]. This is thought to be
due to a number of factors including a longer line length, fewer words to a page,
DESIGN FOCUS
Where’s the middle?
Optical illusions highlight the differences between the way things are and the way we perceive them –
and in interface design we need to be aware that we will not always perceive things exactly as they are.
The way that objects are composed together will affect the way we perceive them, and we do not per-
ceive geometric shapes exactly as they are drawn. For example, we tend to magnify horizontal lines and
reduce vertical. So a square needs to be slightly increased in height to appear square and lines will
appear thicker if horizontal rather than vertical.
Optical illusions also affect page symmetry. We tend to see the center of a page as being a little above
the actual center – so if a page is arranged symmetrically around the actual center, we will see it as too
low down. In graphic design this is known as the optical center – and bottom page margins tend to be
increased by 50% to compensate.
1.2 Input–output channels 23
orientation and the familiarity of the medium of the page. These factors can of
course be reduced by careful design of textual interfaces.
A final word about the use of contrast in visual display: a negative contrast (dark
characters on a light screen) provides higher luminance and, therefore, increased
acuity, than a positive contrast. This will in turn increase legibility. However, it will
also be more prone to flicker. Experimental evidence suggests that in practice negat-
ive contrast displays are preferred and result in more accurate performance [30].
1.2.2 Hearing
The sense of hearing is often considered secondary to sight, but we tend to under-
estimate the amount of information that we receive through our ears. Close your eyes
for a moment and listen. What sounds can you hear? Where are they coming from?
What is making them? As I sit at my desk I can hear cars passing on the road outside,
machinery working on a site nearby, the drone of a plane overhead and bird song.
But I can also tell wherethe sounds are coming from, and estimate how far away they
are. So from the sounds I hear I can tell that a car is passing on a particular road near
my house, and which direction it is traveling in. I know that building work is in
progress in a particular location, and that a certain type of bird is perched in the tree
in my garden.
The auditory system can convey a lot of information about our environment. But
how does it work?
The human ear
Just as vision begins with light, hearing begins with vibrations in the air or sound
waves. The ear receives these vibrations and transmits them, through various stages,
to the auditory nerves. The ear comprises three sections, commonly known as the
outer ear, middle ear and inner ear.
The outer ear is the visible part of the ear. It has two parts: the pinna, which is
the structure that is attached to the sides of the head, and the auditory canal, along
which sound waves are passed to the middle ear. The outer ear serves two purposes.
First, it protects the sensitive middle ear from damage. The auditory canal contains
wax which prevents dust, dirt and over-inquisitive insects reaching the middle ear.
It also maintains the middle ear at a constant temperature. Secondly, the pinna and
auditory canal serve to amplify some sounds.
The middle ear is a small cavity connected to the outer ear by the tympanic
membrane, or ear drum, and to the inner ear by the cochlea. Within the cavity are the
ossicles, the smallest bones in the body. Sound waves pass along the auditory canal
and vibrate the ear drum which in turn vibrates the ossicles, which transmit the
vibrations to the cochlea, and so into the inner ear. This ‘relay’ is required because,
unlike the air-filled outer and middle ears, the inner ear is filled with a denser
cochlean liquid. If passed directly from the air to the liquid, the transmission of the
sound waves would be poor. By transmitting them via the ossicles the sound waves
are concentrated and amplified.
24 Chapter 1 n The human
The waves are passed into the liquid-filled cochlea in the inner ear. Within
the cochlea are delicate hair cells or cilia that bend because of the vibrations in the
cochlean liquid and release a chemical transmitter which causes impulses in the
auditory nerve.
Processing sound
As we have seen, sound is changes or vibrations in air pressure. It has a number of
characteristics which we can differentiate. Pitchis the frequency of the sound. A low
frequency produces a low pitch, a high frequency, a high pitch. Loudness is propor-
tional to the amplitude of the sound; the frequency remains constant. Timbrerelates
to the type of the sound: sounds may have the same pitch and loudness but be made
by different instruments and so vary in timbre. We can also identify a sound’s loca-
tion, since the two ears receive slightly different sounds, owing to the time difference
between the sound reaching the two ears and the reduction in intensity caused by the
sound waves reflecting from the head.
The human ear can hear frequencies from about 20 Hz to 15 kHz. It can distin-
guish frequency changes of less than 1.5 Hz at low frequencies but is less accurate at
high frequencies. Different frequencies trigger activity in neurons in different parts
of the auditory system, and cause different rates of firing of nerve impulses.
The auditory system performs some filtering of the sounds received, allowing us
to ignore background noise and concentrate on important information. We are
selective in our hearing, as illustrated by the cocktail party effect, where we can pick
out our name spoken across a crowded noisy room. However, if sounds are too loud,
or frequencies too similar, we are unable to differentiate sound.
As we have seen, sound can convey a remarkable amount of information. It is
rarely used to its potential in interface design, usually being confined to warning
sounds and notifications. The exception is multimedia, which may include music,
voice commentary and sound effects. However, the ear can differentiate quite subtle
sound changes and can recognize familiar sounds without concentrating attention
on the sound source. This suggests that sound could be used more extensively in
interface design, to convey information about the system state, for example. This is
discussed in more detail in Chapter 10.
Worked exercise Suggest ideas for an interface which uses the properties of sound effectively.
Answer You might approach this exercise by considering how sound could be added to an appli-
cation with which you are familiar. Use your imagination. This is also a good subject for
a literature survey (starting with the references in Chapter 10).
Speech sounds can obviously be used to convey information. This is useful not only for
the visually impaired but also for any application where the user’s attention has to be
divided (for example, power plant control, flight control, etc.). Uses of non-speech
sounds include the following:
n Attention – to attract the user’s attention to a critical situation or to the end of a
process, for example.
1.2 Input–output channels 25
n Status information – continuous background sounds can be used to convey status
information. For example, monitoring the progress of a process (without the need
for visual attention).
n Confirmation – a sound associated with an action to confirm that the action has
been carried out. For example, associating a sound with deleting a file.
n Navigation – using changing sound to indicate where the user is in a system. For
example, what about sound to support navigation in hypertext?
1.2.3 Touch
The third and last of the senses that we will consider is touch or haptic perception.
Although this sense is often viewed as less important than sight or hearing, imagine
life without it. Touch provides us with vital information about our environment.
It tells us when we touch something hot or cold, and can therefore act as a warning. It
also provides us with feedback when we attempt to lift an object, for example. Con-
sider the act of picking up a glass of water. If we could only see the glass and not
feel when our hand made contact with it or feel its shape, the speed and accuracy of
the action would be reduced. This is the experience of users of certain virtual reality
games: they can see the computer-generated objects which they need to manipulate
but they have no physical sensation of touching them. Watching such users can be
an informative and amusing experience! Touch is therefore an important means of
feedback, and this is no less so in using computer systems. Feeling buttons depress is
an important part of the task of pressing the button. Also, we should be aware that,
although for the average person, haptic perception is a secondary source of informa-
tion, for those whose other senses are impaired, it may be vitally important. For such
users, interfaces such as braille may be the primary source of information in the
interaction. We should not therefore underestimate the importance of touch.
The apparatus of touch differs from that of sight and hearing in that it is not local-
ized. We receive stimuli through the skin. The skin contains three types of sensory
receptor: thermoreceptors respond to heat and cold, nociceptors respond to intense
pressure, heat and pain, and mechanoreceptors respond to pressure. It is the last of
these that we are concerned with in relation to human–computer interaction.
There are two kinds of mechanoreceptor, which respond to different types of
pressure. Rapidly adapting mechanoreceptors respond to immediate pressure as the
skin is indented. These receptors also react more quickly with increased pressure.
However, they stop responding if continuous pressure is applied. Slowly adapting
mechanoreceptors respond to continuously applied pressure.
Although the whole of the body contains such receptors, some areas have greater
sensitivity or acuity than others. It is possible to measure the acuity of different areas
of the body using the two-point threshold test. Take two pencils, held so their tips are
about 12 mm apart. Touch the points to your thumb and see if you can feel two
points. If you cannot, move the points a little further apart. When you can feel two
points, measure the distance between them. The greater the distance, the lower the
sensitivity. You can repeat this test on different parts of your body. You should find
26 Chapter 1 n The human
that the measure on the forearm is around 10 times that of the finger or thumb. The
fingers and thumbs have the highest acuity.
A second aspect of haptic perception is kinesthesis: awareness of the position of
the body and limbs. This is due to receptors in the joints. Again there are three
types: rapidly adapting, which respond when a limb is moved in a particular direc-
tion; slowly adapting, which respond to both movement and static position; and
positional receptors, which only respond when a limb is in a static position. This
perception affects both comfort and performance. For example, for a touch typist,
awareness of the relative positions of the fingers and feedback from the keyboard are
very important.
1.2.4 Movement
Before leaving this section on the human’s input–output channels, we need to
consider motor control and how the way we move affects our interaction with com-
puters. A simple action such as hitting a button in response to a question involves
a number of processing stages. The stimulus (of the question) is received through
the sensory receptors and transmitted to the brain. The question is processed and a
valid response generated. The brain then tells the appropriate muscles to respond.
Each of these stages takes time, which can be roughly divided into reaction time and
movement time.
Movement time is dependent largely on the physical characteristics of the subjects:
their age and fitness, for example. Reaction time varies according to the sensory
channel through which the stimulus is received. A person can react to an auditory
Handling the goods
E-commerce has become very successful in some areas of sales, such as travel services,
books and CDs, and food. However, in some retail areas, such as clothes shopping, e-commerce
has been less successful. Why?
When buying train and airline tickets and, to some extent, books and food, the experience of shop-
ping is less important than the convenience. So, as long as we know what we want, we are happy
to shop online. With clothes, the experience of shopping is far more important. We need to be
able to handle the goods, feel the texture of the material, check the weight to test quality. Even if
we know that something will fit us we still want to be able to handle it before buying.
Research into haptic interaction (see Chapter 2 and Chapter 10) is looking at ways of solving this
problem. By using special force feedback and tactile hardware, users are able to feel surfaces
and shape. For example, a demonstration environment called TouchCity allows people to walk
around a virtual shopping mall, pick up products and feel their texture and weight. A key problem
with the commercial use of such an application, however, is that the haptic experience requires
expensive hardware not yet available to the average e-shopper. However, in future, such immer-
sive e-commerce experiences are likely to be the norm. (See www.novint.com/)
1.3 Human memory 27
signal in approximately 150 ms, to a visual signal in 200 ms and to pain in 700 ms.
However, a combined signal will result in the quickest response. Factors such as skill
or practice can reduce reaction time, and fatigue can increase it.
A second measure of motor skill is accuracy. One question that we should ask is
whether speed of reaction results in reduced accuracy. This is dependent on the task
and the user. In some cases, requiring increased reaction time reduces accuracy. This
is the premise behind many arcade and video games where less skilled users fail at
levels of play that require faster responses. However, for skilled operators this is not
necessarily the case. Studies of keyboard operators have shown that, although the
faster operators were up to twice as fast as the others, the slower ones made 10 times
the errors.
Speed and accuracy of movement are important considerations in the design
of interactive systems, primarily in terms of the time taken to move to a particular
target on a screen. The target may be a button, a menu item or an icon, for example.
The time taken to hit a target is a function of the size of the target and the distance
that has to be moved. This is formalized in Fitts’ law [135]. There are many vari-
ations of this formula, which have varying constants, but they are all very similar.
One common form is
Movement time =a + b log
2
(distance/size + 1)
where a and b are empirically determined constants.
This affects the type of target we design. Since users will find it more difficult
to manipulate small objects, targets should generally be as large as possible and
the distance to be moved as small as possible. This has led to suggestions that pie-
chart-shaped menus are preferable to lists since all options are equidistant. However,
the trade-off is increased use of screen estate, so the choice may not be so simple.
If lists are used, the most frequently used options can be placed closest to the user’s
start point (for example, at the top of the menu). The implications of Fitts’ law in
design are discussed in more detail in Chapter 12.
HUMAN MEMORY
Have you ever played the memory game? The idea is that each player has to recount
a list of objects and add one more to the end. There are many variations but the
objects are all loosely related: ‘I went to the market and bought a lemon, some
oranges, bacon. . .’ or ‘I went to the zoo and saw monkeys, and lions, and tigers . . .’
and so on. As the list grows objects are missed out or recalled in the wrong order and
so people are eliminated from the game. The winner is the person remaining at the
end. Such games rely on our ability to store and retrieve information, even seemingly
arbitrary items. This is the job of our memory system.
Indeed, much of our everyday activity relies on memory. As well as storing all our
factual knowledge, our memory contains our knowledge of actions or procedures.
1.3
28 Chapter 1 n The human
It allows us to repeat actions, to use language, and to use new information received
via our senses. It also gives us our sense of identity, by preserving information from
our past experiences.
But how does our memory work? How do we remember arbitrary lists such as
those generated in the memory game? Why do some people remember more easily
than others? And what happens when we forget?
In order to answer questions such as these, we need to understand some of the
capabilities and limitations of human memory. Memory is the second part of our
model of the human as an information-processing system. However, as we noted
earlier, such a division is simplistic since, as we shall see, memory is associated with
each level of processing. Bearing this in mind, we will consider the way in which
memory is structured and the activities that take place within the system.
It is generally agreed that there are three types of memory or memory function:
sensorybuffers, short-term memory or working memory, and long-term memory. There
is some disagreement as to whether these are three separate systems or different
functions of the same system. We will not concern ourselves here with the details
of this debate, which is discussed in detail by Baddeley [21], but will indicate the
evidence used by both sides as we go along. For our purposes, it is sufficient to note
three separate types of memory. These memories interact, with information being
processed and passed between memory stores, as shown in Figure 1.9.
1.3.1 Sensor y memory
The sensory memories act as buffers for stimuli received through the senses. A
sensory memory exists for each sensory channel: iconic memory for visual stimuli,
echoic memory for aural stimuli and haptic memory for touch. These memories are
constantly overwritten by new information coming in on these channels.
We can demonstrate the existence of iconic memory by moving a finger in front
of the eye. Can you see it in more than one place at once? This indicates a persistence
of the image after the stimulus has been removed. A similar effect is noticed most
vividly at firework displays where moving sparklers leave a persistent image.
Information remains in iconic memory very briefly, in the order of 0.5 seconds.
Similarly, the existence of echoic memory is evidenced by our ability to ascertain
the direction from which a sound originates. This is due to information being
received by both ears. However, since this information is received at different times,
we must store the stimulus in the meantime. Echoic memory allows brief ‘play-back’
Figure 1.9 A model of the structure of memory
1.3 Human memory 29
of information. Have you ever had someone ask you a question when you are
reading? You ask them to repeat the question, only to realize that you know what was
asked after all. This experience, too, is evidence of the existence of echoic memory.
Information is passed from sensory memory into short-term memory by atten-
tion, thereby filtering the stimuli to only those which are of interest at a given time.
Attention is the concentration of the mind on one out of a number of competing
stimuli or thoughts. It is clear that we are able to focus our attention selectively,
choosing to attend to one thing rather than another. This is due to the limited capa-
city of our sensory and mental processes. If we did not selectively attend to the
stimuli coming into our senses, we would be overloaded. We can choose which stimuli
to attend to, and this choice is governed to an extent by our arousal, our level of
interest or need. This explains the cocktail party phenomenon mentioned earlier:
we can attend to one conversation over the background noise, but we may choose
to switch our attention to a conversation across the room if we hear our name
mentioned. Information received by sensory memories is quickly passed into a more
permanent memory store, or overwritten and lost.
1.3.2 Shor t-term memory
Short-term memory or working memory acts as a ‘scratch-pad’ for temporary recall
of information. It is used to store information which is only required fleetingly. For
example, calculate the multiplication 35× 6 in your head. The chances are that you
will have done this calculation in stages, perhaps 5× 6 and then 30 × 6 and added
the results; or you may have used the fact that 6= 2 × 3 and calculated 2 × 35 = 70
followed by 3× 70. To perform calculations such as this we need to store the inter-
mediate stages for use later. Or consider reading. In order to comprehend this
sentence you need to hold in your mind the beginning of the sentence as you read
the rest. Both of these tasks use short-term memory.
Short-term memory can be accessed rapidly, in the order of 70 ms. However, it
also decays rapidly, meaning that information can only be held there temporarily, in
the order of 200 ms.
Short-term memory also has a limited capacity. There are two basic methods for
measuring memory capacity. The first involves determining the length of a sequence
which can be remembered in order. The second allows items to be freely recalled in
any order. Using the first measure, the average person can remember 7± 2 digits.
This was established in experiments by Miller [234]. Try it. Look at the following
number sequence:
265397620853
Now write down as much of the sequence as you can remember. Did you get it all
right? If not, how many digits could you remember? If you remembered between five
and nine digits your digit span is average.
Now try the following sequence:
44 113 245 8920
30 Chapter 1 n The human
Did you recall that more easily? Here the digits are grouped or chunked. A general-
ization of the 7± 2 rule is that we can remember 7 ± 2 chunks of information.
Therefore chunking information can increase the short-term memory capacity. The
limited capacity of short-term memory produces a subconscious desire to create
chunks, and so optimize the use of the memory. The successful formation of a chunk
is known as closure. This process can be generalized to account for the desire to com-
plete or close tasks held in short-term memory. If a subject fails to do this or is pre-
vented from doing so by interference, the subject is liable to lose track of what she is
doing and make consequent errors.
DESIGN FOCUS
Cashing in
Closure gives you a nice ‘done it’ when we complete some part of a task. At this point our minds have
a tendency to flush short-term memory in order to get on with the next job. Early automatic teller
machines (ATMs) gave the customer money before returning their bank card. On receiving the money
the customer would reach closure and hence often forget to take the card. Modern ATMs return the
card first!
The sequence of chunks given above also makes use of pattern abstraction: it is
written in the form of a UK telephone number which makes it easier to remember.
We may even recognize the first sets of digits as the international code for the UK
and the dialing code for Leeds – chunks of information. Patterns can be useful as aids
1.3 Human memory 31
to memory. For example, most people would have difficulty remembering the fol-
lowing sequence of chunks:
HEC ATR ANU PTH ETR EET
However, if you notice that by moving the last character to the first position, you get
the statement ‘the cat ran up the tree’, the sequence is easy to recall.
In experiments where subjects were able to recall words freely, evidence shows that
recall of the last words presented is better than recall of those in the middle [296].
This is known as the recency effect. However, if the subject is asked to perform
another task between presentation and recall (for example, counting backwards) the
recency effect is eliminated. The recall of the other words is unaffected. This suggests
that short-term memory recall is damaged by interference of other information.
However, the fact that this interference does not affect recall of earlier items provides
some evidence for the existence of separate long-term and short-term memories. The
early items are held in a long-term store which is unaffected by the recency effect.
Interference does not necessarily impair recall in short-term memory. Baddeley asked
subjects to remember six-digit numbers and attend to sentence processing at the same
time [21]. They were asked to answer questions on sentences, such as ‘A precedes B:
AB is true or false?’. Surprisingly, this did not result in interference, suggesting that
in fact short-term memory is not a unitary system but is made up of a number of
components, including a visual channel and an articulatory channel. The task of sen-
tence processing used the visual channel, while the task of remembering digits used
the articulatory channel, so interference only occurs if tasks utilize the same channel.
These findings led Baddeley to propose a model of working memory that incorp-
orated a number of elements together with a central processing executive. This is
illustrated in Figure 1.10.
Figure 1.10 A more detailed model of short-term memory
32 Chapter 1 n The human
1.3.3 Long-term memor y
If short-term memory is our working memory or ‘scratch-pad’, long-term memory
is our main resource. Here we store factual information, experiential knowledge,
procedural rules of behavior – in fact, everything that we ‘know’. It differs from
short-term memory in a number of significant ways. First, it has a huge, if not unlim-
ited, capacity. Secondly, it has a relatively slow access time of approximately a tenth
of a second. Thirdly, forgetting occurs more slowly in long-term memory, if at all.
These distinctions provide further evidence of a memory structure with several parts.
Long-term memory is intended for the long-term storage of information.
Information is placed there from working memory through rehearsal. Unlike work-
ing memory there is little decay: long-term recall after minutes is the same as that
after hours or days.
Long-term memory structure
There are two types of long-term memory: episodic memory and semantic memory.
Episodic memory represents our memory of events and experiences in a serial form.
It is from this memory that we can reconstruct the actual events that took place at a
given point in our lives. Semantic memory, on the other hand, is a structured record
of facts, concepts and skills that we have acquired. The information in semantic
memory is derived from that in our episodic memory, such that we can learn new
facts or concepts from our experiences.
Semantic memory is structured in some way to allow access to information,
representation of relationships between pieces of information, and inference. One
model for the way in which semantic memory is structured is as a network. Items are
DESIGN FOCUS
7 ±2 revisited
When we looked at short-term memory, we noted the general rule that people can hold 7± 2 items
or chunks of information in short-term memory. It is a principle that people tend to remember but it
can be misapplied. For example, it is often suggested that this means that lists, menus and other groups
of items should be designed to be no more than 7 items long. But use of menus and lists of course has
little to do with short-term memory – they are available in the environment as cues and so do not need
to be remembered.
On the other hand the 7± 2 rule would apply in command line interfaces. Imagine a scenario where a
UNIX user looks up a command in the manual. Perhaps the command has a number of parameters of
options, to be applied in a particular order, and it is going to be applied to several files that have long
path names. The user then has to hold the command, its parameters and the file path names in short-
term memory while he types them in. Here we could say that the task may cause problems if the num-
ber of items or chunks in the command line string is more than 7.
1.3 Human memory 33
associated to each other in classes, and may inherit attributes from parent classes.
This model is known as a semantic network. As an example, our knowledge about
dogs may be stored in a network such as that shown in Figure 1.11.
Specific breed attributes may be stored with each given breed, yet general dog
information is stored at a higher level. This allows us to generalize about specific
cases. For instance, we may not have been told that the sheepdog Shadow has four
legs and a tail, but we can infer this information from our general knowledge about
sheepdogs and dogs in general. Note also that there are connections within the net-
work which link into other domains of knowledge, for example cartoon characters.
This illustrates how our knowledge is organized by association.
The viability of semantic networks as a model of memory organization has been
demonstrated by Collins and Quillian [74]. Subjects were asked questions about
different properties of related objects and their reaction times were measured. The
types of question asked (taking examples from our own network) were ‘Can a collie
breathe?’, ‘Is a beagle a hound?’ and ‘Does a hound track?’ In spite of the fact that the
answers to such questions may seem obvious, subjects took longer to answer ques-
tions such as ‘Can a collie breathe?’ than ones such as ‘Does a hound track?’ The
reason for this, it is suggested, is that in the former case subjects had to search fur-
ther through the memory hierarchy to find the answer, since information is stored
at its most abstract level.
A number of other memory structures have been proposed to explain how we
represent and store different types of knowledge. Each of these represents a different
Figure 1.11 Long-term memory may store information in a semantic network
34 Chapter 1 n The human
aspect of knowledge and, as such, the models can be viewed as complementary rather
than mutually exclusive. Semantic networks represent the associations and relation-
ships between single items in memory. However, they do not allow us to model the
representation of more complex objects or events, which are perhaps composed of
a number of items or activities. Structured representations such as framesand scripts
organize information into data structures. Slots in these structures allow attribute
values to be added. Frame slots may contain default, fixed or variable information.
A frame is instantiated when the slots are filled with appropriate values. Frames
and scripts can be linked together in networks to represent hierarchical structured
knowledge.
Returning to the ‘dog’ domain, a frame-based representation of the knowledge
may look something like Figure 1.12. The fixed slots are those for which the attribute
value is set, default slots represent the usual attribute value, although this may be
overridden in particular instantiations (for example, the Basenji does not bark), and
variable slots can be filled with particular values in a given instance. Slots can also
contain procedural knowledge. Actions or operations can be associated with a slot
and performed, for example, whenever the value of the slot is changed.
Frames extend semantic nets to include structured, hierarchical information. They
represent knowledge items in a way which makes explicit the relative importance of
each piece of information.
Scripts attempt to model the representation of stereotypical knowledge about situ-
ations. Consider the following sentence:
John took his dog to the surgery. After seeing the vet, he left.
From our knowledge of the activities of dog owners and vets, we may fill in a
substantial amount of detail. The animal was ill. The vet examined and treated the
animal. John paid for the treatment before leaving. We are less likely to assume the
alternative reading of the sentence, that John took an instant dislike to the vet on
sight and did not stay long enough to talk to him!
Figure 1.12 A frame-based representation of knowledge
1.3 Human memory 35
A script represents this default or stereotypical information, allowing us to inter-
pret partial descriptions or cues fully. A script comprises a number of elements,
which, like slots, can be filled with appropriate information:
Entry conditions Conditions that must be satisfied for the script to be activated.
Result Conditions that will be true after the script is terminated.
Props Objects involved in the events described in the script.
Roles Actions performed by particular participants.
Scenes The sequences of events that occur.
Tracks A variation on the general pattern representing an alternative scenario.
An example script for going to the vet is shown in Figure 1.13.
A final type of knowledge representation which we hold in memory is the repre-
sentation of procedural knowledge, our knowledge of how to do something. A com-
mon model for this is the production system. Condition–action rules are stored
in long-term memory. Information coming into short-term memory can match a
condition in one of these rules and result in the action being executed. For example,
a pair of production rules might be
IF dog is wagging tail
THEN pat dog
IF dog is growling
THEN run away
If we then meet a growling dog, the condition in the second rule is matched, and we
respond by turning tail and running. (Not to be recommended by the way!)
Figure 1.13 A script for visiting the vet
36 Chapter 1 n The human
Long-term memory processes
So much for the structure of memory, but what about the processes which it uses?
There are three main activities related to long-term memory: storage or remember-
ing of information, forgetting and information retrieval. We shall consider each of
these in turn.
First, how does information get into long-term memory and how can we improve
this process? Information from short-term memory is stored in long-term memory by
rehearsal. The repeated exposure to a stimulus or the rehearsal of a piece of informa-
tion transfers it into long-term memory.
This process can be optimized in a number of ways. Ebbinghaus performed
numerous experiments on memory, using himself as a subject [117]. In these experi-
ments he tested his ability to learn and repeat nonsense syllables, comparing his
recall minutes, hours and days after the learning process. He discovered that the
amount learned was directly proportional to the amount of time spent learning.
This is known as the total time hypothesis. However, experiments by Baddeley and
others suggest that learning time is most effective if it is distributed over time [22].
For example, in an experiment in which Post Office workers were taught to type,
those whose training period was divided into weekly sessions of one hour performed
better than those who spent two or four hours a week learning (although the former
obviously took more weeks to complete their training). This is known as the distribu-
tion of practice effect.
However, repetition is not enough to learn information well. If information is
not meaningful it is more difficult to remember. This is illustrated by the fact that
it is more difficult to remember a set of words representing concepts than a set of
words representing objects. Try it. First try to remember the words in list A and test
yourself.
List A: Faith Age Cold Tenet Quiet Logic Idea Value Past Large
Now try list B.
List B: Boat Tree Cat Child Rug Plate Church Gun Flame Head
The second list was probably easier to remember than the first since you could
visualize the objects in the second list.
Sentences are easier still to memorize. Bartlett performed experiments on remem-
bering meaningful information (as opposed to meaningless such as Ebbinghaus
used) [28]. In one such experiment he got subjects to learn a story about an un-
familiar culture and then retell it. He found that subjects would retell the story
replacing unfamiliar words and concepts with words which were meaningful to
them. Stories were effectively translated into the subject’s own culture. This is related
to the semantic structuring of long-term memory: if information is meaningful and
familiar, it can be related to existing structures and more easily incorporated into
memory.
1.3 Human memory 37
So if structure, familiarity and concreteness help us in learning information, what
causes us to lose this information, to forget? There are two main theories of forget-
ting: decay and interference. The first theory suggests that the information held in
long-term memory may eventually be forgotten. Ebbinghaus concluded from his
experiments with nonsense syllables that information in memory decayed logarith-
mically, that is that it was lost rapidly to begin with, and then more slowly. Jost’s law,
which follows from this, states that if two memory traces are equally strong at a given
time the older one will be more durable.
The second theory is that information is lost from memory through interference.
If we acquire new information it causes the loss of old information. This is termed
retroactiveinterference. A common example of this is the fact that if you change tele-
phone numbers, learning your new number makes it more difficult to remember
your old number. This is because the new association masks the old. However, some-
times the old memory trace breaks through and interferes with new information.
This is called proactive inhibition. An example of this is when you find yourself driv-
ing to your old house rather than your new one.
Forgetting is also affected by emotional factors. In experiments, subjects given
emotive words and non-emotive words found the former harder to remember in
the short term but easier in the long term. Indeed, this observation tallies with our
experience of selective memory. We tend to remember positive information rather
than negative (hence nostalgia for the ‘good old days’), and highly emotive events
rather than mundane.
Memorable or secure?
As online activities become more widespread, people are having to remember more and
more access information, such as passwords and security checks. The average active internet user
may have separate passwords and user names for several email accounts, mailing lists, e-shopping
sites, e-banking, online auctions and more! Remembering these passwords is not easy.
From a security perspective it is important that passwords are random. Words and names are very
easy to crack, hence the recommendation that passwords are frequently changed and constructed
from random strings of letters and numbers. But in reality these are the hardest things for people to
commit to memory. Hence many people will use the same password for all their online activities
(rarely if ever changing it) and will choose a word or a name that is easy for them to remember,
in spite of the obviously increased security risks. Security here is in conflict with memorability!
A solution to this is to construct a nonsense password out of letters or numbers that will have
meaning to you but will not make up a word in a dictionary (e.g. initials of names, numbers from
significant dates or postcodes, and so on). Then what is remembered is the meaningful rule for
constructing the password, and not a meaningless string of alphanumeric characters.
38 Chapter 1 n The human
It is debatable whether we ever actually forget anything or whether it just becomes
increasingly difficult to access certain items from memory. This question is in some
ways moot since it is impossible to prove that we do forget: appearing to have for-
gotten something may just be caused by not being able to retrieve it! However, there
is evidence to suggest that we may not lose information completely from long-term
memory. First, proactive inhibition demonstrates the recovery of old information
even after it has been ‘lost’ by interference. Secondly, there is the ‘tip of the tongue’
experience, which indicates that some information is present but cannot be satisfac-
torily accessed. Thirdly, information may not be recalled but may be recognized, or
may be recalled only with prompting.
This leads us to the third process of memory: information retrieval. Here we need
to distinguish between two types of information retrieval, recall and recognition. In
recall the information is reproduced from memory. In recognition, the presentation
of the information provides the knowledge that the information has been seen
before. Recognition is the less complex cognitive activity since the information is
provided as a cue.
However, recall can be assisted by the provision of retrieval cues, which enable
the subject quickly to access the information in memory. One such cue is the use of
categories. In an experiment subjects were asked to recall lists of words, some of
which were organized into categories and some of which were randomly organized.
The words that were related to a category were easier to recall than the others [38].
Recall is even more successful if subjects are allowed to categorize their own lists of
words during learning. For example, consider the following list of words:
child red plane dog friend blood cold tree big angry
Now make up a story that links the words using as vivid imagery as possible. Now try
to recall as many of the words as you can. Did you find this easier than the previous
experiment where the words were unrelated?
The use of vivid imagery is a common cue to help people remember information.
It is known that people often visualize a scene that is described to them. They can
then answer questions based on their visualization. Indeed, subjects given a descrip-
tion of a scene often embellish it with additional information. Consider the follow-
ing description and imagine the scene:
The engines roared above the noise of the crowd. Even in the blistering heat people
rose to their feet and waved their hands in excitement. The flag fell and they were off.
Within seconds the car had pulled away from the pack and was careering round the
bend at a desperate pace. Its wheels momentarily left the ground as it cornered.
Coming down the straight the sun glinted on its shimmering paint. The driver gripped
the wheel with fierce concentration. Sweat lay in fine drops on his brow.
Without looking back to the passage, what color is the car?
If you could answer that question you have visualized the scene, including the
car’s color. In fact, the color of the car is not mentioned in the description
at all.
1.4 Thinking: reasoning and problem solving 39
THINKING: REASONING AND PROBLEM SOLVING
We have considered how information finds its way into and out of the human
system and how it is stored. Finally, we come to look at how it is processed and
manipulated. This is perhaps the area which is most complex and which separates
1.4
Improve your memory
Many people can perform astonishing feats of memory: recalling the sequence of cards in a
pack (or multiple packs – up to six have been reported), or recounting π to 1000 decimal places,
for example. There are also adverts to ‘Improve Your Memory’ (usually leading to success, or
wealth, or other such inducement), and so the question arises: can you improve your memory
abilities? The answer is yes; this exercise shows you one technique.
Look at the list below of numbers and associated words:
1 bun 6 sticks
2 shoe 7 heaven
3 tree 8 gate
4 door 9 wine
5 hive 10 hen
Notice that the words sound similar to the numbers. Now think about the words one at a time
and visualize them, in as much detail as possible. For example, for ‘1’, think of a large, sticky iced
bun, the base spiralling round and round, with raisins in it, covered in sweet, white, gooey icing.
Now do the rest, using as much visualization as you can muster: imagine how things would look,
smell, taste, sound, and so on.
This is your reference list, and you need to know it off by heart.
Having learnt it, look at a pile of at least a dozen odd items collected together by a colleague. The
task is to look at the collection of objects for only 30 seconds, and then list as many as possible
without making a mistake or viewing the collection again. Most people can manage between five
and eight items, if they do not know any memory-enhancing techniques like the following.
Mentally pick one (say, for example, a paper clip), and call it number one. Now visualize it inter-
acting with the bun. It can get stuck into the icing on the top of the bun, and make your fingers all
gooey and sticky when you try to remove it. If you ate the bun without noticing, you’d get a
crunched tooth when you bit into it – imagine how that would feel. When you’ve really got a
graphic scenario developed, move on to the next item, call it number two, and again visualize it
interacting with the reference item, shoe. Continue down your list, until you have done 10 things.
This should take you about the 30 seconds allowed. Then hide the collection and try and recall the
numbers in order, the associated reference word, and then the image associated with that word.
You should find that you can recall the 10 associated items practically every time. The technique
can be easily extended by extending your reference list.
40 Chapter 1 n The human
humans from other information-processing systems, both artificial and natural.
Although it is clear that animals receive and store information, there is little evid-
ence to suggest that they can use it in quite the same way as humans. Similarly,
artificial intelligence has produced machines which can see (albeit in a limited way)
and store information. But their ability to use that information is limited to small
domains.
Humans, on the other hand, are able to use information to reason and solve
problems, and indeed do these activities when the information is partial or unavail-
able. Human thought is conscious and self-aware: while we may not always be
able to identify the processes we use, we can identify the products of these processes,
our thoughts. In addition, we are able to think about things of which we have
no experience, and solve problems which we have never seen before. How is this
done?
Thinking can require different amounts of knowledge. Some thinking activities
are very directed and the knowledge required is constrained. Others require vast
amounts of knowledge from different domains. For example, performing a subtrac-
tion calculation requires a relatively small amount of knowledge, from a constrained
domain, whereas understanding newspaper headlines demands knowledge of pol-
itics, social structures, public figures and world events.
In this section we will consider two categories of thinking: reasoning and problem
solving. In practice these are not distinct since the activity of solving a problem may
well involve reasoning and vice versa. However, the distinction is a common one and
is helpful in clarifying the processes involved.
1.4.1 Reasoning
Reasoningis the process by which we use the knowledge we have to draw conclusions
or infer something new about the domain of interest. There are a number of differ-
ent types of reasoning: deductive, inductiveand abductive. We use each of these types
of reasoning in everyday life, but they differ in significant ways.
Deductive reasoning
Deductive reasoning derives the logically necessary conclusion from the given pre-
mises. For example,
If it is Friday then she will go to work
It is Friday
Therefore she will go to work.
It is important to note that this is the logical conclusion from the premises; it does
not necessarily have to correspond to our notion of truth. So, for example,
If it is raining then the ground is dry
It is raining
Therefore the ground is dry.
1.4 Thinking: reasoning and problem solving 41
is a perfectly valid deduction, even though it conflicts with our knowledge of what is
true in the world.
Deductive reasoning is therefore often misapplied. Given the premises
Some people are babies
Some babies cry
many people will infer that ‘Some people cry’. This is in fact an invalid deduction
since we are not told that all babies are people. It is therefore logically possible that
the babies who cry are those who are not people.
It is at this point, where truth and validity clash, that human deduction is poorest.
One explanation for this is that people bring their world knowledge into the reason-
ing process. There is good reason for this. It allows us to take short cuts which make
dialog and interaction between people informative but efficient. We assume a certain
amount of shared knowledge in our dealings with each other, which in turn allows
us to interpret the inferences and deductions implied by others. If validity rather
than truth was preferred, all premises would have to be made explicit.
Inductive reasoning
Induction is generalizing from cases we have seen to infer information about cases
we have not seen. For example, if every elephant we have ever seen has a trunk, we
infer that all elephants have trunks. Of course, this inference is unreliable and cannot
be proved to be true; it can only be proved to be false. We can disprove the inference
simply by producing an elephant without a trunk. However, we can never prove it
true because, no matter how many elephants with trunks we have seen or are known
to exist, the next one we see may be trunkless. The best that we can do is gather evid-
ence to support our inductive inference.
In spite of its unreliability, induction is a useful process, which we use constantly
in learning about our environment. We can never see all the elephants that have ever
lived or will ever live, but we have certain knowledge about elephants which we are
prepared to trust for all practical purposes, which has largely been inferred by induc-
tion. Even if we saw an elephant without a trunk, we would be unlikely to move from
our position that ‘All elephants have trunks’, since we are better at using positive
than negative evidence. This is illustrated in an experiment first devised by Wason
[365]. You are presented with four cards as in Figure 1.14. Each card has a number
on one side and a letter on the other. Which cards would you need to pick up to test
the truth of the statement ‘If a card has a vowel on one side it has an even number
on the other’?
A common response to this (was it yours?) is to check the E and the 4. However,
this uses only positive evidence. In fact, to test the truth of the statement we need to
check negative evidence: if we can find a card which has an odd number on one side
and a vowel on the other we have disproved the statement. We must therefore check
E and 7. (It does not matter what is on the other side of the other cards: the state-
ment does not say that all even numbers have vowels, just that all vowels have even
numbers.)
42 Chapter 1 n The human
Abductive reasoning
The third type of reasoning is abduction. Abduction reasons from a fact to the action
or state that caused it. This is the method we use to derive explanations for the events
we observe. For example, suppose we know that Sam always drives too fast when she
has been drinking. If we see Sam driving too fast we may infer that she has been
drinking. Of course, this too is unreliable since there may be another reason why she
is driving fast: she may have been called to an emergency, for example.
In spite of its unreliability, it is clear that people do infer explanations in this way,
and hold onto them until they have evidence to support an alternative theory or
explanation. This can lead to problems in using interactive systems. If an event
always follows an action, the user will infer that the event is caused by the action
unless evidence to the contrary is made available. If, in fact, the event and the action
are unrelated, confusion and even error often result.
Figure 1.14 Wason’s cards
Filling the gaps
Look again at Wason’s cards in Figure 1.14. In the text we say that you only need to check
the E and the 7. This is correct, but only because we very carefully stated in the text that ‘each
card has a number on one side and a letter on the other’. If the problem were stated without that
condition then the K would also need to be examined in case it has a vowel on the other side. In
fact, when the problem is so stated, even the most careful subjects ignore this possibility. Why?
Because the nature of the problem implicitly suggests that each card has a number on one side and
a letter on the other.
This is similar to the embellishment of the story at the end of Section 1.3.3. In fact, we constantly
fill in gaps in the evidence that reaches us through our senses. Although this can lead to errors in
our reasoning it is also essential for us to function. In the real world we rarely have all the evid-
ence necessary for logical deductions and at all levels of perception and reasoning we fill in details
in order to allow higher levels of reasoning to work.
1.4 Thinking: reasoning and problem solving 43
1.4.2 Problem solving
If reasoning is a means of inferring new information from what is already known,
problem solving is the process of finding a solution to an unfamiliar task, using the
knowledge we have. Human problem solving is characterized by the ability to adapt
the information we have to deal with new situations. However, often solutions seem
to be original and creative. There are a number of different views of how people
solve problems. The earliest, dating back to the first half of the twentieth century, is
the Gestalt view that problem solving involves both reuse of knowledge and insight.
This has been largely superseded but the questions it was trying to address remain
and its influence can be seen in later research. A second major theory, proposed in
the 1970s by Newell and Simon, was the problem space theory, which takes the view
that the mind is a limited information processor. Later variations on this drew on the
earlier theory and attempted to reinterpret Gestalt theory in terms of information-
processing theories. We will look briefly at each of these views.
Gestalt theory
Gestalt psychologists were answering the claim, made by behaviorists, that prob-
lem solving is a matter of reproducing known responses or trial and error. This
explanation was considered by the Gestalt school to be insufficient to account for
human problem-solving behavior. Instead, they claimed, problem solving is both pro-
ductive and reproductive. Reproductive problem solving draws on previous experi-
ence as the behaviorists claimed, but productive problem solving involves insight and
restructuring of the problem. Indeed, reproductive problem solving could be a hind-
rance to finding a solution, since a person may ‘fixate’ on the known aspects of the
problem and so be unable to see novel interpretations that might lead to a solution.
Gestalt psychologists backed up their claims with experimental evidence. Kohler
provided evidence of apparent insight being demonstrated by apes, which he
observed joining sticks together in order to reach food outside their cages [202].
However, this was difficult to verify since the apes had once been wild and so could
have been using previous knowledge.
Other experiments observed human problem-solving behavior. One well-known
example of this is Maier’s pendulum problem [224]. The problem was this: the
subjects were in a room with two pieces of string hanging from the ceiling. Also in
the room were other objects including pliers, poles and extensions. The task set was
to tie the pieces of string together. However, they were too far apart to catch hold
of both at once. Although various solutions were proposed by subjects, few chose
to use the weight of the pliers as a pendulum to ‘swing’ the strings together. How-
ever, when the experimenter brushed against the string, setting it in motion, this
solution presented itself to subjects. Maier interpreted this as an example of produc-
tive restructuring. The movement of the string had given insight and allowed the
subjects to see the problem in a new way. The experiment also illustrates fixation:
subjects were initially unable to see beyond their view of the role or use of a pair
of pliers.
44 Chapter 1 n The human
Although Gestalt theory is attractive in terms of its description of human problem
solving, it does not provide sufficient evidence or structure to support its theories.
It does not explain when restructuring occurs or what insight is, for example. How-
ever, the move away from behaviorist theories was helpful in paving the way for the
information-processing theory that was to follow.
Problem space theory
Newell and Simon proposed that problem solving centers on the problem space. The
problem space comprises problem states, and problem solving involves generating
these states using legal state transition operators. The problem has an initial state
and a goal state and people use the operators to move from the former to the latter.
Such problem spaces may be huge, and so heuristics are employed to select appro-
priate operators to reach the goal. One such heuristic is means–ends analysis. In
means–ends analysis the initial state is compared with the goal state and an oper-
ator chosen to reduce the difference between the two. For example, imagine you are
reorganizing your office and you want to move your desk from the north wall of the
room to the window. Your initial state is that the desk is at the north wall. The goal
state is that the desk is by the window. The main difference between these two is the
location of your desk. You have a number of operators which you can apply to mov-
ing things: you can carry them or push them or drag them, etc. However, you know
that to carry something it must be light and that your desk is heavy. You therefore
have a new subgoal: to make the desk light. Your operators for this may involve
removing drawers, and so on.
An important feature of Newell and Simon’s model is that it operates within the
constraints of the human processing system, and so searching the problem space is
limited by the capacity of short-term memory, and the speed at which information
can be retrieved. Within the problem space framework, experience allows us to solve
problems more easily since we can structure the problem space appropriately and
choose operators efficiently.
Newell and Simon’s theory, and their General Problem Solvermodel which is based
on it, have largely been applied to problem solving in well-defined domains, for
example solving puzzles. These problems may be unfamiliar but the knowledge that
is required to solve them is present in the statement of the problem and the expected
solution is clear. In real-world problems finding the knowledge required to solve
the problem may be part of the problem, or specifying the goal may be difficult.
Problems such as these require significant domain knowledge: for example, to solve
a programming problem you need knowledge of the language and the domain in
which the program operates. In this instance specifying the goal clearly may be a
significant part of solving the problem.
However, the problem space framework provides a clear theory of problem
solving, which can be extended, as we shall see when we look at skill acquisition in
the next section, to deal with knowledge-intensive problem solving. First we will look
briefly at the use of analogy in problem solving.
1.4 Thinking: reasoning and problem solving 45
Worked exercise Identify the goals and operators involved in the problem ‘delete the second paragraph of the
document’ on a word processor. Now use a word processor to delete a paragraph and note
your actions, goals and subgoals. How well did they match your earlier description?
Answer Assume you have a document open and you are at some arbitrary position within it.
You also need to decide which operators are available and what their preconditions and
results are. Based on an imaginary word processor we assume the following operators
(you may wish to use your own WP package):
Operator Precondition Result
delete_paragraph Cursor at start of paragraph Paragraph deleted
move_to_paragraph Cursor anywhere in document Cursor moves to start of
next paragraph (except
where there is no next
paragraph when no effect)
move_to_start Cursor anywhere in document Cursor at start of document
Goal: delete second paragraph in document
Looking at the operators an obvious one to resolve this goal is delete_paragraph which
has the precondition ‘cursor at start of paragraph’. We therefore have a new subgoal:
move_to_paragraph. The precondition is ‘cursor anywhere in document’ (which we can
meet) but we want the second paragraph so we must initially be in the first.
We set up a new subgoal, move_to_start, with precondition ‘cursor anywhere in docu-
ment’ and result ‘cursor at start of document’. We can then apply move_to_paragraph
and finally delete_paragraph.
We assume some knowledge here (that the second paragraph is the paragraph after the
first one).
Analogy in problem solving
A third element of problem solving is the use of analogy. Here we are interested in
how people solve novel problems. One suggestion is that this is done by mapping
knowledge relating to a similar known domain to the new problem – called analo-
gical mapping. Similarities between the known domain and the new one are noted
and operators from the known domain are transferred to the new one.
This process has been investigated using analogous stories. Gick and Holyoak
[149] gave subjects the following problem:
A doctor is treating a malignant tumor. In order to destroy it he needs to blast
it with high-intensity rays. However, these will also destroy the healthy tissue sur-
rounding the tumor. If he lessens the rays’ intensity the tumor will remain. How does
he destroy the tumor?
46 Chapter 1 n The human
The solution to this problem is to fire low-intensity rays from different directions
converging on the tumor. That way, the healthy tissue receives harmless low-
intensity rays while the tumor receives the rays combined, making a high-intensity
dose. The investigators found that only 10% of subjects reached this solution with-
out help. However, this rose to 80% when they were given this analogous story and
told that it may help them:
A general is attacking a fortress. He can’t send all his men in together as the roads are
mined to explode if large numbers of men cross them. He therefore splits his men into
small groups and sends them in on separate roads.
In spite of this, it seems that people often miss analogous information, unless it is
semantically close to the problem domain. When subjects were not told to use the
story, many failed to see the analogy. However, the number spotting the analogy rose
when the story was made semantically close to the problem, for example a general
using rays to destroy a castle.
The use of analogy is reminiscent of the Gestalt view of productive restructuring
and insight. Old knowledge is used to solve a new problem.
1.4.3 Skill acquisition
All of the problem solving that we have considered so far has concentrated on
handling unfamiliar problems. However, for much of the time, the problems that
we face are not completely new. Instead, we gradually acquire skill in a particular
domain area. But how is such skill acquired and what difference does it make to our
problem-solving performance? We can gain insight into how skilled behavior works,
and how skills are acquired, by considering the difference between novice and expert
behavior in given domains.
Chess: of human and artificial intelligence
A few years ago, Deep Blue, a chess-playing computer, beat Gary Kasparov, the world’s top
Grand Master, in a full tournament. This was the long-awaited breakthrough for the artificial
intelligence (AI) community, who have traditionally seen chess as the ultimate test of their art.
However, despite the fact that computer chess programs can play at Grand Master level against
human players, this does not mean they play in the same way. For each move played, Deep Blue
investigated many millions of alternative moves and counter-moves. In contrast, a human chess
player will only consider a few dozen. But, if the human player is good, these will usually be the
right few dozen. The ability to spot patterns allows a human to address a problem with far less
effort than a brute force approach. In chess, the number of moves is such that finally brute force,
applied fast enough, has overcome human pattern-matching skill. In Go, which has far more pos-
sible moves, computer programs do not even reach a good club level of play. Many models of the
mental processes have been heavily influenced by computation. It is worth remembering that
although there are similarities, computer ‘intelligence’ is very different from that of humans.
1.4 Thinking: reasoning and problem solving 47
A commonly studied domain is chess playing. It is particularly suitable since it
lends itself easily to representation in terms of problem space theory. The initial state
is the opening board position; the goal state is one player checkmating the other;
operators to move states are legal moves of chess. It is therefore possible to examine
skilled behavior within the context of the problem space theory of problem solving.
Studies of chess players by DeGroot, Chase and Simon, among others, produced
some interesting observations [64, 65, 88, 89]. In all the experiments the behavior of
chess masters was compared with less experienced chess players. The first observa-
tion was that players did not consider large numbers of moves in choosing their
move, nor did they look ahead more than six moves (often far fewer). Masters con-
sidered no more alternatives than the less experienced, but they took less time to
make a decision and produced better moves.
So what makes the difference between skilled and less skilled behavior in chess?
It appears that chess masters remember board configurations and good moves
associated with them. When given actual board positions to remember, masters
are much better at reconstructing the board than the less experienced. However,
when given random configurations (which were unfamiliar), the groups of players
were equally bad at reconstructing the positions. It seems therefore that expert
players ‘chunk’ the board configuration in order to hold it in short-term memory.
Expert players use larger chunks than the less experienced and can therefore re-
member more detail.
This behavior is also seen among skilled computer programmers. They can also
reconstruct programs more effectively than novices since they have the structures
available to build appropriate chunks. They acquire plans representing code to solve
particular problems. When that problem is encountered in a new domain or new
program they will recall that particular plan and reuse it.
Another observed difference between skilled and less skilled problem solving is
in the way that different problems are grouped. Novices tend to group problems
according to superficial characteristics such as the objects or features common to
both. Experts, on the other hand, demonstrate a deeper understanding of the prob-
lems and group them according to underlying conceptual similarities which may not
be at all obvious from the problem descriptions.
Each of these differences stems from a better encoding of knowledge in the expert:
information structures are fine tuned at a deep level to enable efficient and accurate
retrieval. But how does this happen? How is skill such as this acquired? One model
of skill acquisition is Anderson’s ACT* model [14]. ACT* identifies three basic
levels of skill:
1. The learner uses general-purpose rules which interpret facts about a problem.
This is slow and demanding on memory access.
2. The learner develops rules specific to the task.
3. The rules are tuned to speed up performance.
General mechanisms are provided to account for the transitions between these
levels. For example, proceduralization is a mechanism to move from the first to the
second. It removes the parts of the rule which demand memory access and replaces
48 Chapter 1 n The human
variables with specific values. Generalization, on the other hand, is a mechanism
which moves from the second level to the third. It generalizes from the specific cases
to general properties of those cases. Commonalities between rules are condensed to
produce a general-purpose rule.
These are best illustrated by example. Imagine you are learning to cook. Initially
you may have a general rule to tell you how long a dish needs to be in the oven, and
a number of explicit representations of dishes in memory. You can instantiate the
rule by retrieving information from memory.
IF cook[type, ingredients, time]
THEN
cook for: time
cook[casserole, [chicken,carrots,potatoes], 2 hours]
cook[casserole, [beef,dumplings,carrots], 2 hours]
cook[cake, [flour,sugar,butter,eggs], 45 mins]
Gradually your knowledge becomes proceduralized and you have specific rules for
each case:
IF type is casserole
AND ingredients are [chicken,carrots,potatoes]
THEN
cook for: 2 hours
IF type is casserole
AND ingredients are [beef,dumplings,carrots]
THEN
cook for: 2 hours
IF type is cake
AND ingredients are [flour,sugar,butter,eggs]
THEN
cook for: 45 mins
Finally, you may generalize from these rules to produce general-purpose rules, which
exploit their commonalities:
IF type is casserole
AND ingredients are ANYTHING
THEN
cook for: 2 hours
The first stage uses knowledge extensively. The second stage relies upon known
procedures. The third stage represents skilled behavior. Such behavior may in fact
become automatic and as such be difficult to make explicit. For example, think of
an activity at which you are skilled, perhaps driving a car or riding a bike. Try to
describe to someone the exact procedure which you go through to do this. You will
find this quite difficult. In fact experts tend to have to rehearse their actions mentally
in order to identify exactly what they do. Such skilled behavior is efficient but may
cause errors when the context of the activity changes.
1.4 Thinking: reasoning and problem solving 49
1.4.4 Errors and mental models
Human capability for interpreting and manipulating information is quite impres-
sive. However, we do make mistakes. Some are trivial, resulting in no more than
temporary inconvenience or annoyance. Others may be more serious, requiring
substantial effort to correct. Occasionally an error may have catastrophic effects, as
we see when ‘human error’ results in a plane crash or nuclear plant leak.
Why do we make mistakes and can we avoid them? In order to answer the latter
part of the question we must first look at what is going on when we make an error.
There are several different types of error. As we saw in the last section some errors
result from changes in the context of skilled behavior. If a pattern of behavior has
become automatic and we change some aspect of it, the more familiar pattern may
break through and cause an error. A familiar example of this is where we intend to
stop at the shop on the way home from work but in fact drive past. Here, the activ-
ity of driving home is the more familiar and overrides the less familiar intention.
Other errors result from an incorrect understanding, or model, of a situation or
system. People build their own theories to understand the causal behavior of sys-
tems. These have been termed mental models. They have a number of characteristics.
Mental models are often partial: the person does not have a full understanding of the
working of the whole system. They are unstable and are subject to change. They can
be internally inconsistent, since the person may not have worked through the logical
consequences of their beliefs. They are often unscientific and may be based on super-
stition rather than evidence. Often they are based on an incorrect interpretation of
the evidence.
DESIGN FOCUS
Human error and false memories
In the second edition of this book, one of the authors added the following story:
During the Second World War a new cockpit design was introduced for Spitfires. The pilots were trained
and flew successfully during training, but would unaccountably bail out when engaged in dog fights. The new
design had exchanged the positions of the gun trigger and ejector controls. In the heat of battle the old
responses resurfaced and the pilots ejected. Human error, yes, but the designer’s error, not the pilot’s.
It is a good story, but after the book was published we got several emails saying ‘Spitfires didn’t have
ejector seats’. It was Kai-Mikael Jää-Aro who was able to find what may have been the original to the
story (and incidentally inform us what model of Spitfire was in our photo and who the pilot was!). He
pointed us to and translated the story of Sierra 44, an S35E Draken reconnaissance aircraft.
1
The full
story involves just about every perceptual and cognitive error imaginable, but the point that links to
1. Pej Kristoffersson, 1984. Sigurd 44 – Historien om hur man gör bort sig så att det märks by, Flygrevyn 2/1984, pp. 44 –6.
Assuming a person builds a mental model of the system being dealt with, errors
may occur if the actual operation differs from the mental model. For example, on
one occasion we were staying in a hotel in Germany, attending a conference. In the
lobby of the hotel was a lift. Beside the lift door was a button. Our model of the sys-
tem, based on previous experience of lifts, was that the button would call the lift. We
pressed the button and the lobby light went out! In fact the button was a light switch
and the lift button was on the inside rim of the lift, hidden from view.
the (false) Spitfire story is that in the Draken the red buttons for releasing the fuel ‘drop’ tanks and for
the canopy release differed only in very small writing. In an emergency (burning fuel tanks) the pilot
accidentally released the canopy and so ended up flying home cabriolet style.
There is a second story of human error here – the author’s memory. When the book was written he
could not recall where he had come across the story but was convinced it was to do with a Spitfire. It
may be that he had been told the story by someone else who had got it mixed up, but it is as likely that
he simply remembered the rough outline of the story and then ‘reconstructed’ the rest. In fact that is
exactly how our memories work. Our brains do not bother to lay down every little detail, but when
we ‘remember’ we rebuild what the incident ‘must have been’ using our world knowledge. This pro-
cess is completely unconscious and can lead to what are known as false memories. This is particularly
problematic in witness statements in criminal trials as early questioning by police or lawyers can unin-
tentionally lead to witnesses being sure they have seen things that they have not. Numerous controlled
psychological experiments have demonstrated this effect which furthermore is strongly influenced by
biasing factors such as the race of supposed criminals.
To save his blushes we have not said here which author’s failing memory was responsible for the Spitfire
story, but you can read more on this story and also find who it was on the book website at:
/e3/online/spitfire/
50 Chapter 1 n The human
Courtesy of popperfoto.com
1.5 Emotion 51
Although both the light switch and the lift button were inconsistent with our men-
tal models of these controls, we would probably have managed if they had been
encountered separately. If there had been no button beside the lift we would have
looked more closely and found the one on the inner rim. But since the light switch
reflected our model of a lift button we looked no further. During our stay we
observed many more new guests making the same error.
This illustrates the importance of a correct mental model and the dangers of
ignoring conventions. There are certain conventions that we use to interpret the
world and ideally designs should support these. If these are to be violated, explicit
support must be given to enable us to form a correct mental model. A label on the
button saying ‘light switch’ would have been sufficient.
EMOTION
So far in this chapter we have concentrated on human perceptual and cognitive abil-
ities. But human experience is far more complex than this. Our emotional response
to situations affects how we perform. For example, positive emotions enable us to
think more creatively, to solve complex problems, whereas negative emotion pushes
us into narrow, focussed thinking. A problem that may be easy to solve when we are
relaxed, will become difficult if we are frustrated or afraid.
Psychologists have studied emotional response for decades and there are many
theories as to what is happening when we feel an emotion and why such a response
occurs. More than a century ago, William James proposed what has become known
as the James–Lange theory (Lange was a contemporary of James whose theories
were similar): that emotion was the interpretation of a physiological response, rather
than the other way around. So while we may feel that we respond to an emotion,
James contended that we respond physiologically to a stimulus and interpret that as
emotion:
Common sense says, we lose our fortune, are sorry and weep; we meet a bear, are
frightened and run; we are insulted by a rival, are angry and strike. The hypothesis
here...is that we feel sorry because we cry, angry because we strike, afraid because we
tremble.
(W. James, Principles of Psychology, page 449. Henry Holt, New York, 1890.)
Others, however, disagree. Cannon [54a], for example, argued that our physio-
logical processes are in fact too slow to account for our emotional reactions, and that
the physiological responses for some emotional states are too similar (e.g. anger
and fear), yet they can be easily distinguished. Experience in studies with the use of
drugs that stimulate broadly the same physiological responses as anger or fear seems
to support this: participants reported physical symptoms but not the emotion,
which suggests that emotional response is more than a recognition of physiological
changes.
1.5
52 Chapter 1 n The human
Schachter and Singer [312a] proposed a third interpretation: that emotion results
from a person evaluating physical responses in the light of the whole situation. So
whereas the same physiological response can result from a range of different situ-
ations, the emotion that is felt is based on a cognitive evaluation of the circumstance
and will depend on what the person attributes this to. So the same physiological
response of a pounding heart will be interpreted as excitement if we are in a com-
petition and fear if we find ourselves under attack.
Whatever the exact process, what is clear is that emotion involves both physical
and cognitive events. Our body responds biologically to an external stimulus and we
interpret that in some way as a particular emotion. That biological response – known
as affect – changes the way we deal with different situations, and this has an impact
on the way we interact with computer systems. As Donald Norman says:
Negative affect can make it harder to do even easy tasks; positive affect can make it
easier to do difficult tasks.
(D. A. Norman, Emotion and design: attractive things work better.
Interactions Magazine, ix(4): 36–42, 2002.)
So what are the implications of this for design? It suggests that in situations of
stress, people will be less able to cope with complex problem solving or managing
difficult interfaces, whereas if people are relaxed they will be more forgiving of
limitations in the design. This does not give us an excuse to design bad interfaces
but does suggest that if we build interfaces that promote positive responses – for
example by using aesthetics or reward – then they are likely be more successful.
INDIVIDUAL DIFFERENCES
In this chapter we have been discussing humans in general. We have made the
assumption that everyone has similar capabilities and limitations and that we
can therefore make generalizations. To an extent this is true: the psychological
principles and properties that we have discussed apply to the majority of people.
Notwithstanding this, we should remember that, although we share processes in
common, humans, and therefore users, are not all the same. We should be aware of
individual differences so that we can account for them as far as possible within our
designs. These differences may be long term, such as sex, physical capabilities and
intellectual capabilities. Others are shorter term and include the effect of stress
or fatigue on the user. Still others change through time, such as age.
These differences should be taken into account in our designs. It is useful to
consider, for any design decision, if there are likely to be users within the target
group who will be adversely affected by our decision. At the extremes a decision may
exclude a section of the user population. For example, the current emphasis on visual
interfaces excludes those who are visually impaired, unless the design also makes use
of the other sensory channels. On a more mundane level, designs should allow for
1.6
1.7 Psychology and the design of interactive systems 53
users who are under pressure, feeling ill or distracted by other concerns: they should
not push users to their perceptual or cognitive limits.
We will consider the issues of universal accessibility in more detail in Chapter 10.
PSYCHOLOGY AND THE DESIGN OF
INTERACTIVE SYSTEMS
So far we have looked briefly at the way in which humans receive, process and
store information, solve problems and acquire skill. But how can we apply what we
have learned to designing interactive systems? Sometimes, straightforward conclu-
sions can be drawn. For example, we can deduce that recognition is easier than recall
and allow users to select commands from a set (such as a menu) rather than input
them directly. However, in the majority of cases, application is not so obvious
or simple. In fact, it may be dangerous, leading us to make generalizations which are
not valid. In order to apply a psychological principle or result properly in design,
we need to understand its context, both in terms of where it fits in the wider field
of psychology and in terms of the details of the actual experiments, the measures
used and the subjects involved, for example. This may appear daunting, particularly
to the novice designer who wants to acknowledge the relevance of cognitive psy-
chology but does not have the background to derive appropriate conclusions.
Fortunately, principles and results from research in psychology have been distilled
into guidelines for design, models to support design and techniques for evaluating
design. Parts 2 and 3 of this book include discussion of a range of guidelines,
models and techniques, based on cognitive psychology, which can be used to support
the design process.
1.7.1 Guidelines
Throughout this chapter we have discussed the strengths and weaknesses of human
cognitive and perceptual processes but, for the most part, we have avoided attempt-
ing to apply these directly to design. This is because such an attempt could only
be partial and simplistic, and may give the impression that this is all psychology
has to offer.
However, general design principles and guidelines can be and have been derived
from the theories we have discussed. Some of these are relatively straightforward:
for instance, recall is assisted by the provision of retrieval cues so interfaces should
incorporate recognizable cues wherever possible. Others are more complex and con-
text dependent. In Chapter 7 we discuss principles and guidelines further, many of
which are derived from psychological theory. The interested reader is also referred to
Gardiner and Christie [140] which illustrates how guidelines can be derived from
psychological theory.
1.7
54 Chapter 1 n The human
1.7.2 Models to suppor t design
As well as guidelines and principles, psychological theory has led to the development
of analytic and predictive models of user behavior. Some of these include a specific
model of human problem solving, others of physical activity, and others attempt
a more comprehensive view of cognition. Some predict how a typical computer
user would behave in a given situation, others analyze why particular user behavior
occurred. All are based on cognitive theory. We discuss these models in detail in
Chapter 12.
1.7.3 Techniques for evaluation
In addition to providing us with a wealth of theoretical understanding of the human
user, psychology also provides a range of empirical techniques which we can employ
to evaluate our designs and our systems. In order to use these effectively we need to
understand the scope and benefits of each method. Chapter 9 provides an overview
of these techniques and an indication of the circumstances under which each should
be used.
Worked exercise Produce a semantic network of the main information in this chapter.
Answer This network is potentially huge so it is probably unnecessary to devise the whole thing!
Be selective. One helpful way to tackle the exercise is to approach it in both a top-down
and a bottom-up manner. Top-down will give you a general overview of topics and how
they relate; bottom-up can fill in the details of a particular field. These can then be
Top-down view
1.8 Summary 55
‘glued’ together to build up the whole picture. You may be able to tackle this problem
in a group, each taking one part of it. We will not provide the full network here but will
give examples of the level of detail anticipated for the overview and the detailed ver-
sions. In the overview we have not included labels on the arcs for clarity.
SUMMARY
In this chapter we have considered the human as an information processor, re-
ceiving inputs from the world, storing, manipulating and using information, and
reacting to the information received. Information is received through the senses,
particularly, in the case of computer use, through sight, hearing and touch. It is
stored in memory, either temporarily in sensory or working memory, or perman-
ently in long-term memory. It can then be used in reasoning and problem solving.
Recurrent familiar situations allow people to acquire skills in a particular domain, as
their information structures become better defined. However, this can also lead to
error, if the context changes.
Human perception and cognition are complex and sophisticated but they are not
without their limitations. We have considered some of these limitations in this chap-
ter. An understanding of the capabilities and limitations of the human as informa-
tion processor can help us to design interactive systems which support the former
and compensate for the latter. The principles, guidelines and models which can be
derived from cognitive psychology and the techniques which it provides are invalu-
able tools for the designer of interactive systems.
1.8
Bottom-up view
56 Chapter 1 n The human
EXERCISES
1.1 Devise experiments to test the properties of (i) short-term memory, (ii) long-term
memory, using the experiments described in this chapter to help you. Try out your experiments
on your friends. Are your results consistent with the properties described in this chapter?
1.2 Observe skilled and novice operators in a familiar domain, for example touch and ‘hunt-and-peck’
typists, expert and novice game players, or expert and novice users of a computer application.
What differences can you discern between their behaviors?
1.3 From what you have learned about cognitive psychology devise appropriate guidelines for use by
interface designers. You may find it helpful to group these under key headings, for example visual
perception, memory, problem solving, etc., although some may overlap such groupings.
1.4 What are mental models, and why are they important in interface design?
1.5 What can a system designer do to minimize the memory load of the user?
1.6 Human short-term memory has a limited span. This is a series of experiments to determine what
that span is. (You will need some other people to take part in these experiments with you – they
do not need to be studying the course – try it with a group of friends.)
(a) Kim’s game
Divide into groups. Each group gathers together an assortment of objects – pens, pencils, paper-
clips, books, sticky notes, etc. The stranger the object, the better! You need a large number of
them – at least 12 to 15. Place them in some compact arrangement on a table, so that all items
are visible. Then, swap with another group for 30 seconds only and look at their pile. Return to
your table, and on your own try to write down all the items in the other group’s pile.
Compare your list with what they actually have in their pile. Compare the number of things you
remembered with how the rest of your group did. Now think introspectively: what helped you
remember certain things? Did you recognize things in their pile that you had in yours? Did that
help? Do not pack the things away just yet.
Calculate the average score for your group. Compare that with the averages from the other
group(s).
Questions: What conclusions can you draw from this experiment? What does this indicate
about the capacity of short-term memory? What does it indicate that helps improve the capa-
city of short-term memory?
(b) ‘I went to market...’
In your group, one person starts off with ‘I went to market and I bought a fish’ (or some other
produce, or whatever!). The next person continues ‘I went to market and I bought a fish and
I bought a bread roll as well’. The process continues, with each person adding some item to
the list each time. Keep going around the group until you cannot remember the list accurately.
Make a note of the first time someone gets it wrong, and then record the number of items
that you can successfully remember. Some of you will find it hard to remember more than a few,
others will fare much better. Do this a few more times with different lists, and then calculate your
average score, and your group’s average score.
RECOMMENDED READING
E. B. Goldstein, Sensation and Perception, 6th edition, Wadsworth, 2001.
A textbook covering human senses and perception in detail. Easy to read with
many home experiments to illustrate the points made.
A. Baddeley, Human Memory: Theory and Practice, revised edition, Allyn & Bacon,
1997.
The latest and most complete of Baddeley’s texts on memory. Provides up-to-date
discussion on the different views of memory structure as well as a detailed survey
of experimental work on memory.
M. W. Eysenck and M. T. Keane, Cognitive Psychology: A Student’s Handbook, 4th
edition, Psychology Press, 2000.
A comprehensive and readable textbook giving more detail on cognitive psycho-
logy, including memory, problem solving and skill acquisition.
S. K. Card, T. P. Moran and A. Newell, The Psychology of Human–Computer
Interaction, Lawrence Erlbaum Associates, 1983.
A classic text looking at the human as an information processor in interaction
with the computer. Develops and describes the Model Human Processor in detail.
A. Newell and H. Simon, Human Problem Solving, Prentice Hall, 1972.
Describes the problem space view of problem solving in more detail.
M. M. Gardiner and B. Christie, editors, Applying Cognitive Psychology to User-
Interface Design, John Wiley, 1987.
A collection of essays on the implications of different aspects of cognitive psycho-
logy to interface design. Includes memory, thinking, language and skill acquisition.
Provides detailed guidelines for applying psychological principles in design practice.
Questions: What does this tell you about short-term memory? What do you do that helps you
remember? What do you estimate is the typical capacity of human short-term memory? Is this a
good test for short-term memory?
(c) Improving your memory
Try experiment 1.6(a) again, using the techniques on page 39.
Has your recall ability improved? Has your group’s average improved? What does this show you
about memory?
1.7 Locate one source (through the library or the web) that reports on empirical evidence on human
limitations. Provide a full reference to the source. In one paragraph, summarize what the result of
the research states in terms of a physical human limitation.
In a separate paragraph, write your thoughts on how you think this evidence on human capabil-
ities impacts interactive system design.
Recommended reading 57
58 Chapter 1 n The human
A. Monk, editor, Fundamentals of Human Computer Interaction, Academic Press,
1985.
A good collection of articles giving brief coverage of aspects of human psychology
including perception, memory, thinking and reading. Also contains articles on
experimental design which provide useful introductions.
ACT-R site. Website of resources and examples of the use of the cognitive archi-
tecture ACT-R, which is the latest development of Anderson’s ACT model,
http://act-r.psy.cmu.edu/
THE COMPUTER
OVERVIEW
A computer system comprises various elements, each of which affects the
user of the system.
n Input devices for interactive use, allowing text entry, drawing and
selection from the screen:
text entry: traditional keyboard, phone text entry, speech and
handwriting
pointing: principally the mouse, but also touchpad, stylus and others
3D interaction devices.
n Output display devices for interactive use:
different types of screen mostly using some form of bitmap display
large displays and situated displays for shared and public use
digital paper may be usable in the near future.
n Virtual reality systems and 3D visualization which have special interaction
and display devices.
n Various devices in the physical world:
physical controls and dedicated displays
sound, smell and haptic feedback
sensors for nearly everything including movement, temperature,
bio-signs.
n Paper output and input: the paperless office and the less-paper office:
different types of printers and their characteristics, character styles
and fonts
scanners and optical character recognition.
n Memory:
short-term memory: RAM
long-term memory: magnetic and optical disks
capacity limitations related to document and video storage
access methods as they limit or help the user.
n Processing:
the effects when systems run too slow or too fast, the myth of the
infinitely fast machine
limitations on processing speed
networks and their impact on system performance.
2
60 Chapter 2 n The computer
INTRODUCTION
In order to understand how humans interact with computers, we need to have an
understanding of both parties in the interaction. The previous chapter explored
aspects of human capabilities and behavior of which we need to be aware in the
context of human–computer interaction; this chapter considers the computer and
associated input–output devices and investigates how the technology influences the
nature of the interaction and style of the interface.
We will concentrate principally on the traditional computer but we will also look
at devices that take us beyond the closed world of keyboard, mouse and screen. As
well as giving us lessons about more traditional systems, these are increasingly
becoming important application areas in HCI.
2.1
Exercise: how many computers?
In a group or class do a quick survey:
n How many computers do you have in your home?
n How many computers do you normally carry with you in your pockets or bags?
Collate the answers and see who the techno-freaks are!
Discuss your answers.
After doing this look at /e3/online/how-many-computers/
When we interact with computers, what are we trying to achieve? Consider what
happens when we interact with each other – we are either passing information to
other people, or receiving information from them. Often, the information we receive
is in response to the information that we have recently imparted to them, and we
may then respond to that. Interaction is therefore a process of information transfer.
Relating this to the electronic computer, the same principles hold: interaction is a
process of information transfer, from the user to the computer and from the com-
puter to the user.
The first part of this chapter concentrates on the transference of information from
the user to the computer and back. We begin by considering a current typical com-
puter interface and the devices it employs, largely variants of keyboard for text entry
(Section 2.2), mouse for positioning (Section 2.3) and screen for displaying output
(Section 2.4).
Then we move on to consider devices that go beyond the keyboard, mouse and
screen: entering deeper into the electronic world with virtual reality and 3D interaction
2.1 Introduction 61
(Section 2.5) and outside the electronic world looking at more physical interactions
(Section 2.6).
In addition to direct input and output, information is passed to and fro via
paper documents. This is dealt with in Section 2.7, which describes printers and
scanners. Although not requiring the same degree of user interaction as a mouse
or keyboard, these are an important means of input and output for many current
applications.
We then consider the computer itself, its processor and memory devices and
the networks that link them together. We note how the technology drives and
empowers the interface. The details of computer processing should largely be irrelev-
ant to the end-user, but the interface designer needs to be aware of the limitations
of storage capacity and computational power; it is no good designing on paper a
marvellous new interface, only to find it needs a Cray to run. Software designers
often have high-end machines on which to develop applications, and it is easy to
forget what a more typical configuration feels like.
Before looking at these devices and technology in detail we’ll take a quick
bird’s-eye view of the way computer systems are changing.
2.1.1 A typical computer system
Consider a typical computer setup as shown in Figure 2.1. There is the computer
‘box’ itself, a keyboard, a mouse and a color screen. The screen layout is shown
alongside it. If we examine the interface, we can see how its various characteristics
are related to the devices used. The details of the interface itself, its underlying prin-
ciples and design, are discussed in more depth in Chapter 3. As we shall see there are
variants on these basic devices. Some of this variation is driven by different hardware
configurations: desktop use, laptop computers, PDAs (personal digital assistants).
Partly the diversity of devices reflects the fact that there are many different types of
Window
Window
Figure 2.1 A typical computer system
62 Chapter 2 n The computer
data that may have to be entered into and obtained from a system, and there are also
many different types of user, each with their own unique requirements.
2.1.2 Levels of interaction – batch processing
In the early days of computing, information was entered into the computer in a
large mass – batch data entry. There was minimal interaction with the machine: the
user would simply dump a pile of punched cards onto a reader, press the start
button, and then return a few hours later. This still continues today although now
with pre-prepared electronic files or possibly machine-read forms. It is clearly the
most appropriate mode for certain kinds of application, for example printing pay
checks or entering the results from a questionnaire.
With batch processing the interactions take place over hours or days. In contrast
the typical desktop computer system has interactions taking seconds or fractions of
a second (or with slow web pages sometimes minutes!). The field of Human–
Computer Interaction largely grew due to this change in interactive pace. It is easy to
assume that faster means better, but some of the paper-based technology discussed
in Section 2.7 suggests that sometimes slower paced interaction may be better.
2.1.3 Richer interaction – everywhere, everywhen
Computers are coming out of the box! Information appliances are putting internet
access or dedicated systems onto the fridge, microwave and washing machine: to
automate shopping, give you email in your kitchen or simply call for maintenance
when needed. We carry with us WAP phones and smartcards, have security systems
that monitor us and web cams that show our homes to the world. Is Figure 2.1 really
the typical computer system or is it really more like Figure 2.2?
Figure 2.2 A typical computer system? Photo courtesy Electrolux
2.2 Text entry devices 63
TEXT ENTRY DEVICES
Whether writing a book like this, producing an office memo, sending a thank you
letter after your birthday, or simply sending an email to a friend, entering text is
one of our main activities when using the computer. The most obvious means of
text entry is the plain keyboard, but there are several variations on this: different
keyboard layouts, ‘chord’ keyboards that use combinations of fingers to enter let-
ters, and phone key pads. Handwriting and speech recognition offer more radical
alternatives.
2.2.1 The alphanumeric keyboard
The keyboard is still one of the most common input devices in use today. It is used
for entering textual data and commands. The vast majority of keyboards have a stand-
ardized layout, and are known by the first six letters of the top row of alphabetical
keys, QWERTY. There are alternative designs which have some advantages over the
QWERTY layout, but these have not been able to overcome the vast technological
inertia of the QWERTY keyboard. These alternatives are of two forms: 26 key layouts
and chord keyboards. A 26 key layout rearranges the order of the alphabetic keys,
putting the most commonly used letters under the strongest fingers, or adopting
simpler practices. In addition to QWERTY, we will discuss two 26 key layouts,
alphabetic and DVORAK, and chord keyboards.
The QWERTY keyboard
The layout of the digits and letters on a QWERTY keyboard is fixed (see Figure 2.3),
but non-alphanumeric keys vary between keyboards. For example, there is a differ-
ence between key assignments on British and American keyboards (in particular,
above the 3 on the UK keyboard is the pound sign £, whilst on the US keyboard
there is a dollar sign $). The standard layout is also subject to variation in the place-
ment of brackets, backslashes and suchlike. In addition different national keyboards
include accented letters and the traditional French layout places the main letters in
different locations – the top line starts AZERTY.
2.2
Figure 2.3 The standard QWERTY keyboard
64 Chapter 2 n The computer
The QWERTY arrangement of keys is not optimal for typing, however. The
reason for the layout of the keyboard in this fashion can be traced back to the days
of mechanical typewriters. Hitting a key caused an arm to shoot towards the carriage,
imprinting the letter on the head on the ribbon and hence onto the paper. If two
arms flew towards the paper in quick succession from nearly the same angle,
they would often jam – the solution to this was to set out the keys so that common
combinations of consecutive letters were placed at different ends of the keyboard,
which meant that the arms would usually move from alternate sides. One appealing
story relating to the key layout is that it was also important for a salesman to be able
to type the word ‘typewriter’ quickly in order to impress potential customers: the
letters are all on the top row!
The electric typewriter and now the computer keyboard are not subject to the
original mechanical constraints, but the QWERTY keyboard remains the dominant
layout. The reason for this is social – the vast base of trained typists would be reluct-
ant to relearn their craft, whilst the management is not prepared to accept an initial
lowering of performance whilst the new skills are gained. There is also a large invest-
ment in current keyboards, which would all have to be either replaced at great cost,
or phased out, with the subsequent requirement for people to be proficient on both
keyboards. As whole populations have become keyboard users this technological
inertia has probably become impossible to change.
How keyboards work
Current keyboards work by a keypress closing a connection, causing a character code to
be sent to the computer. The connection is usually via a lead, but wireless systems also exist. One
aspect of keyboards that is important to users is the ‘feel’ of the keys. Some keyboards require a
very hard press to operate the key, much like a manual typewriter, whilst others are featherlight.
The distance that the keys travel also affects the tactile nature of the keyboard. The keyboards that
are currently used on most notebook computers are ‘half-travel’ keyboards, where the keys travel
only a small distance before activating their connection; such a keyboard can feel dead to begin
with, but such qualitative judgments often change as people become more used to using it. By mak-
ing the actual keys thinner, and allowing them a much reduced travel, a lot of vertical space can be
saved on the keyboard, thereby making the machine slimmer than would otherwise be possible.
Some keyboards are even made of touch-sensitive buttons, which require a light touch and
practically no travel; they often appear as a sheet of plastic with the buttons printed on them.
Such keyboards are often found on shop tills, though the keys are not QWERTY, but specific to
the task. Being fully sealed, they have the advantage of being easily cleaned and resistant to dirty
environments, but have little feel, and are not popular with trained touch-typists. Feedback is
important even at this level of human–computer interaction! With the recent increase of repetit-
ive strain injury (RSI) to users’ fingers, and the increased responsibilities of employers in these
circumstances, it may be that such designs will enjoy a resurgence in the near future. RSI in fingers
is caused by the tendons that control the movement of the fingers becoming inflamed owing to
overuse and making repeated unnatural movements.
2.2 Text entry devices 65
Ease of learning – alphabetic keyboard
One of the most obvious layouts to be produced is the alphabetic keyboard, in which
the letters are arranged alphabetically across the keyboard. It might be expected that
such a layout would make it quicker for untrained typists to use, but this is not the
case. Studies have shown that this keyboard is not faster for properly trained typists,
as we may expect, since there is no inherent advantage to this layout. And even for
novice or occasional users, the alphabetic layout appears to make very little differ-
ence to the speed of typing. These keyboards are used in some pocket electronic per-
sonal organizers, perhaps because the layout looks simpler to use than the QWERTY
one. Also, it dissuades people from attempting to use their touch-typing skills on a
very small keyboard and hence avoids criticisms of difficulty of use!
Ergonomics of use – DVORAK keyboard and split designs
The DVORAK keyboard uses a similar layout of keys to the QWERTY system, but
assigns the letters to different keys. Based upon an analysis of typing, the keyboard is
designed to help people reach faster typing speeds. It is biased towards right-handed
people, in that 56% of keystrokes are made with the right hand. The layout of the
keys also attempts to ensure that the majority of keystrokes alternate between hands,
thereby increasing the potential speed. By keeping the most commonly used keys on
the home, or middle, row, 70% of keystrokes are made without the typist having
to stretch far, thereby reducing fatigue and increasing keying speed. The layout also
There are a variety of specially shaped keyboards to relieve the strain of typing or to allow
people to type with some injury (e.g. RSI) or disability. These may slope the keys towards the
hands to improve the ergonomic position, be designed for single-handed use, or for no hands at
all. Some use bespoke key layouts to reduce strain of finger movements. The keyboard illustrated
is produced by PCD Maltron Ltd. for left-handed use. See www.maltron.com/
Source: www.maltron.com, reproduced courtesy of PCD Maltron Ltd.
66 Chapter 2 n The computer
aims to minimize the number of keystrokes made with the weak fingers. Many
of these requirements are in conflict, and the DVORAK keyboard represents one
possible solution. Experiments have shown that there is a speed improvement of
between 10 and 15%, coupled with a reduction in user fatigue due to the increased
ergonomic layout of the keyboard [230].
Other aspects of keyboard design have been altered apart from the layout of the
keys. A number of more ergonomic designs have appeared, in which the basic tilted
planar base of the keyboard is altered. Moderate designs curve the plane of the key-
board, making it concave, whilst more extreme ones split the keys into those for the
left and right hand and curve both halves separately. Often in these the keys are also
moved to bring them all within easy reach, to minimize movement between keys.
Such designs are supposed to aid comfort and reduce RSI by minimizing effort, but
have had practically no impact on the majority of systems sold.
2.2.2 Chord keyboards
Chord keyboards are significantly different from normal alphanumeric keyboards.
Only a few keys, four or five, are used (see Figure 2.4) and letters are produced by
pressing one or more of the keys at once. For example, in the Microwriter, the pat-
tern of multiple keypresses is chosen to reflect the actual letter shape.
Such keyboards have a number of advantages. They are extremely compact:
simply reducing the size of a conventional keyboard makes the keys too small and
close together, with a correspondingly large increase in the difficulty of using it. The
Figure 2.4 A very early chord keyboard (left) and its lettercodes (right)
2.2 Text entry devices 67
learning time for the keyboard is supposed to be fairly short – of the order of a few
hours – but social resistance is still high. Moreover, they are capable of fast typing
speeds in the hands (or rather hand!) of a competent user. Chord keyboards can
also be used where only one-handed operation is possible, in cramped and confined
conditions.
Lack of familiarity means that these are unlikely ever to be a mainstream form of
text entry, but they do have applications in niche areas. In particular, courtroom
stenographers use a special form of two-handed chord keyboard and associated
shorthand to enter text at full spoken speed. Also it may be that the compact size and
one-handed operation will find a place in the growing wearables market.
DESIGN FOCUS
Numeric keypads
Alphanumeric keyboards (as the name suggests) include numbers as well as letters. In the QWERTY
layout these are in a line across the top of the keyboard, but in most larger keyboards there is also a
separate number pad to allow faster entry of digits. Number keypads occur in other contexts too,
including calculators, telephones and ATM cash dispensers. Many people are unaware that there are
two different layouts for numeric keypads: the calculator style that has ‘123’ on the bottom and the
telephone style that has ‘123’ at the top.
It is a demonstration of the amazing adaptability of humans that we move between these two styles
with such ease. However, if you need to include a numeric keypad in a device you must consider which
is most appropriate for your potential users. For example, computer keyboards use calculator-style
layout, as they are primarily used for entering numbers for calculations.
One of the authors was caught out by this once when he forgot the PIN number of his cash card. He
half remembered the digits, but also his fingers knew where to type, so he ‘practiced’ on his calculator.
Unfortunately ATMs use telephone-style layout!
calculator ATM phone
68 Chapter 2 n The computer
2.2.3 Phone pad and T9 entr y
With mobile phones being used for SMS text messaging (see Chapter 19) and WAP
(see Chapter 21), the phone keypad has become an important form of text input.
Unfortunately a phone only has digits 0–9, not a full alphanumeric keyboard.
To overcome this for text input the numeric keys are usually pressed several times
– Figure 2.5 shows a typical mapping of digits to letters. For example, the 3 key has
‘def’ on it. If you press the key once you get a ‘d’, if you press 3 twice you get an ‘e’,
if you press it three times you get an ‘f’. The main number-to-letter mapping is stand-
ard, but punctuation and accented letters differ between phones. Also there needs to
be a way for the phone to distinguish, say, the ‘dd’ from ‘e’. On some phones you
need to pause for a short period between successive letters using the same key, for
others you press an additional key (e.g. ‘#’).
Most phones have at least two modesfor the numeric buttons: one where the keys
mean the digits (for example when entering a phone number) and one where they
mean letters (for example when typing an SMS message). Some have additional
modes to make entering accented characters easier. Also a special mode or setting is
needed for capital letters although many phones use rules to reduce this, for ex-
ample automatically capitalizing the initial letter in a message and letters following
full stops, question marks and exclamation marks.
This is all very laborious and, as we will see in Chapter 19, experienced mobile
phone users make use of a highly developed shorthand to reduce the number of
keystrokes. If you watch a teenager or other experienced txt-er, you will see they
Figure 2.5 Mobile phone keypad. Source: Photograph by Alan Dix (Ericsson phone)
Typical key mapping:
1 – space, comma, etc. (varies)
2 – a b c
3 – d e f
4 – g h i
5 – j k l
6 – m n o
7 – p q r s
8 – t u v
9 – w x y z
0 – +, &, etc.
2.2 Text entry devices 69
often develop great typing speed holding the phone in one hand and using only
their thumb. As these skills spread through society it may be that future devices
use this as a means of small format text input. For those who never develop this
physical dexterity some phones have tiny plug-in keyboards, or come with fold-out
keyboards.
Another technical solution to the problem is the T9 algorithm. This uses a large
dictionary to disambiguate words by simply typing the relevant letters once. For
example, ‘3926753’ becomes ‘example’ as there is only one word with letters that
match (alternatives like ‘ewbosld’ that also match are not real words). Where there
are ambiguities such as ‘26’, which could be an ‘am’ or an ‘an’, the phone gives a
series of options to choose from.
2.2.4 Handwriting recognition
Handwriting is a common and familiar activity, and is therefore attractive as a
method of text entry. If we were able to write as we would when we use paper, but
with the computer taking this form of input and converting it to text, we can see that
it is an intuitive and simple way of interacting with the computer. However, there are
a number of disadvantages with handwriting recognition. Current technology is
still fairly inaccurate and so makes a significant number of mistakes in recognizing
letters, though it has improved rapidly. Moreover, individual differences in hand-
writing are enormous, and make the recognition process even more difficult. The
most significant information in handwriting is not in the letter shape itself but in the
stroke information – the way in which the letter is drawn. This means that devices
which support handwriting recognition must capture the stroke information, not
just the final character shape. Because of this, online recognition is far easier than
reading handwritten text on paper. Further complications arise because letters
within words are shaped and often drawn very differently depending on the actual
word; the context can help determine the letter’s identity, but is often unable to pro-
vide enough information. Handwriting recognition is covered in more detail later in
the book, in Chapter 10. More serious in many ways is the limitation on speed; it is
difficult to write at more than 25 words a minute, which is no more than half the
speed of a decent typist.
The different nature of handwriting means that we may find it more useful in
situations where a keyboard-based approach would have its own problems. Such
situations will invariably result in completely new systems being designed around
the handwriting recognizer as the predominant mode of textual input, and these
may bear very little resemblance to the typical system. Pen-based systems that use
handwriting recognition are actively marketed in the mobile computing market,
especially for smaller pocket organizers. Such machines are typically used for taking
notes and jotting down and sketching ideas, as well as acting as a diary, address book
and organizer. Using handwriting recognition has many advantages over using a
keyboard. A pen-based system can be small and yet still accurate and easy to use,
whereas small keys become very tiring, or even impossible, to use accurately. Also the
70 Chapter 2 n The computer
pen-based approach does not have to be altered when we move from jotting down
text to sketching diagrams; pen-based input is highly appropriate for this also.
Some organizer designs have dispensed with a keyboard completely. With such
systems one must consider all sorts of other ways to interact with the system that are
not character based. For example, we may decide to use gesture recognition, rather
than commands, to tell the system what to do, for example drawing a line through a
word in order to delete it. The important point is that a different input device that
was initially considered simply as an alternative to the keyboard opens up a whole
host of alternative interface designs and different possibilities for interaction.
Signature authentication
Handwriting recognition is difficult principally because of the great differences between dif-
ferent people’s handwriting. These differences can be used to advantage in signature authentication
where the purpose is to identify the user rather than read the signature. Again this is far easier
when we have stroke information as people tend to produce signatures which look slightly differ-
ent from one another in detail, but are formed in a similar fashion. Furthermore, a forger who has
a copy of a person’s signature may be able to copy the appearance of the signature, but will not
be able to reproduce the pattern of strokes.
2.2.5 Speech recognition
Speech recognition is a promising area of text entry, but it has been promising for a
number of years and is still only used in very limited situations. There is a natural
enthusiasm for being able to talk to the machine and have it respond to commands,
since this form of interaction is one with which we are very familiar. Successful
recognition rates of over 97% have been reported, but since this represents one let-
ter in error in approximately every 30, or one spelling mistake every six or so words,
this is stoll unacceptible (sic)! Note also that this performance is usually quoted only
for a restricted vocabulary of command words. Trying to extend such systems to the
level of understanding natural language, with its inherent vagueness, imprecision
and pauses, opens up many more problems that have not been satisfactorily solved
even for keyboard-entered natural language. Moreover, since every person speaks
differently, the system has to be trained and tuned to each new speaker, or its per-
formance decreases. Strong accents, a cold or emotion can also cause recognition
problems, as can background noise. This leads us on to the question of practicality
within an office environment: not only may the background level of noise cause
errors, but if everyone in an open-plan office were to talk to their machine, the level
of noise would dramatically increase, with associated difficulties. Confidentiality
would also be harder to maintain.
Despite its problems, speech technology has found niche markets: telephone
information systems, access for the disabled, in hands-occupied situations (especially
2.3 Positioning, pointing and drawing 71
military) and for those suffering RSI. This is discussed in greater detail in Chapter 10,
but we can see that it offers three possibilities. The first is as an alternative text entry
device to replace the keyboard within an environment and using software originally
designed for keyboard use. The second is to redesign a system, taking full advantage
of the benefits of the technique whilst minimizing the potential problems. Finally, it
can be used in areas where keyboard-based input is impractical or impossible. It is in
the latter, more radical areas that speech technology is currently achieving success.
POSITIONING, POINTING AND DRAWING
Central to most modern computing systems is the ability to point at something on
the screen and thereby manipulate it, or perform some function. There has been a
long history of such devices, in particular in computer-aided design (CAD), where
positioning and drawing are the major activities. Pointing devices allow the user to
point, position and select items, either directly or by manipulating a pointer on the
screen. Many pointing devices can also be used for free-hand drawing although the
skill of drawing with a mouse is very different from using a pencil. The mouse is still
most common for desktop computers, but is facing challenges as laptop and hand-
held computing increase their market share. Indeed, these words are being typed on
a laptop with a touchpad and no mouse.
2.3.1 The mouse
The mouse has become a major component of the majority of desktop computer sys-
tems sold today, and is the little box with the tail connecting it to the machine in our
basic computer system picture (Figure 2.1). It is a small, palm-sized box housing a
weighted ball – as the box is moved over the tabletop, the ball is rolled by the table
and so rotates inside the housing. This rotation is detected by small rollers that are
in contact with the ball, and these adjust the values of potentiometers. If you remove
the ball occasionally to clear dust you may be able to see these rollers. The changing
values of these potentiometers can be directly related to changes in position of the
ball. The potentiometers are aligned in different directions so that they can detect
both horizontal and vertical motion. The relative motion information is passed
to the computer via a wire attached to the box, or in some cases using wireless or
infrared, and moves a pointer on the screen, called the cursor. The whole arrange-
ment tends to look rodent-like, with the box acting as the body and the wire as the
tail; hence the term ‘mouse’. In addition to detecting motion, the mouse has typically
one, two or three buttons on top. These are used to indicate selection or to initiate
action. Single-button mice tend to have similar functionality to multi-button mice,
and achieve this by instituting different operations for a single and a double button
click. A ‘double-click’ is when the button is pressed twice in rapid succession. Multi-
button mice tend to allocate one operation to each particular button.
2.3
72 Chapter 2 n The computer
The mouse operates in a planar fashion, moving around the desktop, and is an
indirect input device, since a transformation is required to map from the horizontal
nature of the desktop to the vertical alignment of the screen. Left–right motion is
directly mapped, whilst up–down on the screen is achieved by moving the mouse
away–towards the user. The mouse only provides information on the relative move-
ment of the ball within the housing: it can be physically lifted up from the desktop
and replaced in a different position without moving the cursor. This offers the
advantage that less physical space is required for the mouse, but suffers from being
less intuitive for novice users. Since the mouse sits on the desk, moving it about is
easy and users suffer little arm fatigue, although the indirect nature of the medium
can lead to problems with hand–eye coordination. However, a major advantage of
the mouse is that the cursor itself is small, and it can be easily manipulated without
obscuring the display.
The mouse was developed around 1964 by Douglas C. Engelbart, and a photo-
graph of the first prototype is shown in Figure 2.6. This used two wheels that
slid across the desktop and transmitted xycoordinates to the computer. The hous-
ing was carved in wood, and has been damaged, exposing one of the wheels. The
original design actually offers a few advantages over today’s more sleek versions:
by tilting it so that only one wheel is in contact with the desk, pure vertical or hori-
zontal motion can be obtained. Also, the problem of getting the cursor across the
large screens that are often used today can be solved by flicking your wrist to get
the horizontal wheel spinning. The mouse pointer then races across the screen with
no further effort on your behalf, until you stop it at its destination by dropping the
mouse down onto the desktop.
Figure 2.6 The first mouse. Photograph courtesy of Douglas Engelbart and
Bootstrap Institute
2.3 Positioning, pointing and drawing 73
Although most mice are hand operated, not all are – there have been experiments
with a device called the footmouse. As the name implies, it is a foot-operated device,
although more akin to an isometric joystick than a mouse. The cursor is moved by
foot pressure on one side or the other of a pad. This allows one to dedicate hands to
the keyboard. A rare device, the footmouse has not found common acceptance!
Interestingly foot pedals are used heavily in musical instruments including pianos,
electric guitars, organs and drums and also in mechanical equipment including cars,
cranes, sewing machines and industrial controls. So it is clear that in principle this is
a good idea. Two things seem to have limited their use in computer equipment
(except simulators and games). One is the practicality of having foot controls in the
work environment: pedals under a desk may be operated accidentally, laptops with
foot pedals would be plain awkward. The second issue is the kind of control being
exercised. Pedals in physical interfaces are used predominantly to control one or
more single-dimensional analog controls. It may be that in more specialized interfaces
appropriate foot-operated controls could be more commonly and effectively used.
2.3.2 Touchpad
Touchpads are touch-sensitive tablets usually around 2–3 inches (50–75 mm)
square. They were first used extensively in Apple Powerbook portable computers but
are now used in many other notebook computers and can be obtained separately to
replace the mouse on the desktop. They are operated by stroking a finger over their
surface, rather like using a simulated trackball. The feel is very different from other
input devices, but as with all devices users quickly get used to the action and become
proficient.
Because they are small it may require several strokes to move the cursor across the
screen. This can be improved by using acceleration settings in the software linking
the trackpad movement to the screen movement. Rather than having a fixed ratio of
pad distance to screen distance, this varies with the speed of movement. If the finger
Optical mice
Optical mice work differently from mechanical mice. A light-emitting diode emits a weak
red light from the base of the mouse. This is reflected off a special pad with a metallic grid-like pat-
tern upon which the mouse has to sit, and the fluctuations in reflected intensity as the mouse is
moved over the gridlines are recorded by a sensor in the base of the mouse and translated into
relative x, y motion. Some optical mice do not require special mats, just an appropriate surface,
and use the natural texture of the surface to detect movement. The optical mouse is less suscept-
ible to dust and dirt than the mechanical one in that its mechanism is less likely to become blocked
up. However, for those that rely on a special mat, if the mat is not properly aligned, movement of
the mouse may become erratic – especially difficult if you are working with someone and pass the
mouse back and forth between you.
74 Chapter 2 n The computer
moves slowly over the pad then the pad movements map to small distances on the
screen. If the finger is moving quickly the same distance on the touchpad moves the
cursor a long distance. For example, on the trackpad being used when writing this
section a very slow movement of the finger from one side of the trackpad to the other
moves the cursor less than 10% of the width of the screen. However, if the finger is
moved very rapidly from side to side, the cursor moves the whole width of the screen.
In fact, this form of acceleration setting is also used in other indirect positioning
devices including the mouse. Fine settings of this sort of parameter makes a great dif-
ference to the ‘feel’ of the device.
2.3.3 Trackball and thumbwheel
The trackball is really just an upside-down mouse! A weighted ball faces upwards and
is rotated inside a static housing, the motion being detected in the same way as for
a mechanical mouse, and the relative motion of the ball moves the cursor. Because
of this, the trackball requires no additional space in which to operate, and is there-
fore a very compact device. It is an indirect device, and requires separate buttons
for selection. It is fairly accurate, but is hard to draw with, as long movements are
difficult. Trackballs now appear in a wide variety of sizes, the most usual being about
the same as a golf ball, with a number of larger and smaller devices available. The size
and ‘feel’ of the trackball itself affords significant differences in the usability of the
device: its weight, rolling resistance and texture all contribute to the overall effect.
Some of the smaller devices have been used in notebook and portable computers,
but more commonly trackpads or nipples are used. They are often sold as altern-
atives to mice on desktop computers, especially for RSI sufferers. They are also
heavily used in video games where their highly responsive behavior, including being
able to spin the ball, is ideally suited to the demands of play.
Thumbwheels are different in that they have two orthogonal dials to control the
cursor position. Such a device is very cheap, but slow, and it is difficult to manipu-
late the cursor in any way other than horizontally or vertically. This limitation can
sometimes be a useful constraint in the right application. For instance, in CAD the
designer is almost always concerned with exact verticals and horizontals, and a
device that provides such constraints is very useful, which accounts for the appear-
ance of thumbwheels in CAD systems. Another successful application for such a
device has been in a drawing game such as Etch-a-Sketch in which straight lines can
be created on a simple screen, since the predominance of straight lines in simple
drawings means that the motion restrictions are an advantage rather than a handi-
cap. However, if you were to try to write your signature using a thumbwheel, the
limitations would be all too apparent. The appropriateness of the device depends on
the task to be performed.
Although two-axis thumbwheels are not heavily used in mainstream applications,
single thumbwheels are often included on a standard mouse in order to offer an
alternative means to scroll documents. Normally scrolling requires you to grab the
scroll bar with the mouse cursor and drag it down. For large documents it is hard to
2.3 Positioning, pointing and drawing 75
be accurate and in addition the mouse dragging is done holding a finger down which
adds to hand strain. In contrast the small scroll wheel allows comparatively intuitive
and fast scrolling, simply rotating the wheel to move the page.
2.3.4 Joystick and keyboard nipple
The joystick is an indirect input device, taking up very little space. Consisting of a
small palm-sized box with a stick or shaped grip sticking up from it, the joystick is a
simple device with which movements of the stick cause a corresponding movement
of the screen cursor. There are two types of joystick: the absolute and the isometric.
In the absolute joystick, movement is the important characteristic, since the position
of the joystick in the base corresponds to the position of the cursor on the screen.
In the isometric joystick, the pressure on the stick corresponds to the velocity of
the cursor, and when released, the stick returns to its usual upright centered position.
This type of joystick is also called the velocity-controlled joystick, for obvious
reasons. The buttons are usually placed on the top of the stick, or on the front like a
trigger. Joysticks are inexpensive and fairly robust, and for this reason they are often
found in computer games. Another reason for their dominance of the games market
is their relative familiarity to users, and their likeness to aircraft joysticks: aircraft are
a favorite basis for games, leading to familiarity with the joystick that can be used
for more obscure entertainment ideas.
A smaller device but with the same basic characteristics is used on many laptop
computers to control the cursor. Some older systems had a variant of this called the
keymouse, which was a single key. More commonly a small rubber nipple projects in
the center of the keyboard and acts as a tiny isometric joystick. It is usually difficult
for novices to use, but this seems to be related to fine adjustment of the speed set-
tings. Like the joystick the nipple controls the rate of movement across the screen
and is thus less direct than a mouse or stylus.
2.3.5 Touch-sensitive screens (touchscreens)
Touchscreens are another method of allowing the user to point and select objects
on the screen, but they are much more direct than the mouse, as they detect the
presence of the user’s finger, or a stylus, on the screen itself. They work in one of
a number of different ways: by the finger (or stylus) interrupting a matrix of light
beams, or by capacitance changes on a grid overlaying the screen, or by ultrasonic
reflections. Because the user indicates exactly which item is required by pointing to
it, no mapping is required and therefore this is a direct device.
The touchscreen is very fast, and requires no specialized pointing device. It is
especially good for selecting items from menus displayed on the screen. Because
the screen acts as an input device as well as an output device, there is no separate
hardware to become damaged or destroyed by dirt; this makes touchscreens suitable
for use in hostile environments. They are also relatively intuitive to use and have
been used successfully as an interface to information systems for the general public.
76 Chapter 2 n The computer
They suffer from a number of disadvantages, however. Using the finger to point is
not always suitable, as it can leave greasy marks on the screen, and, being a fairly
blunt instrument, it is quite inaccurate. This means that the selection of small
regions is very difficult, as is accurate drawing. Moreover, lifting the arm to point to
a vertical screen is very tiring, and also means that the screen has to be within about
a meter of the user to enable it to be reached, which can make it too close for com-
fort. Research has shown that the optimal angle for a touchscreen is about 15 degrees
up from the horizontal.
2.3.6 Stylus and light pen
For more accurate positioning (and to avoid greasy screens), systems with touch-
sensitive surfaces often emply a stylus. Instead of pointing at the screen directly a
small pen-like plastic stick is used to point and draw on the screen. This is particu-
larly popular in PDAs, but they are also being used in some laptop computers.
An older technology that is used in the same way is the light pen. The pen is con-
nected to the screen by a cable and, in operation, is held to the screen and detects
a burst of light from the screen phosphor during the display scan. The light pen
can therefore address individual pixels and so is much more accurate than the
touchscreen.
Both stylus and light pen can be used for fine selection and drawing, but both
can be tiring to use on upright displays and are harder to take up and put down
when used together with a keyboard. Interestingly some users of PDAs with fold-out
keyboards learn to hold the stylus held outwards between their fingers so that they
can type whilst holding it. As it is unattached the stylus can easily get lost, but a
closed pen can be used in emergencies.
Stylus, light pen and touchscreen are all very direct in that the relationship
between the device and the thing selected is immediate. In contrast, mouse, touch-
pad, joystick and trackball all have to map movements on the desk to cursor move-
ment on the screen.
However, the direct devices suffer from the problem that, in use, the act of point-
ing actually obscures the display, making it harder to use, especially if complex
detailed selections or movements are required in rapid succession. This means that
screen designs have to take into account where the user’s hand will be. For example,
you may want to place menus at the bottom of the screen rather than the top. Also
you may want to offer alternative layouts for right-handed and left-handed users.
2.3.7 Digitizing tablet
The digitizing tablet is a more specialized device typically used for freehand drawing,
but may also be used as a mouse substitute. Some highly accurate tablets, usually
using a puck (a mouse-like device), are used in special applications such as digitizing
information for maps.
The tablet provides positional information by measuring the position of some
device on a special pad, or tablet, and can work in a number of ways. The resistive
2.3 Positioning, pointing and drawing 77
tabletdetects point contact between two separated conducting sheets. It has advant-
ages in that it can be operated without a specialized stylus – a pen or the user’s finger
is sufficient. The magnetic tablet detects current pulses in a magnetic field using a
small loop coil housed in a special pen. There are also capacitative and electrostatic
tablets that work in a similar way. The sonic tabletis similar to the above but requires
no special surface. An ultrasonic pulse is emitted by a special pen which is detected
by two or more microphones which then triangulate the pen position. This device
can be adapted to provide 3D input, if required.
Digitizing tablets are capable of high resolution, and are available in a range of
sizes. Sampling rates vary, affecting the resolution of cursor movement, which gets
progressively finer as the sampling rate increases. The digitizing tablet can be used to
detect relative motion or absolute motion, but is an indirect device since there is a
mapping from the plane of operation of the tablet to the screen. It can also be used
for text input; if supported by character recognition software, handwriting can be
interpreted. Problems with digitizing tablets are that they require a large amount of
desk space, and may be awkward to use if displaced to one side by the keyboard.
2.3.8 Eyegaze
Eyegaze systems allow you to control the computer by simply looking at it! Some sys-
tems require you to wear special glasses or a small head-mounted box, others are
built into the screen or sit as a small box below the screen. A low-power laser is shone
into the eye and is reflected off the retina. The reflection changes as the angle of the
eye alters, and by tracking the reflected beam the eyegaze system can determine the
direction in which the eye is looking. The system needs to be calibrated, typically by
staring at a series of dots on the screen, but thereafter can be used to move the screen
cursor or for other more specialized uses. Eyegaze is a very fast and accurate device,
but the more accurate versions can be expensive. It is fine for selection but not for
drawing since the eye does not move in smooth lines. Also in real applications it
can be difficult to distinguish deliberately gazing at something and accidentally
glancing at it.
Such systems have been used in military applications, notably for guiding air-to-
air missiles to their targets, but are starting to find more peaceable uses, for disabled
users and for workers in environments where it is impossible for them to use their
hands. The rarity of the eyegaze is due partly to its novelty and partly to its expense,
and it is usually found only in certain domain-specific applications. Within HCI it is
particularly useful as part of evaluation as one is able to trace exactly where the user
is looking [81]. As prices drop and the technology becomes less intrusive we may see
more applications using eyegaze, especially in virtual reality and augmented reality
areas (see Chapter 20).
2.3.9 Cursor keys and discrete positioning
All of the devices we have discussed are capable of giving near continuous 2D
positioning, with varying degrees of accuracy. For many applications we are only
78 Chapter 2 n The computer
interested in positioning within a sequential list such as a menu or amongst 2D cells
as in a spreadsheet. Even for moving within text discrete up/down left/right keys can
sometimes be preferable to using a mouse.
Cursor keys are available on most keyboards. Four keys on the keyboard are used
to control the cursor, one each for up, down, left and right. There is no standardized
layout for the keys. Some layouts are shown in Figure 2.7, but the most common now
is the inverted ‘T’.
Cursor keys used to be more heavily used in character-based systems before
windows and mice were the norm. However, when logging into remote machines
such as web servers, the interface is often a virtual character-based terminal within a
telnet window. In such applications it is common to find yourself in a 1970s world
of text editors controlled sometimes using cursor keys and sometimes by more
arcane combinations of control keys!
Small devices such as mobile phones, personal entertainment and television
remote controls often require discrete control, either dedicated to a particular func-
tion such as volume, or for use as general menu selection. Figure 2.8 shows examples
of these. The satellite TV remote control has dedicated ‘+/–’ buttons for controlling
volume and stepping between channels. It also has a central cursor pad that is used
for on-screen menus. The mobile phone has a single central joystick-like device.
This can be pushed left/right, up/down to navigate within the small 3× 3 array of
graphical icons as well as select from text menus.
DISPLAY DEVICES
The vast majority of interactive computer systems would be unthinkable with-
out some sort of display screen, but many such systems do exist, though usually
in specialized applications only. Thinking beyond the traditional, systems such as
cars, hi-fis and security alarms all have different outputs from those expressible on a
screen, but in the personal computer and workstation market, screens are pervasive.
2.4
Figure 2.7 Various cursor key layouts
2.4 Display devices 79
In this section, we discuss the standard computer display in detail, looking at the
properties of bitmap screens, at different screen technologies, at large and situated
displays, and at a new technology, ‘digital paper’.
2.4.1 Bitmap displays – resolution and color
Virtually all computer displays are based on some sort of bitmap. That is the display
is made of vast numbers of colored dots or pixels in a rectangular grid. These pixels
may be limited to black and white (for example, the small display on many TV
remote controls), in grayscale, or full color.
Figure 2.8 Satellite TV remote control and mobile phone. Source: Photograph left by Alan Dix with
permission from British Sky Broadcasting Limited, photograph right by Alan Dix (Ericsson phone)
80 Chapter 2 n The computer
The color or, for monochrome screens, the intensity at each pixel is held by the
computer’s video card. One bit per pixel can store on/off information, and hence only
black and white (the term ‘bitmap’ dates from such displays). More bits per pixel
give rise to more color or intensity possibilities. For example, 8 bits/pixel give rise to
2
8
=256 possible colors at any one time. The set of colors make up what is called the
colormap, and the colormap can be altered at any time to produce a different set of
colors. The system is therefore capable of actually displaying many more than the
number of colors in the colormap, but not simultaneously. Most desktop computers
now use 24 or 32 bits per pixel which allows virtually unlimited colors, but devices such
as mobile phones and PDAs are often still monochrome or have limited color range.
As well as the number of colors that can be displayed at each pixel, the other measure
that is important is the resolution of the screen. Actually the word ‘resolution’ is used
in a confused (and confusing!) way for screens. There are two numbers to consider:
n the total number of pixels: in standard computer displays this is always in a 4:3
ratio, perhaps 1024 pixels across by 768 down, or 1600× 1200; for PDAs this will
be more in the order of a few hundred pixels in each direction.
n the density of pixels: this is measured in pixels per inch. Unlike printers (see
Section 2.7 below) this density varies little between 72 and 96 pixels per inch.
To add to the confusion, a monitor, liquid crystal display (LCD) screen or other
display device will quote its maximum resolution, but the computer may actually
give it less than this. For example, the screen may be a 1200× 900 resolution with 96
pixels per inch, but the computer only sends it 800× 600. In the case of a cathode ray
tube (CRT) this typically will mean that the image is stretched over the screen sur-
face giving a lower density of 64 pixels per inch. An LCD screen cannot change its
pixel size so it would keep 96 pixels per inch and simply not use all its screen space,
adding a black border instead. Some LCD projectors will try to stretch or reduce
what they are given, but this may mean that one pixel gets stretched to two, or two
pixels get ‘squashed’ into one, giving rise to display ‘artifacts’ such as thin lines dis-
appearing, or uniform lines becoming alternately thick or thin.
Although horizontal and vertical lines can be drawn perfectly on bitmap screens,
and lines at 45 degrees reproduce reasonably well, lines at any other angle and curves
have ‘jaggies’, rough edges caused by the attempt to approximate the line with pixels.
When using a single color jaggies are inevitable. Similar effects are seen in bitmap
fonts. The problem of jaggies can be reduced by using high-resolution screens, or by
a technique known as anti-aliasing. Anti-aliasing softens the edges of line segments,
blurring the discontinuity and making the jaggies less obvious.
Look at the two images in Figure 2.9 with your eyes slightly screwed up. See how
the second anti-aliased line looks better. Of course, screen resolution is much higher,
but the same principle holds true. The reason this works is because our brains are
constantly ‘improving’ what we see in the world: processing and manipulating the
raw sensations of the rods and cones in our eyes and turning them into something
meaningful. Often our vision is blurred because of poor light, things being out of
focus, or defects in our vision. Our brain compensates and tidies up blurred images.
By deliberately blurring the image, anti-aliasing triggers this processing in our brain
and we appear to see a smooth line at an angle.
2.4 Display devices 81
2.4.2 Technologies
Cathode ray tube
The cathode ray tube is the television-like computer screen still most common as
we write this, but rapidly being displaced by flat LCD screens. It works in a similar
way to a standard television screen. A stream of electrons is emitted from an electron
gun, which is then focussed and directed by magnetic fields. As the beam hits the
phosphor-coated screen, the phosphor is excited by the electrons and glows (see
Figure 2.10). The electron beam is scanned from left to right, and then flicked back
to rescan the next line, from top to bottom. This is repeated, at about 30 Hz (that is,
30 times a second), per frame, although higher scan rates are sometimes used to
reduce the flicker on the screen. Another way of reducing flicker is to use interlacing,
in which the odd lines on the screen are all scanned first, followed by the even lines.
Using a high-persistence phosphor, which glows for a longer time when excited, also
reduces flicker, but causes image smearing especially if there is significant animation.
Black and white screens are able to display grayscale by varying the intensity of the
electron beam; color is achieved using more complex means. Three electron guns
are used, one each to hit red, green and blue phosphors. Combining these colors can
Figure 2.9 Magnified anti-aliased lines
Figure 2.10 CRT screen
82 Chapter 2 n The computer
produce many others, including white, when they are all fully on. These three phosphor
dots are focussed to make a single point using a shadow mask, which is imprecise and
gives color screens a lower resolution than equivalent monochrome screens.
An alternative approach to producing color on the screen is to use beam penetra-
tion. A special phosphor glows a different color depending on the intensity of the
beam hitting it.
The CRT is a cheap display device and has fast enough response times for rapid
animation coupled with a high color capability. Note that animation does not neces-
sarily mean little creatures and figures running about on the screen, but refers in
a more general sense to the use of motion in displays: moving the cursor, opening
windows, indicating processor-intensive calculations, or whatever. As screen resolu-
tion increases, however, the price rises. Because of the electron gun and focussing
components behind the screen, CRTs are fairly bulky, though recent innovations
have led to flatter displays in which the electron gun is not placed so that it fires
directly at the screen, but fires parallel to the screen plane with the resulting beam
bent through 90 degrees to hit the screen.
Health hazards of CRT displays
Most people who habitually use computers are aware that screens can often cause eyestrain
and fatigue; this is usually due to flicker, poor legibility or low contrast. There have also been many
concerns relating to the emission of radiation from screens. These can be categorized as follows:
n X-rays which are largely absorbed by the screen (but not at the rear!)
n ultraviolet and infrared radiation from phosphors in insignificant levels
n radio frequency emissions, plus ultrasound (approximately 16 kHz)
n electrostatic field which leaks out through the tube to the user. The intensity is dependent on
distance and humidity. This can cause rashes in the user
n electromagnetic fields (50 Hz to 0.5 MHz) which create induction currents in conductive
materials, including the human body. Two types of effects are attributed to this: in the visual
system, a high incidence of cataracts in visual display unit (VDU) operators, and concern over
reproductive disorders (miscarriages and birth defects).
Research into the potentially harmful effect of these emissions is generally inconclusive, in that it
is difficult to determine precisely what the causes of illness are, and many health scares have been
the result of misinformed media opinion rather than scientific fact. However, users who are preg-
nant ought to take especial care and observe simple precautions. Generally, there are a number of
common-sense things that can be done to relieve strain and minimize any risk. These include
n not sitting too close to the screen
n not using very small fonts
n not looking at the screen for a long time without a break
n working in well-lit surroundings
n not placing the screen directly in front of a bright window.
2.4 Display devices 83
Liquid crystal display
If you have used a personal organizer or notebook computer, you will have seen
the light, flat plastic screens. These displays utilize liquid crystal technology and are
smaller, lighter and consume far less power than traditional CRTs. These are also
commonly referred to as flat-panel displays. They have no radiation problems asso-
ciated with them, and are matrix addressable, which means that individual pixels can
be accessed without the need for scanning.
Similar in principle to the digital watch, a thin layer of liquid crystal is sandwiched
between two glass plates. The top plate is transparent and polarized, whilst the bot-
tom plate is reflective. External light passes through the top plate and is polarized,
which means that it only oscillates in one direction. This then passes through the
crystal, reflects off the bottom plate and back to the eye, and so that cell looks white.
When a voltage is applied to the crystal, via the conducting glass plates, the crystal
twists. This causes it to turn the plane of polarization of the incoming light, rotating
it so that it cannot return through the top plate, making the activated cell look black.
The LCD requires refreshing at the usual rates, but the relatively slow response of the
crystal means that flicker is not usually noticeable. The low intensity of the light
emitted from the screen, coupled with the reduced flicker, means that the LCD is less
tiring to use than standard CRT ones, with reduced eyestrain.
This different technology can be used to replace the standard screen on a desktop
computer, and this is now common. However, the particular characteristics of com-
pactness, light weight and low power consumption have meant that these screens
have created a large niche in the computer market by monopolizing the notebook
and portable computer systems side. The advent of these screens allowed small, light
computers to be built, and created a large market that did not previously exist. Such
computers, riding on the back of the technological wave, have opened up a different
way of working for many people, who now have access to computers when away
from the office, whether out on business or at home. Working in a different location
on a smaller machine with different software obviously represents a different style
of interaction and so once again we can see that differences in devices may alter
the human–computer interaction considerably. The growing notebook computer
market fed back into an investment in developing LCD screen technology, with
supertwisted crystals increasing the viewing angle dramatically. Response times have
also improved so that LCD screens are now used in personal DVD players and even
in home television.
When the second edition of this book was being written the majority of LCD
screens were black and white or grayscale, We wrote then ‘it will be interesting to see
whether color LCD screens supersede grayscale by the time the third edition of this
book is prepared’. Of course, this is precisely the case. Our expectation is that by the
time we produce the next edition LCD monitors will have taken over from CRT
monitors completely.
84 Chapter 2 n The computer
Special displays
There are a number of other display technologies used in niche markets. The one
you are most likely to see is the gas plasma display, which is used in large screens
(see Section 2.4.3 below).
The random scan display, also known as the directed beam refresh, or vector display,
works differently from the bitmap display, also known as raster scan, that we dis-
cussed in Section 2.4.1. Instead of scanning the whole screen sequentially and hori-
zontally, the random scan draws the lines to be displayed directly. By updating the
screen at at least 30 Hz to reduce flicker, the direct drawing of lines at any angle
means that jaggies are not created, and higher resolutions are possible, up to 4096×
4096 pixels. Color on such displays is achieved using beam penetration technology,
and is generally of a poorer quality. Eyestrain and fatigue are still a problem, and
these displays are more expensive than raster scan ones, so they are now only used in
niche applications.
The direct view storage tube is used extensively as the display for an analog
storage oscilloscope, which is probably the only place that these displays are used in
any great numbers. They are similar in operation to the random scan CRT but the
image is maintained by flood guns which have the advantage of producing a stable
display with no flicker. The screen image can be incrementally updated but not
selectively erased; removing items has to be done by redrawing the new image on
a completely erased screen. The screens have a high resolution, typically about
4096× 3120 pixels, but suffer from low contrast, low brightness and a difficulty in
displaying color.
2.4.3 Large displays and situated displays
Displays are no longer just things you have on your desktop or laptop. In Chapter 19
we will discuss meeting room environments that often depend on large shared
screens. You may have attended lectures where the slides are projected from a com-
puter onto a large screen. In shops and garages large screen adverts assault us from
all sides.
There are several types of large screen display. Some use gas plasma technology
to create large flat bitmap displays. These behave just like a normal screen except
they are big and usually have the HDTV (high definition television) wide screen
format which has an aspect ratio of 16:9 instead of the 4:3 on traditional TV and
monitors.
Where very large screen areas are required, several smaller screens, either LCD or
CRT, can be placed together in a video wall. These can display separate images, or a
single TV or computer image can be split up by software or hardware so that each
screen displays a portion of the whole and the result is an enormous image. This
is the technique often used in large concerts to display the artists or video images
during the performance.
2.4 Display devices 85
Possibly the large display you are most likely to have encountered is some sort of
projector. There are two variants of these. In very large lecture theatres, especially
older ones, you see projectors with large red, green and blue lenses. These each scan
light across the screen to build a full color image. In smaller lecture theatres and in
small meetings you are likely to see LCD projectors. Usually the size of a large book,
these are like ordinary slide projectors except that where the slide would be there is
a small LCD screen instead. The light from the projector passes through the tiny
screen and is then focussed by the lens onto the screen.
The disadvantage of projected displays is that the presenter’s shadow can often fall
across the screen. Sometimes this is avoided in fixed lecture halls by using back pro-
jection. In a small room behind the screen of the lecture theatre there is a projector
producing a right/left reversed image. The screen itself is a semi-frosted glass so that
the image projected on the back can be seen in the lecture theatre. Because there are
limits on how wide an angle the projector can manage without distortion, the size of
the image is limited by the depth of the projection room behind, so these are less
heavily used than front projection.
As well as for lectures and meetings, display screens can be used in various public
places to offer information, link spaces or act as message areas. These are often called
situated displays as they take their meaning from the location in which they are
situated. These may be large screens where several people are expected to view or
interact simultaneously, or they may be very small. Figure 2.11 shows an example
of a small experimental situated display mounted by an office door to act as an
electronic sticky note [70].
Figure 2.11 Situated door display. Source: Courtesy of Keith Cheverst
86 Chapter 2 n The computer
2.4.4 Digital paper
A new form of ‘display’ that is still in its infancy is the various forms of digital paper.
These are thin flexible materials that can be written to electronically, just like a com-
puter screen, but which keep their contents even when removed from any electrical
supply.
There are various technologies being investigated for this. One involves the whole
surface being covered with tiny spheres, black one side, white the other. Electronics
embedded into the material allow each tiny sphere to be rotated to make it black
or white. When the electronic signal is removed the ball stays in its last orientation.
A different technique has tiny tubes laid side by side. In each tube is light-absorbing
liquid and a small reflective sphere. The sphere can be made to move to the top sur-
face or away from it making the pixel white or black. Again the sphere stays in its last
position once the electronic signal is removed.
Probably the first uses of these will be for large banners that can be reprogrammed
or slowly animated. This is an ideal application, as it does not require very rapid
updates and does not require the pixels to be small. As the technology matures, the
aim is to have programmable sheets of paper that you attach to your computer to get
a ‘soft’ printout that can later be changed. Perhaps one day you may be able to have
a ‘soft’ book that appears just like a current book with soft pages that can be turned
and skimmed, but where the contents and cover can be changed when you decide to
download a new book from the net!
DESIGN FOCUS
Hermes: a situated display
Office doors are often used as a noticeboard with messages from the occupant such as ‘just gone
out’ or ‘timetable for the week’ and from visitors ‘missed you, call when you get back’. The Hermes
system is an electronic door display that offers some of the functions of sticky notes on a door [70].
Figure 2.11(i) shows an installed Hermes device fixed just beside the door, including the socket to
use a Java iButton to authenticate the occupant. The occupant can leave messages that others can read
(Figure 2.11(ii)) and people coming to the door can leave messages for the occupant. Electronic notes
are smaller than paper ones, but because they are electronic they can be read remotely using a web
interface (Figure 2.11(iii)), or added by SMS (see Chapter 19, Section 19.3.2).
The fact that it is situated – by a person’s door – is very important. It establishes a context, ‘Alan’s
door’, and influences the way the system is used. For example, the idea of anonymous messages left on
the door, where the visitor has had to be physically present, feels different from, say, anonymous emails.
See the book website for the full case study: /e3/casestudy/hermes/
2.5 Devices for virtual reality and 3D interaction 87
DEVICES FOR VIRTUAL REALITY AND 3D INTERACTION
Virtual reality (VR) systems and various forms of 3D visualization are discussed in
detail in Chapter 20. These require you to navigate and interact in a three-dimensional
space. Sometimes these use the ordinary controls and displays of a desktop computer
system, but there are also special devices used both to move and interact with 3D
objects and to enable you to see a 3D environment.
2.5.1 Positioning in 3D space
Virtual reality systems present a 3D virtual world. Users need to navigate through
these spaces and manipulate the virtual objects they find there. Navigation is not
simply a matter of moving to a particular location, but also of choosing a particular
orientation. In addition, when you grab an object in real space, you don’t simply
move it around, but also twist and turn it, for example when opening a door. Thus
the move from mice to 3D devices usually involves a change from two degrees of
freedom to six degrees of freedom, not just three.
Cockpit and virtual controls
Helicopter and aircraft pilots already have to navigate in real space. Many arcade
games and also more serious applications use controls modeled on an aircraft
cockpit to ‘fly’ through virtual space. However, helicopter pilots are very skilled and
it takes a lot of practice for users to be able to work easily in such environments.
In many PC games and desktop virtual reality (where the output is shown on
an ordinary computer screen), the controls are themselves virtual. This may be a
simulated form of the cockpit controls or more prosaic up/down left/right buttons.
The user manipulates these virtual controls using an ordinary mouse (or other 2D
device). Note that this means there are two levels of indirection. It is a tribute to the
flexibility of the human mind that people can not only use such systems but also
rapidly become proficient.
The 3D mouse
There are a variety of devices that act as 3D versions of a mouse. Rather than just
moving the mouse on a tabletop, you can pick it up, move it in three dimensions,
rotate the mouse and tip it forward and backward. The 3D mouse has a full six
degrees of freedom as its position can be tracked (three degrees), and also its
up/down angle (called pitch), its left/right orientation (called yaw) and the amount
it is twisted about its own axis (called roll) (see Figure 2.12). Various sensors are used
to track the mouse position and orientation: magnetic coils, ultrasound or even
mechanical joints where the mouse is mounted rather like an angle-poise lamp.
With the 3D mouse, and indeed most 3D positioning devices, users may experi-
ence strain from having to hold the mouse in the air for a long period. Putting the
2.5
88 Chapter 2 n The computer
3D mouse down may even be treated as an action in the virtual environment, that is
taking a nose dive.
Dataglove
One of the mainstays of high-end VR systems (see Chapter 20), the dataglove is a 3D
input device. Consisting of a lycra glove with optical fibers laid along the fingers, it
detects the joint angles of the fingers and thumb. As the fingers are bent, the fiber
optic cable bends too; increasing bend causes more light to leak from the fiber, and
the reduction in intensity is detected by the glove and related to the degree of bend
in the joint. Attached to the top of the glove are two sensors that use ultrasound to
determine 3D positional information as well as the angle of roll, that is the degree of
wrist rotation. Such rich multi-dimensional input is currently a solution in search
of a problem, in that most of the applications in use do not require such a compre-
hensive form of data input, whilst those that do cannot afford it. However, the avail-
ability of cheaper versions of the dataglove will encourage the development of more
complex systems that are able to utilize the full power of the dataglove as an input
device. There are a number of potential uses for this technology to assist disabled
people, but cost remains the limiting factor at present.
The dataglove has the advantage that it is very easy to use, and is potentially very
powerful and expressive (it can provide 10 joint angles, plus the 3D spatial informa-
tion and degree of wrist rotation, 50 times a second). It suffers from extreme
expense, and the fact that it is difficult to use in conjunction with a keyboard.
However, such a limitation is shortsighted; one can imagine a keyboard drawn onto
a desk, with software detecting hand positions and interpreting whether the virtual
keys had been hit or not. The potential for the dataglove is vast; gesture recognition
and sign language interpretation are two obvious areas that are the focus of active
research, whilst less obvious applications are evolving all the time.
Figure 2.12 Pitch, yaw and roll
2.5 Devices for virtual reality and 3D interaction 89
Virtual reality helmets
The helmets or goggles worn in some VR systems have two purposes: (i) they display
the 3D world to each eye and (ii) they allow the user’s head position to be tracked.
We will discuss the former later when we consider output devices. The head tracking
is used primarily to feed into the output side. As the user’s head moves around the
user ought to see different parts of the scene. However, some systems also use the
user’s head direction to determine the direction of movement within the space and
even which objects to manipulate (rather like the eyegaze systems). You can think of
this rather like leading a horse in reverse. If you want a horse to go in a particular
direction, you use the reins to pull its head in the desired direction and the horse
follows its head.
Whole-body tracking
Some VR systems aim to be immersive, that is to make the users feel as if they are
really in the virtual world. In the real world it is possible (although not usually wise)
to walk without looking in the direction you are going. If you are driving down
the road and glance at something on the roadside you do not want the car to do a
sudden 90-degree turn! Some VR systems therefore attempt to track different kinds
of body movement. Some arcade games have a motorbike body on which you can
lean into curves. More strangely, small trampolines have been wired up so that the
user can control movement in virtual space by putting weight on different parts of
the trampoline. The user can literally surf through virtual space. In the extreme the
movement of the whole body may be tracked using devices similar to the dataglove,
or using image-processing techniques. In the latter, white spots are stuck at various
points of the user’s body and the position of these tracked using two or more cam-
eras, allowing the location of every joint to be mapped. Although the last of these
sounds a little constraining for the fashion conscious it does point the way to less
intrusive tracking techniques.
2.5.2 3D displays
Just as the 3D images used in VR have led to new forms of input device, they
also require more sophisticated outputs. Desktop VR is delivered using a standard
computer screen and a 3D impression is produced by using effects such as shadows,
occlusion (where one object covers another) and perspective. This can be very
effective and you can even view 3D images over the world wide web using a VRML
(virtual reality markup language) enabled browser.
Seeing in 3D
Our eyes use many cues to perceive depth in the real world (see also Chapter 1). It is
in fact quite remarkable as each eye sees only a flattened form of the world, like
a photograph. One important effect is stereoscopic vision (or simply stereo vision).
90 Chapter 2 n The computer
Because each eye is looking at an object from a slightly different angle each sees a
different image and our brain is able to use this to assess the relative distance of dif-
ferent objects. In desktop VR this stereoscopic effect is absent. However, various
devices exist to deliver true stereoscopic images.
The start point of any stereoscopic device is the generation of images from differ-
ent perspectives. As the computer is generating images for the virtual world anyway,
this just means working out the right positions and angles corresponding to the typ-
ical distance between eyes on a human face. If this distance is too far from the natural
one, the user will be presented with a giant’s or gnat’s eye view of the world!
Different techniques are then used to ensure that each eye sees the appropriate
image. One method is to have two small screens fitted to a pair of goggles. A differ-
ent image is then shown to each eye. These devices are currently still quite cumber-
some and the popular image of VR is of a user with head encased in a helmet with
something like a pair of inverted binoculars sticking out in front. However, smaller
and lighter LCDs are now making it possible to reduce the devices towards the size
and weight of ordinary spectacles.
An alternative method is to have a pair of special spectacles connected so that each
eye can be blanked out by timed electrical signals. If this is synchronized with the
frame rate of a computer monitor, each eye sees alternate images. Similar techniques
use polarized filters in front of the monitor and spectacles with different polarized
lenses. These techniques are both effectively using similar methods to the red–green
3D spectacles given away in some breakfast cereals. Indeed, these red–green spectacles
have been used in experiments in wide-scale 3D television broadcasts. However,
the quality of the 3D image from the polarized and blanked eye spectacles is sub-
stantially better.
The ideal would be to be able to look at a special 3D screen and see 3D images just
as one does with a hologram – 3D television just like in all the best sci-fi movies!
But there is no good solution to this yet. One method is to inscribe the screen with
small vertical grooves forming hundreds of prisms. Each eye then sees only alternate
dots on the screen allowing a stereo image at half the normal horizontal resolution.
However, these screens have very narrow viewing angles, and are not ready yet for
family viewing.
In fact, getting stereo images is not the whole story. Not only do our eyes see dif-
ferent things, but each eye also focusses on the current object of interest (small mus-
cles change the shape of the lens in the pupil of the eye). The images presented to the
eye are generated at some fixed focus, often with effectively infinite depth of field.
This can be confusing and tiring. There has been some progress recently on using
lasers to detect the focal depth of each eye and adjust the images correspondingly,
similar to the technology used for eye tracking. However, this is not currently used
extensively.
VR motion sickness
We all get annoyed when computers take a long time to change the screen, pop up
a window, or play a digital movie. However, with VR the effects of poor display
2.6 Physical controls, sensors and special devices 91
performance can be more serious. In real life when we move our head the image our
eyes see changes accordingly. VR systems produce the same effect by using sensors in
the goggles or helmet and then using the position of the head to determine the right
image to show. If the system is slow in producing these images a lag develops
between the user moving his head and the scene changing. If this delay is more than
a hundred milliseconds or so the feeling becomes disorienting. The effect is very
similar to that of being at sea. You stand on the deck looking out to sea, the boat
gently rocking below you. Tiny channels in your ears detect the movement telling
your brain that you are moving; your eyes see the horizon moving in one direction
and the boat in another. Your brain gets confused and you get sick. Users of VR can
experience similar nausea and few can stand it for more than a short while. In fact,
keeping laboratories sanitary has been a major push in improving VR technology.
Simulators and VR caves
Because of the problems of delivering a full 3D environment via head-mounted
displays, some virtual reality systems work by putting the user within an environ-
ment where the virtual world is displayed upon it. The most obvious examples of this
are large flight simulators – you go inside a mock-up of an aircraft cockpit and the
scenes you would see through the windows are projected onto the virtual windows.
In motorbike or skiing simulators in video arcades large screens are positioned to fill
the main part of your visual field. You can still look over your shoulder and see your
friends, but while you are engaged in the game it surrounds you.
More general-purpose rooms called caves have large displays positioned all
around the user, or several back projectors. In these systems the user can look all
around and see the virtual world surrounding them.
PHYSICAL CONTROLS, SENSORS AND SPECIAL DEVICES
As we have discussed, computers are coming out of the box. The mouse keyboard
and screen of the traditional computer system are not relevant or possible in
applications that now employ computers such as interactive TV, in-car navigation
systems or personal entertainment. These devices may have special displays, may use
sound, touch and smell as well as visual displays, may have dedicated controls and
may sense the environment or your own bio-signs.
2.6.1 Special displays
Apart from the CRT screen there are a number of visual outputs utilized in com-
plex systems, especially in embedded systems. These can take the form of analog
representations of numerical values, such as dials, gauges or lights to signify a certain
system state. Flashing light-emitting diodes (LEDs) are used on the back of some
2.6
92 Chapter 2 n The computer
computers to signify the processor state, whilst gauges and dials are found in process
control systems. Once you start in this mode of thinking, you can contemplate
numerous visual outputs that are unrelated to the screen. One visual display that has
found a specialized niche is the head-up display that is used in aircraft. The pilot is
fully occupied looking forward and finds it difficult to look around the cockpit to get
information. There are many different things that need to be known, ranging from
data from tactical systems to navigational information and aircraft status indicators.
The head-up display projects a subset of this information into the pilot’s line
of vision so that the information is directly in front of her eyes. This obviates the
need for large banks of information to be scanned with the corresponding lack
of attention to what is happening outside, and makes the pilot’s job easier. Less
important information is usually presented on a smaller number of dials and gauges
in the cockpit to avoid cluttering the head-up display, and these can be monitored
less often, during times of low stress.
2.6.2 Sound output
Another mode of output that we should consider is that of auditory signals. Often
designed to be used in conjunction with screen displays, auditory outputs are poorly
understood: we do not yet know how to utilize sound in a sensible way to achieve
maximum effect and information transference. We have discussed speech previ-
ously, but other sounds such as beeps, bongs, clanks, whistles and whirrs are all used
to varying effect. As well as conveying system output, sounds offer an important level
of feedback in interactive systems. Keyboards can be set to emit a click each time
a key is pressed, and this appears to speed up interactive performance. Telephone
keypads often sound different tones when the keys are pressed; a noise occurring
signifies that the key has been successfully pressed, whilst the actual tone provides
some information about the particular key that was pressed. The advantage of audit-
ory feedback is evident when we consider a simple device such as a doorbell. If we
press it and hear nothing, we are left undecided. Should we press it again, in case
we did not do it right the first time, or did it ring but we did not hear it? And if we
press it again but it actually did ring, will the people in the house think we are very
rude, ringing insistently? We feel awkward and a little stressed. If we were using a
computer system instead of a doorbell and were faced with a similar problem, we
would not enjoy the interaction and would not perform as well. Yet it is a simple
problem that could be easily rectified by a better initial design, using sound. Chap-
ter 10 will discuss the use of the auditory channel in more detail.
2.6.3 Touch, feel and smell
Our other senses are used less in normal computer applications, but you may have
played computer games where the joystick or artificial steering wheel vibrated, per-
haps when a car was about to go off the track. In some VR applications, such as the
use in medical domains to ‘practice’ surgical procedures, the feel of an instrument
2.6 Physical controls, sensors and special devices 93
moving through different tissue types is very important. The devices used to emulate
these procedures have force feedback, giving different amounts of resistance depend-
ing on the state of the virtual operation. These various forms of force, resistance and
texture that influence our physical senses are called haptic devices.
Haptic devices are not limited to virtual environments, but are used in specialist
interfaces in the real world too. Electronic braille displays either have pins that rise
or fall to give different patterns, or may involve small vibration pins. Force feedback
has been used in the design of in-car controls.
In fact, the car gives a very good example of the power of tactile feedback. If
you drive over a small bump in the road the car is sent slightly off course; however,
the chances are that you will correct yourself before you are consciously aware of the
bump. Within your body you have reactions that push back slightly against pressure
to keep your limbs where you ‘want’ them, or move your limbs out of the way when
you brush against something unexpected. These responses occur in your lower
brain and are very fast, not involving any conscious effort. So, haptic devices can
access very fast responses, but these responses are not fully controlled. This can be
used effectively in design, but of course also with caution.
Texture is more difficult as it depends on small changes between neighboring
points on the skin. Also, most of our senses notice change rather than fixed stimuli,
so we usually feel textures when we move our fingers over a surface, not just when
resting on it. Technology for this is just beginning to become available
There is evidence that smell is one of the strongest cues to memory. Various
historical recreations such as the Jorvik Centre in York, England, use smells to create
a feeling of immersion in their static displays of past life. Some arcade games also
generate smells, for example, burning rubber as your racing car skids on the track.
These examples both use a fixed smell in a particular location. There have been
several attempts to produce devices to allow smells to be recreated dynamically in
response to games or even internet sites. The technical difficulty is that our noses do
not have a small set of basic smells that are mixed (like salt/sweet/sour/bitter/savoury
on our tongue), but instead there are thousands of different types of receptor
responding to different chemicals in the air. The general pattern of devices to gener-
ate smells is to have a large repertoire of tiny scent-containing capsules that are
released in varying amounts on demand – rather like a printer cartridge with
hundreds of ink colors! So far there appears to be no mass market for these devices,
but they may eventually develop from niche markets.
Smell is a complex multi-dimensional sense and has a peculiar ability to trigger
memory, but cannot be changed rapidly. These qualities may prove valuable in areas
where a general sense of location and awareness is desirable. For example, a project
at the Massachusetts Institute of Technology explored the use of a small battery
of scent generators which may be particularly valuable for ambient displays and
background awareness [198, 161].
94 Chapter 2 n The computer
2.6.4 Physical controls
Look at Figure 2.13. In it you can see the controls for a microwave, a washing
machine and a personal MiniDisc player. See how they each use very different phys-
ical devices: the microwave has a flat plastic sheet with soft buttons, the washing
machine large switches and knobs, and the MiniDisc has small buttons and an inter-
esting multi-function end.
A desktop computer system has to serve many functions and so has generic keys
and controls that can be used for a variety of purposes. In contrast, these dedicated
control panels have been designed for a particular device and for a single use. This is
why they differ so much.
Looking first at the microwave, it has a flat plastic control panel. The buttons on
the panel are pressed and ‘give’ slightly. The choice of the smooth panel is probably
partly for visual design – it looks streamlined! However, there are also good prac-
tical reasons. The microwave is used in the kitchen whilst cooking, with hands that
may be greasy or have food on them. The smooth controls have no gaps where food
can accumulate and clog buttons, so it can easily be kept clean and hygienic.
When using the washing machine you are handling dirty clothes, which may be
grubby, but not to the same extent, so the smooth easy-clean panel is less important
(although some washing machines do have smooth panels). It has several major
DESIGN FOCUS
Feeling the road
In the BMW 7 Series you will find a single haptic feedback control for many of the functions that would
normally have dedicated controls. It uses technology developed by Immersion Corporation who are
also behind the force feedback found in many medical and entertainment haptic devices. The iDrive
control slides backwards and forwards and rotates to give access to various menus and lists of options.
The haptic feedback allows the user to feel ‘clicks’ appropriate to the number of items in the various
menu lists.
See: www.immersion.com/ and www.bmw.com/ Picture courtesy of BMW AG
2.6 Physical controls, sensors and special devices 95
settings and the large buttons act both as control and display. Also the dials for
dryer timer and the washing program act both as a means to set the desired time or
program and to display the current state whilst the wash is in progress.
Finally, the MiniDisc controller needs to be small and unobtrusive. It has tiny
buttons, but the end control is most interesting. It twists from side to side and
also can be pulled and twisted. This means the same control can be used for two
different purposes. This form of multi-function control is common in small
devices.
We discussed the immediacy of haptic feedback and these lessons are also import-
ant at the level of creating physical devices; do keys, dials, etc., feel as if they have
been pressed or turned? Getting the right level of resistance can make the device
work naturally, give you feedback that you have done something, or let you know
that you are controlling something. Where for some reason this is not possible,
something has to be done to prevent the user getting confused, perhaps pressing but-
tons twice; for example, the smooth control panel of the microwave in Figure 2.13
offers no tactile feedback, but beeps for each keypress. We will discuss these design
issues further when we look at user experience in Chapter 3 (Section 3.9).
Figure 2.13 Physical controls on microwave, washing machine and MiniDisc. Source: Photograph bottom
right by Alan Dix with permission from Sony (UK)
96 Chapter 2 n The computer
Whereas texture is difficult to generate, it is easy to build into materials. This can
make a difference to the ease of use of a device. For example, a touchpad is smooth,
but a keyboard nipple is usually rubbery. If they were the other way round it would
be hard to drag your finger across the touchpad or to operate the nipple without
slipping. Texture can also be used to disambiguate. For example, most keyboards
have a small raised dot on the ‘home’ keys for touch typists and some calculators and
phones do the same on the ‘5’ key. This is especially useful in applications when the
eyes are elsewhere.
2.6.5 Environment and bio-sensing
In a public washroom there are often no controls for the wash basins, you simply put
your hands underneath and (hope that) the water flows. Similarly when you open
the door of a car, the courtesy light turns on. The washbasin is controlled by a small
infrared sensor that is triggered when your hands are in the basin (although it is
DESIGN FOCUS
Smart-Its – making using sensors easy
Building systems with physical sensors is no easy task. You need a soldering iron, plenty of experience
in electronics, and even more patience. Although some issues are unique to each sensor or project,
many of the basic building blocks are similar – connecting simple microprocessors to memory and
networks, connecting various standard sensors such as temperature, tilt, etc.
The Smart-Its project has made this job easier by creating a collection of components and an
architecture for adding new sensors. There are a number of basic Smart-It boards – the photo on
the left shows a microprocessor with wireless connectivity. Onto these boards are plugged a variety of
modules – in the center is a sensor board including temperature and light, and on the right is a power
controller.
See: www.smart-its.org/ Source: Courtesy of Hans Gellersen
sometimes hard to find the ‘sweet spot’ where this happens!). The courtesy lights are
triggered by a small switch in the car door.
Although we are not always conscious of them, there are many sensors in our
environment – controlling automatic doors, energy saving lights, etc. and devices
monitoring our behavior such as security tags in shops. The vision of ubiquitous
computing (see Chapters 4 and 20) suggests that our world will be filled with such
devices. Certainly the gap between science fiction and day-to-day life is narrow;
for example, in the film Minority Report (20th Century Fox) iris scanners identify
each passer-by to feed them dedicated advertisements, but you can buy just such an
iris scanner as a security add-on for your home computer.
There are many different sensors available to measure virtually anything: temper-
ature, movement (ultrasound, infrared, etc.), location (GPS, global positioning, in
mobile devices), weight (pressure sensors). In addition audio and video information
can be analyzed to identify individuals and to detect what they are doing. This
all sounds big brother like, but is also used in ordinary applications, such as the
washbasin.
Sensors can also be used to capture physiological signs such as body temperature,
unconscious reactions such as blink rate, or unconscious aspects of activities such
as typing rate, vocabulary shifts (e.g. modal verbs). For example, in a speech-based
game, Tsukahara and Ward use gaps in speech and prosody (patterns of rhythm,
pitch and loudness in speech) to infer the user’s emotional state and thus the nature
of acceptable responses [350] and Allanson discusses a variety of physiological
sensors to create ‘electrophysiological interactive computer systems’ [12].
PAPER: PRINTING AND SCANNING
Some years ago, a recurrent theme of information technology was the paperless office.
In the paperless office, documents would be produced, dispatched, read and filed
online. The only time electronic information would be committed to paper would be
when it went out of the office to ordinary customers, or to other firms who were lag-
gards in this technological race. This vision was fuelled by rocketing property prices,
and the realization that the floor space for a wastepaper basket could cost thousands
in rent each year. Some years on, many traditional paper files are now online, but the
desire for the completely paperless office has faded. Offices still have wastepaper bas-
kets, and extra floor space is needed for the special computer tables to house 14-inch
color monitors.
In this section, we will look at some of the available technology that exists to get
information to and from paper. We will look first at printing, the basic technology,
and issues raised by it. We will then go on to discuss the movement from paper back
into electronic media. Although the paperless office is no longer seen as the goal, the
less-paper office is perhaps closer, now that the technologies for moving between
media are better.
2.7
2.7 Paper: printing and scanning 97
98 Chapter 2 n The computer
2.7.1 Printing
If anything, computer systems have made it easier to produce paper documents. It is
so easy to run off many copies of a letter (or book), in order to get it looking ‘just
right’. Older printers had a fixed set of characters available on a printhead. These var-
ied from the traditional line printer to golf-ball and daisy-wheel printers. To change
a typeface or the size of type meant changing the printhead, and was an awkward,
and frequently messy, job, but for many years the daisy-wheel printer was the only
means of producing high-quality output at an affordable price. However, the drop in
the price of laser printers coupled with the availability of other cheap high-quality
printers means that daisy-wheels are fast becoming a rarity.
All of the popular printing technologies, like screens, build the image on the paper
as a series of dots. This enables, in theory, any character set or graphic to be printed,
Common types of dot-based printers
Dot-matrix printers
These use an inked ribbon, like a typewriter, but instead of a single character-shaped head striking
the paper, a line of pins is used, each of which can strike the ribbon and hence dot the paper.
Horizontal resolution can be varied by altering the speed of the head across the paper, and ver-
tical resolution can be improved by sending the head twice across the paper at a slightly different
position. So, dot-matrix printers can produce fast draft-quality output or slower ‘letter’-quality
output. They are cheap to run, but could not compete with the quality of jet and laser printers for
general office and home printing. They are now only used for bulk printing, or where carbon paper
is required for payslips, check printing, etc.)
Ink-jet and bubble-jet printers
These operate by sending tiny blobs of ink from the printhead to the paper. The ink is squirted at
pressure from an ink-jet, whereas bubble-jets use heat to create a bubble. Both are quite quiet in
operation. The ink from the bubble-jet (being a bubble rather than a droplet) dries more quickly
than the ink-jet and so is less likely to smear. Both approach laser quality, but the bubble-jet dots
tend to be more accurately positioned and of a less broken shape.
Laser printer
This uses similar technology to a photocopier: ‘dots’ of electrostatic charge are deposited on a
drum, which then picks up toner (black powder). This is then rolled onto the paper and cured by
heat. The curing is why laser printed documents come out warm, and the electrostatic charge is
why they smell of ozone! In addition, some toner can be highly toxic if inhaled, but this is more a
problem for full-time maintenance workers than end-users changing the occasional toner cartridge.
Laser printers give nearly typeset-quality output, with top-end printers used by desktop publishing
firms. Indeed, many books are nowadays produced using laser printers. The authors of this book
have produced camera-ready copy for other books on 300 and 600 dpi laser printers, although this
book required higher quality and the first edition was typeset at 1200 dpi onto special bromide
paper.
2.7 Paper: printing and scanning 99
limited only by the resolution of the dots. This resolution is measured in dots per inch
(dpi). Imagine a sheet of graph paper, and building up an image by putting dots at
the intersection of each line. The number of lines per inch in each direction is the
resolution in dpi. For some mechanical printers this is slightly confused: the dots
printed may be bigger than the gaps, neighboring printheads may not be able to
print simultaneously and may be offset relative to one another (a diamond-shaped
rather than rectangular grid). These differences do not make too much difference to
the user, but mean that, given two printers at the same nominal resolution, the out-
put of one looks better than that of the other, because it has managed the physical
constraints better.
The most common types of dot-based printers are dot-matrix printers, ink-jet
printers and laser printers. These are listed roughly in order of increasing resolution
and quality, where dot-matrix printers typically have a resolution of 80–120 dpi ris-
ing to about 300–600 dpi for ink-jet printers and 600–2400 dpi for laser printers. By
varying the quantity of ink and quality of paper, ink-jet printers can be used to print
photo-quality prints from digital photographs.
Printing in the workplace
Although ink-jet and laser printers have the lion’s share of the office and home printer mar-
ket, there are many more specialist applications that require different technology.
Most shop tills use dot-matrix printing where the arrangement is often very clever, with one print-
head serving several purposes. The till will usually print one till roll which stays within the machine,
recording all transactions for audit purposes. An identical receipt is printed for the customer.
In addition, many will print onto the customer’s own check or produce a credit card slip for the
customer to sign. Sometimes the multiple copies are produced using two or more layers of paper
where the top layer receives the ink and the lower layers use pressure-sensitive paper – not
possible using ink-jet or laser technology. Alternatively, a single printhead may move back and
forth over several small paper rolls within the same machine, as well as moving over the slot for
the customer’s own check.
As any printer owner will tell you, office printers are troublesome, especially as they age. Dif-
ferent printing technology is therefore needed in harsh environments or where a low level of
supervision is required. Thermal printers use special heat-sensitive paper that changes color when
heated. The printhead simply heats the paper where it wants a dot. Often only one line of dots
is produced per pass, in contrast to dot-matrix and ink-jet printers, which have several pins or
jets in parallel. The image is then produced using several passes per line, achieving a resolution
similar to a dot-matrix. Thermal paper is relatively expensive and not particularly nice to look
at, but thermal printers are mechanically simple and require little maintenance (no ink or toner
splashing about). Thermal printers are used in niche applications, for example industrial equipment,
some portable printers, and fax machines (although many now use plain paper).
100 Chapter 2 n The computer
As well as resolution, printers vary in speed and cost. Typically, office-quality ink-
jet or laser printers produce between four and eight pages per minute. Dot-matrix
printers are more often rated in characters per second (cps), and typical speeds may
be 200 cps for draft and 50 cps for letter-quality print. In practice, this means no
more than a page or so per minute. These are maximum speeds for simple text, and
printers may operate much more slowly for graphics.
Color ink-jet printers are substantially cheaper than even monochrome laser
printers. However, the recurrent costs of consumables may easily dominate this
initial cost. Both jet and laser printers have special-purpose parts (print cartridges,
toner, print drums), which need to be replaced every few thousand sheets; and they
must also use high-grade paper. It may be more difficult to find suitable grades of
recycled paper for laser printers.
2.7.2 Fonts and page description languages
Some printers can act in a mode whereby any characters sent to them (encoded in
ASCII, see Section 2.8.5) are printed, typewriter style, in a single font. Another case,
simple in theory, is when you have a bitmap picture and want to print it. The dots
to print are sent to the printer, and no further interpretation is needed. However, in
practice, it is rarely so simple.
Many printed documents are far more complex – they incorporate text in many
different fonts and many sizes, often italicized, emboldened and underlined. Within
the text you will find line drawings, digitized photographs and pictures generated
from ‘paint’ packages, including the ubiquitous ‘clip art’. Sometimes the computer
does all the work, converting the page image into a bitmap of the right size to be sent
to the printer. Alternatively, a description of the page may be sent to the printer.
At the simplest level, this will include commands to set the print position on the
page, and change the font size.
More sophisticated printers can accept a page description language, the most com-
mon of which is PostScript. This is a form of programming language for printing. It
includes some standard programming constructs, but also some special ones: paths
for drawing lines and curves, sophisticated character and font handling and scaled
bitmaps. The idea is that the description of a page is far smaller than the associated
bitmap, reducing the time taken to send the page to the printer. A bitmap version
of an A4 laser printer page at 300 dpi takes 8 Mbytes; to send this down a standard
serial printer cable would take 10 minutes! However, a computer in the printer has
to interpret the PostScript program to print the page; this is typically faster than 10
minutes, but is still the limiting factor for many print jobs.
Text is printed in a font with a particular size and shape. The size of a font is
measured in points (pt). The point is a printer’s measure and is about 1/72 of an
inch. The point size of the font is related to its height: a 12 point font has about
six lines per inch. The shape of a font is determined by its font name, for example
Times Roman, Courier or Helvetica. Times Roman font is similar to the type of
many newspapers, such as The Times, whereas Courier has a typewritten shape.
2.7 Paper: printing and scanning 101
Some fonts, such as Courier, are fixed pitch, that is each character has the
same width. The alternative is a variable-pitched font, such as Times Roman or
Gill Sans, where some characters, such as the ‘m’, are wider than others, such as
the ‘i’. Another characteristic of fonts is whether they are serif or sans-serif. A serif
font has fine, short cross-lines at the ends of the strokes, imitating those found on cut
stone lettering. A sans-serif font has square-ended strokes. In addition, there are
special fonts looking like Gothic lettering or cursive script, and fonts of Greek letters
and special mathematical symbols.
This book is set in 10 point Minion font using PostScript. Minion is a variable-
pitched serif font. Figure 2.14 shows examples of different fonts.
A mathematics font: αβξ±π∈∀∞⊥≠ℵ∂√∃
Figure 2.14 Examples of different fonts
DESIGN FOCUS
Readability of text
There is a substantial body of knowledge about the readability of text, both on screen and on paper.
An MSc student visited a local software company and, on being shown some of their systems, remarked
on the fact that they were using upper case throughout their displays. At that stage she had only com-
pleted part of an HCI course but she had read Chapter 1 of this book and already knew that WORDS
WRITTEN IN BLOCK CAPITALS take longer to read than those in lower case. Recall that this is largely
because of the clues given by word shapes and is the principle behind ‘look and say’ methods of teach-
ing children to read. The company immediately recognized the value of the advice and she instantly rose
in their esteem!
However, as with many interface design guidelines there are caveats. Although lower-case words are
easier to read, individual letters and nonsense words are clearer in upper case. For example, one writes
flight numbers as ‘BA793’ rather than ‘ba793’. This is particularly important when naming keys to press
(for example, ‘Press Q to quit’) as keyboards have upper-case legends.
Font shapes can also make a difference; for printed text, serif fonts make it easier to run one’s eye
along a line of text. However, they usually reproduce less well on screen where the resolution is
poorer.
102 Chapter 2 n The computer
2.7.3 Screen and page
A common requirement of word processors and desktop publishing software is that
what you see is what you get (see also Chapters 4 and 17), which is often called by its
acronym WYSIWYG (pronounced whizz-ee-wig). This means that the appearance
of the document on the screen should be the same as its eventual appearance on
the printed page. In so far as this means that, for example, centered text is displayed
centered on the screen, this is reasonable. However, this should not cloud the fact
that screen and paper are very different media.
A typical screen resolution is about 72 dpi compared with a laser printer at over
600 dpi. Some packages can show magnified versions of the document in order to
help in this. Most screens use an additive color model using red, green and blue light,
whereas printers use a subtractive color model with cyan, magenta, yellow and black
inks, so conversions have to be made. In addition, the sizes and aspect ratios are very
different. An A4 page is about 11 inches tall by 8 wide (297× 210 mm), whereas a
screen is often of similar dimensions, but wider than it is tall.
These differences cause problems when designing software. Should you try to
make the screen image as close to the paper as possible, or should you try to make
the best of each? One approach to this would be to print only what could be dis-
played, but that would waste the extra resolution of the printer. On the other
hand, one can try to make the screen as much like paper as possible, which
is the intention behind the standard use of black text on a white background,
rotatable A4 displays, and tablet PCs. This is a laudable aim, but cannot get rid of
all the problems.
A particular problem lies with fonts. Imagine we have a line of ‘m’s, each having a
width of 0.15 inch (4 mm). If we print them on a 72 dpi screen, then we can make
the screen character 10 or 11 dots wide, in which case the screen version will be
narrower or wider than the printed version. Alternatively, we can print the screen
version as near as possible to where the printed characters would lie, in which case
the ‘m’s on the screen would have different spaces between them: ‘mm mm mm mm
m’. The latter looks horrible on the screen, so most software chooses the former
approach. This means that text that aligns on screen may not do so on printing.
Some systems use a uniform representation for screen and printer, using the same
font descriptions and even, in the case of the Next operating system, PostScript
for screen display as well as printer output (also PDF with MacOS X). However,
this simply exports the problem from the application program to the operating
system.
The differences between screen and printer mean that different forms of graphic
design are needed for each. For example, headings and changes in emphasis are made
using font style and size on paper, but using color, brightness and line boxes on
screen. This is not usually a problem for the display of the user’s own documents as
the aim is to give the user as good an impression of the printed page as possible, given
the limitations. However, if one is designing parallel paper and screen forms, then
one has to trade off consistency between the two representations with clarity in each.
2.7 Paper: printing and scanning 103
An overall similar layout, but with different forms of presentation for details, may be
appropriate.
2.7.4 Scanners and optical character recognition
Printers take electronic documents and put them on paper – scanners reverse this
process. They start by turning the image into a bitmap, but with the aid of optical
character recognitioncan convert the page right back into text. The image to be con-
verted may be printed, but may also be a photograph or hand-drawn picture.
There are two main kinds of scanner: flat-bed and hand-held. With a flat-bed
scanner, the page is placed on a flat glass plate and the whole page is converted
into a bitmap. A variant of the flat-bed is where sheets to be scanned are pulled
through the machine, common in multi-function devices (printer/fax/copier). Many
flat-bed scanners allow a small pile of sheets to be placed in a feed tray so that
they can all be scanned without user intervention. Hand-held scanners are pulled
over the image by hand. As the head passes over an area it is read in, yielding
a bitmap strip. A roller at the ends ensures that the scanner knows how fast it is
being pulled and thus how big the image is. The scanner is typically only 3 or 4 inches
(80 or 100 mm) wide and may even be the size of a large pen (mainly used for
scanning individual lines of text). This means at least two or three strips must be
‘glued’ together by software to make a whole page image, quite a difficult process
as the strips will overlap and may not be completely parallel to one another, as
well as suffering from problems of different brightness and contrast. However,
for desktop publishing small images such as photographs are quite common, and
as long as one direction is less than the width of the scanner, they can be read in
one pass.
Scanners work by shining a beam of light at the page and then recording the intens-
ity and color of the reflection. Like photocopiers, the color of the light that is shone
means that some colors may appear darker than others on a monochrome scanner.
For example, if the light is pure red, then a red image will reflect the light completely
and thus not appear on the scanned image.
Like printers, scanners differ in resolution, commonly between 600 and 2400 dpi,
and like printers the quoted resolution needs careful interpretation. Many have a
lower resolution scanhead but digitally interpolate additional pixels – the same is
true for some digital cameras. Monochrome scanners are typically only found in
multi-function devices, but color scanners usually have monochrome modes for
black and white or grayscale copying. Scanners will usually return up to 256 levels
of gray or RGB (red, green, blue) color. If a pure monochrome image is required
(for instance, from a printed page), then it can thresholdthe grayscale image; that is,
turn all pixels darker than some particular value black, and the rest white.
Scanners are used extensively in desktop publishing (DTP) for reading in hand-
drawn pictures and photographs. This means that cut and paste can be performed
electronically rather than with real glue. In addition, the images can be rotated,
104 Chapter 2 n The computer
scaled and otherwise transformed, using a variety of image manipulation software
tools. Such tools are becoming increasingly powerful, allowing complex image trans-
formations to be easily achieved; these range from color correction, through the
merging of multiple images to the application of edge-detection and special effects
filters. The use of multiple layers allows photomontage effects that would be imposs-
ible with traditional photographic or paper techniques. Even where a scanned image
is simply going to be printed back out as part of a larger publication, some process-
ing typically has to be performed to match the scanned colors with those produced
during printing. For film photographs there are also special film scanners that can
scan photographic negatives or color slides. Of course, if the photographs are digital
no scanning is necessary.
Another application area is in document storage and retrieval systems, where
paper documents are scanned and stored on computer rather than (or sometimes as
well as) in a filing cabinet. The costs of maintaining paper records are enormous, and
electronic storage can be cheaper, more reliable and more flexible. Storing a bitmap
image is neither most useful (in terms of access methods), nor space efficient (as we
will see later), so scanning may be combined with optical character recognition to
obtain the text rather than the page image of the document.
Optical character recognition (OCR) is the process whereby the computer can
‘read’ the characters on the page. It is only comparatively recently that print could be
reliably read, since the wide variety of typefaces and print sizes makes this more
difficult than one would imagine – it is not simply a matter of matching a character
shape to the image on the page. In fact, OCR is rather a misnomer nowadays as,
although the document is optically scanned, the OCR software itself operates on the
bitmap image. Current software can recognize ‘unseen’ fonts and can even produce
output in word-processing formats, preserving super- and subscripts, centering,
italics and so on.
Another important area is electronic publishing for multimedia and the world
wide web. Whereas in desktop publishing the scanned image usually ends up (after
editing) back on paper, in electronic publishing the scanned image is destined to be
viewed on screen. These images may be used simply as digital photographs or may
be made active, whereby clicking on some portion of the image causes pertinent
information to be displayed (see Chapter 3 for more on the point-and-click style
of interaction). One big problem when using electronic images is the plethora of
formats for storing graphics (see Section 2.8.5). Another problem is the fact that
different computers can display different numbers of colors and that the appearance
of the same image on different monitors can be very different.
The importance of electronic publishing and also the ease of electronically manip-
ulating images for printing have made the digital camera increasingly popular.
Rather than capturing an image on film, a digital camera has a small light-sensitive
chip that can directly record an image into memory.
2.7 Paper: printing and scanning 105
Worked exercise What input and output devices would you use for the following systems? For each, compare
and contrast alternatives, and if appropriate indicate why the conventional keyboard, mouse
and CRT screen may be less suitable.
(a) portable word processor
(b) tourist information system
(c) tractor-mounted crop-spraying controller
Paper-based interaction
Paper is principally seen as an output medium. You type in some text, format it, print it and
read it. The idea of the paperless office was to remove the paper from the write–read loop entirely,
but it didn’t fundamentally challenge its place in the cycle as an output medium. However, this view of
paper as output has changed as OCR technology has improved and scanners become commonplace.
Workers at Xerox Palo Alto Research Center (also known as Xerox PARC) capitalized on this by
using paper as a medium of interaction with computer systems [195]. A special identifying mark is
printed onto forms and similar output. The printed forms may have check boxes or areas for writ-
ing numbers or (in block capitals!) words. The form can then be scanned back in. The system reads
the identifying mark and thereby knows what sort of paper form it is dealing with. It doesn’t have
to use OCR on the printed text of the form as it printed it, but can detect the check boxes that
have been filled in and even recognize the text that has been written. The identifying mark the
researchers used is composed of backward and forward slashes, ‘\’ and ‘/’, and is called a glyph.
An alternative would have been to use bar codes, but the slashes were found to fax and scan
more reliably. The research version of this system was known as XAX, but it is now marketed as
Xerox PaperWorks.
One application of this technology is mail order catalogs. The order form is printed with a glyph.
When completed, forms can simply be collected into bundles and scanned in batches, generating
orders automatically. If the customer faxes an order the fax-receiving software recognizes the
glyph and the order is processed without ever being handled at the company end. Such a paper
user interface may involve no screens or keyboards whatsoever.
Some types of paper now have identifying marks micro-printed like a form of textured water-
mark. This can be used both to identify the piece of paper (as the glyph does), and to identify the
location on the paper. If this book were printed on such paper it would be possible to point at
a word or diagram with a special pen-like device and have it work out what page you are on
and where you are pointing and thus take you to appropriate web materials...perhaps the fourth
edition...
It is paradoxical that Xerox PARC, where much of the driving work behind current ‘mouse and
window’ computer interfaces began, has also developed this totally non-screen and non-mouse
paradigm. However, the common principle behind each is the novel and appropriate use of differ-
ent media for graceful interaction.
106 Chapter 2 n The computer
(d) air traffic control system
(e) worldwide personal communications system
(f) digital cartographic system.
Answer In the later exercise on basic architecture (see Section 2.8.6), we focus on ‘typical’
systems, whereas here the emphasis is on the diversity of different devices needed for
specialized purposes. You can ‘collect’ devices – watch out for shop tills, bank tellers,
taxi meters, lift buttons, domestic appliances, etc.
(a) Portable word processor
The determining factors are size, weight and battery power. However, remember
the purpose: this is a word processor not an address book or even a data entry
device.
(i) LCD screen – low-power requirement
(ii) trackball or stylus for pointing
(iii) real keyboard – you can’t word process without a reasonable keyboard and
stylus handwriting recognition is not good enough
(iv) small, low-power bubble-jet printer – although not always necessary, this
makes the package stand alone. It is probably not so necessary that the printer
has a large battery capacity as printing can probably wait until a power point is
found.
(b) Tourist information system
This is likely to be in a public place. Most users will only visit the system once, so
the information and mode of interaction must be immediately obvious.
(i) touchscreen only – easy and direct interaction for first-time users (see also
Chapter 3)
(ii) NO mice or styluses – in a public place they wouldn’t stay long!
(c) Tractor-mounted crop-spraying controller
A hostile environment with plenty of mud and chemicals. Requires numerical input
for flow rates, etc., but probably no text
(i) touch-sensitive keypad – ordinary keypads would get blocked up
(ii) small dedicated LED display (LCDs often can’t be read in sunlight and large
screens are fragile)
(iii) again no mice or styluses – they would get lost.
(d) Air traffic control system
The emphasis is on immediately available information and rapid interaction. The
controller cannot afford to spend time searching for information; all frequently used
information must be readily available.
(i) several specialized displays – including overlays of electronic information on
radar
(ii) light pen or stylus – high-precision direct interaction
(iii) keyboard – for occasional text input, but consider making it fold out of the way.
(e) Worldwide personal communications system
Basically a super mobile phone! If it is to be kept on hand all the time it must be
very light and pocket sized. However, to be a ‘communications’ system one would
imagine that it should also act as a personal address/telephone book, etc.
2.8 Memory 107
(i) standard telephone keypad – the most frequent use
(ii) small dedicated LCD display – low power, specialized functions
(iii) possibly stylus for interaction – it allows relatively rich interaction with the
address book software, but little space
(iv) a ‘docking’ facility – the system itself will be too small for a full-sized key-
board(!), but you won’t want to enter in all your addresses and telephone num-
bers by stylus!
(f) Digital cartographic system
This calls for very high-precision input and output facilities. It is similar to CAD in
terms of the screen facilities and printing, but in addition will require specialized
data capture.
(i) large high-resolution color VDU (20 inch or bigger) – these tend to be enor-
mously big (from back to front). LCD screens, although promising far thinner
displays in the long term, cannot at present be made large enough
(ii) digitizing tablet – for tracing data on existing paper maps. It could also double
up as a pointing device for some interaction
(iii) possibly thumbwheels – for detailed pointing and positioning tasks
(iv) large-format printer – indeed very large: an A2 or A1 plotter at minimum.
MEMORY
Like human memory, we can think of the computer’s memory as operating at dif-
ferent levels, with those that have the faster access typically having less capacity. By
analogy with the human memory, we can group these into short-term and long-term
memories (STM and LTM), but the analogy is rather weak – the capacity of the com-
puter’s STM is a lot more than seven items! The different levels of computer mem-
ory are more commonly called primary and secondary storage.
The details of computer memory are not in themselves of direct interest to the
user interface designer. However, the limitations in capacity and access methods are
important constraints on the sort of interface that can be designed. After some fairly
basic information, we will put the raw memory capacity into perspective with the
sort of information which can be stored, as well as again seeing how advances in
technology offer more scope for the designer to produce more effective interfaces. In
particular, we will see how the capacity of typical memory copes with video images as
these are becoming important as part of multimedia applications (see Chapter 21).
2.8.1 RAM and shor t-term memory (STM)
At the lowest level of computer memory are the registers on the computer chip, but
these have little impact on the user except in so far as they affect the general speed of
2.8
108 Chapter 2 n The computer
the computer. Most currently active information is held in silicon-chip random
access memory(RAM ). Different forms of RAM differ as to their precise access times,
power consumption and characteristics. Typical access times are of the order of
10 nanoseconds, that is a hundred-millionth of a second, and information can be
accessed at a rate of around 100 Mbytes (million bytes) per second. Typical storage
in modern personal computers is between 64 and 256 Mbytes.
Most RAM is volatile, that is its contents are lost when the power is turned off.
However, many computers have small amount of non-volatile RAM, which retains its
contents, perhaps with the aid of a small battery. This may be used to store setup
information in a large computer, but in a pocket organizer will be the whole mem-
ory. Non-volatile RAM is more expensive so is only used where necessary, but with
many notebook computers using very low-power static RAM, the divide is shrink-
ing. By strict analogy, non-volatile RAM ought to be classed as LTM, but the import-
ant thing we want to emphasize is the gulf between STM and LTM in a traditional
computer system.
In PDAs the distinctions become more confused as the battery power means that
the system is never completely off, so RAM memory effectively lasts for ever. Some
also use flash memory, which is a form of silicon memory that sits between fixed
content ROM (read-only memory) chips and normal RAM. Flash memory is relat-
ively slow to write, but once written retains its content even with no power whatso-
ever. These are sometimes called silicon disks on PDAs. Digital cameras typically
store photographs in some form of flash media and small flash-based devices are
used to plug into a laptop or desktop’s USB port to transfer data.
2.8.2 Disks and long-term memor y (LTM)
For most computer users the LTM consists of disks, possibly with small tapes for
backup. The existence of backups, and appropriate software to generate and retrieve
them, is an important area for user security. However, we will deal mainly with those
forms of storage that impact the interactive computer user.
There are two main kinds of technology used in disks: magnetic disks and optical
disks. The most common storage media, floppy disks and hard (or fixed) disks,
are coated with magnetic material, like that found on an audio tape, on which the
information is stored. Typical capacities of floppy disks lie between 300 kbytes and
1.4 Mbytes, but as they are removable, you can have as many as you have room for
on your desk. Hard disks may store from under 40 Mbytes to several gigabytes
(Gbytes), that is several thousand million bytes. With disks there are two access times
to consider, the time taken to find the right track on the disk, and the time to read
the track. The former dominates random reads, and is typically of the order of 10 ms
for hard disks. The transfer rate once the track is found is then very high, perhaps
several hundred kilobytes per second. Various forms of large removable media are
also available, fitting somewhere between floppy disks and removable hard disks, and
are especially important for multimedia storage.
2.8 Memory 109
Optical disks use laser light to read and (sometimes) write the information on the
disk. There are various high capacity specialist optical devices, but the most common
is the CD-ROM, using the same technology as audio compact discs. CD-ROMs have
a capacity of around 650 megabytes, but cannot be written to at all. They are useful
for published material such as online reference books, multimedia and software
distribution. Recordable CDs are a form of WORM device (write-once read-many)
and are more flexible in that information can be written, but (as the name suggests)
only once at any location – more like a piece of paper than a blackboard. They are
obviously very useful for backups and for producing very secure audit information.
Finally, there are fully rewritable optical disks, but the rewrite time is typically much
slower than the read time, so they are still primarily for archival not dynamic storage.
Many CD-ROM reader/writers can also read DVD format, originally developed for
storing movies. Optical media are more robust than magnetic disks and so it is easier
to use a jukeboxarrangement, whereby many optical disks can be brought online auto-
matically as required. This can give an online capacity of many hundreds of giga-
bytes. However, as magnetic disk capacities have grown faster than the fixed standard
of CD-ROMs, some massive capacity stores are moving to large disk arrays.
2.8.3 Understanding speed and capacity
So what effect do the various capacities and speeds have on the user? Thinking of our
typical personal computer system, we can summarize some typical capacities as in
Table 2.1.
We think first of documents. This book is about 320,000 words, or about 2
Mbytes, so it would hardly make a dent in 256 Mbytes of RAM. (This size – 2 Mbytes
– is unformatted and without illustrations; the actual size of the full data files is an
order of magnitude bigger, but still well within the capacity of main memory.) To
take a more popular work, the Bible would use about 4.5 Mbytes. This would still
consume only 2% of main memory, and disappear on a hard disk. However, it might
look tight on a smaller PDA. This makes the memory look not too bad, so long as
you do not intend to put your entire library online. However, many word processors
come with a dictionary and thesaurus, and there is no standard way to use the same
one with several products. Together with help files and the program itself, it is not
Table 2.1 Typical capacities of different storage media
STM small/fast LTM large/slower
Media: RAM Hard disk
Capacity: 256 Mbytes 100 Gbytes
Access time: 10 ns 7 ms
Transfer rate: 100 Mbyte/s 30 Mbyte/s
110 Chapter 2 n The computer
unusual to find each application consuming tens or even hundreds of megabytes of
disk space – it is not difficult to fill a few gigabytes of disk at all!
Similarly, although 256 Mbytes of RAM are enough to hold most (but not all) sin-
gle programs, windowed systems will run several applications simultaneously, soon
using up many megabytes. Operating systems handle this by paging unused bits of
programs out of RAM onto disk, or even swapping the entire program onto disk.
This makes little difference to the logical functioning of the program, but has a
significant effect on interaction. If you select a window, and the relevant application
happens to be currently swapped out onto the disk, it has to be swapped back in. The
delay this causes can be considerable, and is both noticeable and annoying on many
systems.
The delays due to swapping are a symptom of the von Neumann bottleneck
between disk and main memory. There is plenty of information in the memory, but
it is not where it is wanted, in the machine’s RAM. The path between them is limited
by the transfer rate of the disk and is too slow. Swapping due to the operating system
may be difficult to avoid, but for an interactive system designer some of these prob-
lems can be avoided by thinking carefully about where information is stored and
when it is transferred. For example, the program can be lazy about information
transfer. Imagine the user wants to look at a document. Rather than reading in the
whole thing before letting the user continue, just enough is read in for the first page
to be displayed, and the rest is read during idle moments.
Returning to documents, if they are scanned as bitmaps (and not read using
OCR), then the capacity of our system looks even less impressive. Say an 11× 8 inch
(297× 210 mm) page is scanned with an 8 bit grayscale (256 levels) setting at 1200 dpi.
The image contains about one billion bits, that is about 128 Mbyte. So, our 100 Gbyte
disk could store 800 pages – just OK for this book, but not for the Bible.
If we turn to video, things are even worse. Imagine we want to store moving
video using 12 bits for each pixel (4 bits for each primary color giving 16 levels of
brightness), each frame is 512 × 512 pixels, and we store at 25 frames per second.
Technological change and storage capacity
Most of the changes in this book since the first and second editions have been additions
where new developments have come along. However, this portion has had to be scrutinized line
by line as the storage capacities of high-end machines when this book was first published in 1993
looked ridiculous as we revised it in 1997 and then again in 2003. One of our aims in this chapter
was to give readers a concrete feel for the capacities and computational possibilities in standard
computers. However, the pace of advances in this area means that it becomes out of date almost
as fast as it is written! This is also a problem for design; it is easy to build a system that is sensible
given a particular level of technology, but becomes meaningless later. The solution is either to issue
ever more frequent updates and new versions, or to exercise a bit of foresight. . .
2.8 Memory 111
This is by no means a high-quality image, but each frame requires approximately
400 kbytes giving 10 Mbytes per second. Our disk will manage about three hours
of video – one good movie. Lowering our sights to still photographs, good digital
cameras usually take 2 to 4 mega pixels at 24 bit color; that is 10 Mbytes of raw
uncompressed image – you’d not get all your holiday snaps into main memory!
2.8.4 Compression
In fact, things are not quite so bad, since compression techniques can be used to
reduce the amount of storage required for text, bitmaps and video. All of these things
are highly redundant. Consider text for a moment. In English, we know that if we use
the letter ‘q’ then ‘u’ is almost bound to follow. At the level of words, some words
like ‘the’ and ‘and’ appear frequently in text in general, and for any particular work
one can find other common terms (this book mentions ‘user’ and ‘computer’ rather
frequently). Similarly, in a bitmap, if one bit is white, there is a good chance the next
will be as well. Compression algorithms take advantage of this redundancy. For
example, Huffman encoding gives short codes to frequent words [182], and run-
length encoding represents long runs of the same value by length value pairs. Text
can easily be reduced by a factor of five and bitmaps often compress to 1% of their
original size.
For video, in addition to compressing each frame, we can take advantage of the
fact that successive frames are often similar. We can compute the differencebetween
successive frames and then store only this – compressed, of course. More sophistic-
ated algorithms detect when the camera pans and use this information also. These
differencing methods fail when the scene changes, and so the process periodically has
to restart and send a new, complete (but compressed) image. For storage purposes
this is not a problem, but when used for transmission over telephone lines or net-
works it can mean glitches in the video as the system catches up.
With these reductions it is certainly possible to store low-quality video at
64 kbyte/s; that is, we can store five hours of highly compressed video on our 1 Gbyte
hard disk. However, it still makes the humble video cassette look very good value.
Probably the leading edge of video still and photographic compression is fractal
compression. Fractals have been popularized by the images of the Mandelbrot set (that
swirling pattern of computer-generated colors seen on many T-shirts and posters).
Fractals refer to any image that contains parts which, when suitably scaled, are sim-
ilar to the whole. If we look at an image, it is possible to find parts which are approx-
imately self-similar, and these parts can be stored as a fractal with only a few numeric
parameters. Fractal compression is especially good for textured features, which cause
problems for other compression techniques. The decompression of the image can
be performed to any degree of accuracy, from a very rough soft-focus image, to
one more detailed than the original. The former is very useful as one can produce
poor-quality output quickly, and better quality given more time. The latter is rather
remarkable – the fractal compression actually fills in details that are not in the
original. These details are not accurate, but look convincing!
112 Chapter 2 n The computer
2.8.5 Storage format and standards
The most common data types stored by interactive programs are text and bitmap
images, with increasing use of video and audio, and this subsection looks at the
ridiculous range of file storage standards. We will consider database retrieval in the
next subsection.
The basic standard for text storage is the ASCII (American standard code for
information interchange) character codes, which assign to each standard printable
character and several control characters an internationally recognized 7 bit code
(decimal values 0–127), which can therefore be stored in an 8 bit byte, or be transmit-
tedas 8 bits including parity. Many systems extend the codes to the values 128 –255,
including line-drawing characters, mathematical symbols and international letters such
as ‘æ’. There is a 16 bit extension, the UNICODE standard, which has enough room
for a much larger range of characters including the Japanese Kanji character set.
As we have already discussed, modern documents consist of more than just characters.
The text is in different fonts and includes formatting information such as centering,
page headers and footers. On the whole, the storage of formatted text is vendor specific,
since virtually every application has its own file format. This is not helped by the fact
that many suppliers attempt to keep their file formats secret, or update them fre-
quently to stop others’ products being compatible. With the exception of bare ASCII,
the most common shared format is rich text format(RTF), which encodes formatting
information including style sheets. However, even where an application will import
or export RTF, it may represent a cut-down version of the full document style.
RTF regards the document as formatted text, that is it concentrates on the appear-
ance. Documents can also be regarded as structured objects: this book has chapters
containing sections, subsections...paragraphs, sentences, words and characters. There
are ISOstandards for document structure and interchange, which in theory could be
used for transfer between packages and sites, but these are rarely used in practice.
Just as the PostScript language is used to describe the printed page, SGML(standard
generalized markup language) can be used to store structured text in a reasonably
extensible way. You can define your own structures (the definition itself in SGML),
and produce documents according to them. XML (extensible markup language), a
lightweight version of SGML, is now used extensively for web-based applications.
For bitmap storage the range of formats is seemingly unending. The stored image
needs to record the size of the image, the number of bits per pixel, possibly a color
map, as well as the bits of the image itself. In addition, an icon may have a ‘hot-spot’
for use as a cursor. If you think of all the ways of encoding these features, or leaving
them implicit, and then consider all the combinations of these different encodings,
you can see why there are problems. And all this before we have even considered
the effects of compression! There is, in fact, a whole software industry producing
packages that convert from one format to another.
Given the range of storage standards (or rather lack of standards), there is no easy
advice as to which is best, but if you are writing a new word processor and are about
to decide how to store the document on disk, think, just for a moment, before
defining yet another format.
2.8 Memory 113
2.8.6 Methods of access
Standard database access is by special key fields with an associated index. The user
has to know the key before the system can find the information. A telephone direct-
ory is a good example of this. You can find out someone’s telephone number if you
know their name (the key), but you cannot find the name given the number. This is
evident in the interface of many computer systems. So often, when you contact an
organization, they can only help you if you give your customer number, or last order
number. The usability of the system is seriously impaired by a shortsighted reliance
on a single key and index. In fact, most database systems will allow multiple keys and
indices, allowing you to find a record given partial information. So these problems
are avoidable with only slight foresight.
There are valid reasons for not indexing on too many items. Adding extra indices
adds to the size of the database, so one has to balance ease of use against storage cost.
However, with ever-increasing disk sizes, this is not a good excuse for all but extreme
examples. Unfortunately, brought up on lectures about algorithmic efficiency, it is
easy for computer scientists to be stingy with storage. Another, more valid, reason
for restricting the fields you index is privacy and security. For example, telephone
companies will typically hold an online index that, given a telephone number, would
return the name and address of the subscriber, but to protect the privacy of their cus-
tomers, this information is not divulged to the general public.
It is often said that dictionaries are only useful for people who can spell. Bad
spellers do not know what a word looks like so cannot look it up to find out. Not only
in spelling packages, but in general, an application can help the user by matching
badly spelt versions of keywords. One example of this is do what I mean (DWIM)
used in several of Xerox PARC’s experimental programming environments. If a
command name is misspelt the system prompts the user with a close correct name.
Menu-based systems make this less of an issue, but one can easily imagine doing
the same with, say, file selection. Another important instance of this principle is
Soundex, a way of indexing words, especially names. Given a key, Soundex finds
those words which sound similar. For example, given McCloud, it would find
MacCleod. These are all examples of forgiving systems, and in general one should aim
to accommodate the user’s mistakes. Again, there are exceptions to this: you do not
want a bank’s automated teller machine (ATM) to give money when the PIN num-
ber is almost correct!
Not all databases allow long passages of text to be stored in records, perhaps set-
ting a maximum length for text strings, or demanding the length be fixed in advance.
Where this is the case, the database seriously restricts interface applications where
text forms an important part. At the other extreme, free text retrievalsystems are cen-
tered on unformatted, unstructured text. These systems work by keeping an index
of every word in every document, and so you can ask ‘give me all documents with
the words “human” and “computer” in them’. Programs, such as versions of the
UNIX ‘grep’ command, give some of the same facilities by quickly scanning a list of
files for a certain word, but are much slower. On the web, free text search is of course
the standard way to find things using search engines.