ALAN DIX, JANET FINLAY,

GREGORY D. ABOWD, RUSSELL BEALE

THIRD EDITION

THIRD

EDITION

HUMAN–COMPUTER

INTERACTION

Much has changed since the first edition of

Human–Computer Interaction was published. Ubiquitous

computing and rich sensor-filled environments are

finding their way out of the laboratory, not just into

movies but also into our workplaces and homes. The

computer has broken out of its plastic and glass

bounds providing us with networked societies where

personal computing devices from mobile phones to

smart cards fill our pockets and electronic devices

surround us at home and work. The web too has grown

from a largely academic network into the hub of

business and everyday lives. As the distinctions between

the physical and the digital, and between work and

leisure start to break down, human–computer

interaction is also changing radically.

The excitement of these changes is captured in this new

edition, which also looks forward to other emerging

technologies. However, the book is firmly rooted in

strong principles and models independent of the

passing technologies of the day: these foundations will

be the means by which today’s students will

understand tomorrow’s technology.

The third edition of Human–Computer Interaction can be

used for introductory and advanced courses on HCI,

Interaction Design, Usability or Interactive Systems

Design. It will also prove an invaluable reference for

professionals wishing to design usable computing

devices.

Accompanying the text is a comprehensive website

containing a broad range of material for instructors,

students and practitioners, a full text search facility for

the book, links to many sites of additional interest and

much more: go to www.hcibook.com

Alan Dix is Professor in the Department of Computing, Lancaster, UK. Janet Finlay is

Professor in the School of Computing, Leeds Metropolitan University, UK. Gregory D. Abowd is

Associate Professor in the College of Computing and GVU Center at Georgia Tech, USA.

Russell Beale is lecturer at the School of Computer Science, University of

Birmingham, UK.

Cover illustration by Peter Gudynas

New to this edition:

� A revised structure, reflecting the

growth of HCI as a discipline,

separates out basic material suitable

for introductory courses from more

detailed models and theories.

� New chapter on interaction design

adds material on scenarios and basic

navigation design.

� New chapter on universal design,

substantially extending the coverage

of this material in the book.

� Updated and extended treatment of

socio/contextual issues.

� Extended and new material on novel

interaction,including updated

ubicomp material,designing

experience,physical sensors and a

new chapter on rich interaction.

� Updated material about the web

including dynamic content.

� Relaunched websiteincluding case

studies,WAP access and search.

www.pearson-books.com

HUMAN–COMPUTER

INTERACTION

DIX

FINLAY

ABOWD

BEALE

Human–Computer Interaction

We work with leading authors to develop the

strongest educational materials in computing,

bringing cutting-edge thinking and best learning

practice to a global market.

Under a range of well-known imprints, including

Prentice Hall, we craft high quality print and

electronic publications which help readers to

understand and apply their content, whether

studying or at work.

To ﬁnd out more about the complete range of our

publishing, please visit us on the world wide web at:

www.pearsoned.co.uk

Human–Computer Interaction

Third Edition

Alan Dix,

Lancaster University

Janet Finlay, Leeds Metropolitan University

Gregory D. Abowd, Georgia Institute of Technology

Russell Beale, University of Birmingham

Pearson Education Limited

Edinburgh Gate

Harlow

Essex CM20 2JE

England

and Associated Companies throughout the world

Visit us on the world wide web at:

www.pearsoned.co.uk

First published 1993

Second edition published 1998

Third edition published 2004

The rights of Alan Dix, Janet E. Finlay, Gregory D. Abowd and Russell Beale

to be identiﬁed as authors of this work have been asserted by them in

accordance with the Copyright, Designs and Patents Act 1988.

in a retrieval system, or transmitted in any form or by any means, electronic,

mechanical, photocopying, recording or otherwise, without either the prior

written permission of the publisher or a licence permitting restricted copying

in the United Kingdom issued by the Copyright Licensing Agency Ltd,

90 Tottenham Court Road, London W1T 4LP.

All trademarks used herein are the property of their respective owners. The use

of any trademark in this text does not vest in the author or publisher any trademark

ownership rights in such trademarks, nor does the use of such trademarks imply any

afﬁliation with or endorsement of this book by such owners.

ISBN-13: 978-0-13-046109-4

ISBN-10: 0-13-046109-1

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

109876543

10 09 08 07 06

Typeset in 10/12

/2pt Minion by 35

Printed and bound by Scotprint, Haddington

BRIEF CONTENTS
Guided tour xiv
Foreword xvi
Preface to the third edition xix
Publisher’s acknowledgements xxiii
Introduction 1
FOUNDATIONS 9
Chapter 1 The human 11
Chapter 2 The computer 59
Chapter 3 The interaction 123
Chapter 4 Paradigms 164
DESIGN PROCESS 189
Chapter 5 Interaction design basics 191
Chapter 6 HCI in the software process 225
Chapter 7 Design rules 258
Chapter 8 Implementation support 289
Chapter 9 Evaluation techniques 318
Chapter 10 Universal design 365
Chapter 11 User support 395
MODELS AND THEORIES 417
Chapter 12 Cognitive models 419
Chapter 13 Socio-organizational issues and stakeholder requirements 450
Part 3
Part 2
Part 1

vi Brief Contents
Chapter 14 Communication and collaboratio0n models 475
Chapter 15 Task analysis 510
Chapter 16 Dialog notations and design 544
Chapter 17 Models of the system 594
Chapter 18 Modeling rich interaction 629
OUTSIDE THE BOX 661
Chapter 19 Groupware 663
Chapter 20 Ubiquitous computing and augmented realities 716
Chapter 21 Hypertext, multimedia and the world wide web 748
References 791
Index 817
Part 4

CONTENTS
Guided tour xiv
Foreword xvi
Preface to the third edition xix
Publisher’s acknowledgements xxiii
Introduction 1
FOUNDATIONS 9
Chapter 1 The human 11
1.1 Introduction 12
1.2 Input–output channels 13
Design Focus: Getting noticed 16
Design Focus: Where’s the middle? 22
1.3 Human memory 27
Design Focus: Cashing in 30
Design Focus: 7 ± 2 revisited 32
1.4 Thinking: reasoning and problem solving 39
Design Focus: Human error and false memories 49
1.5 Emotion 51
1.6 Individual differences 52
1.7 Psychology and the design of interactive systems 53
1.8 Summary 55
Exercises 56
Recommended reading 57
Chapter 2 The computer 59
2.1 Introduction 60
2.2 Text entry devices 63
Design Focus: Numeric keypads 67
2.3 Positioning, pointing and drawing 71
Part 1

viii Contents
2.4 Display devices 78
Design Focus: Hermes: a situated display 86
2.5 Devices for virtual reality and 3D interaction 87
2.6 Physical controls, sensors and special devices 91
Design Focus: Feeling the road 94
Design Focus: Smart-Its – making using sensors easy 96
2.7 Paper: printing and scanning 97
Design Focus: Readability of text 101
2.8 Memory 107
2.9 Processing and networks 114
Design Focus: The myth of the inﬁnitely fast machine 116
2.10 Summary 120
Exercises 121
Recommended reading 122
Chapter 3 The interaction 123
3.1 Introduction 124
3.2 Models of interaction 124
Design Focus: Video recorder 130
3.3 Frameworks and HCI 130
3.4 Ergonomics 131
Design Focus: Industrial interfaces 133
3.5 Interaction styles 136
Design Focus: Navigation in 3D and 2D 144
3.6 Elements of the WIMP interface 145
Design Focus: Learning toolbars 151
3.7 Interactivity 152
3.8 The context of the interaction 154
Design Focus: Half the picture? 155
3.9 Experience, engagement and fun 156
3.10 Summary 160
Exercises 161
Recommended reading 162
Chapter 4 Paradigms 164
4.1 Introduction 165
4.2 Paradigms for interaction 165
4.3 Summary 185
Exercises 186
Recommended reading 187

Contents ix
DESIGN PROCESS 189
Chapter 5 Interaction design basics 191
5.1 Introduction 192
5.2 What is design? 193
5.3 The process of design 195
5.4 User focus 197
Design Focus: Cultural probes 200
5.5 Scenarios 201
5.6 Navigation design 203
Design Focus: Beware the big button trap 206
Design Focus: Modes 207
5.7 Screen design and layout 211
Design Focus: Alignment and layout matter 214
Design Focus: Checking screen colors 219
5.8 Iteration and prototyping 220
5.9 Summary 222
Exercises 223
Recommended reading 224
Chapter 6 HCI in the software process 225
6.1 Introduction 226
6.2 The software life cycle 226
6.3 Usability engineering 237
6.4 Iterative design and prototyping 241
Design Focus: Prototyping in practice 245
6.5 Design rationale 248
6.6 Summary 256
Exercises 257
Recommended reading 257
Chapter 7 Design rules 258
7.1 Introduction 259
7.2 Principles to support usability 260
7.3 Standards 275
7.4 Guidelines 277
7.5 Golden rules and heuristics 282
7.6 HCI patterns 284
7.7 Summary 286
Exercises 287
Recommended reading 288
Part 2

x Contents
Chapter 8 Implementation support 289
8.1 Introduction 290
8.2 Elements of windowing systems 291
8.3 Programming the application 296
Design Focus: Going with the grain 301
8.4 Using toolkits 302
Design Focus: Java and AWT 304
8.5 User interface management systems 306
8.6 Summary 313
Exercises 314
Recommended reading 316
Chapter 9 Evaluation techniques 318
9.1 What is evaluation? 319
9.2 Goals of evaluation 319
9.3 Evaluation through expert analysis 320
9.4 Evaluation through user participation 327
9.5 Choosing an evaluation method 357
9.6 Summary 362
Exercises 363
Recommended reading 364
Chapter 10 Universal design 365
10.1 Introduction 366
10.2 Universal design principles 366
10.3 Multi-modal interaction 368
Design Focus: Designing websites for screen readers 374
Design Focus: Choosing the right kind of speech 375
Design Focus: Apple Newton 381
10.4 Designing for diversity 384
Design Focus: Mathematics for the blind 386
10.5 Summary 393
Exercises 393
Recommended reading 394
Chapter 11 User support 395
11.1 Introduction 396
11.2 Requirements of user support 397
11.3 Approaches to user support 399
11.4 Adaptive help systems 404
Design Focus: It’s good to talk – help from real people 405
11.5 Designing user support systems 412
11.6 Summary 414
Exercises 415
Recommended reading 416

Contents xi
MODELS AND THEORIES 417
Chapter 12 Cognitive models 419
12.1 Introduction 420
12.2 Goal and task hierarchies 421
Design Focus: GOMS saves money 424
12.3 Linguistic models 430
12.4 The challenge of display-based systems 434
12.5 Physical and device models 436
12.6 Cognitive architectures 443
12.7 Summary 447
Exercises 448
Recommended reading 448
Chapter 13 Socio-organizational issues and stakeholder requirements 450
13.1 Introduction 451
13.2 Organizational issues 451
Design Focus: Implementing workﬂow in Lotus Notes 457
13.3 Capturing requirements 458
Design Focus: Tomorrow’s hospital – using participatory design 468
13.4 Summary 472
Exercises 473
Recommended reading 474
Chapter 14 Communication and collaboration models 475
14.1 Introduction 476
14.2 Face-to-face communication 476
Design Focus: Looking real – Avatar Conference 481
14.3 Conversation 483
14.4 Text-based communication 495
14.5 Group working 504
14.6 Summary 507
Exercises 508
Recommended reading 509
Chapter 15 Task analysis 510
15.1 Introduction 511
15.2 Differences between task analysis and other techniques 511
15.3 Task decomposition 512
15.4 Knowledge-based analysis 519
15.5 Entity–relationship-based techniques 525
15.6 Sources of information and data collection 532
15.7 Uses of task analysis 538
Part 3

xii Contents
15.8 Summary 541
Exercises 542
Recommended reading 543
Chapter 16 Dialog notations and design 544
16.1 What is dialog? 545
16.2 Dialog design notations 547
16.3 Diagrammatic notations 548
Design Focus: Using STNs in prototyping 551
Design Focus: Digital watch – documentation and analysis 563
16.4 Textual dialog notations 565
16.5 Dialog semantics 573
16.6 Dialog analysis and design 582
16.7 Summary 589
Exercises 591
Recommended reading 592
Chapter 17 Models of the system 594
17.1 Introduction 595
17.2 Standard formalisms 595
17.3 Interaction models 608
17.4 Continuous behavior 618
17.5 Summary 624
Exercises 625
Recommended reading 627
Chapter 18 Modeling rich interaction 629
18.1 Introduction 630
18.2 Status–event analysis 631
18.3 Rich contexts 639
18.4 Low intention and sensor-based interaction 649
Design Focus: Designing a car courtesy light 655
18.5 Summary 657
Exercises 658
Recommended reading 659
OUTSIDE THE BOX 661
Chapter 19 Groupware 663
19.1 Introduction 664
19.2 Groupware systems 664
Part 4

Contents xiii
19.3 Computer-mediated communication 667
Design Focus: SMS in action 673
19.4 Meeting and decision support systems 679
19.5 Shared applications and artifacts 685
19.6 Frameworks for groupware 691
Design Focus: TOWER – workspace awareness 701
19.7 Implementing synchronous groupware 702
19.8 Summary 713
Exercises 714
Recommended reading 715
Chapter 20 Ubiquitous computing and augmented realities 716
20.1 Introduction 717
20.2 Ubiquitous computing applications research 717
Design Focus: Ambient Wood – augmenting the physical 723
Design Focus: Classroom 2000/eClass – deploying and evaluating ubicomp 727
Design Focus: Shared experience 732
20.3 Virtual and augmented reality 733
Design Focus: Applications of augmented reality 737
20.4 Information and data visualization 738
Design Focus: Getting the size right 740
20.5 Summary 745
Exercises 746
Recommended reading 746
Chapter 21 Hypertext, multimedia and the world wide web 748
21.1 Introduction 749
21.2 Understanding hypertext 749
21.3 Finding things 761
21.4 Web technology and issues 768
21.5 Static web content 771
21.6 Dynamic web content 778
21.7 Summary 787
Exercises 788
Recommended reading 788
References 791
Index 817

xiv Guided tour

DESIGN PROCESS

In this part, we concentrate on how design practice

addresses the critical feature of an interactive system –

usability from the human perspective. The chapters in

this part promote the purposeful design of more usable

interactive systems. We begin in Chapter 5 by introducing

the key elements in the interaction design process. These

elements are then expanded in later chapters.

Chapter 6 discusses the design process in more detail,

speciﬁcally focussing on the place of user-centered design

within a software engineering framework. Chapter 7 high-

lights the range of design rules that can help us to specify

usable interactive systems, including abstract principles,

guidelines and other design representations.

In Chapter 8, we provide an overview of implementa-

tion support for the programmer of an interactive system.

Chapter 9 is concerned with the techniques used to evalu-

ate the interactive system to see if it satisﬁes user needs.

Chapter 10 discusses how to design a system to be univer-

sally accessible, regardless of age, gender, cultural background

or ability. In Chapter 11 we discuss the provision of user

support in the form of help systems and documentation.

PART

MODELING RICH

INTERACTION

OVERVIEW

We operate within an ecology of people, physical artifacts

and electronic systems, and this rich ecology has recently

become more complex as electronic devices invade the

workplace and our day-to-day lives. We need methods

to deal with these rich interactions.

nStatus–event analysis is a semi-formal, easy to apply

technique that:

–classiﬁes phenomena as event or status

–embodies naïve psychology

–highlights feedback problems in interfaces.

nAspects of rich environments can be incorporated into

methods such as task analysis:

–other people

–information requirements

–triggers for tasks

–modeling artifacts

–placeholders in task sequences.

nNew sensor-based systems do not require explicit

interaction; this means:

–new cognitive and interaction models

–new design methods

–new system architectures.

19.3 Computer-mediated communication

675

CuSeeMe

Special-purpose video conferencing is still relatively expensive, but low-ﬁdelity desktop

video conferencing is now within the reach of many users of desktop computers. Digital video

cameras are now inexpensive and easily obtainable. They often come with pre-packaged video

phone or video conferencing software. However, the system which has really popularized

video conferencing is a web-based tool. CuSeeMe works over the internet allowing participants

across the world owning only a basic digital video camera to see and talk to one another. The soft-

ware is usually public domain (although there are commercial versions) and the services allowing

connection are often free. The limited bandwidth available over long-distance internet links means

that video quality and frame rates are low and periodic image break-up may occur. In fact, it is

sound break-up which is more problematic. After all, we can talk to one another quite easily with-

out seeing one another, but ﬁnd it very difﬁcult over a noisy phone line. Often participants may

see one another’s video image, but actually discuss using a synchronous text-based ‘talk’ program.

CuSeeMe – video conferencing on the internet. Source: Courtesy of Geoff Ellis

440 Chapter 12nCognitive models

Worked exercise Do a keystroke-level analysis for opening up an application in a visual desktop interface using

a mouse as the pointing device, comparing at least two different methods for performing the

task. Repeat the exercise using a trackball. Consider how the analysis would differ for various

positions of the trackball relative to the keyboard and for other pointing devices.

Answer We provide a keystroke-level analysis for three different methods for launching an

application on a visual desktop. These methods are analyzed for a conventional one-

button mouse, a trackball mounted away from the keyboard and one mounted close to

the keyboard. The main distinction between the two trackballs is that the second one

does not require an explicit repositioning of the hands, that is there is no time required

for homing the hands between the pointing device and the keyboard.

Method 1 Double clicking on application icon

Steps Operator Mouse Trackball

Trackball

1. Move hand to mouse H[mouse] 0.400 0.400 0.000

2. Mouse to icon P[to icon] 0.664 1.113 1.113

3. Double click 2B[click] 0.400 0.400 0.400

4. Return to keyboard H[kbd] 0.400 0.400 0.000

Total times 1.864 2.313 1.513

Method 2 Using an accelerator key

Steps Operator Mouse Trackball

Trackball

1. Move hand to mouse H[mouse] 0.400 0.400 0.000

2. Mouse to icon P[to icon] 0.664 1.113 1.113

3. Click to select B[click] 0.200 0.200 0.200

4. Pause M 1.350 1.350 1.350

5. Return to keyboard H[kbd] 0.400 0.400 0.000

6. Press accelerator K 0.200 0.200 0.200

Total times 3.214 3.663 2.763

Method 3 Using a menu

Steps Operator Mouse Trackball

Trackball

1. Move hand to mouse H[mouse] 0.400 0.400 0.000

2. Mouse to icon P[to icon] 0.664 1.113 1.113

3. Click to select B[click] 0.200 0.200 0.200

4. Pause M 1.350 1.350 1.350

5. Mouse to ﬁle menu P 0.664 1.113 1.113

6. Pop-up menu B[down] 0.100 0.100 0.100

7. Drag to open P[drag] 0.713 1.248 1.248

8. Release mouse B[up] 0.100 0.100 0.100

9. Return to keyboard H[kbd] 0.400 0.400 0.000

Total times 4.591 6.024 5.224

The part structure separates out introductory and more

advanced material, with each part opener giving a simple

description of what its constituent chapters cover

Bullet points at the start of each chapter highlight the

core coverage

Worked exercises within chapters provide step-by-step

guidelines to demonstrate problem-solving techniques

Boxed asides contain descriptions of particular tasks or

technologies for additional interest, experimentation

and discussion

Guided tour xv

732 Chapter 20nUbiquitous computing and augmented realities

within these environments. Much of our understanding of work has developed from

Fordist and Taylorist principles on the structuring of activities and tasks. Evaluation

within HCI reﬂects these roots and is often predicated on notions of task and the

measurement of performance and efﬁciency in meeting these goals and tasks.

However, it is not clear that these measures can apply universally across activities

when we move away from structured and paid work to other activities. For example,

DESIGN FOCUS

Shared experience

You are in the Mackintosh Interpretation Centre in an arts center in Glasgow, Scotland. You notice a

man wearing black wandering around looking at the exhibits and then occasionally at a small PDA he is

holding. As you get closer he appears to be talking to himself, but then you realize he is simply talking

into a head-mounted microphone. ‘Some people can never stop using their mobile phone’, you think.

As you are looking at one exhibit, he comes across and suddenly cranes forward to look more closely,

getting right in front of you. ‘How rude’, you think.

The visitor is taking part in the City project – a mixed-reality experience. He is talking to two other

people at remote sites, one who has a desktop VR view of the exhibition and the other just a website.

However, they can all see representations of each other. The visitor is being tracked by ultrasound and

he appears in the VR world. Also, the web user’s current page locates her in a particular part of the

virtual exhibition. All of the users see a map of the exhibitiion showing where they all are.

You might think that in such an experiment the person actually in the museum would take the lead, but

in fact real groups using this system seemed to have equal roles and really had a sense of shared experi-

ence despite their very different means of seeing the exhibition.

See the book website for a full case study: /e3/casestudy/city/

City project: physical presence, VR interfaces and web interface. Source: Courtesy of

Matthew Chalmers, note: City is an Equator project

RECOMMENDED READING

J. Carroll, editor, HCI Models, Theories, and Frameworks: Toward an Interdisciplinary

Science, Morgan Kaufmann, 2003.

See chapters by Perry on distributed cognition, Monk on common ground and

Kraut on social psychology.

L. A. Suchman, Plans and Situated Actions: The Problem of Human–Machine

Communication, Cambridge University Press, 1987.

This book popularized ethnography within HCI. It puts forward the viewpoint

that most actions are not pre-planned, but situated within the context in which

they occur. The principal domain of the book is the design of help for a photo-

copier. This is itself a single-user task, but the methodology applied is based on

both ethnographic and conversational analysis. The book includes several chap-

ters discussing the contextual nature of language and analysis of conversation

transcripts.

T. Winograd and F. Flores, Understanding Computers and Cognition: A New

Foundation for Design, Addison-Wesley, 1986.

Like Suchman, this book emphasizes the contextual nature of language and the

weakness of traditional artiﬁcial intelligence research. It includes an account of

speech act theory as applied to Coordinator. Many people disagree with the

authors’ use of speech act theory, but, whether by application or reaction, this

work has been highly inﬂuential.

S. Greenberg, editor, Computer-supported Cooperative Work and Groupware,

Academic Press, 1991.

The contents of this collection originally made up two special issues of the

International Journal of Man–Machine Studies. In addition, the book contains

Greenberg’s extensive annotated bibliography of CSCW, a major entry point for

any research into the ﬁeld. Updated versions of the bibliography can be obtained

from the Department of Computer Science, University of Calgary, Calgary,

Alberta, Canada.

Communications of the ACM, Vol. 34, No. 12, special issue on ‘collaborative com-

puting’, December 1991.

Several issues of the journal Interacting with Computersfrom late 1992 through early

1993 have a special emphasis on CSCW.

Computer-Supported Cooperative Workis a journal dedicated to CSCW. See also back

issues of the journal Collaborative Computing. This ran independently for a while,

but has now merged with Computer-Supported Cooperative Work.

See also the recommended reading list for Chapter 19, especially the conference

proceedings.

Exercises

393

SUMMARY

Universal design is about designing systems that are accessible by all users in all

circumstances, taking account of human diversity in disabilities, age and culture.

Universal design helps everyone – for example, designing a system so that it can be

used by someone who is deaf or hard of hearing will beneﬁt other people working in

noisy environments or without audio facilities. Designing to be accessible to screen-

reading systems will make websites better for mobile users and older browsers.

Multi-modal systems provide access to system information and functionality

through a range of different input and output channels, exploiting redundancy.

Such systems will enable users with sensory, physical or cognitive impairments to

make use of the channels that they can use most effectively. But all users beneﬁt

from multi-modal systems that utilize more of our senses in an involving interactive

experience.

For any design choice we should ask ourselves whether our decision is excluding

someone and whether there are any potential confusions or misunderstandings in

our choice.

10.5

EXERCISES

10.1 Is multi-modality always a good thing? Justify your answer.

10.2 What are (i) auditory icons and (ii) earcons? How can they be used to beneﬁt both visually

impaired and sighted users?

10.3 Research your country’s legislation relating to accessibility of technology for disabled people.

What are the implications of this to your future career in computing?

10.4 Take your university website or another site of your choice and assess it for accessibility using

Bobby. How would you recommend improving the site?

10.5 How could systems be made more accessible to older users?

10.6 Interview either (i) a person you know over 65 or (ii) a child you know under 16 about their

experience, attitude and expectations of computers. What factors would you take into account

if you were designing a website aimed at this person?

10.7 Use the screen reader simulation available at www.webaim.org/simulations/screenreader to

experience something of what it is like to access the web using a screen reader. Can you ﬁnd

the answers to the test questions on the site?

Annotated further reading encourages readers to

research topics in depth

Design Focus mini case studies highlight practical

applications of HCI concepts

Frequent links to the

book website for

further information

Chapter summaries reinforce student learning.

Exercises at the end of chapters can be used by

teachers or individuals to test understanding

FOREWORD

Human–computer interaction is a difﬁcult endeavor with glorious rewards.

Designing interactive computer systems to be effective, efﬁcient, easy, and enjoyable to

use is important, so that people and society may realize the beneﬁts of computation-

based devices. The subtle weave of constraints and their trade-offs – human,

machine, algorithmic, task, social, aesthetic, and economic – generates the difﬁculty.

The reward is the creation of digital libraries where scholars can ﬁnd and turn the

pages of virtual medieval manuscripts thousands of miles away; medical instruments

that allow a surgical team to conceptualize, locate, and monitor a complex neuro-

surgical operation; virtual worlds for entertainment and social interaction, respon-

sive and efﬁcient government services, from online license renewal to the analysis of

parliamentary testimony; or smart telephones that know where they are and under-

stand limited speech. Interaction designers create interaction in virtual worlds and

embed interaction in physical worlds.

Human–computer interaction is a specialty in many ﬁelds, and is therefore multi-

disciplinary, but it has an intrinsic relationship as a subﬁeld to computer science.

Most interactive computing systems are for some human purpose and interact with

humans in human contexts. The notion that computer science is the study of algo-

rithms has virtue as an attempt to bring foundational rigor, but can lead to ignoring

constraints foundational to the design of successful interactive computer systems.

A lesson repeatedly learned in engineering is that a major source of failure is the

narrow optimization of a design that does not take sufﬁcient account of contextual

factors. Human users and their contexts are major components of the design

problem that cannot be wished away simply because they are complex to address. In

fact, that largest part of program code in most interactive systems deals with user

interaction. Inadequate attention to users and task context not only leads to bad user

interfaces, it puts entire systems at risk.

The problem is how to take into account the human and contextual part of a sys-

tem with anything like the rigor with which other parts of the system are understood

and designed – how to go beyond fuzzy platitudes like ‘know the user’ that are true,

but do not give a method for doing or a test for having done. This is difﬁcult to do,

but inescapable, and, in fact, capable of progress. Over the years, the need to take

into account human aspects of technical systems has led to the creation of new ﬁelds

of study: applied psychology, industrial engineering, ergonomics, human factors,

Foreword xvii

man–machine systems. Human–computer interaction is the latest of these, more

complex in some ways because of the breadth of user populations and applications,

the reach into cognitive and social constraints, and the emphasis on interaction. The

experiences with other human-technical disciplines lead to a set of conclusions about

how a discipline of human–computer interaction should be organized if it is to be

successful.

First, design is where the action is. An effective discipline of human–computer

interaction cannot be based largely on ‘usability analysis’, important though that

may be. Usability analysis happens too late; there are too few degrees of freedom; and

most importantly, it is not generative. Design thrives on understanding constraints,

on insight into the design space, and on deep knowledge of the materials of the

design, that is, the user, the task, and the machine. The classic landmark designs in

human–computer interaction, such as the Xerox Star and the Apple Lisa/Macintosh,

were not created from usability analysis (although usability analysis had important

roles), but by generative principles for their designs by user interface designers who

had control of the design and implementation.

Second, although the notion of ‘user-centered design’ gets much press, we should

really be emphasizing ‘task-centered design’. Understanding the purpose and con-

text of a system is key to allocating functions between people and machines and to

designing their interaction. It is only in deciding what a human–machine system

should do and the constraints on this goal that the human and technical issues can

be resolved. The need for task-centered design brings forward the need for methods

of task analysis as a central part of system design.

Third, human–computer interaction needs to be structured to include both

analytic and implementation methods together in the same discipline and taught

together as part of the core. Practitioners of the discipline who can only evaluate, but

not design and build are under a handicap. Builders who cannot reason analytically

about the systems they build or who do not understand the human information pro-

cessing or social contexts of their designs are under a handicap. Of course, there will

be specialists in one or another part of human–computer interaction, but for there

to be a successful ﬁeld, there must be a common core.

Finally, what makes a discipline is a set of methods for doing something. A ﬁeld

must have results that can be taught and used by people other than their originators

to do something. Historically, a ﬁeld naturally evolves from a set of point results to

a set of techniques to a set of facts, general abstractions, methods, and theories. In

fact, for a ﬁeld to be cumulative, there must be compaction of knowledge by crunch-

ing the results down into methods and theories; otherwise the ﬁeld becomes fad-

driven and a collection of an almost unteachably large set of weak results. The most

useful methods and theories are generative theories: from some task analysis it is

possible to compute some insightful property that constrains the design space of a

system. In a formula: task analysis, approximation, and calculation. For example,

we can predict that if a graphics system cannot update the display faster than 10

times/second then the illusion of animation will begin to break down. This con-

straint worked backwards has architectural implications for how to guarantee the

needed display rate under variable computational load. It can be designed against.

xviii Foreword

This textbook, by Alan Dix, Janet Finlay, Gregory Abowd, and Russell Beale,

represents how far human–computer interaction has come in developing and

organizing technical results for the design and understanding of interactive

systems. Remarkably, by the light of their text, it is pretty far, satisfying all the just-

enumerated conclusions. This book makes an argument that by now there are many

teachable results in human–computer interaction by weight alone! It makes an argu-

ment that these results form a cumulative discipline by its structure, with sections

that organize the results systematically, characterizing human, machine, interaction,

and the design process. There are analytic models, but also code implementation

examples. It is no surprise that methods of task analysis play a prominent role in

the text as do theories to help in the design of the interaction. Usability evaluation

methods are integrated in their proper niche within the larger framework.

In short, the codiﬁcation of the ﬁeld of human–computer interaction in this

text is now starting to look like other subﬁelds of computer science. Students by

studying the text can learn how to understand and build interactive systems.

Human–computer interaction as represented by the text ﬁts together with other

parts of computer science. Moreover, human–computer interaction as presented is

a challenge problem for advancing theory in cognitive science, design, business, or

social-technical systems. Given where the ﬁeld was just a few short years ago, the

creation of this text is a monumental achievement. The way is open to reap the

glorious rewards of interactive systems through a markedly less difﬁcult endeavor,

both for designer and for user.

Stuart K. Card

Palo Alto Research Center, Palo Alto, California

PREFACE TO THE THIRD EDITION

It is ten years since the ﬁrst edition of this book was published and much has

changed. Ubiquitous computing and rich sensor-ﬁlled environments are ﬁnding

their way out of the laboratory, not just into ﬁlms and ﬁction, but also into our

workplaces and homes. Now the computer really has broken its bounds of plastic

and glass: we live in networked societies where personal computing devices from

mobile phones to smart cards ﬁll our pockets, and electronic devices surround us at

home and at work. The web too has grown from a largely academic network into the

hub of business and everyday lives. As the distinctions between physical and digital,

work and leisure start to break down, human–computer interaction is also radically

changing.

We have tried to capture some of the excitement of these changes in this revised

edition, including issues of physical devices in Chapters 2 and 3, discussion of

web interfaces in Chapter 21, ubiquitous computing in Chapters 4 and 20, and new

models and paradigms for interaction in these new environments in Chapters 17 and

18. We have reﬂected aspects of the shift in use of technology from work to leisure

in the analysis of user experience in Chapter 3, and in several of the boxed examples

and case studies in the text. This new edition of Human–Computer Interaction is not

just tracking these changes but looking ahead at emerging areas.

However, it is also rooted in strong principles and models that are not dependent

on the passing technologies of the day. We are excited both by the challenges of the

new and by the established foundations, as it is these foundations that will be the

means by which today’s students understand tomorrow’s technology. So we make no

apology for continuing the focus of previous editions on the theoretical and con-

ceptual models that underpin our discipline. As the use of technology has changed,

these models have expanded. In particular, the insular individual focus of early

work is increasingly giving way to include the social and physical context. This is

reﬂected in the expanded treatment of social and organizational analysis, including

ethnography, in Chapter 13, and the analysis of artifacts in the physical environment

in Chapter 18.

xx Preface to the third edition

STRUCTURE

The structure of the new edition has been completely revised. This in part reﬂects the

growth of the area: ten years ago HCI was as often as not a minority optional sub-

ject, and the original edition was written to capture the core material for a standard

course. Today HCI is much expanded: some areas (like CSCW) are fully ﬂedged dis-

ciplines in their own right, and HCI is studied from a range of perspectives and at

different levels of detail. We have therefore separated basic material suitable for intro-

ductory courses into the ﬁrst two parts, including a new chapter on interaction

design, which adds new material on scenarios and navigation design and provides an

overview suitable for a ﬁrst course. In addition, we have included a new chapter on

universal design, to reﬂect the growing emphasis on design that is inclusive of all,

regardless of ability, age or cultural background. More advanced material focussing

on different HCI models and theories is presented in Part 3, with extended cover-

age of social and contextual models and rich interaction. It is intended that these

sections will be suitable for more advanced HCI courses at undergraduate and

postgraduate level, as well as for researchers new to the ﬁeld. Detailed coverage of the

particular domains of web applications, ubiquitous computing and CSCW is given

in Part 4.

New to this edition is a full color plate section. Images ﬂagged with a camera icon

in the text can be found in color in the plate section.

WEBSITE AND SUPPORT MATERIALS

We have always believed that support materials are an essential part of a textbook of

this kind. These are designed to supplement and enhance the printed book – phys-

ical and digital integration in practice. Since the ﬁrst edition we have had exercises,

mini-case studies and presentation slides for all chapters available electronically.

For the second edition these were incorporated into a website including links and

an online search facility that acts as an exhaustive index to the book and mini-

encyclopedia of HCI. For visually disabled readers, access to a full online electronic

text has also been available. The website is continuing to develop, and for the third

edition provides all these features plus more, including WAP search, multi-choice

questions, and extended case study material (see also color plate section). We will use

the book website to bring you new exercises, information and other things, so do

visit us at www.hcibook.com (also available via www.booksites.net/dix). Throughout

the book you will ﬁnd shorthand web references of the form /e3/a-page-url/. Just

prepend http://www.hcibook.com to ﬁnd further information. To assist users of the

second edition, a mapping between the structures of the old and new editions is

available on the web at: http://www.hcibook.com/e3/contents/map2e/

Preface to the third edition xxi

STYLISTIC CONVENTION

As with all books, we have had to make some global decisions regarding style and

terminology. Speciﬁcally, in a book in which the central characters are ‘the user’

and ‘the designer’, it is difﬁcult to avoid the singular pronoun. We therefore use the

pronoun ‘he’ when discussing the user and ‘she’ when referring to the designer. In

other cases we use ‘she’ as a generic term. This should not be taken to imply anything

about the composition of any actual population.

Similarly, we have adopted the convention of referring to the ﬁeld of ‘Human–

Computer Interaction’ and the notion of ‘human–computer interaction’. In many

cases we will also use the abbreviation HCI.

ACKNOWLEDGEMENTS

In a book of this size, written by multiple authors, there will always be myriad

people behind the scenes who have aided, supported and abetted our efforts. We

would like to thank all those who provided information, pictures and software that

have enhanced the quality of the ﬁnal product. In particular, we are indebted to

Wendy Mackay for the photograph of EVA; Wendy Hall and her colleagues at the

University of Southampton for the screen shot of Microcosm; Saul Greenberg for

the reactive keyboard; Alistair Edwards for Soundtrack; Christina Engelbart for the

photographs of the early chord keyset and mouse; Geoff Ellis for the screen shot of

Devina and himself using CuSeeMe; Steve Benford for images of the Internet Foyer;

and Tony Renshaw who provided photographs of the eye tracking equipment.

Thanks too to Simon Shum for information on design rationale, Robert Ward who

gave us material on psycho-physiology, and Elizabeth Mynatt and Tom Rodden who

worked with Gregory on material adapted in Chapter 20. Several of the boxed case

studies are based on the work of multi-institution projects, and we are grateful

to all those from the project teams of CASCO, thePooch SMART-ITS, TOWER,

AVATAR-Conference and TEAM-HOS for boxes and case studies based on their

work; and also to the EQUATOR project from which we drew material for the boxes

on cultural probes, ‘Ambient Wood’ and ‘City’. We would also like to thank all the

reviewers and survey respondents whose feedback helped us to select our subject

matter and improve our coverage; and our colleagues at our respective institutions

and beyond who offered insight, encouragement and tolerance throughout the revi-

sion. We are indebted to all those who have contributed to the production process

at Pearson Education and elsewhere, especially Keith Mansﬁeld, Anita Atkinson,

Lynette Miller, Sheila Chatten and Robert Chaundy.

Personal thanks must go to Fiona, Esther, Miriam, Rachel, Tina, Meghan, Aidan

and Blaise, who have all endured ‘The Book’ well beyond the call of duty and over

xxii Preface to the third edition

many years, and Bruno and ‘the girls’ who continue to make their own inimitable

contribution.

Finally we all owe huge thanks to Fiona for her continued deep personal support

and for tireless proofreading, checking of ﬁgures, and keeping us all moving. We

would never have got beyond the ﬁrst edition without her.

The efforts of all of these have meant that the book is better than it would other-

wise have been. Where it could still be better, we take full responsibility.

PUBLISHER’S ACKNOWLEDGEMENTS

We are grateful to the following for permission to reproduce copyright material:

Figure p. 2, Figures 3.14, 3.15, 3.16 and 5.13 and Exercise 8.4 screen shots reprinted

by permission from Apple Computer, Inc.; Figure 2.11 reprinted by permission of

Keith Cheverst; Figure 3.13 from The WebBook and Web Forager: An information

reprinted by permission (Card, S. K., Robertson, G. G. and York, W. 1996); Figures

3.9, 3.19, 5.5, Chapter 14, Design Focus: Looking real – Avatar Conference screen

shots, Figures 21.3, 21.10, 21.11 screen shot frames reprinted by permission from

Microsoft Corporation; Tables 6.2 and 6.3 adapted from Usability engineering: our

experience and evolution in Handbook for Human–Computer Interaction edited by

J. and Hotzblatt, K. 1988); Figure 7.1 adapted from The alternate reality kit – an

animated environment for creating interactive simulations in Proceedings of

(Smith, R. B. 1986); Figure 7.2 from Guidelines for designing user interface software

in MITRE Corporation Report MTR-9420, reprinted by permission of The MITRE

Corporation (Smith, S. L. and Mosier, J. N. 1986); Figure 7.3 reprinted by permis-

sion of Jenifer Tidwell; Figures 8.6 and 8.9 from Xview Programming Manual,

Volume 7 of The X Window System, reprinted by permission of O’Reilly and

Associates, Inc. (Heller, D. 1990); Figure 9.8 screen shot reprinted by permission of

Dr. R. D. Ward; Figure 10.2 after Earcons and icons: their structure and common

design principles in Human-Computer Interaction, 4(1), published and reprinted

by permission of Lawrence Erlbaum Associates, Inc. (Blattner, M., Sumikawa, D. and

Greenberg, R. 1989); Figure 10.5 reprinted by permission of Alistair D. N. Edwards;

Figure 10.7 reprinted by permission of Saul Greenberg; Figure 11.2 screen shot

reprinted by permission of Macromedia, Inc.; Table 12.1 adapted from The

Psychology of Human Computer Interaction, published and reprinted by permission

of Lawrence Erlbaum Associates, Inc. (Card, S. K., Moran, T. P. and Newell, A.

1983); Table 12.2 after Table in A comparison of input devices in elemental

pointing and dragging tasks in Reaching through technology – CHI’91 Conference

Proceedings, Human Factors in Computing Systems, April, edited by S. P. Robertson,

xxiv Publisher’s acknowledgements

I. S., Sellen, A. and Buxton, W. 1991); Figure 14.1 from Understanding Computers

and Cognition: A New Foundation for Design, published by Addison-Wesley,

reprinted by permission of Pearson Education, Inc. (Winograd, T. and Flores, F.

1986); Figure 14.5 from Theories of multi-party interaction. Technical report, Social

and Computer Sciences Research Group, University of Surrey and Queen Mary and

Westﬁeld Colleges, University of London, reprinted by permission of Nigel Gilbert

(Hewitt, B., Gilbert, N., Jirotka, M. and Wilbur, S. 1990); Figure 14.6 from Dialogue

processes in computer-mediated communication: a study of letters in the com system.

Technical report, Linköping Studies in Arts and Sciences, reprinted by permission of

Kerstin Severinson Eklundh (Eklundh, K. S. 1986); Chapter 14, Design Focus:

Looking real – Avatar Conference, screen shots reprinted by permission of

AVATAR-Conference project team; Figure 16.17 screen shot reprinted by permis-

sion of Harold Thimbleby; Figure 17.5 based on Verifying the behaviour of virtual

world objects in DSV-IS 2000 Interactive Systems: Design, Speciﬁcation and

Veriﬁcation. LNCS 1946, edited by P. Palanque and F. Paternò, published and

reprinted by permission of Spinger-Verlag GmbH & Co. KG (Willans, J. S. and

Harrison, M. D. 2001); Figure 18.4 icons reprinted by permission of Fabio Paternò;

Chapter 19, p.675 CuSeeMe screen shot reprinted by permission of Geoff Ellis;

Chapter 19, Design Focus: TOWER – workspace awareness, screen shots reprinted

by permission of Wolfgang Prinz; Figure 20.1 reprinted by permission of Mitsubishi

Electric Research Laboratories, Inc.; Figure 20.4 (right) reprinted by permission of

Sony Computer Science Laboratories, Inc; Figure 20.9 from Cone trees. Animated 3d

visualisation of hierarchical information in Proceedings of the CH’91 Conference of

(Robertson, G. G., Card, S. K., and Mackinlay, J. D. 1991); Figure 20.10 from

reprinted by permission (Plaisant, C., Milash, B., Rose, A., Widoff, S. and

Shneiderman, B. 1996); Figure 20.11 from Browsing anatomical image databases: a

Inc., reprinted by permission (North, C. and Korn, F. 1996); Figure 20.12 from

Inc., reprinted by permission (Tweedie, L., Spence, R., Dawkes, H. and Su, H. 1996);

Figure 21.2 from The impact of Utility and Time on Distributed Information

Retrieval in People and Computers XII: Proceedings of HCI’97, edited by H.

Thimbleby, B. O’Conaill and P. Thomas, published and reprinted by permission

of Spinger-Verlag GmbH & Co. KG (Johnson, C. W. 1997); Figure 21.4 screen

shot reprinted by permission of the Departments of Electronics and Computer

Science and History at the University of Southampton; Figure 21.6 Netscape browser

Netscape has not authorized, sponsored, endorsed, or approved this publication and

is not responsible for its content.

We are grateful to the following for permission to reproduce photographs:

Chapter 1, p. 50, Popperfoto.com; Chapter 2, p. 65, PCD Maltron Ltd; Figure 2.2

Electrolux; Figures 2.6 and 19.6 photos courtesy of Douglas Engelbart and Bootstrap

Institute; Figure 2.8 (left) British Sky Broadcasting Limited; Figure 2.13 (bottom

Publisher’s acknowledgements xxv

right) Sony (UK) Ltd; Chapter 2, Design Focus: Feeling the Road, BMW AG;

Chapter 2, Design Focus: Smart-Its – making using sensors easy, Hans Gellersen;

Figures 4.1 (right) and 20.2 (left) Palo Alto Research Center; Figure 4.2 and 20.3

(left) François Guimbretière; Figure 4.3 (bottom left) Franklin Electronic Publishers;

Figure 5.2 (top plate and middle plate) Kingston Museum and Heritage Service,

(bottom plate) V&A Images, The Victoria and Albert Museum, London; Chapter 5,

Design Focus: Cultural probes, William W. Gaver, Anthony Boucher, Sarah

Pennington and Brendan Walker, Equator IRC, Royal College of Art; Chapter 6,

p. 245, from The 1984 Olympic Message System: a text of behavioural principle of

by permission (Gould, J. D., Boies, S. J., Levy, S., Richards, J. T. and Schoonard, J.

1987); Figures 9.5 and 9.6 J. A. Renshaw; Figure 9.7 Dr. R. D. Ward; Figure 10.3

SensAble Technologies; Chapter 13, Design Focus: Tomorrow’s hospital – using

participatory design, Professor J. Artur Vale Serrano; Chapter 18, p. 650, Michael

Beigl; Chapter 19, p. 678, Steve Benford, The Mixed Reality Laboratory, University

of Nottingham; Chapter 19, Design Focus: SMS in action, Mark Rounceﬁeld; Figure

20.2 (right) Ken Hinckley; Figure 20.3 (right) MIT Media Lab; Figure 20.4 (left)

from Interacting with paper on the digital desk in Communications of the ACM,

p. 726, Peter Phillips; Chapter 20, Design Focus: Ambient wood – augmenting the

physical, Yvonne Rogers; Chapter 20, Design Focus: Shared experience, Matthew

Chalmers.

We are grateful to the following for permission to reproduce text extracts:

Pearson Education, Inc. Publishing as Pearson Addison Wesley for an extract

adapted from Designing the User Interface: Strategies for Effective Human–Computer

Group for an extract adapted from The Design of Everyday Things by D. Norman,

1998; and Wiley Publishing, Inc. for extracts adapted from ‘Heuristic Evaluation’ by

Wiley Publishing, Inc.; IEEE for permission to base chapter 20 on ‘The human ex-

perience’ by Gregory Abowd, Elizabeth Mynatt and Tom Rodden which appeared

in IEEE Pervasive Computing Magazine, Special Inaugural Issue on Reaching for

In some instances we have been unable to trace the owners of copyright material, and

we would appreciate any information that would enable us to do so.

INTRODUCTION

WHY HUMAN–COMPUTER INTERACTION?

In the ﬁrst edition of this book we wrote the following:

This is the authors’ second attempt at writing this introduction. Our ﬁrst attempt

fell victim to a design quirk coupled with an innocent, though weary and less than

attentive, user. The word-processing package we originally used to write this intro-

duction is menu based. Menu items are grouped to reﬂect their function. The ‘save’

and ‘delete’ options, both of which are correctly classiﬁed as ﬁle-level operations, are

consequently adjacent items in the menu. With a cursor controlled by a trackball it

is all too easy for the hand to slip, inadvertently selecting delete instead of save. Of

course, the delete option, being well thought out, pops up a conﬁrmation box allow-

ing the user to cancel a mistaken command. Unfortunately, the save option produces

a very similar conﬁrmation box – it was only as we hit the ‘Conﬁrm’ button that we

noticed the word ‘delete’ at the top...

Happily this word processor no longer has a delete option in its menu, but unfortu-

nately, similar problems to this are still an all too common occurrence. Errors such

as these, resulting from poor design choices, happen every day. Perhaps they are not

catastrophic: after all nobody’s life is endangered nor is there environmental damage

(unless the designer happens to be nearby or you break something in frustration!).

However, when you lose several hours’ work with no written notes or backup and

a publisher’s deadline already a week past, ‘catastrophe’ is certainly the word that

springs to mind.

Why is it then that when computers are marketed as ‘user friendly’ and ‘easy to

use’, simple mistakes like this can still occur? Did the designer of the word processor

actually try to use it with the trackball, or was it just that she was so expert with the

system that the mistake never arose? We hazard a guess that no one tried to use it

when tired and under pressure. But these criticisms are not levied only on the design-

ers of traditional computer software. More and more, our everyday lives involve pro-

grammed devices that do not sit on our desk, and these devices are just as unusable.

Exactly how many VCR designers understand the universal difﬁculty people have

trying to set their machines to record a television program? Do car radio designers

2 Introduction

actually think it is safe to use so many knobs and displays that the driver has to

divert attention away from the road completely in order to tune the radio or adjust

the volume?

Computers and related devices have to be designed with an understanding that

people with speciﬁc tasks in mind will want to use them in a way that is seamless with

respect to their everyday work. To do this, those who design these systems need to

know how to think in terms of the eventual users’ tasks and how to translate that

knowledge into an executable system. But there is a problem with trying to teach the

notion of designing computers for people. All designers are people and, most prob-

ably, they are users as well. Isn’t it therefore intuitive to design for the user? Why

does it need to be taught when we all know what a good interface looks like? As a

result, the study of human–computer interaction (HCI) tends to come late in the

designer’s training, if at all. The scenario with which we started shows that this is a

mistaken view; it is not at all intuitive or easy to design consistent, robust systems

DESIGN FOCUS

Things don’t change

It would be nice to think that problems like those described at the start of the Introduction would

never happen now. Think again! Look at the MacOS X ‘dock’ below. It is a fast launch point for applica-

tions; folders and ﬁles can be dragged there for instant access; and also, at the right-hand side, there

sits the trash can. Imagine what happens as you try to drag a ﬁle into one of the folders. If your ﬁnger

accidentally slips whilst the icon is over the trash can – oops!

Happily this is not quite as easy in reality as it looks in the screen shot, since the icons in the dock con-

stantly move around as you try to drag a ﬁle into it. This is to make room for the ﬁle in case you want

to place it in the dock. However, it means you have to concentrate very hard when dragging a ﬁle over

the dock. We assume this is not a deliberate feature, but it does have the beneﬁcial side effect that

users are less likely to throw away a ﬁle by accident – whew!

In fact it is quite fun to watch a new user trying to throw away a ﬁle. The trash can keeps moving as if

it didn’t want the ﬁle in it. Experienced users evolve coping strategies. One user always drags ﬁles into

the trash from the right-hand side as then the icons in the dock don’t move around. So two lessons:

n designs don’t always get better

n but at least users are clever.

Screen shot reprinted by permission from Apple Computer, Inc.

that will cope with all manner of user carelessness. The interface is not something

that can be plugged in at the last minute; its design should be developed integrally

with the rest of the system. It should not just present a ‘pretty face’, but should sup-

port the tasks that people actually want to do, and forgive the careless mistakes. We

therefore need to consider how HCI ﬁts into the design process.

Designing usable systems is not simply a matter of altruism towards the eventual

user, or even marketing; it is increasingly a matter of law. National health and safety

standards constrain employers to provide their workforce with usable computer sys-

tems: not just safe but usable. For example, EC Directive 90/270/EEC, which has been

incorporated into member countries’ legislation, requires employers to ensure the

following when designing, selecting, commissioning or modifying software:

n that it is suitable for the task

n that it is easy to use and, where appropriate, adaptable to the user’s knowledge

and experience

n that it provides feedback on performance

n that it displays information in a format and at a pace that is adapted to the user

n that it conforms to the ‘principles of software ergonomics’.

Designers and employers can no longer afford to ignore the user.

WHAT IS HCI?

The term human–computer interactionhas only been in widespread use since the early

1980s, but has its roots in more established disciplines. Systematic study of human

performance began in earnest at the beginning of the last century in factories, with

an emphasis on manual tasks. The Second World War provided the impetus for

studying the interaction between humans and machines, as each side strove to pro-

duce more effective weapons systems. This led to a wave of interest in the area among

researchers, and the formation of the Ergonomics Research Society in 1949. Tradi-

tionally, ergonomists have been concerned primarily with the physical characteristics

of machines and systems, and how these affect user performance. Human Factors

incorporates these issues, and more cognitive issues as well. The terms are often used

interchangeably, with Ergonomics being the preferred term in the United Kingdom

and Human Factors in the English-speaking parts of North America. Both of these

disciplines are concerned with user performance in the context of any system, whether

computer, mechanical or manual. As computer use became more widespread, an

increasing number of researchers specialized in studying the interaction between

people and computers, concerning themselves with the physical, psychological and

theoretical aspects of this process. This research originally went under the name man–

machine interaction, but this became human–computer interaction in recognition of

the particular interest in computers and the composition of the user population!

Another strand of research that has inﬂuenced the development of HCI is infor-

mation science and technology. Again the former is an old discipline, pre-dating the

introduction of technology, and is concerned with the management and manipulation

What is HCI? 3

4 Introduction

of information within an organization. The introduction of technology has had a

profound effect on the way that information can be stored, accessed and utilized

and, consequently, a signiﬁcant effect on the organization and work environment.

Systems analysis has traditionally concerned itself with the inﬂuence of technology

in the workplace, and ﬁtting the technology to the requirements and constraints of

the job. These issues are also the concern of HCI.

HCI draws on many disciplines, as we shall see, but it is in computer science and

systems design that it must be accepted as a central concern. For all the other discip-

lines it can be a specialism, albeit one that provides crucial input; for systems design

it is an essential part of the design process. From this perspective, HCI involves the

design, implementation and evaluation of interactive systems in the context of the

user’s task and work.

However, when we talk about human–computer interaction, we do not necessarily

envisage a single user with a desktop computer. By user we may mean an individual

user, a group of users working together, or a sequence of users in an organization,

each dealing with some part of the task or process. The user is whoever is trying to

get the job done using the technology. By computerwe mean any technology ranging

from the general desktop computer to a large-scale computer system, a process

control system or an embedded system. The system may include non-computerized

parts, including other people. By interaction we mean any communication between

a user and computer, be it direct or indirect. Direct interaction involves a dialog

with feedback and control throughout performance of the task. Indirect interaction

may involve batch processing or intelligent sensors controlling the environment.

The important thing is that the user is interacting with the computer in order to

accomplish something.

WHO IS INVOLVED IN HCI?

HCI is undoubtedly a multi-disciplinary subject. The ideal designer of an interactive

system would have expertise in a range of topics: psychology and cognitive science

to give her knowledge of the user’s perceptual, cognitive and problem-solving

skills; ergonomics for the user’s physical capabilities; sociology to help her under-

stand the wider context of the interaction; computer science and engineering to

be able to build the necessary technology; business to be able to market it; graphic

design to produce an effective interface presentation; technical writing to produce

the manuals, and so it goes on. There is obviously too much expertise here to be held

by one person (or indeed four!), perhaps even too much for the average design team.

Indeed, although HCI is recognized as an interdisciplinary subject, in practice peo-

ple tend to take a strong stance on one side or another. However, it is not possible to

design effective interactive systems from one discipline in isolation. Input is needed

from all sides. For example, a beautifully designed graphic display may be unusable

if it ignores dialog constraints or the psychological limitations of the user.

Theory and HCI 5

In this book we want to encourage the multi-disciplinary view of HCI but we too

have our ‘stance’, as computer scientists. We are interested in answering a particular

question. How do principles and methods from each of these contributing dis-

ciplines in HCI help us to design better systems? In this we must be pragmatists

rather than theorists: we want to know how to apply the theory to the problem

rather than just acquire a deep understanding of the theory. Our goal, then, is to be

multi-disciplinary but practical. We concentrate particularly on computer science,

psychology and cognitive science as core subjects, and on their application to design;

other disciplines are consulted to provide input where relevant.

THEORY AND HCI

Unfortunately for us, there is no general and uniﬁed theory of HCI that we can

present. Indeed, it may be impossible ever to derive one; it is certainly out of our

reach today. However, there is an underlying principle that forms the basis of our

own views on HCI, and it is captured in our claim that people use computers to

accomplish work. This outlines the three major issues of concern: the people, the

computers and the tasks that are performed. The system must support the user’s

task, which gives us a fourth focus, usability: if the system forces the user to adopt an

unacceptable mode of work then it is not usable.

There are, however, those who would dismiss our concentration on the task,

saying that we do not even know enough about a theory of human tasks to support

them in design. There is a good argument here (to which we return in Chapter 15).

However, we can live with this confusion about what real tasks are because our

understanding of tasks at the moment is sufﬁcient to give us direction in design. The

user’s current tasks are studied and then supported by computers, which can in

turn affect the nature of the original task and cause it to evolve. To illustrate, word

processing has made it easy to manipulate paragraphs and reorder documents,

allowing writers a completely new freedom that has affected writing styles. No longer

is it vital to plan and construct text in an ordered fashion, since free-ﬂowing prose

can easily be restructured at a later date. This evolution of task in turn affects the

design of the ideal system. However, we see this evolution as providing a motivating

force behind the system development cycle, rather than a refutation of the whole idea

of supportive design.

This word ‘task’ or the focus on accomplishing ‘work’ is also problematic when we

think of areas such as domestic appliances, consumer electronics and e-commerce.

There are three ‘use’ words that must all be true for a product to be successful; it

must be:

useful – accomplish what is required: play music, cook dinner, format a document;

usable – do it easily and naturally, without danger of error, etc.;

used – make people want to use it, be attractive, engaging, fun, etc.

6 Introduction

The last of these has not been a major factor until recently in HCI, but issues of

motivation, enjoyment and experience are increasingly important. We are certainly

even further from having a uniﬁed theory of experience than of task.

The question of whether HCI, or more importantly the design of interactive sys-

tems and the user interface in particular, is a science or a craft discipline is an inter-

esting one. Does it involve artistic skill and fortuitous insight or reasoned methodical

science? Here we can draw an analogy with architecture. The most impressive struc-

tures, the most beautiful buildings, the innovative and imaginative creations that

provide aesthetic pleasure, all require inventive inspiration in design and a sense of

artistry, and in this sense the discipline is a craft. However, these structures also have

to be able to stand up to fulﬁll their purpose successfully, and to be able to do this

the architect has to use science. So it is for HCI: beautiful and/or novel interfaces are

artistically pleasing andcapable of fulﬁlling the tasks required – a marriage of art and

science into a successful whole. We want to reuse lessons learned from the past about

how to achieve good results and avoid bad ones. For this we require both craft and

science. Innovative ideas lead to more usable systems, but in order to maximize the

potential beneﬁt from the ideas, we need to understand not only that they work, but

how and why they work. This scientiﬁc rationalization allows us to reuse related con-

cepts in similar situations, in much the same way that architects can produce a bridge

and know that it will stand, since it is based upon tried and tested principles.

The craft–science tension becomes even more difﬁcult when we consider novel

systems. Their increasing complexity means that our personal ideas of good and bad

are no longer enough; for a complex system to be well designed we need to rely on

something more than simply our intuition. Designers may be able to think about

how one user would want to act, but how about groups? And what about new media?

Our ideas of how best to share workloads or present video information are open to

debate and question even in non-computing situations, and the incorporation of one

version of good design into a computer system is quite likely to be unlike anyone

else’s version. Different people work in different ways, whilst different media color

the nature of the interaction; both can dramatically change the very nature of the

original task. In order to assist designers, it is unrealistic to assume that they can rely

on artistic skill and perfect insight to develop usable systems. Instead we have to pro-

vide them with an understanding of the concepts involved, a scientiﬁc view of the

reasons why certain things are successful whilst others are not, and then allow their

creative nature to feed off this information: creative ﬂow, underpinned with science;

or maybe scientiﬁc method, accelerated by artistic insight. The truth is that HCI is

required to be both a craft and a science in order to be successful.

HCI IN THE CURRICULUM

If HCI involves both craft and science then it must, in part at least, be taught.

Imagination and skill may be qualities innate in the designer or developed through

experience, but the underlying theory must be learned. In the past, when computers

HCI in the curriculum 7

were used primarily by expert specialists, concentration on the interface was a lux-

ury that was often relinquished. Now designers cannot afford to ignore the interface

in favour of the functionality of their systems: the two are too closely intertwined. If

the interface is poor, the functionality is obscured; if it is well designed, it will allow

the system’s functionality to support the user’s task.

Increasingly, therefore, computer science educators cannot afford to ignore HCI.

We would go as far as to claim that HCI should be integrated into every computer

science or software engineering course, either as a recurring feature of other modules

or, preferably, as a module itself. It should not be viewed as an ‘optional extra’

(although, of course, more advanced HCI options can complement a basic core

course). This view is shared by the ACM SIGCHI curriculum development group,

who propose a curriculum for such a core course [9]. The topics included in this

book, although developed without reference to this curriculum, cover the main

emphases of it, and include enough detail and coverage to support specialized

options as well.

In courses other than computer science, HCI may well be an option specializing

in a particular area, such as cognitive modeling or task analysis. Selected use of the

relevant chapters of this book can also support such a course.

HCI must be taken seriously by designers and educators if the requirement for

additional complexity in the system is to be matched by increased clarity and usabil-

ity in the interface. In this book we demonstrate how this can be done in practice.

DESIGN FOCUS

Quick ﬁxes

You should expect to spend both time and money on interface design, just as you would with other

parts of a system. So in one sense there are no quick ﬁxes. However, a few simple steps can make a

dramatic improvement.

Think ‘user’

Probably 90% of the value of any interface design technique is that it forces the designer to remember

that someone (and in particular someone else) will use the system under construction.

Try it out

Of course, many designers will build a system that they ﬁnd easy and pleasant to use, and they ﬁnd

it incomprehensible that anyone else could have trouble with it. Simply sitting someone down with

an early version of an interface (without the designer prompting them at each step!) is enormously

valuable. Professional usability laboratories will have video equipment, one-way mirrors and other

sophisticated monitors, but a notebook and pencil and a home-video camera will sufﬁce (more about

evaluation in Chapter 9).

Involve the users

Where possible, the eventual users should be involved in the design process. They have vital know-

ledge and will soon ﬁnd ﬂaws. A mechanical syringe was once being developed and a prototype was

demonstrated to hospital staff. Happily they quickly noticed the potentially fatal ﬂaw in its interface.

The doses were entered via a numeric keypad: an accidental keypress and the dose could be out by a

factor of 10! The production version had individual increment/decrement buttons for each digit (more

about participatory design in Chapter 13).

Iterate

People are complicated, so you won’t get it right ﬁrst time. Programming an interface can be a very

difﬁcult and time-consuming business. So, the result becomes precious and the builder will want

to defend it and minimize changes. Making early prototypes less precious and easier to throw away is

crucial. Happily there are now many interface builder tools that aid this process. For example, mock-

ups can be quickly constructed using HyperCard on the Apple Macintosh or Visual Basic on the PC.

For visual and layout decisions, paper designs and simple models can be used (more about iterative

design in Chapter 5).

8 Introduction

Figure 0.1 Automatic syringe: setting the dose to 1372. The effect of one key slip before and after

user involvement

PART

FOUNDATIONS

In this part we introduce the fundamental components of

an interactive system: the human user, the computer system

itself and the nature of the interactive process. We then

present a view of the history of interactive systems by look-

ing at key interaction paradigms that have been signiﬁcant.

Chapter 1 discusses the psychological and physiological

attributes of the user, providing us with a basic overview of

the capabilities and limitations that affect our ability to use

computer systems. It is only when we have an understand-

ing of the user at this level that we can understand what

makes for successful designs. Chapter 2 considers the

computer in a similar way. Input and output devices are

described and explained and the effect that their individual

characteristics have on the interaction highlighted. The

computational power and memory of the computer is

another important component in determining what can be

achieved in the interaction, whilst due attention is also paid

to paper output since this forms one of the major uses

of computers and users’ tasks today. Having approached

interaction from both the human and the computer side,

we then turn our attention to the dialog between them

in Chapter 3, where we look at models of interaction. In

Chapter 4 we take a historical perspective on the evolution

of interactive systems and how they have increased the

usability of computers in general.

THE HUMAN

OVERVIEW

n Humans are limited in their capacity to process

information. This has important implications for design.

n Information is received and responses given via a

number of input and output channels:

– visual channel

– auditory channel

– haptic channel

– movement.

n Information is stored in memory:

– sensory memory

– short-term (working) memory

– long-term memory.

n Information is processed and applied:

– reasoning

– problem solving

– skill acquisition

– error.

n Emotion inﬂuences human capabilities.

n Users share common capabilities but are individuals

with differences, which should not be ignored.

12 Chapter 1 n The human

INTRODUCTION

This chapter is the ﬁrst of four in which we introduce some of the ‘foundations’ of

HCI. We start with the human, the central character in any discussion of interactive

systems. The human, the user, is, after all, the one whom computer systems are de-

signed to assist. The requirements of the user should therefore be our ﬁrst priority.

In this chapter we will look at areas of human psychology coming under the general

banner of cognitive psychology. This may seem a far cry from designing and building

interactive computer systems, but it is not. In order to design something for some-

one, we need to understand their capabilities and limitations. We need to know if

there are things that they will ﬁnd difﬁcult or, even, impossible. It will also help us to

know what people ﬁnd easy and how we can help them by encouraging these things.

We will look at aspects of cognitive psychology which have a bearing on the use of com-

puter systems: how humans perceive the world around them, how they store and

process information and solve problems, and how they physically manipulate objects.

We have already said that we will restrict our study to those things that are relev-

ant to HCI. One way to structure this discussion is to think of the user in a way that

highlights these aspects. In other words, to think of a simpliﬁed model of what is

actually going on. Many models have been proposed and it useful to consider one of

the most inﬂuential in passing, to understand the context of the discussion that is to

follow. In 1983, Card, Moran and Newell [56] described the Model Human Processor,

which is a simpliﬁed view of the human processing involved in interacting with

computer systems. The model comprises three subsystems: the perceptual system,

handling sensory stimulus from the outside world, the motor system, which controls

actions, and the cognitive system, which provides the processing needed to connect

the two. Each of these subsystems has its own processor and memory, although

obviously the complexity of these varies depending on the complexity of the tasks

the subsystem has to perform. The model also includes a number of principles of

operation which dictate the behavior of the systems under certain conditions.

We will use the analogy of the user as an information processing system, but in

our model make the analogy closer to that of a conventional computer system.

Information comes in, is stored and processed, and information is passed out. We

will therefore discuss three components of this system: input–output, memory and

processing. In the human, we are dealing with an intelligent information-processing

system, and processing therefore includes problem solving, learning, and, con-

sequently, making mistakes. This model is obviously a simpliﬁcation of the real

situation, since memory and processing are required at all levels, as we have seen in

the Model Human Processor. However, it is convenient as a way of grasping how

information is handled by the human system. The human, unlike the computer, is

also inﬂuenced by external factors such as the social and organizational environ-

ment, and we need to be aware of these inﬂuences as well. We will ignore such

factors for now and concentrate on the human’s information processing capabilities

only. We will return to social and organizational inﬂuences in Chapter 3 and, in

more detail, in Chapter 13.

1.1

1.2 Input–output channels 13

In this chapter, we will ﬁrst look at the human’s input–output channels, the senses

and responders or effectors. This will involve some low-level processing. Secondly,

we will consider human memory and how it works. We will then think about how

humans perform complex problem solving, how they learn and acquire skills, and

why they make mistakes. Finally, we will discuss how these things can help us in the

design of computer systems.

INPUT–OUTPUT CHANNELS

A person’s interaction with the outside world occurs through information being

received and sent: input and output. In an interaction with a computer the user

receives information that is output by the computer, and responds by providing

input to the computer – the user’s output becomes the computer’s input and vice

versa. Consequently the use of the terms input and output may lead to confusion so

we shall blur the distinction somewhat and concentrate on the channels involved.

This blurring is appropriate since, although a particular channel may have a primary

role as input or output in the interaction, it is more than likely that it is also used in

the other role. For example, sight may be used primarily in receiving information

from the computer, but it can also be used to provide information to the computer,

for example by ﬁxating on a particular screen point when using an eyegaze system.

Input in the human occurs mainly through the senses and output through the

motor control of the effectors. There are ﬁve major senses: sight, hearing, touch, taste

and smell. Of these, the ﬁrst three are the most important to HCI. Taste and smell

do not currently play a signiﬁcant role in HCI, and it is not clear whether they could

be exploited at all in general computer systems, although they could have a role to

play in more specialized systems (smells to give warning of malfunction, for example)

or in augmented reality systems. However, vision, hearing and touch are central.

Similarly there are a number of effectors, including the limbs, ﬁngers, eyes, head

and vocal system. In the interaction with the computer, the ﬁngers play the primary

role, through typing or mouse control, with some use of voice, and eye, head and

body position.

Imagine using a personal computer (PC) with a mouse and a keyboard. The appli-

cation you are using has a graphical interface, with menus, icons and windows. In

your interaction with this system you receive information primarily by sight, from

what appears on the screen. However, you may also receive information by ear: for

example, the computer may ‘beep’ at you if you make a mistake or to draw attention

to something, or there may be a voice commentary in a multimedia presentation.

Touch plays a part too in that you will feel the keys moving (also hearing the ‘click’)

or the orientation of the mouse, which provides vital feedback about what you have

done. You yourself send information to the computer using your hands, either

by hitting keys or moving the mouse. Sight and hearing do not play a direct role

in sending information in this example, although they may be used to receive

1.2

14 Chapter 1 n The human

information from a third source (for example, a book, or the words of another per-

son) which is then transmitted to the computer.

In this section we will look at the main elements of such an interaction, ﬁrst con-

sidering the role and limitations of the three primary senses and going on to consider

motor control.

1.2.1 Vision

Human vision is a highly complex activity with a range of physical and perceptual

limitations, yet it is the primary source of information for the average person.

We can roughly divide visual perception into two stages: the physical reception of

the stimulus from the outside world, and the processing and interpretation of that

stimulus. On the one hand the physical properties of the eye and the visual system

mean that there are certain things that cannot be seen by the human; on the other

the interpretative capabilities of visual processing allow images to be constructed

from incomplete information. We need to understand both stages as both inﬂuence

what can and cannot be perceived visually by a human being, which in turn directly

affects the way that we design computer systems. We will begin by looking at the

eye as a physical receptor, and then go on to consider the processing involved in

basic vision.

The human eye

Vision begins with light. The eye is a mechanism for receiving light and transform-

ing it into electrical energy. Light is reﬂected from objects in the world and their

image is focussed upside down on the back of the eye. The receptors in the eye

transform it into electrical signals which are passed to the brain.

The eye has a number of important components (see Figure 1.1) which we will

look at in more detail. The corneaand lens at the front of the eye focus the light into

a sharp image on the back of the eye, the retina. The retina is light sensitive and con-

tains two types of photoreceptor: rods and cones.

Rods are highly sensitive to light and therefore allow us to see under a low level of

illumination. However, they are unable to resolve ﬁne detail and are subject to light

saturation. This is the reason for the temporary blindness we get when moving from

a darkened room into sunlight: the rods have been active and are saturated by the

sudden light. The cones do not operate either as they are suppressed by the rods. We

are therefore temporarily unable to see at all. There are approximately 120 million

rods per eye which are mainly situated towards the edges of the retina. Rods there-

fore dominate peripheral vision.

Cones are the second type of receptor in the eye. They are less sensitive to light

than the rods and can therefore tolerate more light. There are three types of cone,

each sensitive to a different wavelength of light. This allows color vision. The eye has

approximately 6 million cones, mainly concentrated on the fovea, a small area of the

retina on which images are ﬁxated.

1.2 Input–output channels 15

Although the retina is mainly covered with photoreceptors there is one blind spot

where the optic nerve enters the eye. The blind spot has no rods or cones, yet our visual

system compensates for this so that in normal circumstances we are unaware of it.

The retina also has specialized nerve cells called ganglioncells. There are two types:

X-cells, which are concentrated in the fovea and are responsible for the early detec-

tion of pattern; and Y-cells which are more widely distributed in the retina and are

responsible for the early detection of movement. The distribution of these cells

means that, while we may not be able to detect changes in pattern in peripheral

vision, we can perceive movement.

Visual perception

Understanding the basic construction of the eye goes some way to explaining the

physical mechanisms of vision but visual perception is more than this. The informa-

tion received by the visual apparatus must be ﬁltered and passed to processing ele-

ments which allow us to recognize coherent scenes, disambiguate relative distances

and differentiate color. We will consider some of the capabilities and limitations of

visual processing later, but ﬁrst we will look a little more closely at how we perceive

size and depth, brightness and color, each of which is crucial to the design of effective

visual interfaces.

Figure 1.1 The human eye

16 Chapter 1 n The human

Perceiving size and depth Imagine you are standing on a hilltop. Beside you on the

summit you can see rocks, sheep and a small tree. On the hillside is a farmhouse with

outbuildings and farm vehicles. Someone is on the track, walking toward the

summit. Below in the valley is a small market town.

Even in describing such a scene the notions of size and distance predominate. Our

visual system is easily able to interpret the images which it receives to take account

of these things. We can identify similar objects regardless of the fact that they appear

to us to be of vastly different sizes. In fact, we can use this information to judge

distances.

So how does the eye perceive size, depth and relative distances? To understand this

we must consider how the image appears on the retina. As we noted in the previous

section, reﬂected light from the object forms an upside-down image on the retina.

The size of that image is speciﬁed as a visual angle. Figure 1.2 illustrates how the

visual angle is calculated.

If we were to draw a line from the top of the object to a central point on the front

of the eye and a second line from the bottom of the object to the same point, the

visual angle of the object is the angle between these two lines. Visual angle is affected

by both the size of the object and its distance from the eye. Therefore if two objects

are at the same distance, the larger one will have the larger visual angle. Similarly,

if two objects of the same size are placed at different distances from the eye, the

DESIGN FOCUS

Getting noticed

The extensive knowledge about the human visual system can be brought to bear in practical design. For

example, our ability to read or distinguish falls off inversely as the distance from our point of focus

increases. This is due to the fact that the cones are packed more densely towards the center of our

visual ﬁeld. You can see this in the following image. Fixate on the dot in the center. The letters on the

left should all be equally readable, those on the right all equally harder.

This loss of discrimination sets limits on the amount that can be seen or read without moving one’s

eyes. A user concentrating on the middle of the screen cannot be expected to read help text on the

bottom line.

However, although our ability to discriminate static text diminishes, the rods, which are concentrated

more in the outer parts of our visual ﬁeld, are very sensitive to changes; hence we see movement well

at the edge of our vision. So if you want a user to see an error message at the bottom of the screen it

had better be ﬂashing! On the other hand clever moving icons, however impressive they are, will be

distracting even when the user is not looking directly at them.

1.2 Input–output channels 17

furthest one will have the smaller visual angle. The visual angle indicates how much

of the ﬁeld of view is taken by the object. The visual angle measurement is given in

either degrees or minutes of arc, where 1 degree is equivalent to 60 minutes of arc,

and 1 minute of arc to 60 seconds of arc.

So how does an object’s visual angle affect our perception of its size? First, if

the visual angle of an object is too small we will be unable to perceive it at all. Visual

acuity is the ability of a person to perceive ﬁne detail. A number of measurements

have been established to test visual acuity, most of which are included in standard

eye tests. For example, a person with normal vision can detect a single line if it has a

visual angle of 0.5 seconds of arc. Spaces between lines can be detected at 30 seconds

to 1 minute of visual arc. These represent the limits of human visual acuity.

Assuming that we can perceive the object, does its visual angle affect our per-

ception of its size? Given that the visual angle of an object is reduced as it gets

further away, we might expect that we would perceive the object as smaller. In fact,

our perception of an object’s size remains constant even if its visual angle changes.

So a person’s height is perceived as constant even if they move further from you.

This is the law of size constancy, and it indicates that our perception of size relies on

factors other than the visual angle.

One of these factors is our perception of depth. If we return to the hilltop scene

there are a number of cues which we can use to determine the relative positions and

distances of the objects which we see. If objects overlap, the object which is partially

covered is perceived to be in the background, and therefore further away. Similarly,

the size and height of the object in our ﬁeld of view provides a cue to its distance.

Figure 1.2 Visual angle

18 Chapter 1 n The human

A third cue is familiarity: if we expect an object to be of a certain size then we can

judge its distance accordingly. This has been exploited for humour in advertising:

one advertisement for beer shows a man walking away from a bottle in the fore-

ground. As he walks, he bumps into the bottle, which is in fact a giant one in the

background!

Perceiving brightness A second aspect of visual perception is the perception of

brightness. Brightness is in fact a subjective reaction to levels of light. It is affected by

luminance which is the amount of light emitted by an object. The luminance of an

object is dependent on the amount of light falling on the object’s surface and its

reﬂective properties. Luminance is a physical characteristic and can be measured

using a photometer. Contrastis related to luminance: it is a function of the luminance

of an object and the luminance of its background.

Although brightness is a subjective response, it can be described in terms of the

amount of luminance that gives a just noticeable difference in brightness. However,

the visual system itself also compensates for changes in brightness. In dim lighting,

the rods predominate vision. Since there are fewer rods on the fovea, objects in low

lighting can be seen less easily when ﬁxated upon, and are more visible in peripheral

vision. In normal lighting, the cones take over.

Visual acuity increases with increased luminance. This may be an argument

for using high display luminance. However, as luminance increases, ﬂicker also

increases. The eye will perceive a light switched on and off rapidly as constantly

on. But if the speed of switching is less than 50 Hz then the light is perceived to

ﬂicker. In high luminance ﬂicker can be perceived at over 50 Hz. Flicker is also

more noticeable in peripheral vision. This means that the larger the display (and

consequently the more peripheral vision that it occupies), the more it will appear

to ﬂicker.

Perceiving color A third factor that we need to consider is perception of color.

Color is usually regarded as being made up of three components: hue, intensity and

saturation. Hue is determined by the spectral wavelength of the light. Blues have short

wavelengths, greens medium and reds long. Approximately 150 different hues can be

discriminated by the average person. Intensity is the brightness of the color, and

saturation is the amount of whiteness in the color. By varying these two, we can

perceive in the region of 7 million different colors. However, the number of colors

that can be identiﬁed by an individual without training is far fewer (in the region

of 10).

The eye perceives color because the cones are sensitive to light of different wave-

lengths. There are three different types of cone, each sensitive to a different color

(blue, green and red). Color vision is best in the fovea, and worst at the periphery

where rods predominate. It should also be noted that only 3–4% of the fovea is

occupied by cones which are sensitive to blue light, making blue acuity lower.

Finally, we should remember that around 8% of males and 1% of females suffer

from color blindness, most commonly being unable to discriminate between red and

green.

1.2 Input–output channels 19

The capabilities and limitations of visual processing

In considering the way in which we perceive images we have already encountered

some of the capabilities and limitations of the human visual processing system.

However, we have concentrated largely on low-level perception. Visual processing

involves the transformation and interpretation of a complete image, from the light

that is thrown onto the retina. As we have already noted, our expectations affect the

way an image is perceived. For example, if we know that an object is a particular size,

we will perceive it as that size no matter how far it is from us.

Visual processing compensates for the movement of the image on the retina

which occurs as we move around and as the object which we see moves. Although

the retinal image is moving, the image that we perceive is stable. Similarly, color and

brightness of objects are perceived as constant, in spite of changes in luminance.

This ability to interpret and exploit our expectations can be used to resolve ambi-

guity. For example, consider the image shown in Figure 1.3. What do you perceive?

Now consider Figure 1.4 and Figure 1.5. The context in which the object appears

Figure 1.3 An ambiguous shape?

Figure 1.4 ABC

20 Chapter 1 n The human

allows our expectations to clearly disambiguate the interpretation of the object, as

either a B or a 13.

However, it can also create optical illusions. For example, consider Figure 1.6.

Which line is longer? Most people when presented with this will say that the top

line is longer than the bottom. In fact, the two lines are the same length. This may be

due to a false application of the law of size constancy: the top line appears like a con-

cave edge, the bottom like a convex edge. The former therefore seems further away

than the latter and is therefore scaled to appear larger. A similar illusion is the Ponzo

illusion (Figure 1.7). Here the top line appears longer, owing to the distance effect,

although both lines are the same length. These illusions demonstrate that our per-

ception of size is not completely reliable.

Another illusion created by our expectations compensating an image is the proof-

reading illusion. Read the text in Figure 1.8 quickly. What does it say? Most people

reading this rapidly will read it correctly, although closer inspection shows that the

word ‘the’ is repeated in the second and third line.

These are just a few examples of how the visual system compensates, and some-

times overcompensates, to allow us to perceive the world around us.

Figure 1.5 12 13 14

Figure 1.6 The Muller–Lyer illusion – which line is longer?

1.2 Input–output channels 21

Figure 1.7 The Ponzo illusion – are these the same size?

Figure 1.8 Is this text correct?

22 Chapter 1 n The human

Reading

So far we have concentrated on the perception of images in general. However,

the perception and processing of text is a special case that is important to interface

design, which invariably requires some textual display. We will therefore end

this section by looking at reading. There are several stages in the reading process.

First, the visual pattern of the word on the page is perceived. It is then decoded

with reference to an internal representation of language. The ﬁnal stages of lan-

guage processing include syntactic and semantic analysis and operate on phrases or

sentences.

We are most concerned with the ﬁrst two stages of this process and how they

inﬂuence interface design. During reading, the eye makes jerky movements called

saccades followed by ﬁxations. Perception occurs during the ﬁxation periods, which

account for approximately 94% of the time elapsed. The eye moves backwards over

the text as well as forwards, in what are known as regressions. If the text is complex

there will be more regressions.

Adults read approximately 250 words a minute. It is unlikely that words are

scanned serially, character by character, since experiments have shown that words can

be recognized as quickly as single characters. Instead, familiar words are recognized

using word shape. This means that removing the word shape clues (for example, by

capitalizing words) is detrimental to reading speed and accuracy.

The speed at which text can be read is a measure of its legibility. Experiments have

shown that standard font sizes of 9 to 12 points are equally legible, given pro-

portional spacing between lines [346]. Similarly line lengths of between 2.3 and 5.2

inches (58 and 132 mm) are equally legible. However, there is evidence that reading

from a computer screen is slower than from a book [244]. This is thought to be

due to a number of factors including a longer line length, fewer words to a page,

DESIGN FOCUS

Where’s the middle?

Optical illusions highlight the differences between the way things are and the way we perceive them –

and in interface design we need to be aware that we will not always perceive things exactly as they are.

The way that objects are composed together will affect the way we perceive them, and we do not per-

ceive geometric shapes exactly as they are drawn. For example, we tend to magnify horizontal lines and

reduce vertical. So a square needs to be slightly increased in height to appear square and lines will

appear thicker if horizontal rather than vertical.

Optical illusions also affect page symmetry. We tend to see the center of a page as being a little above

the actual center – so if a page is arranged symmetrically around the actual center, we will see it as too

low down. In graphic design this is known as the optical center – and bottom page margins tend to be

increased by 50% to compensate.

1.2 Input–output channels 23

orientation and the familiarity of the medium of the page. These factors can of

course be reduced by careful design of textual interfaces.

A ﬁnal word about the use of contrast in visual display: a negative contrast (dark

characters on a light screen) provides higher luminance and, therefore, increased

acuity, than a positive contrast. This will in turn increase legibility. However, it will

also be more prone to ﬂicker. Experimental evidence suggests that in practice negat-

ive contrast displays are preferred and result in more accurate performance [30].

1.2.2 Hearing

The sense of hearing is often considered secondary to sight, but we tend to under-

estimate the amount of information that we receive through our ears. Close your eyes

for a moment and listen. What sounds can you hear? Where are they coming from?

What is making them? As I sit at my desk I can hear cars passing on the road outside,

machinery working on a site nearby, the drone of a plane overhead and bird song.

But I can also tell wherethe sounds are coming from, and estimate how far away they

are. So from the sounds I hear I can tell that a car is passing on a particular road near

my house, and which direction it is traveling in. I know that building work is in

progress in a particular location, and that a certain type of bird is perched in the tree

in my garden.

The auditory system can convey a lot of information about our environment. But

how does it work?

The human ear

Just as vision begins with light, hearing begins with vibrations in the air or sound

waves. The ear receives these vibrations and transmits them, through various stages,

to the auditory nerves. The ear comprises three sections, commonly known as the

outer ear, middle ear and inner ear.

The outer ear is the visible part of the ear. It has two parts: the pinna, which is

the structure that is attached to the sides of the head, and the auditory canal, along

which sound waves are passed to the middle ear. The outer ear serves two purposes.

First, it protects the sensitive middle ear from damage. The auditory canal contains

wax which prevents dust, dirt and over-inquisitive insects reaching the middle ear.

It also maintains the middle ear at a constant temperature. Secondly, the pinna and

auditory canal serve to amplify some sounds.

The middle ear is a small cavity connected to the outer ear by the tympanic

membrane, or ear drum, and to the inner ear by the cochlea. Within the cavity are the

ossicles, the smallest bones in the body. Sound waves pass along the auditory canal

and vibrate the ear drum which in turn vibrates the ossicles, which transmit the

vibrations to the cochlea, and so into the inner ear. This ‘relay’ is required because,

unlike the air-ﬁlled outer and middle ears, the inner ear is ﬁlled with a denser

cochlean liquid. If passed directly from the air to the liquid, the transmission of the

sound waves would be poor. By transmitting them via the ossicles the sound waves

are concentrated and ampliﬁed.

24 Chapter 1 n The human

The waves are passed into the liquid-ﬁlled cochlea in the inner ear. Within

the cochlea are delicate hair cells or cilia that bend because of the vibrations in the

cochlean liquid and release a chemical transmitter which causes impulses in the

auditory nerve.

Processing sound

As we have seen, sound is changes or vibrations in air pressure. It has a number of

characteristics which we can differentiate. Pitchis the frequency of the sound. A low

frequency produces a low pitch, a high frequency, a high pitch. Loudness is propor-

tional to the amplitude of the sound; the frequency remains constant. Timbrerelates

to the type of the sound: sounds may have the same pitch and loudness but be made

by different instruments and so vary in timbre. We can also identify a sound’s loca-

tion, since the two ears receive slightly different sounds, owing to the time difference

between the sound reaching the two ears and the reduction in intensity caused by the

sound waves reﬂecting from the head.

The human ear can hear frequencies from about 20 Hz to 15 kHz. It can distin-

guish frequency changes of less than 1.5 Hz at low frequencies but is less accurate at

high frequencies. Different frequencies trigger activity in neurons in different parts

of the auditory system, and cause different rates of ﬁring of nerve impulses.

The auditory system performs some ﬁltering of the sounds received, allowing us

to ignore background noise and concentrate on important information. We are

selective in our hearing, as illustrated by the cocktail party effect, where we can pick

out our name spoken across a crowded noisy room. However, if sounds are too loud,

or frequencies too similar, we are unable to differentiate sound.

As we have seen, sound can convey a remarkable amount of information. It is

rarely used to its potential in interface design, usually being conﬁned to warning

sounds and notiﬁcations. The exception is multimedia, which may include music,

voice commentary and sound effects. However, the ear can differentiate quite subtle

sound changes and can recognize familiar sounds without concentrating attention

on the sound source. This suggests that sound could be used more extensively in

interface design, to convey information about the system state, for example. This is

discussed in more detail in Chapter 10.

Worked exercise Suggest ideas for an interface which uses the properties of sound effectively.

Answer You might approach this exercise by considering how sound could be added to an appli-

cation with which you are familiar. Use your imagination. This is also a good subject for

a literature survey (starting with the references in Chapter 10).

Speech sounds can obviously be used to convey information. This is useful not only for

the visually impaired but also for any application where the user’s attention has to be

divided (for example, power plant control, ﬂight control, etc.). Uses of non-speech

sounds include the following:

n Attention – to attract the user’s attention to a critical situation or to the end of a

process, for example.

1.2 Input–output channels 25

n Status information – continuous background sounds can be used to convey status

information. For example, monitoring the progress of a process (without the need

for visual attention).

n Conﬁrmation – a sound associated with an action to conﬁrm that the action has

been carried out. For example, associating a sound with deleting a ﬁle.

n Navigation – using changing sound to indicate where the user is in a system. For

example, what about sound to support navigation in hypertext?

1.2.3 Touch

The third and last of the senses that we will consider is touch or haptic perception.

Although this sense is often viewed as less important than sight or hearing, imagine

life without it. Touch provides us with vital information about our environment.

It tells us when we touch something hot or cold, and can therefore act as a warning. It

also provides us with feedback when we attempt to lift an object, for example. Con-

sider the act of picking up a glass of water. If we could only see the glass and not

feel when our hand made contact with it or feel its shape, the speed and accuracy of

the action would be reduced. This is the experience of users of certain virtual reality

games: they can see the computer-generated objects which they need to manipulate

but they have no physical sensation of touching them. Watching such users can be

an informative and amusing experience! Touch is therefore an important means of

feedback, and this is no less so in using computer systems. Feeling buttons depress is

an important part of the task of pressing the button. Also, we should be aware that,

although for the average person, haptic perception is a secondary source of informa-

tion, for those whose other senses are impaired, it may be vitally important. For such

users, interfaces such as braille may be the primary source of information in the

interaction. We should not therefore underestimate the importance of touch.

The apparatus of touch differs from that of sight and hearing in that it is not local-

ized. We receive stimuli through the skin. The skin contains three types of sensory

receptor: thermoreceptors respond to heat and cold, nociceptors respond to intense

pressure, heat and pain, and mechanoreceptors respond to pressure. It is the last of

these that we are concerned with in relation to human–computer interaction.

There are two kinds of mechanoreceptor, which respond to different types of

pressure. Rapidly adapting mechanoreceptors respond to immediate pressure as the

skin is indented. These receptors also react more quickly with increased pressure.

However, they stop responding if continuous pressure is applied. Slowly adapting

mechanoreceptors respond to continuously applied pressure.

Although the whole of the body contains such receptors, some areas have greater

sensitivity or acuity than others. It is possible to measure the acuity of different areas

of the body using the two-point threshold test. Take two pencils, held so their tips are

about 12 mm apart. Touch the points to your thumb and see if you can feel two

points. If you cannot, move the points a little further apart. When you can feel two

points, measure the distance between them. The greater the distance, the lower the

sensitivity. You can repeat this test on different parts of your body. You should ﬁnd

26 Chapter 1 n The human

that the measure on the forearm is around 10 times that of the ﬁnger or thumb. The

ﬁngers and thumbs have the highest acuity.

A second aspect of haptic perception is kinesthesis: awareness of the position of

the body and limbs. This is due to receptors in the joints. Again there are three

types: rapidly adapting, which respond when a limb is moved in a particular direc-

tion; slowly adapting, which respond to both movement and static position; and

positional receptors, which only respond when a limb is in a static position. This

perception affects both comfort and performance. For example, for a touch typist,

awareness of the relative positions of the ﬁngers and feedback from the keyboard are

very important.

1.2.4 Movement

Before leaving this section on the human’s input–output channels, we need to

consider motor control and how the way we move affects our interaction with com-

puters. A simple action such as hitting a button in response to a question involves

a number of processing stages. The stimulus (of the question) is received through

the sensory receptors and transmitted to the brain. The question is processed and a

valid response generated. The brain then tells the appropriate muscles to respond.

Each of these stages takes time, which can be roughly divided into reaction time and

movement time.

Movement time is dependent largely on the physical characteristics of the subjects:

their age and ﬁtness, for example. Reaction time varies according to the sensory

channel through which the stimulus is received. A person can react to an auditory

Handling the goods

E-commerce has become very successful in some areas of sales, such as travel services,

books and CDs, and food. However, in some retail areas, such as clothes shopping, e-commerce

has been less successful. Why?

When buying train and airline tickets and, to some extent, books and food, the experience of shop-

ping is less important than the convenience. So, as long as we know what we want, we are happy

to shop online. With clothes, the experience of shopping is far more important. We need to be

able to handle the goods, feel the texture of the material, check the weight to test quality. Even if

we know that something will ﬁt us we still want to be able to handle it before buying.

Research into haptic interaction (see Chapter 2 and Chapter 10) is looking at ways of solving this

problem. By using special force feedback and tactile hardware, users are able to feel surfaces

and shape. For example, a demonstration environment called TouchCity allows people to walk

around a virtual shopping mall, pick up products and feel their texture and weight. A key problem

with the commercial use of such an application, however, is that the haptic experience requires

expensive hardware not yet available to the average e-shopper. However, in future, such immer-

sive e-commerce experiences are likely to be the norm. (See www.novint.com/)

1.3 Human memory 27

signal in approximately 150 ms, to a visual signal in 200 ms and to pain in 700 ms.

However, a combined signal will result in the quickest response. Factors such as skill

or practice can reduce reaction time, and fatigue can increase it.

A second measure of motor skill is accuracy. One question that we should ask is

whether speed of reaction results in reduced accuracy. This is dependent on the task

and the user. In some cases, requiring increased reaction time reduces accuracy. This

is the premise behind many arcade and video games where less skilled users fail at

levels of play that require faster responses. However, for skilled operators this is not

necessarily the case. Studies of keyboard operators have shown that, although the

faster operators were up to twice as fast as the others, the slower ones made 10 times

the errors.

Speed and accuracy of movement are important considerations in the design

of interactive systems, primarily in terms of the time taken to move to a particular

target on a screen. The target may be a button, a menu item or an icon, for example.

The time taken to hit a target is a function of the size of the target and the distance

that has to be moved. This is formalized in Fitts’ law [135]. There are many vari-

ations of this formula, which have varying constants, but they are all very similar.

One common form is

Movement time =a + b log

(distance/size + 1)

where a and b are empirically determined constants.

This affects the type of target we design. Since users will ﬁnd it more difﬁcult

to manipulate small objects, targets should generally be as large as possible and

the distance to be moved as small as possible. This has led to suggestions that pie-

chart-shaped menus are preferable to lists since all options are equidistant. However,

the trade-off is increased use of screen estate, so the choice may not be so simple.

If lists are used, the most frequently used options can be placed closest to the user’s

start point (for example, at the top of the menu). The implications of Fitts’ law in

design are discussed in more detail in Chapter 12.

HUMAN MEMORY

Have you ever played the memory game? The idea is that each player has to recount

a list of objects and add one more to the end. There are many variations but the

objects are all loosely related: ‘I went to the market and bought a lemon, some

oranges, bacon. . .’ or ‘I went to the zoo and saw monkeys, and lions, and tigers . . .’

and so on. As the list grows objects are missed out or recalled in the wrong order and

so people are eliminated from the game. The winner is the person remaining at the

end. Such games rely on our ability to store and retrieve information, even seemingly

arbitrary items. This is the job of our memory system.

Indeed, much of our everyday activity relies on memory. As well as storing all our

factual knowledge, our memory contains our knowledge of actions or procedures.

1.3

28 Chapter 1 n The human

It allows us to repeat actions, to use language, and to use new information received

via our senses. It also gives us our sense of identity, by preserving information from

our past experiences.

But how does our memory work? How do we remember arbitrary lists such as

those generated in the memory game? Why do some people remember more easily

than others? And what happens when we forget?

In order to answer questions such as these, we need to understand some of the

capabilities and limitations of human memory. Memory is the second part of our

model of the human as an information-processing system. However, as we noted

earlier, such a division is simplistic since, as we shall see, memory is associated with

each level of processing. Bearing this in mind, we will consider the way in which

memory is structured and the activities that take place within the system.

It is generally agreed that there are three types of memory or memory function:

sensorybuffers, short-term memory or working memory, and long-term memory. There

is some disagreement as to whether these are three separate systems or different

functions of the same system. We will not concern ourselves here with the details

of this debate, which is discussed in detail by Baddeley [21], but will indicate the

evidence used by both sides as we go along. For our purposes, it is sufﬁcient to note

three separate types of memory. These memories interact, with information being

processed and passed between memory stores, as shown in Figure 1.9.

1.3.1 Sensor y memory

The sensory memories act as buffers for stimuli received through the senses. A

sensory memory exists for each sensory channel: iconic memory for visual stimuli,

echoic memory for aural stimuli and haptic memory for touch. These memories are

constantly overwritten by new information coming in on these channels.

We can demonstrate the existence of iconic memory by moving a ﬁnger in front

of the eye. Can you see it in more than one place at once? This indicates a persistence

of the image after the stimulus has been removed. A similar effect is noticed most

vividly at ﬁrework displays where moving sparklers leave a persistent image.

Information remains in iconic memory very brieﬂy, in the order of 0.5 seconds.

Similarly, the existence of echoic memory is evidenced by our ability to ascertain

the direction from which a sound originates. This is due to information being

received by both ears. However, since this information is received at different times,

we must store the stimulus in the meantime. Echoic memory allows brief ‘play-back’

Figure 1.9 A model of the structure of memory

1.3 Human memory 29

of information. Have you ever had someone ask you a question when you are

reading? You ask them to repeat the question, only to realize that you know what was

asked after all. This experience, too, is evidence of the existence of echoic memory.

Information is passed from sensory memory into short-term memory by atten-

tion, thereby ﬁltering the stimuli to only those which are of interest at a given time.

Attention is the concentration of the mind on one out of a number of competing

stimuli or thoughts. It is clear that we are able to focus our attention selectively,

choosing to attend to one thing rather than another. This is due to the limited capa-

city of our sensory and mental processes. If we did not selectively attend to the

stimuli coming into our senses, we would be overloaded. We can choose which stimuli

to attend to, and this choice is governed to an extent by our arousal, our level of

interest or need. This explains the cocktail party phenomenon mentioned earlier:

we can attend to one conversation over the background noise, but we may choose

to switch our attention to a conversation across the room if we hear our name

mentioned. Information received by sensory memories is quickly passed into a more

permanent memory store, or overwritten and lost.

1.3.2 Shor t-term memory

Short-term memory or working memory acts as a ‘scratch-pad’ for temporary recall

of information. It is used to store information which is only required ﬂeetingly. For

example, calculate the multiplication 35× 6 in your head. The chances are that you

will have done this calculation in stages, perhaps 5× 6 and then 30 × 6 and added

the results; or you may have used the fact that 6= 2 × 3 and calculated 2 × 35 = 70

followed by 3× 70. To perform calculations such as this we need to store the inter-

mediate stages for use later. Or consider reading. In order to comprehend this

sentence you need to hold in your mind the beginning of the sentence as you read

the rest. Both of these tasks use short-term memory.

Short-term memory can be accessed rapidly, in the order of 70 ms. However, it

also decays rapidly, meaning that information can only be held there temporarily, in

the order of 200 ms.

Short-term memory also has a limited capacity. There are two basic methods for

measuring memory capacity. The ﬁrst involves determining the length of a sequence

which can be remembered in order. The second allows items to be freely recalled in

any order. Using the ﬁrst measure, the average person can remember 7± 2 digits.

This was established in experiments by Miller [234]. Try it. Look at the following

number sequence:

265397620853

Now write down as much of the sequence as you can remember. Did you get it all

right? If not, how many digits could you remember? If you remembered between ﬁve

and nine digits your digit span is average.

Now try the following sequence:

44 113 245 8920

30 Chapter 1 n The human

Did you recall that more easily? Here the digits are grouped or chunked. A general-

ization of the 7± 2 rule is that we can remember 7 ± 2 chunks of information.

Therefore chunking information can increase the short-term memory capacity. The

limited capacity of short-term memory produces a subconscious desire to create

chunks, and so optimize the use of the memory. The successful formation of a chunk

is known as closure. This process can be generalized to account for the desire to com-

plete or close tasks held in short-term memory. If a subject fails to do this or is pre-

vented from doing so by interference, the subject is liable to lose track of what she is

doing and make consequent errors.

DESIGN FOCUS

Cashing in

Closure gives you a nice ‘done it’ when we complete some part of a task. At this point our minds have

a tendency to ﬂush short-term memory in order to get on with the next job. Early automatic teller

machines (ATMs) gave the customer money before returning their bank card. On receiving the money

the customer would reach closure and hence often forget to take the card. Modern ATMs return the

card ﬁrst!

The sequence of chunks given above also makes use of pattern abstraction: it is

written in the form of a UK telephone number which makes it easier to remember.

We may even recognize the ﬁrst sets of digits as the international code for the UK

and the dialing code for Leeds – chunks of information. Patterns can be useful as aids

1.3 Human memory 31

to memory. For example, most people would have difﬁculty remembering the fol-

lowing sequence of chunks:

HEC ATR ANU PTH ETR EET

However, if you notice that by moving the last character to the ﬁrst position, you get

the statement ‘the cat ran up the tree’, the sequence is easy to recall.

In experiments where subjects were able to recall words freely, evidence shows that

recall of the last words presented is better than recall of those in the middle [296].

This is known as the recency effect. However, if the subject is asked to perform

another task between presentation and recall (for example, counting backwards) the

recency effect is eliminated. The recall of the other words is unaffected. This suggests

that short-term memory recall is damaged by interference of other information.

However, the fact that this interference does not affect recall of earlier items provides

some evidence for the existence of separate long-term and short-term memories. The

early items are held in a long-term store which is unaffected by the recency effect.

Interference does not necessarily impair recall in short-term memory. Baddeley asked

subjects to remember six-digit numbers and attend to sentence processing at the same

time [21]. They were asked to answer questions on sentences, such as ‘A precedes B:

AB is true or false?’. Surprisingly, this did not result in interference, suggesting that

in fact short-term memory is not a unitary system but is made up of a number of

components, including a visual channel and an articulatory channel. The task of sen-

tence processing used the visual channel, while the task of remembering digits used

the articulatory channel, so interference only occurs if tasks utilize the same channel.

These ﬁndings led Baddeley to propose a model of working memory that incorp-

orated a number of elements together with a central processing executive. This is

illustrated in Figure 1.10.

Figure 1.10 A more detailed model of short-term memory

32 Chapter 1 n The human

1.3.3 Long-term memor y

If short-term memory is our working memory or ‘scratch-pad’, long-term memory

is our main resource. Here we store factual information, experiential knowledge,

procedural rules of behavior – in fact, everything that we ‘know’. It differs from

short-term memory in a number of signiﬁcant ways. First, it has a huge, if not unlim-

ited, capacity. Secondly, it has a relatively slow access time of approximately a tenth

of a second. Thirdly, forgetting occurs more slowly in long-term memory, if at all.

These distinctions provide further evidence of a memory structure with several parts.

Long-term memory is intended for the long-term storage of information.

Information is placed there from working memory through rehearsal. Unlike work-

ing memory there is little decay: long-term recall after minutes is the same as that

after hours or days.

Long-term memory structure

There are two types of long-term memory: episodic memory and semantic memory.

Episodic memory represents our memory of events and experiences in a serial form.

It is from this memory that we can reconstruct the actual events that took place at a

given point in our lives. Semantic memory, on the other hand, is a structured record

of facts, concepts and skills that we have acquired. The information in semantic

memory is derived from that in our episodic memory, such that we can learn new

facts or concepts from our experiences.

Semantic memory is structured in some way to allow access to information,

representation of relationships between pieces of information, and inference. One

model for the way in which semantic memory is structured is as a network. Items are

DESIGN FOCUS

7 ±2 revisited

When we looked at short-term memory, we noted the general rule that people can hold 7± 2 items

or chunks of information in short-term memory. It is a principle that people tend to remember but it

can be misapplied. For example, it is often suggested that this means that lists, menus and other groups

of items should be designed to be no more than 7 items long. But use of menus and lists of course has

little to do with short-term memory – they are available in the environment as cues and so do not need

to be remembered.

On the other hand the 7± 2 rule would apply in command line interfaces. Imagine a scenario where a

UNIX user looks up a command in the manual. Perhaps the command has a number of parameters of

options, to be applied in a particular order, and it is going to be applied to several ﬁles that have long

path names. The user then has to hold the command, its parameters and the ﬁle path names in short-

term memory while he types them in. Here we could say that the task may cause problems if the num-

ber of items or chunks in the command line string is more than 7.

1.3 Human memory 33

associated to each other in classes, and may inherit attributes from parent classes.

This model is known as a semantic network. As an example, our knowledge about

dogs may be stored in a network such as that shown in Figure 1.11.

Speciﬁc breed attributes may be stored with each given breed, yet general dog

information is stored at a higher level. This allows us to generalize about speciﬁc

cases. For instance, we may not have been told that the sheepdog Shadow has four

legs and a tail, but we can infer this information from our general knowledge about

sheepdogs and dogs in general. Note also that there are connections within the net-

work which link into other domains of knowledge, for example cartoon characters.

This illustrates how our knowledge is organized by association.

The viability of semantic networks as a model of memory organization has been

demonstrated by Collins and Quillian [74]. Subjects were asked questions about

different properties of related objects and their reaction times were measured. The

types of question asked (taking examples from our own network) were ‘Can a collie

breathe?’, ‘Is a beagle a hound?’ and ‘Does a hound track?’ In spite of the fact that the

answers to such questions may seem obvious, subjects took longer to answer ques-

tions such as ‘Can a collie breathe?’ than ones such as ‘Does a hound track?’ The

reason for this, it is suggested, is that in the former case subjects had to search fur-

ther through the memory hierarchy to ﬁnd the answer, since information is stored

at its most abstract level.

A number of other memory structures have been proposed to explain how we

represent and store different types of knowledge. Each of these represents a different

Figure 1.11 Long-term memory may store information in a semantic network

34 Chapter 1 n The human

aspect of knowledge and, as such, the models can be viewed as complementary rather

than mutually exclusive. Semantic networks represent the associations and relation-

ships between single items in memory. However, they do not allow us to model the

representation of more complex objects or events, which are perhaps composed of

a number of items or activities. Structured representations such as framesand scripts

organize information into data structures. Slots in these structures allow attribute

values to be added. Frame slots may contain default, ﬁxed or variable information.

A frame is instantiated when the slots are ﬁlled with appropriate values. Frames

and scripts can be linked together in networks to represent hierarchical structured

knowledge.

Returning to the ‘dog’ domain, a frame-based representation of the knowledge

may look something like Figure 1.12. The ﬁxed slots are those for which the attribute

value is set, default slots represent the usual attribute value, although this may be

overridden in particular instantiations (for example, the Basenji does not bark), and

variable slots can be ﬁlled with particular values in a given instance. Slots can also

contain procedural knowledge. Actions or operations can be associated with a slot

and performed, for example, whenever the value of the slot is changed.

Frames extend semantic nets to include structured, hierarchical information. They

represent knowledge items in a way which makes explicit the relative importance of

each piece of information.

Scripts attempt to model the representation of stereotypical knowledge about situ-

ations. Consider the following sentence:

John took his dog to the surgery. After seeing the vet, he left.

From our knowledge of the activities of dog owners and vets, we may ﬁll in a

substantial amount of detail. The animal was ill. The vet examined and treated the

animal. John paid for the treatment before leaving. We are less likely to assume the

alternative reading of the sentence, that John took an instant dislike to the vet on

sight and did not stay long enough to talk to him!

Figure 1.12 A frame-based representation of knowledge

1.3 Human memory 35

A script represents this default or stereotypical information, allowing us to inter-

pret partial descriptions or cues fully. A script comprises a number of elements,

which, like slots, can be ﬁlled with appropriate information:

Entry conditions Conditions that must be satisﬁed for the script to be activated.

Result Conditions that will be true after the script is terminated.

Props Objects involved in the events described in the script.

Roles Actions performed by particular participants.

Scenes The sequences of events that occur.

Tracks A variation on the general pattern representing an alternative scenario.

An example script for going to the vet is shown in Figure 1.13.

A ﬁnal type of knowledge representation which we hold in memory is the repre-

sentation of procedural knowledge, our knowledge of how to do something. A com-

mon model for this is the production system. Condition–action rules are stored

in long-term memory. Information coming into short-term memory can match a

condition in one of these rules and result in the action being executed. For example,

a pair of production rules might be

IF dog is wagging tail

THEN pat dog

IF dog is growling

THEN run away

If we then meet a growling dog, the condition in the second rule is matched, and we

respond by turning tail and running. (Not to be recommended by the way!)

Figure 1.13 A script for visiting the vet

36 Chapter 1 n The human

Long-term memory processes

So much for the structure of memory, but what about the processes which it uses?

There are three main activities related to long-term memory: storage or remember-

ing of information, forgetting and information retrieval. We shall consider each of

these in turn.

First, how does information get into long-term memory and how can we improve

this process? Information from short-term memory is stored in long-term memory by

rehearsal. The repeated exposure to a stimulus or the rehearsal of a piece of informa-

tion transfers it into long-term memory.

This process can be optimized in a number of ways. Ebbinghaus performed

numerous experiments on memory, using himself as a subject [117]. In these experi-

ments he tested his ability to learn and repeat nonsense syllables, comparing his

recall minutes, hours and days after the learning process. He discovered that the

amount learned was directly proportional to the amount of time spent learning.

This is known as the total time hypothesis. However, experiments by Baddeley and

others suggest that learning time is most effective if it is distributed over time [22].

For example, in an experiment in which Post Ofﬁce workers were taught to type,

those whose training period was divided into weekly sessions of one hour performed

better than those who spent two or four hours a week learning (although the former

obviously took more weeks to complete their training). This is known as the distribu-

tion of practice effect.

However, repetition is not enough to learn information well. If information is

not meaningful it is more difﬁcult to remember. This is illustrated by the fact that

it is more difﬁcult to remember a set of words representing concepts than a set of

words representing objects. Try it. First try to remember the words in list A and test

yourself.

List A: Faith Age Cold Tenet Quiet Logic Idea Value Past Large

Now try list B.

List B: Boat Tree Cat Child Rug Plate Church Gun Flame Head

The second list was probably easier to remember than the ﬁrst since you could

visualize the objects in the second list.

Sentences are easier still to memorize. Bartlett performed experiments on remem-

bering meaningful information (as opposed to meaningless such as Ebbinghaus

used) [28]. In one such experiment he got subjects to learn a story about an un-

familiar culture and then retell it. He found that subjects would retell the story

replacing unfamiliar words and concepts with words which were meaningful to

them. Stories were effectively translated into the subject’s own culture. This is related

to the semantic structuring of long-term memory: if information is meaningful and

familiar, it can be related to existing structures and more easily incorporated into

memory.

1.3 Human memory 37

So if structure, familiarity and concreteness help us in learning information, what

causes us to lose this information, to forget? There are two main theories of forget-

ting: decay and interference. The ﬁrst theory suggests that the information held in

long-term memory may eventually be forgotten. Ebbinghaus concluded from his

experiments with nonsense syllables that information in memory decayed logarith-

mically, that is that it was lost rapidly to begin with, and then more slowly. Jost’s law,

which follows from this, states that if two memory traces are equally strong at a given

time the older one will be more durable.

The second theory is that information is lost from memory through interference.

If we acquire new information it causes the loss of old information. This is termed

retroactiveinterference. A common example of this is the fact that if you change tele-

phone numbers, learning your new number makes it more difﬁcult to remember

your old number. This is because the new association masks the old. However, some-

times the old memory trace breaks through and interferes with new information.

This is called proactive inhibition. An example of this is when you ﬁnd yourself driv-

ing to your old house rather than your new one.

Forgetting is also affected by emotional factors. In experiments, subjects given

emotive words and non-emotive words found the former harder to remember in

the short term but easier in the long term. Indeed, this observation tallies with our

experience of selective memory. We tend to remember positive information rather

than negative (hence nostalgia for the ‘good old days’), and highly emotive events

rather than mundane.

Memorable or secure?

As online activities become more widespread, people are having to remember more and

more access information, such as passwords and security checks. The average active internet user

may have separate passwords and user names for several email accounts, mailing lists, e-shopping

sites, e-banking, online auctions and more! Remembering these passwords is not easy.

From a security perspective it is important that passwords are random. Words and names are very

easy to crack, hence the recommendation that passwords are frequently changed and constructed

from random strings of letters and numbers. But in reality these are the hardest things for people to

commit to memory. Hence many people will use the same password for all their online activities

(rarely if ever changing it) and will choose a word or a name that is easy for them to remember,

in spite of the obviously increased security risks. Security here is in conﬂict with memorability!

A solution to this is to construct a nonsense password out of letters or numbers that will have

meaning to you but will not make up a word in a dictionary (e.g. initials of names, numbers from

signiﬁcant dates or postcodes, and so on). Then what is remembered is the meaningful rule for

constructing the password, and not a meaningless string of alphanumeric characters.

38 Chapter 1 n The human

It is debatable whether we ever actually forget anything or whether it just becomes

increasingly difﬁcult to access certain items from memory. This question is in some

ways moot since it is impossible to prove that we do forget: appearing to have for-

gotten something may just be caused by not being able to retrieve it! However, there

is evidence to suggest that we may not lose information completely from long-term

memory. First, proactive inhibition demonstrates the recovery of old information

even after it has been ‘lost’ by interference. Secondly, there is the ‘tip of the tongue’

experience, which indicates that some information is present but cannot be satisfac-

torily accessed. Thirdly, information may not be recalled but may be recognized, or

may be recalled only with prompting.

This leads us to the third process of memory: information retrieval. Here we need

to distinguish between two types of information retrieval, recall and recognition. In

recall the information is reproduced from memory. In recognition, the presentation

of the information provides the knowledge that the information has been seen

before. Recognition is the less complex cognitive activity since the information is

provided as a cue.

However, recall can be assisted by the provision of retrieval cues, which enable

the subject quickly to access the information in memory. One such cue is the use of

categories. In an experiment subjects were asked to recall lists of words, some of

which were organized into categories and some of which were randomly organized.

The words that were related to a category were easier to recall than the others [38].

Recall is even more successful if subjects are allowed to categorize their own lists of

words during learning. For example, consider the following list of words:

child red plane dog friend blood cold tree big angry

Now make up a story that links the words using as vivid imagery as possible. Now try

to recall as many of the words as you can. Did you ﬁnd this easier than the previous

experiment where the words were unrelated?

The use of vivid imagery is a common cue to help people remember information.

It is known that people often visualize a scene that is described to them. They can

then answer questions based on their visualization. Indeed, subjects given a descrip-

tion of a scene often embellish it with additional information. Consider the follow-

ing description and imagine the scene:

The engines roared above the noise of the crowd. Even in the blistering heat people

rose to their feet and waved their hands in excitement. The ﬂag fell and they were off.

Within seconds the car had pulled away from the pack and was careering round the

bend at a desperate pace. Its wheels momentarily left the ground as it cornered.

Coming down the straight the sun glinted on its shimmering paint. The driver gripped

the wheel with ﬁerce concentration. Sweat lay in ﬁne drops on his brow.

Without looking back to the passage, what color is the car?

If you could answer that question you have visualized the scene, including the

car’s color. In fact, the color of the car is not mentioned in the description

at all.

1.4 Thinking: reasoning and problem solving 39

THINKING: REASONING AND PROBLEM SOLVING

We have considered how information ﬁnds its way into and out of the human

system and how it is stored. Finally, we come to look at how it is processed and

manipulated. This is perhaps the area which is most complex and which separates

1.4

Improve your memory

Many people can perform astonishing feats of memory: recalling the sequence of cards in a

pack (or multiple packs – up to six have been reported), or recounting π to 1000 decimal places,

for example. There are also adverts to ‘Improve Your Memory’ (usually leading to success, or

wealth, or other such inducement), and so the question arises: can you improve your memory

abilities? The answer is yes; this exercise shows you one technique.

Look at the list below of numbers and associated words:

1 bun 6 sticks

2 shoe 7 heaven

3 tree 8 gate

4 door 9 wine

5 hive 10 hen

Notice that the words sound similar to the numbers. Now think about the words one at a time

and visualize them, in as much detail as possible. For example, for ‘1’, think of a large, sticky iced

bun, the base spiralling round and round, with raisins in it, covered in sweet, white, gooey icing.

Now do the rest, using as much visualization as you can muster: imagine how things would look,

smell, taste, sound, and so on.

This is your reference list, and you need to know it off by heart.

Having learnt it, look at a pile of at least a dozen odd items collected together by a colleague. The

task is to look at the collection of objects for only 30 seconds, and then list as many as possible

without making a mistake or viewing the collection again. Most people can manage between ﬁve

and eight items, if they do not know any memory-enhancing techniques like the following.

Mentally pick one (say, for example, a paper clip), and call it number one. Now visualize it inter-

acting with the bun. It can get stuck into the icing on the top of the bun, and make your ﬁngers all

gooey and sticky when you try to remove it. If you ate the bun without noticing, you’d get a

crunched tooth when you bit into it – imagine how that would feel. When you’ve really got a

graphic scenario developed, move on to the next item, call it number two, and again visualize it

interacting with the reference item, shoe. Continue down your list, until you have done 10 things.

This should take you about the 30 seconds allowed. Then hide the collection and try and recall the

numbers in order, the associated reference word, and then the image associated with that word.

You should ﬁnd that you can recall the 10 associated items practically every time. The technique

can be easily extended by extending your reference list.

40 Chapter 1 n The human

humans from other information-processing systems, both artiﬁcial and natural.

Although it is clear that animals receive and store information, there is little evid-

ence to suggest that they can use it in quite the same way as humans. Similarly,

artiﬁcial intelligence has produced machines which can see (albeit in a limited way)

and store information. But their ability to use that information is limited to small

domains.

Humans, on the other hand, are able to use information to reason and solve

problems, and indeed do these activities when the information is partial or unavail-

able. Human thought is conscious and self-aware: while we may not always be

able to identify the processes we use, we can identify the products of these processes,

our thoughts. In addition, we are able to think about things of which we have

no experience, and solve problems which we have never seen before. How is this

done?

Thinking can require different amounts of knowledge. Some thinking activities

are very directed and the knowledge required is constrained. Others require vast

amounts of knowledge from different domains. For example, performing a subtrac-

tion calculation requires a relatively small amount of knowledge, from a constrained

domain, whereas understanding newspaper headlines demands knowledge of pol-

itics, social structures, public ﬁgures and world events.

In this section we will consider two categories of thinking: reasoning and problem

solving. In practice these are not distinct since the activity of solving a problem may

well involve reasoning and vice versa. However, the distinction is a common one and

is helpful in clarifying the processes involved.

1.4.1 Reasoning

Reasoningis the process by which we use the knowledge we have to draw conclusions

or infer something new about the domain of interest. There are a number of differ-

ent types of reasoning: deductive, inductiveand abductive. We use each of these types

of reasoning in everyday life, but they differ in signiﬁcant ways.

Deductive reasoning

Deductive reasoning derives the logically necessary conclusion from the given pre-

mises. For example,

If it is Friday then she will go to work

It is Friday

Therefore she will go to work.

It is important to note that this is the logical conclusion from the premises; it does

not necessarily have to correspond to our notion of truth. So, for example,

If it is raining then the ground is dry

It is raining

Therefore the ground is dry.

1.4 Thinking: reasoning and problem solving 41

is a perfectly valid deduction, even though it conﬂicts with our knowledge of what is

true in the world.

Deductive reasoning is therefore often misapplied. Given the premises

Some people are babies

Some babies cry

many people will infer that ‘Some people cry’. This is in fact an invalid deduction

since we are not told that all babies are people. It is therefore logically possible that

the babies who cry are those who are not people.

It is at this point, where truth and validity clash, that human deduction is poorest.

One explanation for this is that people bring their world knowledge into the reason-

ing process. There is good reason for this. It allows us to take short cuts which make

dialog and interaction between people informative but efﬁcient. We assume a certain

amount of shared knowledge in our dealings with each other, which in turn allows

us to interpret the inferences and deductions implied by others. If validity rather

than truth was preferred, all premises would have to be made explicit.

Inductive reasoning

Induction is generalizing from cases we have seen to infer information about cases

we have not seen. For example, if every elephant we have ever seen has a trunk, we

infer that all elephants have trunks. Of course, this inference is unreliable and cannot

be proved to be true; it can only be proved to be false. We can disprove the inference

simply by producing an elephant without a trunk. However, we can never prove it

true because, no matter how many elephants with trunks we have seen or are known

to exist, the next one we see may be trunkless. The best that we can do is gather evid-

ence to support our inductive inference.

In spite of its unreliability, induction is a useful process, which we use constantly

in learning about our environment. We can never see all the elephants that have ever

lived or will ever live, but we have certain knowledge about elephants which we are

prepared to trust for all practical purposes, which has largely been inferred by induc-

tion. Even if we saw an elephant without a trunk, we would be unlikely to move from

our position that ‘All elephants have trunks’, since we are better at using positive

than negative evidence. This is illustrated in an experiment ﬁrst devised by Wason

[365]. You are presented with four cards as in Figure 1.14. Each card has a number

on one side and a letter on the other. Which cards would you need to pick up to test

the truth of the statement ‘If a card has a vowel on one side it has an even number

on the other’?

A common response to this (was it yours?) is to check the E and the 4. However,

this uses only positive evidence. In fact, to test the truth of the statement we need to

check negative evidence: if we can ﬁnd a card which has an odd number on one side

and a vowel on the other we have disproved the statement. We must therefore check

E and 7. (It does not matter what is on the other side of the other cards: the state-

ment does not say that all even numbers have vowels, just that all vowels have even

numbers.)

42 Chapter 1 n The human

Abductive reasoning

The third type of reasoning is abduction. Abduction reasons from a fact to the action

or state that caused it. This is the method we use to derive explanations for the events

we observe. For example, suppose we know that Sam always drives too fast when she

has been drinking. If we see Sam driving too fast we may infer that she has been

drinking. Of course, this too is unreliable since there may be another reason why she

is driving fast: she may have been called to an emergency, for example.

In spite of its unreliability, it is clear that people do infer explanations in this way,

and hold onto them until they have evidence to support an alternative theory or

explanation. This can lead to problems in using interactive systems. If an event

always follows an action, the user will infer that the event is caused by the action

unless evidence to the contrary is made available. If, in fact, the event and the action

are unrelated, confusion and even error often result.

Figure 1.14 Wason’s cards

Filling the gaps

Look again at Wason’s cards in Figure 1.14. In the text we say that you only need to check

the E and the 7. This is correct, but only because we very carefully stated in the text that ‘each

card has a number on one side and a letter on the other’. If the problem were stated without that

condition then the K would also need to be examined in case it has a vowel on the other side. In

fact, when the problem is so stated, even the most careful subjects ignore this possibility. Why?

Because the nature of the problem implicitly suggests that each card has a number on one side and

a letter on the other.

This is similar to the embellishment of the story at the end of Section 1.3.3. In fact, we constantly

ﬁll in gaps in the evidence that reaches us through our senses. Although this can lead to errors in

our reasoning it is also essential for us to function. In the real world we rarely have all the evid-

ence necessary for logical deductions and at all levels of perception and reasoning we ﬁll in details

in order to allow higher levels of reasoning to work.

1.4 Thinking: reasoning and problem solving 43

1.4.2 Problem solving

If reasoning is a means of inferring new information from what is already known,

problem solving is the process of ﬁnding a solution to an unfamiliar task, using the

knowledge we have. Human problem solving is characterized by the ability to adapt

the information we have to deal with new situations. However, often solutions seem

to be original and creative. There are a number of different views of how people

solve problems. The earliest, dating back to the ﬁrst half of the twentieth century, is

the Gestalt view that problem solving involves both reuse of knowledge and insight.

This has been largely superseded but the questions it was trying to address remain

and its inﬂuence can be seen in later research. A second major theory, proposed in

the 1970s by Newell and Simon, was the problem space theory, which takes the view

that the mind is a limited information processor. Later variations on this drew on the

earlier theory and attempted to reinterpret Gestalt theory in terms of information-

processing theories. We will look brieﬂy at each of these views.

Gestalt theory

Gestalt psychologists were answering the claim, made by behaviorists, that prob-

lem solving is a matter of reproducing known responses or trial and error. This

explanation was considered by the Gestalt school to be insufﬁcient to account for

human problem-solving behavior. Instead, they claimed, problem solving is both pro-

ductive and reproductive. Reproductive problem solving draws on previous experi-

ence as the behaviorists claimed, but productive problem solving involves insight and

restructuring of the problem. Indeed, reproductive problem solving could be a hind-

rance to ﬁnding a solution, since a person may ‘ﬁxate’ on the known aspects of the

problem and so be unable to see novel interpretations that might lead to a solution.

Gestalt psychologists backed up their claims with experimental evidence. Kohler

provided evidence of apparent insight being demonstrated by apes, which he

observed joining sticks together in order to reach food outside their cages [202].

However, this was difﬁcult to verify since the apes had once been wild and so could

have been using previous knowledge.

Other experiments observed human problem-solving behavior. One well-known

example of this is Maier’s pendulum problem [224]. The problem was this: the

subjects were in a room with two pieces of string hanging from the ceiling. Also in

the room were other objects including pliers, poles and extensions. The task set was

to tie the pieces of string together. However, they were too far apart to catch hold

of both at once. Although various solutions were proposed by subjects, few chose

to use the weight of the pliers as a pendulum to ‘swing’ the strings together. How-

ever, when the experimenter brushed against the string, setting it in motion, this

solution presented itself to subjects. Maier interpreted this as an example of produc-

tive restructuring. The movement of the string had given insight and allowed the

subjects to see the problem in a new way. The experiment also illustrates ﬁxation:

subjects were initially unable to see beyond their view of the role or use of a pair

of pliers.

44 Chapter 1 n The human

Although Gestalt theory is attractive in terms of its description of human problem

solving, it does not provide sufﬁcient evidence or structure to support its theories.

It does not explain when restructuring occurs or what insight is, for example. How-

ever, the move away from behaviorist theories was helpful in paving the way for the

information-processing theory that was to follow.

Problem space theory

Newell and Simon proposed that problem solving centers on the problem space. The

problem space comprises problem states, and problem solving involves generating

these states using legal state transition operators. The problem has an initial state

and a goal state and people use the operators to move from the former to the latter.

Such problem spaces may be huge, and so heuristics are employed to select appro-

priate operators to reach the goal. One such heuristic is means–ends analysis. In

means–ends analysis the initial state is compared with the goal state and an oper-

ator chosen to reduce the difference between the two. For example, imagine you are

reorganizing your ofﬁce and you want to move your desk from the north wall of the

room to the window. Your initial state is that the desk is at the north wall. The goal

state is that the desk is by the window. The main difference between these two is the

location of your desk. You have a number of operators which you can apply to mov-

ing things: you can carry them or push them or drag them, etc. However, you know

that to carry something it must be light and that your desk is heavy. You therefore

have a new subgoal: to make the desk light. Your operators for this may involve

removing drawers, and so on.

An important feature of Newell and Simon’s model is that it operates within the

constraints of the human processing system, and so searching the problem space is

limited by the capacity of short-term memory, and the speed at which information

can be retrieved. Within the problem space framework, experience allows us to solve

problems more easily since we can structure the problem space appropriately and

choose operators efﬁciently.

Newell and Simon’s theory, and their General Problem Solvermodel which is based

on it, have largely been applied to problem solving in well-deﬁned domains, for

example solving puzzles. These problems may be unfamiliar but the knowledge that

is required to solve them is present in the statement of the problem and the expected

solution is clear. In real-world problems ﬁnding the knowledge required to solve

the problem may be part of the problem, or specifying the goal may be difﬁcult.

Problems such as these require signiﬁcant domain knowledge: for example, to solve

a programming problem you need knowledge of the language and the domain in

which the program operates. In this instance specifying the goal clearly may be a

signiﬁcant part of solving the problem.

However, the problem space framework provides a clear theory of problem

solving, which can be extended, as we shall see when we look at skill acquisition in

the next section, to deal with knowledge-intensive problem solving. First we will look

brieﬂy at the use of analogy in problem solving.

1.4 Thinking: reasoning and problem solving 45

Worked exercise Identify the goals and operators involved in the problem ‘delete the second paragraph of the

document’ on a word processor. Now use a word processor to delete a paragraph and note

your actions, goals and subgoals. How well did they match your earlier description?

Answer Assume you have a document open and you are at some arbitrary position within it.

You also need to decide which operators are available and what their preconditions and

results are. Based on an imaginary word processor we assume the following operators

(you may wish to use your own WP package):

Operator Precondition Result

delete_paragraph Cursor at start of paragraph Paragraph deleted

move_to_paragraph Cursor anywhere in document Cursor moves to start of

next paragraph (except

where there is no next

paragraph when no effect)

move_to_start Cursor anywhere in document Cursor at start of document

Goal: delete second paragraph in document

Looking at the operators an obvious one to resolve this goal is delete_paragraph which

has the precondition ‘cursor at start of paragraph’. We therefore have a new subgoal:

move_to_paragraph. The precondition is ‘cursor anywhere in document’ (which we can

meet) but we want the second paragraph so we must initially be in the ﬁrst.

We set up a new subgoal, move_to_start, with precondition ‘cursor anywhere in docu-

ment’ and result ‘cursor at start of document’. We can then apply move_to_paragraph

and ﬁnally delete_paragraph.

We assume some knowledge here (that the second paragraph is the paragraph after the

ﬁrst one).

Analogy in problem solving

A third element of problem solving is the use of analogy. Here we are interested in

how people solve novel problems. One suggestion is that this is done by mapping

knowledge relating to a similar known domain to the new problem – called analo-

gical mapping. Similarities between the known domain and the new one are noted

and operators from the known domain are transferred to the new one.

This process has been investigated using analogous stories. Gick and Holyoak

[149] gave subjects the following problem:

A doctor is treating a malignant tumor. In order to destroy it he needs to blast

it with high-intensity rays. However, these will also destroy the healthy tissue sur-

rounding the tumor. If he lessens the rays’ intensity the tumor will remain. How does

he destroy the tumor?

46 Chapter 1 n The human

The solution to this problem is to ﬁre low-intensity rays from different directions

converging on the tumor. That way, the healthy tissue receives harmless low-

intensity rays while the tumor receives the rays combined, making a high-intensity

dose. The investigators found that only 10% of subjects reached this solution with-

out help. However, this rose to 80% when they were given this analogous story and

told that it may help them:

A general is attacking a fortress. He can’t send all his men in together as the roads are

mined to explode if large numbers of men cross them. He therefore splits his men into

small groups and sends them in on separate roads.

In spite of this, it seems that people often miss analogous information, unless it is

semantically close to the problem domain. When subjects were not told to use the

story, many failed to see the analogy. However, the number spotting the analogy rose

when the story was made semantically close to the problem, for example a general

using rays to destroy a castle.

The use of analogy is reminiscent of the Gestalt view of productive restructuring

and insight. Old knowledge is used to solve a new problem.

1.4.3 Skill acquisition

All of the problem solving that we have considered so far has concentrated on

handling unfamiliar problems. However, for much of the time, the problems that

we face are not completely new. Instead, we gradually acquire skill in a particular

domain area. But how is such skill acquired and what difference does it make to our

problem-solving performance? We can gain insight into how skilled behavior works,

and how skills are acquired, by considering the difference between novice and expert

behavior in given domains.

Chess: of human and artiﬁcial intelligence

A few years ago, Deep Blue, a chess-playing computer, beat Gary Kasparov, the world’s top

Grand Master, in a full tournament. This was the long-awaited breakthrough for the artiﬁcial

intelligence (AI) community, who have traditionally seen chess as the ultimate test of their art.

However, despite the fact that computer chess programs can play at Grand Master level against

human players, this does not mean they play in the same way. For each move played, Deep Blue

investigated many millions of alternative moves and counter-moves. In contrast, a human chess

player will only consider a few dozen. But, if the human player is good, these will usually be the

right few dozen. The ability to spot patterns allows a human to address a problem with far less

effort than a brute force approach. In chess, the number of moves is such that ﬁnally brute force,

applied fast enough, has overcome human pattern-matching skill. In Go, which has far more pos-

sible moves, computer programs do not even reach a good club level of play. Many models of the

mental processes have been heavily inﬂuenced by computation. It is worth remembering that

although there are similarities, computer ‘intelligence’ is very different from that of humans.

1.4 Thinking: reasoning and problem solving 47

A commonly studied domain is chess playing. It is particularly suitable since it

lends itself easily to representation in terms of problem space theory. The initial state

is the opening board position; the goal state is one player checkmating the other;

operators to move states are legal moves of chess. It is therefore possible to examine

skilled behavior within the context of the problem space theory of problem solving.

Studies of chess players by DeGroot, Chase and Simon, among others, produced

some interesting observations [64, 65, 88, 89]. In all the experiments the behavior of

chess masters was compared with less experienced chess players. The ﬁrst observa-

tion was that players did not consider large numbers of moves in choosing their

move, nor did they look ahead more than six moves (often far fewer). Masters con-

sidered no more alternatives than the less experienced, but they took less time to

make a decision and produced better moves.

So what makes the difference between skilled and less skilled behavior in chess?

It appears that chess masters remember board conﬁgurations and good moves

associated with them. When given actual board positions to remember, masters

are much better at reconstructing the board than the less experienced. However,

when given random conﬁgurations (which were unfamiliar), the groups of players

were equally bad at reconstructing the positions. It seems therefore that expert

players ‘chunk’ the board conﬁguration in order to hold it in short-term memory.

Expert players use larger chunks than the less experienced and can therefore re-

member more detail.

This behavior is also seen among skilled computer programmers. They can also

reconstruct programs more effectively than novices since they have the structures

available to build appropriate chunks. They acquire plans representing code to solve

particular problems. When that problem is encountered in a new domain or new

program they will recall that particular plan and reuse it.

Another observed difference between skilled and less skilled problem solving is

in the way that different problems are grouped. Novices tend to group problems

according to superﬁcial characteristics such as the objects or features common to

both. Experts, on the other hand, demonstrate a deeper understanding of the prob-

lems and group them according to underlying conceptual similarities which may not

be at all obvious from the problem descriptions.

Each of these differences stems from a better encoding of knowledge in the expert:

information structures are ﬁne tuned at a deep level to enable efﬁcient and accurate

retrieval. But how does this happen? How is skill such as this acquired? One model

of skill acquisition is Anderson’s ACT* model [14]. ACT* identiﬁes three basic

levels of skill:

1. The learner uses general-purpose rules which interpret facts about a problem.

This is slow and demanding on memory access.

2. The learner develops rules speciﬁc to the task.

3. The rules are tuned to speed up performance.

General mechanisms are provided to account for the transitions between these

levels. For example, proceduralization is a mechanism to move from the ﬁrst to the

second. It removes the parts of the rule which demand memory access and replaces

48 Chapter 1 n The human

variables with speciﬁc values. Generalization, on the other hand, is a mechanism

which moves from the second level to the third. It generalizes from the speciﬁc cases

to general properties of those cases. Commonalities between rules are condensed to

produce a general-purpose rule.

These are best illustrated by example. Imagine you are learning to cook. Initially

you may have a general rule to tell you how long a dish needs to be in the oven, and

a number of explicit representations of dishes in memory. You can instantiate the

rule by retrieving information from memory.

IF cook[type, ingredients, time]

THEN

cook for: time

cook[casserole, [chicken,carrots,potatoes], 2 hours]

cook[casserole, [beef,dumplings,carrots], 2 hours]

cook[cake, [ﬂour,sugar,butter,eggs], 45 mins]

Gradually your knowledge becomes proceduralized and you have speciﬁc rules for

each case:

IF type is casserole

AND ingredients are [chicken,carrots,potatoes]

THEN

cook for: 2 hours

IF type is casserole

AND ingredients are [beef,dumplings,carrots]

THEN

cook for: 2 hours

IF type is cake

AND ingredients are [ﬂour,sugar,butter,eggs]

THEN

cook for: 45 mins

Finally, you may generalize from these rules to produce general-purpose rules, which

exploit their commonalities:

IF type is casserole

AND ingredients are ANYTHING

THEN

cook for: 2 hours

The ﬁrst stage uses knowledge extensively. The second stage relies upon known

procedures. The third stage represents skilled behavior. Such behavior may in fact

become automatic and as such be difﬁcult to make explicit. For example, think of

an activity at which you are skilled, perhaps driving a car or riding a bike. Try to

describe to someone the exact procedure which you go through to do this. You will

ﬁnd this quite difﬁcult. In fact experts tend to have to rehearse their actions mentally

in order to identify exactly what they do. Such skilled behavior is efﬁcient but may

cause errors when the context of the activity changes.

1.4 Thinking: reasoning and problem solving 49

1.4.4 Errors and mental models

Human capability for interpreting and manipulating information is quite impres-

sive. However, we do make mistakes. Some are trivial, resulting in no more than

temporary inconvenience or annoyance. Others may be more serious, requiring

substantial effort to correct. Occasionally an error may have catastrophic effects, as

we see when ‘human error’ results in a plane crash or nuclear plant leak.

Why do we make mistakes and can we avoid them? In order to answer the latter

part of the question we must ﬁrst look at what is going on when we make an error.

There are several different types of error. As we saw in the last section some errors

result from changes in the context of skilled behavior. If a pattern of behavior has

become automatic and we change some aspect of it, the more familiar pattern may

break through and cause an error. A familiar example of this is where we intend to

stop at the shop on the way home from work but in fact drive past. Here, the activ-

ity of driving home is the more familiar and overrides the less familiar intention.

Other errors result from an incorrect understanding, or model, of a situation or

system. People build their own theories to understand the causal behavior of sys-

tems. These have been termed mental models. They have a number of characteristics.

Mental models are often partial: the person does not have a full understanding of the

working of the whole system. They are unstable and are subject to change. They can

be internally inconsistent, since the person may not have worked through the logical

consequences of their beliefs. They are often unscientiﬁc and may be based on super-

stition rather than evidence. Often they are based on an incorrect interpretation of

the evidence.

DESIGN FOCUS

Human error and false memories

In the second edition of this book, one of the authors added the following story:

During the Second World War a new cockpit design was introduced for Spitﬁres. The pilots were trained

and ﬂew successfully during training, but would unaccountably bail out when engaged in dog ﬁghts. The new

design had exchanged the positions of the gun trigger and ejector controls. In the heat of battle the old

responses resurfaced and the pilots ejected. Human error, yes, but the designer’s error, not the pilot’s.

It is a good story, but after the book was published we got several emails saying ‘Spitﬁres didn’t have

ejector seats’. It was Kai-Mikael Jää-Aro who was able to ﬁnd what may have been the original to the

story (and incidentally inform us what model of Spitﬁre was in our photo and who the pilot was!). He

pointed us to and translated the story of Sierra 44, an S35E Draken reconnaissance aircraft.

The full

story involves just about every perceptual and cognitive error imaginable, but the point that links to

1. Pej Kristoffersson, 1984. Sigurd 44 – Historien om hur man gör bort sig så att det märks by, Flygrevyn 2/1984, pp. 44 –6.

Assuming a person builds a mental model of the system being dealt with, errors

may occur if the actual operation differs from the mental model. For example, on

one occasion we were staying in a hotel in Germany, attending a conference. In the

lobby of the hotel was a lift. Beside the lift door was a button. Our model of the sys-

tem, based on previous experience of lifts, was that the button would call the lift. We

pressed the button and the lobby light went out! In fact the button was a light switch

and the lift button was on the inside rim of the lift, hidden from view.

the (false) Spitﬁre story is that in the Draken the red buttons for releasing the fuel ‘drop’ tanks and for

the canopy release differed only in very small writing. In an emergency (burning fuel tanks) the pilot

accidentally released the canopy and so ended up ﬂying home cabriolet style.

There is a second story of human error here – the author’s memory. When the book was written he

could not recall where he had come across the story but was convinced it was to do with a Spitﬁre. It

may be that he had been told the story by someone else who had got it mixed up, but it is as likely that

he simply remembered the rough outline of the story and then ‘reconstructed’ the rest. In fact that is

exactly how our memories work. Our brains do not bother to lay down every little detail, but when

we ‘remember’ we rebuild what the incident ‘must have been’ using our world knowledge. This pro-

cess is completely unconscious and can lead to what are known as false memories. This is particularly

problematic in witness statements in criminal trials as early questioning by police or lawyers can unin-

tentionally lead to witnesses being sure they have seen things that they have not. Numerous controlled

psychological experiments have demonstrated this effect which furthermore is strongly inﬂuenced by

biasing factors such as the race of supposed criminals.

To save his blushes we have not said here which author’s failing memory was responsible for the Spitﬁre

story, but you can read more on this story and also ﬁnd who it was on the book website at:

/e3/online/spitﬁre/

50 Chapter 1 n The human

Courtesy of popperfoto.com

1.5 Emotion 51

Although both the light switch and the lift button were inconsistent with our men-

tal models of these controls, we would probably have managed if they had been

encountered separately. If there had been no button beside the lift we would have

looked more closely and found the one on the inner rim. But since the light switch

reﬂected our model of a lift button we looked no further. During our stay we

observed many more new guests making the same error.

This illustrates the importance of a correct mental model and the dangers of

ignoring conventions. There are certain conventions that we use to interpret the

world and ideally designs should support these. If these are to be violated, explicit

support must be given to enable us to form a correct mental model. A label on the

button saying ‘light switch’ would have been sufﬁcient.

EMOTION

So far in this chapter we have concentrated on human perceptual and cognitive abil-

ities. But human experience is far more complex than this. Our emotional response

to situations affects how we perform. For example, positive emotions enable us to

think more creatively, to solve complex problems, whereas negative emotion pushes

us into narrow, focussed thinking. A problem that may be easy to solve when we are

relaxed, will become difﬁcult if we are frustrated or afraid.

Psychologists have studied emotional response for decades and there are many

theories as to what is happening when we feel an emotion and why such a response

occurs. More than a century ago, William James proposed what has become known

as the James–Lange theory (Lange was a contemporary of James whose theories

were similar): that emotion was the interpretation of a physiological response, rather

than the other way around. So while we may feel that we respond to an emotion,

James contended that we respond physiologically to a stimulus and interpret that as

emotion:

Common sense says, we lose our fortune, are sorry and weep; we meet a bear, are

frightened and run; we are insulted by a rival, are angry and strike. The hypothesis

here...is that we feel sorry because we cry, angry because we strike, afraid because we

tremble.

(W. James, Principles of Psychology, page 449. Henry Holt, New York, 1890.)

Others, however, disagree. Cannon [54a], for example, argued that our physio-

logical processes are in fact too slow to account for our emotional reactions, and that

the physiological responses for some emotional states are too similar (e.g. anger

and fear), yet they can be easily distinguished. Experience in studies with the use of

drugs that stimulate broadly the same physiological responses as anger or fear seems

to support this: participants reported physical symptoms but not the emotion,

which suggests that emotional response is more than a recognition of physiological

changes.

1.5

52 Chapter 1 n The human

Schachter and Singer [312a] proposed a third interpretation: that emotion results

from a person evaluating physical responses in the light of the whole situation. So

whereas the same physiological response can result from a range of different situ-

ations, the emotion that is felt is based on a cognitive evaluation of the circumstance

and will depend on what the person attributes this to. So the same physiological

response of a pounding heart will be interpreted as excitement if we are in a com-

petition and fear if we ﬁnd ourselves under attack.

Whatever the exact process, what is clear is that emotion involves both physical

and cognitive events. Our body responds biologically to an external stimulus and we

interpret that in some way as a particular emotion. That biological response – known

as affect – changes the way we deal with different situations, and this has an impact

on the way we interact with computer systems. As Donald Norman says:

Negative affect can make it harder to do even easy tasks; positive affect can make it

easier to do difﬁcult tasks.

(D. A. Norman, Emotion and design: attractive things work better.

Interactions Magazine, ix(4): 36–42, 2002.)

So what are the implications of this for design? It suggests that in situations of

stress, people will be less able to cope with complex problem solving or managing

difﬁcult interfaces, whereas if people are relaxed they will be more forgiving of

limitations in the design. This does not give us an excuse to design bad interfaces

but does suggest that if we build interfaces that promote positive responses – for

example by using aesthetics or reward – then they are likely be more successful.

INDIVIDUAL DIFFERENCES

In this chapter we have been discussing humans in general. We have made the

assumption that everyone has similar capabilities and limitations and that we

can therefore make generalizations. To an extent this is true: the psychological

principles and properties that we have discussed apply to the majority of people.

Notwithstanding this, we should remember that, although we share processes in

common, humans, and therefore users, are not all the same. We should be aware of

individual differences so that we can account for them as far as possible within our

designs. These differences may be long term, such as sex, physical capabilities and

intellectual capabilities. Others are shorter term and include the effect of stress

or fatigue on the user. Still others change through time, such as age.

These differences should be taken into account in our designs. It is useful to

consider, for any design decision, if there are likely to be users within the target

group who will be adversely affected by our decision. At the extremes a decision may

exclude a section of the user population. For example, the current emphasis on visual

interfaces excludes those who are visually impaired, unless the design also makes use

of the other sensory channels. On a more mundane level, designs should allow for

1.6

1.7 Psychology and the design of interactive systems 53

users who are under pressure, feeling ill or distracted by other concerns: they should

not push users to their perceptual or cognitive limits.

We will consider the issues of universal accessibility in more detail in Chapter 10.

PSYCHOLOGY AND THE DESIGN OF

INTERACTIVE SYSTEMS

So far we have looked brieﬂy at the way in which humans receive, process and

store information, solve problems and acquire skill. But how can we apply what we

have learned to designing interactive systems? Sometimes, straightforward conclu-

sions can be drawn. For example, we can deduce that recognition is easier than recall

and allow users to select commands from a set (such as a menu) rather than input

them directly. However, in the majority of cases, application is not so obvious

or simple. In fact, it may be dangerous, leading us to make generalizations which are

not valid. In order to apply a psychological principle or result properly in design,

we need to understand its context, both in terms of where it ﬁts in the wider ﬁeld

of psychology and in terms of the details of the actual experiments, the measures

used and the subjects involved, for example. This may appear daunting, particularly

to the novice designer who wants to acknowledge the relevance of cognitive psy-

chology but does not have the background to derive appropriate conclusions.

Fortunately, principles and results from research in psychology have been distilled

into guidelines for design, models to support design and techniques for evaluating

design. Parts 2 and 3 of this book include discussion of a range of guidelines,

models and techniques, based on cognitive psychology, which can be used to support

the design process.

1.7.1 Guidelines

Throughout this chapter we have discussed the strengths and weaknesses of human

cognitive and perceptual processes but, for the most part, we have avoided attempt-

ing to apply these directly to design. This is because such an attempt could only

be partial and simplistic, and may give the impression that this is all psychology

has to offer.

However, general design principles and guidelines can be and have been derived

from the theories we have discussed. Some of these are relatively straightforward:

for instance, recall is assisted by the provision of retrieval cues so interfaces should

incorporate recognizable cues wherever possible. Others are more complex and con-

text dependent. In Chapter 7 we discuss principles and guidelines further, many of

which are derived from psychological theory. The interested reader is also referred to

Gardiner and Christie [140] which illustrates how guidelines can be derived from

psychological theory.

1.7

54 Chapter 1 n The human

1.7.2 Models to suppor t design

As well as guidelines and principles, psychological theory has led to the development

of analytic and predictive models of user behavior. Some of these include a speciﬁc

model of human problem solving, others of physical activity, and others attempt

a more comprehensive view of cognition. Some predict how a typical computer

user would behave in a given situation, others analyze why particular user behavior

occurred. All are based on cognitive theory. We discuss these models in detail in

Chapter 12.

1.7.3 Techniques for evaluation

In addition to providing us with a wealth of theoretical understanding of the human

user, psychology also provides a range of empirical techniques which we can employ

to evaluate our designs and our systems. In order to use these effectively we need to

understand the scope and beneﬁts of each method. Chapter 9 provides an overview

of these techniques and an indication of the circumstances under which each should

be used.

Worked exercise Produce a semantic network of the main information in this chapter.

Answer This network is potentially huge so it is probably unnecessary to devise the whole thing!

Be selective. One helpful way to tackle the exercise is to approach it in both a top-down

and a bottom-up manner. Top-down will give you a general overview of topics and how

they relate; bottom-up can ﬁll in the details of a particular ﬁeld. These can then be

Top-down view

1.8 Summary 55

‘glued’ together to build up the whole picture. You may be able to tackle this problem

in a group, each taking one part of it. We will not provide the full network here but will

give examples of the level of detail anticipated for the overview and the detailed ver-

sions. In the overview we have not included labels on the arcs for clarity.

SUMMARY

In this chapter we have considered the human as an information processor, re-

ceiving inputs from the world, storing, manipulating and using information, and

reacting to the information received. Information is received through the senses,

particularly, in the case of computer use, through sight, hearing and touch. It is

stored in memory, either temporarily in sensory or working memory, or perman-

ently in long-term memory. It can then be used in reasoning and problem solving.

Recurrent familiar situations allow people to acquire skills in a particular domain, as

their information structures become better deﬁned. However, this can also lead to

error, if the context changes.

Human perception and cognition are complex and sophisticated but they are not

without their limitations. We have considered some of these limitations in this chap-

ter. An understanding of the capabilities and limitations of the human as informa-

tion processor can help us to design interactive systems which support the former

and compensate for the latter. The principles, guidelines and models which can be

derived from cognitive psychology and the techniques which it provides are invalu-

able tools for the designer of interactive systems.

1.8

Bottom-up view

56 Chapter 1 n The human

EXERCISES

1.1 Devise experiments to test the properties of (i) short-term memory, (ii) long-term

memory, using the experiments described in this chapter to help you. Try out your experiments

on your friends. Are your results consistent with the properties described in this chapter?

1.2 Observe skilled and novice operators in a familiar domain, for example touch and ‘hunt-and-peck’

typists, expert and novice game players, or expert and novice users of a computer application.

What differences can you discern between their behaviors?

1.3 From what you have learned about cognitive psychology devise appropriate guidelines for use by

interface designers. You may ﬁnd it helpful to group these under key headings, for example visual

perception, memory, problem solving, etc., although some may overlap such groupings.

1.4 What are mental models, and why are they important in interface design?

1.5 What can a system designer do to minimize the memory load of the user?

1.6 Human short-term memory has a limited span. This is a series of experiments to determine what

that span is. (You will need some other people to take part in these experiments with you – they

do not need to be studying the course – try it with a group of friends.)

(a) Kim’s game

Divide into groups. Each group gathers together an assortment of objects – pens, pencils, paper-

clips, books, sticky notes, etc. The stranger the object, the better! You need a large number of

them – at least 12 to 15. Place them in some compact arrangement on a table, so that all items

are visible. Then, swap with another group for 30 seconds only and look at their pile. Return to

your table, and on your own try to write down all the items in the other group’s pile.

Compare your list with what they actually have in their pile. Compare the number of things you

remembered with how the rest of your group did. Now think introspectively: what helped you

remember certain things? Did you recognize things in their pile that you had in yours? Did that

help? Do not pack the things away just yet.

Calculate the average score for your group. Compare that with the averages from the other

group(s).

Questions: What conclusions can you draw from this experiment? What does this indicate

about the capacity of short-term memory? What does it indicate that helps improve the capa-

city of short-term memory?

(b) ‘I went to market...’

In your group, one person starts off with ‘I went to market and I bought a ﬁsh’ (or some other

produce, or whatever!). The next person continues ‘I went to market and I bought a ﬁsh and

I bought a bread roll as well’. The process continues, with each person adding some item to

the list each time. Keep going around the group until you cannot remember the list accurately.

Make a note of the ﬁrst time someone gets it wrong, and then record the number of items

that you can successfully remember. Some of you will ﬁnd it hard to remember more than a few,

others will fare much better. Do this a few more times with different lists, and then calculate your

average score, and your group’s average score.

Recommended reading 57

58 Chapter 1 n The human

A. Monk, editor, Fundamentals of Human Computer Interaction, Academic Press,

1985.

A good collection of articles giving brief coverage of aspects of human psychology

including perception, memory, thinking and reading. Also contains articles on

experimental design which provide useful introductions.

ACT-R site. Website of resources and examples of the use of the cognitive archi-

tecture ACT-R, which is the latest development of Anderson’s ACT model,

http://act-r.psy.cmu.edu/

THE COMPUTER

OVERVIEW

A computer system comprises various elements, each of which affects the

user of the system.

n Input devices for interactive use, allowing text entry, drawing and

selection from the screen:

– text entry: traditional keyboard, phone text entry, speech and

handwriting

– pointing: principally the mouse, but also touchpad, stylus and others

– 3D interaction devices.

n Output display devices for interactive use:

– different types of screen mostly using some form of bitmap display

– large displays and situated displays for shared and public use

– digital paper may be usable in the near future.

n Virtual reality systems and 3D visualization which have special interaction

and display devices.

n Various devices in the physical world:

– physical controls and dedicated displays

– sound, smell and haptic feedback

– sensors for nearly everything including movement, temperature,

bio-signs.

n Paper output and input: the paperless ofﬁce and the less-paper ofﬁce:

– different types of printers and their characteristics, character styles

and fonts

– scanners and optical character recognition.

n Memory:

– short-term memory: RAM

– long-term memory: magnetic and optical disks

– capacity limitations related to document and video storage

– access methods as they limit or help the user.

n Processing:

– the effects when systems run too slow or too fast, the myth of the

inﬁnitely fast machine

– limitations on processing speed

– networks and their impact on system performance.

60 Chapter 2 n The computer

INTRODUCTION

In order to understand how humans interact with computers, we need to have an

understanding of both parties in the interaction. The previous chapter explored

aspects of human capabilities and behavior of which we need to be aware in the

context of human–computer interaction; this chapter considers the computer and

associated input–output devices and investigates how the technology inﬂuences the

nature of the interaction and style of the interface.

We will concentrate principally on the traditional computer but we will also look

at devices that take us beyond the closed world of keyboard, mouse and screen. As

well as giving us lessons about more traditional systems, these are increasingly

becoming important application areas in HCI.

2.1

Exercise: how many computers?

In a group or class do a quick survey:

n How many computers do you have in your home?

n How many computers do you normally carry with you in your pockets or bags?

Collate the answers and see who the techno-freaks are!

Discuss your answers.

After doing this look at /e3/online/how-many-computers/

When we interact with computers, what are we trying to achieve? Consider what

happens when we interact with each other – we are either passing information to

other people, or receiving information from them. Often, the information we receive

is in response to the information that we have recently imparted to them, and we

may then respond to that. Interaction is therefore a process of information transfer.

Relating this to the electronic computer, the same principles hold: interaction is a

process of information transfer, from the user to the computer and from the com-

puter to the user.

The ﬁrst part of this chapter concentrates on the transference of information from

the user to the computer and back. We begin by considering a current typical com-

puter interface and the devices it employs, largely variants of keyboard for text entry

(Section 2.2), mouse for positioning (Section 2.3) and screen for displaying output

(Section 2.4).

Then we move on to consider devices that go beyond the keyboard, mouse and

screen: entering deeper into the electronic world with virtual reality and 3D interaction

2.1 Introduction 61

(Section 2.5) and outside the electronic world looking at more physical interactions

(Section 2.6).

In addition to direct input and output, information is passed to and fro via

paper documents. This is dealt with in Section 2.7, which describes printers and

scanners. Although not requiring the same degree of user interaction as a mouse

or keyboard, these are an important means of input and output for many current

applications.

We then consider the computer itself, its processor and memory devices and

the networks that link them together. We note how the technology drives and

empowers the interface. The details of computer processing should largely be irrelev-

ant to the end-user, but the interface designer needs to be aware of the limitations

of storage capacity and computational power; it is no good designing on paper a

marvellous new interface, only to ﬁnd it needs a Cray to run. Software designers

often have high-end machines on which to develop applications, and it is easy to

forget what a more typical conﬁguration feels like.

Before looking at these devices and technology in detail we’ll take a quick

bird’s-eye view of the way computer systems are changing.

2.1.1 A typical computer system

Consider a typical computer setup as shown in Figure 2.1. There is the computer

‘box’ itself, a keyboard, a mouse and a color screen. The screen layout is shown

alongside it. If we examine the interface, we can see how its various characteristics

are related to the devices used. The details of the interface itself, its underlying prin-

ciples and design, are discussed in more depth in Chapter 3. As we shall see there are

variants on these basic devices. Some of this variation is driven by different hardware

conﬁgurations: desktop use, laptop computers, PDAs (personal digital assistants).

Partly the diversity of devices reﬂects the fact that there are many different types of

Window

Figure 2.1 A typical computer system

62 Chapter 2 n The computer

data that may have to be entered into and obtained from a system, and there are also

many different types of user, each with their own unique requirements.

2.1.2 Levels of interaction – batch processing

In the early days of computing, information was entered into the computer in a

large mass – batch data entry. There was minimal interaction with the machine: the

user would simply dump a pile of punched cards onto a reader, press the start

button, and then return a few hours later. This still continues today although now

with pre-prepared electronic ﬁles or possibly machine-read forms. It is clearly the

most appropriate mode for certain kinds of application, for example printing pay

checks or entering the results from a questionnaire.

With batch processing the interactions take place over hours or days. In contrast

the typical desktop computer system has interactions taking seconds or fractions of

a second (or with slow web pages sometimes minutes!). The ﬁeld of Human–

Computer Interaction largely grew due to this change in interactive pace. It is easy to

assume that faster means better, but some of the paper-based technology discussed

in Section 2.7 suggests that sometimes slower paced interaction may be better.

2.1.3 Richer interaction – everywhere, everywhen

Computers are coming out of the box! Information appliances are putting internet

access or dedicated systems onto the fridge, microwave and washing machine: to

automate shopping, give you email in your kitchen or simply call for maintenance

when needed. We carry with us WAP phones and smartcards, have security systems

that monitor us and web cams that show our homes to the world. Is Figure 2.1 really

the typical computer system or is it really more like Figure 2.2?

Figure 2.2 A typical computer system? Photo courtesy Electrolux

2.2 Text entry devices 63

TEXT ENTRY DEVICES

Whether writing a book like this, producing an ofﬁce memo, sending a thank you

letter after your birthday, or simply sending an email to a friend, entering text is

one of our main activities when using the computer. The most obvious means of

text entry is the plain keyboard, but there are several variations on this: different

keyboard layouts, ‘chord’ keyboards that use combinations of ﬁngers to enter let-

ters, and phone key pads. Handwriting and speech recognition offer more radical

alternatives.

2.2.1 The alphanumeric keyboard

The keyboard is still one of the most common input devices in use today. It is used

for entering textual data and commands. The vast majority of keyboards have a stand-

ardized layout, and are known by the ﬁrst six letters of the top row of alphabetical

keys, QWERTY. There are alternative designs which have some advantages over the

QWERTY layout, but these have not been able to overcome the vast technological

inertia of the QWERTY keyboard. These alternatives are of two forms: 26 key layouts

and chord keyboards. A 26 key layout rearranges the order of the alphabetic keys,

putting the most commonly used letters under the strongest ﬁngers, or adopting

simpler practices. In addition to QWERTY, we will discuss two 26 key layouts,

alphabetic and DVORAK, and chord keyboards.

The QWERTY keyboard

The layout of the digits and letters on a QWERTY keyboard is ﬁxed (see Figure 2.3),

but non-alphanumeric keys vary between keyboards. For example, there is a differ-

ence between key assignments on British and American keyboards (in particular,

above the 3 on the UK keyboard is the pound sign £, whilst on the US keyboard

there is a dollar sign $). The standard layout is also subject to variation in the place-

ment of brackets, backslashes and suchlike. In addition different national keyboards

include accented letters and the traditional French layout places the main letters in

different locations – the top line starts AZERTY.

2.2

Figure 2.3 The standard QWERTY keyboard

64 Chapter 2 n The computer

The QWERTY arrangement of keys is not optimal for typing, however. The

reason for the layout of the keyboard in this fashion can be traced back to the days

of mechanical typewriters. Hitting a key caused an arm to shoot towards the carriage,

imprinting the letter on the head on the ribbon and hence onto the paper. If two

arms ﬂew towards the paper in quick succession from nearly the same angle,

they would often jam – the solution to this was to set out the keys so that common

combinations of consecutive letters were placed at different ends of the keyboard,

which meant that the arms would usually move from alternate sides. One appealing

story relating to the key layout is that it was also important for a salesman to be able

to type the word ‘typewriter’ quickly in order to impress potential customers: the

letters are all on the top row!

The electric typewriter and now the computer keyboard are not subject to the

original mechanical constraints, but the QWERTY keyboard remains the dominant

layout. The reason for this is social – the vast base of trained typists would be reluct-

ant to relearn their craft, whilst the management is not prepared to accept an initial

lowering of performance whilst the new skills are gained. There is also a large invest-

ment in current keyboards, which would all have to be either replaced at great cost,

or phased out, with the subsequent requirement for people to be proﬁcient on both

keyboards. As whole populations have become keyboard users this technological

inertia has probably become impossible to change.

How keyboards work

Current keyboards work by a keypress closing a connection, causing a character code to

be sent to the computer. The connection is usually via a lead, but wireless systems also exist. One

aspect of keyboards that is important to users is the ‘feel’ of the keys. Some keyboards require a

very hard press to operate the key, much like a manual typewriter, whilst others are featherlight.

The distance that the keys travel also affects the tactile nature of the keyboard. The keyboards that

are currently used on most notebook computers are ‘half-travel’ keyboards, where the keys travel

only a small distance before activating their connection; such a keyboard can feel dead to begin

with, but such qualitative judgments often change as people become more used to using it. By mak-

ing the actual keys thinner, and allowing them a much reduced travel, a lot of vertical space can be

saved on the keyboard, thereby making the machine slimmer than would otherwise be possible.

Some keyboards are even made of touch-sensitive buttons, which require a light touch and

practically no travel; they often appear as a sheet of plastic with the buttons printed on them.

Such keyboards are often found on shop tills, though the keys are not QWERTY, but speciﬁc to

the task. Being fully sealed, they have the advantage of being easily cleaned and resistant to dirty

environments, but have little feel, and are not popular with trained touch-typists. Feedback is

important even at this level of human–computer interaction! With the recent increase of repetit-

ive strain injury (RSI) to users’ ﬁngers, and the increased responsibilities of employers in these

circumstances, it may be that such designs will enjoy a resurgence in the near future. RSI in ﬁngers

is caused by the tendons that control the movement of the ﬁngers becoming inﬂamed owing to

overuse and making repeated unnatural movements.

2.2 Text entry devices 65

Ease of learning – alphabetic keyboard

One of the most obvious layouts to be produced is the alphabetic keyboard, in which

the letters are arranged alphabetically across the keyboard. It might be expected that

such a layout would make it quicker for untrained typists to use, but this is not the

case. Studies have shown that this keyboard is not faster for properly trained typists,

as we may expect, since there is no inherent advantage to this layout. And even for

novice or occasional users, the alphabetic layout appears to make very little differ-

ence to the speed of typing. These keyboards are used in some pocket electronic per-

sonal organizers, perhaps because the layout looks simpler to use than the QWERTY

one. Also, it dissuades people from attempting to use their touch-typing skills on a

very small keyboard and hence avoids criticisms of difﬁculty of use!

Ergonomics of use – DVORAK keyboard and split designs

The DVORAK keyboard uses a similar layout of keys to the QWERTY system, but

assigns the letters to different keys. Based upon an analysis of typing, the keyboard is

designed to help people reach faster typing speeds. It is biased towards right-handed

people, in that 56% of keystrokes are made with the right hand. The layout of the

keys also attempts to ensure that the majority of keystrokes alternate between hands,

thereby increasing the potential speed. By keeping the most commonly used keys on

the home, or middle, row, 70% of keystrokes are made without the typist having

to stretch far, thereby reducing fatigue and increasing keying speed. The layout also

There are a variety of specially shaped keyboards to relieve the strain of typing or to allow

people to type with some injury (e.g. RSI) or disability. These may slope the keys towards the

hands to improve the ergonomic position, be designed for single-handed use, or for no hands at

all. Some use bespoke key layouts to reduce strain of ﬁnger movements. The keyboard illustrated

is produced by PCD Maltron Ltd. for left-handed use. See www.maltron.com/

Source: www.maltron.com, reproduced courtesy of PCD Maltron Ltd.

66 Chapter 2 n The computer

aims to minimize the number of keystrokes made with the weak ﬁngers. Many

of these requirements are in conﬂict, and the DVORAK keyboard represents one

possible solution. Experiments have shown that there is a speed improvement of

between 10 and 15%, coupled with a reduction in user fatigue due to the increased

ergonomic layout of the keyboard [230].

Other aspects of keyboard design have been altered apart from the layout of the

keys. A number of more ergonomic designs have appeared, in which the basic tilted

planar base of the keyboard is altered. Moderate designs curve the plane of the key-

board, making it concave, whilst more extreme ones split the keys into those for the

left and right hand and curve both halves separately. Often in these the keys are also

moved to bring them all within easy reach, to minimize movement between keys.

Such designs are supposed to aid comfort and reduce RSI by minimizing effort, but

have had practically no impact on the majority of systems sold.

2.2.2 Chord keyboards

Chord keyboards are signiﬁcantly different from normal alphanumeric keyboards.

Only a few keys, four or ﬁve, are used (see Figure 2.4) and letters are produced by

pressing one or more of the keys at once. For example, in the Microwriter, the pat-

tern of multiple keypresses is chosen to reﬂect the actual letter shape.

Such keyboards have a number of advantages. They are extremely compact:

simply reducing the size of a conventional keyboard makes the keys too small and

close together, with a correspondingly large increase in the difﬁculty of using it. The

Figure 2.4 A very early chord keyboard (left) and its lettercodes (right)

2.2 Text entry devices 67

learning time for the keyboard is supposed to be fairly short – of the order of a few

hours – but social resistance is still high. Moreover, they are capable of fast typing

speeds in the hands (or rather hand!) of a competent user. Chord keyboards can

also be used where only one-handed operation is possible, in cramped and conﬁned

conditions.

Lack of familiarity means that these are unlikely ever to be a mainstream form of

text entry, but they do have applications in niche areas. In particular, courtroom

stenographers use a special form of two-handed chord keyboard and associated

shorthand to enter text at full spoken speed. Also it may be that the compact size and

one-handed operation will ﬁnd a place in the growing wearables market.

DESIGN FOCUS

Numeric keypads

Alphanumeric keyboards (as the name suggests) include numbers as well as letters. In the QWERTY

layout these are in a line across the top of the keyboard, but in most larger keyboards there is also a

separate number pad to allow faster entry of digits. Number keypads occur in other contexts too,

including calculators, telephones and ATM cash dispensers. Many people are unaware that there are

two different layouts for numeric keypads: the calculator style that has ‘123’ on the bottom and the

telephone style that has ‘123’ at the top.

It is a demonstration of the amazing adaptability of humans that we move between these two styles

with such ease. However, if you need to include a numeric keypad in a device you must consider which

is most appropriate for your potential users. For example, computer keyboards use calculator-style

layout, as they are primarily used for entering numbers for calculations.

One of the authors was caught out by this once when he forgot the PIN number of his cash card. He

half remembered the digits, but also his ﬁngers knew where to type, so he ‘practiced’ on his calculator.

Unfortunately ATMs use telephone-style layout!

calculator ATM phone

68 Chapter 2 n The computer

2.2.3 Phone pad and T9 entr y

With mobile phones being used for SMS text messaging (see Chapter 19) and WAP

(see Chapter 21), the phone keypad has become an important form of text input.

Unfortunately a phone only has digits 0–9, not a full alphanumeric keyboard.

To overcome this for text input the numeric keys are usually pressed several times

– Figure 2.5 shows a typical mapping of digits to letters. For example, the 3 key has

‘def’ on it. If you press the key once you get a ‘d’, if you press 3 twice you get an ‘e’,

if you press it three times you get an ‘f’. The main number-to-letter mapping is stand-

ard, but punctuation and accented letters differ between phones. Also there needs to

be a way for the phone to distinguish, say, the ‘dd’ from ‘e’. On some phones you

need to pause for a short period between successive letters using the same key, for

others you press an additional key (e.g. ‘#’).

Most phones have at least two modesfor the numeric buttons: one where the keys

mean the digits (for example when entering a phone number) and one where they

mean letters (for example when typing an SMS message). Some have additional

modes to make entering accented characters easier. Also a special mode or setting is

needed for capital letters although many phones use rules to reduce this, for ex-

ample automatically capitalizing the initial letter in a message and letters following

full stops, question marks and exclamation marks.

This is all very laborious and, as we will see in Chapter 19, experienced mobile

phone users make use of a highly developed shorthand to reduce the number of

keystrokes. If you watch a teenager or other experienced txt-er, you will see they

Figure 2.5 Mobile phone keypad. Source: Photograph by Alan Dix (Ericsson phone)

Typical key mapping:

1 – space, comma, etc. (varies)

2 – a b c

3 – d e f

4 – g h i

5 – j k l

6 – m n o

7 – p q r s

8 – t u v

9 – w x y z

0 – +, &, etc.

2.2 Text entry devices 69

often develop great typing speed holding the phone in one hand and using only

their thumb. As these skills spread through society it may be that future devices

use this as a means of small format text input. For those who never develop this

physical dexterity some phones have tiny plug-in keyboards, or come with fold-out

keyboards.

Another technical solution to the problem is the T9 algorithm. This uses a large

dictionary to disambiguate words by simply typing the relevant letters once. For

example, ‘3926753’ becomes ‘example’ as there is only one word with letters that

match (alternatives like ‘ewbosld’ that also match are not real words). Where there

are ambiguities such as ‘26’, which could be an ‘am’ or an ‘an’, the phone gives a

series of options to choose from.

2.2.4 Handwriting recognition

Handwriting is a common and familiar activity, and is therefore attractive as a

method of text entry. If we were able to write as we would when we use paper, but

with the computer taking this form of input and converting it to text, we can see that

it is an intuitive and simple way of interacting with the computer. However, there are

a number of disadvantages with handwriting recognition. Current technology is

still fairly inaccurate and so makes a signiﬁcant number of mistakes in recognizing

letters, though it has improved rapidly. Moreover, individual differences in hand-

writing are enormous, and make the recognition process even more difﬁcult. The

most signiﬁcant information in handwriting is not in the letter shape itself but in the

stroke information – the way in which the letter is drawn. This means that devices

which support handwriting recognition must capture the stroke information, not

just the ﬁnal character shape. Because of this, online recognition is far easier than

reading handwritten text on paper. Further complications arise because letters

within words are shaped and often drawn very differently depending on the actual

word; the context can help determine the letter’s identity, but is often unable to pro-

vide enough information. Handwriting recognition is covered in more detail later in

the book, in Chapter 10. More serious in many ways is the limitation on speed; it is

difﬁcult to write at more than 25 words a minute, which is no more than half the

speed of a decent typist.

The different nature of handwriting means that we may ﬁnd it more useful in

situations where a keyboard-based approach would have its own problems. Such

situations will invariably result in completely new systems being designed around

the handwriting recognizer as the predominant mode of textual input, and these

may bear very little resemblance to the typical system. Pen-based systems that use

handwriting recognition are actively marketed in the mobile computing market,

especially for smaller pocket organizers. Such machines are typically used for taking

notes and jotting down and sketching ideas, as well as acting as a diary, address book

and organizer. Using handwriting recognition has many advantages over using a

keyboard. A pen-based system can be small and yet still accurate and easy to use,

whereas small keys become very tiring, or even impossible, to use accurately. Also the

70 Chapter 2 n The computer

pen-based approach does not have to be altered when we move from jotting down

text to sketching diagrams; pen-based input is highly appropriate for this also.

Some organizer designs have dispensed with a keyboard completely. With such

systems one must consider all sorts of other ways to interact with the system that are

not character based. For example, we may decide to use gesture recognition, rather

than commands, to tell the system what to do, for example drawing a line through a

word in order to delete it. The important point is that a different input device that

was initially considered simply as an alternative to the keyboard opens up a whole

host of alternative interface designs and different possibilities for interaction.

Signature authentication

Handwriting recognition is difﬁcult principally because of the great differences between dif-

ferent people’s handwriting. These differences can be used to advantage in signature authentication

where the purpose is to identify the user rather than read the signature. Again this is far easier

when we have stroke information as people tend to produce signatures which look slightly differ-

ent from one another in detail, but are formed in a similar fashion. Furthermore, a forger who has

a copy of a person’s signature may be able to copy the appearance of the signature, but will not

be able to reproduce the pattern of strokes.

2.2.5 Speech recognition

Speech recognition is a promising area of text entry, but it has been promising for a

number of years and is still only used in very limited situations. There is a natural

enthusiasm for being able to talk to the machine and have it respond to commands,

since this form of interaction is one with which we are very familiar. Successful

recognition rates of over 97% have been reported, but since this represents one let-

ter in error in approximately every 30, or one spelling mistake every six or so words,

this is stoll unacceptible (sic)! Note also that this performance is usually quoted only

for a restricted vocabulary of command words. Trying to extend such systems to the

level of understanding natural language, with its inherent vagueness, imprecision

and pauses, opens up many more problems that have not been satisfactorily solved

even for keyboard-entered natural language. Moreover, since every person speaks

differently, the system has to be trained and tuned to each new speaker, or its per-

formance decreases. Strong accents, a cold or emotion can also cause recognition

problems, as can background noise. This leads us on to the question of practicality

within an ofﬁce environment: not only may the background level of noise cause

errors, but if everyone in an open-plan ofﬁce were to talk to their machine, the level

of noise would dramatically increase, with associated difﬁculties. Conﬁdentiality

would also be harder to maintain.

Despite its problems, speech technology has found niche markets: telephone

information systems, access for the disabled, in hands-occupied situations (especially

2.3 Positioning, pointing and drawing 71

military) and for those suffering RSI. This is discussed in greater detail in Chapter 10,

but we can see that it offers three possibilities. The ﬁrst is as an alternative text entry

device to replace the keyboard within an environment and using software originally

designed for keyboard use. The second is to redesign a system, taking full advantage

of the beneﬁts of the technique whilst minimizing the potential problems. Finally, it

can be used in areas where keyboard-based input is impractical or impossible. It is in

the latter, more radical areas that speech technology is currently achieving success.

POSITIONING, POINTING AND DRAWING

Central to most modern computing systems is the ability to point at something on

the screen and thereby manipulate it, or perform some function. There has been a

long history of such devices, in particular in computer-aided design (CAD), where

positioning and drawing are the major activities. Pointing devices allow the user to

point, position and select items, either directly or by manipulating a pointer on the

screen. Many pointing devices can also be used for free-hand drawing although the

skill of drawing with a mouse is very different from using a pencil. The mouse is still

most common for desktop computers, but is facing challenges as laptop and hand-

held computing increase their market share. Indeed, these words are being typed on

a laptop with a touchpad and no mouse.

2.3.1 The mouse

The mouse has become a major component of the majority of desktop computer sys-

tems sold today, and is the little box with the tail connecting it to the machine in our

basic computer system picture (Figure 2.1). It is a small, palm-sized box housing a

weighted ball – as the box is moved over the tabletop, the ball is rolled by the table

and so rotates inside the housing. This rotation is detected by small rollers that are

in contact with the ball, and these adjust the values of potentiometers. If you remove

the ball occasionally to clear dust you may be able to see these rollers. The changing

values of these potentiometers can be directly related to changes in position of the

ball. The potentiometers are aligned in different directions so that they can detect

both horizontal and vertical motion. The relative motion information is passed

to the computer via a wire attached to the box, or in some cases using wireless or

infrared, and moves a pointer on the screen, called the cursor. The whole arrange-

ment tends to look rodent-like, with the box acting as the body and the wire as the

tail; hence the term ‘mouse’. In addition to detecting motion, the mouse has typically

one, two or three buttons on top. These are used to indicate selection or to initiate

action. Single-button mice tend to have similar functionality to multi-button mice,

and achieve this by instituting different operations for a single and a double button

click. A ‘double-click’ is when the button is pressed twice in rapid succession. Multi-

button mice tend to allocate one operation to each particular button.

2.3

72 Chapter 2 n The computer

The mouse operates in a planar fashion, moving around the desktop, and is an

indirect input device, since a transformation is required to map from the horizontal

nature of the desktop to the vertical alignment of the screen. Left–right motion is

directly mapped, whilst up–down on the screen is achieved by moving the mouse

away–towards the user. The mouse only provides information on the relative move-

ment of the ball within the housing: it can be physically lifted up from the desktop

and replaced in a different position without moving the cursor. This offers the

advantage that less physical space is required for the mouse, but suffers from being

less intuitive for novice users. Since the mouse sits on the desk, moving it about is

easy and users suffer little arm fatigue, although the indirect nature of the medium

can lead to problems with hand–eye coordination. However, a major advantage of

the mouse is that the cursor itself is small, and it can be easily manipulated without

obscuring the display.

The mouse was developed around 1964 by Douglas C. Engelbart, and a photo-

graph of the ﬁrst prototype is shown in Figure 2.6. This used two wheels that

slid across the desktop and transmitted x–ycoordinates to the computer. The hous-

ing was carved in wood, and has been damaged, exposing one of the wheels. The

original design actually offers a few advantages over today’s more sleek versions:

by tilting it so that only one wheel is in contact with the desk, pure vertical or hori-

zontal motion can be obtained. Also, the problem of getting the cursor across the

large screens that are often used today can be solved by ﬂicking your wrist to get

the horizontal wheel spinning. The mouse pointer then races across the screen with

no further effort on your behalf, until you stop it at its destination by dropping the

mouse down onto the desktop.

Figure 2.6 The ﬁrst mouse. Photograph courtesy of Douglas Engelbart and

Bootstrap Institute

2.3 Positioning, pointing and drawing 73

Although most mice are hand operated, not all are – there have been experiments

with a device called the footmouse. As the name implies, it is a foot-operated device,

although more akin to an isometric joystick than a mouse. The cursor is moved by

foot pressure on one side or the other of a pad. This allows one to dedicate hands to

the keyboard. A rare device, the footmouse has not found common acceptance!

Interestingly foot pedals are used heavily in musical instruments including pianos,

electric guitars, organs and drums and also in mechanical equipment including cars,

cranes, sewing machines and industrial controls. So it is clear that in principle this is

a good idea. Two things seem to have limited their use in computer equipment

(except simulators and games). One is the practicality of having foot controls in the

work environment: pedals under a desk may be operated accidentally, laptops with

foot pedals would be plain awkward. The second issue is the kind of control being

exercised. Pedals in physical interfaces are used predominantly to control one or

more single-dimensional analog controls. It may be that in more specialized interfaces

appropriate foot-operated controls could be more commonly and effectively used.

2.3.2 Touchpad

Touchpads are touch-sensitive tablets usually around 2–3 inches (50–75 mm)

square. They were ﬁrst used extensively in Apple Powerbook portable computers but

are now used in many other notebook computers and can be obtained separately to

replace the mouse on the desktop. They are operated by stroking a ﬁnger over their

surface, rather like using a simulated trackball. The feel is very different from other

input devices, but as with all devices users quickly get used to the action and become

proﬁcient.

Because they are small it may require several strokes to move the cursor across the

screen. This can be improved by using acceleration settings in the software linking

the trackpad movement to the screen movement. Rather than having a ﬁxed ratio of

pad distance to screen distance, this varies with the speed of movement. If the ﬁnger

Optical mice

Optical mice work differently from mechanical mice. A light-emitting diode emits a weak

red light from the base of the mouse. This is reﬂected off a special pad with a metallic grid-like pat-

tern upon which the mouse has to sit, and the ﬂuctuations in reﬂected intensity as the mouse is

moved over the gridlines are recorded by a sensor in the base of the mouse and translated into

relative x, y motion. Some optical mice do not require special mats, just an appropriate surface,

and use the natural texture of the surface to detect movement. The optical mouse is less suscept-

ible to dust and dirt than the mechanical one in that its mechanism is less likely to become blocked

up. However, for those that rely on a special mat, if the mat is not properly aligned, movement of

the mouse may become erratic – especially difﬁcult if you are working with someone and pass the

mouse back and forth between you.

74 Chapter 2 n The computer

moves slowly over the pad then the pad movements map to small distances on the

screen. If the ﬁnger is moving quickly the same distance on the touchpad moves the

cursor a long distance. For example, on the trackpad being used when writing this

section a very slow movement of the ﬁnger from one side of the trackpad to the other

moves the cursor less than 10% of the width of the screen. However, if the ﬁnger is

moved very rapidly from side to side, the cursor moves the whole width of the screen.

In fact, this form of acceleration setting is also used in other indirect positioning

devices including the mouse. Fine settings of this sort of parameter makes a great dif-

ference to the ‘feel’ of the device.

2.3.3 Trackball and thumbwheel

The trackball is really just an upside-down mouse! A weighted ball faces upwards and

is rotated inside a static housing, the motion being detected in the same way as for

a mechanical mouse, and the relative motion of the ball moves the cursor. Because

of this, the trackball requires no additional space in which to operate, and is there-

fore a very compact device. It is an indirect device, and requires separate buttons

for selection. It is fairly accurate, but is hard to draw with, as long movements are

difﬁcult. Trackballs now appear in a wide variety of sizes, the most usual being about

the same as a golf ball, with a number of larger and smaller devices available. The size

and ‘feel’ of the trackball itself affords signiﬁcant differences in the usability of the

device: its weight, rolling resistance and texture all contribute to the overall effect.

Some of the smaller devices have been used in notebook and portable computers,

but more commonly trackpads or nipples are used. They are often sold as altern-

atives to mice on desktop computers, especially for RSI sufferers. They are also

heavily used in video games where their highly responsive behavior, including being

able to spin the ball, is ideally suited to the demands of play.

Thumbwheels are different in that they have two orthogonal dials to control the

cursor position. Such a device is very cheap, but slow, and it is difﬁcult to manipu-

late the cursor in any way other than horizontally or vertically. This limitation can

sometimes be a useful constraint in the right application. For instance, in CAD the

designer is almost always concerned with exact verticals and horizontals, and a

device that provides such constraints is very useful, which accounts for the appear-

ance of thumbwheels in CAD systems. Another successful application for such a

device has been in a drawing game such as Etch-a-Sketch in which straight lines can

be created on a simple screen, since the predominance of straight lines in simple

drawings means that the motion restrictions are an advantage rather than a handi-

cap. However, if you were to try to write your signature using a thumbwheel, the

limitations would be all too apparent. The appropriateness of the device depends on

the task to be performed.

Although two-axis thumbwheels are not heavily used in mainstream applications,

single thumbwheels are often included on a standard mouse in order to offer an

alternative means to scroll documents. Normally scrolling requires you to grab the

scroll bar with the mouse cursor and drag it down. For large documents it is hard to

2.3 Positioning, pointing and drawing 75

be accurate and in addition the mouse dragging is done holding a ﬁnger down which

adds to hand strain. In contrast the small scroll wheel allows comparatively intuitive

and fast scrolling, simply rotating the wheel to move the page.

2.3.4 Joystick and keyboard nipple

The joystick is an indirect input device, taking up very little space. Consisting of a

small palm-sized box with a stick or shaped grip sticking up from it, the joystick is a

simple device with which movements of the stick cause a corresponding movement

of the screen cursor. There are two types of joystick: the absolute and the isometric.

In the absolute joystick, movement is the important characteristic, since the position

of the joystick in the base corresponds to the position of the cursor on the screen.

In the isometric joystick, the pressure on the stick corresponds to the velocity of

the cursor, and when released, the stick returns to its usual upright centered position.

This type of joystick is also called the velocity-controlled joystick, for obvious

reasons. The buttons are usually placed on the top of the stick, or on the front like a

trigger. Joysticks are inexpensive and fairly robust, and for this reason they are often

found in computer games. Another reason for their dominance of the games market

is their relative familiarity to users, and their likeness to aircraft joysticks: aircraft are

a favorite basis for games, leading to familiarity with the joystick that can be used

for more obscure entertainment ideas.

A smaller device but with the same basic characteristics is used on many laptop

computers to control the cursor. Some older systems had a variant of this called the

keymouse, which was a single key. More commonly a small rubber nipple projects in

the center of the keyboard and acts as a tiny isometric joystick. It is usually difﬁcult

for novices to use, but this seems to be related to ﬁne adjustment of the speed set-

tings. Like the joystick the nipple controls the rate of movement across the screen

and is thus less direct than a mouse or stylus.

2.3.5 Touch-sensitive screens (touchscreens)

Touchscreens are another method of allowing the user to point and select objects

on the screen, but they are much more direct than the mouse, as they detect the

presence of the user’s ﬁnger, or a stylus, on the screen itself. They work in one of

a number of different ways: by the ﬁnger (or stylus) interrupting a matrix of light

beams, or by capacitance changes on a grid overlaying the screen, or by ultrasonic

reﬂections. Because the user indicates exactly which item is required by pointing to

it, no mapping is required and therefore this is a direct device.

The touchscreen is very fast, and requires no specialized pointing device. It is

especially good for selecting items from menus displayed on the screen. Because

the screen acts as an input device as well as an output device, there is no separate

hardware to become damaged or destroyed by dirt; this makes touchscreens suitable

for use in hostile environments. They are also relatively intuitive to use and have

been used successfully as an interface to information systems for the general public.

76 Chapter 2 n The computer

They suffer from a number of disadvantages, however. Using the ﬁnger to point is

not always suitable, as it can leave greasy marks on the screen, and, being a fairly

blunt instrument, it is quite inaccurate. This means that the selection of small

regions is very difﬁcult, as is accurate drawing. Moreover, lifting the arm to point to

a vertical screen is very tiring, and also means that the screen has to be within about

a meter of the user to enable it to be reached, which can make it too close for com-

fort. Research has shown that the optimal angle for a touchscreen is about 15 degrees

up from the horizontal.

2.3.6 Stylus and light pen

For more accurate positioning (and to avoid greasy screens), systems with touch-

sensitive surfaces often emply a stylus. Instead of pointing at the screen directly a

small pen-like plastic stick is used to point and draw on the screen. This is particu-

larly popular in PDAs, but they are also being used in some laptop computers.

An older technology that is used in the same way is the light pen. The pen is con-

nected to the screen by a cable and, in operation, is held to the screen and detects

a burst of light from the screen phosphor during the display scan. The light pen

can therefore address individual pixels and so is much more accurate than the

touchscreen.

Both stylus and light pen can be used for ﬁne selection and drawing, but both

can be tiring to use on upright displays and are harder to take up and put down

when used together with a keyboard. Interestingly some users of PDAs with fold-out

keyboards learn to hold the stylus held outwards between their ﬁngers so that they

can type whilst holding it. As it is unattached the stylus can easily get lost, but a

closed pen can be used in emergencies.

Stylus, light pen and touchscreen are all very direct in that the relationship

between the device and the thing selected is immediate. In contrast, mouse, touch-

pad, joystick and trackball all have to map movements on the desk to cursor move-

ment on the screen.

However, the direct devices suffer from the problem that, in use, the act of point-

ing actually obscures the display, making it harder to use, especially if complex

detailed selections or movements are required in rapid succession. This means that

screen designs have to take into account where the user’s hand will be. For example,

you may want to place menus at the bottom of the screen rather than the top. Also

you may want to offer alternative layouts for right-handed and left-handed users.

2.3.7 Digitizing tablet

The digitizing tablet is a more specialized device typically used for freehand drawing,

but may also be used as a mouse substitute. Some highly accurate tablets, usually

using a puck (a mouse-like device), are used in special applications such as digitizing

information for maps.

The tablet provides positional information by measuring the position of some

device on a special pad, or tablet, and can work in a number of ways. The resistive

2.3 Positioning, pointing and drawing 77

tabletdetects point contact between two separated conducting sheets. It has advant-

ages in that it can be operated without a specialized stylus – a pen or the user’s ﬁnger

is sufﬁcient. The magnetic tablet detects current pulses in a magnetic ﬁeld using a

small loop coil housed in a special pen. There are also capacitative and electrostatic

tablets that work in a similar way. The sonic tabletis similar to the above but requires

no special surface. An ultrasonic pulse is emitted by a special pen which is detected

by two or more microphones which then triangulate the pen position. This device

can be adapted to provide 3D input, if required.

Digitizing tablets are capable of high resolution, and are available in a range of

sizes. Sampling rates vary, affecting the resolution of cursor movement, which gets

progressively ﬁner as the sampling rate increases. The digitizing tablet can be used to

detect relative motion or absolute motion, but is an indirect device since there is a

mapping from the plane of operation of the tablet to the screen. It can also be used

for text input; if supported by character recognition software, handwriting can be

interpreted. Problems with digitizing tablets are that they require a large amount of

desk space, and may be awkward to use if displaced to one side by the keyboard.

2.3.8 Eyegaze

Eyegaze systems allow you to control the computer by simply looking at it! Some sys-

tems require you to wear special glasses or a small head-mounted box, others are

built into the screen or sit as a small box below the screen. A low-power laser is shone

into the eye and is reﬂected off the retina. The reﬂection changes as the angle of the

eye alters, and by tracking the reﬂected beam the eyegaze system can determine the

direction in which the eye is looking. The system needs to be calibrated, typically by

staring at a series of dots on the screen, but thereafter can be used to move the screen

cursor or for other more specialized uses. Eyegaze is a very fast and accurate device,

but the more accurate versions can be expensive. It is ﬁne for selection but not for

drawing since the eye does not move in smooth lines. Also in real applications it

can be difﬁcult to distinguish deliberately gazing at something and accidentally

glancing at it.

Such systems have been used in military applications, notably for guiding air-to-

air missiles to their targets, but are starting to ﬁnd more peaceable uses, for disabled

users and for workers in environments where it is impossible for them to use their

hands. The rarity of the eyegaze is due partly to its novelty and partly to its expense,

and it is usually found only in certain domain-speciﬁc applications. Within HCI it is

particularly useful as part of evaluation as one is able to trace exactly where the user

is looking [81]. As prices drop and the technology becomes less intrusive we may see

more applications using eyegaze, especially in virtual reality and augmented reality

areas (see Chapter 20).

2.3.9 Cursor keys and discrete positioning

All of the devices we have discussed are capable of giving near continuous 2D

positioning, with varying degrees of accuracy. For many applications we are only

78 Chapter 2 n The computer

interested in positioning within a sequential list such as a menu or amongst 2D cells

as in a spreadsheet. Even for moving within text discrete up/down left/right keys can

sometimes be preferable to using a mouse.

Cursor keys are available on most keyboards. Four keys on the keyboard are used

to control the cursor, one each for up, down, left and right. There is no standardized

layout for the keys. Some layouts are shown in Figure 2.7, but the most common now

is the inverted ‘T’.

Cursor keys used to be more heavily used in character-based systems before

windows and mice were the norm. However, when logging into remote machines

such as web servers, the interface is often a virtual character-based terminal within a

telnet window. In such applications it is common to ﬁnd yourself in a 1970s world

of text editors controlled sometimes using cursor keys and sometimes by more

arcane combinations of control keys!

Small devices such as mobile phones, personal entertainment and television

remote controls often require discrete control, either dedicated to a particular func-

tion such as volume, or for use as general menu selection. Figure 2.8 shows examples

of these. The satellite TV remote control has dedicated ‘+/–’ buttons for controlling

volume and stepping between channels. It also has a central cursor pad that is used

for on-screen menus. The mobile phone has a single central joystick-like device.

This can be pushed left/right, up/down to navigate within the small 3× 3 array of

graphical icons as well as select from text menus.

DISPLAY DEVICES

The vast majority of interactive computer systems would be unthinkable with-

out some sort of display screen, but many such systems do exist, though usually

in specialized applications only. Thinking beyond the traditional, systems such as

cars, hi-ﬁs and security alarms all have different outputs from those expressible on a

screen, but in the personal computer and workstation market, screens are pervasive.

2.4

Figure 2.7 Various cursor key layouts

2.4 Display devices 79

In this section, we discuss the standard computer display in detail, looking at the

properties of bitmap screens, at different screen technologies, at large and situated

displays, and at a new technology, ‘digital paper’.

2.4.1 Bitmap displays – resolution and color

Virtually all computer displays are based on some sort of bitmap. That is the display

is made of vast numbers of colored dots or pixels in a rectangular grid. These pixels

may be limited to black and white (for example, the small display on many TV

remote controls), in grayscale, or full color.

Figure 2.8 Satellite TV remote control and mobile phone. Source: Photograph left by Alan Dix with

permission from British Sky Broadcasting Limited, photograph right by Alan Dix (Ericsson phone)

80 Chapter 2 n The computer

The color or, for monochrome screens, the intensity at each pixel is held by the

computer’s video card. One bit per pixel can store on/off information, and hence only

black and white (the term ‘bitmap’ dates from such displays). More bits per pixel

give rise to more color or intensity possibilities. For example, 8 bits/pixel give rise to

=256 possible colors at any one time. The set of colors make up what is called the

colormap, and the colormap can be altered at any time to produce a different set of

colors. The system is therefore capable of actually displaying many more than the

number of colors in the colormap, but not simultaneously. Most desktop computers

now use 24 or 32 bits per pixel which allows virtually unlimited colors, but devices such

as mobile phones and PDAs are often still monochrome or have limited color range.

As well as the number of colors that can be displayed at each pixel, the other measure

that is important is the resolution of the screen. Actually the word ‘resolution’ is used

in a confused (and confusing!) way for screens. There are two numbers to consider:

n the total number of pixels: in standard computer displays this is always in a 4:3

ratio, perhaps 1024 pixels across by 768 down, or 1600× 1200; for PDAs this will

be more in the order of a few hundred pixels in each direction.

n the density of pixels: this is measured in pixels per inch. Unlike printers (see

Section 2.7 below) this density varies little between 72 and 96 pixels per inch.

To add to the confusion, a monitor, liquid crystal display (LCD) screen or other

display device will quote its maximum resolution, but the computer may actually

give it less than this. For example, the screen may be a 1200× 900 resolution with 96

pixels per inch, but the computer only sends it 800× 600. In the case of a cathode ray

tube (CRT) this typically will mean that the image is stretched over the screen sur-

face giving a lower density of 64 pixels per inch. An LCD screen cannot change its

pixel size so it would keep 96 pixels per inch and simply not use all its screen space,

adding a black border instead. Some LCD projectors will try to stretch or reduce

what they are given, but this may mean that one pixel gets stretched to two, or two

pixels get ‘squashed’ into one, giving rise to display ‘artifacts’ such as thin lines dis-

appearing, or uniform lines becoming alternately thick or thin.

Although horizontal and vertical lines can be drawn perfectly on bitmap screens,

and lines at 45 degrees reproduce reasonably well, lines at any other angle and curves

have ‘jaggies’, rough edges caused by the attempt to approximate the line with pixels.

When using a single color jaggies are inevitable. Similar effects are seen in bitmap

fonts. The problem of jaggies can be reduced by using high-resolution screens, or by

a technique known as anti-aliasing. Anti-aliasing softens the edges of line segments,

blurring the discontinuity and making the jaggies less obvious.

Look at the two images in Figure 2.9 with your eyes slightly screwed up. See how

the second anti-aliased line looks better. Of course, screen resolution is much higher,

but the same principle holds true. The reason this works is because our brains are

constantly ‘improving’ what we see in the world: processing and manipulating the

raw sensations of the rods and cones in our eyes and turning them into something

meaningful. Often our vision is blurred because of poor light, things being out of

focus, or defects in our vision. Our brain compensates and tidies up blurred images.

By deliberately blurring the image, anti-aliasing triggers this processing in our brain

and we appear to see a smooth line at an angle.

2.4 Display devices 81

2.4.2 Technologies

Cathode ray tube

The cathode ray tube is the television-like computer screen still most common as

we write this, but rapidly being displaced by ﬂat LCD screens. It works in a similar

way to a standard television screen. A stream of electrons is emitted from an electron

gun, which is then focussed and directed by magnetic ﬁelds. As the beam hits the

phosphor-coated screen, the phosphor is excited by the electrons and glows (see

Figure 2.10). The electron beam is scanned from left to right, and then ﬂicked back

to rescan the next line, from top to bottom. This is repeated, at about 30 Hz (that is,

30 times a second), per frame, although higher scan rates are sometimes used to

reduce the ﬂicker on the screen. Another way of reducing ﬂicker is to use interlacing,

in which the odd lines on the screen are all scanned ﬁrst, followed by the even lines.

Using a high-persistence phosphor, which glows for a longer time when excited, also

reduces ﬂicker, but causes image smearing especially if there is signiﬁcant animation.

Black and white screens are able to display grayscale by varying the intensity of the

electron beam; color is achieved using more complex means. Three electron guns

are used, one each to hit red, green and blue phosphors. Combining these colors can

Figure 2.9 Magniﬁed anti-aliased lines

Figure 2.10 CRT screen

82 Chapter 2 n The computer

produce many others, including white, when they are all fully on. These three phosphor

dots are focussed to make a single point using a shadow mask, which is imprecise and

gives color screens a lower resolution than equivalent monochrome screens.

An alternative approach to producing color on the screen is to use beam penetra-

tion. A special phosphor glows a different color depending on the intensity of the

beam hitting it.

The CRT is a cheap display device and has fast enough response times for rapid

animation coupled with a high color capability. Note that animation does not neces-

sarily mean little creatures and ﬁgures running about on the screen, but refers in

a more general sense to the use of motion in displays: moving the cursor, opening

windows, indicating processor-intensive calculations, or whatever. As screen resolu-

tion increases, however, the price rises. Because of the electron gun and focussing

components behind the screen, CRTs are fairly bulky, though recent innovations

have led to ﬂatter displays in which the electron gun is not placed so that it ﬁres

directly at the screen, but ﬁres parallel to the screen plane with the resulting beam

bent through 90 degrees to hit the screen.

Health hazards of CRT displays

Most people who habitually use computers are aware that screens can often cause eyestrain

and fatigue; this is usually due to ﬂicker, poor legibility or low contrast. There have also been many

concerns relating to the emission of radiation from screens. These can be categorized as follows:

n X-rays which are largely absorbed by the screen (but not at the rear!)

n ultraviolet and infrared radiation from phosphors in insigniﬁcant levels

n radio frequency emissions, plus ultrasound (approximately 16 kHz)

n electrostatic ﬁeld which leaks out through the tube to the user. The intensity is dependent on

distance and humidity. This can cause rashes in the user

n electromagnetic ﬁelds (50 Hz to 0.5 MHz) which create induction currents in conductive

materials, including the human body. Two types of effects are attributed to this: in the visual

system, a high incidence of cataracts in visual display unit (VDU) operators, and concern over

reproductive disorders (miscarriages and birth defects).

Research into the potentially harmful effect of these emissions is generally inconclusive, in that it

is difﬁcult to determine precisely what the causes of illness are, and many health scares have been

the result of misinformed media opinion rather than scientiﬁc fact. However, users who are preg-

nant ought to take especial care and observe simple precautions. Generally, there are a number of

common-sense things that can be done to relieve strain and minimize any risk. These include

n not sitting too close to the screen

n not using very small fonts

n not looking at the screen for a long time without a break

n working in well-lit surroundings

n not placing the screen directly in front of a bright window.

2.4 Display devices 83

Liquid crystal display

If you have used a personal organizer or notebook computer, you will have seen

the light, ﬂat plastic screens. These displays utilize liquid crystal technology and are

smaller, lighter and consume far less power than traditional CRTs. These are also

commonly referred to as ﬂat-panel displays. They have no radiation problems asso-

ciated with them, and are matrix addressable, which means that individual pixels can

be accessed without the need for scanning.

Similar in principle to the digital watch, a thin layer of liquid crystal is sandwiched

between two glass plates. The top plate is transparent and polarized, whilst the bot-

tom plate is reﬂective. External light passes through the top plate and is polarized,

which means that it only oscillates in one direction. This then passes through the

crystal, reﬂects off the bottom plate and back to the eye, and so that cell looks white.

When a voltage is applied to the crystal, via the conducting glass plates, the crystal

twists. This causes it to turn the plane of polarization of the incoming light, rotating

it so that it cannot return through the top plate, making the activated cell look black.

The LCD requires refreshing at the usual rates, but the relatively slow response of the

crystal means that ﬂicker is not usually noticeable. The low intensity of the light

emitted from the screen, coupled with the reduced ﬂicker, means that the LCD is less

tiring to use than standard CRT ones, with reduced eyestrain.

This different technology can be used to replace the standard screen on a desktop

computer, and this is now common. However, the particular characteristics of com-

pactness, light weight and low power consumption have meant that these screens

have created a large niche in the computer market by monopolizing the notebook

and portable computer systems side. The advent of these screens allowed small, light

computers to be built, and created a large market that did not previously exist. Such

computers, riding on the back of the technological wave, have opened up a different

way of working for many people, who now have access to computers when away

from the ofﬁce, whether out on business or at home. Working in a different location

on a smaller machine with different software obviously represents a different style

of interaction and so once again we can see that differences in devices may alter

the human–computer interaction considerably. The growing notebook computer

market fed back into an investment in developing LCD screen technology, with

supertwisted crystals increasing the viewing angle dramatically. Response times have

also improved so that LCD screens are now used in personal DVD players and even

in home television.

When the second edition of this book was being written the majority of LCD

screens were black and white or grayscale, We wrote then ‘it will be interesting to see

whether color LCD screens supersede grayscale by the time the third edition of this

book is prepared’. Of course, this is precisely the case. Our expectation is that by the

time we produce the next edition LCD monitors will have taken over from CRT

monitors completely.

84 Chapter 2 n The computer

Special displays

There are a number of other display technologies used in niche markets. The one

you are most likely to see is the gas plasma display, which is used in large screens

(see Section 2.4.3 below).

The random scan display, also known as the directed beam refresh, or vector display,

works differently from the bitmap display, also known as raster scan, that we dis-

cussed in Section 2.4.1. Instead of scanning the whole screen sequentially and hori-

zontally, the random scan draws the lines to be displayed directly. By updating the

screen at at least 30 Hz to reduce ﬂicker, the direct drawing of lines at any angle

means that jaggies are not created, and higher resolutions are possible, up to 4096×

4096 pixels. Color on such displays is achieved using beam penetration technology,

and is generally of a poorer quality. Eyestrain and fatigue are still a problem, and

these displays are more expensive than raster scan ones, so they are now only used in

niche applications.

The direct view storage tube is used extensively as the display for an analog

storage oscilloscope, which is probably the only place that these displays are used in

any great numbers. They are similar in operation to the random scan CRT but the

image is maintained by ﬂood guns which have the advantage of producing a stable

display with no ﬂicker. The screen image can be incrementally updated but not

selectively erased; removing items has to be done by redrawing the new image on

a completely erased screen. The screens have a high resolution, typically about

4096× 3120 pixels, but suffer from low contrast, low brightness and a difﬁculty in

displaying color.

2.4.3 Large displays and situated displays

Displays are no longer just things you have on your desktop or laptop. In Chapter 19

we will discuss meeting room environments that often depend on large shared

screens. You may have attended lectures where the slides are projected from a com-

puter onto a large screen. In shops and garages large screen adverts assault us from

all sides.

There are several types of large screen display. Some use gas plasma technology

to create large ﬂat bitmap displays. These behave just like a normal screen except

they are big and usually have the HDTV (high deﬁnition television) wide screen

format which has an aspect ratio of 16:9 instead of the 4:3 on traditional TV and

monitors.

Where very large screen areas are required, several smaller screens, either LCD or

CRT, can be placed together in a video wall. These can display separate images, or a

single TV or computer image can be split up by software or hardware so that each

screen displays a portion of the whole and the result is an enormous image. This

is the technique often used in large concerts to display the artists or video images

during the performance.

2.4 Display devices 85

Possibly the large display you are most likely to have encountered is some sort of

projector. There are two variants of these. In very large lecture theatres, especially

older ones, you see projectors with large red, green and blue lenses. These each scan

light across the screen to build a full color image. In smaller lecture theatres and in

small meetings you are likely to see LCD projectors. Usually the size of a large book,

these are like ordinary slide projectors except that where the slide would be there is

a small LCD screen instead. The light from the projector passes through the tiny

screen and is then focussed by the lens onto the screen.

The disadvantage of projected displays is that the presenter’s shadow can often fall

across the screen. Sometimes this is avoided in ﬁxed lecture halls by using back pro-

jection. In a small room behind the screen of the lecture theatre there is a projector

producing a right/left reversed image. The screen itself is a semi-frosted glass so that

the image projected on the back can be seen in the lecture theatre. Because there are

limits on how wide an angle the projector can manage without distortion, the size of

the image is limited by the depth of the projection room behind, so these are less

heavily used than front projection.

As well as for lectures and meetings, display screens can be used in various public

places to offer information, link spaces or act as message areas. These are often called

situated displays as they take their meaning from the location in which they are

situated. These may be large screens where several people are expected to view or

interact simultaneously, or they may be very small. Figure 2.11 shows an example

of a small experimental situated display mounted by an ofﬁce door to act as an

electronic sticky note [70].

Figure 2.11 Situated door display. Source: Courtesy of Keith Cheverst

86 Chapter 2 n The computer

2.4.4 Digital paper

A new form of ‘display’ that is still in its infancy is the various forms of digital paper.

These are thin ﬂexible materials that can be written to electronically, just like a com-

puter screen, but which keep their contents even when removed from any electrical

supply.

There are various technologies being investigated for this. One involves the whole

surface being covered with tiny spheres, black one side, white the other. Electronics

embedded into the material allow each tiny sphere to be rotated to make it black

or white. When the electronic signal is removed the ball stays in its last orientation.

A different technique has tiny tubes laid side by side. In each tube is light-absorbing

liquid and a small reﬂective sphere. The sphere can be made to move to the top sur-

face or away from it making the pixel white or black. Again the sphere stays in its last

position once the electronic signal is removed.

Probably the ﬁrst uses of these will be for large banners that can be reprogrammed

or slowly animated. This is an ideal application, as it does not require very rapid

updates and does not require the pixels to be small. As the technology matures, the

aim is to have programmable sheets of paper that you attach to your computer to get

a ‘soft’ printout that can later be changed. Perhaps one day you may be able to have

a ‘soft’ book that appears just like a current book with soft pages that can be turned

and skimmed, but where the contents and cover can be changed when you decide to

download a new book from the net!

DESIGN FOCUS

Hermes: a situated display

Ofﬁce doors are often used as a noticeboard with messages from the occupant such as ‘just gone

out’ or ‘timetable for the week’ and from visitors ‘missed you, call when you get back’. The Hermes

system is an electronic door display that offers some of the functions of sticky notes on a door [70].

Figure 2.11(i) shows an installed Hermes device ﬁxed just beside the door, including the socket to

use a Java iButton to authenticate the occupant. The occupant can leave messages that others can read

(Figure 2.11(ii)) and people coming to the door can leave messages for the occupant. Electronic notes

are smaller than paper ones, but because they are electronic they can be read remotely using a web

interface (Figure 2.11(iii)), or added by SMS (see Chapter 19, Section 19.3.2).

The fact that it is situated – by a person’s door – is very important. It establishes a context, ‘Alan’s

door’, and inﬂuences the way the system is used. For example, the idea of anonymous messages left on

the door, where the visitor has had to be physically present, feels different from, say, anonymous emails.

See the book website for the full case study: /e3/casestudy/hermes/

2.5 Devices for virtual reality and 3D interaction 87

DEVICES FOR VIRTUAL REALITY AND 3D INTERACTION

Virtual reality (VR) systems and various forms of 3D visualization are discussed in

detail in Chapter 20. These require you to navigate and interact in a three-dimensional

space. Sometimes these use the ordinary controls and displays of a desktop computer

system, but there are also special devices used both to move and interact with 3D

objects and to enable you to see a 3D environment.

2.5.1 Positioning in 3D space

Virtual reality systems present a 3D virtual world. Users need to navigate through

these spaces and manipulate the virtual objects they ﬁnd there. Navigation is not

simply a matter of moving to a particular location, but also of choosing a particular

orientation. In addition, when you grab an object in real space, you don’t simply

move it around, but also twist and turn it, for example when opening a door. Thus

the move from mice to 3D devices usually involves a change from two degrees of

freedom to six degrees of freedom, not just three.

Cockpit and virtual controls

Helicopter and aircraft pilots already have to navigate in real space. Many arcade

games and also more serious applications use controls modeled on an aircraft

cockpit to ‘ﬂy’ through virtual space. However, helicopter pilots are very skilled and

it takes a lot of practice for users to be able to work easily in such environments.

In many PC games and desktop virtual reality (where the output is shown on

an ordinary computer screen), the controls are themselves virtual. This may be a

simulated form of the cockpit controls or more prosaic up/down left/right buttons.

The user manipulates these virtual controls using an ordinary mouse (or other 2D

device). Note that this means there are two levels of indirection. It is a tribute to the

ﬂexibility of the human mind that people can not only use such systems but also

rapidly become proﬁcient.

The 3D mouse

There are a variety of devices that act as 3D versions of a mouse. Rather than just

moving the mouse on a tabletop, you can pick it up, move it in three dimensions,

rotate the mouse and tip it forward and backward. The 3D mouse has a full six

degrees of freedom as its position can be tracked (three degrees), and also its

up/down angle (called pitch), its left/right orientation (called yaw) and the amount

it is twisted about its own axis (called roll) (see Figure 2.12). Various sensors are used

to track the mouse position and orientation: magnetic coils, ultrasound or even

mechanical joints where the mouse is mounted rather like an angle-poise lamp.

With the 3D mouse, and indeed most 3D positioning devices, users may experi-

ence strain from having to hold the mouse in the air for a long period. Putting the

2.5

88 Chapter 2 n The computer

3D mouse down may even be treated as an action in the virtual environment, that is

taking a nose dive.

Dataglove

One of the mainstays of high-end VR systems (see Chapter 20), the dataglove is a 3D

input device. Consisting of a lycra glove with optical ﬁbers laid along the ﬁngers, it

detects the joint angles of the ﬁngers and thumb. As the ﬁngers are bent, the ﬁber

optic cable bends too; increasing bend causes more light to leak from the ﬁber, and

the reduction in intensity is detected by the glove and related to the degree of bend

in the joint. Attached to the top of the glove are two sensors that use ultrasound to

determine 3D positional information as well as the angle of roll, that is the degree of

wrist rotation. Such rich multi-dimensional input is currently a solution in search

of a problem, in that most of the applications in use do not require such a compre-

hensive form of data input, whilst those that do cannot afford it. However, the avail-

ability of cheaper versions of the dataglove will encourage the development of more

complex systems that are able to utilize the full power of the dataglove as an input

device. There are a number of potential uses for this technology to assist disabled

people, but cost remains the limiting factor at present.

The dataglove has the advantage that it is very easy to use, and is potentially very

powerful and expressive (it can provide 10 joint angles, plus the 3D spatial informa-

tion and degree of wrist rotation, 50 times a second). It suffers from extreme

expense, and the fact that it is difﬁcult to use in conjunction with a keyboard.

However, such a limitation is shortsighted; one can imagine a keyboard drawn onto

a desk, with software detecting hand positions and interpreting whether the virtual

keys had been hit or not. The potential for the dataglove is vast; gesture recognition

and sign language interpretation are two obvious areas that are the focus of active

research, whilst less obvious applications are evolving all the time.

Figure 2.12 Pitch, yaw and roll

2.5 Devices for virtual reality and 3D interaction 89

Virtual reality helmets

The helmets or goggles worn in some VR systems have two purposes: (i) they display

the 3D world to each eye and (ii) they allow the user’s head position to be tracked.

We will discuss the former later when we consider output devices. The head tracking

is used primarily to feed into the output side. As the user’s head moves around the

user ought to see different parts of the scene. However, some systems also use the

user’s head direction to determine the direction of movement within the space and

even which objects to manipulate (rather like the eyegaze systems). You can think of

this rather like leading a horse in reverse. If you want a horse to go in a particular

direction, you use the reins to pull its head in the desired direction and the horse

follows its head.

Whole-body tracking

Some VR systems aim to be immersive, that is to make the users feel as if they are

really in the virtual world. In the real world it is possible (although not usually wise)

to walk without looking in the direction you are going. If you are driving down

the road and glance at something on the roadside you do not want the car to do a

sudden 90-degree turn! Some VR systems therefore attempt to track different kinds

of body movement. Some arcade games have a motorbike body on which you can

lean into curves. More strangely, small trampolines have been wired up so that the

user can control movement in virtual space by putting weight on different parts of

the trampoline. The user can literally surf through virtual space. In the extreme the

movement of the whole body may be tracked using devices similar to the dataglove,

or using image-processing techniques. In the latter, white spots are stuck at various

points of the user’s body and the position of these tracked using two or more cam-

eras, allowing the location of every joint to be mapped. Although the last of these

sounds a little constraining for the fashion conscious it does point the way to less

intrusive tracking techniques.

2.5.2 3D displays

Just as the 3D images used in VR have led to new forms of input device, they

also require more sophisticated outputs. Desktop VR is delivered using a standard

computer screen and a 3D impression is produced by using effects such as shadows,

occlusion (where one object covers another) and perspective. This can be very

effective and you can even view 3D images over the world wide web using a VRML

(virtual reality markup language) enabled browser.

Seeing in 3D

Our eyes use many cues to perceive depth in the real world (see also Chapter 1). It is

in fact quite remarkable as each eye sees only a ﬂattened form of the world, like

a photograph. One important effect is stereoscopic vision (or simply stereo vision).

90 Chapter 2 n The computer

Because each eye is looking at an object from a slightly different angle each sees a

different image and our brain is able to use this to assess the relative distance of dif-

ferent objects. In desktop VR this stereoscopic effect is absent. However, various

devices exist to deliver true stereoscopic images.

The start point of any stereoscopic device is the generation of images from differ-

ent perspectives. As the computer is generating images for the virtual world anyway,

this just means working out the right positions and angles corresponding to the typ-

ical distance between eyes on a human face. If this distance is too far from the natural

one, the user will be presented with a giant’s or gnat’s eye view of the world!

Different techniques are then used to ensure that each eye sees the appropriate

image. One method is to have two small screens ﬁtted to a pair of goggles. A differ-

ent image is then shown to each eye. These devices are currently still quite cumber-

some and the popular image of VR is of a user with head encased in a helmet with

something like a pair of inverted binoculars sticking out in front. However, smaller

and lighter LCDs are now making it possible to reduce the devices towards the size

and weight of ordinary spectacles.

An alternative method is to have a pair of special spectacles connected so that each

eye can be blanked out by timed electrical signals. If this is synchronized with the

frame rate of a computer monitor, each eye sees alternate images. Similar techniques

use polarized ﬁlters in front of the monitor and spectacles with different polarized

lenses. These techniques are both effectively using similar methods to the red–green

3D spectacles given away in some breakfast cereals. Indeed, these red–green spectacles

have been used in experiments in wide-scale 3D television broadcasts. However,

the quality of the 3D image from the polarized and blanked eye spectacles is sub-

stantially better.

The ideal would be to be able to look at a special 3D screen and see 3D images just

as one does with a hologram – 3D television just like in all the best sci-ﬁ movies!

But there is no good solution to this yet. One method is to inscribe the screen with

small vertical grooves forming hundreds of prisms. Each eye then sees only alternate

dots on the screen allowing a stereo image at half the normal horizontal resolution.

However, these screens have very narrow viewing angles, and are not ready yet for

family viewing.

In fact, getting stereo images is not the whole story. Not only do our eyes see dif-

ferent things, but each eye also focusses on the current object of interest (small mus-

cles change the shape of the lens in the pupil of the eye). The images presented to the

eye are generated at some ﬁxed focus, often with effectively inﬁnite depth of ﬁeld.

This can be confusing and tiring. There has been some progress recently on using

lasers to detect the focal depth of each eye and adjust the images correspondingly,

similar to the technology used for eye tracking. However, this is not currently used

extensively.

VR motion sickness

We all get annoyed when computers take a long time to change the screen, pop up

a window, or play a digital movie. However, with VR the effects of poor display

2.6 Physical controls, sensors and special devices 91

performance can be more serious. In real life when we move our head the image our

eyes see changes accordingly. VR systems produce the same effect by using sensors in

the goggles or helmet and then using the position of the head to determine the right

image to show. If the system is slow in producing these images a lag develops

between the user moving his head and the scene changing. If this delay is more than

a hundred milliseconds or so the feeling becomes disorienting. The effect is very

similar to that of being at sea. You stand on the deck looking out to sea, the boat

gently rocking below you. Tiny channels in your ears detect the movement telling

your brain that you are moving; your eyes see the horizon moving in one direction

and the boat in another. Your brain gets confused and you get sick. Users of VR can

experience similar nausea and few can stand it for more than a short while. In fact,

keeping laboratories sanitary has been a major push in improving VR technology.

Simulators and VR caves

Because of the problems of delivering a full 3D environment via head-mounted

displays, some virtual reality systems work by putting the user within an environ-

ment where the virtual world is displayed upon it. The most obvious examples of this

are large ﬂight simulators – you go inside a mock-up of an aircraft cockpit and the

scenes you would see through the windows are projected onto the virtual windows.

In motorbike or skiing simulators in video arcades large screens are positioned to ﬁll

the main part of your visual ﬁeld. You can still look over your shoulder and see your

friends, but while you are engaged in the game it surrounds you.

More general-purpose rooms called caves have large displays positioned all

around the user, or several back projectors. In these systems the user can look all

around and see the virtual world surrounding them.

PHYSICAL CONTROLS, SENSORS AND SPECIAL DEVICES

As we have discussed, computers are coming out of the box. The mouse keyboard

and screen of the traditional computer system are not relevant or possible in

applications that now employ computers such as interactive TV, in-car navigation

systems or personal entertainment. These devices may have special displays, may use

sound, touch and smell as well as visual displays, may have dedicated controls and

may sense the environment or your own bio-signs.

2.6.1 Special displays

Apart from the CRT screen there are a number of visual outputs utilized in com-

plex systems, especially in embedded systems. These can take the form of analog

representations of numerical values, such as dials, gauges or lights to signify a certain

system state. Flashing light-emitting diodes (LEDs) are used on the back of some

2.6

92 Chapter 2 n The computer

computers to signify the processor state, whilst gauges and dials are found in process

control systems. Once you start in this mode of thinking, you can contemplate

numerous visual outputs that are unrelated to the screen. One visual display that has

found a specialized niche is the head-up display that is used in aircraft. The pilot is

fully occupied looking forward and ﬁnds it difﬁcult to look around the cockpit to get

information. There are many different things that need to be known, ranging from

data from tactical systems to navigational information and aircraft status indicators.

The head-up display projects a subset of this information into the pilot’s line

of vision so that the information is directly in front of her eyes. This obviates the

need for large banks of information to be scanned with the corresponding lack

of attention to what is happening outside, and makes the pilot’s job easier. Less

important information is usually presented on a smaller number of dials and gauges

in the cockpit to avoid cluttering the head-up display, and these can be monitored

less often, during times of low stress.

2.6.2 Sound output

Another mode of output that we should consider is that of auditory signals. Often

designed to be used in conjunction with screen displays, auditory outputs are poorly

understood: we do not yet know how to utilize sound in a sensible way to achieve

maximum effect and information transference. We have discussed speech previ-

ously, but other sounds such as beeps, bongs, clanks, whistles and whirrs are all used

to varying effect. As well as conveying system output, sounds offer an important level

of feedback in interactive systems. Keyboards can be set to emit a click each time

a key is pressed, and this appears to speed up interactive performance. Telephone

keypads often sound different tones when the keys are pressed; a noise occurring

signiﬁes that the key has been successfully pressed, whilst the actual tone provides

some information about the particular key that was pressed. The advantage of audit-

ory feedback is evident when we consider a simple device such as a doorbell. If we

press it and hear nothing, we are left undecided. Should we press it again, in case

we did not do it right the ﬁrst time, or did it ring but we did not hear it? And if we

press it again but it actually did ring, will the people in the house think we are very

rude, ringing insistently? We feel awkward and a little stressed. If we were using a

computer system instead of a doorbell and were faced with a similar problem, we

would not enjoy the interaction and would not perform as well. Yet it is a simple

problem that could be easily rectiﬁed by a better initial design, using sound. Chap-

ter 10 will discuss the use of the auditory channel in more detail.

2.6.3 Touch, feel and smell

Our other senses are used less in normal computer applications, but you may have

played computer games where the joystick or artiﬁcial steering wheel vibrated, per-

haps when a car was about to go off the track. In some VR applications, such as the

use in medical domains to ‘practice’ surgical procedures, the feel of an instrument

2.6 Physical controls, sensors and special devices 93

moving through different tissue types is very important. The devices used to emulate

these procedures have force feedback, giving different amounts of resistance depend-

ing on the state of the virtual operation. These various forms of force, resistance and

texture that inﬂuence our physical senses are called haptic devices.

Haptic devices are not limited to virtual environments, but are used in specialist

interfaces in the real world too. Electronic braille displays either have pins that rise

or fall to give different patterns, or may involve small vibration pins. Force feedback

has been used in the design of in-car controls.

In fact, the car gives a very good example of the power of tactile feedback. If

you drive over a small bump in the road the car is sent slightly off course; however,

the chances are that you will correct yourself before you are consciously aware of the

bump. Within your body you have reactions that push back slightly against pressure

to keep your limbs where you ‘want’ them, or move your limbs out of the way when

you brush against something unexpected. These responses occur in your lower

brain and are very fast, not involving any conscious effort. So, haptic devices can

access very fast responses, but these responses are not fully controlled. This can be

used effectively in design, but of course also with caution.

Texture is more difﬁcult as it depends on small changes between neighboring

points on the skin. Also, most of our senses notice change rather than ﬁxed stimuli,

so we usually feel textures when we move our ﬁngers over a surface, not just when

resting on it. Technology for this is just beginning to become available

There is evidence that smell is one of the strongest cues to memory. Various

historical recreations such as the Jorvik Centre in York, England, use smells to create

a feeling of immersion in their static displays of past life. Some arcade games also

generate smells, for example, burning rubber as your racing car skids on the track.

These examples both use a ﬁxed smell in a particular location. There have been

several attempts to produce devices to allow smells to be recreated dynamically in

response to games or even internet sites. The technical difﬁculty is that our noses do

not have a small set of basic smells that are mixed (like salt/sweet/sour/bitter/savoury

on our tongue), but instead there are thousands of different types of receptor

responding to different chemicals in the air. The general pattern of devices to gener-

ate smells is to have a large repertoire of tiny scent-containing capsules that are

released in varying amounts on demand – rather like a printer cartridge with

hundreds of ink colors! So far there appears to be no mass market for these devices,

but they may eventually develop from niche markets.

Smell is a complex multi-dimensional sense and has a peculiar ability to trigger

memory, but cannot be changed rapidly. These qualities may prove valuable in areas

where a general sense of location and awareness is desirable. For example, a project

at the Massachusetts Institute of Technology explored the use of a small battery

of scent generators which may be particularly valuable for ambient displays and

background awareness [198, 161].

94 Chapter 2 n The computer

2.6.4 Physical controls

Look at Figure 2.13. In it you can see the controls for a microwave, a washing

machine and a personal MiniDisc player. See how they each use very different phys-

ical devices: the microwave has a ﬂat plastic sheet with soft buttons, the washing

machine large switches and knobs, and the MiniDisc has small buttons and an inter-

esting multi-function end.

A desktop computer system has to serve many functions and so has generic keys

and controls that can be used for a variety of purposes. In contrast, these dedicated

control panels have been designed for a particular device and for a single use. This is

why they differ so much.

Looking ﬁrst at the microwave, it has a ﬂat plastic control panel. The buttons on

the panel are pressed and ‘give’ slightly. The choice of the smooth panel is probably

partly for visual design – it looks streamlined! However, there are also good prac-

tical reasons. The microwave is used in the kitchen whilst cooking, with hands that

may be greasy or have food on them. The smooth controls have no gaps where food

can accumulate and clog buttons, so it can easily be kept clean and hygienic.

When using the washing machine you are handling dirty clothes, which may be

grubby, but not to the same extent, so the smooth easy-clean panel is less important

(although some washing machines do have smooth panels). It has several major

DESIGN FOCUS

Feeling the road

In the BMW 7 Series you will ﬁnd a single haptic feedback control for many of the functions that would

normally have dedicated controls. It uses technology developed by Immersion Corporation who are

also behind the force feedback found in many medical and entertainment haptic devices. The iDrive

control slides backwards and forwards and rotates to give access to various menus and lists of options.

The haptic feedback allows the user to feel ‘clicks’ appropriate to the number of items in the various

menu lists.

See: www.immersion.com/ and www.bmw.com/ Picture courtesy of BMW AG

2.6 Physical controls, sensors and special devices 95

settings and the large buttons act both as control and display. Also the dials for

dryer timer and the washing program act both as a means to set the desired time or

program and to display the current state whilst the wash is in progress.

Finally, the MiniDisc controller needs to be small and unobtrusive. It has tiny

buttons, but the end control is most interesting. It twists from side to side and

also can be pulled and twisted. This means the same control can be used for two

different purposes. This form of multi-function control is common in small

devices.

We discussed the immediacy of haptic feedback and these lessons are also import-

ant at the level of creating physical devices; do keys, dials, etc., feel as if they have

been pressed or turned? Getting the right level of resistance can make the device

work naturally, give you feedback that you have done something, or let you know

that you are controlling something. Where for some reason this is not possible,

something has to be done to prevent the user getting confused, perhaps pressing but-

tons twice; for example, the smooth control panel of the microwave in Figure 2.13

offers no tactile feedback, but beeps for each keypress. We will discuss these design

issues further when we look at user experience in Chapter 3 (Section 3.9).

Figure 2.13 Physical controls on microwave, washing machine and MiniDisc. Source: Photograph bottom

right by Alan Dix with permission from Sony (UK)

96 Chapter 2 n The computer

Whereas texture is difﬁcult to generate, it is easy to build into materials. This can

make a difference to the ease of use of a device. For example, a touchpad is smooth,

but a keyboard nipple is usually rubbery. If they were the other way round it would

be hard to drag your ﬁnger across the touchpad or to operate the nipple without

slipping. Texture can also be used to disambiguate. For example, most keyboards

have a small raised dot on the ‘home’ keys for touch typists and some calculators and

phones do the same on the ‘5’ key. This is especially useful in applications when the

eyes are elsewhere.

2.6.5 Environment and bio-sensing

In a public washroom there are often no controls for the wash basins, you simply put

your hands underneath and (hope that) the water ﬂows. Similarly when you open

the door of a car, the courtesy light turns on. The washbasin is controlled by a small

infrared sensor that is triggered when your hands are in the basin (although it is

DESIGN FOCUS

Smart-Its – making using sensors easy

Building systems with physical sensors is no easy task. You need a soldering iron, plenty of experience

in electronics, and even more patience. Although some issues are unique to each sensor or project,

many of the basic building blocks are similar – connecting simple microprocessors to memory and

networks, connecting various standard sensors such as temperature, tilt, etc.

The Smart-Its project has made this job easier by creating a collection of components and an

architecture for adding new sensors. There are a number of basic Smart-It boards – the photo on

the left shows a microprocessor with wireless connectivity. Onto these boards are plugged a variety of

modules – in the center is a sensor board including temperature and light, and on the right is a power

controller.

See: www.smart-its.org/ Source: Courtesy of Hans Gellersen

sometimes hard to ﬁnd the ‘sweet spot’ where this happens!). The courtesy lights are

triggered by a small switch in the car door.

Although we are not always conscious of them, there are many sensors in our

environment – controlling automatic doors, energy saving lights, etc. and devices

monitoring our behavior such as security tags in shops. The vision of ubiquitous

computing (see Chapters 4 and 20) suggests that our world will be ﬁlled with such

devices. Certainly the gap between science ﬁction and day-to-day life is narrow;

for example, in the ﬁlm Minority Report (20th Century Fox) iris scanners identify

each passer-by to feed them dedicated advertisements, but you can buy just such an

iris scanner as a security add-on for your home computer.

There are many different sensors available to measure virtually anything: temper-

ature, movement (ultrasound, infrared, etc.), location (GPS, global positioning, in

mobile devices), weight (pressure sensors). In addition audio and video information

can be analyzed to identify individuals and to detect what they are doing. This

all sounds big brother like, but is also used in ordinary applications, such as the

washbasin.

Sensors can also be used to capture physiological signs such as body temperature,

unconscious reactions such as blink rate, or unconscious aspects of activities such

as typing rate, vocabulary shifts (e.g. modal verbs). For example, in a speech-based

game, Tsukahara and Ward use gaps in speech and prosody (patterns of rhythm,

pitch and loudness in speech) to infer the user’s emotional state and thus the nature

of acceptable responses [350] and Allanson discusses a variety of physiological

sensors to create ‘electrophysiological interactive computer systems’ [12].

PAPER: PRINTING AND SCANNING

Some years ago, a recurrent theme of information technology was the paperless ofﬁce.

In the paperless ofﬁce, documents would be produced, dispatched, read and ﬁled

online. The only time electronic information would be committed to paper would be

when it went out of the ofﬁce to ordinary customers, or to other ﬁrms who were lag-

gards in this technological race. This vision was fuelled by rocketing property prices,

and the realization that the ﬂoor space for a wastepaper basket could cost thousands

in rent each year. Some years on, many traditional paper ﬁles are now online, but the

desire for the completely paperless ofﬁce has faded. Ofﬁces still have wastepaper bas-

kets, and extra ﬂoor space is needed for the special computer tables to house 14-inch

color monitors.

In this section, we will look at some of the available technology that exists to get

information to and from paper. We will look ﬁrst at printing, the basic technology,

and issues raised by it. We will then go on to discuss the movement from paper back

into electronic media. Although the paperless ofﬁce is no longer seen as the goal, the

less-paper ofﬁce is perhaps closer, now that the technologies for moving between

media are better.

2.7

2.7 Paper: printing and scanning 97

98 Chapter 2 n The computer

2.7.1 Printing

If anything, computer systems have made it easier to produce paper documents. It is

so easy to run off many copies of a letter (or book), in order to get it looking ‘just

right’. Older printers had a ﬁxed set of characters available on a printhead. These var-

ied from the traditional line printer to golf-ball and daisy-wheel printers. To change

a typeface or the size of type meant changing the printhead, and was an awkward,

and frequently messy, job, but for many years the daisy-wheel printer was the only

means of producing high-quality output at an affordable price. However, the drop in

the price of laser printers coupled with the availability of other cheap high-quality

printers means that daisy-wheels are fast becoming a rarity.

All of the popular printing technologies, like screens, build the image on the paper

as a series of dots. This enables, in theory, any character set or graphic to be printed,

Common types of dot-based printers

Dot-matrix printers

These use an inked ribbon, like a typewriter, but instead of a single character-shaped head striking

the paper, a line of pins is used, each of which can strike the ribbon and hence dot the paper.

Horizontal resolution can be varied by altering the speed of the head across the paper, and ver-

tical resolution can be improved by sending the head twice across the paper at a slightly different

position. So, dot-matrix printers can produce fast draft-quality output or slower ‘letter’-quality

output. They are cheap to run, but could not compete with the quality of jet and laser printers for

general ofﬁce and home printing. They are now only used for bulk printing, or where carbon paper

is required for payslips, check printing, etc.)

Ink-jet and bubble-jet printers

These operate by sending tiny blobs of ink from the printhead to the paper. The ink is squirted at

pressure from an ink-jet, whereas bubble-jets use heat to create a bubble. Both are quite quiet in

operation. The ink from the bubble-jet (being a bubble rather than a droplet) dries more quickly

than the ink-jet and so is less likely to smear. Both approach laser quality, but the bubble-jet dots

tend to be more accurately positioned and of a less broken shape.

Laser printer

This uses similar technology to a photocopier: ‘dots’ of electrostatic charge are deposited on a

drum, which then picks up toner (black powder). This is then rolled onto the paper and cured by

heat. The curing is why laser printed documents come out warm, and the electrostatic charge is

why they smell of ozone! In addition, some toner can be highly toxic if inhaled, but this is more a

problem for full-time maintenance workers than end-users changing the occasional toner cartridge.

Laser printers give nearly typeset-quality output, with top-end printers used by desktop publishing

ﬁrms. Indeed, many books are nowadays produced using laser printers. The authors of this book

have produced camera-ready copy for other books on 300 and 600 dpi laser printers, although this

book required higher quality and the ﬁrst edition was typeset at 1200 dpi onto special bromide

paper.

2.7 Paper: printing and scanning 99

limited only by the resolution of the dots. This resolution is measured in dots per inch

(dpi). Imagine a sheet of graph paper, and building up an image by putting dots at

the intersection of each line. The number of lines per inch in each direction is the

resolution in dpi. For some mechanical printers this is slightly confused: the dots

printed may be bigger than the gaps, neighboring printheads may not be able to

print simultaneously and may be offset relative to one another (a diamond-shaped

rather than rectangular grid). These differences do not make too much difference to

the user, but mean that, given two printers at the same nominal resolution, the out-

put of one looks better than that of the other, because it has managed the physical

constraints better.

The most common types of dot-based printers are dot-matrix printers, ink-jet

printers and laser printers. These are listed roughly in order of increasing resolution

and quality, where dot-matrix printers typically have a resolution of 80–120 dpi ris-

ing to about 300–600 dpi for ink-jet printers and 600–2400 dpi for laser printers. By

varying the quantity of ink and quality of paper, ink-jet printers can be used to print

photo-quality prints from digital photographs.

Printing in the workplace

Although ink-jet and laser printers have the lion’s share of the ofﬁce and home printer mar-

ket, there are many more specialist applications that require different technology.

Most shop tills use dot-matrix printing where the arrangement is often very clever, with one print-

head serving several purposes. The till will usually print one till roll which stays within the machine,

recording all transactions for audit purposes. An identical receipt is printed for the customer.

In addition, many will print onto the customer’s own check or produce a credit card slip for the

customer to sign. Sometimes the multiple copies are produced using two or more layers of paper

where the top layer receives the ink and the lower layers use pressure-sensitive paper – not

possible using ink-jet or laser technology. Alternatively, a single printhead may move back and

forth over several small paper rolls within the same machine, as well as moving over the slot for

the customer’s own check.

As any printer owner will tell you, ofﬁce printers are troublesome, especially as they age. Dif-

ferent printing technology is therefore needed in harsh environments or where a low level of

supervision is required. Thermal printers use special heat-sensitive paper that changes color when

heated. The printhead simply heats the paper where it wants a dot. Often only one line of dots

is produced per pass, in contrast to dot-matrix and ink-jet printers, which have several pins or

jets in parallel. The image is then produced using several passes per line, achieving a resolution

similar to a dot-matrix. Thermal paper is relatively expensive and not particularly nice to look

at, but thermal printers are mechanically simple and require little maintenance (no ink or toner

splashing about). Thermal printers are used in niche applications, for example industrial equipment,

some portable printers, and fax machines (although many now use plain paper).

100 Chapter 2 n The computer

As well as resolution, printers vary in speed and cost. Typically, ofﬁce-quality ink-

jet or laser printers produce between four and eight pages per minute. Dot-matrix

printers are more often rated in characters per second (cps), and typical speeds may

be 200 cps for draft and 50 cps for letter-quality print. In practice, this means no

more than a page or so per minute. These are maximum speeds for simple text, and

printers may operate much more slowly for graphics.

Color ink-jet printers are substantially cheaper than even monochrome laser

printers. However, the recurrent costs of consumables may easily dominate this

initial cost. Both jet and laser printers have special-purpose parts (print cartridges,

toner, print drums), which need to be replaced every few thousand sheets; and they

must also use high-grade paper. It may be more difﬁcult to ﬁnd suitable grades of

recycled paper for laser printers.

2.7.2 Fonts and page description languages

Some printers can act in a mode whereby any characters sent to them (encoded in

ASCII, see Section 2.8.5) are printed, typewriter style, in a single font. Another case,

simple in theory, is when you have a bitmap picture and want to print it. The dots

to print are sent to the printer, and no further interpretation is needed. However, in

practice, it is rarely so simple.

Many printed documents are far more complex – they incorporate text in many

different fonts and many sizes, often italicized, emboldened and underlined. Within

the text you will ﬁnd line drawings, digitized photographs and pictures generated

from ‘paint’ packages, including the ubiquitous ‘clip art’. Sometimes the computer

does all the work, converting the page image into a bitmap of the right size to be sent

to the printer. Alternatively, a description of the page may be sent to the printer.

At the simplest level, this will include commands to set the print position on the

page, and change the font size.

More sophisticated printers can accept a page description language, the most com-

mon of which is PostScript. This is a form of programming language for printing. It

includes some standard programming constructs, but also some special ones: paths

for drawing lines and curves, sophisticated character and font handling and scaled

bitmaps. The idea is that the description of a page is far smaller than the associated

bitmap, reducing the time taken to send the page to the printer. A bitmap version

of an A4 laser printer page at 300 dpi takes 8 Mbytes; to send this down a standard

serial printer cable would take 10 minutes! However, a computer in the printer has

to interpret the PostScript program to print the page; this is typically faster than 10

minutes, but is still the limiting factor for many print jobs.

Text is printed in a font with a particular size and shape. The size of a font is

measured in points (pt). The point is a printer’s measure and is about 1/72 of an

inch. The point size of the font is related to its height: a 12 point font has about

six lines per inch. The shape of a font is determined by its font name, for example

Times Roman, Courier or Helvetica. Times Roman font is similar to the type of

many newspapers, such as The Times, whereas Courier has a typewritten shape.

2.7 Paper: printing and scanning 101

Some fonts, such as Courier, are ﬁxed pitch, that is each character has the

same width. The alternative is a variable-pitched font, such as Times Roman or

Gill Sans, where some characters, such as the ‘m’, are wider than others, such as

the ‘i’. Another characteristic of fonts is whether they are serif or sans-serif. A serif

font has ﬁne, short cross-lines at the ends of the strokes, imitating those found on cut

stone lettering. A sans-serif font has square-ended strokes. In addition, there are

special fonts looking like Gothic lettering or cursive script, and fonts of Greek letters

and special mathematical symbols.

This book is set in 10 point Minion font using PostScript. Minion is a variable-

pitched serif font. Figure 2.14 shows examples of different fonts.

A mathematics font: αβξ±π∈∀∞⊥≠ℵ∂√∃

Figure 2.14 Examples of different fonts

DESIGN FOCUS

Readability of text

There is a substantial body of knowledge about the readability of text, both on screen and on paper.

An MSc student visited a local software company and, on being shown some of their systems, remarked

on the fact that they were using upper case throughout their displays. At that stage she had only com-

pleted part of an HCI course but she had read Chapter 1 of this book and already knew that WORDS

WRITTEN IN BLOCK CAPITALS take longer to read than those in lower case. Recall that this is largely

because of the clues given by word shapes and is the principle behind ‘look and say’ methods of teach-

ing children to read. The company immediately recognized the value of the advice and she instantly rose

in their esteem!

However, as with many interface design guidelines there are caveats. Although lower-case words are

easier to read, individual letters and nonsense words are clearer in upper case. For example, one writes

ﬂight numbers as ‘BA793’ rather than ‘ba793’. This is particularly important when naming keys to press

(for example, ‘Press Q to quit’) as keyboards have upper-case legends.

Font shapes can also make a difference; for printed text, serif fonts make it easier to run one’s eye

along a line of text. However, they usually reproduce less well on screen where the resolution is

poorer.

102 Chapter 2 n The computer

2.7.3 Screen and page

A common requirement of word processors and desktop publishing software is that

what you see is what you get (see also Chapters 4 and 17), which is often called by its

acronym WYSIWYG (pronounced whizz-ee-wig). This means that the appearance

of the document on the screen should be the same as its eventual appearance on

the printed page. In so far as this means that, for example, centered text is displayed

centered on the screen, this is reasonable. However, this should not cloud the fact

that screen and paper are very different media.

A typical screen resolution is about 72 dpi compared with a laser printer at over

600 dpi. Some packages can show magniﬁed versions of the document in order to

help in this. Most screens use an additive color model using red, green and blue light,

whereas printers use a subtractive color model with cyan, magenta, yellow and black

inks, so conversions have to be made. In addition, the sizes and aspect ratios are very

different. An A4 page is about 11 inches tall by 8 wide (297× 210 mm), whereas a

screen is often of similar dimensions, but wider than it is tall.

These differences cause problems when designing software. Should you try to

make the screen image as close to the paper as possible, or should you try to make

the best of each? One approach to this would be to print only what could be dis-

played, but that would waste the extra resolution of the printer. On the other

hand, one can try to make the screen as much like paper as possible, which

is the intention behind the standard use of black text on a white background,

rotatable A4 displays, and tablet PCs. This is a laudable aim, but cannot get rid of

all the problems.

A particular problem lies with fonts. Imagine we have a line of ‘m’s, each having a

width of 0.15 inch (4 mm). If we print them on a 72 dpi screen, then we can make

the screen character 10 or 11 dots wide, in which case the screen version will be

narrower or wider than the printed version. Alternatively, we can print the screen

version as near as possible to where the printed characters would lie, in which case

the ‘m’s on the screen would have different spaces between them: ‘mm mm mm mm

m’. The latter looks horrible on the screen, so most software chooses the former

approach. This means that text that aligns on screen may not do so on printing.

Some systems use a uniform representation for screen and printer, using the same

font descriptions and even, in the case of the Next operating system, PostScript

for screen display as well as printer output (also PDF with MacOS X). However,

this simply exports the problem from the application program to the operating

system.

The differences between screen and printer mean that different forms of graphic

design are needed for each. For example, headings and changes in emphasis are made

using font style and size on paper, but using color, brightness and line boxes on

screen. This is not usually a problem for the display of the user’s own documents as

the aim is to give the user as good an impression of the printed page as possible, given

the limitations. However, if one is designing parallel paper and screen forms, then

one has to trade off consistency between the two representations with clarity in each.

2.7 Paper: printing and scanning 103

An overall similar layout, but with different forms of presentation for details, may be

appropriate.

2.7.4 Scanners and optical character recognition

Printers take electronic documents and put them on paper – scanners reverse this

process. They start by turning the image into a bitmap, but with the aid of optical

character recognitioncan convert the page right back into text. The image to be con-

verted may be printed, but may also be a photograph or hand-drawn picture.

There are two main kinds of scanner: ﬂat-bed and hand-held. With a ﬂat-bed

scanner, the page is placed on a ﬂat glass plate and the whole page is converted

into a bitmap. A variant of the ﬂat-bed is where sheets to be scanned are pulled

through the machine, common in multi-function devices (printer/fax/copier). Many

ﬂat-bed scanners allow a small pile of sheets to be placed in a feed tray so that

they can all be scanned without user intervention. Hand-held scanners are pulled

over the image by hand. As the head passes over an area it is read in, yielding

a bitmap strip. A roller at the ends ensures that the scanner knows how fast it is

being pulled and thus how big the image is. The scanner is typically only 3 or 4 inches

(80 or 100 mm) wide and may even be the size of a large pen (mainly used for

scanning individual lines of text). This means at least two or three strips must be

‘glued’ together by software to make a whole page image, quite a difﬁcult process

as the strips will overlap and may not be completely parallel to one another, as

well as suffering from problems of different brightness and contrast. However,

for desktop publishing small images such as photographs are quite common, and

as long as one direction is less than the width of the scanner, they can be read in

one pass.

Scanners work by shining a beam of light at the page and then recording the intens-

ity and color of the reﬂection. Like photocopiers, the color of the light that is shone

means that some colors may appear darker than others on a monochrome scanner.

For example, if the light is pure red, then a red image will reﬂect the light completely

and thus not appear on the scanned image.

Like printers, scanners differ in resolution, commonly between 600 and 2400 dpi,

and like printers the quoted resolution needs careful interpretation. Many have a

lower resolution scanhead but digitally interpolate additional pixels – the same is

true for some digital cameras. Monochrome scanners are typically only found in

multi-function devices, but color scanners usually have monochrome modes for

black and white or grayscale copying. Scanners will usually return up to 256 levels

of gray or RGB (red, green, blue) color. If a pure monochrome image is required

(for instance, from a printed page), then it can thresholdthe grayscale image; that is,

turn all pixels darker than some particular value black, and the rest white.

Scanners are used extensively in desktop publishing (DTP) for reading in hand-

drawn pictures and photographs. This means that cut and paste can be performed

electronically rather than with real glue. In addition, the images can be rotated,

104 Chapter 2 n The computer

scaled and otherwise transformed, using a variety of image manipulation software

tools. Such tools are becoming increasingly powerful, allowing complex image trans-

formations to be easily achieved; these range from color correction, through the

merging of multiple images to the application of edge-detection and special effects

ﬁlters. The use of multiple layers allows photomontage effects that would be imposs-

ible with traditional photographic or paper techniques. Even where a scanned image

is simply going to be printed back out as part of a larger publication, some process-

ing typically has to be performed to match the scanned colors with those produced

during printing. For ﬁlm photographs there are also special ﬁlm scanners that can

scan photographic negatives or color slides. Of course, if the photographs are digital

no scanning is necessary.

Another application area is in document storage and retrieval systems, where

paper documents are scanned and stored on computer rather than (or sometimes as

well as) in a ﬁling cabinet. The costs of maintaining paper records are enormous, and

electronic storage can be cheaper, more reliable and more ﬂexible. Storing a bitmap

image is neither most useful (in terms of access methods), nor space efﬁcient (as we

will see later), so scanning may be combined with optical character recognition to

obtain the text rather than the page image of the document.

Optical character recognition (OCR) is the process whereby the computer can

‘read’ the characters on the page. It is only comparatively recently that print could be

reliably read, since the wide variety of typefaces and print sizes makes this more

difﬁcult than one would imagine – it is not simply a matter of matching a character

shape to the image on the page. In fact, OCR is rather a misnomer nowadays as,

although the document is optically scanned, the OCR software itself operates on the

bitmap image. Current software can recognize ‘unseen’ fonts and can even produce

output in word-processing formats, preserving super- and subscripts, centering,

italics and so on.

Another important area is electronic publishing for multimedia and the world

wide web. Whereas in desktop publishing the scanned image usually ends up (after

editing) back on paper, in electronic publishing the scanned image is destined to be

viewed on screen. These images may be used simply as digital photographs or may

be made active, whereby clicking on some portion of the image causes pertinent

information to be displayed (see Chapter 3 for more on the point-and-click style

of interaction). One big problem when using electronic images is the plethora of

formats for storing graphics (see Section 2.8.5). Another problem is the fact that

different computers can display different numbers of colors and that the appearance

of the same image on different monitors can be very different.

The importance of electronic publishing and also the ease of electronically manip-

ulating images for printing have made the digital camera increasingly popular.

Rather than capturing an image on ﬁlm, a digital camera has a small light-sensitive

chip that can directly record an image into memory.

2.7 Paper: printing and scanning 105

Worked exercise What input and output devices would you use for the following systems? For each, compare

and contrast alternatives, and if appropriate indicate why the conventional keyboard, mouse

and CRT screen may be less suitable.

(a) portable word processor

(b) tourist information system

Paper-based interaction

Paper is principally seen as an output medium. You type in some text, format it, print it and

read it. The idea of the paperless ofﬁce was to remove the paper from the write–read loop entirely,

but it didn’t fundamentally challenge its place in the cycle as an output medium. However, this view of

paper as output has changed as OCR technology has improved and scanners become commonplace.

Workers at Xerox Palo Alto Research Center (also known as Xerox PARC) capitalized on this by

using paper as a medium of interaction with computer systems [195]. A special identifying mark is

printed onto forms and similar output. The printed forms may have check boxes or areas for writ-

ing numbers or (in block capitals!) words. The form can then be scanned back in. The system reads

the identifying mark and thereby knows what sort of paper form it is dealing with. It doesn’t have

to use OCR on the printed text of the form as it printed it, but can detect the check boxes that

have been ﬁlled in and even recognize the text that has been written. The identifying mark the

researchers used is composed of backward and forward slashes, ‘\’ and ‘/’, and is called a glyph.

An alternative would have been to use bar codes, but the slashes were found to fax and scan

more reliably. The research version of this system was known as XAX, but it is now marketed as

Xerox PaperWorks.

One application of this technology is mail order catalogs. The order form is printed with a glyph.

When completed, forms can simply be collected into bundles and scanned in batches, generating

orders automatically. If the customer faxes an order the fax-receiving software recognizes the

glyph and the order is processed without ever being handled at the company end. Such a paper

user interface may involve no screens or keyboards whatsoever.

Some types of paper now have identifying marks micro-printed like a form of textured water-

mark. This can be used both to identify the piece of paper (as the glyph does), and to identify the

location on the paper. If this book were printed on such paper it would be possible to point at

a word or diagram with a special pen-like device and have it work out what page you are on

and where you are pointing and thus take you to appropriate web materials...perhaps the fourth

edition...

It is paradoxical that Xerox PARC, where much of the driving work behind current ‘mouse and

window’ computer interfaces began, has also developed this totally non-screen and non-mouse

paradigm. However, the common principle behind each is the novel and appropriate use of differ-

ent media for graceful interaction.

106 Chapter 2 n The computer

(d) air trafﬁc control system

(e) worldwide personal communications system

(f) digital cartographic system.

Answer In the later exercise on basic architecture (see Section 2.8.6), we focus on ‘typical’

systems, whereas here the emphasis is on the diversity of different devices needed for

specialized purposes. You can ‘collect’ devices – watch out for shop tills, bank tellers,

taxi meters, lift buttons, domestic appliances, etc.

(a) Portable word processor

The determining factors are size, weight and battery power. However, remember

the purpose: this is a word processor not an address book or even a data entry

device.

(i) LCD screen – low-power requirement

(ii) trackball or stylus for pointing

(iii) real keyboard – you can’t word process without a reasonable keyboard and

stylus handwriting recognition is not good enough

(iv) small, low-power bubble-jet printer – although not always necessary, this

makes the package stand alone. It is probably not so necessary that the printer

has a large battery capacity as printing can probably wait until a power point is

found.

(b) Tourist information system

This is likely to be in a public place. Most users will only visit the system once, so

the information and mode of interaction must be immediately obvious.

(i) touchscreen only – easy and direct interaction for ﬁrst-time users (see also

Chapter 3)

(ii) NO mice or styluses – in a public place they wouldn’t stay long!

A hostile environment with plenty of mud and chemicals. Requires numerical input

for ﬂow rates, etc., but probably no text

(i) touch-sensitive keypad – ordinary keypads would get blocked up

(ii) small dedicated LED display (LCDs often can’t be read in sunlight and large

screens are fragile)

(iii) again no mice or styluses – they would get lost.

(d) Air trafﬁc control system

The emphasis is on immediately available information and rapid interaction. The

controller cannot afford to spend time searching for information; all frequently used

information must be readily available.

(i) several specialized displays – including overlays of electronic information on

radar

(ii) light pen or stylus – high-precision direct interaction

(iii) keyboard – for occasional text input, but consider making it fold out of the way.

(e) Worldwide personal communications system

Basically a super mobile phone! If it is to be kept on hand all the time it must be

very light and pocket sized. However, to be a ‘communications’ system one would

imagine that it should also act as a personal address/telephone book, etc.

2.8 Memory 107

(i) standard telephone keypad – the most frequent use

(ii) small dedicated LCD display – low power, specialized functions

(iii) possibly stylus for interaction – it allows relatively rich interaction with the

address book software, but little space

(iv) a ‘docking’ facility – the system itself will be too small for a full-sized key-

board(!), but you won’t want to enter in all your addresses and telephone num-

bers by stylus!

(f) Digital cartographic system

This calls for very high-precision input and output facilities. It is similar to CAD in

terms of the screen facilities and printing, but in addition will require specialized

data capture.

(i) large high-resolution color VDU (20 inch or bigger) – these tend to be enor-

mously big (from back to front). LCD screens, although promising far thinner

displays in the long term, cannot at present be made large enough

(ii) digitizing tablet – for tracing data on existing paper maps. It could also double

up as a pointing device for some interaction

(iii) possibly thumbwheels – for detailed pointing and positioning tasks

(iv) large-format printer – indeed very large: an A2 or A1 plotter at minimum.

MEMORY

Like human memory, we can think of the computer’s memory as operating at dif-

ferent levels, with those that have the faster access typically having less capacity. By

analogy with the human memory, we can group these into short-term and long-term

memories (STM and LTM), but the analogy is rather weak – the capacity of the com-

puter’s STM is a lot more than seven items! The different levels of computer mem-

ory are more commonly called primary and secondary storage.

The details of computer memory are not in themselves of direct interest to the

user interface designer. However, the limitations in capacity and access methods are

important constraints on the sort of interface that can be designed. After some fairly

basic information, we will put the raw memory capacity into perspective with the

sort of information which can be stored, as well as again seeing how advances in

technology offer more scope for the designer to produce more effective interfaces. In

particular, we will see how the capacity of typical memory copes with video images as

these are becoming important as part of multimedia applications (see Chapter 21).

2.8.1 RAM and shor t-term memory (STM)

At the lowest level of computer memory are the registers on the computer chip, but

these have little impact on the user except in so far as they affect the general speed of

2.8

108 Chapter 2 n The computer

the computer. Most currently active information is held in silicon-chip random

access memory(RAM ). Different forms of RAM differ as to their precise access times,

power consumption and characteristics. Typical access times are of the order of

10 nanoseconds, that is a hundred-millionth of a second, and information can be

accessed at a rate of around 100 Mbytes (million bytes) per second. Typical storage

in modern personal computers is between 64 and 256 Mbytes.

Most RAM is volatile, that is its contents are lost when the power is turned off.

However, many computers have small amount of non-volatile RAM, which retains its

contents, perhaps with the aid of a small battery. This may be used to store setup

information in a large computer, but in a pocket organizer will be the whole mem-

ory. Non-volatile RAM is more expensive so is only used where necessary, but with

many notebook computers using very low-power static RAM, the divide is shrink-

ing. By strict analogy, non-volatile RAM ought to be classed as LTM, but the import-

ant thing we want to emphasize is the gulf between STM and LTM in a traditional

computer system.

In PDAs the distinctions become more confused as the battery power means that

the system is never completely off, so RAM memory effectively lasts for ever. Some

also use ﬂash memory, which is a form of silicon memory that sits between ﬁxed

content ROM (read-only memory) chips and normal RAM. Flash memory is relat-

ively slow to write, but once written retains its content even with no power whatso-

ever. These are sometimes called silicon disks on PDAs. Digital cameras typically

store photographs in some form of ﬂash media and small ﬂash-based devices are

used to plug into a laptop or desktop’s USB port to transfer data.

2.8.2 Disks and long-term memor y (LTM)

For most computer users the LTM consists of disks, possibly with small tapes for

backup. The existence of backups, and appropriate software to generate and retrieve

them, is an important area for user security. However, we will deal mainly with those

forms of storage that impact the interactive computer user.

There are two main kinds of technology used in disks: magnetic disks and optical

disks. The most common storage media, ﬂoppy disks and hard (or ﬁxed) disks,

are coated with magnetic material, like that found on an audio tape, on which the

information is stored. Typical capacities of ﬂoppy disks lie between 300 kbytes and

1.4 Mbytes, but as they are removable, you can have as many as you have room for

on your desk. Hard disks may store from under 40 Mbytes to several gigabytes

(Gbytes), that is several thousand million bytes. With disks there are two access times

to consider, the time taken to ﬁnd the right track on the disk, and the time to read

the track. The former dominates random reads, and is typically of the order of 10 ms

for hard disks. The transfer rate once the track is found is then very high, perhaps

several hundred kilobytes per second. Various forms of large removable media are

also available, ﬁtting somewhere between ﬂoppy disks and removable hard disks, and

are especially important for multimedia storage.

2.8 Memory 109

Optical disks use laser light to read and (sometimes) write the information on the

disk. There are various high capacity specialist optical devices, but the most common

is the CD-ROM, using the same technology as audio compact discs. CD-ROMs have

a capacity of around 650 megabytes, but cannot be written to at all. They are useful

for published material such as online reference books, multimedia and software

distribution. Recordable CDs are a form of WORM device (write-once read-many)

and are more ﬂexible in that information can be written, but (as the name suggests)

only once at any location – more like a piece of paper than a blackboard. They are

obviously very useful for backups and for producing very secure audit information.

Finally, there are fully rewritable optical disks, but the rewrite time is typically much

slower than the read time, so they are still primarily for archival not dynamic storage.

Many CD-ROM reader/writers can also read DVD format, originally developed for

storing movies. Optical media are more robust than magnetic disks and so it is easier

to use a jukeboxarrangement, whereby many optical disks can be brought online auto-

matically as required. This can give an online capacity of many hundreds of giga-

bytes. However, as magnetic disk capacities have grown faster than the ﬁxed standard

of CD-ROMs, some massive capacity stores are moving to large disk arrays.

2.8.3 Understanding speed and capacity

So what effect do the various capacities and speeds have on the user? Thinking of our

typical personal computer system, we can summarize some typical capacities as in

Table 2.1.

We think ﬁrst of documents. This book is about 320,000 words, or about 2

Mbytes, so it would hardly make a dent in 256 Mbytes of RAM. (This size – 2 Mbytes

– is unformatted and without illustrations; the actual size of the full data ﬁles is an

order of magnitude bigger, but still well within the capacity of main memory.) To

take a more popular work, the Bible would use about 4.5 Mbytes. This would still

consume only 2% of main memory, and disappear on a hard disk. However, it might

look tight on a smaller PDA. This makes the memory look not too bad, so long as

you do not intend to put your entire library online. However, many word processors

come with a dictionary and thesaurus, and there is no standard way to use the same

one with several products. Together with help ﬁles and the program itself, it is not

Table 2.1 Typical capacities of different storage media

STM small/fast LTM large/slower

Media: RAM Hard disk

Capacity: 256 Mbytes 100 Gbytes

Access time: 10 ns 7 ms

Transfer rate: 100 Mbyte/s 30 Mbyte/s

110 Chapter 2 n The computer

unusual to ﬁnd each application consuming tens or even hundreds of megabytes of

disk space – it is not difﬁcult to ﬁll a few gigabytes of disk at all!

Similarly, although 256 Mbytes of RAM are enough to hold most (but not all) sin-

gle programs, windowed systems will run several applications simultaneously, soon

using up many megabytes. Operating systems handle this by paging unused bits of

programs out of RAM onto disk, or even swapping the entire program onto disk.

This makes little difference to the logical functioning of the program, but has a

signiﬁcant effect on interaction. If you select a window, and the relevant application

happens to be currently swapped out onto the disk, it has to be swapped back in. The

delay this causes can be considerable, and is both noticeable and annoying on many

systems.

The delays due to swapping are a symptom of the von Neumann bottleneck

between disk and main memory. There is plenty of information in the memory, but

it is not where it is wanted, in the machine’s RAM. The path between them is limited

by the transfer rate of the disk and is too slow. Swapping due to the operating system

may be difﬁcult to avoid, but for an interactive system designer some of these prob-

lems can be avoided by thinking carefully about where information is stored and

when it is transferred. For example, the program can be lazy about information

transfer. Imagine the user wants to look at a document. Rather than reading in the

whole thing before letting the user continue, just enough is read in for the ﬁrst page

to be displayed, and the rest is read during idle moments.

Returning to documents, if they are scanned as bitmaps (and not read using

OCR), then the capacity of our system looks even less impressive. Say an 11× 8 inch

(297× 210 mm) page is scanned with an 8 bit grayscale (256 levels) setting at 1200 dpi.

The image contains about one billion bits, that is about 128 Mbyte. So, our 100 Gbyte

disk could store 800 pages – just OK for this book, but not for the Bible.

If we turn to video, things are even worse. Imagine we want to store moving

video using 12 bits for each pixel (4 bits for each primary color giving 16 levels of

brightness), each frame is 512 × 512 pixels, and we store at 25 frames per second.

Technological change and storage capacity

Most of the changes in this book since the ﬁrst and second editions have been additions

where new developments have come along. However, this portion has had to be scrutinized line

by line as the storage capacities of high-end machines when this book was ﬁrst published in 1993

looked ridiculous as we revised it in 1997 and then again in 2003. One of our aims in this chapter

was to give readers a concrete feel for the capacities and computational possibilities in standard

computers. However, the pace of advances in this area means that it becomes out of date almost

as fast as it is written! This is also a problem for design; it is easy to build a system that is sensible

given a particular level of technology, but becomes meaningless later. The solution is either to issue

ever more frequent updates and new versions, or to exercise a bit of foresight. . .

2.8 Memory 111

This is by no means a high-quality image, but each frame requires approximately

400 kbytes giving 10 Mbytes per second. Our disk will manage about three hours

of video – one good movie. Lowering our sights to still photographs, good digital

cameras usually take 2 to 4 mega pixels at 24 bit color; that is 10 Mbytes of raw

uncompressed image – you’d not get all your holiday snaps into main memory!

2.8.4 Compression

In fact, things are not quite so bad, since compression techniques can be used to

reduce the amount of storage required for text, bitmaps and video. All of these things

are highly redundant. Consider text for a moment. In English, we know that if we use

the letter ‘q’ then ‘u’ is almost bound to follow. At the level of words, some words

like ‘the’ and ‘and’ appear frequently in text in general, and for any particular work

one can ﬁnd other common terms (this book mentions ‘user’ and ‘computer’ rather

frequently). Similarly, in a bitmap, if one bit is white, there is a good chance the next

will be as well. Compression algorithms take advantage of this redundancy. For

example, Huffman encoding gives short codes to frequent words [182], and run-

length encoding represents long runs of the same value by length value pairs. Text

can easily be reduced by a factor of ﬁve and bitmaps often compress to 1% of their

original size.

For video, in addition to compressing each frame, we can take advantage of the

fact that successive frames are often similar. We can compute the differencebetween

successive frames and then store only this – compressed, of course. More sophistic-

ated algorithms detect when the camera pans and use this information also. These

differencing methods fail when the scene changes, and so the process periodically has

to restart and send a new, complete (but compressed) image. For storage purposes

this is not a problem, but when used for transmission over telephone lines or net-

works it can mean glitches in the video as the system catches up.

With these reductions it is certainly possible to store low-quality video at

64 kbyte/s; that is, we can store ﬁve hours of highly compressed video on our 1 Gbyte

hard disk. However, it still makes the humble video cassette look very good value.

Probably the leading edge of video still and photographic compression is fractal

compression. Fractals have been popularized by the images of the Mandelbrot set (that

swirling pattern of computer-generated colors seen on many T-shirts and posters).

Fractals refer to any image that contains parts which, when suitably scaled, are sim-

ilar to the whole. If we look at an image, it is possible to ﬁnd parts which are approx-

imately self-similar, and these parts can be stored as a fractal with only a few numeric

parameters. Fractal compression is especially good for textured features, which cause

problems for other compression techniques. The decompression of the image can

be performed to any degree of accuracy, from a very rough soft-focus image, to

one more detailed than the original. The former is very useful as one can produce

poor-quality output quickly, and better quality given more time. The latter is rather

remarkable – the fractal compression actually ﬁlls in details that are not in the

original. These details are not accurate, but look convincing!

112 Chapter 2 n The computer

2.8.5 Storage format and standards

The most common data types stored by interactive programs are text and bitmap

images, with increasing use of video and audio, and this subsection looks at the

ridiculous range of ﬁle storage standards. We will consider database retrieval in the

next subsection.

The basic standard for text storage is the ASCII (American standard code for

information interchange) character codes, which assign to each standard printable

character and several control characters an internationally recognized 7 bit code

(decimal values 0–127), which can therefore be stored in an 8 bit byte, or be transmit-

tedas 8 bits including parity. Many systems extend the codes to the values 128 –255,

including line-drawing characters, mathematical symbols and international letters such

as ‘æ’. There is a 16 bit extension, the UNICODE standard, which has enough room

for a much larger range of characters including the Japanese Kanji character set.

As we have already discussed, modern documents consist of more than just characters.

The text is in different fonts and includes formatting information such as centering,

page headers and footers. On the whole, the storage of formatted text is vendor speciﬁc,

since virtually every application has its own ﬁle format. This is not helped by the fact

that many suppliers attempt to keep their ﬁle formats secret, or update them fre-

quently to stop others’ products being compatible. With the exception of bare ASCII,

the most common shared format is rich text format(RTF), which encodes formatting

information including style sheets. However, even where an application will import

or export RTF, it may represent a cut-down version of the full document style.

RTF regards the document as formatted text, that is it concentrates on the appear-

ance. Documents can also be regarded as structured objects: this book has chapters

containing sections, subsections...paragraphs, sentences, words and characters. There

are ISOstandards for document structure and interchange, which in theory could be

used for transfer between packages and sites, but these are rarely used in practice.

Just as the PostScript language is used to describe the printed page, SGML(standard

generalized markup language) can be used to store structured text in a reasonably

extensible way. You can deﬁne your own structures (the deﬁnition itself in SGML),

and produce documents according to them. XML (extensible markup language), a

lightweight version of SGML, is now used extensively for web-based applications.

For bitmap storage the range of formats is seemingly unending. The stored image

needs to record the size of the image, the number of bits per pixel, possibly a color

map, as well as the bits of the image itself. In addition, an icon may have a ‘hot-spot’

for use as a cursor. If you think of all the ways of encoding these features, or leaving

them implicit, and then consider all the combinations of these different encodings,

you can see why there are problems. And all this before we have even considered

the effects of compression! There is, in fact, a whole software industry producing

packages that convert from one format to another.

Given the range of storage standards (or rather lack of standards), there is no easy

advice as to which is best, but if you are writing a new word processor and are about

to decide how to store the document on disk, think, just for a moment, before

deﬁning yet another format.

2.8 Memory 113

2.8.6 Methods of access

Standard database access is by special key ﬁelds with an associated index. The user

has to know the key before the system can ﬁnd the information. A telephone direct-

ory is a good example of this. You can ﬁnd out someone’s telephone number if you

know their name (the key), but you cannot ﬁnd the name given the number. This is

evident in the interface of many computer systems. So often, when you contact an

organization, they can only help you if you give your customer number, or last order

number. The usability of the system is seriously impaired by a shortsighted reliance

on a single key and index. In fact, most database systems will allow multiple keys and

indices, allowing you to ﬁnd a record given partial information. So these problems

are avoidable with only slight foresight.

There are valid reasons for not indexing on too many items. Adding extra indices

adds to the size of the database, so one has to balance ease of use against storage cost.

However, with ever-increasing disk sizes, this is not a good excuse for all but extreme

examples. Unfortunately, brought up on lectures about algorithmic efﬁciency, it is

easy for computer scientists to be stingy with storage. Another, more valid, reason

for restricting the ﬁelds you index is privacy and security. For example, telephone

companies will typically hold an online index that, given a telephone number, would

return the name and address of the subscriber, but to protect the privacy of their cus-

tomers, this information is not divulged to the general public.

It is often said that dictionaries are only useful for people who can spell. Bad

spellers do not know what a word looks like so cannot look it up to ﬁnd out. Not only

in spelling packages, but in general, an application can help the user by matching

badly spelt versions of keywords. One example of this is do what I mean (DWIM)

used in several of Xerox PARC’s experimental programming environments. If a

command name is misspelt the system prompts the user with a close correct name.

Menu-based systems make this less of an issue, but one can easily imagine doing

the same with, say, ﬁle selection. Another important instance of this principle is

Soundex, a way of indexing words, especially names. Given a key, Soundex ﬁnds

those words which sound similar. For example, given McCloud, it would ﬁnd

MacCleod. These are all examples of forgiving systems, and in general one should aim

to accommodate the user’s mistakes. Again, there are exceptions to this: you do not

want a bank’s automated teller machine (ATM) to give money when the PIN num-

ber is almost correct!

Not all databases allow long passages of text to be stored in records, perhaps set-

ting a maximum length for text strings, or demanding the length be ﬁxed in advance.

Where this is the case, the database seriously restricts interface applications where

text forms an important part. At the other extreme, free text retrievalsystems are cen-

tered on unformatted, unstructured text. These systems work by keeping an index

of every word in every document, and so you can ask ‘give me all documents with

the words “human” and “computer” in them’. Programs, such as versions of the

UNIX ‘grep’ command, give some of the same facilities by quickly scanning a list of

ﬁles for a certain word, but are much slower. On the web, free text search is of course

the standard way to ﬁnd things using search engines.