Part F - Design

Usability Engineering

Introduce the Discipline of Usability Engineering
Analyze the User
Describe the Elements of a Task Analysis
Define the Metrics used to Evaluate Usability
Describe the process of determining the usability of a system
Introduce the elements of the process and the options available to the evaluator

"[Usability engineering addresses the effectiveness, efficiency and satisfaction with which specified users can achieve specified goals in particular environments" ISO DIS 9241-11

User | Task Analysis | Metrics | Specifications | Methodology | Experiments | Participatory | Inspection | Exercises

The need for usability engineering is widely recognized.  Usability engineering turns the poorly designed product into a well designed one. 

"[Usability engineering is] a process whereby the usability of a product is specified quantitatively and in advance.  Then as the product itself, or early baselevels or prototypes of the product are built, it is demonstrated that they do indeed reach the planned levels of usability" [Tyldesley 1990]

Usability engineering tries to make a product suit the task for which the product was conceived.  This involves not only the functionality of the product but also the ergonomic considerations that make the product easy to use.  There are parts to a usability study:

  • the user
  • the task
  • the metrics

World Usability Day

Uusability engineers are engineers trained to make usability measurements and decisions.  They are trained to follow an elaborate process in making these measurements and decisions. 

To turn the evaluation of any system into a usability engineering process, the usability engineer creates a usability specification. 

After the usability engineer has made the measurements with respect to the specification, s/he takes the results as a guide for the rest of the usability process, which includes:

  • locating poorly designed aspects of the system
  • re-designing them to meet the users' needs in an improved way
  • re-evaluating the usability of the new design

usability process


Design rules help increase the usability of software products.  They include:

  • Principles
  • Standards
  • Guidelines
  • Patterns

Design principles are the most abstract rules.  They are derived from observing successful systems and discovering why they were successful. 

Here is a one-minute video on the usability revolution.

Design principles can be broken into 3 categories:

  • Learnability - how easy to learn
  • Flexibility - the ways a user can communicate with the system
  • Robustness - features supporting successful achievement of goals


Learnability principles address the features of a system which allow a novice user to:

  • understand the system
  • use the system at an initial level
  • master the system to use it efficiently with a higher level of proficiency


Predictability involves being able to predict the effect of an action based on how the system has performed in the past.  The user should be able to observe the system for a period of time and then predict the effect of actions. 

The goal is to achieve predictability with minimal exposure to the system.  In the worst case, the user would have to see every operation performed to be able to predict the effect on an action.  In the best case, observing a few operations is sufficient to allow someone to predict future operations. 


Synthesizeability is the ability of the user to assess the effect of past operations on the state of the system.  Predictability cannot be achieved if the effect of operations cannot be readily observed.  One of the hallmarks of synthesizability is immediate feedback. 

For example: an early version of the MAcintosh finder allowed the user to create new folders which were placed at the end of the list of files.  If there were enough files to fill the window, the new folder was placed out of sight.  This caused the user to assume the operation had failed and repeat it multiple times, resulting in many unnamed folders in the directory.


Familiarity refers to how familiar the user is with concepts similar to those of the system.  This is often aided by the system being a metaphor for something the user is familiar with in the real world.  For example, the word processor is similar to a typewriter.  Therefore, the user's familiarity with the typewriter will aid him or her in understanding the word processor.


Users can extend their understanding of the system by generalizing the concepts they already understand.  For example, a drawing program allows one to constrain an ellipse to be a circle by holding down the shift key while drawing.  The user can extend this to conclude that a rectangle can be constrained to be a square by holding down the shift key.

Generality is related to consistency. 


Consistency is the most often mentioned principle of interface design.  It means that similar actions should have similar effects in similar situations.  Consistency reduces the load on the user's memory since there are fewer rules to remember. 

Input Consistency

For example:

  • the arrow keys are used to move around in many text editors
  • some editors use regular keys arranged in the same pattern on keyboards which do not have arrow keys
  • using e, s, d, x as arrow substitutes that have the same relative positions.

Output Consistency

The control panel for an aircraft color codes the messages on the panel:

  • urgent (red) - requires immediate action
  • warning (amber) - requires attention soon
  • advisory (green) - for information only, no action required


Flexibility refers to the number of ways the user can exchange information with the system.  There are several underlying principles which contribute to flexibility:

Dialog Initiative

Dialog initiative indicates who starts a dialog - the system or the user.

  • system preemptive
  • user preemptive

System Preemptive

The system initiates all dialog and the user simply responds.  This happens when the system displays a modal dialog so that the user must respond before any other operation can be performed.

  • system preemptive
  • user preemptive

In a system preemptive dialog the user's action are controlled by the system and greatly restricted.  We want to maximize the user's ability to preempt the system and minimize the system's ability to preempt the user. 

Consider filling in an address form.  In a system preemptive dialog

  • user selects country
  • system presents list of states or provinces
  • user selects province
  • system presents list of cities
  • user selects city
  • system prompts for address

The user must

  • enter data in a specific order
  • cannot cut and paste from elsewhere
  • do exactly what the system demands

User Preemptive

This model provides much greater flexibility for the user.  Most users prefer this form of dialog initiative.

  • user asks to enter address
  • system displays address form
  • user enters data in any order desired

There are situations where it is better for a dialog to be system preemptive. 

Consider a text editor where two users can edit at the same time.  To avoid problems, the system locks paragraphs so that only one user can change a paragraph at a time.  If this was not controlled by the system, a lot of confusion would result. 


A multi-threaded dialog allows the user to interact with different parts of the system at the same time.  This is common in windowing systems which allow a user to switch between running programs and interact with the one selected.  The same thing can be done for an application by providing multi-threading.

Multi-modality is related to multi-threading.  It means that different communication channels can be used simultaneously.  A bell can sound when invalid input is supplied.  The user can also be provided with multiple ways to perform a task. 

Open a window by

  • clicking an icon
  • keyboard shortcut
  • say "open window"

Task Migratability

Task migratability means that control of a task can be transferred from the user to the system and vice versa.  Consider the case of checking the spelling in a document.  The user can look each word up in a dictionary.  The system can perform the same task automatically. 

Task migratability can enhance safety. 

  • In an aircraft, routine flying can be relegated to the auto pilot
  • in an emergency, it is vital that the functionality be transferred to a human pilot
  • without this migratability, safety would be compromised


Substitutivity means that one form of information representation can be substituted for an equivalent form. 

Consider entering a distance:

  • as a value in default units - 3.8
  • as a value in specific units - 3.8 cm
  • as a calculation (5 - 1.2) cm

There are also equivalent forms of output that can be substituted.

Consider the display of the temperature

  • as a thermometer displaying the current temperature
  • as a chart showing temperatures over a period
  • as a graph showing temperatures over a period

Equal opprtunity blurs the distinction between input and output at the interface:

  • an analog dial which shows a value but whose hand can be moved to adjust the value
  • a cell in a spreadsheet whose value can be changed and it will cause other values to be recalculated


Customizability allows the interface to be modified by the system or the user.


Adaptability refers to the user's ability to adjust the form of input and output.  The user plays an explicit role.  Adaptability can take two forms:

  • lexical customization - the user can only change the layout of on-screen buttons or rename commands - the overall structure of the interface remains the same
  • programmability - the user can program the interface
    • the UNIX shell can be programmed
    • VB/Java script can be used to program many applications
    • macros can be recorded to create new commands with simple logic in them


Adaptivity refers to the automatic customization of the interface by the system.  The user plays an implicit role. 

Adaptivity allows the system to adjust the interface to the perceived needs of the user.  For example, the system could monitor the actions of the user to decide whether to provide an interface for a novice or an expert. 


The robustness of an interaction is any feature of the system that supports the successful completion of the goal.


Observability allows the user to evaluate the internal state of the system by interacting with the interface.

Observability is supported by

  • browsability
  • defaults
  • reachability
  • persistence
  • operation visibility


Browsability allows the user to explore the current state of the system from the user interface.  Typically, the user interface does not display al of the information about the system.  What it should do is display all of the state information that is relevant to the current task. 

Often, there is more information than can be displayed at once.  This means that the user must be provided with some means of navigating the information to explore and find what is needed.  The browsing itself should not affect the state of the system. 


The use of default values can reduce

  • the amount of information that a user has to remember
  • the number of physical actions necessary to provide a value

There are two types of default values:

  • static - the values never change
  • dynamic - values are calculated based on the state of the system


Reachability refers to the user's ability to navigate through the observable system states.  A state is reachable if it can be reached from every other state in one way or another.  It is important that all states be reachable.


Persistence has to do with how long an act of communication between the user and the system persists.  If the system plays a sound to alert the user, it is gone in an instant.  If the user went for coffee when the alert sounded, the user would have missed the alert.  If the alert is displayed on the screen, it will persist for a much longer time and there is less chance that it will be missed or forgotten.


Recoverability refers to the ability to recover from a mistake and still be able to achieve a goal. 

Recovery can occur in either of two directions:

  • Forward
    • the current state is accepted and it is possible to work to the final goal from that state
    • this might be the only possible technique if an action has irreversible side effects, such as printing
  • Backward
    • this allows you to undo one or more actions to return to a previous state
    • you can then move in a different direction from the previous state to correct the error

The principle of commensurate difficulty states that if it is difficult to undo an operation, it should have been difficult to perform the operation in the first place.


Responsiveness is the rate of communication between the system and the user.  Response time is the duration needed for the system to display a state change to the user.  Ideally, response time should be close to zero, so that the user perceives it as instantaneous.  Stability, the constancy of response time across similar tasks , is more important than absolute response time.

Task Conformance

Task conformance means that the necessary operations must not only be provided but also be provided in a way that they are convenient to use for a variety of tasks. 

Task completeness means that all operations necessary to perform the task are provided. 

The User

The first step in a usability study is to identify who the end user will be.  Once identified, we determine

  • their level of expertise
  • what they will assume about the system
  • the environment within which they will use the system

It is often helpful to be able to group users into classes.  Let us consider two axes for user classification:

  • how they use the system
  • their level of expertise

System Usage

The types of usage include:

  • direct
    • make direct use of the system to carry out their duties
    • e.g. data entry clerk or secretary who uses a word processor
  • indirect
    • ask other people to use the system on their behalf
    • e.g. a passenger who asks a travel agent to check the availability of a flight
  • remote
    • depend upon the services the system offers but do not use the system itself
    • e.g. bank customers depend upon the bank systems for account statements
  • support
    • support the system
    • e.g. help desk staff, administrators, technicians

Two further types that overlap with these are:

  • mandatory
    • must use the system as part of their job
  • discretionary
    • do not have to use the system as part of their job
    • make infrequent use of the system
    • have less familiarity with the system than mandatory users


Categories of expertise are:

  • novices
  • intermediate users
  • experts

We decompose these further into:

  • mandatory
  • discretionary
  • intermittent



  • have little or no experience
  • may be hesitant to use a system
  • need feedback that they are doing things in the right way
  • prefer to progress at their own speed
  • require a system robust enough to deal with a user who does not know what they are doing
  • need guidance through processes
  • need availability of human help
  • need lists of common questions and answers to those questions
  • need a system where their actions do not have side effects
  • need simple interfaces with complex areas hidden unless explicitly requested



  • require less help
  • perform many operations for the first time
  • make extensive use of help systems
  • need accelerator keys for faster alternatives to menus
  • need to be able to customize their typed commands to make the interface look like others they know

Intermittent Users

Intermittent Users:

  • use the system occasionally with possibly long periods of absence
  • display both novice and expert characteristics
  • often remember general concepts but not low-level details
  • need access to good manuals and help facilities

Gathering Information

To identify the typical user we gather information through:

  • formal and informal discussions
  • an expert on the design team
  • questionnaires
  • observations
  • interviews

Formal and Informal Discussions


  • reveals unknown details about how users work and their environment
  • reveals insights into which designs would work for users
  • makes users feel as part of the design team
  • causes users to want the project to succeed, whereas leaving them out often makes them hope it would fail

Informal discussion gives us an appreciation of the workplace.  Before designing a system, it is important to know the environment in which the system will be used.  For example, secretaries have said that they did not want audible alerts when they make a mistake because they do not want others around them to know.  Discussions with the boss should reveal

  • how the boss thinks the job is done
  • how the job used to be done
  • how the boss wants the job to be done

Subsequent discussions with the worker would reveal

  • how the job is really done

Expert on the Design Team

For expertise, we include an end user in the design team:

  • this expert user provides valuable insight into the needs of the users in general
  • if this expert user spends too much time on the team, they become a designer and stop being an external source
  • rotating expert users on the design team can ensure that they remain representative


Questionnaires are good for subjective data.  They produce large amounts of information, which can take considerable time to analyze.  Questionnaires can be either interviewer administered or self-administered:

  • Interviewer administered
    • very time consuming
    • interviewer explains the meaning of the questions
    • different interviewers must treat subjects in the same way
  • Self-administered
    • much faster than interviewer administered
    • questions have to be clear to avoid misinterpretation
    • open to some groups answering in a particular way if they have an axe to grind

Questions can be:

  • open
    • gain broader information by allowing respondents to answer in any way they choose
    • produces so much data that it may be difficult or impossible to interpret
  • closed
    • restricts answers to few possibilities
    • can distort the answers by suggesting responses that the respondent would not have thought of on their own

Questionnaires must be prototyped and tested before being administered.  This should reveal:

  • possible misinterpretations of questions
  • additional material that must be included to understand the question
  • imprecise wording of questions
  • questions that bias the respondent - ticking off more so that they seem more competent than they are

The different types of questionnaires include:

  • yes/no or one answer from a finite set of stated responses
  • checklists - provide room for multiple responses - the user does not have to remember which aspects of the system they have been using
  • scalar - ask for one answer on a numerical scale - the middle position if any should allow the user to register no opinion
  • length scale - gather the strength of an opinion
  • rank - requests a relative ranking of the stated responses


Observation can be a very useful tool in finding out how people actually use a system.  A major danger with observation is that the users alter their behavior when they know that they are being observed.  This is called the Hawthorne effect.  It was noticed by Henry Landsberger in his study of electrical workers when their productivity increased as light levels reduced to the point where the workers could no longer see and their productivity slumped when the study was concluded and the lights returned to original levels. 

There are two ways to circumvent the Hawthorne effect:

  • expose the users to the observer for a long enough time that they start to ignore the observer
  • use video to record the workers without an actual observer being present

In most cases, acclimatizing the users to the observer produces the best results.  The use of video recording is less efective than it was originally anticipated to be.  Video takes a long time to set up and often records far more information than necessary.  This makes it less effective than just recording a couple of minutes of crucial activity.  Users' behavior changes as much by the act of video recording as by being observed directly by a human. 

Another form of observation is activity logging.  This involves logging what the user does and how long it takes them to do it.  This can be done by the user, although this affects their performance, by an observer, or by the software itself.  Using an observer is effective but the user will have to become acclimatized to their presence.  If the system does the logging, the user must be informed of this for ethical reasons and by law in some areas.  A period of acclimatization is also necessary for system based logging.


Interviews can range from open-ended questions to closed well-defined questions, with every possible variation in between:

  • Open-ended
    • useful when first starting interviews
    • useful when the interviewer does not know what to ask
    • lets the interviewer refine and direct the questions as the interview proceeds
    • often uncovers new facts
  • Closed
    • the interviewer directs the interview
    • obtains a limited range of responses
    • the interview time is limited
  • Mixed
    • uses closed questions but allows the interviewee to add additional thoughts
    • the interviewer can add questions or steer the interview in new directions
    • this is often the most effective interview style

Task Analysis

In general, a task is some human activity performed to achieve a goal.  The result of a task analysis is an understanding of what a system should do to enable the achievement of the goal.  Good design is possible once we understand the user's tasks and how the user performs those tasks. 


A task consists of

  • input
  • output
  • a process that converts the input into the output

elements of a task

We decompose a task into a hierarchy of subtasks.  To do so, we ask a series of questions about the input:

  • what information is needed to perform the task?
  • what are the characteristics of the information sources? (reliable, wrong format, etc.)
  • what affects the availability of the information
  • what errors might occur? (the goal is to design a system that can avoid the errors)
  • what initiates the task? (often the initiation of a task depends upon the completion of a previous task)

We follow these questions by questions about the output:

  • what are the performance requirements?
  • what happens to the output? (to determines whether the output is sent to another process or needs to be formatted)
  • how does the user get feedback on the progress of the task?

Finally, we ask questions about the transformation itself:

  • what decisions are made by the entity performing the transformation?
  • what are the strategies for decision making and how do we incorporate them into the system?
  • what skills are needed for the task? (users of the system will need to be trained and kept up to date with these skills)
  • what interruptions can occur and when can they occur?
  • how often and when is the task performed?
  • does the task depend upon another task?
  • what is the normal and maximal workload? (this allows us to design the system to deal with this load)
  • can the user control the task workload? (in some cases the user can delay a task or control the flow of data to a task)

We describe the result of a task analysis in the form of a graph of tasks and subtasks. 

graph of a task set


How users perform tasks depends upon their cultural background.  To appreciate the role of culture, we turn to ethnography. 

Ethnography is the immersion into a culture to determine how people of that culture work and think.  Members of the design team join the users in field studies for prolonged periods and literally become one of the users in an effort to understand the cultural influences at work. 


Once we understand the user and the task, we can design and build a solution.  The question arises "How good is the solution?".  The answer is not obvious since there is no simple way to measure the goodness of something.  One solution is to give users a questionnaire and have them rate aspects of the product from 1 to 10.  The problem with this approach is that ideas of what is a good product differ.  For example, consider how hot would you like your meal? 1, 2 or 3 chillies?  what is mild to one person may be extremely hot to another. 

The ISO 9241 definition provides an initial framework for measuring quality:

  • efficient
  • effective
  • satisfying

To these we add:

  • learnability
  • error rates


To judge effectiveness, we measure:

  • the success to failure ratio in completing the task
    • this provides an overall measure of effectiveness
  • the frequency of use of various commands or features
    • this shows how often the features are used
    • this exposes the techniques employed to solve the task
    • this reveals if there might be more efficient techniques
  • the user problems
    • this provides an indication of what problems the users experience and how severe those problems are
  • the quality of the results or output
    • this provides an overall measure of effectiveness


Efficiency is the amount of time it takes to complete the task or the amount of work performed per unit time.  To judge efficiency, we measure:

  • the time required to perform the task
  • the number of actions required for the task
  • the time spent consulting documentation
  • the time spent dealing with errors


Measuring users' satisfaction can be difficult.  The best way is to get the users to rate their satisfaction on a scale.  This is a noisy measurement, but a larger sample size should make the data meaningful. 

  • the time required to perform the task
  • the number of actions required for the task
  • the time spent consulting documentation
  • the time spent dealing with errors


Learnability measures how ease it is to learn to use a software system.  This is important since the shorter the learning time, the less costly it is to train new users.  We measure learnability in terms of the time it takes a new user to acquire the skills to complete the tasks involved.

Most users never learn all of the capabilities of a system.  All we measure is how long it takes the user to acquire the most essential skills.  Some systems use metaphors that take advantage of the user's knowledge of similar tools.  These systems claim to be zero knowledge systems since novices can use them immediately without any prior knowledge. 

Other ways to measure learnability include:

  • the frequency of error messages
  • the frequency of a particular error message
  • the frequency of online help usage
  • the frequency of help requests on a particular topic

Error Rates

Error rates are a classic measure of efficiency.  Logically, the fewer the errors, the more useful work is being accomplished.  We categorize errors by type and severity:

  • type
    • an intentional error is one where the user intended to perform the action but the action was wrong, usually due to a misunderstanding of the software.
    • a slip is an unintentional error made by clicking the wrong button.  A slip is not a problem with understanding the software.
  • severity
    • minor - easy to spot and correct
    • major - easy to spot but harder to correct
    • fatal - prevents the task from being completed
    • catastrophic - has other side effects such as loss of data or affects other applications


The usability specification states:

  • to which set of users the measurement applies
  • which attributes are being measured
  • what are the preconditions for the measurement
  • what are the criteria for success
  • how are the criteria to be measured

The specification also sets out the measures of usability.  These can include:

  • worst case - the worst case that will render the system unacceptable to the user
  • lowest acceptable level - lowest acceptable performance
  • planned case - the level at which the system is expected to operate
  • now level - the performance of the existing system


We use a checklist to make sure that the usability specification is complete.  The checklist simply serves to remind us of items we might have forgotten.  The specification need not include all of the items on the checklist. 

The specification checklist:

  • time required to complete a task
  • fraction of a task completed
  • fraction of a task completed per unit of time
  • ratio of success to failure
  • time spent dealing with errors
  • frequency of use of documentation
  • time spent using documentation
  • ratio of favourable user comments
  • number of repetitions or failed commands
  • number of good features recalled by the user
  • number of commands which were not used


Methodology describes how we evaluate the usability of a system.  There are distinct types of evaluation:

  • analytical - paper and pencil used to evaluate tasks and goals
  • empirical - user performance is evaluated to judge the usability

All evaluation methods share the same basic structure and requirements: 

  • identify the target group
  • recruit users
  • establish the task
  • perform the evaluation
  • interpret the results amd report the findings

Target Group

The target group will usually be the same as that identified during the process of requirements gathering.  Some experimental designs might require targeting different groups, so be sure to check.

Recruiting users can take a long time.  We must recruit users from the right target group.  We should recruit more users than necessary as problems might arise rendering some user unsuitable or unavailable.  We should avoid recruiting users who have been involved in the design process as they might be biased.  Researchers have found that 3 users can give an accurate opinion that reflects that of the larger user community. 

Task Identification

We will measure the task that the user will perform.  In the initial stages of evaluation, it is better to use very specific tasks.  In later stages, we can use more broad-based tasks.  We must

  • specify the task as a detailed set of steps to be performed
  • test the steps in the task to show that they are correct and can be performed
  • check the statement of the task to make sure that the instructions are clearly written
  • decide how much instruction is appropriate to give the user before they perform the task

Prepare the User

Before conducting the evaluation, we need to

  • introduce the user to the system
  • give the user a written introdcution to the system to ensure that a walk through does not miss anything
  • set up some method of recording the evaluation (observation or video)
  • prepare and test any questionnaires
  • write instructions for conducting the evaluation, particularily if more than one person will be performing the evaluation

Perform the Evaluation

The first step in performing the evaluation is the introduction:

  • welcome the users and make them feel comfortable
  • tell the users about the process and what is expected of them
  • make sure that the users know that the system is being tested, NOT the users
  • let the users know that the result, whether negative or positive, will be used to improve the performance of the system
  • make sure that the users understand that if things go wrong, it is not their fault
  • familiarize the users with any observer, questionnaire, or recording equipment being used
  • let the users know that they can quit at any time
  • at the end, ask any required questions of the users
  • finally, thank the users for their time and input

Code of Conduct

As evaluators, we should follow a code of conduct with respect to the users. 

  • We are working with people, not objects
  • Explain
    • what is expected of the user
    • the purpose of the experiment
    • that the user can leave at any time
    • that it is the system being tested
    • that the results are confidential
    • what the results will be used for
  • make sure the user is comfortable
  • do nothing to embarass or distress the user
  • get the user to agree, in writing, to the guidelines

Report the Findings

Before reporting the findings, we review the evaluation process itself.  We list any problems that occured during the process and examine them to find causes and solutions.


HCI is based on cognitive psychology.  Cognitive psychology draws its experimental methods from the scientific method:

  • induction - form a hypothesis from existing data
  • deduction - make predictions based upon the hypothesis
  • observation - gather data to prove or disprove the hypothesis
  • verification - test the predictions against further observations


We state a hypothesis to explain the observations that we have made. 

  • the hypothesis is our testable statement
  • we identify the independent or test variable
  • we design our experiments to test the validity of our hypothesis under changes to our independent variable
  • we also design our experiments to rule out other possible causes for the observed experimental results
  • we form a null hypothesis to check if its falsehood helps to prove our hypothesis

For example:

  • "The choice of background and text color affects the user's ability to check a document"
  • the independent variable or test is the background and text color
  • the background and test color can be manipulated by the researcher
  • the dependent variable is the user's performance in checking the document
  • We test the hypothesis by varying the text and background colors while measuring the user's performance
  • the preformance should be significantly different to indicate that the independent variable affected it
  • we use statistical methods to determine if the difference in performance is significant or not
  • we use tables to determine if our results are statiscally significant or not
  • our null hypothesis is "The choice of background and text color does not affect the user's ability to check a document"


The major problem in designing the experiment is to ensure that the independent variable is the only factor affecting the dependent variable.  There are several techniques that help.  The choice depends upon the nature of the experiment.  The techniques include:

  • control groups
  • subject selection

Control Groups

One way to ensure that the independent variable is affecting the result is to use a control group.  For example, a control group is used in drug testing where the group is given a placebo to nullify any psychological effects.  We can use this approach in HCI where we give the control group the original interface and tell the users that improvements have been made. 

Subject Selection

One group is often better at performing a task than another group.  We can use a process called matching to ensure that the members of the two groups are the same.  We identify the attributes that subjects in both groups must have and make sure that they are present in each group.  Possible attributes to match include

  • both groups have equal ratios of men and women
  • the ages of both groups are similar
  • the occupations of both groups are similar

We refer to differences between the groups as confounding variables (they can alter the test results).

One way to eliminate differences between individuals is to perform the experiment on the same individual.  We call this related measure design.  We measure the user's performance

  • alone without feedback
  • in the presence of a researcher who provides feedback
and determine if the feedback helps.

Related measure design suffers from potential problems:

  • the order effect - the order in which the user performs the tasks affects the results
  • the practice effect - the performance improves with practice
  • the fatigue effect - the performance decreases with time
The way to overcome these problems is to use a technique called counterbalancing.  One half of the subjects perform the experiment in one order and the other half perform the experiment in the opposite order.


There are many known problems with experiments:

  • we cannot prove a hypothesis since no matter how many experiments we perform to support it, one case to the contrary will disprove it
  • experiments in social science cannot remove extraneous variables as well as is done in the natural sciences and the results cannot be as well trusted
  • experiments prove or disprove the hypothesis of the resercher and ignore any input from the subjects
  • experiments are often scaled down versions of real-world and we cannot be sure that the results will be applicable if scaled up to the real problem
  • experiments are usually conducted in a laboratory and this setting might affect the results.  One solution to this is to conduct the experiment in the workplace to eliminate differences in setting
  • some experiments are contrived, have little to do with reality and the results might not apply to the real world


Think Aloud

Think aloud is a technique in which we ask the user to think aloud while performing a task.  This lets us determine:

  • how the user approaches the task
  • the user's model of the system
  • why the user makes the decisions that s/he does

Think aloud requires the researcher to take notes or to record the session.  Some users will be uncomfortable thinking aloud or might be embarassed.  Think aloud might also alter the performance of other users.

Cooperative Evaluation

Cooperative evaluation is a variation on think aloud in which we encourage the user to think of themselves as participating in the evaluation rather than just being a subject.  We cooperate with the user to produce an evaluation of the system: 

  • The user makes comments which s/he thinks will aid in the evaluation
  • We ask the user for clarification of a comment
  • The user asks us for help in a poorly understood part so that the evaluation can continue

This technique produces a large amount of information which we then need to analyze.


  • 5 users are enough
  • the task should be specific (use the drawing package to draw a circle)
  • the results should be recorded in some way
  • the results should be broken down into unexpected behavior and user comments

Wizard of Oz

The Wizard of Oz is a technique whereby we evaluate a system without building it: 

  • a human, possibly located remotely, acts as the system
  • the user interacts with the human
  • the system is very difficult to prototype
  • it is very difficult to provide feedback on the design


Logging all of the action performed by the user is a useful technique for evaluating how they use the system and how long they take to complete their tasks.  Logging can take the form of

  • the user keeping a diary
  • the system recording the user's actions automatically

If logging is automatic, we must make the user aware of the fact that they are being logged both from an ethical perspective and, in some jurisdictions, a legal perspective.  Users who know that they are being logged might change their behavior and take some time to revert to their normal behavior.  Logging tends to generate large amounts of data.

Inspection Methods

Inspection methods are an alternative to evaluation techniques, which involve users and can be both time consuming and expensive.  In an inspection method, an HCI practitioner evaluates the usability of a product without involving the users. 

Inspection methods include:

  • guideline review
  • pluralistic walkthrough
  • consistency inspection
  • standards inspection
  • cognitive walkthrough
  • formal usability inspection
  • feature inspection
  • expert appraisal
  • property checklists
  • heuristic evaluation

Guideline Review

Many practiitoners have formulated a set of guidelines over the years, often to ensure

  • consistency of interfaces
  • similarity to other interfaces developed by the same company
  • consistency of all interfaces for a particular platform
The usability engineer should examine the guidelines before checking that they have been followed.

Pluralistic Walkthrough

A pluralistic walkthrough involves:

  • users
  • developers
  • usability experts
Each participant brings their own expertise to the walkthrough.  Each makes an evaluation based upon heuristics.  All meet to reach a final evaluation.

Consistency Inspection

Consistency inspections are carried out by experts.  They inspect controls on all aspects of the system and compare them for consistency.

Standards Inspection

Standards inspections check that a product conforms to specified standards.  Standards differ from guidelines in that standards are adopted by the developer or software house.  Sometimes standards exist for human factors reasons.  Other standards exist simply to have things done consistently.  Often, standards represent compromises rather than best solutions. 

Standards can originate from many sources:

  • internal developers standards
  • internal clients standards
  • external industry standards
  • external national standards
  • external international standards

Cognitive Walkthroughs

In a cognitive walkthrough, an expert goes through a task pretending to be a user.  The goal is to identify any difficulties that a user might encounter.  For this to work, the expert must have a good knowledge of the user and be able to act accurately like the user.  A walkthrough is only a good as the expert and his knowledge of the user and the task.

Formal Usability Inspection

Formal usability inspections are carried out by a team as part of the software testing process.  Normally, 3 testers are sufficient to form the team.  The team may use any of the guidelines or heuristics in its evaluation.

Feature Inspection

Feature inspections focus on:

  • what features and functionality exist
  • how features are related to one another
  • whether the features and functionality meet the requirements for the system

Expert Appraisal

An expert appraisal is an appraisal of the system by an expert using one of the other techniques:

  • many experts use a cognitive walkthrough
  • other experts use their own expertise
An expert appraisal is cheaper than involving users, but the evaluation is only as good as the expert. 

Heuristic Evaluation

Heuristics are broad based rules designed to ensure a usable interface.  Many HCI experts have produced lists of heuristics.  Two popular lists are those of

  • Nielsen
  • Schneiderman

Nielsen's Heuristics

  • consistency - systems should be consistent
  • feedback - the user should be provided with feedback about the state and actions of the system
  • clear exits - the user should be able to exit easily from any part of the system, especially if they got there by accident
  • shortcuts - there should be accelerators for experts which could be hidden from novices
  • good error messages - should be easily understandable
  • prevent errors - systems should try to prevent errors
  • help and documentation - should be easy to use and search

Schneiderman's Heuristics

  • consistency - strive for it
  • shortcuts - provide for frequent users
  • feedback - offer informative feedback
  • dialogs that yield closure - actions should have a beginning, middle and end, the user should know when an operation is complete so he can move on
  • simple error handling - errors should be designed out of the system, incorrect commands should not harm the system
  • easy reversal of actions
  • internal locus of control - users should initiate actions, not the system
  • reduce short term memory load - reduce the amount that the user has to remember

Property Checklists

Property checklists take the form of high-level goals that are identified by attributes.  These checklists are similar to heuristics.  We go through the checklist and tick off where the system meets the attributes.  Researchers have produced long checklists.  Once the checklist has been prepared, unskilled personnel can perform the evaluation.