Thinking Aloud Test Report
Human-Computer Interaction SS 2024
Group GT-XX
Harald Roth
Christian Traum
Thomas Gelb
Sabine Schwarz
Thinking Aloud Test of the Web Site
example.com
Report of XXth May 2024
{My instructions and comments are contained inside curly brackets. Remove them before you hand in your work!}
{You must use your own words. Do not copy material from the web, from colleagues from previous years, or from anywhere else. Do not generate text using AI-based tools. You are allowed to copy your own words from your own TA Plan.}
1 Executive Summary
{Executive Summary of the main results from the usabilty test aimed at higher management (between 300 and 500 words). In paragraphs, not bulleted lists.}
{Your client's manager will not read the whole report, but only the executive summary, and wants to know how the test was done and what the main findings were.}
{The description of procedure, methodology, and setup should take up no more than 25% of the executive summary. Most of the executive summary should summarise the main findings.}
2 Introduction
{Short description of the web site to be tested.}
{State whether the tests were run in English or German. Adapt my text below accordingly.}
The web site is in German and was tested in German with German-speaking test users.
The web site is available in both German and English. The English version was tested with English-speaking test users.
3 Test Procedure
3.1 Test Methodology
{Describe what a TA Test is and how it is done. Write between 300 and 500 of your own words. Replace my sample text below with your own.}
A thinking aloud test is ... ... can be found in Keith Andrews' course notes [And2023]. The thinking aloud method is described in detail by Barnum [Bar2020].
{Cite at least two more references of your own in this section and include them in the References section below. Use only indirect quotations (paraphrasing), no direct quotations. You may remove or retain my sample reference citations as you wish.}
3.2 User Profiles
{Describe the kinds of user the site is trying to attract.}
{Group these users into categories according to their characteristics.}
{Describe the goals and typical tasks for each of these user groups.}
{Describe the user group you actually tested, i.e. the one you were notified to use.}
3.3 Test Users
{Describe the selection process for test users, to ensure that they come from the user group chosen for testing.}
{Collect all the data from the background questionnaires, including any domain-specific questions, and enter it into the table below. Use the fictitious first name aliases of your test users.}
Table 1 gives an overview of the test users who participated in the study. First name aliases are used throughout this report to protect their identity.
Test User | TP1 (Pilot) | TP2 | TP3 | TP4 | TP5 |
---|---|---|---|---|---|
Alias | “Stuart” | “Silvia” | “Sally” | “Martin” | “John” |
Date of Test | |||||
Time of Test | |||||
General Information | |||||
Gender | man | woman | woman | man | man |
Age | 23 | 27 | 30 | 32 | 25 |
Occupation | Student | Web developer | Student | Test Engineer | Electrician |
Education | |||||
Highest Level | A Levels | Bachelor | A Levels | Master | A Levels |
Area of Study | studying law | Computer Science | studying economics | Computer Science | - |
... | |||||
etc. | |||||
... | |||||
Domain-Specific Questions | |||||
... | |||||
Previous Usability Tests | |||||
As Test Person | |||||
In Test Team | |||||
Type of Test |
3.4 Test Environment
{Include one or two photos of the room and test environment (JPEG). Anonymise anyone who is identifiable in the photos by pixelating or blurring their faces.}
The test room is shown in Figure 1.
The environment used for the thinking aloud tests is shown in Table 2.
{Fill out the table with the exact details used at the time of the TA tests.}
Device | Dell Precision 5510, 32gb RAM |
---|---|
OS and Version | Windows 10 Pro 64-bit v22H2 EN |
Screen Size | 21″ TFT |
Screen Resolution | 1920×1080 |
Web Browser and Version | Chrome 111.0.5563.111 |
Internet Connection | A1 LTE 4G Hotspot |
Download Speed | 30 mbps |
Screen Recording Software | OBS Studio 25.0.8 |
Recording Resolution | 1920×1080 |
Video Editing Software | Lossless Cut 3.59.1 Win |
Video Transcoding Software | not required |
{Include a screenshot (PNG) of the browser About window to document the exact browser version used in the test.}
The exact web browser version used for the tests is shown in Figure 2.
{Explain that ad blockers were not used (uninstalled, disabled).}
{Explain that use of cookies was left up to each test user.}
{Explain how faces were blurred or pixelated in the screen recordings.}
{Describe the external video equipment (tripod, video camera, microphone, mirror) which was used to record the entire test.}
3.5 Training
{Describe any interface training given to each user. For example, a special input device, window manager, or software.}
{Describe any training on thinking aloud given to each user. How did the facilitator demonstrate the thinking aloud technique? Did the user practice it?}
3.6 Tasks
{Enter the tasks which were actually used, i.e. the ones you were notified to use.}
{If you tested in German, list the tasks in German and provide English translations.}
The internal task list used by the test team is shown in Table 3.
Task No. | Description | Prerequisites | Completion Criteria | Max. Time | Possible Solution Path |
---|---|---|---|---|---|
1 |
[First Impressions]
and spend a few minutes looking around. |
Web browser opened at google.com
|
User indicates they have finished looking around or time has elapsed. The facilitator then asks the user:
|
3 minutes. | |
2 | [Motivational task] {Very easy so that users have a feeling of success. Will not necessarily be analysed as part of the test.} |
2 minutes. | Home → Contact | ||
3 | [Fairly easy] | 2 minutes. | |||
4 | [Medium difficulty] | 5 minutes. | About Us → Jobs → ... | ||
5 | [More involved] | 10 minutes. |
The task descriptions given to the users during the test are shown in Table 4. The actual task slips are included in Appendix A.5.
Task No. | Description |
---|---|
1 | The task description given to the user. |
2 | ... |
3 | |
4 | |
5 |
3.7 Interview Questions
{If you tested in German, list the questions in German and provide English translations.}
The following questions were asked of each user immediately after the final task. Summaries of the interviews can be found in Section 4.7.
-
Opening Question
{The first question should always be "How was it?", an open question to encourage the user to speak freely.}
-
"How was it?" ["Wie war's?"]
{Then let the user speak, and encourage them, until they dry up and stop of their own accord.}
-
"How was it?" ["Wie war's?"]
-
Standard Questions
{Next, you ask a series of standard questions prepared in advance for every test user. More general questions should come first, more specific questions later. At least four standard questions should have been prepared for this section in the TA Plan.}
-
"Did anything strike you as particularly good?"
["Gab es etwas, dass besonders gut oder positiv aufgefallen ist?"] -
"Did anything strike you as particularly bad?"
["Gab es etwas, dass besonders schlecht war?"] - "..."
-
"Did anything strike you as particularly good?"
-
Individual Questions
After the standard questions above, any individual questions which arose during the test with each particular test user were asked.
3.8 Feedback Questionnaire
After the interview, the user was asked to fill out the feedback questionnaire given in Appendix A.6. The feedback questionnaire was given on paper. An overview of the results can be seen in Section 4.8.
3.9 Data Collection
{Explain that demographic, behavioural, and attitudinal data collected during the study are associated with a TPid and first name alias. Test users are only referred to by these in this report.}
{Explain that personal data (real name, signed consent form, and the full video recordings) collected from the test persons are kept separately and will will be deleted after one year.}
{Explain that any faces in the video clips used to illustrate findings in the report have been blurred or pixelated.}
4 Results
4.1 Task Completion
A summary of how many users completed each task and whether any assistance was given is shown in Table 5.
{Task completion is often a binary measure, whereby 0 indicates failure and 1 indicates success. Sometimes, a percentage measure of success is used. Explain which measure you are using.}
Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | |
---|---|---|---|---|---|
TP1 | 1 | 0 | 1 | 1 | 1 |
TP2 | 1 | 1* | 1 | 1 | 1 |
TP3 | 1 | 1 | 1 | 1 | 1 |
TP4 | 1 | 1 | 1 | 1 | 1 |
TP5 | 1 | 0 | 0 | 1 | 1 |
Total | 5 | 3 | 4 | 5 | 5 |
% | 100 | 60 | 80 | 100 | 100 |
4.2 First Impressions
Tables 6a, 6b, and 6c show a summary of the responses of the test users to the questions asked after the first task.
{Summarise in English the responses of the test users to the three questions asked after the first task.}
TP1 | An online shop called... |
---|---|
TP2 | |
TP3 | |
TP4 | |
TP5 |
TP1 | Geeks and nerds. |
---|---|
TP2 | |
TP3 | |
TP4 | |
TP5 |
TP1 | Hardware components... |
---|---|
TP2 | |
TP3 | |
TP4 | |
TP5 |
{Never invent content. If you forgot to ask about the first impressions, say so.}
4.3 Top Three Positive Findings
The three most positive findings according to their average (mean) positivity ratings are described in more detail below. The positivity rating scheme used to rank positive findings is shown in Table 7.
Positivity | Meaning |
---|---|
4 | Extremely Positive |
3 | Major Positive |
2 | Minor Positive |
1 | Cosmetic Positive |
0 | Not a Positive |
{Describe the top three positive findings which emerged during the test, in descending order of mean positivity (most positive first).}
{One instance from every user who experienced a finding should be extracted as a video clip and included in the mini-table for that finding, each linked to the corresponding video clip. List the corresponding timestamp(s), which refer to elapsed time from the start of the corresponding test user's full session capture video.}
{For each finding, include at least one paragraph of text in the description. Choose the best video clip from those available for the finding to embed as a figure in the description.}
P01. Registration Page Well Laid Out
Title: | Registration Page Well Laid Out |
---|---|
Description: | The registration page is well laid out and clear. |
Video Clip(s): | p01-tp1-reg-clear.mp4, p01-tp3-reg-clear.mp4, p01-tp4-reg-clear.mp4 |
Timestamp(s): | TP1 00:01:15, TP3 01:24:36, TP4 01:26:40 |
Location (How Reproducible?): | Home → Register |
Mean Positivity: | 4.00 |
The registration page was praised as well laid out by two test users... as can be seen in Figure 3.
P02. etc.
4.4 List of All Positive Findings
{Aggregated list of all positive findings observed in the test, in descending order of mean positivity.}
{For every test user who experienced a positive finding, a video clip should be extracted from the session capture video and its name listed in the table cell (linked to the video clip itself). The timestamp of each video clip in the full session capture video should be listed in the neighbouring cell.}
Table 8 shows a list of all the positive findings which emerged from the test, sorted in descreasing order of average (mean) positivity, i.e. the most positive are at the top of the table. Positivity ratings were assigned by members of the test team, their name codes are shown in Table 9.
No. | Title | Description | Video Clip(s) | Timestamps(s) | Location (how reproducible?) | Positivity | ||||
---|---|---|---|---|---|---|---|---|---|---|
HR | CT | TG | SS | Mean | ||||||
1 | Registration Page Well Laid Out | The registration page is well laid out and clear... | p01-tp1-reg-clear.mp4 p01-tp3-reg-clear.mp4 p01-tp4-reg-clear.mp4 | TP1 00:01:15, TP3 01:24:36, TP4 01:26:40 | Home → Register | 4 | 4 | 4 | 4 | 4.00 |
... | Any product page, link "Shipping" | |||||||||
... | {in descending order of mean positivity until... } | ... | ||||||||
... | ||||||||||
11 | News Section Up-to-Date | The entries in the news section are up-to-date... | p11-tp2-news-current.mp4 | TP2 00:12:48 |
something.com/news
|
1 | 0 | 1 | 1 | 0.75 |
Code | Meaning |
---|---|
HR | Harald Roth |
CT | Christian Traum |
TG | Thomas Gelb |
SS | Sabine Schwarz |
4.5 Top Five Problems
The five most serious problems according to their average (mean) severity ratings are described in more detail below. Problem number 1 is the problem (negative finding) with the highest mean severity. The severity rating scheme used to rank problems is shown in Table 10.
Severity | Meaning |
---|---|
4 | Catastrophic problem |
3 | Serious problem |
2 | Minor problem |
1 | Cosmetic problem |
0 | Not a problem |
{Describe the top five problems (negative findings) which emerged during the test, in descending order of mean severity.}
{One instance from every user who experienced a finding should be extracted as a video clip. In the mini-table, include the names of all the video clip(s) for that finding, linked to the corresponding video clip. List the corresponding timestamp(s), which refer to elapsed time from the start of the corresponding test user's full session capture video.}
{For each problem, choose the best video clip from those available for the problem to embed as a figure. Include at least one paragraph of text describing the problem itself. Include a further paragraph of text describing way(s) the problem could potentially be addressed.}
N01. Unclear when Free Delivery Applies
Title: | Unclear when Free Delivery Applies |
---|---|
Description: | Contradictory information: “free delivery” or “free delivery above €200”. |
Video Clip(s): | n01-tp4-unclear-when-delivery-free.mp4, n01-tp5-unclear-when-delivery-free.mp4 |
Timestamp(s): | TP4 00:03:00, TP5 00:01:50. |
Location (How Reproducible?): | Home → Shipping → Delivery |
Mean Severity: | 3.50 |
Two TPs discovered the inconsistency regarding when free delivery applies... as can be seen in Figure X.
To solve this problem, state clearly and in one place, when free delivery applies...
N02. Shipping Link not to Shipping Information
Title: | Shipping Link not to Shipping Information |
---|---|
Description: | When the shipping link was clicked products were shown. The shipping link should go to shipping information. |
Video Clip(s): | n02-tp2-shipping.mp4 |
Timestamp(s): | TP2 00:06:04 |
Location (How Reproducible?): | Any product page, link “Shipping”. |
Mean Severity: | 3.25 |
From a product page, the shipping link does not actually go to information about shipping... as can be seen in Figure X+1.
When the shipping link is clicked, it should actually go to information about shipping...
N03. etc.
4.6 List of All Problems Found
{Aggregated list of all problems observed in the test, in descending order of mean severity.}
{For every test user who experienced a problem, a video clip (MP4 H.264 AAC) should be extracted from the session capture video and its name listed in the table cell (linked to the video clip itself). The timestamp of each video clip in the full session capture video should be listed in the neighbouring cell.}
Table 11 shows a list of all the problems observed in the test, sorted in descreasing order of average (mean) severity, i.e. the most severe are at the top of the table. Severity ratings were assigned by members of the test team, their name codes are shown in Table 12.
No. | Title | Description | Video Clip(s) | Timestamps(s) | Location (how reproducible?) | Severity | ||||
---|---|---|---|---|---|---|---|---|---|---|
HR | CT | TG | SS | Mean | ||||||
1 | Unclear when Free Delivery Applies | Contradictory information: “free delivery” or “free delivery above €200”. | n01-tp4-unclear-when-delivery-free.mp4 n01-tp5-unclear-when-delivery-free.mp4 | TP4 00:03:00, TP5 00:01:50 | Home → Shipping → Delivery | 2 | 4 | 4 | 4 | 3.50 |
... | Any product page, link “Shipping” | |||||||||
... | {in descending order of mean severity until... } | ... | ||||||||
... | ||||||||||
21 | No Catalogue Name | Unsure which catalogue, since no name is displayed. | n21-tp2-cat-name.mp4 | TP2 00:12:48 |
example.com/en/page.bto?page=catalog
|
1 | 0 | 1 | 1 | 0.75 |
Code | Meaning |
---|---|
HR | Harald Roth |
CT | Christian Traum |
TG | Thomas Gelb |
SS | Sabine Schwarz |
4.7 Interviews
{For each test user, summarise in English the main points made by the user in the their interview.}
TP1 “Stuart” appreciated the simplicity of the registration process, ...
TP2 “Silvia” stated immediately that the...
TP3 “Sally” had to be encouraged to talk out loud, but was very forthcoming in her interview. She stressed that...
etc.
{Never invent content. If you forgot to do the interviews, say so.}
4.8 Feedback Questionnaires
{A tabular summary of the responses to the feedback questionnaire, with the mean and standard deviation. Make sure to include the extra domain-specific questions.}
{The neutral scale in the original feedback questionnaire is replaced by a weighted scale between 6 (best) and 0 (worst). The users' responses are then entered into the columns on the right.}
{The mean and standard deviation is calculated for each question and entered with 2 decimal places in the column on the far right.}
{So that the reader can see the overall results at a glance, the point on the scale closest to the mean is marked highlighted through the style sheet.}
{If you tested in German, use English versions of the questions and answers in this table.}
Table 13 shows a summary of the ratings given by users in the feedback questionnaire at the end of the test. The neutral scale in the original feedback questionnaire has been mapped to a weighted scale between 6 (best) and 0 (worst). The numbers in bold indicate the (rounded) mean rating.
The original blank questionnaire can be found in Section A.6. Scans of the questionnaires as completed by each test user can be found in Section B.2.
TP1 | TP2 | TP3 | TP4 | TP5 | Mean | Std Dev | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
1. | Getting to the right part of the site. | Very easy | 6 5 4 3 2 1 0 | Very hard | 5 | 5 | 3 | 2 | 4 | 3.80 | 1.30 |
2. | Quality of information. | Very good | 6 5 4 3 2 1 0 | Very poor | 3 | 3 | 2 | 1 | 2 | 2.20 | 0.84 |
3. | It is easy to read the text. | Very easy | 6 5 4 3 2 1 0 | Very hard | 1 | 1 | 1 | 1 | 1 | 1.00 | 0.00 |
etc. | ... | Very satisfied | 6 5 4 3 2 1 0 | Very unsatisfied | |||||||
12. | How likely are you to return to this site later? | Definitely | 6 5 4 3 2 1 0 | Never | 1 | 1 | 1 | 1 | 1 | 1.00 | 0.00 |
etc. | ... | Very satisfied | 6 5 4 3 2 1 0 | Very unsatisfied |
References
{References to related work and related studies. Include at least two more references of your own. Do not include references to Wikipedia (or copies of Wikipedia). You may remove or retain my sample references as you wish. All references listed here must be cited somewhere in the document.}
- [And2023]
- Keith Andrews;
Human-Computer Interaction: Course Notes;
13 Mar 2023.
http://courses.isds.tugraz.at/hci/hci.pdf
- [Bar2020]
- Carol M. Barnum; Usability Testing Essentials; 2nd Edition, Morgan Kaufmann, 2020. ISBN 0128169427.
A Test Team Materials
{Include links to the various materials you actually used, depending
on whether you ran your tests in English or German. Materials in
German will typically have a -de
suffix.}
A.1 Checklist
The following checklist was used for the tests:
checklist.html
.
A.2 Orientation Script
The following orientation script was used for the tests:
orient.html
.
A.3 Consent Form
The following blank consent form was used in the tests:
consent-ta.pdf
.
A.4 Background Questionnaire
The following blank background questionnaire was used in the tests:
background.html
.
A.5 Task Slips
The following task slips were presented to the test users:
external-tasks-ta.pdf
.
A.6 Feedback Questionnaire
The following blank feedback questionnaire was used in the tests:
feedback.html
.
A.7 Presentation Slides
The slides used for the presentation are:
ta-slides.pdf
.
B User Materials
The following materials were filled out for each test user.
B.1 Completed Background Questionnaires
{For each test user, include a good quality PDF scan (in colour if applicable) of the completed background questionnaire. A photograph is not acceptable!}
For each test user, the completed background questionnaire was scanned as PDF:
B.2 Completed Feedback Questionnaires
{For each test user, include a good quality PDF scan (in colour if applicable) of the completed feedback questionnaire. A photograph is not acceptable!}
For each test user, the completed feedback questionnaire was scanned as PDF:
C Protected Materials
The mapping of real names to aliases, the signed consent forms, external videos, and session capture videos all contain identifiable personal data of the test users. These files are handed in separately on a USB stick. They are not to be published and are to be deleted after one year.
C.1 User Aliases
The mapping of test users' aliases to their real names is documented
in the file users.html
.
C.2 Signed Consent Forms
For each test user, the signed consent form was scanned as PDF:
tp1-consent.pdf
tp2-consent.pdf
tp3-consent.pdf
tp4-consent.pdf
tp5-consent.pdf
C.3 External Videos
For each test user, the entire test was captured with a tripod-mounted external video camera in the following files:
- TP1 “Stuart”:
tp1-ext.mp4
. - TP2 “Silvia”:
tp2-ext.mp4
. - ...
- TP5: “John”:
tp5-ext.mp4
.
C.4 Session Capture Videos
For each test user, the entire test session on the computer was captured in the following files:
- TP1 “Stuart”:
tp1.mp4
. - TP2 “Silvia”:
tp2.mp4
. - ...
- TP5: “John”:
tp5.mp4
.