Thinking Aloud Test Report

Human-Computer Interaction SS 2024

Group GT-XX

Harald Roth
Christian Traum
Thomas Gelb
Sabine Schwarz

Thinking Aloud Test of the Web Site

`example.com`

Report of XX^th May 2024

{My instructions and comments are contained inside curly brackets. Remove them before you hand in your work!}

{You must use your own words. Do not copy material from the web, from colleagues from previous years, or from anywhere else. Do not generate text using AI-based tools. You are allowed to copy your own words from your own TA Plan.}

1 Executive Summary

{Executive Summary of the main results from the usabilty test aimed at higher management (between 300 and 500 words). In paragraphs, not bulleted lists.}

{Your client's manager will not read the whole report, but only the executive summary, and wants to know how the test was done and what the main findings were.}

{The description of procedure, methodology, and setup should take up no more than 25% of the executive summary. Most of the executive summary should summarise the main findings.}

2 Introduction

{Short description of the web site to be tested.}

{State whether the tests were run in English or German. Adapt my text below accordingly.}

The web site is in German and was tested in German with German-speaking test users.

The web site is available in both German and English. The English version was tested with English-speaking test users.

3 Test Procedure

3.1 Test Methodology

{Describe what a TA Test is and how it is done. Write between 300 and 500 of your own words. Replace my sample text below with your own.}

A thinking aloud test is ... ... can be found in Keith Andrews' course notes [And2023]. The thinking aloud method is described in detail by Barnum [Bar2020].

{Cite at least two more references of your own in this section and include them in the References section below. Use only indirect quotations (paraphrasing), no direct quotations. You may remove or retain my sample reference citations as you wish.}

3.2 User Profiles

{Describe the kinds of user the site is trying to attract.}

{Group these users into categories according to their characteristics.}

{Describe the goals and typical tasks for each of these user groups.}

{Describe the user group you actually tested, i.e. the one you were notified to use.}

3.3 Test Users

{Describe the selection process for test users, to ensure that they come from the user group chosen for testing.}

{Collect all the data from the background questionnaires, including any domain-specific questions, and enter it into the table below. Use the fictitious first name aliases of your test users.}

Table 1 gives an overview of the test users who participated in the study. First name aliases are used throughout this report to protect their identity.

Test User	TP1 (Pilot)	TP2	TP3	TP4	TP5
Alias	“Stuart”	“Silvia”	“Sally”	“Martin”	“John”
Date of Test
Time of Test
General Information
Gender	man	woman	woman	man	man
Age	23	27	30	32	25
Occupation	Student	Web developer	Student	Test Engineer	Electrician
Education
Highest Level	A Levels	Bachelor	A Levels	Master	A Levels
Area of Study	studying law	Computer Science	studying economics	Computer Science	-
...
etc.
...
Domain-Specific Questions
...
Previous Usability Tests
As Test Person
In Test Team
Type of Test

Table 1: Overview of the test users.

3.4 Test Environment

{Include one or two photos of the room and test environment (JPEG). Anonymise anyone who is identifiable in the photos by pixelating or blurring their faces.}

The test room is shown in Figure 1.

Test room, test user in front of PC — Figure 1: Photo of the test room.

The environment used for the thinking aloud tests is shown in Table 2.

{Fill out the table with the exact details used at the time of the TA tests.}

Device	Dell Precision 5510, 32gb RAM
OS and Version	Windows 10 Pro 64-bit v22H2 EN
Screen Size	21″ TFT
Screen Resolution	1920×1080
Web Browser and Version	Chrome 111.0.5563.111
Internet Connection	A1 LTE 4G Hotspot
Download Speed	30 mbps
Screen Recording Software	OBS Studio 25.0.8
Recording Resolution	1920×1080
Video Editing Software	Lossless Cut 3.59.1 Win
Video Transcoding Software	not required

Table 2: Environment used for the thinking aloud test.

{Include a screenshot (PNG) of the browser About window to document the exact browser version used in the test.}

The exact web browser version used for the tests is shown in Figure 2.

Screenshot of the Browser About Window — Figure 2: Browser version used for the tests.

{Explain that ad blockers were not used (uninstalled, disabled).}

{Explain that use of cookies was left up to each test user.}

{Explain how faces were blurred or pixelated in the screen recordings.}

{Describe the external video equipment (tripod, video camera, microphone, mirror) which was used to record the entire test.}

3.5 Training

{Describe any interface training given to each user. For example, a special input device, window manager, or software.}

{Describe any training on thinking aloud given to each user. How did the facilitator demonstrate the thinking aloud technique? Did the user practice it?}

3.6 Tasks

{Enter the tasks which were actually used, i.e. the ones you were notified to use.}

{If you tested in German, list the tasks in German and provide English translations.}

The internal task list used by the test team is shown in Table 3.

Task No.	Description	Prerequisites	Completion Criteria	Max. Time	Possible Solution Path
1	[First Impressions] Please go to the web site: `example.com` and spend a few minutes looking around.	Web browser opened at `google.com`	User indicates they have finished looking around or time has elapsed. The facilitator then asks the user: Who does this web site represent? Who is this web site intended for? What does this web site offer?	3 minutes.
2	[Motivational task] {Very easy so that users have a feeling of success. Will not necessarily be analysed as part of the test.}			2 minutes.	Home → Contact
3	[Fairly easy]			2 minutes.
4	[Medium difficulty]			5 minutes.	About Us → Jobs → ...
5	[More involved]			10 minutes.

Table 3: The internal task list used by the test team.

The task descriptions given to the users during the test are shown in Table 4. The actual task slips are included in Appendix A.5.

Task No.	Description
1	The task description given to the user.
2	...
3
4
5

Table 4: Tasks given to the users.

3.7 Interview Questions

{If you tested in German, list the questions in German and provide English translations.}

The following questions were asked of each user immediately after the final task. Summaries of the interviews can be found in Section 4.7.

Opening Question

{The first question should always be "How was it?", an open question to encourage the user to speak freely.}
- "How was it?" ["Wie war's?"]
  {Then let the user speak, and encourage them, until they dry up and stop of their own accord.}
Standard Questions

{Next, you ask a series of standard questions prepared in advance for every test user. More general questions should come first, more specific questions later. At least four standard questions should have been prepared for this section in the TA Plan.}
1. "Did anything strike you as particularly good?"
  ["Gab es etwas, dass besonders gut oder positiv aufgefallen ist?"]
2. "Did anything strike you as particularly bad?"
  ["Gab es etwas, dass besonders schlecht war?"]
3. "..."
Individual Questions

After the standard questions above, any individual questions which arose during the test with each particular test user were asked.

3.8 Feedback Questionnaire

After the interview, the user was asked to fill out the feedback questionnaire given in Appendix A.6. The feedback questionnaire was given on paper. An overview of the results can be seen in Section 4.8.

3.9 Data Collection

{Explain that demographic, behavioural, and attitudinal data collected during the study are associated with a TPid and first name alias. Test users are only referred to by these in this report.}

{Explain that personal data (real name, signed consent form, and the full video recordings) collected from the test persons are kept separately and will will be deleted after one year.}

{Explain that any faces in the video clips used to illustrate findings in the report have been blurred or pixelated.}

4 Results

4.1 Task Completion

A summary of how many users completed each task and whether any assistance was given is shown in Table 5.

{Task completion is often a binary measure, whereby 0 indicates failure and 1 indicates success. Sometimes, a percentage measure of success is used. Explain which measure you are using.}

	Task 1	Task 2	Task 3	Task 4	Task 5
TP1	1	0	1	1	1
TP2	1	1*	1	1	1
TP3	1	1	1	1	1
TP4	1	1	1	1	1
TP5	1	0	0	1	1
Total	5	3	4	5	5
%	100	60	80	100	100

Table 5: Task completion rates. An asterisk (*) indicates that assistance was given.

4.2 First Impressions

Tables 6a, 6b, and 6c show a summary of the responses of the test users to the questions asked after the first task.

{Summarise in English the responses of the test users to the three questions asked after the first task.}

TP1	An online shop called...
TP2
TP3
TP4
TP5

Table 6a: User responses to the question: "Who does this site represent?"

TP1	Geeks and nerds.
TP2
TP3
TP4
TP5

Table 6b: User responses to the question: "Who is this web site intended for?"

TP1	Hardware components...
TP2
TP3
TP4
TP5

Table 6c: User responses to the question: "What does this web site offer?"

{Never invent content. If you forgot to ask about the first impressions, say so.}

4.3 Top Three Positive Findings

The three most positive findings according to their average (mean) positivity ratings are described in more detail below. The positivity rating scheme used to rank positive findings is shown in Table 7.

Positivity	Meaning
4	Extremely Positive
3	Major Positive
2	Minor Positive
1	Cosmetic Positive
0	Not a Positive

Table 7: Positivity Ratings.

{Describe the top three positive findings which emerged during the test, in descending order of mean positivity (most positive first).}

{One instance from every user who experienced a finding should be extracted as a video clip and included in the mini-table for that finding, each linked to the corresponding video clip. List the corresponding timestamp(s), which refer to elapsed time from the start of the corresponding test user's full session capture video.}

{For each finding, include at least one paragraph of text in the description. Choose the best video clip from those available for the finding to embed as a figure in the description.}

P01. Registration Page Well Laid Out

Title:	Registration Page Well Laid Out
Description:	The registration page is well laid out and clear.
Video Clip(s):	p01-tp1-reg-clear.mp4, p01-tp3-reg-clear.mp4, p01-tp4-reg-clear.mp4
Timestamp(s):	TP1 00:01:15, TP3 01:24:36, TP4 01:26:40
Location (How Reproducible?):	Home → Register
Mean Positivity:	4.00

Figure 3: The registration page was commended as well laid out by two test users.

The registration page was praised as well laid out by two test users... as can be seen in Figure 3.

P02. etc.

4.4 List of All Positive Findings

{Aggregated list of all positive findings observed in the test, in descending order of mean positivity.}

{For every test user who experienced a positive finding, a video clip should be extracted from the session capture video and its name listed in the table cell (linked to the video clip itself). The timestamp of each video clip in the full session capture video should be listed in the neighbouring cell.}

Table 8 shows a list of all the positive findings which emerged from the test, sorted in descreasing order of average (mean) positivity, i.e. the most positive are at the top of the table. Positivity ratings were assigned by members of the test team, their name codes are shown in Table 9.

No.	Title	Description	Video Clip(s)	Timestamps(s)	Location (how reproducible?)	Positivity
No.	Title	Description	Video Clip(s)	Timestamps(s)	Location (how reproducible?)	HR	CT	TG	SS	Mean
1	Registration Page Well Laid Out	The registration page is well laid out and clear...	p01-tp1-reg-clear.mp4 p01-tp3-reg-clear.mp4 p01-tp4-reg-clear.mp4	TP1 00:01:15, TP3 01:24:36, TP4 01:26:40	Home → Register	4	4	4	4	4.00
...					Any product page, link "Shipping"
...		{in descending order of mean positivity until... }								...
...
11	News Section Up-to-Date	The entries in the news section are up-to-date...	p11-tp2-news-current.mp4	TP2 00:12:48	`something.com/news`	1	0	1	1	0.75

Table 8: Aggregated list of all positive findings, in descending order of average (mean) positivity.

Code	Meaning
HR	Harald Roth
CT	Christian Traum
TG	Thomas Gelb
SS	Sabine Schwarz

Table 9: Name codes.

4.5 Top Five Problems

The five most serious problems according to their average (mean) severity ratings are described in more detail below. Problem number 1 is the problem (negative finding) with the highest mean severity. The severity rating scheme used to rank problems is shown in Table 10.

Severity	Meaning
4	Catastrophic problem
3	Serious problem
2	Minor problem
1	Cosmetic problem
0	Not a problem

Table 10: Severity ratings.

{Describe the top five problems (negative findings) which emerged during the test, in descending order of mean severity.}

{One instance from every user who experienced a finding should be extracted as a video clip. In the mini-table, include the names of all the video clip(s) for that finding, linked to the corresponding video clip. List the corresponding timestamp(s), which refer to elapsed time from the start of the corresponding test user's full session capture video.}

{For each problem, choose the best video clip from those available for the problem to embed as a figure. Include at least one paragraph of text describing the problem itself. Include a further paragraph of text describing way(s) the problem could potentially be addressed.}

N01. Unclear when Free Delivery Applies

Title:	Unclear when Free Delivery Applies
Description:	Contradictory information: “free delivery” or “free delivery above €200”.
Video Clip(s):	n01-tp4-unclear-when-delivery-free.mp4, n01-tp5-unclear-when-delivery-free.mp4
Timestamp(s):	TP4 00:03:00, TP5 00:01:50.
Location (How Reproducible?):	Home → Shipping → Delivery
Mean Severity:	3.50

Figure X: It is unclear whether free delivery applies always, or only for orders above €200.

Two TPs discovered the inconsistency regarding when free delivery applies... as can be seen in Figure X.

To solve this problem, state clearly and in one place, when free delivery applies...

N02. Shipping Link not to Shipping Information

Title:	Shipping Link not to Shipping Information
Description:	When the shipping link was clicked products were shown. The shipping link should go to shipping information.
Video Clip(s):	n02-tp2-shipping.mp4
Timestamp(s):	TP2 00:06:04
Location (How Reproducible?):	Any product page, link “Shipping”.
Mean Severity:	3.25

Figure X+1: The link to shipping should actually go to shipping information...

From a product page, the shipping link does not actually go to information about shipping... as can be seen in Figure X+1.

When the shipping link is clicked, it should actually go to information about shipping...

N03. etc.

4.6 List of All Problems Found

{Aggregated list of all problems observed in the test, in descending order of mean severity.}

{For every test user who experienced a problem, a video clip (MP4 H.264 AAC) should be extracted from the session capture video and its name listed in the table cell (linked to the video clip itself). The timestamp of each video clip in the full session capture video should be listed in the neighbouring cell.}

Table 11 shows a list of all the problems observed in the test, sorted in descreasing order of average (mean) severity, i.e. the most severe are at the top of the table. Severity ratings were assigned by members of the test team, their name codes are shown in Table 12.

No.	Title	Description	Video Clip(s)	Timestamps(s)	Location (how reproducible?)	Severity
No.	Title	Description	Video Clip(s)	Timestamps(s)	Location (how reproducible?)	HR	CT	TG	SS	Mean
1	Unclear when Free Delivery Applies	Contradictory information: “free delivery” or “free delivery above €200”.	n01-tp4-unclear-when-delivery-free.mp4 n01-tp5-unclear-when-delivery-free.mp4	TP4 00:03:00, TP5 00:01:50	Home → Shipping → Delivery	2	4	4	4	3.50
...					Any product page, link “Shipping”
...		{in descending order of mean severity until... }								...
...
21	No Catalogue Name	Unsure which catalogue, since no name is displayed.	n21-tp2-cat-name.mp4	TP2 00:12:48	`example.com/en/page.bto?page=catalog`	1	0	1	1	0.75

Table 11: Aggregated list of all problems found, in descending order of average (mean) severity.

Code	Meaning
HR	Harald Roth
CT	Christian Traum
TG	Thomas Gelb
SS	Sabine Schwarz

Table 12: Name codes used for severity ratings.

4.7 Interviews

{For each test user, summarise in English the main points made by the user in the their interview.}

TP1 “Stuart” appreciated the simplicity of the registration process, ...

TP2 “Silvia” stated immediately that the...

TP3 “Sally” had to be encouraged to talk out loud, but was very forthcoming in her interview. She stressed that...

etc.

{Never invent content. If you forgot to do the interviews, say so.}

4.8 Feedback Questionnaires

{A tabular summary of the responses to the feedback questionnaire, with the mean and standard deviation. Make sure to include the extra domain-specific questions.}

{The neutral scale in the original feedback questionnaire is replaced by a weighted scale between 6 (best) and 0 (worst). The users' responses are then entered into the columns on the right.}

{The mean and standard deviation is calculated for each question and entered with 2 decimal places in the column on the far right.}

{So that the reader can see the overall results at a glance, the point on the scale closest to the mean is marked highlighted through the style sheet.}

{If you tested in German, use English versions of the questions and answers in this table.}

Table 13 shows a summary of the ratings given by users in the feedback questionnaire at the end of the test. The neutral scale in the original feedback questionnaire has been mapped to a weighted scale between 6 (best) and 0 (worst). The numbers in bold indicate the (rounded) mean rating.

The original blank questionnaire can be found in Section A.6. Scans of the questionnaires as completed by each test user can be found in Section B.2.

					TP1	TP2	TP3	TP4	TP5	Mean	Std Dev
1.	Getting to the right part of the site.	Very easy	6 5 4 3 2 1 0	Very hard	5	5	3	2	4	3.80	1.30
2.	Quality of information.	Very good	6 5 4 3 2 1 0	Very poor	3	3	2	1	2	2.20	0.84
3.	It is easy to read the text.	Very easy	6 5 4 3 2 1 0	Very hard	1	1	1	1	1	1.00	0.00
etc.	...	Very satisfied	6 5 4 3 2 1 0	Very unsatisfied
12.	How likely are you to return to this site later?	Definitely	6 5 4 3 2 1 0	Never	1	1	1	1	1	1.00	0.00
etc.	...	Very satisfied	6 5 4 3 2 1 0	Very unsatisfied

Table 13: Summary of user ratings from the feedback questionnaire.

References

{References to related work and related studies. Include at least two more references of your own. Do not include references to Wikipedia (or copies of Wikipedia). You may remove or retain my sample references as you wish. All references listed here must be cited somewhere in the document.}

[And2023]: Keith Andrews; Human-Computer Interaction: Course Notes; 13 Mar 2023. http://courses.isds.tugraz.at/hci/hci.pdf
[Bar2020]: Carol M. Barnum; Usability Testing Essentials; 2^nd Edition, Morgan Kaufmann, 2020. ISBN 0128169427.

A Test Team Materials

{Include links to the various materials you actually used, depending on whether you ran your tests in English or German. Materials in German will typically have a -de suffix.}

A.1 Checklist

The following checklist was used for the tests: checklist.html.

A.2 Orientation Script

The following orientation script was used for the tests: orient.html.

A.3 Consent Form

The following blank consent form was used in the tests: consent-ta.pdf.

A.4 Background Questionnaire

The following blank background questionnaire was used in the tests: background.html.

A.5 Task Slips

The following task slips were presented to the test users: external-tasks-ta.pdf.

A.6 Feedback Questionnaire

The following blank feedback questionnaire was used in the tests: feedback.html.

A.7 Presentation Slides

The slides used for the presentation are: ta-slides.pdf.

B User Materials

The following materials were filled out for each test user.

B.1 Completed Background Questionnaires

{For each test user, include a good quality PDF scan (in colour if applicable) of the completed background questionnaire. A photograph is not acceptable!}

For each test user, the completed background questionnaire was scanned as PDF:

B.2 Completed Feedback Questionnaires

{For each test user, include a good quality PDF scan (in colour if applicable) of the completed feedback questionnaire. A photograph is not acceptable!}

For each test user, the completed feedback questionnaire was scanned as PDF:

C Protected Materials

The mapping of real names to aliases, the signed consent forms, external videos, and session capture videos all contain identifiable personal data of the test users. These files are handed in separately on a USB stick. They are not to be published and are to be deleted after one year.

C.1 User Aliases

The mapping of test users' aliases to their real names is documented in the file users.html.

C.2 Signed Consent Forms

For each test user, the signed consent form was scanned as PDF:

tp1-consent.pdf
tp2-consent.pdf
tp3-consent.pdf
tp4-consent.pdf
tp5-consent.pdf

C.3 External Videos

For each test user, the entire test was captured with a tripod-mounted external video camera in the following files:

TP1 “Stuart”: tp1-ext.mp4.
TP2 “Silvia”: tp2-ext.mp4.
...
TP5: “John”: tp5-ext.mp4.

C.4 Session Capture Videos

For each test user, the entire test session on the computer was captured in the following files:

TP1 “Stuart”: tp1.mp4.
TP2 “Silvia”: tp2.mp4.
...
TP5: “John”: tp5.mp4.