The pilot test showed that users could finish the experiment tasks correctly. However, there were still a couple issues:
- There are some bugs in our interface such as some buttons were not working. Corrections were made to be sure they worked in our formal experiments.
- Some of the instructions in our protocol were not clear. When we created the instructions for the two alternative tests, we decided to state it more specifically (i.e. instead of ‘create a tree’, we can have ‘add your grandpa, add your grandma, add your …’).
- The questionnaire had some duplicate questions, we deleted some of the questions that were not clear.
The pilot test gave us some idea of the real experiment would look like. Our users took 15 minutes to do the pilot test on average and this is also what we expected the real experiment average would be.
Number of participants : 2 , One female, One male
The following criteria should pass in order for the pilot to be marked as success :
Task | Expected Outcome |
Create a family tree with both designs |
OK |
Add family members | OK |
View the tree | OK |
View a member’s profile | OK |
Time | Two time range for two designs – expected result is the Design B takes less time to complete – OK |
Errors | Two values for number of errors per design – OK |
Experiment Abstract
The experiment was conducted with number of participants N = 8. Each participant was asked to systematically go through each design to create a family tree. The instruction for the task was precise to the name of each family members intentionally to reduce the possible noises that could arise from user errors and time factor that they may take to remember the family members. For each participants the following data was measured :
- Time to complete the task using design A
- Time to complete the task using design B
- The number of encountered errors using design A and design B
Participant ID | Design Type | Time | Error |
1 | A | 9 | 0 |
1 | B | 8 | 1 |
2 | A | 7.5 | 0 |
2 | B | 6.5 | 0 |
3 | A | 3.9 | 0 |
3 | B | 3.6 | 1 |
4 | A | 3.5 | 1 |
4 | B | 5 | 2 |
5 | A | 10.47 | 0 |
5 | B | 6 | 2 |
6 | A | 3.5 | 0 |
6 | B | 5.6 | 1 |
7 | A | 9 | 1 |
7 | B | 6 | 0 |
8 | A | 15 | 2 |
8 | B | 9 | 0 |
9 | A | 5.1 | 0 |
9 | B | 8.5 | 0 |
10 | A | 3.1 | 0 |
10 | B | 4.5 | 0 |
The data then were combined in a table to be analyzed in R. Based on our findings the total number of errors made using each design was as following:
Design A Errors: 4
Design B Errors:7
Which indicated the design B was more error prone.
However the total mean time for design A was less, which indicated that the time required to complete the task with design B is less than design A.
Mean Completion time Design A : 7.007
Mean Completion time Design B : 6.270