Update 4b: Experiment Design – The Collaborative Video Project

Participants:

We are planning to have have our participants composed of the general population. The inclusion criteria are as follows:

must be able to use a computer and follow along with an educational video.
must not be an expert of the video tutorial subjects used in the study

We plan to recruit participants using word of mouth (convenience sampling) and a call for participants at UBC. We expect to recruit and perform the experiment on 8 participants due to our 4 combination counterbalancing.

Conditions:

In this experiment, we will compare a user’s performance on our prototype versus the user’s performance using YouTube. We will examine how quickly users can perform tasks on both interfaces. This includes how quickly users can find annotations (a timestamped user remark) as well as complete an entire task described in a video tutorial. In addition, we will be looking at a user’s willingness to use each system and their preference.

Tasks:

On each video and interface, participants will be asked to perform the following tasks in the given order:

Complete entire task described in video
Find a specific high visibility annotation
Find a specific low visibility annotation
Complete questionnaire including Likert scales indicating their preferences

Design:

To test the speed of finding annotations, we will use a 2×2 (annotation visibility x interface type) within-subjects factorial design. We will have 2 levels of annotation visibility: high and low. We will also have 2 levels of interface type: YouTube (System Red to participants) and our interface (System Blue). Regarding annotation visibility, high means that the placement of the annotation is in an immediately visible position in the list of annotations without scrolling on our system. A low visibility means that the annotation may only be found after scrolling through the list of annotations on our system. To test the time taken to complete an entire task described in a video, we will use a t-test to compare both interface types.

We will use a counterbalancing method to eliminate order effects. Participants will interact with both interfaces with two different videos. For example, a user might be assigned to the first video on our system followed by the second video on YouTube. There are four possible possible combinations displayed in the table below:

Table 1: Counterbalancing method for our experiment

Combination	First Scenario	Second Scenario
1	YouTube, Video 1	Our System, Video 2
2	YouTube, Video 2	Our System, Video 1
3	Our System, Video 1	YouTube, Video 2
4	Our System, Video 2	YouTube, Video 1

We plan to counterbalance this way because a user cannot watch the same video twice, due to learning effects. After completing a tutorial, a user would become familiar with the steps and anticipate what should be done next, biasing our results. Thus, we are using two different videos regarding knot tying, choosing the videos based on:

Similarity of content. We wanted the videos to be similar enough in content and length to be comparable without introducing video as a factor, but different enough to eliminate significant learning effects.
Number of comments. The videos should have a similar number of comments.
How easily segmentable they are. We want to use videos that have logically segmentable sections, so that the segments can be potentially useful for users.
How likely it is for participants to already know how to complete the prescribed task. We want to use tutorial videos that users will probably be unfamiliar with, to address the potential confounding factor of participant expertise.
How complicated the video is. We want the users to be able to complete the tutorial without struggling greatly, but also make sure the task to be completed is non-trivial.
How lengthy the video is. Since participants will be watching two videos, we do not want to bore them with long videos.

The videos chosen are as follows:

(Video 1) How to Tie the Celtic Tree of Life Knot by TIAT: https://www.youtube.com/watch?v=scU4wbNrDHg

(Video 2) How to Tie a Big Celtic Heart Knot by TIAT: https://www.youtube.com/watch?v=tfPTJdCKzVw

For the Youtube interface, the participant will be directed to the corresponding video hosted on Youtube. For our developed interface, the participant will interact with the interface on a local machine. The comments (non-timestamped remark) and annotations for our developed system will be imported from the Youtube video hosting the same video. These will be randomly assigned in our system to be either a comment or an annotation, with a 50% chance of each. We decided on 50% since there is no precedent for a system like this to provide more accurate data on how comments and annotations should be distributed. Similarly, we assume that annotations’ timestamps are uniformly distributed across time. To make a fair comparison between both interfaces, all comments will be sorted based on most recent to least recent. For our developed interface, the video will be segmented manually beforehand based on places where the video either visually or audially pauses for more than one second or where the video transitions in some way (e.g. a screen transition).

Procedure:

Participants will be informed of the goals of the research experiment and consent will be reviewed.
Participants will be asked about their familiarity with the tutorial subject to exclude participants who are experts in the subject.
On the first interface and video combination as determined by counterbalancing, the participant will be asked to perform the experimental tasks in the order given. Tasks 1 to 3 will be timed and recorded by the researcher using a stopwatch. The researcher will also take any relevant notes as the participant is doing each task, especially when the participant interacts with the segments feature in our developed interface. To determine when a task has been completed for timing purposes, the participants will be asked to indicate to the researcher when they have completed the task. The task will only be accepted once it has been completed correctly.
Repeat step 2 using the other interface and video combination.
Have the participants fill in a questionnaire regarding their demographic information and their preferences regarding the two interfaces that they used.
Ask if the participants have any remaining questions before concluding the study.

Apparatus:

We plan on conducting this experiment in a quiet environment, such as ICICS x360.
The participant will use one of the computers available in the lab room to perform the tasks required while the researcher observes from the side.
Cell phone timers will be used to record the time it takes a participant to complete all tasks.
During the experiment, notes will be taken by team members on either a laptop or on paper.

Hypotheses:

Speed:
H1. Finding a specified annotation is faster using our system compared to Youtube for high visibility annotations.
H2. Finding a specified annotation is no slower using our system compared to Youtube for low visibility annotations.
H3. Completing an entire task prescribed in a video is no slower on our system compared to Youtube.

User Preference:
H4. Users will prefer our system’s comment and annotation system over Youtube’s.
H5. Users will not have a preference towards either system overall.

Priority of Hypotheses:

H4 and H3 are most important since they have more direct and tangible implications for design; H3 is concerned with overall usefulness of the system and H4 is concerned with users’ willingness to use the system.
H1 is important since it tests one of the big potential advantages of our system; however, it is in a more limited scope and applicability than H3.
H2 is reasonably important since it examines the potential tradeoff of having annotations and comments in separate sections.
H5 is least important since it is dependent on a comparison of a fully functional interface and a still-in-development interface. At this stage, it would be beneficial to get a sense of users’ overall opinion, but it is important to recognize that this may change as our interface is developed.

Planned Analysis:

For our statistical analysis, we will be using a 2 factor ANOVA (2 interface types x 2 annotation visibilities) for the time it takes for the participant to find specific annotations in our system compared to Youtube’s interface. A two-tailed paired t-test will also be used to compare the completion time of an entire task between the two interfaces. In order to measure the user’s preference of interface type, descriptive statistics of Likert scale data will be collected from each participant’s questionnaire.

Expected Limitations:

There are various issues that we expect to be limitations in our experiment, including:

The type of video we are testing. Since we are only using one specific type of educational video in our experiment, we may miss out on some interactions users have with different types of video.
Breadth of Comparison. Our experiment will only be testing our system against Youtube. We are not accounting for differences that may exist between our system and other popular video contexts, such as Khan Academy.
Comment/Annotation placement. The way that “existing” comments/annotations are placed in our system is predetermined. We are assuming an equal chance for a user to post an annotation or comment, and that annotations will be uniformly distributed by time. Since there is no precedent for a similar system, we cannot determine the validity of this assumption.
Video Segmentation. We are segmenting the videos based on our own judgement whereas the fully functional system would automatically segment based on user input. This may limit the validity of the video segments that have been chosen.

Leave a Reply Cancel reply