FedericaLuraschi | CPSC 444 Project

Blog Update #7a – Final conclusions and recommendations:
Overall, we believe that the project drove our interface towards using sound design concept based on the elements we uncovered in our field study. However, our experiment could have yielded better results to support our hypotheses and validate the prototype interface. We have identified several aspects of the experiment that require further revision in order to remedy the statistically weak results we see. Nevertheless, the project produced interesting information that could help guide future work in the field. We gained insight into the user mentality and how different information types are considered as crucial for the map/chat aspects of the interface when planning for trip activities, thus providing clear connections of affordance between key elements that our intended users look for. We believe that improvements in the prototype in terms of robustness and overcoming technical limitations would greatly help take our research to the next level.

For recommendations on future research and to validate our interface, we believe some minor adjustments are needed, but the overall approach has been validated. As we have seen with previous research in the field, our hypotheses and overall tests are on the right track. However, it is important to extend the capabilities of our medium-fidelity prototype and revise certain aspects of the experiment. Overcoming technical issues with chatbot behavior/intelligence, application freezes, and dynamic updates would help produce more accurate data. Researching the relevant factors for expertise may also yield helpful information that aid in the experimental design.

Blog Update #7b – Reflection on your design and evaluation process:
Upon reflecting on the design process, we believe that the most significant ways in which the design concept and actual interface changed under the influence of the user was getting the map to be the main focus of the UI. Previously, we had wanted to only use the chat for everything. However, users have pointed out the importance of map information, which shifted our focus to include proper map implementation. Some of the biggest surprises we have seen include the fact some users almost exclusively use the map, even when the chat is also available. Another is that a user ended up spending a long time with the chat because they wanted to test the limits of the AI. For the most parts, the methods we chose (think-aloud task followed by semi-structured interview questions) were very effective. This also includes the way we recorded data using coding sheets, screen capture, camera, and logs. However, it may have been interesting if we decided to account for trip planning expertise by looking at pair/group planning and adjusted the task execution to test for that dynamic. A particularly unhelpful method we used was audio recording, since the notes we took were very detailed, the recordings seemed superfluous.

Blog Update #4a – Revised goal(s) of experiment:

On the level of content curation, the goal is to evaluate how our app compares to Google Maps, TripAdvisor or online travel resources. The comparison would be in terms of user satisfaction and frustration the user may feel during the process of usage but also on the level of the time taken to complete a task on our app vs. other tools.
More specifically, in what ways is our app better or worse than the currently available resources for travelling.

Blog Update #4b Experiment method: detailed description of the following components:

Participants – For our experiment, we are planning to have a minimum of 5 participants. These participants should range from non-computer science students who are more likely to believe that the concierge bot is a true AI system to computer science students who may be more skeptical to the technology. The participants should also be people that have experience travelling since they are the people that would actually use our system. To recruit our participants, we plan on sending out the questionnaire we used for our field study which asks general information about whether a person has travelled or not. If they happen to have experience travelling, then we plan on recruiting them for our experiment.

Conditions – For our experiment, we plan on asking participants to use our interface to complete a task and also to complete the same task using Google and online resources that are available to them through the use of Google. Although the task itself will be the same (e.g. find a museum in the city), the city for our interface will be Tokyo and the city will be Seoul for Google. This is to make sure that the two tasks a user will complete are independent of each other and the user can’t use previous knowledge to complete the second task they are asked. Furthermore, we plan on randomizing whether the participant tests our interface first or Google and other resources first, again to make the results as accurate and independent as possible. We chose Tokyo and Seoul as the cities because they are similar to keep consistency between the two tasks while still producing independent results.

Tasks – As mentioned we are comparing our interface with Google (and online resources that can be found through Google such as TripAdvisor or even Google Maps) with regards to collecting information about a travel destination. More specifically, the participant will be given a task respective to the interface they are using. When using our interface, the user will be asked to find a museum to visit in Tokyo. When using Google, the user will be asked to find a museum in Seoul. The task is focused and specific enough to direct and guide the user while producing useful data yet gives the participant full freedom to show us their process for museum selection in a new city.

Design – Regarding our experimental design, participants will be given a phone on which they will need to complete the tasks required. This phone will give access to Google and also have our prototype on it for this experiment. Depending on whether the participant is using our interface or Google, they will need to find a museum to visit in Tokyo or Seoul, respectively. As mentioned, the task they complete first will be randomized. Because our app features a chat-bot, we will be on the other end of the chat answering questions they may ask in order to complete the task. The level of frustration, satisfaction and time taken to complete tasks on both interfaces will be tested and compared. We plan on timing the process and through observation, noting any difficulties the participant may have in either case.

Procedure – In randomized order participant will complete either 1 or 2 first

Participant is given the task to search for museum on our interface in Tokyo
Participant is given the task to search for museum on Google in Seoul
Participant is asked interview questions regarding frustration levels or overall satisfaction with both processes

Apparatus –
- Phone with our prototype or access to Google (including other online resources)
- Remote person to answer participant’s chat questions
- Interview Questions
- Script to run identical process for all participants

Independent and dependent variables- The data collected for statistical analysis will be from the post experiment interview questions we ask which will rate confusion, frustration or preference over the two interfaces. We will also be timing the process of task completion for both interfaces. The dependent variables will be the time spent on task completion, user satisfaction ratings and preference, and the number and severity of confusing points encountered. The independent variable will be the interface type (prototype or Google with access to other online resources). The individual differences in experience with using tools for trip planning will be offset by randomizing the task completion order using either interface within participants.

Hypotheses – remember to state these in terms of the independent and dependent variables

H0 – There is no significant difference between the two interfaces in terms of time spent, satisfaction/preference, and points of confusion for the users

H1 – Users will be more satisfied with the prototype and prefer it over using Google (including other online resources) for carrying out the task

H2 – Users will experience significantly fewer points of confusion when using the prototype when compared to using Google (including other online resources)

H3 – Users will spend significantly less time in completing the task when compared to using Google (including other online resources)

Planned statistical analyses

We plan on running t-tests on the data collected to test the validity of the hypotheses. Depending on which hypothesis we are testing, the data collected used will differ. Data will be gathered from post-experiment interview questions.

Expected limitations of the planned experiment

Our prototype features a map and a chat-bot. This map in our prototype will be hard-coded and cannot be updated dynamically for user requests. Although we plan on covering the cases necessary for the user to complete the task, the lack of dynamic mapping may limit the information that a user can request or receive from the use of our app and therefore, our results. Another limitation is that this chat-bot is not an intelligent AI system as we are letting the user believe, but rather a human. Given this, the responses the user receives may be delayed and may not be as thorough as the ones of a properly implemented AI system. Also, most the data collected for statistical analysis will be from the post-interview questions so there is a chance they may not fully quantify certain aspects of our interface.

Blog Update #4c – Supplemental experiment materials:

CPSC 444 Project – Locl

Just another UBC Blogs site

Author Archives: FedericaLuraschi

Milestone IV – Blog Update #7

Milestone III – Blog Update #4