Monthly Archives: March 2017

Milestone IV – Blog Update #7

Blog Update #7a – Final conclusions and recommendations:
Overall, we believe that the project drove our interface towards using sound design concept based on the elements we uncovered in our field study. However, our experiment could have yielded better results to support our hypotheses and validate the prototype interface. We have identified several aspects of the experiment that require further revision in order to remedy the statistically weak results we see. Nevertheless, the project produced interesting information that could help guide future work in the field. We gained insight into the user mentality and how different information types are considered as crucial for the map/chat aspects of the interface when planning for trip activities, thus providing clear connections of affordance between key elements that our intended users look for. We believe that improvements in the prototype in terms of robustness and overcoming technical limitations would greatly help take our research to the next level.
For recommendations on future research and to validate our interface, we believe some minor adjustments are needed, but the overall approach has been validated. As we have seen with previous research in the field, our hypotheses and overall tests are on the right track. However, it is important to extend the capabilities of our medium-fidelity prototype and revise certain aspects of the experiment. Overcoming technical issues with chatbot behavior/intelligence, application freezes, and dynamic updates would help produce more accurate data. Researching the relevant factors for expertise may also yield helpful information that aid in the experimental design.
Blog Update #7b – Reflection on your design and evaluation process:
Upon reflecting on the design process, we believe that the most significant ways in which the design concept and actual interface changed under the influence of the user was getting the map to be the main focus of the UI. Previously, we had wanted to only use the chat for everything. However, users have pointed out the importance of map information, which shifted our focus to include proper map implementation. Some of the biggest surprises we have seen include the fact some users almost exclusively use the map, even when the chat is also available. Another is that a user ended up spending a long time with the chat because they wanted to test the limits of the AI. For the most parts, the methods we chose (think-aloud task followed by semi-structured interview questions) were very effective. This also includes the way we recorded data using coding sheets, screen capture, camera, and logs. However, it may have been interesting if we decided to account for trip planning expertise by looking at pair/group planning and adjusted the task execution to test for that dynamic. A particularly unhelpful method we used was audio recording, since the notes we took were very detailed, the recordings seemed superfluous.

Milestone IV – Blog Update #6

Blog Update #6a – Pilot Test:
In our pilot test, we found out that more preset/canned responses were necessary for the range of questions that the user could ask in the chat. Our previous repertoire of responses only covered the basic greetings, museum information, and travel directions. We expanded it to include other topics such as public transportation, pricing, restaurants, etc. We felt that this would help cover the limitation we have in a static map view with hard-coded markers. We also found out that the participant may respond better if the chat interface was not entirely AI. With recommendation from our TA, we decided to change the concierge to be a human/bot hybrid. As a result, whether or not the participants we looked for are more easily convinced that we used a full AI was no longer applicable. Furthermore, we decided to move our experiment setting to that which emulates the bustling and active environment of a hotel in order to help the participant immerse in the task scenarios. We hoped that this would generate more accurate results than a lab setting, and decided to designate areas in the NEST and nearby cafés for holding the experiment.

 

Blog Update #6b – Experiment Abstract:
We investigated the use of a chatbot to act as a responsive concierge for users planning trip activities on the go. It was our expectation that this service would provide customized and well-informed responses to users and help improve efficiency. Our experiment aimed to compare two interfaces, our Locl Bot prototype and traditional resources like Google Maps, with 5 participants in order to evaluate the the pros and cons of each interface via the task of finding a museum to visit in a foreign city. The results collected from time spent, satisfaction, and frustration ratings showed that there was no significant difference between our prototype and the traditional alternative. This was likely due to statistically weak data from the small sample size and various technical issues that may have affected the results. In the future, we hope to overcome these limitations in design with improvements and to validate our hypotheses.

 

Blog Update #6c – Revised Supplementary Experiment Materials:

Milestone III – Blog Update #5

#5a – Rationale of Medium Fidelity Prototyping Approach

  1. Our prototype is more vertical than horizontal. This is because chat and map are crucial in testing our prototype. Since making a dynamic chat and map are time consuming we decided to reduce horizontal features of the prototype.
  2. The chat is Wizard-of-Oz’d by having a human that’s pretending to be a bot. The bot’s responses will be scripted so that we’re consistent throughout. From a technical standpoint, we’ve already created a chat and map interface. We used ZenDesk (a customer service company) and Google Maps API. We will be telling the subject that we’re using a Google voice API to parse their chat. We will then reveal that a human is the bot.
  3. Functionality is more important than appearance.
  4. Prototype should be used on a phone.
  5. For our prototype, we are using the Google Maps API and ZenDesk.

 

#5b – Prototype Demonstrations

Figure 1: The prototype consists of a map screen that covers the entire interface with basic controls. The map is built using Google Maps API, allowing the map to be dragged, zoomed in/out, clicked for details, and switched to Street View. At the bottom left corner is our chat button, which was kept small to minimize it being intrusive in any way as we have found most users tend to mainly focus on the map.

Figure 2: The chat window allows the user to communicate with our bot. This screen shows the startup greeting and appearance of the chat window. The top right button shrinks the window so that the user may return to the map view at any time and continue exploring.

Figure 3: This shows an example of the conversation flow one might expect to see from the user interacting with the bot. Given what information the user requests, the chatbot will respond accordingly and make any necessary updates on the map view and direct the user’s attention to them.

Figure 4: This screen shows the updated map view according to the conversation between the chatbot and the user. The information presented here can be personalized to the user request depending on the extent of exploration.

Figure 5: A typical way to end the conversation with the bot after gathering information that is required for planning to visit a point of interest on the go. The amount of information required varies from user to user, but our bot should provide enough dynamic feedback to account for most experience levels.

Milestone III – Blog Update #4

Blog Update #4a – Revised goal(s) of experiment:

  • On the level of content curation, the goal is to evaluate how our app compares to Google Maps, TripAdvisor or online travel resources. The comparison would be in terms of user satisfaction and frustration the user may feel during the process of usage but also on the level of the time taken to complete a task on our app vs. other tools.
  • More specifically, in what ways is our app better or worse than the currently available resources for travelling.

Blog Update #4b Experiment method: detailed description of the following components:

  • Participants – For our experiment, we are planning to have a minimum of 5 participants. These participants should range from non-computer science students who are more likely to believe that the concierge bot is a true AI system to computer science students who may be more skeptical to the technology. The participants should also be people that have experience travelling since they are the people that would actually use our system. To recruit our participants, we plan on sending out the questionnaire we used for our field study which asks general information about whether a person has travelled or not. If they happen to have experience travelling, then we plan on recruiting them for our experiment.
  • Conditions – For our experiment, we plan on asking participants to use our interface to complete a task and also to complete the same task using Google and online resources that are available to them through the use of Google. Although the task itself will be the same (e.g. find a museum in the city), the city for our interface will be Tokyo and the city will be Seoul for Google. This is to make sure that the two tasks a user will complete are independent of each other and the user can’t use previous knowledge to complete the second task they are asked. Furthermore, we plan on randomizing whether the participant tests our interface first or Google and other resources first, again to make the results as accurate and independent as possible. We chose Tokyo and Seoul as the cities because they are similar to keep consistency between the two tasks while still producing independent results.
  • Tasks –  As mentioned we are comparing our interface with Google (and online resources that can be found through Google such as TripAdvisor or even Google Maps) with regards to collecting information about a travel destination. More specifically, the participant will be given a task respective to the interface they are using. When using our interface, the user will be asked to find a museum to visit in Tokyo. When using Google, the user will be asked to find a museum in Seoul. The task is focused and specific enough to direct and guide the user while producing useful data yet gives the participant full freedom to show us their process for museum selection in a new city.
  • Design – Regarding our experimental design, participants will be given a phone on which they will need to complete the tasks required. This phone will give access to Google and also have our prototype on it for this experiment. Depending on whether the participant is using our interface or Google, they will need to find a museum to visit in Tokyo or Seoul, respectively. As mentioned, the task they complete first will be randomized. Because our app features a chat-bot, we will be on the other end of the chat answering questions they may ask in order to complete the task. The level of frustration, satisfaction and time taken to complete tasks on both interfaces will be tested and compared. We plan on timing the process and through observation, noting any difficulties the participant may have in either case.
  • Procedure – In randomized order participant will complete either 1 or 2 first
  1. Participant is given the task to search for museum on our interface in Tokyo
  2. Participant is given the task to search for museum on Google in Seoul
  3. Participant is asked interview questions regarding frustration levels or overall satisfaction with both processes
  • Apparatus – 
    • Phone with our prototype or access to Google (including other online resources)
    • Remote person to answer participant’s chat questions
    • Interview Questions
    • Script to run identical process for all participants
  • Independent and dependent variables- The data collected for statistical analysis will be from the post experiment interview questions we ask which will rate confusion, frustration or preference over the two interfaces. We will also be timing the process of task completion for both interfaces. The dependent variables will be the time spent on task completion, user satisfaction ratings and preference, and the number and severity of confusing points encountered. The independent variable will be the interface type (prototype or Google with access to other online resources). The individual differences in experience with using tools for trip planning will be offset by randomizing the task completion order using either interface within participants.
  • Hypotheses – remember to state these in terms of the independent and dependent variables

H0 – There is no significant difference between the two interfaces in terms of time spent, satisfaction/preference, and points of confusion for the users

H1 – Users will be more satisfied with the prototype and prefer it over using Google (including other online resources) for carrying out the task

H2 – Users will experience significantly fewer points of confusion when using the prototype when compared to using Google (including other online resources)

H3 – Users will spend significantly less time in completing the task when compared to using Google (including other online resources)

  • Planned statistical analyses

We plan on running t-tests on the data collected to test the validity of the hypotheses. Depending on which hypothesis we are testing, the data collected used will differ. Data will be gathered from post-experiment interview questions.   

  • Expected limitations of the planned experiment

Our prototype features a map and a chat-bot. This map in our prototype will be hard-coded and cannot be updated dynamically for user requests. Although we plan on covering the cases necessary for the user to complete the task, the lack of dynamic mapping may limit the information that a user can request or receive from the use of our app and therefore, our results. Another limitation is that this chat-bot is not an intelligent AI system as we are letting the user believe, but rather a human. Given this, the responses the user receives may be delayed and may not be as thorough as the ones of a properly implemented AI system. Also, most the data collected for statistical analysis will be from the post-interview questions so there is a chance they may not fully quantify certain aspects of our interface.

Blog Update #4c – Supplemental experiment materials: