Chatbot Pilot Evaluation: A Quantitative and Qualitative ‘Exit’ Study

The ‘exit study’, conducted upon completion of the chatbot experiment, incorporated both quantitative and qualitative methods; an online survey, the ‘exit questionnaire’, sent to all chatbot participants, and online focus, or discussion, groups, held with a subset of chatbot participants (two focus groups with 4-7 participants per pilot location).

The focus groups topic guide and the survey questionnaires were designed with inputs from most of the other work packages in the consortium – such as the technical and the ethics partners.

Focus group findings

To open the discussion, focus group participants were asked to think of three words to describe their chatbot experience; the word cloud below highlights the overall positive tone at a glance.

The topic guide covered the following themes: positive and negative features of chatbot, badges and messages, future developments, contextual factors, privacy and diversity issues.

The main currents of opinion

Participants felt the screen was too crowded. They were also concerned about the inability to engage in a conversation with someone who had answered their question, and thought that all the answers could be made available to all users. They were also mystified by the incentives scheme (badges and messages), whose logic had not been properly clarified in advance. Finally, privacy was not perceived as an issue as all users belonged to the same students’ community.

Survey findings

The online questionnaire included a set of closed-ended questions with quantified response alternatives and a final open text question for any additional comments. The survey aimed at assessing various aspects of the user experience; performance and effort expectancy, hedonic motivation and behavioural intentions, contextual factors (location and time), reception of incentives.

Albeit with some variation across countries, between 71 and 85% of students agreed or strongly agreed that Ask for Help was easy to use and that they had the necessary resources and knowledge to use it.

Additional encouraging findings emerge from the survey analysis; an overwhelming majority (88%) of participants reported being interested in the experience. It is interesting to note that a higher percentage of participants thought the app was useful to provide help to others (86%) than to reach out for help (73%). On the other hand, participants reported feeling equally pleased, overall, to provide (87%) and receive answers (87%). Particularly high (85-89%) the overall agreement about feeling comfortable using chatbot. Considering the unsophisticated, premature version of the app being tested, it is not unsatisfactory that over half of testers would continue to use the app beyond the experiment stage.

The analysis of questions: content analysis

A content analysis was conducted on chatbot question logs with the aims of i) classifying questions into categories based on their apparent communication goal, ii) presenting example questions for each, iii) identifying relevant domains within each question category.

The LSE (The London School of Economics) and AAU (Aalborg University) chatbot interaction data, both in English language, were analysed and used to develop and test a coding frame, which could then be applied to other countries’ samples.

Seven question categories were identified:

Requests for information; getting to know the chatbot community; initiating a connection; asking for suggestions; sharing opinions and experiences; academic dilemmas; sensitive and personal topics.

Sensitive and personal questions such as “Do you want to get married in the future? Do you want kids?” were crucial in shaping the ‘ethics of technology’ guidance for future pilots design; such questions highlight the subjectivity of sensitive/personal attribution, which can vary widely depending on cultural or political context and background of the requester and receiver.

Looking at the LSE and AAU distributions, we can identify three main question categories; a third of LSE’s and over half of AAU’s questions sought to get to know the community, while at least 1 out of 5 questions were requesting suggestions and about 15% of questions enquired about others’ opinions and experiences. A noticeably higher share of academic questions were found in the LSE sample and a relatively small number of explicitly sensitive or personal questions were coded in both locations.

Interestingly, all LSE connection questions were aimed at ‘anyone’, as well as around 80% of community, opinions and experiences and suggestions. Among academic questions, half were targeted to ‘anyone’ and 44% at ‘someone similar’ – ‘similar’ seemed to be interpreted as students in the same course year, discipline or degree level. The ‘ask different’ option was most popular for personal and sensitive questions but was, overall, the least preferred choice of respondent profile.

In conclusion, here are some quotes from the survey and focus groups: