This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on https://humanfactors.jmir.org, as well as this copyright and license information must be included.
The rising adoption of telehealth provides new opportunities for more effective and equitable health care information mediums. The ability of chatbots to provide a conversational, personal, and comprehendible avenue for learning about health care information make them a promising tool for addressing health care inequity as health care trends continue toward web-based and remote processes. Although chatbots have been studied in the health care domain for their efficacy for smoking cessation, diet recommendation, and other assistive applications, few studies have examined how specific design characteristics influence the effectiveness of chatbots in providing health information.
Our objective was to investigate the influence of different design considerations on the effectiveness of an educational health care chatbot.
A 2×3 between-subjects study was performed with 2 independent variables: a chatbot’s complexity of responses (eg, technical or nontechnical language) and the presented qualifications of the chatbot’s persona (eg, doctor, nurse, or nursing student). Regression models were used to evaluate the impact of these variables on 3 outcome measures: effectiveness, usability, and trust. A qualitative transcript review was also done to review how participants engaged with the chatbot.
Analysis of 71 participants found that participants who received technical language responses were significantly more likely to be in the high effectiveness group, which had higher improvements in test scores (odds ratio [OR] 2.73, 95% CI 1.05-7.41;
Given their increasing popularity, it is vital that we consider how chatbots are designed and implemented. This study showed that factors such as chatbots’ persona and language complexity are two design considerations that influence the ability of chatbots to successfully provide health care information.
As health care technology advances, internet usage increases, and cultural norms shift (eg, in response to the COVID-19 pandemic), people are receiving more health care information from virtual mediums (eg, telehealth) than ever before [
The potential benefits that chatbots can provide have led to their implementation in a variety of health care contexts, including diet recommendations [
In this study, participants were tasked with interacting with the chatbot to seek information about blood pressure. The experiment was a 2×3 between-subjects design, in which the chatbot with which the participants interacted differed in the complexity of its responses (either technical or nontechnical language) and the presented qualifications of its persona (either Doctor, Nurse, or Nursing Student).
This study was reviewed and approved by the Clemson University Institutional Review Board (IRB2019-411).
The most common purpose of chatbots in health care has been to provide education and training for conditions (eg, mental health, type 2 diabetes, breast cancer, hypertension, asthma, pain monitoring, and language impairment) [
To differentiate between the complexity of the responses (technical vs nontechnical), we assessed the reading difficulty of each chatbot response using the Microsoft Word Reading Assessment feature. This feature uses the Flesch-Kincaid readability test, which determines a text’s Flesch reading ease and its Flesch-Kincaid grade level. The Flesch-Kincaid assessments have been used to assess technical manuals, legal documents, and insurance policies [
The persona that the chatbot represented consisted of 3 possible naming structures (ie, Doctor, Nurse, or Nursing Student). Each of the chatbot personas were named Sarah with only the salutation changing between the conditions (eg, “Dr Sarah,” “Nurse Sarah,” or “Nursing Student Sarah”). This was done to avoid any implicit bias in the persona based on using different names. Each of the personas introduced themselves at the start of the chatbot engagement. For example, “Hello, my name is Dr Sarah. I’m here to help you learn about blood pressure today. You can ask questions about understanding blood pressure, learning how to manage or prevent high blood pressure, who is affected, and more. What is your first question?” Following the initial engagement, the persona identifier was used as an identifier in each response to the participant.
Participants were recruited from Clemson university; they were required to be between the ages of 18 and 26 years and to be able to read, write, and speak in English. Participants received a compensation of US $10 for 30 minutes of their time at the end of the session. Participants between the ages of 18 and 26 years were chosen so that the participant population likely had a similar (nominal) level of knowledge about blood pressure.
Following informed consent procedures, participants completed a demographic survey and then an experimenter assessed the participants’ health literacy using the Short Assessment of Health Literacy—English [
Participants’ perceived usability of the chatbot was measured via the PSSUQ [
Initially, 74 students participated in the study; however, 3 participant’s data were removed from the data analysis—two due to incomplete data collection and one because the participant did not engage in the task (eg, not asking blood pressure–related questions throughout the experiment). Of the remaining 71 participants, 43 (60.6%) self-identified as female, 30 (42.3%) were graduate students, and 41 (57.7%) were undergraduate students. The average age of the participants was 21.87 (SD 2.58) years. The demographic results are presented in
Characteristics of study participants (N=74).
Variables | Values | ||
Age (years), mean (SD) | 21.87 (2.58) | ||
|
|||
|
Male | 28 (39.4) | |
|
Female | 43 (60.6) | |
|
|||
|
Caucasian | 49 (69) | |
|
African American | 8 (11.3) | |
|
Asian | 14 (19.7) | |
|
|||
|
Undergraduate | 41 (57.7) | |
|
Graduate | 30 (42.3) |
The average usability score was relatively high (mean 6.00, SD 0.63), indicating high perceived usability of the system. A linear model was constructed to model the usability scores from the independent factors and resulted in residuals that were significantly skewed (Shapiro-Wilk test: skewness 0.959;
Linear regression model predicting usability of the chatbot.
Coefficients | Estimate | SE | |
Intercept | 39.5 | 1.98 | <.001 |
Response complexity (technical language) | –2.34 | 1.59 | .15 |
Chatbot persona (“Doctor”) | –3.32 | 1.97 | .10 |
Chatbot persona (“Nursing Student”) | –4.52 | 1.96 | .02 |
Gender (male) | –3.38 | 1.70 | .049 |
Student status (undergraduate) | 7.05 | 2.27 | .03 |
Only 9 of 71 (12.7%) participants reported not trusting the chatbot. A binary logistic regression model predicting trust (
Binary logistic regression model predicting trust in the chatbot.
Coefficients | ORa (95% CI) | |
Intercept | <0.001 (<0.001-1.51) | .07 |
Response complexity (technical language) | 0.80 (0.17-3.58) | .77 |
Chatbot persona (“Doctor”) | 0.86 (0.14-4.95) | .87 |
Chatbot persona (“Nursing Student”) | 1.94 (0.27-17.8) | .52 |
Health literacy score | 2.04 (1.11-4.00) | .03 |
aOR: odds ratio.
The median difference in pretest versus posttest scores was an improvement of 4 questions, and thus, a median split separated participants who had an improvement of 4 or more into a “high effectiveness” group (n=37) and those who had an improvement less than 4 into a “low effectiveness” group (n=34). A binary logistic regression predicting effectiveness (
Binary logistic regression model predicting effectiveness of the chatbot.
Coefficients | ORa (95% CI) | |
Intercept | 0.87 (0.33-2.25) | .76 |
Response complexity (technical language) | 2.73 (1.05-7.41) | .04 |
Chatbot persona (“Doctor”) | 0.84 (0.25-2.72) | .76 |
Chatbot persona (“Nursing Student”) | 0.52 (0.15-1.69) | .28 |
aOR: odds ratio.
Analysis of the chatbot conversation transcripts reveals that all of the 71 participants followed the general knowledge–seeking task. However, there were elements of how participants interacted with the chatbot that varied. Only about half of the participants (35/71, 49.3%) asked at least one question using the singular “I” form, often concerning prevention for themselves (ie, “How can I prevent high blood pressure from occurring?”). Of these participants, most (25/35, 71.4%) asked more than one question using the singular “I” form. Generally, the “I” questions could be answered with generic responses, but occasionally participants would ask questions such as “Am I at risk?” which the chatbot, based on the current chatbot pattern matching structure, was not able to answer explicitly for each participant. Only one participant asked the chatbot about assisting others: “How can I help someone with high blood pressure?” When participants received an “I don’t know” response from the chatbot, they generally reverted back to general knowledge seeking with questions like “What is blood pressure?” or “Who is affected most?”
A handful of participants (5/71, 7%) used scenarios at some point in their dialogue to learn about specific factors that could put them at risk of high blood pressure. The scenarios were generally self-centric, in that the participants wanted to know if their specific life circumstances or choices could affect their blood pressure.
Additionally, the way in which participants interacted with the chatbot’s persona (Doctor, Nurse, or Nursing Student Sarah) varied. When participants initially entered the chatbot, they received a welcome message from Sarah. Only 4 of 71 (5.6%) participants responded with a greeting or addressed Sarah personally (eg, “Hello Nursing Student Sarah, what a strange name. I am Graduate Student (redacted),” or “Hi Sarah!”). An additional person thanked Sarah at one point in their session (“Thanks for helping me Nurse Sarah”), while another two participants just said “Thanks” at the very end of the session. Two of the participants that addressed Sarah at the beginning also either addressed her again in the session or had generic conversation-like comments (eg, “You too, Nursing Student Sarah”). Still other participants said things like “Interesting,” “Okay,” and “That’s scary” when finding out information they did not know or by which they were fascinated.
The way the participants used grammar or shorthand in their conversation with the chatbot was evaluated. Most participants asked their questions using a format similar to “What is high blood pressure?” although even those varied greatly in terms of grammar. Some participants used capitalization and question marks whereas others did not. Other participants preferred statements like “how to prevent blood pressure,” “symptoms of high blood pressure,” and even one as simple as “high blood pressure.” Overall, the way participants formatted their questions grammatically and how they expected to be able to input text and receive corresponding information varied widely, which suggests multiple means of interaction with the chatbot, either as a chatbot conversationally or emulating a search engine.
I am 25 year old [sic] and my mother and father both have high blood pressure. What are the odds that I get high blood pressure?
What if I work out but eat unhealthy [sic]
For a young woman age [sic] 18, what is the likelihood of developing high blood pressure?
Has [sic] stress in college aged kids started an increase in hypertension in younger people [sic]
Chatbots are growing in use across the internet, not only for consumer products and websites but also within health care settings. This paper described an exploratory study investigating how the design of a chatbot might impact its perceived trust, usability, and effectiveness in a health information search setting. The chatbot’s language was based on previous health care research that demonstrated that patients’ understanding of health information changes with language style and structure [
A key limitation was the relative homogeneity of the participants within this study; participants were of similar ages (18-26 years) and education levels. Although this age range was selected to support a more homogeneous group of possible participants without direct experiences and knowledge associated with blood pressure, this does limit the generalizability of the study. Technical language responses may have been more effective because all of the participants were college students with relatively high health literacy, and thus, simplifying the responses may only have served as a detriment. In other populations with lower health literacy, nontechnical language may be more effective. Future work should more closely reflect the wider population ages, experiences, and health literacies in evaluating the usefulness of chatbots in health care applications. Additionally, future work should evaluate how the users’ identities and their intersectionality influence their interactions with chatbots to account for potential cultural and other biases that may be implemented in a chatbot’s design.
Health literacy and its impacts on chatbot language, trust, and usability need to be further studied. This study found that health literacy had an impact on the trust in the chatbot, which was to be expected based on previous research [
Another limitation is the simple persona used in this chatbot. This persona was not found to significantly impact effectiveness or trust. This may be because the persona used in this study was simple, and therefore, potentially unengaging; it included only a name and title, it did not have a picture or other visual stimuli, and it did not engage in any personalized dialogue (eg, asking the participant questions). This is supported in the qualitative transcript review, which found that most participants did not acknowledge Sarah (the chatbot’s persona), and few responded to the greeting, addressed Sarah at some other point in the dialogue, or thanked Sarah. Overall, most of the participants did not appear to engage with Sarah beyond its use as a chatbot to deliver information, suggesting that some participants used the chatbot as more of a conventional search engine rather than a conversational agent. Future studies should examine other ways of representing personas to evaluate whether personas in general are useful in this context. Other representations could include additional visual stimuli like pictures or avatar images. As the representations transform into 3D or virtual agents, the required characteristics need to change as well and follow other design patterns [
Neither language nor persona had a significant effect on trust in our study. This could be in part due to trust being difficult to measure and quantify [
Lastly, although the experimental setting attempted to replicate a health care website with a chatbot, the setting was a static website with a simulated chatbot. The responses were not truly determined by an artificial agent but were instead accomplished with preconstructed responses resembling a messenger type system via a Wizard of Oz study. This replication may have impacted the results, as the responses were simulated by an experimenter and not by the technology. Since the responses were given by a person, there is a possibility for variability in how the experimenter responded. Along with the experimenter’s possible variability, there was variability in what questions participants asked and how participants asked those questions.
With increased internet use in everyday life, the ways in which people obtain health care information are changing. It is important to continue to develop proper health care websites with information that can be personalized for users based on influential factors, such as age, gender, identity, and health literacy [
Health care chatbots and telehealth medicine are also on the rise, not only in the last decade but particularly as a response to the COVID-19 pandemic. One technology implementation that saw an increase was telehealth medicine, where doctors and patients communicated virtually via videos, emails, and chats. Chatbots may be effective for these particular cases [
odds ratio
Post‐Study System Usability Questionnaire
None declared.