This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on https://humanfactors.jmir.org, as well as this copyright and license information must be included.
With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (ChatGPT), have emerged as potential tools for various applications, including health care. However, ChatGPT is not specifically designed for health care purposes, and its use for self-diagnosis raises concerns regarding its adoption’s potential risks and benefits. Users are increasingly inclined to use ChatGPT for self-diagnosis, necessitating a deeper understanding of the factors driving this trend.
This study aims to investigate the factors influencing users’ perception of decision-making processes and intentions to use ChatGPT for self-diagnosis and to explore the implications of these findings for the safe and effective integration of AI chatbots in health care.
A cross-sectional survey design was used, and data were collected from 607 participants. The relationships between performance expectancy, risk-reward appraisal, decision-making, and intention to use ChatGPT for self-diagnosis were analyzed using partial least squares structural equation modeling (PLS-SEM).
Most respondents were willing to use ChatGPT for self-diagnosis (n=476, 78.4%). The model demonstrated satisfactory explanatory power, accounting for 52.4% of the variance in decision-making and 38.1% in the intent to use ChatGPT for self-diagnosis. The results supported all 3 hypotheses: The higher performance expectancy of ChatGPT (β=.547, 95% CI 0.474-0.620) and positive risk-reward appraisals (β=.245, 95% CI 0.161-0.325) were positively associated with the improved perception of decision-making outcomes among users, and enhanced perception of decision-making processes involving ChatGPT positively impacted users’ intentions to use the technology for self-diagnosis (β=.565, 95% CI 0.498-0.628).
Our research investigated factors influencing users’ intentions to use ChatGPT for self-diagnosis and health-related purposes. Even though the technology is not specifically designed for health care, people are inclined to use ChatGPT in health care contexts. Instead of solely focusing on discouraging its use for health care purposes, we advocate for improving the technology and adapting it for suitable health care applications. Our study highlights the importance of collaboration among AI developers, health care providers, and policy makers in ensuring AI chatbots’ safe and responsible use in health care. By understanding users’ expectations and decision-making processes, we can develop AI chatbots, such as ChatGPT, that are tailored to human needs, providing reliable and verified health information sources. This approach not only enhances health care accessibility but also improves health literacy and awareness. As the field of AI chatbots in health care continues to evolve, future research should explore the long-term effects of using AI chatbots for self-diagnosis and investigate their potential integration with other digital health interventions to optimize patient care and outcomes. In doing so, we can ensure that AI chatbots, including ChatGPT, are designed and implemented to safeguard users’ well-being and support positive health outcomes in health care settings.
The digital age has witnessed an unprecedented surge in technological innovation, shaping the essence of human-computer interaction. As the world progresses toward a future encompassing artificial intelligence (AI), advanced conversational AI models, such as Chat Generative Pretrained Transformer (ChatGPT), a cutting-edge conversational AI model by OpenAI, have come to the forefront of academic discussion. This paradigm-shifting technology has revolutionized our interactions with machines and introduced profound implications across multiple disciplines. By harnessing the power of machine learning, ChatGPT transcends the limitations of traditional chatbots, yielding increasingly humanlike conversational capabilities. The technology has demonstrated remarkable capabilities, such as understanding context, generating coherent text, and adapting to various natural language processing (NLP) tasks, including but not limited to language translation, answering of questions, and text generation [
The literature has demonstrated the potential of AI-based chatbots, such as ChatGPT, to revolutionize patient care and service delivery [
However, ChatGPT is not specifically trained in medical literature. It is crucial to understand the intended purpose of ChatGPT and acknowledge its negative consequences if used otherwise. The use of ChatGPT in health care has raised concerns about the accuracy and reliability of the information provided, patient privacy and data security, prejudice, responsibility, and the ethical ramifications of using such potent language models. A study also emphasized the importance of using strong cybersecurity measures to protect patient data and privacy when using ChatGPT in health care settings [
Although ChatGPT has proven to be a remarkable technological achievement, its application in self-diagnosis poses significant risks that must be noted. We all have used the internet for self-diagnosis. Depending on the user’s health literacy, the source’s validity, and the accuracy of information interpretation, web-based self-diagnosis has resulted in positive and negative consequences. Just like most consumer-facing screen-based technologies, ChatGPT has the potential for misinterpretation and misuse, necessitating a careful approach to implementation. This is important because misuse (using it for tasks it is not designed for) can affect user trust, resulting in the underuse of the technology [
We must understand that anyone with a computer and an internet connection can use ChatGPT. Individuals with minimal to no health and technology literacy may not realize the limitations of ChatGPT and its intended use. User character, the intricacy of medical information, and the unique nature of individual patient cases underscore the potential for misinterpretation. Inaccurate or incomplete information provided by ChatGPT may result in misguided self-diagnosis or exacerbation of existing conditions. From a cognitive human factor standpoint, the misalignment between AI-generated information and users’ mental models can lead to erroneous decision-making and unfavorable health outcomes. The possibility of ChatGPT being misused for self-diagnosis is a significant concern. To counteract this, accessing the potential misuse of ChatGPT from a human factor standpoint is essential.
Addressing the concerns associated with AI chatbots in health care, this study aims to (1) explore users’ intentions to use ChatGPT for self-diagnosis and (2) gain a deeper understanding of the factors influencing their decision-making processes. By providing novel insights into the implications of AI chatbot adoption in health care settings, we intend to inform the development of guidelines, policies, and interventions that promote the responsible and effective use of AI technologies in health care. The originality of this study stems from its focus on users’ decision-making processes and intentions when using AI chatbots for self-diagnosis, an area of research that remains relatively unexplored in the context of health care applications.
As illustrated in
By pinpointing the factors contributing to users’ decision-making processes and intentions to use ChatGPT, regulators and health care professionals can develop informed policies and recommendations to ensure AI’s safe and ethical deployment in health care [
Conceptual framework illustrating the effect of performance expectancy and risk-reward appraisal on the perception of decision-making (directly) and intent to use ChatGPT for self-diagnosis (indirectly). ChatGPT: Chat Generative Pretrained Transformer; H: hypothesis.
The conceptual framework (see
In this study, performance expectancy is operationally defined as the extent to which an individual anticipates that using ChatGPT will augment their capacity to accomplish tasks, attain objectives, and alleviate workload proficiently and efficiently. This latent construct encapsulates the user’s perceptions regarding the advantages, effectiveness, and overall satisfaction derived from their interaction with ChatGPT.
The construct of decision-making was operationally defined as the extent to which an individual perceives ChatGPT as a valuable tool for assisting them in making informed, timely, and effective choices by providing relevant recommendations and insights. This latent construct encompasses the user’s belief in ChatGPT’s ability to contribute positively to their decision-making process and their willingness to act on the recommendations generated by the technology.
Similarly, the risk-reward-appraisal construct can be operationally defined as the extent to which an individual perceives the advantages of using ChatGPT as surpassing any potential adverse consequences or risks associated with its use. This latent construct captures the user’s evaluation of the trade-offs between the positive outcomes derived from ChatGPT and the potential hazards or drawbacks that may arise from its implementation.
The following are the hypotheses tested in this study.
The higher performance expectancy of ChatGPT is positively associated with improved user decision-making outcomes. Hypothesis 1 (H1) is grounded in established theories, such as the technology acceptance model (TAM) and UTAUT [
Although this is the first study to explore the impact of the perceived effectiveness of ChatGPT on decision-making, H1 can be justified by drawing on several studies that have explored the relationship between performance expectancy and technology acceptance, usage, or decision-making in other domains. For instance, Al-Emran et al [
A positive risk-reward appraisal of ChatGPT is associated with enhanced user decision-making outcomes. Hypothesis 2 (H2) is based on established psychological and decision-making theories, such as prospect theory and protection motivation theory (PMT), as well as the concept of trust in technology [
Prospect theory posits that individuals evaluate potential gains and losses during decision-making processes and that their choices are influenced by the perceived risks and rewards associated with each alternative [
In technology adoption, risk perception can significantly affect decision-making. Although there may not be studies directly examining the relationship between risk-reward appraisal and decision-making in the context of ChatGPT, several studies have explored the impact of risk perception and trust in technology on decision-making and technology adoption in other domains. For instance, a study examined the interplay between trust, perceived risk, and TAM in the context of consumer acceptance of electronic commerce (e-commerce) [
A positive perception of ChatGPT’s role in enhancing decision-making processes is associated with an increased intention among users to use the technology for self-diagnosis. Hypothesis 3 (H3) can be substantiated by drawing upon well-established theories, such as TAM, UTAUT, and research on trust in technology [
The study, classified as a flex protocol type, received approval from the Institutional Review Board (IRB) of West Virginia University (IRB protocol number 2302725983). Informed consent was obtained from participants. The data gathered through the online survey were securely stored on Centiment’s platform and remained accessible exclusively to the research team.
Statement 1: “ChatGPT can help me achieve my goals.” This item gauges the user’s conviction regarding ChatGPT’s capability to facilitate the attainment of their desired objectives within the context of their tasks.
Statement 2: “ChatGPT can reduce my workload.” This item appraises the user’s perception of ChatGPT’s potential to mitigate the burden of task completion by streamlining processes and increasing efficiency.
Statement 3: “I was successful in achieving what I wanted to accomplish with ChatGPT.” This item measures the user’s perception of the degree to which their interaction with ChatGPT has led to the realization of intended outcomes, reflecting the efficacy of the technology in practical applications.
Statement 4: “I am satisfied with ChatGPT.” This item examines the user’s overall contentment with the performance of ChatGPT, capturing their appraisal of its utility and effectiveness in addressing their needs and expectations.
In addition, 2 statements were developed to form the latent construct
Statement 1: “ChatGPT helps me make informed and timely decisions.” This item evaluates the user’s perception of ChatGPT’s capacity to provide pertinent information, insights, and guidance, which in turn facilitates well-informed and timely decision-making processes.
Statement 2: “I am willing to make decisions based on the recommendations provided by ChatGPT.” This item measures the user’s trust in the recommendations offered by ChatGPT and their readiness to incorporate those suggestions into their decision-making processes.
Statements used in the survey.
Factor | Questions |
Performance expectancy (PE) |
To what extent do you agree with the following: ChatGPTa can help me achieve my goals (PE1). To what extent do you agree with the following: ChatGPT can reduce my workload (PE2). To what extent do you agree with the following: I was successful in achieving what I wanted to accomplish with ChatGPT (PE3). To what extent do you agree with the following: I am satisfied with ChatGPT (PE4). |
Perception of decision-making (DM) |
To what extent do you agree with the following: ChatGPT helps me make informed and timely decisions (DM1). To what extent do you agree with the following: I am willing to make decisions based on the recommendations provided by ChatGPT (DM2). |
Risk-reward appraisal (RRA) |
To what extent do you agree with the following: The benefits of using ChatGPT outweigh any potential risks (RRA). |
Intent to use (IU) |
To what extent do you agree with the following: I am willing to use ChatGPT for self-diagnosis purposes (IU). |
aChatGPT: Chat Generative Pretrained Transformer.
Furthermore, 1 statement measured the extent to which an individual perceives the advantages of using ChatGPT as surpassing any potential adverse consequences or risks associated with its use: “The benefits of using ChatGPT outweigh any potential risks.”
Lastly, users’ willingness to use ChatGPT for self-diagnosis was captured using 1 statement: “I am willing to use ChatGPT for self-diagnosis purposes.”
All the items were assessed using a 4-point Likert scale, allowing participants to indicate their level of agreement with each statement, ranging from “strongly disagree” to “strongly agree.”
Note that we used a forced Likert scale in this study. By precluding the inclusion of a neutral or midpoint option, a forced Likert scale necessitates respondents to articulate a definitive opinion or predilection, thereby generating data that are more incisive and unequivocal [
The data collection for this study took place in February 2023, using an online survey administered through Centiment, a reputable service provider for survey deployment and data gathering [
Upon obtaining informed consent from participants, they were directed to the survey, which contained questions designed to measure the constructs under investigation. The survey used forced 4-point Likert scale questions to elicit decisive responses from participants, thus reducing potential biases. Additionally, the survey incorporated a checking question to verify that respondents thoroughly read all questions before providing their answers, further ensuring data quality. Upon completing the data collection process, the team meticulously reviewed and processed the data to ascertain their quality and accuracy before advancing to subsequent data analysis.
The data analysis for this study consisted of 2 primary stages: descriptive statistics and partial least squares structural equation modeling (PLS-SEM). Descriptive statistics were calculated for all survey questions to provide an overview of the responses’ central tendency, dispersion, and distribution. These statistics offered initial insights into the participants’ attitudes and perceptions regarding the constructs under investigation.
Following the descriptive analysis, the research team used PLS-SEM to examine the relationships between the latent constructs. PLS-SEM is a powerful multivariate analysis technique that allows researchers to estimate complex cause-effect relationships between latent constructs and their indicators [
After confirming the measurement model’s adequacy, we evaluated the structural model to test our hypotheses. This analysis included assessing the path coefficients, significance levels, and determination coefficients (
A total of 607 individuals participated in the study, providing comprehensive responses to the questionnaire.
Our study investigated the relationships between performance expectancy, risk-reward appraisal, decision-making, and the intent to use ChatGPT for self-diagnosis. The
Regarding reliability, the performance expectancy construct had a Cronbach α coefficient of .783, a rhoC of 0.860, and an AVE of 0.606. The decision-making construct exhibited a Cronbach α coefficient of .668, a rhoC of 0.858, and an AVE of 0.751. The rhoA values were similar to Cronbach α values for each construct, further supporting the reliability of the constructs. Moreover, the AVE values for all constructs exceeded the recommended threshold of 0.5, indicating adequate convergent validity.
Descriptive statistics of study variables.
Question | Mean (SD) | Kurtosis |
PE1a | 3.24 (0.77) | 0.694 |
PE2 | 3.22 (0.78) | 0.285 |
PE3 | 3.20 (0.74) | 0.290 |
PE4 | 3.24 (0.76) | 0.403 |
DM1b | 3.25 (0.78) | 0.633 |
DM2 | 3.13 (0.81) | 0.028 |
RRAc | 3.20 (0.80) | 0.371 |
IUd | 3.09 (0.85) | –0.180 |
aPE: performance expectancy.
bDM: decision-making.
cRRA: risk-reward appraisal.
dIU: intent to use.
Illustration of the proportion of ChatGPT use frequency, respondents’ purpose of using ChatGPT, their familiarity with the technology, their perception of ChatGPT’s persuasiveness, and their education level. ChatGPT: Chat Generative Pretrained Transformer.
Our findings provide empirical support for all 3 hypotheses. We discovered that higher performance expectancy and positive risk-reward appraisals of ChatGPT are positively associated with improved decision-making outcomes among users. Additionally, enhanced decision-making processes involving ChatGPT positively impact users’ intention to use the technology for self-diagnosis. The results, including standardized β coefficients, SDs,
PLS-SEM analysis results elucidated the significant associations among the study variables. The direct effects indicated a significant positive association between performance expectancy and decision-making (β=.547,
The “total effects of study variables” section of
Standardized direct, indirect, and total effects.
Effects | β (SD) | 95% CI |
|
||
|
|||||
|
Performance expectancy à decision-making | .547 (.037) | 14.715 | 0.474-0.620 |
|
|
Risk-reward appraisal à decision-making | .245 (.042) | 5.850 | 0.161-0.325 |
|
|
Decision-making à intent to use | .565 (.033) | 16.928 | 0.498-0.628 |
|
|
|||||
|
Performance expectancy à decision-making à intent to use | .309 (.028) | 10.911 | 0.255-0.366 |
|
|
Risk-reward appraisal à decision-making à intent to use | .138 (.026) | 5.191 | 0.087-0.191 |
|
|
|||||
|
Performance expectancy à decision-making | .547 (.037) | N/Aa | 0.474-0.620 |
|
|
Performance expectancy à intent to use | .309 (.028) | N/A | 0.255-0.366 |
|
|
Risk-reward appraisal à decision-making | .245 (.042) | N/A | 0.161-0.325 |
|
|
Decision-making à intent to use | .138 (.027) | N/A | 0.087-0.191 |
|
aN/A: not applicable.
We acknowledge that conversational AI systems, such as ChatGPT, can be crucial in health care by providing numerous possibilities to elevate patient care, optimize medical workflows, and augment the overall health care experience. In this study, we highlighted specific obstacles that must be tackled for secure and efficient implementation of ChatGPT. Although our study is the first to investigate the effects of perceived ChatGPT effectiveness and risk-reward appraisal on decision-making, its validity is supported by numerous studies examining the relationships among performance expectancy, technology acceptance, usage, risk-reward appraisal, and decision-making in other domains.
The findings of our study contribute to the growing body of the literature on AI chatbots in health care and their potential applications, particularly in the context of self-diagnosis. In recent years, research has increasingly focused on developing and evaluating AI chatbots for various health care purposes [
Our research builds on earlier studies investigating the factors affecting the adoption of AI chatbots in health care [
By focusing on ChatGPT, our study contributes to the broader conversation on AI chatbots’ ethical and societal implications in health care. The increasing popularity of AI chatbots, such as ChatGPT, for self-diagnosis raises important questions about the responsibilities of AI developers, health care providers, and policy makers in ensuring such technologies’ safe and responsible use. Our findings highlight the need for a collaborative, interdisciplinary approach to addressing these challenges, involving stakeholders from various sectors, including AI development, health care, policy, and ethics.
The implications of our findings from a policy and pragmatic standpoint suggest a need for proactive preparation and policy alteration concerning the use of ChatGPT for self-diagnosis in health care.
Human behavior has consistently demonstrated a tendency to repurpose technology for purposes beyond its original design, even when aware of the potential risks or drawbacks. In the context of our study, people are inclined to use ChatGPT, a technology not specifically designed for health care applications, for self-diagnosis, as they perceive it to be useful and easy to use. Similarly, as evidenced by our prior study on internet use and mental health, people often turn to online sources for self-diagnosis and health information, despite the potential negative impact on mental health [
In both cases, people’s behavior can be understood by observing the balance between perceived benefits, ease of use, and potential risks. The desire to reduce uncertainty and the convenience of technology may outweigh the awareness of potential drawbacks or misuse. This highlights the need to develop and regulate technologies like ChatGPT and online health information sources to meet health care applications’ unique requirements and ethical considerations, ensuring that they are user-friendly and trustworthy and minimize negative impacts on users’ health and well-being.
First, policy makers and health care stakeholders should collaborate to establish guidelines and ethical standards for using ChatGPT in health care settings [
Second, by tailoring the chatbot to address medical inquiries and concerns better, users can receive more reliable and valuable information to inform their decision-making processes. In addition, incorporating evidence-based medicine, reliable sources, and expert opinions into the chatbot’s knowledge base can further improve its credibility and usefulness in the health care [
Third, educating and informing users about the appropriate use of ChatGPT for self-diagnosis and its limitations are essential. Public health campaigns and educational materials should emphasize the importance of consulting health care professionals for accurate diagnosis and treatment, while highlighting the potential benefits of using chatbots as an adjunct tool for health information and decision-making support. A feedback mechanism could be proposed to ensure shared understanding and improve user awareness. This mechanism would involve users providing feedback on their experience with ChatGPT, including the accuracy and relevance of the information received and any concerns or misconceptions they may have encountered. Health care professionals could also be involved in this process, sharing their perspectives on the chatbot’s performance and suggesting improvements to enhance its reliability and user-friendliness. The feedback collected would then be used to refine ChatGPT’s algorithms, knowledge base, and user interface, ensuring it remains current with the latest medical knowledge and best practices. This iterative process would foster continuous improvement of the chatbot’s performance and promote greater awareness and understanding among users about the appropriate use of ChatGPT and its limitations in the context of self-diagnosis. Additionally, educational resources, such as tutorials and guidelines, could supplement the feedback mechanism to guide users in interacting with ChatGPT effectively and responsibly. By implementing a feedback mechanism and providing educational support, users can better perceive ChatGPT’s capabilities and limitations, ultimately promoting responsible and effective use of AI chatbots in health care settings.
Lastly, continuous monitoring and evaluation of ChatGPT’s use in health care should be conducted to assess its impact on health care outcomes and decision-making. This will enable policy makers and health care providers to make informed decisions about the potential benefits, risks, and practical applications of ChatGPT in health care settings.
Our study has limitations that warrant consideration. First, we did not control for potential confounding factors, such as age, medical condition, health literacy, previous experience with comparable technologies, or demographic characteristics, which might significantly influence users’ intentions to use ChatGPT for self-diagnosis. The results among younger and healthier populations could differ substantially from those among older populations with existing medical conditions. Younger individuals may be more inclined to use AI chatbots due to their familiarity with technology. In comparison, older individuals or those with medical conditions may seek additional reassurance or support for managing their health concerns.
Second, the cross-sectional survey design constrained our capacity to examine the evolving nature of users’ interactions with AI chatbots. Moreover, relying on self-reported measurements may introduce various biases, including social desirability, recall, or imprecise reporting. Self-report measures obtained through surveys inherently capture users’ perceptions rather than objective reality. Although the participants’ subjective experiences can provide valuable insights, there may be discrepancies between these perceptions and the actual situation. Furthermore, the cross-sectional design of the study limited our ability to draw causal inferences over time. Future research could use a triangulation approach to mitigate these limitations, incorporating objective measures and longitudinal data collection to provide a more comprehensive understanding of the phenomenon under investigation. Lastly, focusing on ChatGPT, which is not specifically intended for health care applications, may limit the applicability of the findings to other AI chatbots explicitly designed for health care purposes.
To address these limitations, future research should consider using longitudinal data, stratifying the sample by age group and medical condition, and accounting for potential confounding factors, such as participants’ familiarity with AI technology, prior experiences with chatbots, and demographic information. Various methodologies could provide additional insights, including monitoring chatbot usage and conducting qualitative interviews to assess trust and user behavior. Enhancing the data collection frequency and guaranteeing participant anonymity may also help reduce biases. By addressing these constraints, future research can contribute to a more comprehensive understanding of AI chatbot adoption in health care settings and enable more targeted interventions to optimize patient care and outcomes across diverse populations and health statuses.
In conclusion, our research investigated the factors influencing users’ intentions to use ChatGPT for self-diagnosis, a purpose for which the technology is not specifically designed. The study aimed to explore the implications of these factors for the safe and effective integration of AI chatbots in health care settings. By examining performance expectancy, risk-reward appraisal, and decision-making processes, our findings contribute to the growing body of the literature on AI chatbots in health care and provide insights into AI chatbot adoption in health care contexts.
The clinical message of this study is to emphasize the importance of ongoing collaboration among AI developers, health care providers, and policy makers in ensuring the safe and responsible use of AI chatbots in health care. Addressing users’ expectations, risk-reward appraisal, and decision-making processes can help develop AI chatbots tailored to human needs and preferences, providing consumers with reliable and verified sources for health-related information. This approach can not only enhance health care accessibility but also improve health literacy and awareness among the public.
As the field of AI chatbots in health care continues to evolve, future research should further investigate the long-term effects of using AI chatbots for self-diagnosis and explore the potential integration of AI chatbots with other digital health interventions to optimize patient care and outcomes. In doing so, we can better understand the implications of AI chatbot usage in health care settings and ensure that these technologies are designed and implemented to safeguard users’ well-being and support positive health outcomes.
artificial intelligence
average variance extracted
Chat Generative Pretrained Transformer
electronic commerce
partial least squares structural equation modeling
protection motivation theory
technology acceptance model
unified theory of acceptance and use of technology
AC, the lead researcher, was responsible for the study’s conceptualization, the survey’s development, figure illustration, data collection, and manuscript writing. YS, the student author, contributed to the study by conducting data analysis and providing input during manuscript writing. Both authors equally collaborated throughout the research process, ensuring the rigor and accuracy of the study, and approved the final version of the manuscript for submission.
None declared.