Abstract
Background: The exponential growth in computing power and the increasing digitization of information have substantially advanced the machine learning (ML) research field. However, ML algorithms are often considered “black boxes,” and this fosters distrust. In medical domains, in which mistakes can result in fatal outcomes, practitioners may be especially reluctant to trust ML algorithms.
Objective: The aim of this study is to explore the effect of user-interface design features on intensivists’ trust in an ML-based clinical decision support system.
Methods: A total of 47 physicians from critical care specialties were presented with 3 patient cases of bacteremia in the setting of an ML-based simulation system. Three conditions of the simulation were tested according to combinations of information relevancy and interactivity. Participants’ trust in the system was assessed by their agreement with the system’s prediction and a postexperiment questionnaire. Linear regression models were applied to measure the effects.
Results: Participants’ agreement with the system’s prediction did not differ according to the experimental conditions. However, in the postexperiment questionnaire, higher information relevancy ratings and interactivity ratings were associated with higher perceived trust in the system (P<.001 for both). The explicit visual presentation of the features of the ML algorithm on the user interface resulted in lower trust among the participants (P=.05).
Conclusions: Information relevancy and interactivity features should be considered in the design of the user interface of ML-based clinical decision support systems to enhance intensivists’ trust. This study sheds light on the connection between information relevancy, interactivity, and trust in human-ML interaction, specifically in the intensive care unit environment.
doi:10.2196/56924
Keywords
Introduction
Overview
In the intensive care unit (ICU), intensivists make an extremely high number of decisions. For example, McKenzie et al [
] found that approximately 100 decisions are made every morning round. According to Ward et al [ ], despite the continual increase in the number of ICUs, the number of intensivists remains about the same, resulting in an extremely high workload. The high rate of decision-making together with the continuous overload prompts the need for decision support tools.Although machine learning (ML) algorithms and systems serving the medical community are continually increasing, their adoption into routine health care practice is not guaranteed [
]. One reason is the complexity of the algorithms, which often leads to clinicians’ lack of trust in such systems [ ]. A multidisciplinary approach may enhance trust, by considering the human factor, the technological aspect, and the interaction between them [ ]. This study examined 2 human-automation interaction features that emphasize the importance of the human factor in the design of ML-based clinical decision support systems (CDSSs).Clinical Decision Support Systems
To date, many CDSSs are categorized as “expert systems”—systems that try to imitate the way an ideal physician would think. These systems generate conclusions based on sets of rules [
]. In contrast, ML algorithms approach problems in the opposite way—they generate rules from historical data [ , ]. ML algorithms are currently being developed in almost every field of medicine and, in many instances, are already providing equal or even greater accuracy than physicians (eg, [ - ]). However, though ML CDSSs can enhance the quality of care, the adoption of such systems in all medical fields, and specifically in critical care, remains low [ ].In contrast to expert systems, ML algorithms are complex, and understanding and explaining the reasoning underlying them is often impossible [
]. Thus, ML algorithms are frequently considered black box algorithms. This fosters physicians’ distrust and skepticism of ML systems [ ] and has been suggested as a major cause of the low rates of adoption and acceptance of these systems within the medical community [ ]. Wrong decisions made by intensivists can result in severe and even fatal outcomes. Thus, they may be reluctant to share their decision-making responsibilities with black box CDSSs that they do not understand [ ].Interpretable ML
As ML algorithms are developed to serve humans, human interaction with them must be considered. One approach to move from a “black box” to a “clear box” [
] lies in the growing field of interpretable ML [ - ]. Miller [ ] offered an approach that combines artificial intelligence, social science, and human-computer interaction (HCI). He referred to “human–agent interaction” as the intersection of these 3 domains, including it as part of the interpretable ML field. Impressive work has been performed on interpretable ML in the HCI community (eg, [ - ]). Unfortunately, the ML community and the HCI community do not always work together [ ]. This results in poor usability of many interpretable ML algorithms [ ], yet opens an opportunity for HCI and interaction design researchers to seek means of enhancing trust in ML CDSSs [ ].Human-Automation Trust
Parasuraman and Riley [
] defined automation as a technology that executes “a function that was previously carried out by a human.” This wide definition covers all kinds of machines, computers, and applications of artificial intelligence. Human-automation trust is a well-studied subject (eg, [ - ]). In the context of human social interactions, trust can be defined as “the willingness to be vulnerable to the actions of another person” [ ]. Research has shown that humans perceive computers as social actors and may interact with them as they would with each other [ - ]. The interaction between humans and automated systems, or, in the context of this study, intensivists and black box algorithms, has also been shown to be substantially influenced by trust [ ].Although human-automation trust is being researched by many disciplines, no dominant model or approach has been determined for its measure. However, a well-accepted conclusion is that trust is not a standalone construct, but rather multidimensional [
]. In this study, we used the definition of Lee and See [ ] for human-automation trust “an attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability.” This definition corresponds well with the interaction between intensivists and ML CDSSs, even though the ICU environment is characterized by high levels of both uncertainty and vulnerability.According to Madsen and Gregor [
], human-computer trust is comprised of 2 main dimensions—cognition-based trust (CBT) and affect-based trust (ABT). CBT is based on the user’s intellectual perceptions of the system’s characteristics, while ABT is based on the user’s emotional responses to the system. The 2 dimensions can be further subdivided. CBT is comprised of the understandability of the system and the technical competence of the system, whereas ABT is comprised of faith, personal attachment, and reliability. Madsen and Gregor [ ] note that reliability was also found to influence CBT, although its influence on ABT is stronger. The researchers suggested a questionnaire for measuring trust, which we implemented in this study ( ).Aim
The primary aim of this study was to investigate the influence of elements of the user interface (UI) design on intensivists’ trust in ML-based CDSSs (“black-box”–based algorithms). From the many UI elements that can be modified, the 2 that were chosen and compared are information relevancy and interface interactivity.
The literature is abundant regarding information relevancy, interactivity, and trust, as well as the influence of the 2 former factors on the latter. However, to the best of our knowledge, no research has assessed connections between information relevancy, interactivity, and trust in the context of human-ML interaction, specifically in the context of the ICU environment.
Hypothesis 1: Information Relevancy
Information relevancy concerns the degree to which users perceive that the information content of a system meets their needs [
]. This factor was found to positively influence user satisfaction with websites [ , ] and users’ trust in health infomediaries [ ]. Relevant information has been found to be an attribute that is more crucial for users than usability and convenient use of the system [ ]. Considering the above, our hypothesis is as follows:- Higher levels of information relevancy will lead to higher levels of trust in the system. [H1]
Hypothesis 2: Interface Interactivity
Interactivity can be defined in various ways. For this study, we used a common definition by Steuer [
]—“the extent to which users can participate in modifying the format and content of a mediated environment in real time.” Interactivity is considered to strongly influence users’ experiences during the interaction [ ] and is key to the success of e-commerce websites [ - ]. Interactivity was found to increase users’ trust in websites in general and specifically in e-commerce, mobile commerce [ , ], and brand loyalty [ ]. Although most of the literature on interactivity has focused on e-commerce trust and intentions to use websites, we expected greater interface interactivity to positively influence the interaction between ML CDSSs and intensivists, and to enhance their trust. Considering the above, our hypothesis is as follows:- Higher levels of interface interactivity will lead to higher levels of trust in the system. [H2]
Methods
Overview
To test the hypotheses, a laboratory experiment with 3 conditions was designed. This enabled testing the effects of information relevancy and interactivity on intensivists’ trust in a simulated ML-based bacteremia prediction system. Bacteremia is a common phenomenon in ICUs, that clinicians need to identify and respond to [
]. Thus, a decision support system that assists clinicians in identifying this condition can serve as a good reference for generalizing and deriving implications for the UI design of many ML-based CDSSs. Each experimental condition was characterized by a different set of UI. The effects were measured with both a behavioral measure (the participants’ decisions that were captured by the simulation software) and a postexperiment questionnaire that captured their perceived understanding of the system.Participants
The participants were 47 physicians (female: n=14; male: n=33) from critical care specialties of 5 tertiary hospitals in Israel. They were recruited through a convenience sample of on-duty physicians and were free to withdraw from the study at any time. The experiment was conducted for 1 month, between the first and second COVID-19 lockdowns in Israel. All the participants were compensated with a gift card (US $15) and there were no exclusion criteria except for being a critical care physician.
Ethical Considerations
This research complied with the American Psychological Association Code of Ethics and was approved by the institutional review board at Ben-Gurion University of the Negev (21-12-19). Informed consent was obtained from each participant.
Experimental Design
To test the hypotheses, a 2×2 (relevant/nonrelevant×interactive/noninteractive) between-subjects fractional factorial experiment was designed. The experiment included 3 conditions (as shown in
). The 15‐16 participants were randomly assigned to 1 of the 3 conditions; the duration of their performance was not limited. A total of 3 clinical cases of patients who were hospitalized in an ICU with medical conditions implying bacteremia onset were extracted. The presentations of these cases were designed by 3 experienced intensivists to provide accurate context.Noninteractive | Interactive | |
Nonrelevant information | 1 | — |
Relevant information | 2 | 3 |
aNot tested.
Apparatus and Stimuli
A total of 3 UIs that represent 3 medical conditions were designed using Axure RP software (version 9.1; Axure Software Solutions, Inc). The interfaces were imitating an ML bacteremia prediction system. The system, which at the time of the study was still in its development stage, provides prediction and a list of the main features that were significant for the prediction algorithm. The right section of all the interfaces presented similar time-series charts. The charts included trends over time for the 10 clinical measures that are most related to bacteremia prediction. The information that was presented in the left section was manipulated to match the 3 conditions. An example of an interface (condition 2) is shown in
- .The information relevancy level was set by the type of clinical measurements that were presented in a table in the left section of the chart. For the relevant information conditions, the information presented in the table comprised the current values of the same clinical measures that clinicians usually use to assess a patient’s condition. In addition, the normal range of each measure was presented. In the nonrelevant information condition, the information presented in the table comprised the values of the 10 features that were ranked as most important by the bacteremia ML prediction algorithm for making the prediction. Although these features were most significant for the prediction algorithm, they were not usually used by clinicians and, therefore, were considered nonrelevant (see
).The interface interaction level was set by the type of interaction that the participants were assigned with the UI. In the interactive condition, the participants were required to enter values of the patient’s current clinical measures (the values provided in the written clinical case) before they could explore the other charts and information. Entering and copying values to and from the patient record is a common task clinicians apply in a subset of the IT systems in the ICU. In the noninteractive conditions, the information about the patients appeared right away, and the participants could only explore the information and ask the system for its prediction (see
).The fourth combination, nonrelevant information and interactivity, was not tested, as in the nonrelevant information condition, the information that was presented was of the features of the algorithm. Thus, including the algorithm features in the clinical case and entering them into the UI would seem unrealistic.
Procedure
The participants were introduced to the purpose of the study and received an explanation about the ML bacteremia prediction system. They were then introduced to the simulation software, with the UI fitting the condition they were assigned. The participants were asked to first read the clinical case, and only then to explore the UI. After exploring the UI, they could click on the “calculate algorithm result” button to receive the algorithm’s prediction. The predictions that were presented to the participants were accurate. Participants in the interactive condition had to enter the values of the patient’s current clinical measures before the system calculated the algorithm result. All the participants were asked to handle the information as though they were taking the described patient under their care, and the information provided was all that was available to them.
After the algorithm presented its prediction, the participants could continue to explore the UI and the information presented, and then answer whether they agreed with the algorithm’s prediction or not. After answering this question, they proceeded to the same procedure with two additional clinical cases. To avoid order bias, counterbalancing was used. The number of times participants agreed with the system’s prediction represents their reaction to the system.
Postexperiment Questionnaires
After completing the 3 clinical cases, the participants answered 2 demographic questions about their experience and gender and 3 questionnaires about their trust in the system, the interactivity of the system, and the information relevancy of the system. The postexperiment questionnaires measured perceived understanding of the system. These consisted of the AIMQ (AIM quality) questionnaire [
] to measure information relevancy, 7 items from an interactivity questionnaire [ ] that assessed interactivity, and 14 items from a questionnaire that assessed trust [ ]. The latter questionnaire associated the CBT subdimensions of understandability, technical competence, and reliability from the human-computer trust questionnaire. All the questionnaires used a 7-point Likert scale (1=low and 7=high). See for the entire list of the variables. To control for possible variance, the gender and years of experience of the participants were recorded. These analyses were performed because studies have shown a significant impact of gender [ , , ] and years of experience [ , ] on the interaction of humans with automation, and a consequent influence on the development of human-automation trust. The questionnaire questions are presented in .Construct | Scale | How it was measured |
Years of experience | Continuous | Demographics |
Gender | Nominal | Demographics |
UI | level of information relevancyBinary | By design |
UI level of interactivity | Binary | By design |
Information relevancy rating | Discrete (1-7) | AIMQ | questionnaire [ ]
Interactivity rating | Discrete (1-7) | McMillan and Hwang [ | ]
Understandability | Discrete (1-7) | HCT | ; Medsen and Gregor [ ]
Technical competence | Discrete (1-7) | HCT; Medsen and Gregor [ | ]
Reliability | Discrete (1-7) | HCT; Medsen and Gregor [ | ]
Cognitive-based trust | Discrete (1-7) | HCT; Medsen and Gregor [ | ]
Agreement with the system | Discrete (0‐3) | Simulation software |
aUI: user interface.
bAIMQ: AIM quality.
cHCT: human-computer trust.
Data Analysis
To measure the participants’ immediate reaction to the system, the participants were grouped by the number of times they agreed with the system’s prediction. This information was compared with their information relevancy rating. Due to the different group sizes, the Welch test was used to conduct the comparisons.
A linear regression model was used to assess the influence of several variables on trust as a single construct (cognitive-based trust). Although the 2 study hypotheses aimed to identify the main effects of information relevancy and interactivity on trust, variables 1‐6 (years of experience, gender, UI level of information relevancy, UI level of interactivity, information relevancy rating, and interactivity rating) were included in the model to control for possible variance. Interactions were assessed on gender and years of experience with all the other variables.
Three linear regression models were used to assess the effect of CBT subdimensions. Variables 1‐6 were included in the models to control possible variance. Interactions were assessed on gender and years of experience with all the other variables.
Results
Participants’ Agreement With the System’s Prediction
The conditions of the experiments (variables 3 and 4;
) were not found to be associated with the participants’ trust in the system. However, participants’ responses to the postexperiment questionnaires reveal significant findings. Overall, the higher the participants rated information relevancy, the more frequently they agreed with the system’s prediction. Information relevancy was rated significantly higher among those who agreed 3 times with the system’s prediction compared to those who did not agree at all (t11=–3.924, 2-tailed; P=.05). No other comparisons between the groups were significant (see ). Participants’ agreement with the system’s prediction did not differ according to their experience, gender, or the interactivity ratings of the system.Trust as a Single Construct
The significant main effect for the UI level of information relevancy revealed that relevant information resulted in higher perceived trust (β=2.684; P=.05). Higher information relevancy ratings (β=.824; P<.001) and higher interactivity ratings (β=.613; P<.001) were associated with higher perceived trust in the system. A significant interaction between UI level of interactivity and years of experience (β=–.056; P=.05) revealed lower trust ratings among experienced participants with higher interactivity ratings. The adjusted R2 of the regression model was 0.5296.
CBT Subdimensions
A significant main effect was observed for the UI level of information relevancy and technical competence (β=4.5; P<.001). In addition, across all the models, significant main effects for information relevancy ratings and interactivity ratings were observed. The statistical measures are summarized in
. No other significant main or interaction effects were observed across the subdimensions.CBT subdimension | βinformation relevancy ratings | Pinformation relevancy ratings | βinteractivity ratings | Pinteractivity ratings |
Technical competence | 1.18 | <.001 | .6 | <.001 |
Understandability | .4 | <.001 | .53 | <.001 |
Reliability | .72 | <.001 | .53 | <.001 |
Discussion
Principal Findings
Trust is difficult to measure. Participants’ agreement with the system’s prediction did not differ according to the experimental conditions. However, in the postexperiment questionnaire, higher information relevancy ratings and interactivity ratings were associated with higher perceived trust in the system, and the explicit visual presentation of the features of the ML algorithm on the user interface resulted in lower trust by the participants.
Information Relevancy
The results of our experiment revealed that information relevancy plays an important role in operators’ trust in ML-based systems. Two different, but complementary questions were addressed and they are (1) to what extent does relevant information enhance intensivists’ trust in ML-based CDSSs? and (2) what type of information do intensivists consider to be relevant? The answer to the first question is derived directly from the results—perceived relevant information is important and affects various aspects of the operators’ trust in the system. This finding supports the first hypothesis and corroborates studies from diverse domains, which found that information relevancy substantially influences users’ trust in technological systems [
, , ].Regarding the second question, discerning the type of information that intensivists consider relevant is more complicated. As hypothesized, providing detailed information about the algorithm’s features decreased the participants’ trust in the system. A possible explanation for the decreased trust is that the participants found the detailed information about the ML algorithm confusing and irrelevant. Accordingly, the information about the ML algorithm may have supported the participants’ belief that they were dealing with a black box algorithm, and this, in turn, may have fostered distrust of the system [
].Across all the CBT subdimensions assessed (understandability, technical competence, and reliability), the greater the relevancy of the information presented in the UI, according to the participants, the higher their trust. This concurs with the analysis of trust as a standalone construct and thus supports the first hypothesis.
The understandability and reliability ratings were not found to differ significantly between the information relevancy conditions. This suggests that the presentation of ML features did not significantly decrease the participants’ ratings of understandability and reliability. However, ratings of technical competence did differ between the information relevancy conditions. This could indicate a stronger effect on trust, in the technical competence subdimension, compared to understandability and reliability.
Interactivity
The participants’ trust ratings were not found to differ significantly between conditions. However, trust ratings increased as participants’ perception of UI interactivity increased. This finding supports the second hypothesis and is in line with a meta-analysis by Yang and Shen [
], which concluded that perceived interactivity was much more effective than objective interactivity.Two possibilities arise to explain the gap between participants’ perceptions of the interactivity and the actual UI level of interactivity. First, within the 2 interactivity levels, the objective gap between the different conditions may not have been strong enough. The less interactive condition also forced 2-way communication between the participants and the UI. Possibly, the initial user engagement did not add enough interactivity to render a noticeable difference. Alternatively, the participants may not have perceived increased interactivity. Second, although entering and copying values to and from the patient record is a common task clinicians must apply in a subset of the IT systems in the ICU, participants may have considered that manually entering the patient’s clinical measures was dull or redundant. This could have reduced participants’ opinion of the system and led to lower trust ratings.
Although more interactive perceptions of the UI were associated with higher trust ratings, it is arguable whether extreme levels of interactivity are always preferable. Kalet et al [
] investigated the influence of different interactivity levels in a computer-assisted instruction system on medical students’ performances. They found that a mid-range UI level of interactivity maximized improvements in the performance of clinical skills. Yang and Shen [ ] found that extremely high levels of website interactivity were less effective than moderate levels. However, pinpointing the exact amount of moderate interactivity, universally or specifically for a domain, is challenging. Furthermore, treating interactivity as a continuous variable and fitting it into a linear regression model could lead to measurement and interpretation errors. According to Yang and Shen [ ], interactivity should be considered as a curvilinear variable, with the peak at the center of the curve and not at the edges. When fitting a linear regression model to an interactivity variable, the latter is considered linear, but this is not always the case. This approach may fail to capture the real influence of different levels of interactivity.Across the 3 CBT dimensions examined (understandability, technical competence, and reliability), the more interactive the UI, according to the participants’ perception, the higher their trust. This was precisely the situation when trust was analyzed as a standalone construct. Otherwise, the interactivity levels examined were not found to differ between the CBT dimensions. Notably, a linear regression model was set for each subdimension. Although the results showed that the more interactive the UI, the higher the ratings for each subdimension, moderate levels of interactivity may have had a greater effect on those subdimensions.
Finally, the literature is scant regarding correlations between experience and interactivity, and additional research is needed to elaborate on the significant negative interaction across years of experience and interactivity ratings.
Limitations and Future Research
Some limitations of this study represent opportunities for future research. First, the study design, limited resources, and the period the study was conducted (between the first and second waves of the COVID-19 pandemic) posed limitations on participant recruitment. The limited sample size dictated a design with only 2 levels of each variable. Future research should explore advanced and more realistic UI interactions and different information types. Second, although Madsen and Gregor’s [
] approach was used to analyze trust, the ABT dimensions were not explored. Such investigation is needed to obtain a wider view of the relations between trust and its subdimensions, both cognitive-based and affect-based. Third, due to time limitations, the study did not evaluate participants’ attitudes and changes in trust in the system over time. Finally, the study was performed in a simulation environment, using a specific interface design, and using case studies rather than real-time data from patients. Investigating clinician collaboration with a variety of interface designs, within real-world information systems used in diverse health care settings could yield a deeper understanding of future interface design.Conclusions
Developing ML algorithms is only the first step toward improving medical treatment. To increase acceptance and trust of ML-based CDSSs, and expand their use, a broader and more multidisciplinary approach (eg, user-centered design) should be taken. This approach needs to be specifically evaluated in the health care work environment, considering its unique challenges and professional personnel. A better understanding of means to increase intensivists’ trust in ML-based CDSSs may open new opportunities for user-centered design and improved decision-making processes in the ICU.
Human factor studies, like this one, highlight the importance of understanding the effect of specific UI features when designing ML-based CDSS and other “artificial intelligence” systems. This study focused on the effects of 2 UI features related to intensivists’ trust in ML-based CDSSs. We demonstrated that the level of relevancy of the information that is presented in the UI and the interactivity level of the UI can play major roles when designing ML-based CDSSs. However, to enhance trust in these systems, more UI features should be investigated.
A wide point of view on trust should be maintained. In this study, trust as a standalone construct was influenced significantly by the different information relevancy levels in the tested conditions. Of the CBT subdimensions, only technical competence was influenced in the same way. These findings emphasize the need to analyze trust from different perspectives. For the research community and system designers, this may promote a broad understanding of means to enhance and foster trust in ML-based CDSSs, as well as in other “artificial intelligence” systems.
Conflicts of Interest
None declared.
The questionnaire.
DOCX File, 20 KBReferences
- McKenzie MS, Auriemma CL, Olenik J, Cooney E, Gabler NB, Halpern SD. An observational study of decision making by medical intensivists. Crit Care Med. Aug 2015;43(8):1660-1668. [CrossRef] [Medline]
- Ward NS, Afessa B, Kleinpell R, et al. Intensivist/patient ratios in closed ICUs: a statement from the Society of Critical Care Medicine Taskforce on ICU Staffing. Crit Care Med. Feb 2013;41(2):638-645. [CrossRef] [Medline]
- Sharma M, Savage C, Nair M, Larsson I, Svedberg P, Nygren JM. Artificial intelligence applications in health care practice: scoping review. J Med Internet Res. Oct 5, 2022;24(10):e40238. [CrossRef] [Medline]
- Poon AIF, Sung JJY. Opening the black box of AI-medicine. J Gastroenterol Hepatol. Mar 2021;36(3):581-584. [CrossRef] [Medline]
- Liberman-Pincu E, Bitan Y. Fule—functionality, usability, look-and-feel and evaluation novel user-centered product design methodology—illustrated in the case of an autonomous medical device. Appl Sci. Feb 2021;11(3):985. [CrossRef]
- Obermeyer Z, Emanuel EJ. Predicting the future - big data, machine learning, and clinical medicine. N Engl J Med. Sep 29, 2016;375(13):1216-1219. [CrossRef] [Medline]
- Mohammed M, Pathan ASK. Automatic Defense Against Zero-Day Polymorphic Worms in Communication Networks. Auerbach Publications; 2013.
- Islam MM, Nasrin T, Walther BA, Wu CC, Yang HC, Li YC. Prediction of sepsis patients using machine learning approach: a meta-analysis. Comput Methods Programs Biomed. Mar 2019;170(January):1-9. [CrossRef] [Medline]
- Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. Feb 2, 2017;542(7639):115-118. [CrossRef] [Medline]
- Beck AH, Sangoi AR, Leung S, et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med. Nov 9, 2011;3(108):108ra113. [CrossRef] [Medline]
- Sanchez-Pinto LN, Luo Y, Churpek MM. Big data and data science in critical care. Chest. Nov 2018;154(5):1239-1248. [CrossRef] [Medline]
- Choy G, Khalilzadeh O, Michalski M, et al. Current applications and future impact of machine learning in radiology. Radiology. Aug 2018;288(2):318-328. [CrossRef] [Medline]
- Stewart J, Sprivulis P, Dwivedi G. Artificial intelligence and machine learning in emergency medicine. Emerg Med Australas. Dec 2018;30(6):870-874. [CrossRef] [Medline]
- Wu M, Hughes MC, Parbhoo S, Zazzi M, Roth V, Doshi-Velez F. Beyond sparsity: tree regularization of deep models for interpretability. Presented at: AAAI’18: AAAI Conference on Artificial Intelligence; Feb 2-7, 2018; New Orleans, LA. [CrossRef]
- Bitan Y, Patterson ES. Unique challenges in user interface design for medical devices that use predictive algorithms. Proc Int Symp Hum Factors Ergon Health Care. Sep 2020;9(1):265-266. [CrossRef]
- Narayanan M, Chen E, He J, Kim B, Gershman S, Doshi-Velez F. How do humans understand explanations from machine learning systems? an evaluation of the human-interpretability of explanation. arXiv. Preprint posted online on Feb 2, 2018. URL: https://arxiv.org/abs/1802.00682 [Accessed 2024-07-25] [CrossRef]
- Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. arXiv. Preprint posted online on Feb 28, 2017. URL: https://arxiv.org/abs/1702.08608 [Accessed 2024-07-25] [CrossRef]
- Du M, Liu N, Hu X. Techniques for interpretable machine learning. Commun ACM. Dec 20, 2019;63(1):68-77. [CrossRef]
- Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. May 2019;1(5):206-215. [CrossRef] [Medline]
- Miller T. Explanation in artificial intelligence: insights from the social sciences. Artificial Intelligence. Feb 2019;267:1-38. [CrossRef]
- Kay M, Kola T, Hullman JR, Munson SA. When (ish) is my bus? User-centered visualizations of uncertainty in everyday, mobile predictive systems. Presented at: CHI’16: CHI Conference on Human Factors in Computing Systems; May 7-12, 2016:5092-5103; San Jose, CA. [CrossRef]
- Kulesza T, Stumpf S, Burnett M, Kwan I. Tell me more? The effects of mental model soundness on personalizing an intelligent agent. Presented at: CHI ’12: CHI Conference on Human Factors in Computing Systems; May 5-10, 2012:1-10; Austen, TX. [CrossRef]
- Stumpf S, Rajaram V, Li L, et al. Interacting meaningfully with machine learning systems: three experiments. Int J Hum Comput Stud. Aug 2009;67(8):639-662. [CrossRef]
- Yang R, Newman MW. Learning from a learning thermostat: lessons for intelligent systems for the home. Presented at: UbiComp ’13: The 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing; Sep 8-12, 2013:93-102; Zurich, Switzerland. [CrossRef]
- Xu W, Dainoff MJ, Ge L, Gao Z. Transitioning to human interaction with AI systems: new challenges and opportunities for HCI professionals to enable human-centered AI. Int J Hum-Comput Interact. Feb 7, 2023;39(3):494-518. [CrossRef]
- Abdul A, Vermeulen J, Wang D, Lim BY, Kankanhalli M. Trends and trajectories for explainable, accountable and intelligible systems. Presented at: CHI '18: CHI Conference on Human Factors in Computing Systems; Apr 21-26, 2018:1-18; Montreal, QC. [CrossRef]
- Parasuraman R, Riley V. Humans and automation: use, misuse, disuse, abuse. Hum Factors. Jun 1997;39(2):230-253. [CrossRef]
- Hoff KA, Bashir M. Trust in automation: integrating empirical evidence on factors that influence trust. Hum Factors. May 2015;57(3):407-434. [CrossRef] [Medline]
- Lee JD, See KA. Trust in automation: designing for appropriate reliance. Hum Factors. Feb 2004;46(1):50-80. [CrossRef]
- Madsen M, Gregor S. Measuring human-computer trust. Proc Elev Australas Conf Inf Syst. 2000;53:6-8. URL: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b8eda9593fbcb63b7ced1866853d9622737533a2 [Accessed 2024-07-25]
- Ghazizadeh M, Lee JD, Boyle LN. Extending the technology acceptance model to assess automation. Cogn Tech Work. Mar 2012;14(1):39-49. [CrossRef]
- Jian JY, Bisantz AM, Drury CG. Foundations for an empirically determined scale of trust in automated systems. Int J Cogn Ergon. Mar 2000;4(1):53-71. [CrossRef]
- Hengstler M, Enkel E, Duelli S. Applied artificial intelligence and trust-the case of autonomous vehicles and medical assistance devices. Technol Forecast Soc Change. Apr 2016;105:105-120. [CrossRef]
- Sheridan TB. Extending three existing models to analysis of trust in automation: signal detection, statistical parameter estimation, and model-based control. Hum Factors. Nov 2019;61(7):1162-1170. [CrossRef] [Medline]
- Mayer RC, Davis JH, Schoorman FD. An integrative model of organizational trust. Acad Manag Rev. Jul 1995;20(3):709. [CrossRef]
- Cassell J, Sullivan J, Prevost S, Churchill E. Embodied Conversational Agents. MIT Press; 2000.
- Paiva A. Affective interactions: towards a new generation of computer interfaces. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer; 1999. [CrossRef]
- Pickering MJ, Garrod S. Toward a mechanistic psychology of dialogue. Behav Brain Sci. Apr 2004;27(2):169-190. [CrossRef] [Medline]
- Muylle S, Moenaert R, Despontin M. The conceptualization and empirical validation of web site user satisfaction. Inf Manag. May 2004;41(5):543-560. [CrossRef]
- Park YA, Gretzel U. Success factors for destination marketing web sites: a qualitative meta-analysis. J Travel Res. Aug 2007;46(1):46-63. [CrossRef]
- Song J, Zahedi FM. Trust in health infomediaries. Decis Support Syst. Mar 2007;43(2):390-407. [CrossRef]
- Tsakonas G, Papatheodorou C. Analysing and evaluating usefulness and usability in electronic information services. J Inf Sci. Oct 2006;32(5):400-419. [CrossRef]
- Steuer J. Defining virtual reality: dimensions determining telepresence. J Commun. Dec 1992;42(4):73-93. [CrossRef]
- McMillan SJ, Hwang JS. Measures of perceived interactivity: an exploration of the role of direction of communication, user control, and time in shaping perceptions of interactivity. J Advert. Oct 2002;31(3):29-42. [CrossRef]
- Bezjian-Avery A, Calder B, Iacobucci D. New media interactive advertising vs traditional advertising. J Advert Res. 1998;38(4):23-32.
- Hoffman DL, Novak TP. Marketing in hypermedia computer-mediated environments: conceptual foundations. J Mark. Jul 1996;60(3):50-68. [CrossRef]
- Sorrell M, Salama E, Levin M, et al. The future of interactive marketing. Harvard Business Review. 1996. URL: https://hbr.org/1996/11/the-future-of-interactive-marketing [Accessed 2024-07-25]
- Cyr D, Head M, Ivanov A. Perceived Interactivity leading to e-loyalty: development of a model for cognitive–affective user responses. Int J Hum Comput Stud. Oct 2009;67(10):850-869. [CrossRef]
- Lee T. The impact of perceptions of interactivity on customer trust and transaction intentions in mobile commerce. J Electron Commer Res. 2005;6(3):165-180. URL: http://www.jecr.org/sites/default/files/06_3_p01.pdf [Accessed 2024-07-25]
- Vallés J, León C, Alvarez-Lerma F. Nosocomial bacteremia in critically ill patients: a multicenter study evaluating epidemiology and prognosis. Spanish collaborative group for infections in intensive care units of Sociedad Espanola de Medicina Intensiva Y Unidades Coronarias (SEMIUC). Clin Infect Dis. Mar 1997;24(3):387-395. [CrossRef] [Medline]
- Lee YW, Strong DM, Kahn BK, Wang RY. AIMQ: a methodology for information quality assessment. Inf Manage. Dec 2002;40(2):133-146. [CrossRef]
- Schaefer KE, Chen JYC, Szalma JL, Hancock PA. A meta-analysis of factors influencing the development of trust in automation: implications for understanding autonomy in future systems. Hum Factors. May 2016;58(3):377-400. [CrossRef] [Medline]
- Stanton B, Jensen T. Trust and artificial intelligence. National Institute of Standards and Technology; 2021. URL: https://www.nist.gov/publications/trust-and-artificial-intelligence-draft [Accessed 2024-07-25]
- Nicolaou AI, McKnight DH. Perceived information quality in data exchanges: effects on risk, trust, and intention to use. Inf Syst Res. Dec 2006;17(4):332-351. [CrossRef]
- Zhou T. An empirical examination of initial trust in mobile banking. Internet Res. Aug 12, 2011;21(5):527-540. [CrossRef]
- Yang F, Shen F. Effects of web interactivity: a meta-analysis. Commun Res. Jul 2018;45(5):635-658. [CrossRef]
- Kalet AL, Song HS, Sarpel U, et al. Just enough, but not too much Interactivity leads to better clinical skills performance after a computer assisted learning module. Med Teach. 2012;34(10):833-839. [CrossRef] [Medline]
Abbreviations
ABT: affect-based trust |
AIMQ: AIM quality |
CBT: cognition-based trust |
CDSS: clinical decision support system |
HCI: human-computer interaction |
ICU: intensive care unit |
ML: machine learning |
UI: user interface |
Edited by Avishek Choudhury; submitted 30.01.24; peer-reviewed by Adeola Bamgboje-Ayodele, Liz Herrle, Martin Sedlmayr, Robert Marshall, Suptendra Sarbadhikari; final revised version received 09.05.24; accepted 24.05.24; published 01.08.24.
Copyright© Omer Katzburg, Michael Roimi, Amit Frenkel, Roy Ilan, Yuval Bitan. Originally published in JMIR Human Factors (https://humanfactors.jmir.org), 1.8.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on https://humanfactors.jmir.org, as well as this copyright and license information must be included.