Original Paper
Abstract
Background: Artificial intelligence (AI), such as machine learning (ML), shows great promise for improving clinical decision-making in cardiac diseases by outperforming statistical-based models. However, few AI-based tools have been implemented in cardiology clinics because of the sociotechnical challenges during transitioning from algorithm development to real-world implementation.
Objective: This study explored how an ML-based tool for predicting ventricular tachycardia and ventricular fibrillation (VT/VF) could support clinical decision-making in the remote monitoring of patients with an implantable cardioverter defibrillator (ICD).
Methods: Seven experienced electrophysiologists participated in a near-live feasibility and qualitative study, which included walkthroughs of 5 blinded retrospective patient cases, use of the prediction tool, and questionnaires and interview questions. All sessions were video recorded, and sessions evaluating the prediction tool were transcribed verbatim. Data were analyzed through an inductive qualitative approach based on grounded theory.
Results: The prediction tool was found to have potential for supporting decision-making in ICD remote monitoring by providing reassurance, increasing confidence, acting as a second opinion, reducing information search time, and enabling delegation of decisions to nurses and technicians. However, the prediction tool did not lead to changes in clinical action and was found less useful in cases where the quality of data was poor or when VT/VF predictions were found to be irrelevant for evaluating the patient.
Conclusions: When transitioning from AI development to testing its feasibility for clinical implementation, we need to consider the following: expectations must be aligned with the intended use of AI; trust in the prediction tool is likely to emerge from real-world use; and AI accuracy is relational and dependent on available information and local workflows. Addressing the sociotechnical gap between the development and implementation of clinical decision-support tools based on ML in cardiac care is essential for succeeding with adoption. It is suggested to include clinical end-users, clinical contexts, and workflows throughout the overall iterative approach to design, development, and implementation.
doi:10.2196/26964
Keywords
Introduction
Ventricular tachycardia and ventricular fibrillation (VT/VF) are potentially lethal cardiac arrhythmias, which constitute a growing challenge to health care systems worldwide [
]. The development of implantable cardioverter defibrillators (ICDs) has led to major advances in the prevention of death from VT/VF [ ]. ICDs are implantable devices used in patients at increased risk of sudden cardiac death. ICDs monitor the heart rhythm continuously to detect and treat VT/VF. In recent years, remote monitoring has become the standard of care for ICD patients [ ], and follow-ups are based on transmission of data from the implanted device through the patient’s home monitoring box. This has reduced the number of in-office follow-ups [ , ] and increased survival rates [ ] due to improved early detection of arrhythmias [ ]. However, the numbers of ICD implants are increasing worldwide, posing a workload challenge for electrophysiologists and technicians when assessing data from incoming transmissions in remote monitoring centers [ - ]. There is a growing need for decision-making tools that can support and reduce data-intensive remote follow-ups, and while current systems can detect and treat VT/VF arrhythmias as they occur, tools for predicting arrhythmias before their onset are lacking [ ].Artificial intelligence (AI), such as machine learning (ML), shows great promise for improving clinical decision-making in cardiac diseases by outperforming statistical-based models [
, ], and recent examples include promising models for the prediction of heart disease and heart failure [ - ], as well as cardiac arrhythmias, such as ventricular arrhythmia [ ], atrial fibrillation [ ], and electrical storm [ ]. There are positive attitudes and high expectations among physicians that AI will improve future patient care in fields where data are collected continuously, such as cardiology [ , ].However, few prediction outcome algorithms based on ML have been implemented in cardiology clinics because of the challenges during transitioning from algorithm development to real-world implementation. While studies of medical AI-based tools that undergo prospective clinical validation are emerging [
- ], there is a general lack of understanding of how AI may support achieving clinical effectiveness and improve patient care in real-life settings [ , ]. Scholars have argued that ML-based patient outcome prediction models are yet to prove their worth to human clinicians [ ]. Prediction accuracy by itself can be impressive in the lab; however, this does not always translate to better treatment, and it is being stressed to look for ways to make human and AI prediction algorithms complement each other, ensuring actionability in clinical practice [ - ]. Going from research and development environments to hospital or clinical contexts is considered a challenging task that has been named “the last mile” of implementing medical AI-based tools [ , ], and there is a call for research on how end-users find AI-based user interfaces useful in practice [ - ], as well as studies that report on the sociotechnical challenges of deploying AI-based tools in complex clinical environments [ , , , - ].This study addresses the sociotechnical gap between the development and implementation of a clinical decision-support tool based on ML for the prediction of VT/VF in remote monitoring of ICD patients. The aim of this study was to explore the feasibility and clinician preimplementation perspectives of using a prediction tool for improved workflows. Therefore, this study does not provide algorithmic validation per se but instead answers questions about the clinical feasibility and workflow integration of a decision-support tool based on ML.
Methods
Understanding Needs and Co-design of the Prediction Tool
This study was conducted at the remote monitoring center at Rigshospitalet, Copenhagen University Hospital, Denmark, which is a large tertiary hospital covering all aspects of treatments in cardiology and is among the largest centers in Europe having more than 4000 patients with cardiac implanted electronic devices in remote follow-up. The study was organized in 3 stages (
). In the first stage, field work observations in the remote monitoring clinic were conducted to understand both the clinical workflow and workload [ ]. This was followed by 3 co-design workshops with an electrophysiologist (PKJ) and 5 co-design workshops with a cardiologist consultant (SZD) focusing on feature engineering and sketching the user interface. In stage 2, the AI algorithm was developed, and in stage 3, a near-live feasibility and qualitative interview study was conducted. The study was reviewed by the Danish National Board of Health and the Danish National Committee on Health Research Ethics, and authorized by The Capital Region of Denmark.Development of the AI Algorithm
A prediction tool was developed for improving the support for clinical decision-making in ICD remote monitoring based on the random forest ML method, and it consisted of a risk prediction algorithm of VT/VF within 30 days. The prediction tool was designed to show alarm status (yes/no), risk probability (%), and ranking of the 5 most and least important parameters for the prediction, using the LIME technique [
] ( ). The design and development of the tool were informed by previous fieldwork studies of current practices [ , ], as well as early results from using ML to predict electrical storm, a severe form of cardiac arrhythmia [ ]. The data set used for developing the algorithm consisted of 11,921 transmissions from 1251 patients with an ICD or a cardiac resynchronization therapy defibrillator (CRT-D), followed over a 4-year period from 2015 to 2019 at Rigshospitalet. The data set contained 74,149 arrhythmia episodes, each characterized by 7 variables, such as the type of arrhythmia (VT, VF, supraventricular tachycardia, atrial fibrillation, etc), ICD treatment of the arrhythmia, duration of the episode, and maximum heart rate reached during the episode.The random forest ML method [
] was selected for algorithm development because it provided optimal results when considering the tradeoffs between model performance and explainability. Several other classifier methods (supervised, unsupervised, and deep learning methods) were evaluated through development and testing, including KNeighborsClassifier [ ], GradientBoostingClassifier [ ], AdaBoostClassifier [ ], support vector classifier [ ], and long short-term memory (LSTM) [ ]. The deep learning method, LSTM, provided poorer performance and poorer explainability, possibly due to the nature of the data (ie, time series data with considerable time between events, making time series modeling difficult). The other methods provided similar performance. KNeighborsClassifier and support vector machine had the worst performance, while the decision tree methods had the best performance. GradientBoostingClassifier produced an optimal F1 score and recall score; however, random forest provided the highest accuracy and precision scores, which led to the choice of using the random forest method for developing the first version of the algorithm to be evaluated with end-users in this study. The algorithm was tested on 2342 of the 11,921 transmissions. The transmission data were stratified and grouped into training and test sets. This means that the prevalence of the positive condition was the same in both the training and test sets (stratified) and that no patient had data in both data sets (grouped). The algorithm achieved an accuracy of 0.96, with a positive predictive value of 0.67 and a negative predictive value of 0.97. The probability threshold for raising an alarm was set to 0.28, indicating the value with an optimal tradeoff between negative and positive predictive outcomes.Feature engineering was carried out in collaboration between 2 data scientists (MKHH and CV) and a cardiologist consultant (SZD) during 5 co-design workshops. A total of 48 features (referred to as parameters when discussed with the study participants) were developed, and the following 2 main principles were adopted: aggregating episodes by day and building a historic snapshot for days leading up to the arrhythmic event. To provide the clinical end-user with algorithm explainability, the LIME technique [
] was used to show the top 5 features that increase or decrease the likelihood of a VT/VF arrhythmic event occurring within the coming 30 days.Study Participants and Case Selection
Seven medical doctors specialized in electrophysiology (ie, cardiologists treating patients with cardiac arrhythmia) were selected for participation from a convenience sample (
). Participants included 6 males and 1 female (average age, 52 years; average work experience as an electrophysiologist, 13 years).A selection of 5 retrospective patient cases (
) was used to evaluate the feasibility of the prediction tool’s ability to support clinical decision-making. The cases included high and low risk probability, true positives, and true negatives, and 2 cases with AF as the primary episode type. Patient cases were retrieved 19 to 27 months back in time, blinded, and presented as paper printouts with a summary of each patient’s clinical history along with reports from the electronic health record (list of diagnoses, progress notes from the cardiology department, latest blood tests, and list of medications) and screenshots of relevant ICD transmission data, including device type, battery status, device programming and settings, time of implantation, latest diagnostic information about the transmission, frequency of arrhythmias, heart rate, device therapy, and assessment of physical activity.Participant | Sex | Age (years) | Title | Years since obtaining specialist certification in cardiology |
1 | Female | 52 | Consultant cardiologist, MD, PhD | 11 |
2 | Male | 61 | Professor, consultant cardiologist, MD, DMSc | 23 |
3 | Male | 55 | Consultant cardiologist, MD, PhD | 14 |
4 | Male | 43 | Cardiologist, MD, PhD | 2 |
5 | Male | 62 | Consultant cardiologist, MD, DMSc | 28 |
6 | Male | 44 | Cardiologist, MD, PhD | 2 |
7 | Male | 47 | Consultant cardiologist, MD, DMSc | 9 |
Case number | Patient summary | Current ICDa transmission | Prediction tool | |||||
Transmission type | Primary episode type | ICD treatment | Transmission summary | 30-day VTb/VFc risk probability | Alarm raised (prediction outcome) | |||
1 | Male, age 63 years, ischemic heart failure, left ventricular assist device | Automated | VT/VF | ATPd | 3 VT/VF; 36 sensing episodes; 217 VT-NSe | 58.6 | Yes (true positive) | |
2 | Female, age 67 years, dilated cardiomyopathy | Automated | VT/VF | Shock | 1 VT/VF; 1 VT-NS; 20 min of AFf since the last transmission | 14.4 | No (true negative) | |
3 | Female, age 40 years, dilated cardiomyopathy | Automated | VT/VF | Shock | 2 VT/VF; 4 VT-NS | 35.4 | Yes (true positive) | |
4 | Male, age 61 years, ischemic heart failure | Patient initiated | AF | None | 12 hours of AF since the last transmission | 1.2 | No (true negative) | |
5 | Male, age 73 years, ischemic heart failure | Automated | AF | None | 14 hours of AF since the last session; 26 VT-NS | 7.8 | No (true negative) |
aICD: implantable cardioverter defibrillator.
bVT: ventricular tachycardia.
cVF: ventricular fibrillation.
dATP: antitachycardia pacing.
eVT-NS: nonsustained ventricular tachycardia.
fAF: atrial fibrillation.
Data Collection
A combined feasibility and qualitative interview study was undertaken based on a retrospective case study design. The primary aim of the study was to address the following 4 main questions about the feasibility of the prediction tool using quantitative measures: Does use of the tool lead to change in clinical action? Does it support decision-making? Are visualizing parameters useful? Can it reduce time spent? The secondary aims were to understand the electrophysiologist’s immediate reactions to using the prediction tool, including qualifying the quantitative feasibility measures against qualitative dimensions based on interviews. Electrophysiologists were invited to conduct a “near-live” clinical simulation of decision-making based on walkthroughs of the 5 patient cases (
) with and without the prediction tool. Two structured questionnaires based on a 5-point Likert scale were designed to capture electrophysiologists’ decisions on action without the prediction tool ( ) and their experiences of the feasibility of the prediction tool ( ). A semistructured interview guide was designed based on the framework of Bowen et al for feasibility studies [ ] to cover open-ended questions about the electrophysiologists’ overall experiences of using the prediction tool. Ten questions in the following 4 areas of inquiry were posed: acceptability, demand, adoption, and implementation ( ).“Near-live” case walkthroughs were performed with inspiration from Li et al [
] ( ), and they were facilitated by the authors SM, MKHH, and TOA. First, the electrophysiologist was on-boarded with a presentation of the study objectives, the intended use of the prediction tool, the algorithm development (data set and ML model, as well as results), and the outline of the feasibility and qualitative study processes, and time was provided to resolve open questions. Second, the electrophysiologist was provided with a patient case and asked to do a walkthrough of the case material to reach a decision on clinical action, similar to normal clinical practice, and was asked to answer the first questionnaire and explain the reasoning behind the decision on clinical action. Third, the electrophysiologist received the prediction tool on paper and was asked to answer the second questionnaire for evaluation of the effects of the prediction tool and to share his/her immediate reactions. Fourth, after ending all patient case walkthroughs, the electrophysiologist was interviewed about his/her experience of the feasibility of the prediction tool. The total time for observations and interviews was 12.5 hours, with an average of 1 hour 47 minutes per electrophysiologist. Case walkthroughs and interviews were audio and video recorded, and sections with electrophysiologists’ responses to the questionnaires and the open-ended interview were transcribed verbatim.Data Analysis
Data from electrophysiologists’ reactions to the interview study were analyzed using an inductive qualitative approach based on grounded theory [
]. A 2-step iterative coding process was applied beginning with line-by-line coding to support initial analytic decisions about the data. Action codes were developed by using gerunds (a noun form of a verb) to make explicit what electrophysiologists were doing during case walkthroughs and what meaning they derived (eg, “being confirmed,” “building trust,” and “using to prioritize”). This was done to preserve focus on action and situated processes of electrophysiologists’ decision-making, and to turn thematic descriptions into analytical insights in later stages of the analysis. Focused coding was carried out by iteratively sorting and synthesizing line-by-line codes into themes and subthemes related to the research questions and by constructing key insights. This process allowed for comparing and turning frequently reappearing initial codes across large amounts of data, and obtaining more general and analytically incisive findings (eg, “predictions can serve as a second opinion” and “decision-making workload is reduced when trust in the prediction tool is established”). The entire process was carried out iteratively in collaboration between SM and TOA using the qualitative data analysis software NVivo 12 (QSR International).Results
Feasibility of the Prediction Tool in Clinical Practice
Does the Prediction Tool Change Clinical Decisions?
Overall, the electrophysiologists did not change their decisions on clinical action when presented with the 30-day VT/VF arrhythmia prediction (
). However, several electrophysiologists found that the prediction tool was helpful ( , Quote 1) and increased their confidence in their choice of clinical action, and that the predictions could help prioritize patients ( , Quote 2) and determine what action to take in relation to the local circumstances at the clinic ( , Quote 3).Question and answer | Total (N=35), n (%) | Case 1 (N=7), n (%) | Case 2 (N=7), n (%) | Case 3 (N=7), n (%) | Case 4 (N=7), n (%) | Case 5 (N=7), n (%) | |
Q1: The prediction tool made me change my decision on clinical action | |||||||
Yes | 1 (3) | 1 (14) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |
No | 34 (97) | 6 (86) | 7 (100) | 7 (100) | 7 (100) | 7 (100) | |
Q1a: I will contact the patient | |||||||
Strongly disagree/disagree | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |
Neither agree nor disagree | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |
Agree/strongly agree | 35 (100) | 7 (100) | 7 (100) | 7 (100) | 7 (100) | 7 (100) | |
Q1b: I will book a procedure or reschedule an existing procedure | |||||||
Strongly disagree/disagree | 3 (9) | 0 (0) | 1 (14) | 1 (14) | 0 (0) | 1 (14) | |
Neither agree nor disagree | 6 (17) | 1 (14) | 0 (0) | 0 (0) | 4 (57) | 1 (14) | |
Agree/strongly agree | 26 (74) | 6 (86) | 6 (86) | 6 (86) | 3 (43) | 5 (71) | |
Q1c: I will do something else | |||||||
Strongly disagree/disagree | 19 (54) | 5 (71) | 3 (43) | 5 (71) | 2 (29) | 4 (57) | |
Neither agree nor disagree | 5 (14) | 1 (14) | 1 (14) | 1 (14) | 2 (29) | 0 (0) | |
Agree/strongly agree | 11 (31) | 1 (14) | 3 (43) | 1 (14) | 3 (43) | 3 (43) | |
Q2: The prediction tool supported my decision-making | |||||||
Strongly disagree/disagree | 8 (23) | 2 (29) | 1 (14) | 0 (0) | 3 (43) | 2 (29) | |
Neither agree nor disagree | 4 (11) | 1 (14) | 0 (0) | 1 (14) | 1 (14) | 1 (14) | |
Agree/strongly agree | 23 (66) | 4 (57) | 6 (86) | 6 (86) | 3 (43) | 4 (57) | |
Q3: The prediction tool’s visualization of parameters supported my decision making | |||||||
Strongly disagree/disagree | 11 (31) | 3 (43) | 1 (14) | 2 (29) | 3 (43) | 2 (29) | |
Neither agree nor disagree | 3 (9) | 1 (14) | 0 (0) | 0 (0) | 1 (14) | 1 (14) | |
Agree/strongly agree | 21 (60) | 3 (43) | 6 (86) | 5 (71) | 3 (43) | 4 (57) | |
Q4: The prediction tool can help me reach a decision faster | |||||||
Strongly disagree/disagree | 13 (37) | 4 (57) | 2 (29) | 2 (29) | 2 (29) | 3 (43) | |
Neither agree nor disagree | 4 (11) | 1 (14) | 0 (0) | 0 (0) | 3 (43) | 0 (0) | |
Agree/strongly agree | 18 (51) | 2 (29) | 5 (71) | 5 (71) | 2 (29) | 4 (57) |
Themes, insights, and illustrative quotes describing the feasibility of the prediction tool.
Taking Action
Insights: The prediction tool led to no change in clinical action; the prediction tool can increase confidence in clinical action; the prediction tool can help prioritize clinical action and patients; and being confirmed supports decision-making.
- Quote 1: Well, it hasn’t changed my current decision, but the basis is much better, and I can easily see that it has helped me. [Case 3, Electrophysiologist #7]
- Quote 2: If you are in a busy situation where many transmissions have arrived and the technician and I have to maneuver and prioritize, there is no doubt that we will concentrate on those with high-risk predictions. [Electrophysiologist #5]
- Quote 3: This [tool prediction] is something that might make me react a little more aggressively. […] Now I've been told that he's actually more likely to get an episode within the next month than he's not getting an episode […] if our program is fully booked, both today and tomorrow, and the day after tomorrow, but on Friday we have a time. Then I kind of have to make a trade off if I really want to spare him a shock. Which may turn into a lot of shocks. [Case 1, Electrophysiologist #2]
Decision-Making
Insights: The prediction tool predictions served as a second opinion; the prediction tool supported gathering of thoughts; the overall presentation of the prediction tool needs to be easily translatable to clinical relevance; and being confirmed supports decision-making.
- Quote 4: So, I agree with the conclusion, it was also my feeling that I would be a little worried about this patient. [Case 3, Electrophysiologist #6]
- Quote 5: But then if it is you now have to convince some [other electrophysiologists] that they should ablate her, then instead of saying that I think so, you can argue that the algorithm thinks so too. So, in that way you can say that you can get an extra view of it. [Case 3, Electrophysiologist #3]
- Quote 6: In that way, the algorithm can be a support because it helps to gather thoughts about things that play a role in whether a person gets a new arrhythmia. [Case 3, Electrophysiologist #5]
- Quote 7: Yes, I think again that if you present 58.6% then it expresses an accuracy that you may have difficulty navigating with. I know it from other areas in the medical world, the thing about expressing something with a decimal number, it expresses an accuracy for which there may be no evidence at all […] I have a hard time relating to the number […] it’s problematic to translate that into something clinically relevant. [Case 1, Electrophysiologist #5]
- Quote 8: I agree with what the alarm tells me, but I don’t think it has helped me very much right here. [Case 5, Electrophysiologist #5]
Visualization
Insights: The prediction tool should provide actionable parameters; showing parameters enables confirmation and agreement; showing parameters enables in-situ validation of algorithmic inputs and the prediction tool result; the prediction tool performs only as good as the data it bases its predictions on; transparency about the algorithmic data input helped raise confidence and trust; and showing important parameters is more important than showing the output probability.
- Quote 9: To list what counts for and what counts against, makes really good sense. That’s also how it works in my head. [Case 3, Electrophysiologist #7]
- Quote 10: I think it's super good, I actually think it's really pedagogical, I like it. Because, in reality this is how it confirms the result. It’s basically the same empirical data that you have in your mind: You say “okay, is this a case where we have to do something?” It sums up some assumptions that you have made yourself, and in that way, I actually think you are confirmed more than if you have a green or red light. [Electrophysiologist #4]
- Quote 11: It's very nice to see that the algorithm reacts on the same parameters that I've discovered myself ... So it's nice to see that I agree with it. You could say that it’s supporting and it's safe to know, that it also says there was something here. [Case 3, Electrophysiologist #1]
- Quote 12: What's happening here is that the ICD detects that the patient has VT, and then the prediction tool bases it’s predictions on that. But it’s not entirely correct, because the device has recently been re-programmed to sense everything. [Case 1, Electrophysiologist #6]
- Quote 13: This case has nothing to do with risk of VT/VF [...] it's the second thing I look at. No, here I won’t use it [the prediction tool]. [Case 4, Electrophysiologist #1]
Time Saving
Insights: The prediction tool can speed up decision-making when trust is established; the prediction tool can reduce workload when trust is established; the prediction tool can reduce information search time when no or low risk is predicted; the prediction tool can substitute patient input; and showing important parameters enables work redelegation to technicians and nurses.
- Quote 14: It will give me a much better basis for decision making and I actually think it will save me a lot of time. Just like with all other new technology based on machine learning: the first 2 months I sit and read through to see what I have, but in month 3, I will look at the output alone. Because then I trust that it has pulled out what is appropriate, and then it starts saving me all the work I did in the beginning. But for everyone, it is that there is a phase for you personally to find out if this brings you further. […] I really think I would have come to the decision faster if I had seen this first. [Case 3, Electrophysiologist #7]
- Quote 15: I might reach a decision faster with this system if I can’t get a hold of the patient i.e., if the patient does not pick up the phone. Then it could well be that I look at the alarm and say “well, yes okay there is low risk.” [Case 2, Electrophysiologist #6]
- Quote 16: You could make a scenario where the technician first looks at it [the prediction tool] and says ... okay there are those parameters and there is electrical storm, so we call in the patient and the doctor does not have to look at the transmission. That would support our workflow. [Case 1, Electrophysiologist #4]
In Which Cases Does the Prediction Tool Support Clinical Decision-Making?
In 23 (66%) of the case walkthroughs, the electrophysiologists agreed that the prediction tool supported their decision-making, whereas in 8 (23%) of the walkthroughs they disagreed. Finding the prediction tool supporting was particularly pertinent in both patient cases 2 and 3, where 6 (86%) of the electrophysiologists agreed, and the prediction tool was found to assist decision-making by confirming the electrophysiologists’ clinical evaluations and expectations of an increasing risk of VF/VT (
, Quote 4). On the contrary, when the electrophysiologists were focused on predicting arrhythmias other than VT/VF, the prediction tool was deemed less useful, and answers were more heterogenous (Case 4 and Case 5). Some electrophysiologists said that the predictions served as a second opinion ( , Quote 5) and that the prediction tool was helpful for collecting arguments that supported the electrophysiologists when trying to “gather thoughts” about potential VT/VF occurrences ( , Quote 6). Nevertheless, some electrophysiologists found that showing the probability score as a percentage with decimals created uncertainty, and the naming of parameters was sometimes found difficult to interpret ( , Quote 7).Is Visualization of Important Parameters Useful?
The prediction tool’s visualization of the most important parameters in the prediction of increased or decreased probability of VT/VF arrhythmia was found useful when the electrophysiologists agreed with the parameters presented. In patient cases 2 and 3, 6 (86%) and 5 (71%) of the electrophysiologists agreed that showing important parameters supported their decision-making. However, when the parameters represented poor data quality (
, Quote 12), agreement was lower, for example, in patient Case 1 (43% agreed, 43% disagreed), or when the electrophysiologists were focused on predicting arrhythmias (Case 4 and Case 5) other than what the prediction tool was designed for ( , Quote 13).In general, presentation of important parameters provided explainability and supported decision-making by resembling the clinical interpretation process of what counts for or against the occurrence of VT/VF (
, Quote 9). Several electrophysiologists found that visualization of important parameters created more confidence in the prediction tool than the probability score alone as the tool summed up many of the same assumptions that the electrophysiologists already had ( , Quote 10). Listing the algorithm’s important parameters also enabled electrophysiologists to do in-situ validations of the prediction tool’s predictions by interpreting the data against the patient case ( , Quote 11). However, in some cases, the electrophysiologists found that the parameters were based on wrong data from the ICD transmission. In those cases, it enabled electrophysiologists to check if the prediction tool based its predictions on wrong or poor data quality and to decide whether to trust the predictions or not ( , Quote 12).Does the Prediction Tool Reduce Time for Decision-Making?
The electrophysiologists found that the prediction tool could enable a reduction in time for decision-making in cases where they trusted the predictions. Moreover, 5 (71%) of the electrophysiologists agreed that the prediction tool can help reach a decision faster (Case 2 and Case 3). However, agreement was lower (29% in Case 1 and 57% in Case 5) when predictions were found to be uncertain or less useful for handling patients.
Several of the electrophysiologists expressed that once they become familiar with the system, they expect the AI tool will speed up decision-making and reduce the diagnostic workload. This indicates that establishing trust in AI predictions is essential. One of the electrophysiologists explained how time can be saved when personal trust in the prediction tool is developed (
, Quote 14).Across all cases, several electrophysiologists found that the probability score and the presentation of important parameters can reduce information search time. Typically, electrophysiologists must retrieve valuable information by clicking through multiple webpages in the ICD manufacturer’s web-based system, which the prediction tool summarizes in a table. Some electrophysiologists also speculated that the tool could support decision-making when patient input is inaccessible, such as when a patient does not answer the phone (
, Quote 15). Other electrophysiologists considered that the tool can support workflow and reduce unnecessary time consumption for electrophysiologists by delegating decision-making to the technician ( , Quote 16).Clinician Preimplementation Perspectives of the Prediction Tool: Acceptability, Adoption, Demand, and Implementation
Acceptability
Acceptability of the prediction tool was high when patient cases concerned VT/VF, as the risk predictions were found to be relevant. However, several electrophysiologists had expectations that the prediction tool would bring new and groundbreaking insights (
, Quote 1) to support or challenge their decisions on which action to take. In cases where the task-technology fit was lower (Case 1, Case 4, and Case 5), acceptability was also lower ( , Quote 2). For some of the electrophysiologists, the prediction tool was therefore considered “nice to have” rather than “need to have” ( , Quote 3), while most of the electrophysiologists recognized the potential of the prediction tool. Some of the electrophysiologists considered the tool useful for standardizing decision-making across the electrophysiologist team by avoiding individual influences from recent experiences and thus achieving harmonization of individual treatment ( , Quote 4).Adoption
There was consensus that high precision is important for prediction tool adoption to happen. Several of the electrophysiologists emphasized that the positive or negative predictive value should be as unambiguous as possible, showing either low or high risk when the alarm is raised (
, Quote 5). Other electrophysiologists emphasized that false positives or negatives hinder adoption, which they explained to be the case for the adoption of OptiVol (an early warning alarm for fluid-related decompensation). Here, the electrophysiologist team decided not to use it due to too many false positives ( , Quote 6). Several electrophysiologists explained that acceptance and clinical adoption are collectively decided based on team experiences from real-world use ( , Quote 7) and from experiencing that the prediction tool actually confirms decisions in everyday clinical practice ( , Quote 8). Adoption can also be achieved through building trust in the tool by means of validation studies. Participants explained that trust is a precondition for adoption, which can be achieved by documenting effects in a randomized clinical trial and through algorithm validation in peer-reviewed journals ( , Quote 9).Demand
Several of the electrophysiologists emphasized that there is a high demand for workflow support in remote monitoring of cardiac device patients. They found the prediction tool useful for supporting more efficient prioritization and identification of important patient cases (
, Quote 10). Others described the demand for screening support among the increasing number of nonspecialized hospitals where fewer electrophysiologists are at work. For example, the prediction tool could support technicians doing the initial prioritization work more effectively and efficiently; the prediction tool could decrease electrophysiologists’ patient information search time when handed over from technicians; and the prediction tool could function as “data help” by enabling junior doctors to get a form of senior help by consulting the tool ( , Quote 11).Implementation
To ensure successful implementation, some electrophysiologists described how remote monitoring clinics may want to be able to adjust the threshold of the prediction tool to fit local workflows and prioritization rules. For example, technicians and electrophysiologists should be able to configure the prediction tool and decide on related actions, such as “no need to take action” or “need to contact the patient.” Relatedly, several electrophysiologists explained that indications of low-risk patients are especially useful in supporting clinicians in handling low-risk transmissions (
, Quote 12). Moreover, the electrophysiologists explained that the intention of using the prediction tool is dependent on easy access, as well as how well it presents data and alleviates the need for clicking through several web pages in remote monitoring systems ( , Quote 13). Some electrophysiologists added that it is practical that the algorithm uses data already available in remote monitoring systems, which are used daily for decision-making in the clinic. Knowing the data creates transparency and enables in-situ validation of the correctness of the probability score, thereby increasing the likelihood for success with implementation of the prediction tool ( , Quote 14).Themes, insights, and illustrative quotes describing clinicians’ preimplementation perspectives of the prediction tool.
Acceptability
Insights: Expectations that the prediction tool would bring new and more groundbreaking insights; overall usefulness of the prediction tool is “nice to have;” clear purpose is decisive for acceptability; intension of use is tied to the prediction tool’s task-technology fit; harmonizing individual treatment; and avoiding being influenced by recent experiences and reducing individual bias.
- Quote 1: […] it confirms the assessment you make, and that's fine, but it's not something groundbreaking, and that's okay too. [Interview, Electrophysiologist #5]
- Quote 2: […] I'm a little disappointed with the alarm, because for the cases I have looked at, the alarm has not given me much. […] if you had some cases with some ‘meat’ on such as a couple of treatment requiring VTs, I actually do think that getting a number, a risk score, will enable to better estimating the problem. I think that can be valuable […] I just think the cases were wrong. If you want to show that this algorithm gives value and then bring 2 cases with AF problems, which the algorithm does not handle, then that’s not optimal. [Interview, Electrophysiologist #2]
- Quote 3: It's always nice when something is supportive, I would say, but isn't it a “nice to have” and not a “need to have”? [Interview, Electrophysiologist #4]
- Quote 4: 20 years of experience or not. Perhaps, the advantage of the algorithm is that it is not influenced by what the individual clinician has experienced within the last month, and in this way helping to make more uniform conclusions. [Interview, Electrophysiologist #5]
Adoption
Insights: High precision is important; false positives hinder adoption; using the prediction tool and getting confirmation in real life creates trust and enables adoption; the clinical team needs to decide on use; a randomized clinical trial is a precondition for acceptability; and algorithm validation supports trust.
- Quote 5: If you want to come out with this, it must be something with a positive predictive value that is really good, so that you don’t get a lot of nonsense that you can’t use. The alarm should only be raised when there really is something. [Interview, Electrophysiologist #3]
- Quote 6: It needs to be easily accessible and we [team of electrophysiologists] have to agree that we trust it [the prediction tool]. We just have to say that yes it looks right. For example, the Optivol alarm had too many false positives, which gave a lot of extra work and everything, and we actually chose not to use it because there were too many sources of error, and you only really discover that when you work with it [new algorithms]. [Interview, Electrophysiologist #1]
- Quote 7: I would say that it [prediction tool] would be an instrument that would have to be accepted in our group and then you would find it valuable when we all agree to take the red alarms first, and in that way use it to prioritize a bit. [Interview, Electrophysiologist #5]
- Quote 8: I just think I should see that it confirms our decisions in enough cases - then I would feel comfortable about colleagues leaning on it […] There is something about trying it out, you know how it is. [Interview, Electrophysiologist #4]
- Quote 9: Published studies of the algorithm would increase confidence yes, because then you know that someone with an understanding of making these models have said that it looks okay; someone externally who have validated it. [Electrophysiologist #1]
Demand
Insights: Supporting better workflows; demand for prioritization and identification of important patient cases; increasing demand for screening tools in nonspecialist hospitals; demand for decreasing electrophysiologist’s information search time; supporting nurses and technicians to do prioritization work; and “data help” that enables junior doctors to get senior help.
- Quote 10: When transmissions come in, it’s almost an unsorted list of transmissions […] The list is unprocessed, so with the algorithm it takes it a step further by nuancing what comes into CareLink [Medtronic’s remote monitoring dashboard] with some semi-quantitative markings. And, if it is reliable, then it would be valuable. Partly because you don’t overlook anything, and partly because you are confirmed that we must take these patients first, because we have experience that there can be trouble here. [Interview, Electrophysiologist #5]
- Quote 11: You could say that in this way, the young doctor can do without getting senior help by actually getting data help. [Interview, Electrophysiologist #2]
Implementation
Insights: Demand for adjusting the threshold to local prioritization rules; clinics need to be able to configure the cutoff and threshold; making the prediction tool easily accessible and integrated in the list of transmissions supports the workflow; intention of use is dependent on easy access; and it is practical that the algorithm uses data that are already familiar to the clinicians.
- Quote 12: Electrophysiologists don’t bother to hear about it if it is below a certain percentage […]. We have adopted some rules, e.g. if you have a patient and she has got a shock, and gets rare therapies and it goes over, then we don’t need to hear about it because we think there is not a big risk. You might well imagine that introducing this alarm will support handling low-risk transmissions. [Interview, Electrophysiologist #3]
- Quote 13: If it’s easily presented and you don’t have to go in and look through 4 pages and such and if it was on the front page and brought up “number of episodes” and information like that - if you could easily retrieve the information [from the prediction tool] or if it was printed on the list of transmissions we are working on, then it would also be a great help. [Interview, Electrophysiologist #1]
- Quote 14: What one would emphasize, is that the algorithm uses the same data that the clinician uses i.e. it’s the same data, just integrated according to a formula that clinicians do not currently have available. [Interview, Electrophysiologist #5]
Discussion
Tackling the Sociotechnical Challenges of ML-Based Tools in Health Care
In bridging the sociotechnical gap between the development of ML-based tools and clinical implementation, this study explored the feasibility and clinical perspectives of using a prediction tool for improved workflows in ICD remote monitoring. We found that the feasibility of the ML-based tool is promising when the intended use of the tool is aligned with expectations, that is, by providing support for decision-making, visualizing useful information, and reducing time spent. The results also show that an actionable prediction tool is one that presents the reason for why the algorithm deemed as it did, such as in this study, by highlighting important data to be used for clinical evaluation and enabling clinicians to assess the algorithm’s outcome against their own evaluation [
, ].However, the current prediction tool did not lead to change in clinical action, suggesting that ML and explainability techniques do not outperform specialized and experienced electrophysiologist evaluations, but at best confirm and support the interpretation of complex ICD device information along with a promise for a less time-consuming clinical workflow.
The contribution of this paper lies in the implications of the qualitative results suggesting that clinical end-users, clinical contexts, and workflows must be included throughout an overall iterative approach to design, development, and implementation. In the following sections, we will discuss the qualitative results concerning the sociotechnical challenges and implementation of ML-based tools for clinical decision support.
Expectations Need to Align With the Intended Use of AI
In cases where misalignment emerged between the electrophysiologists’ expectations and intended use, the prediction tool was considered less useful and at best “nice to have” for clinical decision-making. For example, in cases where the ICD transmissions revolved around other types of arrhythmias than what the prediction tool was designed for and in cases where the electrophysiologists expected that the prediction tool should be capable of outperforming their own evaluation, disappointment was raised about the performance of the underlying AI algorithm. This aligns with recent studies that reported on physicians’ high expectations and attitudes toward medical AI [
, , , ]. The challenge of managing expectations has been addressed by a growing number of studies aimed at providing an explanation of algorithmic decisions at the time of inference [ ] and by developing user interfaces with expectation adjustment techniques [ ]. Recently, researchers focused on the early human-AI onboarding process of pathologists and found that presenting a global view of a prediction tool and its capabilities, limitations, and biases is key to the formation of initial impressions and appropriate mental models [ ]. This suggests that the development of so-called explainability in the user interface is important, but communicating the intended use of the prediction tool is imperative for acceptance in the clinic. To achieve alignment of expectations, training programs for clinicians are critical when implementing medical AI tools.Trust Emerges From Real-World Use
Trust is another key factor for user acceptance and adoption of AI technologies. Trust is typically considered an issue in creating transparent and understandable algorithmic behavior, as opposed to seeing the prediction tool as a black box [
, , ]. Extensive research on explainable AI and various approaches to achieve transparency have been suggested [ ], yet experimental studies on whether these approaches achieve their intended effects in the real world are only just starting to emerge [ , , , ]. In this study, the electrophysiologists requested large-scale algorithm validation and prospective evaluations from clinical trials. However, an important observation was that trust in the prediction tool may only emerge from continuous use of the tool and from experiencing confirmation on individual evaluations in collaboration with the tool. There was general agreement among the electrophysiologists that visualization of the most important predictive parameters helped raise confidence and trust over time, and that adoption of the prediction tool would hinge on the collective decision among the team of electrophysiologists. Recent experimental studies have reported similar findings [ , ] and have demonstrated that adding an AI prediction tool to the clinical evaluation can increase clinician confidence [ ]. The implication of understanding trust as emerging from real-world use is that when deploying medical AI in clinical settings, trust needs to be built bottom-up through weeks or months of trialing the new tool for clinicians to experience convincing reassurance. Therefore, initial implementation processes may benefit from simultaneous calibration and adaption of the tool to establish a human-AI partnership, and allowing the local team of clinicians to decide collectively how they choose to trust and use the tool in the clinic.Accuracy is Dependent on Workflow and Context
While AI algorithms have been validated and have been shown to have similar or higher accuracy than humans, recent studies of AI deployment in clinical settings report that professional autonomy, workflow, and local sociotechnical factors have impacts on how accuracy is perceived and used in clinical practice [
, , - , - ]. Bruun et al [ ] found that overall performance was positively impacted among clinicians using an AI-prediction tool for assessing progression in early stage dementia and that clinicians’ professional autonomy impacts the use of medical AI in situated clinical practice. Additionally, the study by Beede et al [ ] of a ML-based (deep learning) system used in clinics for the detection of diabetic eye disease indicated that several socioenvironmental factors, such as busy screening procedures, poor lighting conditions, and consideration of patient burden, have impacts on how AI accuracy is perceived in clinical screening practices. Similarly, we found that high accuracy becomes relative to the electrophysiologist’s evaluation of available information, the local circumstances, and the consequences that AI predictions have for taking action. For example, several electrophysiologists argued that AI prediction needs to be considered against patient-reported symptoms and that a full patient schedule may affect how the AI prediction is acted upon in practice. Moreover, in several cases, the electrophysiologists found the visualization of important parameters more useful than the prediction score itself. This indicates that AI accuracy needs to be understood as relational and dependent on available information and local workflows, which supports the vision of establishing a human-AI symbiosis that combines the predictive abilities of both the clinician and the AI prediction algorithm [ , ]. Finally, the wish for better visualization of data parameters over prediction accuracy indicates that the development of medical AI assistants needs to be carried out as close as possible to implementation in clinical practice with clinical end-users through iterative approaches [ , , ] that can bridge the “AI chasm” [ ] of scientifically sound algorithms and their use in meaningful real-world clinical applications.Limitations
The findings in this study are limited to the small number of study participants and patient cases. One electrophysiologist (PKJ) participated in co-design workshops, resulting in potential positive bias. Patient cases were selected to represent diversity in prediction capabilities, rather than the distribution in clinical practice, which may weaken the generalizability of the results. Only cases where the prediction tool provided true-positive and true-negative prediction outcomes were used, which means that the clinical feasibility of ML in cases with false-positive and false-negative outcomes [
] have not been explored. Future studies are needed to assess the implications of false prediction outcomes, as well as conduct algorithmic validation similar to recent related studies [ - ]. Limitations also involve data availability, that is, the data set used may entail algorithmic bias [ ] and the study participants may have been more positive toward innovative AI technology since all of the study participants were from a tertiary university hospital and constituted a rather homogenous group of highly specialized physicians. The AI studied has limitations, because only the random forest ML-based algorithm was evaluated with electrophysiologists. These types of methods are commonly used in medical applications [ , , ] because of their high classification accuracy and capabilities for handling data with imbalanced classes [ ] while providing easily accessible, if limited, global intelligibility through the visualization and ranking of parameter importance [ ]. This work will benefit from being validated in a large-scale multicenter study with higher diversity in participating electrophysiologists and workflows. It will be imperative to conduct prospective clinical trials evaluating the algorithm against standard care with regard to workload, cost-effectiveness, and hard clinical endpoints.Conclusions
This study shows that a tool based on ML for the prediction of VT/VF in remote monitoring of ICD patients has the potential to support electrophysiologists’ decision-making. While the prediction tool was regarded as “nice to have” rather than “need to have” in its current form, the tool demonstrated potential for supporting clinical decision-making, as it provided reassurance, increased confidence, and indicated the potential for reducing information search time, as well as enabled delegation of decisions to nurses and technicians. The findings also indicate that trust in the prediction tool, acceptable data quality, and clearly defined intended use are decisive for end-user acceptance and that adoption hinges on successful clinical implementation. This suggests that clinical end-users’ sociotechnical contexts and workflows need to be taken into consideration early on and continuously throughout a participatory design process to address the sociotechnical gap between the development and implementation of medical AI in cardiac care.
Acknowledgments
We wish to thank the participating electrophysiologists at the Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark. The research and technology developed (working title: SafeHeart) were supported by the European Data Pitch Innovation Program H2020–732506 and led by TOA.
Conflicts of Interest
TOA is a co-founder of Vital Beats, which has commercial interests in the technology under investigation. MKHH and CV were full-time employees of Vital Beats. JHS, SZD, MCHL, and SM are affiliated with Vital Beats as advisors or independent researchers. The authors have no other conflicts of interest to disclose.
Questionnaire used before the electrophysiologists are presented with the prediction tool results.
PDF File (Adobe PDF File), 79 KB
Questionnaire after the electrophysiologists have received the prediction tool results.
PDF File (Adobe PDF File), 103 KB
The semistructured interview guide.
PDF File (Adobe PDF File), 53 KBReferences
- Al-Khatib SM, Stevenson WG, Ackerman MJ, Bryant WJ, Callans DJ, Curtis AB, et al. 2017 AHA/ACC/HRS Guideline for Management of Patients With Ventricular Arrhythmias and the Prevention of Sudden Cardiac Death: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Rhythm Society. J Am Coll Cardiol 2018 Oct 02;72(14):e91-e220 [FREE Full text] [CrossRef] [Medline]
- Nanthakumar K, Epstein AE, Kay GN, Plumb VJ, Lee DS. Prophylactic implantable cardioverter-defibrillator therapy in patients with left ventricular systolic dysfunction: a pooled analysis of 10 primary prevention trials. J Am Coll Cardiol 2004 Dec 07;44(11):2166-2172 [FREE Full text] [CrossRef] [Medline]
- Varma N, Ricci RP. Telemedicine and cardiac implants: what is the benefit? Eur Heart J 2013 Jul 04;34(25):1885-1895 [FREE Full text] [CrossRef] [Medline]
- Mabo P, Victor F, Bazin P, Ahres S, Babuty D, Da Costa A, COMPAS Trial Investigators. A randomized trial of long-term remote monitoring of pacemaker recipients (the COMPAS trial). Eur Heart J 2012 May;33(9):1105-1111 [FREE Full text] [CrossRef] [Medline]
- Landolina M, Perego GB, Lunati M, Curnis A, Guenzati G, Vicentini A, et al. Remote monitoring reduces healthcare use and improves quality of care in heart failure patients with implantable defibrillators: the evolution of management strategies of heart failure patients with implantable defibrillators (EVOLVO) study. Circulation 2012 Jun 19;125(24):2985-2992 [FREE Full text] [CrossRef] [Medline]
- Saxon LA, Hayes DL, Gilliam FR, Heidenreich PA, Day J, Seth M, et al. Long-term outcome after ICD and CRT implantation and influence of remote device follow-up: the ALTITUDE survival study. Circulation 2010 Dec 07;122(23):2359-2367 [FREE Full text] [CrossRef] [Medline]
- Crossley GH, Boyle A, Vitense H, Chang Y, Mead RH, CONNECT Investigators. The CONNECT (Clinical Evaluation of Remote Notification to Reduce Time to Clinical Decision) trial: the value of wireless remote monitoring with automatic clinician alerts. J Am Coll Cardiol 2011 Mar 08;57(10):1181-1189 [FREE Full text] [CrossRef] [Medline]
- Cronin EM, Ching EA, Varma N, Martin DO, Wilkoff BL, Lindsay BD. Remote monitoring of cardiovascular devices: a time and activity analysis. Heart Rhythm 2012 Dec;9(12):1947-1951 [FREE Full text] [CrossRef] [Medline]
- Ricci RP, Morichelli L, D'Onofrio A, Calò L, Vaccari D, Zanotto G, et al. Manpower and outpatient clinic workload for remote monitoring of patients with cardiac implantable electronic devices: data from the HomeGuide Registry. J Cardiovasc Electrophysiol 2014 Nov 28;25(11):1216-1223 [FREE Full text] [CrossRef] [Medline]
- Andersen TO, Nielsen KD, Moll J, Svendsen JH. Unpacking telemonitoring work: Workload and telephone calls to patients in implanted cardiac device care. Int J Med Inform 2019 Sep;129:381-387 [FREE Full text] [CrossRef] [Medline]
- Gustafson, PE. Implementing Remote Monitoring of Cardiac Implantable Electronic Devices: The Clinical Experience from One Center in Sweden. The Journal of Innovations in Cardiac Rhythm Management 2014;5:1733-1739 [FREE Full text] [CrossRef]
- Patel B, Sengupta P. Machine learning for predicting cardiac events: what does the future hold? Expert Rev Cardiovasc Ther 2020 Feb 23;18(2):77-84 [FREE Full text] [CrossRef] [Medline]
- Crowley RJ, Tan YJ, Ioannidis JPA. Empirical assessment of bias in machine learning diagnostic test accuracy studies. J Am Med Inform Assoc 2020 Jul 01;27(7):1092-1101 [FREE Full text] [CrossRef] [Medline]
- Lagu T, Pekow PS, Shieh M, Stefan M, Pack QR, Kashef MA, et al. Validation and Comparison of Seven Mortality Prediction Models for Hospitalized Patients With Acute Decompensated Heart Failure. Circ: Heart Failure 2016 Aug;9(8):e002912 [FREE Full text] [CrossRef]
- Pan Y, Fu M, Cheng B, Tao X, Guo J. Enhanced Deep Learning Assisted Convolutional Neural Network for Heart Disease Prediction on the Internet of Medical Things Platform. IEEE Access 2020;8:189503-189512 [FREE Full text] [CrossRef]
- Eltrass AS, Tayel MB, Ammar AI. A new automated CNN deep learning approach for identification of ECG congestive heart failure and arrhythmia using constant-Q non-stationary Gabor transform. Biomedical Signal Processing and Control 2021 Mar;65:102326 [FREE Full text] [CrossRef]
- Hammad M, Iliyasu AM, Subasi A, Ho ESL, El-Latif AAA. A Multitier Deep Learning Model for Arrhythmia Detection. IEEE Trans. Instrum. Meas 2021;70:1-9 [FREE Full text] [CrossRef]
- Shouval R, Hadanny A, Shlomo N, Iakobishvili Z, Unger R, Zahger D, et al. Machine learning for prediction of 30-day mortality after ST elevation myocardial infraction: An Acute Coronary Syndrome Israeli Survey data mining study. Int J Cardiol 2017 Nov 01;246:7-13 [FREE Full text] [CrossRef] [Medline]
- Karnan H, Natarajan S, Manivel R. Human machine interfacing technique for diagnosis of ventricular arrhythmia using supervisory machine learning algorithms. Concurrency Computat Pract Exper 2018 Oct 19;33(4):e5001 [FREE Full text] [CrossRef]
- Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. The Lancet 2019 Sep 07;394(10201):861-867 [FREE Full text] [CrossRef] [Medline]
- Shakibfar S, Krause O, Lund-Andersen C, Aranda A, Moll J, Andersen TO, et al. Predicting electrical storms by remote monitoring of implantable cardioverter-defibrillator patients using machine learning. Europace 2019 Feb 01;21(2):268-274 [FREE Full text] [CrossRef] [Medline]
- Oh S, Kim JH, Choi S, Lee HJ, Hong J, Kwon SH. Physician Confidence in Artificial Intelligence: An Online Mobile Survey. J Med Internet Res 2019 Mar 25;21(3):e12422 [FREE Full text] [CrossRef] [Medline]
- Maassen O, Fritsch S, Palm J, Deffge S, Kunze J, Marx G, et al. Future Medical Artificial Intelligence Application Requirements and Expectations of Physicians in German University Hospitals: Web-Based Survey. J Med Internet Res 2021 Mar 05;23(3):e26646 [FREE Full text] [CrossRef] [Medline]
- Bruun M, Frederiksen KS, Rhodius-Meester HFM, Baroni M, Gjerum L, Koikkalainen J, et al. Impact of a clinical decision support tool on prediction of progression in early-stage dementia: a prospective validation study. Alzheimers Res Ther 2019 Mar 20;11(1):25 [FREE Full text] [CrossRef] [Medline]
- Manz C, Chivers C, Liu M, Regli SB, Changolkar S, Evans CN, et al. Prospective validation of a machine learning algorithm to predict short-term mortality among outpatients with cancer. JCO 2020 May 20;38(15_suppl):2009-2009 [FREE Full text] [CrossRef]
- Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018;1:39 [FREE Full text] [CrossRef] [Medline]
- Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 2016 Dec 13;316(22):2402-2410 [FREE Full text] [CrossRef] [Medline]
- Kang J, Morin O, Hong JC. Closing the Gap Between Machine Learning and Clinical Cancer Care-First Steps Into a Larger World. JAMA Oncol 2020 Nov 01;6(11):1731-1732 [FREE Full text] [CrossRef] [Medline]
- Beede E, Baylor E, Hersch F, Iurchenko A, Wilcox L, Ruamviboonsuk P, et al. A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy. In: CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 2020 Presented at: 2020 CHI Conference on Human Factors in Computing Systems; April 25-30, 2020; Honolulu, HI p. 1-12. [CrossRef]
- Chen JH, Asch SM. Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations. N Engl J Med 2017 Jun 29;376(26):2507-2509 [FREE Full text] [CrossRef] [Medline]
- Lindsell CJ, Stead WW, Johnson KB. Action-Informed Artificial Intelligence-Matching the Algorithm to the Problem. JAMA 2020 Jun 02;323(21):2141-2142. [CrossRef] [Medline]
- Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019 Jan;25(1):44-56 [FREE Full text] [CrossRef] [Medline]
- Lee J. Is Artificial Intelligence Better Than Human Clinicians in Predicting Patient Outcomes? J Med Internet Res 2020 Aug 26;22(8):e19918 [FREE Full text] [CrossRef] [Medline]
- Cabitza F, Campagner A, Balsano C. Bridging the "last mile" gap between AI implementation and operation: "data awareness" that matters. Ann Transl Med 2020 Apr;8(7):501 [FREE Full text] [CrossRef] [Medline]
- Coiera E. The Last Mile: Where Artificial Intelligence Meets Reality. J Med Internet Res 2019 Nov 08;21(11):e16323 [FREE Full text] [CrossRef] [Medline]
- Abdul A, Vermeulen J, Wang D, Lim BY, Kankanhalli M. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In: CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 2018 Presented at: 2018 CHI Conference on Human Factors in Computing Systems; April 21-26, 2018; Montreal, QC, Canada p. 1-18. [CrossRef]
- Wolf CT. Explainability scenarios: towards scenario-based XAI design. In: IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces. 2019 Presented at: 24th International Conference on Intelligent User Interfaces; March 17-20, 2019; Marina del Ray, CA p. 252-257. [CrossRef]
- Miller T. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 2019 Feb;267:1-38 [FREE Full text] [CrossRef]
- Poursabzi-Sangdeh F, Goldstein DG, Hofman JM, Vaughan JW, Wallach H. Manipulating and Measuring Model Interpretability. In: CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 2021 Presented at: 2021 CHI Conference on Human Factors in Computing Systems; May 8-13, 2021; Yokohama Japan p. 1-52. [CrossRef]
- Coiera EW. Artificial intelligence in medicine: the challenges ahead. J Am Med Inform Assoc 1996;3(6):363-366 [FREE Full text] [CrossRef] [Medline]
- Keane PA, Topol EJ. With an eye to AI and autonomous diagnosis. NPJ Digit Med 2018;1:40 [FREE Full text] [CrossRef] [Medline]
- Quiroz JC, Laranjo L, Kocaballi AB, Berkovsky S, Rezazadegan D, Coiera E. Challenges of developing a digital scribe to reduce clinical documentation burden. NPJ Digit Med 2019 Nov 22;2(1):114 [FREE Full text] [CrossRef] [Medline]
- Kocaballi AB, Ijaz K, Laranjo L, Quiroz JC, Rezazadegan D, Tong HL, et al. Envisioning an artificial intelligence documentation assistant for future primary care consultations: A co-design study with general practitioners. J Am Med Inform Assoc 2020 Nov 01;27(11):1695-1704 [FREE Full text] [CrossRef] [Medline]
- Petkovic D, Altman R, Wong M, Vigil A. Improving the explainability of Random Forest classifier - user centered approach. Pac Symp Biocomput 2018;23:204-215 [FREE Full text] [Medline]
- Benda NC, Das LT, Abramson EL, Blackburn K, Thoman A, Kaushal R, et al. "How did you get to this number?" Stakeholder needs for implementing predictive analytics: a pre-implementation qualitative study. J Am Med Inform Assoc 2020 May 01;27(5):709-716 [FREE Full text] [CrossRef] [Medline]
- Sendak M, Elish MC, Gao M, Futoma J, Ratliff W, Nichols M, et al. "The human body is a black box": supporting clinical decision-making with deep learning. In: FAT '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020 Presented at: 2020 Conference on Fairness, Accountability, and Transparency; January 27-30, 2020; Barcelona, Spain p. 99-109. [CrossRef]
- Elish MC, Watkins EA. Repairing Innovation: A Study of Integrating AI in Clinical Care. Data Society. 2020. URL: https://datasociety.net/wp-content/uploads/2020/09/Repairing-Innovation-DataSociety-20200930-1.pdf [accessed 2021-11-18]
- Andersen TO, Nunes F, Wilcox L, Kaziunas E, Matthiesen S, Magrabi F. Realizing AI in Healthcare: Challenges Appearing in the Wild. In: CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 2021 Presented at: 2021 CHI Conference on Human Factors in Computing Systems; May 8-13, 2021; Yokohama, Japan p. 1-5. [CrossRef]
- Ginestra JC, Giannini HM, Schweickert WD, Meadows L, Lynch MJ, Pavan K, et al. Clinician Perception of a Machine Learning–Based Early Warning System Designed to Predict Severe Sepsis and Septic Shock*. Critical Care Medicine 2019;47(11):1477-1484 [FREE Full text] [CrossRef]
- Barda AJ, Horvat CM, Hochheiser H. A qualitative research framework for the design of user-centered displays of explanations for machine learning model predictions in healthcare. BMC Med Inform Decis Mak 2020 Oct 08;20(1):257 [FREE Full text] [CrossRef] [Medline]
- Romero-Brufau S, Wyatt KD, Boyum P, Mickelson M, Moore M, Cognetta-Rieke C. A lesson in implementation: A pre-post study of providers' experience with artificial intelligence-based clinical decision support. Int J Med Inform 2020 May;137:104072 [FREE Full text] [CrossRef] [Medline]
- Sandhu S, Lin AL, Brajer N, Sperling J, Ratliff W, Bedoya AD, et al. Integrating a Machine Learning System Into Clinical Workflows: Qualitative Study. J Med Internet Res 2020 Nov 19;22(11):e22421 [FREE Full text] [CrossRef] [Medline]
- Wang D, Wang L, Zhang Z, Wang D, Zhu H, Gao Y, et al. “Brilliant AI Doctor” in Rural Clinics: Challenges in AI-Powered Clinical Decision Support System Deployment. In: CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 2021 Presented at: 2021 CHI Conference on Human Factors in Computing Systems; May 8-13, 2021; Yokohama, Japan p. 1-18. [CrossRef]
- Yang Q, Steinfeld A, Zimmerman J. Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes. In: CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019 Presented at: 2019 CHI Conference on Human Factors in Computing Systems; May 4-9, 2019; Glasgow, Scotland p. 1-11. [CrossRef]
- Ribeiro MT, Singh S, Guestrin C. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016 Presented at: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13-17, 2016; San Francisco, CA p. 1135-1144. [CrossRef]
- Andersen T, Bjørn P, Kensing F, Moll J. Designing for collaborative interpretation in telemonitoring: re-introducing patients as diagnostic agents. Int J Med Inform 2011 Aug;80(8):e112-e126 [FREE Full text] [CrossRef] [Medline]
- Altman NS. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 1992 Aug;46(3):175 [FREE Full text] [CrossRef]
- Friedman JH. Greedy function approximation: A gradient boosting machine. Ann. Statist 2001 Oct 1;29(5):1189-1232 [FREE Full text] [CrossRef]
- Freund Y, Schapire R. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 1999 Sep;14(5):771-780 [FREE Full text]
- Chang CC, Lin CJ. LIBSVM. ACM Trans. Intell. Syst. Technol 2011 Apr;2(3):1-27 [FREE Full text] [CrossRef]
- Karim F, Majumdar S, Darabi H, Chen S. LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 2018;6:1662-1669 [FREE Full text] [CrossRef]
- Bowen DJ, Kreuter M, Spring B, Cofta-Woerpel L, Linnan L, Weiner D, et al. How we design feasibility studies. Am J Prev Med 2009 May;36(5):452-457 [FREE Full text] [CrossRef] [Medline]
- Li AC, Kannry JL, Kushniruk A, Chrimes D, McGinn TG, Edonyabo D, et al. Integrating usability testing and think-aloud protocol analysis with "near-live" clinical simulations in evaluating clinical decision support. Int J Med Inform 2012 Nov;81(11):761-772. [CrossRef] [Medline]
- Charmaz K. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. London: Sage Publications; 2006.
- Abdullah R, Fakieh B. Health Care Employees' Perceptions of the Use of Artificial Intelligence Applications: Survey Study. J Med Internet Res 2020 May 14;22(5):e17620 [FREE Full text] [CrossRef] [Medline]
- Kocielnik R, Amershi S, Bennett PN. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. In: CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019 Presented at: 2019 CHI Conference on Human Factors in Computing Systems; May 4-9, 2019; Glasgow, Scotland p. 1-14. [CrossRef]
- Cai CJ, Winter S, Steiner D, Wilcox L, Terry M. "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making. Proc. ACM Hum.-Comput. Interact 2019 Nov 07;3(CSCW):1-24. [CrossRef]
- LaRosa E, Danks D. Impacts on Trust of Healthcare AI. In: AIES '18: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018 Presented at: 2018 AAAI/ACM Conference on AI, Ethics, and Society; February 2-3, 2018; New Orleans, LA p. 210-215. [CrossRef]
- Yin M, Wortman Vaughan J, Wallach H. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In: CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019 Presented at: 2019 CHI Conference on Human Factors in Computing Systems; May 4-9, 2019; Glasgow, Scotland p. 1-12. [CrossRef]
- Amann J, Blasimme A, Vayena E, Frey D, Madai VI, Precise4Q consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak 2020 Nov 30;20(1):310 [FREE Full text] [CrossRef] [Medline]
- Maniatopoulos G, Procter R, Llewellyn S, Harvey G, Boyd A. Moving beyond local practice: reconfiguring the adoption of a breast cancer diagnostic technology. Soc Sci Med 2015 Apr;131:98-106 [FREE Full text] [CrossRef] [Medline]
- Wang D, Yang Q, Abdul A, Lim BY. Designing Theory-Driven User-Centric Explainable AI. In: CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019 Presented at: 2019 CHI Conference on Human Factors in Computing Systems; May 4-9, 2019; Glasgow, Scotland p. 1-15. [CrossRef]
- Gajowniczek K, Grzegorczyk I, Ząbkowski T. Reducing False Arrhythmia Alarms Using Different Methods of Probability and Class Assignment in Random Forest Learning Methods. Sensors (Basel) 2019 Apr 02;19(7):1588 [FREE Full text] [CrossRef] [Medline]
- Azar AT, Elshazly HI, Hassanien AE, Elkorany AM. A random forest classifier for lymph diseases. Comput Methods Programs Biomed 2014 Feb;113(2):465-473 [FREE Full text] [CrossRef] [Medline]
- Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak 2011 Jul 29;11(1):51-13 [FREE Full text] [CrossRef] [Medline]
Abbreviations
AI: artificial intelligence |
ICD: implantable cardioverter defibrillator |
LSTM: long short-term memory |
ML: machine learning |
VF: ventricular fibrillation |
VT: ventricular tachycardia |
Edited by A Kushniruk; submitted 05.01.21; peer-reviewed by T Canares, V Montmirail; comments to author 02.03.21; revised version received 23.03.21; accepted 11.10.21; published 26.11.21
Copyright©Stina Matthiesen, Søren Zöga Diederichsen, Mikkel Klitzing Hartmann Hansen, Christina Villumsen, Mats Christian Højbjerg Lassen, Peter Karl Jacobsen, Niels Risum, Bo Gregers Winkel, Berit T Philbert, Jesper Hastrup Svendsen, Tariq Osman Andersen. Originally published in JMIR Human Factors (https://humanfactors.jmir.org), 26.11.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on https://humanfactors.jmir.org, as well as this copyright and license information must be included.