Search Articles

View query in Help articles search

Search Results (1 to 10 of 284 Results)

Download search results: CSV END BibTex RIS


Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models

Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models

We appreciate the thoughtful critique of our manuscript “Perceptions and earliest experiences of medical students and faculty with Chat GPT in medical education: qualitative study” [1] by Zhao and Zhang [2]. Concerns over the generation of hallucinated citations by large language models (LLMs), such as Open AI’s Chat GPT, Google’s Gemini, and Hangzhou’s Deep Seek, warrant exploring advanced and novel methodologies to ensure citation accuracy and overall output integrity [3].

Mohamad-Hani Temsah, Ayman Al-Eyadhy, Amr Jamal, Khalid Alhasan, Khalid H Malki

JMIR Med Educ 2025;11:e73698

Citation Accuracy Challenges Posed by Large Language Models

Citation Accuracy Challenges Posed by Large Language Models

Large language models (LLMs) such as Deep Seek, Chat GPT, and Chat GLM have significant limitations in generating citations, raising concerns about the quality and reliability of academic research. These models tend to produce citations that are correctly formatted but fictional in content, misleading users and undermining academic rigor.

Manlin Zhang, Tianyu Zhao

JMIR Med Educ 2025;11:e72998

Authors’ Reply: The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary

Authors’ Reply: The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary

Juels Parker commented on our study comparing the sufficiency of Chat GPT, Google Bard, and Bing artificial intelligence (AI) in generating patient-facing responses to questions about five dermatological diagnoses [1,2]. He highlights an important need to compare AI to existing patient education tools, such as handouts, peer-reviewed articles, and patient-centered websites. We agree that AI is not a benign entity, and many resources exist for patients to learn about their conditions, aside from AI [3,4].

Courtney Chau, Hao Feng, Gabriela Cobos, Joyce Park

JMIR Dermatol 2025;8:e72540

The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary

The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary

Reference 1: The comparative sufficiency of ChatGPT, Google Bard, and Bing AI in answering diagnosis,chatgptTheme Issue (2023): AI and ChatGPT in Dermatology

Parker Juels

JMIR Dermatol 2025;8:e71768

Evaluating the Diagnostic Accuracy of ChatGPT-4 Omni and ChatGPT-4 Turbo in Identifying Melanoma: Comparative Study

Evaluating the Diagnostic Accuracy of ChatGPT-4 Omni and ChatGPT-4 Turbo in Identifying Melanoma: Comparative Study

There has a been rapid popularization of the LLM, Chat GPT for home-based medical inquiries [3]. Minimal research exists on Chat GPT’s accuracy in detecting melanoma. Given that patients are increasingly presenting internet-derived diagnostics during cancer consultations, it is imperative to understand the capabilities of commonly used AI engines, such as Chat GPT [4].

Samantha S. Sattler, Nitin Chetla, Matthew Chen, Tamer Rajai Hage, Joseph Chang, William Young Guo, Jeremy Hugh

JMIR Dermatol 2025;8:e67551

The Impact of ChatGPT Exposure on User Interactions With a Motivational Interviewing Chatbot: Quasi-Experimental Study

The Impact of ChatGPT Exposure on User Interactions With a Motivational Interviewing Chatbot: Quasi-Experimental Study

To determine the extent of exposure to Chat GPT, for each participant in MIBot (version 5.2 A), we included an additional short survey in the 1-week-later survey referred to as the Chat GPT survey. It contained 8 new questions designed to evaluate the participant’s knowledge and use of Chat GPT prior to engaging in MIBot (version 5.2 A). The full Chat GPT survey can be found in Multimedia Appendix 1.

Jiading Zhu, Alec Dong, Cindy Wang, Scott Veldhuizen, Mohamed Abdelwahab, Andrew Brown, Peter Selby, Jonathan Rose

JMIR Form Res 2025;9:e56973

Assessing the Diagnostic Accuracy of ChatGPT-4 in Identifying Diverse Skin Lesions Against Squamous and Basal Cell Carcinoma

Assessing the Diagnostic Accuracy of ChatGPT-4 in Identifying Diverse Skin Lesions Against Squamous and Basal Cell Carcinoma

We assessed the ability of Chat GPT to distinguish images of SCC and BCC from other lesions. Open AI’s application programming interface was used to query Chat GPT-4 Omni (Chat GPT-4 O) for assessing the performance in classifying 200 dermatoscopic images each of SCC, BCC, BK, melanocytic nevi, and 150 images of AK from the HAM10 K database [7]. Images were verified using histopathology (>50%), follow-up examination, expert consensus, or in-vivo confocal microscopy.

Nitin Chetla, Matthew Chen, Joseph Chang, Aaron Smith, Tamer Rajai Hage, Romil Patel, Alana Gardner, Bridget Bryer

JMIR Dermatol 2025;8:e67299

Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination

Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination

While it is not the user (eg, researchers or clinicians), but the manufacturer (eg, Open AI for the Chat GPT models) who assigns an intended medical use—which itself comes with further regulatory requirements—the clinical use of the currently available and mostly all-purpose LLMs remains challenging.

Julian Madrid, Philipp Diehl, Mischa Selig, Bernd Rolauffs, Felix Patricius Hans, Hans-Jörg Busch, Tobias Scheef, Leo Benning

JMIR Med Educ 2025;11:e58375

Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes

Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes

For example, Chat GPT 3.5 performed poorly in diagnosing an infectious disease known to be widely underdiagnosed [26]. Furthermore, Chat GPT 3.5 made different treatment recommendations based on insurance status, which might introduce health disparities [27]. When generating clinical cases, Chat GPT-4 failed to create cases that depicted demographic diversity and relied on stereotypes when choosing gender or ethnicity [28].

Rebekka Schnepper, Noa Roemmel, Rainer Schaefert, Lena Lambrecht-Walzinger, Gunther Meinlschmidt

JMIR Ment Health 2025;12:e57986