This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on http://humanfactors.jmir.org, as well as this copyright and license information must be included.
Mental health apps tend to be narrow in their functioning, with their focus mostly being on tracking, management, or psychoeducation. It is unclear what capability such apps have to facilitate a change in users, particularly in terms of learning key constructs relating to behavioral interventions. Thought Challenger (CBITs, Chicago) is a skill-building app that engages users in cognitive restructuring, a core component of cognitive therapy (CT) for depression.
The purpose of this study was to evaluate the learnability and learning performance of users following initial use of Thought Challenger.
Twenty adults completed in-lab usability testing of Thought Challenger, which comprised two interactions with the app. Learnability was measured via completion times, error rates, and psychologist ratings of user entries in the app; learning performance was measured via a test of CT knowledge and skills. Nonparametric tests were conducted to evaluate the difference between individuals with no or mild depression to those with moderate to severe depression, as well as differences in completion times and pre- and posttests.
Across the two interactions, the majority of completion times were found to be acceptable (5 min or less), with minimal errors (1.2%, 10/840) and successful completion of CT thought records. Furthermore, CT knowledge and skills significantly improved after the initial use of Thought Challenger (
The learning objectives for Thought Challenger during initial uses were successfully met in an evaluation with likely end users. The findings therefore suggest that apps are capable of providing users with opportunities for learning of intervention skills.
Commercially available mental health apps have been rapidly emerging over recent years, and demand for them is high [
Most apps with a focus on mental health are designed with a narrow functionality, focusing primarily on providing information to users as a way to enhance learning about their mental health symptoms or their management [
One such skills-based app is Thought Challenger, an app currently available through the Google Play Store [
The focus of CT is on educating patients about the impact of their thoughts on their mood while demonstrating how identifying, appraising, and modifying thoughts can lead to ultimate symptom reduction [
The effectiveness of behavioral health intervention apps to achieve proximal goals purported to lead to ultimate symptom change is rarely evaluated. Apps are most often evaluated using randomized controlled trials; many researchers, however, have noted the limitations of these trials in the evaluation of mobile app technologies [
It is also important to evaluate how well a user will learn a depression intervention skill through the use of an app, without first reviewing any instructions. The evaluation of learning without instruction is important, given that most users are unlikely to engage with instructions or help materials before use, despite the likely benefits of doing so [
Despite the growth in skills-based apps for mental health, the efficacy of such apps in promoting skills-based learning through their use is unknown. Furthermore, it has recently been documented that mental health providers may have concerns about the credibility and risk associated with treatment provided via mobile phone apps [
We will first describe Thought Challenger, following the framework for the evaluation of the app, and the specific procedures of the usability testing.
Thought Challenger, currently available through the Google Play Store, was informed by CT. It was specifically designed to aid users in engaging in the CT-based technique of thought restructuring. This process involves identifying thought distortions, which are unhelpful or erroneous thoughts that occur automatically but cause distress or mood changes in a person. Following the identification of such thought distortions, thought restructuring involves asking oneself questions to help challenge this distorted thought and to come up with a more helpful alternative thought [
Thought Challenger has two functions: challenge and review. The challenge feature is a tool designed to help restructure each thought through 5 steps: (1) “Catch It”: enter a recent maladaptive thought; (2) “Check It”: reflective questions are posed regarding the thought; (3) “Choose a Distortion”: identify in which type of cognitive distortion the thought likely falls; (4) Consider reflective questions tailored to the chosen type of distortion; and (5) “Change It”: enter a more adaptive thought. Within steps 1 and 5, Thought Challenger provides examples of possible maladaptive and adaptive thoughts, which users may select and use in their interaction with the thought restructuring tool. Thought Challenger also provides a review function so that users can see their past entries of all thoughts, listed by automatic thought, rational response, distortion, and date and time of interaction.
Attributes are usability features that measure different usability qualities of technology products [
Usability attributes and their application to learning evaluation.
Qualifier | Learnability | Learning performance |
Description | Level of ease through which a user gains proficiency | Actual impact on performance of a task/acquisition of knowledge |
Tasks for testing | Complete two attempts at using the Thought Challenger tool |
Complete a pre-and posttest of cognitive therapy and skills |
Measurement via | Time to complete interactions |
Scores on pre-and posttest |
Learning objectives | Identify how user interacts without instruction or didactic material |
Measure change in knowledge of cognitive therapy skills and concepts following initial use |
Learnability is defined as the level of ease through which a user gains proficiency with an app [
Learning performance is an attribute of usability relating to the actual impact of a technology on the performance of a task or acquisition of knowledge, such as the ability of a technology to aid in increasing capabilities to complete assignments in a classroom [
Recruitment of participants occurred from July to August 2015 from Web-based postings in the Chicago area of the United States, resulting in the participation of 20 adults. Inclusion criteria required that participants were at least 18 years of age, able to attend an in-lab testing session, and able to speak and read in English. As depression is a condition that is frequently chronic, characterized by patterns of remissions and relapses [
Participants were invited to a laboratory room located within Northwestern University’s Feinberg School of Medicine and were accompanied by a moderator, who provided guidance and noted participants’ actions throughout the testing session. Before the testing of Thought Challenger, participants engaged in a card-sorting task to identify the barriers to the use of apps for depression [
Traditional data collection methodologies, which have been successfully used in other evaluations of apps [
Study data were collected and managed using REDCap (Research Electronic Data Capture) tools hosted at Northwestern University [
At screening, the participants were asked to provide demographic information (ie, gender, race/ethnicity, age, education, and employment status). Thereafter, they completed the PHQ-9 and CT Tool Knowledge and Skill Pretest at screening [
The PHQ-9 is a 9-item self-report instrument measuring depressive symptomology with scores ranging from 0 to 27 [
The thought record entries in Thought Challenger were collected to measure success of users in Thought Challenger tool use, that is, identifying how accurately users engaged in thought restructuring on the app. Following the completion of all testing sessions, doctoral-level clinical psychologists blindly rated participants’ entries of maladaptive thoughts, assignment of type of cognitive distortion, and entries of alternative thoughts across their two interactions with the tool (such that each complete entry was rated by 2 separate psychologists). The psychologists were instructed to evaluate the entries as if they were thought records, a tool typically administered via paper, handed out in face-to-face CT to enable the practice of thought restructuring [
Given the small sample size and anticipated non-normal distribution (ie, participants ranging from no depressive symptoms to severe), nonparametric tests were conducted to analyze quantitative usability testing data. Wilcoxon signed-rank tests were used to analyze comparison of time to completion of the tool interaction on the first and second attempt, as well as comparison of scores before and after the interaction with Thought Challenger. To ensure that there were no significant differences between the participants recruited with PHQ-9 scores above and below 10, Mann-Whitney
Usability testing sample characteristics.
Demographic | PHQ-9a<10 |
PHQ-9≥10 |
Total |
Female, n (%) | 7 (63.6) | 8 (88.9) | 15 (75) |
Age in years, mean (standard deviation) | 34.5 (10.3) | 40.6 (14.0) | 37.2 (12.2) |
African American, n (%) | 4 (36.4) | 1 (16.7) | 5 (25) |
Asian, n (%) | 2 (18.1) | 0 (0) | 2 (10) |
Hispanic white, n (%) | 1 (9.1) | 0 (0) | 1 (5) |
Non-Hispanic white, n (%) | 5 (45.5) | 8 (88.9) | 13 (65) |
PHQ-9, mean (standard deviation) | 3.8 (3.2) | 14.4 (5.8) | 8.6 (7.0) |
History of depression, n (%) | 2 (18.2) | 7 (77.8) | 9 (45) |
History of anxiety, n (%) | 2 (18.2) | 5 (55.6) | 7 (35) |
aPHQ-9: Patient Health Questionnaire-9.
Tool interaction completion times, median (interquartile range).
Time point | PHQ-9a<10 | PHQ-9≥10 | Total |
Time 1 | 4:13 (4:01) | 3:57 (7:30) | 4:05 (4:04) |
Time 2 | 2:08 (1:11) | 3:57 (3:40) | 2:34 (2:00) |
aPHQ-9: Patient Health Questionnaire-9.
Ten errors occurred across the two interactions for each participant with the Thought Challenger tool. On the first attempt at the Thought Challenger challenge interaction, 9 mistakes were made, relating to attempts to interact with the Thought Challenger word cloud on the home screen (ie, clicking on the word cloud rather than a button), selecting “Review” rather than “Challenge” to begin to challenge a thought, and persistence in the remaining challenge interactions after first entering a maladaptive thought (eg, “I entered my thought in like it said, now what?”). No slips or fatal errors occurred for any participants across the first interaction.
On the second interaction with the Thought Challenger challenge tool, one fatal error occurred, preventing the user from completing the task even with provided instruction and guidance because of frustration saturation (ie, “I don’t want to start all over again and re-enter everything.”). This fatal error occurred by the user clicking “cancel” while entering data into the challenge tool. Thought Challenger brought the user back to the Thought Challenger home screen without saving the entered data and without prompting the user that data would be lost. This is an example of violating the usability heuristic of error prevention, which guides designers to reduce or eliminate conditions that are likely to lead to errors in interactions [
The total error rate for all initial interactions with the Thought Challenger tool was therefore defined by 10 (errors)/(21 [error opportunities] x 2 [number of interactions] x 20 [participants])=.012. Therefore, the error rate on initial interactions with Thought Challenger’s tool was 1.2%.
The majority of tool entries were rated as appropriate by doctoral level psychologists, with 75% (30/40) success in entries of a maladaptive thought, 51% (20/39) success in choice of type of thought distortion, and 74% (29/39) success in the entry of an adaptive thought. Consistent with face-to-face findings, the rate of success was determined to be 63% or greater [
To identify learning performance of users following use of Thought Challenger, all participants completed a pre- and posttest of CT skills and knowledge.
Cognitive therapy pre-and posttest scores, median (interquartile range).
Time point | PHQ-9a<10 | PHQ-9≥10 | Total |
Pretest | 26.0 (11.0) | 29.0 (5.5) | 28.5 (11.3) |
Posttest | 29.0 (6.0) | 32.0 (10.0) | 31.0 (6.8) |
aPHQ-9: Patient Health Questionnaire-9.
No significant differences in completion times or in the performance on the pre- and posttest of CT skills and knowledge before or after interactions with Thought Challenger were identified between the two groups above and below the threshold for a referral to psychotherapy (
This study aimed to evaluate CT learning during initial interactions with a publicly deployed, skills-based app for depression [
Thought Challenger met the evaluated learning objectives, creating entries in the tool that met the standard of accurately reflecting CT thought records at a rate of about 75%. This exceeded the benchmark of 63% of patients who were able to accurately complete the records as between-session homework throughout treatment [
Thought Challenger was able to impact learning without requiring users to read or engage with didactic content. This is in contrast to most currently available mental health apps, which focus on providing information about symptoms and/or their management (ie, inform) [
Although Thought Challenger met the criteria for learnability and learning performance established for this study, the evaluation indicated opportunities for improvement of the app. First, a fatal error occurred (ie, an error that prevented the user from completing the task even with provided instruction/guidance) [
There are several limitations and caveats that should be considered in interpreting these results. First, this was an evaluation of learnability and learning performance of Thought Challenger following initial use. It is unclear how these results would apply to long-term use, knowledge, skill application, or symptom reduction. Furthermore, as an evaluation of learning, there was no opportunity for comparison to other apps that promote learning (eg, different skills and psychoeducation only). Second, this study examined Thought Challenger in the context of users with symptom severity ranging from absent to severe depression, with the majority in the mild depressive range. It is unclear how these findings extend to users with other psychiatric or medical comorbidities. Third, while in-lab sessions were chosen over field-testing for multiple reasons, it is possible that the presence of a session moderator impacted user confidence or performance in a way that might have differed from field use. Finally, because of geographical limitations, the sample comprised urban and primarily younger users; it is unclear how well these findings extend to users in differing geographical locations and demographic groups.
This study employed usability methodology [
To the best of our knowledge, this is the first use of usability testing methods to evaluate learning in an app intended to help users to learn and practice an intervention skill. Future research is needed to explore the role of learning in such apps and how to continue to improve skills-based learning, particularly in users with depression. This will promote improved design and dissemination of such apps. There has been some noted skepticism of clinicians on the efficacy of mental health apps. However, the findings from this study suggest that users can learn to complete a therapeutic intervention skill effectively through the use of a mobile tool alone, without engaging in didactic content.
Center for Behavioral Intervention Technologies
cognitive therapy
Cognitive Therapy Awareness Scale
interquartile range
institutional review board
Patient Health Questionnaire-9
Research Electronic Data Capture
We are grateful for support from the United States National Institutes of Health, including R01 MH100482 (PI: Mohr); K08 MH102336 (PI: Schueller); and F31 MH106321 (PI: Stiles-Shields). This project was also supported by NIH/NCRR Colorado CTSI Grant Number UL1 RR025780. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies. We would especially like to thank Drs Ellen Koucky, Kristina Pecora, and Kate N Tomasino for their assistance with this project.
None declared.