Background: The assessment of usability is a complex process that involves several steps and procedures. It is important to standardize the evaluation and reporting of usability procedures across studies to guide researchers, facilitate comparisons across studies, and promote high-quality usability studies. The first step to standardizing is to have an overview of how usability study procedures are reported across the literature.
Objective: This scoping review of reviews aims to synthesize the procedures reported for the different steps of the process of conducting a user-centered usability assessment of digital solutions relevant for older adults and to identify potential gaps in the present reporting of procedures. The secondary aim is to identify any principles or frameworks guiding this assessment in view of a standardized approach.
Methods: This is a scoping review of reviews. A 5-stage scoping review methodology was used to identify and describe relevant literature published between 2009 and 2020 as follows: identify the research question, identify relevant studies, select studies for review, chart data from selected literature, and summarize and report results. The research was conducted on 5 electronic databases: PubMed, ACM Digital Library, IEEE, Scopus, and Web of Science. Reviews that met the inclusion criteria (reporting on user-centered usability evaluation procedures for any digital solution that could be relevant for older adults and were published in English) were identified, and data were extracted for further analysis regarding study evaluators, study participants, methods and techniques, tasks, and test environment.
Results: A total of 3958 articles were identified. After a detailed screening, 20 reviews matched the eligibility criteria. The characteristics of the study evaluators and participants and task procedures were only briefly and differently reported. The methods and techniques used for the assessment of usability are the topics that were most commonly and comprehensively reported in the reviews, whereas the test environment was seldom and poorly characterized.
Conclusions: A lack of a detailed description of several steps of the process of assessing usability and no evidence on good practices of performing it suggests that there is a need for a consensus framework on the assessment of user-centered usability evaluation. Such a consensus would inform researchers and allow standardization of procedures, which are likely to result in improved study quality and reporting, increased sensitivity of the usability assessment, and improved comparability across studies and digital solutions. Our findings also highlight the need to investigate whether different ways of assessing usability are more sensitive than others. These findings need to be considered in light of review limitations.
Digital solutions, defined as any set of technologies, systems, and mobile apps that are available on a digital device such as an iPad, a laptop, or a smartphone , have become popular in different areas, namely to optimize and personalize health care provision [ ], to promote healthy lifestyles (eg, physical activity) [ , ], to minimize loneliness and social exclusion by promoting social, religious, civic, and political participation [ - ], or to improve safety, independence, and confidence [ ].
The accelerated aging of the population imposes several challenges on the health care and social systems. Owing to the higher rates of disease and morbidity [, ], digital solutions have been noted as a valid contributor to help reach a high number of individuals at lower costs [ ]. However, developing digital solutions adjusted to older adults presents specific challenges related to age and disease, such as loss of visual and hearing acuity or changes in fine motricity. These need to be considered so that the technology matches the users’ needs and characteristics and, ultimately, its use results in an added value in daily life [ , ]. To guarantee that a digital solution is fully adjusted to its users, a robust evaluation process must be considered [ ]. One of the key attributes of digital solutions that require careful attention and evaluation is usability.
Usability is part of the user experience, that is, the total usage phenomenon , and is defined as the measure by which a product can be used by specific users to achieve specific goals with effectiveness, efficiency, and satisfaction in a specific context of use [ ]. Efficacy refers to the degree of accuracy and completeness with which users achieve certain goals in a given environment, efficiency is related to the accuracy and completeness of the goals achieved with regard to the resources used, and satisfaction is defined as the comfort and acceptance on the use of a system [ ]. Furthermore, the level of usability obtained depends on the specific circumstances in which the product is used and the usage context includes users, tasks, equipment (hardware and software), and the physical and social environment, as all of these factors can influence the usability of digital solutions [ ]. In other words, usability is the ability of a product to be understood, learned, used, and attractive to the user, when used under specific conditions. This definition reinforces the idea that a product has no intrinsic usability and only the ability to be used under specific conditions [ ]. Good usability allows reducing task execution times, errors, or learning times; improves user satisfaction; and leads to improved product acceptability, increased user satisfaction, and improved product reliability [ ].
Usability evaluation is an important part of the overall development of user interaction mechanisms, which consists of interactive cycles of design, prototyping, and validation . Ideally, usability evaluation must be present at all development stages and must be iterative to enable a continuous evolution of the quality of the product or service. The literature describes several models, methods, and techniques to ensure that usability issues are considered during the development process. The selection of these models, methods, and techniques depends on the development stage of digital solutions and available resources [ ]. Certain models of usability assessment rely on usability experts, whereas others rely on end users (user-centered usability assessment). The former are known as the analytical models [ ] and involve the inspection of the digital solution by experts to assess the various aspects of user interaction against an established set of principles of interface design and usability [ , ]. The latter refer to the empirical models [ ] and involve having the perspective of users and are key to highly usable digital solutions by ensuring that the digital solutions meet the users’ needs and requirements, that is, they are adapted to the body and mind of their user in a given context [ ]. This perspective is gathered using different methods (eg, test and inquiry) and techniques (eg, interviews, think-aloud, and observation), which are usually combined [ ]. Both models are essential in the development process of digital solutions and provide complementary information [ ]. This review focuses on the users’ assessment of usability.
Usability assessment involving users is a complex task, and the use of only one method (eg, test or inquiry) may not be comprehensive enough to thoroughly consider all relevant issues associated with a given product or service . In addition, different methods have different strengths and weaknesses and provide information on different aspects of the digital solution [ ]. Nevertheless, it is important to standardize the evaluation and reporting of usability procedures across studies. This will guide researchers, facilitate comparisons across studies, promote high-quality usability studies, which would be more likely to identify usability problems, and provide relevant data that contribute to highly usable solutions. The first step to standardizing is to provide an overview of how user-centered usability evaluation procedures are reported in the literature.
This scoping review of reviews aims to synthesize the procedures used or reported for the different steps of the process of conducting a user-centered usability assessment of digital solutions relevant for older adults and identify potential gaps in the present reporting of procedures. The secondary aim is to identify the principles guiding this assessment.
This study followed the 5-stage scoping review methodology defined by Levac et al  based on the framework previously developed by Arskey and O’Malley [ ]. The stages include (1) identification of the research question, (2) identification of relevant studies, (3) selection of relevant studies, (4) charting the data, and (5) collating, summarizing, and reporting the results of the review. A scoping review of the literature aims to map key concepts, summarize a range of evidence, especially in complex fields, and identify gaps in the existing literature. It allows for broader perspectives in comparison with systematic reviews [ , ] and, therefore, was the appropriate approach for this study, in which we aimed to cover a broad range of usability evaluation procedures and identify gaps to direct future research.
Identification of the Research Question
The research question provides a roadmap for the subsequent stages of the review. It was defined based on the analysis of the literature in the field of usability evaluation of digital solutions and the expertise of the research team, that is, during our previous work in the field of usability evaluation, we identified a lack of consensus in the academic literature regarding the instruments, protocols, and methodologies used for assessing usability across a range of digital solutions (eg, websites, assistive technology, augmented reality). Therefore, to have a more in-depth knowledge of the practices and procedures used, the following research question was defined: What are the current practices for the user-centered assessment of the usability of digital solutions (eg, procedures instruments) relevant (ie, that could be used and have value) for the older adult population? This broad question was subdivided into 5 research questions: (1) What are the characteristics of study evaluators reported in user-centered usability studies for digital solutions relevant to older adults? (2) What are the characteristics of study participants reported in user-centered usability studies for digital solutions relevant to older adults? (3) How are the tasks used for user-centered usability studies for digital solutions relevant to older adults? (4) What are the methods and techniques used in user-centered usability studies for digital solutions relevant to older adults? and (5) Where (ie, the environment) do user-centered usability evaluations take place?
Identification of Relevant Studies
The search expression usability OR user experience was used in the electronic search carried out in PubMed, ACM Digital Library, IEEE, Scopus, and Web of Science. The search expression did not include older adults as we did not want to limit the inclusion of reviews to those specifically mentioning older adults. Databases were searched for English language reviews published between January 1, 2009, and January 23, 2020. The limit of 2009 was established, as 2007 was the year the ambient assisted living joint programme was launched by the European Commission, which is a transnational funding program exclusively focused on the research and development of digital solutions directed at older adults . Therefore, we searched for reviews from 2009 onward that covered the primary studies published after 2007.
Selection of Relevant Studies
All references were imported into Mendeley software (Elsevier, North-Holland) through which duplicates were removed. The first 300 abstracts were screened by 3 reviewers (HC, AS, and NR). Differences in judgment were used to refine the inclusion and exclusion criteria and were discussed until consensus was reached. This first phase of screening also served to build a common understanding of the inclusion and exclusion criteria. Screening of the remaining abstracts was performed by 1 reviewer (HC). Similarly, the first 10 full articles were screened by 2 reviewers (HC and AS), and differences in judgment were discussed until consensus was reached. If consensus was difficult to attain, a third reviewer who is a senior reviewer and an expert on usability (NR) was consulted. The remaining full papers were independently screened by one of these 3 reviewers.
To be included in this scoping review, studies had to report on user-centered usability procedures or methods of evaluation for any type of digital solution that could be relevant for older adults and that was (1) published in English; (2) a review, either systematic, scoping, or narrative review; (3) addressing and synthesizing evidence on any of the steps or methodologies used for usability assessment; and (4) addressing usability in general or for a specific digital solution that was considered relevant (this was a subjective judgment made by the authors of the review) to older adults or those caring for older adults, such as informal caregivers, family members, or health care professionals.
Studies were excluded if they (1) were grossly unrelated to the study topic (eg, chemistry field); (2) targeted children or younger age groups (eg, digital solutions for children with diabetes); (3) addressed usability for nondigital solutions (eg, buildings) or digital solutions assessed as not of interest for older adults or those caring for them (eg, moodle and eLearning solutions); and (4) addressed usability of digital solutions for caregivers of older adults, but only those studies that did not involve interaction or feedback with older persons or those caring for them were included.
Charting the Data and Collating, Summarizing, and Reporting the Results
The data extraction tool was developed using an iterative team process. The preliminary data extraction categories were derived from our research questions. The following data were extracted from each review: authors, year of publication, purpose/aim of the study, and the number of studies included in the review. Further extraction, analysis, and reporting of results were guided by the framework proposed by Ellsworth et al  for reporting usability evaluations, and the following operational definitions were used for this review:
- Study evaluators, that is, the individuals who conducted the usability evaluation.
- Participants, that is, the individuals who were asked to evaluate the usability of a product or service.
- Tasks, that is, the activities that participants were asked to perform when evaluating the usability of a product or service.
- Methods and techniques: methods refer to the set of techniques used to perform formative user-centered usability evaluation of a certain type at any stage of the product or service development. Usability evaluation techniques refer to a set of procedures used to perform a usability evaluation and collect data of a certain type. For this review, methods and techniques of usability evaluation were categorized and defined as presented in (adapted from Martins et al [ ]). Usability assessment usually requires the combination of more than one method, can be conducted remotely (ie, evaluators are separated in space from users) or in the presence of the participants, and can be synchronous (ie, occur at the time of the participants’ interaction with the system) or asynchronous [ ].
- The test environment, that is, the environment where the evaluation of usability takes place: (1) laboratory or controlled conditions, usually a transversal assessment, or (2) in a real context, that is, the usability assessment is carried out in the same context and circumstances where the end product or service is expected to be used, which is usually a longitudinal assessment.
Details on the characteristics of each of these components of the usability assessment were extracted.
|Method and definition and technique for data collection||Definition|
|Test: involves observing users while they perform predefined tasks and consists of collecting mostly quantitative data; the test is centered on the interaction of the user with the technology|
|Performance evaluation||Evaluated by recording elements related to the execution of a particular task (eg, execution time, success or failure, number of errors, eye-tracking, and automated usability evaluation or logfiles or web usage analysis or app-use generated data or sensor data)|
|Observation||Attentive visualization and systematic recording of a particular phenomenon, including people, artifacts, environments, behaviors, and interactions. Observation can be direct, when the researcher is present during the task execution, or indirect, when the task is observed through other means such as video recording|
|Think-aloud||Users are invited to talk about what they see, do, think, or feel as they interact with the system or service|
|Inquiry: provide valuable, subjective, and usually qualitative information on the users’ opinions and expectations|
|Focus groups||Involves a small number of people in an informal discussion|
|Interviews||Involves a one-to-one interaction to gather opinions, attitudes, perceptions, and experiences|
|Scales/questionnaires||Collects data on characteristics, thoughts, feelings, perceptions, behaviors, or attitudes, measuring either one (scale) or several (questionnaire) dimensions of usability. It is important to distinguish whether instruments were validated|
|Diary studies||Users record events related to their experience in the context of daily activity and later share them with the evaluators|
|Card sorting||It involves participants using logic while sorting content or cards into categories or groups that make sense to them, given the information they are provided with|
The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram for this scoping review is presented in. A total of 3958 articles were identified from the 5 electronic databases. Of these, 1298 were eliminated because they were duplicates or did not have the author’s name. The remaining 2660 records were screened based on title and abstract and 2509 were excluded because they were not reviews (66/2660, 2.48%) or were out of scope (2443/2660, 91.8%). A total of 151 full texts were read for further analysis. Of these, 115 manuscripts were excluded because they were not related to usability, 3 articles were not found, and 13 reported on the assessment of usability by experts. Therefore, 20 reviews were included in this scoping review of the reviews. Of these, 19 were systematic reviews and one was a narrative review. presents the main characteristics of the included reviews (study, purpose, and number of included studies).
|Study||Purpose of the review||Number of studies included in the review|
|Ellsworth et al (2017) ||Review methods employed for usability testing on electronic health records||120|
|Allison et al (2019) ||Review methodologies and techniques to evaluate websites; provide a framework of the appropriate website attributes that could be applied to any future website evaluations||69|
|Azad-Khaneghah et al (2020) ||Review the rating scales used to evaluate usability and quality of mobile health applications||87|
|Baharuddin et al (2013) ||Propose a set of usability dimensions that should be considered for designing and evaluating mobile applications||Not referred|
|Bastien (2010) ||List test procedures and define and develop tools to help conduct user tests||Not referred (narrative review)|
|Bhutkar et al (2013) ||List the most commonly applied usability evaluation methods and related emerging trends||30|
|Cavalcanti et al (2018) ||Understand which methods and user assessment approaches are commonly used in motor rehabilitation studies that use augmented reality applications||32|
|Fernandez et al (2012) ||Analyze the usability evaluation methods that have proven to be the most effective in the web domain||18|
|Fernandez et al (2011) ||Analyze the usability evaluation methods that have been employed to evaluate web applications over the last 14 years||206|
|Fu et al (2017) ||Assess the usability of diabetes mobile apps developed for adults with type 2 diabetes||7|
|Hussain et al (2014) ||Review the relevant and appropriate usability dimensions and measurements for banking applications||49|
|Inal et al (2020) ||Analyze how usability is being addressed and measured in mobile health interventions for mental health problems||42|
|Klaassen et al (2016) ||Analyze if usability methods are equally employed for different end-user groups and applications||127|
|Lim et al (2019) ||Identify, study, and analyze existing usability metrics, methods, techniques, and areas in mobile augmented reality learning||72|
|Narasimha et al (2017) ||Analyzing the characteristics of usability-related studies conducted using geriatric participants and the subsequent usability challenges identified||16|
|Shah and Chiew (2019) ||Identify, analyze, and synthesize the usability features and assessment approaches of pain management mobile applications targeted at the evaluation studies||27|
|Simor et al (2016) ||Analyze usability evaluation methods used for gesture-based games, considering devices with the motion-sensing capability||10|
|Sousa and Lopez (2017) ||Identify psychometrically tested questionnaires that measure the usability of eHealth tools||35|
|Yen and Bakken (2012) ||Review and categorize health information technology usability study methods, and to provide practical guidance on health information technology usability evaluation||346|
|Zapata et al (2015) ||Review a set of selected papers that perform a usability evaluation of mobile health–related mobile apps||22|
Only 4 out of the 20 (20%) [, , , ] included reviews briefly mentioned any characteristic of the evaluators' profile. One of the reviews [ ] reported that one of the 32 articles included mentioned that the person who performed the usability assessment was a blind evaluator. One review stated that several studies (exact numbers not provided) used graduate students as both evaluators to perform usability inspections and participants in experimental sessions (eg, think-aloud protocol, remote user testing) [ ], whereas another review [ ] reported that usability evaluations were conducted by researchers. In a review by Ellsworth et al [ ], 29% (35/120) of the included articles presented the description of the study evaluators responsible for designing and carrying out the usability evaluation, but the characteristics reported in primary studies were not provided.
Half of the reviews included in this scoping review did not refer to the characteristics of the participants included in the primary studies reviewed. Of the reviews, 50% (10/20) reviews that reported on any of the participants' characteristics, 4 reported mean age or age range [, , , ], 4 reported the gender of participants [ , , , ], 8 reported the sample size [ , , , , , , , ], and 7 reported on other characteristics of participants by describing them as healthy participants or as having a specific clinical condition [ , , , , , , ]. Nevertheless, 20% (4/20) reviews that reported the age of the participants also reported that not all primary studies detailed such information. Similar findings were reported for gender and sample size. No reference to sample size calculation or rationale for deciding on sample size was provided. Other characteristics of participants mentioned were being healthy, having a specific clinical condition, belonging to a specific occupational group (health care providers or students), and previous experience with mobile devices. presents a description of the information provided within the included reviews.
Only 2 of the 20 (10%) included reviews referred to the tasks that participants were asked to perform for the usability evaluation [, ]. Simor et al [ ] conducted a usability evaluation for gesture-based games and reported that the games and, consequently, the usability evaluation of each study had different aims, target populations, interfaces, and details, but in the majority of the studies, the protocol used was presented. Zapata et al [ ] performed a systematic review on mobile health apps and reported that 17 of the 22 primary studies included reported the number of tasks performed by the users. The number of tasks ranged between 1 and 25.
Methods and Techniques
Of the 20 systematic reviews included, only 3 (15%) [, , ] did not refer to the methods and techniques of usability used. Among the inquiry methods, the questionnaires/scales (15/20, 75%) and interviews (12/20, 60%) were most commonly reported. Among the test methods, the techniques of performance (9/20, 45%) and think-aloud were the most commonly reported (6/20, 30%; ). Of the 20 reviews, 6 (30%) reported on combinations of techniques mentioning a total of 22 different combinations of 4, 3, or 2 techniques. Most combinations include at least one technique from each method, which indicates that a multimethod approach was used ( ). Among scales/questionnaires, which constitute the technique most often reported, the most common usability assessment scales were the System Usability Scale [ , , - , , ] and the Post-Study System Usability Questionnaire [ , , , ]. The other scales/questionnaires include the Questionnaire for User Interaction Satisfaction [ , , ], the Software Usability Measurement Inventory [ , ], the Usefulness, Satisfaction, and Ease of use Questionnaire [ , ], the Computer System Usability Questionnaire [ , ], the After-Scenario Questionnaire [ , ], the Perceived Useful and Ease of Use [ ], the IsoMetrics usability inventory [ ], the Health Information Technology Usability Evaluation Scale [ ], the user Mobile Application Rating Scale [ ]; the IBM ease of use [ ], and the ISO 9241–11 Questionnaire [ ]. In addition, several reviews have reported the use of nonvalidated questionnaires [ , , , ]. One review reported that 26% of the included studies used a remote assessment of usability, where participants are in an uncontrolled environment [ ].
|Performance evaluation (n=9)||Observation (n=3)||Think-aloud (n=6)||Focus group (n=3)||Interview (n=12)||Scales or questionnaires (n=15)||Diary studies (n=1)||Card sorting (n=1)|
|Allison et al (2019) ||✓a||—b||—||—||—||✓||—||—|
|Azad-Khaneghah et al (2020) ||—||—||—||—||—||✓||—||—|
|Bastien (2010) ||—||—||—||—||✓||—||✓||—|
|Bhutkar et al (2013) ||✓||—||✓||—||✓||—||—||—|
|Cavalcanti et al (2018) ||✓||—||✓||—||—||✓||—||—|
|Ellsworth et al (2017) ||—||—||—||✓||✓||✓||—||✓|
|Fernandez et al (2012) ||✓||—||✓||—||✓||✓||—||—|
|Fernandez et al (2011) ||✓||—||✓||✓||✓||✓||—||—|
|Fu et al (2017) ||✓||—||—||—||—||✓||—||—|
|Klaassen et al (2016) ||✓||✓||—||—||✓||✓||—||—|
|Lim et al (2019) ||✓||—||—||—||✓||✓||—||—|
|Narasimha et al (2017) ||—||—||—||—||✓||✓||—||—|
|Shah and Chiew (2019) ||—||✓||—||—||✓||✓||—||—|
|Simor et al (2016) ||—||—||—||—||✓||✓||—||—|
|Sousa and Lopez (2017) ||—||—||—||—||—||✓||—||—|
|Yen and Bakken (2012) ||—||✓||✓||✓||✓||✓||—||—|
|Zapata et al (2015) ||✓||—||✓||—||✓||✓||—||—|
aReported in the review.
|Cavalcanti et al (2018) ||Fu et al (2017) ||Inal et al (2020) ||Shah & Chiew (2019) ||Simor et al (2016) ||Zapata et al (2015) |
|Observation + performance evaluation + think-aloud + scale/questionnaire||✓a||N/Ab||N/A||N/A||N/A||N/A||✓|
|Observation + performance evaluation + scale/questionnaire + interview||✓||N/A||N/A||N/A||N/A||N/A||✓|
|Observation + scale/questionnaire+ interview + diary studies||N/A||N/A||✓||N/A||N/A||N/A||✓|
|Performance evaluation + think-aloud + scale/questionnaire + interview||N/A||N/A||✓||N/A||N/A||N/A||✓|
|Observation + performance evaluation + think-aloud + interview||N/A||N/A||✓||N/A||N/A||N/A||✓|
|Performance evaluation + scale/questionnaire + interview||✓||N/A||✓||✓||N/A||N/A||✓|
|Performance evaluation + scale/questionnaire + focus group||N/A||N/A||✓||N/A||N/A||N/A||✓|
|Performance evaluation + scale/questionnaire + observation||✓||N/A||✓||N/A||N/A||N/A||✓|
|Performance evaluation + observation||N/A||N/A||✓||N/A||✓||N/A||N/A|
|Think-aloud + scale/questionnaire + interview||N/A||N/A||✓||✓||N/A||N/A||✓|
|Think-aloud + scale/questionnaire + interview||N/A||✓||N/A||N/A||N/A||N/A||✓|
|Scale/questionnaire + interview + focus group||N/A||N/A||✓||✓||N/A||N/A||N/A|
|Observation + scale/questionnaire + interview||✓||N/A||✓||✓||N/A||N/A||✓|
|Observation + scale/questionnaire||✓||✓||✓||N/A||N/A||N/A||✓|
|Observation + interview||N/A||N/A||N/A||✓||N/A||N/A||✓|
|Performance evaluation + observation||✓||N/A||N/A||N/A||N/A||N/A||N/A|
|Performance evaluation + scale/questionnaire||✓||N/A||✓||N/A||N/A||✓||✓|
|Think-aloud + scale/questionnaire||N/A||N/A||✓||N/A||N/A||N/A||✓|
|Scale/questionnaire + interview||✓||N/A||✓||✓||N/A||✓||N/A|
|Scale/questionnaire + diary studies||N/A||N/A||✓||N/A||N/A||N/A||N/A|
|Interview + focus group||N/A||N/A||✓||N/A||N/A||N/A||N/A|
aReported in the review.
bN/A: not applicable.
Of the 20 reviews, 2 (10%) reported on the environment where the usability assessment of the included studies took place. In a review by Bhutkar et al , of the 17 studies that reported on the test environment, 8 were conducted in hospitals, 5 in intensive care units, and 4 in laboratories. In addition, 31 of the 42 studies reviewed by Inal et al [ ], which focused on mobile health interventions for mental health problems, reported having conducted their usability testing in the natural environment of the participants with the technology deployed in the everyday environment of the intended users or their representatives. In addition, the review of Ellsworth et al [ ] did not provide data on the test environment; however, the test environment was an inclusion criterion, as they stated that they have included studies that tested the usability of the hospital and clinic electronic health records in the inpatient, outpatient, emergency department, or operating room settings.
This scoping review of reviews aims to synthesize the procedures used or reported for the different steps of the process of conducting a user-centered usability assessment of digital solutions relevant for older adults, to identify gaps in the literature, and to identify the best practices for each of the different steps. The results suggest that the characteristics of study evaluators and participants and task procedures are only briefly reported, and no agreement seems to exist on what should be reported. The methods and techniques used for the assessment of user-centered usability are the topics most commonly and comprehensively reported in the reviews, whereas the test environment is seldom and poorly characterized. Despite our aim of searching for reviews reporting on digital solutions relevant for older adults, only one of the included reviews specifically targeted older adults. This suggests that studies using older adults are scarce and that the findings of this scoping review also apply to usability studies with adults.
Our findings are in line with the review of Ellsworth et al , who reported that several of the included studies described the participants, but not the individual who conducted the usability assessment (study evaluator). The level of expertise and domain experience, whether the study evaluator is external to the team developing the product or service being assessed or, on the contrary, is part of the team and potentially has a conflict of interest when assessing usability, are examples of aspects that have the potential to influence the results of the usability assessment. Therefore, these should be reported by the authors. Most of the techniques are complex procedures of usability assessment; some of these depend on the interaction between the participant and the study evaluator and, therefore, require experience and knowledge to be assessed effectively.
The characteristics of the study participants most commonly reported across reviews were age and sex. However, these seem insufficient for the reader to make a judgment regarding the degree of similarity between the sample and the target end users. Educational or digital literacy levels are likely to influence how the participant perceives the usability of the system. For example, different subgroups of older adults may perceive the usability of the same system differently . Therefore, a detailed characterization of physical, emotional, cognitive, and digital skills is needed for an appropriate interpretation of the results of the usability evaluation in certain subgroups of older adults. Furthermore, a detailed characterization of health conditions might also be relevant [ ]. These aspects will also inform whether the sample used is representative of the end users. The use of nonrepresentative users and, therefore, the failure to consider their needs and preferences may result in products with low usability [ ]. In general, the sample sizes are small, and no rationale for the size of the sample is provided. The appropriate sample size for usability studies is a matter of debate, with some authors arguing that 4 or 5 participants are enough to identify approximately 80%-85% of usability problems [ - ], whereas others report that with these numbers of participants only 35% of usability problems are determined [ ]. The type of interfaces, the tasks performed by the participants, the context of use, and the state of technology development may explain the differences between studies [ ]. Furthermore, it is worth noting the definition of usability as the measure by which a product can be used by specific users to achieve specific goals with effectiveness, efficiency, and satisfaction [ ]. Conceivably, small sample sizes may be enough to detect usability problems but may be insufficient to have a broader view of usability more in line with the present definition.
Only 2 reviews reported on the tasks that participants were asked to perform to assess the usability of the product or service [, ], and both concluded that, in general, studies reported on the protocol of the tasks used. Tasks vary depending on several factors, such as study aims, target population, interfaces, methods, and techniques used for usability assessment [ ]. Nevertheless, the definition or selection of tasks that participants should perform should mirror the future use of the product or service [ , ]. No principles were found to guide the selection of tasks. For example, should there be a minimum set of tasks to be performed, should tasks require single or multiple steps, or should there be a minimum amount of time that each participant needs to spend using the product or service are illustrative examples of issues that are not clear.
The methods and techniques used for the assessment of usability have been consistently reported, and most reviews have found that a combination of methods and/or techniques are usually performed, in line with recommendations . Different methods and techniques have different strengths and limitations [ ] and, therefore, their combination is more likely to provide a comprehensive view of usability problems [ ]. For example, scales and questionnaires are easy to use and useful for gathering self-reported data about the user’s perception but might have limited value informing on which aspects of the system need to be targeted for improvement [ , ]. Scales and questionnaires should be valid, but a few reviews have reported the use of scales and questionnaires that are unlikely to have been validated. Although there might be reasons to develop or adapt a scale/questionnaire, this process must be followed by evidence of its validity [ ]. Interviews and observations are recommended when the number of participants is small because both generate high amounts of data that are time-consuming to analyze. Nevertheless, interviews can be useful to understand the reasoning of the user when facing a problem, and observation gives an insight into the moment when a problem occurs [ ]. It is argued that think-aloud protocols may result in the loss of focus on the tasks being performed, whereas user performance is an easy assessment, particularly in cases where the system automatically records the performance indicators, but might provide limited information if used alone [ ]. The most frequent multimethod combination described in the literature is the test and inquiry method combination; however, we found no information in the included reviews regarding which combination of techniques is the most sensitive and whether this could vary depending on the development stage of the product or service being evaluated. Furthermore, the combination of techniques should allow for the assessment of effectiveness, efficiency, and satisfaction, as these are all part of usability.
Only 2 reviews reported on the test environment, but both referred that most included studies reported usability testing to have been conducted in the real context. Nevertheless, we found no indication of how long the usability assessment should be conducted, that is, how long the participants should be allowed to use the product or service before assessing it, and whether conducting the usability assessment in a real context means that the product or service was used in the circumstances that it is expected to be used.
Recommendations and Future Research
The conducting of rigorous experiments on user-centered usability is likely to result in increased sensitivity for these experiments, that is, an increased ability to detect usability issues. Developing a consensus framework is likely to improve the quality of studies on usability evaluation and respective reporting, improve comparability of usability results across studies, provide digital solutions helping consumers and producers to identify the best products, improve the efficiency of the process of usability evaluation and facilitate further research on the impact of usability on other outcomes, such health-related outcomes.presents a list of parameters that we believe should be considered when planning and reporting user-centered usability studies. These parameters provide guidance while also being flexible to accommodate study differences regarding aspects such as study participants or the digital solution being assessed. At present, we are working on a Delphi-study aiming to establish an international consensus on user-centered usability evaluation procedures.
A proposed guide of aspects to consider when designing and reporting a user-centered usability evaluation study.
- Provide a rationale for sample size
- Experience with usability evaluation with users (if none, plan training)
- Establish clear inclusion and exclusion criteria (age, gender, educational level, and academic background)
- Clarify whether internal or external to product development
- Provide a rationale for sample size
- Define clear inclusion and exclusion criteria
- Define sampling methods (probability/nonprobability) and setting of recruitment
Methods and techniques:
- Provide a rationale for the combination of methods and techniques
- Define equipment needed
- Select valid and reliable instruments of assessment
- Define the number
- Provide a detailed description of tasks
- Develop a participant script
- Identify and justify the choice (lab test or field test or both; remote test or face to face)
- Identify facilities and material needed
- Ensure the existence of an observation room and recording room
- Ensure the proper functioning of all equipment necessary for the test evaluation
Limitations of This Scoping Review
Some limitations are directly related to the typology of this review, such as the absence of assessment of the quality of the included reviews and the quantitative summary of findings . Usability is also a topic on which a large number of publications are published as conference proceedings, and such publications were not specifically searched (selection bias). Nevertheless, it is likely that by including mostly reviews published in journals that these are more comprehensive, as conference proceedings tend to have lower word counts for included papers. Abstracts and full-text screening were performed first by 3 and 2 authors, respectively, and after a common understanding was built, only 1 reviewer screened the remaining abstracts and full papers. Although we believe that this did not have a major impact on the results, having only 1 person screening for inclusion might have increased the possibility of error and of not including a potentially relevant study. The judgment made to decide whether a manuscript was on a product or technology that could be of use for older adults was a subjective judgment made by the authors and could have biased the results toward the field of health. Finally, no cross-checking of the primary studies included in each review was made and, therefore, the same primary studies could have been included in more than one review.
In summary, we found a lack of a detailed description of several steps of the process of assessing the usability of digital solutions and no evidence on good practices. These findings suggest the need for a consensus framework on the assessment of usability that informs researchers and allows standardization of procedures. Furthermore, it highlights the need to investigate whether different techniques of assessing usability are more sensitive than others to detect usability issues.
Conflicts of Interest
Summary details of participant profile and sample size (sometimes percentages do not add up to 100%, as only partial information was provided in the review).DOCX File , 31 KB
- Beschorner B, Woodward L. Engaging teachers in a digital learner-centered approach to support understanding foundational literacy. In: Karchmer-Klein R, Pytash KE, editors. Effective Practices in Online Teacher Preparation for Literacy Educators. Pennsylvania: IGI Global; 2020:284-306.
- Wagner F, Basran J, Bello-Haas VD. A review of monitoring technology for use with older adults. J Geriatr Phys Ther 2012;35(1):28-34. [CrossRef] [Medline]
- Simões P, Silva AG, Amaral J, Queirós A, Rocha NP, Rodrigues M. Features, behavioral change techniques, and quality of the most popular mobile apps to measure physical activity: systematic search in app stores. JMIR Mhealth Uhealth 2018 Oct 26;6(10):e11281 [FREE Full text] [CrossRef] [Medline]
- Silva AG, Simões P, Queirós A, Rodrigues M, Rocha NP. Mobile apps to quantify aspects of physical activity: a systematic review on its reliability and validity. J Med Syst 2020 Jan 8;44(2):51. [CrossRef] [Medline]
- Gilson A, Dodds D, Kaur A, Potteiger M, Ii JH. Using computer tablets to improve moods for older adults with dementia and interactions with their caregivers: pilot intervention study. JMIR Form Res 2019 Sep 3;3(3):e14530 [FREE Full text] [CrossRef] [Medline]
- Leone C, Lim JS, Stern A, Charles J, Black S, Baecker R. Communication technology adoption among older adult veterans: the interplay of social and cognitive factors. Aging Ment Health 2018 Dec;22(12):1666-1677. [CrossRef] [Medline]
- Rogers WA, Mitzner TL. Envisioning the future for older adults: autonomy, health, well-being, and social connectedness with technology support. Futures 2017 Mar;87:133-139 [FREE Full text] [CrossRef] [Medline]
- Fortin M, Hudon C, Haggerty J, van den Akker M, Almirall J. Prevalence estimates of multimorbidity: a comparative study of two sources. BMC Health Serv Res 2010 May 6;10:111 [FREE Full text] [CrossRef] [Medline]
- Fortin M, Bravo G, Hudon C, Vanasse A, Lapointe L. Prevalence of multimorbidity among adults seen in family practice. Ann Fam Med 2005;3(3):223-228 [FREE Full text] [CrossRef] [Medline]
- de la Torre-Díez I, López-Coronado M, Vaca C, Aguado JS, de Castro C. Cost-utility and cost-effectiveness studies of telemedicine, electronic, and mobile health systems in the literature: a systematic review. Telemed J E Health 2015 Feb;21(2):81-85 [FREE Full text] [CrossRef] [Medline]
- Hajek A, Jens-Oliver B, Kai-Uwe S, Matschinger H, Brenner H, Holleczek B, et al. Frailty and healthcare costs-longitudinal results of a prospective cohort study. Age Ageing 2018 Mar 1;47(2):233-241. [CrossRef] [Medline]
- Cheung JT, Yu R, Wu Z, Wong SY, Woo J. Geriatric syndromes, multimorbidity, and disability overlap and increase healthcare use among older Chinese. BMC Geriatr 2018 Jun 25;18(1):147 [FREE Full text] [CrossRef] [Medline]
- Herndon JH, Hwang R, Bozic KH. Healthcare technology and technology assessment. Eur Spine J 2007 Aug;16(8):1293-1302 [FREE Full text] [CrossRef] [Medline]
- Pyla PS, Hartson R. Agile UX Design for a Quality User Experience. Massachusetts: Morgan Kaufmann Publishers; 2019.
- ISO 9241-11:2018(en) Ergonomics of Human-system Interaction — Part 11: Usability: Definitions and Concepts. ISO. 2018. URL: https://www.iso.org/obp/ui/#iso:std:iso:9241:-11:ed-2:v1:en [accessed 2020-10-01]
- Berntsen NO, Dybkjær L. Multimodal Usability. London, UK: Springer; 2010.
- Nunes IL. Ergonomics and usability – key factors in knowledge society. Enterp Work Innov Stud 2006;2:87-94.
- Martins A, Queirós A, Silva AG, Rocha NP. Usability evaluation methods: a systematic review. In: Saeed S, Bajwa IS, Mahmood Z, editors. Human Factors in Software Development and Design. Pennsylvania: IGI Global; 2014.
- Morrissey K. A review of 'universal methods of design: 100 ways to research complex problems, develop innovative ideas, and design effective solutions'. Visitor Studies 2014 Sep 25;17(2):222-224. [CrossRef]
- Dix A, Finlay G, Abowd G, Beale R. Human-Computer Interaction. New Jersey, US: Prentice Hall; 2004.
- da Costa RP, Canedo ED, de Sousa RT, de Oliveira RA, Villalba LJ. Set of usability heuristics for quality assessment of mobile applications on smartphones. IEEE Access 2019;7:116145-116161. [CrossRef]
- Dix A. In: Finlay G, Abowd G, Beale R, editors. Human-Computer Interaction. New Jersey, US: Prentice Hall; 2004.
- Bernsen N. In: Dybkjær L, editor. Multimodal Usability. London, UK: Springer; 2010.
- Martins AI, Queirós A, Rocha NP, Santos BS. Avaliação de usabilidade: uma revisão sistemática da literatura. Iber J Inf Syst Technol 2013 Jun 1;11:31-44. [CrossRef]
- Queirós A, Silva A, Alvarelhão J, Rocha NP, Teixeira A. Usability, accessibility and ambient-assisted living: a systematic literature review. Univ Access Inf Soc 2013 Oct 5;14(1):57-66. [CrossRef]
- Levac D, Colquhoun H, O'Brien KK. Scoping studies: advancing the methodology. Implement Sci 2010 Sep 20;5:69 [FREE Full text] [CrossRef] [Medline]
- Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol 2005 Feb;8(1):19-32. [CrossRef]
- Ambient Assisted Living. AAL Programme. 2007. URL: http://www.aal-europe.eu/ [accessed 2020-09-30]
- Ellsworth MA, Dziadzko M, O'Horo JC, Farrell AM, Zhang J, Herasevich V. An appraisal of published usability evaluations of electronic health records via systematic review. J Am Med Inform Assoc 2017 Jan;24(1):218-226 [FREE Full text] [CrossRef] [Medline]
- Martins AI. In: Queirós A, Silva AG, Rocha NP, editors. Usability Evaluation Methods: A Systematic Review, Human Factors. Pennsylvania: IGI Global; 2014.
- Allison R, Hayes C, McNulty CA, Young V. A comprehensive framework to evaluate websites: literature review and development of goodweb. JMIR Form Res 2019 Oct 24;3(4):e14372 [FREE Full text] [CrossRef] [Medline]
- Azad-Khaneghah P, Neubauer N, Cruz AM, Liu L. Mobile health app usability and quality rating scales: a systematic review. Disabil Rehabil Assist Technol 2020 Jan 8:1-10. [CrossRef] [Medline]
- Baharuddin R, Singh D, Razali R. Usability dimensions for mobile applications-a review. Res J Appl Sci Eng Technol 2013 Feb 21;11(9):2225-2231. [CrossRef]
- Bastien JC. Usability testing: a review of some methodological and technical aspects of the method. Int J Med Inform 2010 Apr;79(4):e18-e23. [CrossRef] [Medline]
- Bhutkar G, Konkani A, Katre D, Ray GG. A review: healthcare usability evaluation methods. Biomed Instrum Technol 2013;Suppl:45-53. [CrossRef] [Medline]
- Cavalcanti VC, Santana MI, Gama AE, Correia WF. Usability assessments for augmented reality motor rehabilitation solutions: a systematic review. Int J Comput Games Technol 2018 Nov 1;2018:1-18. [CrossRef]
- Fernandez A, Abrahão S, Insfran E. A Systematic Review on the Effectiveness of Web Usability Evaluation Methods. In: 16th International Conference on Evaluation & Assessment in Software Engineering. 2012 Presented at: EASE'12; June 1-7, 2012; Ciudad Real. [CrossRef]
- Fernandez A, Insfran E, Abrahão S. Usability evaluation methods for the web: a systematic mapping study. Inf Softw Technol 2011 Aug;53(8):789-817. [CrossRef]
- Fu H, McMahon SK, Gross CR, Adam TJ, Wyman JF. Usability and clinical efficacy of diabetes mobile applications for adults with type 2 diabetes: a systematic review. Diabetes Res Clin Pract 2017 Sep;131:70-81. [CrossRef] [Medline]
- Hussain A, Abubakar HI, Hashim B. Evaluating Mobile Banking Application: Usability Dimensions and Measurements. In: 6th International Conference on Information Technology and Multimedia. 2014 Presented at: CITM'14; May 20-23, 2014; Putrajaya. [CrossRef]
- Inal Y, Wake JD, Guribye F, Nordgreen T. Usability evaluations of mobile mental health technologies: systematic review. J Med Internet Res 2020 Jan 6;22(1):e15337 [FREE Full text] [CrossRef] [Medline]
- Klaassen B, van Beijnum BJ, Hermens HJ. Usability in telemedicine systems-a literature survey. Int J Med Inform 2016 Sep;93:57-69. [CrossRef] [Medline]
- Lim KC, Selamat A, Alias RA, Krejcar O, Fujita H. Usability measures in mobile-based augmented reality learning applications: a systematic review. Appl Sci 2019 Jul 5;9(13):2718. [CrossRef]
- Narasimha S, Madathil KC, Agnisarman S, Rogers H, Welch B, Ashok A, et al. Designing telemedicine systems for geriatric patients: a review of the usability studies. Telemed J E Health 2017 Jun;23(6):459-472. [CrossRef] [Medline]
- Shah U, Chiew T. A systematic literature review of the design approach and usability evaluation of the pain management mobile applications. Symmetry 2019 Mar 19;11(3):400. [CrossRef]
- Simor FW, Brum MR, Schmidt JD, Rieder R, de Marchi AC. Usability evaluation methods for gesture-based games: a systematic review. JMIR Serious Games 2016 Oct 4;4(2):e17 [FREE Full text] [CrossRef] [Medline]
- Sousa V, Lopez KD. Towards usable E-health. A systematic review of usability questionnaires. Appl Clin Inform 2017 May 10;8(2):470-490 [FREE Full text] [CrossRef] [Medline]
- Po-Yin Y, Bakken S. Review of health information technology usability study methodologies. J Am Med Inform Assoc 2012;19(3):413-422 [FREE Full text] [CrossRef] [Medline]
- Zapata BC, Fernández-Alemán JL, Idri A, Toval A. Empirical studies on usability of mHealth apps: a systematic literature review. J Med Syst 2015 Feb;39(2):1. [CrossRef] [Medline]
- Nielsen J, Landauer TK. A Mathematical Model of the Finding of Usability Problems. In: Conference on Human Factors in Computing Systems. 1993 Presented at: CHI'93; July 6-9, 1993; Amsterdam. [CrossRef]
- Virzi RA. Streamlining the design process: running fewer subjects. Proc Hum Factors Soc Annu Meet 2016 Aug 6;34(4):291-294. [CrossRef]
- Virzi RA. Refining the test phase of usability evaluation: how many subjects is enough? Hum Factors 2016 Nov 23;34(4):457-468. [CrossRef]
- Spool J, Schroeder W. Testing Web Sites: Five Users Is Nowhere Near Enough. In: Human Factors in Computing Systems. 2001 Presented at: HFCS'01; August 5-7, 2001; Seattle. [CrossRef]
- Martins AI, Queirós A, Silva AG, Rocha NP. ICF Based Usability Scale: Evaluating Usability According to the Evaluators' Perspective About the Users' Performance. In: 7th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion. 2016 Presented at: SDTE'16; March 2-3, 2016; Vila Real. [CrossRef]
- Munn Z, Peters MD, Stern C, Tufanaru C, McArthur A, Aromataris E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol 2018 Nov 19;18(1):143 [FREE Full text] [CrossRef] [Medline]
Edited by G Eysenbach; submitted 23.07.20; peer-reviewed by S Olsen, T Risling, A Jaiswal; comments to author 16.09.20; revised version received 29.09.20; accepted 29.10.20; published 13.01.21Copyright
©Anabela G Silva, Hilma Caravau, Ana Martins, Ana Margarida Pisco Almeida, Telmo Silva, Óscar Ribeiro, Gonçalo Santinha, Nelson P Rocha. Originally published in JMIR Human Factors (http://humanfactors.jmir.org), 13.01.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on http://humanfactors.jmir.org, as well as this copyright and license information must be included.