This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on https://humanfactors.jmir.org, as well as this copyright and license information must be included.
The
The objectives of the study were as follows: an evaluation of the overall user interface usability of the feasibility tool, the identification of critical usability issues, comprehensibility of the underlying ontology operability, and analysis of user feedback on additional functionalities. From these, recommendations for quality-of-use optimization, focusing on more intuitive usability, were derived.
To achieve the study goal, an exploratory usability test consisting of 2 main parts was conducted. In the first part, the thinking aloud method (test participants express their thoughts aloud throughout their use of the tool) was complemented by a quantitative questionnaire. In the second part, the interview method was combined with supplementary mock-ups to collect users’ opinions on possible additional features.
The study cohort rated global usability of the feasibility tool based on the System Usability Scale with a good score of 81.25. The tasks assigned posed certain challenges. No participant was able to solve all tasks correctly. A detailed analysis showed that this was mostly because of minor issues. This impression was confirmed by the recorded statements, which described the tool as intuitive and user friendly. The feedback also provided useful insights regarding which critical usability problems occur and need to be addressed promptly.
The findings indicate that the prototype of the
The past decade has seen various projects aimed at making medical data and biological samples available for research. On a national level, the German Biobank Node (GBN) [
The ABIDE project benefits from previous work using the infrastructure of the DICs established within the MII. In addition, the ABIDE project takes advantage of the experience gained from the German Biobank Alliance (GBA) [
The feasibility tool v1 at the time allowed a simple querying of data elements based on the COVID-19–specific German Corona Consensus Data Set (GECCO) [
Nonetheless, some usability problems were uncovered. Among the points noted were a need for clearer visualization of the subdivision of inclusion and exclusion criteria, a uniform display of linking using Boolean operators, and the ability to search for synonyms. In addition, a function was desired to save a created query and continue editing it later or to archive sent queries together with the results. These and other functionalities were the focus of the development of an improved version of the feasibility tool in the ABIDE project (hereinafter referred to as feasibility tool v2) as additional requirements. In addition, focus was placed on the integration of the temporal restriction of criteria, grouping of criteria, and representation of their temporal relationship to each other, which was defined as an additional technical development goal for the ABIDE project. Another priority was the extension of the searchable data set. This was intended to expand the underlying ontology to the entire core data set of the MII [
In this way, the entire patient collective of the participating university hospitals can be considered in future study cohorts by means of appropriate feasibility queries. Furthermore, the integration of the feasibility tool v2 into the German Research Data Portal for Health (Forschungsdatenportal für Gesundheit [FDPG]) [
The planned implementation was tested during development using a simulation prototype and supplementary mock-ups. On the basis of the feedback, a revised version of the feasibility tool (v3) will be created, which can then serve the development team as a reference for the final programming.
The objectives of the study were as follows: (1) an evaluation of overall user interface usability, (2) the identification of critical usability issues, (3) comprehensibility of the underlying ontology operability, and (4) analysis of user feedback on additional functionalities. From these, recommendations for quality-of-use optimization, focusing on more intuitive usability, were derived.
We conducted an exploratory usability test consisting of two main parts:
The thinking aloud method, in which test participants express their thoughts aloud throughout their use of the tool, was complemented by a quantitative questionnaire.
The interview method was combined with supplementary mock-ups to collect users’ opinions on possible additional features.
The participants tested the feasibility tool v2 on the web from their workplace. Neither randomization into intervention and control groups nor blinding took place.
The ethics committee at the Friedrich-Alexander-Universität Erlangen-Nürnberg approved the study (21-420-S).
The focus of the study was on the primary user group of the feasibility tool v2. These are researchers who have a research question and require a cohort with specific patient data or available biospecimens to address it. Professionals with a biobanking background and IT specialists with a research background were also recruited. This is because they are considered a secondary user group as it can be assumed that they will also use the tool (eg, to process internal queries). Recruitment was initiated and coordinated by the ABIDE project management, and potential study participants were contacted through project staff at each site. One prerequisite was that the participants should have had no prior experience with the tool to be tested. This prevented an overlap with those who tested the first prototype. In accordance with the requirements of the study protocol, a sufficient number of individuals were approached to achieve the sample size of at least 14 volunteers.
The feasibility tool v2 was evaluated in January 2022. Compared with the first release, this version includes the core MII data set in addition to the GECCO. This enhancement means that the FDPG can ultimately serve as a central point of contact for people who want to check the Germany-wide availability of data and biospecimens from affiliated university hospitals to answer their research questions. In alignment with study protocols, in which exclusion and inclusion criteria are usually formulated for research questions, the interface of the feasibility tool v2 was designed to be structured accordingly (
Criteria that are relevant for the study or should be avoided can be searched for in the respective areas using either a free-text search or a category tree (
After the initial selection of the criterion, a pop-up window opens offering the possibility to add further restrictions (
In addition to criterion-specific restrictions (eg, specification of a value range or the localization of the biospecimen), a temporal constraint is possible. The possibility to link the selected criteria using Boolean operators is offered as soon as the criteria have been finally added to the query. Once the desired query has been formulated, it can be executed (
As soon as the search query is processed, the result is displayed in the upper area under
Search interface of the feasibility tool v2 of the
Search options via free-text search or category tree.
Pop-up window with the possibility to specify selected criteria.
The query used to initiate the search process.
Interested participants were enrolled in the study after being recruited and provided with detailed information, including an informed consent form and a privacy statement. Upon receipt of the signed forms, an appointment was made to conduct the evaluation, which lasted approximately 60 minutes. After attending a brief welcome session and having been provided an overview of the study, the participants had to solve 3 tasks as part of an exploratory usability walk-through. The test leader protocolled the testing and the comments of the participants in a structured form. After the participants had completed the test tasks, we collected information regarding usability, demographic aspects, expertise, and so on, using the web-based survey tool SoSci Survey [
For backup reasons, the entire session was captured on Zoom (Zoom Video Communications, Inc) [
The evaluation team had compiled 2 test tasks themselves to be able to cover the entire range of functions of the feasibility tool v2 as far as possible. Care was taken to ensure that these tasks reflected realistic requests and varied in their degree of complexity. Moreover, a third task was formulated based on a real-world request submitted during an MII workshop. While carrying out the tasks, the test participants were encouraged to express their thoughts aloud according to the thinking aloud method [
After completing the tasks, the test participants were asked to describe their immediate impression of the feasibility tool v2 and, in particular, to list positive and negative design aspects as well as make suggestions for improvement. Subsequently, they were asked to assess the usability of the query tool using the System Usability Scale (SUS). According to Brooke [
A final interview block [
Mock-up for the additional feature groups.
Mock-up for the additional feature Time Linkage.
The corresponding interaction path was demonstrated to the test participants by the test leader for illustration purposes. On the basis of these mock-ups, the participants were asked to assess whether they perceived the approach as intuitive and, if not, what navigation path they would have expected. For the representation of
After the test sessions, the task processing protocols were checked for completeness, supplemented if necessary, and electronically documented. All positive and negative aspects of the tool were extracted from the protocols. Three usability and ontology experts categorized the problems separately as usability related or ontology related. The consensus decision was documented in a list. In cases where a sharp distinction between usability-related problem and ontology-related problem was not possible, these were grouped in a separate cluster. The negative aspects were additionally rated by 2 experts using the severity scale developed by Nielsen et al [
The correctness of task processing was both evaluated globally and differentiated for the respective task steps using a self-developed scoring system (
Regarding the SUS, we applied a quantitative evaluation using the scoring method formulated by Brooke [
Analogous to the thinking aloud protocols, the feedback obtained during the interviews regarding the additional functionalities was recorded and documented electronically. The statements were subjected to a descriptive qualitative content analysis.
The study cohort consisted of 22 test participants from 14 ABIDE partners. This corresponds to 92% (22/24) of the potential participants approached and thus comfortably exceeds the planned sample size of 14 test participants. The majority of the study cohort was composed of the younger age groups
Detailed characteristics of the study cohort.
Variables | Values | |
|
||
|
25 to 34 | 8 (36) |
|
35 to 44 | 11 (50) |
|
45 to 54 | 2 (9) |
|
55 to 64 | 1 (5) |
|
||
|
Male | 9 (41) |
|
Female | 13 (59) |
|
||
|
Researcher | 7 (32) |
|
Professional with biobanking background | 4 (18) |
|
IT professional with research background | 8 (36) |
|
Other | 2 (9) |
|
Not specified | 1 (5) |
Work experience (years), mean (SD) | 4.65 (5.34) | |
|
||
|
None | 10 (45) |
|
Some | 12 (55) |
|
||
|
No | 13 (59) |
|
Yes | 9 (41) |
|
||
|
Medium | 9 (41) |
|
High | 13 (59) |
|
||
|
Very low | 2 (9) |
|
Rather low | 5 (23) |
|
Medium | 8 (36) |
|
Rather high | 7 (32) |
The effectiveness analysis (completeness and accuracy) showed that no participant managed to solve all the tasks correctly (in the sense of matching the model solution). Task 1a was successfully completed by half of the test participants (11/22, 50%). Task 1b displayed the best performance with a success rate of 100%. In task 2, of the 22 participants, 14 (64%) obtained the correct result. By contrast, only 1 (5%) of the 22 participants was able to solve task 3.
The accuracy analysis of the partial steps that had to be processed within the assignments based on the scoring system is presented in
Task success according to the scoring system.
Task | Maximum possible score | Mean score achieved (SD) | Accuracy rate, % |
Task 1a | 8 | 7.23 (1.28) | 90.37 |
Task 1b | 1 | 1.00 (0.00) | 100 |
Task 2 | 8 | 7.64 (0.48) | 95.50 |
Task 3 | 5 | 3.32 (0.87) | 66.40 |
Of the maximum possible 8 points in task 1a and task 2, participants obtained an average of 7.23 (SD 1.28) points and 7.64 (SD 0.48) points, respectively. This corresponds to a success rate of 90.37% and 95.50%, respectively. Task 3 could only be completed correctly with an accuracy of 66.40%. With a maximum of 5 possible points, this corresponds to an average of 3.32 (SD 0.87) points scored. In task 1a, the major source of error was the choice of diagnosis (8/22, 36%). Instead of choosing “Essential (primary) hypertension,” participants often selected another characteristic containing the term “hypertension” (eg, “Hypertension [hypertensive disease]”). The same potential for error was present in task 3 for both criteria (“Vancomycin” [selected by 18, 82% of the 22 participants] and “treated in intensive care” [selected by 7, 32% of the 22 participants]) being searched. Less frequently, errors occurred because of an incorrect
System Usability Scale item and mean (SD) values
I think that I would like to use this query tool frequently: 4.6 (0.5)
I found the query tool unnecessarily complex: 1.6 (1.0)
I thought that the query tool was easy to use: 4.3 (0.8)
I think that I would need the support of a technical person to be able to use this query tool: 1.9 (1.0)
I found the various functions in this query tool were well integrated: 4.1 (0.7)
I thought there was too much inconsistency in this query tool: 1.8 (0.9)
I would imagine that most people would learn to use this query tool very quickly: 4.3 (0.8)
I found the query tool very cumbersome to use: 1.5 (1.0)
I felt very confident using the query tool: 3.8 (0.9)
I needed to learn a lot of things before I could get going with this query tool: 1.8 (0.7)
The mean SUS score of test participants classified as
The evaluation of the findability of criteria—based on the questions formulated in addition to the SUS scores—by the study participants indicates that the search for criteria was perceived as easy. Participants found that searching via the category tree tended to be more difficult than via the free-text search. More than half of the participants (14/22, 64%) had the impression that they could easily find the relevant criteria to solve the test tasks.
Rating of the additional items regarding the findability of criteria.
The analysis of the thinking aloud protocol revealed that the majority of the participants (13/22, 59%) assessed the user interface of the feasibility tool v2 as simple to use and intuitive. Searching for criteria using the free-text search was frequently emphasized as a helpful feature. Moreover, the clarity of the user interface and visual separation of the inclusion and exclusion criteria were highlighted as particularly positive. The switch button that makes it easy to change
In addition to the positive aspects, 39 usability problems were identified and classified using the severity scale developed by Nielsen et al [
Among the 5 usability catastrophes was that the free-text search bar was not easily located since the free-text input fields are grayed out indicating inactivity. In addition, the identification of relevant criteria in the results list of the free-text search was partly perceived as difficult, first because of the missing labeling of the code type and second because of the absence of traceability of the criteria path. Furthermore, the restriction of the time period with the operator
The usability catastrophes and major usability problems are visualized in
The study participants assessed the orientation at the upper level of the category tree as good. In addition, it was observed that the orientation at lower levels was perceived as comprehensible by the test participants if they had background knowledge about the criteria. Overall, most of the participants (14/22, 64%) found it quite easy to identify relevant criteria as shown in
The mixed use of German and English terms—predetermined by the MII core data set—was perceived as cumbersome by some test participants and led to comprehension problems. The sorting of the criteria in the category tree was criticized at several points, and preferred alternatives were suggested; for example, some of the participants (4/22, 18%) wanted the criteria to be ordered alphabetically, whereas others (2/22, 9%) preferred sorting by relevance. Furthermore, criteria with the designation
With regard to the additional features presented in the supplementary mock-ups, the interview analysis revealed that the implementation of the group function was considered successful and intuitive by almost all of the test participants (21/22, 95%). However, it was also pointed out that the
The option to link subgroups within a group in terms of temporal dependencies was perceived as rather complex. In principle, the function is considered useful because questions with temporal dependencies occur frequently, especially in the oncology field. However, the presented implementation of the function was still perceived as not very intuitive. Possible improvements could involve providing (1) a stronger emphasis of the button
The discussion on the depth of criteria nesting provided a heterogeneous picture. Regarding the intuitive approach, the recorded solutions varied from the entry of individual criteria and the formation of groups to the desired possibility of assigning criteria directly to other related criteria (eg, assigning the criterion
The rationale for this work was to simultaneously develop and assess the feasibility tool v2 regarding usability and to evaluate the comprehensibility of the underlying ontology with regard to the findability of criteria.
Thinking aloud tests are an established method for formative evaluations to identify usability problems and their causes early in the development process and have been applied several times in the clinical field for usability evaluation of query builders [
In addition, we conducted user interviews, which are fundamentally well suited to elicit user desires and insights and have been applied several times for usability evaluations [
Usability tests were conducted with a sample of 22 participants. This number is sufficient from the point of view of conducting (1) the thinking aloud test, which requires a minimum of 3 to 5 test participants [
Overall, the combination of methods allowed us to obtain a very diverse picture of user views and identify important usability issues that would need to be addressed in the next iteration. Furthermore, this combination of methods was easy to apply without the need for any special application knowledge and could be performed within a reasonable amount of time to obtain ideas for further developments very promptly.
The evaluation of the usability of the feasibility tool v2 indicated a good degree of user-friendliness. The quantitative evaluation of the SUS questionnaire also confirmed the impression gathered through user feedback. In comparison with the previous version of the prototype, it can be stated that the critical usability problems identified in the evaluation by Sedlmayr et al [
Comparing the feasibility tool v2 with similar tools, it can be stated that it performs relatively well. With a SUS score of 81.15, the feasibility tool v2 performed better than the query tools Informatics for Integrating Biology and the Bedside (i2b2) [
Although the group function is technical and graphical rather easy to implement, the temporal link is more complex. Methods for technical as well as graphical implementation can already be found in the literature [
We also discovered that the underlying ontology has a crucial impact on the usability and acceptance of a feasibility tool. This was particularly evident in the direct comparison between the extended version with the comprehensive MII core data set [
Despite the efforts we made to apply a real-world approach to the study design to obtain meaningful results for the subsequent development steps, our work includes some limitations. First, it should be mentioned that test participants were recruited for the study at sites that were ABIDE project partners. Nevertheless, care was taken to ensure that the participants were not directly involved in the project work so that they could provide an unbiased evaluation. Another aspect that could have contributed to selection bias is the fact that the study was conducted via Zoom. This method saves time and resources, but it lacks the advantages offered by a standardized test environment, although the literature shows that remote testing can be expected to produce results similar to those of laboratory testing and is an equally good method for usability testing [
Another limitation is that a prototype was evaluated. On the one hand, this had the consequence that neither test data nor real data were connected; thus, no realistic results could be provided after the query was sent. As this is only a small aspect, and the focus was on the general usability of the tool, it can be assumed that this factor is negligible. On the other hand, because the prototype did not contain all functionalities, the envisaged additional functions could only be presented in the form of mock-ups. In this way, the analysis of the navigation path and usability was limited. However, because the evaluation took place during development, we see it as an advantage that the planned implementation could first be tested using the mock-ups before any programming work was done. According to the feedback, a revised version of the mock-ups can now be created, which can then serve as a reference for the development team.
We would like to point out that, under certain circumstances, the different ways of presenting scenarios (2 tasks in tabular format and 1 in free-text format), test execution time, and the current fatigue state of the participants could have had a possible influence on the results. However, we conducted an exploratory study with a focus on collecting suggestions for improvement for the next iteration and not a classical experiment where it is common to perform a confounder analysis.
The exclusive use of the SUS as a standardized questionnaire could be perceived as an additional limitation. Although the SUS has been used previously to evaluate ontologies, it had to be adapted for this purpose [
The findings from the evaluation indicate that the investigated prototype of the feasibility tool v2 has good usability. The global SUS score of 81.25 can be rated as
Test tasks of the usability evaluation of the Aligning Biobanking and Data Integration Centers Efficiently feasibility tool v2.
Questionnaires of the usability evaluation of the Aligning Biobanking and Data Integration Centers Efficiently feasibility tool v2.
Scoring system for the assessment of the correctness of task processing.
Usability catastrophes and major usability problems (visualization and optimization).
Aligning Biobanking and Data Integration Centers Efficiently
COVID-19 Data Exchange Platform
data integration center
Research Data Portal for Health (Forschungsdatenplattform für Gesundheit)
Fast Healthcare Interoperability Resources
German Biobank Alliance
German Biobank Node
German Corona Consensus Data Set
Informatics for Integrating Biology and the Bedside
International Classification of Diseases, Tenth Revision
International Classification of Diseases for Oncology, version 3
Logical Observation Identifiers Names and Codes
Medical Informatics Initiative (Medizininformatik-Initiative)
Observational Health Data Sciences and Informatics
System Usability Scale
The authors would like to thank the staff all participating Aligning Biobanking and Data Integration Centers Efficiently (ABIDE) locations: Aachen, Augsburg, Berlin, Dresden, Erlangen, Frankfurt, Freiburg, Gießen, Göttingen, Halle, Heidelberg, Jena, Lübeck, and Tübingen. A very special thanks goes to all scientists and researchers who participated in this study and gave us a valuable insight into the usability of the feasibility tool v2 through their loud thoughts and interview responses. The authors would also like to thank Björn Kroll, who was significantly involved in the development of the first version of the prototype, and Caroline Glathe, who assisted with taking notes during the evaluation and provided her feedback on the mock-ups. The study was conducted as part of the ABIDE project, which is funded by the German Federal Ministry of Education and Research (FKZ 01ZZ2061A, 01ZZ2061I, 01ZZ2061E, and 01ZZ2061D).
CS wrote the first version of the manuscript. MZ and CS planned and conducted the usability study, which was supervised by HUP and BS. JG was the team leader. TK played a leading role in the development of the graphical user interface, and LR led the work on the ontology. CS and HUP were responsible for the recruitment of the test participants. MZ and CS analyzed all thinking aloud and interview protocols as well as the recorded screen videos. All authors read the first version of the manuscript and provided valuable suggestions for changes.
None declared.