This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on http://humanfactors.jmir.org, as well as this copyright and license information must be included.
Little is known about the acceptance and usability of computerized adaptive tests (CATs) among patients with rheumatoid arthritis (RA). The main difference between completing a CAT and a traditional questionnaire concerns item presentation. CATs only provide one item at a time on the screen, and skipping forward or backward to review and change already given answers is often not possible.
The objective of this study was to examine how patients with RA experience a Web-based CAT for fatigue.
In individual sessions, participants filled in the CAT while thinking aloud, and were subsequently interviewed about their experience with the new instrument. The technology acceptance model (TAM) was used to structure the results.
The participants were 15 patients with RA. They perceived the CAT as clear, brief, and easy to use. They were positive about answering one question per screen, the changing response options, layout, progress bar, and item number. There were 40% (6/15) of the participants that also mentioned that they experienced the completion of the CAT as useful and pleasant, and liked the adaptive test mechanism. However, some participants noted that not all items were applicable to everybody, and that the wordings of questions within the severity dimension were often similar.
Participants perceived the “CAT Fatigue RA” as easy to use, and also its usefulness was expressed. A 2.0 version has been improved according to the participants’ comments, and is currently being used in a validation study before it will be implemented in daily clinical practice. Our results give a first indication that CAT methodology may outperform traditional questionnaires not merely on measurement precision, but also on usability and acceptance valuation.
The use of Web-based technology to monitor disease course and quality of life of patients will increase tremendously in the future due to demand for greater transparency in health care and innovations in the use of Web-based measurement technology. At least for patient reported outcome measures, patients themselves will directly use this technology, and information on technology acceptance is therefore of crucial importance to estimate the benefits and long-term effectiveness of innovative technology in health care [
Rheumatoid arthritis (RA) is a chronic auto-immune disease that is characterized by inflammation of the joints [
In a CAT, items are successively selected from a large item bank, based on the patient’s previous answer. Measurement is thus tailored to the individual level, leading to greater measurement precision, with need of fewer items than traditional questionnaires [
The CAT Fatigue RA has been constructed with multidimensional IRT [
However, relatively little is known on how patients experience the use of CATs in the measurement of patient reported outcomes (PRO)’s. A few previous studies have shown that the overall user acceptance was quite high. Participants mainly expressed criticism on layout issues [
We were especially interested in whether patients would face any problems while filling in the CAT, and whether they would perceive it as a useful instrument. These aspects of usability are properly included in the Technology Acceptance Model (TAM) [
The TAM explains user acceptance of new technology by two main determinants: (1) perceived usefulness (PU), and (2) perceived ease of use (PEOU). PU refers to the degree to which a person believes a system to be worth using, for example, advantageous. PEOU refers to the degree to which a person believes that using a system does not cost much effort. Davis [
For the actual use of the CAT Fatigue RA, it is important that patients will not face any difficulties during its completion. The CAT is an Internet application that is intended to be used for PRO-monitoring in daily clinical practice and for research purposes. Patients gain access via their personal accounts of the Web-based Rheumatology Online Monitor Application (ROMA) that is used by many Dutch rheumatology units. Usually patients complete questionnaires on the Internet at home before their consultation at the rheumatology outpatient department. If a patient perceives filling in the CAT as difficult, not useful, or not enjoyable, the risk of drop out will be high.
Although many patients are already used to computer-based questionnaires, the completion of a CAT differs from filling in traditional fixed-length questionnaires. In a traditional questionnaire, patients see all questions immediately, and they have the opportunity to reread and to change answers. In contrast in a CAT, only one item at a time is provided on the screen, and often patients cannot skip forward or back [
In this study, patients filled in the CAT Fatigue RA in individual sessions while thinking aloud, and were subsequently interviewed on their experiences with the new instrument. Think aloud is a highly recommended method used to identify possible problems in measurement tools [
Technology acceptance model.
Participants were selected from a sample of patients that had participated in a previous study [
Except for one appointment at a participant’s home, all sessions took place at the university. After receiving information on the study (eg, not the person, but the application will be tested), participants signed an informed consent and filled in some background questions. Then participants filled in the CAT while thinking aloud. In case a person forgot to articulate his or her thoughts, the researcher reminded him or her to do so. Finally, a brief interview on the CAT took place. The think aloud sessions and the interviews were recorded on audiotape. The travel costs for the participants were refunded.
Participants answered background questions (gender, age, education, and work status), and gave disease-specific information (disease duration, comorbidity, numerical rating scale; NRS, global health, pain, and fatigue). The NRSs had eleven points (ranging from 0 to 10) and the following anchors, very good/very poor, no pain/unbearable pain, no fatigue/totally exhausted.
The CAT item bank consists of 196 items and three dimensions; severity (13 items, example,
Screenshot of the CAT Fatigue RA.
The method of think aloud is the most prominent user-based usability method [
After completing the CAT Fatigue RA, participants were asked about their experience with, and opinions on, the new measurement instrument, according to the interview scheme shown in
How did you experience completing the instrument?
What do you think about the successive administration of only one item per screen?
Did you notice that the response formats changed? What do you think about that?
How well could you read the questions? What do you think about letter size, colors, etc?
Did you notice the progress bar? What do you think about it?
What do you think about the length of the test/the number of questions?
Do you have any further comments about the CAT, did you notice anything else?
The audio material (think aloud sessions and interviews) was transcribed verbatim. The interview material was sorted per interview question. The comments from the think aloud part were sorted per participant. To thoroughly analyze the data, a code scheme was developed in a combination of bottom-up (search for meaningful units in the transcripts) and top down (guided by TAM) methods by reading the transcripts in detail [
There were six men and nine women diagnosed with RA that participated. Mean age was 56.13 years (SD 10.82) and mean disease duration was 12.40 years (SD 7.18). An overview about the participants and further patient characteristics are shown in
The results of the think aloud sessions and the interviews will be reported in terms of the TAM and illustrated by quotes.
Overview of participants.
Participant | Gender | Age, years | Education | Work | Disease duration, years | Comorbidity | NRS healthf | NRS paing | NRS fatigueh |
1 | Fb | 45 | Highc | Full-time | 24 | Yes | 3 | 2 | 7 |
2 | Fb | 49 | Highc | Full-time | 15 | Yes | 7 | 5 | 6 |
3 | Fb | 46 | Highc | Part-time | 20 | No | 7 | 7 | 7 |
4 | Fb | 66 | Moderated | Retired | 6 | No | 4 | 6 | 7 |
5 | Fb | 69 | Moderated | Retired | 23 | No | 2 | 1 | 3 |
6 | Ma | 59 | Highc | Part-time | 10 | No | 6 | 4 | 3 |
7 | Ma | 54 | Moderated | Disabled | 10 | No | 5 | 5 | 8 |
8 | Fb | 63 | Lowe | Household | 5 | Yes | 4 | 4 | 5 |
9 | Fb | 62 | Moderated | Retired | 13 | No | 5 | 6 | 9 |
10 | Ma | 71 | Lowe | Retired | 3 | No | 0 | 0 | 1 |
11 | Fb | 60 | Highc | Full-time | 23 | Yes | 2 | 1 | 6 |
12 | Fb | 63 | Lowe | Household | 7 | No | 7 | 8 | 8 |
13 | Ma | 60 | Lowe | Full-time | 11 | No | 3 | 2 | 5 |
14 | Ma | 40 | Moderated | Disabled | 12 | No | 6 | 6 | 7 |
15 | Ma | 35 | Moderated | Disabled | 4 | No | 3 | 6 | 5 |
Mean |
|
56.13 |
|
|
12.40 |
|
4.27 | 4.20 | 5.80 |
SD |
|
10.82 |
|
|
7.18 |
|
2.12 | 2.46 | 2.18 |
aM=male
bF=female
cHigh, more than 14 years of education
dModerate, 13-14 years of education
eLow, 12 or less years of education
fNRS health, 0 = very good and 10 = very poor
gNRS pain, 0 = no pain and 10 = unbearable pain
hNRS fatigue, 0 = no fatigue and 10 = totally exhausted
There were 80% of the participants (12 out of 15) that said that they experienced the CAT as clear and/or easy to complete. There were 87% of the participants (13 out of 15) that regarded it as advantageous to fill in only one question per screen, as it improved clarity. They found this presentation of items clear and well organized, making it easier to concentrate on the question and being really engaged in answering it. It was argued that too many questions at the same time can be overwhelming or cluttered, and with more simultaneous questions, people have the tendency to look ahead at the next question.
Quite clear, good. Yes, because of course you shouldn’t let yourself be tempted to read all the questions as quickly as possible because that is a mistake people often make, that they just immediately do everything, and then they maybe do not give a truly orientated answer.
Only two of the 15 participants reported that the presentation of only one item per screen did not really matter to them.
However, concerns also emerged related to this way of item presentation. In two thirds of the sessions, the CAT selected three or four of the following items of the dimension severity, Item 2,
All but one participant recognized that not all items had the same response options. There were 40% of the participants (6 out of 15) that said that they did not mind, it was no problem, and it did not distract them. There were two thirds of the participants that mentioned that it was advantageous that not all items had the same response options. In this way, items and response options match well with each other, which improves clarity. It was also argued that changing response options prevents people from always giving the same answer. Only one participant mentioned that it can be difficult to switch from one response format to another, however, this participant also reported having learned to fill in this kind of questions without thinking about them too long. All participants described the readability of the questions as clear, good, or comfortable to look at.
Regarding test-length, all participants were positive. They reported that the CAT was quick to complete, and they experienced the CAT as a clear and brief instrument, also in comparison to other measurement instruments. Moreover, for 40% of the participants (6 out of 15), the number of questions turned out smaller than they had expected. In general, participants described the test-length of the CAT as great, clearly better than expected, brief, or to the point.
There were 40% of the participants (6 out of 15) that declared that they regarded the CAT as useful, for example, one participant considered the CAT to be a nice questionnaire with relevant questions.
(...) they are relevant questions, they are also much more focussed and clear questions, so I think it is a nice questionnaire (...) it really goes into fatigue and in a good way. So yes, I found it surprising, a surprising thing to do. Then it gives me more the notion that it is worthwhile to complete. You can enter, I was tired lately, the last 7 days, yes, the last month, but that says so little about fatigue.
There was one person that reflected on the adaptive testing mechanism.
What I really noticed was that if I had completed a question, that the computer sometimes took longer to get to the next question, and then I think, yes, that is logical, because then it is choosing the next question after all. (...) they are going to select which question fits with the answer to your previous question. I found it quite pleasant this way.
This person was of the opinion that the questions were useful, having good response options. There were 20% of the participants (3 out of 15) that criticized that not all questions were applicable to each patient (eg, being too fatigued to do voluntary work).
Completing the CAT, and the successive administration of only one item per screen, was experienced as pleasant, nice, great, excellent, or positive by two thirds of the participants. Furthermore, 40% (6 out of 15) described the progress bar as pleasant, great, useful, or comforting. There were 53% of the participants (8 out of 15) that liked the possibility of estimating their progress in completing the CAT; mostly the progress bar was recognized immediately.
A participant described it as pleasant that it did not take a lot of time to fill in the CAT. There were one third of the participants (5 out of 15) that were said to be glad with research into fatigue, and liked to support it through their participation. There were two participants that noted that it had been pleasant to fill in the CAT.
Please continue this because it is very nice. (...) I found it very nice after many years’ experience with ROMA and especially with the paper questionnaires. In the past I occasionally completed one of those things every two months. And then you think, aaaach, you really get to take such a pile of homework with you. So, no, it was very nice.
There were two other participants that also reported enjoying the idea that the CAT is testing adaptively.
To provide an overview about the different topics that are inherent to the use of a CAT in relation to those that are also inherent to Internet questionnaires in general, we summarized the main results in
Usability topics and their specificity to CAT.
Topic | Participants, N=15, n (%) | CAT specific/Internet fatigue measurement | |
Clear and/or easy to complete | 12 (80) | Concerns CAT and Internet measurement | |
|
|
CAT specific | |
|
Advantageous | 13 (87) |
|
|
Did not matter | 2 (13) |
|
|
CAT specific | ||
Confusion | 7 (70) |
|
|
|
No comment | 3 (30) |
|
|
|
CAT specific | |
|
Positive opinion | 15 (100) |
|
Good readability | 15 (100) | Specific to Internet measurement | |
Good test-length | 15 (100) | Concerns CAT and Internet measurement | |
Usefulness | 6 (40) | Concerns CAT and Internet measurement | |
Criticism about not applicable items | 3 (20) | CAT specific | |
Enjoyment | 14 (93) | Concerns CAT and Internet measurement |
This study investigated the usability of the first version of the CAT Fatigue RA in a sample of its end users. Overall, the CAT was positively evaluated. It was described as easy to use, clear, and brief. Also some participants reported to perceive the CAT as a useful instrument, and appreciated the idea of the adaptive test mechanism. Participants reported pleasure while filling in the CAT. However, usability problems were also identified regarding similarity between items and the general applicability of some items.
Several elements are important for acceptance of new technology and actual use of a system. The perceived ease of use of the CAT was supported by this study. All participants described reading the questions as clear and good, and were positive about test-length. They said that the CAT was quick and easy to complete. Moreover, it was argued that changing response options prevents people from always giving the same answer. Nearly all participants appreciated the successive presentation of one item on the screen at a time, as it improves clarity and makes it easier to concentrate on the question.
The item presentation in the CAT gave participants less control during completion than they would have had while filling in a traditional questionnaire. Since there is no opportunity to skip forward or backwards, it is impossible to see all questions at the same time, answer them in a flexible order, or review and change already given answers [
Regarding item formulation, participants reported that four items in the severity dimension were formulated in a very similar way. As they could not skip back within the CAT, they were wondering whether the CAT presented items twice. To prevent a person feeling confused by these items while filling in the CAT, the first version of the CAT was adapted. Before the start of the instrument, a brief introduction has now been included. Therewith, patients are informed that some items may seem similar. In this way, it should be prevented that people will become distracted from filling in the CAT attentively, or that they might feel uncomfortable because they feel unable to answer in a consistent way. Another solution of this usability problem might be a more sophisticated algorithm that is able to recognize similar items, and consequently would avoid presenting them within one administration. However, then, usability issues might conflict with the selection of the best item in psychometric terms.
Nearly half of the participants reported to perceive the CAT as a useful instrument. They emphasized that the CAT contained relevant and clear questions that cover patients’ fatigue experience. Furthermore, participants liked the idea of the adaptive test mechanism, and to receive items matched to their individual level of fatigue. Since participants were not explicitly asked about the usefulness of the CAT, this result is of special interest. Probably a higher percentage of participants had supported the usefulness of the CAT if a precise question about this topic had been included in the interview scheme.
However, some participants mentioned that not all questions were applicable to every patient. As a consequence, the response option “not applicable” was added to six items in the next version (eg, items about the impact of fatigue on work, cooking, or driving the car). When the “not applicable” option is chosen, the CAT receives no information for the fatigue estimation through this item, and will select the next optimal item for that particular patient as a substitute. In general, a comparable method might also be used to enable a skip forward function in a CAT. This could be useful in situations where it is adequate to give patients the possibility to skip questions they do not want to answer, for example, regarding private information. However, then an adequate way would be needed to communicate the option to skip items to patients without stimulating them to actually do this. Otherwise, too much loss of information might be the consequence. In the CAT Fatigue RA, a skip forward option does not seem necessary since the item pool has carefully been developed with a Delphi approach [
Technology acceptance is also related to perceived enjoyment. Participants perceived completing the CAT as a pleasant experience. They enjoyed answering successively one item on the screen at a time, and liked the progress bar, as it informed them on their completion progress. Other positive remarks explicitly referred to the idea that the CAT is testing adaptively.
The TAM turned out to be an adequate guideline to study the usability of the CAT Fatigue RA. Most participants reported to perceive the CAT as easy to use, and nearly half of the participants expressed that they perceived the CAT as useful. Perceived usefulness is of special importance for acceptance and use of new technology, and might be partly explained by the perceived ease of use [
This usability test provided important insights for further research with CATs. Similar formulated items and items that might not be applicable to each participant are typical issues that may be faced when implementing a CAT technology into practice. From a theoretical viewpoint, it is beneficial to include as many items as possible in the CAT item bank. Items that are similar to each other may also be useful, as they can be selected to measure very precisely at a certain level of fatigue. However, for the user, this rationale is not always clear, and may lead to usability problems. The same applies to an item bank with items that have no “not applicable” options. CAT was originally developed for educational and assessment purposes, where “not applicable” options are not appropriate. Adopting this technology into the health care context poses new usability questions.
This study has shown that the technology of CAT was well accepted by those who are intended to use it. The method of thinking aloud in combination with a consecutive interview on the participants’ experiences with the CAT has proven to be effective in uncovering usability problems, and thereby provided the opportunity to further improve the CAT. However, it cannot be ruled out that only those patients registered for the study who were already relatively familiar with using the computer. A small group of patients without computer experience [
A possible field for future research on the CAT Fatigue RA is the development of a CAT version with a flexible stopping rule that ends the item administration in cases when a certain measurement precision is reached before 20 items have been administered. This could lead to even more efficient measurement. However, the realization is challenging because the standard error on the separate dimensions does not always decrease monotonously as in a unidimensional CAT. Such nonmonotone progress of the standard error is inherent to the multidimensional CAT algorithm that takes information into account of all three dimensions at the same time. Future research should make clear which possibilities are available for our CAT regarding a flexible stopping rule.
To conclude, the CAT Fatigue RA turned out to be perceived as an easy and useful measurement instrument that was also enjoyed by participants. This study provided insight into usability problems, leading to adaptations to the CAT. Moreover, participants described usability aspects that exceed traditional questionnaires. Next questions concerning the usability of CAT methodology are related to attractiveness of adaptive measurement in the long run. It is possible that the initial enthusiasm for this innovative measurement instrument will decrease when patients use the CAT on a regular basis and also for different purposes. However, there is a good chance that CATs will remain attractive since patients receive different items each time, which prevents boredom and predictability. However, it might also be imaginable that those different items provoke scepticism about the comparability of CAT scores of repeated measures and/or between persons. These topics have to be examined in detail in future research.
computer adaptive testing/computerized adaptive test
item response theory
numerical rating scale
perceived ease of use
patient reported outcomes
perceived usefulness
rheumatoid arthritis
Rheumatology Online Monitor Application
technology acceptance model
The Dutch Arthritis Foundation (Reumafonds) financed this study. The authors thank the participants for their contribution to this study. The authors also want to thank the two patient research partners, who were affiliated to the construction of the CAT, for their support with the interpretation of the results.
None declared.