Published on in Vol 11 (2024)

Preprints (earlier versions) of this paper are available at, first published .
Assessing the Relationship Between Digital Trail Making Test Performance and IT Task Performance: Empirical Study

Assessing the Relationship Between Digital Trail Making Test Performance and IT Task Performance: Empirical Study

Assessing the Relationship Between Digital Trail Making Test Performance and IT Task Performance: Empirical Study

Original Paper

1Tech3lab, HEC Montréal, Montréal, QC, Canada

2Faculty of Health Sciences, Hokkaido University, Sapporo, Hokkaido, Japan

Corresponding Author:

Jared Boasen, PhD


HEC Montréal

5540 Av Louis-Colin

Montréal, QC, H3T 1T7


Phone: 1 514 340 6000


Background: Cognitive functional ability affects the accessibility of IT and is thus something that should be controlled for in user experience (UX) research. However, many cognitive function assessment batteries are long and complex, making them impractical for use in conventional experimental time frames. Therefore, there is a need for a short and reliable cognitive assessment that has discriminant validity for cognitive functions needed for general IT tasks. One potential candidate is the Trail Making Test (TMT).

Objective: This study investigated the usefulness of a digital TMT as a cognitive profiling tool in IT-related UX research by assessing its predictive validity on general IT task performance and exploring its discriminant validity according to discrete cognitive functions required to perform the IT task.

Methods: A digital TMT (parts A and B) named Axon was administered to 27 healthy participants, followed by administration of 5 IT tasks in the form of CAPTCHAs (Completely Automated Public Turing tests to Tell Computers and Humans Apart). The discrete cognitive functions required to perform each CAPTCHA were rated by trained evaluators. To further explain and cross-validate our results, the original TMT and 2 psychological assessments of visuomotor and short-term memory function were administered.

Results: Axon A and B were administrable in less than 5 minutes, and overall performance was significantly predictive of general IT task performance (F5,19=6.352; P=.001; Λ=0.374). This result was driven by performance on Axon B (F5,19=3.382; P=.02; Λ=0.529), particularly for IT tasks involving the combination of executive processing with visual object and pattern recognition. Furthermore, Axon was cross-validated with the original TMT (Pcorr=.001 and Pcorr=.017 for A and B, respectively) and visuomotor and short-term memory tasks.

Conclusions: The results demonstrate that variance in IT task performance among an age-homogenous neurotypical population can be related to intersubject variance in cognitive function as assessed by Axon. Although Axon’s predictive validity seemed stronger for tasks involving the combination of executive function with visual object and pattern recognition, these cognitive functions are arguably relevant to the majority of IT interfaces. Considering its short administration time and remote implementability, the Axon digital TMT demonstrates the potential to be a useful cognitive profiling tool for IT-based UX research.

JMIR Hum Factors 2024;11:e49992



Cognitive functional ability is a fundamental factor widely recognized to influence IT usability [1-3]. The classical approach to control for cognitive functional ability is to target participants according to general demographics based on age, education, or other factors [4,5]. However, this approach intrinsically precludes the ability to control for or assess how cognitive functional ability impacts IT usability in individual users, thereby limiting the extent which insight can be gained within a demographic or for an individual. Moreover, this approach is incongruent with the rapid advancement of IT toward products that adapt to individual user characteristics, thus necessitating a more granular understanding of individual cognitive abilities [6-8].

To obtain a granular characterization of individual cognitive function, hitherto, research has typically used cognitive assessment batteries [9-11]. Dumont et al [12] used the National Institutes of Health Toolbox, which is a battery of cognitive tests that can be completed in 40 minutes [13] to develop a cognitive analysis grid to be able to draw statistical parallels between the cognitive demands of an information systems interface and the performance of a user. Other batteries of tests were also used, such as the Kit of Factor-Referenced Cognitive Tests [10], which was used by Wagner et al [1] to study the impact of age on website usability and by Allen [14] in his research to study the combination of users’ cognitive abilities and specific information system functionalities that can be implemented to create system usability. This battery is typically administered in 144 minutes [15]. Another approach for assessing individual cognitive ability is to use clinically administered tests such as the Montreal Cognitive Assessment (MoCA) or the Mini-Mental State Examination (MMSE). Although typically used in medical settings to evaluate cognitive impairment in patients with neurological disorders [9,11], MoCA and MMSE have been reportedly used to measure the cognitive abilities of participants in human-computer interaction experiments [3,16-18]. However, while detailed and accurate, these cognitive assessment batteries are too lengthy to practically administer during typical user experience (UX) testing time frames [19,20]. Furthermore, while clinically administered tests such as MoCA and MMSE are comparatively shorter than other assessment batteries, they require a trained administrator to administer and score the test [3]. This level of expertise may not always be available, particularly in UX research settings where mostly nonclinically trained research personnel are conducting the experiments.

Correspondingly, there have been calls from across health, UX, and IT domains for a more practical yet accurate means of assessing cognitive function [12,21,22]. One solution would be to identify a short test with reduced scope but which nevertheless targets cognitive functions important for using IT. Based on research conducted to understand the impact of cognitive functions on the use of technology by older people [23,24], and on existing models of cognitive architecture in human-computer interaction [25], we identified 5 key cognitive functions important for IT use: visual perception, motor function, executive function, inhibitory control, and working memory. Visual perception is important for finding relevant information cues on a web page [23]. Motor functions are involved in tasks such as data entry using the keyboard, navigation using the mouse, or other tool to perform a digital task [26]. Executive functions come into play in order to make decisions and prioritize action [23]. Inhibitory control, also called “response inhibition” [27], is the functional ability to inhibit or override motor commands or other executive processing, such as when an external stimulus interferes with goal-driven behavior as in a task-switching situation [28,29]. Finally, short-term or working memory capacity may be important in IT task performance, for example, for remembering options or system output at a later stage [23].

One potential preexisting cognitive assessment candidate that targets these cognitive functions related to IT use is the Trail Making Test (TMT). First developed for the Army Individual Test Battery [30], the TMT is one of the most widely used instruments in neuropsychological assessment as an indicator of cognitive processing speed and executive functioning [31-35]. Many studies have been conducted to determine which cognitive abilities are engaged during the completion of this 2-part test (TMT-A and TMT-B). After a comprehensive review of the literature on the topic, Sánchez-Cubillo et al [36] explored the contributions of certain cognitive functions and found that part A of the TMT (TMT-A) mainly requires visual-perceptual abilities, and that part B (TMT-B) reflects primarily working memory, executive function, and task-switching ability. Finally, although its contribution in the TMT has been questioned by the study of Sánchez-Cubillo et al [36], it is interesting to note that psychomotor ability has been mentioned numerous times as one of the abilities required for both parts of the TMT (Groff and Hubble [37] in both parts, Schear and Sato [38], Gaudino et al [39], and Crowe [40] in part B). The primary objective of this study was to test the validity of using the TMT as a cognitive profiling tool to predict or explain the variance in IT task performance. With an interest in a practical tool for cognitive profile assessments in UX testing of digital artifacts, we chose to use a digital version of the TMT. To further support and explain our results, we additionally cross-validated the digital TMT with the original TMT, a visual search task assessing visuomotor processing [41,42], and a hidden path learning task assessing visuomotor-processing speed, spatial working memory, and error-monitoring ability [43]. We had two hypotheses: (1) TMT times would be predictive of general IT task performance and (2) that the predictive power of the TMT would be stronger for tasks requiring the use of cognitive functions that are congruent with those assessed by the TMT.


To test our hypothesis, we conducted a laboratory experiment with 27 healthy participants (12 men and 15 women), between 18 and 36 (mean 24, SD 4.22) years of age, who were mostly university students (n=22, 85%).

Ethical Considerations

Written informed consent was obtained from all subjects via a signed form at the beginning of the experiment. This project was approved by our institution’s research ethics committee (#2021-4108). A monetary compensation of CAD $25 (US $18.35) was provided to each subject upon completion of the experiment. Data from 1 subject were lost due to technical issues, thus leaving data from 26 participants available for analysis. All data were anonymized prior to analysis and stored in encrypted servers only accessible by authorized researchers.

IT Tasks

Two types of general IT tasks were used in the experiment. One type of IT task was based on CAPTCHA (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). This type of Turing test is widely used in IT to ensure the cybersecurity of many internet services, as they prevent a number of attacks from automated programs (often referred to as bots), by distinguishing legitimate users from computer bots while requiring minimal effort by the human user [44]. Four CAPTCHAs were based on typical existent CAPTCHAs and included Google reCAPTCHA (Google), pictogram recognition (PicRec), numerical recognition (NumRec), and text recognition (Text). A Fifth task was taken from Raven’s Progressive Matrices (RPM) and presented in a CAPTCHA format. RPM are a collection of widely used standardized intelligence tests consisting of analogy problems in which a matrix of geometric figures is presented with 1 entry missing, and the correct missing entry must be selected from a set of answer choices [45]. A 3×3 RPM was selected as it was considered that it offered the best trade-off between cognitive effort and the time required to complete it. The final 5 IT tasks, shown in Figure 1, were embedded on a Qualtrics questionnaire. For this study, we targeted IT task completion time, measured as the time from the display of each task to when subjects responded and pressed the “next” button, based on 30 fps screen recordings.

Figure 1. The 5 information technology tasks. (A) Text-based Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHA): subjects had to type the 2 words in an input field below the text image. (B) Pictogram recognition CAPTCHA: subjects had to recognize and click on the image showing the 2 dice with the same pictogram on the top face. (C) Google reCAPTCHA: subjects had to recognize and click on the images showing the bicycles. (D) Number recognition CAPTCHA: subjects had to recognize and click on the image showing dice summing to 14 on the top faces (numerals and dots combined). (E) Raven’s Progressive Matrix: subjects had to click from among the 8 proposed images the one which most appropriately fit in the missing corner of the basic matrix.

The other type of IT task was a website design evaluation to assess perceived usability using Aladwani and Palvia’s [46] user-perceived web quality measurement scale. Screenshots of the home pages of the following 5 websites were used: Vignerons d’Exception [47], Renaud-Bray [48], LesPAC [49], [50], and [51]. One website was presented subsequent to each CAPTCHA. Participants were told that the website evaluation was the primary task of the experiment and that the CAPTCHAs were present as a security measure to access our database housing the website screenshots. However, the website evaluations were actually dummy tasks, and participant responses were not analyzed. The IT tasks really targeted and analyzed in this study were the CAPTCHAs.

Cognitive Function Characterization of CAPTCHAs

The principal reason CAPTCHAs were chosen as our general IT tasks is because they are ubiquitous in IT and because they are often distinguishable from one another according to task-specific demands such as math, 3D orientation, text recognition, and visual search, suggesting that different underlying cognitive processing required them. However, there is a paucity of studies regarding the examination of the specific cognitive functions of CAPTCHAs. Therefore, we formed a panel of 11 trained, nonexpert evaluators to rank the selected CAPTCHAs on a 5-point agreement scale according to the 5 cognitive functions mentioned in the Introduction section, which have been deemed relevant to IT tasks and the TMT: visuospatial perception, motor function, executive function, inhibitory control, and working memory. The evaluation scores permitted each CAPTCHA to be assigned a rank according to the extent the cognitive functions required to perform it overlapped with those of the TMT. In order of highest to lowest alignment, the rankings were as follows: (1) RPM, (2) NumRec, (3) PicRec, (4) Google, and (5) Text, as shown in Table 1. For details of how this evaluation was conducted and how the process was validated, see Multimedia Appendix 1.

Table 1. Convergence ranks of IT tasks with the TMTa.
IT taskRPMb (E)NumRecc (D)PicRecd (B)Google (C)Text (A)
Executive function, mean (SD)5.00 (0.00)4.91 (0.30)4.45 (1.04)4.45 (1.04)3.82 (1.4)
Visual object recognition, mean (SD)4.09 (0.70)4.27 (0.65)4.64 (0.67)4.82 (0.60)4.18 (1.17)
Visual pattern recognition, mean (SD)4.91 (0.30)4.45 (0.93)4.64 (0.67)3.82 (0.98)4.64 (0.67)
Working memory, mean (SD)4.18 (0.60)3.91 (1.38)2.91 (1.51)2.45 (1.21)2.73 (1.27)
Evaluation score for reliable convergent dimensions, mean (SD)4.55 (0.48)4.39 (0.42)4.16 (0.84)3.89 (1.04)3.84 (0.81)
Convergence rank with TMT following the evaluatione12345

aTMT: Trail Making Test.

bRPM: Raven\'s Progressive Matrices.

cNumRec: numerical recognition.

dPicRec: pictogram recognition.

eBased on the average evaluation scores of IT tasks on the reliable cognitive dimensions considered convergent with the TMT. A, B, C, D, and E refer to the labels of the IT tasks presented in Figure 1.

Digital TMT

Because we are interested in cognitive assessment for UX testing of IT and because it was convenient to present all the tasks on the same device, we chose to use a digital version of the TMT called “Axon” (Language Research Development Group). This version emulates the original TMT as an iPad app, allowing the user to draw the trail on the touch screen with 1 finger. The 2 parts (A and B) of the TMT were completed, each with 25 circles to connect. Axon TMT was designed with a canvas generation algorithm, meaning that the test canvas for each subject for each TMT A and B was different. As shown in Figure 2, both tests were presented in full screen on the iPad with 25 circles of 1-cm diameter placed randomly on the digital canvas in a homogeneous way. The rules of Axon were identical to those of the original TMT, as outlined by Bowie and Harvey [52]. Participants had to connect the circles in ascending order: from 1 to 25 for part A and from 1 to 13 for part B, alternating numbers and letters in ascending order (ie, 1, A, 2, B, 3, C, etc). Errors such as lifting the finger off the screen, crossing trails, or connecting a wrong circle resulted in the line for the latest segment to be automatically erased and subjects had to return to the last successfully reached circle in order to continue. The measures chosen for this study were the completion time for each of the 2 parts of the test, from the moment the layout was displayed until the last circle was reached. These measures were exported from the app after the completion of the study and used in our statistical analyses.

Figure 2. Screenshots of Axon A and Axon B. Subjects had to draw to connect the circles in ascending order (from 1 to 25 for part A and from 1 to 13 and A to L for part B, alternating numbers and letters) on a single line, without crossing paths or lifting their finger from the screen. In case of errors in drawing, the app automatically guided subjects back to the last correct circle from which they continued the test.

Cross-Validation of the Digital TMT


To better support and explain our results, we cross-validated Axon with the original TMT and a working memory and a visual search task.

Original TMT

The original TMT was administered as outlined by Bowie and Harvey [52] at the end of the study. The practice step was skipped in the interest of time and with the knowledge that the subject had already performed the digital TMT earlier in the study.

Hidden Path Learning Task

To cross-validate Axon’s ability to measure working memory and spatial ability, we administered a hidden maze learning task, based on the Groton Maze Learning Test developed by Pietrzak et al [43]. Our task was called the “hidden path learning” task and was based on a 10 × 10 grid. Five trials were administered on the iPad via the Cognition Lab platform (BeriSoft, Inc), following similar guidelines as the Groton Maze Learning Test [43]. The hidden path learning task is particularly targeted at working memory, as the user has to call on it to navigate between tiles and remember any errors they may have made before [53,54]. Correspondingly, working memory ability is associated with the extent to which completion time decreases over trials, revealing a learning curve. Thus, the metrics used for these analyses were the difference between the completion times of each consecutive trial on the task. A depiction of the hidden path learning task is shown in Figure 3 (left). Measures were automatically collected on the Cognition Lab server.

Figure 3. Cross-validation tasks. In the hidden path learning 10×10 matrix (left), subjects had to go from the yellow starting point to the green end point 1 tile at a time. In the visual search task (right), there were 6 items, with 1 target and 5 distractors. In the I+N sequence (shown), participants had to touch “Yes” at the bottom-right if they saw the target, “No” at the bottom-left otherwise.
Visual Search Task

To cross-validate Axon’s ability to measure visuomotor function, we administered a visual search task on the Cognition Lab platform (BeriSoft, Inc). This task was based on the work by Treisman and Gelade [42] and involved finding a target among distractors. Participants had to touch the right side of the screen when they saw the target, the left side otherwise, therefore involving visual and psychomotor response ability. Three stimuli configurations were used, with 3 distractor sets. Configurations were displayed with 24 trials for each stimulus, leading to a total of 72 trials. For each trial, 3, 6, or 9 symbols were displayed (letters or shapes), with even and randomized distribution among each stimulus sequence. A depiction of this task is shown in Figure 3 (right). Again, measures were automatically collected on the Cognition Lab server. Reaction times were used for the present analyses.


Upon arrival and after signing the informed consent form, subjects were asked to sit on a chair facing the iPad Air (fourth generation) running on iPadOS 15.3 (Apple Inc) placed on a desk and were asked to adjust the chair’s height so that they were comfortable using the iPad, and they were within the camera recording frame. The experimental setup is presented in Figure 4. They were asked to move the chair closer or further away to maintain an approximate distance of 70 (±10) cm between their eyes and the iPad screen to give enough space for hand movement during the tasks. The camera was fixed independently from the iPad to avoid unwanted movements on the video when the participant presses the screen while doing the tasks. After a presentation of the study and the tools used, the participants were asked to complete the 2 parts of the TMT (A and then B) on the Axon app. Task instructions were given in a protocol format to ensure that all participants received the same instructions and that the data would be comparable. Participants were verbally and visually guided through the rules of the TMT using a tutorial embedded in the app.

Figure 4. Experimental setup diagram. The subject was seated at a chair in front of a desk where the iPad Air 4 was placed. A Logitech C920 camera was independently fixed to the desk via a camera stand and duct tape.

After completing parts A and B on the Axon app, participants were administered the hidden path learning and the visual search tasks. Then, participants commenced the IT task portion of the experiment. As previously mentioned, participants were told that the primary objective was to evaluate 5 interfaces of more or less popular websites, each interface being on a secure server accessible only after the completion of a CAPTCHA. Thus, subjects completed a CAPTCHA, observed a web interface for a few minutes, and then completed the user-perceived web quality measurement scale [46]. This sequence was repeated 5 times, with the tasks presented in random order, each preceded by a distinct CAPTCHA. At the end of the study, for ethical reasons, subjects were told orally that they were in fact being evaluated on their performance on the CAPTCHAs.

Statistical Analyses

To test the ability of the Axon TMT to predict performance on the 5 CAPTCHA IT tasks, a repeated-measures multivariate analysis of covariance (RM MANCOVA) was performed with Axon A completion time and Axon B completion time as independent predictors and the completion time for each of the 5 IT tasks as the dependent covariates.

To further interpret our results, we tested the relationship between Axon TMT completion times and visuomotor function by performing an RM MANCOVA with Axon A and Axon B times as independent predictors and the mean reaction time of each of the 3 visual search tasks (the shape of an arrow as a target among the triangle shapes as distractors, the letter T as a target among the letters I and N as distractors, and the letter T as a target among the letters I and Z as distractors) as the dependent covariates. In addition, we tested the relationship between Axon TMT completion times and working memory function by performing an RM MANCOVA with Axon A and Axon B times as independent predictors and the difference between the completion time of each consecutive trial on the hidden path learning task as the dependent covariates. Finally, we cross-validated the relationship between the Axon TMT and the original TMT using 2 Pearson correlation tests, 1 each for tests A and B.

For all RM MANCOVAs performed in the analysis, omnibus results and multivariate results for each independent predictor are reported. In the case of significant multivariate results, simple main effects based on parameter estimates are reported for dependent covariates, which were significantly predicted by Axon.

All statistical analyses were conducted using the IBM SPSS Statistics software (version; IBM Corp) with a threshold for statistical significance set at P≤.05, using the Bonferroni correction to adjust for multiple comparisons.

Axon TMT Cross-Validation

Axon Versus Original TMT

The mean scores of Axon A and B were 48.04 (SD 25.80) and 56.88 (SD 25.53) seconds, respectively. The mean scores on the original TMT A and B were 29.22 (SD 12.26) and 51.62 (SD 19.07) seconds, respectively. Pearson correlation tests revealed that Axon is highly correlated with TMT results, with a significant positive correlation between Axon A and TMT A (r=0.688; Pcorr=.001) and a significant positive correlation between Axon B and TMT B (r=0.505; Pcorr=.017).

Axon TMT Versus Hidden Path Learning

The difference in consecutive trial times was (2–1) –29.87 (17.70), (3–2) –5.48 (6.01) seconds, (43) –4.30 (4.80) seconds, and (5–4) –1.50 (4.25) seconds. The omnibus test of the RM MANCOVA revealed that Axon A and Axon B combined are significant to explain the variance in the decrease in completion times across consecutive trials (F4,20=4.119; P=.01; Λ=0.548). However, multivariate results revealed that the decrease in completion times across trials was not predicted by Axon A (F4,20=1.923; P=.15; Λ=0.722) or Axon B (F4,20=1.106; P=.38; Λ=0.819) alone. Thus, a predictive relationship appears to exist between Axon and working memory in the hidden path learning task as a function of Axon A and B combined.

Axon TMT Versus Visual Search

Reaction times for the T among letters I and N, T among letters I and Z, and arrow among triangles were 0.80 (0.14) milliseconds, 0.78 (0.15) milliseconds, and 0.68 (0.14) milliseconds, respectively. The omnibus test of the RM MANCOVA revealed that Axon A and Axon B combined significantly explained the variance in visuomotor function assessed with reaction time to the 3 stimuli in the visual search task (F3,21=3.125; P=.048; Λ=0.691). Multivariate results revealed that this result was driven mainly by Axon A (F3,21=3.220; P=.043; Λ=0.685) rather than Axon B (F3,21=0.502; P=.69; Λ=0.933). Parameter estimates revealed that Axon A was marginally significantly predictive of reaction times to the letter T among letters I and N stimulus (β=3.573; t21=2.767; Pcorr=.055) and significant for letter T among letters I and Z (β=4.353; t21=3.156; Pcorr=.02) and arrow among triangles (β=3.725; t21=3.158; Pcorr=.02) stimuli.

Axon TMT Predicts Overall IT Performance

The primary hypothesis assumed that there was a positive predictive relationship between TMT performance and IT task performance. The omnibus test of the RM MANCOVA revealed that Axon A and Axon B combined significantly explain the variance in IT tasks performance (F5,19=6.352; P=.001; Λ=0.374), thereby supporting the primary hypothesis. Multivariate results revealed that this effect was driven by performance on Axon B (F5,19=3.382; P=.03; Λ=0.53). Figure 5 shows the distribution of Axon completion times in relation to IT task completion times.

Figure 5. Distribution of Axon A and B completion times in relation with the completion times of the 5 IT tasks (N=26). Axon B trendlines and parameter estimates (β and P) show the relationship between Axon B IT task performance. Number in upper right corner of plot area is hypothesized convergence rank (Table 1). IT: information technology; NumRec: numerical recognition; PicRec: pictogram recognition; RPM: Raven’s Progressive Matrices. Letters A through E refer to the labels used for each task in Figure 1.

Axon TMT Better Predicts Performance on Convergent IT Tasks

The second hypothesis assumed that the predictive relationship between TMT performance and IT task performance would be stronger if the cognitive abilities involved in the performance were congruent. To test our hypothesis, we analyzed the parameter estimates for the multivariate results of Axon B. These revealed that Axon B was significantly predictive of IT task C (RPM task; β=.785; t19=3.240; Pcorr=.018) and IT task B (PicRec task; β=.260; t19=2.824; Pcorr=.048). However, IT task D (number recognition task), which was rated the second most congruent task with Axon, was not significantly predicted by Axon B (β=.150; t19=0.479; Pcorr=3.183). Our secondary hypothesis is therefore partially supported. These results are shown in Figure 5, where the effects of individual factors of Axon B on performance on IT tasks are represented (β and P values).

Principal Findings

Cognitive functional ability may well affect task performance in UX and other research experimentation, leading to variance in performance measures among the target population and confounding the effects of experimental factors. Although detailed cognitive assessment batteries exist and can be used to control intersubject differences in cognitive abilities [12], they are not time efficient and thus impractical to implement within typical experimental time frames. Here, this study tested the validity of using the Axon TMT, which takes only a few minutes to administer, to predict or explain the variance in IT task performance in an age-homogenous subject population.

The mean age of the subject population of this sample was 24 (SD 4.22) years. This is typical of many research studies, UX related or otherwise, relying on student recruitment through the parent institution [55-57]. Despite the relatively low SD of age, the SD in Axon TMT scores was broad, at 25.80 (mean 48.04) and 25.53 (mean 56.9) seconds, respectively, for Axon A and B, suggesting a large distribution of cognitive functional abilities among this age-homogeneous neurotypical population. Notably, the means and SDs for the Axon TMT, particularly for Axon A, were higher than what is typically reported in the literature for neurotypical subjects in this age bracket [58-60]. This may be due to the fact that, unlike in the implementation of the paper-based TMT, subjects did not practice a mini version of the test before performing Axon A or B. Thus, some portion of the time taken to complete the test must be attributable to familiarization with task demands. This would also explain why the mean scores for Axon B, whose task demands are similar to Axon A in many respects, are closer to typically reported TMT B means. Nevertheless, for the purposes of this study, it is not absolute Axon TMT scores that are important. Rather, it is the relative distribution of the variance in Axon scores and their correlation to other metrics that is essential. To that end, both Axon A and B significantly correlated to their respective paper-based TMT counterparts showed a combined predictive validity toward working memory via the hidden path learning task. Furthermore, it was Axon A, not B, which was the predominant driver of the significant correlation with visual search performance. This is logical, as the visual search task does not involve working memory–related processing [42,61]. Instead, it requires an emphasis on target identification, cognitive control, and motor output, precisely the dominant cognitive functions involved in TMT A [36,39,40]. Thus, far from being problematic, implementing Axon A and B without a preliminary minitest for practice was time-efficient and yielded a reliable distribution of scores, which could be cross-correlated with expected cognitive functions.

This cross-validation lends credibility to our observation that Axon A and B combined were significantly predictive of IT task performance, supporting our primary hypothesis. Interestingly, for the IT tasks chosen, it was Axon B that appeared as the stronger driver of predictive validity, suggesting that it may be more powerful in capturing the executive decision-making involved in an ecologically valid IT task. Moreover, simple main effects tests revealed that Axon B significantly correlated with 2 out of the top 3 tasks ranked as requiring congruent cognitive functions as the TMT, thereby partially supporting our secondary hypothesis. Contrary to our expectations, the NumRec task, which had the second-highest congruence rank, was not significantly correlated with Axon B. We speculate that the confound here relates to the underlying mathematical operations involved in solving that CAPTCHA. Although raters classified this as executive decision-making, it certainly can be said that neither TMT A nor TMT B requires arithmetic. Therefore, there must be cognitive processes involved that are simply not recruited during the performance of the TMT, which our ranking system was not granularized enough to capture, hence explaining the lack of correlation between the NumRec task and Axon B. Meanwhile, Axon B was most strongly correlated with the RPM and PicRec task, suggesting that it is well suited for tasks involving visual pattern and object recognition in combination with higher-order executive processing to orient this visual information. These kinds of processing are arguably crucial for interface navigation, virtual reality, gaming, or using simulators, which are extremely common IT tasks investigated in UX research [62-64]. Thus, while Axon does appear to be better aligned with IT tasks involving convergent cognitive processing, such tasks may well comprise a major proportion of those studied in UX research.

Finally, there are a few points worth emphasizing. First, the complete administration of Axon took less than 5 minutes, far shorter than the strategy used by Dumont et al [12] or any other cognitive assessment that we are aware of. Second, considering Axon’s ability to differentiate from among an age-homogeneous neurotypical population, it would likely perform even better among populations where a larger variance in cognitive function would be expected, such as in older adults, children, stroke survivors, or other individuals with atypical cognitive function. This is important because understanding how to design appropriate and accessible IT for these populations has become a topic of increasing concern in UX research [65-67]. Moreover, Axon is suitable for remotely moderated experimentation, a popular strategy since the COVID-19 pandemic [68] and one that mitigates subject recruitment challenges for all population types. Finally, the current advancement in technology, particularly in the field of artificial intelligence, is trending toward a more personalized and user-centric approach, adapting technology to individual user characteristics such as preferences and interests [8,69,70]. Part of this personalization could be to tailor technology according to the cognitive abilities of users. Axon could potentially facilitate this advancement, serving as a quick and reliable metric to train the artificial intelligence technology adaptation algorithm.


There are some limitations that should be acknowledged with this study. First, because the Axon app is designed to produce TMT canvases according to an algorithm with every test instance, the Axon A and B canvas layouts were not constant across subjects. This means that some of the variance in Axon A and B times is intrinsically attributable to factors such as differences in the straight-line drawing path length of the test or the extent of visual interference between each drawing segment. On the other hand, the fact that Axon A and B were significantly cross-validated with the original TMT and the visual search and hidden path learning tasks in spite of canvas layout differences between participants suggests that the variance these differences cause is small and does not detract from the use of Axon as a cognitive profiling tool in UX testing. Second, this study tested the predictive validity of Axon on simple and discrete IT tasks. This was necessary as a proof of concept for our hypotheses. However, readers should use caution when generalizing the present results. Further research is needed to investigate the extent to which Axon retains predictive validity for more complex IT tasks in different contexts and across various user demographics, including neuroatypical and cognitively impaired users.


This study tested the ability of the Axon digital TMT to predict performance on discrete IT tasks. The results indicate that variance in IT task performance among an age-homogenous neurotypical population can be related to intersubject variance in cognitive function as assessed by Axon. Although the findings suggest that Axon’s predictive validity may be strongest for IT tasks involving the combination of decision-making with visual object and pattern recognition, these types of cognitive processing would arguably be relevant to the majority of IT interfaces. Considering its short administration time and remote implementability, the Axon digital TMT has the potential to be a useful cognitive profiling tool for IT-based UX research.


The authors would like to thank David Brieugne, Salima Tazi, Xavier Côté, and all the operational staff at the Tech3Lab for their operational assistance in the execution of this study. This study was made possible thanks to funding from NSERC (ALLRP 571020-21), Prompt (176_Sénécal-BNC, CAE, CN, SRC_2021.02), and Mitacs (IT18280).

Conflicts of Interest

None declared.

Multimedia Appendix 1

Evaluation of CAPTCHAs–congruent cognitive functions in the information technology tasks.

DOCX File , 17 KB

  1. Wagner N, Hassanein K, Head M. The impact of age on website usability. Computers in Human Behavior. Aug 2014;37:270-282. [CrossRef]
  2. Karahoca D, Karahoca A, Güngör A. Assessing effectiveness of the cognitive abilities and individual differences on e-learning portal usability evaluation. New York, NY, United States. Association for Computing Machinery; 2008. Presented at: Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing; June 12-13, 2008; Gabrovo, Bulgaria. [CrossRef]
  3. Martins AI, Silva AG, Pais J, Cruz VT, Rocha NP. The impact of users' cognitive function on evaluator perceptions of usability. Sci Rep. Aug 12, 2022;12(1):13753. [FREE Full text] [CrossRef] [Medline]
  4. Aykin NM, Aykin T. Individual differences in human-computer interaction. Comput Ind Eng. 1991;20(3):373-379. [CrossRef]
  5. Lewis JR. Usability testing. In: Salvendy G, editor. Handbook of Human Factors and Ergonomics, Fourth Edition. Hoboken, NJ. John Wiley & Sons; 2012:1267-1312.
  6. Ji H, Yun Y, Lee S, Kim K, Lim H. An adaptable UI/UX considering user’s cognitive and behavior information in distributed environment. Cluster Comput. 2017;21(1):1045-1058. [CrossRef]
  7. Andreessen LM, Gerjets P, Meurers D, Zander TO. Toward neuroadaptive support technologies for improving digital reading: a passive BCI-based assessment of mental workload imposed by text difficulty and presentation speed during reading. User Model User-Adap Inter. 2020;31(1):75-104. [FREE Full text] [CrossRef]
  8. Gajos KZ, Weld DS, Wobbrock JO. Automatically generating personalized user interfaces with Supple. AI. 2010;174(12-13):910-950. [FREE Full text] [CrossRef]
  9. Folstein MF, Folstein SE, McHugh PR. "Mini-mental state": a practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189-198. [CrossRef] [Medline]
  10. Ekstrom RB, French JW, Harman HH, Dermen D. Manual for Kit of Factor-Referenced Cognitive Tests, 1976. Princeton, NJ. Education Testing Service; 1976.
  11. Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, et al. The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. J Am Geriatr Soc. 2005;53(4):695-699. [CrossRef] [Medline]
  12. Dumont L, Chénier-Leduc G, de Guise É, de Guinea AO, Sénécal S, Léger PM. Using a cognitive analysis grid to inform information systems design. In: Davis FD, Riedl R, vom Brocke J, Léger PM, Randolph AB, editors. Information Systems and Neuroscience: Gmunden Retreat on NeuroIS 2015. Cham. Springer International Publishing; 2015:193-199.
  13. Weintraub S, Dikmen SS, Heaton RK, Tulsky DS, Zelazo PD, Bauer PJ, et al. Cognition assessment using the NIH Toolbox. Neurology. Mar 12, 2013;80(11 Suppl 3):S54-S64. [FREE Full text] [CrossRef] [Medline]
  14. Allen B. Cognitive abilities and information system usability. Inf Process Manage. 1994;30(2):177-191. [CrossRef]
  15. Wothke W, Curran LT, Augustin JW, Guerrero C, Bock RD, Fairbank BA, et al. Factor Analytic Examination of the Armed Services Vocational Aptitude Battery (ASVAB) and the KIT of Factor-Referenced Tests. San Antonio, TX. Operational Technologies Corp; 1991. URL: [accessed 2024-04-26]
  16. Oliveira J, Gamito P, Alghazzawi DM, Fardoun HM, Rosa PJ, Sousa T, et al. Performance on naturalistic virtual reality tasks depends on global cognitive functioning as assessed via traditional neurocognitive tests. Appl Neuropsychol Adult. 2018;25(6):555-561. [CrossRef] [Medline]
  17. Czaja SJ, Charness N, Fisk AD, Hertzog C, Nair SN, Rogers WA, et al. Factors predicting the use of technology: findings from the Center for Research and Education on Aging and Technology Enhancement (CREATE). Psychol Aging. 2006;21(2):333-352. [FREE Full text] [CrossRef] [Medline]
  18. Khalili-Mahani N, Assadi A, Li K, Mirgholami M, Rivard ME, Benali H, et al. Reflective and reflexive stress responses of older adults to three gaming experiences in relation to their cognitive abilities: mixed methods crossover study. JMIR Ment Health. 2020;7(3):e12388. [FREE Full text] [CrossRef] [Medline]
  19. Schatz R, Egger-Lampl S, Masuch K. The impact of test duration on user fatigue and reliability of subjective quality ratings. J Audio Eng Soc. 2012;60(1):63-73.
  20. Lallemand C, Gronier G. UX design methods: 30 fundamental methods to design for optimal experiences. Paris, France. Eyrolles; 2018.
  21. Isaacs E, Oates J, ILSI Europe a.i.s.b.l. Nutrition and cognition: assessing cognitive abilities in children and young people. Eur J Nutr. 2008;47(Suppl 3):4-24. [CrossRef] [Medline]
  22. Robbins TW, Sahakian BJ. Computer methods of assessment of cognitive function. In: Copeland JRM, Abou-Saleh MT, Blazer DG, editors. Principles and Practice of Geriatric Psychiatry. Chichester, West Sussex, England. Wiley; 2002:147-151.
  23. Slegers K, Van Boxtel MPJ, Jolles J. The efficiency of using everyday technological devices by older adults: the role of cognitive functions. Ageing Soc. 2009;29(2):309-325. [CrossRef]
  24. Chan MY, Haber S, Drew LM, Park DC. Training older adults to use tablet computers: does it enhance cognitive function? Gerontologist. 2016;56(3):475-484. [FREE Full text] [CrossRef] [Medline]
  25. Kieras DE, Meyer DE. An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Hum–Comput Int. 1997;12(4):391-438. [CrossRef]
  26. Czaja SJ, Sharit J, Nair S, Rubert M. Understanding sources of user variability in computer-based data entry performance. Behav Inf Technol. 1998;17(5):282-293. [CrossRef]
  27. Miyake A, Friedman NP, Emerson MJ, Witzki AH, Howerter A, Wager TD. The unity and diversity of executive functions and their contributions to complex "Frontal Lobe" tasks: a latent variable analysis. Cogn Psychol. 2000;41(1):49-100. [CrossRef] [Medline]
  28. Kiesel A, Steinhauser M, Wendt M, Falkenstein M, Jost K, Philipp AM, et al. Control and interference in task switching—a review. Psychol Bull. 2010;136(5):849-874. [CrossRef] [Medline]
  29. Gade M, Schuch S, Druey M, Koch I. Inhibitory control in task switching. In: Grange J, Houghton G, editors. Task Switching and Cognitive Control. New York, NY. Oxford University Press; 2014:137-159.
  30. US Army. Army individual test battery. In: Manual of Directions and Scoring. Washington, DC. War Department, Adjunct General’s Office; 1944.
  31. Lezak MD. Executive functions and motor performance. In: Neuropsychological Assessment. New York. Oxford University Press; 1995:650-685.
  32. Mitrushina M, Boone KB, Razani J, D'Elia L. Handbook of Normative Data for Neuropsychological Assessment, Second Edition. New York, NY. Oxford University Press; 2005.
  33. Reitan RM. Validity of the trail making test as an indicator of organic brain damage. Percept Mot Skills. 2016;8(3):271-276. [CrossRef]
  34. Reitan RM, Wolfson D. Conventional intelligence measurements and neuropsychological concepts of adaptive abilities. J Clin Psychol. 1992;48(4):521-529. [CrossRef] [Medline]
  35. Strauss E, Sherman EMS, Spreen O. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. Oxford. Oxford University Press; 2006.
  36. Sánchez-Cubillo I, Periáñez JA, Adrover-Roig D, Rodríguez-Sánchez JM, Ríos-Lago M, Tirapu J, et al. Construct validity of the trail making test: role of task-switching, working memory, inhibition/interference control, and visuomotor abilities. J Int Neuropsychol Soc. 2009;15(3):438-450. [CrossRef] [Medline]
  37. Groff MG, Hubble LM. A factor analytic investigation of the Trail Making Test. Clin Neuropsychol. 1981;3(4):11-13. [CrossRef]
  38. Schear JM, Sato SD. Effects of visual acuity and visual motor speed and dexterity on cognitive test performance. Arch Clin Neuropsychol. 1989;4(1):25-32. [CrossRef]
  39. Gaudino EA, Geisler MW, Squires NK. Construct validity in the trail making test: what makes Part B harder? J Clin Exp Neuropsychol. 1995;17(4):529-535. [CrossRef] [Medline]
  40. Crowe SF. The differential contribution of mental tracking, cognitive flexibility, visual search, and motor speed to performance on parts A and B of the trail making test. J Clin Psychol. 1998;54(5):585-591. [CrossRef]
  41. Meegan DV, Tipper SP. Visual search and target-directed action. J Exp Psychol Hum Percept Perform. 1999;25(5):1347-1362. [CrossRef]
  42. Treisman AM, Gelade G. A feature-integration theory of attention. Cogn Psychol. 1980;12(1):97-136. [CrossRef] [Medline]
  43. Pietrzak RH, Maruff P, Mayes LC, Roman SA, Sosa JA, Snyder PJ. An examination of the construct validity and factor structure of the Groton Maze Learning Test, a new measure of spatial working memory, learning efficiency, and error monitoring. Arch Clin Neuropsychol. 2008;23(4):433-445. [FREE Full text] [CrossRef] [Medline]
  44. Gossweiler R, Kamvar M, Baluja S. What's up CAPTCHA? A CAPTCHA based on image orientation. New York, NY. Association for Computing Machinery; 2009. Presented at: Proceedings of the 18th international conference on World wide web; April 20-24, 2009:841-850; Madrid, Spain. [CrossRef]
  45. Kunda M, McGreggor K, Goel AK. A computational model for solving problems from the Raven's progressive matrices intelligence test using iconic visual representations. Cogn Syst Res. 2013;22-23:47-66. [CrossRef]
  46. Aladwani AM, Palvia PC. Developing and validating an instrument for measuring user-perceived web quality. Inf Manag. 2002;39(6):467-476. [CrossRef]
  47. Vignerons d’Exception. URL: [accessed 2024-06-03]
  48. Renaud-Bray. URL: [accessed 2024-06-03]
  49. LesPAC. URL: [accessed 2024-06-05]
  50. Lufa Farms. URL: [accessed 2024-06-03]
  51. Craigslist. URL: [accessed 2024-06-03]
  52. Bowie CR, Harvey PD. Administration and interpretation of the Trail Making Test. Nat Protoc. 2006;1(5):2277-2281. [CrossRef] [Medline]
  53. Thomas E, Snyder PJ, Pietrzak RH, Maruff P. Behavior at the choice point: decision making in hidden pathway maze learning. Neuropsychol Rev. 2014;24(4):514-536. [CrossRef] [Medline]
  54. Gould MC, Perrin FAC. A comparison of the factors involved in the maze learning of human adults and children. J Exp Psychol. 1916;1(2):122-154. [CrossRef]
  55. Cuvillier M, Léger P, Sénécal S. Studies on the reliability, validity and sensitivity of single-item measurement scales in user experience. HEC Montréal. 2021:57-93. [CrossRef]
  56. Passalacqua M, Léger PM, Nacke LE, Fredette M, Labonté-Lemoyne E, Lin X, et al. Playing in the backstore: interface gamification increases warehousing workforce engagement. Ind Manag Data Syst. 2020;120(7):1309-1330. [CrossRef]
  57. Bracken MR, Mazur-Mosiewicz A, Glazek K. Trail making test: comparison of paper-and-pencil and electronic versions. Appl Neuropsychol Adult. 2019;26(6):522-532. [CrossRef] [Medline]
  58. Tombaugh TN. Trail making test a and B: normative data stratified by age and education. Arch Clin Neuropsychol. 2004;19(2):203-214. [FREE Full text] [CrossRef] [Medline]
  59. Giovagnoli AR, Del Pesce M, Mascheroni S, Simoncelli M, Laiacona M, Capitani E. Trail making test: normative values from 287 normal adult controls. Ital J Neurol Sci. 1996;17(4):305-309. [CrossRef] [Medline]
  60. Arango-Lasprilla JC, Rivera D, Aguayo A, Rodríguez W, Garza MT, Saracho CP, et al. Trail making test: normative data for the Latin American Spanish speaking adult population. NeuroRehabilitation. 2015;37(4):639-661. [FREE Full text] [CrossRef] [Medline]
  61. Shiffrin RM, Schneider W. Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychol Rev. 1977;84(2):127-190. [CrossRef]
  62. Karran AJ, Demazure T, Leger P, Labonte-LeMoyne E, Senecal S, Fredette M, et al. Toward a hybrid passive BCI for the modulation of sustained attention using EEG and fNIRS. Front Hum Neurosci. 2019;13:393. [FREE Full text] [CrossRef] [Medline]
  63. Guertin-Lahoud S, Coursaris CK, Sénécal S, Léger PM. User experience evaluation in shared interactive virtual reality. Cyberpsychol Behav Soc Netw. 2023;26(4):263-272. [FREE Full text] [CrossRef] [Medline]
  64. Courtemanche F, Labonté-LeMoyne E, Léger PM, Fredette M, Senecal S, Cameron A, et al. Texting while walking: an expensive switch cost. Accid Anal Prev. 2019;127:1-8. [CrossRef] [Medline]
  65. Giroux F, Couture L, Lasbareille C, Boasen J, Stagg CJ, Fleming MK, et al. Usability evaluation of assistive technology for ICT accessibility: lessons learned with stroke patients and able-bodied participants experiencing a motor dysfunction simulation. In: Davis FD, Riedl R, vom Brocke J, Léger PM, Randolph AB, Müller-Putz GR, editors. Information Systems and Neuroscience: NeuroIS Retreat 2022. Cham. Springer International Publishing; 2022:349-359.
  66. Zolyomi A, Snyder J. Social-emotional-sensory design map for affective computing informed by neurodivergent experiences. Proc ACM Hum-Comput Interact. 2021;5(CSCW1):1-37. [FREE Full text] [CrossRef]
  67. Demmans Epp C, McEwen R, Campigotto R, Moffatt K. Information practices and user interfaces: student use of an iOS application in special education. Educ Inf Technol. 2015;21(5):1433-1456. [CrossRef]
  68. Vasseur A, Léger PM, Courtemanche F, Labonte-Lemoyne E, Georges V, Valiquette A, et al. Distributed remote psychophysiological data collection for UX evaluation: a pilot project. In: Human-Computer Interaction. Theory, Methods and Tools: Thematic Area, HCI 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part I. Cham, Switzerland. Springer Nature; 2021.
  69. Amershi S, Weld D, Vorvoreanu M, Fourney A, Nushi B, Collisson P, et al. Guidelines for human-AI interaction. New York, NY, United States. Association for Computing Machinery; 2019. Presented at: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; May 4-9, 2019:1-13; Glasgow, Scotland, UK. [CrossRef]
  70. Hussain J, Ul Hassan A, Muhammad Bilal HS, Ali R, Afzal M, Hussain S, et al. Model-based adaptive user interface based on context and user experience evaluation. J Multimodal User Interfaces. 2018;12(1):1-16. [CrossRef]

CAPTCHA: Completely Automated Public Turing tests to tell Computers and Humans Apart
MMSE: Mini-Mental State Examination
MoCA: Montreal Cognitive Assessment
NumRec: numerical recognition
PicRec: pictogram recognition
RM MANCOVA: repeated-measures multivariate analysis of covariance
RPM: Raven’s Progressive Matrices
TMT: Trail Making Test
UX: user experience

Edited by A Kushniruk; submitted 22.06.23; peer-reviewed by L Loeb, C Baxter; comments to author 20.01.24; revised version received 13.02.24; accepted 20.02.24; published 14.06.24.


©Tanguy Depauw, Jared Boasen, Pierre-Majorique Léger, Sylvain Sénécal. Originally published in JMIR Human Factors (, 14.06.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.