This is Part 2 of a 4-part article on NIH’s Effort Preference claim. Part 1 can be found here: https://thoughtsaboutme.com/2024/06/10/the-nih-intramural-me-study-lies-damn-lies-and-statistics-part-1/
In this Part 2 of my 4-part series, I am analyzing the EEfRT data to show that they do not support the claim that ME patients’ symptoms are caused by dysfunctional effort discounting (overestimating of effort and underestimating of rewards and capacity), which is what NIH calls an altered Effort Preference. The authors included a graph, Figure 3a, which is the main illustration of the false Effort Preference claim, that completely misrepresents the EEfRT data and, in short, presents an entirely false picture of the EEfRT results. In addition, they failed to exclude patients who were physically unable to complete hard tasks at anywhere near acceptable levels for the EEfERT data to be valid. Moreover, the authors failed to report—other than their false conclusion—their analysis of a metric that is typically at the heart of the EEfRT analysis: the assessment of whether a group difference in probability sensitivity (typically due to game optimization strategies) is responsible for the lower proportion or number of hard-task choices by patients. Moreover, based on the data reported by the authors, patients performed better on the EEfRT than controls did, which the authors concealed by not sharing the relevant analysis (virtual rewards obtained). I will also show that the recorded EEfRT data is unreliable as at least some of it is false. In addition, I will identify a large number of careless mistakes in the paper with respect to the EEfRT, demonstrating that NIH’s work on ME was phoned in.
This post is the longest in the series and requires a fair amount of stick-to-it-iveness both in terms of length and complexity of the issues and details discussed. I realize that this will, unfortunately, be beyond the limits of many ME patients, but I decided not to divide it into smaller parts due to the connectedness of the issues and in order to allow for easy sharing with and reporting to the appropriate authorities and other interested parties of the main reasons for why this study should be urgently investigated and retracted.
In order to follow along, it is important to understand the EEfRT game rules as well as the alleged findings, so I will begin with explaining those.
Modified EEfRT Game Rules
The modified EEfRT as used by the investigators is a multi-game test in which participants complete a series of repeated button-pressing trials with the goal of winning as much virtual money as possible. On each trial, participants were asked to choose between a hard and an easy task. A hard task involved pressing a button 98 times in 21 seconds using the non-dominant pinky finger; an easy trial required pressing a button 30 times in seven seconds with the dominant index finger. During the trials, each button press fills a white bar gradually with red color indicating progress toward completion of the task.
Participants were told that they would win the virtual money allocated to each trial if they, by pressing the button quickly enough, raise the bar to the top within the time allowed and if the trial was a win trial. Participants were not guaranteed to win the reward if they completed the task; if a trial was a no-win trial, participants did not win the allocated amount even if they successfully, i.e., timely, completed the task. Before choosing a hard versus an easy task, participants were informed of the probability of a trial being a win trial and of the potential reward value, the winnable virtual dollar amount, of each trial.

Based on that information, participants decided whether to choose the hard or the easy task for each trial. There were three levels of reward probability: 12% probability, 50% probability and 88% probability. The specified probability level for each trial was the same for hard or easy tasks, and there were equal proportions of each probability level across the test. Each easy task was eligible to win $1; hard tasks were eligible to win between $1.24 and $4.12. The levels of probability of reward attainment and the reward magnitude for hard-task choices were presented in the same order for each participant.
Participants were told at the beginning of the EEfRT that they would get to take home the actual dollar amount from two of their winning trials, which would be randomly chosen by the computer program, at the end of the test. The minimum amount a trial participant could take home was $2, and the maximum amount was $8.24.
Each trial started with a one-second blank computer screen, which was followed by a choice period of five seconds during which participants were informed of the probability of receiving a reward and the reward value assigned to that trial. If participants did not select an easy or hard task during the five-second choice period, a task was randomly assigned to them by the computer. After choosing a hard or easy task, there was another one-second blank screen before the task began. After the time for a task was up, participants received on-screen feedback regarding their successful completion of the task and whether and how much money they won. Participants continued to choose and attempt to complete hard or easy tasks for a total of 15 minutes. The button pressing for the hard task took three times as long as the button pressing for the easy task, but with the pre-task items (blank screens and choice time) and the post-task items (feedback on completion and amount of virtual money won, if any), hard tasks ultimately took about twice as long as easy tasks.
(I use the terms hard tasks and hard trials interchangeably because once a participants chose a hard task for a given trial, that trial became a hard trial.)
The Study’s Alleged Findings
Below is a summary of the authors’ Effort Preference claims:
1. Hard-Task Choices. This is the main Effort Preference claim made in the paper. According to the authors, ME patients chose significantly fewer hard tasks than controls (p=0.04). After applying some statistical legerdemain to the raw data—including eliminating the data of a poorly performing control who chose the lowest number of hard tasks among controls by a large margin, on par with the lowest-performing ME patient, who was not excluded—the authors claim that the probability of choosing hard tasks was significantly higher in controls compared with ME patients at the beginning of and throughout the test (p=0.04) . The proportion of hard-task selections by ME patients was used as a correlate for what NIH calls Effort Preference, the decision to avoid the harder task, which the authors claim indicates an altered so-called Effort Preference in ME patients (Figure 3a).
The authors further claim that this result cannot be explained by fatigue sensitivity because there was no group difference in the decline over time regarding the ratio of the hard-task selections (p=0.53), by reward sensitivity because both groups increased their ratio of hard-task choices at the same rate with increasing reward value (p=0.07), or by probability sensitivity because there was no group difference in participants’ ratio of hard-task choices based on the probability of a trial being a win trial (p=0.43).
2. Button-Pressing Rate. ME patients demonstrated a significant decline in button-pressing rate over time while performing easy tasks (Figure 3b, p=0.003). Because such decline was not observed during hard tasks (Figure 3b), the authors concluded that the decline was not due to fatigue.
3. Completion Rate. ME patients were less likely to complete hard tasks than controls “by an immense magnitude” (p=0.0001) but not less likely than controls to complete easy tasks (p>0.05).
4. Pacing During Easy Tasks. Because the decline over time in the button-press speed of ME patients for easy tasks (see 2 above) did not result in a group difference with respect to the probability of ME patients’ completion rate for easy tasks (see 3 above), the authors concluded that patients “reduced their mechanical effort while maintaining performance on the easy tasks,” i.e., that ME patients were pacing during the easy tasks.
Hard-Task Choices
The metric underlying the authors’ Effort Preference claim is the proportion of hard-task choices the two groups made. The investigators used the ratio of hard-task choices as a correlate for an alleged misperception by patients as to their abilities or what the authors call an altered Effort Preference.
Exclusion of Control F and Improper Inclusion of Patients Too Sick for the EEfRT
According to the Figure 3a spreadsheet (attached to the paper in the Source Data file), the investigators determined that the EEfRT data of control F was invalid and excluded that data from their analysis. It is possible that I overlooked it—this is a long paper with multiple sizable attachments that do not cross-reference each other well or at all—but I was unable to determine what led to the alleged invalidity of that individual’s data. As far as I could tell, the fact that this data was excluded is not mentioned in the paper, let alone explained. Other EEfRT studies discuss why certain participants’ data were excluded, if any, as is customary in scientific papers. Moreover, the NIH paper itself explains why individuals were excluded with respect to other testing (see Supplementary Results, page 18, “Sex-based differences in Gene Expression were Validated in other Data Sets”).
Excluding the data of control F was certainly convenient for the authors since that individual chose by far the fewest total hard tasks in the control group, on par with the ME patient who chose the fewest total hard tasks, whose data were not excluded. Was the exclusion of this particular control what the authors needed to get over the statistical-significance hurdle, which they barely did with a p value of 0.04? Controls chose an average of 19.25 hard trials per control, but when you include control F, that number goes down to 18.65.
In addition, five of the fifteen ME patients who participated in the EEfRT—one third of the patient group—were physically unable to complete hard tasks at an acceptable rate as evidenced by an extremely low completion rate for their hard tasks (each far less than 50%). They completed hard tasks at a combined rate of less than 16% whereas controls completed hard tasks at a rate of more than 96%. The authors themselves did not mention those percentages but found, based on the group’s actual performance, that patients were less likely to complete hard tasks compared to controls “by an immense magnitude” (p<0.0001). Had the authors excluded the data from patients that were unable to complete at least 50% of the hard trials as they were required to under the EEfRT (discussed in detail below under “Confounding Factors and Validity Issues of EEfRT—Physical Inability of ME Patients to Complete Hard Tasks”), the average number of hard tasks chosen per patient would have gone up from 16.6 to 17.3.
Four out of six patients (the sixth one completed barely more than 50% of hard tasks) who had the most physical difficulty completing hard tasks at a rate acceptable for EEfRT validity purposes chose the fewest hard tasks in the patient group. This is an indicator that their physical struggle to complete the hard tasks directly impacted their hard-task choices. Therefore, their decisions whether to choose hard or easy tasks reflected their physical limitations, not a misunderstanding of what they are capable of performing or of disrupted effort discounting.
Consequently, the question is if there would still be any statistical significance with respect to the ratio of hard tasks chosen by the two groups if control F had not been excluded and if the five patients who struggled to complete hard tasks had been excluded. Had the authors done this, the average number of hard tasks chosen by patients would have been 17.3, and the average number of hard tasks chosen by controls would have been 18.65—a tiny difference.
Example Graphs
The investigators created example graphs for their interpretation of what various outcomes—effort sensitivity, fatigue sensitivity, and reward sensitivity—would look like (see Supplementary Figures S5b-d below).
Fatigue sensitivity. They theorized that if a difference in fatigue sensitivity between the groups had been the reason for patients having chosen fewer hard tasks than controls, i.e., if the ratio of hard-task choices by patients had decreased over time at a rate greater than those of controls, that would indicate that abnormal fatigue sensitivity rather than an issue of false perception of effort explains the fewer hard-task choices by patients. The authors showed a simplified version of what fatigue sensitivity would look like in Supplementary Figure S5c below, where the ratio of hard-task choices decline in one group, the one with increased fatigue sensitivity, as participants complete more trials.
Reward sensitivity. If, on the other hand, patients did not value rewards properly, i.e, did not choose more hard tasks as reward values increased at a rate comparable to controls, then patients would demonstrate diminished reward sensitivity, i.e., an issue with effort discounting (or an altered Effort Preference). That is, as shown in Supplementary Figure S5d below, in the absence of appropriate reward sensitivity, the proportion of hard-task choices would not rise as reward values increase.
Effort sensitivity. Finally, if patients had had an “aversion” to effort, there would have been a group difference in effort sensitivity or Effort Preference, i.e., patients would have selected a lower percentage of hard-task choices at the beginning of and throughout the entire task, illustrated by two parallel lines with the control group sitting higher on the y-axis than the patient group (see Supplementary Figure S5b below). This, according to the authors, would demonstrate that disrupted effort discounting (or an altered Effort Preference) explains the lower rate of hard-task choices by patients.
Supplemental Figures S5b-d:

The Alleged Outcomes
The authors compared the ratio of hard-task choices made by the two groups and claim that (1) controls chose more hard tasks than patients (p=0.04) and (2) the probability of choosing hard tasks is significantly higher in controls than in patients “at the start of and throughout” the EEfRT (p=0.04). They refer to Figure 3a for both claims.
Figure 3a (The same outcome is depicted in Supplementary Figure S5e.):
However, Figure 3a (or the basically identical Supplementary Figure S5e) tells us nothing about the first finding (actual number of hard-task choices made by groups per trial) because it only depicts the probability of choosing the hard task. The actual hard-task choices and the probability of choosing the hard task are not interchangeable, and the same graph can, obviously, not illustrate two separate findings with different parameters, but that is exactly how that graph is used in the NIH paper. There is no graph in the paper depicting the actual hard-task choices or proportion of hard-task choices. (The authors seem to refer to number of hard tasks chosen and the proportion of hard-task choices interchangeably.)
With respect to the second finding (probability of choosing the hard task) depicted by Figure 3a, that approach grossly distorts the data. The use of estimating techniques is not appropriate when one has the actual data regarding the proportion or number of hard-task choices that were made for each trial, which was the case here. In other words, one cannot convert the proportion or number of actual hard-task choices into a probability of hard-task choices with respect to EEfRT choices that have already been made.
To the degree that the authors’ claim regarding the probability of choosing hard tasks might have been forward-looking, that would be preposterous. The probability of what? The probability of the same 15 ME patients choosing the same number of hard tasks on re-testing, the probability of a different group of 15 ME patients choosing the same number of hard tasks, or the probability of the entire ME patient population choosing the same number of hard tasks? Surely, the data of 15 patients (some of whom likely did not have ME), cannot tell us anything about what the probability of other ME patients choosing the hard tasks on the EEfRT would be.
A scenario in which the use of estimating techniques might be appropriate is when the researchers have only a few data points and need to fill in the likelihood of the hard-task choices on the non-measured data points. That is not the case here as the investigators collected data for each trial. The authors did not explain why they converted the actual proportion or number of hard-task choices into the probability of choosing the hard task.
No group difference in fatigue sensitivity. Because Supplementary Figure S5e (see below) does not resemble Supplementary Figure S5c (see above), the investigators concluded that there was no group difference in fatigue sensitivity:
“Two-way interactions showed no group differences in response to task-related fatigue….”
and
“Lack of interaction indicates similar fatigue sensitivity.”
(Remember that Supplementary Figure S5e (below) is basically the same graph as Figure 3a (above).)
In other words, the percentage of hard-task choices decreased at a similar rate by group as the number of trials increased, indicating that a difference in fatigue sensitivity does not explain the fact that, overall, patients made hard-task choices at a lower rate than controls.
What the authors omitted from their fatigue-sensitivity analysis is that patients were likely in a so-called adrenaline surge during the 15 minutes of the EEfRT. Adrenaline surges allow ME patients to temporarily display higher functionality due to bursts of false, unsustainable energy (possibly driven by adrenaline) when patients are unable to pace, such as for medical appointments, emergencies, important tasks, participating in the first NIH intramural ME study in decades, etc. Those adrenaline spikes often result in so-called crashes when the patients’ system do not correctly down-regulate and are, therefore not a reflection of patients’ true or safe capacity. Despite describing this symptom in ME in one of the nested NIH studies on post-exertional malaise (PEM) done as part of the phenotyping study and published years before the phenotyping paper, the phenotyping investigators proceeded with the EEfRT inquiry without taking the impact of adrenaline surges into account.
No group difference in reward sensitivity. Because Supplementary Figure S5f (see below) does not look like Supplementary Figure S5d (see above), the authors concluded that there was no group difference in reward sensitivity:
“Two-way interactions showed no group differences in response to … reward value….”
and
“Lack of interaction indicates similar reward sensitivity.”
In other words, the percentage of hard-task choices by the two groups increased at a similar rate as reward values increased, indicating that a difference in reward sensitivity does not explain the fact that, overall, patients made hard-task choices at a lower rate than controls. Consequently, there is nothing wrong with how patients valued rewards or with their effort discounting with respect to rewards.
Supplementary Figures S5e-f:

Because they found no group difference in fatigue or reward sensitivities, the authors concluded that only an “unfavorable” effort sensitivity, i.e., an altered Effort Preference, in ME patients can explain the difference between groups in the proportion of hard tasks chosen. That interpretation is incomplete—having left out probability sensitivity—and, therefore, incorrect.
Group Difference with Respect to Probability Sensitivity
The authors decided not to include a graph analyzing whether a group difference in probability sensitivity, choices made based on the probability of a trial being a winning trial, was the reason for the difference in the proportion of hard-task choices between groups. I will discuss this in detail below (under “Game Optimization Strategy”), but in essence, there was a group difference with respect to low and medium-probability trials only but not with respect to high-probability trials, which is evidence of patients having made strategic, i.e., smart, choices in line with the EEfRT instructions to win as much virtual money as possible. Therefore, there is nothing wrong with how patients incorporated into their hard-task decisions the probability of trials being win trials.
(Please be careful not to confuse the probability of choosing the hard task (graphed in Figure 3a and Supplementary Figure S5e) with the probability of a trial being a win trial in accordance with the EEfRT game instructions; those are completely different aspects of the EEfRT testing.)
Statistical Legerdemain
Before I address the failure of NIH to include the probability-sensitivity analysis (under “Game Optimization Strategy”), let’s focus on Figure 3a (identical to Supplementary Figure S5e), which is at the heart of the Effort Preference claim.
Figure 3a:
This graph certainly makes it look as though patients chose fewer hard tasks than controls from the outset and throughout the test, doesn’t it? However, Figure 3a is the result of the investigators’ application of some statistical legerdemain to the actual data. The actual raw data as a percentage of hard tasks chosen by each group per trial when no statistical measures are applied paints a completely different picture.
I created the graph below depicting the percentage of hard-task choices made by the groups per trial, which is based on the data underlying Figure 3a, provided by NIH in the corresponding spreadsheet attached to the paper. (I excluded the data from control F as their EEfRT data was determined by the investigators to be invalid and excluded.) What actually happened during the EEfRT looks nothing like Figure 3a or what the researchers want us to believe in terms hard-task choices patients made when compared to controls.

(The above graph reflects a minor correction to the one I originally published.)
To allow the reader to get a sense of the actual numbers of hard tasks chosen by the groups, I also created the following graph depicting the total number of hard tasks chosen by group per trial. Because there were 16 controls and 15 patients, I excluded (in addition to control F whose data was allegedly invalid) an additional control to arrive at the same number of ME patients and controls (15). For that I chose control Q, who had selected 19 hard-task trials. As mentioned, the average number of hard-task choices per control was 19.25, so this graph is slanted slightly in the authors’ favor. There is little difference between the two graphs I generated, which is to be expected because control Q performed in the middle of the pack in the control group, but in any event, the two groups performed almost identically in both graphs.
The following area graph shows a bit more clearly just how slim the group-difference margins are. Basically, the area to which the black arrow points—a total of four out of 50 trials—is the main difference in terms of number of hard-task choices between the groups and, therefore, the basis for NIH’s claim that a dysfunctional Effort Preference, i.e., an alleged misunderstanding by patients as to their true capacity, defines ME.
It is easy to see why the authors chose not to generate a visual of what actually happened during the EEfRT and instead resorted to manipulating the data with statistical tools until they arrived at a figure that fit their desired outcome (Figure 3a and Supplemental Figure S5e). The latter allowed them to make it look as though patients chose significantly fewer hard tasks for every single trial throughout the EEfRT while the former shows clearly that their Effort Preference claim has no legs.
Let’s look more closely at the data in relation to the paper’s claim that the difference in hard-task choices persisted from the beginning of and throughout the EEfRT. Dr. Nicholas Madian, psychologist and NIH postdoctoral fellow who was apparently responsible for the implementation of the EEfRT, repeated, during the NIH Symposium (at 2:29:22), the paper’s false claim. When discussing fatigue sensitivity, he said, “We did again see a difference at baseline, which persisted throughout the task, indicating differences in effort discounting.” That is categorically false.
At the start. Contrary to NIH’s assertion, controls did not choose more hard tasks at the start of the EEfRT. The exact break-down of the total number of actual hard tasks chosen by each group during the first four trials (including all patients and all controls except control F whose data were excluded by the authors) is as follows:
Out of the first four trials, ME patients and controls chose the exact same number of hard tasks per participant. For the very first trial, arguably “the start” of the EEfRT, patients chose twice as many hard trials as controls. Contrary to what the paper and Madian claimed, controls did not choose more hard tasks than ME patients at the start of the EEfRT.
Throughout. NIH’s claim that controls chose more hard tasks throughout the entire EEfRT, with Figure 3a giving the impression that they chose more hard tasks in every trial, is false, too. For 34% of the trials, ME patients chose hard tasks at a higher rate than controls. For another 2% of trials, both groups chose the same percentage of hard tasks. During an additional 14% of tasks, both groups’ hard-task choices were nearly identical. This is entirely contrary to the impression the authors give, i.e., that controls chose more hard tasks in every single trial.
What happened? If the authors had used bar graphs to depict the group difference in hard-task choices as other EEfRT studies typically do, they could have shown that the patient group undeniably chose slightly fewer hard tasks and a slightly lower ratio of hard-task trials than the control group. However, in order to demonstrate that fatigue was not a factor for that group difference—something that was clearly a priority for the authors given NIH’s pathological, habitual framing of ME as fatigue, which ironically threatened to get in the way of the altered-Effort Preference claim—the actual hard-task choices would have had to be charted by trial, in order to show that both groups fatigued at a similar rate, thereby ruling out a difference in fatigue sensitivity as a factor. The resulting graph would have looked like my graphs above, i.e., clearly undermining their claim. Graphing the raw data by trial would also have made it impossible to claim that patients chose fewer hard tasks at the beginning of and throughout the EEfRT because it would have been obvious that this was not the case. The only path to alleging that patients underestimate what their bodies can perform was to use statistical manipulation that would result in a smooth graph (Figure 3a) that allowed for the false claim that patients chose fewer hard tasks from the start and on every single trial throughout the EEfRT, neatly supporting the Effort Preference claim. The only problem is that Figure 3a is contradicted by the data.
Random Assignment of Tasks
The data is even weaker than I have demonstrated with my graphs because the random assignment of tasks (hard versus easy) when no choice was made in the five-second choice period skewed the number of hard-task choices in favor of controls due to the fact that, in the case of controls, more than half of the randomly assigned tasks were hard tasks (57%). By contrast, for ME patients, only one third of the randomly assigned tasks were hard tasks (33%). This is further amplified by the fact that ME patients had tasks randomly assigned to them more than twice as often (15 times) as controls (seven times).
Such uncontrolled differences undermine the conclusions, especially given the tiny cohort sizes. Of course, it is preposterous to include in the measure of hard-task choices those trials where no choice was made. Other EEfRT studies (for example, this one) excluded trials for which participants failed to make a choice within the choice period. That does not entirely cure the problem as participants might have chosen the hard task for those trials, and this becomes particularly problematic when the number of times this happens differs by groups in a meaningful way as it did here, but it is preferable to including the data as the NIH authors have done. The number of these intances is not dramatic, but when the margins of group difference are as slim as they were here with respect to hard-task choices, the impact of even just a few cases can easily change the outcome.
Misrepresentation of Hard-Task Choices as the Relevant EEfRT Measure
It is indisputable that controls chose slightly more total hard tasks than ME patients on the EEfRT. However, that is irrelevant for two reasons: (1) the correct measure of who performed better in the EEfRT is the amount of virtual rewards won, which is not determined by the proportion or number of hard tasks chosen (discussed below under “Game Optimization Strategy”) and (2) the reason for the difference in the proportion of hard-task choices was a group difference in probability sensitivity, which means that patients made strategic choices in order to win the game in accordance with the instructions (discussed below under “Game Optimization Strategy—Probability Sensitivity”), leading to their choosing of fewer hard tasks. In fact, selecting fewer hard tasks resulted in ME patients winning more virtual money than controls based on the reported data, demonstrating that, to the extent that the EEfRT tells us anything about motivation or effort discounting, patients exhibited superior motivation and effort discounting compared to controls contrary to NIH’s claim (discussed below under “ME Patients Performed Better on EEfRT”). Both points completely gut NIH’s claim of disrupted effort discounting or an altered Effort Preference in ME patients.
Let me begin by addressing the impact of a game optimization strategy and the group difference in probability sensitivity.
Game Optimization Strategy
A major validity issue of the EEfRT is the confounding factor of game optimization strategies used by participants. After all, the EEfRT is a game with varying reward and probability levels. In its discussion of the EEfRT, the NIH authors assert the following:
“The primary measure of the EEfRT task is Proportion of Hard Task Choices (effort preference). This behavioral measure is the ratio of the number of times the hard task was selected compared to the number of times the easy task was selected. This metric is used to estimate effort preference, the decision to avoid the harder task when decision-making is unsupervised and reward values and probabilities of receiving a reward are standardized.”
The assertion that the proportion of hard-task choices is the primary measure of the EEfRT is demonstrably false. Based on the EEfRT instructions, it is improper to use the EEfRT as a measure of motivation—or an alleged false perception of effort as NIH has done. In the structure of the EEfRT, always choosing the hard task over the easy task (even choosing the hard task a large majority of the time) is not optimal if one is trying to receive the maximum reward. The use of rewards is designed to be the motivating factor, and winning as much money as possible is the goal of the test. For example, one EEfRT paper clearly and simply states the following (other EEfRT papers contain similar language):
“The goal of the EEfRT is to win as much money as possible by completing easy or hard tasks.”
Hence, merely looking at the relative proportion of hard versus easy tasks is not the correct way to assess results of the EEfRT because that approach would most definitely not lead to a maximization of rewards. If the instructions had been to choose as many hard tasks as possible, then the proportion of hard tasks chosen would be the primary outcome measure, but that is not the case in EEfRT studies and was not the case in the NIH study.
The optimal approach to increase one’s chances of receiving the maximum virtual rewards is more complex. It involves reviewing the parameters of the test to determine the best reward strategy. The key elements for each trial are the hard versus easy choice, the probability of the trial being a win trial (one where a reward is earned upon successful completion of the task), and the amount of the potential reward. It is also critical to note that the easy-task choice requires only about half the time of the hard-task choice. Therefore, making the easy-task choice allows for more trials to be completed within the 15 minutes allocated for the EEfRT and also increases the ability to choose potentially higher-reward/higher-probability hard tasks later in the test. Hence, the optimal strategy would generally dictate making the easy-task choice for low-probability/low-reward trials and the hard-task choice for high-probability/high-reward trials. With respect to 50% probability trials, the hard task would be optimal only in the case of very high reward levels and possibly not even then.
For example, if a trial has a 12% probability of being a win trial and a hard-task-reward magnitude is $1.24 (remember, an easy-task choice always has a reward value of $1), it is clear that a participant should make the easy-task choice. In such a trial, she has the same chance (12%) of obtaining about the same reward ($1 versus $1.24) and has the time to do another easy-task trial to win another dollar in the same amount of time it would take to complete the hard-task trial. That would result in winning $2 by choosing two easy trials as opposed to $1.24 by choosing one hard trial, which would take about the same amount of time as the two easy trials. Moreover, the second easy trial might have a higher probability of being a winning trial. That same logic holds for most low-probability trials, particularly when the potential hard-task reward is less than $2 or $3, which is the case for five or 11, respectively, out of the 18 different potential hard-task reward values. Of course, any trial with a 12% probability is not likely to get any reward at all, so choosing the hard task is not at all a compelling choice. In other words, why choose the hard task if there is only a 1 in 8 chance of winning?
If a participant chose only hard-task trials, she would complete fewer than 32 trials (we do not know how long the feedback period is, so it is not possible to pin this down exactly) in the allowed 15 minutes for the modified EEfRT test (a third with only 12% probability of winning) versus 64 easy-task trials. Only choosing easy tasks is also not the correct strategy as that would leave significantly higher rewards from hard tasks on the table.
As a result, various EEfRT studies have excluded “inflexible” participants, those who chose only either easy or only hard tasks. For example, one EEfRT study “removed from analyses participants who made only hard selections across all reward levels. This removed from analysis subjects who had no room to demonstrate increases in effort allocation.”
Numerous prior EEfRT studies have cautioned about the confounding nature of game optimization strategies. For example, a 2022 study examining the reliability and validity of the EEfRT (EEfRT reliability and validity study) concluded that:
“the original EEfRT comes with a major downside: At least some participants understand that choosing the hard task is often lowering the possible overall monetary gain as the hard task takes almost 3 times as long as the easy task and the overall duration of the task is fixed. Hence, at least some participants’ choices are partly based on a strategic decision and less on approach motivation per se.”
The same study found that:
“the percentage of hard-task-choices within the original EEfRT did not correlate with participants‘ motivation to win money in any trial category”
and that
“[t]he original EEfRT has been shown to be partly related to individual strategic behavior, which is not related to participants’ actual approach motivation.”
According to another schizophrenia EEfRT study, the failure to “put forth the mental effort required to develop a systematic allocation strategy with regard to reward and probability information” is a sign of lack of motivation itself. Conversely, employing a strategy on EEfRT testing, as patients very effectively did, is evidence of strong motivation.
Prior EEfRT studies urged future EEfRT researchers to inquire with participants about their potential use of strategies in an attempt to limit their confounding impact as well as to modify the EEfRT to remove optimization strategies from the equation:
“Therefore, future studies should counteract these limitations by (1) systematically asking participants about their strategies while playing the EEfRT and/or (2) optimizing the EEfRT, such that the only valid strategy for participants to increase their rewards is to increase their effort allocation.”
Nevertheless, the NIH investigators chose not to do so.
Obviously, the optimal strategy involves a mix of hard and easy-task trials, mostly depending on probability of winning. In fact, as noted above, a prior EEfRT study excluded from analysis “inflexible” participants because they obviously did not take the instructions for, and the goal of, the EEfRT into account in their choices.
A computer could, no doubt, calculate the precise optimization strategy in each case, but this is not so easy for EEfRT participants during the brief time they are given to choose a task (five seconds). The bottom line is that, to the extent that the EEfRT has any validity, the key result measure is the amount of rewards earned during the trials and definitely not the percentage of hard-task choices.
(In some EEfRT studies, as was the case with the NIH version of the EEfRT, the ultimate reward actually paid out to participant is based on two win trials randomly selected at the end of the EEfRT. Since the rewards in the hard-task trials are higher than in the easy-task trials, the random rewards at the end make it a slightly more optimal choice to select a hard task on the margin. However, those randomly selected trials are not likely to influence participants’ choices because the selection among the win trials at the end of the test is random and, therefore, unpredictable and because there is no feedback for them due to the reward delay (temporal discounting) whereas there is immediate reward feedback for each trial, which will influence participants’ strategy as they go through the trials and make the choice for each of them.)
Probability Sensitivity
As explained, utilizing a game optimization strategy can impact the choices made with respect to hard versus easy tasks based on the probability of a trial being a winning trial. The NIH authors claim that the EEfRT measures only effort (number or proportion of hard-task choices), the potential impact of fatigue, and the potential impact of reward sensitivity.

That is an incomplete, and, therefore, false statement to the extent that the implication is that the EEfRT does not assess probability sensitivity. The EEfRT does measure probability sensitivity, which is typically analyzed in EEfRT studies in addition to effort, fatigue, and reward sensitivities. For example, an EEfRT paper on anhedonia in Major Depressive Disorder expressly states that “[p]robability is manipulated in the EEfRT.”
Not only that, but probability sensitivity is far and away the most important factor in choosing between a hard and an easy task because the probability of getting any reward at the 12% probability level (1 out of 8) is 1/7th of the probability of winning a reward at the 88% probability level (7 out of 8 ). On the other hand, the reward values vary only by a factor of about four (between $1 and $4.12 ). The relevant inquiry with respect to probability sensitivity is whether, as part of a game optimization strategy, patients chose a lower percentage of hard tasks than controls when the probability of winning was only 12% or 50% as opposed to 88%. A group difference with respect to probability sensitivity would explain the difference in the proportion of hard-task choices or effort between the groups, and there would be no basis for any conclusion that ME manifests as a disrupted perception of effort or a misjudging of ability (i.e., Effort Preference).
The sharing of the analysis of the data not only with respect to reward levels, which the authors did, but also regarding probability levels, which the authors did not include in their paper, is typically at the core of EEfRT studies. In line with their false assertion that the EEfRT does not assess probability sensitivity, the authors shared graphs that depict their findings only with respect to fatigue sensitivity and reward sensitivity (see above under “Hard-Task Choices”) and, unlike other EEfRT studies, the NIH investigators did not include any graphs for their analysis of the probability sensitivity. Instead, they merely claimed that there was no group difference with respect to probability sensitivity:
“Two-way interactions showed no group differences in responses to … reward probability (ROR = 0.50 [0.09, 2.77], p = 0.43)….”
That is false.
The heavily redacted Peer Review File, which includes the comments of some but not all peer reviewers (and the corresponding answers by NIH), shows that one of the reviewers (we do not know the names of the reviewers other than Dr. Anthony Komaroff’s, and we do not know who made this particular comment) raised the issue of proper analysis of the EEfRT data. The reviewer’s question was redacted in its entirety. Based on NIH’s answer, the reviewer seems to have suggested the use of a different statistical tool, the Cooper 2019 approach, rather than the tool the NIH investigators chose. In their response to the reviewer comment, NIH rejected the Cooper 2019 approach, which apparently would have shown any group difference regarding probability sensitivity according to NIH. Below is NIH’s explanation for their choice:
“[The Cooper 2019 approach] is designed to dissect out how participants are making their decisions (i.e. which aspects of the task are being weighed in making decisions about hard/easy task selection). Use of the Cooper 2019 approach would help determine the contribution of individual aspects of the task to the performance outcome, such as how subjects integrate reward, effort, and probability to guide decision-making. As our data did not show differences in reward sensitivity and probability sensitivity by group, this approach seems unlikely to provide information regarding the primary outcome.” [emphasis added]
In other words, the NIH investigators’ rationale for rejecting the Cooper 2019 was that there was allegedly no difference in probability sensitivity between groups as the paper itself also asserts. Again, that is incorrect. There was a group difference with respect to probability sensitivity.
I generated the following graph based on the raw data attached to the paper. The graph shows the relative percentage of hard-task choices between the two groups at the three probability levels. As you can see, there is a substantial group difference in probability sensitivity at the lower probability levels (12% and 50%) and essentially the same probability sensitivity between groups at the 88% probability level.
(I again excluded the data from control F, which was deemed to be invalid by the investigators.)
It is true that patients overall chose hard tasks at a lower rate than controls, but almost all of the difference in hard-task choices occurred in the 12% and 50% probability trials where an optimal strategy is most impactful and dictates significantly fewer hard-task choices if one is looking to maximize the total virtual rewards in accordance with the EEfRT instructions. In contrast, at the 88% probability level, a hard-task choice is optimal except for trials with very low potential reward values.
At the 12% probability level, controls chose about 46% more hard tasks than patients; at the 50% probability level, controls chose about 40% more hard tasks than patients; and at the 88% probability level, controls chose less than four percent (4%) more hard tasks than patients. There is clearly a statistically significant group difference with respect to choosing hard tasks between the lower probability levels (12% and 50%) and the high probability level (88%).
In essence, ME patients seem to have made strategic decisions about when to choose hard trials in order to win as much virtual money as possible in accordance with the EEfRT instructions. That was the right decision as confirmed by the fact that patients indeed won more virtual money than controls in accordance with the reported data, which I will discuss in the next section. This difference in probability sensitivity—and not disrupted effort discounting or an altered Effort Preference—explains the group difference with respect to choosing hard tasks.
Based on NIH’s Own Data, ME Patients Performed Better on EEfRT
With the understanding that the goal of the EEfRT is to maximize virtual winnings and that the participants were instructed accordingly, let’s look at which group actually performed better as measured by the virtual rewards won by both groups.
ME patients received, on average, more virtual rewards during their win trials ($58.13 per patient) than controls ($56.71 per control) and, therefore, out-performed controls on the EEfRT despite choosing fewer hard-task trials. ME patients’ selection of fewer hard choices was the better strategy as demonstrated by the ultimate results. Maybe this difference is not statistically significant—the margins are slim—but the fact remains that ME patients did better on EEfRT testing according to the data the authors reported, so if there was no statistically significant group difference, the authors should have reported that their effort inquiry did not yield anything. Instead, they ignored the most relevant EEfRT outcome altogether, and what they did report is entirely contrary to the actual result of the EEfRT, allowing them to make their spurious Effort Preference claim.
I created the following graph to illustrate those results.
(I again excluded the data from control F, which was deemed to be invalid by the investigators.)
The investigators’ use of the EEfRT to try to show that bodies of ME patients are able to perform at a higher level than patients think, due to dysfunctional effort discounting, was a complete failure. The data actually shows the opposite: that patients, despite being gravely ill and physically and cognitively limited, performed better than controls on the EEfRT by winning more virtual rewards than controls and that there is nothing wrong with the conscious or unconscious motivation of patients, with their effort discounting, or with their Effort Preference.
False Recording of EEfRT Data
The spreadsheet with the raw EEfRT data (Figure 3a spreadsheet in the Source Data File, attached to the paper) shows that there was a serious data recording or data entry issue with respect to the granting of rewards. In the case of 79 trials (including practice trials, excluding control F), the spreadsheet recorded the granting of rewards despite those trials not having been completed. This is contrary to the EEfRT game rules and what the participants had been told at the beginning of the task. Essentially, rewards were granted that had not been earned.
This happened more frequently in the case of the patient group, but with the proper exclusions (see details below under “Confounding Factors and Validity Issues of the EEfRT—Physical Inability of ME Patients to Complete Hard Tasks”), the two groups performed essentially the same (with less than a 1% difference), so these results are basically a tie despite patients’ severe limitations. This demonstrates that patients did not overestimate effort, underestimate rewards, or underestimate their capacity and do not have an altered Effort Preference even when one deducts for both groups the rewards that were granted improperly, i.e.., granted despite being unearned.
Some examples of falsely recorded reward data are depicted below. Note the highlighted entries in the last two columns depicted below.
This is what Dr. Walter, Koroshetz, Director of the National Institute of Neurological Disorders and Stroke (NINDS), called “data as pristine as you can get” (at 02:16) during the May 2, 2024 NIH Symposium on the study. There is a tell in how this study is being presented by NIH. Whenever the authors or NIH bureaucrats or surrogates use superlative statements—for example, that the patient cohort is exceptionally “clean,” that the data is “pristine,” that this is the “best study ever done,” that this is a groundbreaking study that, for the first, time has found any biomedical abnormalities in ME, etc.—they are trying, by sheer repetitive brute force and appeals to NIH’s authority, to convince the public that there is no reason to scrutinize the study in any way. If only they repeat these self-aggrandizing assertions often enough, they think, people might just believe that they are true, and, unfortunately, that sometimes works with new patients or patients who have not followed ME politics or the history and science of the disease. Every time we let NIH or CDC or FDA placate us with obvious falsehoods, another decade is lost for patients.
Given the significant number of instances where reward data was falsely recorded (56.03% of uncompleted trials) and the extremely large number of data entered into the Figure 3a spreadsheet (1621 rows and 15 columns for a total of 24,315 entries), it is highly unlikely that these would have been the only mistakes made in the process of recording or entering the data in the spreadsheet. It is also possible that the computer administered the EEfRT incorrectly. In any event, in light of these mistakes, none of NIH’s EEfRT data can reasonably be relied upon.
(Added June 13, 2024: A different interpretation of the data in column K (“Reward Granted”) has been suggested. Under that theory, NIH, instead of falsely entering the data, might have mislabeled column K and recorded in that column whether or not a particular trial was a win trial, i.e., a trial for which rewards were earned by every participant who successfully completed that trial as opposed to whether a reward was granted to the participant identified in column A only.
If NIH, indeed, mislabeled that column, that would be a significant issue. Words have meaning, and if NIH gets to, after the fact, just say, “Oh, no, we meant something entirely different than what we said,” then none of their findings have any significance. Maybe Neurally Mediated Hypotension really means brain damage?
Moreover, interpreting the title NIH used for column K (“Reward Granted”) to mean Win Trial is entirely unreasonable and utterly indefensible in the universe on this side of the Looking Glass. The data entered in the spreadsheet for Figure 3a is clearly data specific to each participant: age, sex, valid data, trial number, trial difficulty, choice time, recorded presses, successful completion, completion time, button press rate. The only exceptions are where it is clear that it is data that relates to a specific trial that is the same for every participant: required presses, value of reward, and probability of reward; that is not the case with respect to “Reward Granted,” which clearly relates to the participant, not the trial. Therefore, the only possible interpretation of the words “Reward Granted” is that a reward was, indeed, granted to that individual for that trial.
Moreover, if the entry in column K applied to more than one individual, it would be entitled “Rewards (plural) Granted” as it would apply to all trials with the same number, not singular as it is. Of course, a more accurate title, in any event, would be “Win Trial” if NIH indeed meant to capture the win-trial status of a trial.
Either way (entry error or labeling error), NIH made a significant mistake, raising the question of how many other mistakes were made in recording and/or labeling the EEfRT data.
In any event, this alternative interpretation of column K does not change the fact that the groups essentially performed the same regarding the virtual rewards won (with the proper exclusion of those patients who were unable to complete hard tasks at a valid rate or at all). There was no significant group difference (less than 1%) in virtual rewards earned and, therefore, no basis for the claim that there is disrupted effort discounting or an altered Effort Preference in ME patients.)
Button-Pressing Speed over Time
With respect to button-pressing speed, ME patients started stronger on the easy tasks than controls, which does not jibe with the authors’ claim that patients lacked motivation or that their effort discounting is dysfunctional. Similarly, it is worth noting that the button-press rate for the hard tasks actually increased for ME patients throughout the EEfRT, which again contradicts the claim of an impaired preference with respect to exertion.
Figure 3b:

Confounding Factors and Validity Issues of the EEfRT
There are significant limitations, potential confounding factors, and validity concerns discussed by many prior EEfRT study authors, including the creators of the EEfRT. The EEfRT is a highly problematic measure that is vulnerable to distortion even in the hands of the most unbiased and benevolent researchers, which is not the situation we find ourselves in.
Impact of Task Properties and Administration
The authors of the EEfRT reliability and validity paper point out that “[s]eemingly small differences in task properties and administration could have a great impact on task behavior.” This is highly relevant given the obvious bias of at least some of the NIH investigators, who set out to “confirm,” one way or another, their pre-conceived notions about ME.
As an example of the potential impact of task administration, numerous EEfRT studies excluded participants who were taking benzodiazepines, a medication that can confound EEfRT outcomes. Not so the NIH investigators, who included ME patients who were, at the time of the NIH study, taking benzodiazepines. At least one EEfRT study found that the proportion of hard-task choices can be affected by poor sleep quality; ME patients, of course, suffer from non-restorative sleep and other sleep dysfunction, making the EEfRT an inappropriate test for them. These are just two examples of how easily EEfRT results are skewed.
Measuring Motivation?
EEfRT results are liable to a wide range of confounding factors, resulting in measuring metrics other than motivation, the willingness to exert effort, or the valuing of rewards. For example, the EEfRT is not capable of distinguishing choices made based on motivation as opposed to personality.
As the authors of one EEfRT study state, “there is currently a lack of clarity about the specific drivers of motivated action during this task.” Another EEfRT paper on Bipolar I Disorder acknowledges that “it was not possible to adequately disentangle whether willingness to expend high levels of effort for reward can be considered a mechanism driving ambition.”
The authors of the EEfRT reliability and validity study state:
“[P]revious studies indicate that effort allocation within the EEfRT can be manipulated by a wide range of factors, ranging from mood inductions over neurophysiological manipulations to the influence of reduced motivation, or the intake of caffeine. So how does a person decide whether to increase effort to potentially gain a greater monetary reward within the EEfRT? Our mixed pattern of results shows that there is no simple answer to this question. Especially the impact of reward attributes hints at a complex pattern behind participants’ decisions and at the importance of individual reward evaluation.”
I discussed the impact of the mentioned reward attributes in more detail under “Game Optimization Strategy” above.
Misrepresentation of EEfRT Scope
The authors state, “Motivation was assessed using the Effort-Expenditure for Rewards Task….” In other words, the authors purport that the EEfRT is capable of measuring any and all motivation or general motivation. That is incorrect. There are various types of motivation, and the EEfRT does not, for example, purport to measure intrinsic motivation, which is driven by interests, passions, personal values, instincts, etc. and is difficult, if not impossible, to assess. Instead, the EEfRT is limited to measuring reward-based motivation, i.e., a specific form of extrinsic motivation (another being recognition, for example, which, too, the EEfRT is incapable of assessing). It is right there in the name: Effort Expenditure for *Rewards* Task. Moreover, the rewards for which the EEfRT can possibly claim any validity at all are immediate, not delayed, rewards. In addition, the only type of reward that the NIH investigators have any data on is very small monetary gambling rewards.
Consequently, any EEfRT claims by the authors should have been limited to a small subset of motivation. In essence, the NIH investigators misrepresented what the EEfRT is designed to assess and capable of assessing. There is no discussion of NIH’s results-oriented, overreaching broadening of the interpretation of the EEfRT outcomes. Overgeneralizing findings is an absolute no-no for any ethical scientist.
Moreover, it is not reasonable to draw any conclusions about the general motivation (or the effort discounting) of extremely sick patients, living on the margins and finding themselves in a daily existential struggle, merely based on their choices relating to a relatively meaningless social reward, such as to win a button-pressing game, while going through a grueling protocol that was guaranteed to cause significant physical fall-out. Aside from the special circumstances of ME patients, people value money differently, and it is not supportable to conclude that somebody has a problem with motivation in general (or, in the case of the NIH study, suffers from a false perception of effort and their own abilities) if that person assigns less importance to monetary gain.
The authors of the EEfRT reliability and validity study agree:
“It is reasonable to assume that the evaluation of potential benefits and costs can differ greatly between participants. … A potentially important factor is the type of reward and how much a person values this reward.”
Low Statistical Power and Risk of False Positives
The difference between groups in the proportion of hard tasks chosen in the NIH study are mostly minor, which is reflected in the weak p-value of 0.04 (with a generally accepted p-value threshold of <0.05 for statistical significance). Many prior EEfRT studies, which were almost all larger in cohort sizes, some dramatically so, than the NIH study (16 controls and 15 ME patients) noted as a limitation their small sample sizes and cautioned about low statistical power and the attendant risk of false positives.
The more far-reaching the implications of the interpretation of the EEfRT data are, the more problematic an exceedingly small cohort size is. One can hardly imagine a smaller cohort than the one in the NIH study short of a single case study. To extrapolate from the choice of hard tasks made by 15 ME patients on one occasion to tens of millions of ME patients worldwide by declaring that the ME phenotype is defined by an altered Effort Preference would be embarrassing to any serious scientist.
It is as though one used the result of 15 coin tosses to demonstrate the probability of the result of any coin toss being heads. I tried that one time only and got 6 heads and 9 tails. Does that mean that the probability of a random coin toss landing heads is 6 out of 15 or 40%? Let’s be sure to tell the NFL captain calling the coin toss at the Super Bowl to call tails.
With respect to the NIH study, there has been a clear pattern of the investigators trying to defend their inability to replicate well established abnormalities in ME, raising important questions of cohort selection (both the patients and the control cohorts). Whenever the authors are confronted with uncomfortable questions—for example, their failure to find POTS in ME patients—they are quick to point out, as an excuse, that this was an exploratory, hypothesis-generating study with a small cohort. And yet, the small cohort size was no deterrent for the authors to claim that ME is defined by patients underestimating their capacity. Heads they win; tails we lose.
Physical Inability of ME Patients to Complete Hard Tasks
Based on the actual number of hard tasks completed by the two groups, controls were more likely to complete hard tasks than ME patients “by an immense magnitude.” This translated into a respectable p-value of 0.0001.
I determined, based on the Figure 3a raw data, the percentage of completion of hard-task trials by group. Controls completed hard tasks at a rate of 96.43% (in line with the original EEfRT study) while ME patients completed hard tasks at a rate of 67.07%. That is a dramatic difference. This inability of ME patients to complete hard trials at anywhere near the same rate as controls is a strong indicator that the patient group struggled physically to complete hard tasks.
The authors carefully avoided including an illustration of this highly relevant group difference, so I created the following graph:
(I excluded the data from control F, which was deemed to be invalid by the investigators.)
For starters, the authors misstate how one determines if trial completion was an issue:
“The three-way interaction of participant diagnosis, trial number, and task difficulty was evaluated in order to determine whether participants’ abilities to complete the easy and hard tasks differed between diagnostic group.” [emphasis added]
That is false. The number of trials comes into play for this analysis only if all one is looking at is fatigue (NIH did not find a group difference with respect to fatigue sensitivity, see above under “Hard-Task Choices”). However, the physical ability to complete the trials can be impacted by issues other than fatigue, and the trial number is entirely irrelevant for assessing whether participants were unable to complete trials at a reasonable and reasonably equal (to the control group) due to health issues other than fatigue, for example, due to individual motoric ability. All that is required is comparing the percentages of trials that was completed by each group. NIH’s pathological insistence that ME is basically nothing more than fatigue quite literally allowed the investigators to misreport the EEfRT results.
Prior EEfRT studies. The creators of the EEfRT cautioned other researchers with respect to the importance, for the validity of the outcomes, of the ability to complete the EEfRT tasks:
“An important requirement for the EEfRT is that it measure individual differences in motivation for rewards, rather than individual differences in ability or fatigue. The task was specifically designed to require a meaningful difference in effort between hard and easy-task choices while still being simple enough to ensure that all subjects were capable of completing either task, and that subjects would not reach a point of exhaustion. Two manipulation checks were used to ensure that neither ability nor fatigue shaped our results. First, we examined the completion rate across all trials for each subject, and found that all subjects completed between 96%-100% of trials. This suggests that all subjects were readily able to complete both the hard and easy tasks throughout the experiment. …”
That first EEfRT study’s participants’ completion rates of between 96% and 100% have been typical in subsequent EEfRT trials. One study resulted in a completion rate of only 88% owing to the fact that it included a large number of older adults with a mean age of 73 years in order to study age-related differences; the mean age of ME patients in the NIH study, however, was less than 38 years, and in any event, the two groups were matched for age.
Various subsequent EEfRT studies agreed with the need to control for motoric ability as it has been “shown to strongly impact performance” on the EEfRT. For example, an EEfRT paper on depression states that unequal completion rates between groups indicate issues of “psychomotor retardation.”
The authors of an EEfRT paper on schizophrenia determined the maximum number of button presses for both easy and hard tasks for each individual by instructing participants to press the button as many times as possible and setting the button-pressing requirement for hard trials at 90% of the maximum rate in order “to control for nonspecific differences in motoric ability between groups, [sic] and to assure that each individual had the capacity to complete the trials.” Those authors stressed the importance of controlling for motor function:
“This control is of critical importance as most of our inferences on incentive motivational systems depend on instrumental responses (Salamone et al., 2007). Hence any individual differences in motor ability will bias the caliber to which the instrumental response is executed, which may in turn be incorrectly interpreted as decreases in motivated behaviour.”
Another study also calibrated the button-pressing requirement for hard-task trials to 90% of the participants’ maximum keypress speed and excluded participants who completed less than 50% of their trials, which resulted in a completion rate of 97.5% for the hard trials and 98.6% for the easy trials.
In another EEfRT study on schizophrenia, the investigators adjusted the required button-pressing number for hard trials to 85% of the individually calibrated number of button presses in order to control for motor-speed and dexterity differences.
An EEfRT paper on binge eating excluded those participants who did not at least complete 50% of their trials.
Yet another study recognized the strong impact of motoric ability on EEfRT outcomes and performed “a pre-analysis” to probe “whether higher motoric ability is associated with greater average number of clicks throughout the actual task and should therefore be statistically controlled for in the analysis.” Their preliminary analysis “revealed a large impact of participants’ individual motoric abilities on the number of clicks they exerted” and that the groups differed in their motoric abilities as measured via the motoric trials. Since those differences do not reflect actual motivation, the authors concluded that “not including this factor could have distorted the results.” Even after considering the issue of differences in motoric ability and attempting to control for it by including motoric trials, the authors were concerned that “the large impact of motoric abilities may still be considered a possible downside of this task. Future studies should address this limitation.”
One study “individually calibrated” the number of presses required for the hard and easy tasks for each participant” and set the number of button presses for the hard task at 85% of the participants’ maximum and the number of easy tasks at one third of that.
Another study also determined the maximum number of button presses and required participants to execute 70% of their maximum button-press number determined during calibration for the easy task and 90% for the hard task.
These are merely examples and not meant to be a complete list of how the prior EEfRT studies addressed the issue of difficulties with trial completion. It is obvious that other EEfRT researchers were acutely aware of the attendant validity issue and were trying to control for it.
It is important to remember that none of the prior EEfRT studies were done on individuals with organic diseases. They were either done on healthy participants or participants with primary mental-health issues. Those individuals were much less likely to be as physically limited as ME patients; nevertheless, the authors of those studies recognized the need to protect the validity of their data.
NIH investigators. Despite having had the benefit of prior EEfRT studies educating the NIH investigators on the importance of the issue and on the design of a modified version of the EEfRT for the special needs of ME patients, the NIH investigators did not calibrate the participants’ maximum button-press rate, even though they could have easily done so, nor did they exclude the data of the patients who were unable to complete hard tasks at a reasonable rate even after having found an “immense” (their own word) group difference with respect to the ability to complete hard tasks. In other words, the NIH investigators knew that they had to control for motoric ability but did not do so; they also knew that their results were not a valid measure of decision-making or effort discounting in light of the dramatic group difference with respect to the ability to complete hard tasks, but they chose to publish the EEfRT outcomes regardless and to claim that there is something wrong with ME patients’ perception of their physical abilities. There is no discussion of this validity issue in the paper.
All NIH did was to purport to control for fatigue and call it a day. ME, of course, frequently manifests with altered motor coordination and speed as well with issues with dexterity and reaction time. The fact that patients performed noticeably worse than controls on pegboard testing, particularly with the dominant hand, (Supplementary Data 13), which was part of the cognitive testing, corroborates that. That test requires motor function and physical exertion and is, therefore, another indication that the physical ability of patients to complete tasks involving motoric ability was impaired. Of course, the ability to participate in the EEfRT can be impaired for reasons other than impaired motor function, and the investigators completely ignored any ME symptoms that likely impacted the button pressing of patients, for example: non-restorative sleep (as mentioned above), benzodiazepine use (as mentioned above), various pain, POTS, Neurally Mediated Hypotension, nausea, dizziness, sensitivity to light and noise, etc.
The completion data for hard trials leaves no doubt that the EEfRT data of some patients were invalid. This is a textbook case of res ipsa loquitur, the thing speaks for itself. The EEfRT is designed to assess investment of physical effort for monetary rewards. It is beyond the pale for researchers to administer effort testing that requires physical exertion, such as the EEfRT, in a patient population that has been established to be physically unable to function as healthy individuals and when patients indeed do not perform at the level of controls, conclude that there is something wrong with how patients perceive effort and their own ability to exert.
Based on prior EEfRT studies, patients who completed less than than 50% of their hard trials should have been excluded by NIH. That applies to five patients, i.e., a third of the ME group. When one excludes those five patients, the hard-task completion rate by patients jumps from 67.07% to 89.60%, which is much closer to the hard-trial completion rate of controls (96.43%), although possibly still lower than what one would expect from participants who are able to complete EEfRT tasks reliability and validly. However, since NIH did not control for patients’ inability to complete tasks, the entire EEfRT findings are invalid.
NIH Symposium—Madian. A virtual-audience member asked about this issue during the recent NIH Symposium. Madian had been alerted to this question in advance and responded (at 2:50:16) as follows:
“What the [original EEfRT] paper describes is that the EEfRT was designed so that the sample of patients used within that original study could consistently complete the task. This does not mean that everyone who takes the task must be able to complete the task without issue for the administration or data to be valid or interpretable. It seems that the creators wanted to ensure that in general as many people as possible would be able to complete the task but without compromising the task’s ability to challenge participants. Furthermore, I think, it bears mentioning that although our ME participants did not complete the task at the same 96-100% rate as the participants in the original study or at the same rate as our healthy controls, they still completed the task a large majority of the time. To wrap things up, to answer the question, consistently completing the task is not a requirement for a valid EEfRT test administration, and by all accounts we believe our data is valid and is, thus, interpretable as a measure of impaired effort discounting.”
The part of the original EEfRT paper that Madian referred to is reproduced via a screenshot below:
Madian’s reply is severely misleading. First of all, referring to the initial EEfRT study fails since that study did not use the EEfRT data to completely re-define an organic disease. Instead, those authors were looking for a correlation between the level of anhedonia and decreased motivation for rewards in patients with Major Depressive Disorder. Surely, the threshold requirement for the completion rate must be stricter when the results of the EEfRT are used in as consequential a way as they were by NIH: to draw sweeping, definitive conclusions about ME.
Furthermore, the original EEfRT authors conclude “that all subjects were readily able to complete both the hard and easy tasks throughout the experiment” based on a consistently high completion rate. [emphasis added] There is no language in the original EEfRT paper to the effect that it is sufficient that “as many people as possible would be able to complete the task” contrary to what Madian claimed. In fact, it states the opposite:
“The task was specifically designed to require a meaningful difference in effort between hard and easy-task chocies while still being simple enough to ensure that all subjects were capable of completing either task.” [empahsis added]
Madian himself initially conceded that consistent task completion is required but later inexplicably contradicted himself, the EEfRT creators, and many other EEfRT studies when he claimed the opposite. All EEfRT studies that have since addressed the issue have been unequivocal regarding the requirement that the physical inability to complete tasks renders the EEfRT results invalid. At least two studies set that threshold at a 50% completion rate.
In the NIH study, a third of the patients was unable to complete hard trials consistently or at all. One patient was unable to complete a single hard trial out of 18 attempts, and another was able to complete only two out of 21 attempted hard trials. The following table illustrates the extremely low completion rate of the five patients who were not able to complete at least 50% of hard trials.
An additional patient barely got over the 50% threshold with only seven out of 13 hard tasks completed.
Madian admitted that patients “did not complete tasks at the 96-100% rate as the participants in the original [EEfRT] study” did but said that that does not impact the validity of the NIH data because patients completed tasks “a large majority of the time.” At a combined completion rate of 67.1% for hard tasks by patients, that is demonstrably false. To illustrate how dramatic the issue is: the combined hard-task completion rate of the five patients who struggled completing hard tasks was less than 16%. That is not even close to a simple majority and a far cry from “a large majority of the time,” which Madian falsely claimed was achieved by patients.
It is difficult to believe that Madian could have gotten it so wrong by accident given that (by his own admission) he agonized over his answer in preparation for the Symposium and spent a significant amount of time on it in an alleged attempt to ensure accuracy of his response, wrote out his answer in advance, and apparently read it word for word to avoid making a mistake. Again, according to the NIH authors themselves, the groups differed with respect to the hard-task completion “by an immense magnitude” and with a stunning resulting p-value of 0.0001.
It is no coincidence that four out of the six patients who completed hard tasks at an abysmal rate chose the fewest hard tasks out of all patients. Their difficulty in completing hard tasks obviously had an impact on their number of hard-task choices.
Moreover, Madian acknowledged only the very first EEfRT study, although there are dozens of EEfRT studies by now, many addressing this very issue, and none of them agree with Madian. He also did not at all address the issues of not having calibrated the button-press requirement to patients’ individual ability or the requirement to exclude the data from those patients who clearly struggled with completing the hard trials.
The data of those patients should have been excluded because the proportion of hard tasks they chose is a reflection of their physical inability to complete hard tasks, not of their motivation or an alleged false perception of exertion capacity because it is highly likely that those patients’ inability to complete hard tasks impacted the number of hard tasks they chose. As a result, the EEfRT testing did not actually measure patients’ motivation or effort discounting (i.e., Effort Preference), and it is unacceptable to re-define an entire disease based on this invalid data. NIH’s failure to exclude the data from the patients who were too impaired to complete hard tasks consistently or at all means that the entire EEfRT findings are invalid.
Sloppy Paper
The intramural paper, with its apparent absence of proof-reading, is a mess. There is so much wrong with it that one has to wonder if the authors accidentally published a draft instead of a final, carefully proofed version. Below are examples that illustrate the point.
Supplementary Figure S5
The authors generated graphical examples (Supplementary Figures S5b-d) for how the selection ratio of easy versus hard tasks by the two groups would allegedly look in three different scenarios and explained their graphs in the corresponding analysis below Supplementary Figure S5. (I discussed how to read those graphs under “Hard-Task Choices” above.)
Supplementary Figures S5b-d:
Figure S5 analysis:

I am listing issues in the Supplementary Figure S5 analysis in the order in which they appear:
There is a duplicative sentence under B.
The authors claim that the reward value is charted on the y-axis of graph d; however, the reward value is obviously represented by the x-axis.
The analysis for graph b (under (B)) is wrong. The authors claim that graph b illustrates the following scenario: “[a] difference in effort sensitivity is represented by a constant reduction in hard task choices through the entire task, with the blue group having lower effort sensitivity than the gray.” Obviously, there is no reduction at all in graph b; the lines are parallel to the x-axis throughout.
The interpretation of the graphs refers to blue and gray groups, but the arrows in the graphs are actually purple and teal.
The analysis under Supplementary Figures S5e-f falsely specifies the sample size of the control group with 17. This is likely owing to the fact that the data for control F was excluded from the EEfRT analysis after the fact without adjusting the sample-size number under Supplementary Figures S5e-f accordingly.
Supplementary Figures S5e-f:

There is no legend for Supplementary Figures S5e-f indicating that the control group is graphed in blue and the patient group in red.
A careful scientist would have included a unit for the reward value for Supplementary Figure S5f.
Supplementary Figure S5e shows data for a trial 0. There can be, and was, no such thing as a trial 0. The same issue exists for the nearly identical Figure 3a.
Figure 3a:
When you take a look at graphical examples in Supplementary Figures S5b-d above, you will notice that the y-axes are labeled “Hard/Easy Task Choice Ratio.” Those graphical examples are meant to relate to the actual EEfRT data allegedly depicted in Supplementary Figures S5e-f; however, the y-axes of Supplementary Figures S5e-f are labeled “Probability of Choosing Hard Task,” even though the y-axes of all five figures should be identical. Madian corrected this inconsistency in his Symposium presentation slide by changing the designation of the y-axes for Supplementary Figures S5b-d (see below), but the inconsistently designated y-axes of Supplementary Figures S5b-d remain in the actual paper.
Throughout the paper, the authors play fast and loose with the concepts of number of hard-task choices, proportion of hard-task choices, and the probability of choosing the hard tasks, which they seem to use interchangeably as though they are the same. That obfuscation goes a long way in the authors’ attempt to make the EEfRT findings close to impenetrable.

Supplementary Figure S5a:
Supplementary Figure S5a shows the sequence of steps for the EEfRT. The steps are represented by computer screens. The blank screen before and after the choice screen are not depicted. Moreover, the first screen shows that if the participant were to choose the hard task, he or she might win $2. This was not one of the options in this study. Moreover, the last screen, which shows the participant’s actual reward, indicates that he or she won $2.37, which was not the option for that particular trial according to the first screen and was not an option for any of the trials in the study. It is not possible to win more than the potential reward value indicated when a participant’s choice is made for a trial.

That’s an astonishing number of mistakes to make in a single paragraph and the corresponding graphs/illustrations.
Figure 3b
For another example of the sloppy drafting, take a look at the two graphs below (Figure 3b). The button-press rate for the easy tasks is illustrated in the graph on the left, and the button-press rate for the hard tasks is captured in the graph on the right. Nevertheless, the authors claim the reverse. This error is a powerful demonstration of NIH’s up-is-downism or left-is-rightism in this case.

Range of total winnings
The paper incorrectly states that the maximum total amount a participant could potentially take home is $8.42. That is false. The correct amount is $8.24 as the highest reward level is $4.12 and participants get to take home the winnings from two randomly selected trials.
It is highly likely that there are many more mistakes in this paper given the large number I found by focusing merely on the EEfRT analysis.
In Part 3 of this 4-part series, I will address the EEfRT as a psychological measure, NIH’s desperate attempts to justify their EEfRT conclusions, the agency’s history of falsely reducing ME to fatigue and its pivot to Effort Preference, the investigators’ denial of established CPET science, and the ongoing inquiry by NIH into Effort Preference.
***
Open Access: I shared quotes, data, images from the paper “Deep phenotyping of post-infectious myalgic encephalomyelitis/chronic fatigue syndrome” under the Creative Commons license, a copy of which can be found here. I indicated how I re-analyzed the data.