Article Text
Abstract
Many health systems now use cost-effectiveness analysis to decide which interventions and programmes to fund. A key issue for such decision making is how to measure health outcomes from interventions to reflect changes in both health-related quality of life and life expectancy. For some decision makers, including the National Institute for Health and Clinical Excellence in the UK, the quality-adjusted life-year (QALY) is central to health measurement. This article describes the concept of the QALY, its derivation, and its strengths and weaknesses.
Statistics from Altmetric.com
Given the inevitability of limited resources in health care, it is increasingly recognised that decisions have to be taken about which programmes and interventions are the best value for money in terms of the relation between their cost and the health outcomes that flow from them. Indeed, a number of healthcare systems now require the manufacturers of new technologies (mainly pharmaceuticals) to demonstrate the cost-effectiveness of their products before they can be funded. This is particularly the case where there is a long tradition in health technology assessment such as in the UK, Australia, Canada, The Netherlands and the Scandinavian countries.1 For example, in the UK, an independent agency—the National Institute for Health and Clinical Excellence (NICE)—undertakes technology appraisals at the request of the Department of Health which, given its mandatory status, effectively represents national policy decisions on whether the appraised technologies should be made available in the National Health Service (NHS), or not.2
To do their job, NICE and other international decision makers require information about the cost-effectiveness of interventions. That is, the extent to which they result in a net increase in health outcomes. Cost-effectiveness analysis involves estimating the change in cost from using a new technology instead of one or more comparators, and then comparing this change in cost to the change in health outcomes from switching from the old to the new technology. Cost, in this context, refers to the full cost implication of a technology, not just its price (for example, including the cost of any adverse events resulting from its use, monitoring such as measuring the INR in patients on anticoagulants, etc). It may also include the cost of any change in productivity as a result of a patient being away from paid employment, but this would depend on whether a healthcare system or societal perspective is adopted for the analysis.
If the new technology is less costly and more effective than its comparators, then it is said to be “dominant” and unequivocally more cost-effective than the comparators. This is infrequent but entirely feasible; for example, the use of trans-vaginal tape rather than open surgery for women with stress urinary incontinence.3 However, most new technologies impose an additional cost on the system at the same time as improving health outcomes relative to their comparators. In this situation, the new technology could be considered cost-effective if the health gain it generates is greater than the health decrement that will be experienced as a result of fewer interventions being undertaken in other clinical areas to “free up” resources to fund the new technology (this is the economists’ concept of opportunity cost).
If this sort of cost-effectiveness analysis is to be used to support decision making, then exactly how we measure health outcomes needs to be addressed and understood by all concerned, including neurologists. This is why the quality-adjusted life-year (QALY) has become widely used—indeed it is now formally required as part of NICE’s methodological guidelines for health technology assessment.2
Here we will describe the concept of the QALY, the usual methods for obtaining quality weights, how to use the EuroQol (EQ)-5D instrument to calculate QALYs, and the main advantages and limitations of the QALY approach. Additional details may be found in Drummond et al 20053 and the other references listed below.5–24
WHY DO WE NEED QALYS?
Cost-effectiveness analysis has been used for some years in health care. Early examples expressed effects in terms of disease-specific outcomes such as blood pressure reduction (mm Hg),5 number of cases of deep-venous thrombosis detected,6 asthma episode-free days,7 and years of life gained.8 Nowadays, there are a number of clinical scenarios where cost-effectiveness analysis using QALYs is the preferred analytic option:
When health-related quality of life (HRQoL) is the important outcome (for example, in comparing alternative interventions for the treatment of multiple sclerosis, the primary interest is focused on how well the different treatments improve the patient’s physical function, social function and psychological well-being rather than just relapse rate).
When the intervention affects both morbidity and mortality and a common unit of outcome is needed that combines both (for example, to capture the excess mortality and morbidity associated with different treatments for Parkinson’s disease). This is particularly important when morbidity and mortality outcomes move in opposite directions—for example, a new cancer therapy which increases survival but with toxicities which affect HRQoL. In this context an overall measure of benefit on a single scale is required to assess cost-effectiveness.
When the new technology is more costly and more effective than standard care, because one can compare the added QALYs with the opportunity costs—that is, the QALYs lost from programmes which have to be reduced or eliminated to fund the new technology. Assessing this opportunity cost of finding the additional resources to provide the new technology is a complicated empirical task that is commonly resolved using an external reference. For example, the UK decision makers’ “cost-effectiveness threshold” is based on an administrative “rule of thumb” largely reflecting earlier decisions about the funding of new technologies. Interventions with incremental cost-effectiveness ratios (that is, their additional cost compared to their additional QALYs) in the region of £20,000 to £30,000 per additional QALY have been considered to provide value for money in the NHS.9
In summary, the advantage of the QALY, the most well known health outcome measure used in cost-effectiveness analysis, is that it can simultaneously capture gains in terms of morbidity (HRQoL gains) and mortality (quantity gains) and combine these into a single generic measure. Moreover, this combination is based on the relative desirability of the different outcomes (for example, it can incorporate individuals’ preferences when confronted with trade-offs such as a modest increase in survival but a reduction of HRQoL during treatment, or no increase in survival but maintenance of the same HRQoL during the remaining life of the patient, a typical situation for many cancer treatments).
WHAT EXACTLY IS A QALY?
The QALY was designed with the purpose of economic evaluation in mind and first popularised by a landmark paper from Harvard University, published in the New England Journal of Medicine in 1977.10 Weinstein and Stason described the QALY as a composite measure of outcome for use in healthcare economic evaluation studies, with measured or judged HRQoL weights for health states (on a 0–1 scale) with which to adjust survival times.
By plotting the HRQoL weight (also called utility score) for each healthcare intervention against the year in which the health outcome is obtained, a profile can be constructed comparing the consequences of one intervention versus another. If the two groups start with exactly the same baseline utility, the area between the two curves represents the QALYs gained (otherwise, the area between the two curves must be adjusted to account for any baseline difference using multiple regression methods11). A simple example is displayed in figure 1.
From figure 1, consider an individual suffering from a particular form of cancer. Without treatment the patient would expect to live for six months without symptoms, then experience 12 months with progressive disease before dying. With treatment, the patient would experience three months toxicity while treatment was being given, then 12 months without symptoms, followed by six months progressive disease before dying. For the sake of illustration, assume that the HRQoL weights for the various health states as displayed in the y-axis are as follows: cancer without symptoms, 0.9; treatment with toxicity, 0.8; progressive disease, 0.7. Without the intervention the individual’s HRQoL would deteriorate according to the lower path and the person would die at 18 months. After an initial period of deterioration associated with treatment (for example, a three-month cycle of chemotherapy), with the intervention the person would deteriorate more slowly, live longer and die at 21 months. The two paths cross each other to reflect the initial QALY loss in the short-term associated with chemotherapy (area A) in order to achieve a QALY gain in the longer term. The QALYs gained from the intervention can be divided into two areas, B and C. B is the amount of QALY gained due to HRQoL improvement (the gain in HRQoL of life during the time that the patient would have been alive without treatment) and C is the amount of QALY gained due to the quantity improvement (that is, the amount of life extension weighted by HRQoL). The QALY calculation based on this figure is as follows:
Without treatment the estimated QALYs are: 0.5 year at 0.9 + 1 year at 0.7 = 1.15 QALYs. With treatment the estimated QALYs are: 0.25 year at 0.8 + 1 year at 0.9 + 0.5 year at 0.7 = 1.45 QALYs. Therefore, 0.3 QALYs are gained with treatment.
Other situations can be handled; for instance, the paths may be identical for a long time after the intervention and only diverge in the distant future (that is, the treatment has a prophylactic or preventive effect experienced only in the long term). With each path having a QALY value associated with it, in decision modelling the expected QALY is calculated as the sum of the QALYs for each pathway weighted by their respective probabilities.
METHODS FOR ESTIMATING HRQOL WEIGHTS
To use the QALY concept, one needs estimates of the weights that represent the HRQoL of each of the health states under consideration (as in the example above). Two points—full health (1) and death (0)—act as “anchors” for the interval scale of QALY weights (an interval scale is where the same change—for example, an improvement of 0.2—means the same no matter what part of the scale is being considered, like the standard scales for measuring temperature). HRQoL weights can also have negative values to reflect conditions worse than death.
The three most widely used techniques to measure the preferences of individuals for health outcomes, and so derive QALY weights, are the rating or visual analogue scale, the time trade-off, and the standard gamble (table). They correspond to two main approaches:
QALY as a health status index: using subjective weights such as those derived from a rating or a visual analogue scale.
QALY as utility-weighted index: these methods are based on utility theory, and so we talk about the resulting scores as utilities (which are equivalent to weights). A key aspect of the measurement process is whether the outcomes in the question that the subject is asked to choose are certain (for example, time trade-off) or uncertain (for example, standard gamble).
Scores from a rating (visual analogue) scale provide an indication of the ordinal rankings of the health outcomes, and some indication of the intensity of those preferences (fig 2). However, when compared with preferences measured by the time trade-off or the standard gamble methods, a rating scale may not act as an interval scale of preferences.13, 14 Nonetheless, rating scales can still be used as a warm-up exercise before measuring preferences by some other technique, and their resulting scores can be converted to utilities.13
The time trade-off and standard gamble are preference scales for choice-based measurement:
In the time trade-off method the individual is presented with a choice between living the rest of their life (t) in a given health state i (for example, on dialysis) or a shorter period of time (x) living in perfect health. Time x is varied until the respondent is indifferent between the two alternatives, at which point the required preference score for state i is x/t (fig 3). The logic is that the higher the value the individual places on state i the greater he or she would require time x to be to be indifferent between the two options.
In the standard gamble method individuals are required to make a choice between the certainty of remaining in a given state (guaranteed state) and an alternative with two possible outcomes: perfect health (with probability p) and death (with probability 1-p). The probability of enjoying perfect health, p, is varied until the respondent is indifferent between the two alternatives, at which point the required preference score for state i for time t is simply p (fig 4). The underlying logic here is that the higher the value the individual places on state i the higher the probability of a successful outcome (perfect health) will be required from the gamble for the individual to be indifferent between the certain outcome and the gamble.
The two different methods produce different results: the standard gamble generally produces higher utility values than the time trade-off method reflecting many people’s risk aversion.3
ARE ALL QALYS THE SAME?
There will inevitably be differences between QALYs calculated in different ways.
HRQoL weights may be based on different methods for measuring preferences (for example, visual analogue and time trade-off).
Whose preferences are being evaluated is a key issue. Respondents may be members of the general public, clinical experts, or patients—all of these groups have their own interests and personal knowledge of the health states under consideration, which will have a direct impact on their valuations. A recent comprehensive review of HRQoL data showed considerable variation in the weights assessed by different authors for the same health state,14 with 51% using direct elicitation methods, the rest based on subjective expert judgment. A second study has recently reviewed the methods used to obtain HRQoL weights in the assessments carried out for NICE.15 There was striking variation: data from patients were used in 33% of analyses, from the general public in 22%, and from clinicians in 9%, the rest of respondents being the authors themselves and other proxies.
The size and representativeness of the sample of individuals providing preference data (whether these be patients or the public) may influence the estimated weights.
Many studies describe health states to individuals who will then indicate their preferences for that state using one of the above methods. Different approaches to describing health states can generate different HRQoL weights—for example, the use of disease-specific versus generic descriptions.
There is a case for greater transparency and consistency in reporting the methods used to obtain HRQoL weights to facilitate the comparability of results. In the UK, NICE requires UK-population preference values elicited using choice-based methods (such as time trade-off and standard gamble).2
GENERIC PREFERENCE-BASED SYSTEMS: THE EQ-5D
Measuring preferences for health outcomes, as described above, is a time consuming and complex task. A common alternative is to use one of the existing pre-scored standardised generic instruments, for example:
After a description of the different health states under consideration (“profiles”) the different systems present a standardised set of HRQoL weights based on public preferences. The EQ-5D will be described here in some detail.
In 1987 the EuroQol Group, a consortium of researchers in Western Europe, set themselves the task of developing a standardised generic instrument for valuing health-related quality of life, for use in economic evaluation. The EQ-5D was developed as a single index value for health status with five domains: mobility, self-care, usual activity, pain/discomfort and anxiety/depression. Each domain has three levels (no problem, some problems and major problems) which, together with the state of unconsciousness and the state of death, define 245 possible health states (35 plus dead and unconscious; fig 5). If someone is in good health, they would tick the first box for each domain (health state 11111); if they have moderate pain but no other health problems they would have health state 11121; and if they had severe problems on all dimensions they would be in health state 33333.
The conversion of the 245 health states into a single index value was based on preferences derived from the time trade-off method in a representative sample of over 3000 adults in the UK.20–22 Today, the available population value sets produced for the EQ-5D can be applied in 13 countries (and there is a generic “European value set”), and the forms have been translated into 60 official languages to appropriately reflect cultural differences. Further details on the EQ-5D can be obtained from the website of the EuroQol Group (http://www.euroqol.org).
SOME CRITICISMS OF THE QALY
The QALY concept is not without controversy. The main argument against it is to do with equity concerns, derived from the fact that QALYs represent an aggregation of preferences and trade-offs that individuals hold for their health. In aggregating, the usual assumption is that a QALY’s worth of health gain represents the same value whoever receives it, so other patient characteristics such as starting level of health, age or ethnicity are not taken into account directly. Indeed, NICE explicitly states that its equity position is that “an additional QALY has the same weight regardless of the other characteristics of the individuals receiving the health benefit”.2 This equity position may be at odds with those of decision makers and the public. Research is underway to build other equity positions into QALYs but these have yet to be widely used in decision making.23, 24
QALYs involve a number of key assumptions about individual preferences, in particular, the assumption of “constant proportional trade-off” (that is, 10 years in the EQ-5D health state 22222 (valued at 0.5) is equivalent to five years in health state 11111 (valued at 1.0)). Also the assumption of “additive independence” in preferences (that is, five years in health state 12221 followed by eight years in health state 11111 is equivalent to the reverse of eight years in health state 11111 followed by five years in health state 12221, not taking into account any time preference); in other words, that individuals have no preference about the ordering of their health states (good followed by bad is valued the same as bad followed by good).
Good examples of the use of QALYs in economic evaluation in the area of neurology are available in the literature.25–27
In addition there are some more practical issues with QALYs:
It is difficult to collect HRQoL information from certain types of patients—for example, children, those with mental health problems and the severely ill. However, this is the case for all measures of health which use patient-reported outcomes rather than clinical measures (such as blood pressure or deep venous thrombosis).
The extent to which the QALY is sensitive to changes in a patient’s health. There are essentially two perspectives on this. The first is that there are clinically important differences which are also important to the patient and that, if the QALY does not reflect these, it is inadequate. The second perspective is that if these differences are not reflected in the QALY, it seems they are not important enough to show up when individuals are asked to make trade-offs between length of life and health-related quality of life. Many economists would take the latter perspective but this would depend on there being an adequate measurement of HRQoL that covered all the key domains. It is a point of contention whether this is true of the domains and levels of the EQ-5D (fig 5).
SOME ALTERNATIVES TO THE QALY
A number of alternatives have been proposed but none is as popular and widely used as the QALY, mainly for practical reasons:
Healthy-years equivalents (HYEs) seek to overcome the assumptions of constant proportional trade-off and additive independence (see section above), although they do not address the equity issue. In this sense, the HYE measurement has been proposed as a theoretically superior alternative to QALYs; however, although measuring preferences over a path of health states is theoretically attractive, it is more difficult to implement in practice.
Disability-adjusted life years (DALYs) were developed by the World Health Organization with the aim of quantifying the burden of disease and injury in human populations. Although conceptually similar to QALYs, they differ in a number of ways, mainly that the disability weights in the DALYs are not preferences but person trade-offs scores from a panel of professionals, and that they use age weights that give extra weight to the middle-age population because of their economic and social role supporting and taking care of the younger and elderly members of society.
In the UK, NICE has issued methods guidance2 that argues for the use of measures like the EQ-5D that have been weighted according to the preferences of the UK population.
Practice points
The measurement of health benefits is essential for evaluating healthcare programmes
Traditional outcome measures (clinical units such as mortality) are inadequate for the comparison of healthcare interventions producing different types of outcomes, across various disorders.
The advantage of the QALY, the most well known generic outcome measure, is that it can simultaneously capture gains (or losses) from reduced (or increased) morbidity and extended survival, and combine these into a single measure.
However the QALY concept is not without controversy, especially regarding its equity position, which may be at odds with that of decision makers and the public.
Acknowledgments
Cathie Sudlow, Rustam Al-Shahi Salman and Steff Lewis all in Edinburgh, UK, for reviewing this article.