Article Text

Download PDFPDF
Regression analysis
  1. Steff Lewis
  1. Senior Research Fellow/Statistician, University of Edinburgh, Division of Clinical Neurosciences, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK;

    Statistics from

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

    Embedded Image

    Regression analysis describes the relation between an outcome of interest and one or more variables, known as explanatory variables. For example, figure 1 shows how height (the outcome) is related to age (the explanatory variable) in young children. Each cross on the plot represents the value for an individual child, and the dotted line is the regression line, which will be explained later.

    Figure 1

    Scatter plot of height and age in 100 children, with regression line. (Data used with permission from the Office of Population Censuses and Surveys. Social Survey Division, National Diet, Nutrition and Dental Survey of Children Aged 1 1/2 to 4 1/2 Years, 1992–1993. SN: 3481. Colchester, UK: December 1995.)

    How a regression analysis is performed depends on the type of outcome data. Three common methods are described in this article, relating to:

    • continuous outcomes (such as height): linear regression

    • binary outcomes (such as stroke/no stroke): logistic regression

    • time-to-event outcomes (such as time to death): Cox proportional hazards.

    Regression analysis is so commonly used that clinicians must be able to at least understand the reporting of multivariable regression in publications, even if not able to do the analysis themselves. It would also be helpful for many to be able to interpret the computer output from a multivariable regression procedure. The methods described are available in standard statistical software packages.


    Simple linear regression is used to describe the relation between one continuous outcome variable—for example, height—and another (explanatory) variable—for example, age (fig 1). The explanatory variable may be binary (for example, male, female), have several categories (for example, nationality), or be continuous (for example, age). Here it seems sensible to choose height as the outcome variable (y, vertical axis), and age the explanatory variable (x, horizontal axis) as a person’s height depends on their age, not the other way …

    View Full Text

    Linked Articles

    • From the editor's desk
      Charles Warlow

    Other content recommended for you