Indirect standardization

Indirect standardisation is used to compare the mortality rate of a sub-population using the age-specific rates of the standard population. The aim is to produce a summary mortality rate for the sub-population which would be expected if the mortality rate of the sub-population were identical to that of a standard population.

As an example, consider the claim that the golf is the most dangerous of all sports. There is some anecdotal evidence: golfers are likely to be struck by lightening, die of stroke following prolonged exposure to sun, be attacked by wild animals (alligators, rhino) while searching for lost golf balls, and so on. We want to evaluate the claim that the mortality rate for golfers (sub-population) is higher than the general sporting population.

We are given the number of golfing deaths for each age group but we do not know the age-specific mortality rates for the sub-population of golfers. This means we cannot directly compare the golf-specific mortality rates with the standard population. We do know the mortality rates of the general sporting population. We can then use this information to calculate the expected number of golfing deaths. So we are really asking: what would the mortality rate among golfers be if the mortality rate among golfers were identical to the mortality rate of the sporting population? If the observed deaths are higher than the expected deaths, then golfers are indeed at a higher risk of death.

The ratio of the observed and expected deaths is called the standardized mortality ratio (SMR).

$$ \small \text{SMR} = \frac{\text{Observed deaths}}{\text{Expected deaths}}, $$

where

$$ \small \begin{align} SMR = 1 & \quad \text{the observed mortality rate is not unusual}, \\\ SMR < 1 & \quad \text{the observed mortality rate is higher}, \\\ SMR > 1 & \quad \text{the observed mortality rate is lower.}
\end{align} $$

Consider the following fictitious data loaded into Stata, which gives the population size and deaths of golfers by age group for the year 2016. The last column gives the mortality rate in the general sporting population.

. list , abbrev(14) sep(6)

     +-----------------------------------------+
     |   age   pop_golf   obs_death   pop_rate |
     |-----------------------------------------|
  1. | 20-24      10000           6      .0004 |
  2. | 25-29      15000          14      .0005 |
  3. | 30-34      20000          32      .0007 |
  4. | 35-44      40000          54      .0011 |
  5. | 45-54      45000         120      .0019 |
  6. |   55+      50000         235      .0025 |
     +-----------------------------------------+

The next step is to calculate the expected deaths among golfers using the mortality rates for the general sporting population.

. gen exp_death = pop_golf * pop_rate

. list , abbrev(12) sep(10)

     +-----------------------------------------------------+
     |   age   pop_golf   obs_death   pop_rate   exp_death |
     |-----------------------------------------------------|
  1. | 20-24      10000           6      .0004           4 |
  2. | 25-29      15000          14      .0005         7.5 |
  3. | 30-34      20000          32      .0007          14 |
  4. | 35-44      40000          54      .0011          44 |
  5. | 45-54      45000         120      .0019        85.5 |
  6. |   55+      50000         235      .0025         125 |
     +-----------------------------------------------------+

To calculate the SMR, we sum the observed and the expected deaths and take the ratio.

. qui sum exp_death

. scalar exp_tot = r(sum)

. dis exp_tot
280

. qui sum obs_death

. scalar obs_tot = r(sum)

. dis obs_tot
461

. scalar SMR = obs_tot/exp_tot

. dis SMR
1.6464286

The SMR of 1.646 indicates that golfers are indeed at a higher risk of death than the general sporting population, most likely due to being exposed to the natural elements of the golf course.

Alain Vandormael
Alain Vandormael
Senior Data Scientist, PhD, MSc