Indirect standardization
Indirect standardisation is used to compare the mortality rate of a sub-population using the age-specific rates of the standard population. The aim is to produce a summary mortality rate for the sub-population which would be expected if the mortality rate of the sub-population were identical to that of a standard population.
As an example, consider the claim that the golf is the most dangerous of all sports. There is some anecdotal evidence: golfers are likely to be struck by lightening, die of stroke following prolonged exposure to sun, be attacked by wild animals (alligators, rhino) while searching for lost golf balls, and so on. We want to evaluate the claim that the mortality rate for golfers (sub-population) is higher than the general sporting population.
We are given the number of golfing deaths for each age group but we do not know the age-specific mortality rates for the sub-population of golfers. This means we cannot directly compare the golf-specific mortality rates with the standard population. We do know the mortality rates of the general sporting population. We can then use this information to calculate the expected number of golfing deaths. So we are really asking: what would the mortality rate among golfers be if the mortality rate among golfers were identical to the mortality rate of the sporting population? If the observed deaths are higher than the expected deaths, then golfers are indeed at a higher risk of death.
The ratio of the observed and expected deaths is called the standardized mortality ratio (SMR).
$$ \small \text{SMR} = \frac{\text{Observed deaths}}{\text{Expected deaths}}, $$
where
$$
\small
\begin{align}
SMR = 1 & \quad \text{the observed mortality rate is not unusual}, \\\
SMR < 1 & \quad \text{the observed mortality rate is higher}, \\\
SMR > 1 & \quad \text{the observed mortality rate is lower.}
\end{align}
$$
Consider the following fictitious data loaded into Stata, which gives the population size and deaths of golfers by age group for the year 2016. The last column gives the mortality rate in the general sporting population.
. list , abbrev(14) sep(6) +-----------------------------------------+ | age pop_golf obs_death pop_rate | |-----------------------------------------| 1. | 20-24 10000 6 .0004 | 2. | 25-29 15000 14 .0005 | 3. | 30-34 20000 32 .0007 | 4. | 35-44 40000 54 .0011 | 5. | 45-54 45000 120 .0019 | 6. | 55+ 50000 235 .0025 | +-----------------------------------------+
The next step is to calculate the expected deaths among golfers using the mortality rates for the general sporting population.
. gen exp_death = pop_golf * pop_rate . list , abbrev(12) sep(10) +-----------------------------------------------------+ | age pop_golf obs_death pop_rate exp_death | |-----------------------------------------------------| 1. | 20-24 10000 6 .0004 4 | 2. | 25-29 15000 14 .0005 7.5 | 3. | 30-34 20000 32 .0007 14 | 4. | 35-44 40000 54 .0011 44 | 5. | 45-54 45000 120 .0019 85.5 | 6. | 55+ 50000 235 .0025 125 | +-----------------------------------------------------+
To calculate the SMR, we sum the observed and the expected deaths and take the ratio.
. qui sum exp_death . scalar exp_tot = r(sum) . dis exp_tot 280 . qui sum obs_death . scalar obs_tot = r(sum) . dis obs_tot 461 . scalar SMR = obs_tot/exp_tot . dis SMR 1.6464286
The SMR of 1.646 indicates that golfers are indeed at a higher risk of death than the general sporting population, most likely due to being exposed to the natural elements of the golf course.