Life tables: Using survey data

Here a life table exercise using fake retrospective survey data:

You collected retrospective survey data on the age at first birth for twenty-five women who are age 50 at interview in 2009. Construct a life table where the state of interest is childlessness. Start the life table at age 15 and terminate it at exact age 50. Use five year intervals and, for speed, assume that “deaths” occur halfway through the 5-yr age interval (despite the fact that you have actual ages).Their ages (last birthday) at first birth are 29, 22, 43, 31, 26, 20, no birth, 25, 23, 30, no birth, 37, 21, 25, 28, no birth, 23 (twins), 27, 34, 25, 24, 21, 17, no birth, no birth. On the basis of this information, answer the following questions.

What is the probability that a childless woman at exact age 30 would still be childless at age 40?

First, we have to build the corresponding life table where the event of interest is childbearing. For childless women during the period I assign 50 years old. In the case of twins it is enough to specify just one event (first birth).

ages <- c(29, 22, 43, 31, 26, 20, 50, 25, 23, 30, 50, 37, 21,
25, 28, 50, 23, 27, 34, 25, 24, 21, 17, 50, 50)
# age
x <- seq(15, 49, 5)
# width of intervals
n <- rep(5, 7)
# births
b <- rep(NA, 7)

(dat <- data.frame(x, n, b))
##    x n  b
## 1 15 5 NA
## 2 20 5 NA
## 3 25 5 NA
## 4 30 5 NA
## 5 35 5 NA
## 6 40 5 NA
## 7 45 5 NA
# computing births: equivalent to counting logic values
for (i in 1:7) {
dat$b[i] <- sum(ages >= dat$x[i] & ages <= (dat$x[i + 1] -
1), na.rm = TRUE)
}

# computing lx
dat$lx <- NA
# childless women at the beginning of the period
dat$lx[1] <- length(ages)

for (i in 1:6) {
dat$lx[i + 1] <- dat$lx[i] - dat$b[i]
}

dat
##    x n b lx
## 1 15 5 1 25
## 2 20 5 7 24
## 3 25 5 7 17
## 4 30 5 3 10
## 5 35 5 1 7
## 6 40 5 1 6
## 7 45 5 0 5

So the probability a childless woman at exact age 30 would still be childless at age 40 is:

dat$l[dat$x == 40]/dat$l[dat$x == 30]
## [1] 0.6

What was the expected number of years of childlessness (prior to age 50) for a 25 year old childless woman?

We have to calculate the equivalent to life expectancy but for childlessness. For that, we need $L_x$ and $T_x$. Following the assumptions specified in the question: “deaths” occur halfway through the 5-yr age interval (despite the fact that you have actual ages), we can compute readily $L_x$:

# computing Lx assuming nax = interval/2
dat$Lx <- NA
for (i in 1:6) {
dat$Lx[i] <- dat$lx[i + 1] * dat$n[i] + (dat$n[i]/2) * dat$b[i]
}

# Lx for the last interval
dat$Lx[7] <- 5 * 5

# computing Tx
dat$Tx <- rev(cumsum(rev(dat$Lx)))
dat
##    x n b lx    Lx    Tx
## 1 15 5 1 25 122.5 420.0
## 2 20 5 7 24 102.5 297.5
## 3 25 5 7 17 67.5 195.0
## 4 30 5 3 10 42.5 127.5
## 5 35 5 1 7 32.5 85.0
## 6 40 5 1 6 27.5 52.5
## 7 45 5 0 5 25.0 25.0

The expected number of years of childlessness would be $\frac{T_{25}}{l_{25}}$

dat$T[dat$x == 25]/dat$l[dat$x == 25]
## [1] 11.5

What fraction of years between ages 15 and 49.99 were spent childless?

That would be $\frac{e_{15}}{35}$:

(dat$T[dat$x == 15]/dat$l[dat$x == 15])/35
## [1] 0.48

You observe that the parity progression ratios for this cohort take the following form. What is the TFR of this cohort?

A straightforward way to do it:

# given parity progression ratios
ppr <- c(NA, 0.8, 0.75, 0.25, 0)
# estimation PPR1
# the total number of first births was 20
# the total number of women is 25, so...
(ppr[1] <- (20/25))
## [1] 0.8
(TFR <- sum(cumprod(ppr)))
## [1] 2.04

How might your data collection method affect the accuracy of your answer to the former questions? Be specific, referencing the possible direction of bias if applicable.

In the survey we are only taking into account surviving women. We will und erestimate the probability of remaining childless in this cohort because single and childless women have higher mortality.

Assume that births happened exactly half-way through the year of age in which a woman reported a birth (e.g., a birth reported at age 34 happened at exact age 34.5). How inaccurate is the short-cut estimate of $_{5}a_{20}$ you used above?

We have to compute the average number of years lived childless according to the specification for the question.

# ages included in 5a20
(age <- seq(20, 24, 1))
## [1] 20 21 22 23 24
# years lived childless according to assumptions of the
# question
(x <- (age - 19.5))
## [1] 0.5 1.5 2.5 3.5 4.5
# births between 20 and 24
(y <- table(ages[ages >= 20 & ages <= 24]))
##
## 20 21 22 23 24
## 1 2 1 2 1
# average number years lived childless in the interval
mean(x * y)
## [1] 3.5

Thus, the short-cut estimate is inaccurate by 3.5 - (5/2) = 1.




blog comments powered by Disqus