Simple R package to define sample sizes and MOEs

I present a simple R package called sampler. The package defines sample sizes and margins of error (MOE) for proportions, as usually it is needed when designing public opinion surveys. In a previous post, I showed some functions that do mostly the same thing. This new package, though, includes some new features that can be useful when allocating a sample.

Installation

# you have to install devtools first
devtools::install_github("sdaza/sampler")
library(sampler)

Functions

The packages contains four functions:

ssize: computes sample size.
serr: computes MOE.
astrata: assigns sample sizes to strata.
serrst: computes MOE for stratified samples.

Define sample size: ssize

ssize(.05)

## [1] 384

# design effect (deff) and response rate (rr)
ssize(.05, deff = 1.2, rr = .90)

## [1] 512

# finite population correction
ssize(.05, deff = 1.2, rr = .90, N = 1000)

## [1] 370

# warning message
ssize(.05, deff = 1.2, rr = .90, N = 100)

## n is bigger than N in some rows: n = N

## [1] 100

Define sampling error: serr

serr(384)

## [1] 0.05

serr(512, deff = 1.2, rr = .90)

## [1] 0.05

serr(370, deff = 1.2, rr = .90, N = 1000)

## [1] 0.05

# we still get an answer
serr(100, deff = 1.2, rr = .90, N = 100)

## [1] 0.0569

Strata allocation: astrata

These examples show how to allocate a sample size into strata. Look at ?astrata in R for definitions of the allocation procedures that are available.

# I will use data.table
library(data.table)
chile <- data.table(chile)
chile

##     reg     pob  pr
##  1:   1  328782 0.3
##  2:   2  613328 0.4
##  3:   3  308247 0.5
##  4:   4  759228 0.5
##  5:   5 1808300 0.5
##  6:   6  910577 0.6
##  7:   7 1035593 0.3
##  8:   8 2100494 0.1
##  9:   9  983499 0.2
## 10:  10  834714 0.5
## 11:  11  107334 0.5
## 12:  12  163748 0.4
## 13:  13 7228581 0.6
## 14:  14  401548 0.2
## 15:  15  235081 0.3

# proportional for a sample of 1000
chile[, aprop := astrata(1000, wp = 1, N = pob)]

# fixed (same number by stratum)
chile[, afixed := astrata(1000, wp = 0, N = pob)]

# 40% proportional, 60% fixed
chile[, a40 := astrata(1000, wp =.4, N = pob)]

# 60% proportional, 40% fixed
chile[, a60 := astrata(1000, wp =.6, N = pob)]

# square-root
chile[, aroot := astrata(1000, method = "root", N = pob)]

# neyman
chile[, aneyman := astrata(1000, method = "neyman", N = pob, p = pr)]

# standard deviation
chile[, astdev := astrata(1000, method = "stdev", N = pob, p = pr)]

# error
chile[, aerr := astrata(e = .11, method = "error", N = pob, p = pr)]

##     reg     pob  pr aprop afixed a40 a60 aroot aneyman astdev aerr
##  1:   1  328782 0.3    18     67  47  38    41      18     66   67
##  2:   2  613328 0.4    34     67  54  47    56      37     71   76
##  3:   3  308247 0.5    17     67  47  37    40      19     72   79
##  4:   4  759228 0.5    43     67  57  53    62      46     72   79
##  5:   5 1808300 0.5   101     67  81  87    96     110     72   79
##  6:   6  910577 0.6    51     67  61  57    68      54     71   76
##  7:   7 1035593 0.3    58     67  63  62    73      58     66   67
##  8:   8 2100494 0.1   118     67  87  98   104      77     43   29
##  9:   9  983499 0.2    55     67  62  60    71      48     58   51
## 10:  10  834714 0.5    47     67  59  55    65      51     72   79
## 11:  11  107334 0.5     6     67  43  30    23       7     72   79
## 12:  12  163748 0.4     9     67  44  32    29      10     71   76
## 13:  13 7228581 0.6   406     67 203 270   192     432     71   76
## 14:  14  401548 0.2    23     67  49  41    45      20     58   51
## 15:  15  235081 0.3    13     67  45  35    35      13     66   67

Getting sampling error from a stratified sample: serrst

# the second most efficient allocation
serrst(n = chile$aprop, N = chile$pob, p = chile$pr)

## [1] 0.0288

# the worst solution
serrst(n = chile$afixed, N = chile$pob, p = chile$pr)

## [1] 0.0518

serrst(n = chile$a40, N = chile$pob, p = chile$pr)

## [1] 0.0339

serrst(n = chile$a60, N = chile$pob, p = chile$pr)

## [1] 0.0311

serrst(n = chile$aroot, N = chile$pob, p = chile$pr)

## [1] 0.0339

# the most efficient allocation
serrst(n = chile$aneyman, N = chile$pob, p = chile$pr)

## [1] 0.0285

serrst(n = chile$astdev, N = chile$pob, p = chile$pr)

## [1] 0.0508

serrst(n = chile$aerr, N = chile$pob, p = chile$pr)

## [1] 0.0498

Combining criteria

# get error for 60% proportional / 40% fixed allocation for each strata
chile[, error_a60 := serr(a60, p = pr)]

# assign sample sizes assuming 13% error for each strata
chile[, serr13 := astrata(e = .13, method = "error", N = pob, p = pr)]

# total error, not that good!
serrst(n = chile$serr13, N = chile$pob, p = chile$pr)

## [1] 0.0586

chile[, .(reg, pob, pr, a60, error_a60, serr13)]

##     reg     pob  pr a60 error_a60 serr13
##  1:   1  328782 0.3  38    0.1457     48
##  2:   2  613328 0.4  47    0.1401     55
##  3:   3  308247 0.5  37    0.1611     57
##  4:   4  759228 0.5  53    0.1346     57
##  5:   5 1808300 0.5  87    0.1051     57
##  6:   6  910577 0.6  57    0.1272     55
##  7:   7 1035593 0.3  62    0.1141     48
##  8:   8 2100494 0.1  98    0.0594     20
##  9:   9  983499 0.2  60    0.1012     36
## 10:  10  834714 0.5  55    0.1321     57
## 11:  11  107334 0.5  30    0.1789     57
## 12:  12  163748 0.4  32    0.1697     55
## 13:  13 7228581 0.6 270    0.0584     55
## 14:  14  401548 0.2  41    0.1224     36
## 15:  15  235081 0.3  35    0.1518     48

We can adjust a bit more:

# when error is higher than .13, use serr13
chile[, sfinal := ifelse(error_a60 > .13, serr13, a60)]

# new error by stratum
chile[, error_sfinal := serr(sfinal, p = pr)]

# total error, much better!
serrst(n = chile$sfinal, N = chile$pob, p = chile$pr)

## [1] 0.0309

# although the total sample size is now bigger
sum(chile$sfinal)

## [1] 1109

##     reg     pob  pr sfinal error_sfinal
##  1:   1  328782 0.3     48       0.1296
##  2:   2  613328 0.4     55       0.1295
##  3:   3  308247 0.5     57       0.1298
##  4:   4  759228 0.5     57       0.1298
##  5:   5 1808300 0.5     87       0.1051
##  6:   6  910577 0.6     57       0.1272
##  7:   7 1035593 0.3     62       0.1141
##  8:   8 2100494 0.1     98       0.0594
##  9:   9  983499 0.2     60       0.1012
## 10:  10  834714 0.5     57       0.1298
## 11:  11  107334 0.5     57       0.1298
## 12:  12  163748 0.4     55       0.1295
## 13:  13 7228581 0.6    270       0.0584
## 14:  14  401548 0.2     41       0.1224
## 15:  15  235081 0.3     48       0.1296

That’s it. A simple package to do simple calculations.