Simple R package to define sample sizes and MOEs

In this post I present a simple R package called sampler. The package defines sample sizes and margins of error (MOE) for a proportion, as usually is done when designing public opinion surveys. In a previous post, I presented some functions that do mostly the same thing. This new package, however, includes some new features that might be useful.

Installation

# you have to install devtools first
devtools::install_github("sdaza/sampler")
library(sampler)

Functions

The packages contains four functions:

Define sample size: ssize

ssize(.05)
## [1] 384
# design effect (deff) and response rate (rr)
ssize(.05, deff = 1.2, rr = .90)
## [1] 512
# finite population correction
ssize(.05, deff = 1.2, rr = .90, N = 1000)
## [1] 370
# warning message
ssize(.05, deff = 1.2, rr = .90, N = 100)
## n is bigger than N in some rows: n = N
## [1] 100

Define sampling error: serr

serr(384)
## [1] 0.05
serr(512, deff = 1.2, rr = .90)
## [1] 0.05
serr(370, deff = 1.2, rr = .90, N = 1000)
## [1] 0.05
# we still get an answer
serr(100, deff = 1.2, rr = .90, N = 100)
## [1] 0.0569

Strata allocation: astrata

These examples show how to allocate a sample size across strata. Look at ?astrata in R for definitions of the allocation procedures that are available.

# I will use data.table
library(data.table)
chile <- data.table(chile)
chile
##     reg     pob  pr
## 1: 1 328782 0.3
## 2: 2 613328 0.4
## 3: 3 308247 0.5
## 4: 4 759228 0.5
## 5: 5 1808300 0.5
## 6: 6 910577 0.6
## 7: 7 1035593 0.3
## 8: 8 2100494 0.1
## 9: 9 983499 0.2
## 10: 10 834714 0.5
## 11: 11 107334 0.5
## 12: 12 163748 0.4
## 13: 13 7228581 0.6
## 14: 14 401548 0.2
## 15: 15 235081 0.3
# proportional for a sample of 1000
chile[, aprop := astrata(1000, wp = 1, N = pob)]

# fixed (same number by stratum)
chile[, afixed := astrata(1000, wp = 0, N = pob)]

# 40% proportional, 60% fixed
chile[, a40 := astrata(1000, wp =.4, N = pob)]

# 60% proportional, 40% fixed
chile[, a60 := astrata(1000, wp =.6, N = pob)]

# square-root
chile[, aroot := astrata(1000, method = "root", N = pob)]

# neyman
chile[, aneyman := astrata(1000, method = "neyman", N = pob, p = pr)]

# standard deviation
chile[, astdev := astrata(1000, method = "stdev", N = pob, p = pr)]

# error
chile[, aerr := astrata(e = .11, method = "error", N = pob, p = pr)]
##     reg     pob  pr aprop afixed a40 a60 aroot aneyman astdev aerr
## 1: 1 328782 0.3 18 67 47 38 41 18 66 67
## 2: 2 613328 0.4 34 67 54 47 56 37 71 76
## 3: 3 308247 0.5 17 67 47 37 40 19 72 79
## 4: 4 759228 0.5 43 67 57 53 62 46 72 79
## 5: 5 1808300 0.5 101 67 81 87 96 110 72 79
## 6: 6 910577 0.6 51 67 61 57 68 54 71 76
## 7: 7 1035593 0.3 58 67 63 62 73 58 66 67
## 8: 8 2100494 0.1 118 67 87 98 104 77 43 29
## 9: 9 983499 0.2 55 67 62 60 71 48 58 51
## 10: 10 834714 0.5 47 67 59 55 65 51 72 79
## 11: 11 107334 0.5 6 67 43 30 23 7 72 79
## 12: 12 163748 0.4 9 67 44 32 29 10 71 76
## 13: 13 7228581 0.6 406 67 203 270 192 432 71 76
## 14: 14 401548 0.2 23 67 49 41 45 20 58 51
## 15: 15 235081 0.3 13 67 45 35 35 13 66 67

Getting sampling error from a stratified sample: serrst

# the second most efficient allocation
serrst(n = chile$aprop, N = chile$pob, p = chile$pr)
## [1] 0.0288
# the worst solution
serrst(n = chile$afixed, N = chile$pob, p = chile$pr)
## [1] 0.0518
serrst(n = chile$a40, N = chile$pob, p = chile$pr)
## [1] 0.0339
serrst(n = chile$a60, N = chile$pob, p = chile$pr)
## [1] 0.0311
serrst(n = chile$aroot, N = chile$pob, p = chile$pr)
## [1] 0.0339
# the most efficient allocation
serrst(n = chile$aneyman, N = chile$pob, p = chile$pr)
## [1] 0.0285
serrst(n = chile$astdev, N = chile$pob, p = chile$pr)
## [1] 0.0508
serrst(n = chile$aerr, N = chile$pob, p = chile$pr)
## [1] 0.0498

Combining criteria

# get error for 60% proportional / 40% fixed allocation for each strata
chile[, error_a60 := serr(a60, p = pr)]

# assign sample sizes assuming 13% error for each strata
chile[, serr13 := astrata(e = .13, method = "error", N = pob, p = pr)]

# total error, not that good!
serrst(n = chile$serr13, N = chile$pob, p = chile$pr)
## [1] 0.0586
chile[, .(reg, pob, pr, a60, error_a60, serr13)]
##     reg     pob  pr a60 error_a60 serr13
## 1: 1 328782 0.3 38 0.1457 48
## 2: 2 613328 0.4 47 0.1401 55
## 3: 3 308247 0.5 37 0.1611 57
## 4: 4 759228 0.5 53 0.1346 57
## 5: 5 1808300 0.5 87 0.1051 57
## 6: 6 910577 0.6 57 0.1272 55
## 7: 7 1035593 0.3 62 0.1141 48
## 8: 8 2100494 0.1 98 0.0594 20
## 9: 9 983499 0.2 60 0.1012 36
## 10: 10 834714 0.5 55 0.1321 57
## 11: 11 107334 0.5 30 0.1789 57
## 12: 12 163748 0.4 32 0.1697 55
## 13: 13 7228581 0.6 270 0.0584 55
## 14: 14 401548 0.2 41 0.1224 36
## 15: 15 235081 0.3 35 0.1518 48

We can adjust a bit more:

# when error is higher than .13, use serr13
chile[, sfinal := ifelse(error_a60 > .13, serr13, a60)]

# new error by stratum
chile[, error_sfinal := serr(sfinal, p = pr)]

# total error, much better!
serrst(n = chile$sfinal, N = chile$pob, p = chile$pr)
## [1] 0.0309
# although the total sample size is now bigger
sum(chile$sfinal)
## [1] 1109
##     reg     pob  pr sfinal error_sfinal
## 1: 1 328782 0.3 48 0.1296
## 2: 2 613328 0.4 55 0.1295
## 3: 3 308247 0.5 57 0.1298
## 4: 4 759228 0.5 57 0.1298
## 5: 5 1808300 0.5 87 0.1051
## 6: 6 910577 0.6 57 0.1272
## 7: 7 1035593 0.3 62 0.1141
## 8: 8 2100494 0.1 98 0.0594
## 9: 9 983499 0.2 60 0.1012
## 10: 10 834714 0.5 57 0.1298
## 11: 11 107334 0.5 57 0.1298
## 12: 12 163748 0.4 55 0.1295
## 13: 13 7228581 0.6 270 0.0584
## 14: 14 401548 0.2 41 0.1224
## 15: 15 235081 0.3 48 0.1296

That’s it. A simple package to do simple calculations.




blog comments powered by Disqus