In this post I present a simple R package called sampler . The package defines sample sizes and margins of error (MOE) for a proportion, as usually is done when designing public opinion surveys. In a previous post , I presented some functions that do mostly the same thing. This new package, however, includes some new features that might be useful.

Installation
# you have to install devtools first devtools :: install_github ( "sdaza/sampler" ) library ( sampler )

Functions
The packages contains four functions:

ssize : computes sample size.
serr : computes MOE.
astrata : assigns sample sizes to strata.
serrst : computes MOE for stratified samples.
Define sample size: ssize
ssize ( .05 )

`## [1] 384`

# design effect (deff) and response rate (rr) ssize ( .05 , deff = 1.2 , rr = .90 )

`## [1] 512`

# finite population correction ssize ( .05 , deff = 1.2 , rr = .90 , N = 1000 )

`## [1] 370`

# warning message ssize ( .05 , deff = 1.2 , rr = .90 , N = 100 )

`## n is bigger than N in some rows: n = N`

`## [1] 100`

Define sampling error: serr
serr ( 384 )

`## [1] 0.05`

serr ( 512 , deff = 1.2 , rr = .90 )

`## [1] 0.05`

serr ( 370 , deff = 1.2 , rr = .90 , N = 1000 )

`## [1] 0.05`

# we still get an answer serr ( 100 , deff = 1.2 , rr = .90 , N = 100 )

`## [1] 0.0569`

Strata allocation: astrata
These examples show how to allocate a sample size across strata. Look at ?astrata in R for definitions of the allocation procedures that are available.

# I will use data.table library ( data.table ) chile <- data.table ( chile ) chile

`## reg pob pr` ## 1: 1 328782 0.3 ## 2: 2 613328 0.4 ## 3: 3 308247 0.5 ## 4: 4 759228 0.5 ## 5: 5 1808300 0.5 ## 6: 6 910577 0.6 ## 7: 7 1035593 0.3 ## 8: 8 2100494 0.1 ## 9: 9 983499 0.2 ## 10: 10 834714 0.5 ## 11: 11 107334 0.5 ## 12: 12 163748 0.4 ## 13: 13 7228581 0.6 ## 14: 14 401548 0.2 ## 15: 15 235081 0.3

# proportional for a sample of 1000 chile [, aprop := astrata ( 1000 , wp = 1 , N = pob )] # fixed (same number by stratum) chile [, afixed := astrata ( 1000 , wp = 0 , N = pob )] # 40% proportional, 60% fixed chile [, a 40 := astrata ( 1000 , wp = .4 , N = pob )] # 60% proportional, 40% fixed chile [, a 60 := astrata ( 1000 , wp = .6 , N = pob )] # square-root chile [, aroot := astrata ( 1000 , method = "root" , N = pob )] # neyman chile [, aneyman := astrata ( 1000 , method = "neyman" , N = pob , p = pr )] # standard deviation chile [, astdev := astrata ( 1000 , method = "stdev" , N = pob , p = pr )] # error chile [, aerr := astrata ( e = .11 , method = "error" , N = pob , p = pr )]

`## reg pob pr aprop afixed a40 a60 aroot aneyman astdev aerr` ## 1: 1 328782 0.3 18 67 47 38 41 18 66 67 ## 2: 2 613328 0.4 34 67 54 47 56 37 71 76 ## 3: 3 308247 0.5 17 67 47 37 40 19 72 79 ## 4: 4 759228 0.5 43 67 57 53 62 46 72 79 ## 5: 5 1808300 0.5 101 67 81 87 96 110 72 79 ## 6: 6 910577 0.6 51 67 61 57 68 54 71 76 ## 7: 7 1035593 0.3 58 67 63 62 73 58 66 67 ## 8: 8 2100494 0.1 118 67 87 98 104 77 43 29 ## 9: 9 983499 0.2 55 67 62 60 71 48 58 51 ## 10: 10 834714 0.5 47 67 59 55 65 51 72 79 ## 11: 11 107334 0.5 6 67 43 30 23 7 72 79 ## 12: 12 163748 0.4 9 67 44 32 29 10 71 76 ## 13: 13 7228581 0.6 406 67 203 270 192 432 71 76 ## 14: 14 401548 0.2 23 67 49 41 45 20 58 51 ## 15: 15 235081 0.3 13 67 45 35 35 13 66 67

Getting sampling error from a stratified sample: serrst
# the second most efficient allocation serrst ( n = chile $ aprop , N = chile $ pob , p = chile $ pr )

`## [1] 0.0288`

# the worst solution serrst ( n = chile $ afixed , N = chile $ pob , p = chile $ pr )

`## [1] 0.0518`

serrst ( n = chile $ a 40 , N = chile $ pob , p = chile $ pr )

`## [1] 0.0339`

serrst ( n = chile $ a 60 , N = chile $ pob , p = chile $ pr )

`## [1] 0.0311`

serrst ( n = chile $ aroot , N = chile $ pob , p = chile $ pr )

`## [1] 0.0339`

# the most efficient allocation serrst ( n = chile $ aneyman , N = chile $ pob , p = chile $ pr )

`## [1] 0.0285`

serrst ( n = chile $ astdev , N = chile $ pob , p = chile $ pr )

`## [1] 0.0508`

serrst ( n = chile $ aerr , N = chile $ pob , p = chile $ pr )

`## [1] 0.0498`

Combining criteria
# get error for 60% proportional / 40% fixed allocation for each strata chile [, error_a60 := serr ( a 60 , p = pr )] # assign sample sizes assuming 13% error for each strata chile [, serr13 := astrata ( e = .13 , method = "error" , N = pob , p = pr )] # total error, not that good! serrst ( n = chile $ serr13 , N = chile $ pob , p = chile $ pr )

`## [1] 0.0586`

chile [, . ( reg , pob , pr , a 60 , error_a60 , serr13 )]

`## reg pob pr a60 error_a60 serr13` ## 1: 1 328782 0.3 38 0.1457 48 ## 2: 2 613328 0.4 47 0.1401 55 ## 3: 3 308247 0.5 37 0.1611 57 ## 4: 4 759228 0.5 53 0.1346 57 ## 5: 5 1808300 0.5 87 0.1051 57 ## 6: 6 910577 0.6 57 0.1272 55 ## 7: 7 1035593 0.3 62 0.1141 48 ## 8: 8 2100494 0.1 98 0.0594 20 ## 9: 9 983499 0.2 60 0.1012 36 ## 10: 10 834714 0.5 55 0.1321 57 ## 11: 11 107334 0.5 30 0.1789 57 ## 12: 12 163748 0.4 32 0.1697 55 ## 13: 13 7228581 0.6 270 0.0584 55 ## 14: 14 401548 0.2 41 0.1224 36 ## 15: 15 235081 0.3 35 0.1518 48

We can adjust a bit more:

# when error is higher than .13, use serr13 chile [, sfinal := ifelse ( error_a60 > .13 , serr13 , a 60 )] # new error by stratum chile [, error_sfinal := serr ( sfinal , p = pr )] # total error, much better! serrst ( n = chile $ sfinal , N = chile $ pob , p = chile $ pr )

`## [1] 0.0309`

# although the total sample size is now bigger sum ( chile $ sfinal )

`## [1] 1109`

`## reg pob pr sfinal error_sfinal` ## 1: 1 328782 0.3 48 0.1296 ## 2: 2 613328 0.4 55 0.1295 ## 3: 3 308247 0.5 57 0.1298 ## 4: 4 759228 0.5 57 0.1298 ## 5: 5 1808300 0.5 87 0.1051 ## 6: 6 910577 0.6 57 0.1272 ## 7: 7 1035593 0.3 62 0.1141 ## 8: 8 2100494 0.1 98 0.0594 ## 9: 9 983499 0.2 60 0.1012 ## 10: 10 834714 0.5 57 0.1298 ## 11: 11 107334 0.5 57 0.1298 ## 12: 12 163748 0.4 55 0.1295 ## 13: 13 7228581 0.6 270 0.0584 ## 14: 14 401548 0.2 41 0.1224 ## 15: 15 235081 0.3 48 0.1296

Thatâ€™s it. A simple package to do simple calculations.

Please enable JavaScript to view the comments powered by Disqus.
blog comments powered by