Clement Rasul: November 2017

I have often been asked how to determine sample size in Statistics, while I always referred to Cochran [1963:1] for a complete explanation and Yamane [1967:2] for a simpler version of the formula, I discovered (an eureka moment!) that the existing published formula for sample size can still be further simplified. Here is my simple sample size formula (Rasul Formula):

Sample Size (n) = 1 / e^2

where e is the margin of error (sampling error)

This simplified formula assumes an infinite population (or universe) and slightly greater than 95% confidence level. The input requirement is only the margin of error expressed as a percentage. If we assume further a 5% (.05) margin of error, the formula will give us a magic value of 400 (Rasul Sample), which can be interpreted as the sample size given infinite population at slightly more than 95% confidence level with 5% margin of error.

If you do not want to compute and was asked as to what sample size of proportion, the answer is 400 samples.

This simple formula also yields a simpler way of calculating the margin of error (e) given an actual sample number as follows (Rasul Margin of Error Formula):

Margin of Error (e) = (1/n)^(1/2)

or

Margin of Error (e) is the square root of 1/n.

Let us say, you were able to get only 300 samples and you want to know what is your margin of error (e), applying the derived formula above will give you:

e = (1/300)^(0.5)

e = (0.003333)^(0.5)

e = 0.057773 (or 6% rounded off to the nearest integer percent)

This tells us that even if we plan for a 5% margin of error, in real life situation, actual field condition (like peace and order situation) may not allow us to get the required sample size. This particular formula can easily aid us in calculating the margin of error given the actual sample gathered.

Derivation

For those interested on how the formula was derived, here it is:

a) Cochran's sample formula for proportions given infinite population (universe) is stated as,

Sample Size (n) = ( Z^2 * p *q ) / e^2

where:

Z is the abscissa of area of the normal curve
p is estimated proportion of an attribute that is present in the population
q is (1 - p )
e is the margin of error

b) Remember that the Z score is the number of standard deviations a given proportion is away from the mean. Given that, the desired confidence level and Z score relationship are as follows:

Confidence Level Z score

90% 1.65
95% 1.96
99% 2.58

c) Assuming the more widely acceptable confidence level of 95% and borrowing from the idea of Taro Yamane of approximating the Z score of 95% confidence by rounding it off to 2 instead of using 1.96, and

d) Assuming further a more conservative proportion (p) of 50% (This is often the assumed value if we do not know the given proportion, which is most likely). Therefore, p * (1 - p ) = 0.5 * (1-0.5) yields the value of 0.25,

e) The formula can now can be stated as:

n = [ 2^2 * (0.25) ]/e^2

n = [ 4 * 0.25 ]/e^2

Thereby producing the more simpler formula of:

n = 1/ e^2

--------------------

References

[1] Cochran, W.G. 1963. Sampling Techniques, 2nd Edition, New York. John Wiley and Sons, Inc.

[2] Yamane, Taro. 1967. Statistics: An Introductory Analysis, 2nd Edition, New York. Harper and Row.

Clement Rasul

Tuesday, November 28, 2017

Simple Sample Size Formula