Let us consider the estimation of sample size for a cross-sectional study.

In order to estimate the required sample size, we need to know the following:

**p**: The prevalence of the condition/ health state. If the prevalence is 32%, it may be either used as such (32%), or in its decimal form (0.32).

**q**: i. When p is in percentage terms: (100-p)

ii. When p is in decimal terms: (1-p)

**d (or l)**: The precision of the estimate. This could either be the relative precision, or the absolute precision. This will be discussed later in this post.

**Za [Z alpha]**: The value of z from the probability tables. If the values are normally distributed, then 95% of the values will fall within 2 standard errors of the mean. The value of z corresponding to this is 1.96 (from the standard normal variate tables).

The formula for estimating sample size is given as:

(Za)^2[p*q] where the symbol ^ means ‘to the power of’; * means ‘multiplied by’

N= d^2 that is, “Z-alpha squared into pq; upon d-square”

substituting the values of Za, we get:

N= (1.96)^2[p*q]

d^2

We can round off the value of Za (1.96) to 2, to obtain:

N= (2)^2[p*q]

d^2

or, N= 4pq/ d^2 that is, “4 pq by d-square”

**Example:**

I wish to conduct a cross-sectional study on awareness of Hepatitis B among school children. A literature search reveals that other investigators have reported knowledge to range from 5% to 20% among students of grades 6 through 8. What should the size of my sample be?

The formula requires us to input the value of d (precision). If the absolute precision is known, there is no problem. However, often we can only input a relative precision. Where do we get the value of relative precision from?

Typically, relative precision is taken as a proportion of ‘p’. The maximum permissible limit is 20% of ‘p’.

In the above example, if ‘p’ is 20%, then ‘d’ will be (20/100)*20= 0.2*20= 4 {Taking a relative precision of 20%}.

This means that we will be able to detect a ‘p’ (prevalence) of 18% or more {half the value of relative precision on either side of ‘p’–> +/- 2%: 18% to 22%}.

That is, by taking a relative precision of 20% of ‘p’, the study will be able to detect the true awareness level if the actual prevalence is 18% or more. If the actual prevalence is less than 18%, however, the study will be unable to detect it accurately.

Therefore, the larger the value of ‘p’ (prevalence), the larger the possible value of ‘d’ (relative precision), keeping ‘d’ fixed (say, at 20% of ‘p’). If the prevalence is 50%, ‘d’ (20% of ‘p’) would then be 0.2*50= 10 (as compared to ‘d’ = 4 when ‘p’ = 20%).

The reverse is also true: the smaller the value of ‘p’, the smaller the value of ‘d’. A smaller ‘d’ implies a larger sample size. Therefore, the choice of ‘p’ is crucial.

We can now input the values in the formula to obtain the sample size:

For the calculation we will take ‘d’ as 4. This yields:

N= (4*20*80)/ (4*4)

= 400 this sample size will enable us to detect the truth if the prevalence is between 18-22% (or more).

If we took ‘p’= 5, then the sample size would be:

N= (4*5*95)/(1*1) [‘d’= 0.2*5= 1]

= 1900 this sample size will enable us to detect the truth if the prevalence is between 4-6% (or more).

So should I take ‘p’= 20% or ‘p’=5%?

That depends upon:

1. The location of the original study- if you are planning to conduct the study in an urban area, use the prevalence reported by studies conducted in urban areas, and vice versa.

2. The available resources (time, manpower, money, etc.). Aim for the largest feasible sample size. The size should be adequate to yield 80% power. Do not unnecessarily increase the sample size unless the intention is to obtain greater power. If so, please mention the same in the methodology section.

3. The results of your pilot study. If you have conducted a pilot study, the prevalence obtained from that study should be taken as ‘p’. This will be much more accurate than any other external value.

**Note 1**: * If you have multiple objectives, you must calculate the required sample size for each objective, then choose the largest sample size thus obtained. This will ensure adequate power for all objectives, else the study will lack power for one or more objectives.* That is, you may not be able to detect a significant result where it actually exists

*because*you failed to include enough subjects to detect it.

**Note 2**: It is advisable to mention a range rather than a single value for sample size. This is standard practice in the west, but not in India. A range may be obtained by calculating the sample size for different values of ‘p’.

sir i am doing a work on chronic kidney disease comparison in diabetics with and without ckd

how to calculate sample size

Dear Sangeeta,

That depends upon your objectives, study design, power, effect size (expected), etc.

I recommend consulting a local statistician for further inputs as it is not feasible to discuss this matter here.

I’d be glad to clarify any doubts you may have subsequent to such a consultation.

Regards,

Dr. Roopesh

dear sir..

I am using only questionnaire forms to obtain all the data for my research… is this formula is applicable for my study.. How did you choose 18%? is it an estimation value?

Dear Shazwan,

This formula can be used for the kind of studies you’ve described.

What I have described is a hypothetical situation where the prevalence is 20%.

The question any researcher has to face is, “What if my prevalence estimate is wrong?”. That is, what if the true prevalence is not 20%?

In the example, taking a relative precision of 20% yields d=4. This indicates the range of prevalence that can be captured by the study.

If we were to consider that d gives us the 95% limits of prevalence, then the range of prevalence detectable by the study is half of d on either side of the prevalence estimate (here 20%).

Thus, the range of prevalence lies from 20-2 to 20+2: 18% to 22% respectively.

In other words, the study will capture a prevalence of 18% or higher. If the true prevalence is lower than 18%, the study may not have adequate power to detect a significant difference.

I hope this clarifies the matter.

Regards,

Dr. Roopesh

if the prevalence is 2.25 % what should be the sample size. the confidence interval be 95 %.

Dear Sitwat,

Using the formula described above, the required sample size would be 4344 (assuming 20% relative precision).

Dr. Roopesh

I need more illustration in the difference between the relative precision and the absolute precision

Dear Mariam,

I’ve been busy during Christmas-time. Please allow a few days for a response.

Regards,

Dr. Roopesh

Dear Mariam,

I hope you have read the article I wrote in response to your query. You may find it here: https://communitymedicine4asses.wordpress.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

Hope this helps.

Regards,

Dr. Roopesh

among 64 person 8 were hypertensive.What will be the prevalence if CI 95%?

Dear Anonymous,

Apologies for the delay in responding- I somehow oversaw the query.

The prevalence is a proportion. In this case, the numerator is 8, while the denominator is 64. Thus, the prevalence rate is 125/ 1000 people.

One may calculate a 95% CI for this proportion by using the free online tool at this site:

http://www.graphpad.com/quickcalcs/confInterval1/

Based on the recommended method, the 95% CI is given as: 0.06 to 0.23

http://www.graphpad.com/quickcalcs/confInterval2/

I hope that helped.

Regards,

Dr. Roopesh

Dear Farouk,

If you plan to conduct a study on macular thickness, I assume it is with regard to a specific condition. That is, your objective would be something like this: “To determine macular thickness among individuals with/ without XYZ..”.

In order to have adequate power for the study (that is, to be able to detect a difference in macular thickness between the subjects of interest and others [or any two groups]), one needs to factor in the prevalence of the condition of interest among them.

Failure to do so may result in the study having inadequate power. Thus, no difference may be detected although there actually is a difference between the two groups.

Bottomline: The prevalence is required to estimate sample size.

The only exception to this rule is an in-vitro study or a study in lab animals, where a rule of thumb usually applies.

I hope that helps clear your doubt.

Regards,

Dr. Roopesh

Dear Roopesh,

I have doubt in using absolute and relative precision.

For example,I want to assess prevalence of oral malodour in a city, in a pilot study I got prevalence as 20%; I would like to have 95% Confidence Interval;here which precision should I consider for sample size calculation.i.e. ABSOLUTE OR RELATIVE PRECISION?

Kindly let me know, what conditions we should use absolute precision and relative precisions?

Dear Sudhakar,

I have discussed Absolute and Relative precision in another post. You may get more clarity on the concepts in that post.

To access that post, please follow the URL: https://communitymedicine4asses.wordpress.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

Do let me know if you need additional assistance.

Regards,

Dr. Roopesh

Dear Dr,

I’ve been reading your post & really appreciate your concern in spreading the knowledge. I am new in research and i am conducting a research regarding incidence of an acute infection in one particular hospital. I need to ask you that keeping in mind the previous incidence of 11.2% which was from another state another hospital but same country. However, no other study has been conducted as per now in this country on the specific topic. what should be my expected sample size??

Cross-sectional study means a duration based study with minimum sample size as per calculated using this formula? right?

I will be grateful for you guidance! thanks.

Dear Omaid,

A cross-sectional study is not suitable to determine incidence- it can only help estimate the prevalence. In order to obtain incidence (new cases), one needs to undertake a longitudinal (cohort) study.

When one calculates sample size for a cross-sectional study, the usual input parameter is prevalence.

A cross-sectional study simply means a study that captures a snap-shot of the population at a point in time. It has nothing to do with sample size. The key thing is to interview each subject only once during the study duration. You may read about cross-sectional studies in my post by the same name.

In view of the above, I am unable to respond appropriately to your query on sample size at present. Maybe I will be able to help you estimate sample size if you modify the question suitably.

Hope this helped.

Regards,

Dr. Roopesh

Thanks for your guidance Dr. I studied and got the idea of study designs. I’ve changed my design to prospective observational study.

Now kindly let me know what should be the appropriate sample size to calculate the incidence by keeping in mind the prevalence of 11.2% & CI 95%?

Dear Omaid,

A prospective observational study could be longitudinal (cohort), or otherwise (cross-sectional). However, traditional teaching indicates that one can determine incidence (new cases) only via a cohort study. Are you planning to conduct a cohort study?

I will assume that you actually plan on conducting a cross-sectional study (simply because it is the easiest and least expensive). In that case, the sample size would be:

{4*11.2*(100-11.2)}/ {(11.2*0.2)^2}

This works out to be: 3978.24/ 5.0= 795 or approximately 800.

Again, please note that a cross sectional study is not appropriate to obtain incidence.

As a rule of thumb, the lower the prevalence, the larger the sample size.

I hope this was useful.

Regards,

Dr. Roopesh

Dr. I have to find incidence of an acute disease in a current hospital population. So I am conducting a prospective observational study which can be termed as cohort study. Because I’ll be comparing two groups with similar characteristics out of which one will be those who develop the disease and other did not. I’m finding up the risk factors also.

So, I calculated my population size by Denial’s Formula. And it gave me sample said of 186 including 25% dropout rate. Is it wrong?

Denials formula is one which is given in Cochran book of Sampling Techniques. The formula is: n = Z^2 x p(1-p)/d^2

where Z is Confidence Interval = 1.96 for 95%

d (margin of error) = 0.05 for 5%

p (Prevalence) = 0.11 for 11.2%

This gives me n=151

adding 25% drop off rate it becomes 189 which is my final sample size.

HOWEVER, i will be conducting a duration based study like 4-5 months study in which i will be taking all the registered patients during that time period.

Finding out the sample size is only so that i have an idea of at least what should be the minimum number of subjects to be taken into account.

Please let me know what u think over it??

Thanks again.. u r doing a great deed, no one now a days takes out time and help people like this. Really appreciable!

Dear Omaid,

The formula is correct, but your application of the formula is not.

You need to calculate ‘d’ as a proportion of ‘p’- 5% of 11.2.

That should yield the required sample size.

Of course, the sample size will skyrocket, but it is expected with a prevalence as low as 11.2%.

First calculate the required sample size using my suggestion above, then calculate how many more you’d need assuming 25% loss to follow up.

That will give you the grand total required.

Typically, one does not compute sample size using an estimated loss to follow up of more than 20%.

Any loss to follow up more than 20% is considered unacceptably high and reflects poor patient selection or other methodological flaws.

Hope this helps clarify things.

Regards,

Dr. Roopesh

if the prevalence is 30 to 50% then what will be the sample size?

Dear Fatima,

Using the formula for a cross-sectional study, and assuming precision to be 10%, the sample size would be between 84 and 100.

I would recommend that the larger sample size be used, though.

Regards,

Dr. Roopesh

CI 95% and margin of error 5%, plz tell in detail. thank you.

Dear Fatima,

Let us assume that the true population value is 100 units.

When we estimate something, there is always a chance of error. The amount of error indicates the precision of the estimate- smaller the error, more precise the estimate. The margin of error merely refers to the magnitude of error we would like to have.

If the error margin is 5%, then in the above example the estimate should be accurate to within +/-5% of 100 (the true population value).

The 95% CI indicates that if we obtained sample estimates 100 times, in 95 instances we would get the true population value.

Combining the two:

The sample estimate will not differ from the true population value by more than +/-5 percent (margin of error) 95 percent of the time (confidence interval).

I hope this helps clarify your doubt.

Regards,

Dr. Roopesh

Today I actually got an understanding of margin of error and confidence interval. Thank you very much. Your way of teaching is absolute!

Dear Omaid,

Thanks for the kind words.

Regards,

Dr. Roopesh

Dear Dr. Roopesh

Hey & good-day

It’s with great pleasure that I write for you

I’m PhD student, I prepare myself to do my research, so I would like to ask you; How can I calculate sample size, if the prevalence of previous study is 30%?

Also, Is it equation satiable for quantitative variables

Dear Ammar,

The sample size calculation does not depend upon the type of variable.

However, the number of variables should not be excessive- that would decrease the power.

Please read the related article on a general rule of thumb for sample size calculation for more details.

Regards,

Dr. Roopesh

Dear Ammar,

I presume the study design is a cross sectional study. If so, the sample size would be:

233 (relative precision 20%) to 933 (relative precision 10%).

Please note that the calculation is based on the formula 4pq/ (d^2), where

p= (prevalence)=30

q=(100-p)=70

d= (precision)= either 10% or 20% of p

I would recommend the larger sample size as the possibility of having low power is reduced. However, practical considerations might dictate otherwise.

I hope this helps.

Regards,

Dr. Roopesh

Dear Dr. Roopesh

Hello and good-day

Many thanks for your information, I appreciated it. I asked before about quantitative variables, because I had read some articles talking about ” quantitative variables” so I ask you again “What’s the different?”

Also how you calculate the precision? please tell in detail. thank you.

Dear Ammar,

I have described quantitative and qualitative variables in this post:

https://communitymedicine4asses.wordpress.com/2013/01/13/types-of-data-qualitative-and-quantitative-data/

I have discussed precision in this post:

https://communitymedicine4asses.wordpress.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

Please go through the above links and let me know if you continue to have queries.

Regards,

Dr. Roopesh

Dear Dr. Roopesh

Hello,

Many thanks for your guidance, but my question is still persists;

Is it equation satiable for quantitative or qualitative variables?

” Because I had read some articles talking about these ”

Dear Ammar,

I’m afraid I don’t understand what exactly you mean by the term ‘equation satiable’.

I have not come across any sample size estimation formula that is dependent upon the type of variable.

I have mentioned the elements required for sample size calculation in the articles on the topic.

Unless you specify/ clarify your question further, I’m afraid I will be unable to respond appropriately.

Regards,

Dr. Roopesh

Dear Dr. Roopesh

I’m so sorry for confuse you, I was mean ” equation suitable”

Here the link talking about the sample size estimation formula that is dependent upon the type of variable;

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3775042/

Again many thanks for your guidance

Dear Ammar,

Thanks for the clarification and link. I disagree with the authors. I learnt the same formulae as:

‘Formula to calculate sample size when proportions are known;

Formula to calculate sample size when mean, SD is known’, etc.

The reason for my disagreement is that numeric variables can also be expressed as a proportion/ percentage. Therefore, to claim that the first formula (using proportions) is only applicable to qualitative data does not make sense to me.

Moreover, often, the only details available are the relative proportions, not mean/ SD. If the formula were applicable only to qualitative variables, most studies would never have seen the light of day!

You can test the veracity of my statement by searching for journal articles mentioning percentages/ proportions for outcome variables as opposed to mean/ SD.

You should find that the former far outnumber the latter; and that most of those articles describe quantitative variables.

In addition, you could seek expert opinion from someone else as well.

Do let me know what you discover- I might learn something in the process.

Regards,

Dr. Roopesh

Dear Ammar,

I haven’t heard from you since I posted my response.

I wish to clarify that I wasn’t being sarcastic when I said I could learn something through your efforts. I truly believe that every interaction is a learning experience, and that one could learn from anyone. Besides, I don’t profess to know everything about everything, either. Naturally, there is a distinct possibility of learning something through your response(s).

Regards,

Dr. Roopesh

Dear Dr. Roopesh

Good day

I apologize for delay reply to you

You very much for the detailed explanation. This has been useful in my research

I will continue to search in the subject until I get to a satisfactory result

And I’ll tell you everything findings

With best regards

Ammar Elmajzoup

Dear Dr. Roopesh

As, sample size = 4PQ/L^2, Where , P= Prevalence & Q =100-P

L= Margin of error

If P = 20%,L=10%, Than Q=100-20=80%

Sample size = 4PQ\L^2

= 4*20*80\10*10 =64

Or some books have mention

L= 10 OF P =10*20\100 = 2

Than Sample size = 4PQ\L^2

= 4*20*80\4

= 1600

In that case which ans is correct.Either 1600 sample size needed or 64 sample size needed

Dear Haripal,

I would recommend the second approach.

With a sample size of 64, you may encounter issues with power. The same is unlikely with a sample size of 1600.

However, practical considerations would supervene.

Regards,

Dr. Roopesh

Dear Dr. Roopesh,

You mentioned in the article that the sample size should be adequate to yield 80%. May I know which part of the formula that are related to power? For example, what value do I change if I want to increase or decrease the power?

We want to conduct a cross sectional study and from previous study the prevalence of disease is 6%. I would like to know how many samples I should include in the study so it does capture “adequate” number of people with the disease. Is the formula given a right formula to use in this context?

Many thanks and your help is much appreciated.

Regards,

Betsy

Dear Betsy,

The formula mentioned in the article is for the situation when a proportion is the parameter of study. The actual formula is:

n= (Za^2 *p*q)/ d^2

where

Za (Z alpha)= Standard Normal Deviate (Z value)

p= proportion/ prevalence of interest

q= 100-p

d= expected precision

Alpha refers to the Type I error rate. This is usually kept at 5% (ie 0.05).

The corresponding value of Z (for alpha of 0.05) is 1.96.

The formula given in the article is a simplification- as 1.96 is ~2,

Za^2 = (2^2)= 4,

yielding the formula

n= 4pq/d^2

Alpha error of 0.05 (5%) corresponds to Confidence Interval of 95%.

If you wish to alter the Confidence Interval up or down, one merely needs to change the value of Za in the formula.

For 99% CI, Za= 2.57

For 99.9% CI, Za= 3.29

The above formula does not permit alteration of power. For that, one may use the following:

n= [(Za+Zb)^2*p1q1+p2q2]/ d^2

where

Za (Z alpha)= Z value for alpha error

Zb (Z beta)= Z value for beta error

p1= proportion in first group;

q1= 1-p1

p2= proportion in second group

q2= 1-p2

d= clinically meaningful difference between two groups

Zb influences the power

Za influences the Confidence Interval

Beta is usually =80%

The n obtained yields the required number for each arm/ group

With a prevalence of 6%, the sample size would be

4*6*94/ (6*0.2)^2

= 2566/1.44

=1781.9 or 1782

I hope this helps.

Regards,

Dr. Roopesh

Dear Dr. Roopesh,

Many thanks for you prompt reply. Please correct me if I am wrong, if we think of the set up as hypothesis testing, the first formula is for H0: p = 6% vs H1: p not equal 6%, the second formula is for comparison between two groups?

For the first formula, is that mean that power is fixed at 80% or it has nothing to do with power at all?

The prevalence of disease of my population is 6%, I understand that I need n that is big enough so it contains some people with the disease. I was a asked to get the value of n such that the study has 80% power to capture adequate number of people with disease. Is this the same as using your first formula? Or the 80% power just can’t be used in this way?

Sorry for the long question. Your help is much appreciated.

Regards,

Betsy

Dear Betsy,

If my memory serves me right, the first formula is fixed at 80%. You could alter the denominator (precision) to improve your chances of capturing individuals with the disease.

I am travelling at present, so am unable to provide a better response.

If you will bear with me, I will provide a detailed/ more accurate response upon my return.

Regards,

Dr. Roopesh

Dear Dr. Roopesh,

Many thanks for your quick reply. I am looking forward to hear from you again.

Regards,

Betsy

Dear Betsy,

Based on my research, the first formula does not include beta, so power cannot be altered (or estimated). However, power will increase with decrease in alpha- larger n.

A statistician friend told me that one does not compute power for the first formula.

According to him, power computations are restricted to hypothesis testing situations (where one is dealing with two proportions).

I have managed to obtain a formula for estimating beta from alpha:

Zb = [n(p1-p0)^2/2pq]^(1/2) – Za

where

Zb = Z beta

n= sample size

p1= proportion as per Ha (alternative hypothesis)

p0= proportion as per H0 (null hypothesis)

The reference for the formula is:

Case Control Studies design, conduct, analysis by

James J Schlesselman

1982,

Chapter: Sample Size

Page: 149

One always has the option of performing post-hoc power analysis. In addition, if you have an estimate of the effect size, it may be possible to estimate power.

Apologies for the delay.

I hope this helps.

Regards,

Dr. Roopesh

Dear Betsy,

Please note that the alternative hypothesis is one-tailed and not two-tailed.

Regards,

Dr. Roopesh

Hello Sir,

Kindly, inform me whether I can use this formula to determine the sample size in a community based study in social science, particularly Psychology. As I have to find the prevalence of a syndrome in the population.

Can this formula be used for any other social science subject, like: Tourism, Management etc. ?

Regards

Dear Vipasha,

The formula is applicable for all cross-sectional studies regardless of the subject.

Hope that helps.

Regards,

Dr. Roopesh

Respected Sir,

Thank you so much for the quick response. It is really going to help me out a lot.

Regards and best wishes,

Vipasha Kashyap,

Doctoral Student,

Department of Psychology,

Himachal Pradesh University

Hello Sir,

I have one more query. Would it be appropriate to calculate the denominator (precision) as 12% of ‘p’. Because in my study, if I am calculating it, at 5 or 10% the sample size is coming out very large. Kindly, suggest me a reason which I could write in my methodology as an explanation for calculating the precision at 12%. I have to find the ‘p’ first by conducting a pilot study.

I shall be highly thankful to you.

Best wishes and regards,

Vipasha

Dear Vipasha,

The maximum acceptable level for precision is 20% of p. It should not be very difficult to obtain references for the same. Try looking up good epidemiology and biostatistics books. Research methodology books may also help.

Regards,

Dr. Roopesh

Hello SIR,

Thank you so much for the help once again.

Best wishes, & kind regards,

Vipasha Kashyap

hello sir,i am najihah..may i ask..i am doing research about diferrences of gut microbiota between obese and lean in adult Malaysian..may i know how to calculate sample size

Dear Najihah,

You will need to first perform a literature review to ascertain the prevalence of one or more gut microbes of interest in the specific population. Calculate sample size for each of them, then select the largest sample size as the required sample size for your study.

Regards,

Dr. Roopesh

Dear Dr Roopesh

I am carrying out a study of dry eye among diabetics in comparison to non-diabetics. My study design is a comparative cross-sectional design. I wonder if this is right and if so, what formula should I use to calculate my sample size. Thanks

Dear Anonymous,

The study design depends upon your research question.

If you are attempting to determine the prevalence of dry eye, it would be a cross-sectional study.

If you are trying to compare the two- occurrence of dry eye in diabetic vs non-diabetic, it would have to be a case-control study.

The sample size calculation depends upon the study design.

I hope this helps.

Regards,

Dr. Roopesh

Found all the information here extremely useful.

Dear Ayeola,

Thank you for the kind comment.

I hope you visit again.

Do recommend this blog to your friends as well.

Regards,

Dr. Roopesh

Pingback: Sample size calculation: Cross-sectional studies | mwebazavanessa

Reblogged this on mwebazavanessa and commented:

this is amazing.God knows I needed this.I was perplexed at the fact that I had to look at this level of Math again.

Dear Muwimmv,

Thanks for sharing and the kind comment.

Regards,

Dr. Roopesh

Dr Roopesh, Please I am finding it difficult to identify an estimated proportion to enable calculate the sample size of a purely cross sectional descriptive study on decentralization of PHC services. I am doing an all purposive sampling

Dear Uche,

The sample size will depend upon your objective(s).

I might be able to better help you if you provide a sample objective.

The sampling method will affect validity, not sample size.

Regards,

Dr. Roopesh

Dear Dr. Roopesh,

Currently, I’m doing a cross-sectional study involving measurements of parameters value (numerical) from MRI images. I’m having a problem in calculating the number of sample needed for validation process in order to calculate the inter-rater agreement using intraclass correlation coefficient (ICC)

The questions that I want to ask are :

1. Is the calculation method more or less same like the calculation for determination of sample size for my study ?

2. Is there any way that the number of sample for validation can be determined just by random assumption in case of no previous study done before ?

Regards,

Hazim

Dear Dr. Roopesh,

i am trying to conduct a study about “Frequency of diabetes in pregnant women at first antenatal visit” in our hospital. according to literature the prevalence rate is 1%. what formula should i use to calculate sample size and margin of error?

thank you,

Dear Dr. Roopesh,

please can you provide the sample size and margin of error calculation formulas for the following:

1) Clinical Randomized Control (CRT)

2) Cohrt Study

3) Cross-section studies (i have got it from you above article but i am little confuse about margin of error. should i consider “d” as margin of error? )

thank you

Salman Karim

Dear Salman Karim,

I am travelling, and unable to respond to your queries at present.

Please allow some time for the same. I will respond as soon as possible.

Regards,

Dr. Roopesh

Dear Salman Karim,

The ‘d’ mentioned in the article refers to ‘relative precision’, which equates with ‘margin of error’.

However, you may find the following links useful:

http://www.dummies.com/how-to/content/how-to-calculate-the-margin-of-error-for-a-sample-.html

http://www.isixsigma.com/tools-templates/sampling-data/margin-error-and-confidence-levels-made-simple/

The formulae for Randomized Controlled Trials (RCTs) vary with the type of RCT- factorial design/ parallel design/ cross-over design; non-inferiority trial/ superiority trial/ bio-equivalence trial, so you will have to specify which trial is of interest to you.

The following URL links to an article containing formulae for a variety of epidemiological study designs:

http://www.gerontologija.lt/files/edit_files/File/pdf/2006/nr_4/2006_225_231.pdf

Hope this helps

Dr. Roopesh

Dr, my research is prevalence of diarrhoea in rural areas and using only questionnaire form. if the prevalence is 45 % what should be the sample size. the confidence interval be 95 %.

Dear Ali,

The estimated sample size would be:

(4*45*55)/(45*0.2)^2

that is:

9900/81 or 122

Once you have concluded the study, compare the prevalence obtained in your study with the estimate used in sample size calculation (45%).

You may use the following method to calculate 95% CI of the study prevalence:

http://www.dummies.com/how-to/content/how-to-determine-the-confidence-interval-for-a-pop.html

Hope this helps.

Dr. Roopesh

dear Roopesh,

I am doing comparative cross-sectional study where i am planning to use cluster sampling in choosing sample unit. I wanted to keep difference of Sd in my two study setting 0.5 , 95% CI and with design effects 3 and power 80%. I want to insight from you in my sampling technique..

hoping to hear soon

with regards,

shneha

Dear Ms. Acharya,

The calculation seems fine- except the use of design effect of 3, that is.

Such a high design effect indicates very high inter and/or intra-cluster variability.

I would recommend evaluating the necessity for such a high design effect, preferably through a pilot study.

The reason for my suggestion is simple- if you can obtain your answers with a lower sample, you have no reason to waste resources by taking a larger sample. In the specific case of your proposed study, a design effect of 3 would imply trebling the initial sample size, while lowering the design effect to 2 would mean sampling ‘only’ twice the initial sample size.

You have not mentioned the number of clusters and size of each cluster.

My recommendation would be to increase the number of clusters while reducing the size of each cluster. This will have the effect of increasing power. You may discuss this with your statistician/ epidemiologist (to learn what happens and/ or how this occurs).

Regards,

Roopesh

Dr,

Currently I am trying to look into Family functioning and other exposure like coping mechanism, Prevelance of PTSD in family and its relation to Children PTSD.

My desire CI is 95% with power 0.8 but Dr i am confused which formula will be best for me to calculate sample size. Wish to get insight from you Dr.

With Regards,

Shneha Acharya

Hello sir , I’m doing a research project for my studies on the association between stress with dietary intake and anthropometric measurement among undergraduate , may i know what kind of formula shall i use for this ?

Dear Mohanapriyah,

The formula will depend upon your study design.

Regards,

Dr. Roopesh

sir,my study design is cross sectional study.What formula shall i use sir ?

Dear Someone,

You may use the formula provided in the article.

Regards,

Dr. Roopesh

sir , my study design is cross-sectional study.

Dear Someone,

You may use the formula provided in the article.

Regards,

Dr. Roopesh

Hi! I would like to apologize ahead of time if this post will be lengthy. I have been really troubled lately about the reliability of my study and I am about to have my Final defense on my paper (on March 14).

I conducted a study entitled “Situation of Drug Resistant Tuberculosis in the Municipalities of Molave and Tambulig, Zamboanga del Sur”. At the beginning of the study, I did not apply any form of sampling design. What I did was just considered the entire population and did purposive sampling. My study was also cross sectional.

Anyway, if I would have gone back and computed my sample size, how would I do it? The following are considered.

1) Total population where my samples were taken is 1,208 previously treated tuberculosis patients.

2) My objectives are the following:

a. Identify prevalence of TB symptoms among previously treated TB patients (there is no

given statistics for this)

b. Identify prevalence of MDRTB among previously treated patients (there is a national

incidence rate of 5.7% among patients being treated for Tuberculosis)

In the course of my study, I was able to interview 368 out of the 1,208 patients. Out of the 368, 124 turned out to be positive for symptoms. I required all 124 for testing for MDRTB but only 83 showed up and were hence tested. Out of the 83, only 1 turned out positive for MDRTB.

Can you please help? I’m sorry if I am hardly making any sense. Haha. I have been told that my research was a mess (and I believed that – having had no experience prior and having too little of a guidance doing it). Thank you so much!

Dear Kerwin,

I’m afraid I was unable to respond earlier, so this response is probably not of much benefit to you, considering that your thesis defense is over.

Firstly, you could have applied cluster sampling to obtain the sample size.

Next, I have several questions for you:

Were the patients treated for pulmonary TB only, or did you include extra-pulmonary TB as well?

What was the time frame under consideration- those who were treated within the last year/ last 2 years/ last 5 years?

Were HIV positive individuals (HIV-TB co-infection) included- this would affect the probability that one would continue be symptomatic after treatment completion; it would also influence the risk of developing MDR-TB?

What was the treatment outcome of the subjects- In India following first line treatment, sputum+ve pulmonary TB patients may either be ‘cured’ (sputum+ve at start of treatment, but sputum-ve at end of intensive phase and end of treatment), ‘treatment completed’ (sputum+ve initially, then sputum-ve after intensive phase, but not at end of treatment), or ‘failure’ (initially sputum+ve, continues to remain sputum+ve till end of treatment)?

What was the minimum time after cure/treatment completion for subjects to be included?

Were smokers/ ex-smokers included- COPD/ post-TB Bronchiectasis excluded (“symptoms of TB”)?

What is the mortality rate due to TB in your country- did you factor that in your calculations/ estimations regarding how many would be alive/ available?

Was Diabetes considered as a major risk factor during study design/ analysis?

Hopefully the above queries will help clarify things for you.

Apologies for the delay in responding.

Regards,

Dr. Roopesh

i am doing a study on comparison of bronchial wash and biopsy with bronchial brushing and bronchial biopsy. i want to calculate my sample size.

Dear Sumida,

Please indicate the study design, research question(s) and objective(s).

Regards,

Dr. Roopesh

Dear Dr. Roopesh

I am planning a study titled ‘estimation of measles antibody levels in aged 0 to 9 months healthy children’ and Compare the titre levels between groups of each month (0-1, 1.1-2, 2.1-3, 3.1-4 etc) as well as between groups of each quartile age (0-3 months, 4-6 months, 7-9 months)

please suggest the sample size needed and which formula to use

Thanking you

Dear Govind,

Your requirements involve the use of ANOVA.

I recommend you use the free software G Power to estimate sample size.

Link to G Power download site (University of Dusseldorf):

http://www.gpower.hhu.de/en.html

Link to G Power Manual :

http://www.gpower.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf

Link to G Power tutorial:

http://www.gpower.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerShortTutorial.pdf

Regards,

Dr. Roopesh

Hello Dr.

I’m doing a cross sectional analytical study on physical activity and postpartum depression(PPD) in women. I’m trying to find prevalence of PPD and the association between the PPD and physical activity. what sample size formula am i expected to use

Dear Grace,

The prevalence of moderate -vigorous physical activity (MVPA) among women with postpartum depression is around 32% according to a study.

You will have to perform a detailed literature search to obtain prevalence rates from several studies.

Then, compute the sample size requirement using the lowest prevalence and highest prevalence reported in literature. That will give you a range within which your own study’s sample size should fall.

The estimation may be further refined by performing a pilot study in your area and using the prevalence thus obtained for sample size estimation.

If you want to play it safe with regards to power considerations, simply use the smallest reported prevalence for sample size estimation- it should yield the largest required sample size.

Hope this helps.

Regards,

Dr. Roopesh

PS: You may need to frame your research question carefully, as the prevalence of MVPA varies with time after delivery, as does the occurrence of postpartum depression.

Dear Gracy,

Please use the formula provided for cross-sectional studies.

Regards,

Dr. Roopesh

Dear Dr Roopesh,

I want to conduct a simple descriptive assessment on the healthcare behaviors of patients in a specific department of a public sector health facility (specifically I would like to know for eg if the patients come to this facility for 2nd opinion, if this facility is their first choice, why do they choose it, would they go to a private facility if they could afford it etc.). I am struggling a bit with: 1)sample size (as I am not sure which size of the population should I choose (is it the average number of patients admitted to this department by day, or by week ,or by month??), 2)the period over which I should be conducting the study (should I choose one day per week and question all patients coming into the department during that day say for a month, or 2; or maybe 2 days a week on a period of 1 month etc.)

Hope you would be able to help!

Thanks

Karol

Respected Sir,

Kindly, suggest me a few references for cluster sampling and the formula used to make the cluster (from the population of the concerned area). I want to make make a cluster for one of my research study.

Regards,

Vipasha Kashyap

Dear Vipasha,

You may find the following useful:

http://www.stat.columbia.edu/~gelman/stuff_for_blog/chap20.pdf

http://ocw.jhsph.edu/courses/statmethodsforsamplesurveys/pdfs/lecture5.pdf

http://www.stat.purdue.edu/~jennings/stat522/notes/topic5.pdf

http://www.ph.ucla.edu/epi/rapidsurveys/RScourse/whostatquarterly44_98_106_1991.pdf

http://www.dental.usm.my/aos/docs/Vol_1/09_14_ayub.pdf

Hope the above helps.

Regards,

Dr. Roopesh

Respected Sir,

Hope you are ding well.

I’m really thankful to you for helping me out always. This blog is very beneficial for new researchers like me.

Kindly, suggest me a few references for the method developed by Institute for Research in Medical Statistics as an alternative to probability proportionate to size.

I ‘m also interested in knowing whether this technique can be applied in social-science research?

Regards,

Vipasha Kashyap

Dear Vipasha,

There are several alternatives for that approach.

I am unable to help you as you have not provided the specific name for the alternative of interest.

In my opinion, a technique is merely a tool. I do not see why a particular method cannot or should not be adapted for a domain it was not originally devised for.

Regards,

Dr. Roopesh

Dear Dr Roopesh,

Greetings.

I am now doing comparison study among vegetarian and non vegetarian

Purpose: to compare lifestyle factor, dietary intake, physical activity among vegetarian and non vegetarian

In the mean time, i planned to use cluster sampling design in choosing respondents.

My question is should i named my study as comparison cross sectional design or just comparison study design?

Thank you.

Dear Qi,

A cross-sectional study is any study that engages with study subjects only once during the period of study.

Thus, each subject contributes only one set of responses/ values to the study during its tenure.

If the subjects are interviewed/ investigated on more than one occasion, the study design then changes to a longitudinal study.

In such studies, subjects contribute more than one set of values to the study- obtained at different points in time.

If your study involves interviewing subjects just once during the study period, it is a cross-sectional study. It doesn’t matter if the total time taken to interview all subjects is 1 or even 2 years, as long as each subject was interviewed only once.

How you obtain/ recruit the subjects is the purview of sampling, and does not affect the study design.

Please note that all epidemiological studies involve comparisons. Therefore, to call your study a comparison study design would be of no benefit (there is no such study design).

I wonder why you wish to use cluster sampling, though, as it is a less than robust method- unless you have a large sample size/ geographical area to cover, and desire the convenience of cluster sampling.

Hope this helped.

Regards,

Dr. Roopesh

Dear Dr. Roopesh,

I’m new in research and now I need to conduct one for my thesis. It is about the prevalence of optic neuritis in a particular hospital, also its clinical presentation (e.g visual acuity, visual field, color vision). The study design is cross sectional, with the sampling is consecutive sampling from medical records. May I know what formula I’m expected to use to calculate the sample size? Please tell in detail. Thanks in advance for your guidance.

Dear Someone,

The calculation of sample size for cross sectional studies requires the use of the formula mentioned in the above article: 4pq/ l^2

Since you have more than one objective, you will need to perform a detailed review of literature and obtain prevalence values from published studies in similar settings. Identify the setting that is most like your own, then take the lowest value of prevalence for optic neuritis. Impute the value in the formula to obtain a sample size estimate for optic neuritis.

Next, establish threshold levels for visual acuity, visual field and color vision depending upon your hypotheses- visual acuity less than a/b; etc.

Perform the same procedure as for optic neuritis- review of literature, then selection of prevalence value to estimate sample size.

Once you have calculated sample size for each objective, select the largest sample size as the required sample size for your study. That way the study will be adequately powered for all objectives.

I hope this helps.

Regards,

Dr. Roopesh

Hello sir,

At the outset, thank you so much for the valuable discussion. I am planning to do a comparative cross sectional study among obese and non-obese children to see if there is any association (of course not causal) between dental caries and obesity. How should I calculate the sample size? What data do I need to derive an appropriate sample size? My prespecified hypothesis is that obesity and dental caries can have common risk factors and hence there could be an association between dental caries and obesity.

Thanks a lot.

Dear Dhyan,

Please state your objective(s).

Regards,

Dr. Roopesh

Hello sir,

To check association between obesity as identified by BMI scores and caries experience…. so caries experience of normal weight individuals will be compared with that of obese individuals.

Dear Someone/ Dhyan,

You will need to state the study population.

In that population, you will have to determine the prevalence of obesity and caries (separately) from published literature.

You will typically have a few prevalence values for obesity in that study population, and a similar number for caries.

Calculate sample size for each value, and take the largest sample size thus obtained.

If there is a similar study, use the smaller value for prevalence of dental caries among obese subjects/ non-obese subjects to calculate the sample size.

Hope this helps.

Regards,

Dr. Roopesh

Hello sir,

In my work i going to compare the serum levels of a certain protein in obese and non obese adults. Having difficulty with the sample size calculation. please can you help?

Dear Bright S. Letsu,

You will need to tell me your study’s objectives, and the prevalence of serum protein levels as reported by other investigators.

Regards,

Dr. Roopesh

Dear Dr Roophesh! I appreciate your inputs for the researchers.

Dear Mahpara,

Thank you for the words of encouragement.

I’m glad you find this blog useful.

Do spread the word about this blog so that many others may benefit as well.

Regards,

Dr. Roopesh

Dear Dr. Roopesh,

Thank you for your post. This is indeed helpful.

I am trying to estimate the sample size of a national oral health survey.

From the previous report, the prevalence rates of dental caries and gum diseases were 90% and 85%, respectively. Using the formula you provided, I came up with n=385 for dental caries and 545 for gum diseases, with a margin of error of 3% and 95% CI.

I would like to ask, how could I make sure that the power of this study is 80% or more?

Thank you.

Qin

Dear Qin,

The formula provided above has a presumed power of 80%. I say presumed because I have been unable to find something to support that assumption.

I would recommend that you try to use the largest sample size obtained.

The only way to assure yourself of a power of 80% or more is to take a small value for ‘L’. This will cause inflation of the sample size, and increase power.

In your case, I would take a smaller margin of error to be certain of the power- provided it is feasible.

I hope this helps.

Regards,

Dr. Roopesh

Hi Sir,

I am doing a cross-sectional study to determine the risk factors of diarrhoea in children under 5 y, so if the prevalence of diarrhoea is 24.4% while another study is 28 what should be the sample size. the confidence interval be 95 %. and I’m not sure if a cross-sectional study is the best option for my study.

Regards

Dear Ali,

You should take the lower prevalence to calculate sample size.

The appropriate study design depends upon the research question and objective(s). The way the research question is framed influences the study design.

I hope this helps.

Regards,

Dr. Roopesh

i am doing study in a hospital where limited no of patient are coming can i use finite population correction for calculating sample size ?

Dear Ms. Vani,

Thank you for your query, and apologies for the delay in responding.

Yes, you can use finite population correction if your sample involves less than 5% of the population.

The advantage of using the finite population correction is a reduction in required sample size.

Hope this helps.

Regards,

Dr. Roopesh

Asslam Alaikum dear…

I m going to start my research that is descriptive crossectional study..but i m feeling difficulty to calculate the sample size…can you guide me for this…

Dear Rashid,

The sample size calculation would depend upon your objectives.

Please go through previous questions in this thread- you will find guidance on the same.

If after that you need specific assistance, feel free to write in.

Regards,

Dr. Roopesh

Hi Doctor,

I am doing a study on gender differences in stroke. It will be a cross sectional study aith my objectives loking for differences in risk factors, outcome between sexes. Since i am comparing both sexes I am confused on what formula to use. Please help!

Lydia

Dear Lydia,

There are a few things to consider:

1. From an epidemiological perspective: In a cross sectional study there are technically no groups to start with- just the sample. Groups are created during analysis on the basis of characteristics of interest. This is because one doesn’t know the proportion of one or other type of subject in advance. Hence, the formula for sample size calculation in a cross sectional study will be applicable. You merely need to perform a detailed literature review and ascertain the gender wise proportion of subjects with stroke who have the outcome of interest. Choose the lowest prevalence of all such outcome variables for estimation of sample size. During analysis, group subjects by the factors of interest and analyze.

2. The biostatistician’s perspective: The requirement is to compare two proportions, so use the formula for comparing two proportions (links provided below), and obtain the sample using any study design.

As you would have probably gathered, the choice of approach depends upon the perspective of the investigator. Both are acceptable. However, some may insist on adhering to only one approach out of personal preference.

Useful links:

https://select-statistics.co.uk/calculators/sample-size-calculator-two-proportions/

http://powerandsamplesize.com/Calculators/Compare-2-Proportions/2-Sample-Equality

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137461/pdf/cc1521.pdf

http://hansheng.gsm.pku.edu.cn/pdf/2007/prop.pdf

Please note that the first and third links are to simpler resources, while the second and fourth are more technical.

I hope this helps.

Regards,

Dr. Roopesh

good afternoon sir,

i am doing a study on de-escalation of antibiotic. my primary objective is the prevalence of antibiotic de-escalation. prevalence from previous study range 32-93%. what would be the sample size for my study? should i set the study duration first or calculate the sample then only decide the study duration for patient’s enrollment? i hope you can guide me with this matter sir.

thank you.

Dear Faridah,

You need to first estimate sample size for your study using the smallest feasible prevalence value (depending upon available resources- including time).

The study duration would be determined by the time required to obtain the required sample in the study setting (hospital/ community, etc.).

If the sample size required is 200, and you can obtain only 20 patients each month, then the study duration would be 10 months.

I hope this helps.

Regards,

Dr. Roopesh

ya its really help. thanks alot sir

Mine study is to assess peak expiratory flow rate in school children of raipur city chhattisgarh. I am not able to calculate sample size because my study is not the prevalence study. And so many studies have been conducted in different parts of india. Still none of them have mentioned the steps or the criteria involved for the calculation of the sample size. Kindly suggest how to reach the calculation part as mine is a cross sectional study.

Dear Smita,

Please note that a cross-sectional study is also called a prevalence study.

The formula provided in the above article will suffice to calculate sample size.

Please go through the article and get back to me if you have additional queries.

Regards,

Dr. Roopesh