Quantitative study designs: Cohort Studies

Quantitative study designs.

  • Introduction
  • Cohort Studies
  • Randomised Controlled Trial
  • Case Control
  • Cross-Sectional Studies
  • Study Designs Home

Cohort Study

Did you know that the majority of people will develop a diagnosable mental illness whilst only a minority will experience enduring mental health?  Or that groups of people at risk of having high blood pressure and other related health issues by the age of 38 can be identified in childhood?  Or that a poor credit rating can be indicative of a person’s health status?

These findings (and more) have come out of a large cohort study started in 1972 by researchers at the University of Otago in New Zealand.  This study is known as The Dunedin Study and it has followed the lives of 1037 babies born between 1 April 1972 and 31 March 1973 since their birth. The study is now in its fifth decade and has produced over 1200 publications and reports, many of which have helped inform policy makers in New Zealand and overseas.

In Introduction to Study Designs, we learnt that there are many different study design types and that these are divided into two categories:  Experimental and Observational. Cohort Studies are a type of observational study. 

What is a Cohort Study design?

  • Cohort studies are longitudinal, observational studies, which investigate predictive risk factors and health outcomes. 
  • They differ from clinical trials, in that no intervention, treatment, or exposure is administered to the participants. The factors of interest to researchers already exist in the study group under investigation.
  • Study participants are observed over a period of time. The incidence of disease in the exposed group is compared with the incidence of disease in the unexposed group.
  • Because of the observational nature of cohort studies they can only find correlation between a risk factor and disease rather than the cause. 

Cohort studies are useful if:

  • There is a persuasive hypothesis linking an exposure to an outcome.
  • The time between exposure and outcome is not too long (adding to the study costs and increasing the risk of participant attrition).
  • The outcome is not too rare.

The stages of a Cohort Study

  • A cohort study starts with the selection of a group of participants (known as a ‘cohort’) sourced from the same population, who must be free of the outcome under investigation but have the potential to develop that outcome.
  • The participants must be identical, having common characteristics except for their exposure status.
  • The participants are divided into two groups – the first group is the ‘exposure’ group, the second group is free of the exposure. 

Types of Cohort Studies

There are two types of cohort studies:  Prospective and Retrospective .

How Cohort Studies are carried out

hypothesis cohort study

Adapted from: Cohort Studies: A brief overview by Terry Shaneyfelt [video] https://www.youtube.com/watch?v=FRasHsoORj0)

Which clinical questions does this study design best answer?

What are the advantages and disadvantages to consider when using a cohort study, what does a strong cohort study look like.

  • The aim of the study is clearly stated.
  • It is clear how the sample population was sourced, including inclusion and exclusion criteria, with justification provided for the sample size.  The sample group accurately reflects the population from which it is drawn.
  • Loss of participants to follow up are stated and explanations provided.
  • The control group is clearly described, including the selection methodology, whether they were from the same sample population, whether randomised or matched to minimise bias and confounding.
  • It is clearly stated whether the study was blinded or not, i.e. whether the investigators were aware of how the subject and control groups were allocated.
  • The methodology was rigorously adhered to.
  • Involves the use of valid measurements (recognised by peers) as well as appropriate statistical tests.
  • The conclusions are logically drawn from the results – the study demonstrates what it says it has demonstrated.
  • Includes a clear description of the data, including accessibility and availability.

What are the pitfalls to look for?

  • Confounding factors within the sample groups may be difficult to identify and control for, thus influencing the results.
  • Participants may move between exposure/non-exposure categories or not properly comply with methodology requirements.
  • Being in the study may influence participants’ behaviour.
  • Too many participants may drop out, thus rendering the results invalid.

Critical appraisal tools

To assist with the critical appraisal of a cohort study here are some useful tools that can be applied.

Critical appraisal checklist for cohort studies (JBI)

CASP appraisal checklist for cohort studies

Real World Examples

Bell, A.F., Rubin, L.H., Davis, J.M., Golding, J., Adejumo, O.A. & Carter, C.S. (2018). The birth experience and subsequent maternal caregiving attitudes and behavior: A birth cohort study . Archives of Women’s Mental Health .

Dykxhoorn, J., Hatcher, S., Roy-Gagnon, M.H., & Colman, I. (2017). Early life predictors of adolescent suicidal thoughts and adverse outcomes in two population-based cohort studies . PLoS ONE , 12(8).

Feeley, N., Hayton, B., Gold, I. & Zelkowitz, P. (2017). A comparative prospective cohort study of women following childbirth: Mothers of low birthweight infants at risk for elevated PTSD symptoms . Journal of Psychosomatic Research , 101, 24–30.

Forman, J.P., Stampfer, M.J. & Curhan, G.C. (2009). Diet and lifestyle risk factors associated with incident hypertension in women . JAMA: Journal of the American Medical Association , 302(4), 401–411.

Suarez, E. (2002). Prognosis and outcome of first-episode psychoses in Hawai’i: Results of the 15-year follow-up of the Honolulu cohort of the WHO international study of schizophrenia . ProQuest Information & Learning, Dissertation Abstracts International: Section B: The Sciences and Engineering , 63(3-B), 1577.

Young, J.T., Heffernan, E., Borschmann, R., Ogloff, J.R.P., Spittal, M.J., Kouyoumdjian, F.G., Preen, D.B., Butler, A., Brophy, L., Crilly, J. & Kinner, S.A. (2018). Dual diagnosis of mental illness and substance use disorder and injury in adults recently released from prison: a prospective cohort study . The Lancet. Public Health , 3(5), e237–e248.

References and Further Reading

Greenhalgh, T. (2014). How to Read a Paper : The Basics of Evidence-Based Medicine , John Wiley & Sons, Incorporated, Somerset, United Kingdom.

Hoffmann, T. a., Bennett, S. P., & Mar, C. D. (2017). Evidence-Based Practice Across the Health Professions (Third edition. ed.): Elsevier.

Song, J.W. & Chung, K.C. (2010). Observational studies: cohort and case-control studies . Plastic and Reconstructive Surgery , 126(6), 2234-42.

Mann, C.J. (2003). Observational research methods. Research design II: cohort, cross sectional, and case-control studies . Emergency Medicine Journal , 20(1), 54-60.

  • << Previous: Introduction
  • Next: Randomised Controlled Trial >>
  • Last Updated: Feb 29, 2024 4:49 PM
  • URL: https://deakin.libguides.com/quantitative-study-designs

Cohort Studies: Design, Analysis, and Reporting

Affiliations.

  • 1 Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH. Electronic address: [email protected].
  • 2 Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH.
  • PMID: 32658655
  • DOI: 10.1016/j.chest.2020.03.014

Cohort studies are types of observational studies in which a cohort, or a group of individuals sharing some characteristic, are followed up over time, and outcomes are measured at one or more time points. Cohort studies can be classified as prospective or retrospective studies, and they have several advantages and disadvantages. This article reviews the essential characteristics of cohort studies and includes recommendations on the design, statistical analysis, and reporting of cohort studies in respiratory and critical care medicine. Tools are provided for researchers and reviewers.

Keywords: bias; cohort studies; confounding; prospective; retrospective.

Copyright © 2020 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.

Publication types

  • Cohort Studies*
  • Data Interpretation, Statistical
  • Guidelines as Topic
  • Research Design / statistics & numerical data*

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 13 January 2022

Cohort studies investigating the effects of exposures: key principles that impact the credibility of the results

  • Anna Miroshnychenko 1 ,
  • Dena Zeraatkar 1 , 2 ,
  • Mark R. Phillips   ORCID: orcid.org/0000-0003-0923-261X 1 ,
  • Sophie J. Bakri 3 ,
  • Lehana Thabane   ORCID: orcid.org/0000-0003-0355-9734 1 , 4 ,
  • Mohit Bhandari   ORCID: orcid.org/0000-0001-9608-4808 1 , 5 &
  • Varun Chaudhary   ORCID: orcid.org/0000-0002-9988-4146 1 , 5

for the Retina Evidence Trials InterNational Alliance (R.E.T.I.N.A.) Study Group

Eye volume  36 ,  pages 905–906 ( 2022 ) Cite this article

6914 Accesses

2 Citations

1 Altmetric

Metrics details

  • Outcomes research

What are cohort studies?

Cohort studies are observational studies that follow groups of patients with different exposures forward in time and determine outcomes of interest in each exposure group or that investigate the effect of one or more participant characteristics on prognostic outcomes [ 1 ]. The focus of this editorial is on cohort studies that investigate the effects of exposures that may be associated with an increased or a decreased occurrence of the outcome of interest. Cohort studies may be prospective or retrospective in design. In prospective cohort studies, investigators enroll participants, assess exposure status, initiate follow up, and measure the outcome of interest in the future. In retrospective cohort studies, data on both the exposures and outcome of interest have been previously collected.

Purpose of cohort studies

While large well-designed randomized controlled trials (RCTs) represent the optimal design for making inferences about the effects of exposures or interventions on health outcomes, they are often not feasible to conduct—due to costs or challenges of recruiting patients with rare conditions and following patients for sufficient durations. Further, patients included in RCTs may not be representative of patients encountered in practice and the effectiveness of therapies in strict clinical trials may be different than when implemented in routine practice. In such circumstances, well-designed observational studies, which include cohort studies, can play an important role in producing evidence to guide clinical care decisions in ophthalmology. Cohort studies can also be conducted to generate hypotheses and establishing questions for future RCTs.

The differentiating characteristics between observational (e.g., cohort study) and experimental (e.g., RCT) study designs are that in the former the investigator does not intervene and rather “observes” and examines the relationship or association between an exposure and outcome. Examples of cohort studies in ophthalmology include evaluation of a possible association between exposure to ambient air pollution and age-related cataract [ 2 ]; or assessment of the impact of eye preserving therapies for patients with advanced retinoblastoma [ 3 ].

Key determinants of credibility (i.e., internal validity) in cohort studies

Readers considering applying evidence from cohort studies should be mindful of the following factors that affect the credibility or internal validity of cohort studies.

Factors that decrease the credibility of cohort studies

Cohort studies are at serious risk of confounding bias and so adjusting or accounting for confounding factors is a priority in these studies. Confounding occurs when the exposure of interest is associated with another factor that also influences the outcome of interest. Investigators can use various design (e.g., matching) and statistical methods (e.g., adjusted analyses based on regression methods) to deal with known, measured confounders. Readers should assess whether the authors accounted for known confounders of the relationship under investigation in either their design or statistical analysis. Readers should be mindful, however, that possibility of residual confounding caused by unknown or unmeasured confounders always remains.

Inappropriate selection of participants into the cohort study can result in selection bias. Selection bias occurs when selection of participants is related to both the intervention and outcome. Bias in measurement of exposure/outcome, or detection bias, can arise when outcome assessors are aware of intervention status, different methods are used to assess outcomes in the different intervention groups, and/or the exposure status is misclassified differentially or non-differentially (i.e., the probability of individuals being misclassified is different or equal between groups in a study, respectively).

Missing data may also affect the credibility of cohort studies. Bias due to missing data in prospective and retrospective studies arises when follow up data are missing for individuals initially included in the study. Participants with missing outcome data may differ importantly from those with complete data (e.g., they may be healthier or may not have experienced adverse events).

Last, credibility of a cohort study may be affected by the reporting of results. Selective reporting arises when investigators selectively report results in studies in such a way so that the study report highlights or emphasizes evidence supporting a particular hypothesis and does not report or understates evidence supporting an alternative hypothesis. Investigators may selectively report results for timepoints or measures that produced results consistent with their preconceived beliefs or results that were newsworthy and disregard results for timepoints or measures that produced results that were inconsistent with their beliefs or considered not newsworthy. Publication bias refers to the propensity for studies with anomalous, interesting, or statistically significant results to be published at higher rates or to be published more rapidly or to be published in journals with higher visibility.

Factors that increase the credibility of cohort studies

Three uncommon situations can sometimes make us more certain of findings of cohort studies—in some circumstances, these situations can make us as confident of evidence from cohort studies as we would be for evidence from a rigorous RCT. First, when the observed effect is large (typically a relative risk (RR) > 2 or RR < 0.5), biases, such as confounding, are less likely to completely explain the observed effect. Second, we may be more certain of results when we observe a dose-response gradient: biases in non-randomized studies (e.g., confounding and errors in the classification of the exposure) are unlikely to produce spurious dose-response associations., when all suspected biases are believed to act against the observed direction of effect, we can be more certain that the observed effect is not due to the suspected biases. It is, however, difficult to anticipate with sufficient certainty the direction in which effects are likely biased in complex epidemiological studies. Because situations that make us more certain of findings of cohort studies occur infrequently, cohort studies usually provide only low to very low certainty evidence [ 4 ].

Applicability (i.e., external validity) in cohort studies

If the populations, exposures, or outcomes investigated in cohort studies differ from the those of interest in routine or typical settings, the evidence may not be applicable or externally valid. Such judgements depend on whether differences between studies and the question of interest would lead to an appreciable change in the direction or magnitude of effect. Generally, observational studies (e.g., cohort studies) have higher external validity than experimental studies (e.g., RCTs) [ 5 ].

Cohort studies follow a population exposed or not exposed to a potential causal agent forward in time and assess outcomes. Cohort studies are beneficial because these studies allow the investigators to observe a possible association between an exposure and outcome of interest in a population that cannot be randomly subjected to an exposure due to ethical, methodological, or feasibility limitations. Cohort studies, however, have several limitations that should be acknowledged and minimized if possible.

Barrett D, Noble H. What are cohort studies? Evid-Based Nurs. 2019;22:95–6.

Article   Google Scholar  

Shin J, Lee H, Kim H. Association between exposure to ambient air pollution and age-related cataract: a nationwide population-based retrospective cohort study. Int J Environ Res Public Health. 2020;17:9231.

Article   CAS   Google Scholar  

Zhou C, Wen X, Ding Y, Ding J, Jin M, Liu Z, et al. Eye-preserving therapies for advanced retinoblastoma: a multicenter cohort of 1678 patients in China. Ophthalmology.2021;S0161-6420:00683–7.

Google Scholar  

Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence-study limitations (risk of bias). J Clin Epidemiol. 2011;64:407–15.

Rothwell PM. External validity of randomised controlled trials: “to whom do the results of this trial apply?”. Lancet.2005;365:82–93.

Download references

Author information

Authors and affiliations.

Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada

Anna Miroshnychenko, Dena Zeraatkar, Mark R. Phillips, Lehana Thabane, Mohit Bhandari, Varun Chaudhary & Lehana Thabane

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

Dena Zeraatkar

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie J. Bakri

Biostatistics Unit, St. Joseph’s Healthcare-Hamilton, Hamilton, ON, Canada

Lehana Thabane & Lehana Thabane

Department of Surgery, McMaster University, Hamilton, ON, Canada

Mohit Bhandari, Varun Chaudhary, Varun Chaudhary & Mohit Bhandari

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Boon, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia

Robyn H. Guymer

Department of Surgery (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia

You can also search for this author in PubMed   Google Scholar

  • Varun Chaudhary
  • , Mohit Bhandari
  • , Charles C. Wykoff
  • , Sobha Sivaprasad
  • , Lehana Thabane
  • , Peter Kaiser
  • , David Sarraf
  • , Sophie J. Bakri
  • , Sunir J. Garg
  • , Rishi P. Singh
  • , Frank G. Holz
  • , Tien Y. Wong
  •  & Robyn H. Guymer

Contributions

AM was responsible for writing, critical review and feedback on manuscript. DZ was responsible for writing, critical review and feedback on manuscript. MRP was responsible for conception of idea, critical review and feedback on manuscript. SJB was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript. MB was responsible for conception of idea, critical review and feedback on manuscript. VC was responsible for conception of idea, critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

SJB: Consultant: Adverum, Allegro, Alimera, Allergan, Apellis, Eyepoint, ilumen, Kala, Genentech, Novartis, Regenexbio, Roche, Zeiss – unrelated to this study. MB: Research funds: Pendopharm, Bioventus, Acumed – unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis – unrelated to this study. Rest authors have nothing to disclose.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

*A list of authors and their affiliations appears at the end of the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Miroshnychenko, A., Zeraatkar, D., Phillips, M.R. et al. Cohort studies investigating the effects of exposures: key principles that impact the credibility of the results. Eye 36 , 905–906 (2022). https://doi.org/10.1038/s41433-021-01897-0

Download citation

Received : 26 November 2021

Revised : 30 November 2021

Accepted : 06 December 2021

Published : 13 January 2022

Issue Date : May 2022

DOI : https://doi.org/10.1038/s41433-021-01897-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

hypothesis cohort study

Cohort Study: Definition, Designs & Examples

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A cohort study is a type of longitudinal study where a group of individuals (cohort), often sharing a common characteristic or experience, is followed over an extended period of time to study and track outcomes, typically related to specific exposures or interventions.

In cohort studies, the participants must share a common factor or characteristic such as age, demographic, or occupation. A “cohort” is a group of subjects who share a defining characteristic.

Cohort studies are observational, so researchers will follow the subjects without manipulating any variables or interfering with their environment.

This type of study is beneficial for medical researchers, specifically in epidemiology, as scientists can use data from cohort studies to understand potential risk factors or causes of a disease.

Before any appearance of the disease is investigated, medical professionals will identify a cohort, observe the target participants over time, and collect data at regular intervals.

Weeks, months, or years later, depending on the duration of the study design, the researchers will examine any factors that differed between the individuals who developed the condition and those who did not.

They can then determine if an association exists between an exposure and an outcome and even identify disease progression and relative risk.

Retrospective

  • A retrospective cohort study is a type of observational research that uses existing past data to identify two groups of individuals—those with the risk factor or exposure (cohort) and without—and follows their outcomes backward in time to determine the relationship.
  • In a retrospective study , the subjects have already experienced the outcome of interest or developed the disease before starting the study.
  • The researchers then look back in time to identify a cohort of subjects before developing the disease and use existing data, such as medical records, to discover any patterns.

Prospective

A prospective cohort study is a type of longitudinal research where a group of individuals sharing a common characteristic (cohort) is followed over time to observe and measure outcomes, often to investigate the effect of suspected risk factors.

In a prospective study , the investigators will design the study, recruit subjects, and collect baseline data on all subjects before they have developed the outcomes of interest.

  • The subjects are followed and observed over a period of time to gather information and record the development of outcomes.

prospective Cohort study

Determine cause-and-effect relationships

Because researchers study groups of people before they develop an illness, they can discover potential cause-and-effect relationships between certain behaviors and the development of a disease.

Provide extensive data

Cohort studies enable researchers to study the causes of disease and identify multiple risk factors associated with a single exposure. These studies can also reveal links between diseases and risk factors.

Enable studies of rare exposures

Cohort studies can be very useful for evaluating the effects and risks of rare diseases or unusual exposures, such as toxic chemicals or adverse effects of drugs.

Can measure a continuously changing relationship between exposure and outcome

Because cohort studies are longitudinal, researchers can study changes in levels of exposure over time and any changes in outcome, providing a deeper understanding of the dynamic relationship between exposure and outcome.

Limitations

Time consuming and expensive.

Cohort studies usually require multiple months or years before researchers are able to identify the causes of a disease or discover significant results. Because of this, they are often more expensive than other types of studies. Retrospective studies, though, tend to be cheaper and quicker than prospective studies as the data already exists.

Require large sample sizes

Cohort studies require large sample sizes in order for any relationships or patterns to be meaningful. Researchers are unable to generate results if there is not enough data.

Prone to bias

Because of the longitudinal nature of these studies, it is common for participants to drop out and not complete the study. The loss of follow-up in cohort studies means researchers are more likely to estimate the effects of an exposure on an outcome incorrectly.

Unable to discover why or how a certain factor is associated with a disease

Cohort studies are used to study cause-and-effect relationships between a disease and an outcome. However, they do not explain why the factors that affect these relationships exist. Experimental studies are required to determine why a certain factor is associated with a particular outcome.

The Framingham Heart Study

Studied the effects of diet, exercise, and medications on the development of hypertensive or arteriosclerotic cardiovascular disease, in a longitudinal population-based cohort.

The Whitehall Study

The initial prospective cohort study examined the association between employment grades and mortality rates of 17139 male civil servants over a period of ten years, beginning in 1967. When the Whitehall Study was conducted, there was no requirement to obtain ethical approval for scientific studies of this kind.

The Nurses’ Health Study

Researched long-term effects of nurses” nutrition, hormones, environment, and work-life on health and disease development.

The British Doctors Study

This was a prospective cohort study that ran from 1951 to 2001, investigating the association between smoking and the incidence of lung cancer.

The Black Women’s Health Study

Gathered information about the causes of health problems that affect Black women.

Millennium Cohort Study

Found evidence to show how various circumstances in the first stages of life can influence later health and development. The study began with an original sample of 18,818 cohort members.

The Danish Cohort Study of Psoriasis and Depression

Studied the association between psoriasis and the onset of depression.

The 1970 British Cohort Study

Followed the lives of around 17,000 people born in England, Scotland, and Wales in a single week of 1970.

Frequently Asked Questions

1. are case-control studies and cohort studies the same.

While both studies are commonly used among medical professionals to study disease, they differ.

Case-control studies are performed on individuals who already have a disease (cases) and compare them with individuals who share similar characteristics but do not have the disease (controls).

In cohort studies, on the other hand, researchers identify a group before any of the subjects have developed the disease. Then after an extended period, they examine any factors that differed between the individuals who developed the condition and those who did not.

2. What is the difference between a cross-sectional study and a cohort study?

Like case-control and cohort studies, cross-sectional studies are also used in epidemiology to identify exposures and outcomes and compare the rates of diseases and symptoms of an exposed group with an unexposed group.

However, cross-sectional studies analyze information about a population at a specific point in time, while cohort studies are carried out over longer periods.

3. What is the difference between cohort and longitudinal studies?

A cohort study is a specific type of longitudinal study. Another type of longitudinal study is called a  panel study  which involves sampling a cross-section of individuals at specific intervals for an extended period.

Panel studies are a type of prospective study, while cohort studies can be either prospective or retrospective.

Barrett D, Noble H. What are cohort studies? Evidence-Based Nursing 2019; 22:95-96.

Kandola, A.A., Osborn, D.P.J., Stubbs, B. et al. Individual and combined associations between cardiorespiratory fitness and grip strength with common mental disorders: a prospective cohort study in the UK Biobank. BMC Med 18, 303 (2020). https://doi.org/10.1186/s12916-020-01782-9

Marmot, M. G., Rose, G., Shipley, M., & Hamilton, P. J. (1978). Employment grade and coronary heart disease in British civil servants. Journal of Epidemiology & Community Health, 32(4), 244-249.

Rosenberg, L., Adams-Campbell, L., & Palmer, J. R. (1995). The Black Women’s Health Study: a follow-up study for causes and preventions of illness. Journal of the American Medical Women’s Association (1972), 50(2), 56-58.

Samer Hammoudeh, Wessam Gadelhaq and Ibrahim Janahi (November 5th 2018). Prospective Cohort Studies in Medical Research, Cohort Studies in Health Sciences, R. Mauricio Barría, IntechOpen, DOI: 10.5772/intechopen.76514. Available from: https://www.intechopen.com/chapters/60939

Setia M. S. (2016). Methodology Series Module 1: Cohort Studies. Indian journal of dermatology, 61(1), 21–25. https://doi.org/10.4103/0019-5154.174011

Zabor, E. C., Kaizer, A. M., & Hobbs, B. P. (2020). Randomized Controlled Trials. Chest, 158(1). https://doi.org/10.1016/j.chest.2020.03.013

Further Information

  • Cohort Effect? Definition and Examples
  • Barrett, D., & Noble, H. (2019). What are cohort studies?. Evidence-based nursing, 22(4), 95-96.
  • The Whitehall Studies
  • Euser, A. M., Zoccali, C., Jager, K. J., & Dekker, F. W. (2009). Cohort studies: prospective versus retrospective. Nephron Clinical Practice, 113(3), c214-c217.

Print Friendly, PDF & Email

13. Study design and choosing a statistical test

Sample size.

hypothesis cohort study

Baseline prevalence of smoking in a particular community is 30%. A clean indoor air policy goes into effect. What is the sample size required to detect a decrease in smoking prevalence of at least 2 percentage points? \(\alpha=0.05\); 90% power.

We are interested in testing the following hypothesis:

Null hypothesis:

\(H_0\colon \text{prevalence}_{(Before)}\le \text{prevalence}_{(After)}\)

Alternative hypothesis:

\(H_A\colon \text{prevalence}_{(Before)}- \text{prevalence}_{(After)}=\delta\)

Where \(\delta \gt 0\)

The resulting formula for the sample size for testing a difference in prevalence using a one-sided test is as follows:

and for this example, n can be calculated as:

\(n=\dfrac{1}{d^{2}}\left [ z_{\alpha }\sqrt{\pi_{0}(1-\pi_{0})}+z_{\beta }\sqrt{\pi_{1}(1-\pi_{1})} \right ]^{2}\)

Replace \(z_{\alpha }\) by \(z_{\alpha/2 }\) for a two-sided test

Take a moment to look at the table below for sample size requirements for testing the value of a single proportion with a one-sided test. Prevalence can be found along the top of the table and the percentage point difference vertically on the left. How many individuals do we need to include in our study in order to meet the above criteria?

(Tables from Woodward, M. Epidemiology Study Design and Analysis . Boca Raton: Chapman and Hall:, 1999 )

Table B.8. Sample size requirements for testing the value of a single proportion

Need a hint?

  • Prevalence increases (\(B_0\))? Does the sample size increase or decrease?
  • What happens to the sample size as effect size decreases?
  • What is the minimal detectable difference if you had funds for 1,500 subjects?
  • The largest sample sizes occur with baseline prevalence at 0.5
  • The smaller the effect size, the larger the sample size
  • About 3.6% decrease in prevalence

9.4 - Example 9-2: Ratios in a population-based study (relative risks, relative rates or prevalence ratios)

Example 9-2.

Suppose the rate of disease in an unexposed population is 10/100 person-years. You hypothesize an exposure has a relative risk of 2.0. How many persons must you enroll assuming half are exposed and half are unexposed to detect this increased risk? \(\alpha=0.05\) and 90% power.

Here are the hypotheses:

Null hypothesis

\(H_0\colon \text{Incidence}_{(Unexposed)} \le \text{Incidence}_{(Exposed)}\)

Alternative hypothesis

\(H_A\colon \text{Incidence}_{(Unexposed)} \le \text{Incidence}_{(Exposed)}=\lambda\)

\(\lambda \gt 0\) \(\text{Incidence}_{(Exposed)}=p(\text{Disease|Exposed})\) \(\text{Incidence}_{(Unexposed)}=p(\text{Disease|Not Exposed})\)

and the resulting formula:

\(n=\dfrac{r+1}{r(\lambda -1)^{2}\pi^{2} }\left [ z_{\alpha }\sqrt{(r+1)p_{c}(1-p_{c})}+z_{\beta }\sqrt{\lambda \pi (1-\lambda \pi)+r\pi(1-\pi )} \right ]^{2}\)

where \(\pi=\pi_2\) is the proportion in the reference group and \(p_c\) is the common proportion over the two groups, which is estimated as:

\(p_{c}=\dfrac{\pi (r\lambda +1)}{r+1}\)

When r = 1 (equal-sized groups), the formula above reduces to:

\(p_{c}=\dfrac{\pi (\lambda +1)}{2}=\dfrac{\pi_{1}+\pi_{2} }{2}\)

Let's take a look at tabulated results:

Table B.9. Sample size requirements (for the two groups combined) for testing the ratio of two proportions (relative risk) with equal numbers in each group

Click the button below to find sample size for detecting RR of 2 under conditions above.

  • Incidence rate increase \((\pi)\)?
  • Relative risk decreases \((\lambda)\)?
  • How would you use this table to determine sample size for 'protective' effects (i.e., nutritional components or medical procedures which prevent a negative outcome), as opposed to an increased risk?
  • What is the minimal detectable relative risk if you had funds for 1000 subjects?
  • n decreases
  • Largest n is closest to l
  • Protective effects would be those with \(\lambda \lt 1\)
  • With a background rate of 10/100 and 1000 subjects, a relative risk of about 1.65 could be detected.

9.5 - Example 9-3 : Odds Ratios from a case/control study

Example 9-3.

Suppose your study design is an unmatched case-control study with equal numbers of cases and controls .

If 30% of the population is exposed to a risk factor, what is the number of study subjects (assuming an equal number of cases and controls in an unmatched study design) necessary to detect a hypothesized odds ratio of 2.0? Assume 90% power \(\alpha=0.05\).

Here are the hypotheses being tested:

\(H_0\colon \text{incidence}_{1}^* \le \text{incidence}_{2}^*\)

\(H_A\colon \text{incidence}_{1}^* / \text{incidence}_{2}^*=\lambda^*\)

\(\lambda^*\gt0\)

\(\text{Disease incidence}_1^*=p(\text{Exposed|Case})\)

\(\text{Disease incidence}_2^*=p(\text{Not Exposed|Control})\)

The resulting sample size formula is:

\(n=\dfrac{(r+1)(1+(\lambda -1)P)^{2}}{rP^{2}(P-1)^{2}(\lambda -1)P)^{2}}\left [ z_{\alpha}\sqrt{(r+1)p_{c}^{*}(1-p_{c}^{*})} + z_{\beta}\sqrt{\frac{\lambda P(1-P)}{\left [ 1+(\lambda-1)P \right ]^{2}}+rP(1-P)} \right ]^{2}\)

\(p_{c}^{*}=\dfrac{P}{r+1}\left ( \dfrac{r\lambda}{1+(\lambda -1)P}+1 \right )\)

Table B.10. Total sample size requirements (for the two groups combined) for unmatched case-control studies with equal numbers of cases and controls with equal numbers in each group

  • Prevalence of the risk factor increases (P)?
  • Odds ratio decreases (\(\lambda\))?
  • For many \(\lambda\), 0.5 has the smallest sample size requirement
  • largest sample sizes with OR closest to 1; 1.1 requires greater n than 0.9

We have considered three typical epidemiologic research designs. You might also ask these questions:

Should the number of controls match the number of cases? Should multiple controls be used for each case?

Observe the power curve below:

Power increases but at a decreasing rate as the ratio of controls/cases increases. Little additional power is gained at ratios higher than four controls/cases. There is little benefit to enrolling a greater ratio of controls to cases.

from Woodward, M. Epidemiology Study Design and Analysis . Boca Raton: Chapman and Hall, 1999, p.265

Under what circumstances would it be recommended to enroll a large number of controls compared to cases?

Perhaps the small gain in power is worthwhile if the cost of a Type II error is large and the expense of obtaining controls is minimal, such as selecting controls with covariate information from a computerized database. If you must physically locate and recruit the controls, set up clinic appointments, run diagnostic tests, and enter data, the effort of pursuing a large number of controls quickly offsets any gain. You would use a one-to-one or two-to-one range. The bottom line is there is little additional power beyond a four-to-one ratio.

What if there is a Limited Number of Total Subjects for Case-Control Studies?

Sometimes the total number of subjects is limited (e.g., you have limited funds and the cost associated with each case is equal to the cost associated with a control). This graph illustrates power as related to the ratio of the controls to cases.

from Woodward, M. Epidemiology Study Design and Analysis . Boca Raton: Chapman and Hall, 1999, p.358

There is maximum power with a one-to-one ratio of controls to cases. If you are limited in the number of people that can be enrolled in a study, match cases to controls in a one-to-one fashion.

What about Matched Case-Control Studies?

In matched case/control study designs, useful data come from only the discordant pairs of subjects. Useful information does not come from the concordant pairs of subjects. Matching of cases and controls on a confounding factor (e.g., age, sex) may increase the efficiency of a case-control study, especially when the moderator's minimal number of controls are rejected.

The sample size for matched study designs may be greater or less than the sample size required for similar unmatched designs because only the pairs discordant on exposure are included in the analysis. The proportion of discordant pairs must be estimated to derive sample size and power. The power of matched case/control study design for a given sample size may be larger or smaller than the power for an unmatched design.

Formula for sample size calculation for matched case-control study:

\(n=\dfrac{(r+1)(1+(\lambda -1)P)^{2}}{rP^{2}(P-1)^{2}(\lambda -1)^{2}}\left [ z_{\alpha}\sqrt{(r+1)p_{c}^{*}} + z_{\beta}\sqrt{\frac{\lambda P(1-P)}{\left [ 1+(\lambda-1)P \right ]^{2}}+rP(1-P)} \right ]^{2}\)

P = prevalence of exposure among the population \(\lambda\) = estimated relative risk r = ratio of cases to controls

9.6 - Example of a Cohort Study

In Week 3 of this course, you looked at the cohort study by Maurice Zeegers et al, " Alcohol Consumption and Bladder Cancer Risk: Results from the Netherlands Cohort Study " American Journal of Epidemiology , Vol 153, No. 1, pp 38-41. We discussed potential effect modifiers vs. confounders at that time. Let's look at this study again to in terms of its design as a cohort study: The study design for the original cohort and selection of the case-cohort is detailed in van den Brandt, P.A. et al. " A large scale prospective cohort study on diet and cancer in the Netherlands. " J Clinical Epidemiology (1990), Vol 43, No. 3, 285-295.

  • What evidence is there of the prospective nature of this cohort study? Answer Subjects completed a questionnaire on baseline risk factors; 61% also provided toenail clippings (exposure data); follow-up for incident cancer ensues with record linkage to cancer and pathology registries.

The original cohort came from the general population of 55 to 69-year-old men and women in the Netherlands, sampled from municipal population registries. Individuals with special dietary habits (e.g. vegetarians) were over-sampled. 120, 852 subjects are in the original cohort. These subjects completed the baseline questionnaire that was sent to 340, 439 subjects. A sub-cohort of 5000 was randomly selected immediately after the identification of cohort members. A case-cohort approach was used. There was a further random selection of 3500 members from the 5000 for processing questionnaires and toenail specimens; further selection for collecting and processing dietary questionnaires. See Figure 1 in Brandt et al.

A nested case-control design would require waiting for cases to occur before efficiently matching controls to cases. This would cause a delay in processing questionnaires for cases and controls. The case-cohort approach allows data to be processed while cases are still being ascertained. In the case-cohort design, the person-year experience of the whole is estimated by the results of the sub-cohort, while cases are counted among the entire cohort.

A beauty of a well-run cohort study is the multiple outcomes that can be considered. A group well-characterized and followed over a long period of time provides much useful information. For example, the Framingham study has studied 3 generations and added to our understanding of the roles of obesity, HDL lipids, and hypertension in heart disease and stroke as well as contributing an algorithm for predicting CHD risk and identifying 8 genetic loci associated with hypertension. The use of sub-cohorts for specific purposes can minimize cost and the length of a study.

9.7 - Sample Size and Power for Epidemiologic Studies

One reason for performing sample size calculations in the planning phase of a study is to assure confidence in the study results and conclusions. We certainly wish to propose a study that has a chance to be scientifically meaningful.

Are there other implications, beyond a lack of confidence in the results, to an inadequately-powered study? Suppose you are reviewing grants for a funding agency. If insufficient numbers of subjects are to be enrolled for the study to have a reasonable chance of finding a statistically significant difference, should the investigator receive funds from the granting agency? Of course not. The FDA, NIH, NCI, and most other funding agencies are concerned about sample size and power in the studies they support and do not consider funding studies that would waste limited resources.

Money is not the only limited resource. What about potential study subjects? Is it ethical to enroll subjects in a study with a small probability of producing clinically meaningful results, precluding their participation in a more adequately-powered study? What about the horizon of patients not yet treated? Are there ethical implications to conducting a study in which treatment and care actually help prolong life, yet due to inadequate power, the results are unable to alter clinical practice?

Too many subjects are also problematic. If more subjects are recruited than needed, the study is prolonged. Wouldn't it be preferable to quickly disseminate the results if the treatment is worthwhile instead of continuing a study beyond the point where a significant effect is clear? Or, if the treatment proves detrimental to some, how many subjects will it take for the investigator to conclude there is a clear safety issue?

Recognizing that careful consideration of statistical power and the sample size is critical to assuring scientifically meaningful results, protection of human subjects, and good stewardship of fiscal, tissue, physical, and staff resources, let's review how power and sample size are determined.

One-Sided Hypothesis Testing

  • Null hypothesis – \(H_0\colon \text{disease frequency}_1=\text{disease frequency}_2\)
  • Alternative hypothesis – \(H_1\colon\text{disease frequency}_1 \gt \text{disease frequency}_2\)

Power is calculated with regard to a particular set of hypotheses. Often epidemiologic hypotheses compare an observed proportion or rate to a hypothesized value. The above hypotheses are one-sided , i.e. testing whether the proportion is significantly less in group 2 than group 1. An example of two-sided hypotheses would be testing equality of proportions as the null hypothesis; using as the alternative, inequality of proportions.

Possible Outcomes for Tests of Hypotheses

When testing hypotheses, there are two types of error as shown in the table below:

Using the analogy of a trial, we want to make correct decisions: declare the guilty, 'guilty' and the innocent, 'innocent'. We do not wish to declare the innocent 'guilty' or the guilty 'innocent'.

Statistical Power

Power is the probability that the null hypothesis is rejected if a specific alternative hypothesis is true. \(\beta\) represents Type II error, the probability of not rejecting the null hypothesis when the given alternative is true.

\(1-\beta\) = power

The power of a study should be minimally 80% and often, studies are designed to have 90-95% power to detect a particular clinical effect.

What factors affect power?

\(\alpha\),\(\beta\), effect size, variability, (baseline incidence), n

\(\alpha\) is the level of significance, the probability of a Type I error. This is usually 5% or 1%, meaning the investigator is willing to accept this level of risk of declaring the null hypothesis false when it is actually true.

The effect size is the deviation from the null that the investigator wishes to be able to detect. The effect size should be clinically meaningful. It may be based on the results of prior or pilot studies. For example, a study might be powered to be able to detect a relative risk of 2 or greater.

Sometimes a standardized effect size is given, i.e., the effect size divided by the standard deviation. This is a unitless value. If power is calculated in this manner, the standardized effect size is usually between 0.1 and 0.5, with 0.5 meaning \(H_1\) is 0.5 standard deviations away from \(H_0\).

Variability may be expressed in terms of a standard deviation, or an appropriate measure of variability for the statistic. If the hypotheses are concerned with a population proportion, the value of the proportion and the sample size are used to calculate the variability. The investigator will need an estimate of the variability in order to calculate power. Reasonable estimates may be obtained from historical data, pilot study data, or a literature search.

A study may have multiple sources of variation, each accounted for in the analysis. For example, a repeated measures design will need to account for both within-subject and between-subject variability.

The baseline incidence rate is related to the effect size. If it is hypothesized that a rate has increased or decreased, the baseline rate and the effect size must both be known to calculate the power for detecting such a change.

With knowledge of the above factors, the power of a statistical test can be calculated for a given sample size. Alternatively, the required sample size for a given power can be calculated.

Power is directly related to effect size, sample size, and significance level. An increase in either the effect size, the sample size, or the significance level will produce increased statistical power , all other factors being equal. Power is inversely related to variability. Decreasing variability will increase the power of a study.

If the power of a study is relatively high and a statistically significant effect is not observed, this implies the effect, if any, is small.

Sample Size in Epidemiologic Studies

Epidemiologic studies can be population-based or non-population-based, such as case-control studies.

  • Differences in proportions (e.g., attributable risk)
  • Ratios (e.g., relative risks, relative rates, prevalence ratios)
  • Unmatched study designs
  • Multiple controls/case
  • Matched study designs

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Strong Hypothesis | Steps & Examples

How to Write a Strong Hypothesis | Steps & Examples

Published on May 6, 2022 by Shona McCombes . Revised on November 20, 2023.

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection .

Example: Hypothesis

Daily apple consumption leads to fewer doctor’s visits.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, other interesting articles, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more types of variables .

  • An independent variable is something the researcher changes or controls.
  • A dependent variable is something the researcher observes and measures.

If there are any control variables , extraneous variables , or confounding variables , be sure to jot those down as you go to minimize the chances that research bias  will affect your results.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis cohort study

Step 1. Ask a question

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2. Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to ensure that you’re embarking on a relevant topic . This can also help you identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalize more complex constructs.

Step 3. Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

4. Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

5. Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in  if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis . The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

  • H 0 : The number of lectures attended by first-year students has no effect on their final exam scores.
  • H 1 : The number of lectures attended by first-year students has a positive effect on their final exam scores.

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). How to Write a Strong Hypothesis | Steps & Examples. Scribbr. Retrieved April 16, 2024, from https://www.scribbr.com/methodology/hypothesis/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, construct validity | definition, types, & examples, what is a conceptual framework | tips & examples, operationalization | a guide with examples, pros & cons, what is your plagiarism score.

Step 4: Test Hypotheses

Once investigators have narrowed down the likely source of the outbreak to a few possible foods, they test the hypotheses . Investigators can use many different methods to test their hypotheses, but most methods entail studies that compare how often (frequency) sick people in the outbreak ate certain foods to how often people not part of the outbreak ate those foods.

If eating a particular food is associated with getting sick in the outbreak, it provides evidence that the food is the likely source. Investigators can describe the strength of the association between food and illness by using statistical tests or measures, such as odds ratios and confidence intervals.

Illness clusters

An illness cluster is when two or more people who do not live in the same household report eating at the same restaurant location, attending a common event, or shopping at the same location of a grocery store before getting sick. Investigating illness clusters can help test hypotheses about the source of an outbreak because an illness cluster suggests that the contaminated food item was served or sold at the cluster location.

Conducting epidemiologic studies within illness cluster locations can be an effective way to identify foods that are associated with illness. Case-control and cohort studies can both be used in illness cluster investigations and are especially useful when they assess associations between illness and specific food ingredients.

In some multistate outbreaks, investigators identify numerous illness clusters. In those situations, looking for common ingredients that people ate across all the illness clusters can help investigators test hypotheses, even in the absence of an epidemiologic study.

Surveys of healthy people

Investigators often compare the frequency of foods reported by sick people in a multistate outbreak to data that already exist about healthy people. The most common source for data about how often healthy people eat certain foods is the FoodNet Population Survey , a periodic survey of randomly selected residents in the FoodNet surveillance area . The most recent FoodNet Population Survey was conducted during 2018–2019 and included interviews from 38,743 adults and children. In addition to information on food exposures, the survey also includes questions on demographic characteristics, such as age, gender, race, and ethnicity. Investigators use statistical tests to determine if people in an outbreak report eating any of the suspected foods significantly more often than people in the survey. Comparing the frequency of foods reported by sick people to existing data is often faster than conducting a formal epidemiologic study.

Epidemiologic studies

If one or more of the suspected foods under consideration are not included on the FoodNet Population Survey, investigators might need to do an epidemiologic study to determine whether consuming the food is associated with being ill. Several types of studies can be conducted during multistate foodborne outbreaks:

  • Case-control studies : Investigators collect information from sick people (cases) and people who are not sick (controls) to see if cases were more likely to eat certain foods significantly more often than controls.
  • Case-case studies : Investigators compare sick people in the outbreak to other sick people who are not part of the outbreak.
  • Cohort studies : Investigators gather data from all the people that attended an event or ate at the same restaurant and compare the frequency of illness between people who did and did not eat specific foods. When people who ate a certain food got sick significantly more often than people who did not eat the food, it provides evidence that the food is the source of the outbreak.

Challenges of hypothesis testing

There are several reasons why hypothesis testing might not identify the likely source of an outbreak.

  • The initial investigation did not lead to a strong hypothesis to test.
  • There were too few illnesses to statistically analyze differences between sick people and people who were not part of the outbreak.
  • Sick people in the outbreak could not be reached to ask about their food exposures.
  • Certain ingredients were commonly consumed together in dishes, such as tomatoes, onions, and peppers in a salsa.

Even if investigators do not find a statistical association between a food and illness, the outbreak could still be foodborne. If the outbreak has ended, the source of the outbreak is considered unknown. If people are still getting sick, investigators keep gathering information to find the food that is causing the illnesses.

<< Previous Step: Generate Hypotheses about Outbreak Sources

Next Step: Confirm the Outbreak Source >>

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Observational Studies: Cohort and Case-Control Studies

Jae w. song.

1 Research Fellow, Section of Plastic Surgery, Department of Surgery The University of Michigan Health System; Ann Arbor, MI

Kevin C. Chung

2 Professor of Surgery, Section of Plastic Surgery, Department of Surgery The University of Michigan Health System; Ann Arbor, MI

Observational studies are an important category of study designs. To address some investigative questions in plastic surgery, randomized controlled trials are not always indicated or ethical to conduct. Instead, observational studies may be the next best method to address these types of questions. Well-designed observational studies have been shown to provide results similar to randomized controlled trials, challenging the belief that observational studies are second-rate. Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In this review article, we describe these study designs, methodological issues, and provide examples from the plastic surgery literature.

Because of the innovative nature of the specialty, plastic surgeons are frequently confronted with a spectrum of clinical questions by patients who inquire about “best practices.” It is thus essential that plastic surgeons know how to critically appraise the literature to understand and practice evidence-based medicine (EBM) and also contribute to the effort by carrying out high-quality investigations. 1 Well-designed randomized controlled trials (RCTs) have held the pre-eminent position in the hierarchy of EBM as level I evidence ( Table 1 ). However, RCT methodology, which was first developed for drug trials, can be difficult to conduct for surgical investigations. 3 Instead, well-designed observational studies, recognized as level II or III evidence, can play an important role in deriving evidence for plastic surgery. Results from observational studies are often criticized for being vulnerable to influences by unpredictable confounding factors. However, recent work has challenged this notion, showing comparable results between observational studies and RCTs. 4 , 5 Observational studies can also complement RCTs in hypothesis generation, establishing questions for future RCTs, and defining clinical conditions.

Levels of Evidence Based Medicine

From REF 1 .

Observational studies fall under the category of analytic study designs and are further sub-classified as observational or experimental study designs ( Figure 1 ). The goal of analytic studies is to identify and evaluate causes or risk factors of diseases or health-related events. The differentiating characteristic between observational and experimental study designs is that in the latter, the presence or absence of undergoing an intervention defines the groups. By contrast, in an observational study, the investigator does not intervene and rather simply “observes” and assesses the strength of the relationship between an exposure and disease variable. 6 Three types of observational studies include cohort studies, case-control studies, and cross-sectional studies ( Figure 1 ). Case-control and cohort studies offer specific advantages by measuring disease occurrence and its association with an exposure by offering a temporal dimension (i.e. prospective or retrospective study design). Cross-sectional studies, also known as prevalence studies, examine the data on disease and exposure at one particular time point ( Figure 2 ). 6 Because the temporal relationship between disease occurrence and exposure cannot be established, cross-sectional studies cannot assess the cause and effect relationship. In this review, we will primarily discuss cohort and case-control study designs and related methodologic issues.

An external file that holds a picture, illustration, etc.
Object name is nihms-237355-f0001.jpg

Analytic Study Designs. Adapted with permission from Joseph Eisenberg, Ph.D.

An external file that holds a picture, illustration, etc.
Object name is nihms-237355-f0002.jpg

Temporal Design of Observational Studies: Cross-sectional studies are known as prevalence studies and do not have an inherent temporal dimension. These studies evaluate subjects at one point in time, the present time. By contrast, cohort studies can be either retrospective (latin derived prefix, “retro” meaning “back, behind”) or prospective (greek derived prefix, “pro” meaning “before, in front of”). Retrospective studies “look back” in time contrasting with prospective studies, which “look ahead” to examine causal associations. Case-control study designs are also retrospective and assess the history of the subject for the presence or absence of an exposure.

COHORT STUDY

The term “cohort” is derived from the Latin word cohors . Roman legions were composed of ten cohorts. During battle each cohort, or military unit, consisting of a specific number of warriors and commanding centurions, were traceable. The word “cohort” has been adopted into epidemiology to define a set of people followed over a period of time. W.H. Frost, an epidemiologist from the early 1900s, was the first to use the word “cohort” in his 1935 publication assessing age-specific mortality rates and tuberculosis. 7 The modern epidemiological definition of the word now means a “group of people with defined characteristics who are followed up to determine incidence of, or mortality from, some specific disease, all causes of death, or some other outcome.” 7

Study Design

A well-designed cohort study can provide powerful results. In a cohort study, an outcome or disease-free study population is first identified by the exposure or event of interest and followed in time until the disease or outcome of interest occurs ( Figure 3A ). Because exposure is identified before the outcome, cohort studies have a temporal framework to assess causality and thus have the potential to provide the strongest scientific evidence. 8 Advantages and disadvantages of a cohort study are listed in Table 2 . 2 , 9 Cohort studies are particularly advantageous for examining rare exposures because subjects are selected by their exposure status. Additionally, the investigator can examine multiple outcomes simultaneously. Disadvantages include the need for a large sample size and the potentially long follow-up duration of the study design resulting in a costly endeavor.

An external file that holds a picture, illustration, etc.
Object name is nihms-237355-f0003.jpg

Cohort and Case-Control Study Designs

Advantages and Disadvantages of the Cohort Study

Cohort studies can be prospective or retrospective ( Figure 2 ). Prospective studies are carried out from the present time into the future. Because prospective studies are designed with specific data collection methods, it has the advantage of being tailored to collect specific exposure data and may be more complete. The disadvantage of a prospective cohort study may be the long follow-up period while waiting for events or diseases to occur. Thus, this study design is inefficient for investigating diseases with long latency periods and is vulnerable to a high loss to follow-up rate. Although prospective cohort studies are invaluable as exemplified by the landmark Framingham Heart Study, started in 1948 and still ongoing, 10 in the plastic surgery literature this study design is generally seen to be inefficient and impractical. Instead, retrospective cohort studies are better indicated given the timeliness and inexpensive nature of the study design.

Retrospective cohort studies, also known as historical cohort studies, are carried out at the present time and look to the past to examine medical events or outcomes. In other words, a cohort of subjects selected based on exposure status is chosen at the present time, and outcome data (i.e. disease status, event status), which was measured in the past, are reconstructed for analysis. The primary disadvantage of this study design is the limited control the investigator has over data collection. The existing data may be incomplete, inaccurate, or inconsistently measured between subjects. 2 However, because of the immediate availability of the data, this study design is comparatively less costly and shorter than prospective cohort studies. For example, Spear and colleagues examined the effect of obesity and complication rates after undergoing the pedicled TRAM flap reconstruction by retrospectively reviewing 224 pedicled TRAM flaps in 200 patients over a 10-year period. 11 In this example, subjects who underwent the pedicled TRAM flap reconstruction were selected and categorized into cohorts by their exposure status: normal/underweight, overweight, or obese. The outcomes of interest were various flap and donor site complications. The findings revealed that obese patients had a significantly higher incidence of donor site complications, multiple flap complications, and partial flap necrosis than normal or overweight patients. An advantage of the retrospective study design analysis is the immediate access to the data. A disadvantage is the limited control over the data collection because data was gathered retrospectively over 10-years; for example, a limitation reported by the authors is that mastectomy flap necrosis was not uniformly recorded for all subjects. 11

An important distinction lies between cohort studies and case-series. The distinguishing feature between these two types of studies is the presence of a control, or unexposed, group. Contrasting with epidemiological cohort studies, case-series are descriptive studies following one small group of subjects. In essence, they are extensions of case reports. Usually the cases are obtained from the authors' experiences, generally involve a small number of patients, and more importantly, lack a control group. 12 There is often confusion in designating studies as “cohort studies” when only one group of subjects is examined. Yet, unless a second comparative group serving as a control is present, these studies are defined as case-series. The next step in strengthening an observation from a case-series is selecting appropriate control groups to conduct a cohort or case-control study, the latter which is discussed in the following section about case-control studies. 9

Methodological Issues

Selection of subjects in cohort studies.

The hallmark of a cohort study is defining the selected group of subjects by exposure status at the start of the investigation. A critical characteristic of subject selection is to have both the exposed and unexposed groups be selected from the same source population ( Figure 4 ). 9 Subjects who are not at risk for developing the outcome should be excluded from the study. The source population is determined by practical considerations, such as sampling. Subjects may be effectively sampled from the hospital, be members of a community, or from a doctor's individual practice. A subset of these subjects will be eligible for the study.

An external file that holds a picture, illustration, etc.
Object name is nihms-237355-f0005.jpg

Levels of Subject Selection. Adapted from Ref 9 .

Attrition Bias (Loss to follow-up)

Because prospective cohort studies may require long follow-up periods, it is important to minimize loss to follow-up. Loss to follow-up is a situation in which the investigator loses contact with the subject, resulting in missing data. If too many subjects are loss to follow-up, the internal validity of the study is reduced. A general rule of thumb requires that the loss to follow-up rate not exceed 20% of the sample. 6 Any systematic differences related to the outcome or exposure of risk factors between those who drop out and those who stay in the study must be examined, if possible, by comparing individuals who remain in the study and those who were loss to follow-up or dropped out. It is therefore important to select subjects who can be followed for the entire duration of the cohort study. Methods to minimize loss to follow-up are listed in Table 3 .

Methods to Minimize Loss to Follow-Up

Adapted from REF 2 .

CASE-CONTROL STUDIES

Case-control studies were historically borne out of interest in disease etiology. The conceptual basis of the case-control study is similar to taking a history and physical; the diseased patient is questioned and examined, and elements from this history taking are knitted together to reveal characteristics or factors that predisposed the patient to the disease. In fact, the practice of interviewing patients about behaviors and conditions preceding illness dates back to the Hippocratic writings of the 4 th century B.C. 7

Reasons of practicality and feasibility inherent in the study design typically dictate whether a cohort study or case-control study is appropriate. This study design was first recognized in Janet Lane-Claypon's study of breast cancer in 1926, revealing the finding that low fertility rate raises the risk of breast cancer. 13 , 14 In the ensuing decades, case-control study methodology crystallized with the landmark publication linking smoking and lung cancer in the 1950s. 15 Since that time, retrospective case-control studies have become more prominent in the biomedical literature with more rigorous methodological advances in design, execution, and analysis.

Case-control studies identify subjects by outcome status at the outset of the investigation. Outcomes of interest may be whether the subject has undergone a specific type of surgery, experienced a complication, or is diagnosed with a disease ( Figure 3B ). Once outcome status is identified and subjects are categorized as cases, controls (subjects without the outcome but from the same source population) are selected. Data about exposure to a risk factor or several risk factors are then collected retrospectively, typically by interview, abstraction from records, or survey. Case-control studies are well suited to investigate rare outcomes or outcomes with a long latency period because subjects are selected from the outset by their outcome status. Thus in comparison to cohort studies, case-control studies are quick, relatively inexpensive to implement, require comparatively fewer subjects, and allow for multiple exposures or risk factors to be assessed for one outcome ( Table 4 ). 2 , 9

Advantages and Disadvantages of the Case-Control Study

An example of a case-control investigation is by Zhang and colleagues who examined the association of environmental and genetic factors associated with rare congenital microtia, 16 which has an estimated prevalence of 0.83 to 17.4 in 10,000. 17 They selected 121 congenital microtia cases based on clinical phenotype, and 152 unaffected controls, matched by age and sex in the same hospital and same period. Controls were of Hans Chinese origin from Jiangsu, China, the same area from where the cases were selected. This allowed both the controls and cases to have the same genetic background, important to note given the investigated association between genetic factors and congenital microtia. To examine environmental factors, a questionnaire was administered to the mothers of both cases and controls. The authors concluded that adverse maternal health was among the main risk factors for congenital microtia, specifically maternal disease during pregnancy (OR 5.89, 95% CI 2.36-14.72), maternal toxicity exposure during pregnancy (OR 4.76, 95% CI 1.66-13.68), and resident area, such as living near industries associated with air pollution (OR 7.00, 95% CI 2.09-23.47). 16 A case-control study design is most efficient for this investigation, given the rarity of the disease outcome. Because congenital microtia is thought to have multifactorial causes, an additional advantage of the case-control study design in this example is the ability to examine multiple exposures and risk factors.

Selection of Cases

Sampling in a case-control study design begins with selecting the cases. In a case-control study, it is imperative that the investigator has explicitly defined inclusion and exclusion criteria prior to the selection of cases. For example, if the outcome is having a disease, specific diagnostic criteria, disease subtype, stage of disease, or degree of severity should be defined. Such criteria ensure that all the cases are homogenous. Second, cases may be selected from a variety of sources, including hospital patients, clinic patients, or community subjects. Many communities maintain registries of patients with certain diseases and can serve as a valuable source of cases. However, despite the methodologic convenience of this method, validity issues may arise. For example, if cases are selected from one hospital, identified risk factors may be unique to that single hospital. This methodological choice may weaken the generalizability of the study findings. Another example is choosing cases from the hospital versus the community; most likely cases from the hospital sample will represent a more severe form of the disease than those in the community. 2 Finally, it is also important to select cases that are representative of cases in the target population to strengthen the study's external validity ( Figure 4 ). Potential reasons why cases from the original target population eventually filter through and are available as cases (study participants) for a case-control study are illustrated in Figure 5 .

An external file that holds a picture, illustration, etc.
Object name is nihms-237355-f0006.jpg

Levels of Case Selection. Adapted from Ref 2 .

Selection of Controls

Selecting the appropriate group of controls can be one of the most demanding aspects of a case-control study. An important principle is that the distribution of exposure should be the same among cases and controls; in other words, both cases and controls should stem from the same source population. The investigator may also consider the control group to be an at-risk population, with the potential to develop the outcome. Because the validity of the study depends upon the comparability of these two groups, cases and controls should otherwise meet the same inclusion criteria in the study.

A case-control study design that exemplifies this methodological feature is by Chung and colleagues, who examined maternal cigarette smoking during pregnancy and the risk of newborns developing cleft lip/palate. 18 A salient feature of this study is the use of the 1996 U.S. Natality database, a population database, from which both cases and controls were selected. This database provides a large sample size to assess newborn development of cleft lip/palate (outcome), which has a reported incidence of 1 in 1000 live births, 19 and also enabled the investigators to choose controls (i.e., healthy newborns) that were generalizable to the general population to strengthen the study's external validity. A significant relationship with maternal cigarette smoking and cleft lip/palate in the newborn was reported in this study (adjusted OR 1.34, 95% CI 1.36-1.76). 18

Matching is a method used in an attempt to ensure comparability between cases and controls and reduces variability and systematic differences due to background variables that are not of interest to the investigator. 8 Each case is typically individually paired with a control subject with respect to the background variables. The exposure to the risk factor of interest is then compared between the cases and the controls. This matching strategy is called individual matching. Age, sex, and race are often used to match cases and controls because they are typically strong confounders of disease. 20 Confounders are variables associated with the risk factor and may potentially be a cause of the outcome. 8 Table 5 lists several advantages and disadvantages with a matching design.

Advantages and Disadvantages for Using a Matching Strategy

Multiple Controls

Investigations examining rare outcomes may have a limited number of cases to select from, whereas the source population from which controls can be selected is much larger. In such scenarios, the study may be able to provide more information if multiple controls per case are selected. This method increases the “statistical power” of the investigation by increasing the sample size. The precision of the findings may improve by having up to about three or four controls per case. 21 - 23

Bias in Case-Control Studies

Evaluating exposure status can be the Achilles heel of case-control studies. Because information about exposure is typically collected by self-report, interview, or from recorded information, it is susceptible to recall bias, interviewer bias, or will rely on the completeness or accuracy of recorded information, respectively. These biases decrease the internal validity of the investigation and should be carefully addressed and reduced in the study design. Recall bias occurs when a differential response between cases and controls occurs. The common scenario is when a subject with disease (case) will unconsciously recall and report an exposure with better clarity due to the disease experience. Interviewer bias occurs when the interviewer asks leading questions or has an inconsistent interview approach between cases and controls. A good study design will implement a standardized interview in a non-judgemental atmosphere with well-trained interviewers to reduce interviewer bias. 9

The STROBE Statement: The Strengthening the Reporting of Observational Studies in Epidemiology Statement

In 2004, the first meeting of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) group took place in Bristol, UK. 24 The aim of the group was to establish guidelines on reporting observational research to improve the transparency of the methods, thereby facilitating the critical appraisal of a study's findings. A well-designed but poorly reported study is disadvantaged in contributing to the literature because the results and generalizability of the findings may be difficult to assess. Thus a 22-item checklist was generated to enhance the reporting of observational studies across disciplines. 25 , 26 This checklist is also located at the following website: www.strobe-statement.org . This statement is applicable to cohort studies, case-control studies, and cross-sectional studies. In fact, 18 of the checklist items are common to all three types of observational studies, and 4 items are specific to each of the 3 specific study designs. In an effort to provide specific guidance to go along with this checklist, an “explanation and elaboration” article was published for users to better appreciate each item on the checklist. 27 Plastic surgery investigators should peruse this checklist prior to designing their study and when they are writing up the report for publication. In fact, some journals now require authors to follow the STROBE Statement. A list of participating journals can be found on this website: http://www.strobe-statement.org./index.php?id=strobe-endorsement .

Due to the limitations in carrying out RCTs in surgical investigations, observational studies are becoming more popular to investigate the relationship between exposures, such as risk factors or surgical interventions, and outcomes, such as disease states or complications. Recognizing that well-designed observational studies can provide valid results is important among the plastic surgery community, so that investigators can both critically appraise and appropriately design observational studies to address important clinical research questions. The investigator planning an observational study can certainly use the STROBE statement as a tool to outline key features of a study as well as coming back to it again at the end to enhance transparency in methodology reporting.

Acknowledgments

Supported in part by a Midcareer Investigator Award in Patient-Oriented Research (K24 AR053120) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (to Dr. Kevin C. Chung).

None of the authors has a financial interest in any of the products, devices, or drugs mentioned in this manuscript.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

  • REGISTERED REPORT
  • Open access
  • Published: 12 April 2024

Assessing causal links between age at menarche and adolescent mental health: a Mendelian randomisation study

  • Adrian Dahl Askelund 1 , 2 , 3 ,
  • Robyn E. Wootton 2 , 3 , 4 , 5 ,
  • Fartein A. Torvik 1 , 6 ,
  • Rebecca B. Lawn 7 ,
  • Helga Ask 3 , 8 ,
  • Elizabeth C. Corfield 2 , 3 ,
  • Maria C. Magnus 6 ,
  • Ted Reichborn-Kjennerud 3 , 9 ,
  • Per M. Magnus 6 ,
  • Ole A. Andreassen 10 , 11 ,
  • Camilla Stoltenberg 12 , 13 ,
  • George Davey Smith 4 ,
  • Neil M. Davies 14 , 15 , 16 ,
  • Alexandra Havdahl 2 , 3 , 4 , 8 &
  • Laurie J. Hannigan   ORCID: orcid.org/0000-0003-3123-5411 2 , 3 , 4  

BMC Medicine volume  22 , Article number:  155 ( 2024 ) Cite this article

298 Accesses

11 Altmetric

Metrics details

The timing of puberty may have an important impact on adolescent mental health. In particular, earlier age at menarche has been associated with elevated rates of depression in adolescents. Previous research suggests that this relationship may be causal, but replication and an investigation of whether this effect extends to other mental health domains is warranted.

In this Registered Report, we triangulated evidence from different causal inference methods using a new wave of data ( N  = 13,398) from the Norwegian Mother, Father, and Child Cohort Study. We combined multiple regression, one- and two-sample Mendelian randomisation (MR), and negative control analyses (using pre-pubertal symptoms as outcomes) to assess the causal links between age at menarche and different domains of adolescent mental health.

Our results supported the hypothesis that earlier age at menarche is associated with elevated depressive symptoms in early adolescence based on multiple regression ( β  =  − 0.11, 95% CI [− 0.12, − 0.09], p one-tailed  < 0.01). One-sample MR analyses suggested that this relationship may be causal ( β  =  − 0.07, 95% CI [− 0.13, 0.00], p one-tailed  = 0.03), but the effect was small, corresponding to just a 0.06 standard deviation increase in depressive symptoms with each earlier year of menarche. There was also some evidence of a causal relationship with depression diagnoses during adolescence based on one-sample MR (OR = 0.74, 95% CI [0.54, 1.01], p one-tailed  = 0.03), corresponding to a 29% increase in the odds of receiving a depression diagnosis with each earlier year of menarche. Negative control and two-sample MR sensitivity analyses were broadly consistent with this pattern of results. Multivariable MR analyses accounting for the genetic overlap between age at menarche and childhood body size provided some evidence of confounding. Meanwhile, we found little consistent evidence of effects on other domains of mental health after accounting for co-occurring depression and other confounding.

Conclusions

We found evidence that age at menarche affected diagnoses of adolescent depression, but not other domains of mental health. Our findings suggest that earlier age at menarche is linked to problems in specific domains rather than adolescent mental health in general.

Peer Review reports

Early pubertal timing has been associated with problems in a wide range of adolescent mental health domains (e.g., depression [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ], anxiety [ 7 , 14 , 15 ], conduct disorders [ 3 , 7 , 16 , 17 , 18 , 19 ], and attention-deficit hyperactivity disorder (ADHD) [ 6 ]) across different indicators of pubertal development and across sexes [ 20 ]. The consistency of associations between early timing and adolescent mental health has led to the hypothesis that early pubertal timing is a transdiagnostic risk factor for psychopathology in adolescents [ 21 ].

Despite the apparent generality of associations between early pubertal timing and adolescent mental health, the prominent rise in rates of female depression beginning during puberty [ 22 ] has led to this outcome receiving particular empirical focus [ 23 ]. The timing of puberty in females is commonly indexed using the onset of menses (menarche). Earlier age at menarche has been associated with elevated depressive symptoms in adolescents in several observational studies [ 4 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 ], but not in all [ 33 , 34 , 35 , 36 , 37 ], and also higher rates of clinical depression during adolescence [ 31 , 37 ]. However, although early pubertal maturation in females has been associated with a wide range of problems in adolescence, these associations may dissipate by adulthood [ 3 , 38 ]. A notable exception in a large prospective study was that heightened risk of depression persisted into young adulthood for early maturers, in particular, for those with a history of conduct disorder [ 3 ].

The association between pubertal timing and depression in adolescent females may be due to the biological underpinnings of reproductive maturation. The female sex hormone estradiol increases with puberty and is associated with depression [ 39 , 40 ], and hormonal contraceptive use has been associated with higher levels of depressive symptoms, especially in adolescence [ 41 ]. In fact, it has been found that the stage of breast development (governed primarily by estradiol) was associated with depression independently of the timing of menarche in adolescent females from the Avon Longitudinal Study of Parents and Children (ALSPAC) [ 42 ]. Interestingly, a recent study in the same sample found that a polygenic score for age at menarche showed a potential indirect association with adolescent depressive symptoms through the stage of breast development [ 43 ]. Alongside psychosocial pathways (i.e., visible breast development leading to unwanted sexual attention at a younger age), increases in estradiol represent a plausible biological mechanism for the link between early pubertal timing and depression in females.

Despite the series of observational studies, it is unknown whether the link between age at menarche and depression represents a truly causal relationship. This is important because several robust observational associations in epidemiology have turned out not to be causal and may instead have been the result of confounding (i.e., vitamin E supplement use and cardiovascular disease [ 44 ]). In the case of associations between age at menarche and depression, body mass index (BMI) is a particularly likely candidate for confounding given the robust (and plausibly causal) links between BMI and age at menarche [ 45 ] and between BMI and depression [ 46 , 47 , 48 ]. Failure to appropriately account for potential confounding, especially by BMI [ 4 , 25 , 27 , 28 , 29 , 30 , 32 , 33 , 34 , 35 , 36 , 37 ], has been a relatively common shortcoming of the literature on this topic to date. In previous studies that explicitly controlled for BMI, the relationship was somewhat attenuated [ 26 , 49 ]. Another study found BMI to be a partial mediator of the relationship between earlier menarche and depression [ 31 ].

Mendelian randomisation (MR) is a causal inference method that can be implemented in instrumental variable analyses [ 50 , 51 ], which is particularly useful when experimental manipulation of the variable of interest is not ethical or feasible. Since hundreds of genetic variants are strongly linked with age at menarche [ 52 ], single nucleotide polymorphisms (SNPs) that are independently associated with this phenotype can be used as genetic instruments in MR analyses. The logic of MR is analogous to that of a randomised controlled trial (RCT). Unlike in an RCT design where individuals are randomly assigned to experimental groups, in MR, we use random “assignment” to genotype (ensured by the random transmission of one of two possible alleles at each genetic locus from each parent to their child at conception [ 53 ]). Specifically, these genetic variants are used as instrumental variables, serving as a genetic proxy for age at menarche.

Whereas self-reported age at menarche may be associated with several different confounders (even if precisely measured), the genetic instrument is assumed to be independent of such confounding. Both the widespread genetic influence on age at menarche [ 52 ] and the high accuracy and reliability of self-reported age at menarche [ 54 ] jointly increase the strength of the genetic instrument employed here, which serves to improve study power and minimise weak instrument bias [ 55 ]. The strength of the genetic instrument makes MR especially valuable for advancing menarche research. Provided that some important assumptions of MR hold true, we can estimate the causal effects of age at menarche on adolescent mental health.

A previous study found preliminary evidence that the relationship between age at menarche and depression in early adolescence may be causal, using MR in ALSPAC ( N  = 2404) [ 56 ]. Specifically, they found that early age at menarche resulted in more depressive symptoms at age 14 (independent of BMI), but not later in adolescence. However, this study had low power due to a modest sample size for MR. Here, we aim to replicate the 14-year analyses in adolescents from a larger birth cohort, the Norwegian Mother, Father, and Child Cohort Study (MoBa) [ 57 ]. This replication will allow for a confirmatory and higher-powered test of the hypothesis that earlier age at menarche is causally related to adolescent depression.

Beyond replicating its key finding, we will also extend the previous approach [ 56 ] in several key ways. First, we will test whether the effects of earlier age at menarche extend to other domains of mental health (anxiety disorders, conduct disorder (CD), oppositional defiant disorder (ODD), and ADHD), independent of associations with depression. Second, we will use multivariable methods to examine different confounders or mechanisms, by simultaneously including genetic instruments for childhood body size, adult BMI, or estradiol in the MR model together with age at menarche. Third, in line with recommendations to triangulate evidence across approaches for robust causal inference [ 58 ], we will combine MR with negative control analyses using symptoms prior to puberty as a negative control outcome. This triangulation is particularly important in the context of replication studies, given that the same sources of bias could lead to results being replicated in another study using the same methodology [ 59 , 60 ].

A previous hypothesis-free MR phenome-wide association study identified potential causal effects of age at menarche on adult mental health [ 61 ], but these were not followed up with replication in any independent cohorts. Here, we take a confirmatory approach, testing causal hypotheses about the role of age at menarche in the aetiology of developing mental health disorders. This is important in part because a causal effect of age at menarche may help explain the sharp rise in depression rates among females from early adolescence [ 22 ]. This research might further help with identifying female adolescents at increased risk, facilitating early identification and prevention of mental health problems in adolescence and beyond.

To test our hypotheses, we make use of the Registered Report format, demonstrating its applicability to epidemiological analyses of cohort data when a new wave of data collection ensures that the exposure and outcome data have not been observed prior to the analytic choices being made. This format, combined with several sensitivity tests, will strengthen our statistical inferences by preserving false-positive rates at the specified level [ 62 ] and ultimately increase confidence in the causal conclusions that are drawn.

We addressed the following research questions: (1) To what extent is age at menarche associated with adolescent depression? (2) Does age at menarche associate with symptoms or diagnoses in other domains of mental health, independent of depression? (3) What is the evidence for a causal link between age at menarche and depression? and (4) Is there evidence of causal links between age at menarche and other domains of mental health? The specific hypotheses for each research question are listed in Table  1 .

The Norwegian Mother, Father, and Child Cohort Study (MoBa) is a population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health [ 57 ]. Pregnant women and their partners were recruited from all over Norway at approximately 17 weeks gestation between 1999 and 2008. The women consented to participation in 41% of the pregnancies. The cohort includes approximately 114,500 children, 95,200 mothers, and 75,200 fathers. The current study is based on version 12 of the quality-assured data files released for research in January 2019. We also used data from the Medical Birth Registry of Norway (MBRN).

In MoBa, phenotype data have been collected by questionnaires from early pregnancy to middle childhood, provided primarily by mothers (around weeks 17, 22, and 30 of pregnancy and at child ages 0.5, 1.5, 3, 5, and 8 years). This project also made use of an ongoing wave of data collection in adolescence (questionnaires returned at ~ 14.4 years; hereafter age 14). The 14-year data were not available to us during the preparation of the stage 1 element of the Registered Report (see Fig.  1 ).

figure 1

Overview of the Registered Report process

Inclusion criteria and sample size

We included all MoBa females (as registered at birth in MBRN) with any available phenotype data at age 14. There were 13,398 females with 14-year data, of which 9832 were genotyped.

Self-reported age at menarche (in years) from the 14-year questionnaire was included as the main exposure. We ran the observational analyses with both a continuous and a categorical (early/average/late) variable based on reported age at menarche (see Additional file 1 for further details). Values were imputed for those who had not yet reached menarche at age 14, using information about the stage of pubertal development, as well as all the covariates and outcomes (see Additional file 1 ). We also included the self-reported stage of breast development at age 14 as an additional exposure for sensitivity analyses.

Mental health problems

Depressive symptoms were assessed through the Short Mood and Feelings Questionnaire (SMFQ; 13 items) [ 63 ]. We also computed a dichotomised version of the SMFQ (see Additional file 1 ). Anxiety symptoms were assessed through a short form of the Screen for Child Anxiety-Related Disorders (SCARED; 5 items) [ 64 ]. Behaviour problems (CD, ODD, and ADHD) were assessed with the Rating Scale for Disruptive Behaviour Disorders (RS-DBD; 34 items) [ 65 ]. All symptom outcomes were log or square root transformed due to non-normal distributions. The measures were treated as continuous, and scores were standardised to have a mean of 0 and a standard deviation of 1. Information about the psychometric properties of the scales is provided in Additional file 2 : Table S1. An overview of all variables included in the study and their processing is in Additional file 3 : Table S2.

Psychiatric diagnoses

We linked to the “control and payment of health refunds” database (KUHR) and the Norwegian Patient Registry (NPR) to obtain psychiatric diagnoses from medical records (see Additional file 1 for diagnostic codes and further details). Individuals were classified as a “case” in the case–control analysis if they had received a relevant diagnosis in either primary (covered by KUHR) or secondary health care (covered by NPR) during adolescence (between ages 10 and 17).

We included BMI at ages 8 and 14, age at questionnaire return, maternal and paternal age, parental education and income, financial problems, parental cohabitation, parity, and maternal prenatal and postnatal depression as covariates (see Additional file 3 : Table S2).

Genotyping and quality control

In MoBa, blood samples were obtained from children (umbilical cord) at birth [ 66 ]. The genotyping and quality control have been described in detail elsewhere [ 67 ].

Genetic instruments for Mendelian randomisation

A recent genome-wide association study (GWAS) meta-analysis of 42 studies involving 329,345 post-pubertal women of European ancestry found 389 independent signals associated with self-reported age at menarche, reaching the conventional threshold for genome-wide significance ( p  < 5 × 10 −8 ) in the discovery sample [ 52 ]. These variants were largely replicated in a sample of 39,543 post-pubertal women from the Icelandic deCODE study, explaining 7.4% of the variance in age at menarche. First, we subset these genome-wide significant variants to single nucleotide polymorphisms (SNPs) only by removing insertions and deletions. Then, we extracted these SNPs (as available) from the genetic data in MoBa, which did not contribute to the GWAS meta-analysis. Having subset to genome-wide significant SNPs available in the MoBa cohort, we then clumped them for independence (linkage disequilibrium R 2  = 0.001, clumping window = 10,000 kb). For the one-sample MR, we used this set of SNPs to construct a weighted genetic risk score based on published GWAS effect estimates. The score was computed as the weighted sum of the age-at-menarche-increasing alleles across the selected SNPs. Specifically, we multiplied the number of effect alleles (0, 1, or 2; or if imputed, probabilities of effect alleles) at each SNP by their weight (GWAS SNP-trait association), then summed and divided by the total number of SNPs used. Genotyping batch and the first 20 principal components were regressed out of the genetic instruments, the latter to control for confounding by population stratification.

We also prepared the age at menarche summary statistics—along with summary statistics for estradiol [ 68 ], adult BMI [ 69 ], recalled childhood body size [ 69 ], major depression [ 70 ], and the 14-year symptom outcomes in MoBa—for use in two-sample MR analyses (see Additional file 4 for details [ 69 , 70 , 71 , 72 , 73 , 74 ]). We also employed Steiger filtering [ 75 ] to create another genetic instrument for age at menarche, excluding SNPs that were more predictive of depression at age 14 than age at menarche. This primarily served to prevent reverse causation and to remove potential pleiotropic pathways other than the causal pathway of interest.

Statistical analysis

Observational analyses.

First, we ran linear regression analyses to estimate the observational associations between age at menarche and continuous symptom outcomes, accounting for the effects of covariates. In addition, we ran logistic regression analyses to estimate observational associations with the diagnostic outcomes from registry data, accounting for the effects of covariates. All models were run with and without the covariates described above to obtain adjusted and unadjusted estimates, and inferences were based on the adjusted estimates.

Mendelian randomisation analyses

To avoid problems related to confounding and reverse causation common to traditional observational methods, MR uses j genetic variants G 1 , G 2 , …, G j as a proxy for the exposure X to estimate the association between the exposure X and the outcome Y (see Fig.  2 for an illustrative diagram) [ 50 ]. The obtained estimate is assumed to be independent of potential confounders U . This assumption builds on Mendel’s first and second law of inheritance [ 53 ]. The two laws are (1) the segregation of alleles at the same locus is independent (equal segregation) and (2) the alleles of different genes are inherited independently of each other during gamete formation (independent assortment).

figure 2

Directed acyclic graph illustrating the Mendelian randomisation design. G j is the j th genetic variant, with direct effect γ j on exposure X , and direct effect α j on outcome Y ; θ is the estimated causal effect of the exposure on the outcome; ϕ j is the relationship between confounders U and G j ; dotted lines represent possible violations of the MR assumptions

MR assumptions

The three main assumptions of MR are (1) that the instrument G j is associated with the exposure X , called the relevance assumption; (2) that there are no unmeasured confounders of the gene-outcome association U , called the independence assumption; and (3) that the genetic variants G j affect the outcome Y only through the exposure X , called exclusion restriction. While assumption 1 can be verified empirically, assumptions 2 and 3 are empirically unverifiable (but potentially falsifiable). Owing to how instrumental variable analyses are estimated, violations of these assumptions may lead to strong biases in the estimates; therefore, such estimates should be interpreted with care and in conjunction with other evidence [ 76 ]. Several sensitivity analyses that have been developed to address potential bias from violations of the MR assumptions were employed here [ 51 ].

One-sample MR

In the one-sample MR analyses of continuous symptom variables, we used two-stage least squares (2SLS) regression. In the 2SLS approach, self-reported age at menarche X was first regressed on the genetic variants G j , obtaining the predicted values. In the second stage, the regression of the outcome Y on the exposure X is estimated as usual, replacing self-reported age at menarche with the predicted values from the first stage, hereafter referred to as “genetically predicted age at menarche”. For binary outcomes, a logistic model was used in the second stage. We applied a post-estimation correction of the standard errors (the HC1 option in the sandwich R package [ 77 ]) for both continuous and binary outcomes. In addition to one-sample MR, we conducted two-sample MR analyses based on GWAS of the symptom outcomes in MoBa. Combining one- and two-sample MR is beneficial even when the same outcome sample is used, since any bias from weak instruments would skew the one-sample estimate towards the (confounded) observational estimate and the two-sample estimate towards the null [ 78 ]. In addition, conducting two-sample MR maximises the availability of sensitivity analyses to test the MR assumptions.

Two-sample MR

In the two-sample MR analyses, only the genotype-outcome ( G-Y ) association was estimated in MoBa. For these analyses (using the TwoSampleMR package [ 71 ]), we extracted estimates for the genotype-exposure ( G-X ) association from summary-level data from the age-at-menarche GWAS [ 52 ] and produced a set of SNP-specific Wald estimates by calculating the ratio between the G-X and the G-Y associations. We tested the heterogeneity between the Wald ratios using SNP-specific Q statistics. Then, these estimates were combined using the inverse variance weighted (IVW) meta-analysis approach to obtain an estimate of the causal effect.

Multivariable MR and MR sensitivity analyses

We conducted a battery of sensitivity analyses across the one- and two-sample MR (described in Additional file 5 [ 79 , 80 , 81 , 82 , 83 , 84 , 85 ]) to assess the robustness of results and the impact of horizontal pleiotropy—where the genetic variants affect the outcome through other pathways than the exposure of interest. A likely source of horizontal pleiotropy is via BMI, which is therefore included as an additional exposure in multivariable Mendelian randomisation (MVMR) analyses [ 83 ]. The employed two-sample MR sensitivity analyses (MR-Egger [ 79 ], MR pleiotropy residual sum and outlier (MR-PRESSO) [ 80 ], weighted median [ 81 ], and contamination mixture methods [ 82 ]) make different assumptions about horizontal pleiotropy, and we consider effects that are consistent across these approaches to be more likely causal, in line with a triangulation approach.

Equivalence testing

We additionally used equivalence testing to assess whether estimated effects could be considered practically equivalent to 0. The region of practical equivalence to 0 was set, for each analysis, by pre-defining the smallest effect size of interest (SESOI). Equivalence tests were carried out with a 5% alpha. Details of the full procedure for setting the SESOI and carrying out the equivalence testing are described in Additional file 6 [ 20 , 24 , 29 , 31 , 38 , 42 , 49 , 56 , 86 , 87 , 88 , 89 , 90 ].

Negative control analyses

Since an individual’s age at menarche cannot directly influence their mental health prior to puberty, childhood symptoms can serve as a negative control outcome in our study. Such analyses can be used to detect unmeasured confounding in the context of MR, given that the negative control outcome is associated with confounders in a similar way to the outcome of interest [ 87 ]. Here, we estimate the causal effect of age at menarche on symptoms of depression, anxiety CD, ODD, and ADHD measured before puberty (at 8 years). We formally compare the estimate from the main analysis with this negative control by testing whether the 14-year estimate is consistent with an effect more extreme than the lower 95% CI of the 8-year estimate.

Missingness/handling of missing data

Within the 14-year sample, we used multiple imputation (MI) to account for missing data in all variables (see Additional file 3 for the amount of missing data per variable). Importantly, some individuals reported having not yet had their first menstrual period in the 14-year questionnaire. Imputed values for age at menarche for these individuals were not allowed to be lower than 15 years. Further details on the MI and handling of outliers are presented in Additional file 1 . In addition to MI, we used inverse probability weighting to address potential bias from selective attrition out of the study over time (see Additional file 1 ).

Power calculations

For the stage 1 submission, power analyses were conducted in R by simulation for all null hypothesis significance tests (NHSTs) and equivalence tests used to investigate each hypothesis (see summary in Table  2 , and further details in Additional file 7 ).

Software and analysis code

We conducted all statistical analyses in R version 4.1.2. Data preparation and analysis code is publicly available via GitHub: https://github.com/psychgen/aam-psych-adolesc-rr .

Overview of hypothesis tests and inference criteria

Pre-specified statistical tests and inference criteria for each hypothesis are summarised in Table  2 , including the interpretation of all potential patterns of results. Further details about the main and sensitivity analyses, equivalence bounds, and inference criteria are in Additional file 6 . Note that for directional hypotheses, we pre-specified one-sided null hypothesis significance tests and equivalence tests; therefore, some reported p -values are one-tailed.

The pre-registered analyses were conducted according to the stage 1 protocol [ 91 ], and all deviations are detailed and justified in Table  3 .

Descriptive statistics

The average age at menarche after multiple imputation (since 7.25% had not reached menarche at age 14) was 12.69 years (SD = 1.18). We conducted a sensitivity analysis setting an equal number of age at menarche values to missing, testing the imputation accuracy (see Additional file 8 ). The genetic instrument for age at menarche (based on 235 independent SNPs present in MoBa) explained 6.9% of the variation in age at menarche ( R 2  = 0.069, F  = 996.4) and was associated with BMI at 8 ( β  =  − 0.08, 95% CI [− 0.10, − 0.05], p  < 0.01) and 14 years ( β  =  − 0.10, 95% CI [− 0.12, − 0.08], p  < 0.01). There were no notable associations with other covariates (Additional file 9 : Table S3). The mean number of depressive symptoms at age 14 was 9.20 (SD = 6.56).

Depressive symptoms and diagnoses

Main analyses of symptom outcomes.

An earlier age at menarche was observationally associated with more depressive symptoms at age 14 ( β  =   −  0.11, 95% CI [ −  0.12, − 0.09], p one-tailed  < 0.01) after adjusting for covariates and pre-pubertal symptoms (see Fig.  3 ). The one-sample MR analysis indicated a small, causal relationship ( β  =   −  0.07, 95% CI [ −  0.13, 0.00], p one-tailed  = 0.03), corresponding to an increase of 0.06 standard deviations in depression symptoms score per earlier year of menarche (after re-scaling the estimate). The adjusted observational estimate was consistent with an effect as extreme as our smallest effect size of interest, but the causal estimate was just within the region of practical equivalence to 0. In our negative control one-sample MR analyses, pubertal timing was not associated with depressive symptoms prior to puberty ( β  =   −  0.03, 95% CI [ −  0.12, 0.05], p one-tailed  = 0.22), and the 14-year estimate was consistent with an effect more extreme than this.

figure 3

Observational and causal links between age at menarche and depression. A Standardised betas of age at menarche predicting adolescent depressive symptoms, based on linear regressions unadjusted and adjusted for covariates and symptoms at age 8, one-sample Mendelian randomisation (MR), and negative control MR with 8-year depressive symptoms as the outcome. B Standardised odds ratios of age at menarche predicting depression diagnoses during adolescence, based on logistic regressions unadjusted and adjusted for covariates, and one-sample MR. In both panels, the orange dashed line represents the smallest effect size of interest for the observational analysis; the blue dashed line represents the smallest effect size of interest for the MR. NB: 95% confidence intervals are presented to show the precision of the estimates, but all statistical tests for depression outcomes were pre-specified to be one-tailed, meaning that the visual interpretation of the CIs in relation to the point null and smallest effect sizes of interest differs from the test result (described in text) in places

Main analyses of diagnostic outcomes

For depression diagnoses during adolescence (see Fig.  3 B), we also found evidence of a small association in the observational analysis (OR = 0.78, 95% CI [0.72, 0.84], p one-tailed  < 0.01) and one-sample MR (OR = 0.74, 95% CI [0.54, 1.01], p one-tailed  = 0.03). The MR estimate was equivalent to a 29% increase in the likelihood of receiving a depression diagnosis with each year of earlier menarche. The number of depression cases in childhood (i.e., 7) was too small to include this as a negative control outcome. Only the MR estimate, which was less precise than the observational estimates, was consistent with values at least as extreme as our SESOI.

Sensitivity analyses

The two-sample MR sensitivity analyses with depressive symptoms as the outcome yielded mixed results (see Fig.  4 ). The IVW ( β  =   −  0.05, 95% CI [ −  0.11, 0.01], p one-tailed  = 0.04), contamination mixture ( β  =   −  0.18, 95% CI [ −  0.28, − 0.04], p one-tailed  < 0.01), and weighted median methods ( β  =   −  0.02, 95% CI [ −  0.11, 0.06], p one-tailed  = 0.31) gave estimates in a consistent direction with the one-sample MR. The MR-Egger estimate was positive ( β  = 0.08, 90% CI [ −  0.07, 0.23], p one-tailed  = 0.14). Moreover, after Steiger filtering, the IVW estimate was attenuated ( β  =   −  0.02, 95% CI [ −  0.08, 0.04], p one-tailed  = 0.22), whereas the contamination mixture estimate was unchanged ( β  =   −  0.17, 95% CI [ −  0.25, − 0.04], p one-tailed  = 0.01). The MR-Egger intercept provided little evidence of directional pleiotropy, with an intercept value of − 0.006 (95% CI [ −  0.01, 0.00], p  = 0.05). The MR PRESSO global test did not detect any outliers.

figure 4

Two-sample MR sensitivity analyses of age at menarche and depressive symptoms. MR sensitivity analyses showing broadly consistent directions of effect (except for MR-Egger), with an earlier age at menarche related to elevated adolescent depressive symptoms. MR, Mendelian randomisation; SNP, single nucleotide polymorphism

MVMR analyses accounting for overlap with childhood body size and adult BMI were limited by weak instruments (conditional F -statistic range 7.2–10.3). The estimate for age at menarche predicting depressive symptoms at age 14 was in a consistent direction with the IVW estimate (which was β  =  − 0.05) but partly attenuated when accounting for childhood body size ( β  =  − 0.03, 95% CI [− 0.07, 0.02], p one-tailed  = 0.11) and substantially attenuated when accounting for adult BMI ( β  =  − 0.01, 95% CI [− 0.05, 0.04], p one-tailed  = 0.40). Modified Q -statistics indicated pleiotropy when including childhood body size ( Q 1866  = 2044.3, p  < 0.01) and adult BMI ( Q 1812  = 1924.0, p  = 0.03). When including genetically predicted estradiol as a second exposure, the estimate was only somewhat attenuated ( β  =  − 0.04, 95% CI [− 0.09, − 0.00], p one-tailed  = 0.02). Again, there was evidence of pleiotropy ( Q 1804  = 1951.0, p  < 0.01). The MVMR sensitivity analyses provided broadly similar results (Additional file 10 : Table S4).

Furthermore, we incorporated the stage of breast development as an additional exposure in the observational models. A more advanced breast stage was associated with more depression diagnoses but including it in the same model left estimates for age at menarche largely unchanged (see Additional file 8 ). We also conducted the analyses based on a categorised age at menarche exposure and dichotomised SFMQ—as in Sequeira et al. [ 56 ]—which showed consistent results (see Additional file 8 ). Finally, we also incorporated IP weights to account for selective attrition, which attenuated much of the differences in baseline covariates between participants ( n  = 13,398) and non-participants ( n  = 41,832) at age 14 (Additional file 11 : Fig. S9). The weighted results for symptoms and diagnoses of depression were similar to the unweighted results, although with somewhat reduced precision due to the inclusion of the weights (Additional file 12 : Fig. S10).

Symptoms and diagnoses in other domains

For anxiety symptoms at age 14 (see Fig.  5 ), the observational analysis adjusted for covariates, concurrent depressive symptoms and pre-pubertal anxiety symptoms showed little evidence of an association ( β  =  − 0.02, 95% CI [− 0.04, − 0.01], p  < 0.01). Also, the one-sample MR provided no evidence of a causal relationship ( β  = 0.02, 95% CI [− 0.05, 0.09], p  = 0.64). Similarly, there was little evidence for a relationship with ADHD traits in the fully adjusted observational analysis ( β  =  − 0.02, 95% CI [− 0.04, − 0.01], p  = 0.01) or the MR ( β  = 0.02, 95% CI [− 0.06, 0.09], p  = 0.67). There was a small adjusted observational relationship with CD symptoms ( β  =  − 0.06, 95% CI [− 0.08, − 0.05], p  < 0.01), with which the MR estimate was consistent, but confidence intervals included 0 ( β  =  − 0.06, 95% CI [− 0.13, 0.01], p  = 0.08). The results for ODD symptoms showed no evidence of an observational ( β  = 0.01, 95% CI [− 0.01, 0.03], p  = 0.27) or causal relationship ( β  = 0.01, 95% CI [− 0.06, 0.08], p  = 0.80). Only the unadjusted (but not adjusted) observational and causal estimates for CD symptoms were consistent with effects outside the range of practical equivalence to 0. Negative control analyses for these outcomes were consistent with these results.

figure 5

Observational and causal links between age at menarche and other domains. A Standardised betas for age at menarche predicting symptoms in other domains of mental health in adolescence, based on linear regressions unadjusted and adjusted for covariates, concurrent depressive symptoms (age 14) and pre-pubertal symptoms (age 8), one-sample Mendelian randomisation (MR), and negative control MR with 8-year symptoms as outcomes. B Standardised odds ratios of age at menarche predicting diagnoses in other domains of mental health during adolescence (ages 10–17), based on logistic regressions unadjusted and adjusted for covariates, adolescent (ages 10–17) depression diagnoses and childhood diagnoses for each domain (ages 0–8), and one-sample MR. In both panels, the orange dashed line represents the smallest effect size of interest for the observational analysis; the blue dashed line represents the smallest effect size of interest for the MR; 95% confidence intervals are presented. ANX, anxiety; CD, conduct disorder; ODD, oppositional defiant disorder; ADHD, attention-deficit hyperactivity disorder; DBD, disruptive behaviour disorder

Main analyses of diagnostic outcomes

For anxiety diagnoses during adolescence (see Fig.  5 ), the observational analysis adjusting for covariates, adolescent depression, and childhood anxiety diagnoses showed a small observational association (OR = 0.86, 95% CI [0.80, 0.91], p  < 0.01), and the MR estimate was consistent, but confidence intervals included the null (OR = 0.82, 95% CI [0.63, 1.07], p  = 0.14). Similarly, age at menarche showed no evidence of an observational (OR = 0.90, 95% CI [0.80, 1.01], p  = 0.07) or causal relationship with ADHD diagnoses (OR = 0.69, 95% CI [0.47, 1.02], p  = 0.06), although the MR point estimate was more extreme than the effect found for depression diagnoses. For DBD diagnoses, there was no evidence of an observational (OR = 0.90, 95% CI [0.76, 1.07], p  = 0.24) or causal relationship (OR = 1.04, 95% CI [0.52, 2.07], p  = 0.92). Across the observational analyses, all estimates fell within our defined range of practical equivalence to 0, while MR findings were generally too imprecise to draw clear conclusions. There were too few diagnoses in childhood to conduct negative control analyses (see Table  3 ).

The two-sample MR sensitivity analyses for symptoms in other domains showed little evidence of any associations apart from CD, where all estimates were consistent with earlier age at menarche leading to more CD symptoms (Additional file 13 : Fig. S11). MVMR analyses accounting for overlap with major depression provided little evidence of causal relationships between age at menarche and 14-year symptoms of anxiety ( β  =  − 0.01, 95% CI [− 0.05, 0.03], p  = 0.63), CD ( β  = 0.00, 95% CI [− 0.04, 0.04], p  = 0.94), ODD ( β  = 0.00, 95% CI [− 0.04, 0.04], p  = 0.87), and ADHD ( β  = 0.01, 95% CI [− 0.03, 0.05], p  = 0.80). MVMR sensitivity analyses were sometimes imprecise but consistent with the pattern of null findings (Additional file 10 : Table S4). The MR-Egger test showed little evidence of directional pleiotropy for any of the outcomes. The MR PRESSO global test also did not detect any outliers.

Exploratory analyses

We conducted unregistered follow-up analyses of the causal effect of age at menarche on diagnostic outcomes since the MR results were imprecise. To explore whether the estimates for anxiety and ADHD were attenuated when accounting for comorbid depression and corresponding childhood diagnoses, we ran one-sample MR models with these included as covariates. This somewhat attenuated the estimate for anxiety (OR = 0.86, 95% CI [0.66, 1.13]) but not for ADHD (OR = 0.64, 95% CI [0.43, 0.97]). We also sought to explore the impact of the timing of diagnosis further, since this may have an impact on estimates. To this end, we divided the outcomes into new diagnoses in (1) preadolescence (ages 9–11), (2) early adolescence (ages 12–14), and (3) mid-late adolescence (ages 15–17). For depression, there was only evidence of a causal relationship in early (OR = 0.50, 95% CI [0.26, 0.95]) and not mid-late adolescence (OR = 0.95, 95% CI [0.64, 1.41]). The pattern for ADHD diagnoses was the same, with only a relationship in early adolescence (OR = 0.47, 95% CI [0.23, 0.95]). There were no relationships with anxiety diagnoses in either time window (see Additional file 8 ).

In this Registered Report, we assessed the causal link between age at menarche and adolescent mental health in a large, population-based cohort. In observational analyses, we found evidence of an association between earlier age at menarche and elevated depressive symptoms at age 14, which was robust to the inclusion of measured covariates, pre-pubertal symptoms, and across all sensitivity analyses. In contrast, age at menarche was not associated with symptoms in the other domains apart from CD, once depressive symptoms were accounted for. One-sample MR analyses mirrored the observational results—with evidence of a small, causal effect of earlier age at menarche on elevated depressive symptoms—but not the other outcome domains. Negative control MR analyses using pre-pubertal symptoms as outcomes corroborated this pattern of findings. The results for diagnostic outcomes were generally less precise, indicating a causal effect of earlier age at menarche on more depression diagnoses during adolescence, but not diagnoses in other domains. Taken together, the main analyses supported our hypothesis of a causal effect of earlier age at menarche on diagnoses of adolescent depression and suggest that this effect is specific to depression, rather than influencing adolescent mental health in general.

Our adjusted observational estimates, negative control analyses, and one-sample MR analyses were all consistent with small causal effects of age at menarche on symptoms and diagnoses of depression. This finding effectively replicates a previous finding in adolescents [ 56 ] and aligns with previous findings in adults [ 61 , 70 , 92 ]. For each year earlier menarche, the odds of being diagnosed with depression during adolescence (ages 10–17) increased by approximately 29%. The robustness of this effect—which received broadly consistent empirical support across different outcomes and methodologies—is striking. Nonetheless, some considerations are important to its interpretation. The effects of age at menarche on symptoms and diagnoses of depression were very small, and the extent to which they should be considered clinically meaningful is a matter of debate. On the symptom outcome, individuals with a year earlier menarche would score less than half a point higher, on average, on the SMFQ scale. For diagnoses, the absolute risk of adolescent depression diagnoses changes from 5.2 to 6.6% with 1 year earlier menarche. Our analyses defined a region of practical equivalence to 0 based on pre-defined smallest effect sizes of interest, and while some estimates were consistent with effects outside this region, the majority of plausible values fell within. It should be noted that our smallest effect sizes of interest were set based on existing empirical evidence, and not based on clinical change thresholds—which may be preferable [ 86 ]. Overall, the effect sizes obtained here are consistent with the small associations obtained for a range of mental health and behavioural outcomes in other studies [ 20 , 36 ] and demonstrate a reduction in effect sizes when accounting for confounding.

While the evidence from our main analyses of depression outcomes was relatively consistent, potential complexities in the interpretation of these relationships prompted us to perform extensive sensitivity analyses. The two-sample MR sensitivity analyses, which make different assumptions about the role of horizontal pleiotropy, provided mixed results regarding the causal relationship of age at menarche with depressive symptoms. The results based on Steiger filtering suggested that reverse causation, or other pleiotropic pathways, may be involved in the relationship between age at menarche and adolescent depressive symptoms. It should be noted that these analyses were relatively imprecise because they required that we perform GWAS of the symptom outcomes in MoBa with a sample size (9832) well below what is typically required for genomic discovery. We also conducted multivariable MR analyses, which were designed to estimate the direct effect of age at menarche while accounting for BMI. The results indicated that the association between age at menarche and depressive symptoms was confounded by BMI. The estimate was in a consistent direction but partly attenuated when including childhood body size and markedly attenuated when including adult BMI. While childhood body size is a likely confounder, post-pubertal BMI could also be on the causal pathway to depression; therefore, the impact of adjusting for adult BMI should not be taken primarily as evidence of confounding. These analyses were limited by weak instruments, so may have been biassed towards the null by the violation of the relevance assumption. Nonetheless, these and previous findings [ 56 , 61 ] are in line with the role of BMI as a confounder of the link between pubertal development and adolescent depression. Future research utilising stronger genetic instruments would be required to quantify the extent of confounding by BMI, and its potential mediating role.

The pattern of results in our study suggests that age at menarche may contribute to symptom differentiation in adolescence, affecting risk for depression diagnoses specifically, rather than acting across the range of mental health outcomes included here. There were some possible exceptions, including conduct symptoms (where the adjusted observational estimate suggested a small effect, and the one-sample MR result was consistent in magnitude and direction but non-significant) and ADHD (where observational results suggested no association, but the MR point estimate—while non-significant—was more extreme than that for depression). However, we saw very little signal for anxiety, disruptive behaviour disorders, and ADHD when accounting for co-occurring depression and other confounders. Previous observational studies have shown associations with a wide range of conditions, including anxiety and behavioural conditions [ 20 , 21 ]. Notably, our unadjusted estimates for each condition—both symptoms and diagnoses—were also indicative of effects. As a result of the general attenuation of these effects after control for confounders (both explicitly in the adjusted observational estimate and implicitly in MR), we can infer that some of the previously identified associations may not be causal or that they may not remain after accounting for the effect of co-occurring depression. We recommend that future studies account for the comorbidity between mental health conditions when assessing pubertal timing effects, given the potential for condition-specific mechanisms.

Interestingly, sex differences in depression emerge and then peak during adolescence, before declining across adulthood [ 93 ]. To explain this phenomenon, future research could benefit from taking a lifespan approach to reproductive development and women’s mental health. Given some evidence of converging aetiology of depression, age at menarche and menopause [ 70 , 94 ], future studies could aim to explore their joint genetic architecture and potential shared biological underpinnings [ 95 ]. A hypothesis that remains to be tested is that the associations between depression and earlier female reproductive events (including age at menarche) may be caused by the duration of sex hormone exposure. If so, we would expect that the risk of depression would increase whenever menarche occurs—and that in the longer term, those with a later menarche would eventually “catch up” with others experiencing it earlier. Indeed, evidence of this “catching up” has recently been found in ALSPAC [ 96 ]. In line with this pattern of results, our exploratory analyses indicated an ~ 80% increase in the odds of depression diagnoses and a ~ 90% increase in the odds of ADHD diagnoses per year of earlier menarche in early adolescence (ages 12–14), but not in the years prior to or after this period. This may suggest that menarche causes a transient increase in the prevalence of depression (and possibly ADHD, although this finding should be considered hypothesis-generating rather than conclusive). Future waves of data collection in MoBa could be used to corroborate this further.

We also conducted MVMR analyses utilising genetic variants associated with estradiol as an additional exposure [ 68 ], showing limited attenuation of the relationship between earlier age at menarche and elevated depressive symptoms when including estradiol. When including the stage of breast development as an additional predictor in observational analyses, results for depressive symptoms and diagnoses remained unchanged. This conflicts with previous analyses in ALSPAC which suggested that breast stage is driving the association with depressive symptoms rather than age at menarche [ 42 ]. To shed some light on this complex pattern of findings, future studies could directly investigate the role of estradiol and other pubertal sex hormones in mediating the effects of pubertal development, rather than relying on proxy measures.

Future studies could also test the causal relationship between pubertal timing and mental health using other genetically informed methods. For instance, co-twin control studies have found that pubertal timing effects on adolescent mental health were largely due to shared genetic influences [ 97 , 98 ]. MR can also be conducted within families, accounting for population phenomena that may bias genetic associations, such as population stratification, dynastic effects, and assortative mating [ 99 ]. Although our sample of female adolescents was too small for conducting well-powered within-family MR, consortium-based analyses could offer a solution.

Limitations

This study features a relatively large sample of genotyped adolescents and a strong genetic instrument for age at menarche. Furthermore, the Registered Report format and extensive sensitivity analyses strengthen the support for our conclusions. However, there are some important limitations to our study. Although our sample size was larger than previous studies, the precision of MR analyses was low for less prevalent conditions, especially DBD. Another limiting factor to the precision of estimates was the censoring of diagnoses in the registry data, which we proposed to handle with multiple imputation in the stage 1 protocol. However, this turned out not to be feasible, resulting in a deviation from the registered protocol (see Table  3 for a further description and justification). The accuracy of the imputation of missing age at menarche values was also lower than anticipated, but this was a minor issue since values were known to be 15 or higher.

As well as some issues with low precision, limitations in the MR components of our study are linked to the assumptions upon which the method rests. An advantage of our one-sample MR design is that the relevance and independence assumptions could be tested directly. Here, the genetic instrument was strongly associated with age at menarche—but not measured covariates, besides BMI and to some extent maternal age—providing some support for the validity of these assumptions. Furthermore, the two-sample MR sensitivity analyses provided insufficient evidence of directional horizontal pleiotropy, a particularly likely violation of the assumptions in the context of mental health outcomes [ 100 ]. However, we reiterate that the final two MR assumptions cannot be verified empirically and that violations may give biassed estimates. It is worth highlighting that the MVMR analyses including BMI may violate these assumptions. First, weak instruments may have biassed the estimates from these analyses towards the null. On a related note, despite broadly consistent results from the MVMR sensitivity analyses, BMI SNPs are highly pleiotropic and may have introduced bias in the MVMR. Overall, the potential of bias means that triangulation is of key importance, and results that are consistent across different methods and outcomes should be given more weight than isolated findings.

Since different methods may lead to different sources of bias, triangulation of multiple analytic approaches has been suggested as a way forward in aetiological epidemiology [ 58 ]. However, a key distinction may be between deciding which methods will be combined—and how—before conducting the analyses, or after the fact. “Prospective triangulation”, or pre-specifying a triangulation strategy and specific inference criteria such as in our Registered Report, may further increase the confidence we can have in the results.

There are also limitations to the generalisability of this study. First, MoBa is not fully representative of the general population due to non-random participation at the recruitment stage. Those less represented are the youngest women, those living alone, smokers, and women with previous stillbirths or more than two previous births [ 101 ]. However, previous research has suggested that non-random initial participation may have a limited impact on exposure-outcome associations [ 101 , 102 ]. Furthermore, selective attrition could be expected to have an important impact on our results, due to the substantial drop-out at age 14 in MoBa. Yet, our IPW analyses showed consistent results when accounting for selective attrition. Finally, this study is based on a predominantly white European cohort. Future research in cohorts from non-European countries would advance the field further.

Our findings—based on extensive analyses and hypotheses registered prior to the availability of data [ 91 ]—provided support for the hypothesis that an earlier age at menarche causally increases the risk of adolescent depression. After accounting for depression and other confounders, we found no clear evidence of this effect being present for anxiety, disruptive behaviour disorders, or ADHD. A range of sensitivity analyses corroborated our results but suggested that the causal relationship with depressive symptoms may be partly confounded by BMI and/or influenced by low-level genetic pleiotropy. In sum, although the associations of age at menarche with symptoms and diagnoses of depression are likely partly confounded, our results supported small causal relationships. Since the effects were specific rather than shared across all the mental health domains included here, the timing of menarche may contribute to the developmental differentiation of depression from other mental health conditions in adolescence.

Availability of data and materials

The MoBa data are not publicly available as the consent given by the participants does not open for storage of data on an individual level in repositories or journals. Researchers who want access to data sets for replication should submit an application to datatilgang(at)fhi.no. Access to datasets requires approval from the Regional Committee for Medical and Health Research Ethics in Norway and an agreement with MoBa.

Data preparation and analysis code for all elements of the project are publicly available on GitHub at https://github.com/psychgen/aam-psych-adolesc-rr .

Abbreviations

Two-stage least squares

Attention-deficit hyperactivity disorder

Avon Longitudinal Study of Parents and Children

Body mass index

Conduct disorder

Confidence interval

Genome-wide association study

Inverse probability weighting

Inverse variance weighted

Control and payment of health refunds

Multiple imputation

Norwegian Mother, Father, and Child Cohort Study

  • Mendelian randomisation

Mendelian randomisation-Egger

MR pleiotropy residual sum and outlier

Multivariable Mendelian randomisation

Null hypothesis significance testing

Norwegian Patient Registry

Oppositional defiant disorder

Randomised controlled trial

Rating Scale for Disruptive Behaviour Disorders

Screen for Child Anxiety-Related Disorders

Smallest effect size of interest

Short Mood and Feelings Questionnaire

Single nucleotide polymorphisms

Benoit A, Lacourse E, Claes M. Pubertal timing and depressive symptoms in late adolescence: the moderating role of individual, peer, and parental factors. Dev Psychopathol. 2013;25(2):455–71.

Article   PubMed   Google Scholar  

Conley CS, Rudolph KD, Bryant FB. Explaining the longitudinal association between puberty and depression: sex differences in the mediating effects of peer stress. Dev Psychopathol. 2012;24(2):691–701.

Article   PubMed   PubMed Central   Google Scholar  

Copeland W, Shanahan L, Miller S, Costello EJ, Angold A, Maughan B. Outcomes of early pubertal timing in young women: a prospective population-based study. Am J Psychiatry. 2010;167(10):1218–25.

Ge X, Conger RD, Elder GH Jr. Pubertal transition, stressful life events, and the emergence of gender differences in adolescent depressive symptoms. Dev Psychol. 2001;37(3):404–17.

Article   CAS   PubMed   Google Scholar  

Ge X, Kim IJ, Brody GH, Conger RD, Simons RL, Gibbons FX, et al. It’s about timing and change: pubertal transition effects on symptoms of major depression among African American youths. Dev Psychol. 2003;39(3):430–9.

Ge X, Brody GH, Conger RD, Simons RL. Pubertal maturation and African American children’s internalizing and externalizing symptoms. J Youth Adolesc. 2006;35(4):528–37.

Article   Google Scholar  

Graber JA, Seeley JR, Brooks-Gunn J, Lewinsohn PM. Is pubertal timing associated with psychopathology in young adulthood? J Am Acad Child Adolesc Psychiatry. 2004;43(6):718–26.

Graber JA, Brooks-Gunn J, Warren MP. Pubertal effects on adjustment in girls: moving from demonstrating effects to identifying pathways. J Youth Adolesc. 2006;35(3):391–401.

Hamlat EJ, Stange JP, Abramson LY, Alloy LB. Early pubertal timing as a vulnerability to depression symptoms: differential effects of race and sex. J Abnorm Child Psychol. 2014;42(4):527–38.

Keenan K, Culbert KM, Grimm KJ, Hipwell AE, Stepp SD. Timing and tempo: exploring the complex association between pubertal development and depression in African American and European American girls. J Abnorm Psychol. 2014;123(4):725–36.

Mendle J, Harden KP, Brooks-Gunn J, Graber JA. Development’s tortoise and hare: pubertal timing, pubertal tempo, and depressive symptoms in boys and girls. Dev Psychol. 2010;46(5):1341–53.

Nadeem E, Graham S. Early puberty, peer victimization, and internalizing symptoms in ethnic minority adolescents. J Early Adolesc. 2005;25(2):197–222.

Rudolph KD, Troop-Gordon W. Personal-accentuation and contextual-amplification models of pubertal timing: predicting youth depression. Dev Psychopathol. 2010;22(2):433–51.

Blumenthal H, Leen-Feldner EW, Babson KA, Gahr JL, Trainor CD, Frala JL. Elevated social anxiety among early maturing girls. Dev Psychol. 2011;47(4):1133–40.

Deardorff J, Hayward C, Wilson KA, Bryson S, Hammer LD, Agras S. Puberty and gender interact to predict social anxiety symptoms in early adolescence. J Adolesc Health. 2007;41(1):102–4.

Bakker MP, Ormel J, Lindenberg S, Verhulst FC, Oldehinkel AJ. Generation of interpersonal stressful events: the role of poor social skills and early physical maturation in young adolescents—the TRAILS study. J Early Adolesc. 2011;31(5):633–55.

Haynie DL. Contexts of risk? Explaining the link between girls’ pubertal development and their delinquency involvement. Soc Forces. 2003;82(1):355–97.

Lynne SD, Graber JA, Nichols TR, Brooks-Gunn J, Botvin GJ. Links between pubertal timing, peer influences, and externalizing behaviors among urban students followed through middle school. J Adolesc Health. 2007;40(2):181.e7–181.e13.

Mrug S, Elliott M, Gilliland MJ, Grunbaum JA, Tortolero SR, Cuccaro P, et al. Positive parenting and early puberty in girls: protective effects against aggressive behavior. Arch Pediatr Adolesc Med. 2008;162(8):781–6.

Ullsperger JM, Nikolas MA. A meta-analytic review of the association between pubertal timing and psychopathology in adolescence: are there sex differences in risk? Psychol Bull. 2017;143(9):903–38.

Hamlat EJ, Snyder HR, Young JF, Hankin BL. Pubertal timing as a transdiagnostic risk for psychopathology in youth. Clin Psychol Sci. 2019;7(3):411–29.

Hankin BL, Abramson LY, Moffitt TE, Silva PA, McGee R, Angell KE. Development of depression from preadolescence to young adulthood: emerging gender differences in a 10-year longitudinal study. J Abnorm Psychol. 1998;107(1):128–40.

Graber JA. Pubertal timing and the development of psychopathology in adolescence and beyond. Horm Behav. 2013;64(2):262–9.

Black SR, Klein DN. Early menarcheal age and risk for later depressive symptomatology: the role of childhood depressive symptoms. J Youth Adolesc. 2012;41(9):1142–50.

Ge X, Conger RD, Elder GH Jr. Coming of age too early: pubertal influences on girls’ vulnerability to psychological distress. Child Dev. 1996;67(6):3386–400.

Joinson C, Heron J, Lewis G, Croudace T, Araya R. Timing of menarche and depressive symptoms in adolescent girls from a UK cohort. Br J Psychiatry. 2011;198(1):17–23.

Lam TH, Stewart SM, Leung GM, Lee PW, Wong JP, Ho LM, et al. Depressive symptoms among Hong Kong adolescents: relation to atypical sexual feelings and behaviors, gender dissatisfaction, pubertal timing, and family and peer relationships. Arch Sex Behav. 2004;33(5):487–96.

Kaltiala-Heino R, Kosunen E, Rimpelä M. Pubertal timing, sexual behaviour and self-reported depression in middle adolescence. J Adolesc. 2003;26(5):531–45.

Kaltiala-Heino R, Marttunen M, Rantanen P, Rimpelä M. Early puberty is associated with mental health problems in middle adolescence. Soc Sci Med. 2003;57(6):1055–64.

Rierdan J, Koff E. Depressive symptomatology among very early maturing girls. J Youth Adolesc. 1991;20(4):415–25.

Stice E, Presnell K, Bearman SK. Relation of early menarche to depression, eating disorders, substance abuse, and comorbid psychopathology among adolescent girls. Dev Psychol. 2001;37(5):608–19.

Hayward C, Gotlib IH, Schraedley PK, Litt IF. Ethnic differences in the association between pubertal status and symptoms of depression in adolescent girls. J Adolesc Health. 1999;25(2):143–9.

Carter R, Caldwell CH, Matusko N, Antonucci T, Jackson JS. Ethnicity, perceived pubertal timing, externalizing behaviors, and depressive symptoms among black adolescent girls. J Youth Adolesc. 2011;40(10):1394–406.

Martino S, Lester D. Menarche and eating disorders. Psychol Rep. 2013;113(1):315–7.

McGuire TC, McCormick KC, Koch MK, Mendle J. Pubertal maturation and trajectories of depression during early adolescence. Front Psychol. 2019;10:1362.

Smith-Woolley E, Rimfeld K, Plomin R. Weak associations between pubertal development and psychiatric and behavioral problems. Transl Psychiatry. 2017;7(4):e1098.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Toffol E, Koponen P, Luoto R, Partonen T. Pubertal timing, menstrual irregularity, and mental health: results of a population-based study. Arch Womens Ment Health. 2014;17(2):127–35.

Joinson C, Heron J, Araya R, Lewis G. Early menarche and depressive symptoms from adolescence to young adulthood in a UK cohort. J Am Acad Child Adolesc Psychiatry. 2013;52(6):591–8.

Angold A, Costello EJ, Erkanli A, Worthman CM. Pubertal changes in hormone levels and depression in girls. Psychol Med. 1999;29(5):1043–53.

Balzer BW, Duke SA, Hawke CI, Steinbeck KS. The effects of estradiol on mood and behavior in human female adolescents: a systematic review. Eur J Pediatr. 2015;174(3):289–98.

Skovlund CW, Mørch LS, Kessing LV, Lidegaard Ø. Association of hormonal contraception with depression. JAMA Psychiat. 2016;73(11):1154–62.

Joinson C, Heron J, Araya R, Paus T, Croudace T, Rubin C, et al. Association between pubertal development and depressive symptoms in girls from a UK cohort. Psychol Med. 2012;42(12):2579–89.

Horvath G, Knopik VS, Marceau K. Polygenic influences on pubertal timing and tempo and depressive symptoms in boys and girls. J Res Adolesc. 2020;30(1):78–94.

Hooper L, Ness AR, Smith GD. Antioxidant strategy for cardiovascular disease. The Lancet. 2001;357(9269):1705–6.

Article   CAS   Google Scholar  

Bell JA, Carslake D, Wade KH, Richmond RC, Langdon RJ, Vincent EE, et al. Influence of puberty timing on adiposity and cardiometabolic traits: a Mendelian randomisation study. PLoS Med. 2018;15(8):e1002641.

Hartwig FP, Bowden J, de Mola CL, Tovo-Rodrigues L, Smith GD, Horta BL. Body mass index and psychiatric disorders: a Mendelian randomization study. Sci Rep. 2016;6(1):1–11.

Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50(5):668–81.

Tyrrell J, Mulugeta A, Wood AR, Zhou A, Beaumont RN, Tuke MA, et al. Using genetics to understand the causal influence of higher BMI on depression. Int J Epidemiol. 2019;48(3):834–48.

Lien L, Haavet OR, Dalgard F. Do mental health and behavioural problems of early menarche persist into late adolescence? A three year follow-up study among adolescent girls in Oslo. Norway Soc Sci Med. 2010;71(3):529–33.

Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22.

Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J, Munafò MR, et al. Mendelian randomization. Nat Rev Methods Primer. 2022;2(1):1–21.

Day FR, Thompson DJ, Helgason H, Chasman DI, Finucane H, Sulem P, et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat Genet. 2017;49(6):834–41.

Davey Smith G, Holmes MV, Davies NM, Ebrahim S. Mendel’s laws, Mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. Eur J Epidemiol. 2020;35(2):99–111.

Lundblad MW, Jacobsen BK. The reproducibility of self-reported age at menarche: the Tromsø Study. BMC Womens Health. 2017;17(1):1–7.

Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey SG. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27(8):1133–63.

Sequeira ME, Lewis SJ, Bonilla C, Smith GD, Joinson C. Association of timing of menarche with depressive symptoms and depression in adolescence: Mendelian randomisation study. Br J Psychiatry. 2017;210(1):39–46.

Magnus P, Birke C, Vejrup K, Haugan A, Alsaker E, Daltveit AK, et al. Cohort profile update: the Norwegian Mother and Child Cohort Study (MoBa). Int J Epidemiol. 2016;45(2):382–8.

Lawlor DA, Tilling K, Davey SG. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45(6):1866–86.

PubMed   Google Scholar  

Munafò MR, Davey SG. Robust research needs many lines of evidence. Nature. 2018;553(7686):399–402.

Munafò MR, Higgins JP, Smith GD. Triangulating evidence through the inclusion of genetically informed designs. Cold Spring Harb Perspect Med. 2021;11(8):a040659.

Magnus MC, Guyatt AL, Lawn RB, Wyss AB, Trajanoska K, Küpers LK, et al. Identifying potential causal effects of age at menarche: a Mendelian randomization phenome-wide association study. BMC Med. 2020;18(1):1–17.

Munafò MR, Nosek BA, Bishop DV, Button KS, Chambers CD, Du Sert NP, et al. A manifesto for reproducible science. Nat Hum Behav. 2017;1(1):1–9.

Angold A, Costello EJ, Messer SC, Pickles A. Development of a short questionnaire for use in epidemiological studies of depression in children and adolescents. Int J Methods Psychiatr Res. 1995;5(4):237–49.

Google Scholar  

Birmaher B, Khetarpal S, Brent D, Cully M, Balach L, Kaufman J, et al. The Screen for Child Anxiety Related Emotional Disorders (SCARED): scale construction and psychometric characteristics. J Am Acad Child Adolesc Psychiatry. 1997;36(4):545–53.

Silva RR, Alpert M, Pouget E, Silva V, Trosper S, Reyes K, et al. A rating scale for disruptive behavior disorders, based on the DSM-IV item pool. Psychiatr Q. 2005;76(4):327–39.

Paltiel L, Anita H, Skjerden T, Harbak K, Bækken S, Kristin SN, et al. The biobank of the Norwegian Mother and Child Cohort Study–present status. Nor Epidemiol. 2014;24(1–2):29–35.

Corfield EC, Frei O, Shadrin AA, Rahman Z, Lin A, Athanasiu L, et al. The Norwegian Mother, Father, and Child cohort study (MoBa) genotyping data resource: MoBaPsychGen pipeline v. 1. 2022. Preprint at https://www.biorxiv.org/content/10.1101/2022.06.23.496289v3 .

Schmitz D, Ek WE, Berggren E, Höglund J, Karlsson T, Johansson Å. Genome-wide association study of estradiol levels, and the causal effect of estradiol on bone mineral density. J Clin Endocrinol Metab. 2021;106(11):e4471–86.

Richardson TG, Sanderson E, Elsworth B, Tilling K, Davey SG. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: Mendelian randomisation study. BMJ. 2020;369:m1203.

Howard DM, Adams MJ, Clarke TK, Hafferty JD, Gibson J, Shirali M, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 2019;22(3):343–52.

Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408.

Schmitz D, Ek WE, Berggren E, Höglund J, Karlsson T, Johansson Å. Genome-wide association study of estradiol levels and the causal effect of estradiol on bone mineral density. J Clin Endocrinol Metab. 2021;106(11):e4471–86.

Felix JF, Bradfield JP, Monnereau C, van der Valk RJP, Stergiakouli E, Chesi A, et al. Genome-wide association analysis identifies three new susceptibility loci for childhood body mass index. Hum Mol Genet. 2016;25(2):389–403.

Mbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki JA, Ziyatdinov A, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet. 2021;53(7):1097–103.

Hemani G, Tilling K, Davey SG. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLOS Genet. 2017;13(11):e1007081.

Hernan MA, Robins J. Causal inference: what if. Boca Raton: Chapman & Hill/CRC; 2020.

Zeileis A, Köll S, Graham N. Various versatile variances: an object-oriented implementation of clustered covariances in R. J Stat Softw. 2020;95(1):1–36.

Inoue A, Solon G. Two-sample instrumental variables estimators. Rev Econ Stat. 2010;92(3):557–61.

Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1):R89–98.

Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50(5):693–8.

Burgess S, Bowden J, Fall T, Ingelsson E, Thompson SG. Sensitivity analyses for robust causal inference from Mendelian randomization analyses with multiple genetic variants. Epidemiol Camb Mass. 2017;28(1):30–42.

Burgess S, Foley CN, Allara E, Staley JR, Howson JM. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat Commun. 2020;11(1):1–11.

Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181(4):251–60.

Sanderson E, Spiller W, Bowden J. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization. Stat Med. 2021;40:5434–52.

Grant AJ, Burgess S. Pleiotropy robust methods for multivariable Mendelian randomization. Stat Med. 2021;40(26):5813–30.

Lakens D, Scheel AM, Isager PM. Equivalence testing for psychological research: a tutorial. Adv Methods Pract Psychol Sci. 2018;1(2):259–69.

Sanderson E, Macdonald-Wallis C, Davey SG. Negative control exposure studies in the presence of measurement error: implications for attempted effect estimate calibration. Int J Epidemiol. 2018;47(2):587–96.

Simonsohn U. Small telescopes: detectability and the evaluation of replication results. Psychol Sci. 2015;26(5):559–69.

Brion MJA, Shakhbazov K, Visscher PM. Calculating statistical power in Mendelian randomization studies. Int J Epidemiol. 2013;42(5):1497–501.

Elwert F, Winship C. Endogenous selection bias: the problem of conditioning on a collider variable. Annu Rev Sociol. 2014;40(1):31–53.

Askelund AD, Wootton RE, Torvik FA, Lawn RB, Ask H, Corfield EC, et al. Assessing causal links between age at menarche and adolescent mental health: a Mendelian randomisation study [Registered Report Stage 1 Protocol]. figshare; 2022. Available from: https://springernature.figshare.com/articles/dataset/Assessing_causal_links_between_age_at_menarche_and_adolescent_mental_health_a_Mendelian_randomisation_study_Registered_Report_Stage_1_Protocol_/20101841/2

Hirtz R, Hars C, Naaresh R, Laabs BH, Antel J, Grasemann C, et al. Causal effect of age at menarche on the risk for depression: results from a two-sample multivariable Mendelian randomization study. Front Genet. 2022;13:918584.

Salk RH, Hyde JS, Abramson LY. Gender differences in depression in representative national samples: meta-analyses of diagnoses and symptoms. Psychol Bull. 2017;143(8):783–822.

Ong KK, Elks CE, Li S, Zhao JH, Luan J, Andersen LB, et al. Genetic variation in LIN28B is associated with the timing of puberty. Nat Genet. 2009;41(6):729–33.

Wei YB, Liu JJ, Villaescusa JC, Åberg E, Brené S, Wegener G, et al. Elevation of Il6 is associated with disturbed let-7 biogenesis in a genetic model of depression. Transl Psychiatry. 2016;6(8):e869–e869.

Prince C, Joinson C, Kwong AS, Fraser A, Heron J. The relationship between timing of onset of menarche and depressive symptoms from adolescence to adulthood. Epidemiol Psychiatr Sci. 2023;32:e60.

Padrutt ER, Harper J, Schaefer JD, Nelson KM, McGue M, Iacono WG, et al. Pubertal timing and adolescent outcomes: investigating explanations for associations with a genetically informed design. J Child Psychol Psychiatry. 2023;64(8):1232–41.

Harden KP, Mendle J. Gene-environment interplay in the association between pubertal timing and delinquency in adolescent girls. J Abnorm Psychol. 2012;121(1):73–87.

Brumpton B, Sanderson E, Heilbron K, Hartwig FP, Harrison S, Vie GÅ, et al. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nat Commun. 2020;11(1):3519.

Wootton RE, Jones HJ, Sallis HM. Mendelian randomisation for psychiatry: how does it work, and what can it tell us? Mol Psychiatry. 2022;27(1):53–7.

Nilsen RM, Vollset SE, Gjessing HK, Skjaerven R, Melve KK, Schreuder P, et al. Self-selection and bias in a large prospective pregnancy cohort in Norway. Paediatr Perinat Epidemiol. 2009;23(6):597–608.

Nohr EA, Liew Z. How to investigate and adjust for selection bias in cohort studies. Acta Obstet Gynecol Scand. 2018;97(4):407–16.

Download references

Acknowledgements

We thank the Norwegian Institute of Public Health (NIPH) for generating high-quality genomic data. This research is part of the HARVEST collaboration, supported by the Research Council of Norway (#229624). We also thank deCODE Genetics and the NORMENT Centre for providing genotype data, funded by the Research Council of Norway (#223273), South-Eastern Norway Health Authority, and KG Jebsen Stiftelsen. We further thank the Center for Diabetes Research, University of Bergen for providing genotype data and performing quality control and imputation of the data funded by the ERC AdG project SELECTionPREDISPOSED, Stiftelsen Kristian Gerhard Jebsen, Trond Mohn Foundation, the Research Council of Norway, the Novo Nordisk Foundation, the University of Bergen, and the Western Norway Health Authorities (Helse Vest). The Norwegian Mother, Father and Child Cohort Study is supported by the Norwegian Ministry of Health and Care Services and the Ministry of Education and Research. We are grateful to all the participating families in Norway who take part in this ongoing cohort study.

This work was performed on the TSD (Tjeneste for Sensitive Data) facilities, owned by the University of Oslo, operated and developed by the TSD service group at the University of Oslo, IT Department (USIT). The computations were performed on resources provided by Sigma2—the National Infrastructure for High Performance Computing and Data Storage in Norway. Data from NPR has been used in this publication. The interpretation and reporting of these data are the sole responsibility of the authors, and no endorsement by NPR is intended nor should be inferred.

Open access funding provided by Norwegian Institute of Public Health (FHI) The Research Council of Norway supports F.A.T., C.S., E.C., H.A., T.R.-K., N.M.D., and O.A.A. (#300668; #274611; #273659, #273659, #324620, #274611, #295989, #229129; #213837; #248778; #223273; #249711). The South-Eastern Regional Health Authority supports A.D.A., R.E.W., O.A.A., A.H., L.J.H., and E.C. (#2020023, #2020024, 2017–112, #2020022, #2018058, #2021045). N.M.D. and G.D.S. work in a unit that receives support from the University of Bristol and the UK Medical Research Council (MC_UU_00011/1). O.A.A. is also supported by Stiftelsen Kristian Gerhard Jebsen and H2020 grant CoMorMent (#847776). This work was partly supported by the Research Council of Norway through its Centres of Excellence funding scheme (#262700). The funders have/had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and affiliations.

Department of Psychology, University of Oslo, Oslo, Norway

Adrian Dahl Askelund & Fartein A. Torvik

Nic Waals Institute, Lovisenberg Diaconal Hospital, Oslo, Norway

Adrian Dahl Askelund, Robyn E. Wootton, Elizabeth C. Corfield, Alexandra Havdahl & Laurie J. Hannigan

PsychGen Centre for Genetic Epidemiology and Mental Health, Norwegian Institute of Public Health, Oslo, Norway

Adrian Dahl Askelund, Robyn E. Wootton, Helga Ask, Elizabeth C. Corfield, Ted Reichborn-Kjennerud, Alexandra Havdahl & Laurie J. Hannigan

MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK

Robyn E. Wootton, George Davey Smith, Alexandra Havdahl & Laurie J. Hannigan

School of Psychological Science, University of Bristol, Bristol, UK

Robyn E. Wootton

Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway

Fartein A. Torvik, Maria C. Magnus & Per M. Magnus

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, USA

Rebecca B. Lawn

Promenta Research Center, Department of Psychology, University of Oslo, Oslo, Norway

Helga Ask & Alexandra Havdahl

Institute of Clinical Medicine, University of Oslo, Oslo, Norway

Ted Reichborn-Kjennerud

NORMENT Centre, Institute of Clinical Medicine, University of Oslo, Oslo, Norway

Ole A. Andreassen

Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway

Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway

Camilla Stoltenberg

NORCE Norwegian Research Centre, Bergen, Norway

Division of Psychiatry, University College London, London, UK

Neil M. Davies

Department of Statistical Sciences, University College London, London, UK

KG Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway

You can also search for this author in PubMed   Google Scholar

Contributions

Author contributions are presented according to the CRediT (Contributor Roles Taxonomy). A.D.A: conceptualisation, methodology, formal analysis, software, visualisation, writing—original draft, writing—review and editing, and project administration. R.E.W: conceptualisation, methodology, software, and writing—review and editing. F.A.T: writing—review and editing. R.B.L: writing—review and editing. H.A: writing—review and editing and funding acquisition. E.C: data curation, software, and writing—review and editing. M.C.M: conceptualisation, methodology, and writing—review and editing. T.R.-K: writing—review and editing and funding acquisition. P.M: investigation and writing—review and editing. O.A.A: investigation and writing—review and editing. C.S.: writing—review and editing. G.D.S: writing—review and editing. N.M.D: methodology and writing—review and editing. A.H: conceptualisation, methodology, writing—review and editing, supervision, and funding acquisition. L.J.H: conceptualisation, methodology, formal analysis, software, data curation, writing—original draft, writing—review and editing, supervision, and funding acquisition. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Adrian Dahl Askelund or Laurie J. Hannigan .

Ethics declarations

Ethics approval and consent to participate.

The establishment of MoBa and initial data collection was based on a licence from the Norwegian Data Protection Agency and approval from the Regional Committees for Medical and Health Research Ethics. The MoBa cohort is now based on regulations related to the Norwegian Health Registry Act. The current study was approved by the Regional Committees for Medical and Health Research Ethics (REK numbers 2016/1702). By consenting to MoBa, participants have also agreed to linkage to KUHR, NPR, and MBRN. MBRN is a national health registry containing information about all births in Norway.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Supplementary methods with information about the: a) categorised age at menarche, b) dichotomised depressive symptoms, c) multiple imputation, d) diagnostic codes, e) inverse probability weighting, f) psychometric properties of scales, g) definition/removal of outliers.

Additional file 2: Table S1.

 With psychometric properties of 8y symptom scales (ordinal Cronbach’s alphas).

Additional file 3: Table S2.

With overview of variables: details about items, missingness, and processing.

Additional file 4.

Description of genetic instruments for two-sample MR analyses.

Additional file 5.

Description of MR sensitivity analyses, including: a) one-sample MR sensitivity analyses, b) two-sample MR sensitivity analyses, c) multivariable MR analyses.

Additional file 6.

Outline of analyses for each hypothesis, including: a) main analyses, b) negative control analyses, c) the smallest effect size of interest, d) sensitivity analyses, e) inference criteria.

Additional file 7.

Description of power analyses, including: a) projected prevalence, b) data generation, c) power calculation, d) Figs. S1-S8 showing results of power analysis for all hypotheses.

Additional file 8.

Supplementary results, including: a) imputation of age at menarche, b) breast stage as an additional exposure, c) categorised age at menarche, d) exploratory analyses.

Additional file 9: Table S3.

With associations of genetic instrument for age at menarche with the covariates.

Additional file 10: Table S4.

With results of multivariable Mendelian randomisation sensitivity analyses.

Additional file 11: Fig. S9.

With differences between participants and non-participants with IP weighting.

Additional file 12: Fig. S10.

Showing the impact of inverse probability weighting on results for depression.

Additional file 13: Fig. S11.

With 2-sample MR sensitivity analyses for other mental health domains.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Askelund, A.D., Wootton, R.E., Torvik, F.A. et al. Assessing causal links between age at menarche and adolescent mental health: a Mendelian randomisation study. BMC Med 22 , 155 (2024). https://doi.org/10.1186/s12916-024-03361-8

Download citation

Received : 12 January 2022

Accepted : 18 March 2024

Published : 12 April 2024

DOI : https://doi.org/10.1186/s12916-024-03361-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Age at menarche

BMC Medicine

ISSN: 1741-7015

hypothesis cohort study

IMAGES

  1. Cohort Studies

    hypothesis cohort study

  2. PPT

    hypothesis cohort study

  3. PPT

    hypothesis cohort study

  4. What is the Cohort Study?: Types and Examples

    hypothesis cohort study

  5. Cohort Studies

    hypothesis cohort study

  6. PPT

    hypothesis cohort study

VIDEO

  1. Hypothesis testing #study bs 7 semester statics

  2. What is Cohort Study ?(कोहार्ट अध्ययन क्या है ?)By Prof.Manoj Dayal【274】

  3. March 25th AP Stats Hypothesis Testing Foundations and Study Design Analysis Project Discussiom

  4. صياغة الفرضية Formulation of the Hypothesis

  5. Statistics for Hypothesis Testing

  6. How to write a case series? Journal paper writing, article publishing basics

COMMENTS

  1. What Is a Cohort Study?

    A cohort study is a type of observational study that follows a group of participants over a period of time, examining how certain factors (like exposure to a given risk factor) affect their health outcomes. The individuals in the cohort have a characteristic or lived experience in common, such as birth year or geographic area.

  2. Methodology Series Module 1: Cohort Studies

    The term "cohort" refers to a group of people who have been included in a study by an event that is based on the definition decided by the researcher. For example, a cohort of people born in Mumbai in the year 1980. This will be called a "birth cohort.". Another example of the cohort will be people who smoke.

  3. LibGuides: Quantitative study designs: Cohort Studies

    There is a persuasive hypothesis linking an exposure to an outcome. ... The stages of a Cohort Study. A cohort study starts with the selection of a group of participants (known as a 'cohort') sourced from the same population, who must be free of the outcome under investigation but have the potential to develop that outcome.

  4. Overview: Cohort Study Designs

    The cohort study design is an excellent method to understand an outcome or the natural history of a disease or condition in an identified study population ( Mann, 2012; Song & Chung, 2010 ). Since participants do not have the outcome or disease at study entry, the temporal causality between exposure and outcome (s) can be assessed using this ...

  5. PDF Guidelines for reading a Cohort Study

    1. Breslow, NE, Day NE. Statistical Methods in Cancer Research:II. The Design and Analysis of Cohort Studies. International Agency for Research on Cancer, 1987, Lyon. 2. Dwyer JH, Feinleib M, Lippert P, Hoffmeister H. Statistical Models for Longitudinal Studies of Health. Monographs in Epidemiology and Biostatistics.

  6. Cohort Studies: Design, Analysis, and Reporting

    Abstract. Cohort studies are types of observational studies in which a cohort, or a group of individuals sharing some characteristic, are followed up over time, and outcomes are measured at one or more time points. Cohort studies can be classified as prospective or retrospective studies, and they have several advantages and disadvantages.

  7. Cohort Study Design: An Underutilized Approach for Advancement of

    A cohort study is assumed to involve a prospective approach, ... between randomly created study groups from evaluation of an association between exposure and outcome within a cohort. The test of a null hypothesis for a study that randomly assigns participants to 2 or more different groups determines the probability ...

  8. Cohort Studies: Design, Analysis, and Reporting

    Design, Analysis, and Reporting. Cohort studies are types of observational studies in which a cohort, or a group of individuals sharing some characteristic, are followed up over time, and outcomes are measured at one or more time points. Cohort studies can be classified as prospective or retrospective studies, and they have several advantages ...

  9. Cohort study

    A cohort study is a particular form of longitudinal study that samples a cohort (a group of people who share a defining characteristic, typically those who experienced a common event in a selected period, ... failure to refute a hypothesis often strengthens confidence in it. Crucially, the cohort is identified before the appearance of the ...

  10. Cohort studies investigating the effects of exposures: key ...

    Cohort studies follow a population exposed or not exposed to a potential causal agent forward in time and assess outcomes. Cohort studies are beneficial because these studies allow the ...

  11. Cohort Study: Definition, Designs & Examples

    FAQs. A cohort study is a type of longitudinal study where a group of individuals (cohort), often sharing a common characteristic or experience, is followed over an extended period of time to study and track outcomes, typically related to specific exposures or interventions. In cohort studies, the participants must share a common factor or ...

  12. Cohort Study: Definition, Benefits & Examples

    Cohort studies are observational designs, meaning that the researchers do not manipulate experimental or environmental conditions. Instead, they collect data over time and try to understand how various factors affect the outcome. These projects can last for periods ranging from weeks to decades, depending on the research questions.

  13. 13. Study design and choosing a statistical test

    A cohort study is one in which subjects, initially disease free, are followed up over a period of time. Some will be exposed to some risk factor, for example cigarette smoking. ... For example, in a prevalence study there is no hypothesis to test, and the size of the study is determined by how accurately the investigator wants to determine the ...

  14. Designing and Conducting Analytic Studies in the Field

    In field epidemiology, prospective cohort studies also often involve a group of persons who have had a known exposure (e.g., survived the World Trade Center attack on September 11, 2001 [ 7 ]) and who are then followed to examine the risk for subsequent illnesses with long incubation or latency periods.

  15. Formulating Hypotheses for Different Study Designs

    Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...

  16. Cohort study: What are they, examples, and types

    Nurses' Health Study. One famous example of a cohort study is the Nurses' Health Study. This was a large, long-running analysis of female health that began in 1976. It investigated the ...

  17. Epiville: Cohort Study -- Study Design

    Answer (a) — incorrect: The proposed hypothesis implies the comparison of exposure states between diseased and non-diseased individuals. This comparison is appropriate for a case-control study. A cohort study is designed to compare outcomes between exposed and non-exposed groups.

  18. Lesson 9: Etiologic Studies (3) Cohort Study Design; Sample Size and

    A cohort study is useful for estimating the risk of disease, the incidence rate, and/or relative risks. Non-cases may be enrolled from a well-defined population, current exposure status (at \ (t_0\)) determined, and the onset of disease observed in the subjects over time. Disease status at \ (t_1\) can be compared to exposure status at \ (t_0\).

  19. Basic Understanding of Study Types and Formulating Research Question

    Cohort study may also be performed with a single cohort (i.e. without a control group), where a group of individuals sharing a common characteristic are followed up forward in time to know the outcome of interest. ... Primary research question should never be compromised because the study hypothesis and objectives are framed based on the ...

  20. How to Write a Strong Hypothesis

    5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  21. Step 4: Test Hypotheses

    Step 4: Test Hypotheses. Once investigators have narrowed down the likely source of the outbreak to a few possible foods, they test the hypotheses. Investigators can use many different methods to test their hypotheses, but most methods entail studies that compare how often (frequency) sick people in the outbreak ate certain foods to how often ...

  22. Observational Studies: Cohort and Case-Control Studies

    Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In this review article, we describe these study designs, methodological issues, and provide examples from the plastic surgery literature. Keywords: observational studies, case-control study ...

  23. Assessing causal links between age at menarche and adolescent mental

    Here, we aim to replicate the 14-year analyses in adolescents from a larger birth cohort, the Norwegian Mother, Father, and Child Cohort Study (MoBa) . This replication will allow for a confirmatory and higher-powered test of the hypothesis that earlier age at menarche is causally related to adolescent depression.