- Search Menu
- Sign in through your institution
- Ageing - Other
- Bladder and Bowel Health
- Cardiovascular
- Community Geriatrics
- Dementia and Related Disorders
- End of Life Care
- Ethics and Law
- Falls and Bone Health
- Frailty in Urgent Care Settings
- Gastroenterology and Clinical Nutrition
- Movement Disorders
- Perioperative Care of Older People Undergoing Surgery
- Pharmacology and therapeutics
- Respiratory
- Sarcopenia and Frailty Research
- Telemedicine
- Advance articles
- Editor's Choice
- Supplements
- Themed collections
- The Dhole Eddlestone Memorial Prize
- 50th Anniversary Collection
- Author Guidelines
- Submission Site
- Open Access
- Reasons to Publish
- Advertising and Corporate Services
- Journals Career Network
- Advertising
- Reprints and ePrints
- Sponsored Supplements
- Branded Books
- About Age and Ageing
- About the British Geriatrics Society
- Editorial Board
- Self-Archiving Policy
- Journals on Oxford Academic
- Books on Oxford Academic
Article Contents
Introduction, describing the distribution of values, descriptive statistics in text, descriptive statistics in tables, describing loss of participants in a study, comparing baseline characteristics in rcts, conclusions, acknowledgements, conflicts of interest.
- < Previous
Describing the participants in a study
- Article contents
- Figures & tables
- Supplementary Data
R. M. Pickering, Describing the participants in a study, Age and Ageing , Volume 46, Issue 4, July 2017, Pages 576–581, https://doi.org/10.1093/ageing/afx054
- Permissions Icon Permissions
This paper reviews the use of descriptive statistics to describe the participants included in a study. It discusses the practicalities of incorporating statistics in papers for publication in Age and Aging , concisely and in ways that are easy for readers to understand and interpret.
Most papers reporting analysis of clinical data will at some point use statistics to describe the socio-demographic characteristics and medical history of the study participants. An important reason for doing this is to give the reader some idea of the extent to which study findings can be generalised to their own local situation. The production of descriptive statistics is a straightforward matter, most statistical packages producing all the statistics one could possibly desire, and a choice has to be made over which ones to present. These then have to be included in a paper in a manner that is easy for readers to assimilate. There may be constraints on the amount of space available, and it is in any case a good idea to make statistical display as concise as possible. This article reviews the statistics that might be used to describe a sample of older people, and gives tips on how best to do this in a paper for publication in Age and Aging . It builds on a previously published paper [ 1 ].
The values observed in a group of subjects, when measurements of a quantitative characteristic are made, are called the distribution of values. Graphical displays can be used to show the detail of the distribution in a variety of ways, but they take up a considerable amount of space. A precis of two key features of the distribution, its centre and its spread, is usually presented using descriptive statistics. The centre of a distribution can be described by its mean or median, and the spread by its standard deviation (SD), range, or inter-quartile range (IQR). Definitions and properties of these statistics are given in statistical textbooks [ 2 ].
Figure 1 a shows an idealised symmetric distribution for a quantitative variable. The mean might be used here to describe where the centre of the distribution lies and the SD to give an idea of how spread out values are around the centre. SDs are particularly appropriate where a symmetric distribution approximately follows the bell-shaped pattern shown in Figure 1 a which is called the normal distribution. For such a distribution the large majority, 95%, of values observed in a sample will fall between the values two SDs above and below the mean, called the normal range. Presentation of the mean and SD invites the reader to calculate the normal range and think of it as covering most of the distribution of values. Another reason for presenting the SD is that it is required in calculations of sample size for approximately normally distributed outcomes, and can be used by readers in planning future studies. A graphical display of approximately normally distributed real data (age at admission amongst 373 study participants) is shown in Figure 1 c: with relatively small sample size a smooth distribution such as that shown in Figure 1 a cannot be achieved. The mean (82.9) and SD (6.8) of the age distribution lead to the normal range 69.3–96.5 years, which can be seen in Figure 1 c to cover most of the ages in the sample: 14 subjects fall below 69.3 and 7 fall above 96.5, so that the range actually covers 352 (94.4%) of the 373 participants, close to the anticipated 95%. For familiar measurements, such as age, there is additional value in presenting the range, the minimum and maximum values attained. Knowing that the study included people aged between 65 and 101 years is immediately meaningful, whereas the value of the SD is more difficult to interpret.
Idealised and real data distributions. (a) Symmetrical distribution. (b) Skewed distribution. (c) Dotplot (each dot representing one value) of an approximate symmetrical distribution indicating the normal range: age in years at admission ( n = 373). (d) Dotplot (each dot representing one value) of a skewed distribution with outliers emphasised and indicating mean and median: hours in A&E ( n = 348).
When a distribution is skewed (Figure 1 b) just one or two extreme values, ‘outliers’, in one of the tails of the distribution (to the right in Figure 1 b) pull the mean away from the obvious central value. An alternative statistic describing central location is the median, defined as the point with 50% of the sample falling above it and 50% below. Figure 1 d shows the distribution of real data (hours in A&E amongst 348 study participants) following a skewed distribution. A few excessively long A&E stays pull the mean to the higher value of 4.9 h compared to the median of 4.4 h: the effect would be greater with a higher proportion of subjects having long stays. The median is often recommended as the preferred statistic to describe the centre of a skewed distribution, but the mean can be helpful. If the attribute being described takes only a limited number of values, the medians of two groups can take the same value in spite of substantial differences in the tails. In these circumstances, the mean can be sensitive to an overall shift in distribution while the median is not. When a comparison of cost based on length of stay is to be made, presenting means of the skewed distributions facilitates calculation of cost savings per subject by applying unit cost to the difference in means. Figure 1 b suggests that the value with highest frequency might be a useful descriptor of the centre of a distribution. In practice, this can prove awkward: depending on the precision of measurement there may be no value occurring more than once.
It is clear from Figure 1 b that no single number can adequately describe the spread of a skewed distribution because spread is greater in one direction than the other. The range (from 1.7 to 40.3 h in A&E in our skewed example) could be used. Another possibility is the IQR (from 3.5 to 5.4 h in A&E) covering the central 50% of the distribution. The SD may be presented even though a distribution is skewed, and could be useful to readers for approximate power calculations, but the normal range derived from the mean and SD will be misleading. With mean(SD) = 4.9(3.2), the lower limit of the normal range of hours in A&E is the impossible negative value of –1.5 h, while the upper limit of 11.3 h lies well below the extreme values exhibited in Figure 1 d.
Descriptive statistics may be presented in text, for example [ 3 ]:
Participants’ ages ranged from 50 to 87 years ( M = 66.1, SD = 7.8) with 56% identified as female, 64% married or partnered, 23% reported being retired or not working, 55% had post-secondary and higher education, and <20% reported living alone. Over 60% of the participants identified as NZ European. The mean of net personal annual income was $34,615. The participants reported the diagnosis of an average of 2.63 (±2.07) chronic health conditions, with 50% reported having three or more chronic health conditions.
There are perhaps too many attributes (age, gender, marital status, employment status, educational level, living arrangements, nationality, personal income and number of chronic conditions) being described in the excerpt above: it would be easier to assimilate this information from a table.
Characteristics of subjects at admission and their operations before (1998/99) and after (2000/01) implementation of a care pathway [ 4 ]. Figures are number (% of non-missing values) unless otherwise stated
. | 1998/99 ( = 395) . | 2000/01 ( = 373) . |
---|---|---|
Age on admission (years) | ||
Mean (SD) | 83 (7) | 83 (7) |
Minimum–maximum | 65–101 | 65–101 |
Gender | ||
Male | 90 (23%) | 90 (24%) |
Female | 305 (77%) | 283 (76%) |
Admission domicile | ||
Own home | 219 (55%) | 202 (54%) |
Sheltered accommodation | 47 (12%) | 58 (16%) |
Residential care | 90 (23%) | 83 (22%) |
Nursing home | 18 (5%) | 15 (4%) |
Other ward SUHT | 7 (2%) | 2 (1%) |
Other trust | 14 (4%) | 13 (4%) |
Ambulation score | ||
Bed/chair bound | 8 (2%) | 5 (1%) |
Presence 1+ | 12 (3%) | 7 (2%) |
1 person | 25 (6%) | 20 (5%) |
Unable 50 m | 145 (37%) | 138 (38%) |
Able 50 m | 200 (51%) | 197 (54%) |
( = 390) | ( = 367) | |
Time in A&E (h) | ||
Mean (SD) | 4.9 (3.2) | 5.6 (2.4) |
Minimum–maximum | 1.7–40.3 | 0–21.4 |
( = 348) | ( = 328) | |
History of dementia | 79 (20%) | 85 (23%) |
( = 395) | ( = 371) | |
Confused on admission | 124 (32%) | 125 (34%) |
( = 394) | ( = 371) | |
Type of fracture | ||
Intra-capsular | 192 (54%) | 173 (52%) |
Extra-capsular | 165 (46%) | 161 (48%) |
( = 357) | ( = 334) | |
Operation more than 48 h after ward admission | 183 (52%) | 205 (64%) |
( = 354) | ( = 323) | |
Reason for delayed operation | ||
Medical | 61 (35%) | 74 (43%) |
Organisational | 66 (38%) | 72 (42%) |
Both | 45 (26%) | 27 (16%) |
( = 172) | ( = 173) | |
Type of operation | ||
Thompson's hemiarthroplasty | 101 (27%) | 87 (24%) |
Austin-Moore hemiarthroplasty | 69 (19%) | 18 (5%) |
Dynamic screw | 162 (43%) | 165 (46%) |
Asnis screws | 38 (11%) | 38 (11%) |
Bipolar hemiarthroplasty | 3 (1%) | 48 (14%) |
( = 373) | ( = 356) | |
Grade of surgeon | ||
Consultant | 46 (12%) | 110 (32%) |
SPR | 318 (86%) | 220 (63%) |
SHO | 6 (2%) | 18 (5%) |
( = 355) | ( = 348) | |
Grade of anaesthetist | ||
Consultant | 1206 (34%) | 175 (55%) |
SPR | 99 (28%) | 52 (16%) |
SHO | 133 (38%) | 81 (29%) |
( = 352) | ( = 318) |
. | 1998/99 ( = 395) . | 2000/01 ( = 373) . |
---|---|---|
Age on admission (years) | ||
Mean (SD) | 83 (7) | 83 (7) |
Minimum–maximum | 65–101 | 65–101 |
Gender | ||
Male | 90 (23%) | 90 (24%) |
Female | 305 (77%) | 283 (76%) |
Admission domicile | ||
Own home | 219 (55%) | 202 (54%) |
Sheltered accommodation | 47 (12%) | 58 (16%) |
Residential care | 90 (23%) | 83 (22%) |
Nursing home | 18 (5%) | 15 (4%) |
Other ward SUHT | 7 (2%) | 2 (1%) |
Other trust | 14 (4%) | 13 (4%) |
Ambulation score | ||
Bed/chair bound | 8 (2%) | 5 (1%) |
Presence 1+ | 12 (3%) | 7 (2%) |
1 person | 25 (6%) | 20 (5%) |
Unable 50 m | 145 (37%) | 138 (38%) |
Able 50 m | 200 (51%) | 197 (54%) |
( = 390) | ( = 367) | |
Time in A&E (h) | ||
Mean (SD) | 4.9 (3.2) | 5.6 (2.4) |
Minimum–maximum | 1.7–40.3 | 0–21.4 |
( = 348) | ( = 328) | |
History of dementia | 79 (20%) | 85 (23%) |
( = 395) | ( = 371) | |
Confused on admission | 124 (32%) | 125 (34%) |
( = 394) | ( = 371) | |
Type of fracture | ||
Intra-capsular | 192 (54%) | 173 (52%) |
Extra-capsular | 165 (46%) | 161 (48%) |
( = 357) | ( = 334) | |
Operation more than 48 h after ward admission | 183 (52%) | 205 (64%) |
( = 354) | ( = 323) | |
Reason for delayed operation | ||
Medical | 61 (35%) | 74 (43%) |
Organisational | 66 (38%) | 72 (42%) |
Both | 45 (26%) | 27 (16%) |
( = 172) | ( = 173) | |
Type of operation | ||
Thompson's hemiarthroplasty | 101 (27%) | 87 (24%) |
Austin-Moore hemiarthroplasty | 69 (19%) | 18 (5%) |
Dynamic screw | 162 (43%) | 165 (46%) |
Asnis screws | 38 (11%) | 38 (11%) |
Bipolar hemiarthroplasty | 3 (1%) | 48 (14%) |
( = 373) | ( = 356) | |
Grade of surgeon | ||
Consultant | 46 (12%) | 110 (32%) |
SPR | 318 (86%) | 220 (63%) |
SHO | 6 (2%) | 18 (5%) |
( = 355) | ( = 348) | |
Grade of anaesthetist | ||
Consultant | 1206 (34%) | 175 (55%) |
SPR | 99 (28%) | 52 (16%) |
SHO | 133 (38%) | 81 (29%) |
( = 352) | ( = 318) |
The distributions of the two quantitative variables in Table 1 are described by mean (SD) and range. The statistics being presented should be stated in the context of the table, here in the left hand column, and could differ across variables. If the same statistics are presented for all the variables in a table they can be indicated in the column headings or title. From the mean (SD) and range in each phase, we can see that the age distribution is reasonably symmetrical because the mean falls close to the centre of the range, and the mean ± 2 SD approach the limits of the range. The distribution of hours in A&E is skewed to the right but has been summarised with the same statistics. We can see that the distribution is skewed because the mean is much closer to the minimum than the maximum, and, if the normal range is calculated, the upper limit does not approach the high values in either phase. For these reasons, the normal range should not be interpreted as covering 95% of values. These conclusions from descriptive statistics alone can be verified in Figure 1 c and d.
A choice arises when describing the distribution of an ordinal variable indicating ordered response categories, such as ambulation score in Table 1 . If the variable takes many distinct values, it can be treated as a quantitative variable and described in terms of centre and spread: ordinal variables often extend from the minimum to maximum possible values and in this case stating the range is not helpful. The meaning of the extremes should be stated in the context of the table to aid interpretation of results. Ordinal variables taking only a few distinct values are better treated as categorical variables and number (%) presented for each category. With only five categories the latter approach was adopted for ambulation score. Display as a categorical variable can be facilitated by combining infrequently occurring adjacent values.
In the original study, 3,182 of 5,719 admissions were screened and 2,286 were eligible. Six hundred and ten patients were not available on the hospital units when the RA [Research Assistant] arrived to complete the CAM [Confusion Assessment Method]; 1,582 patients assented to complete the CAM and 94 patients did not assent; the CAM was not completed for 728 patients because an informant was not available to confirm an acute change and fluctuation in mental status prior to admission or enrolment. The CAM was completed for 854 patients; 375 had delirium; 278 were enroled. Of the 278 enroled patients, 172 were discharged before the follow-up assessment, 73 were still hospitalised, 8 withdrew from the study and 27 died. Of the 172 discharged patients, delirium recovery status was determined for 152, 16 withdrew from the study after discharge and 4 died.
The authors start with the 5,719 admissions and report the numbers lost at successive stages, to arrive at the analysis sample of 152. It may be easier to assimilate the detail of the process from tabular or graphical presentation. The CONSORT guidelines [ 6 ] concerning the reporting of Randomised Controlled Trials (RCTs) recommend that progress of participants through a trial be presented as a flow chart, and an example is shown in Figure 2 . These charts are unequivocally helpful and are now presented in studies other than RCTs.
Recruitment and attrition rates in an RCT of WiiActive exercises in community dwelling older adults [ 7 ].
In addition to loss of participants at each time point as shown in a flow chart, information on specific variables may be missing even though a participant was available at the study point in question. Taking Table 1 as an example, there were 395 and 373 admissions during the 1998/99 and 2000/01 phases, respectively, as stated in the column headings, but the number of participants providing information varies considerably across the characteristics in the table. The reader should be able to establish how many cases contribute to each result, and to this end wherever the number available is lower than the total for the phase, it is stated below the descriptive statistics. For example, ambulation score was only available for 390 of the 395 participants in the 1998/99 phase. The percentages presented for ambulation score were calculated amongst cases where information was available, and this was done for all percentages in the table as indicated in the title. Alternatively, missing values in a categorical variable may be treated as a category in their own right. Where there is a large amount of missing information, this may be the best way of handling the situation with percentages calculated from the total sample size as denominator. Stating the numbers available allows the reader to check this point. Only participants whose operation was delayed by more than 48 h, gave a ‘reason why operation was delayed’ in the table, and from the stated numbers the reader can see that a reason was not given for all delayed cases.
In reports of RCTs, a table describing baseline characteristics in each trial arm demonstrates whether or not randomisation was successful in producing similar groups, as well as addressing the generalisability issue. If there are differences at baseline, comparison of outcome may be confounded. Statistical tests of significance should not be used to decide whether any differences need to be taken into account [ 8 , 9 ]. If the allocation was properly randomised, we know that any differences at baseline must be due to chance. The question facing the researcher is whether or not the magnitude of a difference at baseline is sufficient to confound comparison of outcome, and this depends on the strength of the relationship between the potential confounder and the outcome, as well the baseline difference. A statistical test for baseline differences does not address this question; furthermore, there may be insufficient numbers available to detect quite large baseline differences. Statistics describing baseline characteristics are used to judge whether any differences are large enough to be important. If they are, additional analyses of outcome controlled for characteristics that differ at baseline may be performed. On the other hand, in non-randomised studies, groups are likely to differ, and statistical significance tests can be used to evaluate the evidence that the selection process of patients to each intervention results in different groups. In this situation a primary analysis controlled for many predictors of outcome would probably have been planned, and should be carried out irrespective of any differences, or lack of them, between study groups.
Describing the main features of the distribution of important characteristics of the participants included in a study is the first step in most papers reporting statistical analysis. It is important in establishing the generalisability of research findings, and in the context of comparative studies, flags the need for controlled analysis. Usually space constraints limit the presentation of many descriptive statistics, and in any case, too many statistics can confuse rather than enhance insight. The attrition of subjects during a study should also be described, so that study subjects can be related to the patient base from which they were drawn.
Descriptive statistics are used to describe the participants in a study so that readers can assess the generalisability of study findings to their own clinical practice.
They need to be appropriate to the variable or participant characteristic they aim to describe, and presented in a fashion that is easy for readers to understand.
When many patient characteristics are being described, the detail of the statistics used and number of participants contributing to analysis are best incorporated in tabular presentation.
The author would like to thank Dr Helen Roberts for kindly granting permission to use data from the care pathway study [ 4 ] to produce Figure 1 c and d.
None declared.
Pickering RM . Describing the subjects in a study . Palliat Med 2001 ; 15 : 69 – 75 .
Google Scholar
Altman DG . Practical Statistics for Medical research . London : Chapman & Hall , 1991 .
Google Preview
Yeung P , Breheny M . Using the capability approach to understand the determinants of subjective well-being among community-dwelling older people in New Zealand . Age Aging 2016 ; 45 : 292 – 8 .
Roberts HC , Pickering RM , Onslow E et al. . The effectiveness of implementing a care pathway for femoral neck fracture in older people: a prospective controlled before and after study . Age Aging 2004 ; 33 : 178 – 84 .
Cole MG , McCusker JM , Bailey R et al. . Partial and no recovery from delirium after hospital discharge predict increased adverse events . Age Aging 2017 ; 46 : 90 – 5 .
Schulz KF , Altman DG , Moher D , for the CONSORT Group . CONSORT 2010 statement: updated guidelines for reporting parallel-group randomised trials . BMJ 2010 ; 340 : 698 – 702 .
Kwok BC , Pua YH . Effects of WiiActive exercises on fear of falling and functional outcomes in community-dwelling older adults: a randomised control trial . Age Aging 2016 ; 45 : 621 – 28 .
Assman SF , Pocock SJ , Enos LE , Kasten LE . Subgroup analysis and other (mis)uses of baseline data in clinical trials . Lancet 2000 ; 355 : 1064 – 9 .
Altman DG . Comparability of randomized groups . Statistician 1985 ; 34 : 125 – 36 .
- descriptive statistics
Month: | Total Views: |
---|---|
May 2017 | 23 |
June 2017 | 62 |
July 2017 | 73 |
August 2017 | 53 |
September 2017 | 34 |
October 2017 | 89 |
November 2017 | 38 |
December 2017 | 59 |
January 2018 | 32 |
February 2018 | 12 |
March 2018 | 42 |
April 2018 | 45 |
May 2018 | 50 |
June 2018 | 40 |
July 2018 | 172 |
August 2018 | 255 |
September 2018 | 231 |
October 2018 | 289 |
November 2018 | 809 |
December 2018 | 1,101 |
January 2019 | 1,217 |
February 2019 | 1,418 |
March 2019 | 1,745 |
April 2019 | 1,633 |
May 2019 | 1,772 |
June 2019 | 1,136 |
July 2019 | 1,088 |
August 2019 | 1,091 |
September 2019 | 1,436 |
October 2019 | 1,933 |
November 2019 | 1,706 |
December 2019 | 1,447 |
January 2020 | 1,553 |
February 2020 | 2,191 |
March 2020 | 2,291 |
April 2020 | 3,369 |
May 2020 | 2,057 |
June 2020 | 2,624 |
July 2020 | 2,439 |
August 2020 | 2,584 |
September 2020 | 2,905 |
October 2020 | 3,179 |
November 2020 | 3,068 |
December 2020 | 2,768 |
January 2021 | 2,626 |
February 2021 | 2,429 |
March 2021 | 3,452 |
April 2021 | 3,830 |
May 2021 | 3,102 |
June 2021 | 2,528 |
July 2021 | 2,016 |
August 2021 | 1,848 |
September 2021 | 2,188 |
October 2021 | 2,649 |
November 2021 | 2,488 |
December 2021 | 2,142 |
January 2022 | 2,073 |
February 2022 | 2,164 |
March 2022 | 2,761 |
April 2022 | 3,154 |
May 2022 | 3,308 |
June 2022 | 2,185 |
July 2022 | 1,754 |
August 2022 | 2,090 |
September 2022 | 2,211 |
October 2022 | 2,497 |
November 2022 | 2,790 |
December 2022 | 2,471 |
January 2023 | 2,270 |
February 2023 | 2,359 |
March 2023 | 2,714 |
April 2023 | 3,028 |
May 2023 | 3,292 |
June 2023 | 2,366 |
July 2023 | 1,774 |
August 2023 | 1,588 |
September 2023 | 1,330 |
October 2023 | 1,571 |
November 2023 | 1,456 |
December 2023 | 1,293 |
January 2024 | 1,699 |
February 2024 | 1,815 |
March 2024 | 4,180 |
April 2024 | 2,115 |
May 2024 | 1,819 |
June 2024 | 1,047 |
July 2024 | 1,142 |
August 2024 | 1,063 |
September 2024 | 1,185 |
October 2024 | 806 |
Email alerts
Citing articles via.
- X (formerly Twitter)
- Recommend to your Library
Affiliations
- Online ISSN 1468-2834
- Copyright © 2024 British Geriatrics Society
- About Oxford Academic
- Publish journals with us
- University press partners
- What we publish
- New features
- Open access
- Institutional account management
- Rights and permissions
- Get help with access
- Accessibility
- Media enquiries
- Oxford University Press
- Oxford Languages
- University of Oxford
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
- Copyright © 2024 Oxford University Press
- Cookie settings
- Cookie policy
- Privacy policy
- Legal notice
This Feature Is Available To Subscribers Only
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
Application of biostatistics in research by teaching faculty and final-year postgraduate students in colleges of modern medicine: A cross-sectional study
- Author information
- Copyright and License information
Address for correspondence: Mrs. Alka Dilip Gore, Department of Community Medicine, Bharati Vidyapeeth Deemed University Medical College and Hospital, Sangli, 416 414, Maharashtra, India. E-mail: [email protected]
This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Biostatistics is well recognized as an essential tool in medical research, clinical decision making, and health management. Deficient basic biostatistical knowledge adversely affects research quality. Surveys on this issue are uncommon in the literature.
To study the use of biostatistics in research by teaching faculty and postgraduate students from colleges of modern medicine.
Settings and Design:
Cross-sectional study in colleges of modern medicine.
Materials and Methods:
A pretested proforma was used to collect information about the use of biostatistics by teaching faculty and final-year postgraduate students from colleges of modern medicine. The study period was 6 months.
Statistical Analysis:
Chi-square test, Spearman rank correlation coefficient, and multivariate analysis were used for analysis of data.
With this questionnaire, the maximum possible score for appropriate use of biostatistics in research was 20. The range of scores obtained by the study subjects was 1–20 and the median was 11. Appropriate use of biostatistics was independent of sex, designation, and education ( P >.05). Spearman coefficient showed low—but significant—correlation between the score and the number of papers presented and published ( P =.002 and P =.000, respectively).
Conclusions:
The study showed that nearly half of the respondents were not using statistics appropriately in their research. There was also lack of awareness about the need for applying statistical methods from the stage of planning itself.
Keywords: Awareness, biostatistical knowledge, PG students, teaching faculty
INTRODUCTION
Biostatistics is a branch of applied statistics and it must be taught with the focus being on its various applications in biomedical research.[ 1 ] It is an essential tool for medical research, clinical decision making, and health management.[ 2 ] Statisticians have long expressed concern about the slow uptake of statistical ideas by the medical profession and the frequent misuse of statistics when these methods are used. On the other hand, doctors have been worried about the increasing pressure to make use of techniques that they do not fully understand.[ 3 ] The biostatistical literacy of medical students is a problem all over the world.[ 1 ]
Research is an important activity for not only postgraduate (PG) medical students but for all medical professionals. Deficient basic biostatistical knowledge adversely affects research quality. Inappropriate statistical methods, techniques, and analysis results in time and cost lost and, most importantly, from the perspective of scientific ethics, does harm to science and humanity.[ 4 ] Writing on the teaching and learning of medical statistics in South Africa, Stander remarked that ‘medical practitioners were totally intimidated by the idea of statistics.’[ 5 ] Surveys on this issue are uncommon in the literature.[ 2 ]
This study was designed to find out the problems associated with biostatistical usage in research done by medical professionals in medical colleges. The aim of the study was to examine the use of biostatistics in research by the teaching faculty and PG students of colleges of modern medicine.
MATERIALS AND METHODS
A cross-sectional study was conducted amongst all teaching faculty and final-year PG students from five colleges of modern medicine in three adjacent districts of the south-western region of Maharashtra state, India, from June 2010 to November 2010. Data collection was done using a pretested questionnaire. A pilot study was done to validate the questionnaire and the proforma was modified as necessary. Permission for data collection was taken from the deans of the respective medical colleges. Data was collected by paying a visit to final-year PG students and teaching faculties of every department. They were briefed about the study. Proforma were distributed and filled in proforma were collected.
Final-year PG students are required to finish research work on some topic before obtaining their PG degree. Hence, they were chosen for this study as they can be expected to have relatively better understanding of biostatistics than junior residents. Those who were willing to participate in the study were explained the nature of the study. Information was collected by using a pretested self-administered questionnaire that was designed to elicit information on personal and professional characteristics and knowledge of basic biostatistics. Those study subjects who were not available during the first visit were visited again and administered the proforma. Those who could not be contacted despite five visits, as well as those who failed to return the filled-in proforma, were excluded from the study. About 3–5 visits were paid to each college for collection of the data.
Scoring was based on the responses to 20 questions on biostatistical knowledge. There were 11 closed-ended and 8 open-ended questions, and the maximum possible score was 20. Study subjects were classified into four groups according to the score obtained, as follows: <25%, 25%–50%, 50%–75%, and >75%.
Data was analyzed by calculating percentages. The chi-square test was applied to check the association of sex, education, and designation with the score. Spearman rank correlation coefficient was used to check the degree of association between the score and age, teaching experience, and number of papers presented and published. Multivariate regression was used to get an advanced model for the highly significant independent factors and score. The analysis was done with the help of MS ® Excel ® and the trial version of SPSS ® 17.
Ethical consideration
The institutional ethical committee approved this study. We explained the nature and purpose of the study to the participants and assured confidentiality before obtaining voluntary informed consent.
Of the 600 proformas that were distributed, 310 filled-in proformas were returned, giving a response rate of 51.67%. Twenty-nine respondents (9.35%) failed to mention their designation, gender, and/or age Among the 310 respondents, there were 46 (14.84%) professors, 43 (13.87%) associate professors, 122 (39.35%) lecturers, and 75 (24.19%) final-year PG students. The average age of the participants was 38.3 ± 11.06 years (range: 22–70 years). Among the 310 respondents, there were 175 males and 130 females [ Figure 1 ].
Distribution of study subjects according to gender and designation
Of the 310 respondents in the present study, 305 (98.39%) agreed that biostatistics is important for research. For 118 (38.06%) respondents biostatistics was easy to understand, while for 167 (53.87%) it was difficult. Of these latter 167 respondents, 16 (9.58%) said that all topics in biostatistics were difficult. However, 9 (56.25%) of these 16 respondents had not consulted a biostatistician for help with their research work despite facing problems with understanding biostatistics.
Two hundred and sixty-three (84.8%) respondents took the help of the statistician for data analysis, whereas 36 (11.6%) felt that such help was not necessary; 11 (3.5%) respondents did not answer this question. Only 97 (31.29%) respondents felt that the use of statistics is required from the stage of planning itself; the remaining respondents sought the help of a statistician after data collection,after collating the data in tabular form, or after analysis for interpretation and to check the significance of findings.
Half of the respondents (158; 50.97%) did not calculate sample size appropriately. These respondents used either all available study subjects or a figure of convenience (27.74% and 26.13%, respectively), and some (21.94%) decided the sample size according to previously published articles. Only 152 respondents (49.03%) made the effort to calculate sample size correctly, either by using a standard formula (13.87%) or by asking for the help of statistician (35.16%). Thirteen (4.19%) respondents did not answer the question related to calculation of sample size.
Various options were chosen by subjects in response to the question on the factors upon which data analysis depends: namely study design, sample size, type of data, and aim and objectives. Only 124 (40%) of the respondents mentioned all the factors that can influence data analysis. Twelve (3.87%) respondents did not have any knowledge about this. They responded as ‘don’t know.’
The most commonly mentioned use of a test of significance was ‘to find out the association’ and in general the respondents had very little knowledge about the other uses of test of significance. Three (0.9%) respondents had no idea whatsoever about the uses of tests of significance, and 16 (19.4%) did not respond to the question at all. None of the respondents was able to mention all the applications of tests of significance [ Table 1 ].
Various factors for which medical professionals seek the help of the statistician
The majority of the respondents (172; 55.5%) were unaware about different sampling techniques, and those who claimed about biostatistical knowledge, could not mention the various sampling techniques correctly. Irrelevant names of sampling techniques were given by 45 (14.51%) respondents, means they were totally unaware about sampling techniques. 74 (23.87%) mentioned the correct names, and 191 (61.61%) could not mention any of the names also.
Two hundred and three (65.5%) of the respondents admitted to preparing dummy tables in their research project. Two hundred and sixty-five (85.5%) of the respondents felt that they would need the help of a statistician for proper presentation of data, whereas the remaining respondents considered themselves capable of doing this without help.
Standard deviation (SD) is a measure of dispersion. It measures the degree variation in the data. The majority of the respondents (197; 63.55%) mentioned the correct meaning of standard deviation. Of the 310 respondents, 53 (17.1%) said that SD is a measure of central tendency, 11 (3.55%) stated that it is a measure of skewness, and 47 (15.16%) respondents did not even answer the question.
In this study, we scored each respondent for appropriate use of biostatistics. The maximum possible score was 20. The range of the scores obtained by the respondents was 1–20, and the median score was 11. We found that the score was independent of designation ( P =.22); however, higher scores were obtained by professors than by associate professors and lecturers. The score of PG students was high in comparison to that of MD or MS degree holders, diploma holders, and MSc holders. Female respondents scored more than males, though the difference was not statistically significant ( P =.21) [ Table 2 ].
Score of study subjects according to personal characteristics
The Spearman rank correlation coefficient was calculated for different parameters, including age, years of teaching experience, and number of papers presented and published. A very low (nonsignificant) degree of correlation was found between score and age. There was low but significant correlation of score with number of papers presented and published (R = 0.002 and R = 0.000, respectively) [ Table 3 ].
Spearman rank correlation coeffi cient for score and other parameters
Personal and professional determinants, which were significantly associated with score ( P <0.01); were considered for binary logistic regression. Wald's backward method was used to find out the most significant factors. Education, experience in teaching undergraduates, and number of paper publications were the significant factors at this level. Logistic regression showed that the score was highly dependent on the level of education of the respondents ( P =.009 for PG student and P =.01 for PhD) [ Table 4 ].
Logistic regression
In this study only 9 (2.9%) respondents gave the correct meaning of ‘ P value;’ 164 (52.9%) could not give the correct answer, and 115 (37.10%) did not respond to the question at all. More than half of the respondents (204; 65.81%) felt that the results of their research project need not be positive or concordant with that of the references used, while 43 respondents (13.87%) felt that the results should agree with that of the references mentioned. Two hundred and forty-seven (79.68%) respondents said that they wished to upgrade their knowledge, whereas 18 (5.81%) did not want to upgrade it.
Of the 600 distributed proformas, 310 filled-in proformas were returned, a response rate of 51.67%. This is relatively high in comparison to other studies; for example, in the study by Khan et al . the response rate was only 44.7%, and in the study by Laopaiboon et al . the response rate was 40.0%.[ 6 , 7 ]
It is important to understand biostatistical concepts to read the literature intelligently. The majority of the respondents in this study (305; 98.39%) agreed that biostatistics is important for research. Swift et al . and Windish et al . found that 79% and 95%, respectively, of the participants in their studies considered statistics as important for their work.[ 8 , 9 ] According to 118 (38.06%) respondents in our study, biostatistics was easy to understand, but for 167 (53.87%) it was difficult subject. Windish et al . mentioned that 75% of their respondents did not understand all of the concepts in statistics.[ 9 ] This difference from our findings regarding the understanding level may be because they considered only residents in their study, whereas we included final-year PG students as well as teaching faculty members. Seventy-seven (46.1%), respondents who found biostatistics difficult mentioned analysis, calculation, application of tests, or advanced biostatistics as complex topics; an equal number of respondents did not specify the difficult topics. Twenty-one of the respondents (6.77%) did not respond to the question.
Teachers of medical statistics have recommended that the focus should be on interpretation and understanding of concepts, and that mathematical formulae and calculation must be kept to a minimum.[ 10 – 12 ] Doctors engaging in research are expected to perform statistical analyses themselves or consult with a statistician right from the beginning of the research project.[ 13 ] Two hundred and sixty-three (84.8%) respondents in this study said that they took the help of the statistician for data analysis. The respondents gave various responses to the question on the stage at which they would seek a statistician's help. Doctors’ statistical training needs may have changed due to advances in information technology and the increasing emphasis on evidence-based medicine.[ 13 ]
Biostatistical methods make research scientific if they are used from the stage of planning of the research itself. Unbiased, consistent, and efficient parameter estimates are provided by correct use of statistics. This is possible by applying statistics from the planning stage until the end of the study. So it is necessary to consult statisticians at each and every stage of the study. Only 97 (31.29%) respondents in this study felt that the use of statistics is required from the stage of planning of the proposal, the remaining respondents felt that the help of a statistician is required only after data collection is completed, after tabulating the data, or after analysis—for interpretation and to check the significance of findings. Those who would not seek the statistician's help from the stage of planning seemed to be more interested in the ‘ P value.’
The respondents mentioned various reasons for not seeking a statistician's help, of which the most common were lack of awareness regarding the need for consulting a statistician from the beginning of the research and the nonavailability of a statistician at their institute. Some of the respondents mentioned that they would be capable of doing it themselves by referring to books and the internet and by discussion with colleagues. Harry Robinson et al . found in their study that students who preferred learning by self-instruction did as well or better in terms of exam grades than their colleagues taking lectures.[ 14 ]
Actually researcher have to calculate sample size appropriately, either himself/herself or with the help of statistician by examining previous studies (i.e. references or review of literature), with suitable error, with certain significance level and suitable power of the test; but some researchers take 25, 30, 50 or 100 as the sample size without referring to other studies. In this study, half of the respondents (158; 50.97%) did not calculate sample size appropriately. Only 152 respondents (49.03%) calculated sample size correctly, either by using standard formulae (13.87%) or with the help of a statistician (35.16%). The subject of the study, the characteristics of the population, the length of the research, and the cost of the research must all be taken into account when deciding the sampling technique. Unfortunately, irrespective of the demand of the study design, some researchers use simple random sampling technique, without thinking, as they know only this method.[ 4 ] The majority of the respondents (172; 55.5%) were unaware of the different sampling techniques, and those who said they were aware, did not mention the sampling techniques correctly. Two hundred and sixty-five (85.5%) respondents felt that they would need the help of a statistician for the presentation of data, whereas the remaining felt that they were capable of doing it themselves.
Internal medicine residents had low scores in a test of knowledge of biostatistics, and about three-fourths of the residents surveyed indicated that they were not confident about their understanding of the statistics they encountered in medical literature. The poor knowledge of biostatistics and difficulty experienced in interpretation of study results among the residents in the study likely reflects insufficient training.[ 15 ]
The score of respondents in this study was independent of designation; however, higher scores were obtained by professors compared to associate professors and lecturers. The score of PG students was high in comparison to that of MD or MS degree holders, diploma holders, and MSc degree holders. This may be due to the fact that the PG students were currently involved in research for their dissertation. The score of female respondents was more than that of males; however, the observed difference was not statistically significant. Khan et al ., have also reported that gender did not show any significant effect on responses.[ 6 ] Windish et al . reported higher scores for male respondents,[ 9 ] whereas Asif et al . found that females had higher scores.[ 16 ]
The Spearman rank correlation coefficient showed that the senior teaching faculty members had lower scores than the younger faculty members; the seniors claimed that this was because they were not taught biostatistics as a part of their undergraduate curriculum. The remaining parameters like number of years of teaching experience and number of research papers presented and published had only low degree of correlation with the score. There was low but significant correlation of the score with the number of papers presented and published. This may be due to the fact that scientifically correct research papers, wherein appropriate statistical methods are applied, are more likely to be published than those that lack appropriate application of statistics.
Most researchers are interested mainly in deriving the P value, without having a clear understanding of its meaning. In this study also, only 9 (2.9%) respondents could give the correct meaning of ‘ P value.’ One of the most common errors made by the researchers who do not consult a statistician is that, when conducting a study similar to a previous published study, they tend to use the same methods of statistical analysis and the same tests that were used in the previous study.[ 15 ] This reveals an indifference on the part of the researchers towards statistics and also research as a whole.
From the above observations it is evident that the majority of the teaching faculty and postgraduate students do not apply biostatistical concepts in a scientific manner while conducting research. Although they are aware that the proper use of biostatistical methods is important for scientific research, they lack the required knowledge. Most of the respondents in the present study wished to upgrade their knowledge of biostatistics and suggested refresher training programs, workshops, Continued Medical Education, and self-learning as the means of achieving this. Many respondents were reluctant to fill up the proforma and preferred to leave it blank. Improvements in teaching statistics to medical students should improve their understanding of statistical concepts and reduce the incidence of misconceptions among clinicians and medical researchers.[ 17 ] The poor knowledge of biostatistics and the consequent difficulty faced when interpreting study results among study subjects in the present study reflects insufficient training. Nearly one-third of the study subjects indicated that they never received biostatistics teaching at any point in their career and suggested the need for more effective training in biostatistics in undergraduate or postgraduate education. Zuger had also reported similar findings.[ 18 ] To conclude, it is essential for medical professionals to upgrade biostatistical knowledge frequently to improve research quality.
Source of Support: Nil.
Conflict of Interest: None declared.
- 1. Sami W. Biostatistics education for undergraduate medical students. Biomedica. 2010;26:80–4. [ Google Scholar ]
- 2. Adeleye OA, Offili AN. Difficulty in understanding statistics: Medical students’ perspectives in a Nigerian University. Int J Health Res. 2009;2:233–42. [ Google Scholar ]
- 3. Altman D, Bland JM. Improving doctors understanding of statistics. Stat Soc. 1991;154:223–67. [ Google Scholar ]
- 4. Ercan I, Yazıcı B, Yang Y, Özkaya G, Cangur S, Ediz B, et al. Misusage of statistics in medical research. Eur J Gen Med. 2007;4:128–34. [ Google Scholar ]
- 5. Stander I. Teaching conceptual vs theoretical statistics to medical students International Statistical Institute, 52 nd Session. 1999. [Last accessed on 2011 Dec 29]. Available from: http://www.stat.auckland.ac.nz/~iase/publications/5/stan0219.pdf .
- 6. Khan N, Mumtaz Y. Attitude of teaching faculty towards statistics at a medical university in Karachi, Pakistan. Pakmedinet. 2009;21:166–71. [ PubMed ] [ Google Scholar ]
- 7. Laopaiboon M, Lumbiganon P, Walter SD. Doctor's statistical literacy: A survey at Srinagarind Hospital, Khon Kaen University. J Med Assoc Thai. 1997;80:130–7. [ PubMed ] [ Google Scholar ]
- 8. Swift L, Miles S, Price GM, Shepstone L, Leinster SJ. Do doctors need statistics? Doctors’ use of and attitude to probability and statistics. Stat Med. 2009;28:1969–81. doi: 10.1002/sim.3608. [ DOI ] [ PubMed ] [ Google Scholar ]
- 9. Windish DM, Huot SJ, Green ML. Medicine residents’ understanding of the biostatistics and results in the medical literature. JAMA. 2007;298:1010–22. doi: 10.1001/jama.298.9.1010. [ DOI ] [ PubMed ] [ Google Scholar ]
- 10. Freeman JV, Collier S, Staniforth D, Smith KJ. Innovations in curriculum design: A multi-disciplinary approach to teaching statistics to undergraduate medical students. BMC Med Educ. 2008;8:6920–8. doi: 10.1186/1472-6920-8-28. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 11. Evans SJ. Statistics for medical students in the 1990's: How should we approach the future? Stat Med. 1990;9:1069–75. doi: 10.1002/sim.4780090913. [ DOI ] [ PubMed ] [ Google Scholar ]
- 12. Campbell MJ. Statistical training for doctors in the UK Sixth International Conference on Teaching Statistics Cape Town, South Africa. 2002. [Last accessed on 2011 Dec 26]. Available from: http://www.stat.auckland.ac.nz/~iase/publications/1/4f3_camp.pdf .
- 13. Miles S, Price GM, Swift L, Shepstone L, Leinster SJ. Statistics teaching in medical school: Opinions of practising doctors. BMC Med Educ. 2010;10:75. doi: 10.1186/1472-6920-10-75. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 14. Robinson H, Burke R, Stahl SM. Self-instructional teaching of biostatistics for medical students. J Community Health. 1976;1:249–55. doi: 10.1007/BF01324584. [ DOI ] [ PubMed ] [ Google Scholar ]
- 15. Altman DG. Poor-quality medical research-What can journals do? JAMA. 2002;287:2765–7. doi: 10.1001/jama.287.21.2765. [ DOI ] [ PubMed ] [ Google Scholar ]
- 16. Asif H, Asim B, Awais SM. Importance and understanding of bio-statistics among post graduate students at king edward medical university Lahore – Pakistan. Annals. 2009;15:107–10. [ Google Scholar ]
- 17. Mahmood Z. Uses and abuses of biostatistics in medical research in Pakistan. J Pak Med Assoc. 1990;40:270–1. [ PubMed ] [ Google Scholar ]
- 18. Zuger A. Survey finds significant statistical insecurity: Most physicians have no confidence in their own ability to use medical statistics. J Watch Gen Med. 2007 Aug;82:939–43. [ Google Scholar ]
- View on publisher site
- PDF (347.5 KB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
Add to Collections
Profile | September 28, 2020
Learn Biostatistical Reporting Skills
A medical writing trailblazer teaches students how to apply the reporting standards of evidence-based medicine.
An instructor for the Medical Writing and Editing certificate discusses the importance of statistical literacy for medical writers and the chain of events that led him to write the fundamental textbook in the field.
Medical writing, as we know it today, is still a relatively new field, notes Tom Lang , who has taught Interpreting & Reporting Biostatistics for the University’s Medical Writing and Editing certificate program since it began in 1999. Lang points out that, although medical writing is often defined as writing about medicine, as a subset of technical writing, it differs greatly from literary and journalistic writing in its purpose, form, and evaluation.
“The purpose of medical writing is not to get as many clicks as possible,” he says. “Nor does it need to tell an interesting story or have a unique voice. It’s functional writing designed to help people act by communicating complex science using words, tables, graphs, and images as clearly and concisely as possible.”
A Boundary Spanner at the Forefront of the Profession
Since beginning his career as a medical writer, Lang has been at the forefront of developments in the field and in advancing the profession. After starting as a technical writer at Lawrence Livermore Laboratory, where he was trained by some of the pioneers in the emerging field of technical writing, he went on to take his first steps into medical writing as the co-author of a college text on personal health.
“With an undergraduate degree in the social sciences, I’ve always been a boundary spanner,” Lang says. “I loved applying anthropology to economics and political science to sociology. So, I was a natural fit for writing about personal health because it drew on such a wide variety of disciplines.”
Non-Credit Certificate Program in Medical Writing and Editing
Master the fundamentals and best practices of medical writing, editing, and communication.
View Program
Biostatistical Reporting and the Need for Guidelines
As manager of medical editing services at the Cleveland Clinic in the 1990s, he and his staff edited a broad range of documents reporting basic and clinical research. Early on, however, he encountered an article that presented the results of the same statistical test in two different ways.
“Inconsistencies are the things you look for as an editor to do the job well,” Lang says. “So I went to the publication style guides for answers, but none had guidelines for reporting statistics. The lack of guidelines or requirements seemed strange, given how important statistics are to conducting and reporting research.”
Additionally, the primary books on preparing scientific articles available in the early 90s said nothing about statistical reporting, which only made Lang more curious to find an answer. He turned next to the medical literature where, along with a few relevant short articles and letters to the editor on statistical reporting, he started finding articles by an Oxford statistician named Doug Altman, who was studying statistical errors in the medical literature.
The Foundational Text on Statistical Reporting in Medicine
With Altman’s work as inspiration, Lang began writing a manuscript that focused on how to report statistical results in the biomedical literature. From a review of some 350 studies on statistical and methodological flaws, he created the first comprehensive set of guidelines for reporting statistics in medicine.
That work turned into How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers . Now in its second edition, the book has been a best seller for the publisher, the American College of Physicians, throughout its publication twenty-two years ago. It has also been translated into Chinese, Japanese, and Russian.
“Shortly after the book was published, I was contacted by the University of Chicago to teach a class on reporting biostatistics for their new medical editing certificate program,” Lang adds. “Drawing on the information presented in my book, I designed an intensive three-day curriculum to introduce students to the statistical analyses they’ll need to understand and report about 80% of the clinical literature. Although there are thousands of tests, only about two dozen are common, so it’s not hard to bring students up to speed.”
The Importance of Statistical Literacy for Medical Writers
In this core course of the Medical Writing and Editing program, students develop a conceptual understanding of the most common statistical tests and procedures used in biomedical and epidemiological research. Lectures and readings address what type of statistics to expect in a given research article, how to interpret their meaning in the article, and how to identify errors and omissions. Assignments focus on the complete and proper reporting of research activities, data, and statistical analyses.
Lang emphasizes that the course is not a typical course on statistics taught by and for statisticians. There is no calculating of statistics or designing of research involved. The course is designed and taught by a medical writer for medical writers.
“Evidence-based medicine is literature-based medicine, and medical writers and editors help prepare that literature,” he says. “I teach students what they need to know to apply the reporting standards of evidence-based medicine. These are the skills that help advance medicine and the profession of medical writing.”
“As professional writers and editors,” he adds, “we need to persuade people that we are not just people who like to write but that our knowledge, skills, and experience make us expert writers and allow us to communicate far more effectively than can writers without advanced training. Becoming ‘statistically’ literate greatly improves the value of our services and the image of our profession.”
Additional Stories
View All Stories
The Rise of Professional Certificates
Catalyzing career trajectories in the knowledge economy.
September 24, 2024
Translating Treatment
The rise of medical writing and editing through the enduring impact of Barbara Gastel, MD, MPH.
September 11, 2024
Communication Revolution
Transforming healthcare in the digital age.
August 28, 2024
- Strategic Data Storytelling
- Explore the World of Medical Writing
- Decision-Making and Risk Management
Biostatistics and Research Design for Clinicians
- First Online: 04 May 2018
Cite this chapter
- Tarsicio Uribe-Leitz 5 ,
- Alyssa Fitzpatrick Harlow 5 &
- Adil H. Haider 5
2162 Accesses
The objective of this chapter is to provide the reader with a basic understanding of core statistical concepts that will aid in the translation of meaningful research results. We emphasize the importance of a well-defined study design as a strong foundation that will lead to sound evidence. First, we define foundational statistical terminology; we then discuss the steps to generating and testing hypotheses through development of a solid research question and review how to choose an appropriate study design. Second, we describe a twofold approach to data analysis that uses descriptive statistics followed by inferential statistics. Finally, we cover essentials for sound evidence, based on a systematic point-based approach to judge the quality of data and strength of recommendations produced by research studies.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
- Durable hardcover edition
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Similar content being viewed by others
Statistical Power and Sample Size: Some Fundamentals for Clinician Researchers
Biostatistics
Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424. https://doi.org/10.1080/00273171.2011.568786 .
Article Google Scholar
Biesecker LG. Hypothesis-generating research and predictive medicine. Genome Res. 2013;23(7):1051–3. https://doi.org/10.1101/gr.157826.113 .
Article CAS PubMed PubMed Central Google Scholar
Brian Haynes R. Forming research questions. J Clin Epidemiol. 2006;59(9):881–6. https://doi.org/10.1016/j.jclinepi.2006.06.006
Article CAS Google Scholar
Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Ann Intern Med. 1997;127(5):380–7.
Daniel WW, Cross CL. Biostatistics: a foundation for analysis in the health sciences. 10th ed. Hoboken: Wiley; 2013.
Google Scholar
Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, Atkins D. Current methods of the U.S. Preventive Services Task Force: a review of the process. Am J Prev Med. 2001;20(3 Supplement):21–35. https://doi.org/10.1016/S0749-3797(01)00261-6
Himmelfarb Health Sciences Library. Study Design 101. Washington, DC: George Washington University. 2011. https://himmelfarb.gwu.edu/tutorials/studydesign101/ . Retrieved 24 Mar 2017.
Hulley SB, Cummings SR, Browner WS, Grady DG, Newman TB. Designing clinical research. 3rd ed. Philadelphia: Lippincott Williams and Wilkins; 2007.
Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.
CAS PubMed Google Scholar
Simianu VV, Farjah F, Flum DR. Evidence-based surgery: critically assessing surgical literature. In: Sabiston textbook of surgery: the biological basis of modern surgical practice. 20th ed. Philadelphia: Elsevier; 2013.
Sullivan LM. Essentials of biostatistics in public health. 2nd ed. Sudbury: Jones & Bartlett Learning; 2012.
What is GRADE? BMJ Clinical Evidence. London. 2012. http://clinicalevidence.bmj.com/x/set/static/ebm/learn/665072.html . Retrieved 29 Mar 2017.
Download references
Acknowledgment
Dr. Haider would like to thank the Career Development Course of the Association for Academic Surgery where he has lectured on this topic over the past several years. Most of the concepts presented here have been discussed during the course of these presentations.
Author information
Authors and affiliations.
Center for Surgery and Public Health (CSPH), Brigham and Women’s Hospital, Boston, MA, USA
Tarsicio Uribe-Leitz, Alyssa Fitzpatrick Harlow & Adil H. Haider
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Adil H. Haider .
Editor information
Editors and affiliations.
Brigham and Womens’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
Dell Medical School, University of Texas at Austin, Austin, Texas, USA
Carlos Brown
Division of Trauma Surgery, Rm C5L100, University of Southern California, Los Angeles, California, USA
Kenji Inaba
Trauma and Emergency Surgery Service, Legacy Emanuel Medical Center, Portland, Oregon, USA
Matthew J. Martin
Rights and permissions
Reprints and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Uribe-Leitz, T., Harlow, A.F., Haider, A.H. (2018). Biostatistics and Research Design for Clinicians. In: Salim, A., Brown, C., Inaba, K., Martin, M. (eds) Surgical Critical Care Therapy . Springer, Cham. https://doi.org/10.1007/978-3-319-71712-8_60
Download citation
DOI : https://doi.org/10.1007/978-3-319-71712-8_60
Published : 04 May 2018
Publisher Name : Springer, Cham
Print ISBN : 978-3-319-71711-1
Online ISBN : 978-3-319-71712-8
eBook Packages : Medicine Medicine (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
Biostatistics & Epidemiology
- Find Articles & Data
- Tips & Video Tutorials
- Find Books & E-books
Writing a Lit Review?
Publishing open access, citation guides, books on writing & research, academic honesty & plagiarism, using ulrich's.
- Web Resources
- Evaluating Journal Quality
- Literature Reviews, Systematic Reviews & More for Health Sciences by Ellen Lutz Last Updated Oct 1, 2024 2342 views this year
The Libraries support Open Access publications in a variety of ways. Two that help fund open access publications are the SOAR (Supporting Open Access Research) Fund and Open Access Agreements .
Not only do you need to cite your sources, but you need to format them according to a particular style. Your instructor will often ask you to use a certain style in your paper. Here are links for more information about the styles commonly used in the health sciences:
**NOTE: Many databases provided by the Libraries will generate citations that are not always 100% accurate! When using APA or AMA style, always check the capitalization of words in the article title and the journal title to make sure they are correct.**
- APA Style 7th Edition Reference Quick Guide an overview of formatting citations in APA 7 from the APA Style team
- APA Style - Style & Grammar Guidelines for References from APA Style website
- APA Citation Style Guide from Purdue University Guide in print: Du Bois Library: UM Reference Desk / BF76.7 .P83 2010 OR Science & Engineering Library: UM Science Reference / BF76.7 .P83 2010 / Reference
- AMA Manual of Style - 11th Edition E-book version of the latest American Medical Association style guide. (Limited to 3 users at a time.)
- AMA Style Guide from University of Washington Guide in print: Science & Engineering Library: UM Science Reference / R119 .A533 2009
- NLM Journal Abbreviations Allows you to get the NLM abbreviation for a journal title to use in AMA citation format. Can also look up abbreviation to find full journal title.
"Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty." University of Massachusetts Amherst Academic Honesty Policy and Appeal Procedure
Here are some links for more information about avoiding academic dishonesty:
- Turnitin Turnitin is a plagiarism prevention service used by many UMass professors that detects textual matches between student papers and other documents available in electronic form on the Internet, in subscription databases, and in databases of student papers.
- Academic Honesty Policy Gives specific examples of academic dishonesty and links to an overview of the policy, the full policy and helpful resources if accused of academic dishonesty.
- Avoiding Plagiarism From the Purdue University Online Writing Lab (OWL)
- << Previous: Find Books & E-books
- Next: Zotero >>
- Last Updated: Sep 24, 2024 12:52 AM
- URL: https://guides.library.umass.edu/bioepi
© 2022 University of Massachusetts Amherst • Site Policies • Accessibility
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
How to write statistical analysis section in medical research
Alok kumar dwivedi.
- Author information
- Article notes
- Copyright and License information
Correspondence to Dr Alok Kumar Dwivedi, Department of Molecular and Translational Medicine. Division of Biostatistics & Epidemiology., Texas Tech University Health Sciences Center El Paso, El Paso, Texas, USA; [email protected]
Corresponding author.
Accepted 2022 Jun 1; Issue date 2022 Dec.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, an indication of whether changes were made, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .
Reporting of statistical analysis is essential in any clinical and translational research study. However, medical research studies sometimes report statistical analysis that is either inappropriate or insufficient to attest to the accuracy and validity of findings and conclusions. Published works involving inaccurate statistical analyses and insufficient reporting influence the conduct of future scientific studies, including meta-analyses and medical decisions. Although the biostatistical practice has been improved over the years due to the involvement of statistical reviewers and collaborators in research studies, there remain areas of improvement for transparent reporting of the statistical analysis section in a study. Evidence-based biostatistics practice throughout the research is useful for generating reliable data and translating meaningful data to meaningful interpretation and decisions in medical research. Most existing research reporting guidelines do not provide guidance for reporting methods in the statistical analysis section that helps in evaluating the quality of findings and data interpretation. In this report, we highlight the global and critical steps to be reported in the statistical analysis of grants and research articles. We provide clarity and the importance of understanding study objective types, data generation process, effect size use, evidence-based biostatistical methods use, and development of statistical models through several thematic frameworks. We also provide published examples of adherence or non-adherence to methodological standards related to each step in the statistical analysis and their implications. We believe the suggestions provided in this report can have far-reaching implications for education and strengthening the quality of statistical reporting and biostatistical practice in medical research.
Keywords: Research; Biostatistics; Biomedical Research; Education, Medical; Medicine
Introduction
Biostatistics is the overall approach to how we realistically and feasibly execute a research idea to produce meaningful data and translate data to meaningful interpretation and decisions. In this era of evidence-based medicine and practice, basic biostatistical knowledge becomes essential for critically appraising research articles and implementing findings for better patient management, improving healthcare, and research planning. 1 However, it may not be sufficient for the proper execution and reporting of statistical analyses in studies. 2 3 Three things are required for statistical analyses, namely knowledge of the conceptual framework of variables, research design, and evidence-based applications of statistical analysis with statistical software. 4 5 The conceptual framework provides possible biological and clinical pathways between independent variables and outcomes with role specification of variables. The research design provides a protocol of study design and data generation process (DGP), whereas the evidence-based statistical analysis approach provides guidance for selecting and implementing approaches after evaluating data with the research design. 2 5 Ocaña-Riola 6 reported a substantial percentage of articles from high-impact medical journals contained errors in statistical analysis or data interpretation. These errors in statistical analyses and interpretation of results do not only impact the reliability of research findings but also influence the medical decision-making and planning and execution of other related studies. A survey of consulting biostatisticians in the USA reported that researchers frequently request biostatisticians for performing inappropriate statistical analyses and inappropriate reporting of data. 7 This implies that there is a need to enforce standardized reporting of the statistical analysis section in medical research which can also help rreviewers and investigators to improve the methodological standards of the study.
Biostatistical practice in medicine has been improving over the years due to continuous efforts in promoting awareness and involving expert services on biostatistics, epidemiology, and research design in clinical and translational research. 8–11 Despite these efforts, the quality of reporting of statistical analysis in research studies has often been suboptimal. 12 13 We noticed that none of the methods reporting documents were developed using evidence-based biostatistics (EBB) theory and practice. The EBB practice implies that the selection of statistical analysis methods for statistical analyses and the steps of results reporting and interpretation should be grounded based on the evidence generated in the scientific literature and according to the study objective type and design. 5 Previous works have not properly elucidated the importance of understanding EBB concepts and related reporting in the write-up of statistical analyses. As a result, reviewers sometimes ask to present data or execute analyses that do not match the study objective type. 14 We summarize the statistical analysis steps to be reported in the statistical analysis section based on review and thematic frameworks.
We identified articles describing statistical reporting problems in medicine using different search terms ( online supplemental table 1 ). Based on these studies, we prioritized commonly reported statistical errors in analytical strategies and developed essential components to be reported in the statistical analysis section of research grants and studies. We also clarified the purpose and the overall implication of reporting each step in statistical analyses through various examples.
jim-2022-002479supp001.pdf (189.9KB, pdf)
Although biostatistical inputs are critical for the entire research study ( online supplemental table 2 ), biostatistical consultations were mostly used for statistical analyses only 15 . Even though the conduct of statistical analysis mismatched with the study objective and DGP was identified as the major problem in articles submitted to high-impact medical journals. 16 In addition, multivariable analyses were often inappropriately conducted and reported in published studies. 17 18 In light of these statistical errors, we describe the reporting of the following components in the statistical analysis section of the study.
Step 1: specify study objective type and outcomes (overall approach)
The study objective type provides the role of important variables for a specified outcome in statistical analyses and the overall approach of the model building and model reporting steps in a study. In the statistical framework, the problems are classified into descriptive and inferential/analytical/confirmatory objectives. In the epidemiological framework, the analytical and prognostic problems are broadly classified into association, explanatory, and predictive objectives. 19 These study objectives ( figure 1 ) may be classified into six categories: (1) exploratory, (2) association, (3) causal, (4) intervention, (5) prediction and (6) clinical decision models in medical research. 20
Comparative assessments of developing and reporting of study objective types and models. Association measures include odds ratio, risk ratio, or hazard ratio. AUC, area under the curve; C, confounder; CI, confidence interval; E, exposure; HbA1C: hemoglobin A1c; M, mediator; MFT, model fit test; MST, model specification test; PI, predictive interval; R 2 , coefficient of determinant; X, independent variable; Y, outcome.
The exploratory objective type is a specific type of determinant study and is commonly known as risk factors or correlates study in medical research. In an exploratory study, all covariates are considered equally important for the outcome of interest in the study. The goal of the exploratory study is to present the results of a model which gives higher accuracy after satisfying all model-related assumptions. In the association study, the investigator identifies predefined exposures of interest for the outcome, and variables other than exposures are also important for the interpretation and considered as covariates. The goal of an association study is to present the adjusted association of exposure with outcome. 20 In the causal objective study, the investigator is interested in determining the impact of exposure(s) on outcome using the conceptual framework. In this study objective, all variables should have a predefined role (exposures, confounders, mediators, covariates, and predictors) in a conceptual framework. A study with a causal objective is known as an explanatory or a confirmatory study in medical research. The goal is to present the direct or indirect effects of exposure(s) on an outcome after assessing the model’s fitness in the conceptual framework. 19 21 The objective of an interventional study is to determine the effect of an intervention on outcomes and is often known as randomized or non-randomized clinical trials in medical research. In the intervention objective model, all variables other than the intervention are treated as nuisance variables for primary analyses. The goal is to present the direct effect of the intervention on the outcomes by eliminating biases. 22–24 In the predictive study, the goal is to determine an optimum set of variables that can predict the outcome, particularly in external settings. The clinical decision models are a special case of prognostic models in which high dimensional data at various levels are used for risk stratification, classification, and prediction. In this model, all variables are considered input features. The goal is to present a decision tool that has high accuracy in training, testing, and validation data sets. 20 25 Biostatisticians or applied researchers should properly discuss the intention of the study objective type before proceeding with statistical analyses. In addition, it would be a good idea to prepare a conceptual model framework regardless of study objective type to understand study concepts.
A study 26 showed a favorable effect of the beta-blocker intervention on survival outcome in patients with advanced human epidermal growth factor receptor (HER2)-negative breast cancer without adjusting for all the potential confounding effects (age or menopausal status and Eastern Cooperative Oncology Performance Status) in primary analyses or validation analyses or using a propensity score-adjusted analysis, which is an EBB preferred method for analyzing non-randomized studies. 27 Similarly, another study had the goal of developing a predictive model for prediction of Alzheimer’s disease progression. 28 However, this study did not internally or externally validate the performance of the model as per the requirement of a predictive objective study. In another study, 29 investigators were interested in determining an association between metabolic syndrome and hepatitis C virus. However, the authors did not clearly specify the outcome in the analysis and produced conflicting associations with different analyses. 30 Thus, the outcome should be clearly specified as per the study objective type.
Step 2: specify effect size measure according to study design (interpretation and practical value)
The study design provides information on the selection of study participants and the process of data collection conditioned on either exposure or outcome ( figure 2 ). The appropriate use of effect size measure, tabular presentation of results, and the level of evidence are mostly determined by the study design. 31 32 In cohort or clinical trial study designs, the participants are selected based on exposure status and are followed up for the development of the outcome. These study designs can provide multiple outcomes, produce incidence or incidence density, and are preferred to be analyzed with risk ratio (RR) or hazards models. In a case–control study, the selection of participants is conditioned on outcome status. This type of study can have only one outcome and is preferred to be analyzed with an odds ratio (OR) model. In a cross-sectional study design, there is no selection restriction on outcomes or exposures. All data are collected simultaneously and can be analyzed with a prevalence ratio model, which is mathematically equivalent to the RR model. 33 The reporting of effect size measure also depends on the study objective type. For example, predictive models typically require reporting of regression coefficients or weight of variables in the model instead of association measures, which are required in other objective types. There are agreements and disagreements between OR and RR measures. Due to the constancy and symmetricity properties of OR, some researchers prefer to use OR in studies with common events. Similarly, the collapsibility and interpretability properties of RR make it more appealing to use in studies with common events. 34 To avoid variable practice and interpretation issues with OR, it is recommended to use RR models in all studies except for case–control and nested case–control studies, where OR approximates RR and thus OR models should be used. Otherwise, investigators may report sufficient data to compute any ratio measure. Biostatisticians should educate investigators on the proper interpretation of ratio measures in the light of study design and their reporting. 34 35
Effect size according to study design.
Investigators sometimes either inappropriately label their study design 36 37 or report effect size measures not aligned with the study design, 38 39 leading to difficulty in results interpretation and evaluation of the level of evidence. The proper labeling of study design and the appropriate use of effect size measure have substantial implications for results interpretation, including the conduct of systematic review and meta-analysis. 40 A study 31 reviewed the frequency of reporting OR instead of RR in cohort studies and randomized clinical trials (RCTs) and found that one-third of the cohort studies used an OR model, whereas 5% of RCTs used an OR model. The majority of estimated ORs from these studies had a 20% or higher deviation from the corresponding RR.
Step 3: specify study hypothesis, reporting of p values, and interval estimates (interpretation and decision)
The clinical hypothesis provides information for evaluating formal claims specified in the study objectives, while the statistical hypothesis provides information about the population parameters/statistics being used to test the formal claims. The inference about the study hypothesis is typically measured by p value and confidence interval (CI). A smaller p value indicates that the data support against the null hypothesis. Since the p value is a conditional probability, it can never tell about the acceptance or rejection of the null hypothesis. Therefore, multiple alternative strategies of p values have been proposed to strengthen the credibility of conclusions. 41 42 Adaption of these alternative strategies is only needed in the explanatory objective studies. Although exact p values are recommended to be reported in research studies, p values do not provide any information about the effect size. Compared with p values, the CI provides a confidence range of the effect size that contains the true effect size if the study were repeated and can be used to determine whether the results are statistically significant or not. 43 Both p value and 95% CI provide complementary information and thus need to be specified in the statistical analysis section. 24 44
Researchers often test one or more comparisons or hypotheses. Accordingly, the side and the level of significance for considering results to be statistically significant may change. Furthermore, studies may include more than one primary outcome that requires an adjustment in the level of significance for multiplicity. All studies should provide the interval estimate of the effect size/regression coefficient in the primary analyses. Since the interpretation of data analysis depends on the study hypothesis, researchers are required to specify the level of significance along with the side (one-sided or two-sided) of the p value in the test for considering statistically significant results, adjustment of the level of significance due to multiple comparisons or multiplicity, and reporting of interval estimates of the effect size in the statistical analysis section. 45
A study 46 showed a significant effect of fluoxetine on relapse rates in obsessive-compulsive disorder based on a one-sided p value of 0.04. Clearly, there was no reason for using a one-sided p value as opposed to a two-sided p value. A review of the appropriate use of multiple test correction methods in multiarm clinical trials published in major medical journals in 2012 identified over 50% of the articles did not perform multiple-testing correction. 47 Similar to controlling a familywise error rate due to multiple comparisons, adjustment of the false discovery rate is also critical in studies involving multiple related outcomes. A review of RCTs for depression between 2007 and 2008 from six journals reported that only limited studies (5.8%) accounted for multiplicity in the analyses due to multiple outcomes. 48
Step 4: account for DGP in the statistical analysis (accuracy)
The study design also requires the specification of the selection of participants and outcome measurement processes in different design settings. We referred to this specific design feature as DGP. Understanding DGP helps in determining appropriate modeling of outcome distribution in statistical analyses and setting up model premises and units of analysis. 4 DGP ( figure 3 ) involves information on data generation and data measures, including the number of measurements after random selection, complex selection, consecutive selection, pragmatic selection, or systematic selection. Specifically, DGP depends on a sampling setting (participants are selected using survey sampling methods and one subject may represent multiple participants in the population), clustered setting (participants are clustered through a recruitment setting or hierarchical setting or multiple hospitals), pragmatic setting (participants are selected through mixed approaches), or systematic review setting (participants are selected from published studies). DGP also depends on the measurements of outcomes in an unpaired setting (measured on one occasion only in independent groups), paired setting (measured on more than one occasion or participants are matched on certain subject characteristics), or mixed setting (measured on more than one occasion but interested in comparing independent groups). It also involves information regarding outcomes or exposure generation processes using quantitative or categorical variables, quantitative values using labs or validated instruments, and self-reported or administered tests yielding a variety of data distributions, including individual distribution, mixed-type distribution, mixed distributions, and latent distributions. Due to different DGPs, study data may include messy or missing data, incomplete/partial measurements, time-varying measurements, surrogate measures, latent measures, imbalances, unknown confounders, instrument variables, correlated responses, various levels of clustering, qualitative data, or mixed data outcomes, competing events, individual and higher-level variables, etc. The performance of statistical analysis, appropriate estimation of standard errors of estimates and subsequently computation of p values, the generalizability of findings, and the graphical display of data rely on DGP. Accounting for DGP in the analyses requires proper communication between investigators and biostatisticians about each aspect of participant selection and data collection, including measurements, occasions of measurements, and instruments used in the research study.
Common features of the data generation process.
A study 49 compared the intake of fresh fruit and komatsuna juice with the intake of commercial vegetable juice on metabolic parameters in middle-aged men using an RCT. The study was criticized for many reasons, but primarily for incorrect statistical methods not aligned with the study DGP. 50 Similarly, another study 51 highlighted that 80% of published studies using the Korean National Health and Nutrition Examination Survey did not incorporate survey sampling structure in statistical analyses, producing biased estimates and inappropriate findings. Likewise, another study 52 highlighted the need for maintaining methodological standards while analyzing data from the National Inpatient Sample. A systematic review 53 identified that over 50% of studies did not specify whether a paired t-test or an unpaired t-test was performed in statistical analysis in the top 25% of physiology journals, indicating poor transparency in reporting of statistical analysis as per the data type. Another study 54 also highlighted the data displaying errors not aligned with DGP. As per DGP, delay in treatment initiation of patients with cancer defined from the onset of symptom to treatment initiation should be analyzed into three components: patient/primary delay, secondary delay, and tertiary delay. 55 Similarly, the number of cancerous nodes should be analyzed with count data models. 56 However, several studies did not analyze such data according to DGP. 57 58
Step 5: apply EBB methods specific to study design features and DGP (efficiency and robustness)
The continuous growth in the development of robust statistical methods for dealing with a specific problem produced various methods to analyze specific data types. Since multiple methods are available for handling a specific problem yet with varying performances, heterogeneous practices among applied researchers have been noticed. Variable practices could also be due to a lack of consensus on statistical methods in literature, unawareness, and the unavailability of standardized statistical guidelines. 2 5 59 However, it becomes sometimes difficult to differentiate whether a specific method was used due to its robustness, lack of awareness, lack of accessibility of statistical software to apply an alternative appropriate method, intention to produce expected results, or ignorance of model diagnostics. To avoid heterogeneous practices, the selection of statistical methodology and their reporting at each stage of data analysis should be conducted using methods according to EBB practice. 5 Since it is hard for applied researchers to optimally select statistical methodology at each step, we encourage investigators to involve biostatisticians at the very early stage in basic, clinical, population, translational, and database research. We also appeal to biostatisticians to develop guidelines, checklists, and educational tools to promote the concept of EBB. As an effort, we developed the statistical analysis and methods in biomedical research (SAMBR) guidelines for applied researchers to use EBB methods for data analysis. 5 The EBB practice is essential for applying recent cutting-edge robust methodologies to yield accurate and unbiased results. The efficiency of statistical methodologies depends on the assumptions and DGP. Therefore, investigators may attempt to specify the choice of specific models in the primary analysis as per the EBB.
Although details of evidence-based preferred methods are provided in the SAMBR checklists for each study design/objective, 5 we have presented a simplified version of evidence-based preferred methods for common statistical analysis ( online supplemental table 3 ). Several examples are available in the literature where inefficient methods not according to EBB practice have been used. 31 57 60
Step 6: report variable selection method in the multivariable analysis according to study objective type (unbiased)
Multivariable analysis can be used for association, prediction or classification or risk stratification, adjustment, propensity score development, and effect size estimation. 61 Some biological, clinical, behavioral, and environmental factors may directly associate or influence the relationship between exposure and outcome. Therefore, almost all health studies require multivariable analyses for accurate and unbiased interpretations of findings ( figure 1 ). Analysts should develop an adjusted model if the sample size permits. It is a misconception that the analysis of RCT does not require adjusted analysis. Analysis of RCT may require adjustment for prognostic variables. 23 The foremost step in model building is the entry of variables after finalizing the appropriate parametric or non-parametric regression model. In the exploratory model building process due to no preference of exposures, a backward automated approach after including any variables that are significant at 25% in the unadjusted analysis can be used for variable selection. 62 63 In the association model, a manual selection of covariates based on the relevance of the variables should be included in a fully adjusted model. 63 In a causal model, clinically guided methods should be used for variable selection and their adjustments. 20 In a non-randomized interventional model, efforts should be made to eliminate confounding effects through propensity score methods and the final propensity score-adjusted multivariable model may adjust any prognostic variables, while a randomized study simply should adjust any prognostic variables. 27 Maintaining the event per variable (EVR) is important to avoid overfitting in any type of modeling; therefore, screening of variables may be required in some association and explanatory studies, which may be accomplished using a backward stepwise method that needs to be clarified in the statistical analyses. 10 In a predictive study, a model with an optimum set of variables producing the highest accuracy should be used. The optimum set of variables may be screened with the random forest method or bootstrap or machine learning methods. 64 65 Different methods of variable selection and adjustments may lead to different results. The screening process of variables and their adjustments in the final multivariable model should be clearly mentioned in the statistical analysis section.
A study 66 evaluating the effect of hydroxychloroquine (HDQ) showed unfavorable events (intubation or death) in patients who received HDQ compared with those who did not (hazard ratio (HR): 2.37, 95% CI 1.84 to 3.02) in an unadjusted analysis. However, the propensity score-adjusted analyses as appropriate with the interventional objective model showed no significant association between HDQ use and unfavorable events (HR: 1.04, 95% CI 0.82 to 1.32), which was also confirmed in multivariable and other propensity score-adjusted analyses. This study clearly suggests that results interpretation should be based on a multivariable analysis only in observational studies if feasible. A recent study 10 noted that approximately 6% of multivariable analyses based on either logistic or Cox regression used an inappropriate selection method of variables in medical research. This practice was more commonly noted in studies that did not involve an expert biostatistician. Another review 61 of 316 articles from high-impact Chinese medical journals revealed that 30.7% of articles did not report the selection of variables in multivariable models. Indeed, this inappropriate practice could have been identified more commonly if classified according to the study objective type. 18 In RCTs, it is uncommon to report an adjusted analysis based on prognostic variables, even though an adjusted analysis may produce an efficient estimate compared with an unadjusted analysis. A study assessing the effect of preemptive intervention on development outcomes showed a significant effect of an intervention on reducing autism spectrum disorder symptoms. 67 However, this study was criticized by Ware 68 for not reporting non-significant results in unadjusted analyses. If possible, unadjusted estimates should also be reported in any study, particularly in RCTs. 23 68
Step 7: provide evidence for exploring effect modifiers (applicability)
Any variable that modifies the effect of exposure on the outcome is called an effect modifier or modifier or an interacting variable. Exploring the effect modifiers in multivariable analyses helps in (1) determining the applicability/generalizability of findings in the overall or specific subpopulation, (2) generating ideas for new hypotheses, (3) explaining uninterpretable findings between unadjusted and adjusted analyses, (4) guiding to present combined or separate models for each specific subpopulation, and (5) explaining heterogeneity in treatment effect. Often, investigators present adjusted stratified results according to the presence or absence of an effect modifier. If the exposure interacts with multiple variables statistically or conceptually in the model, then the stratified findings (subgroup) according to each effect modifier may be presented. Otherwise, stratified analysis substantially reduces the power of the study due to the lower sample size in each stratum and may produce significant results by inflating type I error. 69 Therefore, a multivariable analysis involving an interaction term as opposed to a stratified analysis may be presented in the presence of an effect modifier. 70 Sometimes, a quantitative variable may emerge as a potential effect modifier for exposure and an outcome relationship. In such a situation, the quantitative variable should not be categorized unless a clinically meaningful threshold is not available in the study. In fact, the practice of categorizing quantitative variables should be avoided in the analysis unless a clinically meaningful cut-off is available or a hypothesis requires for it. 71 In an exploratory objective type, any possible interaction may be obtained in a study; however, the interpretation should be guided based on clinical implications. Similarly, some objective models may have more than one exposure or intervention and the association of each exposure according to the level of other exposure should be presented through adjusted analyses as suggested in the presence of interaction effects. 70
A review of 428 articles from MEDLINE on the quality of reporting from statistical analyses of three (linear, logistic, and Cox) commonly used regression models reported that only 18.5% of the published articles provided interaction analyses, 17 even though interaction analyses can provide a lot of useful information.
Step 8: assessment of assumptions, specifically the distribution of outcome, linearity, multicollinearity, sparsity, and overfitting (reliability)
The assessment and reporting of model diagnostics are important in assessing the efficiency, validity, and usefulness of the model. Model diagnostics include satisfying model-specific assumptions and the assessment of sparsity, linearity, distribution of outcome, multicollinearity, and overfitting. 61 72 Model-specific assumptions such as normal residuals, heteroscedasticity and independence of errors in linear regression, proportionality in Cox regression, proportionality odds assumption in ordinal logistic regression, and distribution fit in other types of continuous and count models are required. In addition, sparsity should also be examined prior to selecting an appropriate model. Sparsity indicates many zero observations in the data set. 73 In the presence of sparsity, the effect size is difficult to interpret. Except for machine learning models, most of the parametric and semiparametric models require a linear relationship between independent variables and a functional form of an outcome. Linearity should be assessed using a multivariable polynomial in all model objectives. 62 Similarly, the appropriate choice of the distribution of outcome is required for model building in all study objective models. Multicollinearity assessment is also useful in all objective models. Assessment of EVR in multivariable analysis can be used to avoid the overfitting issue of a multivariable model. 18
Some review studies highlighted that 73.8%–92% of the articles published in MEDLINE had not assessed the model diagnostics of the multivariable regression models. 17 61 72 Contrary to the monotonically, linearly increasing relationship between systolic blood pressure (SBP) and mortality established using the Framingham’s study, 74 Port et al 75 reported a non-linear relationship between SBP and all-cause mortality or cardiovascular deaths by reanalysis of the Framingham’s study data set. This study identified a different threshold for treating hypertension, indicating the role of linearity assessment in multivariable models. Although a non-Gaussian distribution model may be required for modeling patient delay outcome data in cancer, 55 a study analyzed patient delay data using an ordinary linear regression model. 57 An investigation of the development of predictive models and their reporting in medical journals identified that 53% of the articles had fewer EVR than the recommended EVR, indicating over half of the published articles may have an overfitting model. 18 Another study 76 attempted to identify the anthropometric variables associated with non-insulin-dependent diabetes and found that none of the anthropometric variables were significant after adjusting for waist circumference, age, and sex, indicating the presence of collinearity. A study reported detailed sparse data problems in published studies and potential solutions. 73
Step 9: report type of primary and sensitivity analyses (consistency)
Numerous considerations and assumptions are made throughout the research processes that require assessment, evaluation, and validation. Some assumptions, executions, and errors made at the beginning of the study data collection may not be fixable 13 ; however, additional information collected during the study and data processing, including data distribution obtained at the end of the study, may facilitate additional considerations that need to be verified in the statistical analyses. Consistencies in the research findings via modifications in the outcome or exposure definition, study population, accounting for missing data, model-related assumptions, variables and their forms, and accounting for adherence to protocol in the models can be evaluated and reported in research studies using sensitivity analyses. 77 The purpose and type of supporting analyses need to be specified clearly in the statistical analyses to differentiate the main findings from the supporting findings. Sensitivity analyses are different from secondary or interim or subgroup analyses. 78 Data analyses for secondary outcomes are often referred to as secondary analyses, while data analyses of an ongoing study are called interim analyses and data analyses according to groups based on patient characteristics are known as subgroup analyses.
Almost all studies require some form of sensitivity analysis to validate the findings under different conditions. However, it is often underutilized in medical journals. Only 18%–20.3% of studies reported some forms of sensitivity analyses. 77 78 A review of nutritional trials from high-quality journals reflected that 17% of the conclusions were reported inappropriately using findings from sensitivity analyses not based on the primary/main analyses. 77
Step 10: provide methods for summarizing, displaying, and interpreting data (transparency and usability)
Data presentation includes data summary, data display, and data from statistical model analyses. The primary purpose of the data summary is to understand the distribution of outcome status and other characteristics in the total sample and by primary exposure status or outcome status. Column-wise data presentation should be preferred according to exposure status in all study designs, while row-wise data presentation for the outcome should be preferred in all study designs except for a case–control study. 24 32 Summary statistics should be used to provide maximum information on data distribution aligned with DGP and variable type. The purpose of results presentation primarily from regression analyses or statistical models is to convey results interpretation and implications of findings. The results should be presented according to the study objective type. Accordingly, the reporting of unadjusted and adjusted associations of each factor with the outcome may be preferred in the determinant objective model, while unadjusted and adjusted effects of primary exposure on the outcome may be preferred in the explanatory objective model. In prognostic models, the final predictive models may be presented in such a way that users can use models to predict an outcome. In the exploratory objective model, a final multivariable model should be reported with R 2 or area under the curve (AUC). In the association and interventional models, the assessment of internal validation is critically important through various sensitivity and validation analyses. A model with better fit indices (in terms of R 2 or AUC, Akaike information criterion, Bayesian information criterion, fit index, root mean square error) should be finalized and reported in the causal model objective study. In the predictive objective type, the model performance in terms of R 2 or AUC in training and validation data sets needs to be reported ( figure 1 ). 20 21 There are multiple purposes of data display, including data distribution using bar diagram or histogram or frequency polygons or box plots, comparisons using cluster bar diagram or scatter dot plot or stacked bar diagram or Kaplan-Meier plot, correlation or model assessment using scatter plot or scatter matrix, clustering or pattern using heatmap or line plots, the effect of predictors with fitted models using marginsplot, and comparative evaluation of effect sizes from regression models using forest plot. Although the key purpose of data display is to highlight critical issues or findings in the study, data display should essentially follow DGP and variable types and should be user-friendly. 54 79 Data interpretation heavily relies on the effect size measure along with study design and specified hypotheses. Sometimes, variables require standardization for descriptive comparison of effect sizes among exposures or interpreting small effect size, or centralization for interpreting intercept or avoiding collinearity due to interaction terms, or transformation for achieving model-related assumptions. 80 Appropriate methods of data reporting and interpretation aligned with study design, study hypothesis, and effect size measure should be specified in the statistical analysis section of research studies.
Published articles from reputed journals inappropriately summarized a categorized variable with mean and range, 81 summarized a highly skewed variable with mean and standard deviation, 57 and treated a categorized variable as a continuous variable in regression analyses. 82 Similarly, numerous examples from published studies reporting inappropriate graphical display or inappropriate interpretation of data not aligned with DGP or variable types are illustrated in a book published by Bland and Peacock. 83 84 A study used qualitative data on MRI but inappropriately presented with a Box-Whisker plot. 81 Another study reported unusually high OR for an association between high breast parenchymal enhancement and breast cancer in both premenopausal and postmenopausal women. 85 This reporting makes suspicious findings and may include sparse data bias. 86 A poor tabular presentation without proper scaling or standardization of a variable, missing CI for some variables, missing unit and sample size, and inconsistent reporting of decimal places could be easily noticed in table 4 of a published study. 29 Some published predictive models 87 do not report intercept or baseline survival estimates to use their predictive models in clinical use. Although a direct comparison of effect sizes obtained from the same model may be avoided if the units are different among variables, 35 a study had an objective to compare effect sizes across variables but the authors performed comparisons without standardization of variables or using statistical tests. 88
A sample for writing statistical analysis section in medical journals/research studies
Our primary study objective type was to develop a (select from figure 1 ) model to assess the relationship of risk factors (list critical variables or exposures) with outcomes (specify type from continuous/discrete/count/binary/polytomous/time-to-event). To address this objective, we conducted a (select from figure 2 or any other) study design to test the hypotheses of (equality or superiority or non-inferiority or equivalence or futility) or develop prediction. Accordingly, the other variables were adjusted or considered as (specify role of variables from confounders, covariates, or predictors or independent variables) as reflected in the conceptual framework. In the unadjusted or preliminary analyses as per the (select from figure 3 or any other design features) DGP, (specify EBB preferred tests from online supplemental table 3 or any other appropriate tests) were used for (specify variables and types) in unadjusted analyses. According to the EBB practice for the outcome (specify type) and DGP of (select from figure 3 or any other), we used (select from online supplemental table 1 or specify a multivariable approach) as the primary model in the multivariable analysis. We used (select from figure 1 ) variable selection method in the multivariable analysis and explored the interaction effects between (specify variables). The model diagnostics including (list all applicable, including model-related assumptions, linearity, or multicollinearity or overfitting or distribution of outcome or sparsity) were also assessed using (specify appropriate methods) respectively. In such exploration, we identified (specify diagnostic issues if any) and therefore the multivariable models were developed using (specify potential methods used to handle diagnostic issues). The other outcomes were analyzed with (list names of multivariable approaches with respective outcomes). All the models used the same procedure (or specify from figure 1 ) for variable selection, exploration of interaction effects, and model diagnostics using (specify statistical approaches) depending on the statistical models. As per the study design, hypothesis, and multivariable analysis, the results were summarized with effect size (select as appropriate or from figure 2 ) along with (specify 95% CI or other interval estimates) and considered statistically significant using (specify the side of p value or alternatives) at (specify the level of significance) due to (provide reasons for choosing a significance level). We presented unadjusted and/or adjusted estimates of primary outcome according to (list primary exposures or variables). Additional analyses were conducted for (specific reasons from step 9) using (specify methods) to validate findings obtained in the primary analyses. The data were summarized with (list summary measures and appropriate graphs from step 10), whereas the final multivariable model performance was summarized with (fit indices if applicable from step 10). We also used (list graphs) as appropriate with DGP (specify from figure 3 ) to present the critical findings or highlight (specify data issues) using (list graphs/methods) in the study. The exposures or variables were used in (specify the form of the variables) and therefore the effect or association of (list exposures or variables) on outcome should be interpreted in terms of changes in (specify interpretation unit) exposures/variables. List all other additional analyses if performed (with full details of all models in a supplementary file along with statistical codes if possible).
Concluding remarks
We highlighted 10 essential steps to be reported in the statistical analysis section of any analytical study ( figure 4 ). Adherence to minimum reporting of the steps specified in this report may enforce investigators to understand concepts and approach biostatisticians timely to apply these concepts in their study to improve the overall quality of methodological standards in grant proposals and research studies. The order of reporting information in statistical analyses specified in this report is not mandatory; however, clear reporting of analytical steps applicable to the specific study type should be mentioned somewhere in the manuscript. Since the entire approach of statistical analyses is dependent on the study objective type and EBB practice, proper execution and reporting of statistical models can be taught to the next generation of statisticians by the study objective type in statistical education courses. In fact, some disciplines ( figure 5 ) are strictly aligned with specific study objective types. Bioinformaticians are oriented in studying determinant and prognostic models toward precision medicine, while epidemiologists are oriented in studying association and causal models, particularly in population-based observational and pragmatic settings. Data scientists are heavily involved in prediction and classification models in personalized medicine. A common thing across disciplines is using biostatistical principles and computation tools to address any research question. Sometimes, one discipline expert does the part of others. 89 We strongly recommend using a team science approach that includes an epidemiologist, biostatistician, data scientist, and bioinformatician depending on the study objectives and needs. Clear reporting of data analyses as per the study objective type should be encouraged among all researchers to minimize heterogeneous practices and improve scientific quality and outcomes. In addition, we also encourage investigators to strictly follow transparent reporting and quality assessment guidelines according to the study design ( https://www.equator-network.org/ ) to improve the overall quality of the study, accordingly STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) for observational studies, CONSORT (Consolidated Standards of Reporting Trials) for clinical trials, STARD (Standards for Reporting Diagnostic Accuracy Studies) for diagnostic studies, TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis OR Diagnosis) for prediction modeling, and ARRIVE (Animal Research: Reporting of In Vivo Experiments) for preclinical studies. The steps provided in this document for writing the statistical analysis section is essentially different from other guidance documents, including SAMBR. 5 SAMBR provides a guidance document for selecting evidence-based preferred methods of statistical analysis according to different study designs, while this report suggests the global reporting of essential information in the statistical analysis section according to study objective type. In this guidance report, our suggestion strictly pertains to the reporting of methods in the statistical analysis section and their implications on the interpretation of results. Our document does not provide guidance on the reporting of sample size or results or statistical analysis section for meta-analysis. The examples and reviews reported in this study may be used to emphasize the concepts and related implications in medical research.
Summary of reporting steps, purpose, and evaluation measures in the statistical analysis section.
Role of interrelated disciplines according to study objective type.
Acknowledgments
The author would like to thank the reviewers for their careful review and insightful suggestions.
Contributors: AKD developed the concept and design and wrote the manuscript.
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests: AKD is a Journal of Investigative Medicine Editorial Board member. No other competing interests declared.
Provenance and peer review: Commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Data availability statement
Data sharing not applicable as no datasets generated and/or analyzed for this study.
Ethics statements
Patient consent for publication.
Not required.
- 1. Oster RA, Devick KL, Thurston SW, et al. Learning gaps among statistical competencies for clinical and translational science learners. J Clin Transl Sci 2020;5:e12. 10.1017/cts.2020.498 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 2. Sauerbrei W, Abrahamowicz M, Altman DG, et al. Strengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med 2014;33:5413–32. 10.1002/sim.6265 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 3. Thiese MS, Arnold ZC, Walker SD. The misuse and abuse of statistics in biomedical research. Biochem Med 2015;25:5–11. 10.11613/BM.2015.001 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 4. Steven S, RMGTMK H. Statistical modeling methods: challenges and strategies. Biostatistics & Epidemiology 2020;4:105–39. [ Google Scholar ]
- 5. Dwivedi AK, Shukla R. Evidence‐based statistical analysis and methods in biomedical research (SAMBR) checklists according to design features. Cancer Rep 2020;3:e1211. 10.1002/cnr2.1211 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 6. Ocaña-Riola R. The use of statistics in health sciences: situation analysis and perspective. Stat Biosci 2016;8:204–19. 10.1007/s12561-015-9138-4 [ DOI ] [ Google Scholar ]
- 7. Wang MQ, Yan AF, Katz RV. Researcher requests for inappropriate analysis and reporting: a U.S. survey of consulting Biostatisticians. Ann Intern Med 2018;169:554–8. 10.7326/M18-1230 [ DOI ] [ PubMed ] [ Google Scholar ]
- 8. Zhang G, Chen JJ. Biostatistics faculty and NIH awards at U.S. medical schools. Am Stat 2015;69:34–40. 10.1080/00031305.2014.992959 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 9. Harrington D, D'Agostino RB, Gatsonis C, et al. New Guidelines for Statistical Reporting in the Journal. N Engl J Med 2019;381:285–6. 10.1056/NEJMe1906559 [ DOI ] [ PubMed ] [ Google Scholar ]
- 10. Nojima M, Tokunaga M, Nagamura F. Quantitative investigation of inappropriate regression model construction and the importance of medical statistics experts in observational medical research: a cross-sectional study. BMJ Open 2018;8:e021129. 10.1136/bmjopen-2017-021129 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 11. Ciolino JD, Spino C, Ambrosius WT, et al. Guidance for biostatisticians on their essential contributions to clinical and translational research protocol review. J Clin Transl Sci 2021;5:e161. 10.1017/cts.2021.814 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 12. Gosselin R-D. Insufficient transparency of statistical reporting in preclinical research: a scoping review. Sci Rep 2021;11:3335. 10.1038/s41598-021-83006-5 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 13. Brown AW, Kaiser KA, Allison DB. Issues with data and analyses: errors, underlying themes, and potential solutions. Proc Natl Acad Sci U S A 2018;115:2563–70. 10.1073/pnas.1708279115 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 14. Bacchetti P. Peer review of statistics in medical research: the other problem. BMJ 2002;324:1271–3. 10.1136/bmj.324.7348.1271 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 15. Sima AP, Rodriguez VA, Bradbrook KE, et al. Incorporating professional recommendations into a graduate-level statistical consulting laboratory: a case study. J Clin Transl Sci 2020;5:e62. 10.1017/cts.2020.527 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 16. Fernandes-Taylor S, Hyun JK, Reeder RN, et al. Common statistical and research design problems in manuscripts submitted to high-impact medical journals. BMC Res Notes 2011;4:304. 10.1186/1756-0500-4-304 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 17. Real J, Forné C, Roso-Llorach A, et al. Quality reporting of multivariable regression models in observational studies: review of a representative sample of articles published in biomedical journals. Medicine 2016;95:e3653. 10.1097/MD.0000000000003653 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 18. Bouwmeester W, Zuithoff NPA, Mallett S, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012;9:e1001221–12. 10.1371/journal.pmed.1001221 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 19. Shmueli G. To explain or to predict? Statistical Science 2010;25:289–310. 10.1214/10-STS330 [ DOI ] [ Google Scholar ]
- 20. Kent P, Cancelliere C, Boyle E, et al. A conceptual framework for prognostic research. BMC Med Res Methodol 2020;20:172. 10.1186/s12874-020-01050-7 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 21. Sainani KL. Explanatory versus predictive modeling. Pm R 2014;6:841–4. 10.1016/j.pmrj.2014.08.941 [ DOI ] [ PubMed ] [ Google Scholar ]
- 22. Baser O. Choosing propensity score matching over regression adjustment for causal inference: when, why and how it makes sense. J Med Econ 2007;10:379–91. 10.3111/13696990701646577 [ DOI ] [ Google Scholar ]
- 23. Kent DM, Trikalinos TA, Hill MD. Are unadjusted analyses of clinical trials inappropriately biased toward the null? Stroke 2009;40:672–3. 10.1161/STROKEAHA.108.532051 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 24. Cummings P, Rivara FP. Reporting statistical information in medical Journal articles. Arch Pediatr Adolesc Med 2003;157:321–4. 10.1001/archpedi.157.4.321 [ DOI ] [ PubMed ] [ Google Scholar ]
- 25. Luo W, Phung D, Tran T, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016;18:e323. 10.2196/jmir.5870 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 26. Spera G, Fresco R, Fung H, et al. Beta blockers and improved progression-free survival in patients with advanced HER2 negative breast cancer: a retrospective analysis of the ROSE/TRIO-012 study. Ann Oncol 2017;28:1836–41. 10.1093/annonc/mdx264 [ DOI ] [ PubMed ] [ Google Scholar ]
- 27. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011;46:399–424. 10.1080/00273171.2011.568786 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 28. Chen X, Zhou Y, Wang R, et al. Potential clinical value of multiparametric PET in the prediction of Alzheimer's disease progression. PLoS One 2016;11:e0154406. 10.1371/journal.pone.0154406 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 29. Shaheen M, Echeverry D, Oblad MG, et al. Hepatitis C, metabolic syndrome, and inflammatory markers: results from the Third National Health and Nutrition Examination Survey [NHANES III]. Diabetes Res Clin Pract 2007;75:320–6. 10.1016/j.diabres.2006.07.008 [ DOI ] [ PubMed ] [ Google Scholar ]
- 30. Rajkumar P, Dwivedi AK, Dodoo CA, et al. The association between metabolic syndrome and hepatitis C virus infection in the United States. Cancer Causes Control 2020;31:569–81. 10.1007/s10552-020-01300-5 [ DOI ] [ PubMed ] [ Google Scholar ]
- 31. Knol MJ, Le Cessie S, Algra A, et al. Overestimation of risk ratios by odds ratios in trials and cohort studies: alternatives to logistic regression. CMAJ 2012;184:895–9. 10.1503/cmaj.101715 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 32. Althouse AD, Raffa GM, Kormos RL. Your results, explained: clarity provided by row percentages versus column percentages. Ann Thorac Surg 2016;101:15–17. 10.1016/j.athoracsur.2015.09.015 [ DOI ] [ PubMed ] [ Google Scholar ]
- 33. Dwivedi AK, Mallawaarachchi I, Lee S, et al. Methods for estimating relative risk in studies of common binary outcomes. J Appl Stat 2014;41:484–500. 10.1080/02664763.2013.840772 [ DOI ] [ Google Scholar ]
- 34. Cummings P. The relative merits of risk ratios and odds ratios. Arch Pediatr Adolesc Med 2009;163:438–45. 10.1001/archpediatrics.2009.31 [ DOI ] [ PubMed ] [ Google Scholar ]
- 35. Davies HT, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ 1998;316:989–91. 10.1136/bmj.316.7136.989 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 36. Oh S, Chung J, Baek S, et al. Postoperative expressive aphasia associated with intravenous midazolam administration: a 5-year retrospective case-control study. J Int Med Res 2020;48:030006052094875. 10.1177/0300060520948751 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 37. Chen ML, Gupta A, Chatterjee A, et al. Association between unruptured intracranial aneurysms and downstream stroke. Stroke 2018;49:2029–33. 10.1161/STROKEAHA.118.021985 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 38. Sturdik I, Krajcovicova A, Jalali Y, et al. Pathophysiology and risk factors for cholelithiasis in patients with Crohn's disease. Physiol Res 2019;68:S173–82. 10.33549/physiolres.934302 [ DOI ] [ PubMed ] [ Google Scholar ]
- 39. Liao Y-T, Yang S-Y, Liu H-C, et al. Cardiac complications associated with short-term mortality in schizophrenia patients hospitalized for pneumonia: a nationwide case-control study. PLoS One 2013;8:e70142. 10.1371/journal.pone.0070142 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 40. Doi SA, Furuya-Kanamori L, Xu C, et al. Controversy and debate: questionable utility of the relative risk in clinical research: paper 1: a call for change to practice. J Clin Epidemiol 2022;142:271–9. 10.1016/j.jclinepi.2020.08.019 [ DOI ] [ PubMed ] [ Google Scholar ]
- 41. Halsey LG. The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol Lett 2019;15:20190174. 10.1098/rsbl.2019.0174 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 42. RLWNA L. The ASA statement on p-Values: context, process, and purpose. The American Statistician;70:129–33. [ Google Scholar ]
- 43. Page P. Beyond statistical significance: clinical interpretation of rehabilitation research literature. Int J Sports Phys Ther 2014;9:726–36. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 44. Weinberg CR. It's time to rehabilitate the P-value. Epidemiology 2001;12:288–90. 10.1097/00001648-200105000-00004 [ DOI ] [ PubMed ] [ Google Scholar ]
- 45. Ou F-S, Le-Rademacher JG, Ballman KV, et al. Guidelines for statistical reporting in medical journals. J Thorac Oncol 2020;15:1722–6. 10.1016/j.jtho.2020.08.019 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 46. Romano S, Goodman W, Tamura R, et al. Long-Term treatment of obsessive-compulsive disorder after an acute response: a comparison of fluoxetine versus placebo. J Clin Psychopharmacol 2001;21:46–52. 10.1097/00004714-200102000-00009 [ DOI ] [ PubMed ] [ Google Scholar ]
- 47. Wason JMS, Stecher L, Mander AP. Correcting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials 2014;15:364. 10.1186/1745-6215-15-364 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 48. Tyler KM, Normand S-LT, Horton NJ. The use and abuse of multiple outcomes in randomized controlled depression trials. Contemp Clin Trials 2011;32:299–304. 10.1016/j.cct.2010.12.007 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 49. Aiso I, Inoue H, Seiyama Y, et al. Compared with the intake of commercial vegetable juice, the intake of fresh fruit and komatsuna (Brassica rapa L. var. perviridis) juice mixture reduces serum cholesterol in middle-aged men: a randomized controlled pilot study. Lipids Health Dis 2014;13:102. 10.1186/1476-511X-13-102 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 50. Allison DB, Antoine LH, George BJ. Incorrect statistical method in parallel-groups RCT led to unsubstantiated conclusions. Lipids Health Dis 2016;15:77. 10.1186/s12944-016-0242-3 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 51. Kim Y, Park S, Kim N-S, et al. Inappropriate survey design analysis of the Korean National health and nutrition examination survey may produce biased results. J Prev Med Public Health 2013;46:96–104. 10.3961/jpmph.2013.46.2.96 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 52. Khera R, Angraal S, Couch T, et al. Adherence to methodological standards in research using the National inpatient sample. JAMA 2017;318:2011–8. 10.1001/jama.2017.17653 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 53. Weissgerber TL, Garcia-Valencia O, Garovic VD, et al. Why we need to report more than 'Data were Analyzed by t-tests or ANOVA'. Elife 2018;7. 10.7554/eLife.36163. [Epub ahead of print: 21 12 2018]. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 54. Weissgerber TL, Milic NM, Winham SJ, et al. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol 2015;13:e1002128. 10.1371/journal.pbio.1002128 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 55. Alok Kumar D, Nand DS, Suryanarayana D, et al. An epidemiological study on delay in treatment initiation of cancer patients. Health 2012;4. 10.4236/health.2012.42012 [ DOI ] [ Google Scholar ]
- 56. Dwivedi AK, Dwivedi SN, Deo S, et al. Statistical models for predicting number of involved nodes in breast cancer patients. Health 2010;2:641–51. 10.4236/health.2010.27098 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 57. Poum A, Promthet S, Duffy SW, et al. Factors associated with delayed diagnosis of breast cancer in northeast Thailand. J Epidemiol 2014;24:102–8. 10.2188/jea.JE20130090 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 58. Ravdin PM, De Laurentiis M, Vendely T, et al. Prediction of axillary lymph node status in breast cancer patients by use of prognostic indicators. J Natl Cancer Inst 1994;86:1771–5. 10.1093/jnci/86.23.1771 [ DOI ] [ PubMed ] [ Google Scholar ]
- 59. Evans RG, Su D-F. Data presentation and the use of statistical tests in biomedical journals: can we reach a consensus? Clin Exp Pharmacol Physiol 2011;38:285–6. 10.1111/j.1440-1681.2011.05508.x [ DOI ] [ PubMed ] [ Google Scholar ]
- 60. Baker D, Lidster K, Sottomayor A, et al. Two years later: journals are not yet enforcing the ARRIVE guidelines on reporting standards for pre-clinical animal studies. PLoS Biol 2014;12:e1001756. 10.1371/journal.pbio.1001756 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 61. Zhang Y-Y, Zhou X-B, Wang Q-Z, et al. Quality of reporting of multivariable logistic regression models in Chinese clinical medical journals. Medicine 2017;96:e6972. 10.1097/MD.0000000000006972 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 62. Sauerbrei W, Perperoglou A, Schmid M, et al. State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues. Diagn Progn Res 2020;4:3. 10.1186/s41512-020-00074-3 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 63. Bursac Z, Gauss CH, Williams DK, et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 2008;3:17. 10.1186/1751-0473-3-17 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 64. Austin PC. Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study. J Clin Epidemiol 2008;61:1009–17. 10.1016/j.jclinepi.2007.11.014 [ DOI ] [ PubMed ] [ Google Scholar ]
- 65. Chen R-C, Dewi C, Huang S-W, et al. Selecting critical features for data classification based on machine learning methods. J Big Data 2020;7:52. 10.1186/s40537-020-00327-4 [ DOI ] [ Google Scholar ]
- 66. Geleris J, Sun Y, Platt J, et al. Observational study of hydroxychloroquine in hospitalized patients with Covid-19. N Engl J Med 2020;382:2411–8. 10.1056/NEJMoa2012410 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 67. Whitehouse AJO, Varcin KJ, Pillar S, et al. Effect of preemptive intervention on developmental outcomes among infants showing early signs of autism: a randomized clinical trial of outcomes to diagnosis. JAMA Pediatr 2021;175:e213298. 10.1001/jamapediatrics.2021.3298 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 68. Ware RS. Reporting both Unadjusted and adjusted estimates is essential to the interpretation of randomized clinical trial results. JAMA Pediatr 2022;176:325–6. 10.1001/jamapediatrics.2021.5544 [ DOI ] [ PubMed ] [ Google Scholar ]
- 69. Wang R, Ware JH. Detecting moderator effects using subgroup analyses. Prev Sci 2013;14:111–20. 10.1007/s11121-011-0221-x [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 70. Knol MJ, VanderWeele TJ. Recommendations for presenting analyses of effect modification and interaction. Int J Epidemiol 2012;41:514–20. 10.1093/ije/dyr218 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 71. Figueiras A, Domenech-Massons JM, Cadarso C. Regression models: calculating the confidence interval of effects in the presence of interactions. Stat Med 1998;17:2099–105. [ DOI ] [ PubMed ] [ Google Scholar ]
- 72. Ernst AF, Albers CJ. Regression assumptions in clinical psychology research practice-a systematic review of common misconceptions. PeerJ 2017;5:e3323. 10.7717/peerj.3323 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 73. Greenland S, Mansournia MA, Altman DG. Sparse data bias: a problem hiding in plain sight. BMJ 2016;352:i1981. 10.1136/bmj.i1981 [ DOI ] [ PubMed ] [ Google Scholar ]
- 74. Li C, Chen Y, Zheng Q, et al. Relationship between systolic blood pressure and all-cause mortality: a prospective study in a cohort of Chinese adults. BMC Public Health 2018;18:107. 10.1186/s12889-017-4965-5 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 75. Port S, Demer L, Jennrich R, et al. Systolic blood pressure and mortality. Lancet 2000;355:175–80. 10.1016/S0140-6736(99)07051-8 [ DOI ] [ PubMed ] [ Google Scholar ]
- 76. Wei M, Gaskill SP, Haffner SM, et al. Waist circumference as the best predictor of noninsulin dependent diabetes mellitus (NIDDM) compared to body mass index, waist/hip ratio and other anthropometric measurements in Mexican Americans--a 7-year prospective study. Obes Res 1997;5:16–23. 10.1002/j.1550-8528.1997.tb00278.x [ DOI ] [ PubMed ] [ Google Scholar ]
- 77. de Souza RJ, Eisen RB, Perera S, et al. Best (but oft-forgotten) practices: sensitivity analyses in randomized controlled trials. Am J Clin Nutr 2016;103:5–17. 10.3945/ajcn.115.121848 [ DOI ] [ PubMed ] [ Google Scholar ]
- 78. Thabane L, Mbuagbaw L, Zhang S, et al. A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med Res Methodol 2013;13:92. 10.1186/1471-2288-13-92 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 79. Kelleher C, Wagener T. Ten guidelines for effective data visualization in scientific publications. Environmental Modelling & Software 2011;26:822–7. 10.1016/j.envsoft.2010.12.006 [ DOI ] [ Google Scholar ]
- 80. Althouse AD, Below JE, Claggett BL, et al. Recommendations for statistical reporting in cardiovascular medicine: a special report from the American heart association. Circulation 2021;144:e70–91. 10.1161/CIRCULATIONAHA.121.055393 [ DOI ] [ PubMed ] [ Google Scholar ]
- 81. DeLeo MJ, Domchek SM, Kontos D, et al. Breast MRI fibroglandular volume and parenchymal enhancement in BRCA1 and BRCA2 mutation carriers before and immediately after risk-reducing salpingo-oophorectomy. AJR Am J Roentgenol 2015;204:669–73. 10.2214/AJR.13.12146 [ DOI ] [ PubMed ] [ Google Scholar ]
- 82. Dontchos BN, Rahbar H, Partridge SC, et al. Are qualitative assessments of background parenchymal enhancement, amount of Fibroglandular tissue on Mr images, and mammographic density associated with breast cancer risk? Radiology 2015;276:371–80. 10.1148/radiol.2015142304 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 83. Bland M, Peacock J. Statistical questions in evidence-based medicine, 2000. [ Google Scholar ]
- 84. Chen JC, Cooper RJ, McMullen ME, et al. Graph quality in top medical journals. Ann Emerg Med 2017;69:453–61. 10.1016/j.annemergmed.2016.08.463 [ DOI ] [ PubMed ] [ Google Scholar ]
- 85. Telegrafo M, Rella L, Stabile Ianora AA, et al. Breast MRI background parenchymal enhancement (BPE) correlates with the risk of breast cancer. Magn Reson Imaging 2016;34:173–6. 10.1016/j.mri.2015.10.014 [ DOI ] [ PubMed ] [ Google Scholar ]
- 86. Thompson CM, Mallawaarachchi I, Dwivedi DK, et al. The association of background parenchymal enhancement at breast MRI with breast cancer: a systematic review and meta-analysis. Radiology 2019;292:552–61. 10.1148/radiol.2019182441 [ DOI ] [ PubMed ] [ Google Scholar ]
- 87. Ramspek CL, Jager KJ, Dekker FW, et al. External validation of prognostic models: what, why, how, when and where? Clin Kidney J 2021;14:49–58. 10.1093/ckj/sfaa188 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 88. Hansson O, Zetterberg H, Buchhave P, et al. Association between CSF biomarkers and incipient Alzheimer's disease in patients with mild cognitive impairment: a follow-up study. Lancet Neurol 2006;5:228–34. 10.1016/S1474-4422(06)70355-6 [ DOI ] [ PubMed ] [ Google Scholar ]
- 89. Goldstein ND, LeVasseur MT, McClure LA. On the convergence of epidemiology, biostatistics, and data science. Harv Data Sci Rev 2020;2. 10.1162/99608f92.9f0215e6. [Epub ahead of print: 30 04 2020]. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data availability statement.
- View on publisher site
- PDF (848.0 KB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
Add to Collections
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
Biostatistics articles from across Nature Portfolio
Biostatistics is the application of statistical methods in studies in biology, and encompasses the design of experiments, the collection of data from them, and the analysis and interpretation of data. The data come from a wide range of sources, including genomic studies, experiments with cells and organisms, and clinical trials.
Latest Research and Reviews
In-Clinic and Natural Gait Observations master protocol (I-CAN-GO) to validate gait using a lumbar accelerometer
- Miles Welbourn
- Paul Sheriff
Azacitidine and gemtuzumab ozogamicin as post-transplant maintenance therapy for high-risk hematologic malignancies
- Satoshi Kaito
- Yuho Najima
- Noriko Doki
Impact of COVID-19 on antibiotic usage in primary care: a retrospective analysis
- Anna Romaszko-Wojtowicz
- K. Tokarczyk-Malesa
- K. Glińska-Lewczuk
A standardized metric to enhance clinical trial design and outcome interpretation in type 1 diabetes
The use of a standardized outcome metric enhances clinical trial interpretation and cross-trial comparison. Here, the authors show the implementation of such a metric using type 1 diabetes trial data, reassess and compare results from these trials, and extend its use to define response to therapy.
- Alyssa Ylescupidez
- Henry T. Bahnson
- Carla J. Greenbaum
A novel approach to visualize clinical benefit of therapies for chronic graft versus host disease (cGvHD): the probability of being in response (PBR) applied to the REACH3 study
- Norbert Hollaender
- Ekkehard Glimm
- Robert Zeiser
Reproducibility in pharmacometrics applied in a phase III trial of BCG-vaccination for COVID-19
- Rob C. van Wijk
- Laurynas Mockeliunas
- Ulrika S. H. Simonsson
News and Comment
Reply to ‘bayesian approaches in drug development: continuing the virtuous cycle’.
- Stephen J. Ruberg
Bayesian approaches in drug development: continuing the virtuous cycle
- John Constant
Mitigating immortal-time bias: exploring osteonecrosis and survival in pediatric ALL - AALL0232 trial insights
- Shyam Srinivasan
- Swaminathan Keerthivasagam
Response to Pfirrmann et al.’s comment on How should we interpret conclusions of TKI-stopping studies
- Junren Chen
- Robert Peter Gale
Cell-free DNA chromosome copy number variations predict outcomes in plasma cell myeloma
- Wanting Qiang
The role of allogeneic haematopoietic cell transplantation as consolidation after anti-CD19 CAR-T cell therapy in adults with relapsed/refractory acute lymphoblastic leukaemia: a prospective cohort study
- Lijuan Zhou
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Biostatistics & Research Design
Many of Northwestern University's Biostatistics, Epidemiology and Research Design activities take place within the Biostatistics Collaboration Center (BCC) .
The BCC is housed within NUCATS and provides biostatistics expertise in all aspects of research, including proposal development, study design, data management, statistical analysis and manuscript preparation.
Specific areas of BCC expertise include statistical genetics, clinical trial design and longitudinal, multilevel and survival analysis. The BCC consists of PhD faculty and master’s-level statisticians who are committed to and experienced in collaborating with clinical and translational investigators.
Members of the BCC can help you:
- Determine study design and sample size
- Design a statistical analysis plan
- Develop study protocols
- Setup study databases
- Tailor methods, sample size and statistical analysis portions of proposals to study aims
- Analyze existing data
- Prepare statistical reports
- Author manuscripts, especially methods and results sections
- Respond to reviewer comments
In addition to collaborating on funded grants and providing gratis initial consultations, the BCC supports a Recharge/Fee-for-Service Model (Hourly Rate). For up-to-date hourly recharge fees please visit the BCC’s Research Services and Support page.
For more information or to schedule a free initial consultation, visit the BCC website . If you have questions, please email [email protected] .
Follow NUCATS on Twitter
Biostatistics And Research Methodology
Introduction: Statistics, Biostatistics, Frequency distribution Measures of central tendency: Mean, Median, Mode- Pharmaceutical examples Measures of dispersion: Dispersion, Range, standard deviation, Pharmaceutical problems Correlation: Definition, Karl Pearson’s coefficient of correlation, Multiple correlation - Pharmaceuticals examples
Regression: Curve fitting by the method of least squares, fitting the lines y= a + bx and x = a + by, Multiple regression, standard error of regression - Pharmaceutical Examples Probability: Definition of probability, Binomial distribution, Normal distribution, Poisson’s distribution, properties - problems Sample, Population, large sample, small sample, Null hypothesis, alternative hypothesis, sampling, essence of sampling, types of sampling, Error-I type, Error-II type, Standard error of mean (SEM) - Pharmaceutical examples Parametric test: t-test(Sample, Pooled or Unpaired and Paired) , ANOVA, (One way and Two way), Least Significance difference
Non Parametric tests: Wilcoxon Rank Sum Test, Mann-Whitney U test, Kruskal-Wallis test, Friedman Test Introduction to Research: Need for research, Need for design of Experiments, Experiential Design Technique, plagiarism Graphs: Histogram, Pie Chart, Cubic Graph, response surface plot, Counter Plot graph Designing the methodology: Sample size determination and Power of a study, Report writing and presentation of data, Protocol, Cohorts studies, Observational studies, Experimental studies, Designing clinical trial, various phases.
Blocking and confounding system for Two-level factorials Regression modeling: Hypothesis testing in Simple and Multiple regressionmodels Introduction to Practical components of Industrial and Clinical Trials Problems: Statistical Analysis Using Excel, SPSS, MINITAB®, DESIGN OF EXPERIMENTS, R - Online Statistical Software’s to Industrial and Clinical trial approach
Design and Analysis of experiments: Factorial Design: Definition, 2², 2³ design. Advantage of factorial design Response Surface methodology: Central composite design, Historical design, Optimization Techniques
- Biostatistics and Research Methodology
5. REPORT WRITING AND PRESENTATION OF DATA
Path: pharmd/ pharmd notes/ pharmd fourth year notes/ biostatistics and research methodology / report writing and presentation of data., leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
Name *
Email *
Add Comment *
Save my name, email, and website in this browser for the next time I comment.
Post Comment
Trending now
4+1 BA Gallatin/MS Biostatistics GPH
Almost every field of knowledge relies on data to study important questions, so statisticians and data scientists are in high demand. In fact, statisticians currently rank #2 in best business jobs and #5 in best STEM jobs, according to U.S. News & World Report.
The 4+1 BA/MS in Biostatistics allows students to earn both a bachelor of arts from NYU Gallatin and a master of science from NYU GPH in less time than it would take to complete both programs separately. The program is designed for academically strong students in almost every field of science, social science and the humanities, including mathematics, economics, psychology, sociology, biology, chemistry, physics and English, among many others. In addition, for students who want a flexible skill set applicable to a variety of fields to bolster their strong commitment to public health, the MS in biostatistics will allow you to supplement your Gallatin BA with expertise in data analysis and statistical computing.
How It Works
In their sophomore year, students who are interested in the 4 + 1 in Biostatistics should speak with an adviser and complete the application for the program by the end of the year. Students admitted to the 4+1 BA/MS will accelerate their progress to the MPH degree by earning 15 MPH credits during their undergraduate program. These will come from general electives and will be double counted -- representing a savings of 15 credits and allowing you to complete the MS portion of the 4 + 1 in just one additional year for a total of 159 credits – 128 credits for the BA and an additional 31 credits for the MS (taken separately, the Gallatin BA requires 128 credits and the MS requires 46 credits, for a total of 174 credits taken over six years.)
4 + 1 Biostatistics Program Sequence
When planning out the courses, students should be aware that there are several selective options as outlined below. While you must take one course in either selective category, you may take the other as an elective.
Choose one of the following (3 credits) GPH-GU 2286 Introduction to Data Management and Statistical Computing (3) GPH-GU 2182 Statistical Programming in R (3)
Choose one of the following (3 credits): GPH-GU 2225 Psychometric Measurement & Analysis in Public Health Research & Practice (3) GPH-GU 2387 Survey Design, Analysis, and Reporting (3)
Choose one of the following (3 credits): GPH-GU 2480 Longitudinal Analysis of Public Health Data (3) GPH-GU 2368 Applied Survival Analysis (3)
Choose one of the following (3 credits): GPH-GU 2930 Epidemiology Design & Methods (3) GPH-GU 3225 Statistical Inference (3) APSTA-GE-2012 Causal Inference (3) Students who are beginning their MPH coursework in Fall 2024 or later should follow the sequence below: (students who began their MS coursework PRIOR to Fall 2024 should follow this sequence ).
Gallatin coursework | Gallatin coursework |
Gallatin coursework | Gallatin coursework |
GPH-GU 2106 Epidemiology (3) GPH-GU 2995 Biostatistics for Public Health (3) Gallatin coursework (12 credits) | GPH-GU 2353 Regression I: Linear Regression and Modeling (3) Gallatin coursework (13 credits) |
GPH-GU 2286 Intro to Data Management and Statistical Computing (3) GPH-GU 2182 Statistical Programming in R (3) GPH-GU 5170 Introduction to Public Health (0) Gallatin coursework (15 credits) | GPH-GU 2450 Intermediate Epidemiology (3) Gallatin coursework (16 credits) |
While an internship is not a requirement of the MS, students are strongly encouraged to work on projects that involve data during the summer. Additionally, students are also encouraged to take up to 6 credits of coursework during this time. | |
GPH-GU 2686 Thesis I: Practice and Integrative Learning Experiences (2) GPH-GU 2354 Regression II: Categorical Data Analysis (3) GPH-GU 2930 Epidemiology Design & Methods (3) GPH-GU 3225 Statistical Inference (3) - APSTA-GE 2021 Causal Inference (3) GPH-GU 2363 Causal Inference: Design and Analysis (3) GPH-GU 2225 Psychometric Measurement & Analysis in Public Health Research & Practice (3) GPH-GU 2387 Survey Design, Analysis, and Reporting (3) Elective (3) Elective (3) | GPH-GU 2687 Thesis II: Practice and Integrative Learning Experiences (2) GPH-GU 2361 Research Methods in Public Health (3) GPH-GU 2480 Longitudinal Analysis of Public Health Data (3) GPH-GU 2368 Applied Survival Analysis (3) Elective (3) Elective (3) |
* students in the Biostatistics concentration take 12 credits, 9 of which are required to be in an approved, thematic area (e.g., clinical trials, machine learning and modeling, data science) and must have statistical content. A list of possible electives may be found on the bottom half of this page .
How to Apply
Admission to this program is open to Gallatin undergraduate students who have completed 64 units toward their BA degree, with a GPA of 3.5 or higher, and have completed the equivalent of a college level precalculus course.
Interested students should contact Cameron Williams, Gallatin Sophomore Class Adviser and their faculty adviser to investigate the specifics of the program and to make sure this is the correct program for them. In the second semester of the sophomore year, students should complete the Gallatin-GPH BA-MS in Biostatistics Dual Degree Application, which includes a one-page Statement of Purpose, and which also requires the faculty adviser's approval.
Eligibility requirements:
- Must have completed a minimum of 44 credits toward the BA prior to applying.
- Must have a minimum cumulative GPA of 3.5 with no outstanding incomplete grades.
- Prior to applying for the dual degree, students must have completed a pre-calculus course (or equivalent) with a grade of "B" or better.
- Prior to applying for the dual degree, students must meet with an adviser in the School of Global Public health.
- Must have a grade of "C" or better in any School of Global Public Health MS core courses already completed.
- Must write a one-page (single-spaced) Statement of Purpose, describing your interest in the BA-MS in Biostatistics dual degree and how you see this program fitting with your overall academic and career goals.
Still have questions? Email Cameron Williams, Gallatin Sophomore Class Adviser at [email protected] OR Miguel Silva, Assistant Director of Student Affairs at School of Global Public Health at [email protected] .
COMMENTS
Introduction. Rigorous scientific review of research protocols is critical to making funding decisions [1, 2], and to the protection of both human and non-human research participants [].Two pillars of ethical clinical and translational research include scientific validity and independent review of the proposed research [].As such, the review process often emphasizes the scientific approach and ...
This document is specifically about how to report statistical results. Refer to our handout "Writing an APA Empirical (lab) Report" for details on writing a results section. Every statistical test that you report should relate directly to a hypothesis. Begin the results section by restating each hypothesis, then state whether your results ...
Abstract. Collecting, analyzing, and interpreting data are essential components of biomedical research and require biostatistics. Doing various statistical tests has been made easy by sophisticated computer software. It is important for the investigator and the interpreting clinician to understand the basics of biostatistics for two reasons.
This paper reviews the use of descriptive statistics to describe the participants included in a study. It discusses the practicalities of incorporating statistics in papers for publication in Age and Aging, concisely and in ways that are easy for readers to understand and interpret. older people, descriptive statistics, study participants ...
INTRODUCTION. Biostatistics is a branch of applied statistics and it must be taught with the focus being on its various applications in biomedical research.[] It is an essential tool for medical research, clinical decision making, and health management.[] Statisticians have long expressed concern about the slow uptake of statistical ideas by the medical profession and the frequent misuse of ...
Medical writing, as we know it today, is still a relatively new field, notes Tom Lang, who has taught Interpreting & Reporting Biostatistics for the University's Medical Writing and Editing certificate program since it began in 1999. Lang points out that, although medical writing is often defined as writing about medicine, as a subset of technical writing, it differs greatly from literary ...
Basic Biostatistics for Clinicians. A broad understanding of the basic concepts of biostatistics is essential for surgeons. This chapter serves as a succinct guide of key biostatistics concepts to help develop sound research questions and evaluate evidence to advance surgical practice .
A well defined research question is vital. Once we have a research question, we can ask: What is the appropriate analytical task for the research question? Let the question determine the methods: descriptive epidemiology done right https://pubmed.ncbi.nlm.nih.gov/32814836/ Understanding the analytical task Study Design
Writing Dissertation and Grant Proposals: Epidemiology, Preventative Medicine, and Biostatistics by Lisa Chasan-Taber. Call Number: E-book R853.P75 C48 2014. ISBN: 9781466512078. Publication Date: 2014. Qualitative and Mixed Methods in Public Health by Deborah K. Padgett. Call Number: RA440.85 .P335 2012.
important writing you will do for the paper. IMHO your reader will either be interested and continuing on with your paper, or... A scholarly introduction is respectful of the literature. In my experience, the introduction is part of a paper that I will outline relatively early in the process, but will nish and repeatedly edit at the end of the
TWP Writing Studio offers free, one-on-one writing consultations to Duke students. Held annually in October/November. Limited seats for fellow, postdocs, students, and staff. aims to advance faculty writing through workshops, retreats, writing groups, and consultations. Sponsored by the Thompson Writing Program.
jim-2022-002479supp001.pdf (189.9KB, pdf) . Results. Although biostatistical inputs are critical for the entire research study (online supplemental table 2), biostatistical consultations were mostly used for statistical analyses only 15.Even though the conduct of statistical analysis mismatched with the study objective and DGP was identified as the major problem in articles submitted to high ...
Summary. This paper provides an overview of commonly used biostatistical research designs and methods. It helps the reader understand the biostatistical developments in the subject-matter areas that will be discussed in detail in the collection of chapters under the present topic. General biostatistical research strategies will be discussed.
Biostatistics is the application of statistical methods in studies in biology, and encompasses the design of experiments, the collection of data from them, and the analysis and interpretation of data.
Many of Northwestern University's Biostatistics, Epidemiology and Research Design activities take place within the Biostatistics Collaboration Center (BCC).. The BCC is housed within NUCATS and provides biostatistics expertise in all aspects of research, including proposal development, study design, data management, statistical analysis and manuscript preparation.
Abstract and Figures. Statistics is the study of the collection, organization, analysis, interpretation, and presentation of data. This review deals with all aspects of this, including the ...
Non Parametric tests: Wilcoxon Rank Sum Test, Mann-Whitney U test, Kruskal-Wallis test, Friedman Test Introduction to Research: Need for research, Need for design of Experiments, Experiential Design Technique, plagiarism Graphs: Histogram, Pie Chart, Cubic Graph, response surface plot, Counter Plot graph Designing the methodology: Sample size determination and Power of a study, Report writing ...
Abstract. This guide for writers of research reports consists of practical suggestions for writing a report that is clear, concise, readable, and understandable. It includes suggestions for terminology and notation and for writing each section of the report—introduction, method, results, and discussion. Much of the guide consists of ...
PATH: PHARMD/ PHARMD NOTES/ PHARMD FOURTH YEAR NOTES/ BIOSTATISTICS AND RESEARCH METHODOLOGY / REPORT WRITING AND PRESENTATION OF DATA. Previous 4. SAMPLE SIZE DETERMINATION AND POWER OF A STUDY.
In fact, statisticians currently rank #2 in best business jobs and #5 in best STEM jobs, according to U.S. News & World Report. The 4+1 BA/MS in Biostatistics allows students to earn both a bachelor of arts from NYU Gallatin and a master of science from NYU GPH in less time than it would take to complete both programs separately.