research article statistics

Statistical Papers

Statistical Papers is a forum for presentation and critical assessment of statistical methods encouraging the discussion of methodological foundations and potential applications.

  • The Journal stresses statistical methods that have broad applications, giving special attention to those relevant to the economic and social sciences.
  • Covers all topics of modern data science, such as frequentist and Bayesian design and inference as well as statistical learning.
  • Contains original research papers (regular articles), survey articles, short communications, reports on statistical software, and book reviews.
  • High author satisfaction with 90% likely to publish in the journal again.
  • Werner G. Müller,
  • Carsten Jentsch,
  • Shuangzhe Liu,
  • Ulrike Schneider

research article statistics

Latest issue

Volume 65, Issue 6

Latest articles

On the functional regression model and its finite-dimensional approximations.

  • José R. Berrendero
  • Alejandro Cholaquidis
  • Antonio Cuevas

research article statistics

Additive partial linear models with autoregressive symmetric errors and its application to the hospitalizations for respiratory diseases

  • Shu Wei Chou-Chen
  • Rodrigo A. Oliveira
  • Gilberto A. Paula

research article statistics

Profile quasi-maximum likelihood estimation for semiparametric varying-coefficient spatial autoregressive panel models with fixed effects

  • Ruiqin Tian
  • Miaojie Xia

research article statistics

Estimation of multicomponent system reliability for inverse Weibull distribution using survival signature

  • Nabakumar Jana
  • Samadrita Bera

research article statistics

Inference on Weibull inverted exponential distribution under progressive first-failure censoring with constant-stress partially accelerated life test

  • Abdullah Fathi
  • Al-Wageh A. Farghal
  • Ahmed A. Soliman

research article statistics

Journal updates

Write & submit: overleaf latex template.

Overleaf LaTeX Template

Journal information

  • Australian Business Deans Council (ABDC) Journal Quality List
  • Current Index to Statistics
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Mathematical Reviews
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Research Papers in Economics (RePEc)
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© Springer-Verlag GmbH Germany, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research

research article statistics

  • Research, Methods, Statistics

Explore the latest in research, methods, and statistics, including topics in clinical research infrastructure, design, conduct, and analysis.

Publication

Article type.

This diagnostic study examines how a large language model used for title and abstract screening for systematic reviews compares in accuracy and efficiency with conventional screening methods.

  • NIH Introduces National Primary Care Research Network in US JAMA News July 5, 2024 Medical Education and Training Research, Methods, Statistics Ethics Equity, Diversity, and Inclusion Full Text | pdf link PDF free

This cross-sectional study of US adults examines the geographical distribution of individuals eligible to participate in the Semaglutide Effects on Heart Disease and Stroke in Patients With Overweight or Obesity (SELECT) trial to estimate potential cardiovascular health impacts of implementing the trial findings at state and national levels.

  • Adapting to the Changing Landscape of Open Access Medical Publishing at JAMA Network Open JAMA Network Open Opinion July 1, 2024 Medical Journals and Publishing Research, Methods, Statistics Academic Medicine Equity, Diversity, and Inclusion Coronavirus (COVID-19) Full Text | pdf link PDF open access has multimedia
  • The Privilege of Being Editor-in-Chief JAMA Network Open Opinion June 24, 2024 Academic Medicine Medical Journals and Publishing Research, Methods, Statistics Pediatrics Coronavirus (COVID-19) Full Text | pdf link PDF open access has multimedia

This cross-sectional study examines the availability of consent forms for National Institutes of Health–funded trials on ClinicalTrials.gov.

This Viewpoint discusses changes proposed by the US Department of Health and Human Services’ Office of Research Integrity that would shift control of research misconduct proceedings from institutional oversight to federal authority.

This Viewpoint discusses the need to protect participants in clinical research and the opportunity to address this issue as the World Medical Association works to revise its Declaration of Helsinki, which governs medical research ethics.

This JAMA Guide to Statistics and Methods article explains the test-negative study design, an observational study design routinely used to estimate vaccine effectiveness, and examines its use in a study that estimated the performance of messenger RNA boosters against the Omicron variant.

This survey study examines the extent to which for-profit and nonprofit or governmental clinical research sites use strategies to recruit trial participants from socioeconomically disadvantaged groups.

  • Assessing Strategies for Inclusion of Marginalized Communities in Clinical Trials—What’s the Plan? JAMA Network Open Opinion June 7, 2024 Equity, Diversity, and Inclusion Health Disparities Research, Methods, Statistics Health Policy Health Inequities Full Text | pdf link PDF open access

This cohort study examines the association of principal investigator (PI) turnover with patient enrollment in surgical clinical trials.

This systematic review of randomized clinical trials investigating hematological malignant neoplasms examines how often patient-reported outcomes are included in results.

This Viewpoint from the FDA discusses how pragmatic clinical research—assessment that uses real-world data, often in combination with research data, after initial marketing approval—can help in evaluation of new technologies, benefit research sites in underresourced settings, and better inform regulatory decisions and clinical practice.

This Viewpoint discusses how proposed Centers for Medicare & Medicaid Services data access changes may impede health services research.

This JAMA Guide to Statistics and Methods article discusses accounting for competing risks in clinical research.

This cohort study examines the utility of adapting the Risk Analysis Index for Frailty using diagnostic codes and whether the adapted version is associated with adverse outcomes.

This survey study evaluates the feasibility and reliability of using large language models to assess risk of bias in randomized clinical trials.

This cross-sectional study compares self-reported social determinants of health needs and Social Vulnerability Index associated with the individual’s residence among members and nonmembers of a large health care system in Pennsylvania.

This comparative effectiveness study examines data from randomized clinical trials on use of nonmatched vs matched placebos in treatment of COVID-19.

Select Your Interests

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

Articles on Statistics

Displaying 1 - 20 of 256 articles.

research article statistics

Four tips to avoid being bamboozled by political statistics and data

Renaud Foucart , Lancaster University

research article statistics

Does the state of the UK economy inspire confidence? An expert crunches the numbers

Michael Nower , Durham University

research article statistics

Vegan dog food has been hailed as the healthiest – our study shows the reality is more complicated

Alexander German , University of Liverpool and Richard Barrett-Jolley , University of Liverpool

research article statistics

The luck of the puck in the Stanley Cup – why chance plays such a big role in hockey

Mark Robert Rank , Arts & Sciences at Washington University in St. Louis

research article statistics

South Africa is short of academic statisticians: why and what can be done

Inger Fabris-Rotelli , University of Pretoria ; Ansie Smit , University of Pretoria ; Danielle Jade Roberts , University of KwaZulu-Natal ; Daniel Maposa , University of Limpopo ; Fabio Mathias Correa , University of the Free State ; Michael Johan von Maltitz , University of the Free State , and Sonali Das , University of Pretoria

research article statistics

School results, smoking rates, shop closures? New statistics tool helps you compare local areas in the UK

Richard Harris , University of Bristol

research article statistics

For over a century, baseball’s scouts have been the backbone of America’s pastime – do they have a future?

H. James Gilmore , Flagler College and Tracy Halcomb , Flagler College

research article statistics

Social media apps have billions of ‘active users’. But what does that really mean?

Milovan Savic , Swinburne University of Technology

research article statistics

The ‘average’ revolutionized scientific research, but overreliance on it has led to discrimination and injury

Zachary del Rosario , Olin College of Engineering

research article statistics

Here’s why you should (almost) never use a pie chart for your data

Adrian Barnett , Queensland University of Technology and Victor Oguoma , The University of Queensland

research article statistics

20 people, 2.4 quintillion possibilities: the baffling statistics of Secret Santa

Stephen Woodcock , University of Technology Sydney

research article statistics

South Africa’s 2022 census missed 31% of people - big data could help in future

David Everatt , University of the Witwatersrand

research article statistics

From stock markets to brain scans, new research harmonises hundreds of scientific methods to understand complex systems

Ben Fulcher , University of Sydney

research article statistics

Tests that diagnose diseases are less reliable than you’d expect. Here’s why

Adrian Barnett , Queensland University of Technology and Nicole White , Queensland University of Technology

research article statistics

The order in which you acquire diseases could affect your life expectancy – new research

Rhiannon Owen , Swansea University

research article statistics

Bazball by the numbers: what the stats say about English cricket’s ambitious but risky change of pace

Tim Newans , Griffith University and Christopher Drovandi , Queensland University of Technology

research article statistics

If 1% of COVID-19 cases result in death, does that mean you have a 1% chance of dying if you catch it? A mathematician explains the difference between a population statistic and your personal risk

Joseph Stover , Gonzaga University

research article statistics

Best time to play Tim Hortons’ Roll up to Win? The middle of the night dramatically increases your odds

Michael Wallace , University of Waterloo

research article statistics

Declines in math readiness underscore the urgency of math awareness

Manil Suri , University of Maryland, Baltimore County

research article statistics

Robodebt not only broke the laws of the land – it also broke laws of mathematics

Noel Cressie , University of Wollongong

Related Topics

  • Coronavirus
  • Mathematics
  • Probability
  • Quick reads

Top contributors

research article statistics

Professor of the Practice of Data Science, Washington University in St. Louis

research article statistics

Professor of Statistics, Queensland University of Technology

research article statistics

Senior Lecturer in Statistics, University of Glasgow

research article statistics

Associate Professor of Mathematical Sciences, University of Technology Sydney

research article statistics

Professor of Biostatistics and Epidemiology, University of South Australia

research article statistics

Professor of Business and Sports Analytics, University of Leeds

research article statistics

Professor of Economics and Public Policy, ANU College of Arts and Social Sciences, Australian National University

research article statistics

Associate Professor, Department of Statistics and Actuarial Science, University of Waterloo

research article statistics

Professor of Statistics and Interim Dean, College of Science, Virginia Tech

research article statistics

Associate Professor, POLIS@ANU Centre for Social Policy Research, Australian National University

research article statistics

Professor of Bioinformatics, WEHI (Walter and Eliza Hall Institute of Medical Research)

research article statistics

Professor, Future Fellow and Head of Statistics at UNSW, and a Deputy Director of the Australian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS), UNSW Sydney

research article statistics

Biostatistician, The University of Melbourne

research article statistics

Emeritus Professor, La Trobe University

  • X (Twitter)
  • Unfollow topic Follow topic

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

  • PLOS Biology
  • PLOS Climate
  • PLOS Complex Systems
  • PLOS Computational Biology
  • PLOS Digital Health
  • PLOS Genetics
  • PLOS Global Public Health
  • PLOS Medicine
  • PLOS Mental Health
  • PLOS Neglected Tropical Diseases
  • PLOS Pathogens
  • PLOS Sustainability and Transformation
  • PLOS Collections
  • How to Report Statistics

How to Report Statistics

Ensure appropriateness and rigor, avoid flexibility and above all never manipulate results

In many fields, a statistical analysis forms the heart of both the methods and results sections of a manuscript. Learn how to report statistical analyses, and what other context is important for publication success and future reproducibility.

A matter of principle

First and foremost, the statistical methods employed in research must always be:

Checklist icon

Appropriate for the study design

Data management icon

Rigorously reported in sufficient detail for others to reproduce the analysis

Fairness icon

Free of manipulation, selective reporting, or other forms of “spin”

Just as importantly, statistical practices must never be manipulated or misused . Misrepresenting data, selectively reporting results or searching for patterns  that can be presented as statistically significant, in an attempt to yield a conclusion that is believed to be more worthy of attention or publication is a serious ethical violation. Although it may seem harmless, using statistics to “spin” results can prevent publication, undermine a published study, or lead to investigation and retraction.

Supporting public trust in science through transparency and consistency

Along with clear methods and transparent study design, the appropriate use of statistical methods and analyses impacts editorial evaluation and readers’ understanding and trust in science.

In 2011  False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant exposed that “flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates” and demonstrated “how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis”.

Arguably, such problems with flexible analysis lead to the “ reproducibility crisis ” that we read about today. 

A constant principle of rigorous science The appropriate, rigorous, and transparent use of statistics is a constant principle of rigorous, transparent, and Open Science. Aim to be thorough, even if a particular journal doesn’t require the same level of detail. Trust in science is all of our responsibility. You cannot create any problems by exceeding a minimum standard of information and reporting.

research article statistics

Sound statistical practices

While it is hard to provide statistical guidelines that are relevant for all disciplines, types of research, and all analytical techniques,  adherence to rigorous and appropriate principles remains key. Here are some ways to ensure your statistics are sound.

Define your analytical methodology before you begin Take the time to consider and develop a thorough study design that defines your line of inquiry, what you plan to do, what data you will collect, and how you will analyze it. (If you applied for research grants or ethical approval, you probably already have a plan in hand!) Refer back to your study design at key moments in the research process, and above all, stick to it.

To avoid flexibility and improve the odds of acceptance, preregister your study design with a journal Many journals offer the option to submit a study design for peer review before research begins through a practice known as preregistration. If the editors approve your study design, you’ll receive a provisional acceptance for a future research article reporting the results. Preregistering is a great way to head off any intentional or unintentional flexibility in analysis.  By declaring your analytical approach in advance you’ll increase the credibility and reproducibility of your results and help address publication bias, too. Getting peer review feedback on your study design and analysis plan before it has begun (when you can still make changes!) makes your research even stronger AND increases your chances of publication—even if the results are negative or null. Never underestimate how much you can help increase the public’s trust in science by planning your research in this way.

Imagine replicating or extending your own work, years in the future Imagine that you are describing your approach to statistical analysis for your future self, in exactly the same way as we have described for writing your methods section . What would you need to know to replicate or extend your own work? When you consider that you might be at a different institution, working with different colleagues,  using different programs, applications, resources — or maybe even adopting new statistical techniques that have emerged — you can help yourself imagine the level of reporting specificity that you yourself would require to redo or extend your work. Consider:

  • Which details would you need to be reminded of? 
  • What did you do to the raw data before analysis?
  • Did the purpose of the analysis change before or during the experiments?
  • What participants did you decide to exclude? 
  • What process did you adjust, during your work? 

Even if a necessary adjustment you made was not ideal, transparency is the key to ensuring this is not regarded as an issue in the future. It is far better to transparently convey any non-optimal techniques or constraints than to conceal them, which could result in reproducibility or ethical issues downstream.

Existing standards, checklists, guidelines for specific disciplines

You can apply the Open Science practices outlined above no matter what your area of expertise—but in many cases, you may still need more detailed guidance specific to your own field. Many  disciplines, fields, and projects have worked hard to develop guidelines and resources  to help with statistics, and to identify and avoid bad statistical practices. Below, you’ll find some of the key materials. 

TIP: Do you have a specific journal in mind?

Be sure to read the submission guidelines for the specific journal you are submitting to, in order to discover any journal- or field-specific policies, initiatives or tools to utilize.

Biomedical Research
The “Statistical Analyses and Methods in the Published Literature” ( ) guidelines covers basic statistical reporting for research in biomedical journals.
General
While specific to , these should be applicable to most research contexts since the journal serves many research disciplines.
Systematic reviews & Meta-analyses
The “Preferred Reporting Items for Systematic Reviews and Meta-Analyses” ( ) is an evidence-based minimum set of items focusing  on the reporting of  evaluating randomized trials and other types of research.
Life Sciences
The “Consistent reporting of Materials, Design, and Analysis” (MDAR) checklist was developed and tested by a cross-publisher group of editors and experts in order to establish and harmonize . The , which is available for use by authors to compile their methods, and editors/reviewers to check methods, establishes a minimum set of requirements in transparent reporting and is adaptable to any discipline within the Life Sciences, by covering a breadth of potentially relevant methodological items and considerations.

Articles on statistical methods and reporting

Makin, T.R.,  Orban de Xivry, J. Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript . eLife 2019;8:e48175 (2019).  https://doi.org/10.7554/eLife.48175  

Munafò, M., Nosek, B., Bishop, D. et al. A manifesto for reproducible science . Nat Hum Behav 1, 0021 (2017). https://doi.org/10.1038/s41562-016-0021    

Writing tips

Your use of statistics should be rigorous, appropriate, and uncompromising in avoidance of analytical flexibility. While this is difficult, do not compromise on rigorous standards for credibility!

What to do

  • Remember that trust in science is everyone’s responsibility.
  • Keep in mind future replicability.
  • Consider preregistering your analysis plan to have it (i) reviewed before results are collected to check problems before they occur and (ii) to avoid any analytical flexibility.
  • Follow principles, but also checklists and field- and journal-specific guidelines.
  • Consider a commitment to rigorous and transparent science a personal responsibility, and not simple adhering to journal guidelines.
  • Be specific about all decisions made during the experiments that someone reproducing your work would need to know.
  • Consider a course in advanced and new statistics, if you feel you have not focused on it enough during your research training.

What not to do

Don’t

  • Misuse statistics to influence significance or other interpretations of results
  • Conduct your statistical analyses if you are unsure of what you are doing—seek feedback (e.g. via preregistration) from a statistical specialist first.
  • How to Write a Great Title
  • How to Write an Abstract
  • How to Write Your Methods
  • How to Write Discussions and Conclusions
  • How to Edit Your Work

The contents of the Peer Review Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

The contents of the Writing Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

There’s a lot to consider when deciding where to submit your work. Learn how to choose a journal that will help your study reach its audience, while reflecting your values as a researcher…

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals

Biostatistics articles from across Nature Portfolio

Biostatistics is the application of statistical methods in studies in biology, and encompasses the design of experiments, the collection of data from them, and the analysis and interpretation of data. The data come from a wide range of sources, including genomic studies, experiments with cells and organisms, and clinical trials.

Latest Research and Reviews

research article statistics

Azacitidine and gemtuzumab ozogamicin as post-transplant maintenance therapy for high-risk hematologic malignancies

  • Satoshi Kaito
  • Yuho Najima
  • Noriko Doki

research article statistics

Impact of COVID-19 on antibiotic usage in primary care: a retrospective analysis

  • Anna Romaszko-Wojtowicz
  • K. Tokarczyk-Malesa
  • K. Glińska-Lewczuk

research article statistics

A standardized metric to enhance clinical trial design and outcome interpretation in type 1 diabetes

The use of a standardized outcome metric enhances clinical trial interpretation and cross-trial comparison. Here, the authors show the implementation of such a metric using type 1 diabetes trial data, reassess and compare results from these trials, and extend its use to define response to therapy.

  • Alyssa Ylescupidez
  • Henry T. Bahnson
  • Carla J. Greenbaum

research article statistics

A novel approach to visualize clinical benefit of therapies for chronic graft versus host disease (cGvHD): the probability of being in response (PBR) applied to the REACH3 study

  • Norbert Hollaender
  • Ekkehard Glimm
  • Robert Zeiser

research article statistics

Reproducibility in pharmacometrics applied in a phase III trial of BCG-vaccination for COVID-19

  • Rob C. van Wijk
  • Laurynas Mockeliunas
  • Ulrika S. H. Simonsson

research article statistics

Addressing mechanism bias in model-based impact forecasts of new tuberculosis vaccines

The complex transmission chain of tuberculosis (TB) forces mathematical modelers to make mechanistic assumptions when modelling vaccine effects. Here, authors posit a Bayesian formalism that unlocks mechanism-agnostic impact forecasts for TB vaccines.

Advertisement

News and Comment

Mitigating immortal-time bias: exploring osteonecrosis and survival in pediatric all - aall0232 trial insights.

  • Shyam Srinivasan
  • Swaminathan Keerthivasagam

Response to Pfirrmann et al.’s comment on How should we interpret conclusions of TKI-stopping studies

  • Junren Chen
  • Robert Peter Gale

research article statistics

Cell-free DNA chromosome copy number variations predict outcomes in plasma cell myeloma

  • Wanting Qiang

research article statistics

The role of allogeneic haematopoietic cell transplantation as consolidation after anti-CD19 CAR-T cell therapy in adults with relapsed/refractory acute lymphoblastic leukaemia: a prospective cohort study

  • Lijuan Zhou

Clinical trials: design, endpoints and interpretation of outcomes

  • Megan Othus
  • Mei-Jie Zhang

research article statistics

A SAS macro for estimating direct adjusted survival functions for time-to-event data with or without left truncation

  • Zhen-Huan Hu
  • Hai-Lin Wang

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research article statistics

  • Library databases
  • Library website

Statistics & Data: Search for Statistics in Articles

Search for statistics in articles.

Articles published in peer-reviewed journals may  use statistics to help support a hypothesis. Articles with statistics can help demonstrate how to incorporate statistics into your own academic writing. Look at the source the author used for these statistics; it may lead you to more sources for statistics or datasets.

You can search the Library's research databases for peer-reviewed articles that use statistics to support a position or argument. You may have to try several different searches to find relevant articles.

Peer-reviewed articles can be very valuable if you are struggling to locate statistics or data from a government site or database. It's best to keep an open mind because you may not find the exact statistics you're searching for but you might find something related or useable. It can be a lot like searching for a needle in a haystack. 

Search with keywords

To start searching by keyword for articles that contain statistics, begin by selecting a subject-specific research database from the library's research home pages. You can access a particular topic from our Subject Resources page .

In the search boxes at the top of the page, type keywords that describe your topic. In one search box, use keywords for terms that can help indicate statistics, such as:

  • statistic* Using an asterisk (*) tells the database to look for words beginning with the root word you've used and any possible endings. Statistic* tells the database to look for statistic, statistics, statistical, etc.

  For example, if you're looking for statistics that show a link between preschool programs and literacy skills in young children, you could try the following keyword search in one of the Education databases:

First Search Box:

Second Search Box:

Third Search Box:

findings 

research article statistics

You can put related terms in the same search box when you use OR between these terms. For example:

findings OR data OR statistic* OR result OR analysis

For more help using Boolean operators (AND, OR, NOT), explore the Library's Boolean Operators guide:

video icon

Search with Limiters

Some databases will offer a limiter that will allow you to restrict your search to statistical or data results only.

  • To begin, select a subject-specific research database from the Library website. ( How do I find databases by subject? )
  • In the search boxes at the top of the page, type keywords that describe your topic.
  • Depending on the database you are using, there may be options to limit your search to articles that con tain statistics. Not all databases will offer limiters for statistics. Look in the area under the search boxes for boxes like Document Type , Publication Type , or Supplemental Materials , and then choose options that might indicate statistics or data.

For example, In ProQuest databases, the Document  Type menu allows you to limit your search to   Statistics/Data Report :

research article statistics

  • Previous Page: Statistics & Data by Topic
  • Office of Student Disability Services

Walden Resources

Departments.

  • Academic Residencies
  • Academic Skills
  • Career Planning and Development
  • Customer Care Team
  • Field Experience
  • Military Services
  • Student Success Advising
  • Writing Skills

Centers and Offices

  • Center for Social Change
  • Office of Academic Support and Instructional Services
  • Office of Degree Acceleration
  • Office of Research and Doctoral Services
  • Office of Student Affairs

Student Resources

  • Doctoral Writing Assessment
  • Form & Style Review
  • Quick Answers
  • ScholarWorks
  • SKIL Courses and Workshops
  • Walden Bookstore
  • Walden Catalog & Student Handbook
  • Student Safety/Title IX
  • Legal & Consumer Information
  • Website Terms and Conditions
  • Cookie Policy
  • Accessibility
  • Accreditation
  • State Authorization
  • Net Price Calculator
  • Contact Walden

Walden University is a member of Adtalem Global Education, Inc. www.adtalem.com Walden University is certified to operate by SCHEV © 2024 Walden University LLC. All rights reserved.

Statology

The Importance of Statistics in Research (With Examples)

The field of statistics is concerned with collecting, analyzing, interpreting, and presenting data.

In the field of research, statistics is important for the following reasons:

Reason 1 : Statistics allows researchers to design studies such that the findings from the studies can be extrapolated to a larger population.

Reason 2 : Statistics allows researchers to perform hypothesis tests to determine if some claim about a new drug, new procedure, new manufacturing method, etc. is true.

Reason 3 : Statistics allows researchers to create confidence intervals to capture uncertainty around population estimates.

In the rest of this article, we elaborate on each of these reasons.

Reason 1: Statistics Allows Researchers to Design Studies

Researchers are often interested in answering questions about populations like:

  • What is the average weight of a certain species of bird?
  • What is the average height of a certain species of plant?
  • What percentage of citizens in a certain city support a certain law?

One way to answer these questions is to go around and collect data on every single individual in the population of interest.

However, this is typically too costly and time-consuming which is why researchers instead take a  sample  of the population and use the data from the sample to draw conclusions about the population as a whole.

Example of taking a sample from a population

There are many different methods researchers can potentially use to obtain individuals to be in a sample. These are known as  sampling methods .

There are two classes of sampling methods:

  • Probability sampling methods : Every member in a population has an equal probability of being selected to be in the sample.
  • Non-probability sampling methods : Not every member in a population has an equal probability of being selected to be in the sample.

By using probability sampling methods, researchers can maximize the chances that they obtain a sample that is representative of the overall population.

This allows researchers to extrapolate the findings from the sample to the overall population.

Read more about the two classes of sampling methods here .

Reason 2: Statistics Allows Researchers to Perform Hypothesis Tests

Another way that statistics is used in research is in the form of hypothesis tests .

These are tests that researchers can use to determine if there is a statistical significance between different medical procedures or treatments.

For example, suppose a scientist believes that a new drug is able to reduce blood pressure in obese patients. To test this, he measures the blood pressure of 30 patients before and after using the new drug for one month.

He then performs a paired samples t- test using the following hypotheses:

  • H 0 : μ after = μ before (the mean blood pressure is the same before and after using the drug)
  • H A : μ after < μ before (the mean blood pressure is less after using the drug)

If the p-value of the test is less than some significance level (e.g. α = .05), then he can reject the null hypothesis and conclude that the new drug leads to reduced blood pressure.

Note : This is just one example of a hypothesis test that is used in research. Other common tests include a one sample t-test , two sample t-test , one-way ANOVA , and two-way ANOVA .

Reason 3: Statistics Allows Researchers to Create Confidence Intervals

Another way that statistics is used in research is in the form of confidence intervals .

A confidence interval is a range of values that is likely to contain a population parameter with a certain level of confidence.

For example, suppose researchers are interested in estimating the mean weight of a certain species of turtle.

Instead of going around and weighing every single turtle in the population, researchers may instead take a simple random sample of turtles with the following information:

  • Sample size  n = 25
  • Sample mean weight  x  = 300
  • Sample standard deviation  s = 18.5

Using the confidence interval for a mean formula , researchers may then construct the following 95% confidence interval:

95% Confidence Interval:  300 +/-  1.96*(18.5/√ 25 ) =  [292.75, 307.25]

The researchers would then claim that they’re 95% confident that the true mean weight for this population of turtles is between 292.75 pounds and 307.25 pounds.

Additional Resources

The following articles explain the importance of statistics in other fields:

The Importance of Statistics in Healthcare The Importance of Statistics in Nursing The Importance of Statistics in Business The Importance of Statistics in Economics The Importance of Statistics in Education

Featured Posts

research article statistics

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Logo for The Wharton School

  • Youth Program
  • Wharton Online

Research Papers / Publications

  • Amstat News
  • ASA Community
  • Practical Significance
  • ASA Leader HUB

research article statistics

  • Real World Data Science
  • Staff Directory
  • ASA Leader Hub
  • Code of Conduct
  • Board of Directors
  • Constitution
  • Strategic Plan
  • Council of Sections Governing Board
  • Council of Chapters Governing Board
  • Council of Sections
  • Council of Chapters
  • Individual Member Benefits
  • Membership Options
  • Membership for Organizations
  • Student Chapters
  • Sections & Interest Groups
  • Outreach Groups
  • Membership Campaigns
  • Membership Directory
  • Members Only
  • Classroom Resources
  • Publications
  • Guidelines and Reports
  • Professional Development
  • Student Competitions
  • Communities and Resources
  • Graduate Educators
  • Caucus of Academic Reps
  • Student Resources
  • Career Resources
  • Communities
  • Statistics and Biostatistics Programs
  • Internships and Fellowships
  • K-12 Student Outreach
  • K-12 Statistical Ambassador
  • Educational Ambassador
  • Statistics and Biostatistics Degree Data
  • COVID-19 Pandemic Resources
  • Education Publications
  • JSM Proceedings
  • Significance
  • ASA Member News
  • Joint Statistical Meetings
  • Conference on Statistical Practice
  • ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop
  • International Conference on Establishment Statistics
  • International Conference on Health Policy Statistics
  • Symposium on Data Science & Statistics
  • Women in Statistics and Data Science
  • Other Meetings
  • ASA Board Statements
  • Letters Signed/Sent
  • Resources for Policymakers
  • Federal Budget Information
  • Statistical Significance Series
  • Count on Stats
  • ASA Fellowships and Grants
  • Salary Information
  • External Funding Sources
  • Ethical Guidelines for Statistical Practice
  • Accreditation
  • Authorized Use of PSTAT® Mark
  • ASA Fellows
  • Student Paper Competitions
  • Awards and Scholarships

ASA Journals Online

Journal of the american statistical association, the american statistician, journal of agricultural, biological, and environmental statistics, journal of business & economic statistics, journal of computational and graphical statistics, journal of nonparametric statistics, statistical analysis and data mining: the asa data science journal, statistics in biopharmaceutical research, technometrics, asa open-access journals.

Data

Data Science in Science   

Journal of statistics and data science education   .

Statistics and Public Policy

Statistics and Public Policy

Statistics surveys, asa co-published journals, journal of educational and behavioral statistics, journal of quantitative analysis in sports.

SIAM/ASA Journal on Uncertainty Quantification

SIAM/ASA Journal on Uncertainty Quantification

Journal of Survey Statistics and Methodology

Journal of Survey Statistics and Methodology

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on July 9, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, other interesting articles, frequently asked questions about descriptive statistics.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

Prevent plagiarism. Run a free check.

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution .

  • Simple frequency distribution table
  • Grouped frequency distribution table
Gender Number
Male 182
Female 235
Other 27

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Library visits in the past year Percent
0–4 6%
5–8 20%
9–12 42%
13–16 24%
17+ 8%

Measures of central tendency estimate the center, or average, of a data set. The mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

Mean number of library visits
Data set 15, 3, 12, 0, 24, 3
Sum of all values 15 + 3 + 12 + 0 + 24 + 3 = 57
Total number of responses = 6
Mean Divide the sum of values by to find : 57/6 =

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then , the median is the number in the middle. If there are two numbers in the middle, find their mean.

Median number of library visits
Ordered data set 0, 3, 3, 12, 15, 24
Middle numbers 3, 12
Median Find the mean of the two middle numbers: (3 + 12)/2 =

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Mode number of library visits
Ordered data set 0, 3, 3, 12, 15, 24
Mode Find the most frequently occurring response:

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s or SD ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.
Raw data Deviation from mean Squared deviation
15 15 – 9.5 = 5.5 30.25
3 3 – 9.5 = -6.5 42.25
12 12 – 9.5 = 2.5 6.25
0 0 – 9.5 = -9.5 90.25
24 24 – 9.5 = 14.5 210.25
3 3 – 9.5 = -6.5 42.25
= 9.5 Sum = 0 Sum of squares = 421.5

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

research article statistics

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

Visits to the library
6
Mean 9.5
Median 7.5
Mode 3
Standard deviation 9.18
Variance 84.3
Range 24

If you were to only consider the mean as a measure of central tendency, your impression of the “middle” of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to outliers , you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.

Number of visits to the library in the past year
Group 0–4 5–8 9–12 13–16 17+
Children 32 68 37 23 22
Adults 36 48 43 83 25

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

Visits to the library in the past year (Percentages)
Group 0–4 5–8 9–12 13–16 17+
Children 18% 37% 20% 13% 12% 182
Adults 15% 20% 18% 35% 11% 235

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables . It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Statistical power
  • Pearson correlation
  • Degrees of freedom
  • Statistical significance

Methodology

  • Cluster sampling
  • Stratified sampling
  • Focus group
  • Systematic review
  • Ethnography
  • Double-Barreled Question

Research bias

  • Implicit bias
  • Publication bias
  • Cognitive bias
  • Placebo effect
  • Pygmalion effect
  • Hindsight bias
  • Overconfidence bias

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved July 10, 2024, from https://www.scribbr.com/statistics/descriptive-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, central tendency | understanding the mean, median & mode, variability | calculating range, iqr, variance, standard deviation, inferential statistics | an easy introduction & examples, what is your plagiarism score.

ACM Digital Library home

  • Advanced Search

CFViSA: : A comprehensive and free platform for visualization and statistics in omics-data

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, introduction, graphical abstract, research highlight, recommendations, exploratory visualization of array-based comparative genomic hybridization.

Recent developments in DNA microarray technology have enabled a new and highly effective platform for performing comparative genomic hybridization (CGH) measurements. CGH measures anomalies in DNA copy number. Such copy number changes are now thought to ...

CubeViz: Exploration and Visualization of Statistical Linked Data

CubeViz is a flexible exploration and visualization platform for statistical data represented adhering to the RDF Data Cube vocabulary. If statistical data is provided adhering to the Data Cube vocabulary, CubeViz exhibits a faceted browsing widget ...

In silico analysis of motifs in promoters of Differentially Expressed Genes in rice (Oryza sativa L.) under anoxia

The aim of this study was to characterise the molecular mechanisms of transcriptional regulation of Differentially Expressed Genes (DEGs) in rice coleoptiles under anoxia by identifying motifs that are common in the promoter region of co-regulated ...

Information

Published in.

Pergamon Press, Inc.

United States

Publication History

Author tags.

  • Comprehensive platform
  • Bioinformatics
  • Visualization
  • User-friendly
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Advertisement

Supported by

Reliability of U.S. Economic Data Is in Jeopardy, Study Finds

A report says new approaches and increased spending are needed to ensure that government statistics remain dependable and free of political influence.

  • Share full article

Ben Casselman

By Ben Casselman

Federal Reserve officials use government data to help determine when to raise or lower interest rates. Congress and the White House use it to decide when to extend jobless benefits or send out stimulus payments. Investors place billions of dollars worth of bets that are tied to monthly reports on job growth, inflation and retail sales.

But a new study says the integrity of that data is in increasing jeopardy.

The report, issued on Tuesday by the American Statistical Association, concludes that government statistics are reliable right now. But that could soon change, the study warns, citing factors including shrinking budgets, falling survey response rates and the potential for political interference.

The authors — statisticians from George Mason University, the Urban Institute and other institutions — likened the statistical system to physical infrastructure like highways and bridges: vital, but often ignored until something goes wrong.

“We do identify this sort of downward spiral as a threat, and that’s what we’re trying to counter,” said Nancy Potok, who served as chief statistician of the United States from 2017 to 2019 and was one of the report’s authors. “We’re not there yet, but if we don’t do something, that threat could become a reality, and in the not-too-distant future.”

The report, “The Nation’s Data at Risk,” highlights the threats facing statistics produced across the federal government, including data on education, health, crime and demographic trends.

But the risks to economic data are particularly notable because of the attention it receives from policymakers and investors. Most of that data is based on surveys of households or businesses. And response rates to government surveys have plummeted in recent years, as they have for private polls. The response rate to the Current Population Survey — the monthly survey of about 60,000 households that is the basis for the unemployment rate and other labor force statistics — has fallen to about 70 percent in recent months, from nearly 90 percent a decade ago.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Fam Med Community Health
  • v.7(2); 2019

Logo of fmch

Basics of statistics for primary care research

Timothy c guetterman.

Family Medicine, University of Michigan, Michigan Medicine, Ann Arbor, Michigan, USA

The purpose of this article is to provide an accessible introduction to foundational statistical procedures and present the steps of data analysis to address research questions and meet standards for scientific rigour. It is aimed at individuals new to research with less familiarity with statistics, or anyone interested in reviewing basic statistics. After examining a brief overview of foundational statistical techniques, for example, differences between descriptive and inferential statistics, the article illustrates 10 steps in conducting statistical analysis with examples of each. The following are the general steps for statistical analysis: (1) formulate a hypothesis, (2) select an appropriate statistical test, (3) conduct a power analysis, (4) prepare data for analysis, (5) start with descriptive statistics, (6) check assumptions of tests, (7) run the analysis, (8) examine the statistical model, (9) report the results and (10) evaluate threats to validity of the statistical analysis. Researchers in family medicine and community health can follow specific steps to ensure a systematic and rigorous analysis.

Investigators in family medicine and community health often employ quantitative research to address aims that examine trends, relationships among variables or comparisons of groups (Fetters, 2019, this issue). Quantitative research involves collecting structured or closed-ended data, typically in the form of numbers, and analysing that numeric data to address research questions and test hypotheses. Research hypotheses provide a proposition about the expected outcome of research that may be assessed using a variety of methodologies, while statistical hypotheses are specific statements about propositions that can only be tested statistically. Statistical analysis requires a series of steps beginning with formulating hypotheses and selecting appropriate statistical tests. After preparing data for analysis, researchers then proceed with the actual statistical analysis and finally report and interpret the results.

Family medicine and community health researchers often limit their analyses to descriptive statistics—reporting frequencies, means and standard deviation (SD). While sometimes an appropriate stopping point, researchers may be missing opportunities for more advanced analyses. For example, knowing that patients have favourable attitudes about a treatment may be important and can be addressed with descriptive statistics. On the other hand, finding that attitudes are different (or not) between men and women and that difference is statistically significant may give even more actionable information to healthcare professionals. The latter question, about differences, can be addressed through inferential statistical tests. The purpose of this article is to provide an accessible introduction to foundational statistical procedures and present the steps of data analysis to address research questions and meet standards for scientific rigour. It is aimed at individuals new to research with less familiarity with statistics and may be helpful information when reading research or conducting peer review.

Foundational statistical techniques

Statistical analysis is a method of aggregating numeric data and drawing inferences about variables. Statistical procedures may be broadly classified into (1) statistics that describe data—descriptive statistics; and (2) statistics that make inferences about more general situations beyond the actual data set—inferential statistics.

Descriptive statistics

Descriptive statistics aggregate data that are grouped into variables to examine typical values and the spread of values for each variable in a data set. Statistics summarising typical values are referred to as measures of central tendency and include the mean, median and mode. The spread of values is represented through measures of variability, including the variance, SD and range. Together, descriptive statistics provide indicators of the distribution of data, or the frequency of values through the data set as in a histogram plot. Table 1 summarises commonly used descriptive statistics. For consistency, I use the terms independent variable and dependent variable, but in some fields and types of research such as correlational studies the preferred terms may be predictor and outcome variable. An independent variable influences, affects or predicts a dependent variable .

StatisticStatisticDescription of calculationIntent
Measures of central tendencyMeanTotal of values divided by the number of values.Describe all responses with the average value.
MedianArrange all values in order and determine the halfway point.Determine the middle value among all values, which is important when dealing with extreme outliers.
ModeExamine all values and determine which one appears most frequently.Describe the most common value.
Measures of variabilityVarianceCalculate the difference of each value from the mean, square this difference score, sum all of the squared difference scores and divide by the number of values minus 1.Provide an indicator of spread.
Standard deviationSquare root of variance.Give an indicator of spread by reporting on average how much values differ from the mean.
RangeThe difference between the maximum and minimum value.Give a very general indicator of spread.
FrequenciesCount the number of occurrences of each value.Provide a distribution of how many times each value occurs.

Inferential statistics: comparing groups with t tests and ANOVA

Inferential statistics are another broad category of techniques that go beyond describing a data set. Inferential statistics can help researchers draw conclusions from a sample to a population. 1 We can use inferential statistics to examine differences among groups and the relationships among variables. Table 2 presents a menu of common, fundamental inferential tests. Remember that even more complex statistics rely on these as a foundation.

Inferential statistics

StatisticIntent
t testsCompare groups to examine whether means between two groups are statistically significant.
Analysis of varianceCompare groups to examine whether means among two or more groups are statistically significant.
CorrelationExamine whether there is a relationship or association between two or more variables.
RegressionExamine how one or more variables predict another variable.

The t test is used to compare two group means by determining whether group differences are likely to have occurred randomly by chance or systematically indicating a real difference. Two common forms are the independent samples t test, which compares means of two unrelated groups, such as means for a treatment group relative to a control group, and the paired samples t test, which compares means of related groups, such as the pretest and post-test scores for the same individuals before and after a treatment. A t test is essentially determining whether the difference in means between groups is larger than the variability within the groups themselves.

Another fundamental set of inferential statistics falls under the general linear model and includes analysis of variance (ANOVA), correlation and regression. To determine whether group means are different, use the t test or the ANOVA. Note that the t test is limited to two groups, but the ANOVA is applicable to two or more groups. For example, an ANOVA could examine whether a primary outcome measure—dependent variable—is significantly different for groups assigned to one of three different interventions. The ANOVA result comes in an F statistic along with a p value or confidence interval (CI), which tells whether there is some significant difference among groups. We then need to use other statistics (eg, planned comparisons or a Bonferroni comparison, to give two possibilities) to determine which of those groups are significantly different from one another. Planned comparisons are established before conducting the analysis to contrast the groups, while other tests like the Bonferroni comparison are conducted post-hoc (ie, after analysis).

Examining relationships using correlation and regression

The general linear model contains two other major methods of analysis, correlation and regression. Correlation reveals whether values between two variables tend to systematically change together. Correlation analysis has three general outcomes: (1) the two variables rise and fall together; (2) as values in one variable rise, the other falls; and (3) the two variables do not appear to be systematically related. To make those determinations, we use the correlation coefficient (r) and related p value or CI. First, use the p value or CI, as compared with established significance criteria (eg, p<0.05), to determine whether a relationship is even statistically significant. If it is not, stop as there is no point in looking at the coefficients. If so, move to the correlation coefficient.

A correlation coefficient provides two very important pieces of information—the strength and direction of the relationship. An r statistic can range from −1.0 to +1.0. Strength is determined by how close the value is to −1.0 or 1.0. Either extreme indicates a perfect relationship, while a value of 0 indicates no relationship. Cohen provides guidance for interpretations: 0.1 is a weak correlation, 0.3 is a medium correlation and 0.5 is a large correlation. 1 2 These interpretations must be considered in the context of the study and relative to the literature. The valence (+ or −) coefficient reveals the direction of the relationship. A negative correlation means one value rises, while the other tends to fall, and a positive coefficient means that the values of the two variables tend to rise and fall together.

Regression adds an additional layer beyond correlation that allows predicting one value from another. Assume we are trying to predict a dependent variable (Y) from an independent variable (X). Simple linear regression gives an equation (Y = b 0 + b 1 X) for a line that we can use to predict one value from another. The three major components of that prediction are the constant (ie, the intercept represented by b 0 ), the systematic explanation of variation (b 1 ), and the error, which is a residual value not accounted for in the equation 3 but available as part of our regression output. To assess a regression model (ie, model fit), examine key pieces of the regression output: (1) F statistic and its significance to determine whether the model systematically accounts for variance in the dependent variable; (2) the r square value for a measure of how much variance in the dependent variable is accounted for by the model; (3) the significance of coefficients for each independent variable in the model; and (4) residuals to examine random error in the model. Other factors, such as outliers, are potentially important (see Field 4 ).

The aforementioned inferential tests are foundational to many other advanced statistics that are beyond the scope of this article. Inferential tests rely on foundational assumptions, including that data are normally distributed, observations are independent, and generally that our dependent or outcome variable is continuous. When data do not meet these assumptions, we turn to non-parametric statistics (see Field 4 ).

A brief history of foundational statistics

Prominent statisticians Karl Pearson and Ronald A Fisher developed and popularised many of the basic statistics that remain a foundation for statistics today. Fisher’s ideas formed the basis of null hypothesis significance testing that sets a criterion for confidence or probability of an event. 4 Among his contributions, Fisher also developed the ANOVA. Pearson’s correlation coefficient provides a way to examine whether two variables are related. The correlation coefficient is denoted by r for a relationship between two variables or R for relationships among more than two variables as in multiple correlation or regression. 4 William Gosset developed the t distribution and later the t test as a way to examine whether two values of means were statistically different. 5

Statistical software

While the aforementioned statistics can be calculated manually, researchers typically use statistical software that process data, calculate statistics and p values, and supply a summary output from the analysis. However, the programs still require an informed researcher to run the correct analysis and interpret the output. Several available programs include SAS, Stata, SPSS and R. Try using the programs through a demonstration or trial period before deciding which one to use. It also helps to know or have access to others using the program should you have questions.

Example study

The remainder of this article presents steps in statistical analysis that apply to many techniques. A recently published study on communication skills to break bad news to a patient with cancer provides an exemplar to illustrate these steps. 6 In that study, the team examined the validity of a competence assessment of communication skills, hypothesising that after receiving training, post-test scores would be statistically improved from pretest scores on the same measure. Another analysis was to examine pretest sensitisation, tested through a hypothesis that a group randomly assigned to receive a pretest and post-test would not be significantly different from a post-test-only group. To test the hypotheses, Guetterman et al 6 examined whether mean differences were statistically significant by applying t tests and ANOVA.

Steps in statistical analysis

Statistical analysis might be considered in 10 related steps. These steps assume necessary background activities, such as conducting literature review and writing clear research question or aims, are already complete.

Step 1. Formulate a hypothesis to test

In statistical analysis, we test hypotheses. Therefore, it is necessary to formulate hypotheses that are testable. A hypothesis is specific, detailed and congruent with statistical procedures. A null hypothesis gives a prediction and typically uses words like ‘no difference’ or ‘no association’. 7 For example, we may hypothesise that group means on a certain measure are not significantly different and test that with an ANOVA or t-test. For example, in the exemplar study, one of the hypotheses was ‘MPathic-VR scores will improve (decreased score reflects better performance) from the preseminar test to the postseminar test based on exposure to the [breaking bad news] BBN intervention’ (p508), which was tested with a t test. 6 Hypotheses about relationships among variables could be tested with correlation and regression. Ultimately, hypotheses are driven by the purpose or aims of a study and further subdivide the purpose or aims into aspects that are specific and testable. When forming hypotheses, a concern is that having too many dependent variables leads to multiple tests of the same data set. This concern, called multiple comparisons or multiplicity, can inflate the likelihood of finding a significant relationship when none exists. Conducting fewer tests and adjusting the p value are ways to mitigate the concern.

Step 2. Select a test to run based on research questions or hypotheses

The statistical test must match the intended hypothesis and research question. Descriptive statistics allow us to examine trends limited to typical values, spread of values and distributions of data. ANOVAs and t tests are methods to test whether means are statistically different among groups and what those differences are. In the exemplar study, the authors used paired samples t-tests for pre–post scores with the same individuals and independent t tests for differences among groups. 6

Correlation is a method to examine whether two or more variables are related to one another, and regression extends that idea by allowing us to fit a line to make predictions about one variable based on a linear relationship to another. These statistical tests alone do not determine cause and effect, but merely associations. Causal inferences can only be made with certain research designs (eg, experiments) and perhaps with advanced statistical techniques (eg, propensity score analysis). Table 3 provides guidance for determining which statistical test to use.

Choosing and interpreting statistics for studies common in primary care

I want toStatistical choiceIndependent variableDependent variableHow to interpret
Examine trends or distributions.Descriptive statisticsCategorical or continuousCategorical or continuousReport the statistic as is to describe the data set.
Compare group means.t testsCategorical with two levels (ie, two groups)ContinuousExamine the t statistic and significance level.
If significant, clearly report which group mean is higher, along with the effect size.
Compare group means.Analysis of varianceCategorical with two or more levels (ie, two or more groups)ContinuousExamine the statistic and significance level.
If significant, clearly report which group means are significantly different and how (eg, which are higher), along with the effect size.
Examine whether variables are associated.CorrelationContinuousContinuousExamine the r statistic and significance level.
If significant, describe whether a positive or negative correlation and its strength.
Gain a detailed understanding of the association of variables and use one or more variables to predict another.RegressionContinuous or categorical, may have more than one independent variable in multiple regressionContinuousExamine the statistic and significance level.
If significant, examine the R square for how much variance the model accounts for.
Determine whether each regression coefficient is significant; if significant, discuss the coefficients.

Step 3. Conduct a power analysis to determine a sample size

Before conducting analysis, we need to ensure that we will have an adequate sample size to detect an effect. Sample size relates to the concept of power. For example, to detect a small effect, a larger sample is needed. Larger sample sizes can thus detect a smaller effect. Sample size is determined through a power analysis. The determination of sample size is never a simple percent of the population, but a calculated number based on the planned statistical tests, significance level and effect size. 8 I recommend using G*Power for basic power calculations, although many other options are available. In the exemplar study, the authors did not report their power analysis prior to conducting the study, but they gave a post-hoc power analysis of the actual power based on their sample size and the effect size detected. 6

Step 4. Prepare data for analysis

Data often need cleaning and other preparation before conducting analysis. Problems requiring cleaning include values outside of an acceptable range and missing values. Any particular value could be wrong because of a data entry error or data collection problem. Visually inspecting data can reveal anomalies. For example, an age value of 200 is clearly an error, or a value of 9 on a 1–5 Likert-type scale is an error. An easy way to start inspecting data is to sort each variable by ascending values and then descending values to look for atypical values. Then, try to correct the problem by determining what the value should be. Missing values are a more complicated problem because a concern is why the value is missing. A few missing values at random is not necessarily a concern, but a pattern of missing values (eg, individuals from a specific ethnic group tend to skip a certain question) indicates a systematic missingness that could indicate a problem with the data collection instrument. Descriptive statistics are an additional way to check for errors and ensure data are ready for analysis. While not discussed in the communication assessment exemplar, the authors did prepare data for analysis and report missing values in their descriptive statistics.

Step 5. Always start with descriptive statistics

Before running inferential statistics, it is critical to first describe the data. Obtaining descriptive statistics is a way to check whether data are ready for further analysis. Descriptive statistics give a general sense of trends and can illuminate errors by reviewing frequencies, minimums and maximums that can indicate values outside of the accepted range. Descriptive statistics are also an important step to check whether we meet assumptions for statistical tests. In a quantitative study, descriptive statistics also inform the first table of the results that reports information about the sample, as seen in table 2 of the exemplar study. 6

Step 6. Check assumptions of statistical tests

All statistical tests rely on foundational assumptions. Although some tests are more robust to violations, checking assumptions indicates whether the test is likely to be valid for a particular data set. Foundational parametric statistics (eg, t tests, ANOVA, correlation, regression) assume independent observations and a normal linear distribution of data. In the exemplar study, the authors noted ‘Data from both groups met normality assumptions, based on the Shapiro–Wilk test’ (p508), and gave the statistics in addition to noting specific assumptions for the independent t tests around equality of variances. 6

Step 7. Run the analysis

Conducting the analysis involves running whatever tests were planned. Statistics may be calculated manually or using software like SPSS, Stata, SAS or R. Statistical software provides an output with key tests statistics, p values that indicate whether a result is likely systematic or random, and indicators of fit. In the exemplar study, the authors noted they used SPSS V.22. 6

Step 8. Examine how well the statistical model fits

The first step involves examining whether the statistical model was significant or a good fit. For t tests, ANOVAs, correlation and regression, first examine an overall test of significance. For a t test, if the t statistic is not statistically significant (eg, p>0.05 or a CI crossing 0), we can conclude no significant difference between groups. The communication assessment exemplar reports significance of the t tests along with measures such as equality of variance.

For an ANOVA, if the F statistic is not statistically significant (eg, p>0.05 or a CI crossing 0), we can conclude no significant difference between groups and stop because there is no point in further examining what groups may be different. If the F statistic is significant in an ANOVA, we can then use contrasts or post-hoc tests to examine what is different. For a correlation test, if the r value is not statistically significant (eg, p>0.05 or a CI crossing 0), we can stop because there is no point in looking at the magnitude or direction of the coefficient. If it is significant, we can proceed to interpret the r. Finally, for a regression, we can examine the F statistic as an omnibus test and its significance. If it is not significant, we can stop. If it is significant, then examine the p value of each independent variable and residuals.

Step 9. Report the results of statistical analysis

When writing statistical results, always start with descriptive statistics and note whether assumptions for tests were met. When reporting inferential statistical tests, give the statistic itself (eg, a F statistic), the measure of significance (p value or CI), the effect size and a brief written interpretation of the statistical test. The interpretation, for example, could note that an intervention was not significantly different from the control or that it was associated with improvement that was statistically significant. For example, the exemplar study gives the pre–post means along standard error, t statistic, p value and an interpretation that postseminar means were lower, along with a reminder to the reader that lower is better. 6

When writing for a journal, follow the journal’s style. Many styles italicise non-Greek statistics (eg, the p value), but follow the particular instructions given. Remember a p value can never be 0 even though some statistical programs round the p to 0. In that case, most styles prefer to report as p<0.001.

Step 10. Evaluate threats to statistical conclusion validity

Shadish et al 9 provide nine threats to statistical conclusion validity in drawing inferences about the relationship between two variables; the threats can broadly apply to many statistical analyses. Although it helps to consider and anticipate these threats when designing a research study, some only arise after data collection and analysis. Threats to statistical conclusion validity appear in table 4 . 9 Pertinent threats can be dealt with to the extent possible (eg, if assumptions were not met, select another test) and should be discussed as limitations in the research report. For example, in the exemplar study, the authors noted the sample size as a limitation but reported that a post-hoc power analysis found adequate power. 6

Threats to statistical conclusion validity

ThreatDescription
Low statistical power (see step 3)The sample size is not adequate to detect an effect.
Violated assumptions of statistical tests (see step 6)The data violate assumptions needed for the test, such as normality.
Fishing and error ratesRepeated tests of the same data (eg, multiple comparisons) increase chances of errors in conclusions.
Unreliability of measuresError in measurement or instruments can artificially inflate or decrease apparent relationships among variables.
Restricted rangeStatistics can be biased by limited outcome values (eg, high/low only) or floor or ceiling effects in which participants scores are clustered around high or low values.
Unreliability of treatment implementationIn experiments, unstandardised or inconsistent implementation affects conclusions about correlation.
Extraneous variance in an experimentThe setting of a study can introduce error.
Heterogeneity of unitsAs participants differ within conditions, standard deviation can increase and introduce error, making it harder to detect effects.
Inaccurate effect size estimationOutliers or incorrect effect size calculations (eg, a continuous measure for a dichotomous dependent variable) can skew measures of effect.

Key resources to learn more about statistics include Field 4 and Salkind 10 for foundational information. For advanced statistics, Hair et al 11 and Tabachnick and Fidell 12 provide detailed information on multivariate statistics. Finally, the University of California Los Angeles Institute for Digital Research and Education (stats.idre.ucla.edu/other/annotatedoutput/) provides annotated output from Stata, SAS, Stata and MPlus for many statistical tests to help researchers read the output and understand what it means.

Researchers in family medicine and community health often conduct statistical analyses to address research questions. Following specific steps ensures a systematic and rigorous analysis. Knowledge of these essential statistical procedures will equip family medicine and community health researchers with interpreting literature, reviewing literature and conducting appropriate statistical analysis of their quantitative data.

Nevertheless, I gently remind you that the steps are interrelated, and statistics is not only a consideration at the end of data collection. When designing a quantitative study, investigators should remember that statistics is based on distributions, meaning statistics works with aggregated numerical data and relies on variance within that data to test statistical hypotheses about group differences, relationships or trends. Statistics provides a broad view, based on these distributions, which brings implications at the early design phase. In designing a quantitative study, the nature of statistics generally suggests a larger number of participants in the research (ie, a larger n) to have adequate power to detect statistical significance and draw valid conclusions. Therefore, it will likely be helpful for researchers to include a biostatistician as early as possible in the research team when designing a study.

Contributors: The sole author, TCG, is responsible for the conceptualisation, writing and preparation of this manuscript.

Funding: This study was funded by the National Institutes of Health (10.13039/100000002) and grant number 1K01LM012739.

Competing interests: None declared.

Patient consent for publication: Not required.

Provenance and peer review: Not commissioned; internally peer reviewed.

  • Health Tech
  • Health Insurance
  • Medical Devices
  • Gene Therapy
  • Neuroscience
  • H5N1 Bird Flu
  • Health Disparities
  • Infectious Disease
  • Mental Health
  • Cardiovascular Disease
  • Chronic Disease
  • Alzheimer's
  • Coercive Care
  • The Obesity Revolution
  • The War on Recovery
  • Adam Feuerstein
  • Matthew Herper
  • Jennifer Adaeze Okwerekwu
  • Ed Silverman
  • CRISPR Tracker
  • Breakthrough Device Tracker
  • Generative AI Tracker
  • Obesity Drug Tracker
  • 2024 STAT Summit
  • Wunderkinds Nomination
  • STAT Madness
  • STAT Brand Studio

Don't miss out

Subscribe to STAT+ today, for the best life sciences journalism in the industry

Expand community-based research to make clinical trials more diverse

By Robert Metcalf and Jeffrey Francer July 10, 2024

Six people float around a globe piecing together an Earth puzzle — first opinion coverage from STAT

I nnovations in clinical trial designs and tools have the potential to unlock a new era of research that is more convenient for patients, more reflective of real-world treatment conditions, and more likely to enable participation of a diverse set of individuals. But a recent study reveals how far the U.S. is from realizing this potential: regions of the country with the worst social drivers of health are the least likely to host clinical trials.

The disconnect between need and where clinical trials are conducted is a longstanding one. But it was recently highlighted by University of Michigan researchers through an examination of demographic data for people enrolled in clinical trials for new cancer medicines. The most socially vulnerable counties were far less likely to have any nearby trial, a disparity that has worsened over time.

advertisement

Research sponsors and the Food and Drug Administration can respond to this challenge by continuing to support community-based clinical trials. But the regulatory framework that governs these and other modern approaches has not kept pace with innovations in clinical trials and must be updated to enable expansion of trials into more communities.

Clinical trials are essential for establishing the safety and effectiveness of new medicines. Trial results have a greater impact when participants reflect the demographic diversity of those who could potentially benefit from the treatments being evaluated. The University of Michigan research is one more confirmation that the U.S.’s existing clinical trial infrastructure often fails to meet these goals.

Designing and implementing clinical trials is hard work. Reports over time suggest that up to 85% of clinical trials don’t meet their recruitment goals and up to 80% are delayed due to recruitment challenges.

Large-scale clinical trials are typically hosted by large research hospitals and academic institutions, most of which are in big cities. This can exclude people in rural communities from participating in clinical trials, and can present logistical challenges even for individuals who live relatively close to these centers because they may not have the economic means or schedule flexibility to attend multiple appointments.

Today’s clinical trial regulations were created for a different era, when the technology of the time necessitated that studies be conducted at a single location under the direct supervision of an investigator and staff who carried out all aspects of the trial. Participants were required to come to that location. Clinical research still largely relies on this outmoded approach, which frequently requires participants to organize their lives around the trial, and often includes traveling, finding a place to stay, and taking time off from work.

New tools and approaches developed by clinical trial sponsors, working with the FDA, should help make trials more representative. The FDA has signaled an openness to supporting trial designs that make them more accessible for participants, more reflective of real-world conditions, and enable more diverse participation. This modernization of the regulatory framework is critically needed and will contribute to healthier communities by speeding the development of new and better treatments that address unmet medical needs.

Community-based trials, also known as decentralized trials, have the potential to significantly increase participation and diversity in clinical research. By forcing a shift to this model, the Covid-19 pandemic showed just how successful these types of studies can be. To help ensure studies could continue during the pandemic, investigators, trial sponsors, and regulators worked collaboratively during the nationwide shutdown to reverse the process, bringing trials to participants rather than participants to trials.

Lilly, the company we work for, partnered with a leading decentralized research organization to bring our Covid-19 research to at-risk patients in long-term care facilities. An innovative cloud-based system helped recruit participants across multiple sites and make adjustments as needed in real time.

This model allowed Lilly to move quickly, reach more people who were traditionally underrepresented in clinical trials, and protect the health of participants and trial staff during the pandemic, all while maintaining the highest standards of scientific research, patient safety, and data integrity. To be sure, Lilly wasn’t alone in doing this: companies across the biopharmaceutical industry can share similar stories of leveraging innovative, community-based approaches to keep clinical trials running during the Covid-19 crisis.

These updated approaches shouldn’t fade away with the pandemic. Drug developers, investigators, and regulators must build on what was learned. Several key updates to the U.S. clinical trial regulatory framework will be crucial to supporting this progress:

Ease the burden on clinical trial investigators. Enabling better support from sponsor staff can create efficiencies and fill resource needs for community-based providers. Local health care professionals are essential to the success of community-based trials, but most of them do not have the resources or infrastructure to manage many of the demands of clinical studies, such as recruiting participants, providing them with logistical support, and shipping investigational products to them. Trial sponsor staff have the capability to perform tasks like these that involve limited or no contact with participants to avoid conflict of interest. Current regulatory rules, however, provide little guidance on what types of sponsor roles are appropriate, which creates uncertainty for sponsors that can discourage such support.

Update the role of investigators. The shift in clinical trial services to multiple care settings, such as community clinics, mobile medical units, and participants’ homes, must be accompanied by updating how clinical trial investigators provide oversight of these settings. Current regulations state that an investigator must personally conduct or supervise a trial. This requirement can create confusion for a community-based study that includes multiple care settings in numerous communities.

To better accommodate community-based trials without compromising patient safety or data integrity, FDA regulations should be updated to clarify that trial investigators may provide oversight by ensuring that study staff such as local health care providers are appropriately qualified and trained for the trial-related activities they will perform. Such assurance could include confirming proper education and qualifications and meeting state licensing requirements.

Current regulations also state that investigators may administer an investigational product only to study participants they personally supervise. Such regulations do not lend themselves to the flexibility needed to enable community-based research, where patients can receive clinical trial services in many types of settings.

Consistently support the use of digital health technologies. Wearable devices and other advances can help make trials more convenient for participants by enabling remote collection of data from them in real time as they go about their daily lives. This convenience can promote diversity by reducing the number of clinic visits needed, making it possible for people to participate in trials whose income, work, or travel issues would prevent multiple in-person visits. Yet current FDA guidance lacks clarity on what evidence is needed to validate the use of digital health technologies. A modernized approach for qualifying digital health technologies is needed. Sponsors of new drug trials are currently encouraged to use the drug development tools pathway , which was not designed for digital health technologies and can be cumbersome and complicated for this use.

It also is not clear how digital health technologies will be reviewed when multiple FDA divisions or offices are involved. Providing greater clarity on the evidence required for validation and on cross-agency standards will support acceleration of the application of digital health technologies, further enabling community-based clinical trials.

By the end of this decade, we believe that community-based clinical trials will become the norm, not the outlier. To achieve this, all clinical trial stakeholders — including the FDA, drug developers, and investigators — must work together to foster a patient-centric clinical trial culture that embraces innovation and brings trials closer to potential participants. The result will be a win for everyone.

Robert Metcalf, Ph.D., is group vice president for clinical design, delivery and analytics, China and Japan medical, for Lilly. Jeffrey Francer, J.D., is Lilly’s vice president, head of global regulatory policy and strategy.

LETTER TO THE EDITOR

Have an opinion on this essay submit a letter to the editor here ., about the authors reprints, robert metcalf, jeffrey francer.

Clinical trials

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page .

research article statistics

Recommended

research article statistics

Recommended Stories

research article statistics

Listen: Anthony Fauci on presidents, bird flu, and turning down a multimillion-dollar job

research article statistics

STAT Plus: Private equity and neuroscience: a novel approach to developing new treatments for neurological disorders

research article statistics

STAT Plus: Troubled for-profit chains are stealthily operating dozens of psychiatric hospitals under nonprofits’ names

research article statistics

STAT Plus: Ascension is racing to unload hospitals as execs work to stem losses

research article statistics

STAT Plus: What’s next for Cassava Sciences, and Cartesian Therapeutics has a novel excuse for dodgy data

research article statistics

Realtor.com Economic Research

  • Data library

June 2024 Monthly Housing Market Trends Report

Ralph McLaughlin

  • The number of homes actively for sale was notably higher compared with last year, growing by 36.7%, an eighth straight month of growth.
  • The total number of unsold homes, including homes that are under contract, increased by 22.4% compared with last year.
  • Home sellers were more active this June, with 6.3% more homes newly listed on the market compared with last year and an increase from May.
  • The median price of homes for sale this June remained stable compared with last year, at $445,000, however, the median price per square foot grew by 3.4%, indicating that the inventory of smaller and more affordable homes has grown in share.
  • Homes spent 45 days on the market, which is two days more than last year.
  • The share of listings that were delisted – while slightly higher than last year – has not spiked this spring, and has recently declined to 5.7% at the end of 2024H1.

According to the Realtor.com ® June housing data , the market stabilized as mortgage rates also stabilized in June due to better-than-expected CPI readings . While the median list price nationwide stayed the same as last year, homes continue to see a price increase on per square foot basis. The time a typical home spends on the market increased compared to last year, as the inventory of homes for sale continued to grow, but homes were still snapped up more quickly than pre-pandemic levels. Meanwhile, although sellers—who are often buyers themselves—may be a little more disgruntled this spring due to a slower market that is requiring more price-adjustment than sellers faced last spring, they are not delisting their homes at any higher rate than last year. Just 6.3% of listings were delisted in the middle of June and this rate has been relatively stable since February.

research article statistics

However, the total count of delistings has risen by 16.1% compared to the same time last year. How can the share of delistings remain relatively stable while the count grows so rapidly? The answer is that total inventory has also grown at about the same rate as delistings, so while a growing number of sellers have taken their homes off the market this spring, there has been a proportionally equal number of sellers who are keeping their homes on the market* .

The number of homes for sale improves but is still low compared with pre-pandemic levels

There were 36.7% more homes actively for sale on a typical day in June compared with the same time in 2023, marking the eighth consecutive month of annual inventory growth. This is also an acceleration from May, which was up 35% year-over-year. In the eight consecutive months of increasing inventory, the rate of growth in each subsequent month has increased. While inventory this June is much improved compared with the previous three years, it is still down 32.4% compared with typical 2017 to 2019 levels. This is a slight improvement from last month’s 34.6% gap, as inventory continues to slowly grow toward normalcy.

research article statistics

In June, as in the previous four months , the growth in homes particularly priced in the $200,000 to $350,000 range outpaced all other price categories, as home inventory in this range grew by 50.0% compared with last year, surpassing even last month’s high 45.1% growth rate. This increase is again primarily fueled by a greater availability of smaller and more affordable homes in the South .

The total number of homes for sale, including homes that were under contract but not yet sold, increased by 22.4% compared with last year, growing on an annual basis for the seventh month in a row and eclipsing last month’s rate of 20.6%.

research article statistics

The number of homes under contract but not yet sold (pending listings) increased by 2.4% which is unchanged from last month’s rate of 2.4%. After reports of consumer price growth flattening in May , mortgage rates fell sharply in June on expectations that the Federal Reserve will cut rates at the end of the year. Back in April we predicted that the growth in pending listings would slow , and this materialized in both May but idled in June. However, with rates falling and inventory growing, it is possible that sales could accelerate slightly in June’s reported numbers, after they declined by 0.7% in May .

research article statistics

However, sellers continued to list their homes in higher numbers this June as newly listed homes were 6.3% above last year’s levels and higher than May’s figure of 5.9%. This marks the eighth month of increasing listing activity after a 17-month streak of decline. Two factors have made listings activity more sensitive to changes in mortgage rates. First, many sellers are themselves also homebuyers. Second, many potential sellers with a current mortgage have a rate well-below today’s market rate, with 87% of outstanding mortgage loans at a sub-6% rate . The decrease in mortgage rates seen in June likely contributed to an increased pace of growth in listing activity. We expect selling activity to continue to normalize as rates inch their way down over the next year.

research article statistics

Regional and Metro Area Inventory Trends

The South and West Are Closest to Bridging the Inventory Gap

In June, all four regions saw active inventory grow over the previous year. The South saw listings grow by 48.9%, while inventory grew by 35.8% in the West, 21.5% in the Midwest, and only 12.5% in the Northeast. Compared with the typical June from 2017 to 2019 before the COVID-19 Pandemic, the South saw the smallest gap in inventory, down 17.2% compared with pre-pandemic levels, while the gap was 22.8% in the West, and much larger in the Midwest and Northeast, at 48.6% and 57.1%, respectively.

research article statistics

The inventory of homes for sale increased in all of the largest 50 metros compared with last year. Metros that saw the most inventory growth included Tampa (+93.1%), Orlando (+81.5%), and Denver (+77.9%).

Despite higher inventory growth compared with last year, most metros still had a lower level of inventory when compared with pre-pandemic years. Among the 50 largest metro areas, eight metros saw higher levels of inventory in June compared with typical 2017 to 2019 levels. This is down from 11 metros last month. The top metros that saw inventory surpass pre-pandemic levels were predominantly in the South and West and included Austin (+41.2%), Memphis (24.9%), and San Antonio (+24.0%).

The South saw newly listed homes increase the most compared with last year

Compared with June 2023, newly listed home inventory increased the most in the West, by 9.8%, whereas new inventory grew by 7.6% in the South, 2.1% in the Northeast, and 0.7% in the Midwest. The gap in newly listed homes compared with pre-pandemic 2017 to 2019 levels was also the lowest in the South, where newly listed homes were 14.9% below pre-pandemic levels. In comparison, they were down 30.2% in the West, 29.5% in the Midwest, and 32.7% in the Northeast. 

In June, 41 of the 50 largest metros saw new listings increase over the previous year, up from 38 last month. However, two large metros saw more newly listed homes this June compared with the typical pace of new listings from June 2017 to 2019 before the pandemic: San Antonio (+8.9%) and Jacksonville (+11.2%). The metros that saw the largest growth in newly listed homes compared with last year included Seattle (+30.5%), San Jose (+26.5%), and San Antonio (+21.8%). 

Homes are spending more time on the market compared with last year, but less than pre-pandemic levels

The typical home spent 45 days on the market this June, which is two days more than the same time last year and one more day than last month. June marks the third month in a row where homes spent more time on the market compared with the previous year as inventory continues to grow and home sales remain sluggish. However, the time a typical home spends on the market is more than a week (8 days) less than the average June from 2017 to 2019.

research article statistics

Regional and metro area time on the market trends

In the South, where the growth in home inventory has been the largest, the typical home spent five more days on the market in June compared with last year, while out West homes are staying on the market three days longer. However, in the Midwest (-1 day) and Northeast (-4 days), homes are still spending less time on the market than last year.

While all regions are still seeing time on the market below pre-pandemic levels, in the West, homes are spending only one day less on the market compared with the typical June from 2017 to 2019. Time on the market was eight days less than pre-pandemic levels in the South, 10 days less in the Midwest, and 15 days less in the Northeast.

Meanwhile, time on the market decreased compared with last year in 26 of the 50 largest metro areas this June, down from 30 markets last month. It decreased the most in San Jose, Chicago, and Providence (-9 days). Time on the market increased compared with last year in 22 of the 50 largest metros, including Phoenix (+14 days), Tampa (+8 days), and Jacksonville (+7 days). Four predominantly Western markets saw homes spend more time on the market than typical 2017 to 2019 pre-pandemic timing: Austin (+6 days), Portland (+4 days), and Oklahoma City (+1 day).

The median list price remained stable compared with last June, but the price per square foot continues to rise

The national median list price continued to increase seasonally, to $445,000 in June compared with $440,000 in May, and the median list price remained stable compared with the same time last year, when it was also $445,000. However, when a change in the mix of inventory toward smaller homes is accounted for, the typical home listed this year has increased in asking price compared with last year. The median listing price per square foot increased by 3.4% in June compared with the same time last year. Moreover, the typical listed home price has grown by 39.1% compared with June 2019, while the price per square foot grew by 52.6%.

research article statistics

While the percentage of homes with price reductions increased from 14.1% in June of last year to 18.3% this year, the overall share of inventory is a little higher (+1.3pp) with the shares seen between June 2017 to June 2019.

research article statistics

Regional and metro area price trends

In June, listing prices fell on a year-over-year basis in the South (-1.8%), where competitive home inventory has grown the most, but prices continued to increase in the Northeast (+5.6%), Midwest (3%), and West (+1.4%) compared with the same time last year. Controlling for the mix of homes on the market by looking at price-per-square-foot, prices in all regions showed greater growth rates of 2.6% to 7.2%. Among large metros, the median list price in Cleveland (+14.7%), Philadelphia (+11.3%), and Rochester (+9.3%) saw the biggest increases.

Meanwhile, all 50 large metropolitan areas have seen sizable price growth compared with homes listed before the pandemic. Compared with June 2019, the price per square foot growth rate in the largest 50 metros ranged from 24.4% to 81.9%. The markets where sellers saw the greatest increase in price per square foot included the New York metro area (+81.9% vs June 2019), Boston (+67.7%), and Tampa (+67.7%). Markets which saw the lowest return included San Jose (+24.4%), Baltimore (+24.6%), and New Orleans (+25.5%).

The share of price reductions was up compared with last year in the South (+5.1 percentage points), West (+4.5 percentage points), Midwest (+2.6 percentage points), and Northeast (+2.1 percentage points). Forty-seven of the 50 largest metros saw the share of price reductions increase compared with last June, up from 46 in May. Tampa saw the greatest increase (+10.9 percentage points), followed by Jacksonville (+9.7 percentage points), and Denver (+9.7 percentage points).

June 2024 Regional Statistics

Midwest 21.5% 0.7% 3.0% 4.3% -1 2.6 pp
Northeast 12.5% 2.1% 5.6% 7.2% -4 2.1 pp
South 48.9% 7.6% -1.8% 2.6% 5 5.1 pp
West 35.8% 9.8% 1.4% 4.8% 3 4.5 pp

June 2024 Regional Statistics vs. Pre-Pandemic 2017–19

Midwest -48.6% -29.5% 41.6% 45.9% -10 -2.2 pp
Northeast -57.1% -32.7% 54.1% 65.0% -15 -5.9 pp
South -17.2% -14.9% 35.1% 53.2% -8 4.2 pp
West -22.8% -29.5% 38.7% 48.9% -1 2.4 pp

June 2024 Housing Overview of the 50 Largest Metros  

$425,000 -2.9% 1.7% 30.8% 52.6%
$550,000 -5.2% -2.1% 49.1% 62.4%
$369,900 1.3% 0.8% 8.8% 24.6%
$301,900 0.7% 1.3% 13.1% 26.5%
$879,000 1.4% 5.9% 46.5% 67.7%
$299,900 8.0% 7.2% 34.9% 46.6%
$441,170 0.0% 3.5% 23.1% 56.5%
$399,900 4.5% 5.8% 19.4% 32.4%
$379,900 -2.6% 3.9% 33.3% 51.3%
$285,000 14.7% 14.4% 35.8% 32.4%
$400,000 0.1% 5.0% 23.3% 53.5%
$459,000 -3.0% 0.5% 27.8% 45.1%
$639,000 -6.0% 1.4% 24.1% 46.7%
$275,000 1.9% 2.3% 5.0% 26.0%
$449,900 3.6% 12.5% 47.6% 64.5%
$372,900 -1.4% -0.1% 15.5% 38.3%
$352,495 0.7% 3.9% 25.9% 56.4%
$424,000 -3.4% -0.4% 36.2% 54.1%
$429,000 -5.2% -2.2% 32.0% 47.9%
$484,988 6.6% 7.6% 49.3% 56.2%
$1,249,000 6.8% 6.5% 51.4% 55.1%
$342,952 5.5% 2.1% 19.5% 41.4%
$349,000 6.7% 2.0% 51.7% 63.8%
$535,000 -11.5% -8.4% 33.8% 47.9%
$400,000 5.3% 5.3% 42.9% 43.0%
$460,000 0.0% 2.2% 28.6% 36.0%
$575,000 -2.7% 3.9% 52.1% 66.7%
$335,000 -2.9% -2.5% 15.5% 25.5%
$789,000 5.3% 8.6% 34.9% 81.9%
$335,000 -4.3% 0.3% 30.9% 44.7%
$445,000 -3.1% -0.1% 39.1% 55.1%
$395,000 11.3% 9.0% 36.3% 54.8%
$539,900 0.0% -0.2% 42.1% 55.3%
$259,900 8.3% 10.8% 30.0% 31.1%
$625,000 -2.3% 2.0% 30.5% 41.5%
$599,000 8.9% 8.8% 55.6% 47.7%
$470,000 -1.9% 3.3% 25.0% 52.7%
$475,000 7.4% 5.9% 41.8% 58.1%
$615,000 6.0% 5.3% 46.5% 61.9%
$299,900 9.3% 10.4% 30.4% 39.7%
$679,000 0.0% 4.2% 35.5% 40.3%
$349,000 -5.0% -2.2% 18.3% 39.3%
$1,048,944 -4.0% 5.5% 45.7% 64.2%
$998,000 -13.2% -6.3% 4.0% 28.5%
$1,441,979 -3.7% -1.3% 22.7% 24.4%
$795,000 -3.6% 0.3% 29.3% 48.5%
$314,900 8.8% 5.7% 34.1% 32.8%
$425,000 -4.5% -0.7% 49.1% 67.7%
$399,000 1.0% 5.1% 35.3% 45.2%
$632,004 -1.9% 6.5% 30.3% 59.8%
58.6% 13.5% 38 -1 22.5% 9.0 pp
31.8% 1.5% 50 6 31.0% -2.6 pp
29.4% 2.1% 31 -5 15.0% 4.2 pp
40.2% 2.1% 45 3 16.9% 4.6 pp
23.1% 6.9% 25 1 15.1% 3.0 pp
10.0% 8.7% 22 -8 8.7% 1.7 pp
49.3% 13.9% 36 -2 21.0% 8.4 pp
5.8% 0.3% 25 -9 11.4% 1.6 pp
30.4% 9.5% 29 -1 14.5% 4.2 pp
5.6% 1.3% 31 -6 12.9% 2.4 pp
32.3% 3.9% 25 3 20.1% 5.6 pp
52.3% 10.4% 40 4 28.1% 7.4 pp
77.9% 7.6% 30 2 29.8% 9.7 pp
10.3% -2.7% 31 1 11.8% -2.4 pp
6.4% -1.7% 17 -1 6.4% 0.5 pp
39.7% 17.4% 40 0 19.9% 3.6 pp
28.8% -6.2% 35 -1 22.9% 5.8 pp
69.6% 21.8% 52 7 28.4% 9.7 pp
23.9% 5.1% 44 -6 15.9% 3.9 pp
-29.5% 15.5% 38 -6 18.4% 4.8 pp
36.9% 11.2% 37 -2 12.1% 3.3 pp
28.7% 6.3% 31 2 16.7% 3.6 pp
53.3% 8.7% 49 6 22.7% 6.5 pp
67.7% 12.7% 67 5 18.2% 5.5 pp
20.6% -3.5% 30 1 9.6% 0.9 pp
21.8% -6.3% 28 -4 13.7% 2.4 pp
20.0% 6.0% 31 -3 25.3% 4.8 pp
28.6% -0.4% 61 3 21.8% 1.5 pp
3.1% -1.5% 45 -5 8.6% 0.4 pp
38.7% 11.4% 45 0 22.1% 6.8 pp
81.5% 14.7% 52 6 23.0% 8.0 pp
10.8% 1.6% 38 -7 13.1% 1.5 pp
56.4% 5.6% 50 14 28.2% 8.3 pp
14.1% 4.8% 44 -3 16.2% 1.9 pp
34.6% -0.9% 40 7 19.7% 3.3 pp
22.9% 9.3% 23 -9 9.5% 3.1 pp
40.4% 16.6% 36 -7 18.7% 6.4 pp
39.5% 5.7% 36 -2 11.3% 4.2 pp
43.9% 10.4% 45 1 16.2% 4.3 pp
3.1% -2.1% 17 5 3.9% -3.5 pp
45.9% 8.7% 33 1 18.9% 6.6 pp
48.6% 21.8% 50 4 26.3% 3.2 pp
72.5% 20.6% 30 -2 16.4% 5.7 pp
39.5% 11.3% 29 -3 13.3% 3.2 pp
53.5% 26.5% 20 -9 9.4% 1.4 pp
61.9% 30.5% 24 -5 15.6% 3.1 pp
20.6% 0.7% 37 -2 13.6% 3.5 pp
93.1% 18.1% 53 8 29.8% 10.9 pp
27.9% 1.3% 31 2 18.1% 6.4 pp
27.2% 8.3% 29 -3 12.9% 3.8 pp

* Note: Some metrics for the Las Vegas, Phoenix, and Rochester metro areas are under review and unavailable.

Methodology

Realtor.com housing data as of June 2024. Listings include the active inventory of existing single-family homes and condos/townhomes/row homes/co-ops for the given level of geography on Realtor.com; new construction is excluded unless listed with an MLS that provides listing data to Realtor.com. Realtor.com data history goes back to July 2016. The 50 largest U.S. metropolitan areas as defined by the Office of Management and Budget (OMB-202003).

* Note that not all listing sources report sales of a home to Realtor.com, so in order to calculate an accurate estimate of delisted homes, we use a subset of counties where consistent sales of listings is reported. As such, our estimates of total inventory when calculating delistings will be different from our overall estimates of inventory. While different, the national trend in delisting activity should be more accurate than estimates based on a sample that includes sources that lack, or are inconsistent in, reporting sold listings.

Sign up for updates

Join our mailing list to receive the latest data and research.

  • At least 10% of research may already be co-authored by AI

That might not be a bad thing

Illustration of a robot head spewing paper with text and charts out of its mouth

Your browser does not support the <audio> element.

“C ERTAINLY, HERE is a possible introduction for your topic...” began a recent article in Surfaces and Interfaces , a scientific journal. Attentive readers might have wondered who exactly that bizarre opening line was addressing. They might also have wondered whether the ensuing article, on the topic of battery technology, was written by a human or a machine .

It is a question ever more readers of scientific papers are asking. Large language models ( LLM s) are now more than good enough to help write a scientific paper. They can breathe life into dense scientific prose and speed up the drafting process, especially for non-native English speakers. Such use also comes with risks: LLM s are particularly susceptible to reproducing biases, for example, and can churn out vast amounts of plausible nonsense. Just how widespread an issue this was, though, has been unclear.

In a preprint posted recently on arXiv, researchers based at the University of Tübingen in Germany and Northwestern University in America provide some clarity. Their research, which has not yet been peer-reviewed, suggests that at least one in ten new scientific papers contains material produced by an LLM . That means over 100,000 such papers will be published this year alone. And that is a lower bound. In some fields, such as computer science, over 20% of research abstracts are estimated to contain LLM -generated text. Among papers from Chinese computer scientists, the figure is one in three.

Spotting LLM -generated text is not easy. Researchers have typically relied on one of two methods: detection algorithms trained to identify the tell-tale rhythms of human prose, and a more straightforward hunt for suspicious words disproportionately favoured by LLM s, such as “pivotal” or “realm”. Both approaches rely on “ground truth” data: one pile of texts written by humans and one written by machines. These are surprisingly hard to collect: both human- and machine-generated text change over time, as languages evolve and models update. Moreover, researchers typically collect LLM text by prompting these models themselves, and the way they do so may be different from how scientists behave.

research article statistics

The latest research by Dmitry Kobak, at the University of Tübingen, and his colleagues, shows a third way, bypassing the need for ground-truth data altogether. The team’s method is inspired by demographic work on excess deaths, which allows mortality associated with an event to be ascertained by looking at differences between expected and observed death counts. Just as the excess-deaths method looks for abnormal death rates, their excess-vocabulary method looks for abnormal word use. Specifically, the researchers were looking for words that appeared in scientific abstracts with a significantly greater frequency than predicted by that in the existing literature (see chart 1). The corpus which they chose to analyse consisted of the abstracts of virtually all English-language papers available on PubMed, a search engine for biomedical research, published between January 2010 and March 2024, some 14.2m in all.

The researchers found that in most years, word usage was relatively stable: in no year from 2013-19 did a word increase in frequency beyond expectation by more than 1%. That changed in 2020, when “ SARS ”, “coronavirus”, “pandemic”, “disease”, “patients” and “severe” all exploded. (Covid-related words continued to merit abnormally high usage until 2022.)

research article statistics

By early 2024, about a year after LLM s like Chat GPT had become widely available, a different set of words took off. Of the 774 words whose use increased significantly between 2013 and 2024, 329 took off in the first three months of 2024. Fully 280 of these were related to style, rather than subject matter. Notable examples include: “delves”, “potential”, “intricate”, “meticulously”, “crucial”, “significant”, and “insights” (see chart 2).

The most likely reason for such increases, say the researchers, is help from LLM s. When they estimated the share of abstracts which used at least one of the excess words (omitting words which are widely used anyway), they found that at least 10% probably had LLM input. As PubMed indexes about 1.5m papers annually, that would mean that more than 150,000 papers per year are currently written with LLM assistance.

research article statistics

This seems to be more widespread in some fields than others. The researchers’ found that computer science had the most use, at over 20%, whereas ecology had the least, with a lower bound below 5%. There was also variation by geography: scientists from Taiwan, South Korea, Indonesia and China were the most frequent users, and those from Britain and New Zealand used them least (see chart 3). (Researchers from other English-speaking countries also deployed LLM s infrequently.) Different journals also yielded different results. Those in the Nature family, as well as other prestigious publications like Science and Cell , appear to have a low LLM- assistance rate (below 10%), while Sensors (a journal about, unimaginatively, sensors), exceeded 24%.

The excess-vocabulary method’s results are roughly consistent with those from older detection algorithms, which looked at smaller samples from more limited sources. For instance, in a preprint released in April 2024, a team at Stanford found that 17.5% of sentences in computer-science abstracts were likely to be LLM -generated. They also found a lower prevalence in Nature publications and mathematics papers ( LLM s are terrible at maths). The excess vocabulary identified also fits with existing lists of suspicious words.

Such results should not be overly surprising. Researchers routinely acknowledge the use of LLM s to write papers. In one survey of 1,600 researchers conducted in September 2023, over 25% told Nature they used LLM s to write manuscripts. The largest benefit identified by the interviewees, many of whom studied or used AI in their own work, was to help with editing and translation for those who did not have English as their first language. Faster and easier coding came joint second, together with the simplification of administrative tasks; summarising or trawling the scientific literature; and, tellingly, speeding up the writing of research manuscripts.

For all these benefits, using LLM s to write manuscripts is not without risks. Scientific papers rely on the precise communication of uncertainty, for example, which is an area where the capabilities of LLM s remain murky. Hallucination—whereby LLM s confidently assert fantasies—remains common, as does a tendency to regurgitate other people’s words, verbatim and without attribution.

Studies also indicate that LLM s preferentially cite other papers that are highly cited in a field, potentially reinforcing existing biases and limiting creativity. As algorithms, they can also not be listed as authors on a paper or held accountable for the errors they introduce. Perhaps most worrying, the speed at which LLM s can churn out prose risks flooding the scientific world with low-quality publications.

Academic policies on LLM use are in flux. Some journals ban it outright. Others have changed their minds. Up until November 2023, Science labelled all LLM text as plagiarism, saying: “Ultimately the product must come from—and be expressed by—the wonderful computers in our heads.” They have since amended their policy: LLM text is now permitted if detailed notes on how they were used are provided in the method section of papers, as well as in accompanying cover letters. Nature and Cell also allow its use, as long as it is acknowledged clearly.

How enforceable such policies will be is not clear. For now, no reliable method exists to flush out LLM prose. Even the excess-vocabulary method, though useful at spotting large-scale trends, cannot tell if a specific abstract had LLM input. And researchers need only avoid certain words to evade detection altogether. As the new preprint puts it, these are challenges that must be meticulously delved into. ■

Curious about the world? To enjoy our mind-expanding science coverage, sign up to  Simply Science , our weekly subscriber-only newsletter.

Explore more

This article appeared in the Science & technology section of the print edition under the headline “Scientists, et ai”

Science & technology June 29th 2024

The race to prevent satellite armageddon.

  • A deadly new strain of mpox is raising alarm

The centre cannot hold

From the June 29th 2024 edition

Discover stories from this section and more in the list of contents

More from Science and technology

research article statistics

New yeast strains can produce untapped flavours of lager

One Chilean hybrid has a spicy taste, with hints of clove

research article statistics

A new technique could analyse tumours mid-surgery

It would be fast enough to guide the hands of neurosurgeons

research article statistics

The world’s most studied rainforest is still yielding new insights

Even after a century of research, Barro Colorado in Panama continues to shed light on natural life

A new bionic leg can be controlled by the brain alone

Those using the prosthetic can walk as fast as those with intact lower limbs

How the last mammoths went extinct

Small genetic mutations accumulated through inbreeding may have made them vulnerable to disease

Fears of a Russian nuclear weapon in orbit are inspiring new protective tech

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

How to Improve the Hiring Process for Disabled Candidates

  • Mason Ameri
  • Terri R. Kurtzberg

research article statistics

It takes more than lip service to convince disabled job applicants to apply to your company. These research-backed practices can demonstrate that you’re a truly equitable employer.

How can companies do a better job of attracting disabled people to apply for jobs and convincing them that they truly are an equitable employer? And how can job candidates feel more comfortable disclosing a need for accommodation? The authors’ research over the last five years offers a number of paths forward for both sides. First, employers can move away from legalistic boilerplates and use more heartfelt language about their commitment to DEI. But they also need to back up their words with concrete evidence, such as a personal message from the CEO; testimonials from disabled employees; statistics on the hiring, promotion, accommodation fulfillment, and retention of disabled employees; or awards recognizing the company’s accomplishments in the DEI space. Their research also suggests job candidates emphasize their hard skills during interviews and delay the conversation about accommodations until they’ve built up more of a rapport with the hiring team.

Despite recent record employment gains for disabled employees in the U.S., the hiring of disabled people continues to be a pain point for both candidates and companies.

  • Mason Ameri , PhD is an associate professor of professional practice at Rutgers Business School. He is an expert in disability employment and is a consultant and speaker on policy reform in this area to government and industry.
  • Terri R. Kurtzberg , PhD, is a professor of management and global business at Rutgers Business School. She is the author of five books and her research is frequently quoted in the media. Dr. Kurtzberg is the recipient of multiple teaching and research awards.

Partner Center

IMAGES

  1. Summary Statistics for Articles

    research article statistics

  2. Which country leads the world in publishing scientific research

    research article statistics

  3. Analysis In A Research Paper

    research article statistics

  4. (PDF) International Journal of Statistics in Medical Research

    research article statistics

  5. Research 101: Descriptive statistics

    research article statistics

  6. Solved Research Project

    research article statistics

VIDEO

  1. Business Statistics and Research Methods|| Statistics For Business Research|| UGC NET PAPER 2

  2. Research & statistics( education)|| Most important question with Explanation(PYQ)|| UGC NET 2024||

  3. Surviving Statistics: A professor's guide to getting through

  4. What do statisticians research?

  5. Applied Descriptive Statistics

  6. Applied Inferential Statistics

COMMENTS

  1. Statistics

    Statistics articles from across Nature Portfolio. Statistics is the application of mathematical concepts to understanding and analysing large collections of data. A central tenet of statistics is ...

  2. Statistics articles within Scientific Reports

    Article 05 July 2024 | Open Access. Research on bearing fault diagnosis based on improved genetic algorithm and BP neural network. Zenghua Chen, Lingjian Zhu & Gang Xiong

  3. Introduction to Research Statistical Analysis: An Overview of the

    This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. ... Finally, the most common statistics produced by these methods are explored. Keywords: statistical analysis, sample size, power, t-test, anova, chi-square ...

  4. The New Statistics for Better Science: Ask How Much, How Uncertain, and

    The "New Statistics" emphasizes effect sizes, confidence intervals, meta-analysis, and the use of Open Science practices. We present 3 specific ways in which a New Statistics approach can help improve scientific practice: by reducing over-confidence in small samples, by reducing confirmation bias, and by fostering more cautious judgments of consistency.

  5. Statistics

    Statistics articles within Nature. Featured. Review Article | 02 August 2023. ... Research articles News Opinion Research Analysis Careers ...

  6. Full article: Reinforcing the Impact of Statistics on Society

    A forthcoming article will be reporting on the follow-up of 8043 residents for mortality, 2000-2016; it found 16% deaths (1429) among the residents and "statistically significantly elevated lung cancer standardized mortality ratios (SMRs)" as well as for asbestosis and mesothelioma. ... Statistics, and Research, and Discussion community ...

  7. Home

    Overview. Statistical Papers is a forum for presentation and critical assessment of statistical methods encouraging the discussion of methodological foundations and potential applications. The Journal stresses statistical methods that have broad applications, giving special attention to those relevant to the economic and social sciences.

  8. Research, Methods, Statistics

    June 12, 2024. This JAMA Guide to Statistics and Methods article explains the test-negative study design, an observational study design routinely used to estimate vaccine effectiveness, and examines its use in a study that estimated the performance of messenger RNA boosters against the Omicron variant. Research, Methods, Statistics Vaccination ...

  9. Challenges and Opportunities in Statistics and Data Science: Ten

    In this article, we present ten research areas that could make statistics and data science more impactful on science and society. Focusing on these areas will help better transform data into knowledge, actionable insights and deliverables, and promote more collaboration with computer and other quantitative scientists and domain scientists ...

  10. Statistics News, Research and Analysis

    Browse Statistics news, research and analysis from The Conversation Menu ... Articles on Statistics. Displaying 1 - 20 of 256 articles. Khakimullin Aleksandr/Shutterstock June 12, 2024

  11. Recommendations for accurate reporting in medical research statistics

    An important requirement for validity of medical research is sound methodology and statistics, yet this is still often overlooked by medical researchers.1,2 Based on the experience of reviewing statistics in more than 1000 manuscripts submitted to The Lancet Group of journals over the past 3 years, this Correspondence provides guidance to commonly encountered statistical deficiencies in ...

  12. Statistical Methods in Medical Research: Sage Journals

    Statistical Methods in Medical Research is a highly ranked, peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and therefore an essential reference for all medical statisticians. It is particularly useful for medical researchers dealing with data and provides a key resource for medical and statistical libraries, as well as ...

  13. Journal of Probability and Statistics

    Journal of Probability and Statistics publishes papers on the theory and application of probability and statistics that consider new methods and approaches to their implementation, or report significant results ... Research Article. Open access. Bayesian Estimation of the Stress‐Strength Reliability Based on Generalized Order Statistics for ...

  14. How to Report Statistics

    Although it may seem harmless, using statistics to "spin" results can prevent publication, undermine a published study, or lead to investigation and retraction. ... If the editors approve your study design, you'll receive a provisional acceptance for a future research article reporting the results. Preregistering is a great way to head ...

  15. Journal of Applied Statistics: Vol 51, No 9 (Current issue)

    Estimating changepoints in extremal dependence, applied to aviation stock prices during COVID-19 pandemic. Arnab Hazra et al. Article | Published online: 3 Jul 2024. View all latest articles. Explore the current issue of Journal of Applied Statistics, Volume 51, Issue 9, 2024.

  16. Biostatistics

    Biostatistics articles from across Nature Portfolio. Biostatistics is the application of statistical methods in studies in biology, and encompasses the design of experiments, the collection of ...

  17. Search for Statistics in Articles

    You can search the Library's research databases for peer-reviewed articles that use statistics to support a position or argument. You may have to try several different searches to find relevant articles. Peer-reviewed articles can be very valuable if you are struggling to locate statistics or data from a government site or database.

  18. The Importance of Statistics in Research (With Examples)

    In the field of research, statistics is important for the following reasons: Reason 1: Statistics allows researchers to design studies such that the findings from the studies can be extrapolated to a larger population. Reason 2: Statistics allows researchers to perform hypothesis tests to determine if some claim about a new drug, new procedure ...

  19. Research Papers / Publications

    Research. Research Papers / Publications; Research Centers; Wharton Seminars / Conferences. Previous Statistics Seminars; Related Seminars; Programs. Undergraduate Program. Undergraduate Contact Information; Undergraduate Statistics Concentration; Undergraduate Statistics Minor; Business Analytics Joint Concentration; Undergraduate Course ...

  20. Basic statistical tools in research and data analysis

    Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research ...

  21. Journals

    Journal of Survey Statistics and Methodology Sponsored by the ASA and American Association for Public Opinion Research, this journal's objective is to include cutting-edge scholarly articles on statistical and methodological issues for sample surveys, censuses, administrative record systems, and other related data.

  22. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate ...

  23. CFViSA: : A comprehensive and free platform for visualization and

    CFViSA integrates two omics data analysis pipelines (microbiome and transcriptome analysis) and an extensive array of 79 analysis tools spanning simple sequence processing, visualization, and statistics available for various omics data, including microbiome and transcriptome data.

  24. Reliability of U.S. Economic Data Is in Jeopardy, Study Finds

    The report, "The Nation's Data at Risk," highlights the threats facing statistics produced across the federal government, including data on education, health, crime and demographic trends.

  25. Basics of statistics for primary care research

    The following are the general steps for statistical analysis: (1) formulate a hypothesis, (2) select an appropriate statistical test, (3) conduct a power analysis, (4) prepare data for analysis, (5) start with descriptive statistics, (6) check assumptions of tests, (7) run the analysis, (8) examine the statistical model, (9) report the results ...

  26. Community-based research will make clinical trials more diverse

    Research sponsors and the Food and Drug Administration can respond to this challenge by continuing to support community-based clinical trials. But the regulatory framework that governs these and ...

  27. June 2024 Housing Market Trends Report—Realtor.com Research

    June 2024 Regional Statistics vs. Pre-Pandemic 2017-19. Region: Active Listing Count vs Pre-Pandemic: ... Join our mailing list to receive the latest data and research.

  28. At least 10% of research may already be co-authored by AI

    They might also have wondered whether the ensuing article, on the topic of battery technology, was written by a human or a machine. It is a question ever more readers of scientific papers are asking.

  29. How to Improve the Hiring Process for Disabled Candidates

    The authors' research over the last five years offers a number of paths forward for both sides. First, employers can move away from legalistic boilerplates and use more heartfelt language about ...