Editor's Choice: AI Tools to Improve Access to Reliable Health Information

research papers published in

  • Original Investigation Non–High-Density Lipoprotein Cholesterol Levels From Childhood to Adulthood and Cardiovascular Disease Events Feitong Wu, PhD; David R. Jacobs, PhD; Stephen R. Daniels, MD, PhD; et al cme

Just Published

  • Non–HDL-C Levels From Childhood to Adulthood and CVD Events Feitong Wu, PhD; et al. Original Investigation online first has active quiz Feitong Wu, PhD; et al.
  • Manufacturer Payments to Cardiologists and Use of Devices Sanket S. Dhruva, MD, MHS; et al. Research Letter online first Sanket S. Dhruva, MD, MHS; et al.
  • Acetaminophen Use During Pregnancy and Children’s Risk of Autism, ADHD, and Intellectual Disability Viktor H. Ahlqvist, PhD; et al. Original Investigation has active quiz Viktor H. Ahlqvist, PhD; et al.
  • Single Ascending and Multiple-Dose Trial of Zerlasiran, a Short Interfering RNA Targeting Lipoprotein(a) Steven E. Nissen, MD; et al. Original Investigation online first Steven E. Nissen, MD; et al.
  • Cardiac Function Before Sepsis and Clinical Outcomes Stuthi Iyer, MPH; et al. Research Letter online first Stuthi Iyer, MPH; et al.
  • Assessing the Real-World Effectiveness of Immunizations for Respiratory Syncytial Virus Fatimah S. Dawood, MD; et al. Viewpoint online first free access Fatimah S. Dawood, MD; et al.
  • Good Enough Lauren Rissman, MD A Piece of My Mind online first free access Lauren Rissman, MD
  • Including Pregnant and Lactating Women in Clinical Research Margaret Foster Riley, JD Viewpoint online first free access Margaret Foster Riley, JD
  • Trading Places, Becoming One Rafael Campo, MD, MA Editor's Note free access Rafael Campo, MD, MA
  • Data Checks Before Registering Study Protocols for Health Care Database Analyses Shirley V. Wang, PhD; et al. Viewpoint online first free access Shirley V. Wang, PhD; et al.
  • Tilt Table Testing William P. Cheshire, MD; et al. JAMA Diagnostic Test Interpretation online first has active quiz William P. Cheshire, MD; et al.
  • Does This Patient Have Alcohol Use Disorder? Evan Wood, MD, PhD; et al. The Rational Clinical Examination has active quiz Evan Wood, MD, PhD; et al.
  • Systemic Lupus Erythematosus Caroline H. Siegel, MD, MS; et al. Review online first has active quiz has multimedia Caroline H. Siegel, MD, MS; et al.
  • Risk Assessment and Prevention of Falls in Older Community-Dwelling Adults Cathleen S. Colón-Emeric, MD, MHS; et al. Review online first has active quiz has multimedia Cathleen S. Colón-Emeric, MD, MHS; et al.
  • Guidelines on Falls Prevention in Older Adults Peggy B. Leung, MD; et al. JAMA Clinical Guidelines Synopsis online first has active quiz has multimedia Peggy B. Leung, MD; et al.

Latest from the USPSTF

  • USPSTF Recommendation: Primary Care Interventions to Prevent Child Maltreatment
  • USPSTF Recommendation: Screening for Speech and Language Delay and Disorders
  • USPSTF Recommendation: Screening and Preventive Interventions for Oral Health in Adults
  • 41,323 Views Brain Waves Appear to Wash Out Waste During Sleep
  • 30,162 Views Provision of Medications for Self-Managed Abortion Before and After the Dobbs Decision
  • 29,094 Views Prostate-Specific Antigen Screening and 15-Year Prostate Cancer Mortality
  • 28,582 Views Study Provides Insight Into ME/CFS
  • 28,148 Views Effect of Tirzepatide on Maintenance of Weight Reduction
  • 26,766 Views Industry Payments to US Physicians by Specialty and Product Type
  • 22,600 Views Questions Surround Blood Tests That Claim to Screen for Multiple Cancers
  • 22,224 Views Stroke Risk After COVID-19 Bivalent Vaccination in US Older Adults
  • 21,879 Views Pharmacotherapy and Mortality in Individuals With ADHD
  • 21,419 Views Acetaminophen Use During Pregnancy and Children’s Risk of Autism, ADHD, and Intellectual Disability
  • 717 Citations Antibody Response to 2-Dose SARS-CoV-2 mRNA Vaccine Series in Solid Organ Transplant Recipients
  • 652 Citations Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization
  • 609 Citations Pancreatic Cancer
  • 598 Citations Updated Guidance on the Reporting of Race and Ethnicity in Medical and Science Journals
  • 591 Citations USPSTF Recommendation: Screening for Colorectal Cancer
  • 511 Citations Effect of 2 Inactivated SARS-CoV-2 Vaccines on Symptomatic COVID-19 Infection in Adults
  • 456 Citations The Leading Causes of Death in the US for 2020
  • 444 Citations Effect of Intermediate- vs Standard-Dose Anticoagulation on Outcomes of Patients With COVID-19
  • 440 Citations Association Between IL-6 Antagonists and Mortality Among Patients Hospitalized for COVID-19
  • 409 Citations Association Between 3 Doses of mRNA COVID-19 Vaccine and Symptomatic Infection Caused by Omicron and Delta Variants
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

  • Open access
  • Published: 30 April 2020
  • Volume 36 , pages 909–913, ( 2021 )

Cite this article

You have full access to this open access article

  • Clara Busse   ORCID: orcid.org/0000-0002-0178-1000 1 &
  • Ella August   ORCID: orcid.org/0000-0001-5151-1036 1 , 2  

266k Accesses

15 Citations

718 Altmetric

Explore all metrics

Communicating research findings is an essential step in the research process. Often, peer-reviewed journals are the forum for such communication, yet many researchers are never taught how to write a publishable scientific paper. In this article, we explain the basic structure of a scientific paper and describe the information that should be included in each section. We also identify common pitfalls for each section and recommend strategies to avoid them. Further, we give advice about target journal selection and authorship. In the online resource 1 , we provide an example of a high-quality scientific paper, with annotations identifying the elements we describe in this article.

Similar content being viewed by others

research papers published in

Literature reviews as independent studies: guidelines for academic practice

Sascha Kraus, Matthias Breier, … João J. Ferreira

research papers published in

Plagiarism in research

Gert Helgesson & Stefan Eriksson

research papers published in

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

Vivek Kumar Singh, Prashasti Singh, … Philipp Mayr

Avoid common mistakes on your manuscript.

Introduction

Writing a scientific paper is an important component of the research process, yet researchers often receive little formal training in scientific writing. This is especially true in low-resource settings. In this article, we explain why choosing a target journal is important, give advice about authorship, provide a basic structure for writing each section of a scientific paper, and describe common pitfalls and recommendations for each section. In the online resource 1 , we also include an annotated journal article that identifies the key elements and writing approaches that we detail here. Before you begin your research, make sure you have ethical clearance from all relevant ethical review boards.

Select a Target Journal Early in the Writing Process

We recommend that you select a “target journal” early in the writing process; a “target journal” is the journal to which you plan to submit your paper. Each journal has a set of core readers and you should tailor your writing to this readership. For example, if you plan to submit a manuscript about vaping during pregnancy to a pregnancy-focused journal, you will need to explain what vaping is because readers of this journal may not have a background in this topic. However, if you were to submit that same article to a tobacco journal, you would not need to provide as much background information about vaping.

Information about a journal’s core readership can be found on its website, usually in a section called “About this journal” or something similar. For example, the Journal of Cancer Education presents such information on the “Aims and Scope” page of its website, which can be found here: https://www.springer.com/journal/13187/aims-and-scope .

Peer reviewer guidelines from your target journal are an additional resource that can help you tailor your writing to the journal and provide additional advice about crafting an effective article [ 1 ]. These are not always available, but it is worth a quick web search to find out.

Identify Author Roles Early in the Process

Early in the writing process, identify authors, determine the order of authors, and discuss the responsibilities of each author. Standard author responsibilities have been identified by The International Committee of Medical Journal Editors (ICMJE) [ 2 ]. To set clear expectations about each team member’s responsibilities and prevent errors in communication, we also suggest outlining more detailed roles, such as who will draft each section of the manuscript, write the abstract, submit the paper electronically, serve as corresponding author, and write the cover letter. It is best to formalize this agreement in writing after discussing it, circulating the document to the author team for approval. We suggest creating a title page on which all authors are listed in the agreed-upon order. It may be necessary to adjust authorship roles and order during the development of the paper. If a new author order is agreed upon, be sure to update the title page in the manuscript draft.

In the case where multiple papers will result from a single study, authors should discuss who will author each paper. Additionally, authors should agree on a deadline for each paper and the lead author should take responsibility for producing an initial draft by this deadline.

Structure of the Introduction Section

The introduction section should be approximately three to five paragraphs in length. Look at examples from your target journal to decide the appropriate length. This section should include the elements shown in Fig.  1 . Begin with a general context, narrowing to the specific focus of the paper. Include five main elements: why your research is important, what is already known about the topic, the “gap” or what is not yet known about the topic, why it is important to learn the new information that your research adds, and the specific research aim(s) that your paper addresses. Your research aim should address the gap you identified. Be sure to add enough background information to enable readers to understand your study. Table 1 provides common introduction section pitfalls and recommendations for addressing them.

figure 1

The main elements of the introduction section of an original research article. Often, the elements overlap

Methods Section

The purpose of the methods section is twofold: to explain how the study was done in enough detail to enable its replication and to provide enough contextual detail to enable readers to understand and interpret the results. In general, the essential elements of a methods section are the following: a description of the setting and participants, the study design and timing, the recruitment and sampling, the data collection process, the dataset, the dependent and independent variables, the covariates, the analytic approach for each research objective, and the ethical approval. The hallmark of an exemplary methods section is the justification of why each method was used. Table 2 provides common methods section pitfalls and recommendations for addressing them.

Results Section

The focus of the results section should be associations, or lack thereof, rather than statistical tests. Two considerations should guide your writing here. First, the results should present answers to each part of the research aim. Second, return to the methods section to ensure that the analysis and variables for each result have been explained.

Begin the results section by describing the number of participants in the final sample and details such as the number who were approached to participate, the proportion who were eligible and who enrolled, and the number of participants who dropped out. The next part of the results should describe the participant characteristics. After that, you may organize your results by the aim or by putting the most exciting results first. Do not forget to report your non-significant associations. These are still findings.

Tables and figures capture the reader’s attention and efficiently communicate your main findings [ 3 ]. Each table and figure should have a clear message and should complement, rather than repeat, the text. Tables and figures should communicate all salient details necessary for a reader to understand the findings without consulting the text. Include information on comparisons and tests, as well as information about the sample and timing of the study in the title, legend, or in a footnote. Note that figures are often more visually interesting than tables, so if it is feasible to make a figure, make a figure. To avoid confusing the reader, either avoid abbreviations in tables and figures, or define them in a footnote. Note that there should not be citations in the results section and you should not interpret results here. Table 3 provides common results section pitfalls and recommendations for addressing them.

Discussion Section

Opposite the introduction section, the discussion should take the form of a right-side-up triangle beginning with interpretation of your results and moving to general implications (Fig.  2 ). This section typically begins with a restatement of the main findings, which can usually be accomplished with a few carefully-crafted sentences.

figure 2

Major elements of the discussion section of an original research article. Often, the elements overlap

Next, interpret the meaning or explain the significance of your results, lifting the reader’s gaze from the study’s specific findings to more general applications. Then, compare these study findings with other research. Are these findings in agreement or disagreement with those from other studies? Does this study impart additional nuance to well-accepted theories? Situate your findings within the broader context of scientific literature, then explain the pathways or mechanisms that might give rise to, or explain, the results.

Journals vary in their approach to strengths and limitations sections: some are embedded paragraphs within the discussion section, while some mandate separate section headings. Keep in mind that every study has strengths and limitations. Candidly reporting yours helps readers to correctly interpret your research findings.

The next element of the discussion is a summary of the potential impacts and applications of the research. Should these results be used to optimally design an intervention? Does the work have implications for clinical protocols or public policy? These considerations will help the reader to further grasp the possible impacts of the presented work.

Finally, the discussion should conclude with specific suggestions for future work. Here, you have an opportunity to illuminate specific gaps in the literature that compel further study. Avoid the phrase “future research is necessary” because the recommendation is too general to be helpful to readers. Instead, provide substantive and specific recommendations for future studies. Table 4 provides common discussion section pitfalls and recommendations for addressing them.

Follow the Journal’s Author Guidelines

After you select a target journal, identify the journal’s author guidelines to guide the formatting of your manuscript and references. Author guidelines will often (but not always) include instructions for titles, cover letters, and other components of a manuscript submission. Read the guidelines carefully. If you do not follow the guidelines, your article will be sent back to you.

Finally, do not submit your paper to more than one journal at a time. Even if this is not explicitly stated in the author guidelines of your target journal, it is considered inappropriate and unprofessional.

Your title should invite readers to continue reading beyond the first page [ 4 , 5 ]. It should be informative and interesting. Consider describing the independent and dependent variables, the population and setting, the study design, the timing, and even the main result in your title. Because the focus of the paper can change as you write and revise, we recommend you wait until you have finished writing your paper before composing the title.

Be sure that the title is useful for potential readers searching for your topic. The keywords you select should complement those in your title to maximize the likelihood that a researcher will find your paper through a database search. Avoid using abbreviations in your title unless they are very well known, such as SNP, because it is more likely that someone will use a complete word rather than an abbreviation as a search term to help readers find your paper.

After you have written a complete draft, use the checklist (Fig. 3 ) below to guide your revisions and editing. Additional resources are available on writing the abstract and citing references [ 5 ]. When you feel that your work is ready, ask a trusted colleague or two to read the work and provide informal feedback. The box below provides a checklist that summarizes the key points offered in this article.

figure 3

Checklist for manuscript quality

Data Availability

Michalek AM (2014) Down the rabbit hole…advice to reviewers. J Cancer Educ 29:4–5

Article   Google Scholar  

International Committee of Medical Journal Editors. Defining the role of authors and contributors: who is an author? http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authosrs-and-contributors.html . Accessed 15 January, 2020

Vetto JT (2014) Short and sweet: a short course on concise medical writing. J Cancer Educ 29(1):194–195

Brett M, Kording K (2017) Ten simple rules for structuring papers. PLoS ComputBiol. https://doi.org/10.1371/journal.pcbi.1005619

Lang TA (2017) Writing a better research article. J Public Health Emerg. https://doi.org/10.21037/jphe.2017.11.06

Download references

Acknowledgments

Ella August is grateful to the Sustainable Sciences Institute for mentoring her in training researchers on writing and publishing their research.

Code Availability

Not applicable.

Author information

Authors and affiliations.

Department of Maternal and Child Health, University of North Carolina Gillings School of Global Public Health, 135 Dauer Dr, 27599, Chapel Hill, NC, USA

Clara Busse & Ella August

Department of Epidemiology, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI, 48109-2029, USA

Ella August

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ella August .

Ethics declarations

Conflicts of interests.

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

(PDF 362 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Busse, C., August, E. How to Write and Publish a Research Paper for a Peer-Reviewed Journal. J Canc Educ 36 , 909–913 (2021). https://doi.org/10.1007/s13187-020-01751-z

Download citation

Published : 30 April 2020

Issue Date : October 2021

DOI : https://doi.org/10.1007/s13187-020-01751-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Manuscripts
  • Scientific writing
  • Find a journal
  • Publish with us
  • Track your research
  • Corrections

Search Help

Get the most out of Google Scholar with some helpful tips on searches, email alerts, citation export, and more.

Finding recent papers

Your search results are normally sorted by relevance, not by date. To find newer articles, try the following options in the left sidebar:

  • click "Since Year" to show only recently published papers, sorted by relevance;
  • click "Sort by date" to show just the new additions, sorted by date;
  • click the envelope icon to have new results periodically delivered by email.

Locating the full text of an article

Abstracts are freely available for most of the articles. Alas, reading the entire article may require a subscription. Here're a few things to try:

  • click a library link, e.g., "FindIt@Harvard", to the right of the search result;
  • click a link labeled [PDF] to the right of the search result;
  • click "All versions" under the search result and check out the alternative sources;
  • click "Related articles" or "Cited by" under the search result to explore similar articles.

If you're affiliated with a university, but don't see links such as "FindIt@Harvard", please check with your local library about the best way to access their online subscriptions. You may need to do search from a computer on campus, or to configure your browser to use a library proxy.

Getting better answers

If you're new to the subject, it may be helpful to pick up the terminology from secondary sources. E.g., a Wikipedia article for "overweight" might suggest a Scholar search for "pediatric hyperalimentation".

If the search results are too specific for your needs, check out what they're citing in their "References" sections. Referenced works are often more general in nature.

Similarly, if the search results are too basic for you, click "Cited by" to see newer papers that referenced them. These newer papers will often be more specific.

Explore! There's rarely a single answer to a research question. Click "Related articles" or "Cited by" to see closely related work, or search for author's name and see what else they have written.

Searching Google Scholar

Use the "author:" operator, e.g., author:"d knuth" or author:"donald e knuth".

Put the paper's title in quotations: "A History of the China Sea".

You'll often get better results if you search only recent articles, but still sort them by relevance, not by date. E.g., click "Since 2018" in the left sidebar of the search results page.

To see the absolutely newest articles first, click "Sort by date" in the sidebar. If you use this feature a lot, you may also find it useful to setup email alerts to have new results automatically sent to you.

Note: On smaller screens that don't show the sidebar, these options are available in the dropdown menu labelled "Year" right below the search button.

Select the "Case law" option on the homepage or in the side drawer on the search results page.

It finds documents similar to the given search result.

It's in the side drawer. The advanced search window lets you search in the author, title, and publication fields, as well as limit your search results by date.

Select the "Case law" option and do a keyword search over all jurisdictions. Then, click the "Select courts" link in the left sidebar on the search results page.

Tip: To quickly search a frequently used selection of courts, bookmark a search results page with the desired selection.

Access to articles

For each Scholar search result, we try to find a version of the article that you can read. These access links are labelled [PDF] or [HTML] and appear to the right of the search result. For example:

A paper that you need to read

Access links cover a wide variety of ways in which articles may be available to you - articles that your library subscribes to, open access articles, free-to-read articles from publishers, preprints, articles in repositories, etc.

When you are on a campus network, access links automatically include your library subscriptions and direct you to subscribed versions of articles. On-campus access links cover subscriptions from primary publishers as well as aggregators.

Off-campus access

Off-campus access links let you take your library subscriptions with you when you are at home or traveling. You can read subscribed articles when you are off-campus just as easily as when you are on-campus. Off-campus access links work by recording your subscriptions when you visit Scholar while on-campus, and looking up the recorded subscriptions later when you are off-campus.

We use the recorded subscriptions to provide you with the same subscribed access links as you see on campus. We also indicate your subscription access to participating publishers so that they can allow you to read the full-text of these articles without logging in or using a proxy. The recorded subscription information expires after 30 days and is automatically deleted.

In addition to Google Scholar search results, off-campus access links can also appear on articles from publishers participating in the off-campus subscription access program. Look for links labeled [PDF] or [HTML] on the right hand side of article pages.

Anne Author , John Doe , Jane Smith , Someone Else

In this fascinating paper, we investigate various topics that would be of interest to you. We also describe new methods relevant to your project, and attempt to address several questions which you would also like to know the answer to. Lastly, we analyze …

You can disable off-campus access links on the Scholar settings page . Disabling off-campus access links will turn off recording of your library subscriptions. It will also turn off indicating subscription access to participating publishers. Once off-campus access links are disabled, you may need to identify and configure an alternate mechanism (e.g., an institutional proxy or VPN) to access your library subscriptions while off-campus.

Email Alerts

Do a search for the topic of interest, e.g., "M Theory"; click the envelope icon in the sidebar of the search results page; enter your email address, and click "Create alert". We'll then periodically email you newly published papers that match your search criteria.

No, you can enter any email address of your choice. If the email address isn't a Google account or doesn't match your Google account, then we'll email you a verification link, which you'll need to click to start receiving alerts.

This works best if you create a public profile , which is free and quick to do. Once you get to the homepage with your photo, click "Follow" next to your name, select "New citations to my articles", and click "Done". We will then email you when we find new articles that cite yours.

Search for the title of your paper, e.g., "Anti de Sitter space and holography"; click on the "Cited by" link at the bottom of the search result; and then click on the envelope icon in the left sidebar of the search results page.

First, do a search for your colleague's name, and see if they have a Scholar profile. If they do, click on it, click the "Follow" button next to their name, select "New articles by this author", and click "Done".

If they don't have a profile, do a search by author, e.g., [author:s-hawking], and click on the mighty envelope in the left sidebar of the search results page. If you find that several different people share the same name, you may need to add co-author names or topical keywords to limit results to the author you wish to follow.

We send the alerts right after we add new papers to Google Scholar. This usually happens several times a week, except that our search robots meticulously observe holidays.

There's a link to cancel the alert at the bottom of every notification email.

If you created alerts using a Google account, you can manage them all here . If you're not using a Google account, you'll need to unsubscribe from the individual alerts and subscribe to the new ones.

Google Scholar library

Google Scholar library is your personal collection of articles. You can save articles right off the search page, organize them by adding labels, and use the power of Scholar search to quickly find just the one you want - at any time and from anywhere. You decide what goes into your library, and we’ll keep the links up to date.

You get all the goodies that come with Scholar search results - links to PDF and to your university's subscriptions, formatted citations, citing articles, and more!

Library help

Find the article you want to add in Google Scholar and click the “Save” button under the search result.

Click “My library” at the top of the page or in the side drawer to view all articles in your library. To search the full text of these articles, enter your query as usual in the search box.

Find the article you want to remove, and then click the “Delete” button under it.

  • To add a label to an article, find the article in your library, click the “Label” button under it, select the label you want to apply, and click “Done”.
  • To view all the articles with a specific label, click the label name in the left sidebar of your library page.
  • To remove a label from an article, click the “Label” button under it, deselect the label you want to remove, and click “Done”.
  • To add, edit, or delete labels, click “Manage labels” in the left column of your library page.

Only you can see the articles in your library. If you create a Scholar profile and make it public, then the articles in your public profile (and only those articles) will be visible to everyone.

Your profile contains all the articles you have written yourself. It’s a way to present your work to others, as well as to keep track of citations to it. Your library is a way to organize the articles that you’d like to read or cite, not necessarily the ones you’ve written.

Citation Export

Click the "Cite" button under the search result and then select your bibliography manager at the bottom of the popup. We currently support BibTeX, EndNote, RefMan, and RefWorks.

Err, no, please respect our robots.txt when you access Google Scholar using automated software. As the wearers of crawler's shoes and webmaster's hat, we cannot recommend adherence to web standards highly enough.

Sorry, we're unable to provide bulk access. You'll need to make an arrangement directly with the source of the data you're interested in. Keep in mind that a lot of the records in Google Scholar come from commercial subscription services.

Sorry, we can only show up to 1,000 results for any particular search query. Try a different query to get more results.

Content Coverage

Google Scholar includes journal and conference papers, theses and dissertations, academic books, pre-prints, abstracts, technical reports and other scholarly literature from all broad areas of research. You'll find works from a wide variety of academic publishers, professional societies and university repositories, as well as scholarly articles available anywhere across the web. Google Scholar also includes court opinions and patents.

We index research articles and abstracts from most major academic publishers and repositories worldwide, including both free and subscription sources. To check current coverage of a specific source in Google Scholar, search for a sample of their article titles in quotes.

While we try to be comprehensive, it isn't possible to guarantee uninterrupted coverage of any particular source. We index articles from sources all over the web and link to these websites in our search results. If one of these websites becomes unavailable to our search robots or to a large number of web users, we have to remove it from Google Scholar until it becomes available again.

Our meticulous search robots generally try to index every paper from every website they visit, including most major sources and also many lesser known ones.

That said, Google Scholar is primarily a search of academic papers. Shorter articles, such as book reviews, news sections, editorials, announcements and letters, may or may not be included. Untitled documents and documents without authors are usually not included. Website URLs that aren't available to our search robots or to the majority of web users are, obviously, not included either. Nor do we include websites that require you to sign up for an account, install a browser plugin, watch four colorful ads, and turn around three times and say coo-coo before you can read the listing of titles scanned at 10 DPI... You get the idea, we cover academic papers from sensible websites.

That's usually because we index many of these papers from other websites, such as the websites of their primary publishers. The "site:" operator currently only searches the primary version of each paper.

It could also be that the papers are located on examplejournals.gov, not on example.gov. Please make sure you're searching for the "right" website.

That said, the best way to check coverage of a specific source is to search for a sample of their papers using the title of the paper.

Ahem, we index papers, not journals. You should also ask about our coverage of universities, research groups, proteins, seminal breakthroughs, and other dimensions that are of interest to users. All such questions are best answered by searching for a statistical sample of papers that has the property of interest - journal, author, protein, etc. Many coverage comparisons are available if you search for [allintitle:"google scholar"], but some of them are more statistically valid than others.

Currently, Google Scholar allows you to search and read published opinions of US state appellate and supreme court cases since 1950, US federal district, appellate, tax and bankruptcy courts since 1923 and US Supreme Court cases since 1791. In addition, it includes citations for cases cited by indexed opinions or journal articles which allows you to find influential cases (usually older or international) which are not yet online or publicly available.

Legal opinions in Google Scholar are provided for informational purposes only and should not be relied on as a substitute for legal advice from a licensed lawyer. Google does not warrant that the information is complete or accurate.

We normally add new papers several times a week. However, updates to existing records take 6-9 months to a year or longer, because in order to update our records, we need to first recrawl them from the source website. For many larger websites, the speed at which we can update their records is limited by the crawl rate that they allow.

Inclusion and Corrections

We apologize, and we assure you the error was unintentional. Automated extraction of information from articles in diverse fields can be tricky, so an error sometimes sneaks through.

Please write to the owner of the website where the erroneous search result is coming from, and encourage them to provide correct bibliographic data to us, as described in the technical guidelines . Once the data is corrected on their website, it usually takes 6-9 months to a year or longer for it to be updated in Google Scholar. We appreciate your help and your patience.

If you can't find your papers when you search for them by title and by author, please refer your publisher to our technical guidelines .

You can also deposit your papers into your institutional repository or put their PDF versions on your personal website, but please follow your publisher's requirements when you do so. See our technical guidelines for more details on the inclusion process.

We normally add new papers several times a week; however, it might take us some time to crawl larger websites, and corrections to already included papers can take 6-9 months to a year or longer.

Google Scholar generally reflects the state of the web as it is currently visible to our search robots and to the majority of users. When you're searching for relevant papers to read, you wouldn't want it any other way!

If your citation counts have gone down, chances are that either your paper or papers that cite it have either disappeared from the web entirely, or have become unavailable to our search robots, or, perhaps, have been reformatted in a way that made it difficult for our automated software to identify their bibliographic data and references. If you wish to correct this, you'll need to identify the specific documents with indexing problems and ask your publisher to fix them. Please refer to the technical guidelines .

Please do let us know . Please include the URL for the opinion, the corrected information and a source where we can verify the correction.

We're only able to make corrections to court opinions that are hosted on our own website. For corrections to academic papers, books, dissertations and other third-party material, click on the search result in question and contact the owner of the website where the document came from. For corrections to books from Google Book Search, click on the book's title and locate the link to provide feedback at the bottom of the book's page.

General Questions

These are articles which other scholarly articles have referred to, but which we haven't found online. To exclude them from your search results, uncheck the "include citations" box on the left sidebar.

First, click on links labeled [PDF] or [HTML] to the right of the search result's title. Also, check out the "All versions" link at the bottom of the search result.

Second, if you're affiliated with a university, using a computer on campus will often let you access your library's online subscriptions. Look for links labeled with your library's name to the right of the search result's title. Also, see if there's a link to the full text on the publisher's page with the abstract.

Keep in mind that final published versions are often only available to subscribers, and that some articles are not available online at all. Good luck!

Technically, your web browser remembers your settings in a "cookie" on your computer's disk, and sends this cookie to our website along with every search. Check that your browser isn't configured to discard our cookies. Also, check if disabling various proxies or overly helpful privacy settings does the trick. Either way, your settings are stored on your computer, not on our servers, so a long hard look at your browser's preferences or internet options should help cure the machine's forgetfulness.

Not even close. That phrase is our acknowledgement that much of scholarly research involves building on what others have already discovered. It's taken from Sir Isaac Newton's famous quote, "If I have seen further, it is by standing on the shoulders of giants."

  • Privacy & Terms

Impossible? Let’s see.

Whether we're shaping the future of sustainability, or optimizing algorithms, or even exploring epidemiological studies, Google Research strives to continuously progress science, advance society, and improve the lives of billions of people.

Person looking up at screen

Advancing the state of the art

Our teams advance the state of the art through research, systems engineering, and collaboration across Google. We publish hundreds of research papers each year across a wide range of domains, sharing our latest developments in order to collaboratively progress computing and science.

Learn more about our philosophy.

Watch the film

Link to Youtube Video

Read the latest

flood forecasting

MAR 20 · BLOG

Computer-aided-diagnosis-4

MAR 18 · BLOG

HEAL

MAR 15 · BLOG

Talk like a graph

MAR 12 · BLOG

Social learning

MAR 07 · BLOG

Our research drives real-world change

MedPalm2

Improving our LLM designed for the medical domain

  • Large language models encode clinical knowledge Publication
  • Towards Expert-Level Medical Question Answering with Large Language Models Publication
  • Our latest health AI research updates Article
  • Med-PaLM 2, our expert-level medical LLM Video

Project Contrails

Project Contrails

A cost-effective and scalable way AI is helping to mitigate aviation’s climate impact

  • A human-labeled Landsat-8 contrails dataset Dataset
  • Can Google AI make flying more sustainable? Video
  • Estimates of broadband upwelling irradiance fromm GOES-16 ABI Publication
  • How AI is helping airlines mitigate the climate impact of contrails Blog

See our impact across other projects

open building

Open Buildings

Project Relate

Project Relate

Flood Forcasting

Flood Forecasting

We work across domains

Our vast breadth of work covers AI/ML foundations, responsible human-centric technology, science & societal impact, computing paradigms, and algorithms & optimization. Our research teams impact technology used by people all over the world.

One research paper started it all

The research we do today becomes the Google of the future. Google itself began with a research paper, published in 1998, and was the foundation of Google Search. Our ongoing research over the past 25 years has transformed not only the company, but how people are able to interact with the world and its information.

Legacy

Responsible research is at the heart of what we do

The impact we create from our research has the potential to reach billions of people. That's why everything we do is guided by methodology that is grounded in responsible practices and thorough consideration.

responsible-ai

Help us shape the future

Academic community

We've been working alongside the academic research community since day one. Explore the ways that we collaborate and provide resources and support through a variety of student and faculty programs.

Career Opportunities

From Accra to Zürich, to our home base in Mountain View, we’re looking for talented scientists, engineers, interns, and more to join our teams not only at Google Research but all research projects across Google.

Explore our other teams and product areas

Google Cloud

Google DeepMind

  • PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
  • EDIT Edit this Article
  • EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
  • Browse Articles
  • Learn Something New
  • Quizzes Hot
  • This Or That Game New
  • Train Your Brain
  • Explore More
  • Support wikiHow
  • About wikiHow
  • Log in / Sign up
  • Education and Communications
  • College University and Postgraduate
  • Academic Writing
  • Research Papers

How to Write and Publish Your Research in a Journal

Last Updated: February 26, 2024 Fact Checked

Choosing a Journal

Writing the research paper, editing & revising your paper, submitting your paper, navigating the peer review process, research paper help.

This article was co-authored by Matthew Snipp, PhD and by wikiHow staff writer, Cheyenne Main . C. Matthew Snipp is the Burnet C. and Mildred Finley Wohlford Professor of Humanities and Sciences in the Department of Sociology at Stanford University. He is also the Director for the Institute for Research in the Social Science’s Secure Data Center. He has been a Research Fellow at the U.S. Bureau of the Census and a Fellow at the Center for Advanced Study in the Behavioral Sciences. He has published 3 books and over 70 articles and book chapters on demography, economic development, poverty and unemployment. He is also currently serving on the National Institute of Child Health and Development’s Population Science Subcommittee. He holds a Ph.D. in Sociology from the University of Wisconsin—Madison. There are 13 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 696,858 times.

Publishing a research paper in a peer-reviewed journal allows you to network with other scholars, get your name and work into circulation, and further refine your ideas and research. Before submitting your paper, make sure it reflects all the work you’ve done and have several people read over it and make comments. Keep reading to learn how you can choose a journal, prepare your work for publication, submit it, and revise it after you get a response back.

Things You Should Know

  • Create a list of journals you’d like to publish your work in and choose one that best aligns with your topic and your desired audience.
  • Prepare your manuscript using the journal’s requirements and ask at least 2 professors or supervisors to review your paper.
  • Write a cover letter that “sells” your manuscript, says how your research adds to your field and explains why you chose the specific journal you’re submitting to.

Step 1 Create a list of journals you’d like to publish your work in.

  • Ask your professors or supervisors for well-respected journals that they’ve had good experiences publishing with and that they read regularly.
  • Many journals also only accept specific formats, so by choosing a journal before you start, you can write your article to their specifications and increase your chances of being accepted.
  • If you’ve already written a paper you’d like to publish, consider whether your research directly relates to a hot topic or area of research in the journals you’re looking into.

Step 2 Look at each journal’s audience, exposure, policies, and procedures.

  • Review the journal’s peer review policies and submission process to see if you’re comfortable creating or adjusting your work according to their standards.
  • Open-access journals can increase your readership because anyone can access them.

Step 1 Craft an effective introduction with a thesis statement.

  • Scientific research papers: Instead of a “thesis,” you might write a “research objective” instead. This is where you state the purpose of your research.
  • “This paper explores how George Washington’s experiences as a young officer may have shaped his views during difficult circumstances as a commanding officer.”
  • “This paper contends that George Washington’s experiences as a young officer on the 1750s Pennsylvania frontier directly impacted his relationship with his Continental Army troops during the harsh winter at Valley Forge.”

Step 2 Write the literature review and the body of your paper.

  • Scientific research papers: Include a “materials and methods” section with the step-by-step process you followed and the materials you used. [5] X Research source
  • Read other research papers in your field to see how they’re written. Their format, writing style, subject matter, and vocabulary can help guide your own paper. [6] X Research source

Step 3 Write your conclusion that ties back to your thesis or research objective.

  • If you’re writing about George Washington’s experiences as a young officer, you might emphasize how this research changes our perspective of the first president of the U.S.
  • Link this section to your thesis or research objective.
  • If you’re writing a paper about ADHD, you might discuss other applications for your research.

Step 4 Write an abstract that describes what your paper is about.

  • Scientific research papers: You might include your research and/or analytical methods, your main findings or results, and the significance or implications of your research.
  • Try to get as many people as you can to read over your abstract and provide feedback before you submit your paper to a journal.

Step 1 Prepare your manuscript according to the journal’s requirements.

  • They might also provide templates to help you structure your manuscript according to their specific guidelines. [11] X Research source

Step 2 Ask 2 colleagues to review your paper and revise it with their notes.

  • Not all journal reviewers will be experts on your specific topic, so a non-expert “outsider’s perspective” can be valuable.

Step 1 Check your sources for plagiarism and identify 5 to 6 keywords.

  • If you have a paper on the purification of wastewater with fungi, you might use both the words “fungi” and “mushrooms.”
  • Use software like iThenticate, Turnitin, or PlagScan to check for similarities between the submitted article and published material available online. [15] X Research source

Step 2 Write a cover letter explaining why you chose their journal.

  • Header: Address the editor who will be reviewing your manuscript by their name, include the date of submission, and the journal you are submitting to.
  • First paragraph: Include the title of your manuscript, the type of paper it is (like review, research, or case study), and the research question you wanted to answer and why.
  • Second paragraph: Explain what was done in your research, your main findings, and why they are significant to your field.
  • Third paragraph: Explain why the journal’s readers would be interested in your work and why your results are important to your field.
  • Conclusion: State the author(s) and any journal requirements that your work complies with (like ethical standards”).
  • “We confirm that this manuscript has not been published elsewhere and is not under consideration by another journal.”
  • “All authors have approved the manuscript and agree with its submission to [insert the name of the target journal].”

Step 3 Submit your article according to the journal’s submission guidelines.

  • Submit your article to only one journal at a time.
  • When submitting online, use your university email account. This connects you with a scholarly institution, which can add credibility to your work.

Step 1 Try not to panic when you get the journal’s initial response.

  • Accept: Only minor adjustments are needed, based on the provided feedback by the reviewers. A first submission will rarely be accepted without any changes needed.
  • Revise and Resubmit: Changes are needed before publication can be considered, but the journal is still very interested in your work.
  • Reject and Resubmit: Extensive revisions are needed. Your work may not be acceptable for this journal, but they might also accept it if significant changes are made.
  • Reject: The paper isn’t and won’t be suitable for this publication, but that doesn’t mean it might not work for another journal.

Step 2 Revise your paper based on the reviewers’ feedback.

  • Try organizing the reviewer comments by how easy it is to address them. That way, you can break your revisions down into more manageable parts.
  • If you disagree with a comment made by a reviewer, try to provide an evidence-based explanation when you resubmit your paper.

Step 3 Resubmit to the same journal or choose another from your list.

  • If you’re resubmitting your paper to the same journal, include a point-by-point response paper that talks about how you addressed all of the reviewers’ comments in your revision. [22] X Research source
  • If you’re not sure which journal to submit to next, you might be able to ask the journal editor which publications they recommend.

research papers published in

Expert Q&A

You might also like.

Develop a Questionnaire for Research

  • If reviewers suspect that your submitted manuscript plagiarizes another work, they may refer to a Committee on Publication Ethics (COPE) flowchart to see how to move forward. [23] X Research source Thanks Helpful 0 Not Helpful 0

research papers published in

  • ↑ https://www.wiley.com/en-us/network/publishing/research-publishing/choosing-a-journal/6-steps-to-choosing-the-right-journal-for-your-research-infographic
  • ↑ https://link.springer.com/article/10.1007/s13187-020-01751-z
  • ↑ https://libguides.unomaha.edu/c.php?g=100510&p=651627
  • ↑ http://www.canberra.edu.au/library/start-your-research/research_help/publishing-research
  • ↑ https://writingcenter.fas.harvard.edu/conclusions
  • ↑ https://writing.wisc.edu/handbook/assignments/writing-an-abstract-for-your-research-paper/
  • ↑ https://www.springer.com/gp/authors-editors/book-authors-editors/your-publication-journey/manuscript-preparation
  • ↑ https://apus.libanswers.com/writing/faq/2391
  • ↑ https://academicguides.waldenu.edu/library/keyword/search-strategy
  • ↑ https://ifis.libguides.com/journal-publishing-guide/submitting-your-paper
  • ↑ https://www.springer.com/kr/authors-editors/authorandreviewertutorials/submitting-to-a-journal-and-peer-review/cover-letters/10285574
  • ↑ http://www.apa.org/monitor/sep02/publish.aspx
  • ↑ Matthew Snipp, PhD. Research Fellow, U.S. Bureau of the Census. Expert Interview. 26 March 2020.

About This Article

Matthew Snipp, PhD

To publish a research paper, ask a colleague or professor to review your paper and give you feedback. Once you've revised your work, familiarize yourself with different academic journals so that you can choose the publication that best suits your paper. Make sure to look at the "Author's Guide" so you can format your paper according to the guidelines for that publication. Then, submit your paper and don't get discouraged if it is not accepted right away. You may need to revise your paper and try again. To learn about the different responses you might get from journals, see our reviewer's explanation below. Did this summary help you? Yes No

  • Send fan mail to authors

Reader Success Stories

RAMDEV GOHIL

RAMDEV GOHIL

Oct 16, 2017

Did this article help you?

research papers published in

David Okandeji

Oct 23, 2019

Revati Joshi

Revati Joshi

Feb 13, 2017

Shahzad Khan

Shahzad Khan

Jul 1, 2017

Oma Wright

Apr 7, 2017

Am I a Narcissist or an Empath Quiz

Featured Articles

How to Perform a Candle Wax Reading

Trending Articles

View an Eclipse

Watch Articles

Make Sticky Rice Using Regular Rice

  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

Get all the best how-tos!

Sign up for wikiHow's weekly email newsletter

Learn more about how the Cal Poly Humboldt Library can help support your research and learning needs.

Stay updated at Campus Ready .

Cal Poly Humboldt

  • Cal Poly Humboldt Library
  • Research Guides

Searching the Scientific Literature

Literature of science.

  • Initial Planning
  • Subject Searching
  • Citation Searching
  • Scientific Subject Headings
  • AND, OR, NOT (Boolean Operators)

Introduction

Scientific literature is the principal medium for communicating the results of scientific research and, as such, represents the permanent record of the collective achievements of the scientific community over time. This scientific knowledge base is composed of the individual "end products" of scientific research and discovery and continues to grow as new research builds on earlier research. This new research may add to, substantiate, modify, refine or refute existing knowledge on a specific topic. As a cycle new research and discovery in the laboratory or field is dependent on the existing scientific knowledge base which, in turn, becomes valuable when the new research is incorporated into the scientific knowledge base.

Scientific literature composing the scientific knowledge base is often divided into two basic categories:

  • Primary literature -- publications that report the results of  original  scientific research. These include journal papers, conference papers, monographic series, technical reports, theses, and dissertations.
  • Secondary literature -- publications that synthesize and condense what is known on specific topics. These include reviews, monographs, textbooks, treatises, handbooks, and manuals. These take time to produce and usually cite key "primary" publications on the topic.

Scientific Research/Publication Cycle

The following chart illustrates common steps involved in the scientific research process (inner circle), the dissemination of research results through the primary and secondary literature (outer circle), and the personal assimilation of this information resulting in new ideas and research (inner circle):

Scientific Journals, Magazines and Series

Scientific serials can be grouped into the following three categories.  Journals - Scholarly or Popular?  summarizes the differences between different types of journals and popular magazines.

Journal papers are the basic "molecular" unit of scientific knowledge base and are the most important "primary" source in the sciences. More than  80%  of the scientific research literature is published in this format. Annually 1.5 million articles are published in over 25,000 peer reviewed journals. Cumulatively there have been more than 50 million peer reviewed papers published since the first scientific journal was published in  1665 .

  • Magazines and Newsletters  -- Articles appearing in these publications tend to be popular in format and scope. They may contain news and perspectives of professional societies and environmental organizations, report on research published in scholarly journals, report on environmental problems and new political initiatives, or contain articles aimed at the layperson.
  • They are published by government agencies, universities or professional organizations. See  Natural Resources Agency Government Documents and Reports  for additional information.
  • The  series has a distinctive name. Typical names include  Bulletin ,  Special Report ,  Special Paper , Technical Report , and  Technical Paper .
  • Individual issues are consecutively numbered, e.g. Technical Paper No. 36.
  • Each issue has a distinctive author and title.
  • There is no regular publication schedule.

A typical example is:

Wheeler, W.E., R.C. Gatti, & G.A. Bartlett.(a) 1984.  Duck Breeding, Ecology and Harvest Characteristics on Grand River Marsh Wildlife Area .(b) Wisconsin Department of Natural Resources(c) Technical Bulletin(d) No. 145(e). where a=individual author; b=individual title; c=series author; d=series title; e=series number

To Find Individual Papers:  Use databases listed in  Articles and Databases  to find individual papers published in scientific journals, magazines and series. Databases typically can be searched by subject, taxonomic category, habitat, time period, chemical substance, geographic area or author. In addition the websites of many journal and magazine publishers contain searchable databases of articles published in their publications.

To Find Print and Fulltext Availability:  See the  Journal and Newspaper Finder  for specific holdings and available formats of journal, magazine and series titles available through the HSU Library. Enter the title of the publication, not the article title. In addition some series are cataloged by individual author and title in the  HSU Library Catalog . In addition directories listed in  Fulltext Journal Directories  include some fulltext journals that are not in our  Journal and Newspaper Finder .

To Find Abbrevations of Scientific Publications:  Many scientific journal and series titles are abbreviated in the literature.  Journal Title Abbreviations  lists both general abbreviation sources and more specific discipline sources in the sciences.

To Find Important Journals by Subject:  See  Journal-Ranking.com ,  Journals Ranked by Impact  (Sci-Bytes), SCImago Journal & Country Rank  and  Eigenfactor.org - Ranking and Mapping Scientific Knowledge .

Conference Papers

Papers presented at national and international conferences, symposia, and workshops are another source of "primary" scientific information . For many conferences the presented papers are eventually published in a "proceedings" or "transactions" volume. Papers with no published proceedings may be refined and reworked for formal publication in a journal. Proceedings available in the HSU Library are listed in the  HSU Library Catalog under both author (generally the name of the conference, individual editor or sponsoring organization) and title.

Many discipline databases included in  Articles and Databases  index individual conference papers by subject, taxonomic, geographic, and author. The  Conference Papers Index  and  PapersFirst  databases only index conference papers.

Theses and Dissertations

The outcome of graduate study conducted at universities is commonly a master's thesis or doctoral dissertation. In addition to the formal thesis or dissertation, research results are often communicated in other "primary" literature formats, such as the journal paper.

See  Theses and Dissertations  for how to find and acquire 1) HSU masters theses; and 2) theses and dissertations produced at other universities that are available in other libraries and on the Internet.

Scientific Monographs

Scientific monographs are book length works written by specialists for the benefit of other specialists. As defined by the  National Research Council  they attempt to "...collect, collate, analyze, integrate, and synthesize all relevant contributions to the archival literature of the scientific and engineering journals and to add original material as required". They are different from textbooks which are pedagogical works and scientific popularizations for the general public.

Monographs are listed in the  HSU Library Catalog  and in  other library catalogs .

Government Documents and Technical Reports

Scientists at federal and state government agencies conduct research that is sometimes published officially  by the government as a  government document . Other research is published in the "open" scientific literature as journal articles and other publications.

The HSU Library is an official " depository library " for federal and state govenment documents and annually receives approximately 6,000 government documents in either paper or microfiche format. In addition 80% of all recently published federal publications are available on the Internet.

Research projects conducted  for  government agencies are frequently published as  technical reports . They are usually produced in response to a specific information need with research either 1) conducted "in-house" by state or federal research labs, or 2) contracted out to universities, consulting firms, research institutes, or private industry.

Progress and final reports typically are used directly by the sponsoring agency with limited distribution beyond the organization. As a result technical report literature is sometimes called "gray literature" because of its difficulty to identify and acquire.

The format of technical reports is more flexible in organization and tends to contain more of the scientific data collected. Research first reported in a technical report may be reworked and published in other "primary" literature formats.

The  Natural Resources Agency Government Documents and Technical Reports  research guide contains further information on govenment documents and technical reports issued by federal and California State agencies, including their organization in the HSU Library and indexes to their content. The focus is on agencies responsible for managing and conducting research in natural resources.

Scientific Data

Scientific data are numerical quantities or other factual attributes derived from observation, experimentation or calculation. They are the raw material and the building block for scientific research. Through data analysis and interpretation new scientific information is generated.

The archiving of data collected and used in scientific research is important for future replication, repurposing based on new ideas or exploration of new analysis methodologies. Many funding agenices and scientific journals require authors of scientific papers to archive and share data utilized in their studies.

Data repositories archive and make data available to the scientific community. They may contain 1) data that has been collected as part of massive mission-oriented projects, e.g., atmospheric, hydrological, or oceanographic, or genomic; or 2) original data or data extracted from larger datasets that are associated with specifc published research studies.

Following are major directories of data repositories:

  • Data.gov  (United States Government) Browse or search for datasets available from US government executive agencies.
  • Data Files  (Association of College and Research Libraries. Science and Technology Section) Lists federal, state and foreign goverment data repository directories.
  • DataCite  (British Library, BioMed Central and Digital Curation Centre) Arranged alphabetically.
  • Global Change Master Directory  (Goddard Space Flight Center) Browse by broad subject area or search by keyword.
  • Open Access Directory: Data Repositories  (Graduate School of Library and Information Science, Simmons College) Arranged by broad subject.
  • Next: Initial Planning >>

Research Help

Profile Photo

National Science Foundation logo.

SCIENCE & ENGINEERING INDICATORS

Publications output: u.s. trends and international comparisons.

  • Report PDF (807 KB)
  • Report - All Formats .ZIP (3.9 MB)
  • Supplemental Materials - All Formats .ZIP (35.6 MB)
  • MORE DOWNLOADS OPTIONS
  • Share on X/Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Send as Email

R&D

Publication Output by Country, Region, or Economy and Scientific Field

Publication output reached 2.9 million articles in 2020 with over 90% of the total from countries with high-income and upper middle-income economies ( Figure PBS-1 ). predatory journals (NSB Indicators 2018 : Bibliometric Data Filters sidebar )." data-bs-content="Publication output only includes those indexed in the Scopus database. The publication output discussion uses fractional counting, which credits coauthored publications according to the collaborating institutions or countries based on the proportion of their participating authors. Country assignments refer to the institutional address of authors, with partial credit given for each international coauthorship. As part of our data analysis, we employ filters on the raw Scopus S&E publication data to remove publications with questionable quality, which appear in what are sometimes called predatory journals (NSB Indicators 2018 : Bibliometric Data Filters sidebar )." data-endnote-uuid="2ca4e1b6-71b9-46cd-95e8-d823ad890cb3">​ Publication output only includes those indexed in the Scopus database. The publication output discussion uses fractional counting, which credits coauthored publications according to the collaborating institutions or countries based on the proportion of their participating authors. Country assignments refer to the institutional address of authors, with partial credit given for each international coauthorship. As part of our data analysis, we employ filters on the raw Scopus S&E publication data to remove publications with questionable quality, which appear in what are sometimes called predatory journals (NSB Indicators 2018 : Bibliometric Data Filters sidebar ). Since 1996, output has consistently grown for countries with high-income economies, such as the United States, Germany, and the United Kingdom (UK), expanding from a large base number of publications ( Table SPBS-2 ). https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups ." data-bs-content="This report uses the World Bank (2021) country income classifications accessed in March 2021. The World Bank updates the classifications each year on 1 July. The World Bank income classifications are assigned using the gross national income per capita as measured in current U.S. dollars. This report uses the rankings. More information is available at https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups ." data-endnote-uuid="aba70396-9a04-4a9b-a470-d2458891074f">​ This report uses the World Bank (2021) country income classifications accessed in March 2021. The World Bank updates the classifications each year on 1 July. The World Bank income classifications are assigned using the gross national income per capita as measured in current U.S. dollars. This report uses the rankings. More information is available at https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups . Countries with upper-middle-income economies, such as China, Iran, Russia, and Brazil, have had a more rapid pace of growth since 1996, expanding from a relatively smaller base number of publications. Overall, the publication compound annual growth rates of countries with upper middle-income and high-income economies have been 10% and 3%, respectively, for the 25-year period covering 1996–2020 ( Figure PBS-1 ).

  • For grouped bar charts, Tab to the first data element (bar/line data point) which will bring up a pop-up with the data details
  • To read the data in all groups Arrow-Down will go back and forth
  • For bar/line chart data points are linear and not grouped, Arrow-Down will read each bar/line data points in order
  • For line charts, Arrow-Left and Arrow-Right will move to the next set of data points after Tabbing to the first data point
  • For stacked bars use the Arrow-Down key again after Tabbing to the first data bar
  • Then use Arrow-Right and Arrow-Left to navigate the stacked bars within that stack
  • Arrow-Down to advance to the next stack. Arrow-Up reverses

S&E articles, by income group: 1996–2020

Article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. Articles are classified by their year of publication and are assigned to a region, country, or economy on the basis of the institutional address(es) of the author(s) listed in the article. Articles are credited on a fractional count basis (i.e., for articles produced by authors from different countries, each country receives fractional credit on the basis of the proportion of its participating authors). Data are not directly comparable to Science and Engineering Indicators 2020 ; see the Technical Appendix for information on data filters. Low-income economies are not included in this figure because of their low publication output. Data by country and income groups are available in Table SPBS-2 .

National Center for Science and Engineering Statistics; Science-Metrix; Elsevier, Scopus abstract and citation database, accessed May 2021; World Bank Country and Lending Groups, accessed March 2021.

Science and Engineering Indicators

More recently, the compound annual growth in publication output for the world was 4% from 2010 to 2020 ( Table PBS-1 ). Country-specific growth rates vary widely by country. Among the 15 largest publication producers, countries with compound annual growth rates above the world average were Russia (10%), Iran (9%), India (9%), China (8%), and Brazil (5%); those with the lower growth rates were Japan (-1%), France (-0.3%), the United States (1%), the UK (1%), and Germany (1%). Table SPBS-17 ." data-bs-content="It is possible that the growth rates could be influenced by fractional counting. For example, the compound annual growth rate for France using whole counting is 1%. Publication output using whole counting is available in Table SPBS-17 ." data-endnote-uuid="0e263707-7950-46cf-831d-cf85eab318f1">​ It is possible that the growth rates could be influenced by fractional counting. For example, the compound annual growth rate for France using whole counting is 1%. Publication output using whole counting is available in Table SPBS-17 . The countries with low growth rates are those that built their scientific capacity decades ago and continue to maintain their scientific research. The worldwide growth of publication output, from 1.9 million in 2010 to 2.9 million in 2020, was led by four geographically large countries. China (36%), India (9%), Russia (6%), and the United States (5%) together accounted for about half the increase in publications over this time period.

S&E articles in all fields for 15 largest producing regions, countries, or economies: 2010 and 2020

na = not applicable.

The countries or economies are ranked based on the 2020 total. Article counts refer to publications from conference proceedings and peer-reviewed journal articles in S&E and indexed in Scopus (see Technical Appendix for more details). Articles are classified by their year of publication and are assigned to a region, country, or economy on the basis of the institutional address(es) of the author(s) listed in the article. Articles are credited on a fractional count basis (i.e., for articles from multiple countries or economies, each country or economy receives fractional credit on the basis of the proportion of its participating authors). Detail may not add to total because of countries or economies that are not shown. Proportions are based on the world total excluding unclassified addresses (data not presented). Details and other countries are available in Table SPBS-2 .

National Center for Science and Engineering Statistics; Science-Metrix; Elsevier, Scopus abstract and citation database, accessed May 2021.

Collectively, the top 15 countries produced 76% of the world’s publication output of 2.9 million articles in 2020 ( Table PBS-1 ). Figure PBS-2 and Table PBS-1 , or whole counting, as in Table SPBS-17 . There is a slight difference between the United States and China when looking at the whole counting total production numbers. Using whole counting for 2020, the United States had 600,053 articles, while China had 742,431. A whole counting measure allocates one full count to each country with an author contributing to the article; in fractional counting, each country receives a proportion of the count based on the number of authors from that country. For example, if an article had four authors—with two from the United States, one from China, and one from Brazil—the fractional scores would be 2/4 for the United States, 1/4 for China, and 1/4 for Brazil. In this example, the difference between whole and fractional counting indicates that the United States had more authors on the example paper, compared to the number of authors in China or Brazil." data-bs-content="The proportion of output attributable to the large producers is consistent whether using fractional counting, as in Figure PBS-2 and Table PBS-1 , or whole counting, as in Table SPBS-17 . There is a slight difference between the United States and China when looking at the whole counting total production numbers. Using whole counting for 2020, the United States had 600,053 articles, while China had 742,431. A whole counting measure allocates one full count to each country with an author contributing to the article; in fractional counting, each country receives a proportion of the count based on the number of authors from that country. For example, if an article had four authors—with two from the United States, one from China, and one from Brazil—the fractional scores would be 2/4 for the United States, 1/4 for China, and 1/4 for Brazil. In this example, the difference between whole and fractional counting indicates that the United States had more authors on the example paper, compared to the number of authors in China or Brazil." data-endnote-uuid="795522b2-013c-415e-8335-55e7fc42059c">​ The proportion of output attributable to the large producers is consistent whether using fractional counting, as in Figure PBS-2 and Table PBS-1 , or whole counting, as in Table SPBS-17 . There is a slight difference between the United States and China when looking at the whole counting total production numbers. Using whole counting for 2020, the United States had 600,053 articles, while China had 742,431. A whole counting measure allocates one full count to each country with an author contributing to the article; in fractional counting, each country receives a proportion of the count based on the number of authors from that country. For example, if an article had four authors—with two from the United States, one from China, and one from Brazil—the fractional scores would be 2/4 for the United States, 1/4 for China, and 1/4 for Brazil. In this example, the difference between whole and fractional counting indicates that the United States had more authors on the example paper, compared to the number of authors in China or Brazil. The two countries producing the most S&E publications in 2020 were China (669,744, or 23%) and the United States (455,856, or 16%) ( Figure PBS-2 ). With the exception of Iran replacing Taiwan beginning in 2014, the top 15 producers of S&E articles have been the same over the last 10 years (NSB 2016).

S&E articles, by selected region, country, or economy and rest of world: 1996–2020

Article counts refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. Articles are classified by their year of publication and are assigned to a region, country, or economy on the basis of the institutional address(es) of the author(s) listed in the article. Articles are credited on a fractional count basis (i.e., for articles produced by authors from different countries, each country receives fractional credit on the basis of the proportion of its participating authors). Data for all regions, countries, and economies are available in Table SPBS-2 .

The U.S. trend of moderate but increasing publication output varies by state. The National Science Board’s (NSB’s) State Indicators data tool provides state-level data based on each state’s doctorate population and R&D funding, including academic S&E article output per 1,000 science, engineering, and health doctorate holders in academia (NSB 2021a) and academic S&E article output per $1 million of academic S&E R&D (NSB 2021b).

The U.S. trend of publication output varies across race or ethnicity and sex, which impacts R&D careers (see sidebar Publication Output by Underrepresented Groups and Impact on R&D Careers and Indicators 2022 report “ The STEM Labor Force of Today: Scientists, Engineers, and Skilled Technical Workers ”).

Publication Output by Underrepresented Groups and Impact on R&D Careers

The National Science Board stated in its Vision 2030 report that “women and underrepresented minorities remain inadequately represented in S&E relative to their proportions in the U.S. population” (NSB 2020). These disparities have also been found in the publication of peer-reviewed articles (Hopkins et al. 2013). The National Center for Science and Engineering Statistics (NCSES) has undertaken research to examine linkages between publication output and careers in research (Chang, White, and Sugimoto forthcoming).

Matching publication output data to demographic survey data provides a key to understanding publication output in conjunction with authors’ demographic, training, and career information. Prior researchers have attempted to add author demographics using various methods, such as sex and race disambiguation algorithms (e.g., NamSor, Ginni, Ethnicolr, OriginsInfo), that estimate the probability of race or sex from given names (or, in the case of Face ++ , from images). The accuracy of these matches varies dramatically by country and field; sex disambiguation algorithms perform better for western countries and poorly for other countries, specifically in Asia and South America (Karimi et al. 2016). In addition, some scientific fields, such as astronomy and astrophysics, generally use initials rather than given names. Despite these limitations, researchers have observed sex and race disparities in publication output (Hopkins et al. 2013; Larivière et al. 2013; Marschke et al. 2018; and NSB Indicators 2018 : S&E Publication Patterns, by Gender ).

The limitations associated with the earlier approaches can be overcome using data directly collected from the authors. One such source is the NCSES Survey of Doctorate Recipients (SDR), * which provides demographic, education, and career history information from a sample of individuals with a U.S. research doctoral degree in a science, engineering, or health field (NCSES 2021). Clarivate, the architect of Web of Science (WoS), † matched SDR respondents to publication records in the WoS publication output database. The results provide demographic information, such as sex and race or ethnicity of publication authors.

These data shed light on publication output differences between groups defined by race or ethnicity and sex, by discipline, and by impacts to R&D career paths (Chang, White, and Sugimoto forthcoming). ‡ The point estimates in Figure PBS-A show the odds of pre-doctorate student publishing by ethnic group or sex relative to White students (or men, for the sex comparison) while the error bars show the confidence around that point estimate (95% confidence interval). The confidence interval is closely linked to the size of the sample. In the SDR-WoS data, the number of minorities and women receiving degrees in the population influences the sample size—and, consequently, the ability to measure odds ratios. For example, there are 3,750 women who received mathematics or statistics PhDs compared to 10,450 men ( Table SPBS-32 ). A similar issue arises for mathematics or statistics PhDs by race or ethnicity ( Table SPBS-33 ). Overall, compared to White graduates, Asian, Black, or Hispanic graduates are less likely to publish before their doctorate in biological, agricultural, and other life sciences; engineering; health sciences; and social sciences.

S&E pre-doctorate publishing odds ratio, by sex and selected race or ethnicity: 1995–2006

S&E doctorates include science, engineering, and health PhD candidates at U.S. research doctorate institutions. Computer sciences is not included in the figure because the odds ratio and confidence interval show no conclusive results for any demographic group or sex. Table shows the estimated odds ratios of publishing at least one article or conference proceeding during the five years before receiving a doctorate in the combined Web of Science and Survey of Doctorate Recipients database. For more detail, see Table SPBS-32 and Table SPBS-33 .

National Center for Science and Engineering Statistics, Survey of Doctorate Recipients; Clarivate, Web of Science.

Compared to men, women are less likely to publish before graduation in the biological sciences, agriculture, engineering, health sciences, physics, and social sciences. Pre-doctorate publications appear to factor into obtaining a job in which research is the primary activity. § For those with at least one pre-PhD publication, 56% reported that their first job has research as its primary activity compared to 37% of those without a publication (Chang, White, and Sugimoto forthcoming).

* A machine learning approach matches the SDR respondents to the authors of publications indexed by the Web of Science (WoS). The matching algorithm incorporates name commonality, research field, education, employment affiliations, coauthorship network, and self-citations to predict matches from the SDR respondents to the WoS.

† WoS is a bibliometric database of conference proceedings and peer-reviewed literature with English-language titles and abstracts.

‡ To predict pre-doctorate publishing propensity, separate models were fitted for each doctoral field, and the following factors from the NCSES’s Survey of Earned Doctorates were controlled: doctorate award year, type of PhD-awarding institution, source of primary support, community college experience, U.S. citizenship status at the time of degree award, level of parental education, marital status, dependents under 18 years old, disability, graduate debt, and name commonality.

§ The model controls for critical factors, such as the PhD institutions ranking as a high research institution, year of graduation, citizenship, parental degree, and student debt. The model does not measure article submissions or rejections.

Distribution of publications by field of science and region, country, or economy can indicate research priorities and capabilities. Health sciences is the largest field of science globally (25% of publications in 2020) ( Table SPBS-2 and Table SPBS-10 ). Likely due to COVID-19, health sciences publications grew 16%, and biological and biomedical sciences publications grew 15% from 2019 to 2020, far surpassing their previous 2009–19 compound annual growth rates of 3% for each ( Table SPBS-5 and Table SPBS-10 ). In the United States, the European Union (EU-27), the UK, and Japan, health sciences publication output far exceeds that of any other field ( Figure PBS-3 ). Table SPBS-17 through Table SPBS-31 )." data-bs-content="There is little difference between whole or fractional counting of publications for the large producing countries. Whole counting shows a difference for small countries with high collaboration rates because they only receive a fraction of a point for each article, while whole counting awards them a full point ( Table SPBS-17 through Table SPBS-31 )." data-endnote-uuid="f8a4e196-8e1b-4fb8-8538-837999b38d46">​ There is little difference between whole or fractional counting of publications for the large producing countries. Whole counting shows a difference for small countries with high collaboration rates because they only receive a fraction of a point for each article, while whole counting awards them a full point ( Table SPBS-17 through Table SPBS-31 ). The United States, the UK, and the EU-27 have the highest proportions of articles in the social sciences of the six countries and regions shown. In China, the largest research area is engineering (24%), followed by health sciences (15%) and computer and information sciences (12%). The largest scientific field for publication output in India is computer sciences (18%). Japan has a portfolio with health sciences (32%) at the top, followed by biological and biomedical sciences (13%) and engineering (13%).

S&E research portfolios, by eight largest fields of science and by selected region, country, or economy: 2020

EU = European Union.

Articles refer to publications from a selection of conference proceedings and peer-reviewed journals in S&E fields from Scopus. Articles are classified by their year of publication and are assigned to a region, country, or economy on the basis of the institutional address(es) of the author(s) listed in the article. Articles are credited on a fractional count basis (i.e., for articles from multiple countries, each country receives fractional credit on the basis of the proportion of its participating authors). See Table SPBS-1 for countries included in the EU; beginning in 2020, the United Kingdom was no longer a member of the EU. See Table SPBS-2 for all fields of science. See Table SPBS-2 through Table SPBS-16 for data on all regions, countries, and economies and all fields of science.

There is increasing interest in measuring publication output that crosses or combines the standard scientific fields for solving boundary-defying issues, such as climate change or poverty reduction (NRC 2014, NASEM 2021). While publication output provides a potential avenue for measuring cross-disciplinary research output, there are challenges for national-level measures. (See sidebar Measuring Cross-Disciplinarity Using Publication Output .)

Measuring Cross-Disciplinarity Using Publication Output

This sidebar uses cross-disciplinarity as an envelope term that includes convergent, multidisciplinary, and interdisciplinary research because the measurement techniques for examining publication output are similar. Cross-disciplinary research includes the following:

  • Convergent research that is driven by a specific and compelling problem requiring deep integration across disciplines (NSF 2019). Convergent science is a team-based approach to problem solving cutting across fields of inquiry and institutional frontiers to integrate areas of knowledge from multiple fields to address specific scientific and societal challenges.
  • Multidisciplinary research (MDR) that “juxtaposes two or more disciplines focused on a question … [where] the existing structure of knowledge is not questioned” (NRC 2014:44).
  • Interdisciplinary research (IDR) that “integrates information, data, methods, tools, concepts, and/or theories from two or more disciplines focused on a complex question, problem, topic, or theme” (NRC 2014:44).

Efforts using publication output to measure cross-disciplinary research yields results that are not suitable for comparing at the country level (Wagner et al. 2011; Wang and Schneider 2020). This finding is similar to sidebars in previous Indicators reports (NSB 2010; NSB 2016; Wagner et al. 2009). This sidebar explains the ongoing methodological issues with measuring convergence, MDR, and IDR at the country level and provides potential directions for future research.

For measurement at the country level, researchers have analyzed cross-disciplinary research using various bibliometric measures. Some have used article citations (Campbell et al. 2015; Porter and Chubin 1985), coauthor fields of specialization (Porter et al. 2007), text mining of abstracts or keywords listed on each article (Del Rio et al. 2001), or network analysis (Leydesdorff and Rafols 2011). An analysis of various approaches for measuring interdisciplinarity revealed a lack of consistent measurement outcomes across scientific fields, over time, and for countries or economies (Digital Science 2016).

Measuring cross-disciplinarity is challenging because indicators that are valid by one measure (e.g., citation counts), are not stable in another scientific area. For example, looking within the broad field of health sciences, health economics uses fewer citations, while biomedicine uses many more. When attempting to measure cross-disciplinarity for health sciences, the differences between health economics and biomedicine are, at least in part, related to different citation habits and not necessarily to differences in the cross-disciplinarity of the research.

Although research has not uncovered robust cross-disciplinary measures for countries, there are insights into the growth and influence of convergence, IDR, and MDR. Measured broadly, researchers find growth in cross-disciplinarity: “from about the mid-1980s, both natural sciences and engineering (NSE) and medical fields (MED) raised their level of interdisciplinarity at the expense of a focus on specialties” (Larivière and Gingras 2014:197). The team also found that the social sciences, as well as the arts and humanities, were the most open to collaborating with other disciplines. While cross-disciplinarity has grown, citation lags are associated with cross-disciplinary research papers. Specifically, they garner fewer than the normal number of citations for the first 3 years but pick up more citations than normal over 13 years (Wang, Thijs, and Glänzel 2015).

Recently, Digital Science prepared a report for the Research Councils of the United Kingdom (RCUK) that scanned the current literature and measurement approaches (Digital Science 2016). RCUK concluded that “no single indicator of interdisciplinarity (either MDR or IDR) analysed here should, used alone, satisfy any stakeholder. They show diverse inconsistency—in terms of change over time, difference between disciplines and trajectory for countries—that raises doubts as to their specific relevance” (Digital Science 2016:8). The RCUK report suggested that combining bibliometric IDR measures with other data, such as award information, could create a framework for expert analysis of IDR. Among the recommendations were continued exploration of text analysis and the inclusion of departmental affiliations in award information.

Similarly, the 2021 National Academies of Sciences, Engineering, and Medicine workshop on Measuring Convergence in Science and Engineering found that “using a single or even a few atomistic indicators to measure complex research activities capable of addressing societal problems is misguided” (NASEM 2021:49). Workshop participant Ismael Rafols suggested shifting from an atomistic to a portfolio approach, investigating the entire landscape that makes convergence possible.

Related Content

Help | Advanced Search

Computer Science > Computation and Language

Title: realm: reference resolution as language modeling.

Abstract: Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

AI Index Report

The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world’s most credible and authoritative source for data and insights about AI.

Subscribe to receive the 2024 report in your inbox!

AI Index coming soon

Coming Soon: 2024 AI Index Report!

The 2024 AI Index Report will be out April 15! Sign up for our mailing list to receive it in your inbox.

Steering Committee Co-Directors

Jack Clark

Ray Perrault

Steering committee members.

Erik Brynjolfsson

Erik Brynjolfsson

John Etchemendy

John Etchemendy

Katrina light

Katrina Ligett

Terah Lyons

Terah Lyons

James Manyika

James Manyika

Juan Carlos Niebles

Juan Carlos Niebles

Vanessa Parli

Vanessa Parli

Yoav Shoham

Yoav Shoham

Russell Wald

Russell Wald

Staff members.

Loredana Fattorini

Loredana Fattorini

Nestor Maslej

Nestor Maslej

Letter from the co-directors.

AI has moved into its era of deployment; throughout 2022 and the beginning of 2023, new large-scale AI models have been released every month. These models, such as ChatGPT, Stable Diffusion, Whisper, and DALL-E 2, are capable of an increasingly broad range of tasks, from text manipulation and analysis, to image generation, to unprecedentedly good speech recognition. These systems demonstrate capabilities in question answering, and the generation of text, image, and code unimagined a decade ago, and they outperform the state of the art on many benchmarks, old and new. However, they are prone to hallucination, routinely biased, and can be tricked into serving nefarious aims, highlighting the complicated ethical challenges associated with their deployment.

Although 2022 was the first year in a decade where private AI investment decreased, AI is still a topic of great interest to policymakers, industry leaders, researchers, and the public. Policymakers are talking about AI more than ever before. Industry leaders that have integrated AI into their businesses are seeing tangible cost and revenue benefits. The number of AI publications and collaborations continues to increase. And the public is forming sharper opinions about AI and which elements they like or dislike.

AI will continue to improve and, as such, become a greater part of all our lives. Given the increased presence of this technology and its potential for massive disruption, we should all begin thinking more critically about how exactly we want AI to be developed and deployed. We should also ask questions about who is deploying it—as our analysis shows, AI is increasingly defined by the actions of a small set of private sector actors, rather than a broader range of societal actors. This year’s AI Index paints a picture of where we are so far with AI, in order to highlight what might await us in the future.

- Jack Clark and Ray Perrault

Our Supporting Partners

AI Index Supporting Partners

Analytics & Research Partners

AI Index Supporting Partners

Stay up to date on the AI Index by subscribing to the  Stanford HAI newsletter.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

55k Accesses

862 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

research papers published in

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

research papers published in

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

research papers published in

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

research papers published in

  • Staff And Affiliates
  • Advisory Board
  • Research & Policy
  • Terner Labs
  • Terner Blog: No Limits

Inclusionary Zoning Paper April 2024 Final

Published On April 10, 2024

research papers published in

Share This Post:

  • Share via Twitter
  • Share via Facebook

Related Articles

Modeling inclusionary zoning’s impact on housing production in los angeles: tradeoffs and policy implications.

A new report, authored by Shane Phillips at the UCLA Lewis Center for Regional Policy Studies and published by the…

research papers published in

Upcoming Event: What’s at stake for housing in the 2024 elections?

Join the Terner Center & Labs on Monday, April 15 for a reception and panel discussion focused on what’s at…

research papers published in

Comparing ADU Permitting Time Inside and Outside the Coastal Zone

Author: Quinn Underriner California legislators are looking for a variety of ways to streamline building across California to help the…

research papers published in

2024 Legislative Preview: Housing and Homelessness Legislation Amid New Leadership and Budgetary Challenges

In a year of new leadership in the California legislature and significant budgetary hurdles, California's 2024 legislative session proves to…

Photo of California capitol building in Sacramento

IMAGES

  1. How to Publish a Research Paper in Reputed Journals?

    research papers published in

  2. (PDF) Analysis of research papers published in the Journal of Marine

    research papers published in

  3. 😎 How to write a research paper for journal publication. How to publish

    research papers published in

  4. (PDF) Choosing the Right Journal for a Scientific Paper

    research papers published in

  5. How to publish research paper in International Journals?

    research papers published in

  6. Published Research Paper_IJEAB_Aug 2016

    research papers published in

VIDEO

  1. Common Types of Research Papers for Publication

  2. What is The Importance of Research in Environmental Science

  3. Mistakes which can get your journal research paper rejected!

  4. Publish your research in open-access journals!🔥WiseUp #shorts

  5. I Published 50+ Research Papers as an Undergraduate Student

  6. The Article Publishing Process Part 1 of 2

COMMENTS

  1. Research articles

    Read the latest Research articles from Scientific Reports. ... Calls for Papers Guide to referees Editor's Choice Journal highlights Publish with us ...

  2. IEEE

    IEEE Spectrum is an award-winning technology magazine and the flagship publication of IEEE, covering major trends and developments in technology, engineering, and science. The Institute, dedicated to IEEE members, features stories about IEEE activities, member profiles, conference information, important member dates and deadlines, IEEE election ...

  3. Latest Research

    A phenome-wide association and Mendelian randomisation study of alcohol use variants in a diverse cohort comprising over 3 million individuals. Our work demonstrates that polymorphisms in genes encoding alcohol metabolising enzymes affect multiple domains of health beyond alcohol-related behaviours.

  4. ACS Publications

    ACS Publications provides high quality peer-reviewed journals, research articles, and information products and services supporting advancement across all fields of chemical sciences. Pair your accounts. Export articles to Mendeley. Get article recommendations from ACS based on references in your Mendeley library.

  5. Search

    Find the research you need | With 160+ million publications, 1+ million questions, and 25+ million researchers, this is where everyone can access science

  6. ResearchGate

    Access 160+ million publications and connect with 25+ million researchers. Join for free and gain visibility by uploading your research.

  7. JAMA

    22,106. 21,768. Explore the latest in medicine including the JNC8 blood pressure guideline, sepsis and ARDS definitions, autism science, cancer screening guidelines, and.

  8. How to Write and Publish a Research Paper for a Peer ...

    Communicating research findings is an essential step in the research process. Often, peer-reviewed journals are the forum for such communication, yet many researchers are never taught how to write a publishable scientific paper. In this article, we explain the basic structure of a scientific paper and describe the information that should be included in each section. We also identify common ...

  9. Publications

    Publications. Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field. Google publishes hundreds of research papers each year. Publishing our work enables us to collaborate and share ideas with, as well as learn from, the broader scientific community.

  10. Google Scholar Search Help

    Finding recent papers. Your search results are normally sorted by relevance, not by date. To find newer articles, try the following options in the left sidebar: click "Since Year" to show only recently published papers, sorted by relevance; click "Sort by date" to show just the new additions, sorted by date;

  11. Google Research

    One research paper started it all. The research we do today becomes the Google of the future. Google itself began with a research paper, published in 1998, and was the foundation of Google Search. Our ongoing research over the past 25 years has transformed not only the company, but how people are able to interact with the world and its information.

  12. 7 steps to publishing in a scientific journal

    Sun and Linton (2014), Hierons (2016) and Craig (2010) offer useful discussions on the subject of "desk rejections.". 4. Make a good first impression with your title and abstract. The title and abstract are incredibly important components of a manuscript as they are the first elements a journal editor sees.

  13. Research Papers in Education

    Journal overview. Research Papers in Education has developed an international reputation for publishing significant research findings across the discipline of education. The distinguishing feature of the journal is that we publish longer articles than most other journals, to a limit of 12,000 words. We particularly focus on full accounts of ...

  14. How to Publish a Research Paper: Your Step-by-Step Guide

    3. Submit your article according to the journal's submission guidelines. Go to the "author's guide" (or similar) on the journal's website to review its submission requirements. Once you are satisfied that your paper meets all of the guidelines, submit the paper through the appropriate channels.

  15. Literature of Science

    Papers published in journals generally go through a peer review process before acceptance and publication. Journal papers are the basic "molecular" unit of scientific knowledge base and are the most important "primary" source in the sciences. More than 80% of the scientific research literature is published in this format. Annually 1.5 million ...

  16. Publications Output: U.S. Trends and International Comparisons

    The countries with low growth rates are those that built their scientific capacity decades ago and continue to maintain their scientific research. The worldwide growth of publication output, from 1.9 million in 2010 to 2.9 million in 2020, was led by four geographically large countries. China (36%), India (9%), Russia (6%), and the United ...

  17. 110553 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on RESEARCH PAPERS. Find methods information, sources, references or conduct a literature review on ...

  18. Empagliflozin after Acute Myocardial Infarction

    A total of 3260 patients were assigned to receive empagliflozin and 3262 to receive placebo. During a median follow-up of 17.9 months, a first hospitalization for heart failure or death from any ...

  19. How to Start Getting Published in Medical and Scientific Journals

    Trends in Medicine. How to Start Getting Published in Medical and Scientific Journals. Katherine J. Igoe April 10, 2024. Whether you're starting a research career, breaking into academic publishing, or pivoting your area of interest, it may seem difficult to build the expertise and network to become a coauthor on academic papers.

  20. 10000 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on PEER-REVIEWED JOURNALS. Find methods information, sources, references or conduct a literature review ...

  21. [2403.20329] ReALM: Reference Resolution As Language Modeling

    ReALM: Reference Resolution As Language Modeling. Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the ...

  22. AI Index Report

    AI Index Report. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the ...

  23. Measuring the Persuasiveness of Language Models \ Anthropic

    Assessing the persuasive impacts of language models is inherently difficult. Persuasion is a nuanced phenomenon shaped by many subjective factors, and is further complicated by the bounds of experimental design. Our research takes a step toward evaluating the persuasiveness of language models, but still has many limitations, which we discuss below.

  24. Predicting and improving complex beer flavor through machine ...

    The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. ... from FWO (1S08821N). Research in the lab of K.J.V ...

  25. Inclusionary Zoning Paper April 2024 Final

    Inclusionary Zoning Paper April 2024 Final. Published OnApril 10, 2024. Share This Post: Share via Twitter. Share via Facebook. Share via LinkedIn. Related Articles. Modeling Inclusionary Zoning's Impact on Housing Production in Los Angeles: Tradeoffs and Policy Implications. A new report, authored by Shane Phillips at the UCLA Lewis Center ...