corpus linguistics research paper

International Journal of Corpus Linguistics

IJCL publishes its articles Online First.

IJCL offers authors the option to publish articles as Open Access , click here for an example.

Follow IJCL on Twitter: https://twitter.com/IJCL_journal

25 April 2024

  • Case and agreement variation in contact : A multifactorial investigation of it -clefts across World Englishes Yi Zhang &  Ming Yue

16 April 2024

  • A user-friendly corpus tool for disciplinary data-driven learning : Introducing CorpusMate Peter Crosthwaite &  Vít Baisa

4 April 2024

  • S. Flach &  M. Hilpert (Eds.). 2022 . Broadening the spectrum of corpus linguistics: New approaches to variability and change Reviewed by Kristen Fleckenstein

25 March 2024

  • Down-sampling from hierarchically structured corpus data Lukas Sönning

13 February 2024

  • “People should get their booster” : Stance towards Covid vaccination in news and academic blogs Hang (Joanna) Zou &  Ken Hyland

29 January 2024

  • Modeling the locative alternation in Mandarin Chinese : A corpus-based study Mengmin Xu , Fuyin Li &  Benedikt Szmrecsanyi

22 December 2023

  • J. Dunn . 2022 . Natural Language Processing for Corpus Linguistics Reviewed by Hanna Schmück | IJCL 29:1 (2024) pp. 123–129

21 December 2023

  • V. Viana (Ed.). 2022 . Teaching English with Corpora: A Resource Book Reviewed by Pascual Pérez-Paredes | IJCL 29:1 (2024) pp. 116–122

7 December 2023

  • Framing the path to net zero : A corpus-assisted discourse analysis of sustainability disclosures by major corporate emitters, 2011–2020 Matteo Fuoli &  Annika Beelitz

14 November 2023

  • Political framing of Covid-19 : From metaphor to moral panic Ariana N Mohammadi

26 October 2023

  • Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp) : A multilingual, POS-tagged, and audio-textual databank Wilkinson Daniel Wong Gonzales

9 October 2023

  • The inverse frequency effect : An exploratory study David Temperley

26 September 2023

  • Pinpointing prescriptive impact : Using change point analysis for the study of prescriptivism at the idiolectal level Beth Malory

14 August 2023

  • Keywords of the manosphere Mark McGlashan &  Alexandra Krendel | IJCL 29:1 (2024) p. 87
  • Association measures for collocation extraction : Automatic evaluation on a large-scale corpus Qi Su , Chen Gu &  Pengyuan Liu | IJCL 29:1 (2024) pp. 59–86

27 July 2023

  • Concordancing for CADS : Practical challenges and theoretical implications Mathew Gillings &  Gerlinde Mautner | IJCL 29:1 (2024) pp. 34–58

20 June 2023

  • Annotation uncertainty in the context of grammatical change Marie-Luis Merten , Marcel Wever , Michaela Geierhos , Doris Tophinke &  Eyke Hüllermeier | IJCL 28:3 (2023) pp. 430–459

15 June 2023

  • Metaphorical polysemy of the Chinese color term hēi 黑 “black” : A corpus-based cognitive semantic analysis with Behavioral Profiles Meichun Liu &  Jinmeng Dou | IJCL 29:1 (2024) pp. 1–33
  • G. Brookes &  P Baker . 2021 . Obesity in the News: Language and Representation in the Press Reviewed by Turo Hiltunen | IJCL 28:4 (2023) pp. 592–596
  • J. Egbert , D. Biber &  B. Gray . 2022 . Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness Reviewed by Tony McEnery | IJCL 28:4 (2023) pp. 586–591

23 May 2023

  • A year to remember? Introducing the BE21 corpus and exploring recent part of speech tag change in British English Paul Baker | IJCL 28:3 (2023) pp. 407–429

16 May 2023

  • Differences in syntactic annotation affect retrieval : Verb-attached PPs in the history of English Eva Zehentner , Marianne Hundt , Gerold Schneider &  Melanie Röthlisberger | IJCL 28:3 (2023) pp. 378–406

6 March 2023

  • Dative alternation in Chinese : A mixed-effects logistic regression analysis Dong Zhang &  Jiajin Xu | IJCL 28:4 (2023) pp. 559–585
  • M. McCarthy . 2020 . Innovations and Challenges in Grammar Reviewed by Beatrix Busse &  Sophie Du Bois | IJCL 28:2 (2023) pp. 284–289

2 March 2023

  • T. McEnery &  V. Brezina . 2022 . Fundamental Principles of Corpus Linguistics Reviewed by Niall Curry | IJCL 28:2 (2023) pp. 278–283

23 February 2023

  • LBiaP : A solution to the problem of attaining observation independence in lexical bundle studies Viviana Cortes &  William Lake | IJCL 28:2 (2023) pp. 263–277
  • “You betcha I’m a ’Merican” : The rise of YOU BET as a pragmatic marker Tomoharu Hirota &  Laurel J. Brinton | IJCL 28:4 (2023) pp. 528–558
  • A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora Hernán Robledo &  Rogelio Nazar | IJCL 28:4 (2023) pp. 500–527

9 January 2023

  • When loanwords are not lone words : Using networks and hypergraphs to explore Māori loanwords in New Zealand English David Trye , Andreea S. Calude , Te Taka Keegan &  Julia Falconer | IJCL 28:4 (2023) pp. 461–499

2 December 2022

  • B. Le Bruyn &  M. Paquot (Eds.). 2021 . Learner Corpus Research Meets Second Language Acquisition Reviewed by Li Nguyen | IJCL 28:1 (2023) pp. 120–124

25 November 2022

  • Assessing word commonness : Adding dispersion to frequency Mikkel Ekeland Paulsen | IJCL 28:3 (2023) pp. 318–343
  • Things we smell and things they smell like : Communicatively relevant odours and odorants Thomas Poulton | IJCL 28:3 (2023) pp. 291–317

Volume 29 (2024)

Volume 28 (2023), volume 27 (2022), volume 26 (2021), volume 25 (2020), volume 24 (2019), volume 23 (2018), volume 22 (2017), volume 21 (2016), volume 20 (2015), volume 19 (2014), volume 18 (2013), volume 17 (2012), volume 16 (2011), volume 15 (2010), volume 14 (2009), volume 13 (2008), volume 12 (2007), volume 11 (2006), volume 10 (2005), volume 9 (2004), volume 8 (2003), volume 7 (2002), volume 6 (2001), volume 5 (2000), volume 4 (1999), volume 3 (1998), volume 2 (1997), volume 1 (1996).

General information about our electronic journals .

Subscription rates

All prices for print + online include postage/handling.

80.00 per volume. Private subscriptions are for personal use only, and must be pre-paid and ordered directly from the publisher.

Available back-volumes

The International Journal of Corpus Linguistics is a peer-reviewed journal and referees will assess submissions with regard to originality, significance, academic rigour, and presentation of argument. Manuscripts submitted to the International Journal of Corpus Linguistics should not at the same time be under consideration for publication elsewhere.

Full research papers should not exceed 9,000 words (including tables and references). Short papers which introduce new corpora or provide technical descriptions of tools and annotation schemes, should be between 2,000 and 4,000 words in length (including tables and references).

Manuscripts of articles and reviews must be prepared in accordance with the style sheet .

Manuscripts should be submitted through the journal's online submission and manuscript tracking site . Please upload both a word (.doc) and PDF version of your manuscript.

Please consult the Short Guide to EM for Authors before you submit your paper.

John Benjamins journals are committed to maintaining the highest standards of publication ethics and to supporting ethical research practices.

Authors and reviewers are kindly requested to read this Ethics Statement  .

Please also note the guidance on the use of (generative) AI in the statement.

Rights and Permissions

Authors must ensure that they have permission to use any third-party material in their contribution; the permission should include perpetual (not time-limited) world-wide distribution in print and electronic format.

For information on authors' rights, please consult the rights information page .

Open Access

Articles accepted for this journal can be made Open Access through payment of an Article Publication Charge (APC) of EUR 1800 (excl. tax); more information can be found on the publisher's Open Access Policy page . There is no fee if the article is not to be made Open Access and thus available only for subscribers.

Corresponding authors from institutions with which John Benjamins has a Read & Publish arrangement can publish Open Access without paying a fee; information on the institutions and which articles qualify, can be found on this page .

For information about permission to post a version of your article online or in an institutional repository ('green' open access or self-archiving), please consult the rights information page .

John Benjamins Publishing Company has an agreement in place with Portico for the archiving of all its online journals and e-books.

If you are not able to submit online, or for any other editorial correspondence, please contact the editorial team:

Linguistics

Main bic subject, main bisac subject.

  • Architecture and Design
  • Asian and Pacific Studies
  • Business and Economics
  • Classical and Ancient Near Eastern Studies
  • Computer Sciences
  • Cultural Studies
  • Engineering
  • General Interest
  • Geosciences
  • Industrial Chemistry
  • Islamic and Middle Eastern Studies
  • Jewish Studies
  • Library and Information Science, Book Studies
  • Life Sciences
  • Linguistics and Semiotics
  • Literary Studies
  • Materials Sciences
  • Mathematics
  • Social Sciences
  • Sports and Recreation
  • Theology and Religion
  • Publish your article
  • The role of authors
  • Promoting your article
  • Abstracting & indexing
  • Publishing Ethics
  • Why publish with De Gruyter
  • How to publish with De Gruyter
  • Our book series
  • Our subject areas
  • Your digital product at De Gruyter
  • Contribute to our reference works
  • Product information
  • Tools & resources
  • Product Information
  • Promotional Materials
  • Orders and Inquiries
  • FAQ for Library Suppliers and Book Sellers
  • Repository Policy
  • Free access policy
  • Open Access agreements
  • Database portals
  • For Authors
  • Customer service
  • People + Culture
  • Journal Management
  • How to join us
  • Working at De Gruyter
  • Mission & Vision
  • De Gruyter Foundation
  • De Gruyter Ebound
  • Our Responsibility
  • Partner publishers

corpus linguistics research paper

Your purchase has been completed. Your documents are now available to view.

journal: Corpus Linguistics and Linguistic Theory

Corpus Linguistics and Linguistic Theory

  • Online ISSN: 1613-7035
  • Print ISSN: 1613-7027
  • Type: Journal
  • Language: English
  • Publisher: De Gruyter Mouton
  • First published: May 20, 2005
  • Publication Frequency: 3 Issues per Year
  • Audience: Researchers from different theoretical backgrounds and with different areas of interest that share a commitment to the systematic and exhaustive analysis of naturally occurring language

Review of A Practical Handbook of Corpus Linguistics

Magali Paquot and Stephan Th. Gries (eds.), Springer Nature Publishing Company, Cham, 2020 (eBook), ISBN: 978-3-030-46216-1

  • Book Review
  • Published: 05 July 2022
  • Volume 6 , pages 253–260, ( 2022 )

Cite this article

corpus linguistics research paper

  • Leida Maria Monaco   ORCID: orcid.org/0000-0003-4164-7815 1  

226 Accesses

Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Biber, D. (1988). Variation across Speech and Writing . Cambridge: Cambridge University Press

Book   Google Scholar  

Biber, D. (1989). A typology of English texts. Linguistics , 27 , 3–43

Article   Google Scholar  

McEnery, T., & Hardie, A. (2011). Corpus Linguistics: Method, Theory and Practice . Cambridge: Cambridge University Press

McEnery, T., & Wilson, A. (1996). Corpus Linguistics . Edinburgh: Edinburgh University Press

Google Scholar  

Mohamed, G. (2011). Text Classification in the BNC Using Corpus and Statistical Methods . Lancaster University: Unpublished PhD Dissertation

Mohamed, G., & Hardie, A. (2021). Approaching Text Typology through Cluster Analysis in Arabic. In T. McEnery (Ed.), Arabic Corpus Linguistics (pp. 201–228). Edinburgh: Edinburgh University Press

Moisl, H. (2015). Cluster Analysis for Corpus Linguistics . Amsterdam/Philadelphia: John Benjamins

O’Keeffe, A., & McCarthy, M. (2022 [2010]). The Routledge Handbook of Corpus Linguistics . London:Routledge

Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use . Cambridge: Cambridge University Press

Stefanowitsch, A. (2020). Corpus Linguistics: A Guide to the Methodology . Berlin: Language Science Press

Weisser, M. (2015). Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis . Chichester: Wiley

Download references

Author information

Authors and affiliations.

Departamento de Filología Inglesa, Alemana y Francesa, Facultad de Filosofía y Letras, Universidad de Oviedo, c/ Amparo Pedregal, 15, 33011, Oviedo, Spain

Leida Maria Monaco

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Leida Maria Monaco .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Monaco, L. Review of A Practical Handbook of Corpus Linguistics . Corpus Pragmatics 6 , 253–260 (2022). https://doi.org/10.1007/s41701-022-00125-8

Download citation

Received : 24 April 2022

Accepted : 04 May 2022

Published : 05 July 2022

Issue Date : September 2022

DOI : https://doi.org/10.1007/s41701-022-00125-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

corpus linguistics research paper

Announcements

Articles falling within one of the categories published in RiCL are welcome through the whole year

Current Issue

Book reviews, issn: 2243-4712.

SCImago Journal & Country Rank

Abstracting & indexing

Google Scholar

Index Copernicus International

Internet Archive Scholar

Linguistic Bibliography Online

MLA International Bibliography

Norwegian List

OASPA 

Publication Forum

ScienceGate

Scimago Journal Rank

Ulrich's Periodicals Directory

  • For Readers
  • For Authors
  • For Librarians

Asociación Española de Lingüística de Corpus /  Spanish Association for Corpus Linguistics Departamento de Filología Inglesa Facultad de Letras | Campus de La Merced Universidad de Murcia, 30003 Murcia, Spain

About this Publishing System

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical Literature
  • Classical Reception
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Archaeology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Variation
  • Language Families
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Modernism)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Culture
  • Music and Religion
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Law and Politics
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Medical Ethics
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Games
  • Computer Security
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business History
  • Business Strategy
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Methodology
  • Economic Systems
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Theory
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Applied Linguistics (2nd edn)

  • < Previous chapter
  • Next chapter >

The Oxford Handbook of Applied Linguistics (2nd edn)

38 Research in Corpus Linguistics

Douglas Biber is Regents' Professor of English (Applied Linguistics) at Northern Arizona University. His research efforts have focused on corpus linguistics, English grammar, and register variation (in English and cross-linguistic; synchronic and diachronic). His publications include books on register variation and corpus linguistics published by Cambridge University Press (1988, 1995, 1998, to appear), the co-authored Longman Grammar of Spoken and Written English (1999), and more recent studies of language use in university settings and discourse structure investigated from a corpus perspective (both published by Benjamins: 2006 and 2007).

Randi Reppen is professor of applied linguistics in the Department of English at Northern Arizona University. Her research interests include exploring how corpus linguistics can inform language teaching and materials development. She can be reached at http://[email protected].

Eric Friginal is assistant professor in the Department of Applied Linguistics and English as a Second Language at Georgia State University. His main research interest lies in using corpus linguistics to explore linguistic variation in professional, cross-cultural discourse in the context of outsourced call centers in the Philippines serving American customers. He is the author of The Language of Outsourced Call Centers: A Corpus-Based Study of Cross-Cultural Interaction. He can be reached at http://[email protected].

  • Published: 18 September 2012
  • Cite Icon Cite
  • Permissions Icon Permissions

Corpus linguistics is a research approach that has developed over the past few decades to support empirical investigations of language variation and use, resulting in research findings that are have much greater generalizability and validity than would otherwise be feasible. Corpus linguistics is not in itself a model of language. Rather, it can be regarded as primarily a methodological approach; it is empirical, analyzing the actual patterns of use in natural texts. It utilizes a large and principled collection of natural texts, known as a corpus , as the basis for analysis. At the same time, corpus linguistics is more than a methodological approach, because these methodological innovations have enabled researchers to ask fundamentally different kinds of research questions, sometimes resulting in radically different perspectives on language variation and use from those taken in previous research. Corpus linguistic research offers strong support for the view that language variation is systematic and can be described using empirical, quantitative methods.

1. Introduction

Corpus linguistics is a research approach that has developed over the past several decades to support empirical investigations of language variation and use, resulting in research findings that are have much greater generalizability and validity than would otherwise be feasible. Corpus linguistics is not in itself a model of language. Rather, it can be regarded as primarily a methodological approach:

It is empirical, analyzing the actual patterns of use in natural texts.

It utilizes a large and principled collection of natural texts, known as a corpus , as the basis for analysis.

It makes extensive use of computers for analysis, employing both automatic and interactive techniques.

It depends on both quantitative and qualitative analytical techniques. (Biber, Conrad & Reppen, 1998 : 4)

At the same time, corpus linguistics is more than a methodological approach, because these methodological innovations have enabled researchers to ask fundamentally different kinds of research questions, sometimes resulting in radically different perspectives on language variation and use from those taken in previous research. Corpus linguistic research offers strong support for the view that language variation is systematic and can be described using empirical, quantitative methods. Variation often involves complex patterns consisting of the interaction among several different linguistic parameters, but, in the end, it is systematic. Beyond this, the major contribution of corpus linguistics is to document the existence of linguistic constructs that are not recognized by current linguistic theories. Research of this type—referred to as a corpus-driven approach—identifies strong tendencies for words and grammatical constructions to pattern together in particular ways, whereas other theoretically possible combinations rarely occur.

A novice student of linguistics could be excused for believing that corpus linguistics evolved only recently, as a reaction against the standard practice of intuition-based linguistics. Introductory linguistics textbooks tend to present linguistic analysis (especially syntactic analysis) as it has been practiced over the past 50 years, employing the analyst's intuitions rather than being based on empirical analysis of natural texts. Against that background, it would be easy for a student to imagine that corpus linguistics developed only in the 1980s and 1990s, responding to the need to base linguistic descriptions on actual language use.

This view is far from accurate. In fact, intuition-based linguistics developed as a reaction to corpus-based linguistics. That is, the standard practice in linguistics up until the 1950s was to base language descriptions on analyses of collections of natural texts: precomputer corpora. Dictionaries have long been based on empirical analysis of word use in natural sentences. For example, Samuel Johnson's Dictionary of the English Language , published in 1755, was based on approximately 150,000 natural sentences recorded on slips of paper, to illustrate the natural usage of words. The Oxford English Dictionary , published in 1928, was based on approximately 5,000,000 citations from natural texts (totaling around 50 million words), compiled by over 2,000 volunteers over a 70-year period. (See the discussion in G. D. Kennedy, 1998 : 14–15.) West's ( 1953 ) creation of the General Service List from a preelectronic corpus of newspapers was one of the first empirical vocabulary studies not motivated by the goal of creating a dictionary.

Grammars were also sometimes based on empirical analyses of natural text corpora before 1960. For example, Jespersen's grammars of English (1909–1949) used natural sentences from newspapers and novels to illustrate the various structures. An even more noteworthy example of this type is the work of C. C. Fries, who wrote two corpus-based grammars of American English. The first, published in 1940, had a focus on usage and social variation, based on a corpus of letters written to the government. The second is essentially a grammar of conversation: It was published in 1952, based on a 250,000-word corpus of telephone conversations. It includes authentic examples taken from the corpus and discussion of grammatical features that are especially characteristic of conversation (e.g., the words well, oh, now , and why when they initiate a “response utterance unit”; Fries, 1952 : 101–102).

In the 1960s and 1970s, most research in linguistics shifted to intuition-based methods, arguing that language was a mental construct and that empirical analyses of corpora were not relevant for describing language competence. However, even during this period, some linguists continued the tradition of empirical linguistic analysis. For example, in the early 1960s, Randolph Quirk began the Survey of English Usage, a precomputer collection of 200 spoken and written texts (each around 5,000 words) that was subsequently used for descriptive grammars of English (e.g., Quirk et al., 1972 ). Functional linguists like Prince and Thompson also continued this descriptive tradition, arguing that (noncomputerized) collections of natural texts could be studied to identify systematic differences in the functional use of linguistic variants. For example, Prince 1978 compares the discourse functions of WH -clefts and IT -clefts in spoken and written texts. Thompson has been especially interested in the study of grammatical variation in conversation; for example, Thompson and Mulac 1991 analyzed factors influencing the retention versus omission of the complementizer that occur in conversation, whereas Fox and Thompson 1990 studied variation in the realization of relative clauses in conversation.

What changed in the 1980s were the widespread availability of large electronic corpora, and the increasing availability of computational tools that facilitated the linguistic analysis of those corpora. Work on large electronic corpora began in the 1960s, when Kucera and Francis 1967 compiled the Brown Corpus (a one-million word corpus of published AmE written texts). This was followed by a parallel corpus of BrE written texts: the LOB Corpus, published in the 1970s.

It was not until the 1980s, though, that major studies of language use based on large electronic corpora began to appear. Thus, in 1982, Francis and Kucera provide a frequency analysis of the words and grammatical part-of-speech categories found in the Brown Corpus, followed in 1989 by a similar analysis of the LOB Corpus (Johansson and Hofland, 1989 ). Book-length descriptive studies of linguistic features began to appear in this period (e.g., Granger, 1983 , on passives; de Haan, 1989 , on nominal postmodifiers) as did the first multidimensional studies of register variation (e.g., Biber, 1988 ). During this same period, English language learner dictionaries based on the analysis of large electronic corpora began to appear, such as the Collins CoBuild English Language Dictionary (1987) and the Longman Dictionary of Contemporary English (1987). Since that time, most descriptive studies of linguistic variation and use in English have been based on analysis of an electronic corpus, either a large standard corpus (such as the British National Corpus) or a small corpus designed for a specific study (e.g., a corpus of 20 biology research articles constructed for a genre analysis). Within applied linguistics, the subfields of English for specific purposes and English for academic purposes have been especially influenced by corpus research, so that nearly all articles published in these areas employ some kind of corpus analysis.

Studies in this research tradition have adopted the tools and techniques available from computer-based corpus linguistics, with its emphasis on the representativeness of the text collection, and its computational tools for investigating distributional patterns across registers and across discourse contexts in large text collections. The textbook treatments by Kennedy 1998 , Biber, Conrad, and Reppen ( 1998 ), and McEnery, Xiao, and Tono ( 2006 ) provide good introductions to the methods used for these studies as well as surveys of previous research.

In the ensuing sections, we survey many of the most important linguistic studies over the past 25 years that have employed corpus analysis. These studies have been motivated by two major research goals (see Biber, Conrad, and Reppen, 1998 : 5–8):

To describe linguistic features, such as vocabulary, lexical combinations, or grammatical features. These studies focus on variation in the choice among related linguistic features (e.g., the simple past tense versus present perfect aspect) or on the discourse functions of a single linguistic feature.

To describe the overall characteristics of a variety: a register or dialect. These studies provide relatively comprehensive linguistic descriptions of a single variety or of a set of related varieties.

Section 2, which follows, introduces studies of the first type, whereas section 3 surveys studies of the second type. Studies of both types have been undertaken for many of the world's languages. However, to limit the scope of the chapter, we survey only studies of English. Then, in section 4, we survey pedagogical applications of these descriptive corpus-based studies, discussing how classroom teaching and materials development have been influenced by the corpus revolution.

2. Descriptive Linguistic Studies

2.1. corpus studies with a lexical focus.

Many of the earliest uses of corpora were designed to provide word lists ranked by frequency, comparing the most frequent words in different varieties. For example, Francis and Kucera 1982 and Johansson and Hofland 1989 catalog the most frequent words in the Brown and LOB Corpora, comparing word frequencies in the fiction versus nonfiction components of the corpora.

One of the major contributions of corpus-based lexical studies has been the insight that collocational associations are a central consideration for describing the meaning of a word. For example, the copular verbs turn, ome , and go all have the same dictionary meaning: “to become, or to change to another state.” However, corpus research (Biber et al., 1999 : 444–445) shows that these three verbs have very different collocational associations: The most common adjectives following turn are color terms, like black, brown, red , and white . The most common adjectives following come describe processes representing a change to a more dynamic condition, such as alive, awake, clean, loose , and unstuck . And in contrast to both other verbs, the most common adjectives following go are all negative: crazy, mad , and wrong . It is not clear whether differences like these should be regarded as part of the core connotational meaning of a word, but it seems uncontroversial that this kind of information is crucially important for language learners.

There have been numerous corpus-based studies of collocation. Probably the best known is Sinclair 1991 , who provides detailed descriptions on the collocations of decline, yield , and set in . Another excellent book-length introduction to the corpus-based study of collocation is by Partington 1998 . For example, in chapter 2 of his book, Partington discusses the word sheer and its supposed synonyms pure, complete , and absolute , showing how these words are not at all interchangeable when considered from the perspective of their frequent collocates. Mahlberg ( 2005 ) provides a book-length treatment of general nouns in English (e.g., time, day, man, woman, people, thing, way ), describing their meanings and use with respect to their collocational associations.

Most studies of collocation have disregarded register differences. One exception to this practice appears in a work by Biber, Conrad, and Reppen 1998 : 43–53), which shows how the near-synonyms big, large , and great co-occur with very different sets of collocates (e.g., big enough versus large number versus great deal ), and further shows how the collocational associations are very different in fiction versus academic writing. Other collocational studies taking a register perspective include those by Gledhill 2000 and Marco 2000 , which both describe the functions of collocations in academic research writing.

Studies of collocation have in turn led to development of the notion of semantic prosody (Louw, 1993 ; Partington, 1998 ): the positive or negative connotations shared by the set of collocates that co-occur with a word. For example, the copular verb go (previously discussed) has a strong negative semantic prosody, whereas the copular verb come has a positive semantic prosody. Partington 1998 : 66–67) discusses another example of this type: the verb commit , which has a strong negative semantic prosody, co-occurring with nouns like crime, suicide , and offenses . Similarly, Sinclair 1991 : 74–75) notes that the nouns that co-occur as the subject of set in are mostly unpleasant states of affairs, such as rot, decay, malaise, despair, infection, disillusion , and so on. Studies have tended to focus on words with negative prosodies rather than positive prosodies. Other examples include cause (Stubbs, 1995 ), signs of (Stubbs, 2001 : 458), and sit through (Hunston, 2002b : 60–62).

A related productive area of research has been the corpus-based (and corpus-driven) investigation of formulaic language in spoken and written registers. The methods and research goals of this line of research are quite different from the typical study of collocation. That is, studies of collocation have typically been case studies focused on a few particular words. These studies have typically disregarded register differences, and they have not attempted to generalize to the textual use of collocational combinations generally. In contrast, corpus studies of longer formulaic expressions are normally carried out in the context of a particular register or for the purposes of describing patterns of variation among multiple registers; in addition, the goals of these studies are to generalize about the use of formulaic language in the target registers rather than case studies restricted to one or two particular formulaic sequences. For example, Simpson ( 2004 ) and Simpson and Mendis 2003 describe the functions of idioms in academic spoken registers.

Many other studies have taken a corpus-driven approach to this research domain, identifying the sequences of words that are most common in different spoken and written registers (rather than starting with a set of formulaic expressions identified a priori based on their perceptual salience). These common word sequences, often referred to as lexical bundles , are usually not idiomatic and are not complete structures, but they are important building blocks of discourse. Thus, for example, Altenberg ( 1998 ) focuses on the recurrent word sequences in spoken English, whereas Biber et al. (1999 , chapter 13 ) compare the lexical bundles in conversation and academic writing. Applying that framework, several studies have considered the types and functions of lexical bundles in additional registers: university classroom teaching and textbooks (Biber, Conrad, and Cortes, 2004 ; Nesi and Basturkmen, 2006 ), university student writing (Cortes, 2004 ), university institutional and advising registers (Biber and Barbieri, 2007 ), and political debate (Partington and Morley, 2004). N. Ellis et al. ( 2008 ) begin with a corpus analysis to identify a set of word sequences that are either frequent or that have strong collocational associations; they then test the psycholinguistic status of those sequences with respect to their perceptual salience and for their role in language production and comprehension (cf. Schmitt, Grandage, and Adolphs, 2004 ).

Corpus studies have shown that the types and functions of lexical bundles are very different among spoken and written registers (see, e.g., Biber, Conrad, and Cortes, 2004 ). First of all, there are generally more lexical bundles used in spoken registers than written registers. In terms of their structural characteristics, the bundles in speech tend to be composed of verb phrase and clause fragments, whereas the bundles in writing tend to be composed of noun phrase and prepositional phrase fragments. Those differences correspond to different discourse functions: The bundles in speech tend to be used for stance and discourse organizing functions, whereas the bundles in writing tend to have referential functions.

Of all subareas of applied linguistics, corpus research has probably had the greatest impact on lexical research and vocabulary studies. As previously noted, West 1953 created the General Service List of important vocabulary items based on analysis of a preelectronic corpus, and that list has been used in countless studies of vocabulary acquisition. One of the central concerns has been efforts to estimate the number of different words that a learner needs to know for different communicative purposes. Waring and Nation ( 1997 ) use corpus analysis to estimate the number of words needed to comprehend general written texts, whereas Coxhead 2000 analyzed a corpus of academic texts from several disciplines to develop a word list specifically for written academic language. Adolphs and Schmitt 2003 utilize analyses of spoken corpora to estimate the number of words required to understand conversational interactions.

Corpus research is similarly accepted as the standard practice in lexicography, so that all major ELT dictionaries are currently based on analysis of actual word use in large corpora (e.g., the Collins CoBuild English Language Dictionary [1987], the Longman Dictionary of Contemporary English [1987], and the Cambridge Advanced Learner's Dictionary [2005]). In sum, it would not be an overstatement to say that corpus research has revolutionized the way that lexicography, vocabulary acquisition, and word use in general are approached in linguistics.

2.2. Corpus Studies with a Grammatical Focus

Within descriptive linguistics, there have been numerous book-length studies over the past 20 years reporting corpus-based investigations of grammar and discourse: for example, Tottie 1991 on negation, Collins 1991 on clefts, Mair 1990 on infinitival complement clauses, Meyer 1992 on apposition, several books on nominal structures (e.g., de Haan, 1989 ; Geisler, 1995 ; Johansson, 1995), Mindt (1995) on modal verbs, Hunston and Francis 2000 on pattern grammar, Lindquist and Mair 2004 on grammaticalization, and Mair 2006 on recent grammatical change within American English and British English—in other words, during the twentieth century).

Most corpus-based grammatical studies take a register perspective. Many of these focus on the linguistic variants associated with a feature, using register differences as one factor to account for the patterns of linguistic variation. However, there are an even larger number of studies that have focused on the use of a particular linguistic feature in a single register; in this case, the goals of the study are to describe both the discourse functions of the linguistic feature as well as the target register itself. Studies of both types can be further subdivided according to the linguistic level of the target feature (e.g., grammatical class, dependent clause type). In addition, both types of studies include descriptions of synchronic patterns of use as well as descriptions of historical patterns of variation.

Corpus-based studies of linguistic features using register as a predictor have investigated linguistic variation from all grammatical levels, from simple part of speech categories to variation in the realization of syntactic phrase and clause types. These studies have shown that descriptions of grammatical variation and use are not valid for the language as a whole. Rather, characteristics of the textual environment interact with register differences so that strong patterns of use in one register often represent only weak patterns in other registers. The Longman Grammar of Spoken and Written English (Biber et al., 1999 ) and Cambridge Grammar of English (Carter and McCarthy, 2006 ) are comprehensive reference works with this goal, applying corpus-based analyses to show how any grammatical feature can be described for structural characteristics as well as patterns of use across spoken and written registers.

As previously noted, many corpus-based studies use register differences as a predictor of linguistic variation, whereas others study linguistic features in the context of a single register. Thus, for example, Tottie 1991 contrasts the choices between synthetic and analytic negation, as in

He could find no words to express his pain. versus He could n't find any words to express his pain.

Among other factors, Tottie shows that synthetic negation is strongly preferred in written rather than in spoken registers, whereas analytic negation is more commonly used in spoken registers. In contrast, Hyland (1998a) focuses on the single register of scientific research articles, describing variation in the use of hedges within that register.

As noted earlier, these studies have documented the use of lexico-grammatical features at all linguistic levels. Several studies analyze a single part-of-speech category, documenting the patterns of variation and use in particular registers. Studies taking the perspective of register variation include Barbieri 2005 on quotative verbs and Römer (2005a) on progressive verbs.

Several other studies describe linguistic variation within the context of a single spoken register, such as conversation. Quaglio and Biber 2006 survey the distinctive grammatical characteristics of conversation identified through corpus research, whereas other studies provide detailed descriptions of a particular feature in conversation. For example, McCarthy ( 2002 ) describes nonminimal response tokens; Aijmer 2002 provides a book-length description of discourse particles; Carter and McCarthy 2006 describe the discourse functions of the get passive; Tao and McCarthy 2001 focus on nonrestrictive which clauses; and Norrick 2008 describes the discourse functions of interjections in conversational narratives. Other studies of a single spoken register have focused on academic speech in university settings, based on analysis of the Michigan Corpus of Academic Spoken English (MICASE). For example, Fortanet 2004 focuses on the pronoun we in university lectures; Lindemann and Mauranen 2001 describe the use of just in academic speech; and Swales 2001 provides a detailed description of the discourse functions served by point and thing in university academic speech.

A much larger number of studies have described linguistic variation within the context of a particular written register, most often a type of academic writing. Many of these have focused on the kinds of verbs used in research writing (e.g., Thomas and Hawes, 1994 ), or the referring expressions in research articles (e.g., Hyland, 2001 , on the use of self-mentions and Kuo, 1999 , on the role relationships expressed by personal pronouns). Other studies deal with simple grammatical structures, but again most often within the context of academic writing. For example, Hyland (2002a) and Swales et al. ( 1998 ) describe variation in the use of imperatives and the expression of directives, whereas Hyland (2002b) and Marley 2002 focus on the use of questions in written registers.

The study of linguistic variation related to the expression of stance and modality has been especially popular in corpus-based research. Several of these studies compare the ways in which stance is expressed in spoken versus written registers. Biber and Finegan 1988 and Conrad and Biber 2001 focus on adverbial markers of stance in speech and writing, whereas Biber and Finegan (1989a , 1989b) and Biber et al. (1999 , chapter 12 ) survey variation in the use of numerous grammatical stance devices (including modal verbs, stance adverbials, and stance complement clause constructions), again contrasting the patterns of use in spoken versus written registers. Biber (2006a , 2006b) and Keck and Biber ( 2004 ) take a similar approach but applied to university spoken and written registers.

Many other studies focus exclusively on the expression of stance and modality in written registers (usually academic writing). These include Vohla's (1999) study of modality in medical research writing, the studies of stance by Charles (2003 , 2006 , 2007) on academic writing from different disciplines, and several studies that focus on hedging in academic writing (e.g., Grabe and Kaplan, 1997 ; Hyland, 1996 , 1998a ; Salager, 1994 ). Related studies have been carried out under the rubric of evaluation , again usually focusing on academic writing (e.g., Hunston and Thompson, 2000 ; Hyland and Tse, 2005 ; Römer, 2005b ; Stotesbury, 2003 ; Tucker, 2003 ; cf. Bednarek's 2006 study of evaluation in newspaper language). Fewer studies have described the linguistic devices used to express stance and evaluation in spoken registers; some of these have focused on conversation (e.g., McCarthy and Carter, 1997 , 2004 ; Tao, 2007 ), whereas others have focused on academic spoken registers (e.g., Mauranen, 2003 , 2004 ; Mauranen and Bondi, 2003 ; Swales and Burke, 2003 ).

Dependent clauses and more complex syntactic structures have also been the focus of numerous corpus-based studies that consider register differences. Several studies contrast the patterns of use in spoken and written registers: Collins 1991 on cleft constructions, de Haan 1989 on nominal postmodifiers, Geisler 1995 on relative infinitives, Johansson ( 1995 ) on relative pronoun choice, and Biber et al. ( 1999 ) on complement clause constructions. Other studies have focused on the use of a syntactic construction in a particular register, like the study of conditionals in medical discourse (G. Ferguson, 2001 ) or the study of extraposed constructions in university student writing (Hewings and Hewings, 2002 ).

All of the kinds of studies surveyed in the preceding paragraphs can be approached from a historical (or diachronic) perspective rather than a synchronic perspective, and numerous studies have taken that approach. For example, many of the papers in the edited volumes by Nevalainen and Kahlas-Tarkka 1997 and Kytö, Rydén, and Smitterberg ( 2006 ) incorporate register comparisons to describe historical change for linguistic features like existential clauses, adverbial clauses, and relative clauses. Biber and Clark ( 2002 ) contrast the kinds of noun modifiers common in academic versus popular written registers. Several historical studies of stance and modality have included analysis of register differences, such as Kytö ( 1991 ) on modal verbs in written and speech-based registers, Culpeper and Kytö ( 1999 ) on hedges in Early Modern English dialogues, Salager-Meyer and Defives 1998 on hedges in academic writing over the last two centuries, Fitzmaurice (2002b , 2003) on stance and politeness in early eighteenth-century letters, and Biber 2004 on historical change in the use of stance and modal features across a range of speech-based and written registers. A few studies have focused on recent (i.e., twentieth-century) historical change; for example, Hundt and Mair 1999 contrast the rapid grammatical change observed in “agile” registers (like newspaper writing) with the much slower pace of change observed in “uptight” registers like academic prose. Leech, Hundt, Mair, and Smith (in press) track historical change in the twentieth century using the register categories distinguished in the Brown/LOB family of corpora.

3. Descriptions of Varieties

3.1. register descriptions.

The studies surveyed in the preceding section focus on a particular linguistic feature, using register to describe the use of that feature. In the present section, the analytical perspective is reversed: These studies focus on the overall description of a register, considering a suite of linguistic features that are characteristic of the register.

Many studies of this type describe spoken registers, including conversation (e.g., Biber, 2008 ; Carter and McCarthy, 1997 , 2004 ; Quaglio and Biber, 2006 ; Biber and Conrad, in press: chapter 4 ), service encounters (e.g., McCarthy, 2000 ), call center interactions (Friginal, 2009a , 2009b ), spoken business English (McCarthy and Handford, 2004 ), television dialogue (Quaglio, 2009 ; Rey, 2001 ), spoken media discourse (O'Keeffe, 2006 ), and spoken university registers like classroom teaching, office hours, and teacher-mentoring sessions (e.g., Biber, 2006a ; Biber, Conrad, and Leech, 2002 ; Csomay, 2005 ; Reppen and Vásquez, 2007 ). Ädel and Reppen 2008 include several papers that use corpus analysis to describe different registers from academic, workplace, and television settings.

However, written registers have received considerably more attention than spoken registers. Academic prose has been the best described written register (see, e.g., Biber, 2006a ; Biber, Connor, and Upton, 2007 ; Connor and Mauranen, 1999 ; Connor and Upton, 2004b ; Conrad, 1996 , 2001 ; Freddi, 2005 ; McKenna, 1997; Tognini-Bonelli and Del Lungo Camiciotti, 2005 ). But many other written registers have also been described using corpus-based analysis, including personal letters (e.g., Connor and Upton, 2003 ; Fitzmaurice, 2002a ; Precht, 1998 ), written advertisements (e.g., Bruthiaux, 1994 , 1996 , 2005 ), newspaper discourse (e.g., Bednarek, 2006 ; Herring, 2003 ; Jucker, 1992 ), and fiction (e.g., Thompson and Sealey, 2007 ; Mahlberg, in press; Semino and Short, 2004 ). Electronic registers that have emerged over the past few decades, from e-mail communication to weblogs and texting, have been an especially interesting and productive area of research (see, e.g., Biber and Conrad, in press: chapter 7 ; Danet and Herring; 2003 , Gains, 1999 ; Herring and Paolillo, 2006 ; Hundt, Nesselhauf, and Biewer, 2007 ; Morrow, 2006 ).

3.2. Multidimensional Analyses of Register Variation

Most of the studies previously listed have the primary goal of describing a single register. However, corpus analysis can also be used to describe the overall patterns of variation among a set of spoken and/or written registers. Perhaps the best known approach used for descriptions of this type is multidimensional (MD) analysis: a corpus-driven methodological approach that identifies the frequent linguistic co-occurrence patterns in a language, relying on inductive empirical/quantitative analysis (see, e.g., Biber, 1988 , 1995 ; Biber and Conrad, in press: chapter 8 ). Frequency plays a central role in the analysis, because each dimension represents a constellation of linguistic features that frequently co-occur in texts. These dimensions of variation can be regarded as linguistic constructs not previously recognized by linguistic theory. Thus, MD analysis is a corpus-driven (as opposed to corpus-based) methodology, in that the linguistic constructs—the dimensions—emerge from analysis of linguistic co-occurrence patterns in the corpus. The set of co-occurring linguistic features that comprise each dimension is identified quantitatively. That is, based on the actual distributions of linguistic features in a large corpus of texts, statistical techniques (specifically, factor analysis) are used to identify the sets of linguistic features that frequently co-occur in texts.

The original MD analyses (Biber, 1986 , 1988 ) investigated the relations among general spoken and written registers in English, based on analysis of the Lancaster-Oslo/Bergen (LOB) Corpus (15 written registers) and the London-Lund Corpus (6 spoken registers). Sixty-seven different linguistic features were analyzed computationally in each text of the corpus. Then, the co-occurrence patterns among those linguistic features were analyzed using factor analysis, identifying the underlying parameters of variation—in other words, the factors or dimensions.

In the 1988 MD analysis, the 67 linguistic features were reduced to 7 underlying dimensions. (The technical details of the factor analysis are given in Biber, 1988 : chapters 4 – 5 ; see also Biber, 1995 : chapter 5 ). The dimensions are interpreted functionally, based on the assumption that linguistic co-occurrence reflects underlying communicative functions; that is, linguistic features occur together in texts because they serve related communicative functions. For example, table 38.1 lists the important co-occurring features for dimensions 1 and 2 from the 1988 MD analysis, together with the labels reflecting the functional interpretation.

Many subsequent studies have applied the 1988 dimensions of variation to study the linguistic characteristics of other more specialized registers and discourse domains (Conrad and Biber, 2001 ). The following are examples: However, other MD studies have undertaken new corpus-driven analyses to identify the distinctive sets of co-occurring linguistic features that appear in a particular discourse domain or in a language other than English. The following section surveys some of those studies.

3.2.1 Comparison of the Multidimensional Patterns across Discourse Domains and Languages

Numerous other studies have undertaken complete MD analyses, using factor analysis to identify the dimensions of variation operating in a particular discourse domain in English rather than applying the dimensions from the 1988 MD analysis (e.g., Biber, 1992 , 2001 , 2006a , 2008 ; Biber, Connor, and Upton, 2007 ; Biber and Jones, 2005 ; Biber and Kurjian, 2007 ; Friginal 2006 , 2009b ; Kanoksilapatham, 2005 , 2007 ; Reppen, 2001 ).

Given that each of these studies is based on a different corpus of texts, representing a different discourse domain, it is reasonable to expect that they would each identify a unique set of dimensions. This expectation is reinforced by the fact that the more recent studies have included additional linguistic features not used in earlier MD studies (e.g., semantic classes of nouns and verbs). However, despite these differences in design and research focus, there are certain striking similarities in the set of dimensions identified by these studies.

Most important, in nearly all of these studies, the first dimension identified by the factor analysis is associated with a literate , informational focus (e.g., nouns, prepositional phrases, attributive adjectives, longer words) versus an oral , involved focus (personal involvement/stance, interactivity, and/or real time production features). For example, the MD studies of university spoken and written registers (Biber, 2006a ), elementary school spoken and written registers (Reppen, 2001 ), and eighteenth-century written and speech-based registers Biber ( 2001 ) all identified a first dimension of this type. More surprisingly, a similar dimension has emerged even in MD studies that have focused exclusively on spoken registers, such as that of M. White 1994 , which investigated register variation within the domain of job interviews, and of Biber ( 2008 ), which investigated register variation among the different types of conversation. A second parameter found in most MD analyses corresponds to narrative discourse, reflected by the co-occurrence of features like past tense, third-person pronouns, perfect aspect, and communication verbs (see, e.g., the Biber, 2006a study of university registers; Biber, 2001 , on eighteenth-century registers; and the Biber, 2008 , study of conversation text types).

However, most of these studies have also identified some dimensions that are unique to the particular discourse domain. For example, Reppen's ( 1994 ) factor analysis identified a dimension of “other-directed idea justification” in elementary student registers. The study of university spoken and written registers (Biber, 2006a ) identified two dimensions that are specialized to the university discourse domain: “Procedural versus content-focused discourse” and “academic stance.”

In sum, corpus-driven MD studies of English registers have uncovered both surprising similarities and notable differences in the underlying dimensions of variation. Two parameters seem to be fundamentally important, regardless of the discourse domain: a dimension associated with informational focus versus (inter) personal focus and a dimension associated with narrative discourse. At the same time, these MD studies have uncovered dimensions particular to the communicative functions and priorities of each different domain of use.

These same general patterns have emerged from MD studies of languages other than English, including Nukulaelae Tuvaluan (Besnier, 1988 ), Korean (Kim and Biber, 1994 ); Somali (Biber and Hared, 1992 , 1994 ); Taiwanese (Jang, 1998 ), Spanish (Biber, Davies, Jones, and Tracy-Ventura, 2006 ; Biber and Tracy-Ventura, 2007 ; Parodi, 2007 ), and Dagbani (Purvis, 2008 ). Taken together, these studies provide the first comprehensive investigations of register variation in non-English languages.

Biber 1995 synthesizes several of these studies to investigate the extent to which the underlying dimensions of variation and the relations among registers are configured in similar ways across languages. These languages show striking similarities in their basic patterns of register variation, as reflected by the co-occurring linguistic features that define the dimensions of variation in each language, the functional considerations represented by those dimensions, and the linguistic/functional relations among analogous registers. For example, similar to the full MD analyses of English, these MD studies have all identified dimensions associated with informational versus (inter)personal purposes and with narrative discourse.

At the same time, each of these MD analyses has identified dimensions that are unique to a language, reflecting the particular communicative priorities of that language and culture. For example, the MD analysis of Somali identified a dimension interpreted as “distanced, directive interaction,” represented by optative clauses, first- and second-person pronouns, directional preverbal particles, and other case particles. Only one register is especially marked for the frequent use of these co-occurring features in Somali—personal letters. This dimension reflects the particular communicative priorities of personal letters in Somali, which are typically interactive as well as explicitly directive.

The cross-linguistic comparisons further show that languages as diverse as English and Somali have undergone similar patterns of historical evolution following the introduction of written registers. For example, specialist written registers in both languages have evolved over time to styles with an increasingly dense use of noun phrase modification. Historical shifts in the use of dependent clauses are also surprising: in both languages, certain types of clausal embedding—especially complement clauses—turn out to be associated with spoken registers rather than with written registers.

These synchronic and diachronic similarities raise the possibility of universals of register variation. Synchronically, such universals reflect the operation of underlying form/function associations tied to basic aspects of human communication; diachronically, such universals relate to the historical development of written registers in response to the pressures of modernization and language adaptation.

3.3. Corpus-Based Studies of Historical Registers

Corpus analysis has been especially important for historical descriptions of registers (see Biber and Conrad, in press: chapter 6 ). Multidimensional analysis has been used to document historical patterns of register variation (e.g., Atkinson, 1992 , 1996 , 1999 ; Biber, 2001 ; Biber and Finegan, 1989a , 1997 ; Geisler, 2002 ). However, there has been an even larger number of studies that provide a detailed description of a single historical register. A few MD studies have focused on a specific register, such as the study of historical change in fictional dialogue by Biber and Burges 2000 or the study of recent changes in television dialogue (Rey, 2001 ). But most of these studies provide detailed descriptions of the linguistic characteristics of a historical register. Several of these studies analyze spoken registers from earlier historical periods (e.g., Culpeper and Kytö, 2000 , forthcoming; Kahlas-Tarkka and Rissanen, 2007 ; Kryk-Kastovsky, 2000 ; 2006; Kytö and Walker, 2003 ). The largest majority, though, focus on written historical registers, such as letters (Fitzmaurice, 2002a ; Nevala, 2004 ), medical recipes and herbals (Mäkinen, 2002 ; Taavitsainen, 2001 ), and medical and scientific writing (e.g., Taavitsainen and Pahta, 2000 , 2004 ).

3.4. World Englishes and English as a Lingua Franca (ELF)

In general, sociolinguistics has been resistant to the application of corpus-based analyses, and so most studies of social and regional dialect variation continue to employ traditional methodologies. However, a few research projects have studied regional dialect variation from a corpus perspective. For the most part, these projects have been conducted in European universities (Freiburg, Helsinki, Newcastle) and have focused on British English dialects, resulting in the Newcastle Electronic Corpus of Tyneside English, the Helsinki Corpus of British English Dialects (see Ihalainen, 1990 ), and the Freiburg English Dialect Corpus (FRED; see Kortmann and Wagner, 2005 ; Anderwald and Wagner, 2005 ). We are aware of only one study to date that has applied a corpus approach to analyze American English regional dialects: Grieve's ( 2009 ) study of variation in a 50-million-word corpus of letters to the editor collected from 200 cities from across the United States.

In contrast, the linguistic study of global varieties of English—or “World Englishes”—is almost always carried out from a corpus perspective. The strengths of the corpus approach make it ideal for describing new varieties that have emerged as English adapts to changing circumstances of use and contact with local languages and cultures (see Breiteneder, 2008 ). Research efforts in this area have focused on two major subareas: the study of World Englishes (indigenous varieties of English) and the study of English as a Lingua Franca (ELF; English used by nonnative English speakers). (See J. Jenkins, 2006 , for a full discussion of this topic.)

Corpus development efforts in the arena of World Englishes are best represented by the International Corpus of English (ICE) project. The ICE project is an attempt to construct comparable corpora for all varieties of English spoken around the world (see Greenbaum, 1988 , 1990a , 1990b , 1990c , 1991 , 1996 ; Greenbaum and Nelson, 1996 ). Each corpus in ICE ideally has the same design—in other words, a total size of one million words, with 500 texts of approximately 2,000 words each from the same registers (news, conversation, etc.). The texts in the corpus date from 1990 or later. The authors and speakers of the texts are aged 18 or over, are educated through the medium of English, and either were born in the target country or moved there at an early age (Nelson, 1996 ).

As part of the ICE project or other related efforts, individual corpora have been constructed for many of the varieties of English used around the world. These include corpora for the “inner-circle” varieties of English (e.g., for Australia, Canada, Great Britain, New Zealand, the United States; see http://www.ucl.ac.uk/english-usage/ ice/) as well as corpora for numerous other varieties of English spoken around the world, such as Caribbean English, East African English, Fiji English, Filipino English, Hong Kong English, Indian English, Jamaican English, Nigerian English, Singaporean English, and Xhosa English (see, e.g., Banjo, 1996 ; Bolt and Kingsley, 1996 ; Bolton, 2000 ; Burridge and Kortmann, 2008 ; Friginal, 2009b ; Holmes, 1996 ; Hundt 1998 , 2006 ; Hundt and Biewer, 2007 ; Kortmann, 2006 ; Mair, 1992 ; Mair and Sand, 1998 ; Ooi, 1997 ; Rogers, 2002 , 2003 ; Sand, 1998 , 1999 ; Schmied, 1990 , 1994 , 2004a , 2004b , 2005 , 2006 , 2007 ; Schmied and Hudson-Ettle, 1996 ; Tent and Mugler, 1996 , 2004 ).

A parallel research effort has focused on English as a lingua franca (ELF). Two especially important projects in this area have been the Vienna Oxford International Corpus of English (VOICE; see Seidlhofer, 2006 , 2007 ; Seidlhofer, Breiteneder, and Pitzl, 2006 ; Breiteneder et al., 2006 ) and the corpus of English as Lingua Franca in Academic Settings (ELFA corpus; see Mauranen, 2003 , 2006 , 2007 ).

4. Corpus Linguistics, Language Learning, and Language Pedagogy

Explorations into the pedagogical applications of corpus linguistics continue to match ongoing advancements in corpus-based technology and classroom research. Vocabulary acquisition and the mastery of grammar for language learners have traditionally been the preferred areas of investigation by many corpus researchers involved in the design and creation of language teaching materials (Conrad, 1999 , 2000 ; Hinkel, 2002 ). However, in recent years, corpus tools have been utilized in the teaching of specific skills particularly in genre-based writing (Hyland, 2004b ; Swales, 2002 ) and speaking in various academic and professional contexts.

There are several points of intersection between corpus linguistics and directly applied issues that involve language teaching and learning. In the following sections, we address four of these:

The compilation and analysis of learner corpora

The use of corpora for language teaching and learning

Applications of corpus research in ESP/EAP

The extent to which corpus findings can be integrated into textbooks and other teaching materials

4.1. Learner Corpora

One major application of corpus methods has been in the construction of learner corpora and the analysis of those corpora to document differences across L1 backgrounds. The most important project of this type is the International Corpus of Learner English (ICLE), a collection of corpora produced by learners from several different language backgrounds (see, e.g., Granger, 1993 , 1994 , 1996 , 1998a , 2003a , 2003b ). Many studies have compared the patterns of use in learner corpora to those found in native-English corpora to document patterns of overuse or underuse by learners. Studies have focused on a wide range of grammatical features, such as passives, participle clauses, connectors, and so on (see Aarts and Granger, 1998 ; Granger, 1997a , 2004 ; Granger and Tyson, 1996 ; Granger, Hung, and Petch-Tyson, 2002 ). Many studies in this tradition have also focused on formulaic sequences and the lexico-grammatical patterns associated with different learner groups (see, e.g., Altenberg and Granger, 2001 ; De Cock, 1998 ; De Cock et al., 1998 ; Granger, 1998b ; Meunier and Granger, 2008 ). Although most corpus studies of leaner language have been based on the ICLE, there have also been major studies with similar research goals undertaken from other perspectives (e.g., Hinkel, 2002 , 2003 ; Reder, Harris, and Setzler, 2003 ).

4.2. Corpora for Language Teaching and Learning

An even larger number of studies address the use of corpora for language teaching, introducing the approaches and discussing potential pedagogical benefits. These include numerous book-length treatments (e.g., Aston, 2001a ; Aston, Bernardini, and Stewart, 2004 ; Botley, McEnery, and Wilson, 2000 ; Burnard and McEnery, 2000 ; Ghadessy et al., 2001 ; Lewandowska-Tomaszczyk, 2003 , 2004 ; McEnery and Wilson, 1997 ; Mukherjee and Rohrbach, 2006 ; O'Keeffe, McCarthy, and Carter, 2007 ; Sinclair, 2004 ; Thomas and Short, 1996 ; Tribble and Jones, 1997 ; Wichmann, Fligelstone, McEnery, and Knowles, 1997 ) as well as an even larger number of journal articles and book chapters (e.g., Alderson, 1996 ; Aston, 1995 , 1997 , 2001b ; Barbieri and Eckhardt, 2007 ; Braun, 2005 ; Brodine, 2001 ; Donley and Reppen, 2001 ; Fligelstone, 1993 ; Huckin and Coady, 1999 ; “Kaltenböck and Mehlmauer-Larcher, 2005 ; Leech, 1997 , 2000 ; McCarthy and Carter, 2001 ; McEnery and Wilson, 1993 , 1997 , 2001 ; Meunier, 2002 ; Milton, 1998 ; Mindt, 1996 ; Mudraya, 2006 ; Murphy, 1996 ; O'Keeffe and Farr, 2003 ; Partington, 2001 ; Salsbury and Crummer, 2008 ; Shirato and Stapleton, 2007 ; Thompson and Tribble, 2001 ; Tribble, 2001 ; Yoon and Hirvela, 2004 ; Zorzi, 2001 ).

One especially common topic of these studies is the use of concordancing activities in the classroom, especially for inductive, data-driven learning (in addition to many of the studies previously cited, see Cobb, 1997 ; Flowerdew, 2001 ; Gaskell and Cobb, 2004 ; Gavioli, 1997 , 2001 ; Johns, 1994 , 1997 ; Nesselhauf, 2003 ; Qiao and Sussex, 2001; Sinclair, 2003 ; Stevens, 1993 ; Todd, 2001 ; Wichmann, 1995 ). For instance, Cobb 1997 and Horst, Cobb, and Nicolae 2005 report specific learning gains in the transfer of vocabulary knowledge of language learners that are attributable to the use of concordance programs and corpus-based tools. Similar studies by Chan and Liou ( 2005 ), Charles 2005 , and Friginal 2006 illustrate how web-based concordancing instruction and the use of concordancers in editing laboratory reports significantly help students' learning and use of verb-noun collocations, reporting verbs, passive and active sentence structures, and linking adverbials. Most participants in these studies see the use of concordancers as helpful. Innovative corpus tools that aid in the introduction of new words, collocations, and lexical bundles help learners to improve their awareness of word meanings and of the uses of words in various contexts. In addition, hands-on concordancing also aids in successful learning of new academic vocabulary, and enhances students' performance in activities and on tests (Altenberg and Granger, 2001 ; McCarthy and Carter, 2002 ; Nesselhauf, 2005 ).

Other studies focus more on the unexpected research findings that result from corpus investigations, discussing how such findings often indicate that we should be using radically different pedagogical approaches and different teaching materials than those traditionally used for language teaching (see, e.g. Carter and McCarthy, 1995 ; Conrad, 1999 , 2000 ; Henry and Roseberry, 2001 ; Hughes and McCarthy, 1998 ; Hunston, 2002b ; Hunston and Francis, 1998 ; Liu, 2003 ; Nesselhauf, 2003 ). For example, Biber and Reppen 2002 present corpus findings that identify the most common verbs in English conversation and then survey ESL grammar books to show that most of them fail to illustrate the use of those verbs.

4.3. Corpora and ESP/EAP

Research in the subfields of English for specific purposes (ESP) and English for academic purposes (EAP) has become almost entirely corpus based over the past 10 to 20 years. For example, a survey of articles in any recent issue of English for Specific Purposes or the Journal of English for Academic Purposes shows that recent linguistic descriptions of special/academic varieties in English are almost always based on corpus analysis.

Similarly, corpus approaches have become commonplace for ESP/EAP pedagogy. For example, Gilquin, Granger, and Paquot ( 2007 ), Hyland (2004b) , Flowerdew 2005 , and Gavioli 2005 all acknowledge the invaluable contribution of corpus approaches in the teaching of ESP/EAP, especially in increasing learners' awareness of the textual features of the target language. Yoon and Hirvela 2004 and Lee and Swales 2006 explore the use of corpora and corpus tools in EAP courses. For example, Lee and Swales piloted an innovative 13-week course in corpus-informed EAP, in which students were able to compare their writing with the linguistic patterns in a corpus of professional, published academic papers. These studies indicate that the corpus approach to academic writing facilitates the development of writing skills and contributes to learners' increased confidence; a majority of the participants in studies reported that they would recommend corpus-informed writing classes to other foreign students.

4.4. Corpus-Informed Language Textbooks

In contrast to the extremely large number of books and research papers that advocate the application of corpus approaches for language teaching, there are surprisingly few language textbooks that are based on corpus research. ELT dictionaries, which have been based on corpus research since the 1980s, are the major exception here (see sections 1 and 2). However, publishers have been more reluctant to break with tradition in ELT textbooks for vocabulary and grammar.

There are a few notable exceptions to this generalization. In some cases, textbooks have been shaped by corpus analysis, even though this influence is not acknowledged on the book cover or in the introduction. Such books include the series Vocabulary in Use (McCarthy and O'Dell, 2001 , 2004 , 2005 ) and Natural Grammar (Thornbury, 2004 ). In more recent years, though, publishers have become more willing to market ESL textbooks that are directly shaped by the results of corpus research. For example, the four-level EFL/ESL Touchstone series by McCarthy, McCarten, and Sandiford ( 2006 ) is advertised as drawing on “the Cambridge International Corpus … to build a syllabus based on how people actually use English” (back cover). Vocabulary books like those by Schmitt and Schmitt 2005 and Huntley 2006 are corpus based in two major respects:

They teach the words on the “Academic Word List”: a list of the most common vocabulary items that occur in a large corpus of written academic texts (see Coxhead, 2000 , previously discussed in section 2.1).

They provide practice in the typical “collocations” of those words, derived from further corpus analysis.

Corpus-based EAP curricula are widely used throughout Europe and Asia, but they are usually based on locally created materials rather than on a major textbook. One exception to this is the corpus-informed textbook on chemistry research writing by Robinson, Stoller, Costanza-Robinson, and Jones ( 2008 ). This book is actually targeted for all students of chemistry, because native speakers of English encounter many of the same challenges in learning advanced disciplinary writing skills as do language learners.

It is possible to make a distinction between corpus-informed textbooks and corpus-based textbooks: The former incorporate natural examples taken from a corpus, whereas in the latter, decisions about inclusion/exclusion of topics and the sequence of topics are made based on the results of prior corpus analysis. In many cases, a corpus-based book will present linguistic patterns of use that would not have even been acknowledged in a traditional textbook. The vocabulary books by Schmitt and Schmitt 2005 and Huntley 2006 are corpus based in this sense. The grammar book by Thornbury 2004 also seems to be corpus based in this sense, although there is nothing in the book introduction that acknowledges the role of corpus analysis.

Two recent books provide corpus-based introductions to English grammar for advanced students training to become language teachers: The Longman Student Grammar of Spoken and Written English (and the accompanying workbook; Biber, Conrad, and Leech, 2002 ; Conrad, Biber and Leech, 2002 ) and the Teacher's Grammar of English (Cowan, 2008 ). Finally, Conrad and Biber (in press) identifies 50 of the most important and surprising corpus research findings from the Longman Grammar of Spoken and Written English , presenting those as grammar units for ESL/EFL students.

5. Future Directions

The present chapter has surveyed the extensive body of research using corpus analysis to describe the patterns of language use in English (and other languages). In addition, there is no shortage of studies that advocate the application of corpus approaches for language teaching. However, as described in the last section, there has been much less effort given to the actual implementation of corpus research findings to develop teaching materials, especially textbooks that can provide the basis for a curriculum. At present, however, there are several such books in the works, and we anticipate that this state of affairs will change dramatically over the next few years.

One specific area that is currently receiving attention is the analysis of spoken corpora annotated for prosody in addition to lexico-grammatical information. Interestingly, the very first large spoken corpus of English—the London-Lund Corpus—included detailed coding to reflect pitch, length, and pausing phenomena (see Svartvik, 1990 ). However, this information was mostly disregarded in linguistic analyses of that corpus. More recently, though, spoken corpora are being analyzed to document systematic patterns of discourse intonation. Cheng, Greaves, and Warren's (2008 ; cf. Warren, 2004 ) study of the Hong Kong Corpus of Spoken English is one notable example of this type. Similarly, the C-ORAL-ROM project (Cresti and Moneglia, 2005 ) is a major research effort to develop acoustically analyzed spoken corpora for Italian, French, Spanish, and Portuguese.

Finally, multimodal annotation of spoken interactions should be another important area for future research (see, e.g., Gu, 2002 , 2007 ). In addition to enhanced prosodic and acoustic transcriptions of spoken corpora, these projects link video recordings to nonlinguistic features that play a crucial role in communication, such as facial expressions, hand gestures, and body position (see, e.g., Carter and Adolphs, 2008 ; Dahlmann and Adolphs, in press; Knight and Adolphs, 2008 ). Studies like these indicate that the strengths of corpus analysis can be extended to include aspects of communication beyond the analysis of the lexico-grammatical fabric of spoken and written texts.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

corpus linguistics research paper

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Corpus Linguistics

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Linguistics Follow Following
  • Languages and Linguistics Follow Following
  • Historical Linguistics Follow Following
  • Cognitive Linguistics Follow Following
  • Language Variation and Change Follow Following
  • Linguistic Typology Follow Following
  • Grammaticalization Follow Following
  • Syntax Follow Following
  • Sociolinguistics Follow Following
  • Semantics Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

COMMENTS

  1. (PDF) CORPUS METHODS IN LANGUAGE STUDIES

    Abstract. This chapter offers an introduction to corpus linguistics as a methodology for studying language, literature, and other fields in the humanities. It defines corpus linguistics, explores ...

  2. PDF Chapter 26 Writing up a Corpus-Linguistic Paper

    1This is also a means of bringing credit and recognition to all those involved in corpus compilation. 26 Writing up a Corpus-Linguistic Paper 651. constructions are, all other things being equal or at least very similar, more likely to use that construction again than they would be if they had not used it before.

  3. PDF An IntroductIon to corpus LInguIstIcs

    What Is CorPus LInguIstICs? So what exactly is corpus linguistics? Corpus linguistics approaches the study of language in use through corpora (singular: corpus). A corpus is a large, principled collection of naturally occurring examples of language stored electronically. In short, corpus linguistics serves to answer two fundamental research ...

  4. Writing up a Corpus-Linguistic Paper

    Given that we prefer to see corpus linguistics as a method rather than a theory (see the special issue of the International Journal of Corpus Linguistics 15(3) for a debate of these two views), we believe outlining the methodological details of a corpus study in a way that is comprehensive enough is absolutely central. At a very high level of abstractness, there is really only one rule, which ...

  5. Applied Corpus Linguistics

    The role of Applied Corpus Linguistics is to provide a forum for further theorisation of corpus data analysis techniques, for the sharing of case studies and of new methods, and to advance the development and consolidation of applied corpus linguistics as a major force in social research. The journal welcomes contributions in the form of full ...

  6. (PDF) Research Trends in Corpus Linguistics: A Bibliometric Analysis of

    This paper uses a bibliometric analysis to map the field of Corpus Linguistics (CL) research in arts and humanities over the last 20 years, while tracking changes in the popular CL research topics ...

  7. Review of A Practical Handbook of Corpus Linguistics

    Finally, Part VI contains two wrapping-up chapters that give directions on writing a Corpus Linguistics research paper (Chap. 26) and on meta-analysing Corpus Linguistics research (Chap. 27), also including a practical guide with R. ... Chapter 26 'Writing up of a Corpus-Linguistics Paper', by Stefan Th. Gries and Magali Paquot, is a highly ...

  8. A Practical Handbook of Corpus Linguistics

    Part VI focuses on how to write a corpus linguistic paper and how to meta-analyze corpus linguistic research. The volume can serve as a course book as well as for individual study. It will be an essential reading for students of corpus linguistics as well as experienced researchers who want to expand their knowledge of the field.

  9. Corpus Linguistics: Overview

    Abstract. Corpus linguistics means the use of computer-assisted methods to study large quantities of real language. Such research is important for applied linguists because they investigate questions about actual language use. Due to advances in technology, researchers can discover new facts and test hypotheses about aspects of language which ...

  10. International Journal of Corpus Linguistics

    The International Journal of Corpus Linguistics (IJCL) publishes original research covering methodological, applied and theoretical work in any area of corpus linguistics. Through its focus on empirical language research, IJCL provides a forum for the presentation of new findings and innovative approaches in any area of linguistics (e.g. lexicology, grammar, discourse analysis, stylistics ...

  11. Corpus Linguistics and Linguistic Theory

    Objective Corpus Linguistics and Linguistic Theory (CLLT) is a peer-reviewed journal publishing high-quality original corpus-based research focusing on theoretically relevant issues in all core areas of linguistic research, or other recognized topic areas. It provides a forum for researchers from different theoretical backgrounds and different areas of interest that share a commitment to the ...

  12. Corpus linguistics in language testing research

    The five papers represent a broad variety of methodologies, research questions, and applications to language assessment, but each one illustrates the use of corpus linguistics to investigate the level of support for inferences in validity arguments either through comparative analyses of two or more relevant corpora or by using corpus data to ...

  13. Corpus Linguistics

    By using large, principled collections of naturally occurring language, corpus linguistics can accurately explore and describe linguistic characteristics and patterns associated with language use in different contexts (e.g., talking among friends, giving a formal speech, writing a friend, writing a research paper), across different speakers ...

  14. Corpus Linguistics and Corpus-Based Research and Its Implication in

    Corpus linguistics is a research approach that has developed over the past few decades to support empirical investigations of language variation and use, resulting in research findings that are ...

  15. PDF Review of A Practical Handbook of Corpus Linguistics

    a Corpus Linguistics research paper (Chap. 26) and on meta-analysing Corpus Lin-guistics research (Chap. 27), also including a practical guide with R. Although almost all chapters follow the same structure - an introduction followed by an exposition of the fundamentals, subsequently illustrated by two or three repre-

  16. Corpus-Based and Corpus-driven Analyses of Language Variation and Use

    Corpus linguistics is a research approach that has developed over the past few decades to support empirical investigations of language variation and use, resulting in research findings which have much greater generalizability and validity than would otherwise be feasible. Corpus studies have used two major research approaches: 'corpus-based ...

  17. Research in Corpus Linguistics

    View All Issues. Research in Corpus Linguistics (RiCL, ISSN 2243-4712) is a scholarly peer-reviewed international scientific journal published annually, aiming at the publication of contributions which contain empirical analyses of data from different languages and from different theoretical perspectives and frameworks.

  18. Research in Corpus Linguistics

    Eric Friginal is assistant professor in the Department of Applied Linguistics and English as a Second Language at Georgia State University. His main research interest lies in using corpus linguistics to explore linguistic variation in professional, cross-cultural discourse in the context of outsourced call centers in the Philippines serving American customers.

  19. Corpus linguistics

    The Brown Corpus was the first computerized corpus designed for linguistic research. ... a combination of papers of the ACL Anthology and Google Scholar metadata. Corpora can also aid in translation efforts or in teaching foreign languages. Methods. Corpus linguistics has generated a number of research methods, which attempt to trace a path ...

  20. (PDF) Corpus Linguistics

    PDF | On Jan 1, 2017, Marc Brysbaert and others published Corpus Linguistics | Find, read and cite all the research you need on ResearchGate

  21. Corpus Linguistics and Corpus-Based Research and Its Implication in

    Corpus & Discourse Analysis Discourse Analysis can also profit from corpus linguistics research. Two studies have made the use of corpus linguistic research to reinforce the capacity and efficiency of discourse analysis. ... Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, 248-252 ...

  22. [PDF] 'Corpus Linguistics and Translation Studies: Implications and

    The rise of corpus linguistics has serious implications for any discipline in which language plays a major role. This paper explores the impact that the availability of corpora is likely to have on the study of translation as an empirical phenomenon. It argues that the techniques and methodology developed in the field of corpus linguistics will have a direct impact on the emerging discipline ...

  23. Corpus Linguistics Research Papers

    Journal of Language and Sexuality 7.2: 145-174. As an introduction to the special issue, this paper presents an overview of previous corpus linguistic work in the field of language and sexuality and discusses the compatibility of corpus linguistic methodology with queer linguistics as... more. Download.

  24. A Corpus Based Analysis of Noun Modification in Empirical Research

    3.1 Overview of Design. This study used a corpus-based approach to investigate noun modification in the sections of. empirical research articles in order to explore the intersection of genre and register by using. IMRD sections with an analysis of several grammatical structures.