essay writing about organization

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.3 Organizing Your Writing

Learning objectives.

Understand how and why organizational techniques help writers and readers stay focused.
Assess how and when to use chronological order to organize an essay.
Recognize how and when to use order of importance to organize an essay.
Determine how and when to use spatial order to organize an essay.

The method of organization you choose for your essay is just as important as its content. Without a clear organizational pattern, your reader could become confused and lose interest. The way you structure your essay helps your readers draw connections between the body and the thesis, and the structure also keeps you focused as you plan and write the essay. Choosing your organizational pattern before you outline ensures that each body paragraph works to support and develop your thesis.

This section covers three ways to organize body paragraphs:

Chronological order
Order of importance
Spatial order

When you begin to draft your essay, your ideas may seem to flow from your mind in a seemingly random manner. Your readers, who bring to the table different backgrounds, viewpoints, and ideas, need you to clearly organize these ideas in order to help process and accept them.

A solid organizational pattern gives your ideas a path that you can follow as you develop your draft. Knowing how you will organize your paragraphs allows you to better express and analyze your thoughts. Planning the structure of your essay before you choose supporting evidence helps you conduct more effective and targeted research.

Chronological Order

In Chapter 8 “The Writing Process: How Do I Begin?” , you learned that chronological arrangement has the following purposes:

To explain the history of an event or a topic
To tell a story or relate an experience
To explain how to do or to make something
To explain the steps in a process

Chronological order is mostly used in expository writing , which is a form of writing that narrates, describes, informs, or explains a process. When using chronological order, arrange the events in the order that they actually happened, or will happen if you are giving instructions. This method requires you to use words such as first , second , then , after that , later , and finally . These transition words guide you and your reader through the paper as you expand your thesis.

For example, if you are writing an essay about the history of the airline industry, you would begin with its conception and detail the essential timeline events up until present day. You would follow the chain of events using words such as first , then , next , and so on.

Writing at Work

At some point in your career you may have to file a complaint with your human resources department. Using chronological order is a useful tool in describing the events that led up to your filing the grievance. You would logically lay out the events in the order that they occurred using the key transition words. The more logical your complaint, the more likely you will be well received and helped.

Choose an accomplishment you have achieved in your life. The important moment could be in sports, schooling, or extracurricular activities. On your own sheet of paper, list the steps you took to reach your goal. Try to be as specific as possible with the steps you took. Pay attention to using transition words to focus your writing.

Keep in mind that chronological order is most appropriate for the following purposes:

Writing essays containing heavy research
Writing essays with the aim of listing, explaining, or narrating
Writing essays that analyze literary works such as poems, plays, or books

When using chronological order, your introduction should indicate the information you will cover and in what order, and the introduction should also establish the relevance of the information. Your body paragraphs should then provide clear divisions or steps in chronology. You can divide your paragraphs by time (such as decades, wars, or other historical events) or by the same structure of the work you are examining (such as a line-by-line explication of a poem).

On a separate sheet of paper, write a paragraph that describes a process you are familiar with and can do well. Assume that your reader is unfamiliar with the procedure. Remember to use the chronological key words, such as first , second , then , and finally .

Order of Importance

Recall from Chapter 8 “The Writing Process: How Do I Begin?” that order of importance is best used for the following purposes:

Persuading and convincing
Ranking items by their importance, benefit, or significance
Illustrating a situation, problem, or solution

Most essays move from the least to the most important point, and the paragraphs are arranged in an effort to build the essay’s strength. Sometimes, however, it is necessary to begin with your most important supporting point, such as in an essay that contains a thesis that is highly debatable. When writing a persuasive essay, it is best to begin with the most important point because it immediately captivates your readers and compels them to continue reading.

For example, if you were supporting your thesis that homework is detrimental to the education of high school students, you would want to present your most convincing argument first, and then move on to the less important points for your case.

Some key transitional words you should use with this method of organization are most importantly , almost as importantly , just as importantly , and finally .

During your career, you may be required to work on a team that devises a strategy for a specific goal of your company, such as increasing profits. When planning your strategy you should organize your steps in order of importance. This demonstrates the ability to prioritize and plan. Using the order of importance technique also shows that you can create a resolution with logical steps for accomplishing a common goal.

On a separate sheet of paper, write a paragraph that discusses a passion of yours. Your passion could be music, a particular sport, filmmaking, and so on. Your paragraph should be built upon the reasons why you feel so strongly. Briefly discuss your reasons in the order of least to greatest importance.

Spatial Order

As stated in Chapter 8 “The Writing Process: How Do I Begin?” , spatial order is best used for the following purposes:

Helping readers visualize something as you want them to see it
Evoking a scene using the senses (sight, touch, taste, smell, and sound)
Writing a descriptive essay

Spatial order means that you explain or describe objects as they are arranged around you in your space, for example in a bedroom. As the writer, you create a picture for your reader, and their perspective is the viewpoint from which you describe what is around you.

The view must move in an orderly, logical progression, giving the reader clear directional signals to follow from place to place. The key to using this method is to choose a specific starting point and then guide the reader to follow your eye as it moves in an orderly trajectory from your starting point.

Pay attention to the following student’s description of her bedroom and how she guides the reader through the viewing process, foot by foot.

Attached to my bedroom wall is a small wooden rack dangling with red and turquoise necklaces that shimmer as you enter. Just to the right of the rack is my window, framed by billowy white curtains. The peace of such an image is a stark contrast to my desk, which sits to the right of the window, layered in textbooks, crumpled papers, coffee cups, and an overflowing ashtray. Turning my head to the right, I see a set of two bare windows that frame the trees outside the glass like a 3D painting. Below the windows is an oak chest from which blankets and scarves are protruding. Against the wall opposite the billowy curtains is an antique dresser, on top of which sits a jewelry box and a few picture frames. A tall mirror attached to the dresser takes up most of the wall, which is the color of lavender.

The paragraph incorporates two objectives you have learned in this chapter: using an implied topic sentence and applying spatial order. Often in a descriptive essay, the two work together.

The following are possible transition words to include when using spatial order:

Just to the left or just to the right
On the left or on the right
Across from
A little further down
To the south, to the east, and so on
A few yards away
Turning left or turning right

On a separate sheet of paper, write a paragraph using spatial order that describes your commute to work, school, or another location you visit often.

Collaboration

Please share with a classmate and compare your answers.

Key Takeaways

The way you organize your body paragraphs ensures you and your readers stay focused on and draw connections to, your thesis statement.
A strong organizational pattern allows you to articulate, analyze, and clarify your thoughts.
Planning the organizational structure for your essay before you begin to search for supporting evidence helps you conduct more effective and directed research.
Chronological order is most commonly used in expository writing. It is useful for explaining the history of your subject, for telling a story, or for explaining a process.
Order of importance is most appropriate in a persuasion paper as well as for essays in which you rank things, people, or events by their significance.
Spatial order describes things as they are arranged in space and is best for helping readers visualize something as you want them to see it; it creates a dominant impression.

The Writing Process
Addressing the Prompt
Writing Skill: Development
Originality
Timed Writing (Expectations)
Integrated Writing (Writing Process)
Introduction to Academic Essays

Organization

Introduction Paragraphs
Body Paragraphs
Conclusion Paragraphs
Example Essay 1
Example Essay 2
Timed Writing (The Prompt)
Integrated Writing (TOEFL Task 1)
Process Essays
Process Essay Example 1
Process Essay Example 2
Writing Skill: Unity
Revise A Process Essay
Timed Writing (Choose a Position)
Integrated Writing (TOEFL Task 2)
Comparison Essays
Comparison Essay Example 1
Comparison Essay Example 2
Writing Skill: Cohesion
Revise A Comparison Essay
Timed Writing (Plans & Problems)
Integrated Writing (Word Choice)
Problem/Solution Essays
Problem/Solution Essay Example 1
Problem/Solution Example Essay 2
Writing Skill: Summary
Revise A Problem/Solution Essay
Timed Writing (Revising)
Integrated Writing (Summary)
More Writing Skills
Punctuation
Simple Sentences
Compound Sentences
Complex Sentences Part 1
Complex Sentences Part 2
Using Academic Vocabulary
Translations

Choose a Sign-in Option

Tools and Settings

Questions and Tasks

Citation and Embed Code

The way you organize your ideas in a five-paragraph essay may be different from the way you normally organize your ideas. You should focus on one central idea, and that idea needs to be clearly stated multiple times. The essay should present reasons and evidence that support that one, central idea. You may have heard that American writers “tell you what they are going to tell you, they tell you, and then they tell you what they told you.” This is often true in a five-paragraph essay.

While it is often easier to draft your essay by beginning with the body paragraphs, the following section will present the organization of an essay to you in the order your reader should experience your writing. You should prepare them for the topic (in the beginning of the introduction), present your main idea (at the end of the introduction), provide explanations and evidence to support your main idea (in the body paragraphs), and summarize or extend your main idea (in the conclusion).

This content is provided to you freely by BYU Open Learning Network.

Access it online or download it at https://open.byu.edu/academic_a_writing/organization .

Organizing Your Writing

Writing for Success

Learning Objectives

Understand how and why organizational techniques help writers and readers stay focused.
Assess how and when to use chronological order to organize an essay.
Recognize how and when to use order of importance to organize an essay.
Determine how and when to use spatial order to organize an essay.

This section covers three ways to organize body paragraphs:

Chronological order
Order of importance
Spatial order

CHRONOLOGICAL ORDER

Chronological arrangement (also called “time order,”) has the following purposes:

To explain the history of an event or a topic
To tell a story or relate an experience
To explain how to do or to make something
To explain the steps in a process

Chronological order is mostly used in expository writing, which is a form of writing that narrates, describes, informs, or explains a process. When using chronological order, arrange the events in the order that they actually happened, or will happen if you are giving instructions. This method requires you to use words such as first, second, then, after that, later, and finally. These transition words guide you and your reader through the paper as you expand your thesis.

WRITING AT WORK

Keep in mind that chronological order is most appropriate for the following purposes:

Writing essays containing heavy research
Writing essays with the aim of listing, explaining, or narrating
Writing essays that analyze literary works such as poems, plays, or books

ORDER OF IMPORTANCE

Order of importance is best used for the following purposes:

Persuading and convincing
Ranking items by their importance, benefit, or significance
Illustrating a situation, problem, or solution

Some key transitional words you should use with this method of organization are most importantly, almost as importantly, just as importantly, and finally.

SPATIAL ORDER

Spatial order is best used for the following purposes:

Helping readers visualize something as you want them to see it
Evoking a scene using the senses (sight, touch, taste, smell, and sound)
Writing a descriptive essay

Pay attention to the following student’s description of her bedroom and how she guides the reader through the viewing process, foot by foot.

The paragraph incorporates two objectives you have learned in this chapter: using an implied topic sentence and applying spatial order. Often in a descriptive essay, the two work together.

The following are possible transition words to include when using spatial order:

Just to the left or just to the right
On the left or on the right
Across from
A little further down
To the south, to the east, and so on
A few yards away
Turning left or turning right

Key Takeaways

The way you organize your body paragraphs ensures you and your readers stay focused on and draw connections to, your thesis statement.
A strong organizational pattern allows you to articulate, analyze, and clarify your thoughts.
Planning the organizational structure for your essay before you begin to search for supporting evidence helps you conduct more effective and directed research.
Chronological order is most commonly used in expository writing. It is useful for explaining the history of your subject, for telling a story, or for explaining a process.
Order of importance is most appropriate in a persuasion paper as well as for essays in which you rank things, people, or events by their significance.
Spatial order describes things as they are arranged in space and is best for helping readers visualize something as you want them to see it; it creates a dominant impression.

Share This Book

Feedback/errata.

Comments are closed.

Module 1: An Overview of the Writing Process

Organizing an essay.

There are many elements that must come together to create a good essay. The topic should be clear and interesting. The author’s voice should come through, but not be a distraction. There should be no errors in grammar, spelling, punctuation, or capitalization. Organization is one of the most important elements of an essay that is often overlooked. An organized essay is clear, focused, logical and effective.

Organization makes it easier to understand the thesis. To illustrate, imagine putting together a bike. Having all of the necessary tools, parts, and directions will make the job easier to complete than if the parts are spread across the room and the tools are located all over the house. The same logic applies to writing an essay. When all the parts of an essay are in some sort of order, it is both easier for the writer to put the essay together and for the reader to understand the main ideas presented in the essay.

Photo of a white kitchen lit with windows. Rows of glass jars line shelves over the countertop, and a hanging rack of pans and pots appears beneath that.

Strategy 1. Reverse Outlining

If your paper is about Huckleberry Finn, a working thesis might be: “In Huckleberry Finn, Mark Twain develops a contrast between life on the river and life on the shore.” However, you might feel uncertain if your paper really follows through on the thesis as promised.

This paper may benefit from reverse outlining. Your aim is to create an outline of what you’ve already written, as opposed to the kind of outline that you make before you begin to write. The reverse outline will help you evaluate the strengths and weaknesses of both your organization and your argument.

Read the draft and take notes Read your draft over, and as you do so, make very brief notes in the margin about what each paragraph is trying to accomplish.

Outline the Draft After you’ve read through the entire draft, transfer the brief notes to a fresh sheet of paper, listing them in the order in which they appear. The outline might look like this:

Paragraph 1: Intro
Paragraph 2: Background on Huck Finn
Paragraph 3: River for Huck and Jim
Paragraph 4: Shore and laws for Huck and Jim
Paragraph 5: Shore and family, school
Paragraph 6: River and freedom, democracy
Paragraph 7: River and shore similarities
Paragraph 8: Conclusion

Examine the Outline Look for repetition and other organizational problems. In the reverse outline above, there’s a problem somewhere in Paragraphs 3-7, where the potential for repetition is high because you keep moving back and forth between river and shore.

Re-examine the Thesis, the Outline, and the Draft Together Look closely at the outline and see how well it supports the argument in your thesis statement. You should be able to see which paragraphs need rewriting, reordering or rejecting. You may find some paragraphs are tangential or irrelevant or that some paragraphs have more than one idea and need to be separated.

Strategy 2. Talk It Out

Drawing of two men sitting at a cafe table talking. They are wearing period dress (bowlers, suits, bow ties).

Find a Friend, your T.A., your Professor, a relative, a Writing Center tutor, or any sympathetic and intelligent listener. People are more accustomed to talking than writing, so it might be beneficial to explain your thinking out loud to someone before organizing the essay. Talking to someone about your ideas may also relieve pressure and anxiety about your topic.

Explain What Your Paper Is About Pay attention to how you explain your argument verbally. It is likely that the order in which you present your ideas and evidence to your listener is a logical way to arrange them in your paper. Let’s say that you begin (as you did above) with the working thesis. As you continue to explain, you realize that even though your draft doesn’t mention “private enterprise” until the last two paragraphs, you begin to talk about it right away. This fact should tell you that you probably need to discuss private enterprise near the beginning.

Take Notes You and your listener should keep track of the way you explain your paper. If you don’t, you probably won’t remember what you’ve talked about. Compare the structure of the argument in the notes to the structure of the draft you’ve written.

Get Your Listener to Ask Questions As the writer, it is in your interest to receive constructive criticism so that your draft will become stronger. You want your listener to say things like, “Would you mind explaining that point about being both conservative and liberal again? I wasn’t sure I followed” or “What kind of economic principle is government relief? Do you consider it a good or bad thing?” Questions you can’t answer may signal an unnecessary tangent or an area needing further development in the draft. Questions you need to think about will probably make you realize that you need to explain more your paper. In short, you want to know if your listener fully understands you; if not, chances are your readers won’t, either. [2]

Strategy 3. Paragraphs

Readers need paragraph breaks in order to organize their reading. Writers need paragraph breaks to organize their writing. A paragraph break indicates a change in focus, topic, specificity, point of view, or rhetorical strategy. The paragraph should have one main idea; the topic sentence expresses this idea. The paragraph should be organized either spatially, chronologically, or logically. The movement may be from general to specific, specific to general, or general to specific to general. All paragraphs must contain developed ideas: comparisons, examples, explanations, definitions, causes, effects, processes, or descriptions. There are several concluding strategies which may be combined or used singly, depending on the assignment’s length and purpose:

a summary of the main points
a hook and return to the introductory “attention-getter” to frame the essay
a web conclusion which relates the topic to a larger context of a greater significance
a proposal calling for action or further examination of the topic
a question which provokes the reader
a vivid image or compelling narrative [3]

Put Paragraphs into Sections You should be able to group your paragraphs so that they make a particular point or argument that supports your thesis. If any paragraph, besides the introduction or conclusion, cannot fit into any section, you may have to ask yourself whether it belongs in the essay.

Re-examine each Section Assuming you have more than one paragraph under each section, try to distinguish between them. Perhaps you have two arguments in favor of that can be distinguished from each other by author, logic, ethical principles invoked, etc. Write down the distinctions — they will help you formulate clear topic sentences.

Re-examine the Entire Argument Which section do you want to appear first? Why? Which Second? Why? In what order should the paragraphs appear in each section? Look for an order that makes the strongest possible argument. [4]

Organizing an Essay ↵
Reorganizing Your Draft ↵
Parts of an Essay ↵
Authored by : J. Indigo Eriksen. Provided by : Blue Ridge Community College. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
Image of kitchen. Authored by : Elissa Merola. Located at : https://flic.kr/p/5u4XQt . License : CC BY-NC-ND: Attribution-NonCommercial-NoDerivatives
Image of two men talking. Authored by : Lovelorn Poets. Located at : https://flic.kr/p/at9FgL . License : CC BY: Attribution
Organizing an Essay. Authored by : Robin Parent. Provided by : Utah State University English Department. Project : USU Open CourseWare Initiative. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

school Campus Bookshelves
menu_book Bookshelves
perm_media Learning Objects
login Login
how_to_reg Request Instructor Account
hub Instructor Commons

Margin Size

Download Page (PDF)
Download Full Book (PDF)
Periodic Table
Physics Constants
Scientific Calculator
Reference & Cite
Tools expand_more
Readability

selected template will load here

This action is not available.

6.14: Essay Organization

Last updated
Save as PDF
Page ID 58331
Lumen Learning

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

Learning Objectives

Examine the basic organization of traditional essays

Although college essays can offer ideas in many ways, one standard structure for expository essays is to offer the main idea or assertion early in the essay, and then offer categories of support.

One way to think about this standard structure is to compare it to a courtroom argument in a television drama. The lawyer asserts, “My client is not guilty.” Then the lawyer provides different reasons for lack of guilt: no physical evidence placing the client at the crime scene, client had no motive for the crime, and more.

In writing terms, the assertion is the thesis sentence , and the different reasons are the topic sentences . Consider this following example:

Topic Sentence (reason) #1: Workers need to learn how to deal with change.
Topic Sentence (reason) #2: Because of dealing with such a rapidly changing work environment, 21st-century workers need to learn how to learn.
Topic Sentence (reason) #3: Most of all, in order to negotiate rapid change and learning, workers in the 21st century need good communication skills.

As you can see, the supporting ideas in an essay develop out of the main assertion or argument in the thesis sentence.

Essay Organization

The structural organization of an essay will vary, depending on the type of writing task you’ve been assigned, but they generally follow this basic structure:

Introduction

The introduction introduces the reader to the topic. We’ve all heard that first impressions are important. This is very true in writing as well. The goal is to engage the readers, hook them so they want to read on. Sometimes this involves giving an example, telling a story or narrative, asking a question, or building up the situation. The introduction should almost always include the thesis statement.

Body Paragraphs

The body of the essay is separated into paragraphs. Each paragraph usually covers a single claim or argues a single point, expanding on what was introduced in the thesis statement. For example, according to the National Institute of Mental Health, the two main causes of schizophrenia are genetic and environmental. Thus, if you were writing about the causes of schizophrenia, then you would have a body paragraph on genetic causes of schizophrenia and a body paragraph on the environmental causes.

A body paragraph usually includes the following:

Topic sentence that identifies the topic for the paragraph
Several sentences that describe and support the topic sentence

Remember that information from outside sources should be placed in the middle of the paragraph and not at the beginning or the end of the paragraph so that you have time to introduce and explain the outside content
Quotation marks placed around any information taken verbatim (word for word) from the source
Summary sentence(s) that draws conclusions from the evidence
Transitions or bridge sentences between paragraphs.
Draw final conclusions from the key points and evidence provided in the paper;
For example, if you began with a story, draw final conclusions from that story; If you began with a question(s), refer back to the question(s) and be sure to provide the answer(s).

Step through this presentation to review the main components of an essay, then see if you can correctly organize the essay below.

Contributors and Attributions

Revision and Adaptation. Provided by : Lumen Learning. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
Writing an Essay. Provided by : QUT Cite Write. Located at : http://www.citewrite.qut.edu.au/write/essay.jsp . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
Image of Choosing Paragraph Patterns. Authored by : GrinnPidgeon. Located at : flic.kr/p/a9oiLS. License : CC BY-SA: Attribution-ShareAlike
Essay Structure. Authored by : Marianne Botos, Lynn McClelland, Stephanie Polliard, Pamela Osback . Located at : https://pvccenglish.files.wordpress.com/2010/09/eng-101-inside-pages-proof2-no-pro.pdf . Project : Horse of a Different Color: English Composition and Rhetoric . License : CC BY: Attribution
Traditional Structure. Provided by : Excelsior OWL. Located at : https://owl.excelsior.edu/writing-process/essay-writing/essay-writing-traditional-structure-activity/ . License : CC BY: Attribution
Image of writing in the sand. Authored by : Michitogo. Provided by : Pixabay. Located at : pixabay.com/photos/the-end-sand-end-beach-text-1544913/. License : Other . License Terms : pixabay.com/service/terms/#license

Writing Home
Writing Advice Home

Organizing an Essay

Printable PDF Version
Fair-Use Policy

Some basic guidelines

The best time to think about how to organize your paper is during the pre-writing stage, not the writing or revising stage. A well-thought-out plan can save you from having to do a lot of reorganizing when the first draft is completed. Moreover, it allows you to pay more attention to sentence-level issues when you sit down to write your paper.

When you begin planning, ask the following questions: What type of essay am I going to be writing? Does it belong to a specific genre? In university, you may be asked to write, say, a book review, a lab report, a document study, or a compare-and-contrast essay. Knowing the patterns of reasoning associated with a genre can help you to structure your essay.

For example, book reviews typically begin with a summary of the book you’re reviewing. They then often move on to a critical discussion of the book’s strengths and weaknesses. They may conclude with an overall assessment of the value of the book. These typical features of a book review lead you to consider dividing your outline into three parts: (1) summary; (2) discussion of strengths and weaknesses; (3) overall evaluation. The second and most substantial part will likely break down into two sub-parts. It is up to you to decide the order of the two subparts—whether to analyze strengths or weaknesses first. And of course it will be up to you to come up with actual strengths and weaknesses.

Be aware that genres are not fixed. Different professors will define the features of a genre differently. Read the assignment question carefully for guidance.

Understanding genre can take you only so far. Most university essays are argumentative, and there is no set pattern for the shape of an argumentative essay. The simple three-point essay taught in high school is far too restrictive for the complexities of most university assignments. You must be ready to come up with whatever essay structure helps you to convince your reader of the validity of your position. In other words, you must be flexible, and you must rely on your wits. Each essay presents a fresh problem.

Avoiding a common pitfall

Though there are no easy formulas for generating an outline, you can avoid one of the most common pitfalls in student papers by remembering this simple principle: the structure of an essay should not be determined by the structure of its source material. For example, an essay on an historical period should not necessarily follow the chronology of events from that period. Similarly, a well-constructed essay about a literary work does not usually progress in parallel with the plot. Your obligation is to advance your argument, not to reproduce the plot.

If your essay is not well structured, then its overall weaknesses will show through in the individual paragraphs. Consider the following two paragraphs from two different English essays, both arguing that despite Hamlet’s highly developed moral nature he becomes morally compromised in the course of the play:

(a) In Act 3, Scene 4, Polonius hides behind an arras in Gertrude’s chamber in order to spy on Hamlet at the bidding of the king. Detecting something stirring, Hamlet draws his sword and kills Polonius, thinking he has killed Claudius. Gertrude exclaims, “O, what a rash and bloody deed is this!” (28), and her words mark the turning point in Hamlet’s moral decline. Now Hamlet has blood on his hands, and the blood of the wrong person. But rather than engage in self-criticism, Hamlet immediately turns his mother’s words against her: “A bloody deed — almost as bad, good Mother, as kill a king, and marry with his brother” (29-30). One of Hamlet’s most serious shortcomings is his unfair treatment of women. He often accuses them of sins they could not have committed. It is doubtful that Gertrude even knows Claudius killed her previous husband. Hamlet goes on to ask Gertrude to compare the image of the two kings, old Hamlet and Claudius. In Hamlet’s words, old Hamlet has “Hyperion’s curls,” the front of Jove,” and “an eye like Mars” (57-58). Despite Hamlet’s unfair treatment of women, he is motivated by one of his better qualities: his idealism. (b) One of Hamlet’s most serious moral shortcomings is his unfair treatment of women. In Act 3, Scene 1, he denies to Ophelia ever having expressed his love for her, using his feigned madness as cover for his cruelty. Though his rantings may be an act, they cannot hide his obsessive anger at one particular woman: his mother. He counsels Ophelia to “marry a fool, for wise men know well enough what monsters you make of them” (139-41), thus blaming her in advance for the sin of adultery. The logic is plain: if Hamlet’s mother made a cuckold out of Hamlet’s father, then all women are capable of doing the same and therefore share the blame. The fact that Gertrude’s hasty remarriage does not actually constitute adultery only underscores Hamlet’s tendency to find in women faults that do not exist. In Act 3, Scene 4, he goes as far as to suggest that Gertrude shared responsibility in the murder of Hamlet’s father (29-30). By condemning women for actions they did not commit, Hamlet is doing just what he accuses Guildenstern of doing to him: he is plucking out the “heart” of their “mystery” (3.2.372-74).

The second of these two paragraphs is much stronger, largely because it is not plot-driven. It makes a well-defined point about Hamlet’s moral nature and sticks to that point throughout the paragraph. Notice that the paragraph jumps from one scene to another as is necessary, but the logic of the argument moves along a steady path. At any given point in your essays, you will want to leave yourself free to go wherever you need to in your source material. Your only obligation is to further your argument. Paragraph (a) sticks closely to the narrative thread of Act 3, Scene 4, and as a result the paragraph makes several different points with no clear focus.

What does an essay outline look like?

Most essay outlines will never be handed in. They are meant to serve you and no one else. Occasionally, your professor will ask you to hand in an outline weeks prior to handing in your paper. Usually, the point is to ensure that you are on the right track. Nevertheless, when you produce your outline, you should follow certain basic principles. Here is an example of an outline for an essay on Hamlet :

This is an example of a sentence outline. Another kind of outline is the topic outline. It consists of fragments rather than full sentences. Topic outlines are more open-ended than sentence outlines: they leave much of the working out of the argument for the writing stage.

When should I begin putting together a plan?

The earlier you begin planning, the better. It is usually a mistake to do all of your research and note-taking before beginning to draw up an outline. Of course, you will have to do some reading and weighing of evidence before you start to plan. But as a potential argument begins to take shape in your mind, you may start to formalize your thoughts in the form of a tentative plan. You will be much more efficient in your reading and your research if you have some idea of where your argument is headed. You can then search for evidence for the points in your tentative plan while you are reading and researching. As you gather evidence, those points that still lack evidence should guide you in your research. Remember, though, that your plan may need to be modified as you critically evaluate your evidence.

How can I construct a usable plan?

Here are two methods for constructing a plan. The first works best on the computer. The second method works well for those who think visually. It is often the method of choice for those who prefer to do some of their thinking with pen and paper, though it can easily be transposed to a word processor or your graphic software of choice.

method 1: hierarchical outline

This method usually begins by taking notes. Start by collecting potential points, as well as useful quotations and paraphrases of quotations, consecutively. As you accumulate notes, identify key points and start to arrange those key points into an outline. To build your outline, take advantage of outline view in Word or numbered lists in Google Docs. Or consider one of the specialized apps designed to help organize ideas: Scrivener, Microsoft OneNote, Workflowy, among others. All these tools make it easy for you to arrange your points hierarchically and to move those points around as you refine your plan.You may, at least initially, keep your notes and your outline separate. But there is no reason for you not to integrate your notes into the plan. Your notes—minor points, quotations, and paraphrases—can all be interwoven into the plan, just below the main points they support. Some of your notes may not find a place in your outline. If so, either modify the plan or leave those points out.

method 2: the circle method

This method is designed to get your key ideas onto a single page, where you can see them all at once. When you have an idea, write it down, and draw a circle around it. When you have an idea that supports another idea, do the same, but connect the two circles with a line. Supporting source material can be represented concisely by a page reference inside a circle. The advantage of the circle method is that you can see at a glance how things tie together; the disadvantage is that there is a limit to how much material you can cram onto a page.

Here is part of a circle diagram

Once you are content with your diagram, you have the option of turning it into an essay outline.

What is a reverse outline?

When you have completed your first draft, and you think your paper can be better organized, consider using a reverse outline. Reverse outlines are simple to create. Just read through your essay, and every time you make a new point, summarize it in the margin. If the essay is reasonably well-organized, you should have one point in the margin for each paragraph, and your points read out in order should form a coherent argument. You might, however, discover that some of your points are repeated at various places in your essay. Other points may be out of place, and still other key points may not appear at all. Think of all these points as the ingredients of an improved outline which you now must create. Use this new outline to cut and paste the sentences into a revised version of your essay, consolidating points that appear in several parts of your essay while eliminating repetition and creating smooth transitions where necessary.

You can improve even the most carefully planned essay by creating a reverse outline after completing your first draft. The process of revision should be as much about organization as it is about style.

How much of my time should I put into planning?

It is self-evident that a well-planned paper is going to be better organized than a paper that was not planned out. Thinking carefully about how you are going to argue your paper and preparing an outline can only add to the quality of your final product. Nevertheless, some people find it more helpful than others to plan. Those who are good at coming up with ideas but find writing difficult often benefit from planning. By contrast, those who have trouble generating ideas but find writing easy may benefit from starting to write early. Putting pen to paper (or typing away at the keyboard) may be just what is needed to get the ideas to flow.

You have to find out for yourself what works best for you, though it is fair to say that at least some planning is always a good idea. Think about whether your current practices are serving you well. You know you’re planning too little if the first draft of your essays is always a disorganized mess, and you have to spend a disproportionate amount of time creating reverse outlines and cutting and pasting material. You know you’re planning too much if you always find yourself writing your paper a day before it’s due after spending weeks doing research and devising elaborate plans.

Be aware of the implications of planning too little or too much.

Planning provides the following advantages :

helps you to produce a logical and orderly argument that your readers can follow
helps you to produce an economical paper by allowing you to spot repetition
helps you to produce a thorough paper by making it easier for you to notice whether you have left anything out
makes drafting the paper easier by allowing you to concentrate on writing issues such as grammar, word choice, and clarity

Overplanning poses the following risks :

doesn’t leave you enough time to write and revise
leads you to produce papers that try to cover too much ground at the expense of analytic depth
can result in a writing style that lacks spontaneity and ease
does not provide enough opportunity to discover new ideas in the process of writing

Walden University
Faculty Portal

Writing a Paper: Organizing Your Thoughts

Stacks of notes, books, and course materials in front of a blank computer screen may cause a moment of writer's block as you go to organize your paper, but there is no need to panic. Instead, organizing your paper will give you a sense of control and allow you to better integrate your ideas as you start to write.

Organizing your paper can be a daunting task if you begin too late, so organizing a paper should take place during the reading and note-taking process . As you read and take notes, make sure to group your data into self-contained categories . These categories will help you to build the structure of your paper.

Take, for example, a paper about children's education and the quantity of television children watch. Some categories may be the following:

Amount of television children watch (by population, age, gender, etc.)
Behaviors or issues linked to television watching (obesity, ADHD, etc.)
Outcomes linked to television watching (performance in school, expected income, etc.)
Factors influencing school performance (parent involvement, study time, etc.)

The list above holds some clear themes that may emerge you as read through the literature. It is sometimes a challenge to know what information to group together into a category. Sources that share similar data, support one another, or bring about similar concerns may be a good place to start looking for such categories.

For example, let's say you had three sources that had the following information:

The average American youth spends 900 hours in school over the course of a school year; the average American youth watches 1500 hours of television a year (Herr, 2001).
"According to the American Academy of Pediatrics (AAP), kids in the United States watch about 4 hours of TV a day - even though the AAP guidelines say children older than 2 should watch no more than 1 to 2 hours a day of quality programming" (Folder, Crisp, & Watson, 2005, p. 2).
"According to AAP (2007) guidelines, children under age 2 should have no screen time (TV, DVDs or videotapes, computers, or video games) at all. During the first 2 years, a critical time for brain development, TV can get in the way of exploring, learning, and spending time interacting and playing with parents and others, which helps young children develop the skills they need to grow cognitively, physically, socially, and emotionally" (Folder, Crisp, & Watson, 2005, p. 9).

With these three ideas, you might group them under this category: Amount of television children watch.

Each of these source quotations or paraphrases supports that category. For each group of information, repeat this process to group similar categories together. Then you can move on to order the information you gather.

Once you have read your sources, taken notes, and grouped your information by category, the next step is to read critically , evaluate your sources , determine your thesis statement , and decide the best order in which to present your research. Note that as you begin to narrow your topic or focus, you will find some sources are not relevant. That is fine! Do not try to squeeze every source mentioning "children" and "television" into your paper.

Let's say you have come up with the following categories from the sources you have read:

Children watch more than the recommended amount of television.
The more television children watch, the less likely they are to study.
Certain groups of children watch more television than others.
Students whose grades are poor in high school are 56% less likely to graduate from college.
Poor performance in middle school correlates to poor high school performance.

You will want the order of your material to advance and prove your thesis. Every thesis needs to be capable of advancement. Let's assume that your thesis is Children who watch more than the recommended amount of television are less likely to receive a college education. In this case, it seems that you will want to start off by showing that there is a problem, and then giving examples of that problem and its consequences.

The best order for these categories would be the following:

Poor performance in middle school correlates to poor high school performance

The way a paper is organized is largely the result of the logical and causal relationships between the categories or topics apparent in the research. In other words, each category's placement is specifically chosen so that it is the result of the previous theme and able to contribute to the next, as the previous example shows. It is often a good practice to save your strongest argument or evidence until the end of the paper and build up to it. Using careful organization to advance your thesis will help guide your reader to your conclusion!

Mindmapping Video

Note that this video was created while APA 6 was the style guide edition in use. There may be some examples of writing that have not been updated to APA 7 guidelines.

Prewriting Demonstrations: Mindmapping (video transcript)

Related Resources

Didn't find what you need? Email us at [email protected] .

Previous Page: Outlining
Next Page: Drafting
Office of Student Disability Services

Walden Resources

Departments.

Academic Residencies
Academic Skills
Career Planning and Development
Customer Care Team
Field Experience
Military Services
Student Success Advising
Writing Skills

Centers and Offices

Center for Social Change
Office of Academic Support and Instructional Services
Office of Degree Acceleration
Office of Research and Doctoral Services
Office of Student Affairs

Student Resources

Doctoral Writing Assessment
Form & Style Review
Quick Answers
ScholarWorks
SKIL Courses and Workshops
Walden Bookstore
Walden Catalog & Student Handbook
Student Safety/Title IX
Legal & Consumer Information
Website Terms and Conditions
Cookie Policy
Accessibility
Accreditation
State Authorization
Net Price Calculator
Contact Walden

PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
EDIT Edit this Article
EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
Browse Articles
Learn Something New
Quizzes Hot
This Or That Game
Train Your Brain
Explore More
Support wikiHow
About wikiHow
Log in / Sign up
Education and Communications
College University and Postgraduate
Academic Writing

How to Organize an Essay

Last Updated: March 27, 2023 Fact Checked

This article was co-authored by Jake Adams . Jake Adams is an academic tutor and the owner of Simplifi EDU, a Santa Monica, California based online tutoring business offering learning resources and online tutors for academic subjects K-College, SAT & ACT prep, and college admissions applications. With over 14 years of professional tutoring experience, Jake is dedicated to providing his clients the very best online tutoring experience and access to a network of excellent undergraduate and graduate-level tutors from top colleges all over the nation. Jake holds a BS in International Business and Marketing from Pepperdine University. There are 17 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 285,248 times.

Essay Template and Sample Essay

Laying the Groundwork

Step 1 Determine the type of essay you're writing.

For example, a high-school AP essay should have a very clear structure, with your introduction and thesis statement first, 3-4 body paragraphs that further your argument, and a conclusion that ties everything together.
On the other hand, a creative nonfiction essay might wait to present the thesis till the very end of the essay and build up to it.
A compare-and-contrast essay can be organized so that you compare two things in a single paragraph and then have a contrasting paragraph, or you can organize it so that you compare and contrast a single thing in the same paragraph.
You can also choose to organize your essay chronologically, starting at the beginning of the work or historical period you're discussing and going through to the end. This can be helpful for essays where chronology is important to your argument (like a history paper or lab report), or if you're telling a story in your essay.
The “support” structure begins with your thesis laid out clearly in the beginning and supports it through the rest of the essay.
The “discovery” structure builds to the thesis by moving through points of discussion until the thesis seems the inevitable, correct view.
The “exploratory” structure looks at the pros and cons of your chosen topic. It presents the various sides and usually concludes with your thesis.

If you haven't been given an assignment, you can always run ideas by your instructor or advisor to see if they're on track.
Ask questions about anything you don't understand. It's much better to ask questions before you put hours of work into your essay than it is to have to start over because you didn't clarify something. As long as you're polite, almost all instructors will be happy to answer your questions.

For example, are you writing an opinion essay for your school newspaper? Your fellow students are probably your audience in this case. However, if you're writing an opinion essay for the local newspaper, your audience could be people who live in your town, people who agree with you, people who don't agree with you, people who are affected by your topic, or any other group you want to focus on.

Getting the Basics Down

A thesis statement acts as the “road map” for your paper. It tells your audience what to expect from the rest of your essay.
Include the most salient points within your thesis statement. For example, your thesis may be about the similarity between two literary works. Describe the similarities in general terms within your thesis statement.
Consider the “So what?” question. A good thesis will explain why your idea or argument is important. Ask yourself: if a friend asked you “So what?” about your thesis, would you have an answer?
The “3-prong thesis” is common in high school essays, but is often frowned upon in college and advanced writing. Don't feel like you have to restrict yourself to this limited form.
Revise your thesis statement. If in the course of writing your essay you discover important points that were not touched upon in your thesis, edit your thesis.

If you have a librarian available, don't be afraid to consult with him or her! Librarians are trained in helping you identify credible sources for research and can get you started in the right direction.

Try freewriting. With freewriting, you don't edit or stop yourself. You just write (say, for 15 minutes at a time) about anything that comes into your head about your topic.
Try a mind map. Start by writing down your central topic or idea, and then draw a box around it. Write down other ideas and connect them to see how they relate. [14] X Research source
Try cubing. With cubing, you consider your chosen topic from 6 different perspectives: 1) Describe it, 2) Compare it, 3) Associate it, 4) Analyze it, 5) Apply it, 6) Argue for and against it.

If your original thesis was very broad, you can also use this chance to narrow it down. For example, a thesis about “slavery and the Civil War” is way too big to manage, even for a doctoral dissertation. Focus on more specific terms, which will help you when you start you organize your outline. [16] X Trustworthy Source University of North Carolina Writing Center UNC's on-campus and online instructional service that provides assistance to students, faculty, and others during the writing process Go to source

Organizing the Essay

Step 1 Create an outline of the points to include in your essay.

Determine the order in which you will discuss the points. If you're planning to discuss 3 challenges of a particular management strategy, you might capture your reader's attention by discussing them in the order of most problematic to least. Or you might choose to build the intensity of your essay by starting with the smallest problem first.

Step 2 Avoid letting your sources drive your organization.

For example, a solid paragraph about Hamlet's insanity could draw from several different scenes in which he appears to act insane. Even though these scenes don't all cluster together in the original play, discussing them together will make a lot more sense than trying to discuss the whole play from start to finish.

Step 3 Write topic sentences for each paragraph.

Ensure that your topic sentence is directly related to your main argument. Avoid statements that may be on the general topic, but not directly relevant to your thesis.
Make sure that your topic sentence offers a “preview” of your paragraph's argument or discussion. Many beginning writers forget to use the first sentence this way, and end up with sentences that don't give a clear direction for the paragraph.
For example, compare these two first sentences: “Thomas Jefferson was born in 1743” and “Thomas Jefferson, who was born in 1743, became one of the most important people in America by the end of the 18th century.”
The first sentence doesn't give a good direction for the paragraph. It states a fact but leaves the reader clueless about the fact's relevance. The second sentence contextualizes the fact and lets the reader know what the rest of the paragraph will discuss.

Step 4 Use transitional words and sentences.

Transitions help underline your essay's overall organizational logic. For example, beginning a paragraph with something like “Despite the many points in its favor, Mystic Pizza also has several elements that keep it from being the best pizza in town” allows your reader to understand how this paragraph connects to what has come before.
Transitions can also be used inside paragraphs. They can help connect the ideas within a paragraph smoothly so your reader can follow them.
If you're having a lot of trouble connecting your paragraphs, your organization may be off. Try the revision strategies elsewhere in this article to determine whether your paragraphs are in the best order.
The Writing Center at the University of Wisconsin - Madison has a handy list of transitional words and phrases, along with the type of transition they indicate. [22] X Research source

You can try returning to your original idea or theme and adding another layer of sophistication to it. Your conclusion can show how necessary your essay is to understanding something about the topic that readers would not have been prepared to understand before.
For some types of essays, a call to action or appeal to emotions can be quite helpful in a conclusion. Persuasive essays often use this technique.
Avoid hackneyed phrases like “In sum” or “In conclusion.” They come across as stiff and cliched on paper.

Revising the Plan

You can reverse-outline on the computer or on a printed draft, whichever you find easier.
As you read through your essay, summarize the main idea (or ideas) of each paragraph in a few key words. You can write these on a separate sheet, on your printed draft, or as a comment in a word processing document.
Look at your key words. Do the ideas progress in a logical fashion? Or does your argument jump around?
If you're having trouble summarizing the main idea of each paragraph, it's a good sign that your paragraphs have too much going on. Try splitting your paragraphs up.

You may also find with this technique that your topic sentences and transitions aren't as strong as they could be. Ideally, your paragraphs should have only one way they could be organized for maximum effectiveness. If you can put your paragraphs in any order and the essay still kind of makes sense, you may not be building your argument effectively.

For example, you might find that placing your least important argument at the beginning drains your essay of vitality. Experiment with the order of the sentences and paragraphs for heightened effect.

Expert Q&A

↑ Jake Adams. Academic Tutor & Test Prep Specialist. Expert Interview. 20 May 2020.
↑ http://www.writing.utoronto.ca/advice/planning-and-organizing/organizing
↑ http://writingcenter.unc.edu/handouts/understanding-assignments/
↑ https://open.lib.umn.edu/writingforsuccess/chapter/6-1-purpose-audience-tone-and-content/
↑ https://www.student.unsw.edu.au/writing-your-essay
↑ https://www.hamilton.edu/writing/writing-resources/persuasive-essays
↑ http://writingcenter.unc.edu/handouts/thesis-statements/
↑ http://writingcenter.unc.edu/handouts/brainstorming/
↑ https://owl.english.purdue.edu/engagement/2/2/53/
↑ https://pressbooks.library.torontomu.ca/scholarlywriting/chapter/revising-a-thesis-statement/
↑ http://writingcenter.unc.edu/handouts/reorganizing-drafts/
↑ https://www.grammarly.com/blog/essay-outline/
↑ https://wts.indiana.edu/writing-guides/paragraphs-and-topic-sentences.html
↑ http://writingcenter.unc.edu/handouts/transitions/
↑ https://writing.wisc.edu/Handbook/Transitions.html
↑ http://writingcenter.unc.edu/handouts/conclusions/
↑ https://writingcenter.unc.edu/tips-and-tools/reading-aloud/

About This Article

To organize an essay, start by writing a thesis statement that makes a unique observation about your topic. Then, write down each of the points you want to make that support your thesis statement. Once you have all of your main points, expand them into paragraphs using the information you found during your research. Finally, close your essay with a conclusion that reiterates your thesis statement and offers additional insight into why it’s important. For tips from our English reviewer on how to use transitional sentences to help your essay flow better, read on! Did this summary help you? Yes No

Send fan mail to authors

Reader Success Stories

Roxana Salgado

Dec 6, 2016

Did this article help you?

Jacky Tormo

Jul 22, 2016

Rosalba Ramirez

Feb 2, 2017

Gulshan Kumar Singh

Sep 4, 2016

Nov 30, 2016

Featured Articles

Watch Articles

Terms of Use
Privacy Policy
Do Not Sell or Share My Info
Not Selling Info

Get all the best how-tos!

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

A Message From the Assistant Director of Content Development

The Purdue OWL® is committed to supporting students, instructors, and writers by offering a wide range of resources that are developed and revised with them in mind. To do this, the OWL team is always exploring possibilties for a better design, allowing accessibility and user experience to guide our process. As the OWL undergoes some changes, we welcome your feedback and suggestions by email at any time.

Please don't hesitate to contact us via our contact page if you have any questions or comments.

All the best,

Social Media

Facebook twitter.

A Writer's Handbook

Introduction
Purpose & Audience
Opening Sentences
Linking Sentences
Finished Introduction
Topic Sentences
Development
Conclusion Sentences
Conclusion Paragraphs for Essays

Essay Writing Organization

Annotating Readings
General Writing Idea Development
Rhetorical and Visual Analysis Idea Development
Character Analysis Idea Development
Theme Analysis Idea Development
Theory Analysis
Using the Library
Using Sources for Illustration or Support
Using Research for Essays
Writing About Research
MLA Handbook Summary for Citations
Final Thoughts on Essays
Literary Element Index
Appendix of Example Papers

Writing is a process that everyone does differently, but an outline will help you with development of ideas.

I. Choice of introduction and linking sentences

A. Catching opener

B. Linking Sentences

C. Thesis

II. Topic Sentence One with Transition, Link to Thesis, and Topic of paragraph mentioned

A. Transition, and mention of first subtopic

Example, quote, illustration of subtopic
Explanation of the example, quote, or illustration of subtopic
If there is only one subtopic, the paragraph will need an additional example(s), quote(s), or illustration(s) and additional explanation(s)

B. Transition, and other subtopic, if applicable

Example, quote, illustration of subtopic

III. Topic Sentence Two with Transition, Link to Thesis, and Topic of paragraph mentioned

A. Transition, and mention of first subtopic

IV. And so on for as many body paragraphs as needed

V. Conclusion

A. Restate idea

B. Sum up main points of all main ideas within

C. Clincher

Graphic Organizer For those who are more visual, this is a downloadable PDF of a graphic organizer for the essay outline.
<< Previous: Conclusion Paragraphs for Essays
Next: Annotating Readings >>
Last Updated: Jan 3, 2023 9:01 AM
URL: https://library.jeffersonstate.edu/AWH

Featured Essay The Love of God An essay by Sam Storms Read Now
Faithfulness of God
Saving Grace
Adoption by God

U.S. Edition

Arts & Culture
Bible & Theology
Christian Living
Current Events
Faith & Work
As In Heaven
Gospelbound
Post-Christianity?
TGC Podcast
You're Not Crazy
Churches Planting Churches
Help Me Teach The Bible
Word Of The Week
Upcoming Events
Past Conference Media
Foundation Documents
Regional Chapters
Church Directory
Global Resourcing
Donate to TGC

To All The World

The world is a confusing place right now. We believe that faithful proclamation of the gospel is what our hostile and disoriented world needs. Do you believe that too? Help TGC bring biblical wisdom to the confusing issues across the world by making a gift to our international work.

Announcing TGC’s 2024 Essay Contest for Young Adults

Writers aged 16–22 can get published and win $500.

More By Staff

The Gospel Coalition announces its 2024 essay contest, inviting young adults (ages 16–22) to explore and write about God’s faithfulness, their relationship with technology, and their heart for full-time ministry in our secular age.

Winning authors will receive a prize, and their essays will be published on TGC’s website. In addition, every writer who submits an essay will receive a coupon code for $50 off the Gen-Z registration for our TGC25 conference .

Essay Requirements

Each 800–1,000 word essay must be original, previously unpublished, and must respond to one of the following three prompts. With each of these prompts, contestants should draw from their own experiences and convictions, and use Scripture to support their conclusions. (Want examples? Read the winning essays from 2022 and 2023 .) Contestants must give permission to TGC to publish their work, and each essay will be judged by TGC’s editorial team.

Submissions will be accepted from June 1 to July 1 and winners will be announced on September 2, 2024.

1. When did the Lord love you by not giving you what you wanted?

Many of us have unfulfilled desires. When was a time you saw the Lord’s love and kindness when he withheld something from you? What was it that you wanted and how did you see the Lord’s faithfulness through not giving it to you? Tell us what you learned from your experience, especially considering that our culture tells us we deserve to have all our desires fulfilled.

2. How has the gospel changed your relationship with your phone?

Today, phones are considered a necessity rather than a luxury. How does the truth of the gospel of Jesus Christ change how you view your phone and how you use it? How has your phone been a hindrance and how has it been an asset to your relationship with the Lord? Tell us what you’ve learned in navigating how to use your phone for the glory of God.

3. Why are you considering full-time ministry?

There’s a greater need than ever for young people to pursue full-time ministry. Why are you considering making ministry your vocation? Tell us your heart behind it, why you think it’s important, and what influences in your life have led you to move forward in this direction.

The contest winner will receive $500; second place will receive a $100 gift card to the TGC bookstore; third place will receive an assortment of books. The winning essays will be published on TGC’s website, as will any other essays the judges select.

Read the full contest rules and upload your essay. Questions? Contact [email protected] .

Now Trending

1 understanding the metamodern mood, 2 hope for struggling christians during pride month, 3 how the legal system enabled—and will curtail—the transgender movement, 4 announcing tgc’s 2024 essay contest for young adults, 5 the 11 beliefs you should know about jehovah’s witnesses when they knock at the door.

Preaching Christ in a Postmodern World

The desegregation of dallas theological seminary.

An Unequally Yoked Small Group

The Omnipotence, Omniscience, and Omnipresence of God

What Should We Think About Paedocommunion?

12 Easy Ways to Improve Your Listening

My Friend, Randy Newman (1956–2024)

Latest Episodes

Trevin wax on reconstructing faith.

Examining the Current and Future State of the Global Church

David Brooks Explores the Amazing Power of Truly Seeing Others

Welcome and Witness: How to Reach Out in a Secular Age

How to Build Gospel Culture: A Q&A Conversation

Gaming Alone: Helping the Generation of Young Men Captivated and Isolated by Video Games

How to Hope in Hard Times

Faith & Work: How Do I Glorify God Even When My Work Seems Meaningless?

Let’s Talk Reunion: The Blessings of Bible Study with Friends

Getting Rid of Your Fear of the Book of Revelation

Looking for Love in All the Wrong Places: A Sermon from Julius Kim

Artwork for the Acts 29 Churches Planting Churches Podcast

Introducing The Acts 29 Podcast

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 03 June 2024

Applying large language models for automated essay scoring for non-native Japanese

Wenchao Li 1 &
Haitao Liu 2

Humanities and Social Sciences Communications volume 11 , Article number: 723 ( 2024 ) Cite this article

12 Accesses

1 Altmetric

Metrics details

Language and linguistics

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Scoring method of English composition integrating deep learning in higher vocational colleges

ChatGPT-3.5 as writing assistance in students’ essays

Detecting contract cheating through linguistic fingerprint

Conventional machine learning technology in aes.

AES has experienced significant growth with the advancement of machine learning technologies in recent decades. In the earlier stages of AES development, conventional machine learning-based approaches were commonly used. These approaches involved the following procedures: a) feeding the machine with a dataset. In this step, a dataset of essays is provided to the machine learning system. The dataset serves as the basis for training the model and establishing patterns and correlations between linguistic features and human ratings. b) the machine learning model is trained using linguistic features that best represent human ratings and can effectively discriminate learners’ writing proficiency. These features include lexical richness (Lu, 2012 ; Kyle and Crossley, 2015 ; Kyle et al. 2021 ), syntactic complexity (Lu, 2010 ; Liu, 2008 ), text cohesion (Crossley and McNamara, 2016 ), and among others. Conventional machine learning approaches in AES require human intervention, such as manual correction and annotation of essays. This human involvement was necessary to create a labeled dataset for training the model. Several AES systems have been developed using conventional machine learning technologies. These include the Intelligent Essay Assessor (Landauer et al. 2003 ), the e-rater engine by Educational Testing Service (Attali and Burstein, 2006 ; Burstein, 2003 ), MyAccess with the InterlliMetric scoring engine by Vantage Learning (Elliot, 2003 ), and the Bayesian Essay Test Scoring system (Rudner and Liang, 2002 ). These systems have played a significant role in automating the essay scoring process and providing quick and consistent feedback to learners. However, as touched upon earlier, conventional machine learning approaches rely on predetermined linguistic features and often require manual intervention, making them less flexible and potentially limiting their generalizability to different contexts.

In the context of the Japanese language, conventional machine learning-incorporated AES tools include Jess (Ishioka and Kameda, 2006 ) and JWriter (Lee and Hasebe, 2017 ). Jess assesses essays by deducting points from the perfect score, utilizing the Mainichi Daily News newspaper as a database. The evaluation criteria employed by Jess encompass various aspects, such as rhetorical elements (e.g., reading comprehension, vocabulary diversity, percentage of complex words, and percentage of passive sentences), organizational structures (e.g., forward and reverse connection structures), and content analysis (e.g., latent semantic indexing). JWriter employs linear regression analysis to assign weights to various measurement indices, such as average sentence length and total number of characters. These weights are then combined to derive the overall score. A pilot study involving the Jess model was conducted on 1320 essays at different proficiency levels, including primary, intermediate, and advanced. However, the results indicated that the Jess model failed to significantly distinguish between these essay levels. Out of the 16 measures used, four measures, namely median sentence length, median clause length, median number of phrases, and maximum number of phrases, did not show statistically significant differences between the levels. Additionally, two measures exhibited between-level differences but lacked linear progression: the number of attributives declined words and the Kanji/kana ratio. On the other hand, the remaining measures, including maximum sentence length, maximum clause length, number of attributive conjugated words, maximum number of consecutive infinitive forms, maximum number of conjunctive-particle clauses, k characteristic value, percentage of big words, and percentage of passive sentences, demonstrated statistically significant between-level differences and displayed linear progression.

Both Jess and JWriter exhibit notable limitations, including the manual selection of feature parameters and weights, which can introduce biases into the scoring process. The reliance on human annotators to label non-native language essays also introduces potential noise and variability in the scoring. Furthermore, an important concern is the possibility of system manipulation and cheating by learners who are aware of the regression equation utilized by the models (Hirao et al. 2020 ). These limitations emphasize the need for further advancements in AES systems to address these challenges.

Deep learning technology in AES

Deep learning has emerged as one of the approaches for improving the accuracy and effectiveness of AES. Deep learning-based AES methods utilize artificial neural networks that mimic the human brain’s functioning through layered algorithms and computational units. Unlike conventional machine learning, deep learning autonomously learns from the environment and past errors without human intervention. This enables deep learning models to establish nonlinear correlations, resulting in higher accuracy. Recent advancements in deep learning have led to the development of transformers, which are particularly effective in learning text representations. Noteworthy examples include bidirectional encoder representations from transformers (BERT) (Devlin et al. 2019 ) and the generative pretrained transformer (GPT) (OpenAI).

BERT is a linguistic representation model that utilizes a transformer architecture and is trained on two tasks: masked linguistic modeling and next-sentence prediction (Hirao et al. 2020 ; Vaswani et al. 2017 ). In the context of AES, BERT follows specific procedures, as illustrated in Fig. 1 : (a) the tokenized prompts and essays are taken as input; (b) special tokens, such as [CLS] and [SEP], are added to mark the beginning and separation of prompts and essays; (c) the transformer encoder processes the prompt and essay sequences, resulting in hidden layer sequences; (d) the hidden layers corresponding to the [CLS] tokens (T[CLS]) represent distributed representations of the prompts and essays; and (e) a multilayer perceptron uses these distributed representations as input to obtain the final score (Hirao et al. 2020 ).

AES system with BERT (Hirao et al. 2020 ).

The training of BERT using a substantial amount of sentence data through the Masked Language Model (MLM) allows it to capture contextual information within the hidden layers. Consequently, BERT is expected to be capable of identifying artificial essays as invalid and assigning them lower scores (Mizumoto and Eguchi, 2023 ). In the context of AES for nonnative Japanese learners, Hirao et al. ( 2020 ) combined the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber ( 1997 ) with BERT to develop a tailored automated Essay Scoring System. The findings of their study revealed that the BERT model outperformed both the conventional machine learning approach utilizing character-type features such as “kanji” and “hiragana”, as well as the standalone LSTM model. Takeuchi et al. ( 2021 ) presented an approach to Japanese AES that eliminates the requirement for pre-scored essays by relying solely on reference texts or a model answer for the essay task. They investigated multiple similarity evaluation methods, including frequency of morphemes, idf values calculated on Wikipedia, LSI, LDA, word-embedding vectors, and document vectors produced by BERT. The experimental findings revealed that the method utilizing the frequency of morphemes with idf values exhibited the strongest correlation with human-annotated scores across different essay tasks. The utilization of BERT in AES encounters several limitations. Firstly, essays often exceed the model’s maximum length limit. Second, only score labels are available for training, which restricts access to additional information.

Mizumoto and Eguchi ( 2023 ) were pioneers in employing the GPT model for AES in non-native English writing. Their study focused on evaluating the accuracy and reliability of AES using the GPT-3 text-davinci-003 model, analyzing a dataset of 12,100 essays from the corpus of nonnative written English (TOEFL11). The findings indicated that AES utilizing the GPT-3 model exhibited a certain degree of accuracy and reliability. They suggest that GPT-3-based AES systems hold the potential to provide support for human ratings. However, applying GPT model to AES presents a unique natural language processing (NLP) task that involves considerations such as nonnative language proficiency, the influence of the learner’s first language on the output in the target language, and identifying linguistic features that best indicate writing quality in a specific language. These linguistic features may differ morphologically or syntactically from those present in the learners’ first language, as observed in (1)–(3).

我-送了-他-一本-书

Wǒ-sòngle-tā-yī běn-shū

1 sg .-give. past- him-one .cl- book

“I gave him a book.”

Agglutinative

彼-に-本-を-あげ-まし-た

Kare-ni-hon-o-age-mashi-ta

3 sg .- dat -hon- acc- give.honorification. past

Inflectional

give, give-s, gave, given, giving

Additionally, the morphological agglutination and subject-object-verb (SOV) order in Japanese, along with its idiomatic expressions, pose additional challenges for applying language models in AES tasks (4).

足-が棒-になり-ました

Ashi-ga bo-ni nar-mashita

leg- nom stick- dat become- past

“My leg became like a stick (I am extremely tired).”

The example sentence provided demonstrates the morpho-syntactic structure of Japanese and the presence of an idiomatic expression. In this sentence, the verb “なる” (naru), meaning “to become”, appears at the end of the sentence. The verb stem “なり” (nari) is attached with morphemes indicating honorification (“ます” - mashu) and tense (“た” - ta), showcasing agglutination. While the sentence can be literally translated as “my leg became like a stick”, it carries an idiomatic interpretation that implies “I am extremely tired”.

To overcome this issue, CyberAgent Inc. ( 2023 ) has developed the Open-Calm series of language models specifically designed for Japanese. Open-Calm consists of pre-trained models available in various sizes, such as Small, Medium, Large, and 7b. Figure 2 depicts the fundamental structure of the Open-Calm model. A key feature of this architecture is the incorporation of the Lora Adapter and GPT-NeoX frameworks, which can enhance its language processing capabilities.

GPT-NeoX Model Architecture (Okgetheng and Takeuchi 2024 ).

In a recent study conducted by Okgetheng and Takeuchi ( 2024 ), they assessed the efficacy of Open-Calm language models in grading Japanese essays. The research utilized a dataset of approximately 300 essays, which were annotated by native Japanese educators. The findings of the study demonstrate the considerable potential of Open-Calm language models in automated Japanese essay scoring. Specifically, among the Open-Calm family, the Open-Calm Large model (referred to as OCLL) exhibited the highest performance. However, it is important to note that, as of the current date, the Open-Calm Large model does not offer public access to its server. Consequently, users are required to independently deploy and operate the environment for OCLL. In order to utilize OCLL, users must have a PC equipped with an NVIDIA GeForce RTX 3060 (8 or 12 GB VRAM).

In summary, while the potential of LLMs in automated scoring of nonnative Japanese essays has been demonstrated in two studies—BERT-driven AES (Hirao et al. 2020 ) and OCLL-based AES (Okgetheng and Takeuchi, 2024 )—the number of research efforts in this area remains limited.

Another significant challenge in applying LLMs to AES lies in prompt engineering and ensuring its reliability and effectiveness (Brown et al. 2020 ; Rae et al. 2021 ; Zhang et al. 2021 ). Various prompting strategies have been proposed, such as the zero-shot chain of thought (CoT) approach (Kojima et al. 2022 ), which involves manually crafting diverse and effective examples. However, manual efforts can lead to mistakes. To address this, Zhang et al. ( 2021 ) introduced an automatic CoT prompting method called Auto-CoT, which demonstrates matching or superior performance compared to the CoT paradigm. Another prompt framework is trees of thoughts, enabling a model to self-evaluate its progress at intermediate stages of problem-solving through deliberate reasoning (Yao et al. 2023 ).

Beyond linguistic studies, there has been a noticeable increase in the number of foreign workers in Japan and Japanese learners worldwide (Ministry of Health, Labor, and Welfare of Japan, 2022 ; Japan Foundation, 2021 ). However, existing assessment methods, such as the Japanese Language Proficiency Test (JLPT), J-CAT, and TTBJ Footnote 1 , primarily focus on reading, listening, vocabulary, and grammar skills, neglecting the evaluation of writing proficiency. As the number of workers and language learners continues to grow, there is a rising demand for an efficient AES system that can reduce costs and time for raters and be utilized for employment, examinations, and self-study purposes.

This study aims to explore the potential of LLM-based AES by comparing the effectiveness of five models: two LLMs (GPT Footnote 2 and BERT), one Japanese local LLM (OCLL), and two conventional machine learning-based methods (linguistic feature-based scoring tools - Jess and JWriter).

The research questions addressed in this study are as follows:

To what extent do the LLM-driven AES and linguistic feature-based AES, when used as automated tools to support human rating, accurately reflect test takers’ actual performance?

What influence does the prompt have on the accuracy and performance of LLM-based AES methods?

The subsequent sections of the manuscript cover the methodology, including the assessment measures for nonnative Japanese writing proficiency, criteria for prompts, and the dataset. The evaluation section focuses on the analysis of annotations and rating scores generated by LLM-driven and linguistic feature-based AES methods.

Methodology

The dataset utilized in this study was obtained from the International Corpus of Japanese as a Second Language (I-JAS) Footnote 3 . This corpus consisted of 1000 participants who represented 12 different first languages. For the study, the participants were given a story-writing task on a personal computer. They were required to write two stories based on the 4-panel illustrations titled “Picnic” and “The key” (see Appendix A). Background information for the participants was provided by the corpus, including their Japanese language proficiency levels assessed through two online tests: J-CAT and SPOT. These tests evaluated their reading, listening, vocabulary, and grammar abilities. The learners’ proficiency levels were categorized into six levels aligned with the Common European Framework of Reference for Languages (CEFR) and the Reference Framework for Japanese Language Education (RFJLE): A1, A2, B1, B2, C1, and C2. According to Lee et al. ( 2015 ), there is a high level of agreement (r = 0.86) between the J-CAT and SPOT assessments, indicating that the proficiency certifications provided by J-CAT are consistent with those of SPOT. However, it is important to note that the scores of J-CAT and SPOT do not have a one-to-one correspondence. In this study, the J-CAT scores were used as a benchmark to differentiate learners of different proficiency levels. A total of 1400 essays were utilized, representing the beginner (aligned with A1), A2, B1, B2, C1, and C2 levels based on the J-CAT scores. Table 1 provides information about the learners’ proficiency levels and their corresponding J-CAT and SPOT scores.

A dataset comprising a total of 1400 essays from the story writing tasks was collected. Among these, 714 essays were utilized to evaluate the reliability of the LLM-based AES method, while the remaining 686 essays were designated as development data to assess the LLM-based AES’s capability to distinguish participants with varying proficiency levels. The GPT 4 API was used in this study. A detailed explanation of the prompt-assessment criteria is provided in Section Prompt . All essays were sent to the model for measurement and scoring.

Measures of writing proficiency for nonnative Japanese

Japanese exhibits a morphologically agglutinative structure where morphemes are attached to the word stem to convey grammatical functions such as tense, aspect, voice, and honorifics, e.g. (5).

食べ-させ-られ-まし-た-か

tabe-sase-rare-mashi-ta-ka

[eat (stem)-causative-passive voice-honorification-tense. past-question marker]

Japanese employs nine case particles to indicate grammatical functions: the nominative case particle が (ga), the accusative case particle を (o), the genitive case particle の (no), the dative case particle に (ni), the locative/instrumental case particle で (de), the ablative case particle から (kara), the directional case particle へ (e), and the comitative case particle と (to). The agglutinative nature of the language, combined with the case particle system, provides an efficient means of distinguishing between active and passive voice, either through morphemes or case particles, e.g. 食べる taberu “eat concusive . ” (active voice); 食べられる taberareru “eat concusive . ” (passive voice). In the active voice, “パンを食べる” (pan o taberu) translates to “to eat bread”. On the other hand, in the passive voice, it becomes “パンが食べられた” (pan ga taberareta), which means “(the) bread was eaten”. Additionally, it is important to note that different conjugations of the same lemma are considered as one type in order to ensure a comprehensive assessment of the language features. For example, e.g., 食べる taberu “eat concusive . ”; 食べている tabeteiru “eat progress .”; 食べた tabeta “eat past . ” as one type.

To incorporate these features, previous research (Suzuki, 1999 ; Watanabe et al. 1988 ; Ishioka, 2001 ; Ishioka and Kameda, 2006 ; Hirao et al. 2020 ) has identified complexity, fluency, and accuracy as crucial factors for evaluating writing quality. These criteria are assessed through various aspects, including lexical richness (lexical density, diversity, and sophistication), syntactic complexity, and cohesion (Kyle et al. 2021 ; Mizumoto and Eguchi, 2023 ; Ure, 1971 ; Halliday, 1985 ; Barkaoui and Hadidi, 2020 ; Zenker and Kyle, 2021 ; Kim et al. 2018 ; Lu, 2017 ; Ortega, 2015 ). Therefore, this study proposes five scoring categories: lexical richness, syntactic complexity, cohesion, content elaboration, and grammatical accuracy. A total of 16 measures were employed to capture these categories. The calculation process and specific details of these measures can be found in Table 2 .

T-unit, first introduced by Hunt ( 1966 ), is a measure used for evaluating speech and composition. It serves as an indicator of syntactic development and represents the shortest units into which a piece of discourse can be divided without leaving any sentence fragments. In the context of Japanese language assessment, Sakoda and Hosoi ( 2020 ) utilized T-unit as the basic unit to assess the accuracy and complexity of Japanese learners’ speaking and storytelling. The calculation of T-units in Japanese follows the following principles:

A single main clause constitutes 1 T-unit, regardless of the presence or absence of dependent clauses, e.g. (6).

ケンとマリはピクニックに行きました (main clause): 1 T-unit.

If a sentence contains a main clause along with subclauses, each subclause is considered part of the same T-unit, e.g. (7).

天気が良かったので (subclause)、ケンとマリはピクニックに行きました (main clause): 1 T-unit.

In the case of coordinate clauses, where multiple clauses are connected, each coordinated clause is counted separately. Thus, a sentence with coordinate clauses may have 2 T-units or more, e.g. (8).

ケンは地図で場所を探して (coordinate clause)、マリはサンドイッチを作りました (coordinate clause): 2 T-units.

Lexical diversity refers to the range of words used within a text (Engber, 1995 ; Kyle et al. 2021 ) and is considered a useful measure of the breadth of vocabulary in L n production (Jarvis, 2013a , 2013b ).

The type/token ratio (TTR) is widely recognized as a straightforward measure for calculating lexical diversity and has been employed in numerous studies. These studies have demonstrated a strong correlation between TTR and other methods of measuring lexical diversity (e.g., Bentz et al. 2016 ; Čech and Miroslav, 2018 ; Çöltekin and Taraka, 2018 ). TTR is computed by considering both the number of unique words (types) and the total number of words (tokens) in a given text. Given that the length of learners’ writing texts can vary, this study employs the moving average type-token ratio (MATTR) to mitigate the influence of text length. MATTR is calculated using a 50-word moving window. Initially, a TTR is determined for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023 ). The final MATTR scores were obtained by averaging the TTR scores for all 50-word windows. The following formula was employed to derive MATTR:

${\rm{MATTR}}({\rm{W}})=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}$

Here, N refers to the number of tokens in the corpus. W is the randomly selected token size (W < N). ${F}_{i}$ is the number of types in each window. The ${\rm{MATTR}}({\rm{W}})$ is the mean of a series of type-token ratios (TTRs) based on the word form for all windows. It is expected that individuals with higher language proficiency will produce texts with greater lexical diversity, as indicated by higher MATTR scores.

Lexical density was captured by the ratio of the number of lexical words to the total number of words (Lu, 2012 ). Lexical sophistication refers to the utilization of advanced vocabulary, often evaluated through word frequency indices (Crossley et al. 2013 ; Haberman, 2008 ; Kyle and Crossley, 2015 ; Laufer and Nation, 1995 ; Lu, 2012 ; Read, 2000 ). In line of writing, lexical sophistication can be interpreted as vocabulary breadth, which entails the appropriate usage of vocabulary items across various lexicon-grammatical contexts and registers (Garner et al. 2019 ; Kim et al. 2018 ; Kyle et al. 2018 ). In Japanese specifically, words are considered lexically sophisticated if they are not included in the “Japanese Education Vocabulary List Ver 1.0”. Footnote 4 Consequently, lexical sophistication was calculated by determining the number of sophisticated word types relative to the total number of words per essay. Furthermore, it has been suggested that, in Japanese writing, sentences should ideally have a length of no more than 40 to 50 characters, as this promotes readability. Therefore, the median and maximum sentence length can be considered as useful indices for assessment (Ishioka and Kameda, 2006 ).

Syntactic complexity was assessed based on several measures, including the mean length of clauses, verb phrases per T-unit, clauses per T-unit, dependent clauses per T-unit, complex nominals per clause, adverbial clauses per clause, coordinate phrases per clause, and mean dependency distance (MDD). The MDD reflects the distance between the governor and dependent positions in a sentence. A larger dependency distance indicates a higher cognitive load and greater complexity in syntactic processing (Liu, 2008 ; Liu et al. 2017 ). The MDD has been established as an efficient metric for measuring syntactic complexity (Jiang, Quyang, and Liu, 2019 ; Li and Yan, 2021 ). To calculate the MDD, the position numbers of the governor and dependent are subtracted, assuming that words in a sentence are assigned in a linear order, such as W1 … Wi … Wn. In any dependency relationship between words Wa and Wb, Wa is the governor and Wb is the dependent. The MDD of the entire sentence was obtained by taking the absolute value of governor – dependent:

MDD = $\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|$

In this formula, $n$ represents the number of words in the sentence, and ${DD}i$ is the dependency distance of the ${i}^{{th}}$ dependency relationship of a sentence. Building on this, the annotation of sentence ‘Mary-ga-John-ni-keshigomu-o-watashita was [Mary- top -John- dat -eraser- acc -give- past] ’. The sentence’s MDD would be 2. Table 3 provides the CSV file as a prompt for GPT 4.

Cohesion (semantic similarity) and content elaboration aim to capture the ideas presented in test taker’s essays. Cohesion was assessed using three measures: Synonym overlap/paragraph (topic), Synonym overlap/paragraph (keywords), and word2vec cosine similarity. Content elaboration and development were measured as the number of metadiscourse markers (type)/number of words. To capture content closely, this study proposed a novel-distance based representation, by encoding the cosine distance between the essay (by learner) and essay task’s (topic and keyword) i -vectors. The learner’s essay is decoded into a word sequence, and aligned to the essay task’ topic and keyword for log-likelihood measurement. The cosine distance reveals the content elaboration score in the leaners’ essay. The mathematical equation of cosine similarity between target-reference vectors is shown in (11), assuming there are i essays and ( L i , …. L n ) and ( N i , …. N n ) are the vectors representing the learner and task’s topic and keyword respectively. The content elaboration distance between L i and N i was calculated as follows:

$\cos \left(\theta \right)=\frac{{\rm{L}}\,\cdot\, {\rm{N}}}{\left|{\rm{L}}\right|{\rm{|N|}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}{N}_{i}}{\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}^{2}}\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{N}_{i}^{2}}}$

A high similarity value indicates a low difference between the two recognition outcomes, which in turn suggests a high level of proficiency in content elaboration.

To evaluate the effectiveness of the proposed measures in distinguishing different proficiency levels among nonnative Japanese speakers’ writing, we conducted a multi-faceted Rasch measurement analysis (Linacre, 1994 ). This approach applies measurement models to thoroughly analyze various factors that can influence test outcomes, including test takers’ proficiency, item difficulty, and rater severity, among others. The underlying principles and functionality of multi-faceted Rasch measurement are illustrated in (12).

$\log \left(\frac{{P}_{{nijk}}}{{P}_{{nij}(k-1)}}\right)={B}_{n}-{D}_{i}-{C}_{j}-{F}_{k}$

(12) defines the logarithmic transformation of the probability ratio ( P nijk /P nij(k-1) )) as a function of multiple parameters. Here, n represents the test taker, i denotes a writing proficiency measure, j corresponds to the human rater, and k represents the proficiency score. The parameter B n signifies the proficiency level of test taker n (where n ranges from 1 to N). D j represents the difficulty parameter of test item i (where i ranges from 1 to L), while C j represents the severity of rater j (where j ranges from 1 to J). Additionally, F k represents the step difficulty for a test taker to move from score ‘k-1’ to k . P nijk refers to the probability of rater j assigning score k to test taker n for test item i . P nij(k-1) represents the likelihood of test taker n being assigned score ‘k-1’ by rater j for test item i . Each facet within the test is treated as an independent parameter and estimated within the same reference framework. To evaluate the consistency of scores obtained through both human and computer analysis, we utilized the Infit mean-square statistic. This statistic is a chi-square measure divided by the degrees of freedom and is weighted with information. It demonstrates higher sensitivity to unexpected patterns in responses to items near a person’s proficiency level (Linacre, 2002 ). Fit statistics are assessed based on predefined thresholds for acceptable fit. For the Infit MNSQ, which has a mean of 1.00, different thresholds have been suggested. Some propose stricter thresholds ranging from 0.7 to 1.3 (Bond et al. 2021 ), while others suggest more lenient thresholds ranging from 0.5 to 1.5 (Eckes, 2009 ). In this study, we adopted the criterion of 0.70–1.30 for the Infit MNSQ.

Moving forward, we can now proceed to assess the effectiveness of the 16 proposed measures based on five criteria for accurately distinguishing various levels of writing proficiency among non-native Japanese speakers. To conduct this evaluation, we utilized the development dataset from the I-JAS corpus, as described in Section Dataset . Table 4 provides a measurement report that presents the performance details of the 14 metrics under consideration. The measure separation was found to be 4.02, indicating a clear differentiation among the measures. The reliability index for the measure separation was 0.891, suggesting consistency in the measurement. Similarly, the person separation reliability index was 0.802, indicating the accuracy of the assessment in distinguishing between individuals. All 16 measures demonstrated Infit mean squares within a reasonable range, ranging from 0.76 to 1.28. The Synonym overlap/paragraph (topic) measure exhibited a relatively high outfit mean square of 1.46, although the Infit mean square falls within an acceptable range. The standard error for the measures ranged from 0.13 to 0.28, indicating the precision of the estimates.

Table 5 further illustrated the weights assigned to different linguistic measures for score prediction, with higher weights indicating stronger correlations between those measures and higher scores. Specifically, the following measures exhibited higher weights compared to others: moving average type token ratio per essay has a weight of 0.0391. Mean dependency distance had a weight of 0.0388. Mean length of clause, calculated by dividing the number of words by the number of clauses, had a weight of 0.0374. Complex nominals per T-unit, calculated by dividing the number of complex nominals by the number of T-units, had a weight of 0.0379. Coordinate phrases rate, calculated by dividing the number of coordinate phrases by the number of clauses, had a weight of 0.0325. Grammatical error rate, representing the number of errors per essay, had a weight of 0.0322.

Criteria (output indicator)

The criteria used to evaluate the writing ability in this study were based on CEFR, which follows a six-point scale ranging from A1 to C2. To assess the quality of Japanese writing, the scoring criteria from Table 6 were utilized. These criteria were derived from the IELTS writing standards and served as assessment guidelines and prompts for the written output.

A prompt is a question or detailed instruction that is provided to the model to obtain a proper response. After several pilot experiments, we decided to provide the measures (Section Measures of writing proficiency for nonnative Japanese ) as the input prompt and use the criteria (Section Criteria (output indicator) ) as the output indicator. Regarding the prompt language, considering that the LLM was tasked with rating Japanese essays, would prompt in Japanese works better Footnote 5 ? We conducted experiments comparing the performance of GPT-4 using both English and Japanese prompts. Additionally, we utilized the Japanese local model OCLL with Japanese prompts. Multiple trials were conducted using the same sample. Regardless of the prompt language used, we consistently obtained the same grading results with GPT-4, which assigned a grade of B1 to the writing sample. This suggested that GPT-4 is reliable and capable of producing consistent ratings regardless of the prompt language. On the other hand, when we used Japanese prompts with the Japanese local model “OCLL”, we encountered inconsistent grading results. Out of 10 attempts with OCLL, only 6 yielded consistent grading results (B1), while the remaining 4 showed different outcomes, including A1 and B2 grades. These findings indicated that the language of the prompt was not the determining factor for reliable AES. Instead, the size of the training data and the model parameters played crucial roles in achieving consistent and reliable AES results for the language model.

The following is the utilized prompt, which details all measures and requires the LLM to score the essays using holistic and trait scores.

Please evaluate Japanese essays written by Japanese learners and assign a score to each essay on a six-point scale, ranging from A1, A2, B1, B2, C1 to C2. Additionally, please provide trait scores and display the calculation process for each trait score. The scoring should be based on the following criteria:

Moving average type-token ratio.

Number of lexical words (token) divided by the total number of words per essay.

Number of sophisticated word types divided by the total number of words per essay.

Mean length of clause.

Verb phrases per T-unit.

Clauses per T-unit.

Dependent clauses per T-unit.

Complex nominals per clause.

Adverbial clauses per clause.

Coordinate phrases per clause.

Mean dependency distance.

Synonym overlap paragraph (topic and keywords).

Word2vec cosine similarity.

Connectives per essay.

Conjunctions per essay.

Number of metadiscourse markers (types) divided by the total number of words.

Number of errors per essay.

Japanese essay text

出かける前に二人が地図を見ている間に、サンドイッチを入れたバスケットに犬が入ってしまいました。それに気づかずに二人は楽しそうに出かけて行きました。やがて突然犬がバスケットから飛び出し、二人は驚きました。バスケットの中を見ると、食べ物はすべて犬に食べられていて、二人は困ってしまいました。(ID_JJJ01_SW1)

The score of the example above was B1. Figure 3 provides an example of holistic and trait scores provided by GPT-4 (with a prompt indicating all measures) via Bing Footnote 6 .

Example of GPT-4 AES and feedback (with a prompt indicating all measures).

Statistical analysis

The aim of this study is to investigate the potential use of LLM for nonnative Japanese AES. It seeks to compare the scoring outcomes obtained from feature-based AES tools, which rely on conventional machine learning technology (i.e. Jess, JWriter), with those generated by AI-driven AES tools utilizing deep learning technology (BERT, GPT, OCLL). To assess the reliability of a computer-assisted annotation tool, the study initially established human-human agreement as the benchmark measure. Subsequently, the performance of the LLM-based method was evaluated by comparing it to human-human agreement.

To assess annotation agreement, the study employed standard measures such as precision, recall, and F-score (Brants 2000 ; Lu 2010 ), along with the quadratically weighted kappa (QWK) to evaluate the consistency and agreement in the annotation process. Assume A and B represent human annotators. When comparing the annotations of the two annotators, the following results are obtained. The evaluation of precision, recall, and F-score metrics was illustrated in equations (13) to (15).

${\rm{Recall}}(A,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,A}$

${\rm{Precision}}(A,\,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,B}$

The F-score is the harmonic mean of recall and precision:

${\rm{F}}-{\rm{score}}=\frac{2* ({\rm{Precision}}* {\rm{Recall}})}{{\rm{Precision}}+{\rm{Recall}}}$

The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.

In accordance with Taghipour and Ng ( 2016 ), the calculation of QWK involves two steps:

Step 1: Construct a weight matrix W as follows:

${W}_{{ij}}=\frac{{(i-j)}^{2}}{{(N-1)}^{2}}$

i represents the annotation made by the tool, while j represents the annotation made by a human rater. N denotes the total number of possible annotations. Matrix O is subsequently computed, where O_( i, j ) represents the count of data annotated by the tool ( i ) and the human annotator ( j ). On the other hand, E refers to the expected count matrix, which undergoes normalization to ensure that the sum of elements in E matches the sum of elements in O.

Step 2: With matrices O and E, the QWK is obtained as follows:

K = 1- $\frac{\sum i,j{W}_{i,j}\,{O}_{i,j}}{\sum i,j{W}_{i,j}\,{E}_{i,j}}$

The value of the quadratic weighted kappa increases as the level of agreement improves. Further, to assess the accuracy of LLM scoring, the proportional reductive mean square error (PRMSE) was employed. The PRMSE approach takes into account the variability observed in human ratings to estimate the rater error, which is then subtracted from the variance of the human labels. This calculation provides an overall measure of agreement between the automated scores and true scores (Haberman et al. 2015 ; Loukina et al. 2020 ; Taghipour and Ng, 2016 ). The computation of PRMSE involves the following steps:

Step 1: Calculate the mean squared errors (MSEs) for the scoring outcomes of the computer-assisted tool (MSE tool) and the human scoring outcomes (MSE human).

Step 2: Determine the PRMSE by comparing the MSE of the computer-assisted tool (MSE tool) with the MSE from human raters (MSE human), using the following formula:

${\rm{PRMSE}}=1-\frac{({\rm{MSE}}\,{\rm{tool}})\,}{({\rm{MSE}}\,{\rm{human}})\,}=1-\,\frac{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-{\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-\hat{{\rm{y}}})}^{2}}$

In the numerator, ŷi represents the scoring outcome predicted by a specific LLM-driven AES system for a given sample. The term y i − ŷ i represents the difference between this predicted outcome and the mean value of all LLM-driven AES systems’ scoring outcomes. It quantifies the deviation of the specific LLM-driven AES system’s prediction from the average prediction of all LLM-driven AES systems. In the denominator, y i − ŷ represents the difference between the scoring outcome provided by a specific human rater for a given sample and the mean value of all human raters’ scoring outcomes. It measures the discrepancy between the specific human rater’s score and the average score given by all human raters. The PRMSE is then calculated by subtracting the ratio of the MSE tool to the MSE human from 1. PRMSE falls within the range of 0 to 1, with larger values indicating reduced errors in LLM’s scoring compared to those of human raters. In other words, a higher PRMSE implies that LLM’s scoring demonstrates greater accuracy in predicting the true scores (Loukina et al. 2020 ). The interpretation of kappa values, ranging from 0 to 1, is based on the work of Landis and Koch ( 1977 ). Specifically, the following categories are assigned to different ranges of kappa values: −1 indicates complete inconsistency, 0 indicates random agreement, 0.0 ~ 0.20 indicates extremely low level of agreement (slight), 0.21 ~ 0.40 indicates moderate level of agreement (fair), 0.41 ~ 0.60 indicates medium level of agreement (moderate), 0.61 ~ 0.80 indicates high level of agreement (substantial), 0.81 ~ 1 indicates almost perfect level of agreement. All statistical analyses were executed using Python script.

Results and discussion

Annotation reliability of the llm.

This section focuses on assessing the reliability of the LLM’s annotation and scoring capabilities. To evaluate the reliability, several tests were conducted simultaneously, aiming to achieve the following objectives:

Assess the LLM’s ability to differentiate between test takers with varying levels of oral proficiency.

Determine the level of agreement between the annotations and scoring performed by the LLM and those done by human raters.

The evaluation of the results encompassed several metrics, including: precision, recall, F-Score, quadratically-weighted kappa, proportional reduction of mean squared error, Pearson correlation, and multi-faceted Rasch measurement.

Inter-annotator agreement (human–human annotator agreement)

We started with an agreement test of the two human annotators. Two trained annotators were recruited to determine the writing task data measures. A total of 714 scripts, as the test data, was utilized. Each analysis lasted 300–360 min. Inter-annotator agreement was evaluated using the standard measures of precision, recall, and F-score and QWK. Table 7 presents the inter-annotator agreement for the various indicators. As shown, the inter-annotator agreement was fairly high, with F-scores ranging from 1.0 for sentence and word number to 0.666 for grammatical errors.

The findings from the QWK analysis provided further confirmation of the inter-annotator agreement. The QWK values covered a range from 0.950 ( p = 0.000) for sentence and word number to 0.695 for synonym overlap number (keyword) and grammatical errors ( p = 0.001).

Agreement of annotation outcomes between human and LLM

To evaluate the consistency between human annotators and LLM annotators (BERT, GPT, OCLL) across the indices, the same test was conducted. The results of the inter-annotator agreement (F-score) between LLM and human annotation are provided in Appendix B-D. The F-scores ranged from 0.706 for Grammatical error # for OCLL-human to a perfect 1.000 for GPT-human, for sentences, clauses, T-units, and words. These findings were further supported by the QWK analysis, which showed agreement levels ranging from 0.807 ( p = 0.001) for metadiscourse markers for OCLL-human to 0.962 for words ( p = 0.000) for GPT-human. The findings demonstrated that the LLM annotation achieved a significant level of accuracy in identifying measurement units and counts.

Reliability of LLM-driven AES’s scoring and discriminating proficiency levels

This section examines the reliability of the LLM-driven AES scoring through a comparison of the scoring outcomes produced by human raters and the LLM ( Reliability of LLM-driven AES scoring ). It also assesses the effectiveness of the LLM-based AES system in differentiating participants with varying proficiency levels ( Reliability of LLM-driven AES discriminating proficiency levels ).

Reliability of LLM-driven AES scoring

Table 8 summarizes the QWK coefficient analysis between the scores computed by the human raters and the GPT-4 for the individual essays from I-JAS Footnote 7 . As shown, the QWK of all measures ranged from k = 0.819 for lexical density (number of lexical words (tokens)/number of words per essay) to k = 0.644 for word2vec cosine similarity. Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for syntactic complexity to 0.734 for grammatical accuracy. The correlations between the writing proficiency scores assigned by human raters and the BERT-based AES system were found to range from 0.661 for syntactic complexity to 0.713 for grammatical accuracy. The correlations between the writing proficiency scores given by human raters and the OCLL-based AES system ranged from 0.654 for cohesion to 0.721 for grammatical accuracy. These findings indicated an alignment between the assessments made by human raters and both the BERT-based and OCLL-based AES systems in terms of various aspects of writing proficiency.

Reliability of LLM-driven AES discriminating proficiency levels

After validating the reliability of the LLM’s annotation and scoring, the subsequent objective was to evaluate its ability to distinguish between various proficiency levels. For this analysis, a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the outcomes of the one-way ANOVAs based on the measures assessed by the GPT-4 model. A post hoc multiple comparison test, specifically the Bonferroni test, was conducted to identify any potential differences between pairs of levels.

As the results reveal, seven measures presented linear upward or downward progress across the three proficiency levels. These were marked in bold in Table 10 and comprise one measure of lexical richness, i.e. MATTR (lexical diversity); four measures of syntactic complexity, i.e. MDD (mean dependency distance), MLC (mean length of clause), CNT (complex nominals per T-unit), CPC (coordinate phrases rate); one cohesion measure, i.e. word2vec cosine similarity and GER (grammatical error rate). Regarding the ability of the sixteen measures to distinguish adjacent proficiency levels, the Bonferroni tests indicated that statistically significant differences exist between the primary level and the intermediate level for MLC and GER. One measure of lexical richness, namely LD, along with three measures of syntactic complexity (VPT, CT, DCT, ACC), two measures of cohesion (SOPT, SOPK), and one measure of content elaboration (IMM), exhibited statistically significant differences between proficiency levels. However, these differences did not demonstrate a linear progression between adjacent proficiency levels. No significant difference was observed in lexical sophistication between proficiency levels.

To summarize, our study aimed to evaluate the reliability and differentiation capabilities of the LLM-driven AES method. For the first objective, we assessed the LLM’s ability to differentiate between test takers with varying levels of oral proficiency using precision, recall, F-Score, and quadratically-weighted kappa. Regarding the second objective, we compared the scoring outcomes generated by human raters and the LLM to determine the level of agreement. We employed quadratically-weighted kappa and Pearson correlations to compare the 16 writing proficiency measures for the individual essays. The results confirmed the feasibility of using the LLM for annotation and scoring in AES for nonnative Japanese. As a result, Research Question 1 has been addressed.

Comparison of BERT-, GPT-, OCLL-based AES, and linguistic-feature-based computation methods

This section aims to compare the effectiveness of five AES methods for nonnative Japanese writing, i.e. LLM-driven approaches utilizing BERT, GPT, and OCLL, linguistic feature-based approaches using Jess and JWriter. The comparison was conducted by comparing the ratings obtained from each approach with human ratings. All ratings were derived from the dataset introduced in Dataset . To facilitate the comparison, the agreement between the automated methods and human ratings was assessed using QWK and PRMSE. The performance of each approach was summarized in Table 11 .

The QWK coefficient values indicate that LLMs (GPT, BERT, OCLL) and human rating outcomes demonstrated higher agreement compared to feature-based AES methods (Jess and JWriter) in assessing writing proficiency criteria, including lexical richness, syntactic complexity, content, and grammatical accuracy. Among the LLMs, the GPT-4 driven AES and human rating outcomes showed the highest agreement in all criteria, except for syntactic complexity. The PRMSE values suggest that the GPT-based method outperformed linguistic feature-based methods and other LLM-based approaches. Moreover, an interesting finding emerged during the study: the agreement coefficient between GPT-4 and human scoring was even higher than the agreement between different human raters themselves. This discovery highlights the advantage of GPT-based AES over human rating. Ratings involve a series of processes, including reading the learners’ writing, evaluating the content and language, and assigning scores. Within this chain of processes, various biases can be introduced, stemming from factors such as rater biases, test design, and rating scales. These biases can impact the consistency and objectivity of human ratings. GPT-based AES may benefit from its ability to apply consistent and objective evaluation criteria. By prompting the GPT model with detailed writing scoring rubrics and linguistic features, potential biases in human ratings can be mitigated. The model follows a predefined set of guidelines and does not possess the same subjective biases that human raters may exhibit. This standardization in the evaluation process contributes to the higher agreement observed between GPT-4 and human scoring. Section Prompt strategy of the study delves further into the role of prompts in the application of LLMs to AES. It explores how the choice and implementation of prompts can impact the performance and reliability of LLM-based AES methods. Furthermore, it is important to acknowledge the strengths of the local model, i.e. the Japanese local model OCLL, which excels in processing certain idiomatic expressions. Nevertheless, our analysis indicated that GPT-4 surpasses local models in AES. This superior performance can be attributed to the larger parameter size of GPT-4, estimated to be between 500 billion and 1 trillion, which exceeds the sizes of both BERT and the local model OCLL.

Prompt strategy

In the context of prompt strategy, Mizumoto and Eguchi ( 2023 ) conducted a study where they applied the GPT-3 model to automatically score English essays in the TOEFL test. They found that the accuracy of the GPT model alone was moderate to fair. However, when they incorporated linguistic measures such as cohesion, syntactic complexity, and lexical features alongside the GPT model, the accuracy significantly improved. This highlights the importance of prompt engineering and providing the model with specific instructions to enhance its performance. In this study, a similar approach was taken to optimize the performance of LLMs. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. Model 1 was used as the baseline, representing GPT-4 without any additional prompting. Model 2, on the other hand, involved GPT-4 prompted with 16 measures that included scoring criteria, efficient linguistic features for writing assessment, and detailed measurement units and calculation formulas. The remaining models (Models 3 to 18) utilized GPT-4 prompted with individual measures. The performance of these 18 different models was assessed using the output indicators described in Section Criteria (output indicator) . By comparing the performances of these models, the study aimed to understand the impact of prompt engineering on the accuracy and effectiveness of GPT-4 in AES tasks.

Based on the PRMSE scores presented in Fig. 4 , it was observed that Model 1, representing GPT-4 without any additional prompting, achieved a fair level of performance. However, Model 2, which utilized GPT-4 prompted with all measures, outperformed all other models in terms of PRMSE score, achieving a score of 0.681. These results indicate that the inclusion of specific measures and prompts significantly enhanced the performance of GPT-4 in AES. Among the measures, syntactic complexity was found to play a particularly significant role in improving the accuracy of GPT-4 in assessing writing quality. Following that, lexical diversity emerged as another important factor contributing to the model’s effectiveness. The study suggests that a well-prompted GPT-4 can serve as a valuable tool to support human assessors in evaluating writing quality. By utilizing GPT-4 as an automated scoring tool, the evaluation biases associated with human raters can be minimized. This has the potential to empower teachers by allowing them to focus on designing writing tasks and guiding writing strategies, while leveraging the capabilities of GPT-4 for efficient and reliable scoring.

PRMSE scores of the 18 AES models.

This study aimed to investigate two main research questions: the feasibility of utilizing LLMs for AES and the impact of prompt engineering on the application of LLMs in AES.

To address the first objective, the study compared the effectiveness of five different models: GPT, BERT, the Japanese local LLM (OCLL), and two conventional machine learning-based AES tools (Jess and JWriter). The PRMSE values indicated that the GPT-4-based method outperformed other LLMs (BERT, OCLL) and linguistic feature-based computational methods (Jess and JWriter) across various writing proficiency criteria. Furthermore, the agreement coefficient between GPT-4 and human scoring surpassed the agreement among human raters themselves, highlighting the potential of using the GPT-4 tool to enhance AES by reducing biases and subjectivity, saving time, labor, and cost, and providing valuable feedback for self-study. Regarding the second goal, the role of prompt design was investigated by comparing 18 models, including a baseline model, a model prompted with all measures, and 16 models prompted with one measure at a time. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. The PRMSE scores of the models showed that GPT-4 prompted with all measures achieved the best performance, surpassing the baseline and other models.

In conclusion, this study has demonstrated the potential of LLMs in supporting human rating in assessments. By incorporating automation, we can save time and resources while reducing biases and subjectivity inherent in human rating processes. Automated language assessments offer the advantage of accessibility, providing equal opportunities and economic feasibility for individuals who lack access to traditional assessment centers or necessary resources. LLM-based language assessments provide valuable feedback and support to learners, aiding in the enhancement of their language proficiency and the achievement of their goals. This personalized feedback can cater to individual learner needs, facilitating a more tailored and effective language-learning experience.

There are three important areas that merit further exploration. First, prompt engineering requires attention to ensure optimal performance of LLM-based AES across different language types. This study revealed that GPT-4, when prompted with all measures, outperformed models prompted with fewer measures. Therefore, investigating and refining prompt strategies can enhance the effectiveness of LLMs in automated language assessments. Second, it is crucial to explore the application of LLMs in second-language assessment and learning for oral proficiency, as well as their potential in under-resourced languages. Recent advancements in self-supervised machine learning techniques have significantly improved automatic speech recognition (ASR) systems, opening up new possibilities for creating reliable ASR systems, particularly for under-resourced languages with limited data. However, challenges persist in the field of ASR. First, ASR assumes correct word pronunciation for automatic pronunciation evaluation, which proves challenging for learners in the early stages of language acquisition due to diverse accents influenced by their native languages. Accurately segmenting short words becomes problematic in such cases. Second, developing precise audio-text transcriptions for languages with non-native accented speech poses a formidable task. Last, assessing oral proficiency levels involves capturing various linguistic features, including fluency, pronunciation, accuracy, and complexity, which are not easily captured by current NLP technology.

Data availability

The dataset utilized was obtained from the International Corpus of Japanese as a Second Language (I-JAS). The data URLs: [ https://www2.ninjal.ac.jp/jll/lsaj/ihome2.html ].

J-CAT and TTBJ are two computerized adaptive tests used to assess Japanese language proficiency.

SPOT is a specific component of the TTBJ test.

J-CAT: https://www.j-cat2.org/html/ja/pages/interpret.html

SPOT: https://ttbj.cegloc.tsukuba.ac.jp/p1.html#SPOT .

The study utilized a prompt-based GPT-4 model, developed by OpenAI, which has an impressive architecture with 1.8 trillion parameters across 120 layers. GPT-4 was trained on a vast dataset of 13 trillion tokens, using two stages: initial training on internet text datasets to predict the next token, and subsequent fine-tuning through reinforcement learning from human feedback.

https://www2.ninjal.ac.jp/jll/lsaj/ihome2-en.html .

http://jhlee.sakura.ne.jp/JEV/ by Japanese Learning Dictionary Support Group 2015.

We express our sincere gratitude to the reviewer for bringing this matter to our attention.

On February 7, 2023, Microsoft began rolling out a major overhaul to Bing that included a new chatbot feature based on OpenAI’s GPT-4 (Bing.com).

Appendix E-F present the analysis results of the QWK coefficient between the scores computed by the human raters and the BERT, OCLL models.

Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V.2. J. Technol., Learn. Assess., 4

Barkaoui K, Hadidi A (2020) Assessing Change in English Second Language Writing Performance (1st ed.). Routledge, New York. https://doi.org/10.4324/9781003092346

Bentz C, Tatyana R, Koplenig A, Tanja S (2016) A comparison between morphological complexity. measures: Typological data vs. language corpora. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 142–153. Osaka, Japan: The COLING 2016 Organizing Committee

Bond TG, Yan Z, Heene M (2021) Applying the Rasch model: Fundamental measurement in the human sciences (4th ed). Routledge

Brants T (2000) Inter-annotator agreement for a German newspaper corpus. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, 31 May-2 June, European Language Resources Association

Brown TB, Mann B, Ryder N, et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, Online, 6–12 December, Curran Associates, Inc., Red Hook, NY

Burstein J (2003) The E-rater scoring engine: Automated essay scoring with natural language processing. In Shermis MD and Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Čech R, Miroslav K (2018) Morphological richness of text. In Masako F, Václav C (ed) Taming the corpus: From inflection and lexis to interpretation, 63–77. Cham, Switzerland: Springer Nature

Çöltekin Ç, Taraka, R (2018) Exploiting Universal Dependencies treebanks for measuring morphosyntactic complexity. In Aleksandrs B, Christian B (ed), Proceedings of first workshop on measuring language complexity, 1–7. Torun, Poland

Crossley SA, Cobb T, McNamara DS (2013) Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System 41:965–981. https://doi.org/10.1016/j.system.2013.08.002

Article Google Scholar

Crossley SA, McNamara DS (2016) Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 7:351–370

CyberAgent Inc (2023) Open-Calm series of Japanese language models. Retrieved from: https://www.cyberagent.co.jp/news/detail/id=28817

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Minnesota, 2–7 June, pp. 4171–4186. Association for Computational Linguistics

Diez-Ortega M, Kyle K (2023) Measuring the development of lexical richness of L2 Spanish: a longitudinal learner corpus study. Studies in Second Language Acquisition 1-31

Eckes T (2009) On common ground? How raters perceive scoring criteria in oral proficiency testing. In Brown A, Hill K (ed) Language testing and evaluation 13: Tasks and criteria in performance assessment (pp. 43–73). Peter Lang Publishing

Elliot S (2003) IntelliMetric: from here to validity. In: Shermis MD, Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Google Scholar

Engber CA (1995) The relationship of lexical proficiency to the quality of ESL compositions. J. Second Lang. Writ. 4:139–155

Garner J, Crossley SA, Kyle K (2019) N-gram measures and L2 writing proficiency. System 80:176–187. https://doi.org/10.1016/j.system.2018.12.001

Haberman SJ (2008) When can subscores have value? J. Educat. Behav. Stat., 33:204–229

Haberman SJ, Yao L, Sinharay S (2015) Prediction of true test scores from observed item scores and ancillary data. Brit. J. Math. Stat. Psychol. 68:363–385

Halliday MAK (1985) Spoken and Written Language. Deakin University Press, Melbourne, Australia

Hirao R, Arai M, Shimanaka H et al. (2020) Automated essay scoring system for nonnative Japanese learners. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1250–1257. European Language Resources Association

Hunt KW (1966) Recent Measures in Syntactic Development. Elementary English, 43(7), 732–739. http://www.jstor.org/stable/41386067

Ishioka T (2001) About e-rater, a computer-based automatic scoring system for essays [Konpyūta ni yoru essei no jidō saiten shisutemu e − rater ni tsuite]. University Entrance Examination. Forum [Daigaku nyūshi fōramu] 24:71–76

Hochreiter S, Schmidhuber J (1997) Long short- term memory. Neural Comput. 9(8):1735–1780

Article CAS PubMed Google Scholar

Ishioka T, Kameda M (2006) Automated Japanese essay scoring system based on articles written by experts. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–18 July 2006, pp. 233-240. Association for Computational Linguistics, USA

Japan Foundation (2021) Retrieved from: https://www.jpf.gp.jp/j/project/japanese/survey/result/dl/survey2021/all.pdf

Jarvis S (2013a) Defining and measuring lexical diversity. In Jarvis S, Daller M (ed) Vocabulary knowledge: Human ratings and automated measures (Vol. 47, pp. 13–44). John Benjamins. https://doi.org/10.1075/sibil.47.03ch1

Jarvis S (2013b) Capturing the diversity in lexical diversity. Lang. Learn. 63:87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Jiang J, Quyang J, Liu H (2019) Interlanguage: A perspective of quantitative linguistic typology. Lang. Sci. 74:85–97

Kim M, Crossley SA, Kyle K (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Mod. Lang. J. 102(1):120–141. https://doi.org/10.1111/modl.12447

Kojima T, Gu S, Reid M et al. (2022) Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, New Orleans, LA, 29 November-1 December, Curran Associates, Inc., Red Hook, NY

Kyle K, Crossley SA (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Q 49:757–786

Kyle K, Crossley SA, Berger CM (2018) The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 50:1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Article PubMed Google Scholar

Kyle K, Crossley SA, Jarvis S (2021) Assessing the validity of lexical diversity using direct judgements. Lang. Assess. Q. 18:154–170. https://doi.org/10.1080/15434303.2020.1844205

Landauer TK, Laham D, Foltz PW (2003) Automated essay scoring and annotation of essays with the Intelligent Essay Assessor. In Shermis MD, Burstein JC (ed), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174

Laufer B, Nation P (1995) Vocabulary size and use: Lexical richness in L2 written production. Appl. Linguist. 16:307–322. https://doi.org/10.1093/applin/16.3.307

Lee J, Hasebe Y (2017) jWriter Learner Text Evaluator, URL: https://jreadability.net/jwriter/

Lee J, Kobayashi N, Sakai T, Sakota K (2015) A Comparison of SPOT and J-CAT Based on Test Analysis [Tesuto bunseki ni motozuku ‘SPOT’ to ‘J-CAT’ no hikaku]. Research on the Acquisition of Second Language Japanese [Dainigengo to shite no nihongo no shūtoku kenkyū] (18) 53–69

Li W, Yan J (2021) Probability distribution of dependency distance based on a Treebank of. Japanese EFL Learners’ Interlanguage. J. Quant. Linguist. 28(2):172–186. https://doi.org/10.1080/09296174.2020.1754611

Article MathSciNet Google Scholar

Linacre JM (2002) Optimizing rating scale category effectiveness. J. Appl. Meas. 3(1):85–106

PubMed Google Scholar

Linacre JM (1994) Constructing measurement with a Many-Facet Rasch Model. In Wilson M (ed) Objective measurement: Theory into practice, Volume 2 (pp. 129–144). Norwood, NJ: Ablex

Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J. Cognitive Sci. 9:159–191

Liu H, Xu C, Liang J (2017) Dependency distance: A new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21. https://doi.org/10.1016/j.plrev.2017.03.002

Loukina A, Madnani N, Cahill A, et al. (2020) Using PRMSE to evaluate automated scoring systems in the presence of label noise. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA → Online, 10 July, pp. 18–29. Association for Computational Linguistics

Lu X (2010) Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15:474–496

Lu X (2012) The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 96:190–208

Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 34:493–511

Lu X, Hu R (2022) Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behav. Res. Method. 54:1444–1460. https://doi.org/10.3758/s13428-021-01675-6

Ministry of Health, Labor, and Welfare of Japan (2022) Retrieved from: https://www.mhlw.go.jp/stf/newpage_30367.html

Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 3:100050

Okgetheng B, Takeuchi K (2024) Estimating Japanese Essay Grading Scores with Large Language Models. Proceedings of 30th Annual Conference of the Language Processing Society in Japan, March 2024

Ortega L (2015) Second language learning explained? SLA across 10 contemporary theories. In VanPatten B, Williams J (ed) Theories in Second Language Acquisition: An Introduction

Rae JW, Borgeaud S, Cai T, et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. ArXiv, abs/2112.11446

Read J (2000) Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

Rudner LM, Liang T (2002) Automated Essay Scoring Using Bayes’ Theorem. J. Technol., Learning and Assessment, 1 (2)

Sakoda K, Hosoi Y (2020) Accuracy and complexity of Japanese Language usage by SLA learners in different learning environments based on the analysis of I-JAS, a learners’ corpus of Japanese as L2. Math. Linguist. 32(7):403–418. https://doi.org/10.24701/mathling.32.7_403

Suzuki N (1999) Summary of survey results regarding comprehensive essay questions. Final report of “Joint Research on Comprehensive Examinations for the Aim of Evaluating Applicability to Each Specialized Field of Universities” for 1996-2000 [shōronbun sōgō mondai ni kansuru chōsa kekka no gaiyō. Heisei 8 - Heisei 12-nendo daigaku no kaku senmon bun’ya e no tekisei no hyōka o mokuteki to suru sōgō shiken no arikata ni kansuru kyōdō kenkyū’ saishū hōkoku-sho]. University Entrance Examination Section Center Research and Development Department [Daigaku nyūshi sentā kenkyū kaihatsubu], 21–32

Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1882–1891. Association for Computational Linguistics

Takeuchi K, Ohno M, Motojin K, Taguchi M, Inada Y, Iizuka M, Abo T, Ueda H (2021) Development of essay scoring methods based on reference texts with construction of research-available Japanese essay data. In IPSJ J 62(9):1586–1604

Ure J (1971) Lexical density: A computational technique and some findings. In Coultard M (ed) Talking about Text. English Language Research, University of Birmingham, Birmingham, England

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Long Beach, CA, 4–7 December, pp. 5998–6008, Curran Associates, Inc., Red Hook, NY

Watanabe H, Taira Y, Inoue Y (1988) Analysis of essay evaluation data [Shōronbun hyōka dēta no kaiseki]. Bulletin of the Faculty of Education, University of Tokyo [Tōkyōdaigaku kyōiku gakubu kiyō], Vol. 28, 143–164

Yao S, Yu D, Zhao J, et al. (2023) Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36

Zenker F, Kyle K (2021) Investigating minimum text lengths for lexical diversity indices. Assess. Writ. 47:100505. https://doi.org/10.1016/j.asw.2020.100505

Zhang Y, Warstadt A, Li X, et al. (2021) When do you need billions of words of pretraining data? Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 1112-1125. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.90

Download references

This research was funded by National Foundation of Social Sciences (22BYY186) to Wenchao Li.

Author information

Authors and affiliations.

Department of Japanese Studies, Zhejiang University, Hangzhou, China

Department of Linguistics and Applied Linguistics, Zhejiang University, Hangzhou, China

You can also search for this author in PubMed Google Scholar

Contributions

Wenchao Li is in charge of conceptualization, validation, formal analysis, investigation, data curation, visualization and writing the draft. Haitao Liu is in charge of supervision.

Corresponding author

Correspondence to Wenchao Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Liu, H. Applying large language models for automated essay scoring for non-native Japanese. Humanit Soc Sci Commun 11 , 723 (2024). https://doi.org/10.1057/s41599-024-03209-9

Download citation

Received : 02 February 2024

Accepted : 16 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1057/s41599-024-03209-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Share full article

Supported by

Guest Essay

Jamie Raskin: How to Force Justices Alito and Thomas to Recuse Themselves in the Jan. 6 Cases

A white chain in the foreground, with the pillars of the Supreme Court Building in the background.

By Jamie Raskin

Mr. Raskin represents Maryland’s Eighth Congressional District in the House of Representatives. He taught constitutional law for more than 25 years and was the lead prosecutor in the second impeachment trial of Donald Trump.

Many people have gloomily accepted the conventional wisdom that because there is no binding Supreme Court ethics code, there is no way to force Associate Justices Samuel Alito and Clarence Thomas to recuse themselves from the Jan. 6 cases that are before the court.

Justices Alito and Thomas are probably making the same assumption.

But all of them are wrong.

It seems unfathomable that the two justices could get away with deciding for themselves whether they can be impartial in ruling on cases affecting Donald Trump’s liability for crimes he is accused of committing on Jan. 6. Justice Thomas’s wife, Ginni Thomas, was deeply involved in the Jan. 6 “stop the steal” movement. Above the Virginia home of Justice Alito and his wife, Martha-Ann Alito, flew an upside-down American flag — a strong political statement among the people who stormed the Capitol. Above the Alitos’ beach home in New Jersey flew another flag that has been adopted by groups opposed to President Biden.

Justices Alito and Thomas face a groundswell of appeals beseeching them not to participate in Trump v. United States , the case that will decide whether Mr. Trump enjoys absolute immunity from criminal prosecution, and Fischer v. United States , which will decide whether Jan. 6 insurrectionists — and Mr. Trump — can be charged under a statute that criminalizes “corruptly” obstructing an official proceeding. (Justice Alito said on Wednesday that he would not recuse himself from Jan. 6-related cases.)

Everyone assumes that nothing can be done about the recusal situation because the highest court in the land has the lowest ethical standards — no binding ethics code or process outside of personal reflection. Each justice decides for him- or herself whether he or she can be impartial.

Of course, Justices Alito and Thomas could choose to recuse themselves — wouldn’t that be nice? But begging them to do the right thing misses a far more effective course of action.

The U.S. Department of Justice — including the U.S. attorney for the District of Columbia, an appointed U.S. special counsel and the solicitor general, all of whom were involved in different ways in the criminal prosecutions underlying these cases and are opposing Mr. Trump’s constitutional and statutory claims — can petition the other seven justices to require Justices Alito and Thomas to recuse themselves not as a matter of grace but as a matter of law.

The Justice Department and Attorney General Merrick Garland can invoke two powerful textual authorities for this motion: the Constitution of the United States, specifically the due process clause, and the federal statute mandating judicial disqualification for questionable impartiality, 28 U.S.C. Section 455. The Constitution has come into play in several recent Supreme Court decisions striking down rulings by stubborn judges in lower courts whose political impartiality has been reasonably questioned but who threw caution to the wind to hear a case anyway. This statute requires potentially biased judges throughout the federal system to recuse themselves at the start of the process to avoid judicial unfairness and embarrassing controversies and reversals.

The constitutional and statutory standards apply to Supreme Court justices. The Constitution, and the federal laws under it, is the “ supreme law of the land ,” and the recusal statute explicitly treats Supreme Court justices as it does other judges: “Any justice, judge or magistrate judge of the United States shall disqualify himself in any proceeding in which his impartiality might reasonably be questioned.” The only justices in the federal judiciary are the ones on the Supreme Court.

This recusal statute, if triggered, is not a friendly suggestion. It is Congress’s command, binding on the justices, just as the due process clause is. The Supreme Court cannot disregard this law just because it directly affects one or two of its justices. Ignoring it would trespass on the constitutional separation of powers because the justices would essentially be saying that they have the power to override a congressional command.

When the arguments are properly before the court, Chief Justice John Roberts and Associate Justices Amy Coney Barrett, Neil Gorsuch, Ketanji Brown Jackson, Elena Kagan, Brett Kavanaugh and Sonia Sotomayor will have both a constitutional obligation and a statutory obligation to enforce recusal standards.

Indeed, there is even a compelling argument based on case law that Chief Justice Roberts and the other unaffected justices should raise the matter of recusal on their own, or sua sponte. Numerous circuit courts have agreed with the Eighth Circuit that this is the right course of action when members of an appellate court are aware of “ overt acts ” of a judge reflecting personal bias. Cases like this stand for the idea that appellate jurists who see something should say something instead of placing all the burden on parties in a case who would have to risk angering a judge by bringing up the awkward matter of potential bias and favoritism on the bench.

But even if no member of the court raises the issue of recusal, the urgent need to deal with it persists. Once it is raised, the court would almost surely have to find that the due process clause and Section 455 compel Justices Alito and Thomas to recuse themselves. To arrive at that substantive conclusion, the justices need only read their court’s own recusal decisions.

In one key 5-to-3 Supreme Court case from 2016, Williams v. Pennsylvania, Justice Anthony Kennedy explained why judicial bias is a defect of constitutional magnitude and offered specific objective standards for identifying it. Significantly, Justices Alito and Thomas dissented from the majority’s ruling.

The case concerned the bias of the chief justice of Pennsylvania, who had been involved as a prosecutor on the state’s side in an appellate death penalty case that was before him. Justice Kennedy found that the judge’s refusal to recuse himself when asked to do so violated due process. Justice Kennedy’s authoritative opinion on recusal illuminates three critical aspects of the current controversy.

First, Justice Kennedy found that the standard for recusal must be objective because it is impossible to rely on the affected judge’s introspection and subjective interpretations. The court’s objective standard requires recusal when the likelihood of bias on the part of the judge “is too high to be constitutionally tolerable,” citing an earlier case. “This objective risk of bias,” according to Justice Kennedy, “is reflected in the due process maxim that ‘no man can be a judge in his own case.’” A judge or justice can be convinced of his or her own impartiality but also completely missing what other people are seeing.

Second, the Williams majority endorsed the American Bar Association’s Model Code of Judicial Conduct as an appropriate articulation of the Madisonian standard that “no man can be a judge in his own cause.” Model Code Rule 2.11 on judicial disqualification says that a judge “shall disqualify himself or herself in any proceeding in which the judge’s impartiality might reasonably be questioned.” This includes, illustratively, cases in which the judge “has a personal bias or prejudice concerning a party,” a married judge knows that “the judge’s spouse” is “a person who has more than a de minimis interest that could be substantially affected by the proceeding” or the judge “has made a public statement, other than in a court proceeding, judicial decision or opinion, that commits or appears to commit the judge to reach a particular result.” These model code illustrations ring a lot of bells at this moment.

Third and most important, Justice Kennedy found for the court that the failure of an objectively biased judge to recuse him- or herself is not “harmless error” just because the biased judge’s vote is not apparently determinative in the vote of a panel of judges. A biased judge contaminates the proceeding not just by the casting and tabulation of his or her own vote but by participating in the body’s collective deliberations and affecting, even subtly, other judges’ perceptions of the case.

Justice Kennedy was emphatic on this point : “It does not matter whether the disqualified judge’s vote was necessary to the disposition of the case. The fact that the interested judge’s vote was not dispositive may mean only that the judge was successful in persuading most members of the court to accept his or her position — an outcome that does not lessen the unfairness to the affected party.”

Courts generally have found that any reasonable doubts about a judge’s partiality must be resolved in favor of recusal. A judge “shall disqualify himself in any proceeding in which his impartiality might reasonably be questioned.” While recognizing that the “challenged judge enjoys a margin of discretion,” the courts have repeatedly held that “doubts ordinarily ought to be resolved in favor of recusal.” After all, the reputation of the whole tribunal and public confidence in the judiciary are both on the line.

Judge David Tatel of the D.C. Circuit emphasized this fundamental principle in 2019 when his court issued a writ of mandamus to force recusal of a military judge who blithely ignored at least the appearance of a glaring conflict of interest. He stated : “Impartial adjudicators are the cornerstone of any system of justice worthy of the label. And because ‘deference to the judgments and rulings of courts depends upon public confidence in the integrity and independence of judges,’ jurists must avoid even the appearance of partiality.” He reminded us that to perform its high function in the best way, as Justice Felix Frankfurter stated, “justice must satisfy the appearance of justice.”

The Supreme Court has been especially disposed to favor recusal when partisan politics appear to be a prejudicial factor even when the judge’s impartiality has not been questioned. In Caperton v. A.T. Massey Coal Co. , from 2009, the court held that a state supreme court justice was constitutionally disqualified from a case in which the president of a corporation appearing before him had helped to get him elected by spending $3 million promoting his campaign. The court, through Justice Kennedy, asked whether, quoting a 1975 decision, “under a realistic appraisal of psychological tendencies and human weakness,” the judge’s obvious political alignment with a party in a case “poses such a risk of actual bias or prejudgment that the practice must be forbidden if the guarantee of due process is to be adequately implemented.”

The federal statute on disqualification, Section 455(b) , also makes recusal analysis directly applicable to bias imputed to a spouse’s interest in the case. Ms. Thomas and Mrs. Alito (who, according to Justice Alito, is the one who put up the inverted flag outside their home) meet this standard. A judge must recuse him- or herself when a spouse “is known by the judge to have an interest in a case that could be substantially affected by the outcome of the proceeding.”

At his Senate confirmation hearing, Chief Justice Roberts assured America that “judges are like umpires.”

But professional baseball would never allow an umpire to continue to officiate the World Series after learning that the pennant of one of the two teams competing was flying in the front yard of the umpire’s home. Nor would an umpire be allowed to call balls and strikes in a World Series game after the umpire’s wife tried to get the official score of a prior game in the series overthrown and canceled out to benefit the losing team. If judges are like umpires, then they should be treated like umpires, not team owners, fans or players.

Justice Barrett has said she wants to convince people “that this court is not comprised of a bunch of partisan hacks.” Justice Alito himself declared the importance of judicial objectivity in his opinion for the majority in the Dobbs v. Jackson Women’s Health Organization decision overruling Roe v. Wade — a bit of self-praise that now rings especially hollow.

But the Constitution and Congress’s recusal statute provide the objective framework of analysis and remedy for cases of judicial bias that are apparent to the world, even if they may be invisible to the judges involved. This is not really optional for the justices.

I look forward to seeing seven members of the court act to defend the reputation and integrity of the institution.

Jamie Raskin, a Democrat, represents Maryland’s Eighth Congressional District in the House of Representatives. He taught constitutional law for more than 25 years and was the lead prosecutor in the second impeachment trial of Donald Trump.

The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips . And here’s our email: [email protected] .

Follow the New York Times Opinion section on Facebook , Instagram , TikTok , WhatsApp , X and Threads .

Share on Twitter
Share on Facebook
Follow us on LinkedIn

Jose Miranda Is Writing A Second Act With The Twins

In the fall of 2022, José Miranda began making a name for himself. A top prospect from the Minnesota Twins organization, Miranda enjoyed a strong rookie season as the Twins headed to Yankee Stadium to take on the New York Yankees.

During that series, Miranda made plans to meet up with his cousin Lin-Manuel Miranda, known for writing the Broadway play Hamilton. Lin-Manuel couldn’t make it to a game, but they met for dinner, and Miranda later hit a home run to put an exclamation point on his trip to New York.

Two years later, José has a fitting connection to his cousin. Miranda’s first act with the Twins was a smash hit, but his encore disappointed many. Still, Miranda has found a way to write a second act this season and is re-introducing himself to Minnesota’s long-term plans.

It started in the summer of 2021 when Miranda had one of the best seasons by a Twins prospect in the past 20 years.

Miranda began the year at Double-A Wichita. He crushed baseballs with the Wind Surge, hitting .345/.408/.588 with 13 homers and 36 RBI in 46 games. When the Twins promoted Miranda to Triple-A St. Paul, he continued to rake, hitting .343/.397/.563 with 17 homers and 56 RBI in 80 games with the Saints.

The total production helped Miranda become a consensus top 100 prospect, and he made his major league debut on May 2, 2022. Miranda’s first month in the majors was tough; he only hit .169/.200/.312 with two homers and seven RBI in his first 22 games. However, something clicked in the summer of 2022, and Miranda began to take off.

Miranda hit .332/.382/.536 with 10 homers and 44 RBI over his next 57 games and became a fixture. He rotated between first base, third base, and designated hitter. While his production tailed off in the season’s final months, Miranda still produced a final line of .268/.325/.426 with 15 homers and 66 RBI in 125 games in his rookie season.

Miranda’s rookie year drew rave reviews, and he was ready to produce an encore, changing his diet and dropping 12 pounds over the offseason. The Twins traded Gio Urshela and were prepared to give Miranda the everyday third baseman job until a shoulder injury in Spring Training prevented him from playing for Puerto Rico in the World Baseball Classic.

The injury lingered into the season, and Miranda experienced a sophomore slump. He only hit .211/.263/.303 with three homers and 13 RBI in 40 games. Miranda returned to the field in August, but it was with St. Paul, where he posted a .255/.326/.360 line with three homers and 23 RBI.

At the same time, Miranda was becoming the forgotten prospect in the Twins’ system. Royce Lewis took over third base upon his return from injury in June, and another promising prospect, Alex Kirilloff , manned first. Miranda’s lack of power didn’t make him an option down the stretch, and he went home to prepare for next season.

This is the moment when the main character in a play faces adversity that leads to a big payoff at the end. Alexander Hamilton overcame a challenging upbringing to become the first United States Secretary of the Treasury. Miranda’s cousin used that as the inspiration for one of the highest-grossing Broadway shows ever.

Even The Lion King , the top-grossing Broadway show of all time, had some sort of redemption story that led to an ultimate payoff at the end of the play. While Miranda’s adversity wasn’t as dramatic, he still was entering a crucial point of his career.

Unlike the year before, we didn’t hear much about Miranda’s offseason. Miranda’s shoulder healed, but he still didn’t produce the exit velocity or power from his rookie season during Spring Training. That led to another option to St. Paul at the beginning of the season, but injuries helped him make his way to St. Paul shortly after the season began.

In 44 games, Twins fans are seeing the player who created so much optimism during his rookie season. Miranda is hitting .280/.311/.469 with six homers and 20 RBI and has carved out consistent playing time, particularly at third base.

The greatest example of Miranda’s resurgence might have been Sunday’s win over the Houston Astros. Hitting in the three-spot in the batting order, Miranda launched a sixth-inning home run to tie the game and played the hero in the top of the eighth, crushing a double that drove in the go-ahead run in a 4-3 victory.

Jose sent that ball into orbit! pic.twitter.com/QjV0AThA41 — Minnesota Twins (@Twins) June 2, 2024

While his defense is still an issue, his bat has returned to his rookie form, keeping him on the major league roster even as Lewis returns from injury this week.

“We’re going to move him around and find ways to get him into the lineup most days,” Twins manager Rocco Baldelli said after the game. “When Royce comes back, he’s going to play, of course, but he’s not going to be out there seven days a week. [He’s been out] a very long time. So he may play a couple of games, get a day [off], play two or three games, get a day.”

That plan seems to leave Miranda in the fold either as a key reserve at third base or a corner-utility infielder with a bat that doesn’t take anything from the lineup. It could lead to Miranda getting a start during this week’s series at Yankee Stadium, and maybe Lin-Manuel will be able to catch his cousin in action.

It would be a fitting chapter in Miranda’s career because he’s producing a second act that even a screenwriter like his cousin could love.

Can Royce Lewis Save Minnesota's Group Of Slumping Sophomores?

One year ago, the future looked bright for the Minnesota Twins. Edouard Julien joined the team early in the season and never returned to St. Paul. Royce […]

COMMENTS

Organization and Structure
Whole-Essay Structure IMRAD. While organization varies across and within disciplines, usually based on the genre, publication venue, and other rhetorical considerations of the writing, a great deal of academic writing can be described by the acronym IMRAD (or IMRaD): Introduction, Methods, Results, and Discussion.
9.3 Organizing Your Writing
A strong organizational pattern allows you to articulate, analyze, and clarify your thoughts. Planning the organizational structure for your essay before you begin to search for supporting evidence helps you conduct more effective and directed research. Chronological order is most commonly used in expository writing.
How to Structure an Essay
The basic structure of an essay always consists of an introduction, a body, and a conclusion. But for many students, the most difficult part of structuring an essay is deciding how to organize information within the body. This article provides useful templates and tips to help you outline your essay, make decisions about your structure, and ...
Academic Guides: Writing a Paper: Revising for Effective Organization
Whole-Essay Organization: These strategies will help you identify paragraphs with information or ideas that need to be rearranged or adjusted. Read each paragraph. On note paper, write the central idea for each paragraph, forming an outline of your paper. Read your summaries of each central idea and ask yourself the following:
Organization
Organization. Organization. Five-paragraph essays have very predictable organization. These short, academic essays typically have four, five, or six paragraphs: one introduction paragraph; two, three, or four body paragraphs; and one conclusion paragraph. The number of body paragraphs you need can change depending on the topic or time ...
Organizing Your Writing
A strong organizational pattern allows you to articulate, analyze, and clarify your thoughts. Planning the organizational structure for your essay before you begin to search for supporting evidence helps you conduct more effective and directed research. Chronological order is most commonly used in expository writing.
Organizational tips for academic essays
Four tips for Organization. In general, the purpose of the essay should dictate the organization of the essay—ask yourself what claims you need to establish in order for your reader to believe that your main claim is right. The claims that help establish your main claim are called "supporting claims.". In many cases, each supporting claim ...
3.1: Introduction to Essay Organization
Building an essay begins with a unique process that involves clear organization, backing up positions with specific evidence, and engaging language. In other words, when writing an essay, you are attempting to establish a clear arguable opinion that has supporting points and evidence, organized with appealing words and wrapped with an ...
Tips for Organizing Your Essay
Strategy #1: Decompose your thesis into paragraphs. A clear, arguable thesis will tell your readers where you are going to end up, but it can also help you figure out how to get them there. Put your thesis at the top of a blank page and then make a list of the points you will need to make to argue that thesis effectively.
Essay Organization
In writing terms, the assertion is the ... Essay Organization. The structural organization of an essay will vary, depending on the type of writing task you've been assigned, but they generally follow this basic structure: The thesis and the topic sentences are all concerned with workers and what they need for the workforce.
Organizing an Essay
Organizing an Essay. There are many elements that must come together to create a good essay. The topic should be clear and interesting. The author's voice should come through, but not be a distraction. There should be no errors in grammar, spelling, punctuation, or capitalization. Organization is one of the most important elements of an essay ...
6.14: Essay Organization
Topic Sentence (reason) #1: Workers need to learn how to deal with change. Topic Sentence (reason) #2: Because of dealing with such a rapidly changing work environment, 21st-century workers need to learn how to learn. Topic Sentence (reason) #3: Most of all, in order to negotiate rapid change and learning, workers in the 21st century need good ...
PDF Strategies for Essay Writing
Harvard College Writing Center 2 Tips for Reading an Assignment Prompt When you receive a paper assignment, your first step should be to read the assignment prompt carefully to make sure you understand what you are being asked to do. Sometimes your assignment will be open-ended ("write a paper about anything in the course that interests you").
Organizing an Essay
method 1: hierarchical outline. This method usually begins by taking notes. Start by collecting potential points, as well as useful quotations and paraphrases of quotations, consecutively. As you accumulate notes, identify key points and start to arrange those key points into an outline.
Academic Guides: Writing a Paper: Organizing Your Thoughts
Categorize. Organizing your paper can be a daunting task if you begin too late, so organizing a paper should take place during the reading and note-taking process. As you read and take notes, make sure to group your data into self-contained categories. These categories will help you to build the structure of your paper.
How to Write an Essay Outline
Revised on July 23, 2023. An essay outline is a way of planning the structure of your essay before you start writing. It involves writing quick summary sentences or phrases for every point you will cover in each paragraph, giving you a picture of how your argument will unfold. You'll sometimes be asked to submit an essay outline as a separate ...
How to Organize an Essay (with Pictures)
3. Determine your writing task. How you organize your essay will also depend on what your writing task is. This is usually in the assignment or prompt. Look for keywords like "describe," "analyze," "discuss," or "compare.". These will tell you what your writing "job" is -- what the essay needs to accomplish. [5]
Organizing an Essay
Organizing an Essay. Organizing ideas and information clearly and logically in an essay, so that readers will understand and be able to follow the writer's thinking, is an essential stage of the writing process, but one that often proves to be more difficult than it sounds. When people write, ideas tend to come out in whatever order they occur ...
Welcome to the Purdue Online Writing Lab
The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out ...
Essay Writing Organization: The Outline
Essay Writing Organization. Writing is a process that everyone does differently, but an outline will help you with development of ideas. I. Choice of introduction and linking sentences. A. Catching opener. B. Linking Sentences. C. Thesis. II. Topic Sentence One with Transition, Link to Thesis, and Topic of paragraph mentioned
5 Strategies To Unlock Your Winning College Essay
The best essays have clear, coherent language and are free of errors. The story is clearly and specifically told. After drafting, take the time to revise and polish your writing. Seek feedback ...
The Beginner's Guide to Writing an Essay
Come up with a thesis. Create an essay outline. Write the introduction. Write the main body, organized into paragraphs. Write the conclusion. Evaluate the overall organization. Revise the content of each paragraph. Proofread your essay or use a Grammar Checker for language errors. Use a plagiarism checker.
Substance Abuse: Statistics In The Us And Risk Factors: Free Essay
Substance abuse and addiction cost the country more than $740 billion annually in terms of healthcare expenses, crime-related costs, and lost productivity. In 2017, approximately 4 % of the teens aged between 12 and 17 suffered from a drug use disorder. In figures, 443,000 adolescents suffered from a substance use disorder.
Announcing TGC's 2024 Essay Contest for Young Adults
Writers Aged 16-22 Can Get Published and Win $500. The Gospel Coalition announces its 2024 essay contest, inviting young adults (ages 16-22) to explore and write about God's faithfulness, their relationship with technology, and their heart for full-time ministry in our secular age. Winning authors will receive a prize, and their essays ...
Applying large language models for automated essay scoring for non
Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated ...
Opinion
Guest Essay. Jamie Raskin: How to Force Justices Alito and Thomas to Recuse Themselves in the Jan. 6 Cases. ... Jackson Women's Health Organization decision overruling Roe v. Wade — a bit of ...
The Writing Process
Table of contents. Step 1: Prewriting. Step 2: Planning and outlining. Step 3: Writing a first draft. Step 4: Redrafting and revising. Step 5: Editing and proofreading. Other interesting articles. Frequently asked questions about the writing process.
AI firms mustn't govern themselves, say ex-members of OpenAI's board
Unfortunately it didn't work. Last November, in an effort to salvage this self-regulatory structure, the OpenAI board dismissed its CEO, Sam Altman. The board's ability to uphold the company ...
Jose Miranda Is Writing A Second Act With The Twins
Miranda's first act with the Twins was a smash hit, but his encore disappointed many. Still, Miranda has found a way to write a second act this season and is re-introducing himself to Minnesota's long-term plans. It started in the summer of 2021 when Miranda had one of the best seasons by a Twins prospect in the past 20 years.

9.3 Organizing Your Writing

Chronological Order

Writing at Work

Order of Importance

Spatial Order

Key Takeaways

Organization

Choose a Sign-in Option

Citation and Embed Code

Organizing Your Writing

Learning Objectives

CHRONOLOGICAL ORDER

WRITING AT WORK

ORDER OF IMPORTANCE

SPATIAL ORDER

Key Takeaways

Share This Book

Module 1: An Overview of the Writing Process

Strategy 1. Reverse Outlining

Strategy 2. Talk It Out

Strategy 3. Paragraphs

Margin Size

6.14: Essay Organization

Learning Objectives

Essay Organization

Introduction

Body Paragraphs

Contributors and Attributions

Organizing an Essay

Some basic guidelines

Avoiding a common pitfall

What does an essay outline look like?

When should I begin putting together a plan?

How can I construct a usable plan?

method 1: hierarchical outline

method 2: the circle method

What is a reverse outline?

How much of my time should I put into planning?

Writing a Paper: Organizing Your Thoughts

Mindmapping Video

Related Resources

Walden Resources

Centers and Offices

Student Resources

How to Organize an Essay

Essay Template and Sample Essay

Laying the Groundwork

Getting the Basics Down

Organizing the Essay

Revising the Plan

Expert Q&A

You Might Also Like

About This Article

Reader Success Stories

Did this article help you?

Featured Articles

Trending Articles

Watch Articles

Welcome to the Purdue Online Writing Lab

Welcome to the Purdue OWL

A Message From the Assistant Director of Content Development

Social Media

A Writer's Handbook

Essay Writing Organization

Most Popular

U.S. Edition

To All The World

Announcing TGC’s 2024 Essay Contest for Young Adults

More By Staff

Essay Requirements

1. When did the Lord love you by not giving you what you wanted?

2. How has the gospel changed your relationship with your phone?

3. Why are you considering full-time ministry?

Now Trending

Preaching Christ in a Postmodern World

An Unequally Yoked Small Group

The Omnipotence, Omniscience, and Omnipresence of God

What Should We Think About Paedocommunion?

12 Easy Ways to Improve Your Listening

My Friend, Randy Newman (1956–2024)