Penn State  Logo

  • Help & FAQ

Automatic grading of programming assignments: An approach based on formal semantics

  • College of Information Sciences and Technology
  • Comparative Literature

Research output : Chapter in Book/Report/Conference proceeding › Conference contribution

Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides counterexamples given a single reference implementation of the problem. Instead of counting the passed tests, our tool searches for semantically different execution paths between a student's submission and the reference implementation. If such a difference is found, the submission is deemed incorrect; otherwise, it is judged to be a correct solution. We use weakest preconditions and symbolic execution to capture the semantics of execution paths and detect potential path differences. AutoGrader is the first automated grading tool that relies on program semantics and generates feedback with counterexamples based on path deviations. It also reduces human efforts in writing test cases and makes the grading more complete. We implement AutoGrader and test its effectiveness and performance with real-world programming problems and student submissions collected from an online programming site. Our experiment reveals that there are no false negatives using our proposed method and we detected 11 errors of online platform judges.

Publication series

All science journal classification (asjc) codes, access to document.

  • 10.1109/ICSE-SEET.2019.00022

Other files and links

  • Link to publication in Scopus
  • Link to citation list in Scopus

Fingerprint

  • grading Social Sciences 100%
  • semantics Social Sciences 76%
  • programming Social Sciences 74%
  • Semantics Engineering & Materials Science 64%
  • Students Engineering & Materials Science 35%
  • Feedback Engineering & Materials Science 30%
  • experiment Social Sciences 14%
  • student Social Sciences 11%

T1 - Automatic grading of programming assignments

T2 - 41st IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training, ICSE-SEET 2019

AU - Liu, Xiao

AU - Wang, Shuai

AU - Wang, Pei

AU - Wu, Dinghao

PY - 2019/5

Y1 - 2019/5

N2 - Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides counterexamples given a single reference implementation of the problem. Instead of counting the passed tests, our tool searches for semantically different execution paths between a student's submission and the reference implementation. If such a difference is found, the submission is deemed incorrect; otherwise, it is judged to be a correct solution. We use weakest preconditions and symbolic execution to capture the semantics of execution paths and detect potential path differences. AutoGrader is the first automated grading tool that relies on program semantics and generates feedback with counterexamples based on path deviations. It also reduces human efforts in writing test cases and makes the grading more complete. We implement AutoGrader and test its effectiveness and performance with real-world programming problems and student submissions collected from an online programming site. Our experiment reveals that there are no false negatives using our proposed method and we detected 11 errors of online platform judges.

AB - Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides counterexamples given a single reference implementation of the problem. Instead of counting the passed tests, our tool searches for semantically different execution paths between a student's submission and the reference implementation. If such a difference is found, the submission is deemed incorrect; otherwise, it is judged to be a correct solution. We use weakest preconditions and symbolic execution to capture the semantics of execution paths and detect potential path differences. AutoGrader is the first automated grading tool that relies on program semantics and generates feedback with counterexamples based on path deviations. It also reduces human efforts in writing test cases and makes the grading more complete. We implement AutoGrader and test its effectiveness and performance with real-world programming problems and student submissions collected from an online programming site. Our experiment reveals that there are no false negatives using our proposed method and we detected 11 errors of online platform judges.

UR - http://www.scopus.com/inward/record.url?scp=85072117626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072117626&partnerID=8YFLogxK

U2 - 10.1109/ICSE-SEET.2019.00022

DO - 10.1109/ICSE-SEET.2019.00022

M3 - Conference contribution

T3 - Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training, ICSE-SEET 2019

BT - Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 25 May 2019 through 31 May 2019

Automatic Grading of Programming Assignments in Moodle

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Assignments
  • Office hours
  • Forum & email
  • Other resources

How assignments are graded

Written by Julie Zelenski

We know that you will invest a lot of time into completing the programming assignments and the results of those efforts will be your biggest source of accomplishment (and frustration :-) in this course. We will celebrate and reward your work by counting the assignments for a healthy chunk of your course grade. Therefore, we want you to understand our standards and the grading process we use, so you can be sure to capture all the points you deserve.

We view grading as an important commitment we are making to help you grow as a programmer. Our grading process is designed to thoroughly exercise your program and the grading TA will review your code to provide comprehensive and thoughtful feedback. Our grade report will highlight the strengths of your work, as well as point out the ways in which you can improve. Some might say that our grading is thorough to excess (we are known for poking our nose into every nook and cranny and torturing your code with our battery of tests :-) but the care we take shows our respect and reciprocation for the effort you put into creating it. In order to be able to provide a thorough evaluation for all submissions with a relatively small staff, we have developed automated tools to assist with the mechanical aspects (the infamous "autotester") so as to free up our human expertise for providing qualitative feedback in the code review.

We evaluate each submission in two broad categories: functionality and code quality . For functionality, we assess your program's effectiveness from an external perspective. We are not looking at code, but testing its behavior. For code quality, we apply quality metrics and do a individual review of your code to appraise its design and readability.

How we evaluate functionality

Functionality measures how successfully the program executes on a comprehensive set of test cases. We create our test suite by working from the original program specification to identify a list of expected behaviors and write a test case for each. We use the autotester to run a submission on each test and award points for each successful result. Thus, the resulting functionality score is a direct reflection of how much observably correct behavior your program exhibited. This process is completely automated; the grader does not search your code to find bugs and deduct for them, nor do they appraise the code and award points for tasks that are attempted or close to correct.

Our functionality test suite is a mix of:

  • sanity Sanity is intended to verify basic functionality on simple inputs. These tests are published to you as sanity check.
  • comprehensive This is a battery of tests, each targeted at verifying a specific functional requirement, such as case-insensitive handling or the expected response for an unsuccessful search. Taken together, the comprehensive tests touch on all required functionality. We architect these tests to be as independent and orthogonal as possible to avoid interference between features.
  • robustness These tests verify the graceful handling of required error conditions and proper treatment of exceptional inputs.
  • stress The stress cases push the program harder, supplying large, complex inputs and exercising program features in combination. Anything and everything is fair game for stress, but the priority is on mainstream operation with little trafficking in extreme/exceptional conditions.

A typical rubric might dedicate 50% of the functionality points for passing sanity check, 30% to the comprehensive coverage, and 10% each to the robustness and stress components.

You may ask: does this scheme double-ding some errors? Yes, a program that fails a comprehensive case often fails a stress case due to the same root cause. But this bug was prevalent in both simple and complex scenarios, which means there was more opportunity to observe it and it was more detrimental to the program's overall correctness. In other words, we are ok with applying both deductions and have configured the point values with this in mind. :-) Contrast this with a bug that only triggers in a narrow context and thus fails just one test. The smaller deduction indicates this bug was more obscure, harder to uncover, and only slightly diminished correctness. The total functionality score is not computed by counting up bugs and penalizing them; we award points for observable, correct functionality. A submission could lose a large number of points for several failures all stemming from a single, critical off-by-one error; another submission can lose the same number of points due to a plethora of underlying bugs. Both submissions received the same score because they demonstrated the same amount of "working/not working" when tested.

What about partial credit? Earning partial credit depends on having correctly implemented some observable functionality. Consider two incomplete submissions. One attempts all the required functionality but falls short (imagine it had a tiny flaw in the initial task, such as reading the input file, which causes all output to be wrong despite that the rest of the code may have been totally correct). This submission would earn few functionality points. The second attempts only some features, but correctly implements those. This submission can earn all the points for the cases that it handles correctly, despite the broken/missing code for the additional requirements. This strongly suggests a development strategy where you add functionality feature-by-feature, not attempting to implement all the required functionality in one fell swoop. I call this the "always have a working program" strategy. During development, the program will not yet include all the required features, but the code so far works correctly for the features that are attempted. From there, extend the program to handle additional cases to earn additional functionality points, taking care not to break any of the currently working features.

The mechanics of the autotester

Understanding of the process we use can help you properly prepare your project for submission and avoid surprises in grading.

Output conformance is required . The autotester is just a fancy version of our sanity check tool. It executes your program and compares its output to what is expected. It is vital that your program produce conformant output in order to be recognized as the exact match the autotester is looking for. If you change the format or leave stray debugging printfs behind, your output may be misread. For example, consider the outputs below:

Although programs A and B seem to compute the appropriate information, they don't match the output format (A adds an extra = character and B inverted the columns). The A and B programs would fail every functionality test-- yikes! To avoid this happening to you, be sure to use sanity check and follow through to resolve any discrepancies. We do not adjust scores that were mis-evaluated because the submission didn't conform to sanity check.

Pass/fail scoring . Each automated test is scored as either passed or failed, without gradations for "more" or "less wrong" as such distinctions are difficult to automate and fraught with peril. Is missing some lines of output better than having the right lines in the wrong order or producing the correct output then crashing? Both are scored by the autotester as incorrect for that test.

Timeout . The autotester employs a hard timeout to avoid stalling on potentially infinite loops. The hard timeout is generous but not unbounded (typically 10x the sample). A grossly inefficient program that executes more than an order of magnitude more slowly than the reference runs the risk of losing functionality points due to tripping the hard timeout on tests it was unable to complete in time. We do not re-run tests with ever-increasing timeouts to accommodate these programs.

Grader judgment . Most functionality cases are automatically scored without any involvement from the grader. For robustness cases and other error conditions, the autotester observes whether the program detect the problem, how it reports it to the user, and whether it appropriately handles it. The wording of your error messages is not required to match the sample program, but doing so guarantees full credit from the autotester. When the wording doesn't match, the autotester defers to the grading TA who makes the call on whether the feedback is sufficiently informative, accurate, and actionable to earn full credit. (Not to stifle your creativity, but the risk-free strategy to avoid losing points is to follow our example :-)

How we evaluate quality

In addition to the automated tests for functionality, we also evaluate how well your program meets our standards for clean, well-written, well-designed code. Although good quality code is highly correlated with correct functionality, the two can diverge, e.g. a well-written program can contain a lurking functionality flaw or a mess of spaghetti code can manage to work correctly despite its design. Make it your goal for your submission to shine in both areas!

We use automated tests for these quality metrics:

Clean compile

We expect a clean build: no warnings, no errors. Any error will block the build, meaning we won't be able to the test the program, so build errors absolutely must be addressed before submitting. Warnings are the way the compiler draws attention to a code passage that isn't an outright error but appears suspect. Some warnings are mild/harmless, but others are critically important. If you get in the habit of keeping your code compiling cleanly, you'll never miss a crucial message in a sea of warnings you are casually ignoring. We apply a small deduction if you leave behind unresolved build warnings.

Clean run under valgrind

We look for an error-free, leak-free report from Valgrind. In scoring a Valgrind report, leaks warrant only a minor deduction whereas memory errors are heavily penalized. Anytime you have an error in a Valgrind report, consider it a severe red flag and immediately prioritize investigating and resolving it. Unresolved memory errors can cause all manner of functionality errors due to unpredictable behavior and crashes. Submitting a program with a memory error will not only lose points in the Valgrind-specific test, but runs the risk of failing many other tests that stumble over the memory bug. Leaks, on the other hand, are mostly quite harmless and working to plug them can (and should) be postponed until near the end of your development. Unresolved leaks rarely cause failures outside of the Valgrind-specific test.

Reasonable efficiency

We measure submissions against the benchmark solution and observe whether it performs similarly both in terms of runtime and memory usage. Our assignment rubric typically designates a small number of points for runtime and memory efficiency. A submission earns full credit by being in the same ballpark as the sample program (i.e. "ballpark" translates to within a factor of 2-3). Our sample is written with a straightforward approach and does not pursue aggressive optimization. Your submission can achieve the same results without heroics, and that it what we want to encourage. There is no bonus for outperforming this benchmark and we especially do not want you to sacrifice elegance or complicate the design in the name of efficiency (which will displease the TA during code review). Note that gross inefficiency (beyond 10x) puts you at risk of losing much more than the designated efficiency points due to the hard timeout on the autotester. if your program is in danger of running afoul of the hard timeout, it is a clear sign you need to bring your attention to correcting the inefficiency to avoid losing points for functionality tests that exceed hard timeout in addition to the regular efficiency deductions.

For both the Valgrind and efficiency tests, we try to ignore issues of functional correctness if they don't otherwise interfere with the evaluation. For example, if the program attempts to reassemble and gets the wrong answer, its Valgrind report or runtime can still be evaluated. However, if the program is sufficiently buggy/incomplete (e.g. discards input or crashes), such inconclusive results can lead to loss of points.

Unless otherwise indicated in the problem statement, we do not require recovery, memory deallocation, or other cleanup from fatal errors, you are allowed to simply exit(1).

Code review

The most important part of the quality feedback is the TA's commentary from the code review. The TA will read your code as in the role of a team manager giving feedback before accepting the code into the team's repository. Our standards should be familiar from CS106: clear, elegant code that is readable, cleanly designed, well-decomposed, and commented appropriately. Read Nick Parlante's hilarious Landmarks in coding quality . The TA's review will identify notable issues found when your reading your code and point out the highlights and opportunities for improvement. The TA also assigns a bucket for the key tasks being evaluated. Most assignments will be to the median bucket (designed [ok]) which means the code is "reasonable". It likely has a number of small issues, but on balance is holding steady in the midst of the peer group and is not notably distinguished up or down. Code that is outstanding we will reward with the [+] bucket, while code that is more troubled will land in the [-] bucket. In rare cases where we need to send an even stronger message, there is a [--] bucket. Your most important takeaways from the code review will come in detailed feedback given in the embedded comments marking up your code, so that's where your should focus your attention. The bucket serves to confirm that you're meeting expectation [ok] or that you are over[+]/under[-] performing.

Below is a list of code quality expectations in the base rubric for the code review of all assignments. There may be additional entries in a rubric specific to a particular assignment.

Cleanliness/readability

  • code is free of clutter: remove all dead code and unused vars/fns
  • split long lines if necessary (screenwidth ~120 chars)
  • 2-4 spaces per indent level (use spaces to indent instead of tabs to avoid editor inconsistency)
  • whitespace used to visually support logical separation
  • good naming conventions help avoid need for additional commentary
  • use consistent scheme for capitalization/underscores
  • use constants/#define'd/sizeof instead of hardcoded numbers
  • overview comments (per-function, per-module) with summary highights
  • inline comments used sparingly where needed to decipher dense/complex lines
  • no excess verbiage that reiterates what code itself already says

Language conventions

  • choose the most clean, direct, conventional syntax available to you, e.g. ptr->field instead of (*ptr).field
  • functionally equivalent but more common to use subscript when accessing an individual array element, pointer arithmetic when accessing subarray
  • avoid unnecessary use of obscure constructs (such as comma operator, unions)
  • bool type from stdbool.h, static qualifier on private functions, const for read-only pointers

Program design

  • program flow decomposed into manageable, logical pieces
  • function interfaces are clean and well-encapsulated
  • appropriate algorithms used, coded cleanly
  • when you need same lines more the once, don't copy and paste -- unify!
  • string manipulation, formatted I/O, sort/search, type conversion, etc.

Data structures

  • data structures are well-chosen and appropriate
  • no redundant storage/copying of data
  • no global variables

Pointers/memory

  • no unnecessary levels of indirection in variable/parameter declarations
  • uses specific pointee type whenever possible, void* only where required
  • low-level pointer manipulation/raw memory ops used only when required
  • allocation uses appropriate storage (stack versus heap, based on requirements)
  • allocations are of appropriate size
  • use typecasts only and exactly where necessary and appropriate

Investing in testing

Succeeding on the functionality tests is a measure of your achievement in implementing the program requirements, but is also a reflection of your testing efforts. The more thorough you are in testing, the more bugs you can find and fix. High assignment scores are strongly correlated with good testing. I believe that most submissions come in with no known bugs, that is, had you observed the bug, you would have fixed it before submitting. But not everyone puts the same time into finding the bugs in the first place. Testing is an important part of the process. Your efforts to identify the inputs or code paths that need to be exercised to get full coverage and testing and re-testing throughout development can make a huge difference in results. (read our thoughts on effective software testing for advice on tactics)

When I create an assignment, I also put together extensive tests we plan to subject your submissions to. Why not give those tests to you up front? Is this stinginess just further evidence of my commitment to student cruelty? Oh, probably :-) But it is also true that your boss/advisor is not going to hand you an exhaustive test suite when assigning you a task. Along with designing and writing the code, a professional engineer is responsible for constructing test inputs, devising strategies, and/or building tools that allow them to verify their work. These skills are not trivial and require practice. Consider the CS107 assignments as chock-full of opportunities for you to gain proficiency in testing.

It is such a bummer to have worked hard on your submission only to get back our grading report with failures from bugs you would have fixed if only you had known about them! This sad outcome is entirely avoidable-- thorough testing allows you to find those bugs, which means you can fix them and earn full points on functionality. That totally rocks!

Interpreting our feedback

A grading report gives a list of the test cases and the pass/fail result from executing it against your program. Each test is identified with a short summary description of what was being tested. If you would like to know more about a particular case, you may ask in email or better, come to office hours where we can manually re-run the test and walk you through it. Note that we do not give out test cases as part of maintaining a level playing field for future students. For the code review, we return your code marked up with our comments. This individualized feedback on your code quality will highlight opportunities for future improvement and commend your successes. Please take the time to read and reflect on all of the notations made by your grading TA, and use email or office hours to resolve any questions or misunderstandings about the feedback.

Frequently asked questions about assignment grading

How does a code review bucket map to a score are assignment grades curved how do assignments figure into the course grade.

Answers to these and other burning questions about how course grades are determined .

Can I get a regrade? I think the autotester results are wrong.

We have invested much time in our tools to try to ensure they evaluate the submissions correctly and consistently and we are fairly confident in them. But given we're human and it's software, mistakes can happen and it is important to us to correct them. If you believe there is a grading error due to a bug in our tools/tests, please let us know so we will investigate further. If there is a problem, we will be eager to fix it and correct the scores for all affected submissions.

My program worked great on sanity check, but still failed many grading tests. Why?

The information from sanity check is only as good as the test cases being run. The default cases supplied with sanity check are fairly minimal. Passing them is necessary, but not sufficient, to ensure a good outcome. You gain broader coverage by creating your own custom tests. The more comprehensive your own custom tests, the more confidence you can have in the results.

I ran Valgrind myself and got a clean report but my assignment grade docked me for errors/leaks. What gives?

We run a Valgrind using one of our larger comprehensive test cases. You should do the same. Getting a clean Valgrind report on a simple test case confirms the correctness of that particular code path, but only that path. If you run Valgrind on a diverse set of code paths, you be able to additionally confirm the absence/presence of memory issues lurking in those parts and will not be caught unaware by what we find there.

How can I tell what bug caused me to fail a particular test? Can you send me the test input to reproduce it myself?

The grade report includes a summary statement of the objective of each test, but they are admittedly terse. If you're in need of further detail, catch up with us in office hours or send us an email. We can talk you through the test and its result on your submission so you better understand what went wrong. We do not publish or release our test inputs.

If I can't get my program working on the general case, can I earn sanity points by hard-coding my program to produce the expected output for the sanity inputs so that it "passes" sanity?

No. Any submission that deliberately attempts to defraud the results of automated testing will be rejected from grading and receive a 0 score for all tests.

Is it better to submit by due date and earn the on-time bonus or sacrifice the bonus to spend more time testing/polishing?

It depends. If your program has no known problem or only very minor issues, definitely make the on-time submission and enjoy the bonus! You can followup with additional testing after the fact and re-submit if you find something critical lurking. The on-time bonus is typically a 5% bump, enough to counteract a failure or two on a grading test, but not much more. Best to resolve any substantial issues before submit, using the free late days up to slip to the hard deadline.

My indentation looked fine in my editor, but misaligned when displayed for code review. What gives?

If your file has mixed tabs and spaces, the expansion of tabs into spaces can change when loaded into a editor/viewer with settings different than your own. You can configure your editor to always use only spaces for indentation as one means to avoid such surprises or use indent to post-process your final version before submission. We do note wonky indentation in code review and recommend you fix in future to improve the viewing experience for your grader, but indentation does not impact the style bucket grading unless grossly inconsistent.

Which TA grades my submissions?

We randomly shuffle grading TA per assignment. The functionality tests are all autoscored, the grading TA handles any judgment calls and the code review. The TA who reviewed your submission is shown in the header of the grade report. All TAs work from a shared rubric and the head TA does a meta-review for consistent application and calibration across graders.

I don't understand or disagree with my code review. Who can I talk to about this?

You can email the TA who graded your submission to ask for clarification or further explanation of any unclear feedback. To challenge the rubric or its application, you'll need to take it up with me (Julie) :-). Come by my office hours!

Grading Programming Assignments with an Automated Grading and Feedback Assistant

  • Conference paper
  • First Online: 26 July 2022
  • Cite this conference paper

grading of programming assignments

  • Marcus Messer   ORCID: orcid.org/0000-0001-5915-9153 11  

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13356))

Included in the following conference series:

  • International Conference on Artificial Intelligence in Education

3645 Accesses

Over the last few years, Computer Science class sizes have increased, resulting in a higher grading workload. Universities often use multiple graders to quickly deliver the grades and associated feedback to manage this workload. While using multiple graders enables the required turnaround times to be achieved, it can come at the cost of consistency and feedback quality. Partially automating the process of grading and feedback could help solve these issues. This project will look into methods to assist in grading and feedback partially subjective elements of programming assignments, such as readability, maintainability, and documentation, to increase the marker’s amount of time to write meaningful feedback. We will investigate machine learning and natural language processing methods to improve grade uniformity and feedback quality in these areas. Furthermore, we will investigate how using these tools may allow instructors to include open-ended requirements that challenge students to use their ideas for possible features in their assignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

checkstyle. http://checkstyle.sourceforge.io/ . Accessed 14 May 2022

Bernius, J.P., Krusche, S., Bruegge, B.: A machine learning approach for suggesting feedback in textual exercises in large courses. In: Proceedings of the Eighth ACM Conference on Learning @ Scale (2021). https://doi.org/10.1145/3430895

Brown, N.C.C., Klling, M., Mccall, D., Utting, I.: Blackbox: a large scale repository of novice programmers’ activity. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education (2014). https://doi.org/10.1145/2538862

Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 476–493 (1994). https://doi.org/10.1109/32.295895

Ferguson, P.: Assessment and evaluation in higher education student perceptions of quality feedback in teacher education. Assess. Eval. High. Educ. (2009). https://doi.org/10.1080/02602930903197883

Insa, D., Silva, J.: Semi-automatic assessment of unrestrained java code * a library, a DSL, and a workbench to assess exams and exercises. In: Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education (2015). https://doi.org/10.1145/2729094

Kane, D., Williams, J., Cappuccini-Ansfield, G.: Student satisfaction surveys: the value in taking an historical perspective, 135–155 (2008). https://doi.org/10.1080/13538320802278347

Kincaid, J.P., Fishburn Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas for navy enlisted personnel, February 1975. https://apps.dtic.mil/sti/citations/ADA006655

Krusche, S., Reimer, L.M., Bruegge, B., von Frankenberg, N.: An interactive learning method to engage students in modeling. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering Education and Training (2020). https://doi.org/10.1145/3377814

Mccabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 308–320 (1976). https://doi.org/10.1109/TSE.1976.233837

Nguyen, H., Lim, M., Moore, S., Nyberg, E., Sakr, M., Stamper, J.: Exploring metrics for the analysis of code submissions in an introductory data science course. In: ACM International Conference Proceeding Series, pp. 632–638, April 2021. https://doi.org/10.1145/3448139.3448209

Parihar, S., Das, R., Dadachanji, Z., Karkare, A., Singh, P.K., Bhattacharya, A.: Automatic grading and feedback using program repair for introductory programming courses. In: Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, pp. 92–97, June 2017. https://doi.org/10.1145/3059009.3059026

Rahman, M.M., Watanobe, Y., Nakamura, K.: Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Appl. Sci. 10 , 2973 (2020). https://doi.org/10.3390/APP10082973

Article   Google Scholar  

Shah, M.: Exploring the use of parsons problems for learning a new programming language (2020). www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-88.html

Wisniewski, B., Zierer, K., Hattie, J.: The power of feedback revisited: a meta-analysis of educational feedback research. Front. Psychol. 3087 (2020). https://doi.org/10.3389/FPSYG.2019.03087

Download references

Author information

Authors and affiliations.

King’s College London, London, UK

Marcus Messer

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Marcus Messer .

Editor information

Editors and affiliations.

Ateneo De Manila University, Quezon, Philippines

Maria Mercedes Rodrigo

Department of Computer Science, North Carolina State University, Raleigh, NC, USA

Noburu Matsuda

Durham University, Durham, UK

Alexandra I. Cristea

University of Leeds, Leeds, UK

Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Cite this paper.

Messer, M. (2022). Grading Programming Assignments with an Automated Grading and Feedback Assistant. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium. AIED 2022. Lecture Notes in Computer Science, vol 13356. Springer, Cham. https://doi.org/10.1007/978-3-031-11647-6_6

Download citation

DOI : https://doi.org/10.1007/978-3-031-11647-6_6

Published : 26 July 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-11646-9

Online ISBN : 978-3-031-11647-6

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. PPT

    grading of programming assignments

  2. PPT

    grading of programming assignments

  3. Calculate Students Grading using Programming C

    grading of programming assignments

  4. Figure 1 from Automatic Grading of Programming Assignments: An Approach

    grading of programming assignments

  5. PPT

    grading of programming assignments

  6. (PDF) Grading programming assignments using rubrics

    grading of programming assignments

VIDEO

  1. Grading System with Student Ranking with SMS Notification

  2. How grading assignments be in 2024:

  3. Grading Writing Assignments

  4. NPTEL || Programming in Modern C++ week 4 Programming Assignments Answers JAN 2024

  5. HKUST Final Year Project 2019-2020 Presentation

  6. Video12: Understanding Grading Functions

COMMENTS

  1. Automatic Grading of Programming Assignments: An Approach Based on

    Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides counterexamples given a single reference ...

  2. Programming assignments

    Programming assignment grades. Programming assignments are graded automatically. Some are graded using a built-in grading algorithm that compares your program's output to a value specified by your instructor. Others are graded using a custom grading algorithm created by your instructor. If a programming assignment uses built-in grading:

  3. Concept-Based Automated Grading of CS-1 Programming Assignments

    Conventional attempts at automated grading of programming assignment rely on test-based grading which assigns scores based on the number of passing tests in a given test-suite. Since test-based grading may not adequately capture the student's understanding of the programming concepts needed to solve a programming task, we propose the notion of ...

  4. Automated Grading and Feedback Tools for Programming Education: A

    A partial grading method using pattern matching for programming assignments. In Proceedings of the 2019 8th International Conference on Innovation, Communication and Engineering. 157 - 160. DOI: Google Scholar [148] Liu Xiao, Wang Shuai, Wang Pei, and Wu Dinghao. 2019. Automatic grading of programming assignments: An approach based on formal ...

  5. PDF Automatic Grading of Programming Assignments: An Approach Based on

    world test-based grading method. In this way, we reveal the weakness of incomplete test suites that are often used in the online grading of programming assignments. II. OVERVIEW A. Problem Statement The goal of our work is to automatically decide the cor-rectness of student programming assignments according to a reference implementation.

  6. Automated Grading and Feedback Tools for Programming Education: A

    refer to providing a grade and/or feedback as an assessment unless we explicitly discuss grading or feedback exclusively. Using AATs to grade programming assignments originated in the 1960s [77]. Traditionally, AATs have focused on grading program correctness using unit tests and pattern matching. They typically

  7. Automatic grading of programming assignments

    Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides ...

  8. Automatic Grading of Programming Assignments: An Approach Based on

    AutoGrader is the first automated grading tool that relies on program semantics and generates feedback with counterexamples based on path deviations, which reduces human efforts in writing test cases and makes the grading more complete. Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this ...

  9. Model‐based automatic grading of object‐oriented programming assignments

    Automatic grading of object-oriented programming (OOP) assignments is an important problem from practical, theoretical, and educational viewpoints. Apart from computing a specific grade, an effective grading method needs to provide systematic feedback comments to both the design and code elements.

  10. Grading Programming Assignments with an Automated Grading ...

    A set of metrics and a methodology to assist in grading and providing feedback automatically using ML and NLP for grading readability, maintainability and documentation. An implementation of the methodology for Java programming assignments. An evaluation of the effect of auto-grading programming assignments has on course design.

  11. Automatic grading of programming assignments: An approach based on

    Programming assignment grading can be time-consuming and error-prone if done manually. Existing tools generate feedback with failing test cases. However, this method is inefficient and the results are incomplete. In this paper, we present AutoGrader, a tool that automatically determines the correctness of programming assignments and provides ...

  12. PDF Automatic Grading of Programming Assignments

    There are number of tools used by instructors or educational institutions that facilitate the automatic grading of programming assignments. Some of the examples are Marmoset [18] and WebCAT [14]. Marmoset is a framework that provides an automatic submission and testing system [12] .

  13. How can I automate the grading of programming assignments?

    As a TA, the usual workflow for grading programming assignments for an introductory course would be: Some instructors used email as a submission mechanism (yes, really.) So, search for submissions in email and download every submission; Organize it to make grading easier later on; Go into each directory, compile it. Realize it doesn't work.

  14. Automated Assessment Tools for grading of programming Assignments: A

    Automatic grading of programming assignments is an important topic in academic research. It aims at improving the level of feedback given to students and optimizing the professor's time. Its ...

  15. Automated Grading and Feedback of Programming Assignments

    Automated Grading Systems for Programming Assignments: A Literature Review. International Journal of Advanced Computer Science and Applications , Vol. 10 (2019). Issue 3. Google Scholar Cross Ref; Jan Philip Bernius, Stephan Krusche, and Bernd Bruegge. 2021. A Machine Learning Approach for Suggesting Feedback in Textual Exercises in Large Courses.

  16. Automatic Grading of Programming Assignments in Moodle

    Providing timely and meaningful feedback of programming assignments to students is very important to ensure a smooth learning curve. However, this can be a time-consuming task for teachers and instructors. There are some tools that already provide support for grading programming assignments, and some are already integrated with learning management systems that allow teaching staff to manage ...

  17. CS107 How assignments are graded

    We know that you will invest a lot of time into completing the programming assignments and the results of those efforts will be your biggest source of accomplishment (and frustration :-) in this course. We will celebrate and reward your work by counting the assignments for a healthy chunk of your course grade. ... We randomly shuffle grading TA ...

  18. Improving Educational Outcomes: Developing and Assessing Grading System

    The proliferation of Massive Open Online Courses (MOOCs) [] has led to an increased interest in automated grading for programming assignments.Educational institutions now employ a variety of techniques and systems to grade programming assignments automatically, in order to make the grading process more efficient and improve students' learning experience.

  19. PDF Grading Programming Assignments with an Automated Grading ...

    Partially automating the process of grading and feedback could help solve these issues. This project will look into meth-ods to assist in grading and feedback partially subjective elements of programming assignments, such as readability, maintainability, and doc-umentation, to increase the marker's amount of time to write meaning-ful feedback.

  20. Programming Assignments Automatic Grading: Review of Tools and

    A characterization of evaluation metrics to grade programming assignments is provided as first step to get a model, and new paths in this research field are proposed. Automatic grading of programming assignments is an important topic in academic research. It aims at improving the level of feedback given to students and optimizing the professor time.

  21. How to write a grading rubric for coding assignments?

    5. This is what I have for my Java assignments. Notice that this is not a programming course, but an algorithms course. Compiling and running (8 points) 1.a 0 points if the program doesn't compile. No points for the rest. Grading complete. 1.b If the code compiles and runs: a Full points if it succeeds in all test cases.

  22. Interface-based programming assignments and automatic grading of java

    AutoGrader is a framework developed at Miami University for the automatic grading of student programming assignments written in the Java programming language.AutoGrader leverages the abstract concept of interfaces, as implemented by the Java interface language construct, in both the assignment and grading of programming assignments. The use of interfaces reinforces the role of procedural ...