Essay on Corpus Applied in Grammatical Nouns Correction

Published: 2021/11/08
Number of words: 2944


The emergence of corpora as a complement to conventional teaching methods presents multiple new opportunities in the instruction of English and other languages. Using corpora as an English instruction tool creates a conducive environment to establish a student-centered education model. It also allows for students to be self-driven instead of relying on teachers for instruction. Instead of teaching knowledge to students rigidly, teachers guide them to participate actively or cooperate with others to search, observe, analyze, and conclude in the corpus to awaken students’ initiative to learn by themselves. Teachers could use corpora to influence how students work and help them to transform disorderly and complicated English into regular and meaningful language knowledge. Using corpora and corpus tools by teachers to deal with vocabulary teaching is frequently developed (Friginal, 2018).

This research study will evaluate how the use of corpora in language studies significantly improves the learning experience. The study will also elucidate how corpora can develop to make the learning experience more effective and efficient. This research hopes to contribute to the existing, though minimal, body of research pertaining to corpora.

Need an essay assistance?
Our professional writers are here to help you.
Place an order

Literature review

Corpus refers to a collection of naturally occurring texts and distinguishes a variety of states of language. In the modern language landscape, corpus also defines a group of authentic machine-readable texts that are representative of an entire collection (Lund University, n.d.). The use of the term ‘machine-readable’ means that, through the use of a computer, one can sift through the text and look for specific information. Authenticity, on the other hand, is a pivotal element of corpora and means that the text has been retrieved from original works of spoken and written language, which includes sermons, speeches, meetings, talks, lectures, reports, periodicals, and published books. The representative element of corpora means that these texts should be a representation of a specific variety of languages.

Corpora are assembled with specific reasons and can, therefore, serve a variety of purposes. The British National Corpus, for example, is made up of more than 100 million words and serves a variety of purposes. One of the central reasons for the development of the British National Corpus was to create material that could reflect modern British English and its generic and social uses. The multiple-purpose corpus comprises approximately 90% of written British English content, while the remaining part consists of spoken material. This corpus is grouped into 4124 documents, each of which is comprised of either transcribed spoken content or written text. The corpus also represents a wide variety of thematic genres.

Various research methods have been posited to explain corpus linguistics as they attempt to connect data and theory. One of the most prominent perspectives is one developed by Wallis and Nelson (2001); it is referred to as the 3A perspective and is comprised of annotation, abstraction, and analysis. Therefore, the 3A model incorporates the application of schemes to text, translates the terms into datasets, and manipulates the datasets to satisfy the system user’s queries.

Corpora and other composite text types share a significant number of similarities, such as the presence of different and complex structural characteristics. However, all corpora share certain elements that define them and relate them to each other. One of the main aspects of corpora is concordances. Concordance lines refer to collections of examples that a computer can present after one runs a specified search through the corpus interface. Concordance lines in corpora are essential as other than giving an individual the specific words used; they can also deliver one the context in which the terms are used. However, contextual size does vary in size and can include just a word before or after the node or one or several sentences. By allowing the addition of words and even sentences to the node word, corpora can elucidate the contextual use of a word in its entirety. Concordance in corpora can be evaluated through the use of a concordancer, a software tool used to query the corpus for contextual meanings of words. The concordancer’s results will show the node word and concordance lines alongside various contexts in which the term is used (Sealey& Thompson, 2007).

Collocations are another fundamental element of the corpus. Collocations describe the frequency of the occurrence of words in a corpus. As language has developed over the years, human beings have conventionalized specific word combinations. The practical implication of collocations in literature is that despite there being several possible word combinations to describe particular phenomena or situations, very few combinations are used in real life. Collocations in corpora are essential as they allow us to identify the most frequently used combinations of words in a language (Gablasova et al., 2017).

Despite the advantages of using corpora to analyze and evaluate language use, there are specific barriers to using this linguistic tool. One is that corpora are unable to contain all knowledge about a specific language due to the infinite possibility of word combinations and sentences. Corpora may also be redundant as some of the findings present themselves as trivial. This redundancy reduces the efficiency of using corpora as it wastes a lot of time. Corpus cannot also be used alone as it is necessary to have the services of the language’s native speaker to confirm the results and identify whether they are grammatical or not.

Sample activity and rationale

This research will focus on senior English students and other related majors in university in China. These students will have to take a Test for English Majors-Band 8 (TEM-8) examination, a comprehensive English test taken by students in China. This examination mainly aims at estimating general knowledge of English by senior students. It is used to judge whether the students have achieved compulsory English knowledge as required in the National College English Teaching Syllabus for English Majors (2004, cited in Yang, 2017). The examination includes tests for reading, listening, translation, writing, and error correction. Error correction and proofreading could be divided into two main categories; lexical errors and syntactic errors. Lexical errors refer to errors made when choosing the appropriate words in certain contexts. Syntactic errors, on the other hand, refer to basic grammatical rule violations in the semantic perspective. Corpora could be mostly used to correct students’ lexical errors during practice.

Error correction and proofreading is a significant part of the examination as it takes up almost 10% of the entire test.

TEM-8 is held once annually, and the exam content is updated regularly to keep up with English use around the world. Two of the most used corpora are the British National Corpus (BNC) and Contemporary American English (COCA). The BNC is static when compared to COCA, which is more dynamic and, therefore, better for use by students. COCA has general, mixed spoken and written, contemporary, diachronic, and annotated content that is better for teaching language. To initiate the use of corpora in language teaching, teachers should introduce the basic knowledge of corpus use to students.

An example of teaching activity to be used with corpora is teachers dividing their students into two groups, one of which is given the noun ADJ woman and the other is given ADJ man. The teacher then prepares questions such as the contextual use of the words and includes various lexical and syntactic errors in the word used to test the students. Students are then instructed to COCA to search for the nouns’ matching strings and their usage and correct the errors presented by the teacher. The teacher should also instruct the students to search for the most frequent contextual sequences of the words.

Another learning activity that the teacher could use to promote the use of corpora in language instruction is having students guess the most frequently used collocates of a word. This exercise is an in-class activity that requires the students to guess which words are used alongside a node noun. The teacher should then run the word through COCA and give points to the most frequently used word combinations. Such activity could help promote proper English usage by students as they will get used to frequently used word combinations. The fun nature of the activity also promotes learning autonomy, where students can use COCA independently. This exercise could also be reversed where the teacher gives the students collocates and gets them to guess the most used node nouns.

Another learning activity aimed at promoting autonomy and learning independence would require the teacher to give the students certain words and require them to use COCA to identify the meaning of the targeted words and consolidating the word usage. Once this is done, students would be required to look at the concordance lines and the language surrounding the node noun. The teacher would then give them true/false questions that test the usage of the words given earlier. This activity would help promote independence in learning, train students on how to use COCA, and introduce them to concordances, which are essential in the use of corpora.

Most of the students are intermediate English users according to the Common European Framework of Reference (CEFR), and this allows the teacher to give slightly advanced tests. Through group work, the teacher can ask students to evaluate how the words that are given interact with other words in the formation of sentences (Yusu, 2014). In the study activity given above, students are expected to discuss their findings in a group setting where they highlight the most frequent contextual strings of word occurrence and their reasoning for the frequency.

Proofreading and error correction questions in the TEM-8 examination could also be used alongside corpora as effective study tools. Teachers should sort these questions, most of which revolve around noun usage, and allow the students to use corpora to analyze. Students should be instructed to critically evaluate the noun’s usage and various word combinations that are frequently used with these node nouns. Once the students have done this on their own, they can discuss it with their fellow students, as well as with the teacher for correction.

COCA could be used by teachers in both vocabulary instruction and word comparison (Yusu, 2014). This is achieved by teachers curating questions that have erroneous use of nouns, and requiring students to identify the errors and correct them. This can be done using both single words and longer passages and statements that will increase the complexity of the exercises.


Using the experiment described above gave an accurate account of the results that can be achieved when corpora are used in teaching non-native English speakers how to communicate in English, especially as pertains to noun usage. One of the study findings was that, with time, students could use the corpus more effectively and require less assistance from the instructor. Students learning to use the corpora is proof that corpus can be an effective and efficient method to teach language to non-native language speakers. For effective learning, the corpus has to be easy to use and have a broad range of vocabulary that allows the students to research various occurrences of concordance. The dynamic corpus is also the most effective as they allow the student to search for a wider variety of collocations. With the increased size of the corpus, students can more accurately identify what collocations are used most frequently, which significantly improves their understanding of common word combinations.

Corpora are also interactive learning tools as they allow for a continuous learning process that can be achieved without the presence of the instructor. In the experiment, teachers would give the students a set of nouns to run through the corpora, identify the frequency of various concordances, and identify errors in the words given. By using such learner-driven modes of instruction, the teachers can instill a sense of learning independence that enables the students to personally work on their language skills using corpora without requiring their presence. The modern technological aspect of using corpora makes it more appropriate for younger learners who are quicker at learning through technology than older learners (Breyer, 2011).

One of the limitations of corpora highlighted in this report is that they are unable to accommodate the infinite possible combinations of words and cannot, therefore, be completely effective in teaching students language use. However, with the development of technology, corpora have been made more flexible, adaptable, and dynamic. This improvement has allowed for the integration of artificial intelligence into corpora, therefore, enabling these systems to gather collections of written text and transcribe spoken content without requiring input from persons. These databases are, therefore, able to grow exponentially and improve the overall learning experience for students. Hong (2018) presents a model of AI use to construct a more efficient and effective corpus through a three-step process of mining information, retrieving the information, and finally processing it. The retrieval system posited by Hong uses web crawlers to gather network information as well as the automated tagging of technology in the indexing of the information. Once this has been achieved, the system uses corresponding language processing systems to correspond to different languages and create a language database. AI can also make it easier to maintain a record of searches and queries by users and extract and give feedback that will consistently build the existing corpus. Successful implementation of AI in the language learning and instruction process has already been viably achieved through the use of applications such as Grammarly. In language instruction, AI can create a shared resource in that these different corpora can be integrated for more efficient and effective learning.

Corpora also make the learning process more efficient by reducing the amount of time required to learn. By having user-friendly interfaces and immense information stored on their databases, corpora can effectively reduce the amount of time that students take to look up words and find out how to use them (Birkner, 2015). It also reduces the time taken by instructors to teach students as they can easily look up node words and come up with a wide variety of different word combinations to use to teach the students. Corpora are also very accurate, so teachers do not have to worry about the authenticity and accuracy of the information being disseminated.

The use of corpora in the learning process expands the modes of instruction that teachers can use with their students. Corpora allow for both a corpus-based study approach as well as the corpus-driven study approach, both of which are appropriate for language instruction. The corpus-based approach is implemented in the learning process by beginning with certain tasks or, in this case, words that the students query using a corpus and find out their meanings. On the other hand, the corpus-driven approach does not require the students to begin with any words to pass through the corpus, but instead, it is used as the primary tool for instruction. The teacher goes straight to the corpus and uses it to instruct students through the identification of statistically frequent word combination uses (Birkner, 2015).

Worry about your grades?
See how we can help you with our essay writing service.

Learner’s use of corpora in their language instruction is an effective way to improve overall competence in the use of the language by students (Samu, 1991). In this study, learners’ competence in English significantly increased as they were able to identify the node words and come up with possible word combinations. The use of corpora made these students’ use of English closer to native language speaking, as it showed the most frequently used word combinations, promoting the use of native English. Such language training improves learners’ fluency, and native language speakers can understand their delivery more easily. Language learning through corpora also allows students to develop self-drive as it encourages independent learning.


The use of corpora in language instruction is a more effective and efficient method of teaching when compared to conventional methods of teaching that only involve the teacher, learner, and instruction material such as books and notes. Corpora are more dynamic and have drastically more content than conventional learning tools. With advancements in technology, corpora will become more efficient as they will learn the students’ habits, store more information, and adapt more quickly. This technological advancement will allow for larger corpora databases and allow more complex word combinations to be searched on these systems. It will also allow for more content to be autonomously added to these corpora databases, making the systems more efficient. Dynamism will improve the use of corpora as it will make them more adaptable and increase their capabilities. It is, however, important for more researchers to establish how to make use of corpora more effective in language instruction.


Available at:

Birkner, V., 2015. Advantages and disadvantages of employing corpus evidence in sociolinguistic studies. The Teacher Magazine, 2(126), pp. 11-16.

Breyer, Y. A., 2011. Corpora in Language Teaching and Learning: Potential, Evaluation, Challenges. S .l.: Peter Lang.

Friginal, E., 2018. Corpus Linguistics for English Teachers: Tools, Online Resources, and Classroom Activities. Routledge.

Gablasova, D., Brezina, V. &McEnery, T., 2017. Collocations in Corpus‐Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence. Language Learning, 67(51), pp. 155-179.

Lund University, n.d. What is a Corpus?. [Online]

NACFLT. (2004), Syllabus for TEM-8, Shanghai: Shanghai Foreign Language.

Samu, W., 1991. The Advantages and Drawbacks of Using Corpus in Translation, s.l.: University of Birmingham.

Sealey, A. & Thompson, P., 2007. Corpus, Concordance, Classification: Young Learners in the L1 Classroom. Language Awareness, 16(3), pp. 208-223.

Wallis, S. & Nelson, G., 2001. Knowledge discovery in grammatically analysed corpora. Data Mining and Knowledge Discovery, Volume 5, pp. 307-340.

Yang, Y., 2017. Test for English majors-band 8 (TEM8) in China. Journal of Language Teaching and Research, 8(6), pp.1229-1233.

Yusu, X., 2014. On the Application of Corpus of Contemporary American English in Vocabulary Instruction. International Education Studies, 7(8), pp.68-73.

Cite this page

Choose cite format:
Online Chat Messenger Email
+44 800 520 0055