Use of corpora in the development of materials for reading comprenhension practice

Read the following article written by two teachers of EST at Universidad Simón Bolívar, Venezuela

Cartaya de Herrero, N., & Casart Quintero, Y. (2009). Rasgos léxicos del material de instrucción y de las pruebas de logro de comprensión de lectura en un curso de inglés científico y tecnológico. In Anales de la Universidad Metropolitana (Vol. 9, No. 1, pp. 115-132).

and after knowing more about their method to study texts to improve the reading comprenhension of their students, try to use the tools that they suggest: Web Vocabprofile , based on the British National Corpus.

Comment your experience in this space.

Advertisements

8 thoughts on “Use of corpora in the development of materials for reading comprenhension practice

  1. I have tried it with the beginning of the text for practice on the wiki, namely, In-vivo Kinematics of the Cervical Spine in Frontal Sled Tests. It seems rather confusing, but as it includes some comments to help you understand the data, it becomes a bit clearer. After some practice, I think it can become rather useful in EST teaching and learning.

    I incluede 1331 words and this tool identified 45 keywords. It warns that for more realible comparisons longer texts should be used.

  2. Activity 2.5: Read the article by Cartaya de Herrero, N., & Casart Quintero, Y. (2009). Rasgos léxicos del material de instrucción y de las pruebas de logro de comprensión de lectura en un curso de inglés científico y tecnológico and after knowing more about their method to study texts to improve the reading comprehension of their students, try to use the tools that they suggest: Web Vocabprofile , based on the British National Corpus .

    I copied the text of my answer to Activity 2.3, and the results are: 68,47% of words are from the thousand more frequent English families of words, 12,85% are from the thousand to two thousand more frequent English families of words, 13,86% are semi-specific words and 4,82% are off-list words. The lexical variation is 2.10 (words in the text ∕different words).

  3. Having read Cartaya and Casart’s (2009) paper, one of their claims in particular (p. 122) comes in handy in the wider frame of an ESP/EAP course: ‘En la medida que un texto presente mayor porcentaje de variación léxica, ello indicaría que el texto contiene una amplia gama de palabras diferentes, lo cual haría que resultara más difícil a un aprendiz con limitado vocabulario’. This observation has reminded me of Prof Cantos Gómez’s intervention at the “VI Seminario de Investigación TIC-ETL” held at UNED on 13th Dec 2013. The title of his presentation, “Aplicaciones del análisis discriminante en textos: discriminación de temáticas y tipificación de la dificultad lectora” (http://www.canal.uned.es/mmobj/index/id/16527), made it clear that the use of computer algorithms might help determine the appropriateness of a given text for a given user context, thus facilitating the learner’s access to the information contained in the text and, in consequence, its decryption. Text and genre classification, however, were far from perfect, especially in those cases where linguistic differences were not enough to help the software discern key elements.
    The use of Web Vocabprofile seems useful to determine the way in which BNC’s least frequent words may affect vocabulary learning, but its indexing capacity does not as seem relevant for its direct application in the language learning class as, for instance, the acquisition of a specialized vocabulary wordlist.

  4. Hi,
    I have chosen a text from the twitter account of the course (Grasshoppers update escape response ‘real time’, http://phys.org/news/2014-04-grasshoppers-response-real.html) and the analysis shows that in a 388 Word article the keywords are:
    (1) 40540.50 grasshopper
    (2) 13513.50 predator
    (3) 783.39 escape
    (4) 614.25 species
    (5) 455.00 sand
    (6) 320.48 professor
    (7) 297.82 grass
    (8) 273.69 behaviour
    (9) 138.25 strategy
    (10) 51.16 land
    (11) 37.18 suggest
    Thanks to this activity I have understood that a keyword is not the word most used in a text, but the word most used in comparison with a corpus or any collection of texts. Grasshopper is not a very commonly used Word in general texts but it is in this one; for that reason it is the first in the ranking of keywords of this text.

  5. hello. I have chosen a highly scientific text, Functional organization of human
    sensorimotor cortex for speech articulation and I pasted 1026 words into the tool . A word like “cortical” is very uncommon in day-to-day conversation or even in a newsypaper. It is 107632.00 times more frequent in the text than it is in the reference corpus. This goes to show how specific and difficult scientific texts are for non – scientists. The lextutor tool can be a good tool for students and teachers because identifying the words that you really need in your discipline , a student can focus on the most important ones first, learning the less important one at a later moment.
    (1) 107632.00 cortical
    (2) 58708.00 ventral
    (3) 58708.00 cortex
    (4) 58708.00 sensorimotor
    (5) 39139.00 larynx
    (6) 23238.75 electrode
    (7) 13046.33 consonant
    (8) 4403.15 syllable
    (9) 3558.09 constrict
    (10) 3057.75 spatial
    (11) 2935.40 tract
    (12) 2446.17 phonetic
    (13) 1956.93 vowel
    (14) 1111.91 articulate
    (15) 1012.21 vocal
    (16) 603.17 tongue
    (17) 322.57 multiple
    (18) 276.92 contrast
    (19) 210.99 speech
    (20) 190.61 supplement
    (21) 114.39 active
    (22) 107.37 represent
    (23) 100.53 precise
    (24) 91.45 distribute
    (25) 88.39 figure
    (26) 76.64 determine
    (27) 74.74 during
    (28) 72.05 human
    (29) 71.95 locate
    (30) 67.36 pattern
    (31) 46.37 function
    (32) 44.54 whereas
    (33) 39.88 produce
    (34) 38.52 degree
    (35) 36.21 subject
    (36) 34.32 organize
    (37) 34.19 product
    (38) 30.22 individual
    (39) 29.06 single
    (40) 27.40 example

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s