data driven learning
DESCRIPTION
A paper written for a course on the structure of the English language. It explores Data Driven Learning (using corpus linguistics in language learning)TRANSCRIPT
APLING 629: Structure of the English Language
Data Driven LearningProf. Charles Meyer
Apostolos Koutropoulos5/12/2009
Data Driven Learning Overview
Data Driven Learning derives from the work of Tim Johns when he suggested that instructors
should use corpora in language learning classrooms. (Braun, 2007) Data driven learning is the
"application of tools (concordancers) and techniques from corpus linguistics in the service of language
learning." (Payne, 2008) The benefits of data driven learning is that the focus is on "the exploitation of
authentic materials even when dealing with tasks such as the acquisition of grammatical structures and
lexical items [...], on real, exploratory tasks and activities rather than traditional «drill & kill» exercises,
[...] on learner-centred activities," and on "the use and exploitation of tools rather than ready-made or
off-the-shelf learnware." (Rüschoff)
Data driven learning differs from traditional grammar learning in a few ways. For starters, the
pedagogical approach to teaching grammar the traditional way is through a process of presentation of
information on the teacher's part, then the students practice with this information, and then the
students produce new content. In contrast, in Data driven learning students observe a grammatical
phenomenon of the language, they hypothesize as to how this grammatical phenomenon works, and
then they experiment to see if their hypothesis is correct (Payne, 2008)
In traditional language and grammar learning, the teacher is the driver and the students the
passengers, while in data driven learning the teacher is more of a co-pilot and navigator and the
students are able to sit in the driver's seat and take control of their learning. Because of this difference
in the pedagogic approach, the materials used are also different in a data driven learning classroom
compared to traditional classrooms.
2
In a traditional classroom the main companions to instruction are textbooks. The textbooks and
the traditional approach to grammar learning "divides up grammar in an (sic.) system that ignores the
nature of English and of authentic communication using English." (Byrd, 1997) This of course poses a few
issues that are outlined by Byrd, such as the inconsistency in defining what is easy grammar versus hard
grammar, the ability to cover all the material in a given curriculum, and using authentic materials in the
language classroom (1997).
Data driven learning solves some of these issues by not relying on textbooks, but rather relying
on corpora, "a body of text assembled according to explicit design criteria for a specific purpose,"
(Payne, 2008) concordancing programs and keyword-in-text (KWIC). By using corpora, you are using
authentic text from the target language both in your instruction of that language and the grammar of
that language. Thus you are exposing students to material that they are likely to come across as users of
the language.
Resource Evaluation
Running a quick Google search one does not find a lot of resources pertaining to language
teaching using a data driven learning methodology. Looking a little deeper I found a number of journals,
such as ReCALL, where I was able to find more information about data driven learning in practice. What I
found interesting was that data driven learning was the focus of experiments in classrooms in Europe,
Asia and the Middle East, however I did not find any articles where the focus of the experiment was an
ESL classroom in the United States where corpora are available and primary materials such as video and
audio files are also easily available.
The resources that I found fall into two categories. The first category is the aforementioned
research where researchers in EFL and ESL conduct studies in classrooms to see how data driven
3
learning approaches work compared to traditional learning methodologies when teaching a certain
topic. The second category, a much smaller category as I mentioned, is that of learning resources for
language teachers and examples of data driven learning activities.
Passapong Sripicharn's website is one resources that has materials that are "designed to draw
the learners' attention to certain vocabulary and linguistics features by providing the students with
concordance data and guiding them to make a generalisation." (2005) He does this by first introducing
students to concordancing and the keyword-in-context (KWIC) format in lessons one and two. Following
lessons provide the students with, what seem to be, handpicked sentences and KWIC examples that are
used to illustrate some feature of the language. Some examples of the types of exercises used are:
deducing the meaning of a word, putting sentences in order so that they make sense, fill-in-the-blanks,
and determining relationships between different grammatical structures. Another example of resources
is David Lee’s list of resources and miscellaneous links for tools and examples of teaching using corpora.
(2007)
The examples that Passapong Sripicharn uses do present some issues because they require
interpretation and the online format does not allow for detailed analysis and synthesis of information by
the student. It provides absolute interpretations which may or may not be true. For example, in Unit 251
there is a concordance using KWIC with hectic being the keyword. In this example there are five
sentences using hectic and the students are asked to pick the word which has the closest meaning to
hectic. If you click on what the author intended you to click you get a pop-up window that says correct
and if you guess wrong you get a pop-up window that says try again. The available synonym options are
boring, busy, bad, and lively. The answer marked as correct is busy, however I could also see lively and
bad as being viable options. From personal use I know that I've used hectic to mean bad quite a few
times. If things are busy, I just say that it was a busy week.
1http://www.geocities.com/tonypgnews/ddl_25.htm
4
There are also good examples of using collocation to determine the meaning of a word or an
expression. In Unit 302 for instance, Passapong Sripicharn provides us with KWIC text for the expressions
on the road to and on the brink of. Based on the collocating words that the students observe in these
examples they then are asked to pick the right expression to fill in the blank in an exercise. The
relationship that we see is that on the road to is followed by something positive and on the brink of is
followed by something negative. Of course this has limitations, which a teacher must account for, such
as someone using on the road to hell, or on the road to disaster which obviously are not positive.
Usage in my Classroom
The major question about data driven learning is whether or not I would be able to implement it
in a language classroom in which I taught, either an ESL/EFL classroom or a classroom where I would be
teaching Greek. Based on the information I found in my searches for both materials and ideas to use in
the classroom and research articles I would say that I would use data driven learning, but I would not
replace a whole curriculum with data driven learning. Instead I would use data driven learning
techniques to either teach specific topics, or I would use it as a type of exercise for the students to use in
the process of learning the target language.
Since the "DDL approach suggests that grammar learning should consist largely of
consciousness-raising activities rather than the teaching of rules.” (Mansour & Ali Akbar, 2006) I would
have to see where I could best fit such an approach. In addition, the other reason why I wouldn't go all
in with data driven learning is because all of the literature seems to indicate that there is no clear cut
proof that data driven learning is superior to any other methods of teaching. (Boulton, 2009) (Braun,
2007) (Mansour & Ali Akbar, 2006)
2 http://www.geocities.com/tonypgnews/ddl_30.htm5
One of the factors that I would need to consider before I implemented data driven learning
would be class size and required resources for data driven learning activities. If the class size is too large,
it may not be possible to conduct data driven learning exercises in class, and as such it might be better
off as a homework exercise. As homework other factors need to be taken into account such as if the
students have computers at home, if they've got access to the corpora that you want to use and if
they've got access to the tools to do a KWIC analysis.
The second factor that I would have to consider is the expectations of the students. The
predominant belief seems to be that grammar learning is associated with learning a set of rules, and if
such an element is lacking in the classroom, the students might feel like they have not learned
something. In Hadley (Hadley, 2002) for instance we see that "Kerr (1993) found in his survey of 100
teacher trainees that attitudes toward grammar ranged from viewing it as an abstract set of rules, to
expressing feelings of terror. Similar sentiments are found in Chalker (1994), who notes that many
classroom teachers equate grammar with the acquisition of some set of rules -- rules that are at times
contradictory and at other times confusing."
In Braun (Braun, 2007) we see that several students felt that they hadn't learned any grammar
because they did not write down any grammar rules. Braun notes that "such statements reflect
prevailing perceptions about learning: it is still seen as something that happens only if, or as soon as,
something is being written down." (Braun, 2007) I think that in order for data driven learning to gain
acceptance from the students the teacher needs to do two things. The first thing is for the teacher to
explain to the students that experiential learning will not only help them deduce the rules but will also
help them in learn how to analyze text when they are in situations where they don’t know something in
a text and they don’t have the benefit of having someone with them that can explain it. The second
thing that teachers should do is to provide a summary of what the students have learned at the end of
6
each exercise and tie that in with other established rules. This way the teacher helps arrange the rules
that the students have synthesized through their analysis of corpora through data driven learning, and
the students who feel that they haven’t learned anything because they didn’t write down any rules can
rest easy because they can now write down a few rules.
One final factor to decide is whether or not to use a full corpus, such as the corpus of
contemporary American English3 and go full speed ahead and let the students do their own
concordances, or whether to filter the material and provide the students with printed out concordances
such as the ones provided on Passapong Sripicharn’s (2005) website. I think this would depend both on
the level of the students in the classroom, the technology limitations, the classroom makeup, and
whether or not I would want my students to focus on a specific corpus. For instance I may want to focus
on blog language for a few lessons to illustrate a few points. I could develop a corpus on my own for that
set of lessons and hand it out to students to use with their concordancers. Some corpora on the other
hand may require subscriptions so it may be better to provide printouts of specific concordances.
I think that in the end it comes down to knowing your students. As Hadley (2001) writes about
problems with data driven learning, students can become demotivated if they get too much on their
plate in terms of data, and they might not be able to have sufficient material to analyze if they don't get
enough data. The concordancing materials might be at a level beyond what the student is conformable
with, so students aren't scaffolded, but rather are being asked to leap and hope they can grab on to the
level that the materials are on.
At this point Hadley points out that the teacher is stuck between a rock and a hard place. The
teacher can "simplify the concordance material and lessen its authenticity," (Hadley, 2001)like what
3 http://www.americancorpus.org/
7
Passapong Sripicharn (2005) did for his website, "or maintain the authenticity and risk demotivating
some students because of the difficulty of the material." (Hadley, 2001)
Personally I think that if curricula incorporate data driven learning from early ages in language
learning, it's perfectly OK to choose simpler and less authentic material. As the students mature, you can
scaffold them onto more challenging and more authentic corpora in data driven learning exercises. In an
environment, such as an ESL classroom, where you might find mixed levels of background knowledge
and analytical approaches, data driven learning can still be used. In this instance the teacher needs to do
an assessment of the student's skills and prepare material as needed for each student. This way each
student will be performing at the level that they feel comfortable with, while at the same time
scaffolding to a more advanced level.
8
BibliographyBoulton, A. (2009). Testing the limits of data-driven learning: language proficiency and training. (F. Blin,
& J. Thompson, Eds.) ReCALL: the journal of EUROCALL. , 21 (1), 37-54.
Braun, S. (2007, September 6). Beyond Data-Driven Learning: Learning activities for a spoken multimedia corpus. Retrieved April 30, 2009, from European Youth Language: http://www.um.es/sacodeyl/data/conferences/eurocall2007/Beyond%20Data-Driven%20Learning_eurocall2007_sb.ppt
Braun, S. (2007). Integrating Corpus Work into Secondary Education: From Data-Driven Learning to Needs-Driven Corpora. (F. Blin, & J. Thompson, Eds.) ReCALL: the journal of EUROCALL. , 19 (3), 307-328.
Byrd, P. (1997, December). Grammar FROM Context: Re-thinking The Teaching Of Grammar At Various Proficiency Levels. The Language Teacher Online , 21 (12).
Cobb, T., Greaves, C., & Horst, M. (2001). Can the rate of lexical acquisition from reading be increased? An experiment in reading French with a suite of on-line resources. In P. Raymond, & C. Cornaire, Regards sur la didactique des langues secondes. (pp. 133-153). Montreal, QC, Canada: Éditions logique.
Hadley, G. (2001). Concordancing in Japanese TEFL: Unlocking the power of data-driven learning. In K. Gray, M. Ansell, S. Cardew, & M. Leedham, The Japanese Learner: Context, Culture and Classroom Practice (pp. 138-144). Oxford, UK: Oxford University Press.
Hadley, G. (2002). Sensing the Winds of Change: An Introduction to Data-Driven Learning. RELC Journal , 22 (2), 99-124.
Infante, P. (2009, April). Explicit Grammar Instruction: Theory & Research. Retrieved April 30, 2009, from Applied Linguistics Student Association: http://alsaclub.ning.com/forum/attachment/download?id=2643024%3AUploadedFi38%3A1481
John's, T. (2000, August 1). Retrieved April 30, 2009, from Tim John's Data-Driven Learning Page: http://www.ecml.at/projects/voll/our_resources/graz_2002/ddrivenlrning/whatisddl/resources/tim_ddl_learning_page.htm
Lamy, M.-N., Klarskov Mortensen, H. J., & Davies, G. (2009). ICT4LT Module 2.4: Using concordance programs in the Modern Foreign Languages classroom. Retrieved April 30, 2009, from Information and Communication Technology for Language Teachers: http://www.ict4lt.org/en/en_mod2-4.htm
9
Lee, D. (2007). Teaching & Misc. Links. Retrieved April 30, 2009, from David Lee's Bookmarks for corpus-based linguistics: http://devoted.to/corpora
Mansour, K., & Ali Akbar, J. (2006). Data-driven Learning and Teaching collocation of prepositions: The Case of Iranian EFL Adult Learners. (J. Jung, & P. Robertson, Eds.) Asian EFL Journal , 8 (4), 192-209.
Mukherjee, J. (2005). Data Driven Learning. Retrieved April 30, 2009, from Anglistik Language Centre Giessen: http://http://www.uni-giessen.de/anglistik/ling/ALC/ddlintro.html
Passapong, S. (2005). My DDL Materials. Retrieved April 30, 2009, from Tony's DDL: http://geocities.com/tonypgnews/units_index_pilot.htm
Payne, J. S. (2008, June 8). Data-Driven South Asian Language Learning. Retrieved April 30, 2009, from The University of Chicago South Asian Language Resource Center: http://salrc.uchicago.edu/workshops/sponsored/061005/DDL.ppt
Rüschoff, B. (n.d.). Data-Driven Learning (DDL): the idea. Retrieved April 30, 2009, from http://www.ecml.at/projects/voll/rationale_and_help/booklets/resources/menu_booklet_ddl.htm
Tian, S. (2005). Data-Driven Learning: Do Learning Task and Proficiency Make a Difference? Proceedings of the 9th Conference of the Pan-Pacific Assocition of Applied Linguistics (pp. 360-372). Tokyo, Japan: Waseda University Media Mix Corp.
Truscott, J. (1998). Noticing in second language acquisition: A critical review. Second Language Research , 14 (2), 103-135.
Tuttle, H. (2009, April 25). Empowering Teachers to be Data Driven Decision Makers. Retrieved April 30, 2009, from archive.techlearning.com/techlearning/pdf/events/techforum/chi05/Tuttle_Empowering_D3M_TFCH05.pdf
10