Corpora: English Corpora: A – E
Corpus
Linguistics |
Tutorials |
Corpora |
English
Corpora |
German
Corpora |
More
Languages |
Spoken
Corpora |
Learner
Corpora |
ICE
Corpora |
Parallel
Corpora |
Historical
Corpora |
Treebanks |
Text
Archives |
Alphabetical
List |
Software |
CL
in Applied Linguistics |
|
You are now in section > Corpora
> English Corpora
> A – E |
A – E
|
F – J
|
K – O
|
P – T
|
U – Z
|
ACE
Bank
of English
BNC
BROWNCEECS
Christine |
CIC
CLCCOLT
CPSAECI |
ACE – Australian Corpus of English
Org: |
Macquarie University, Sydney, Australia |
Time: |
1986 |
Size:
|
1 mio words (500×2000 words samples) |
Contents: |
written and spoken, multigeneric (15 different genres) |
Access:
|
available on the ICAME
CD-ROM |
Notes: |
modelled
on BROWN and LOB
for linguistic research |
Bank of English – Cobuild
Org: |
Cobuild and the University of Birmingham, UK (John
Sinclair) |
Time: |
majority of the material originates after 1990 |
Size:
|
415 mio words in Oct 2000 (still growing) |
Contents: |
written and spoken; multigeneric ( |
Access:
|
CobuildDirect allows restricted access to the corpus
through a java or telnet interface; restricted concordance and collocation
queries are possible |
Notes: |
|
BNC
– British National Corpus
Org: |
Lead by an industrial/academic consortium lead by
Oxford University Press |
Time: |
completed in 1994; first release in 1995; second release
in 2001 |
Size:
|
over 100 mio words (4,125 texts) |
Contents: |
multigeneric; 90% written and 10% spoken materials |
Access:
|
Licensed; Guest account available by using the SARA
Client at the
BNC Online Service or conduct
a simple search at the BNC. |
Notes: |
SGML
Markup according to the TEI
guidelines; POS
tagging carried out with CLAWS |
BROWN
University Corpus
Org: |
Brown University, Rhode Island,U.S. |
Time: |
1960s |
Size:
|
ca. 1 mio words |
Contents: |
American written English; 500 text samples of approximately
2,000 words distributed over 15 text categories |
Access:
|
available on the ICAME
CD-ROM |
Notes: |
|
CEECS
– Corpus of Early English Correspondence Sampler
Org: |
University of Helsinki, Finnland |
Time: |
1418-1680 |
Size:
|
approx. 450,000 words |
Contents: |
click
here for a list of included texts |
Access:
|
available on the ICAME
CD-ROM |
Notes: |
represents the non-copyrighted materials included
in the Corpus of Early English Correspondence |
CHRISTINE
Corpus
Org: |
Geoffrey
Sampson, University of Essex, UK |
Time: |
first distributed in August 2000 |
Size:
|
|
Contents: |
spoken English, and particularly spontaneous,
informal spoken English |
Access:
|
freely available for download
here |
Notes: |
see also SUSANNE |
name=”CIC”
CIC
– Cambridge International Corpus
Org: |
Cambridge University Press |
Time: |
ongoing |
Size:
|
300 mio words and expanding |
Contents: |
multigeneric; written and spoken British and American
materials, learners’ English |
Access:
|
“Currently, it can only be used by authors and
writers working for Cambridge University Press and by members of
staff at UCLES.” |
Notes: |
“Authors, editors and lexicographers use the
CIC
[…] when they are working on books for Cambridge University Press.” |
CLC
– Cambridge Learner Corpus
Org: |
Cambridge University Press and UCLES. |
Time: |
ongoing |
Size:
|
10 mio and expanding |
Contents: |
anonymised exam scripts written by students taking
UCLES
English exams around the world |
Access:
|
“Currently, it can only be used by authors and
writers working for Cambridge University Press and by members of
staff at UCLES.” |
Notes: |
It forms part of the Cambridge
International Corpus |
COLT
– Bergen Corpus of London Teenage Language
Org: |
University of Bergen, Norway |
Time: |
material collected in 1993 |
Size:
|
500.000 words; Pilot-version consists of 151 texts |
Contents: |
transcripts of spoken ‘London Teenage Language’ |
Access:
|
search in the pilot version is available; reg. users
can search the entire corpus online; COLT is available on the ICAME
CD-ROM |
Notes: |
COLT is part of the BNC;
it is tagged for word classes |
CPSA
– Corpus of Spoken Professional American English
Org: |
Contact: Michael
Barlow |
Time: |
1994-1998 |
Size:
|
2 main sub-corpora, 1 mio words each |
Contents: |
short interchanges by 400 speakers – professional activities broadly
tied to academics and politics |
Access:
|
Registered users only ($79 for the individual using
the tagged version) |
Notes: |
The tagging was performed by Tony McEnery and Paul
Baker using the CLAWS programme at UCREL, Lancaster University; available both tagged and untagged |
ECI
Corpus
Org: |
ELSNET |
Time: |
materials collected between 1984 and 1993 |
Size:
|
Four different corpora ranging from 4 to 34 mio. words |
Contents: |
German, French and Dutch newspaper texts; parallel
texts in English Spanish and French |
Access:
|
available on CD Rom for € 50 for research purposes
only |
Notes: |
|
You are now in section > Corpora
> English Corpora >
A – E
|
Data-driven
learning |
Virtual
Resources |
Bibliography |
Email |
About |
|
|
<FILE ARCHIVED ON AND RETRIEVED FROM THE INTERNET ARCHIVE ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
SECTION 108(a)(3)). contact rubtcova.com
Leave a comment