• norsk
    • English
  • norsk 
    • norsk
    • English
  • Logg inn
Vis innførsel 
  •   Hjem
  • Faculty of Humanities
  • Department of Linguistics, Literary and Aestetic Studies
  • Department of Linguistics, Literary and Aestetic Studies
  • Vis innførsel
  •   Hjem
  • Faculty of Humanities
  • Department of Linguistics, Literary and Aestetic Studies
  • Department of Linguistics, Literary and Aestetic Studies
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

Guarding the Guardians Rating scale and rater training effects on reliability and validity of scores of an oral test of Norwegian as a second language

Carlsen, Cecilie
Doctoral thesis
Thumbnail
Åpne
Thesis_Cecilie Carlsen.pdf (1011.Kb)
Permanent lenke
https://hdl.handle.net/1956/2085
Utgivelsesdato
2004-01-16
Metadata
Vis full innførsel
Samlinger
  • Department of Linguistics, Literary and Aestetic Studies [688]
Sammendrag
This thesis focuses on the scoring of a national test of Norwegian as a second language: Språkprøven i norsk for voksne innvandrere, developed by Norsk språktest at the University of Bergen. In order to ensure a fair assessment of the candidates’ oral production, the test constructors make use of trained raters basing their scores on an explicit rating scale (NORS). These two highly recommended procedures in performance testing have traditionally been viewed as means to heighten reliability of test scores. In line with recent developments in the field of language testing, I argue that the rater variable affects not only reliability, but the very construct validity of test scores. Rater training and development of rating scales are costly and time-consuming enterprises. To establish their effect on test scores is therefore interesting from a test theoretical, as well as from a practical and economical point of view. In the study, four groups of informants are compared: non-linguists (or naïve-native speakers), teachers of Norwegian as a second language without rater training, raters of Språkprøven, and finally a subgroup of the most experienced raters of Språkprøven. The informants score eight candidates’ video recorded performances on a six-point scale. The first four are scored impressionistically, and the next for by informants using the NORS. The quantitative data are used in an investigation of internal agreement (inter-rater reliability) between raters of the distinct groups. Informants are also asked to give written reports of their scores, which are used in an investigation of raters’ underlying criteria for assessing speech. The qualitative data are used firstly in an attempt to explain the results of the reliability study, and thereafter in an investigation of the match between raters’ criteria and the criteria of the NORS (construct validity). The results reveal differences between groups for the scores they give, as well as for the reasons for these scores. One important conclusion echoes the claim that “quantitative similarities in ratings may mask significant qualitative differences in the reasons for those ratings” (Connor-Linton 1995: 99).
Utgiver
The University of Bergen
Opphavsrett
All rights reserved
Cecilie Carlsen

Kontakt oss | Gi tilbakemelding

Personvernerklæring
DSpace software copyright © 2002-2019  DuraSpace

Levert av  Unit
 

 

Bla i

Hele arkivetDelarkiv og samlingerUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifterDenne samlingenUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifter

Min side

Logg inn

Statistikk

Besøksstatistikk

Kontakt oss | Gi tilbakemelding

Personvernerklæring
DSpace software copyright © 2002-2019  DuraSpace

Levert av  Unit