measuring reliability and validity in human coding and machine classification
DESCRIPTION
Slides delivered as a part of #CAQDAS14. In 1989 the Department of Sociology at the University of Surrey convened the world's first conference on qualitative software, which brought together qualitative methodologists and software developers who debated the pros and cons of the use of technology for qualitative data analysis. The result was a book (Fielding & Lee (1991) Using Computers in Qualitative Research, Sage Publications), the setting-up of the CAQDAS Networking Project and many other conferences concerning the topics over the years. This conference will be another opportunity for methodologists, developers and researchers to come together and debate the issues.There will be keynote papers by leading experts in the field, software support clinics and opportunities to present work in progress. http://www.surrey.ac.uk/sociology/files/Programme%20.pdfTRANSCRIPT
Measuring Reliability and Validity in Human Coding and Machine Classifica9on
Dr. Stuart Shulman
May 2, 2014 CAQDAS Conference 2014
“…a wealth of informa0on creates a poverty of a6en0on.” -‐ Herbert Simon, 1971
• This research has been supported by grants from the NaGonal Science FoundaGon (NSF) and was supplemented through interagency agreements between the US Environmental ProtecGon Agency, the US Fish & Wildlife Service, and the NSF. – EIA 0089892 (2001-‐2002)
v “SGER CiGzen Agenda-‐SeVng in the Regulatory Process: Electronic CollecGon and Synthesis of Public Commentary”
– EIA 0327979 (2003-‐2004) v “SGER CollaboraGve: A Testbed for eRulemaking Data”
– SES 0322662 (2003-‐2005) v “Democracy and E-‐Rulemaking: Comparing TradiGonal vs. Electronic Comment from a
Discursive DemocraGc Framework” – IIS 0429293 (2004-‐2007)
v “CollaboraGve Research: Language Processing Technology for Electronic Rulemaking” – SES-‐0620673 (2007)
v “Coding across the Disciplines: A Project-‐Based Workshop on Manual Text AnnotaGon Techniques”
– IIS-‐0705566 (2007-‐2010) v “CollaboraGve Research III-‐COR: From a Pile of Documents to a CollecGon of InformaGon:
A Framework for MulG-‐Dimensional Text Analysis”
• Any opinions, findings and conclusions or recommenda9ons expressed in this material are those of the authors and do not necessarily reflect those of the Na9onal Science Founda9on
Acknowledgements
An Incredibly Important Book
Qualita9ve Methods: Genes, Taste, or Tac9c? • Qualita9ve by birth or choice?
– Some look to words as an alternaGve to number crunching – Others rooted in rich and meaningful interpreGve tradiGons
• Another group is fluent in both qual & quant – Mixed methods open up rather than limits fields of knowledge
• One central goal is valid inferences about phenomena – Replicable and transparent methods – AbenGon to error and correcGve measures – Internal and external validaGon of results
• Using computers for qualita9ve data analysis helps, but… – Rigor sGll originates with the research design, not the technology – Socware makes beber organizaGon and efficiency possible – Coders enable the researcher to step back while scaling up
Purist Pluralist Posi9vist
A spectrum of approaches to working with qualita9ve data Different types of knowledge claims depending where you sit
deep immersion closeness to data
anGpathy to numbers credible interpretaGon
in-‐depth analysis contextual subjecGve
experimental mixed method adapGve hybrid flexible approach interdisciplinary
quanGtaGve focus on error
measurement criGcal validity and reliability
replicaGon & objecGvity generalizaGon hypotheses
These choices philosophical, ideological, poli9cal and ethical
Emergent proper9es found in a very well read texts, such as the character type “extremist agent of the law”
Agenda-‐secng in the press
Rela9ons between Classes
Rates and Terms for Credit
Farm Profitability
Cost of Living
Soil Fer9lity
Educa9on
Explora9on Specula9on Coding
Valida9on
Skip Ahead 10 Years: Display Ideas Using IR & NLP Techniques
• Informa9on Retrieval (IR) – Search and cluster topics and cross-‐
correlate by stakeholders
• Natural Language Processing (NLP) – Grouped by opinion and writer type
Con Pro
25,000
20,000
15,000
10,000
5,000
Par 2.2(a1) Ø Con:
ü 150, 818: “impossible to maintain” ü 272: “too expensive for elderly”
Ø Pro: ü 169, 213, 391, 392, 394: “already being done in Alaska”
ü 18: “extend to children”
Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx x xx xxxx x xx x Xx xx xxxx x
Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx xx xxxx xxx Xxx xxx xxxxxxx x xxx xx x Xx xx xxxx x
Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx x xx xxxx x xx x Xx xx xxxx x
Stuart W. Shulman. 2003. "An Experiment in Digital Government at the United States Na9onal Organic Program," Agriculture and Human Values 20(3), 253-‐265.
Coding Web Sites and Focus Groups to Study Agenda-‐Secng
Annota9on to Improve Op9cal Character Recogni9on
Over 13,000 hours of video and audio were recorded of the public spaces in a LTC facility’s demenGa unit in suburban Pibsburgh, PA. A codebook of 80+ codes was developed to categorize the behavior of the consenGng residents and staff (only in relaGon to paGents). 22 coders spent more than 4,400 hours over a period of 22 months coding the video data. The data were coded using the Informedia Digital Video Library (IDVL), an interface designed by computer scienGsts at Carnegie Mellon University.
hjp://cat.ucsur.pij.edu
Dr. Stuart W. Shulman Founder & CEO, Texicer, LLC Research Associate Professor, Department of PoliGcal Science University of Massachusebs Amherst Director, QualitaGve Data Analysis Program (QDAP) Associate Director, NaGonal Center for Digital Government Editor Emeritus, Journal of Informa0on Technology & Poli0cs [email protected] hbp://people.umass.edu/stu/ @stuartwshulman