compositional data - latent dimensions of religion …analyzing compositional data with r (vol....

28
Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model Seong-Hyeon (Sung) Kim 1 , Nathaniel R. Strenger 2 , & Narae Lee 1 1 Fuller Graduate School of Psychology, Pasadena, California, USA 2 Pastoral Counseling Center, Dallas, Texas, USA

Upload: others

Post on 02-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model

Seong-Hyeon (Sung) Kim1, Nathaniel R. Strenger2, & Narae Lee1

1Fuller Graduate School of Psychology, Pasadena, California, USA

2Pastoral Counseling Center, Dallas, Texas, USA

Page 2: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Overview

• Religion & Spirituality (R/S) • Research Questions

• Topic models • Automated text analysis

• Topics: Latent dimensions of text

• Topic proportions as compositional data

• Ternary diagrams

• Topic correlations

Page 3: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Religion & Spirituality (R/S)

• Definitions

• Religion: “the search for significance that occurs within the context of established institutions that are designed to facilitate spirituality” (Pargament et al., 2013, p. 15).

• Spirituality: “the search for the sacred” (Pargament et al., 2013, p. 14).

Pargament, K. I., Mahoney, A., Exline, J. J., Jones, J. W., & Shafranske, E. P. (2013). Envisioning an integrative paradigm for the psychology of religion and spirituality. In K. I. Pargament, J. J. Exline, & J. W. Jones (Eds.), APA handbook of psychology, religion, and spirituality (Vol 1): Context, theory, and research (pp. 3–19). Washington, DC: American Psychological Association. https://doi.org/10.1037/14045-001

Page 4: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Religion & Spirituality (R/S)

• Gorsuch (1984) introduced factor analysis as a tool to investigate the dimension of R/S. • He had criticized the over-supply of R/S measures.

• Our research introduces topic modeling as a tool to identify the fundamental dimensions or building blocks of R/S that had been conceptualized in the R/S measures.

Gorsuch, R. L. (1984). Measurement: The boon and bane of investigating religion. American Psychologist, 39(3), 228–236. https://doi.org/10.1037/0003-066X.39.3.228

Page 5: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Automated Text Analysis

• Quantitative (NOT qualitative) text analysis

• Three Different Types

1. Dictionary method: Pre-defined set of categories

2. Supervised learning: Outcome categories known (e.g., spam mail sorting)

3. Unsupervised learning: e.g., topic modeling (outcome categories unknown)

Page 6: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Topic Modeling

• Identify topics, the latent dimensions, in the text data

• Machine (statistical) learning + computer science + statistics

• Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003): Basic and popular, but does not allow topic correlations

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

Page 7: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

TASA Corpus: 37,000 Texts & 300 Topics

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.

Page 8: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Example: Steyvers & Griffiths (2007)

• 2 topics

• Each gives approximately equal probability to

• Topic 1: “money,” “loan,” and “bank”

• Topic 2: “river,” “stream,” and “bank”

• 16 documents were created by arbitrarily mixing the two topics

• Let’s analyze this collection of documents with LDA (Blei et al., 2003)

. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp.424-440). Hillsdale, NJ: Erlbaum.

Page 9: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Steyvers & Griffiths (2007)

Page 10: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Example: 16 Documents

Page 11: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Term Distributions for Topics

Topic 1

Word Probability

bank .390

money .314

loan .287

river .009

stream .000

Topic 2

Word Probability

stream .391

bank .345

river .240

money .012

loan .012

Page 12: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Topic Distribution for Documents

Page 13: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Matrix Factorization

Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.

Page 14: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

LDA & Beyond

• Limitations of LDA

• Fails to model correlation between topics

• Stems from the implicit independence assumption in the Dirichlet distribution on the topic proportions in documents

• Topics are usually correlated in texts.

Page 15: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

LDA & Beyond

• Correlated Topic Model (CTM, Blei & Lafferty, 2007)

• Replaces the Dirichlet in LDA with “more flexible logistic normal distribution” (p. 19).

• This paper cites Aitchison & Shen (1980), Aitchison (1982), & Aitchison (1985).

Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35. https://doi.org/10.1214/07-AOAS114 Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 44(2), 139–177. Aitchison, J. (1985). A general class of distributions on the simplex. Journal of the Royal Statistical Society. Series B (Methodological), 47(1), 136-146. Atchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67(2), 261-272.

Page 16: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Structural Topic Model (STM)

• Our research used STM based on CTM

• Allows topic correlations

• Allows covariates (i.e., predictors of topic proportions)

• We collected 255 R/S measures published from 1929 and 2016 to identify the latent dimensions of text.

Page 17: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Atkins, D. C., Rubin, T. N., Steyvers, M., Doeden, M. A., Baucom, B. R., & Christensen, A. (2012). Topic Models: A Novel Method for Modeling Couple and Family Text Data. Journal of Family Psychology, 26, 816-27. doi: 10.1037/a0029607

Page 18: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Preprocessing

• R ‘tm’ package (Feinerer & Hornik, 2017)

• Items of 255 R/S measures

• Preprocessed texts

• Removed stop words, numbers, and punctuations.

• e.g., a/an, the, to, for, at, she/he, I, ., or ?.

• Lemmatized words

• e.g., educate, educated, or educating educate

Feinerer, I. & Hornik, K. (2015). tm: Text Mining Package (Version 0.6-2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tm.

Page 19: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Preprocessing

• Created a document-term matrix

• Dimensions: 255 × 5617

• Included

• unigrams

• bigrams (e.g., Jesus Christ)

• trigrams (e.g., religious (and/or) spiritual belief)

• Deleted low-frequency terms (< 3)

Page 20: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Model Estimation

• R ‘stm’ package (Roberts, Stewart, & Tingley, 2017)

• Topics

• Latent dimensions of text data

• Comparable to principal components or factors

• Estimated based on word co-occurrences across documents

• Structural topic modeling

• Estimate covariates’ effect on topic proportions

• Current analysis: Decade of publication as a predictor 1950’s through 2010’s

Roberts, M. E., Stewart, B. M., & Tingley, D. (2016). stm: R Package for Structural Topic Models (Version 1.1.3) [Computer software]. Retrieved from http://www.structuraltopicmodel.com

Page 21: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Top 50 Frequent Terms

Page 22: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Diagnostic Indexes

Page 23: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

3 Topics Identified

• Topic 1: Spirituality spirituality, spiritual belief, religious spiritual, wilderness, never experience, spiritual experience, connect, illness, transcendent, transcendent spiritual

• Topic 2: Religion

church member, loving, teaching church, dealing, dealing life, local religious, join, local religious group, question meaning life, religious denomination

• Topic 3: Judeo-Christianity christian, allah, miracle, god will, god god, punish, client, god feel, patient, writing

Page 24: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

The estimated regression lines and their 95% confidence intervals are plotted.

Longitudinal Change of Expected Topic Proportions from 1950’s to 2010’s

Page 25: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Created using R ‘compositions’ package (van der Boogaart, Tolosana, & Bren, 2015)

Van den Boogaart, K. G., Tolosana, R. & Bren, M. (2015). compositions: R Package for Compositional Data Analysis (Version 1.40-1) [Computer software]. Retrieved from https://cran.r-project.org/web/packages/compositions/index.html

Page 26: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Normal Distribution on the Simplex

Page 27: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

Topic Correlations

1. exp(-var(z)): Buccianti & Pawlowsky-Glahn (2005)

• Z = ilr transformed parts

• 0 (1) → low (high) variability of ratios between parts

• e.g., .0016 for Topics 1 and 2

2. exp(-τ2/2): van den Boogaart & Tolosano-Delgado (2013)

• τ: Variation

• Interpret this as a correlation coefficient

• Very small between topics

Buccianti, A., & Pawlowsky-Glahn, V. (2005). New perspectives on water chemistry and compositional data analysis. Mathematical Geology, 37(7), 703-727. Van den Boogaart, K. G., & Tolosana-Delgado, R. (2013). Analyzing compositional data with R (Vol. 122). Heidelberg: Springer.

Page 28: Compositional Data - Latent Dimensions of Religion …Analyzing compositional data with R (Vol. 122). Heidelberg: Springer. THANK YOU Title PowerPoint Presentation Author SketchBubble.com

THANK YOU