Part 4: Advanced Analyses

Overview

Teaching: 60 min
Exercises: 35 min

Questions

How can I carry out confirmatory factor analysis?

How can I compare two models?

Objectives

Prepare data for CFA.

Fit a one factor and two-factor model

Compare the models

In order to carry out CFA on our data, we need to prep it by calculating subscores. Again, we will be using a number of functions from the tidyverse. We will also be using the lavaan package. This is a package for latent variable modeling. It’s creators have a nice website with tutorials and resources.

#load packages
library(tidyverse)
library(lavaan)
library(semPlot)

# read data
sem_dat <- read_csv("data/placement_2.csv")

# compute subscores
sem_dat <- gather(sem_dat, item, correct, 5:69)
sem_dat <- separate(sem_dat, item, into = c("number", "skill", "type"), 
                    sep = "_", remove = FALSE)

sum_scores <- group_by(sem_dat, ID, skill, type) %>% 
  summarise(total = sum(correct, na.rm = TRUE)) %>%
  unite(skill_type, 2:3, sep = "_") %>%
  spread(skill_type, total)

# some of the subscores are a bit limited in number of questions. combine
sum_scores <- mutate(sum_scores, list_global = list_mi + list_prag,
                        read_global = read_mi + read_inf + read_purp)

We will start by fitting a one-factor confirmatory model.

# Confirmatory Factor Analyses ----
# 1-factor model
## define the model

one <- '
# latent variable definitions
language =~ list_det + list_global + list_inf + read_det + read_voc +
            read_torg + read_global'

## fit the model
fit_one <- cfa(one, data = sum_scores, missing = "fiml")

## plot the model
semPaths(fit_one, "std", layout = "tree", intercepts = FALSE, residuals = T, nDigits = 2, 
         label.cex = 1, edge.label.cex=.95, fade = FALSE)

plot of chunk unnamed-chunk-2

We can use summary to view the results, and we can check the diagnostics with resid and modindices.

#view results
lavaan::summary(fit_one, estimates = TRUE, standardized = TRUE, fit.measures = TRUE)

lavaan 0.6-3 ended normally after 25 iterations

  Optimization method                           NLMINB
  Number of free parameters                         21

  Number of observations                           173
  Number of missing patterns                         1

  Estimator                                         ML
  Model Fit Test Statistic                      25.751
  Degrees of freedom                                14
  P-value (Chi-square)                           0.028

Model test baseline model:

  Minimum Function Test Statistic              447.156
  Degrees of freedom                                21
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.972
  Tucker-Lewis Index (TLI)                       0.959

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2230.686
  Loglikelihood unrestricted model (H1)      -2217.811

  Number of free parameters                         21
  Akaike (AIC)                                4503.372
  Bayesian (BIC)                              4569.591
  Sample-size adjusted Bayesian (BIC)         4503.094

Root Mean Square Error of Approximation:

  RMSEA                                          0.070
  90 Percent Confidence Interval          0.023  0.111
  P-value RMSEA <= 0.05                          0.201

Standardized Root Mean Square Residual:

  SRMR                                           0.034

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  language =~                                                           
    list_det          1.000                               1.206    0.631
    list_global       1.216    0.154    7.893    0.000    1.467    0.728
    list_inf          1.064    0.145    7.360    0.000    1.283    0.687
    read_det          1.038    0.146    7.127    0.000    1.252    0.674
    read_voc          0.458    0.081    5.658    0.000    0.553    0.492
    read_torg         1.523    0.190    8.030    0.000    1.837    0.792
    read_global       1.091    0.148    7.398    0.000    1.316    0.702

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .list_det          5.029    0.145   34.583    0.000    5.029    2.629
   .list_global       5.237    0.153   34.164    0.000    5.237    2.597
   .list_inf          3.740    0.142   26.341    0.000    3.740    2.003
   .read_det          3.468    0.141   24.555    0.000    3.468    1.867
   .read_voc          1.896    0.085   22.208    0.000    1.896    1.688
   .read_torg         6.058    0.176   34.376    0.000    6.058    2.614
   .read_global       3.214    0.143   22.529    0.000    3.214    1.713
    language          0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .list_det          2.203    0.266    8.273    0.000    2.203    0.602
   .list_global       1.913    0.254    7.527    0.000    1.913    0.471
   .list_inf          1.840    0.231    7.968    0.000    1.840    0.528
   .read_det          1.884    0.235    8.027    0.000    1.884    0.546
   .read_voc          0.955    0.108    8.836    0.000    0.955    0.757
   .read_torg         1.999    0.294    6.796    0.000    1.999    0.372
   .read_global       1.788    0.229    7.804    0.000    1.788    0.508
    language          1.455    0.337    4.321    0.000    1.000    1.000

#view diagnostics
resid(fit_one, type="standardized")

$type
[1] "standardized"

$cov
            lst_dt lst_gl lst_nf red_dt red_vc rd_trg rd_glb
list_det     0.000                                          
list_global  1.919  0.000                                   
list_inf    -0.348  1.099  0.000                            
read_det    -1.550 -1.913 -1.161  0.000                     
read_voc     0.952  0.173 -0.684 -0.083  0.000              
read_torg   -1.324  0.222 -0.729  1.139 -0.180  0.000       
read_global -0.562 -3.020  1.036  1.585 -0.137  0.115  0.000

$mean
   list_det list_global    list_inf    read_det    read_voc   read_torg 
          0           0           0           0           0           0 
read_global 
          0 

modindices(fit_one)

           lhs op         rhs    mi    epc sepc.lv sepc.all sepc.nox
  list_det ~~ list_global 9.509  0.584   0.584    0.284    0.284
  list_det ~~    list_inf 0.143 -0.068  -0.068   -0.034   -0.034
  list_det ~~    read_det 2.132 -0.264  -0.264   -0.129   -0.129
  list_det ~~    read_voc 1.079  0.125   0.125    0.086    0.086
  list_det ~~   read_torg 1.856 -0.286  -0.286   -0.136   -0.136
  list_det ~~ read_global 0.287 -0.096  -0.096   -0.048   -0.048
list_global ~~    list_inf 2.040  0.257   0.257    0.137    0.137
list_global ~~    read_det 3.046 -0.314  -0.314   -0.166   -0.166
list_global ~~    read_voc 0.035  0.022   0.022    0.016    0.016
list_global ~~   read_torg 0.065  0.055   0.055    0.028    0.028
list_global ~~ read_global 8.015 -0.509  -0.509   -0.275   -0.275
  list_inf ~~    read_det 1.377 -0.200  -0.200   -0.108   -0.108
  list_inf ~~    read_voc 0.452 -0.076  -0.076   -0.057   -0.057
  list_inf ~~   read_torg 0.521 -0.146  -0.146   -0.076   -0.076
  list_inf ~~ read_global 1.453  0.205   0.205    0.113    0.113
  read_det ~~    read_voc 0.007 -0.010  -0.010   -0.007   -0.007
  read_det ~~   read_torg 2.855  0.341   0.341    0.175    0.175
  read_det ~~ read_global 5.295  0.391   0.391    0.213    0.213
  read_voc ~~   read_torg 0.032 -0.023  -0.023   -0.017   -0.017
  read_voc ~~ read_global 0.019 -0.015  -0.015   -0.012   -0.012
 read_torg ~~ read_global 0.022  0.030   0.030    0.016    0.016

Now we will fit a two-factor model so that we can compare it to the one-factor model. We can use the same commands to examine the model.

# 2-factor model
## define the model
two <- '
# latent variable definitions
listening =~ list_det + list_global + list_inf
reading =~ read_det + read_voc + read_torg + read_global

#covariances
listening ~~ reading'

##fit the model
fit_two <- cfa(two, data = sum_scores, missing = "fiml")

##plot the model
semPaths(fit_two, "std", layout = "tree", intercepts = FALSE, residuals = TRUE, nDigits = 2, 
         label.cex = 1, edge.label.cex=.95, fade = FALSE)

plot of chunk unnamed-chunk-4

Exercise

Explore the model summary and check the diagnostics.

Solution

#view results
lavaan::summary(fit_two, estimates = TRUE, standardized = TRUE, fit.measures = TRUE)

#view diagnostics
resid(fit_two, type="standardized")
modindices(fit_two)

We can compare models with the anova function:

#compare models

anova(fit_one, fit_two)

Chi Square Difference Test

        Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)   
fit_two 13 4496.3 4565.7 16.672                                 
fit_one 14 4503.4 4569.6 25.751     9.0783       1   0.002587 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we will fit a model to that controls for method effects:

one_method <- '
# latent variable definitions
language =~ list_det + list_global + list_inf + read_det + read_voc +
read_torg + read_global

#listening method
list_det ~~ list_global
list_global ~~ list_inf

#reading method
read_det ~~ read_voc
read_det ~~ read_torg
read_det ~~ read_global
read_torg ~~ read_global'

## fit the model
fit_one_method <- cfa(one_method, data = sum_scores, missing = "fiml")

## plot the model
semPaths(fit_one_method, "std", layout = "tree", intercepts = FALSE, residuals = TRUE, nDigits = 2, 
         label.cex = 1, edge.label.cex=.95, fade = FALSE)

plot of chunk unnamed-chunk-7

# view results
lavaan::summary(fit_one_method, estimates = TRUE, standardized = TRUE, fit.measures = TRUE)

lavaan 0.6-3 ended normally after 41 iterations

  Optimization method                           NLMINB
  Number of free parameters                         27

  Number of observations                           173
  Number of missing patterns                         1

  Estimator                                         ML
  Model Fit Test Statistic                       7.629
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.471

Model test baseline model:

  Minimum Function Test Statistic              447.156
  Degrees of freedom                                21
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.002

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2221.625
  Loglikelihood unrestricted model (H1)      -2217.811

  Number of free parameters                         27
  Akaike (AIC)                                4497.251
  Bayesian (BIC)                              4582.390
  Sample-size adjusted Bayesian (BIC)         4496.892

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent Confidence Interval          0.000  0.086
  P-value RMSEA <= 0.05                          0.737

Standardized Root Mean Square Residual:

  SRMR                                           0.018

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  language =~                                                           
    list_det          1.000                               1.174    0.614
    list_global       1.217    0.154    7.877    0.000    1.429    0.709
    list_inf          1.101    0.162    6.779    0.000    1.293    0.692
    read_det          0.966    0.172    5.627    0.000    1.135    0.611
    read_voc          0.477    0.088    5.437    0.000    0.560    0.499
    read_torg         1.540    0.230    6.693    0.000    1.809    0.780
    read_global       1.091    0.180    6.045    0.000    1.281    0.683

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .list_det ~~                                                           
   .list_global       0.491    0.232    2.114    0.035    0.491    0.228
 .list_global ~~                                                        
   .list_inf          0.202    0.209    0.965    0.335    0.202    0.105
 .read_det ~~                                                           
   .read_voc          0.052    0.115    0.454    0.650    0.052    0.036
   .read_torg         0.430    0.280    1.538    0.124    0.430    0.202
   .read_global       0.458    0.237    1.935    0.053    0.458    0.227
 .read_torg ~~                                                          
   .read_global       0.115    0.280    0.411    0.681    0.115    0.058

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .list_det          5.029    0.145   34.583    0.000    5.029    2.629
   .list_global       5.237    0.153   34.160    0.000    5.237    2.597
   .list_inf          3.740    0.142   26.341    0.000    3.740    2.003
   .read_det          3.468    0.141   24.554    0.000    3.468    1.867
   .read_voc          1.896    0.085   22.208    0.000    1.896    1.688
   .read_torg         6.058    0.176   34.376    0.000    6.058    2.614
   .read_global       3.214    0.143   22.529    0.000    3.214    1.713
    language          0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .list_det          2.279    0.295    7.720    0.000    2.279    0.623
   .list_global       2.025    0.329    6.147    0.000    2.025    0.498
   .list_inf          1.816    0.261    6.964    0.000    1.816    0.521
   .read_det          2.164    0.306    7.075    0.000    2.164    0.627
   .read_voc          0.947    0.110    8.609    0.000    0.947    0.751
   .read_torg         2.101    0.411    5.113    0.000    2.101    0.391
   .read_global       1.879    0.301    6.237    0.000    1.879    0.534
    language          1.379    0.349    3.952    0.000    1.000    1.000

# view diagnostics
resid(fit_one_method, type="standardized")

$type
[1] "standardized"

$cov
            lst_dt lst_gl lst_nf red_dt red_vc rd_trg rd_glb
list_det     0.000                                          
list_global -0.058 -0.016                                   
list_inf    -0.208 -0.133  0.000                            
read_det    -0.126  0.130  0.021 -0.065                     
read_voc     1.079  0.336 -0.992 -0.134  0.000              
read_torg   -0.710  1.322 -0.652 -0.034 -0.173  0.000       
read_global  0.133 -2.101  1.459 -0.003 -0.022  0.000  0.000

$mean
   list_det list_global    list_inf    read_det    read_voc   read_torg 
          0           0           0           0           0           0 
read_global 
          0 

modindices(fit_one_method)

           lhs op         rhs    mi    epc sepc.lv sepc.all sepc.nox
  list_det ~~    list_inf 0.043 -0.054  -0.054   -0.026   -0.026
  list_det ~~    read_det 0.023 -0.026  -0.026   -0.012   -0.012
  list_det ~~    read_voc 1.155  0.129   0.129    0.088    0.088
  list_det ~~   read_torg 0.862 -0.207  -0.207   -0.095   -0.095
  list_det ~~ read_global 0.401  0.114   0.114    0.055    0.055
list_global ~~    read_det 0.105  0.057   0.057    0.027    0.027
list_global ~~    read_voc 0.015  0.015   0.015    0.011    0.011
list_global ~~   read_torg 2.708  0.378   0.378    0.183    0.183
list_global ~~ read_global 4.857 -0.399  -0.399   -0.204   -0.204
  list_inf ~~    read_det 0.053 -0.042  -0.042   -0.021   -0.021
  list_inf ~~    read_voc 0.831 -0.110  -0.110   -0.084   -0.084
  list_inf ~~   read_torg 0.725 -0.198  -0.198   -0.102   -0.102
  list_inf ~~ read_global 3.656  0.354   0.354    0.192    0.192
  read_voc ~~   read_torg 0.029 -0.026  -0.026   -0.019   -0.019
  read_voc ~~ read_global 0.000 -0.002  -0.002   -0.001   -0.001

Exercise

Compare the one_fit_method model with the fit_one and fit_two models.
Solution
## compare models
anova(fit_one, fit_one_method) #yes, accounting for method effects is important

anova(fit_two, fit_one_method) #inconclusive! 

Key Points

lavaan and semPlot are two useful packages for CFA.

previous episode

R for Assessment Specialists

next episode

Part 4: Advanced Analyses

Overview

Exercise

Solution

Exercise

Solution

Key Points

previous episode

next episode