Cohort diagnostics

Introduction

In this example we’re going to summarise cohort diagnostics results for cohorts of individuals with an ankle sprain, ankle fracture, forearm fracture, or a hip fracture using the Eunomia synthetic data.

Again, we’ll begin by creating our study cohorts.

library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(CohortCharacteristics)
library(CohortSurvival)
library(PhenotypeR)
library(dplyr)
library(ggplot2)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")

cdm$injuries <- conceptCohort(cdm = cdm,
  conceptSet = list(
    "ankle_sprain" = 81151,
    "ankle_fracture" = 4059173,
    "forearm_fracture" = 4278672,
    "hip_fracture" = 4230399
  ),
  name = "injuries")

Cohort diagnostics

We can run cohort diagnostics analyses for each of our overall cohorts like so:

cohort_diag <- cohortDiagnostics(cdm$injuries, 
                                 matchedSample = NULL,
                                 survival = TRUE)

Cohort diagnostics builds on CohortCharacteristics and CohortSurvival R packages to perform the following analyses on our cohorts:

The analyses cohort characteristics, cohort age distribution, cohort large scale characteristics, and cohort survival will also be performed (by default) in a matched cohort. The matched cohort will be created based on year of birth and sex (see matchCohorts() function in CohortConstructor package). This can help us to compare the results in our cohorts to those obtain in the matched cohort, representing the general population. Notice that the analysis will be performed in: (1) the original cohort, (2) individuals in the original cohorts that have a match (named the sampled cohort), and (3) the matched cohort.

As the matched process can be computationally expensive, specially when the cohorts are very big, we can reduce the matching analysis to a subset of participants from the original cohort using the matchedSample parameter. Alternatively, if we do not want to create the matched cohorts, we can use matchedSample = 0.

The output of cohortDiagnostics() will be a summarised result table.

Visualise cohort diagnostics results

We will now use different functions to visualise the results generated by CohortDiagnostics. Notice that these functions are from CohortCharacteristics and CohortSurvival R packages packages. ### Cohort counts

tableCohortCount(cohort_diag)

Cohort attrition

tableCohortAttrition(cohort_diag)
plotCohortAttrition(cohort_diag)

Cohort characteristics

tableCharacteristics(cohort_diag)

Cohort large scale characteristics

tableLargeScaleCharacteristics(cohort_diag)

Cohort overlap

tableCohortOverlap(cohort_diag)
plotCohortOverlap(cohort_diag)

Cohort timing

tableCohortTiming(cohort_diag)
plotCohortTiming(cohort_diag)

Cohort survival

tableSurvival(cohort_diag, header = "estimate_name")
plotSurvival(cohort_diag, colour = "target_cohort", facet = "cdm_name")