Inflammatory bowel disease

Nathan Constantine-Cooke; Chiara Cotronei

Inflammatory bowel disease

Authors

Affiliations

Nathan Constantine-Cooke

Institute of Genetics and Cancer

Chiara Cotronei

NHS Lothian

Published

January 13, 2026

Introduction

Code

set.seed(123)

source("Baseline/utils.R")

##############
## Packages ##
##############

library(plyr) # Used for mapping values
suppressPackageStartupMessages(library(tidyverse)) # ggplot2, dplyr, and magrittr
library(readxl) # Read in Excel files
library(lubridate) # Handle dates
library(datefixR) # Standardise dates
library(patchwork) # Arrange ggplots

# Generate tables
suppressPackageStartupMessages(library(table1))
library(knitr)
library(pander)

# Generate flowchart of cohort derivation
library(DiagrammeR)
library(DiagrammeRsvg)

# paths to PREdiCCt data
if (file.exists("/docker")) { # If running in docker
  data.path <- "data/final/20221004/"
  redcap.path <- "data/final/20231030/"
  prefix <- "data/end-of-follow-up/"
  outdir <- "data/processed/"
} else { # Run on OS directly
  data.path <- "/Volumes/igmm/cvallejo-predicct/predicct/final/20221004/"
  redcap.path <- "/Volumes/igmm/cvallejo-predicct/predicct/final/20231030/"
  prefix <- "/Volumes/igmm/cvallejo-predicct/predicct/end-of-follow-up/"
  outdir <- "/Volumes/igmm/cvallejo-predicct/predicct/processed/"
}

demo <- readRDS(paste0(outdir, "demo-demographics.RDS"))
demo.cd <- readRDS(paste0(outdir, "demo-cd.RDS"))
demo.uc <- readRDS(paste0(outdir, "demo-uc.RDS"))

cat_theme <- function(gg) {
  p <- gg +
    scale_fill_manual(values = c("#DA4167", "#F4D35E", "#083D77")) +
    scale_color_manual(values = colorspace::darken(c("#DA4167",
                                                     "#F4D35E",
                                                     "#083D77"),
                                                   0.2)
                     ) +
  theme_minimal()
  p
}

This page explores data which describes subjects’ IBD, primarily consisting of clinical data.

IBD type

For this study, we have grouped ulcerative colitis (UC) and inflammatory bowel disease unclassified (IBDU) into one category. IBD type is self-reported.

Whilst PREdiCCt was designed to recruit an equal number of CD and UC/IBDU subjects, the COVID-19 pandemic halted recruitment which has resulted in there not being an equal balance of CD and UC/IBDU subjects.There are slightly more Crohn’s disease (CD) than UC/IBDU subjects in the FC cohort.

Code

demo$diagnosis2 <- plyr::mapvalues(demo$diagnosis,
  from = seq(1, 4),
  to = c(1, 2, 2, 2)
) %>%
  factor(levels = c("1", "2"), labels = c("CD", "UC/IBDU"))

demo$diagnosis <- plyr::mapvalues(demo$diagnosis,
  from = seq(1, 4),
  to = c(1, 2, 3, 3)
) %>%
  factor(levels = c("1", "2", "3"), labels = c("CD", "UC", "IBDU"))


demo %>%
  drop_na(cat) %>%
  ggplot(aes(x = diagnosis2, fill = diagnosis2, color = diagnosis2)) +
  geom_bar() +
  ylab("Frequency") +
  xlab("IBD type") +
  theme_minimal() +
  scale_fill_manual(values = c("#2B2D42", "#92DCE5")) +
  scale_color_manual(values = c("#1E203A", "#1D9BA5")) +
  guides(
    fill = guide_legend(title = "IBD type"),
    color = guide_legend(title = "IBD type")
  )

Figure 1: Distribution of IBD type in the FC cohort.

IBD type is significantly associated with FC at recruitment.

Code

pander(chisq.test(demo$diagnosis2, demo$cat))

Table 1: Chi-squared test between IBD type and FC groups.

Pearson’s Chi-squared test: `demo$diagnosis2` and `demo$cat`
Test statistic	df	P value
6.815	2	0.03313 *

From Figure 2, we can see subjects with UC/IBDU are more likely to have FC < 50, whiilst CD subjects are more likely to have FC 50-250. Roughtly the same proportion of subjects with CD and UC/IBDU have FC > 250.

Code

p <- demo %>%
  drop_na(cat) %>%
  mutate(cat = fct_rev(cat)) %>%
  ggplot(aes(x = diagnosis2, fill = cat, color = cat)) +
  geom_bar(position="fill") +
  scale_y_continuous(labels = scales::percent) + 
  labs(x = "IBD subtype",
       y = "Proportion",
       fill = "FC group",
       color = "FC group")
cat_theme(p)

Figure 2: Distribution of FC category by IBD subtype.

Disease duration

Disease duration has been calculated by subtracting reported date of diagnosis from the date of entry into PREdiCCt. Date of diagnosis has been obtained via REDcap. Where only year was available, the datefixR R package was used to impute the middle of the year. Any disease durations reported to be negative using this method were mapped to 0.

Code

redcap <- read_xlsx(paste0(redcap.path, "redcap_clean.xlsx"))
redcap <- distinct(redcap, participantno, .keep_all = TRUE)

diag.date <- redcap[, c("participantno", "date_diag")]
colnames(diag.date)[1] <- "ParticipantNo"
diag.date <- subset(diag.date, (!is.na(date_diag)) &
  (!is.null(date_diag)) &
  date_diag != "")
duration <- merge(diag.date,
  demo,
  by = "ParticipantNo",
  all.x = FALSE,
  all.y = TRUE
)
duration <- distinct(duration, ParticipantNo, .keep_all = TRUE)

duration <- subset(duration, tolower(date_diag) != "unknown")
duration[duration[, "date_diag"] == "16/032016", "date_diag"] <- "16/03/2016"
duration[duration[, "date_diag"] == "19/04/2017P", "date_diag"] <- "19/04/2017"
duration[duration[, "date_diag"] == "2007 APPROX", "date_diag"] <- "2007"
duration[duration[, "date_diag"] == "approx 1988", "date_diag"] <- "1988"
duration[duration[, "date_diag"] == "?2014", "date_diag"] <- "2014"

duration <- fix_date_df(duration, c("entry_date", "date_diag"))

duration$duration <- as.numeric(with(
  duration,
  (entry_date - date_diag) / 365.25
))

duration[duration[, "duration"] < 0, "duration"] <- 0


duration.cd <- subset(duration, diagnosis == "1")

duration.uc <- subset(duration, diagnosis != "1")

p <- duration %>%
  ggplot(aes(x = duration, color = diagnosis2, fill = diagnosis2)) +
  geom_histogram(bins = 25) +
  theme_minimal() +
  labs(
    x = "Duration (years)",
    y = "Frequency",
    color = "IBD type",
    fill = "IBD type"
  ) +
  scale_fill_manual(
    values = c("#CDEDF6", "#FF6B6B")
  ) +
  scale_color_manual(
    values = c("#5EB1BF", "#C24343")
  )
ggsave("plots/Duration-full-cohort.png",
  p,
  width = 9,
  height = 6
)
ggsave("plots/Duration-full-cohort.pdf", p, width = 9, height = 6)


duration %>%
  drop_na(duration) %>%
  ggplot(aes(x = duration, color = diagnosis2, fill = diagnosis2)) +
  geom_histogram(bins = 25) +
  theme_minimal() +
  labs(
    x = "Duration (years)",
    y = "Frequency",
    color = "IBD type",
    fill = "IBD type"
  ) +
  scale_fill_manual(
    values = c("#CDEDF6", "#FF6B6B")
  ) +
  scale_color_manual(
    values = c("#5EB1BF", "#C24343")
  )

Whilst a low p-value was observed, disease duration was not significantly different between FC groups.

Code

demo <- merge(demo, duration[, c("ParticipantNo", "duration")],
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

names(demo)[18] <- "IBD Duration"

demo %>%
  aov(formula = `IBD Duration` ~ cat) %>%
  summary() %>%
  pander()

Table 2: ANOVA between IBD disease duration and FC groups

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	639.6	319.8	2.708	0.06691
Residuals	2021	238664	118.1	NA	NA

Code

demo %>%
  filter(diagnosis2 == "CD") %>%
  aov(formula = `IBD Duration` ~ cat) %>%
  summary() %>%
  pander()

Table 3: ANOVA between IBD disease duration and FC groups in Crohn’s disease.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	162	81	0.6239	0.5361
Residuals	1051	136459	129.8	NA	NA

Code

demo %>%
  filter(diagnosis2 == "UC/IBDU") %>%
  aov(formula = `IBD Duration` ~ cat) %>%
  summary() %>%
  pander()

Table 4: ANOVA between IBD disease duration and FC groups in UC/IBDU.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	756	378	3.654	0.02626
Residuals	967	1e+05	103.5	NA	NA

Code

p <- demo %>%
  filter(diagnosis2 == "UC/IBDU") %>%
  drop_na(cat) %>%
  mutate(cat = fct_rev(cat)) %>%
  ggplot(aes(x = `IBD Duration`, fill = cat, color = cat)) +
  geom_density() +
  facet_grid(rows = vars(cat)) +
  labs(x = "Disease duration (years)",
       y = "Density",
       fill = "FC group",
       color = "FC group")
cat_theme(p)

Warning: Removed 56 rows containing non-finite outside the scale range
(`stat_density()`).

Figure 4: Distribution of disease duration by FC in UC.

IBD Control

Code

IBD <- read_xlsx(paste0(data.path, "Baseline2022/IBD.xlsx"))

# create flare_group in IBD_C and make into level
IBD <- IBD %>%
  mutate(flare_group = ifelse(FlaresInPastYear == 0,
    "No Flares",
    "1 or More Flares"
  ))
IBD$flare_group <- factor(IBD$flare_group,
  levels = c("No Flares", "1 or More Flares")
)

IBD <- IBD %>%
  mutate(treatment = ifelse(TreatmentUseful == 4,
    "Not On Treatment",
    "On Treatment"
  ))
IBD$treatment <- factor(IBD$treatment,
  levels = c("Not On Treatment", "On Treatment")
)

# correcting scores where No is favourable and Yes is unfavourable outcomes
# answer options were always YES/NO/Not sure
if (any(IBD$MissPlannedActivities == 3)) {
  IBD <- IBD %>%
    mutate_at(
      vars("MissPlannedActivities":"NewSymptoms"),
      ~ recode(., `3` = 1, `2` = 2, `1` = 0)
    )
}

# correct TreatmentUseful (Yes is favourable, No unfavourable)
# not on treatment was assigned the value 1
if (any(IBD$TreatmentUseful == 3)) {
  IBD <- IBD %>%
    mutate_at(
      "TreatmentUseful",
      ~ recode(., `4` = 2, `3` = 1, `2` = 0, `1` = 2)
    )
}

# exclude rows with missing data, 17 participants excluded for missing IBD-control questionnaire data
IBD <- IBD %>%
  filter(!rowSums(is.na(select(., 17:27))) > 0)

# new column with total control score
IBD <- IBD %>%
  mutate(control_score = rowSums(select(., 17:27), na.rm = TRUE))

# new column with IBD-control-8 score
# added 2 to make comparable to future scores where the first question is asked to make comparable to future and against the cut off of 13 which indicates remission
IBD <- IBD %>%
  mutate(control_8 = TreatmentUseful +
    MissPlannedActivities +
    WakeUpAtNight +
    SignificantPain +
    OftenLackEnergy +
    AnxiousDepressed +
    NeedChangeTreatment +
    2)

# create groups with cut off of IBD-control-8 scores quiescent (13 or above)/not quiescent
IBD <- IBD %>%
  mutate(control_grouped = ifelse(control_8 >= 13, "13-16", "0-12"))

IBD$control_8 <- as.factor(IBD$control_8)

IBD <- IBD %>%
  mutate(vas_control = ifelse(OverallControl >= 85, "85+", "<85"))

demo <- merge(demo, IBD[, c("ParticipantNo", "control_8", "OverallControl", "vas_control")],
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

IBD Control is a validated questionnaire designed to assess disease control from a patient’s perspective (Bodger et al. 2013). We considered an eight question subset, IBD-Control-8 which ranges in potential scores from 0 to 16. We also consider the IBD-Control visual analogue scale (VAS). Bodger et al. (2013) found a cut-off of \geq 13 for IBD-Control-8 and a cut-off of \geq 85 for IBD-Control-VAS to be optimal for identifying quiescent disease.

Code

p1 <- demo %>%
  drop_na(cat) %>%
  ggplot(aes(x = control_8, color = control_8, fill = control_8)) +
  geom_bar() +
  theme_minimal() +
  theme(legend.position = "none") +
  xlab("Control-8 Scores") +
  ylab("Frequency") +
  scale_color_manual(values = viridis::viridis(15), na.value = "#032B43") +
  scale_fill_manual(values = viridis::viridis(15), na.value = "#032B43") +
  geom_vline(xintercept = 11.5, color = "#FF4B3E")

p2 <- demo %>%
  drop_na(cat) %>%
  ggplot(aes(x = OverallControl)) +
  geom_histogram(binwidth = 5, fill = "#FFD23F", color = "#947805") +
  theme_minimal() +
  xlab("VAS Scores") +
  ylab("Frequency") +
  geom_vline(xintercept = 85, color = "#FF4B3E") +
  scale_x_continuous(breaks = seq(0, 100, 10))
p1 / p2 + plot_annotation(tag_levels = "A")

Figure 5: Distributiuon of IBD Control scores for (A) Control-8 and (B) Visual analogue scale. The vertical red lines denote the cut-offs for quiescent disease reported by Bodger et al. (2013).

IBD-Control-8 scores and VAS were both found to be significantly associated with FC.

Code

demo %>%
  aov(formula = as.numeric(control_8) ~ cat) %>%
  summary() %>%
  pander()

Table 5: ANOVA between control-8 and FC groups across the FC cohort.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	65.25	32.63	3.335	0.03586
Residuals	1694	16574	9.784	NA	NA

Code

demo %>%
  filter(diagnosis2 == "CD") %>%
  aov(formula = as.numeric(control_8) ~ cat) %>%
  summary() %>%
  pander()

Table 6: ANOVA between control-8 and FC groups in Crohn’s disease.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	29.06	14.53	1.373	0.254
Residuals	867	9179	10.59	NA	NA

Code

demo %>%
  filter(diagnosis2 == "UC/IBDU") %>%
  aov(formula = as.numeric(control_8) ~ cat) %>%
  summary() %>%
  pander()

Table 7: ANOVA between control-8 and FC groups in UC/IBDU.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	73.18	36.59	4.166	0.01584
Residuals	824	7238	8.783	NA	NA

Code

demo %>%
  aov(formula = OverallControl ~ cat) %>%
  summary() %>%
  pander()

Table 8: ANOVA between control VAS and FC groups across the FC cohort.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	4692	2346	9.996	4.836e-05
Residuals	1687	395921	234.7	NA	NA

Code

demo %>%
  filter(diagnosis2 == "CD") %>%
  aov(formula = OverallControl ~ cat) %>%
  summary() %>%
  pander()

Table 9: ANOVA between control VAS and FC groups in Crohn’s disease.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	1632	815.8	3.4	0.03383
Residuals	862	206848	240	NA	NA

Code

demo %>%
  filter(diagnosis2 == "UC/IBDU") %>%
  aov(formula = OverallControl ~ cat) %>%
  summary() %>%
  pander()

Table 10: ANOVA between control VAS and FC groups in UC/IBDU.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	2933	1467	6.448	0.001665
Residuals	822	186983	227.5	NA	NA

Instead to treating VAS as a continuous value, going forward, VAS will be discretised into below 85 and 85 and above as a VAS of 85 was found by Bodger et al. (2013) to be predictive of quiescent disease.

Code

demo <- demo %>%
  select(-OverallControl)
demo %>%
  drop_na(cat) %>%
  ggplot(aes(x = vas_control, color = vas_control, fill = vas_control)) +
  geom_bar() +
  theme_minimal() +
  theme(legend.position = "none") +
  xlab("Visual analogue scores") +
  ylab("Frequency") +
  scale_color_manual(values = c("#D00000", "#447604"), na.value = "#032B43") +
  scale_fill_manual(values = c("#D00000", "#447604"), na.value = "#032B43")

Figure 6: Distribution of discretised IBD-Control-VAS scores.

The discretised VAS is continued to be significantly associated with FC.

Code

pander(chisq.test(demo$vas_control, demo$cat))

Table 11: Chi squared test between VAS and FC groups.

Pearson’s Chi-squared test: `demo$vas_control` and `demo$cat`
Test statistic	df	P value
16.89	2	0.0002153 * * *

Medication use

This section is concerned with medication used by study participants. Current treatment strategy, bio-naive status and antibiotic use are considered.

IBD treatment strategy

Code

drugs <- data.frame(ParticipantNo = demo$ParticipantNo)

drugs$ASA <- FALSE
drugs$imm <- FALSE
drugs$bio <- FALSE
drugs$category <- NA

redcap <- redcap %>% mutate(
  curr_on_mesa = ifelse(is.na(curr_on_mesa),
    0,
    curr_on_mesa
  ),
  curr_on_aza = ifelse(is.na(curr_on_aza),
    0,
    curr_on_aza
  ),
  curr_on_merc = ifelse(is.na(curr_on_merc),
    0,
    curr_on_merc
  ),
  curr_on_metho = ifelse(is.na(curr_on_metho),
    0,
    curr_on_metho
  ),
  curr_on_ciclo = ifelse(is.na(curr_on_ciclo),
    0,
    curr_on_ciclo
  ),
  curr_on_inflix = ifelse(is.na(curr_on_inflix),
    0,
    curr_on_inflix
  ),
  curr_on_ada = ifelse(is.na(curr_on_ada),
    0,
    curr_on_ada
  ),
  curr_on_goli = ifelse(is.na(curr_on_goli),
    0,
    curr_on_goli
  ),
  curr_on_vedo = ifelse(is.na(curr_on_vedo),
    0,
    curr_on_vedo
  ),
  curr_on_ust = ifelse(is.na(curr_on_ust),
    0,
    curr_on_ust
  )
)

for (i in 1:nrow(drugs)) {
  if (drugs[i, "ParticipantNo"] %in% redcap$participantno) {
    subject.data <- subset(redcap, participantno == drugs[i, "ParticipantNo"])[1, ]
    if (subject.data$curr_on_mesa == 1) drugs[i, "ASA"] <- TRUE

    # Immunosuppressants
    if (subject.data$curr_on_aza == 1 |
      subject.data$curr_on_merc == 1 |
      subject.data$curr_on_metho == 1 |
      subject.data$curr_on_ciclo == 1) {
      drugs[i, "imm"] <- TRUE
    }

    # Biologics
    if (subject.data$curr_on_inflix == 1 |
      subject.data$curr_on_ada == 1 |
      subject.data$curr_on_goli == 1 |
      subject.data$curr_on_vedo == 1 |
      subject.data$curr_on_ust == 1) {
      drugs[i, "bio"] <- TRUE
    }
    if (drugs[i, "imm"] & drugs[i, "bio"]) {
      drugs[i, "category"] <- 4
    } else if (drugs[i, "imm"]) {
      drugs[i, "category"] <- 2
    } else if (drugs[i, "bio"]) {
      drugs[i, "category"] <- 3
    } else if (drugs[i, "ASA"]) {
      drugs[i, "category"] <- 1
    }
  }
}

drugs$category <- factor(drugs$category,
  levels = 1:4,
  labels = c(
    "5-ASA",
    "Mono immunotherapy",
    "Mono biologic",
    "Combo therapy"
  )
)

drugs <- merge(drugs, demo[, c("ParticipantNo", "cat", "diagnosis2")], by = "ParticipantNo")
colnames(drugs)[5] <- "Treatment"

For IBD treatment, subjects are categorised as being on either a 5-ASA, mono immunotherapy, mono biologic, or immuno-biologic combination treatment strategy based upon REDCap data provided by participants’ IBD care teams.

Treatments. in REDCap were selected via check boxes. As such, NA (unticked) is assumed to mean the participant is not on a given medication.

Immunosuppressants are defined as:

Azathioprine
Mercaptopurine
Methotrexate
Ciclosporin

Biologics are defined as:

Infliximab
Adalimumab
Golimumab
Vedolizumab
Ustekinumab

Patients are only categorised as receiving a 5-ASA treatment plan if 5-ASA is the only IBD treatment they are receiving (I.E no immunotherapy or biologics).

As should be expected given poor efficacy in CD, a 5-ASA treatment strategy is much more common for UC/IBDU participants (Figure 7). Mono biologic therapy is the most common treatment strategy for CD subjects.

Code

plt.cols <- c("#52528C", "#DD6E42", "#048A81", "#C585B3")


p <- drugs %>%
  drop_na(cat) %>%
  ggplot(aes(x = Treatment, fill = Treatment, color = Treatment)) +
  geom_bar() +
  theme_minimal() +
  theme(legend.position = "none") +
  xlab("Treatment strategy") +
  ylab("Frequency") +
  scale_fill_manual(values = plt.cols, na.value = "#032B43") +
  scale_colour_manual(
    values = colorspace::darken(plt.cols, 0.3),
    na.value = "#032B43"
  ) +
  facet_grid(rows = vars(diagnosis2))

ggsave("plots/baseline/treat_strat.pdf" , p, width = 10, height = 8)
ggsave("plots/baseline/treat_strat.png" , p, width = 10, height = 8)
p

Figure 7: Reported treatment strategies for the FC cohort.

As we are unable to differentiate between participants which are are no treatment and those for whom treatment data is missing, we will instead use a None reported category.

Code

drugs <- drugs %>%
  mutate(Treatment = if_else(is.na(Treatment), 
                             "None reported",
                             Treatment))
demo <- merge(demo,
              drugs[, c("ParticipantNo", "Treatment")],
              by = "ParticipantNo",
              all.x = TRUE,
              all.y = FALSE,
              sort = FALSE)

Code

pander(with(subset(drugs, diagnosis2 == "CD"), chisq.test(Treatment, cat)))

Table 12: Chi-squared test between treatment strategy and FC groups for participants with Crohn’s disease.

Pearson’s Chi-squared test: `Treatment` and `cat`
Test statistic	df	P value
7.571	8	0.4765

Code

pander(with(subset(drugs, diagnosis2 == "UC/IBDU"), chisq.test(Treatment, cat)))

Table 13: Chi-squared test between treatment strategy and FC groups for participants with Ulcerative colitis/IBDU.

Pearson’s Chi-squared test: `Treatment` and `cat`
Test statistic	df	P value
12.26	8	0.1401

Current medications

Code

findCurrent <- function(x) {
  sum(x == 1)
}


drug.cols <- c("participantno",
               "curr_on_mesa",
               "curr_on_aza",
               "curr_on_merc",
               "curr_on_metho",
               "curr_on_ciclo",
               "curr_on_inflix",
               "curr_on_ada",
               "curr_on_goli",
               "curr_on_vedo",
               "curr_on_ust")

drug.tab <- redcap[, drug.cols]

drug.available <- c()
for (i in 1:nrow(drug.tab)) {
  if (any(drug.tab[i, -1] != 0)) {
    drug.available <- c(drug.available, "yes")
  } else(drug.available <- c(drug.available, "no"))
}

drug.tab$available <- drug.available

Whole cohort

Code

drug.tab.cd <- subset(drug.tab,
                     participantno %in%
                       subset(demo, diagnosis2 == "CD")$ParticipantNo)
drug.tab.uc <- subset(drug.tab,
                     participantno %in%
                       subset(demo, diagnosis2 == "UC/IBDU")$ParticipantNo)




cd.rx.counts <- sapply(drug.tab.cd[, -1], findCurrent) 
uc.rx.counts <- sapply(drug.tab.uc[, -1], findCurrent) 

rx.counts <- cbind(cd.rx.counts, uc.rx.counts)
rx.counts <- rx.counts[-nrow(rx.counts), ] # remove available row 
rownames(rx.counts) <- c("5-ASA",
                         "Azathioprine",
                         "Mercaptopurine",
                         "Methotrexate",
                         "Ciclosporin",
                         "Infliximab",
                         "Adalimumab",
                         "Golimumab",
                         "Vedolizumab",
                         "Ustekinumab")
colnames(rx.counts) <- c("Crohn's disease",
                         "Ulcerative colitis/IBDU")


knitr::kable(rx.counts)

	Crohn’s disease	Ulcerative colitis/IBDU
5-ASA	89	551
Azathioprine	325	218
Mercaptopurine	93	64
Methotrexate	61	14
Ciclosporin	1	1
Infliximab	286	104
Adalimumab	164	29
Golimumab	0	6
Vedolizumab	65	102
Ustekinumab	14	3

FC cohort

Code

fc.ids <- demo %>%
  filter(!is.na(cat)) %>%
  pull(ParticipantNo)


drug.tab.fc <- subset(drug.tab,
                      participantno %in% fc.ids)

drug.tab.fc.cd <- subset(drug.tab.fc,
                     participantno %in%
                       subset(demo, diagnosis2 == "CD")$ParticipantNo)
drug.tab.fc.uc <- subset(drug.tab.fc,
                     participantno %in%
                       subset(demo, diagnosis2 == "UC/IBDU")$ParticipantNo)




cd.rx.counts <- sapply(drug.tab.fc.cd[, -1], findCurrent) 
uc.rx.counts <- sapply(drug.tab.fc.uc[, -1], findCurrent) 

rx.counts <- cbind(cd.rx.counts, uc.rx.counts)
rx.counts <- rx.counts[-nrow(rx.counts), ] # remove available row
rownames(rx.counts) <- c("5-ASA",
                         "Azathioprine",
                         "Mercaptopurine",
                         "Methotrexate",
                         "Ciclosporin",
                         "Infliximab",
                         "Adalimumab",
                         "Golimumab",
                         "Vedolizumab",
                         "Ustekinumab")
colnames(rx.counts) <- c("Crohn's disease",
                         "Ulcerative colitis/IBDU")

knitr::kable(rx.counts)

	Crohn’s disease	Ulcerative colitis/IBDU
5-ASA	75	436
Azathioprine	270	177
Mercaptopurine	78	51
Methotrexate	52	12
Ciclosporin	1	1
Infliximab	241	87
Adalimumab	137	21
Golimumab	0	5
Vedolizumab	55	90
Ustekinumab	10	3

FFQ Cohort

Code

FFQ <- read_xlsx(paste0(
  prefix,
  "predicct ffq_nutrientfood groupDQI all foods_data (n1092)Nov2022.xlsx"
))

FFQ <- FFQ %>%
  filter(!is.na(fibre))

ffq.ids <- fc.ids[fc.ids %in% FFQ$participantno]




drug.tab.ffq <- subset(drug.tab,
                      participantno %in% ffq.ids)

drug.tab.ffq.cd <- subset(drug.tab.ffq,
                     participantno %in%
                       subset(demo, diagnosis2 == "CD")$ParticipantNo)
drug.tab.ffq.uc <- subset(drug.tab.ffq,
                     participantno %in%
                       subset(demo, diagnosis2 == "UC/IBDU")$ParticipantNo)




cd.rx.counts <- sapply(drug.tab.ffq.cd[, -1], findCurrent) 
uc.rx.counts <- sapply(drug.tab.ffq.uc[, -1], findCurrent) 

rx.counts <- cbind(cd.rx.counts, uc.rx.counts)
rx.counts <- rx.counts[-nrow(rx.counts), ] # remove available row
rownames(rx.counts) <- c("5-ASA",
                         "Azathioprine",
                         "Mercaptopurine",
                         "Methotrexate",
                         "Ciclosporin",
                         "Infliximab",
                         "Adalimumab",
                         "Golimumab",
                         "Vedolizumab",
                         "Ustekinumab")
colnames(rx.counts) <- c("Crohn's disease",
                         "Ulcerative colitis/IBDU")

knitr::kable(rx.counts)

	Crohn’s disease	Ulcerative colitis/IBDU
5-ASA	30	223
Azathioprine	111	79
Mercaptopurine	32	23
Methotrexate	16	4
Ciclosporin	0	0
Infliximab	81	27
Adalimumab	69	13
Golimumab	0	4
Vedolizumab	19	29
Ustekinumab	3	2

Bio-naive status

Code

bio.naive <- data.frame(ParticipantNo = redcap$participantno)
bio.naive <- subset(bio.naive, ParticipantNo %in% demo$ParticipantNo)
bio.naive <- merge(bio.naive,
  demo[, c("ParticipantNo", "diagnosis")],
  all.y = TRUE,
  all.x = FALSE
)
bio.naive <- distinct(bio.naive, ParticipantNo, .keep_all = TRUE)
bio.naive$naive <- "yes"

bio.vars <- c("inflix", "ada", "goli", "vedo", "ust")

for (i in 1:nrow(bio.naive)) {
  temp.df <- as.data.frame(subset(redcap, participantno == bio.naive[i, 1]))
  temp.df <- temp.df[, paste0("curr_on_", bio.vars)]
  temp.df <- temp.df[, !is.na(temp.df[1, ])]

  if (any(temp.df == 1)) {
    bio.naive[i, "naive"] <- "no-using"
  } else if (any(temp.df == 2)) {
    bio.naive[i, "naive"] <- "no-not-using"
  } # else naive
}

drugs.eof <- read_xlsx(paste0(prefix, "EOF_drugs.xlsx"))
using <- subset(drugs.eof, AtRecruitmentYN == 1)
bio.list <- seq(4, 8)
using <- subset(using, DrugType %in% bio.list)
using <- subset(demo, ParticipantId %in% using$ParticipantId)
using <- using[, c("ParticipantNo", "diagnosis")]
using$naive <- c("no-using")

drugs <- rbind(bio.naive, using)

for (i in unique(drugs$ParticipantNo)) {
  temp <- subset(drugs, ParticipantNo == i)
  # if "no-using" in either extract then report using
  if (nrow(temp) > 1) {
    if (any(temp$naive == "no-using")) {
      drugs <- subset(drugs, ParticipantNo != i)
      drugs <- rbind(
        drugs,
        data.frame(
          ParticipantNo = i,
          diagnosis = temp[1, "diagnosis"],
          naive = "no-using"
        )
      )
    }
  }
}

drugs <- distinct(drugs, ParticipantNo, .keep_all = TRUE)

demo <- merge(demo,
  drugs[, c(1, 3)],
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

demo$naive <- factor(demo$naive,
  levels = c("no-using", "no-not-using", "yes"),
  labels = c("Current", "Previously", "Never prescribed")
)

colnames(demo)[colnames(demo) == "naive"] <- "Biologic"

Bio-naive status has been determined using REDCap and EoS data. As clinical teams were not explicitly asked if a subject was bio-naive, the subject is instead defined as being bio-naive if there are no recorded biologics for the subject (which may bias subjects with missing data towards incorrectly being reported as bio-naive). Non bio-naive subjects are categorised further into either “currently using” or “previously used”.

Biologic treatments are defined as infliximab, adalimumab, golimumab, vedolizumab, and ustekinumab.

Small molecule treatments are not considered in this report due to their low prescribing rates during the recruitment period (which ended March 2020).

Code

demo %>%
  drop_na(cat) %>%
  ggplot(aes(x = Biologic, fill = Biologic, color = Biologic)) +
  geom_bar() +
  xlab("Biologic status") +
  ylab("Frequency") +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#CACF85", "#8CBA80", "#E58F65")) +
  scale_color_manual(values = c("#747825", "#4A6E3F", "#934D1B")) +
  guides(
    fill = guide_legend(title = "Biologic status"),
    color = guide_legend(title = "Biologic status")
  ) +
  facet_grid(rows = vars(diagnosis2))

Bio-naive status was found to be associated with FC groups which may be due to patients with more active disease being more likely to be prescribed a biologic.

Code

pander(chisq.test(demo$Biologic, demo$cat))

Table 14: Chi-squared test between bio-naive status and FC groups.

Pearson’s Chi-squared test: `demo$Biologic` and `demo$cat`
Test statistic	df	P value
16.51	4	0.002406 * *

Smoking

Code

smoking <- read_xlsx(paste0(data.path, "Baseline2022/lifestyle.xlsx")) %>%
  select(ParticipantNo, Smoke, SmokedInPast)
smoking$Smoke <- plyr::mapvalues(smoking$Smoke, from = 2, to = 3)

for (i in 1:nrow(smoking)) {
  if ((!is.na(smoking[i, "SmokedInPast"]) & smoking[i, "SmokedInPast"] == 1) & (!is.na(smoking[i, "Smoke"]) & smoking[i, "Smoke"] == 3)) {
    smoking[i, "Smoke"] <- 2
  }
}

smoking$Smoke <- factor(smoking$Smoke,
  levels = seq(1, 3),
  labels = c("Current", "Previous", "Never")
)

smoking <- smoking[, -3]


demo <- merge(demo,
  smoking,
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

esmoking <- read_xlsx(paste0(data.path, "Baseline2022/lifestyle.xlsx")) %>%
  select(ParticipantNo, ECigs, ECigsPast)
esmoking$ECigs <- plyr::mapvalues(esmoking$ECigs, from = 2, to = 3)

for (i in 1:nrow(esmoking)) {
  if ((!is.na(esmoking[i, "ECigsPast"] == 1) & esmoking[i, "ECigsPast"] == 1) &
      (!is.na(esmoking[i, "ECigs"]) & esmoking[i, "ECigs"] == 3)) {
    esmoking[i, "ECigs"] <- 2
  }
}

esmoking$ECigs <- factor(esmoking$ECigs,
  levels = seq(1, 3),
  labels = c("Current", "Previous", "Never")
)

esmoking <- esmoking[, -3]


demo <- merge(demo,
  esmoking,
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

Code

demo %>%
  drop_na(cat) %>%
  ggplot(aes(x = Smoke, fill = Smoke, color = Smoke)) +
  geom_bar() +
  xlab("Smoking status") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#4E0250", "#517664", "#1C448E"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#3D003F", "#385245", "#033070"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "Smoking status"),
    color = guide_legend(title = "Smoking status")
  )

Figure 9: Smoking status for subjects in the FC cohort.

Smoking status is not associated with FC at recruitment across the entire FC cohort.

Code

pander(chisq.test(demo$Smoke, demo$cat))

Table 15: Chi-squared test between between smoking status and FC groups.

Pearson’s Chi-squared test: `demo$Smoke` and `demo$cat`
Test statistic	df	P value
8.662	4	0.07013

E-cigarette use

The vast majority of participants do not report using some form of e-cigarettes. It should be noted that as recruitment ended March 2020, these day may not reflect more modern trends.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = ECigs, fill = ECigs, color = ECigs)) +
  geom_bar() +
  xlab("E-cigarette status") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#4E0250", "#517664", "#1C448E"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#3D003F", "#385245", "#033070"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "E-cigarette status"),
    color = guide_legend(title = "E-cigarette status")
  )

Figure 10: E-cigarette usage for subjects in the FC cohort.

Code

pander(chisq.test(demo.cd$ECigs, demo.cd$cat))

Table 16: Chi-squared test between between E-cigarettte usage and FC groups.

Pearson’s Chi-squared test: `demo.cd$ECigs` and `demo.cd$cat`
Test statistic	df	P value
5.96	4	0.2022

Variables only relevant to Crohn’s disease subjects

This section looks at data only applicable to subjects with CD. The following variables are considered: surgery, Montreal location, Montreal behaviour, Harvey-Bradshaw index, and smoking.

Surgery

Code

predicct.surgery <- read_xlsx(paste0(
  data.path,
  "Baseline2022/IBD.xlsx"
))[, c("ParticipantNo", "HadSurgeryForIBD")]
colnames(predicct.surgery)[2] <- "Surgery"


redcap.surgery <- redcap[, c("participantno", "surgery")]


names(redcap.surgery) <- c("ParticipantNo", "Surgery")

surgery <- rbind(redcap.surgery, predicct.surgery)
surgery <- subset(surgery, !is.na(surgery$Surgery))

# As REDCap is first, this will be prioritised over subject (if not NA)
surgery <- distinct(surgery, ParticipantNo, .keep_all = TRUE)

demo.cd <- subset(demo, diagnosis2 == "CD")
demo.cd <- merge(demo.cd, surgery, all.x = TRUE, all.y = FALSE)
demo.cd$Surgery <- factor(demo.cd$Surgery,
  levels = c(2, 1),
  labels = c("No", "Yes")
)

Surgical data has been obtained from both REDCap and patient-completed questionnaires. Where data are available from both data sources, REDCap is the preferred data source.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = Surgery, fill = Surgery, color = Surgery)) +
  geom_bar() +
  xlab("Previous surgery") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(values = c("#F9B9B7", "#96C9DC"), na.value = "#032B43") +
  scale_color_manual(values = c("#D06965", "#448CA2"), na.value = "#032B43")

Figure 11: Previous surgery for Crohn’s disease subjects in the FC cohort.

FC at study recruitment is significantly associated with having previously undergone surgery for CD.

Code

pander(chisq.test(demo.cd$Surgery, demo.cd$cat))

Table 17: Chi-squared test between previous surgery and FC groups.

Pearson’s Chi-squared test: `demo.cd$Surgery` and `demo.cd$cat`
Test statistic	df	P value
7.389	2	0.02486 *

Montreal location

Code

mapping <- data.frame(
  code = seq(1, 6),
  definition = c(
    "Oesophago-gastric",
    "Duodenal",
    "Jejunal",
    "Ileal",
    "Colonic",
    "Rectal"
  )
)

cd.location <- redcap[, c("participantno", paste0("mac_extent___", 1:6))]
L4 <- rep(0, nrow(cd.location))
Location <- rep(NA, nrow(cd.location))

for (i in 1:nrow(cd.location)) {
  # If Oesophago-gastric, Duodenal, Jejunal, then L4 is present
  if (any(cd.location[i, 2:4] == 1)) L4[i] <- 1
  if (cd.location[i, 5] == 1 & cd.location[i, 6] == 1) {
    Location[i] <- 3 # Ileal-colonic
  } else if (cd.location[i, 5] == 1 & cd.location[i, 6] == 0) {
    Location[i] <- 1 # Ileal only
  } else if (cd.location[i, 5] == 0 & cd.location[i, 6] == 1) {
    Location[i] <- 2 # Colonic only
  }
  if (is.na(Location[i]) & L4[i] == 1) {
    Location[i] <- 4 # L4 Only
  }
}

L4[is.na(Location) & L4 == 0] <- NA

mont.df <- data.frame(
  cd.location[, "participantno"],
  Location,
  L4
)
colnames(mont.df)[1] <- "ParticipantNo"
demo.cd <- merge(demo.cd, mont.df,
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

demo.cd$Location <- factor(demo.cd$Location,
  levels = c(1, 2, 3, 4),
  labels = c("L1", "L2", "L3", "L4 only")
)

demo.cd$L4 <- factor(demo.cd$L4,
  levels = c(0, 1),
  labels = c("Not present", "Present")
)

Montreal location has been obtained from REDcap data. Rather than being asked for Montreal location directly, clinical teams were asked to provide more granular detail by ticking boxes if inflammation was in any of the following areas Oesophago-gastric, Duodenal, Jejunal, Ileal, Colonic, Rectal. If only any of the first 3 were reported then the subject was assigned “L4 only”. L1-L3 were assigned as per convention (Silverberg et al. 2005). If no location was reported then location was assigned NA.

As can be seen in Figure 12, L3 is the most common Montreal location in this cohort with roughly an equal amount of subjects having either L1 or L2.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = Location, fill = Location, color = Location)) +
  geom_bar() +
  xlab("Montreal location") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#33CA7F", "#96C9DC", "#C33C54", "#FCB97D"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#318859", "#448CA2", "#80333E", "#BD7700"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "Montreal location"),
    color = guide_legend(title = "Montreal location")
  )

Figure 12: Montreal location for CD subjects in the FC cohort.

Due to the low number of subjects with L4-only, Fisher’s exact test has been used to compare with FC.

Code

pander(fisher.test(demo.cd$Location, demo.cd$cat, workspace = 20000000))

Table 18: Fisher’s exact test between Montreal location and FC groups.

Fisher’s Exact Test for Count Data: `demo.cd$Location` and `demo.cd$cat`
P value	Alternative hypothesis
0.6059	two.sided

In addition to L4-only being considered, we also investigate L4 as a modifier.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = L4, fill = L4, color = L4)) +
  geom_bar() +
  xlab("Presence of L4") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(values = c("#51E5FF", "#EC368D"), na.value = "#032B43") +
  scale_color_manual(values = c("#079DB1", "#AF0863"), na.value = "#032B43") +
  guides(
    fill = guide_legend(title = "Presence of L4"),
    color = guide_legend(title = "Presence of L4")
  )

Figure 13: L4 (upper GI involvement) in the FC cohort.

Code

pander(chisq.test(demo.cd$L4, demo.cd$cat))

Table 19: Chi-squared test between L4 and FC groups.

Pearson’s Chi-squared test: `demo.cd$L4` and `demo.cd$cat`
Test statistic	df	P value
5.611	2	0.06047

Montreal behaviour

Code

cd.behaviour <- redcap[, c("participantno", "behaviour", "perianal")]
colnames(cd.behaviour) <- c("ParticipantNo", "Behaviour", "Perianal")
cd.behaviour$Perianal[!is.na(cd.behaviour$Perianal) & cd.behaviour$Perianal == 3] <- NA


demo.cd <- merge(demo.cd,
  cd.behaviour,
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

demo.cd$Behaviour <- factor(demo.cd$Behaviour,
  levels = c(1, 2, 3),
  labels = c("B1", "B2", "B3")
)

Montreal behaviour has been obtained from REDCap data and was recorded conventionally. The majority of CD subjects have the B1 (inflammatory) phenotype.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = Behaviour, fill = Behaviour, color = Behaviour)) +
  geom_bar() +
  xlab("Montreal behaviour") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#7AE7C7", "#440381", "#D64045"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#07A282", "#340165", "#942E31"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "Montreal behaviour"),
    color = guide_legend(title = "Montreal behaviour")
  )

Figure 14: Montreal behaviour for CD subjects in the FC cohort.

Montreal behaviour is not significantly associated with FC at recruitment.

Code

pander(chisq.test(demo.cd$Behaviour, demo.cd$cat))

Table 20: Chi-squared test between Montreal behaviour and FC groups.

Pearson’s Chi-squared test: `demo.cd$Behaviour` and `demo.cd$cat`
Test statistic	df	P value
5.136	4	0.2736

Approximately a third of CD subjects are reported to have perianal disease.

Code

demo.cd$Perianal <- factor(demo.cd$Perianal,
  levels = c(2, 1),
  labels = c("No", "Yes")
)

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = Perianal, fill = Perianal, color = Perianal)) +
  geom_bar() +
  xlab("Perianal disease") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(values = c("#023C40", "#AF5B5B"), na.value = "#032B43") +
  scale_color_manual(values = c("#002C30", "#754242"), na.value = "#032B43") +
  guides(
    fill = guide_legend(title = "Perianal disease"),
    color = guide_legend(title = "Perianal disease")
  )

Figure 15: Perianal disease for CD subjects in the FC cohort.

Perianal disease is not associated with FC at recruitment.

Code

pander(chisq.test(demo.cd$Perianal, demo.cd$cat))

Table 21: Chi-squared test between between perianal disease and FC groups.

Pearson’s Chi-squared test: `demo.cd$Perianal` and `demo.cd$cat`
Test statistic	df	P value
1.305	2	0.5207

Harvey-Bradshaw Index

Code

HBI.df <- redcap[, c("participantid", "current_hb")]
colnames(HBI.df) <- c("ParticipantId", "HBI")
HBI.df <- subset(HBI.df, ParticipantId %in% demo.cd$ParticipantId)

demo.cd <- merge(demo.cd, HBI.df, by = "ParticipantId", all.x = TRUE)

temp <- demo.cd %>%
  drop_na(cat)
percHBI <- round(sum(is.na(temp$HBI)) / nrow(temp), 2) * 100

Whilst HBI was collected via REDCap, substantial missingness is observed with 60% of CD subjects in the FC sub-cohort missing HBI.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = as.factor(HBI))) +
  geom_bar(fill = "#A167A5", color = "#4A306D") +
  theme_minimal() +
  ylab("Frequency") +
  xlab("Harvey-Bradshaw index at time of recruitment")

Figure 16: Harvey-Bradshaw index for CD subjects in the FC cohort.

Code

pander(summary(aov(HBI ~ cat, data = demo.cd)))

Table 22: ANOVA between between Harvey-Bradshaw index and FC groups.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	13.1	6.548	1.174	0.3102
Residuals	441	2461	5.579	NA	NA

PRO-2 in Crohn’s disease

Sum of liquid stools per day and abdominal pain.

Code

PRO2 <- redcap[, c("participantno", "liq_stool_day", "abdo_pain")]
PRO2$liq_stool_day <- as.numeric(PRO2$liq_stool_day) # Drop ileostomies

PRO2$PRO2 <- with(PRO2, liq_stool_day + abdo_pain)

PRO2$ParticipantNo <- PRO2$participantno
demo.cd <- merge(demo.cd,
  PRO2[, c("ParticipantNo", "PRO2")],
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

demo.cd %>%
  ggplot(aes(x = as.factor(PRO2))) +
  geom_bar(fill = "#FF8552", color = "#D2632A") +
  scale_fill_manual(na.value = "#032B43") +
  scale_color_manual(na.value = "#032B43") +
  theme_minimal()

Code

pander(summary(aov(PRO2 ~ cat, data = demo.cd)))

Table 23: ANOVA between between PRO2 and FC groups in Crohn’s disease.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	4.8	2.4	0.6194	0.5386
Residuals	669	2592	3.875	NA	NA

Smoking

Code

percPrev <- round(sum(temp$Smoke == "Previous", na.rm = TRUE) / nrow(temp), 2) * 100

We consider smoking for CD and UC separately as smoking is often reported to worsen outcomes in CD but is possibly protective in UC. Smoking status is self-reported by study participants.

Whilst most subjects have never been a smoker, a substantial proportion of subjects (27%) have previously been smokers.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = Smoke, fill = Smoke, color = Smoke)) +
  geom_bar() +
  xlab("Smoking status") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#4E0250", "#517664", "#1C448E"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#3D003F", "#385245", "#033070"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "Smoking status"),
    color = guide_legend(title = "Smoking status")
  )

Figure 17: Smoking status for CD subjects in the FC cohort.

Smoking status is associated with FC at recruitment for CD subjects.

Code

pander(chisq.test(demo.cd$Smoke, demo.cd$cat))

Table 24: Chi-squared test between between smoking status and FC groups for CD subjects.

Pearson’s Chi-squared test: `demo.cd$Smoke` and `demo.cd$cat`
Test statistic	df	P value
11.08	4	0.02572 *

E-cigarette use

The vast majority of participants do not report using some form of e-cigarettes. It should be noted that as recruitment ended March 2020, these day may not reflect more modern trends.

Code

demo.cd %>%
  drop_na(cat) %>%
  ggplot(aes(x = ECigs, fill = ECigs, color = ECigs)) +
  geom_bar() +
  xlab("E-cigarette status") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#4E0250", "#517664", "#1C448E"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#3D003F", "#385245", "#033070"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "E-cigarette status"),
    color = guide_legend(title = "E-cigarette status")
  )

Figure 18: E-cigarette usage for CD subjects in the FC cohort.

Code

pander(chisq.test(demo.cd$ECigs, demo.cd$cat))

Table 25: Chi-squared test between between E-cigarettte usage and FC groups for CD subjects.

Pearson’s Chi-squared test: `demo.cd$ECigs` and `demo.cd$cat`
Test statistic	df	P value
5.96	4	0.2022

Variables only relevant to ulcerative colitis/IBDU subjects

This section looks at variables only relevant to UC/IBDU subjects: Montreal extent, Mayo score, Smoking status and E-cigarette use.

As PREdiCCt required subjects to be able to flare, this effectively excluded UC subjects who had undergone a proctocolectomy. As such, previous surgery is not considered. Montreal severity was not collected by PREdiCCt.

Montreal extent

Code

mapping <- data.frame(
  code = seq(1, 6),
  definition = c(
    "Rectum",
    "Recto-sigmoid",
    "< Splenic",
    "< Hepatic",
    "Total",
    "Unknown"
  ),
  mont = c(1, 2, 2, 3, 3, NA)
)

demo.uc <- subset(demo, diagnosis2 == "UC/IBDU")
redcap.uc <- subset(redcap, participantno %in% demo.uc$ParticipantNo)
names(redcap.uc)[names(redcap.uc) == "participantno"] <- "ParticipantNo"
redcap.uc <- redcap.uc[, c("ParticipantNo", "max_extent")]
redcap.uc$Extent <- mapvalues(redcap.uc$max_extent,
  from = mapping$code,
  to = mapping$mont
)

demo.uc <- merge(demo.uc,
  redcap.uc[, c("ParticipantNo", "Extent")],
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)
demo.uc$Extent <- factor(demo.uc$Extent,
  levels = seq(1, 3),
  labels = c("E1", "E2", "E3")
)

Montreal extent has been obtained from REDCap. Similar to Montreal location, the extent of inflammation has been reported in more granular detail than is required for the Montreal classification. The below table shows how these definitions have been mapped to Montreal extent.

Code

kable(mapping,
  col.names = c("Coding", "Definition", "Montreal extent"),
  align = c("l", "c", "c")
)

Coding	Definition	Montreal extent
1	Rectum	1
2	Recto-sigmoid	2
3	< Splenic	2
4	< Hepatic	3
5	Total	3
6	Unknown	NA

As can be seen in Figure 19 E2 is the most common Montreal extent classification.

Code

demo.uc %>%
  drop_na(cat) %>%
  ggplot(aes(x = Extent, fill = Extent, color = Extent)) +
  geom_bar() +
  xlab("Montreal extent") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#7AE7C7", "#440381", "#D64045"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#07A282", "#340165", "#942E31"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "Montreal extent"),
    color = guide_legend(title = "Montreal extent")
  )

Figure 19: Montreal extent for UC/IBDU subjects in the FC cohort.

Montreal extent is not associated with FC

Code

pander(chisq.test(demo.uc$Extent, demo.uc$cat))

Table 26: Chi-squared test between Montreal extent and FC groups.

Pearson’s Chi-squared test: `demo.uc$Extent` and `demo.uc$cat`
Test statistic	df	P value
2.281	4	0.6842

Mayo score

Mayo score was collected via REDCap and is more complete than HBI.

Code

mayo.df <- redcap[, c("participantid", "current_mayo")]
colnames(mayo.df) <- c("ParticipantId", "Mayo")
mayo.df <- subset(mayo.df, ParticipantId %in% demo.uc$ParticipantId)

demo.uc <- merge(demo.uc, mayo.df, by = "ParticipantId", all.x = TRUE)
demo.uc %>%
  drop_na(cat) %>%
  ggplot(aes(x = as.factor(Mayo))) +
  geom_bar(fill = "#284B63", color = "#153243") +
  theme_minimal() +
  ylab("Frequency") +
  xlab("Mayo score at time of recruitment")

Mayo score is highly significantly associated with FC.

Code

pander(summary(aov(Mayo ~ cat, data = demo.uc)))

Table 27: ANOVA between between Mayo score and FC groups.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	25.2	12.6	8.212	0.0002973
Residuals	732	1123	1.534	NA	NA

PRO-2 in ulcerative colitis

Sum of stool frequency and rectal bleeding categories.

Code

PRO2 <- redcap[, c("participantno", "stool_freq", "rectal_bleed")]

PRO2$PRO2 <- with(PRO2, stool_freq + rectal_bleed)

PRO2$ParticipantNo <- PRO2$participantno
demo.uc <- merge(demo.uc,
  PRO2[, c("ParticipantNo", "PRO2")],
  by = "ParticipantNo",
  all.x = TRUE,
  all.y = FALSE
)

demo.uc %>%
  ggplot(aes(x = as.factor(PRO2), fill = diagnosis2, color = diagnosis2)) +
  geom_bar() +
  scale_fill_manual(values = "#FF8552", na.value = "#032B43") +
  scale_color_manual(values = "#D2632A", na.value = "#032B43") +
  theme_minimal() +
  theme(legend.position = "none")

Code

pander(summary(aov(PRO2 ~ cat, data = demo.uc)))

Table 28: ANOVA between between PRO2 and FC groups in ulcerative colitis/IBDU.

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cat	2	7.752	3.876	5.049	0.006645
Residuals	728	558.9	0.7678	NA	NA

Smoking

As with CD subjects, smoking status for UC/IBDU subjects is self-reported.

Code

demo.uc %>%
  drop_na(cat) %>%
  ggplot(aes(x = Smoke, fill = Smoke, color = Smoke)) +
  geom_bar() +
  xlab("Smoking status") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#4E0250", "#517664", "#1C448E"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#3D003F", "#385245", "#033070"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "Smoking status"),
    color = guide_legend(title = "Smoking status")
  )

Figure 21: Smoking status for UC/IBDU subjects in the FC cohort.

As with CD, smoking status is significantly associated with FC.

Code

pander(chisq.test(demo.uc$Smoke, demo.uc$cat))

Table 29: Chi-squared test between between smoking status and FC groups for UC subjects.

Pearson’s Chi-squared test: `demo.uc$Smoke` and `demo.uc$cat`
Test statistic	df	P value
9.781	4	0.04427 *

E-cigarette use

Similar to CD, very few subjects reported using E-cigarettes.

Code

demo.uc %>%
  drop_na(cat) %>%
  ggplot(aes(x = ECigs, fill = ECigs, color = ECigs)) +
  geom_bar() +
  xlab("E-cigarette status") +
  ylab("Frequency") +
  theme_minimal() +
  scale_fill_manual(
    values = c("#4E0250", "#517664", "#1C448E"),
    na.value = "#032B43"
  ) +
  scale_color_manual(
    values = c("#3D003F", "#385245", "#033070"),
    na.value = "#032B43"
  ) +
  guides(
    fill = guide_legend(title = "E-cigarette status"),
    color = guide_legend(title = "E-cigarette status")
  )

Figure 22: E-cigarette usage for UC subjects in the FC cohort.

E-cigarette use was found to significantly differ between FC groups in UC. However, the low number of subjects reported to use E-cigarettes (Figure 22) should be beared in mind when interpretting this finding.

Code

pander(fisher.test(demo.uc$ECigs, demo.uc$cat))

Table 30: Chi-squared test between between E-cigarettte usage and FC groups for UC subjects.

Fisher’s Exact Test for Count Data: `demo.uc$ECigs` and `demo.uc$cat`
P value	Alternative hypothesis
0.0196 *	two.sided

Missingness

Code

demo %>%
  select(
  `IBD Duration`,
  control_8,
  vas_control,
  Smoke,
  ECigs, 
  SiteName
  ) %>%
missing_plot2(title = "Disease phenotyping missingness")

Code

demo.cd %>%
  select(
  Location,
  L4,
  Behaviour,
  Perianal,
  HBI, 
  PRO2,
  SiteName
  ) %>%
missing_plot2(title = "Disease phenotyping missingness (CD)")

Code

demo.uc %>%
  select(
  Extent,
  Mayo,
  PRO2,
  SiteName
  ) %>%
missing_plot2(title = "Disease phenotyping missingness (UC/IBDU)")

Code

saveRDS(demo, paste0(outdir, "demo-IBD.RDS"))
saveRDS(demo.cd, paste0(outdir, "demo-cd.RDS"))
saveRDS(demo.uc, paste0(outdir, "demo-uc.RDS"))

Reproducibility & reproduction

Session info

R version 4.4.0 (2024-04-24)

Platform: aarch64-unknown-linux-gnu

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: DiagrammeRsvg(v.0.1), DiagrammeR(v.1.0.11), pander(v.0.6.5), knitr(v.1.47), table1(v.1.4.3), patchwork(v.1.2.0), datefixR(v.1.6.1), readxl(v.1.4.3), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), dplyr(v.1.1.4), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), ggplot2(v.3.5.1), tidyverse(v.2.0.0) and plyr(v.1.8.9)

loaded via a namespace (and not attached): tidyselect(v.1.2.1), finalfit(v.1.0.7), viridisLite(v.0.4.2), farver(v.2.1.2), viridis(v.0.6.5), fastmap(v.1.2.0), rpart(v.4.1.23), digest(v.0.6.35), timechange(v.0.3.0), lifecycle(v.1.0.4), survival(v.3.5-8), magrittr(v.2.0.3), compiler(v.4.4.0), rlang(v.1.1.3), tools(v.4.4.0), utf8(v.1.2.4), yaml(v.2.3.8), labeling(v.0.4.3), htmlwidgets(v.1.6.4), curl(v.5.2.1), RColorBrewer(v.1.1-3), withr(v.3.0.0), nnet(v.7.3-19), grid(v.4.4.0), fansi(v.1.0.6), jomo(v.2.7-6), colorspace(v.2.1-0), mice(v.3.16.0), scales(v.1.3.0), iterators(v.1.0.14), MASS(v.7.3-60.2), cli(v.3.6.2), rmarkdown(v.2.27), ragg(v.1.3.2), generics(v.0.1.3), tzdb(v.0.4.0), visNetwork(v.2.1.2), minqa(v.1.2.7), splines(v.4.4.0), cellranger(v.1.1.0), vctrs(v.0.6.5), V8(v.4.4.2), boot(v.1.3-30), glmnet(v.4.1-8), Matrix(v.1.7-0), jsonlite(v.1.8.8), hms(v.1.1.3), mitml(v.0.4-5), Formula(v.1.2-5), systemfonts(v.1.3.1), foreach(v.1.5.2), glue(v.1.7.0), pan(v.1.9), nloptr(v.2.0.3), codetools(v.0.2-20), stringi(v.1.8.4), gtable(v.0.3.5), shape(v.1.4.6.1), lme4(v.1.1-35.3), munsell(v.0.5.1), pillar(v.1.9.0), htmltools(v.0.5.8.1), R6(v.2.5.1), textshaping(v.0.4.0), evaluate(v.0.23), lattice(v.0.22-6), backports(v.1.5.0), broom(v.1.0.6), Rcpp(v.1.0.12), gridExtra(v.2.3), nlme(v.3.1-164), xfun(v.0.44) and pkgconfig(v.2.0.3)

Licensed by CC BY unless otherwise stated.

References

Bodger, Keith, Clare Ormerod, Daniela Shackcloth, and Melanie Harrison. 2013. “Development and Validation of a Rapid, Generic Measure of Disease Control from the Patient’s Perspective: The IBD-Control Questionnaire.” Gut 63 (7): 1092–102. https://doi.org/10.1136/gutjnl-2013-305600.

Silverberg, Mark S., Jack Satsangi, Tariq Ahmad, Ian DR Arnott, Charles N. Bernstein, Steven R. Brant, Renzo Caprilli, et al. 2005. “Toward an Integrated Clinical, Molecular and Serological Classification of Inflammatory Bowel Disease: Report of a Working Party of the 2005 Montreal World Congress of Gastroenterology.” Canadian Journal of Gastroenterology and Hepatology 19 (September): 5A–36A. https://doi.org/10.1155/2005/269076.