Overall tables

Nathan Constantine-Cooke

Overall tables

Author

Affiliations

Nathan Constantine-Cooke

Institute of Genetics and Cancer

Published

January 13, 2026

Introduction

Code

set.seed(123)

############## 
## Packages ##
##############

library(plyr) # Used for mapping values
suppressPackageStartupMessages(library(tidyverse)) # ggplot2, dplyr, and magrittr
library(readxl) # Read in Excel files
library(lubridate) # Handle dates
library(datefixR) # Standardise dates
library(patchwork) # Arrange ggplots

# Generate tables
suppressPackageStartupMessages(library(table1))
library(knitr)
library(pander)

# Generate flowchart of cohort derivation
library(DiagrammeR)
library(DiagrammeRsvg)

# paths to PREdiCCt data
if (file.exists("/docker")) { # If running in docker
  data.path <- "data/final/20221004/"
  redcap.path <- "data/final/20231030/"
  prefix <- "data/end-of-follow-up/"
  outdir <- "data/processed/"
} else { # Run on OS directly
  data.path <- "/Volumes/igmm/cvallejo-predicct/predicct/final/20221004/"
  redcap.path <- "/Volumes/igmm/cvallejo-predicct/predicct/final/20231030/"
  prefix <- "/Volumes/igmm/cvallejo-predicct/predicct/end-of-follow-up/"
  outdir <- "/Volumes/igmm/cvallejo-predicct/predicct/processed/"
}

demo <- readRDS(paste0(outdir, "demo-diet.RDS"))
FFQ <- read_xlsx(paste0(
  prefix,
  "predicct ffq_nutrientfood groupDQI all foods_data (n1092)Nov2022.xlsx"
))

On this page, you will find key demographic/phenotypic tables for the cohorts. You will also find statistical tests exploring if participant characteristics differ across cohorts.

In addition to the FC cohort (PREdiCCt subjects with a baseline FC available), there is also the FFQ cohort which consists of subjects with analysed FFQs available. All subjects in the FFQ cohort also have a baseline FC available and are therefore also in the FC cohort.

Table of baseline data by cohort

Code

my.render.cont <- function(x) {
  with(
    stats.apply.rounding(stats.default(x),
      digits = 3,
      round.integers = FALSE
    ),
    c("", "Median (IQR)" = sprintf("%s (%s - %s)", MEDIAN, Q1, Q3))
  )
}

demo$control_8 <- as.numeric(demo$control_8)


comp <- demo
comp$cohort <- "All"

temp <- demo %>%
  drop_na(cat)
temp$cohort <- "FC"

comp <- rbind(comp, temp)

temp <- subset(demo, ParticipantNo %in% FFQ$participantno)
temp$cohort <- "FFQ"

comp <- rbind(comp, temp)

comp$cohort <- factor(comp$cohort,
  levels = c("All", "FC", "FFQ"),
  labels = c("Full cohort", "FC cohort", "FFQ cohort")
)

table1(
  ~ Age +
    Sex +
    Ethnicity +
    BMIcat +
    diagnosis +
    `IBD Duration` +
    as.numeric(IMD) +
    Smoke +
    ECigs +
    control_8 +
    vas_control +
    FC +
    CReactiveProtein +
    Haemoglobin +
    WCC +
    Albumin +
    Meat_sum +
    fibre +
    PUFA_percEng +
    NOVAScore_cat +
    Biologic | cohort,
  data = comp,
  render.continuous = my.render.cont,
  overall = FALSE
)

	Full cohort (N=2629)	FC cohort (N=2144)	FFQ cohort (N=1091)
Age
Median (IQR)	44.0 (32.0 - 56.0)	44.0 (32.0 - 57.0)	47.0 (35.0 - 58.0)
Sex
Male	1207 (45.9%)	966 (45.1%)	476 (43.6%)
Female	1422 (54.1%)	1178 (54.9%)	615 (56.4%)
Ethnicity
White	1862 (70.8%)	1657 (77.3%)	1039 (95.2%)
Non-white	71 (2.7%)	57 (2.7%)	24 (2.2%)
Missing	696 (26.5%)	430 (20.1%)	28 (2.6%)
BMIcat
Underweight	49 (1.9%)	38 (1.8%)	20 (1.8%)
Normal	1004 (38.2%)	840 (39.2%)	460 (42.2%)
Overweight	923 (35.1%)	751 (35.0%)	398 (36.5%)
Obese	552 (21.0%)	431 (20.1%)	191 (17.5%)
Missing	101 (3.8%)	84 (3.9%)	22 (2.0%)
diagnosis
CD	1370 (52.1%)	1118 (52.1%)	530 (48.6%)
UC	1174 (44.7%)	950 (44.3%)	523 (47.9%)
IBDU	85 (3.2%)	76 (3.5%)	38 (3.5%)
IBD Duration
Median (IQR)	10.0 (4.73 - 18.6)	9.80 (4.60 - 18.6)	10.3 (4.92 - 20.1)
Missing	151 (5.7%)	120 (5.6%)	53 (4.9%)
as.numeric(IMD)
Median (IQR)	4.00 (2.00 - 5.00)	4.00 (2.00 - 5.00)	4.00 (3.00 - 5.00)
Missing	30 (1.1%)	22 (1.0%)	9 (0.8%)
Smoke
Current	122 (4.6%)	109 (5.1%)	63 (5.8%)
Previous	721 (27.4%)	627 (29.2%)	399 (36.6%)
Never	1071 (40.7%)	965 (45.0%)	597 (54.7%)
Missing	715 (27.2%)	443 (20.7%)	32 (2.9%)
ECigs
Current	80 (3.0%)	70 (3.3%)	38 (3.5%)
Previous	61 (2.3%)	53 (2.5%)	27 (2.5%)
Never	1770 (67.3%)	1575 (73.5%)	994 (91.1%)
Missing	718 (27.3%)	446 (20.8%)	32 (2.9%)
control_8
Median (IQR)	13.0 (11.0 - 15.0)	13.0 (11.0 - 15.0)	13.0 (11.0 - 15.0)
Missing	718 (27.3%)	447 (20.8%)	38 (3.5%)
vas_control
<85	628 (23.9%)	569 (26.5%)	320 (29.3%)
85+	1275 (48.5%)	1121 (52.3%)	730 (66.9%)
Missing	726 (27.6%)	454 (21.2%)	41 (3.8%)
FC
Median (IQR)	49.0 (20.0 - 161)	49.0 (20.0 - 161)	49.0 (20.0 - 161)
Missing	485 (18.4%)	0 (0%)	79 (7.2%)
CReactiveProtein
Median (IQR)	2.00 (1.00 - 5.00)	2.00 (1.00 - 5.00)	2.00 (1.00 - 5.00)
Missing	637 (24.2%)	506 (23.6%)	280 (25.7%)
Haemoglobin
Median (IQR)	139 (130 - 147)	138 (129 - 147)	139 (130 - 146)
Missing	469 (17.8%)	363 (16.9%)	195 (17.9%)
WCC
Median (IQR)	6.20 (5.10 - 7.60)	6.20 (5.10 - 7.60)	6.20 (5.10 - 7.46)
Missing	470 (17.9%)	364 (17.0%)	196 (18.0%)
Albumin
Median (IQR)	41.0 (39.0 - 45.0)	41.0 (39.0 - 45.0)	41.0 (38.0 - 44.0)
Missing	572 (21.8%)	448 (20.9%)	250 (22.9%)
Meat_sum
Median (IQR)	35.8 (24.9 - 50.6)	36.0 (25.3 - 50.5)	35.8 (24.9 - 50.6)
Missing	1539 (58.5%)	1132 (52.8%)	1 (0.1%)
fibre
Median (IQR)	22.9 (17.0 - 29.4)	22.8 (16.9 - 29.3)	22.9 (17.0 - 29.4)
Missing	1539 (58.5%)	1132 (52.8%)	1 (0.1%)
PUFA_percEng
Median (IQR)	5.17 (4.54 - 5.92)	5.17 (4.53 - 5.89)	5.17 (4.54 - 5.92)
Missing	1539 (58.5%)	1132 (52.8%)	1 (0.1%)
NOVAScore_cat
Unprocessed	273 (10.4%)	252 (11.8%)	273 (25.0%)
Processed culinary	273 (10.4%)	257 (12.0%)	273 (25.0%)
Processed food	274 (10.4%)	255 (11.9%)	274 (25.1%)
Ultra-processed	271 (10.3%)	248 (11.6%)	271 (24.8%)
Missing	1538 (58.5%)	1132 (52.8%)	0 (0%)
Biologic
Current	920 (35.0%)	774 (36.1%)	310 (28.4%)
Previously	160 (6.1%)	133 (6.2%)	66 (6.0%)
Never prescribed	1549 (58.9%)	1237 (57.7%)	715 (65.5%)

Associations with cohort membership

Only age, biologic use, and albumin were found to be significantly different across cohorts.

Age

There is a significant difference in age across cohorts with subjects in the FFQ sub-cohort being more likely to be older than the full or FC cohorts.

Code

pander(summary(aov(Age ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	4786	2393	10.08	4.256e-05
Residuals	5861	1391260	237.4	NA	NA

Sex

Code

pander(chisq.test(comp$Sex, comp$cohort))

Pearson’s Chi-squared test: `comp$Sex` and `comp$cohort`
Test statistic	df	P value
1.639	2	0.4406

Body mass index

Code

pander(summary(aov(BMI ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	111.4	55.7	1.95	0.1424
Residuals	5654	161505	28.56	NA	NA

Ethnicity

Code

pander(chisq.test(comp$Ethnicity, comp$cohort))

Pearson’s Chi-squared test: `comp$Ethnicity` and `comp$cohort`
Test statistic	df	P value
4.482	2	0.1063

Index of multiple deprivation

Code

pander(chisq.test(comp$IMD, comp$cohort))

Pearson’s Chi-squared test: `comp$IMD` and `comp$cohort`
Test statistic	df	P value
10.73	8	0.2172

Smoking status

Code

pander(chisq.test(comp$Smoke, comp$cohort))

Pearson’s Chi-squared test: `comp$Smoke` and `comp$cohort`
Test statistic	df	P value
0.5422	4	0.9693

IBD type

Code

pander(chisq.test(comp$diagnosis2, comp$cohort))

Pearson’s Chi-squared test: `comp$diagnosis2` and `comp$cohort`
Test statistic	df	P value
4.474	2	0.1068

Disease duration

Code

pander(summary(aov(`IBD Duration` ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	356.2	178.1	1.492	0.225
Residuals	5537	660933	119.4	NA	NA

IBD Control-8

Code

pander(summary(aov(control_8 ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	19.77	9.884	1.018	0.3613
Residuals	4658	45211	9.706	NA	NA

IBD visual analogue score

Code

pander(chisq.test(comp$vas_control, comp$cohort))

Pearson’s Chi-squared test: `comp$vas_control` and `comp$cohort`
Test statistic	df	P value
3.158	2	0.2062

Faecal calprotectin

FC has been treated as a continuous variable for this test. FC has not been discretised.

Code

pander(summary(aov(FC ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	10042	5021	0.04235	0.9585
Residuals	5297	6.28e+08	118564	NA	NA

C-reactive protein

Code

pander(summary(aov(CReactiveProtein ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	12.36	6.178	0.05603	0.9455
Residuals	4438	489297	110.3	NA	NA

Haemoglobin

Code

pander(summary(aov(Haemoglobin ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	84.22	42.11	0.2276	0.7965
Residuals	4834	894479	185	NA	NA

White cell count

Code

pander(summary(aov(WCC ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	81.86	40.93	1.06	0.3464
Residuals	4831	186499	38.6	NA	NA

Platelets

Code

pander(summary(aov(Platelets ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	12234	6117	1.25	0.2866
Residuals	4830	23635277	4893	NA	NA

Albumin

There is a significant difference in albumin across the cohorts. However, this difference appears to be negligible.

Code

pander(summary(aov(Albumin ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	320.6	160.3	6.931	0.0009876
Residuals	4591	106189	23.13	NA	NA

Dietary data

Dietary data have not been compared across cohorts.

Biologic usage

There is a significant difference in biologic usage across the cohorts with subjects in the FFQ sub-cohort being less likely to have been prescribed a biologic than the full cohort or the FC sub-cohort.

Code

pander(chisq.test(comp$Biologic, comp$cohort))

Pearson’s Chi-squared test: `comp$Biologic` and `comp$cohort`
Test statistic	df	P value
21.41	4	0.0002626 * * *

Table of baseline data by FC groups

Code

demo %>%
  drop_na(cat) %>%
  table1(
    x = ~ Age +
    Sex +
    Ethnicity +
    BMIcat +
    diagnosis +
    `IBD Duration` +
    Smoke +
    ECigs +
    as.numeric(IMD) +
    control_8 +
    vas_control +
    FC +
    CReactiveProtein +
    Haemoglobin +
    WCC +
    Platelets +
    Albumin +
    Meat_sum +
    fibre +
    PUFA_percEng +
    NOVAScore_cat +
    Biologic | cat,
    render.continuous = my.render.cont
  )

Table 1: Baseline data by FC.

	FC < 50 (N=1077)	FC 50-250 (N=683)	FC > 250 (N=384)	Overall (N=2144)
Age
Median (IQR)	44.0 (33.0 - 56.0)	46.0 (33.0 - 59.5)	42.5 (30.0 - 55.0)	44.0 (32.0 - 57.0)
Sex
Male	442 (41.0%)	333 (48.8%)	191 (49.7%)	966 (45.1%)
Female	635 (59.0%)	350 (51.2%)	193 (50.3%)	1178 (54.9%)
Ethnicity
White	834 (77.4%)	540 (79.1%)	283 (73.7%)	1657 (77.3%)
Non-white	33 (3.1%)	13 (1.9%)	11 (2.9%)	57 (2.7%)
Missing	210 (19.5%)	130 (19.0%)	90 (23.4%)	430 (20.1%)
BMIcat
Underweight	19 (1.8%)	12 (1.8%)	7 (1.8%)	38 (1.8%)
Normal	446 (41.4%)	244 (35.7%)	150 (39.1%)	840 (39.2%)
Overweight	371 (34.4%)	246 (36.0%)	134 (34.9%)	751 (35.0%)
Obese	201 (18.7%)	155 (22.7%)	75 (19.5%)	431 (20.1%)
Missing	40 (3.7%)	26 (3.8%)	18 (4.7%)	84 (3.9%)
diagnosis
CD	533 (49.5%)	381 (55.8%)	204 (53.1%)	1118 (52.1%)
UC	504 (46.8%)	283 (41.4%)	163 (42.4%)	950 (44.3%)
IBDU	40 (3.7%)	19 (2.8%)	17 (4.4%)	76 (3.5%)
IBD Duration
Median (IQR)	10.1 (4.62 - 19.4)	9.66 (4.76 - 18.4)	9.07 (4.42 - 16.5)	9.80 (4.60 - 18.6)
Missing	60 (5.6%)	38 (5.6%)	22 (5.7%)	120 (5.6%)
Smoke
Current	64 (5.9%)	36 (5.3%)	9 (2.3%)	109 (5.1%)
Previous	309 (28.7%)	213 (31.2%)	105 (27.3%)	627 (29.2%)
Never	487 (45.2%)	300 (43.9%)	178 (46.4%)	965 (45.0%)
Missing	217 (20.1%)	134 (19.6%)	92 (24.0%)	443 (20.7%)
ECigs
Current	28 (2.6%)	33 (4.8%)	9 (2.3%)	70 (3.3%)
Previous	31 (2.9%)	16 (2.3%)	6 (1.6%)	53 (2.5%)
Never	799 (74.2%)	500 (73.2%)	276 (71.9%)	1575 (73.5%)
Missing	219 (20.3%)	134 (19.6%)	93 (24.2%)	446 (20.8%)
as.numeric(IMD)
Median (IQR)	4.00 (3.00 - 5.00)	4.00 (2.00 - 5.00)	4.00 (2.00 - 5.00)	4.00 (2.00 - 5.00)
Missing	15 (1.4%)	4 (0.6%)	3 (0.8%)	22 (1.0%)
control_8
Median (IQR)	13.0 (11.0 - 15.0)	13.0 (11.0 - 15.0)	13.0 (9.00 - 15.0)	13.0 (11.0 - 15.0)
Missing	218 (20.2%)	136 (19.9%)	93 (24.2%)	447 (20.8%)
vas_control
<85	250 (23.2%)	201 (29.4%)	118 (30.7%)	569 (26.5%)
85+	607 (56.4%)	342 (50.1%)	172 (44.8%)	1121 (52.3%)
Missing	220 (20.4%)	140 (20.5%)	94 (24.5%)	454 (21.2%)
FC
Median (IQR)	20.0 (20.0 - 29.0)	104 (71.0 - 157)	498 (347 - 780)	49.0 (20.0 - 161)
CReactiveProtein
Median (IQR)	2.00 (1.00 - 5.00)	2.00 (1.00 - 5.00)	4.00 (1.00 - 6.00)	2.00 (1.00 - 5.00)
Missing	256 (23.8%)	167 (24.5%)	83 (21.6%)	506 (23.6%)
Haemoglobin
Median (IQR)	139 (130 - 147)	139 (130 - 148)	137 (127 - 146)	138 (129 - 147)
Missing	184 (17.1%)	120 (17.6%)	59 (15.4%)	363 (16.9%)
WCC
Median (IQR)	6.10 (4.99 - 7.40)	6.30 (5.30 - 7.69)	6.60 (5.31 - 8.10)	6.20 (5.10 - 7.60)
Missing	184 (17.1%)	120 (17.6%)	60 (15.6%)	364 (17.0%)
Platelets
Median (IQR)	257 (223 - 299)	264 (225 - 304)	277 (231 - 322)	262 (224 - 305)
Missing	184 (17.1%)	120 (17.6%)	60 (15.6%)	364 (17.0%)
Albumin
Median (IQR)	42.0 (39.0 - 46.0)	41.0 (38.0 - 44.0)	40.0 (38.0 - 43.0)	41.0 (39.0 - 45.0)
Missing	226 (21.0%)	146 (21.4%)	76 (19.8%)	448 (20.9%)
Meat_sum
Median (IQR)	34.8 (24.6 - 48.1)	37.5 (26.0 - 53.2)	35.5 (25.2 - 51.4)	36.0 (25.3 - 50.5)
Missing	570 (52.9%)	357 (52.3%)	205 (53.4%)	1132 (52.8%)
fibre
Median (IQR)	23.5 (17.0 - 29.4)	23.0 (16.9 - 29.5)	21.7 (16.8 - 27.9)	22.8 (16.9 - 29.3)
Missing	570 (52.9%)	357 (52.3%)	205 (53.4%)	1132 (52.8%)
PUFA_percEng
Median (IQR)	5.15 (4.57 - 5.96)	5.14 (4.37 - 5.82)	5.33 (4.57 - 5.93)	5.17 (4.53 - 5.89)
Missing	570 (52.9%)	357 (52.3%)	205 (53.4%)	1132 (52.8%)
NOVAScore_cat
Unprocessed	121 (11.2%)	75 (11.0%)	56 (14.6%)	252 (11.8%)
Processed culinary	138 (12.8%)	74 (10.8%)	45 (11.7%)	257 (12.0%)
Processed food	136 (12.6%)	84 (12.3%)	35 (9.1%)	255 (11.9%)
Ultra-processed	112 (10.4%)	93 (13.6%)	43 (11.2%)	248 (11.6%)
Missing	570 (52.9%)	357 (52.3%)	205 (53.4%)	1132 (52.8%)
Biologic
Current	406 (37.7%)	220 (32.2%)	148 (38.5%)	774 (36.1%)
Previously	48 (4.5%)	55 (8.1%)	30 (7.8%)	133 (6.2%)
Never prescribed	623 (57.8%)	408 (59.7%)	206 (53.6%)	1237 (57.7%)

Table of baseline data by IBD type

Code

demo %>%
  drop_na(cat) %>%
  table1::table1(
    x = ~ Age +
    Sex +
    Ethnicity +
    BMIcat +
    diagnosis +
    `IBD Duration` +
    Smoke + 
    ECigs +
    as.numeric(IMD) +
    control_8 +
    vas_control +
    FC +
    CReactiveProtein +
    Haemoglobin +
    WCC +
    Platelets +  
    Albumin +
    Meat_sum +
    fibre +
    PUFA_percEng +
    NOVAScore_cat +
    Biologic |
      diagnosis2,
    render.continuous = my.render.cont
  )

Table 2: Baseline data by IBD type.

	CD (N=1118)	UC/IBDU (N=1026)	Overall (N=2144)
Age
Median (IQR)	41.5 (31.0 - 54.0)	48.0 (36.0 - 59.0)	44.0 (32.0 - 57.0)
Sex
Male	478 (42.8%)	488 (47.6%)	966 (45.1%)
Female	640 (57.2%)	538 (52.4%)	1178 (54.9%)
Ethnicity
White	849 (75.9%)	808 (78.8%)	1657 (77.3%)
Non-white	31 (2.8%)	26 (2.5%)	57 (2.7%)
Missing	238 (21.3%)	192 (18.7%)	430 (20.1%)
BMIcat
Underweight	20 (1.8%)	18 (1.8%)	38 (1.8%)
Normal	451 (40.3%)	389 (37.9%)	840 (39.2%)
Overweight	379 (33.9%)	372 (36.3%)	751 (35.0%)
Obese	215 (19.2%)	216 (21.1%)	431 (20.1%)
Missing	53 (4.7%)	31 (3.0%)	84 (3.9%)
diagnosis
CD	1118 (100%)	0 (0%)	1118 (52.1%)
UC	0 (0%)	950 (92.6%)	950 (44.3%)
IBDU	0 (0%)	76 (7.4%)	76 (3.5%)
IBD Duration
Median (IQR)	10.3 (5.04 - 20.2)	9.28 (4.27 - 16.8)	9.80 (4.60 - 18.6)
Missing	64 (5.7%)	56 (5.5%)	120 (5.6%)
Smoke
Current	63 (5.6%)	46 (4.5%)	109 (5.1%)
Previous	302 (27.0%)	325 (31.7%)	627 (29.2%)
Never	508 (45.4%)	457 (44.5%)	965 (45.0%)
Missing	245 (21.9%)	198 (19.3%)	443 (20.7%)
ECigs
Current	39 (3.5%)	31 (3.0%)	70 (3.3%)
Previous	33 (3.0%)	20 (1.9%)	53 (2.5%)
Never	799 (71.5%)	776 (75.6%)	1575 (73.5%)
Missing	247 (22.1%)	199 (19.4%)	446 (20.8%)
as.numeric(IMD)
Median (IQR)	4.00 (2.00 - 5.00)	4.00 (3.00 - 5.00)	4.00 (2.00 - 5.00)
Missing	11 (1.0%)	11 (1.1%)	22 (1.0%)
control_8
Median (IQR)	13.0 (10.0 - 15.0)	13.0 (11.0 - 15.0)	13.0 (11.0 - 15.0)
Missing	248 (22.2%)	199 (19.4%)	447 (20.8%)
vas_control
<85	318 (28.4%)	251 (24.5%)	569 (26.5%)
85+	547 (48.9%)	574 (55.9%)	1121 (52.3%)
Missing	253 (22.6%)	201 (19.6%)	454 (21.2%)
FC
Median (IQR)	54.0 (20.0 - 167)	43.0 (20.0 - 155)	49.0 (20.0 - 161)
CReactiveProtein
Median (IQR)	2.00 (1.00 - 5.00)	2.00 (1.00 - 5.00)	2.00 (1.00 - 5.00)
Missing	255 (22.8%)	251 (24.5%)	506 (23.6%)
Haemoglobin
Median (IQR)	137 (128 - 146)	139 (132 - 148)	138 (129 - 147)
Missing	164 (14.7%)	199 (19.4%)	363 (16.9%)
WCC
Median (IQR)	6.30 (5.20 - 7.60)	6.20 (5.00 - 7.60)	6.20 (5.10 - 7.60)
Missing	163 (14.6%)	201 (19.6%)	364 (17.0%)
Platelets
Median (IQR)	265 (227 - 309)	258 (222 - 303)	262 (224 - 305)
Missing	164 (14.7%)	200 (19.5%)	364 (17.0%)
Albumin
Median (IQR)	41.0 (38.0 - 44.0)	42.0 (39.0 - 45.0)	41.0 (39.0 - 45.0)
Missing	209 (18.7%)	239 (23.3%)	448 (20.9%)
Meat_sum
Median (IQR)	35.8 (25.3 - 49.9)	36.1 (25.1 - 50.9)	36.0 (25.3 - 50.5)
Missing	621 (55.5%)	511 (49.8%)	1132 (52.8%)
fibre
Median (IQR)	22.1 (16.4 - 28.4)	23.4 (17.8 - 29.9)	22.8 (16.9 - 29.3)
Missing	621 (55.5%)	511 (49.8%)	1132 (52.8%)
PUFA_percEng
Median (IQR)	5.19 (4.53 - 5.97)	5.15 (4.53 - 5.88)	5.17 (4.53 - 5.89)
Missing	621 (55.5%)	511 (49.8%)	1132 (52.8%)
NOVAScore_cat
Unprocessed	126 (11.3%)	126 (12.3%)	252 (11.8%)
Processed culinary	128 (11.4%)	129 (12.6%)	257 (12.0%)
Processed food	122 (10.9%)	133 (13.0%)	255 (11.9%)
Ultra-processed	121 (10.8%)	127 (12.4%)	248 (11.6%)
Missing	621 (55.5%)	511 (49.8%)	1132 (52.8%)
Biologic
Current	522 (46.7%)	252 (24.6%)	774 (36.1%)
Previously	96 (8.6%)	37 (3.6%)	133 (6.2%)
Never prescribed	500 (44.7%)	737 (71.8%)	1237 (57.7%)

Table of Crohn’s disease variables

Code

demo.cd <- readRDS(paste0(outdir, "demo-cd.RDS"))
demo.cd %>%
  drop_na(cat) %>%
  table1(
    x = ~ `IBD Duration` + 
      Location +
      L4 +
      HBI +
      PRO2 +
      Behaviour +
      Perianal +
      Surgery +
      Smoke +
      ECigs |
      cat,
    render.continuous = my.render.cont
  )

	FC < 50 (N=533)	FC 50-250 (N=381)	FC > 250 (N=204)	Overall (N=1118)
IBD Duration
Median (IQR)	10.2 (4.89 - 20.9)	10.6 (5.02 - 20.2)	10.1 (5.37 - 17.9)	10.3 (5.04 - 20.2)
Missing	33 (6.2%)	21 (5.5%)	10 (4.9%)	64 (5.7%)
Location
L1	138 (25.9%)	99 (26.0%)	46 (22.5%)	283 (25.3%)
L2	128 (24.0%)	90 (23.6%)	60 (29.4%)	278 (24.9%)
L3	176 (33.0%)	136 (35.7%)	73 (35.8%)	385 (34.4%)
L4 only	3 (0.6%)	4 (1.0%)	3 (1.5%)	10 (0.9%)
Missing	88 (16.5%)	52 (13.6%)	22 (10.8%)	162 (14.5%)
L4
Not present	398 (74.7%)	293 (76.9%)	151 (74.0%)	842 (75.3%)
Present	47 (8.8%)	36 (9.4%)	31 (15.2%)	114 (10.2%)
Missing	88 (16.5%)	52 (13.6%)	22 (10.8%)	162 (14.5%)
HBI
Median (IQR)	2.00 (1.00 - 4.00)	2.00 (1.00 - 4.00)	2.00 (1.00 - 3.00)	2.00 (1.00 - 4.00)
Missing	331 (62.1%)	227 (59.6%)	116 (56.9%)	674 (60.3%)
PRO2
Median (IQR)	0 (0 - 2.00)	0 (0 - 2.00)	0 (0 - 1.00)	0 (0 - 2.00)
Missing	242 (45.4%)	135 (35.4%)	69 (33.8%)	446 (39.9%)
Behaviour
B1	279 (52.3%)	211 (55.4%)	127 (62.3%)	617 (55.2%)
B2	96 (18.0%)	73 (19.2%)	37 (18.1%)	206 (18.4%)
B3	51 (9.6%)	31 (8.1%)	11 (5.4%)	93 (8.3%)
Missing	107 (20.1%)	66 (17.3%)	29 (14.2%)	202 (18.1%)
Perianal
No	283 (53.1%)	224 (58.8%)	112 (54.9%)	619 (55.4%)
Yes	149 (28.0%)	106 (27.8%)	66 (32.4%)	321 (28.7%)
Missing	101 (18.9%)	51 (13.4%)	26 (12.7%)	178 (15.9%)
Surgery
No	262 (49.2%)	181 (47.5%)	118 (57.8%)	561 (50.2%)
Yes	251 (47.1%)	193 (50.7%)	78 (38.2%)	522 (46.7%)
Missing	20 (3.8%)	7 (1.8%)	8 (3.9%)	35 (3.1%)
Smoke
Current	32 (6.0%)	26 (6.8%)	5 (2.5%)	63 (5.6%)
Previous	153 (28.7%)	108 (28.3%)	41 (20.1%)	302 (27.0%)
Never	234 (43.9%)	169 (44.4%)	105 (51.5%)	508 (45.4%)
Missing	114 (21.4%)	78 (20.5%)	53 (26.0%)	245 (21.9%)
ECigs
Current	19 (3.6%)	18 (4.7%)	2 (1.0%)	39 (3.5%)
Previous	16 (3.0%)	13 (3.4%)	4 (2.0%)	33 (3.0%)
Never	382 (71.7%)	272 (71.4%)	145 (71.1%)	799 (71.5%)
Missing	116 (21.8%)	78 (20.5%)	53 (26.0%)	247 (22.1%)

Table of Crohn’s disease variables by cohort

Code

comp <- demo.cd
comp$cohort <- "All"

temp <- demo.cd %>%
  drop_na(cat)
temp$cohort <- "FC"

comp <- rbind(comp, temp)

temp <- subset(demo.cd, ParticipantNo %in% FFQ$participantno)
temp$cohort <- "FFQ"

comp <- rbind(comp, temp)

comp$cohort <- factor(comp$cohort,
  levels = c("All", "FC", "FFQ"),
  labels = c("Full cohort", "FC cohort", "FFQ cohort")
)

table1(
  ~ Location +
    L4 +
    HBI +
    PRO2 +
    Behaviour +
    Perianal +
    Surgery +
    Smoke +
    ECigs |
    cohort,
  data = comp,
  render.continuous = my.render.cont,
  overall = FALSE
)

	Full cohort (N=1370)	FC cohort (N=1118)	FFQ cohort (N=530)
Location
L1	362 (26.4%)	283 (25.3%)	149 (28.1%)
L2	333 (24.3%)	278 (24.9%)	141 (26.6%)
L3	454 (33.1%)	385 (34.4%)	160 (30.2%)
L4 only	14 (1.0%)	10 (0.9%)	6 (1.1%)
Missing	207 (15.1%)	162 (14.5%)	74 (14.0%)
L4
Not present	1031 (75.3%)	842 (75.3%)	412 (77.7%)
Present	132 (9.6%)	114 (10.2%)	44 (8.3%)
Missing	207 (15.1%)	162 (14.5%)	74 (14.0%)
HBI
Median (IQR)	2.00 (1.00 - 4.00)	2.00 (1.00 - 4.00)	2.00 (1.00 - 3.00)
Missing	833 (60.8%)	674 (60.3%)	287 (54.2%)
PRO2
Median (IQR)	0 (0 - 2.00)	0 (0 - 2.00)	1.00 (0 - 2.00)
Missing	555 (40.5%)	446 (39.9%)	179 (33.8%)
Behaviour
B1	749 (54.7%)	617 (55.2%)	285 (53.8%)
B2	252 (18.4%)	206 (18.4%)	105 (19.8%)
B3	121 (8.8%)	93 (8.3%)	54 (10.2%)
Missing	248 (18.1%)	202 (18.1%)	86 (16.2%)
Perianal
No	746 (54.5%)	619 (55.4%)	299 (56.4%)
Yes	397 (29.0%)	321 (28.7%)	151 (28.5%)
Missing	227 (16.6%)	178 (15.9%)	80 (15.1%)
Surgery
No	684 (49.9%)	561 (50.2%)	249 (47.0%)
Yes	634 (46.3%)	522 (46.7%)	273 (51.5%)
Missing	52 (3.8%)	35 (3.1%)	8 (1.5%)
Smoke
Current	73 (5.3%)	63 (5.6%)	32 (6.0%)
Previous	343 (25.0%)	302 (27.0%)	186 (35.1%)
Never	562 (41.0%)	508 (45.4%)	299 (56.4%)
Missing	392 (28.6%)	245 (21.9%)	13 (2.5%)
ECigs
Current	45 (3.3%)	39 (3.5%)	17 (3.2%)
Previous	39 (2.8%)	33 (3.0%)	14 (2.6%)
Never	892 (65.1%)	799 (71.5%)	486 (91.7%)
Missing	394 (28.8%)	247 (22.1%)	13 (2.5%)

Associations between Crohn’s Disease-only variables and cohort membership

None of the CD-only variables were found to significantly differ across cohorts.

Montreal location

Code

pander(fisher.test(comp$Location, comp$cohort, workspace = 200000000))

Fisher’s Exact Test for Count Data: `comp$Location` and `comp$cohort`
P value	Alternative hypothesis
0.659	two.sided

Upper gastrointestinal inflammation

Code

pander(chisq.test(comp$L4, comp$cohort))

Pearson’s Chi-squared test: `comp$L4` and `comp$cohort`
Test statistic	df	P value
1.616	2	0.4457

Harvey-Bradshaw index

Code

pander(summary(aov(HBI ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	2.721	1.36	0.2382	0.788
Residuals	1221	6972	5.71	NA	NA

PRO2

Code

pander(summary(aov(PRO2 ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	0.6806	0.3403	0.08516	0.9184
Residuals	1835	7333	3.996	NA	NA

Montreal behaviour

Code

pander(chisq.test(comp$Behaviour, comp$cohort))

Pearson’s Chi-squared test: `comp$Behaviour` and `comp$cohort`
Test statistic	df	P value
1.81	4	0.7707

Perianal disease

Code

pander(chisq.test(comp$Perianal, comp$cohort))

Pearson’s Chi-squared test: `comp$Perianal` and `comp$cohort`
Test statistic	df	P value
0.2153	2	0.898

Surgery

Code

pander(chisq.test(comp$Surgery, comp$cohort))

Pearson’s Chi-squared test: `comp$Surgery` and `comp$cohort`
Test statistic	df	P value
2.961	2	0.2276

Smoking status

Code

pander(chisq.test(comp$Smoke, comp$cohort))

Pearson’s Chi-squared test: `comp$Smoke` and `comp$cohort`
Test statistic	df	P value
1.023	4	0.9063

E-cigarette use

Code

pander(chisq.test(comp$ECigs, comp$cohort))

Pearson’s Chi-squared test: `comp$ECigs` and `comp$cohort`
Test statistic	df	P value
3.416	4	0.4908

Table of ulcerative colitis/IBDU variables

Code

demo.uc <- readRDS(paste0(outdir, "demo-uc.RDS"))

demo.uc %>%
  drop_na(cat) %>%
  table1(
    x = ~ `IBD Duration` + Extent + Mayo + PRO2 + Smoke + ECigs | cat,
    render.continuous = my.render.cont
  )

	FC < 50 (N=544)	FC 50-250 (N=302)	FC > 250 (N=180)	Overall (N=1026)
IBD Duration
Median (IQR)	10.1 (4.48 - 17.8)	8.30 (4.31 - 15.4)	7.45 (3.10 - 13.4)	9.28 (4.27 - 16.8)
Missing	27 (5.0%)	17 (5.6%)	12 (6.7%)	56 (5.5%)
Extent
E1	74 (13.6%)	34 (11.3%)	20 (11.1%)	128 (12.5%)
E2	214 (39.3%)	126 (41.7%)	76 (42.2%)	416 (40.5%)
E3	126 (23.2%)	79 (26.2%)	48 (26.7%)	253 (24.7%)
Missing	130 (23.9%)	63 (20.9%)	36 (20.0%)	229 (22.3%)
Mayo
Median (IQR)	0 (0 - 1.00)	0 (0 - 1.00)	0.500 (0 - 2.00)	0 (0 - 1.00)
Missing	171 (31.4%)	62 (20.5%)	58 (32.2%)	291 (28.4%)
PRO2
Median (IQR)	0 (0 - 1.00)	0 (0 - 1.00)	0 (0 - 1.00)	0 (0 - 1.00)
Missing	172 (31.6%)	64 (21.2%)	59 (32.8%)	295 (28.8%)
Smoke
Current	32 (5.9%)	10 (3.3%)	4 (2.2%)	46 (4.5%)
Previous	156 (28.7%)	105 (34.8%)	64 (35.6%)	325 (31.7%)
Never	253 (46.5%)	131 (43.4%)	73 (40.6%)	457 (44.5%)
Missing	103 (18.9%)	56 (18.5%)	39 (21.7%)	198 (19.3%)
ECigs
Current	9 (1.7%)	15 (5.0%)	7 (3.9%)	31 (3.0%)
Previous	15 (2.8%)	3 (1.0%)	2 (1.1%)	20 (1.9%)
Never	417 (76.7%)	228 (75.5%)	131 (72.8%)	776 (75.6%)
Missing	103 (18.9%)	56 (18.5%)	40 (22.2%)	199 (19.4%)

Table of Ulcerative colitis/IBDU variables by cohort

Code

comp <- demo.uc
comp$cohort <- "All"

temp <- demo.uc %>%
  drop_na(cat)
temp$cohort <- "FC"

comp <- rbind(comp, temp)

temp <- subset(demo.uc, ParticipantNo %in% FFQ$participantno)
temp$cohort <- "FFQ"

comp <- rbind(comp, temp)

comp$cohort <- factor(comp$cohort,
  levels = c("All", "FC", "FFQ"),
  labels = c("Full cohort", "FC cohort", "FFQ cohort")
)


table1(~ Extent + Mayo + PRO2 + Smoke + ECigs | cohort,
  data = comp,
  render.continuous = my.render.cont,
  overall = FALSE
)

	Full cohort (N=1259)	FC cohort (N=1026)	FFQ cohort (N=561)
Extent
E1	166 (13.2%)	128 (12.5%)	81 (14.4%)
E2	503 (40.0%)	416 (40.5%)	250 (44.6%)
E3	318 (25.3%)	253 (24.7%)	130 (23.2%)
Missing	272 (21.6%)	229 (22.3%)	100 (17.8%)
Mayo
Median (IQR)	0 (0 - 1.00)	0 (0 - 1.00)	0 (0 - 1.00)
Missing	366 (29.1%)	291 (28.4%)	136 (24.2%)
PRO2
Median (IQR)	0 (0 - 1.00)	0 (0 - 1.00)	0 (0 - 1.00)
Missing	372 (29.5%)	295 (28.8%)	136 (24.2%)
Smoke
Current	49 (3.9%)	46 (4.5%)	31 (5.5%)
Previous	378 (30.0%)	325 (31.7%)	213 (38.0%)
Never	509 (40.4%)	457 (44.5%)	298 (53.1%)
Missing	323 (25.7%)	198 (19.3%)	19 (3.4%)
ECigs
Current	35 (2.8%)	31 (3.0%)	21 (3.7%)
Previous	22 (1.7%)	20 (1.9%)	13 (2.3%)
Never	878 (69.7%)	776 (75.6%)	508 (90.6%)
Missing	324 (25.7%)	199 (19.4%)	19 (3.4%)

Associations between UC/IBDU-only variables and cohort membership

None of the UC/IBDU-only variables were found to significantly differ across cohorts.

Montreal extent

Code

pander(chisq.test(comp$Extent, comp$cohort))

Pearson’s Chi-squared test: `comp$Extent` and `comp$cohort`
Test statistic	df	P value
2.793	4	0.593

Mayo score

Code

pander(summary(aov(Mayo ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	5.821	2.91	1.794	0.1665
Residuals	2050	3325	1.622	NA	NA

PRO2

Code

pander(summary(aov(PRO2 ~ cohort, data = comp)))

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
cohort	2	1.319	0.6593	0.8346	0.4342
Residuals	2040	1611	0.7899	NA	NA

Smoking status

Code

pander(chisq.test(comp$Smoke, comp$cohort))

Pearson’s Chi-squared test: `comp$Smoke` and `comp$cohort`
Test statistic	df	P value
0.398	4	0.9826

E-cigarette use

Code

pander(chisq.test(comp$ECigs, comp$cohort))

Pearson’s Chi-squared test: `comp$ECigs` and `comp$cohort`
Test statistic	df	P value
0.02764	4	0.9999

Code

demo %>%
  drop_na(cat) %>%
  saveRDS(paste0(outdir, "demo.RDS"))

demo %>%
  saveRDS(paste0(outdir, "demo-full.RDS"))

demo.cd %>%
  saveRDS(paste0(outdir, "demo-cd.RDS"))

demo.uc %>%
  saveRDS(paste0(outdir, "demo-uc.RDS"))

Reproduction and reproducibility

Session info

R version 4.4.0 (2024-04-24)

Platform: aarch64-unknown-linux-gnu

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: DiagrammeRsvg(v.0.1), DiagrammeR(v.1.0.11), pander(v.0.6.5), knitr(v.1.47), table1(v.1.4.3), patchwork(v.1.2.0), datefixR(v.1.6.1), readxl(v.1.4.3), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), dplyr(v.1.1.4), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), ggplot2(v.3.5.1), tidyverse(v.2.0.0) and plyr(v.1.8.9)

loaded via a namespace (and not attached): utf8(v.1.2.4), generics(v.0.1.3), stringi(v.1.8.4), hms(v.1.1.3), digest(v.0.6.35), magrittr(v.2.0.3), evaluate(v.0.23), grid(v.4.4.0), timechange(v.0.3.0), RColorBrewer(v.1.1-3), fastmap(v.1.2.0), cellranger(v.1.1.0), jsonlite(v.1.8.8), Formula(v.1.2-5), fansi(v.1.0.6), scales(v.1.3.0), codetools(v.0.2-20), cli(v.3.6.2), rlang(v.1.1.3), visNetwork(v.2.1.2), munsell(v.0.5.1), withr(v.3.0.0), yaml(v.2.3.8), tools(v.4.4.0), tzdb(v.0.4.0), colorspace(v.2.1-0), curl(v.5.2.1), vctrs(v.0.6.5), R6(v.2.5.1), lifecycle(v.1.0.4), V8(v.4.4.2), htmlwidgets(v.1.6.4), pkgconfig(v.2.0.3), pillar(v.1.9.0), gtable(v.0.3.5), glue(v.1.7.0), Rcpp(v.1.0.12), xfun(v.0.44), tidyselect(v.1.2.1), htmltools(v.0.5.8.1), rmarkdown(v.2.27) and compiler(v.4.4.0)

Licensed by CC BY unless otherwise stated.