-
Notifications
You must be signed in to change notification settings - Fork 6
/
rmarkdown_example_notebook.Rmd
309 lines (204 loc) · 9.98 KB
/
rmarkdown_example_notebook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
---
title: "R Demo for the 2018 BHI & BSN Data Challenge"
output: html_notebook
editor_options:
chunk_output_type: inline
---
# R Demo for the 2018 BHI & BSN Data Challenge
The toughest part is often getting the database running and the software connecting to it. This is the way I usually do it:
run echo 'password' > .pwfile_ in a command prompt, where password is replaced with your user's postgres password
Run this script in the same directory as this file
The resulting chunk may install some R packages on your computer.
```{r}
if(!("dplyr" %in% installed.packages()) |!("dbplyr" %in% installed.packages()) | !("RPostgreSQL" %in% installed.packages())) {
install.packages(c("dplyr","dbplyr","RPostgreSQL"))
}
library(dplyr); library(dbplyr); library(RPostgreSQL); library(DBI)
# Create a database connection
user <- 'postgres'
host <- 'localhost'
dbname <- 'mimic'
schema <- 'mimiciii,public'
port <- 5434
passwd <- readLines("~/.pwfile_")
pg_src <- src_postgres(dbname = dbname,
host = host, port = port,
user = user, password = passwd,
options=paste0("-c search_path=", schema))
m <- dbDriver("PostgreSQL")
con <- dbConnect(m, user=user, password=passwd, dbname=dbname,host=host,port=port) # The trickiest parts
dbSendStatement(con,paste0("SET search_path TO ",schema))
```
```{r}
# Run query and assign the results to a DataFrame
# Requires the icustay_detail view from:
# https://github.com/MIT-LCP/mimic-code/tree/master/concepts/demographics
# To get these views, the following may work (uncomment to run):
## NOTE:
## This will overwrite old views on your machine if you had already created them.
## If you have them already, you likely do not need to run this code!.
## This will take a while, possibly several hours, depending on the speed of your computer/network
#
#
## if(!("MIMICutil" %in% installed.packages())) { devtools::install_github("jraffa/MIMICutil") }
##
## library(MIMICutil); library(RPostgreSQL)
## v <- get_views(URLlist="https://raw.githubusercontent.com/jraffa/MIMICutil/master/mat_view_urls_working",dplyrDB=pg_src,con=con)
###
query = "
WITH first_icu AS (
SELECT i.subject_id, i.hadm_id, i.icustay_id, i.gender, i.admittime admittime_hospital,
i.dischtime dischtime_hospital, i.los_hospital, i.age, i.admission_type,
i.hospital_expire_flag, i.intime intime_icu, i.outtime outtime_icu, i.los_icu,
s.first_careunit
FROM icustay_detail i
LEFT JOIN icustays s
ON i.icustay_id = s.icustay_id
WHERE i.hospstay_seq = 1
AND i.icustay_seq = 1
AND i.age >= 16
)
SELECT f.*, o.icustay_expire_flag, o.oasis, o.oasis_prob
FROM first_icu f
LEFT JOIN oasis o
ON f.icustay_id = o.icustay_id;
"
rs <- dbSendQuery(con,query)
dat <- dbFetch(rs)
# or using dplyr:
# uncomment below
#
## icustay_detail_tbl <- tbl(pg_src,"icustay_detail")
## icustays_tbl <- tbl(pg_src,"icustays")
## oasis_tbl <- tbl(pg_src,"oasis")
##
## first_icu_tbl <- icustay_detail_tbl %>% select(subject_id,hadm_id, icustay_id, gender,admittime_hospital=admittime,
## dischtime_hospital=dischtime, los_hospital, admission_type,
## hospital_expire_flag,intime_icu=intime, outtime_icu=outtime, los_icu,age,hospstay_seq,icustay_seq) %>%
## left_join(icustays_tbl %>% select(icustay_id, first_careunit) ,by="icustay_id") %>%
## filter(hospstay_seq==1,icustay_seq ==1,age>=16) %>% select(-hospstay_seq,-icustay_seq)
## dat <- first_icu_tbl %>%
## left_join(oasis_tbl %>% select(icustay_id,icustay_expire_flag, oasis, oasis_prob),by="icustay_id") %>%
## collect(n=Inf)
```
# Check Data Extracted
Always a good idea to inspect the data after you have extracted it. We will look at the first six patients (rows), and then check the number of rows, and get some summary statistics of the dataset.
```{r}
# Have a look at the dataset:
head(dat)
nrow(dat) #38557
summary(dat)
```
# Add day of week to DataFrame
If we are going to examine the weekend effect, we need to pull this out of the dataset, as you can see, all we have above are dates. We will define a weekend, as anytime between Saturday (00:00:00) until Sunday (23:59:59). The dates above are shifted, and that's why they look odd, but they are matched on the day of week, so this aspect is preserved.
```{r}
if(!("lubridate" %in% installed.packages())) { install.packages("lubridate")}
library(lubridate)
dat$dow <- as.factor(wday(dat$intime_icu ))
table(dat$dow)
# Convert to text levels
levels(dat$dow) <- c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
table(dat$dow)
dat$weekend <- dat$dow %in% c("Sunday","Saturday")
table(dat$weekend)
```
# Produce some Summary Statistics by DOW and Weekday vs. Weekend
Next, it's good to look at some basic summaries of the data. We will compute simple averages and percentages/counts for each of the variables we have extracted, and look at it by day of week (dow) and weekend (weekend).
```{r}
#If this fails, you may have to install pkg-config and libnlopt-dev on your system
# e.g., in Ubuntu/Debian:
# sudo apt-get install pkg-config libnlopt-dev
if(!("tableone" %in% installed.packages())) { install.packages("tableone")}
library(tableone)
vars <- c('gender', 'los_hospital', 'age', 'admission_type', 'hospital_expire_flag',
'los_icu','icustay_expire_flag', 'oasis', 'oasis_prob', 'first_careunit')
group_var <- "dow"
CreateTableOne(dat,vars=vars,strata=group_var,factorVars=c("hospital_expire_flag","icustay_expire_flag")) %>%
print(
printToggle = FALSE,
showAllLevels = TRUE,
cramVars = "kon"
) %>%
{data.frame(
variable_name = gsub(" ", " ", rownames(.), fixed = TRUE), .,
row.names = NULL,
check.names = FALSE,
stringsAsFactors = FALSE)}
```
```{r}
vars <- c('gender', 'los_hospital', 'age', 'admission_type', 'hospital_expire_flag',
'los_icu','icustay_expire_flag', 'oasis', 'oasis_prob', 'first_careunit')
group_var <- "weekend"
CreateTableOne(dat,vars=vars,strata=group_var,factorVars=c("hospital_expire_flag","icustay_expire_flag")) %>%
print(
printToggle = FALSE,
showAllLevels = TRUE,
cramVars = "kon"
) %>%
{data.frame(
variable_name = gsub(" ", " ", rownames(.), fixed = TRUE), .,
row.names = NULL,
check.names = FALSE,
stringsAsFactors = FALSE)}
```
It looks like there's a higher rate of hospital mortality (14.0% vs 10.8%) and ICU mortality (10.2% vs 7.8%) on weekends when compared to weekdays. There are also statistically significant differences between several other important variables, including: admission type, disease severity (OASIS), and the patient's first care unit, suggesting that these groups may be fundamentally different in some way. Let's explore this a little further.
```{r}
library(ggplot2)
dat %>% group_by(admission_type,dow) %>%
summarise(n=n(),mortRate=mean(hospital_expire_flag==1),
LL95 = mortRate - qnorm(0.975)*sqrt(mortRate*(1-mortRate)/n),
UL95 = mortRate + qnorm(0.975)*sqrt(mortRate*(1-mortRate)/n)) %>%
ggplot(aes(dow,mortRate,group=admission_type,col=admission_type)) +
geom_line() +
geom_ribbon(aes(ymax=UL95,ymin=LL95,fill=admission_type,col=NULL),alpha=0.1) +
xlab("Day of Week") + ylab("Hospital Mortality Rate")
```
```{r}
dat$weekend <- as.factor(dat$weekend)
dat$admission_type <- as.factor(dat$admission_type)
dat %>% group_by(admission_type,weekend) %>%
summarise(n=n(),mortRate=mean(hospital_expire_flag==1),
LL95 = mortRate - qnorm(0.975)*sqrt(mortRate*(1-mortRate)/n),
UL95 = mortRate + qnorm(0.975)*sqrt(mortRate*(1-mortRate)/n)) %>%
ggplot(aes(weekend,mortRate,group=admission_type,col=admission_type)) +
geom_line() +
geom_ribbon(aes(ymax=UL95,ymin=LL95,fill=admission_type,col=NULL),alpha=0.1) +
xlab("Weekend") + ylab("Hospital Mortality Rate")
```
# Model Building
Let's try to incorporate what we saw above into a very simple model. We will use logistic regression with hospital mortality as our outcome. First a unadjusted estimate, and then we will try to adjust for admission type.
```{r}
simple.glm <- glm(hospital_expire_flag ~ weekend,data=dat,family="binomial")
summary(simple.glm)
# Uncomment for pretty graph
## if(!("sjPlot" %in% installed.packages())) { install.packages("sjPlot")}
#library(sjPlot)
#sjp.glm(simple.glm)
```
So, looking at the crude rates, and odds ratios, we can see that patients admitted on a weekend have about a 35% increase in the odds of dying in the hospital when compared to those on a weekday. This effect is statistically significant (p<0.001).
Are we done?
I hope not. We saw from the plots above, there is likely some confounding and effect modification going on. Continuing on....
```{r}
# Without effect modification
adj.glm <- glm(hospital_expire_flag ~ weekend + admission_type,data=dat,family="binomial")
summary(adj.glm)
drop1(adj.glm,test="Chisq")
# With effect modification (interaction)
adj.int.glm <- glm(hospital_expire_flag ~ weekend*admission_type,data=dat,family="binomial")
summary(adj.int.glm)
drop1(adj.int.glm,test="Chisq")
```
```{r}
onWeekend <- expand.grid(weekend="TRUE",admission_type=levels(dat$admission_type))
onWeekday <- expand.grid(weekend="FALSE",admission_type=levels(dat$admission_type))
onWeekend$pred <- predict(adj.int.glm,newdata=onWeekend)
onWeekend
onWeekday$pred <- predict(adj.int.glm,newdata=onWeekday)
onWeekday
```
```{r}
onWeekend %>% inner_join(onWeekday,by="admission_type") %>% mutate(OR=exp((pred.x-pred.y)))
```
So, this mirrors what we saw above. While there may be differences between EMERGENCY and URGENT admission types, an ELECTIVE admission occurring on a weekend has an odds of mortality almost four times that of an ELECTIVE admission on a weekday.
What do you think? Do patients admitted on a weekend have a higher rate of mortality than those admitted during the week? Who is most effected, if at all?
Looking forward to see what you guys come up with!