-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Viewing parsing failures, guess_max parameter #245
Comments
I'm adding the
remotes::install_github("OuhscBbmc/REDCapR", ref="dev") |
Thanks for all your help with this. |
@isaactpetersen, I'm experimenting with this now. Can you paste the bad values in this issue? (Or similar ones if they have PHI). I'd like something to test against. |
@wibeasley Sorry for the delay. Here are some example variables where I receive the parsing error (when |
@isaactpetersen, that makes sense. I was having trouble reading time values with reader this morning ...completely independent of REDCap. I think it's worse when the leading unit isn't padded to two digits (ie, '0:39' instead of '00:39'). Even if this case had an easy fix, I think this is a strong argument that the column data types should be specify-able in CSV (after deleting NA rows):
|
That makes sense, and I also believe that the challenge is with readr's handling of time values (independent of REDCap). The MM:SS time values were entered with REDCap's validation criterion of |
@isaactpetersen, will you run the dev version and tell me if this fixes the parsing problems with your time variable? I tackled #257 this morning because we hit a big dataset that needed to be batched, but had inconsistent data types across batches. It's working so far. I started looking through other issues to include in the release, and re-read your issue here. Now I realize that you had suggested the solution last week, and I simply didn't understand/appreciate it. I thought you were doing a column-by-column conversion with So your solution is now essentially inside the batching process. Does this help with your time variable? It worked for some of our inconsistent date variables. I'd still like to implement
|
@wibeasley Thanks for the updates. I just tried exporting the data using the dev version. I receive the following warnings:
|
@isaactpetersen, I'm trying to replicate this with a small dataset. I created a REDCap project with two rows: the first has a date and the second has a time. It's working as intended. When both records are available, it's a character. When it's just the first record returned, it's a date. uri <- "https://bbmc.ouhsc.edu/redcap/api/"
token <- "14A41597332864D74460CBBF52EE49A6"
#Return all records and all variables.
ds_both <-
REDCapR::redcap_read(
redcap_uri = uri,
token = token
)$data
# `time_1` should be a character
str(ds_both)
ds_first <-
REDCapR::redcap_read(
redcap_uri = uri,
token = token,
filter_logic = "[record_id] = 1"
)$data
# `time_1` should be a date.
str(ds_first) results:
Interestingly, now a <- tibble::tibble(
b = c(1:2000000, "some string")
)
# Isn't fooled by the first 2 million rows. `b` still an character.
readr::type_convert(a) |
First off, I love the package and greatly appreciate your work on this. I receive the following warning when I read data from REDCap using
redcap_read()
However, when I try to view the parsing failures by typing
problems(objectName)
, it doesn't show any of the parsing failures. How can I view the parsing failures to see how to address them? Are there suggestions for addressing parsing failures?It's possible that the parsing failures could be due to the small default value (1000) for
guess_max
inread_csv()
(tidyverse/readr#588). Is it possible to increase theguess_max
value when usingredcap_read()
? I seeguess_max
as a parameter forredcap_read_oneshot()
but not forredcap_read()
.Thanks so much for your help!
The text was updated successfully, but these errors were encountered: