Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dateRangeBegin and dateRangeEnd to redcap_read #321

Closed
pbchase opened this issue Feb 25, 2021 · 4 comments
Closed

Adding dateRangeBegin and dateRangeEnd to redcap_read #321

pbchase opened this issue Feb 25, 2021 · 4 comments
Assignees

Comments

@pbchase
Copy link
Contributor

pbchase commented Feb 25, 2021

Has anyone considered implementing dateRangeBegin or dateRangeEnd options in redcap_read? We'd love to read only the past hour of updates for some queries. We often use redcap's datediff in filter_logic, but it is difficult to use and solves a slightly different problem.

We might entertain the idea of adding the dateRangeBegin option to redcap_read but I wouldn't want to go down a path that has already proven fruitless. Has anyone tried these options? Are they performant? I need to turn a 20k record read into a 100 record read.

@wibeasley wibeasley self-assigned this Feb 26, 2021
@wibeasley
Copy link
Member

@pbchase, that would be a different date field for every project, correct? (eg, dob in some projects, specimen collection in others.)

How about precomputing the datetime boundary and pass them to the filter_logic parameter in redcap_read().

datetime_start <- Sys.time() - lubridate::hours(24) 
born_yesterday <- strftime(datetime_start, "%Y-%m-%d %H:%M:%S < [dob]") # Cast to a string w/o the timezone?

ds <- 
  REDCapR::redcap_read(
    redcap_uri    = uri,
    token         = token,
    filter_logic  = born_yesterday
  )$data

The value of the born_yesterday filter is "2021-02-24 18:49:41 < [dob]". I haven't tested this, but tell me first if it's the right idea. I think the only risky part is the format of datetime_start is compatible with the format of the dob in REDCap (also stored as a string).

If this solves the performance issues, I'll stick an example in the Advanced REDCapR Operations vignette.

There's a chance it isn't much faster. I'm guessing that even if yesterday's births are only 100 (our of 200k) records, MySQL is still retrieving all those obs, and the PHP is pivoting to wide (in order to connect dob to all the other eav rows in the redcap_data table). So the network traffic is reduced by a lot (b/c only 100 records eventually cross), but the computation load on the webserver (which pivots the obs after retrieving from the database server) is not.

@pbchase
Copy link
Contributor Author

pbchase commented Feb 26, 2021

I agree that your style of date constraint is easier to express in R and more reliable than using REDCap's datediff. Many thanks for that. Yet when I run similar tests in PHP or Curl, the filter logic is unreliable in execute. It keeps returning the entire dataset. REDCap filter logic is not great and it is less great via the API and with dates.

Yet that is all beside the point as dateRangeBegin and dateRangeEnd are different beasts. Those two dates are fed into a query of the event log table to identify the record IDs in the project that have had an insert or update event. This is the code that finds the record IDs of interest.

SELECT DISTINCT pk
FROM ".Logging::getLogEventTable($project_id)."
WHERE project_id = $project_id 
    AND ((object_type = 'redcap_data' AND event IN ('INSERT', 'UPDATE', 'DOC_DELETE', 'DOC_UPLOAD'))
        OR (page = 'PLUGIN' and event = 'OTHER'))
    AND ts >= $dateRangeBegin
    AND ts <= $dateRangeEnd

The execution is fast. It returned the last two hours of update traffic--614 record IDs--in 1.0 seconds from a project of 12K records. These options are perfect for my goal which is data synchronization with low I/O cost and reduced clock time. I really think these could be worthwhile.

Would you entertain a merge if we made a PR?

@wibeasley
Copy link
Member

I gotcha. So these would need a change to REDCap itself (to connect the event log table to the eav table) and then a change to REDCapR? The former would be a lot more work. Adding a new parameter (plus input validation and testing) to REDCapR is typically only an hour or two.

Of course, any PR is welcome. Tell me when you guys start doing it, and I'll help create a REDCap project or two that are dedicated to the unit test for dateRangeBegin/End

@wibeasley
Copy link
Member

Thanks for the code, @pbchase. I made one minor change: the R function accepts a POSIXct value, so it's guaranteed to be a valid datetime. Then REDCapR uses strftime() to format it in the desired string output. Any thoughts?

I added some simple tests too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants