Skip to content

Commit

Permalink
finish first cut of ml functions and vignette entry
Browse files Browse the repository at this point in the history
  • Loading branch information
fawda123 committed Jun 25, 2024
1 parent 25d5b32 commit 5a22d14
Show file tree
Hide file tree
Showing 15 changed files with 167 additions and 17 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export(anlz_dps)
export(anlz_dps_facility)
export(anlz_ips)
export(anlz_ips_facility)
export(anlz_ml)
export(anlz_ml_facility)
export(util_ps_addcol)
export(util_ps_checkfls)
Expand Down
52 changes: 52 additions & 0 deletions R/anlz_ml.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#' Calculate material loss (ML) loads and summarize
#'
#' Calculate material loss (ML) loads and summarize
#'
#' @param fls vector of file paths to raw entity data, one to many
#' @param summ `r summ_params('summ')`
#' @param summtime `r summ_params('summtime')`
#'
#' @details
#' Input data files in \code{fls} are first processed by \code{\link{anlz_ml_facility}} to calculate ML loads for each facility. `r summ_params('descrip')`
#'
#' @return data frame with loading data for TN as tons per month/year. Columns for TP, TSS, BOD, and hydrologic load are also returned with zero load for consistency with other point source load calculation functions.
#'
#' @export
#'
#' @seealso \code{\link{anlz_ml_facility}}
#'
#' @examples
#' fls <- list.files(system.file('extdata/', package = 'tbeploads'),
#' pattern = 'ps_indml', full.names = TRUE)
#' anlz_ml(fls)
anlz_ml <- function(fls, summ = c('entity', 'facility', 'segment', 'all'), summtime = c('month', 'year')){

# get facility and outfall level data
mlbyfac <- anlz_ml_facility(fls)

# add bay segment and source, must use facilities object since no coastco
baysegs <- facilities |>
dplyr::filter(grepl('Material Losses', source)) |>
dplyr::select(bayseg, entity, facility = facname)
mlld <- mlbyfac |>
dplyr::left_join(baysegs, by = c('entity', 'facility')) |>
dplyr::mutate(
segment = dplyr::case_when(
bayseg == 1 ~ "Old Tampa Bay",
bayseg == 2 ~ "Hillsborough Bay",
bayseg == 3 ~ "Middle Tampa Bay",
bayseg == 4 ~ "Lower Tampa Bay",
TRUE ~ NA_character_
),
source = 'ML'
) |>
dplyr::select(-bayseg)

##
# summarize by selection

out <- util_ps_summ(mlld, summ = summ, summtime = summtime)

return(out)

}
14 changes: 8 additions & 6 deletions R/anlz_ml_facility.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
#' @param fls vector of file paths to raw facility data, one to many
#'
#' @details
#' Input data should one row per year per facility, where the row shows the total tons per year of total nitrogen loss.
#' Input data should be one row per year per facility, where the row shows the total tons per year of total nitrogen loss. Input files are often created by hand based on reported annual tons of nitrogen shipped at each facility. The material losses as tons/yr are estimated from the tons shipped using an agreed upon loss rate. Values reported in the example files represent the estimated loss as the total tons of N shipped each year multiplied by 0.0023 and divided by 2000. The total N shipped at a facility each year can be obtained using a simple back-calculation (multiply by 2000, divide by 0.0023).
#'
#' @return data frame that is nearly identical to the input data except results are shown as monthly load as the annual estimate divided by 12. This is for consistency of reporting without sources.
#' @return data frame that is nearly identical to the input data except results are shown as monthly load as the annual loss estimate divided by 12. This is for consistency of reporting with other loading sources.
#'
#' @seealso \code{\link{anlz_ml}}
#'
Expand All @@ -21,7 +21,7 @@ anlz_ml_facility <- function(fls){

##
# import and prep all data
browser()

mlprep <- tibble::tibble(
fls = fls
) |>
Expand All @@ -35,12 +35,14 @@ anlz_ml_facility <- function(fls){
tidyr::unnest('entinfo') |>
tidyr::unnest('dat')

##
# expand to monthly

ml <- tidyr::crossing(
unique(mlprep[, c('year', 'entity', 'facname')]),
unique(mlprep[, c('Year', 'entity', 'facname')]),
Month = 1:12
) |>
dplyr::full_join(mlprep, by = c('year', 'entity', 'facname')) |>
dplyr::full_join(mlprep, by = c('Year', 'entity', 'facname')) |>
dplyr::mutate(
tn_load = tn_tonsyr / 12,
tp_load = NA,
Expand All @@ -50,7 +52,7 @@ anlz_ml_facility <- function(fls){
source = NA
) |>
dplyr::select(
Year = year,
Year,
Month,
entity,
facility = facname,
Expand Down
2 changes: 1 addition & 1 deletion R/globalVariables.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ globalVariables(c(".", "Facility.Name", "Month", "Outfall.ID", "Permit.Number",
"facname", "facnameshr", "flow_m3m", "flow_mgd", "flow_mgm", "hy_load",
"load_kg", "outfall", "outfallno", "tn_load", "tp_load", "tss_load", "var",
"basin", "coastco", "coastid", "facid", "permit", "dat", "dbasing", "hectare",
"name", "segment", "spccpro"))
"name", "segment", "spccpro", "tn_tonsyr"))

#' @importFrom utils read.table
NULL
Expand Down
2 changes: 1 addition & 1 deletion inst/extdata/ps_indml_csx_rock_2020.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Facility.Name year tn_tonsyr
Facility.Name Year tn_tonsyr
CSX Rockport 2020 0.88388253765
2 changes: 1 addition & 1 deletion inst/extdata/ps_indml_kinder_tampaplex_1718.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Facility.Name year tn_tonsyr
Facility.Name Year tn_tonsyr
Kinder Morgan Tampaplex 2017 0.18758685
Kinder Morgan Tampaplex 2018 0.1896925
2 changes: 1 addition & 1 deletion inst/extdata/ps_indml_kinder_tampaplex_2019.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Facility.Name year tn_tonsyr
Facility.Name Year tn_tonsyr
Kinder Morgan Tampaplex 2019 0.192658902
2 changes: 1 addition & 1 deletion inst/extdata/ps_indml_mosaic_riverview_2021.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
Facility.Name year tn_tonsyr
Facility.Name Year tn_tonsyr
Mosaic Riverview 2021 0.986666666666667
36 changes: 36 additions & 0 deletions man/anlz_ml.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions tests/testthat/helper-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@ psindpth <- system.file('extdata/ps_ind_busch_busch_2020.txt', package = 'tbeplo
fls <- list.files(system.file('extdata/', package = 'tbeploads'), full.names = TRUE)
psdomfls <- fls[grepl('ps_dom', fls)]
psindfls <- fls[grepl('ps_ind_', fls)]
indmlfls <- fls[grepl('ps_indml', fls)]
dps <- anlz_dps_facility(psdomfls)
ips <- anlz_ips_facility(psindfls)
ml <- anlz_ml_facility(indmlfls)
3 changes: 2 additions & 1 deletion tests/testthat/test-anlz_dps.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Test cases
test_that("anlz_dps returns correct results for facility, month", {

# summarize by facility and year
result <- anlz_dps(psdomfls, summ = 'facility', summtime = 'month')
result <- names(result)
expected <- c("Year", "Month", "source", "entity", "facility", "segment", "tn_load",
"tp_load", "tss_load", "bod_load", "hy_load")
expect_identical(result, expected)

})
3 changes: 2 additions & 1 deletion tests/testthat/test-anlz_ips.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Test cases
test_that("anlz_ips returns correct results for facility, month", {

# summarize by facility and year
result <- anlz_ips(psindfls, summ = 'facility', summtime = 'month')
result <- names(result)
expected <- c("Year", "Month", "source", "entity", "facility", "segment", "tn_load",
"tp_load", "tss_load", "bod_load", "hy_load")
expect_identical(result, expected)

})
10 changes: 10 additions & 0 deletions tests/testthat/test-anlz_ml.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
test_that("anlz_ml returns correct results for facility, month", {

# summarize by facility and year
result <- anlz_ml(indmlfls, summ = 'facility', summtime = 'month')
result <- names(result)
expected <- c("Year", "Month", "source", "entity", "facility", "segment", "tn_load",
"tp_load", "tss_load", "bod_load", "hy_load")
expect_identical(result, expected)

})
20 changes: 20 additions & 0 deletions tests/testthat/test-anlz_ml_facility.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@

test_that("Check load calculations", {

result <- ml |>
filter(Year == 2021 & Month == 1 & facility == 'Riverview') |>
mutate_if(is.numeric, round, 3)

expect_equal(result$tn_load[[1]], 0.082)
expect_equal(result$tp_load[[1]], NA)
expect_equal(result$tss_load[[1]], NA)
expect_equal(result$bod_load[[1]], NA)
expect_equal(result$hy_load[[1]], NA)

})

test_that("Verify output class", {

expect_s3_class(ips, "data.frame")

})
33 changes: 29 additions & 4 deletions vignettes/tbeploads.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Load estimates are broadly defined as domestic point source (DPS), industrial po

The DPS functions are designed to work with raw entity data provided by partners. The core function is `anlz_dps_facility()` that requires only a vector of file paths as input, where each path points to a file with monthly parameter concentration (mg/L) and flow data (million gallons per day). The data also describe whether the observations are end of pipe (direct inflow to the bay) or reuse (applied to the land), with each defined by outfall Ids typically noted as D-001, D-002, etc. and R-001, R-002, etc, respectively. Both are estimated as concentration times flow, whereas reuse includes an attenuation factor for land application depending on location. The file names must follow a specific convention, where metadata for each entity is found in the `facilities()` data object using information in the file name.

For convenience, three example files are included with the package. The paths to these files are used as input to the function. Non-trivial data pre-processing and quality control is needed for each file and those included in the package are the correct format. The output is returned as tons per month for TN, TP, TSS, and BOD and million cubic meters per month for flow (hy).
For convenience, four example files are included with the package. The paths to these files are used as input to the function. Non-trivial data pre-processing and quality control is needed for each file and those included in the package are the correct format. The output is returned as tons per month for TN, TP, TSS, and BOD and million cubic meters per month for flow (hy).

```{r}
dpsfls <- list.files(system.file('extdata/', package = 'tbeploads'),
Expand All @@ -51,7 +51,7 @@ anlz_dps_facility(dpsfls)
The `anlz_dps()` function uses `anlz_dps_facility()` to summarize the DPS results by location as facility (combines outfall data), entity (combines facility data), bay segment (combines entity data), and as all (combines bay segment data). The results can also be temporally summarized as monthly or annual totals. The location summary is defined by the `summ` argument and the temporal summary is defined by the `summtime` argument. The `fls` argument used by `anlz_dps_facility()` is also used by `anlz_dps()`. The output is tons per month for TN, TP, TSS, and BOD and as million cubic meters per month for flow (hy) if `summtime = 'month'` or tons per year for TN, TP, TSS, and BOD and million cubic meters per year for flow (hy) if `summtime = 'year'`.

```{r}
# combine by enity and month
# combine by entity and month
anlz_dps(dpsfls, summ = 'entity', summtime = 'month')
# combine by bay segment and year
Expand All @@ -62,7 +62,7 @@ anlz_dps(dpsfls, summ = "segment", summtime = "year")

The IPS functions are designed to work with raw entity data provided by partners and are similar in functionality to the DPS functions. The core function is `anlz_ips_facility()` that requires only a vector of file paths as input, where each path points to a file with monthly parameter concentration (mg/L) and flow data (million gallons per day). Loads are estimated as concentration times flow. The file names must follow a specific convention, where metadata for each entity is found in the `facilities()` data object using information in the file name.

For convenience, three example files are included with the package. The paths to these files are used as input to the function. As before, non-trivial data pre-processing and quality control is needed for each file and those included in the package are the correct format. The output is returned as tons per month for TN, TP, TSS, and BOD and million cubic meters per month for flow (hy).
For convenience, four example files are included with the package. The paths to these files are used as input to the function. As before, non-trivial data pre-processing and quality control is needed for each file and those included in the package are the correct format. The output is returned as tons per month for TN, TP, TSS, and BOD and million cubic meters per month for flow (hy).

```{r}
ipsfls <- list.files(system.file('extdata/', package = 'tbeploads'),
Expand All @@ -73,9 +73,34 @@ anlz_ips_facility(ipsfls)
The `anlz_ips()` function uses `anlz_ips_facility()` to summarize the IPS results by location as facility (combines outfall data), entity (combines facility data), bay segment (combines entity data), and as all (combines bay segment data). The results can also be temporally summarized as monthly or annual totals. The location summary is defined by the `summ` argument and the temporal summary is defined by the `summtime` argument. The `fls` argument used by `anlz_ips_facility()` is also used by `anlz_ips()`. The output is tons per month for TN, TP, TSS, and BOD and as million cubic meters per month for flow (hy) if `summtime = 'month'` or tons per year for TN, TP, TSS, and BOD and million cubic meters per year for flow (hy) if `summtime = 'year'`.

```{r}
# combine by enity and month
# combine by entity and month
anlz_ips(ipsfls, summ = 'entity', summtime = 'month')
# combine by bay segment and year
anlz_ips(ipsfls, summ = "segment", summtime = "year")
```

### Material Losses (ML)

Material losses are estimates of nutrient loads to the bay primarily from fertilizer shipping activities at ports. Historically, loadings from material losses were much higher than at present. Only a few entities report material losses, typically as a total for the year and only for total nitrogen. The material losses as tons/yr are estimated from the tons shipped using an agreed upon loss rate. Values reported in the example files represent the estimated loss as the total tons of N shipped each year multiplied by 0.0023 and divided by 2000. The total N shipped at a facility each year can be obtained using a simple back-calculation (multiply by 2000, divide by 0.0023).

The core function is `anlz_ml_facility()` that requires only a vector of file paths as input, where each file should be one row per year per facility, where the row shows the total tons per year of total nitrogen loss. The file names must follow a specific convention, where metadata for each entity is found in the `facilities()` data object using information in the file name.

For convenience, four example files are included with the package. The paths to these files are used as input to the function. The output is nearly identical to the input data since no load calculations are used, except results are shown as monthly load as the annual loss divided by 12. Additional empty columns (e.g., TP load, TSS load, etc.) are also returned for consistency of reporting with other loading sources.

```{r}
mlfls <- list.files(system.file('extdata/', package = 'tbeploads'),
pattern = 'ps_indml', full.names = TRUE)
anlz_ml_facility(mlfls)
```

The `anlz_ml()` function uses `anlz_ml_facility()` to summarize the IPS results by location as facility, entity (combines facility data), bay segment (combines entity data), and as all (combines bay segment data). The results can also be temporally summarized as monthly or annual totals. The location summary is defined by the `summ` argument and the temporal summary is defined by the `summtime` argument. The `fls` argument used by `anlz_ml_facility()` is also used by `anlz_ml()`. The output is tons per month of TN if `summtime = 'month'` or tons per year of TN if `summtime = 'year'`. Columns for TP, TSS, BOD, and hydrologic load are also returned with zero load for consistency with other point source load calculation functions. Material loss loads are often combined with IPS loads for reporting.

```{r}
# combine by entity and month
anlz_ml(mlfls, summ = 'entity', summtime = 'month')
# combine by bay segment and year
anlz_ml(mlfls, summ = "segment", summtime = "year")
```

0 comments on commit 5a22d14

Please sign in to comment.