Skip to content

Commit

Permalink
reduction of timings of the examples of document_term_matrix, documen…
Browse files Browse the repository at this point in the history
…t_term_frequencies, document_term_frequencies_statistics, cooccurrence, dtm_bind, keywords_collocation
  • Loading branch information
jwijffels committed Nov 9, 2022
1 parent 22dbf67 commit 4f0a0f4
Show file tree
Hide file tree
Showing 9 changed files with 43 additions and 5 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## CHANGES IN udpipe VERSION 0.8.10

- use snprintf instead of sprintf to handle the R CMD check deprecating note on M1mac
- reduction of timings of the examples of document_term_matrix, document_term_frequencies, document_term_frequencies_statistics, cooccurrence, dtm_bind, keywords_collocation

## CHANGES IN udpipe VERSION 0.8.9

Expand Down
9 changes: 6 additions & 3 deletions R/nlp_collocation.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,20 @@
#' @export
#' @aliases keywords_collocation collocation
#' @examples
#' \dontshow{
#' data.table::setDTthreads(1)
#' }
#' data(brussels_reviews_anno)
#' x <- subset(brussels_reviews_anno, language %in% "fr")
#' x <- subset(brussels_reviews_anno, language %in% "fr")
#' colloc <- keywords_collocation(x, term = "lemma", group = c("doc_id", "sentence_id"),
#' ngram_max = 3, n_min = 10)
#' head(colloc, 10)
#'
#' ## Example on finding collocations of nouns preceded by an adjective
#' library(data.table)
#' x <- as.data.table(x)
#' x[, xpos_previous := txt_previous(xpos, n = 1), by = list(doc_id, sentence_id)]
#' x[, xpos_next := txt_next(xpos, n = 1), by = list(doc_id, sentence_id)]
#' x <- x[, xpos_previous := txt_previous(xpos, n = 1), by = list(doc_id, sentence_id)]
#' x <- x[, xpos_next := txt_next(xpos, n = 1), by = list(doc_id, sentence_id)]
#' x <- subset(x, (xpos %in% c("NN") & xpos_previous %in% c("JJ")) |
#' (xpos %in% c("JJ") & xpos_next %in% c("NN")))
#' colloc <- keywords_collocation(x, term = "lemma", group = c("doc_id", "sentence_id"),
Expand Down
3 changes: 3 additions & 0 deletions R/nlp_cooccurrence.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@
#' for the combination of term1 and term2 how many times this combination occurred
#' @export
#' @examples
#' \dontshow{
#' data.table::setDTthreads(1)
#' }
#' data(brussels_reviews_anno)
#'
#' ## By document, which lemma's co-occur
Expand Down
16 changes: 15 additions & 1 deletion R/nlp_flow.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
#' will assume that freq is 1 for each row in the input dataset \code{x}.
#' @export
#' @examples
#' \dontshow{
#' data.table::setDTthreads(1)
#' }
#' ##
#' ## Calculate document_term_frequencies on a data.frame
#' ##
Expand Down Expand Up @@ -119,6 +122,10 @@ document_term_frequencies.character <- function(x, document=paste("doc", seq_alo
#' @export
#' @examples
#' data(brussels_reviews_anno)
#' \dontshow{
#' data.table::setDTthreads(1)
#' brussels_reviews_anno <- subset(brussels_reviews_anno, language %in% "nl")
#' }
#' x <- document_term_frequencies(brussels_reviews_anno[, c("doc_id", "token")])
#' x <- document_term_frequencies_statistics(x)
#' head(x)
Expand Down Expand Up @@ -170,6 +177,9 @@ document_term_frequencies_statistics <- function(x, k = 1.2, b = 0.75){
#' @export
#' @seealso \code{\link[Matrix]{sparseMatrix}}, \code{\link{document_term_frequencies}}
#' @examples
#' \dontshow{
#' data.table::setDTthreads(1)
#' }
#' x <- data.frame(doc_id = c(1, 1, 2, 3, 4),
#' term = c("A", "C", "Z", "X", "G"),
#' freq = c(1, 5, 7, 10, 0))
Expand Down Expand Up @@ -661,6 +671,9 @@ dtm_cor <- function(x) {
#' @aliases dtm_rbind dtm_cbind
#' @export
#' @examples
#' \dontshow{
#' data.table::setDTthreads(1)
#' }
#' data(brussels_reviews_anno)
#' x <- brussels_reviews_anno
#'
Expand All @@ -681,7 +694,8 @@ dtm_cor <- function(x) {
#'
#' ## cbind
#' library(data.table)
#' x <- as.data.table(brussels_reviews_anno)
#' x <- subset(brussels_reviews_anno, language %in% c("nl", "fr"))
#' x <- as.data.table(x)
#' x <- x[, token_bigram := txt_nextgram(token, n = 2), by = list(doc_id, sentence_id)]
#' x <- x[, lemma_upos := sprintf("%s//%s", lemma, upos)]
#' dtm1 <- document_term_frequencies(x = x, document = "doc_id", term = c("token"))
Expand Down
3 changes: 3 additions & 0 deletions man/cooccurrence.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/document_term_frequencies.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions man/document_term_frequencies_statistics.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/document_term_matrix.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion man/dtm_bind.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 4f0a0f4

Please sign in to comment.