Title: | Base package for the y3628 data analysis suite |
---|---|
Description: | Base functionalities used by other packages in the y3628 suite of data analysis tools. These were initially written for various projects during my years at Michigan. |
Authors: | Ye Yuan [aut, cre] |
Maintainer: | Ye Yuan <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.3 |
Built: | 2025-02-13 04:07:09 UTC |
Source: | https://github.com/yeyuan98/y3628 |
Read 'table' file with flexible file extension
flexTableReader(col_names, col_types, skip, ...)
flexTableReader(col_names, col_types, skip, ...)
col_names |
Column names, character vector |
col_types |
Column types, same as |
skip |
Number of lines to skip before reading data |
... |
These dots are for future extensions and must be empty. |
Depending on the actual file format provided by the user, column types may
not be respected. For example, if the file is excel spreadsheet, then the
column types provided is mapped to that supported by the readxl
package.
Reader function that accepts a file path
# TODO
# TODO
Get specific parameters for a table reader
getNormalizedParams(ext, ...)
getNormalizedParams(ext, ...)
ext |
File extension. |
... |
Parameters, see details. |
Currently the following parameters are supported: col_names, col_types, skip. For excel files, col_types is not supported yet.
Named list of normalized parameters.
# Not exported
# Not exported
Get filetype-specific table reader
getTableReader(ext)
getTableReader(ext)
ext |
File extension. |
A suitable reader function.
# Not exported.
# Not exported.
Motivation: the original dplyr::group_by()
uses ellipsis to allow
flexibility in specifying grouping variables. However, sometimes a function
might need the ellipsis for other purposes. In those cases, it is desirable
to "save" the ellipsis by allowing the user to provide all grouping variables
in a single parameter.
grouper(.data, var.group, ...)
grouper(.data, var.group, ...)
.data |
Data frame to perform |
var.group |
Quoted group variable(s). Multiple variables must be
inside |
... |
Must be empty. |
This function is not intended for end users.
Grouped data frame.
# Correct use f <- function(df, var.group){ var.group <- rlang::enexpr(var.group) return(grouper(df, var.group)) } # Single group variable f(mtcars, cyl) # Multiple group variable f(mtcars, gvars(cyl,vs)) # Wrong use f <- function(df, var.group){ return(grouper(df, var.group)) } # f(mtcars, cyl) will error
# Correct use f <- function(df, var.group){ var.group <- rlang::enexpr(var.group) return(grouper(df, var.group)) } # Single group variable f(mtcars, cyl) # Multiple group variable f(mtcars, gvars(cyl,vs)) # Wrong use f <- function(df, var.group){ return(grouper(df, var.group)) } # f(mtcars, cyl) will error
Customizable summary of data frame variables
groupThenSummarize(.data, var.group, .fns, ...)
groupThenSummarize(.data, var.group, .fns, ...)
.data |
a data frame (extension) |
var.group |
variables to group, either single name or gvars() |
.fns |
named list of summary functions |
... |
< |
a data frame (extension) of summary
# Single group variable groupThenSummarize(mtcars, cyl, list(m=mean, s=sd), disp:wt) # Multiple group variable groupThenSummarize(mtcars, gvars(cyl,vs), list(m=mean, s=sd), disp:wt) # cyl == 4 & vs == 0 group will have NA sd values. # This is because there is only one row in this group. mtcars[mtcars$cyl == 4 & mtcars$vs == 0,]
# Single group variable groupThenSummarize(mtcars, cyl, list(m=mean, s=sd), disp:wt) # Multiple group variable groupThenSummarize(mtcars, gvars(cyl,vs), list(m=mean, s=sd), disp:wt) # cyl == 4 & vs == 0 group will have NA sd values. # This is because there is only one row in this group. mtcars[mtcars$cyl == 4 & mtcars$vs == 0,]
metaRcrd
record-style vectorThis creates a vector where each item is a record of data and metadata fields. Data fields are used for record equality/comparison operations. Metadata fields are conceptually "attributes" attached to the data fields.
metaRcrd(fields, meta.fields, ...)
metaRcrd(fields, meta.fields, ...)
fields |
A named list or data.frame where each row is a record. Names of this list are the field names for the record vector. |
meta.fields |
A character vector giving fields that should be considered as "metadata" fields. |
... |
For future extensions. Must be empty. |
An S3 vector of class y3628_metaRcrd
.
Set operations must use vctrs
methods (e.g., vctrs::vec_set_union()
).
Base set operations are not generic and hence invalid.
require(vctrs) require(tibble) ## Representing metadata of experimental samples today <- Sys.Date() dates <- c(today, today, today, today+1) # when conds <- factor(c("L", "M", "H", "L"), levels = c("L", "M", "H")) # condition who <- c("me", "me", "me", "Ahri") # personnel eg1 <- metaRcrd(list(date=dates, condition=conds, personnel=who), "personnel") dates <- c(today) # when conds <- factor("L", levels = c("L", "M", "H")) # condition who <- "Ahri" # personnel eg2 <- metaRcrd(list(date=dates, condition=conds, personnel=who), "personnel") ### concatenate eg_full <- vec_c(eg1, eg2) # c() works just fine too eg_full ### equality/set operation eg1 == eg2 vec_set_difference(eg1, eg2) # use vec_set_* methods ### comparison/sort eg1 <= eg2 sort(eg_full) ### use in a tibble proj <- c("fancy project 1", "ok project 2") # what project rec <- list(eg1, eg2) # experiment record for each project project_record <- tibble(project = proj, record = rec) project_record
require(vctrs) require(tibble) ## Representing metadata of experimental samples today <- Sys.Date() dates <- c(today, today, today, today+1) # when conds <- factor(c("L", "M", "H", "L"), levels = c("L", "M", "H")) # condition who <- c("me", "me", "me", "Ahri") # personnel eg1 <- metaRcrd(list(date=dates, condition=conds, personnel=who), "personnel") dates <- c(today) # when conds <- factor("L", levels = c("L", "M", "H")) # condition who <- "Ahri" # personnel eg2 <- metaRcrd(list(date=dates, condition=conds, personnel=who), "personnel") ### concatenate eg_full <- vec_c(eg1, eg2) # c() works just fine too eg_full ### equality/set operation eg1 == eg2 vec_set_difference(eg1, eg2) # use vec_set_* methods ### comparison/sort eg1 <= eg2 sort(eg_full) ### use in a tibble proj <- c("fancy project 1", "ok project 2") # what project rec <- list(eg1, eg2) # experiment record for each project project_record <- tibble(project = proj, record = rec) project_record
Filtering metaRcrd objects
mR_filter(mRcrd, what)
mR_filter(mRcrd, what)
mRcrd |
An S3 vector of class |
what |
Filtering criteria. Quoted and evaluated to subset the record. |
Subsetted y3628_metaRcrd
.
require(datasets) my_mtcars <- datasets::mtcars[c("cyl", "hp", "mpg")] my_mtcars <- metaRcrd(my_mtcars, meta.fields = "mpg") mR_filter(my_mtcars, mpg > 20)
require(datasets) my_mtcars <- datasets::mtcars[c("cyl", "hp", "mpg")] my_mtcars <- metaRcrd(my_mtcars, meta.fields = "mpg") mR_filter(my_mtcars, mpg > 20)
Applying summary function to string splits
str_split_summary(string, pattern, summary)
str_split_summary(string, pattern, summary)
string |
Input vector, passed to |
pattern |
Pattern to look for, passed to |
summary |
Summary function that will be applied to each split. |
Vector same length as input vector
t <- c("0.1,0.3,0.7", "NA", "0.2", "") str_split_summary(t, ",", \(x) mean(as.numeric(x)))
t <- c("0.1,0.3,0.7", "NA", "0.2", "") str_split_summary(t, ",", \(x) mean(as.numeric(x)))