Title: | The Machine Learning and AI Modeling Companion to 'healthyR' |
---|---|
Description: | Hospital machine learning and ai data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative hospital data. Some of these include predicting length of stay, and readmits. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything. |
Authors: | Steven Sanderson [aut, cre, cph] |
Maintainer: | Steven Sanderson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2024-12-10 03:14:17 UTC |
Source: | https://github.com/spsanderson/healthyR.ai |
8 Hex RGB color definitions suitable for charts for colorblind people.
color_blind()
color_blind()
This function is used in others in order to help render plots for those that are color blind.
A vector of 8 Hex RGB definitions.
Steven P. Sanderson II, MPH
Other Color_Blind:
hai_scale_color_colorblind()
,
hai_scale_fill_colorblind()
color_blind()
color_blind()
This function creates a square mesh by sampling nodes uniformly on a square and then connecting these nodes with edges. The nodes are distributed based on the provided side length and number of segments. Horizontal, vertical, and diagonal edges are generated to fully connect the mesh. The function returns a list containing the nodes and edges, along with data frames and a ggplot object for visualization.
generate_mesh_data(.side_length = 1, .n_seg = 1)
generate_mesh_data(.side_length = 1, .n_seg = 1)
.side_length |
A single numeric value representing the side length of the square. |
.n_seg |
A positive integer representing the number of segments along each side of the square. |
This function generates a mesh of nodes and edges based on the provided side length and number of segments.
This function creates a square mesh of nodes and edges, where the nodes are sampled uniformly on a square. The edges are generated to connect the nodes horizontally, vertically, and diagonally.
A list containing:
A matrix with coordinates of the nodes.
A list of edges connecting the nodes.
A data frame of nodes for ggplot.
A data frame of edges for ggplot.
A ggplot object visualizing the nodes and edges.
Additionally, the list contains attributes:
The side length used to generate the mesh.
The number of segments used to generate the mesh.
Dimensions of the nodes data frame.
Dimensions of the edges data frame.
Steven P. Sanderson II, MPH
Other Data Generation:
get_juiced_data()
generate_mesh_data(1, 1) generate_mesh_data(1, 2)
generate_mesh_data(1, 1) generate_mesh_data(1, 2)
This is a simple function that will get the juiced data from a recipe.
get_juiced_data(.recipe_object)
get_juiced_data(.recipe_object)
.recipe_object |
The recipe object you want to pass. |
Instead of typing out something like:
recipe_object %>% prep() %>% juice() %>% glimpse()
A tibble of the prepped and juiced data from the given recipe
Steven P. Sanderson II, MPH
Other Data Generation:
generate_mesh_data()
suppressPackageStartupMessages(library(timetk)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(healthyR.data)) suppressPackageStartupMessages(library(rsample)) suppressPackageStartupMessages(library(recipes)) data_tbl <- healthyR_data %>% select(visit_end_date_time) %>% summarise_by_time( .date_var = visit_end_date_time, .by = "month", value = n() ) %>% set_names("date_col", "value") %>% filter_by_time( .date_var = date_col, .start_date = "2013", .end_date = "2020" ) splits <- initial_split(data = data_tbl, prop = 0.8) rec_obj <- recipe(value ~ ., training(splits)) get_juiced_data(rec_obj)
suppressPackageStartupMessages(library(timetk)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(healthyR.data)) suppressPackageStartupMessages(library(rsample)) suppressPackageStartupMessages(library(recipes)) data_tbl <- healthyR_data %>% select(visit_end_date_time) %>% summarise_by_time( .date_var = visit_end_date_time, .by = "month", value = n() ) %>% set_names("date_col", "value") %>% filter_by_time( .date_var = date_col, .start_date = "2013", .end_date = "2020" ) splits <- initial_split(data = data_tbl, prop = 0.8) rec_obj <- recipe(value ~ ., training(splits)) get_juiced_data(rec_obj)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_c50( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_c50( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::boost_tree()
with the engine
set to C5.0
A list
Steven P. Sanderson II, MPH
Other Boiler_Plate:
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
Other C5.0:
hai_c50_data_prepper()
## Not run: data <- iris rec_obj <- hai_c50_data_prepper(data, Species ~ .) auto_c50 <- hai_auto_c50( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_c50$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_c50_data_prepper(data, Species ~ .) auto_c50 <- hai_auto_c50( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_c50$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_cubist( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "rmse" )
hai_auto_cubist( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "rmse" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "rmse". The only |
This uses the parsnip::cubist_rules()
with the engine
set to cubist
A list
Steven P. Sanderson II, MPH
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
Other cubist:
hai_cubist_data_prepper()
## Not run: data <- mtcars rec_obj <- hai_cubist_data_prepper(data, mpg ~ .) auto_cube <- hai_auto_cubist( .data = data, .rec_obj = rec_obj, .best_metric = "rmse" ) auto_cube$recipe_info ## End(Not run)
## Not run: data <- mtcars rec_obj <- hai_cubist_data_prepper(data, mpg ~ .) auto_cube <- hai_auto_cubist( .data = data, .rec_obj = rec_obj, .best_metric = "rmse" ) auto_cube$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_earth( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_earth( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::mars()
with the engine
set to earth
A list
Steven P. Sanderson II, MPH
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
Other Earth:
hai_earth_data_prepper()
## Not run: data <- iris rec_obj <- hai_earth_data_prepper(data, Species ~ .) auto_earth <- hai_auto_earth( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_earth$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_earth_data_prepper(data, Species ~ .) auto_earth <- hai_auto_earth( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_earth$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_glmnet( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_glmnet( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::multinom_reg()
with the engine
set to glmnet
A list
Steven P. Sanderson II, MPH
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
## Not run: data <- iris rec_obj <- hai_glmnet_data_prepper(data, Species ~ .) auto_glm <- hai_auto_glmnet( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_glm$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_glmnet_data_prepper(data, Species ~ .) auto_glm <- hai_auto_glmnet( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_glm$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_knn( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "rmse", .model_type = "regression" )
hai_auto_knn( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "rmse", .model_type = "regression" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "rmse". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::nearest_neighbor()
with the engine
set to kknn
A list
Steven P. Sanderson II, MPH
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
## Not run: library(dplyr) data <- iris rec_obj <- hai_knn_data_prepper(data, Species ~ .) auto_knn <- hai_auto_knn( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_knn$recipe_info ## End(Not run)
## Not run: library(dplyr) data <- iris rec_obj <- hai_knn_data_prepper(data, Species ~ .) auto_knn <- hai_auto_knn( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification" ) auto_knn$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_ranger( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_ranger( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::rand_forest()
with the engine
set to kernlab
A list
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/rand_forest.html
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
Other Ranger:
hai_ranger_data_prepper()
## Not run: data <- iris rec_obj <- hai_ranger_data_prepper(data, Species ~ .) auto_ranger <- hai_auto_ranger( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_ranger$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_ranger_data_prepper(data, Species ~ .) auto_ranger <- hai_auto_ranger( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_ranger$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_svm_poly( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_svm_poly( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::svm_poly()
with the engine
set to kernlab
A list
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/svm_poly.html
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
Other SVM_Poly:
hai_svm_poly_data_prepper()
## Not run: data <- iris rec_obj <- hai_svm_poly_data_prepper(data, Species ~ .) auto_svm_poly <- hai_auto_svm_poly( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_svm_poly$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_svm_poly_data_prepper(data, Species ~ .) auto_svm_poly <- hai_auto_svm_poly( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_svm_poly$recipe_info ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_svm_rbf( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_svm_rbf( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::svm_rbf()
with the engine
set to kernlab
A list
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/svm_rbf.html
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_wflw_metrics()
,
hai_auto_xgboost()
Other SVM_RBF:
hai_svm_rbf_data_prepper()
## Not run: data <- iris rec_obj <- hai_svm_rbf_data_prepper(data, Species ~ .) auto_rbf <- hai_auto_svm_rbf( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_rbf$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_svm_rbf_data_prepper(data, Species ~ .) auto_rbf <- hai_auto_svm_rbf( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_rbf$recipe_info ## End(Not run)
This function will extract the metrics from the hai_auto_
boilerplate
functions.
hai_auto_wflw_metrics(.data)
hai_auto_wflw_metrics(.data)
.data |
The output of the |
This function will extract the metrics from the hai_auto_
boilerplate
functions. This function looks for a specific attribute from the hai_auto_
functions so that it will extract the tuned_results
from the tuning process
if it has indeed been tuned.
A tibble
Steven P. Sanderson II, MPH
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_xgboost()
## Not run: data <- iris rec_obj <- hai_knn_data_prepper(data, Species ~ .) auto_knn <- hai_auto_knn( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification", .grid_size = 2, .num_cores = 4 ) hai_auto_wflw_metrics(auto_knn) ## End(Not run)
## Not run: data <- iris rec_obj <- hai_knn_data_prepper(data, Species ~ .) auto_knn <- hai_auto_knn( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas", .model_type = "classification", .grid_size = 2, .num_cores = 4 ) hai_auto_wflw_metrics(auto_knn) ## End(Not run)
This is a boilerplate function to create automatically the following:
recipe
model specification
workflow
tuned model (grid ect)
hai_auto_xgboost( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
hai_auto_xgboost( .data, .rec_obj, .splits_obj = NULL, .rsamp_obj = NULL, .tune = TRUE, .grid_size = 10, .num_cores = 1, .best_metric = "f_meas", .model_type = "classification" )
.data |
The data being passed to the function. The time-series object. |
.rec_obj |
This is the recipe object you want to use. You can use
|
.splits_obj |
NULL is the default, when NULL then one will be created. |
.rsamp_obj |
NULL is the default, when NULL then one will be created. It
will default to creating an |
.tune |
Default is TRUE, this will create a tuning grid and tuned workflow |
.grid_size |
Default is 10 |
.num_cores |
Default is 1 |
.best_metric |
Default is "f_meas". You can choose a metric depending on the
model_type used. If |
.model_type |
Default is |
This uses the parsnip::boost_tree()
with the engine
set to xgboost
A list
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/details_boost_tree_xgboost.html
Other Boiler_Plate:
hai_auto_c50()
,
hai_auto_cubist()
,
hai_auto_earth()
,
hai_auto_glmnet()
,
hai_auto_knn()
,
hai_auto_ranger()
,
hai_auto_svm_poly()
,
hai_auto_svm_rbf()
,
hai_auto_wflw_metrics()
## Not run: data <- iris rec_obj <- hai_xgboost_data_prepper(data, Species ~ .) auto_xgb <- hai_auto_xgboost( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_xgb$recipe_info ## End(Not run)
## Not run: data <- iris rec_obj <- hai_xgboost_data_prepper(data, Species ~ .) auto_xgb <- hai_auto_xgboost( .data = data, .rec_obj = rec_obj, .best_metric = "f_meas" ) auto_xgb$recipe_info ## End(Not run)
Automatically prep a data.frame/tibble for use in the C5.0 algorithm.
hai_c50_data_prepper(.data, .recipe_formula)
hai_c50_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the C5.0 algorithm. The C5.0 algorithm is a lazy learning classification algorithm. It expects data to be presented in a certain fashion.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
https://www.rulequest.com/see5-unix.html
Other Preprocessor:
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other C5.0:
hai_auto_c50()
library(ggplot2) hai_c50_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .) rec_obj <- hai_c50_data_prepper(Titanic, Survived ~ .) get_juiced_data(rec_obj)
library(ggplot2) hai_c50_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .) rec_obj <- hai_c50_data_prepper(Titanic, Survived ~ .) get_juiced_data(rec_obj)
Create a control chart, aka Shewhart chart: https://en.wikipedia.org/wiki/Control_chart.
hai_control_chart( .data, .value_col, .x_col, .center_line = mean, .std_dev = 3, .plt_title = NULL, .plt_catpion = NULL, .plt_font_size = 11, .print_plot = TRUE )
hai_control_chart( .data, .value_col, .x_col, .center_line = mean, .std_dev = 3, .plt_title = NULL, .plt_catpion = NULL, .plt_font_size = 11, .print_plot = TRUE )
.data |
data frame or a path to a csv file that will be read in |
.value_col |
variable of interest mapped to y-axis (quoted, ie as a string) |
.x_col |
variable to go on the x-axis, often a time variable. If unspecified row indices will be used (quoted) |
.center_line |
Function used to calculate central tendency. Defaults to mean |
.std_dev |
Number of standard deviations above and below the central tendency to call a point influenced by "special cause variation." Defaults to 3 |
.plt_title |
Plot title |
.plt_catpion |
Plot caption |
.plt_font_size |
Font size; text elements will be scaled to this |
.print_plot |
Print the plot? Default = TRUE. Set to FALSE if you want to assign the plot to a variable for further modification, as in the last example. |
Control charts, also known as Shewhart charts (after Walter A. Shewhart) or process-behavior charts, are a statistical process control tool used to determine if a manufacturing or business process is in a state of control. It is more appropriate to say that the control charts are the graphical device for Statistical Process Monitoring (SPM). Traditional control charts are mostly designed to monitor process parameters when underlying form of the process distributions are known. However, more advanced techniques are available in the 21st century where incoming data streaming can-be monitored even without any knowledge of the underlying process distributions. Distribution-free control charts are becoming increasingly popular.
Generally called for the side effect of printing the control chart. Invisibly, returns a ggplot object for further customization.
Steven P. Sanderson II, MPH
data_tbl <- tibble::tibble( day = sample( c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"), 100, TRUE ), person = sample(c("Tom", "Jane", "Alex"), 100, TRUE), count = rbinom(100, 20, ifelse(day == "Friday", .5, .2)), date = Sys.Date() - sample.int(100) ) hai_control_chart(.data = data_tbl, .value_col = count, .x_col = date) # In addition to printing or writing the plot to file, hai_control_chart # returns the plot as a ggplot2 object, which you can then further customize library(ggplot2) my_chart <- hai_control_chart(data_tbl, count, date) my_chart + ylab("Number of Adverse Events") + scale_x_date(name = "Week of ... ", date_breaks = "week") + theme(axis.text.x = element_text(angle = -90, vjust = 0.5, hjust = 1))
data_tbl <- tibble::tibble( day = sample( c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"), 100, TRUE ), person = sample(c("Tom", "Jane", "Alex"), 100, TRUE), count = rbinom(100, 20, ifelse(day == "Friday", .5, .2)), date = Sys.Date() - sample.int(100) ) hai_control_chart(.data = data_tbl, .value_col = count, .x_col = date) # In addition to printing or writing the plot to file, hai_control_chart # returns the plot as a ggplot2 object, which you can then further customize library(ggplot2) my_chart <- hai_control_chart(data_tbl, count, date) my_chart + ylab("Number of Adverse Events") + scale_x_date(name = "Week of ... ", date_breaks = "week") + theme(axis.text.x = element_text(angle = -90, vjust = 0.5, hjust = 1))
Automatically prep a data.frame/tibble for use in the cubist algorithm.
hai_cubist_data_prepper(.data, .recipe_formula)
hai_cubist_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the cubist algorithm. The cubist algorithm is for regression only.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
https://rulequest.com/cubist-info.html
Other Preprocessor:
hai_c50_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other cubist:
hai_auto_cubist()
library(ggplot2) hai_cubist_data_prepper(.data = diamonds, .recipe_formula = price ~ .) rec_obj <- hai_cubist_data_prepper(diamonds, price ~ .) get_juiced_data(rec_obj)
library(ggplot2) hai_cubist_data_prepper(.data = diamonds, .recipe_formula = price ~ .) rec_obj <- hai_cubist_data_prepper(diamonds, price ~ .) get_juiced_data(rec_obj)
Takes in a recipe and will impute missing values using a selected recipe. To call the recipe use a quoted argument like "median" or "bagged".
hai_data_impute( .recipe_object = NULL, ..., .seed_value = 123, .type_of_imputation = "mean", .number_of_trees = 25, .neighbors = 5, .mean_trim = 0, .roll_statistic, .roll_window = 5 )
hai_data_impute( .recipe_object = NULL, ..., .seed_value = 123, .type_of_imputation = "mean", .number_of_trees = 25, .neighbors = 5, .mean_trim = 0, .roll_statistic, .roll_window = 5 )
.recipe_object |
The data that you want to process |
... |
One or more selector functions to choose variables to be imputed. When used with imp_vars, these dots indicate which variables are used to predict the missing data in each variable. See selections() for more details |
.seed_value |
To make results reproducible, set the seed. |
.type_of_imputation |
This is a quoted argument and can be one of the following:
|
.number_of_trees |
This is used for the |
.neighbors |
This should be filled in with an integer value if |
.mean_trim |
This should be filled in with a fraction if |
.roll_statistic |
This should be filled in with a single unquoted function
that takes with it a single argument such as mean. This should be filled in
if |
.roll_window |
This should be filled in with an integer value if |
This function will get your data ready for processing with many types of ml/ai models.
This is intended to be used inside of the data processor and therefore is an internal function. This documentation exists to explain the process and help the user understand the parameters that can be set in the pre-processor function.
A list object
Steven P. Sanderson II, MPH
https://recipes.tidymodels.org/reference/index.html#section-step-functions-imputation/
step_impute_bag
https://recipes.tidymodels.org/reference/step_impute_bag.html
step_impute_knn
https://recipes.tidymodels.org/reference/step_impute_knn.html
step_impute_linear
https://recipes.tidymodels.org/reference/step_impute_linear.html
step_impute_lower
https://recipes.tidymodels.org/reference/step_impute_lower.html
step_impute_mean
https://recipes.tidymodels.org/reference/step_impute_mean.html
step_impute_median
https://recipes.tidymodels.org/reference/step_impute_median.html
step_impute_mode
https://recipes.tidymodels.org/reference/step_impute_mode.html
step_impute_roll
https://recipes.tidymodels.org/reference/step_impute_roll.html
Other Data Recipes:
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
pca_your_recipe()
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(c(rnorm(9), NA), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_impute( .recipe_object = rec_obj, value, .type_of_imputation = "roll", .roll_statistic = median )$impute_rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(c(rnorm(9), NA), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_impute( .recipe_object = rec_obj, value, .type_of_imputation = "roll", .roll_statistic = median )$impute_rec_obj %>% get_juiced_data()
Takes in a recipe and will scale values using a selected recipe.
hai_data_poly(.recipe_object = NULL, ..., .p_degree = 2)
hai_data_poly(.recipe_object = NULL, ..., .p_degree = 2)
.recipe_object |
The data that you want to process |
... |
One or more selector functions to choose variables to be imputed. When used with imp_vars, these dots indicate which variables are used to predict the missing data in each variable. See selections() for more details |
.p_degree |
The polynomial degree, an integer. |
This function will get your data ready for processing with many types of ml/ai models.
This is intended to be used inside of the data processor and therefore is an internal function. This documentation exists to explain the process and help the user understand the parameters that can be set in the pre-processor function.
A list object
Steven P. Sanderson II, MPH
https://recipes.tidymodels.org/reference/step_poly.html
Other Data Recipes:
hai_data_impute()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
pca_your_recipe()
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_poly( .recipe_object = rec_obj, value )$scale_rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_poly( .recipe_object = rec_obj, value )$scale_rec_obj %>% get_juiced_data()
Takes in a recipe and will scale values using a selected recipe. To call the recipe use a quoted argument like "scale" or "normalize".
hai_data_scale( .recipe_object = NULL, ..., .type_of_scale = "center", .range_min = 0, .range_max = 1, .scale_factor = 1 )
hai_data_scale( .recipe_object = NULL, ..., .type_of_scale = "center", .range_min = 0, .range_max = 1, .scale_factor = 1 )
.recipe_object |
The data that you want to process |
... |
One or more selector functions to choose variables to be imputed. When used with imp_vars, these dots indicate which variables are used to predict the missing data in each variable. See selections() for more details |
.type_of_scale |
This is a quoted argument and can be one of the following:
|
.range_min |
A single numeric value for the smallest value in the range. This defaults to 0. |
.range_max |
A single numeric value for the largeest value in the range. This defaults to 1. |
.scale_factor |
A numeric value of either 1 or 2 that scales the numeric inputs by one or two standard deviations. By dividing by two standard deviations, the coefficients attached to continuous predictors can be interpreted the same way as with binary inputs. Defaults to 1. More in reference below. |
This function will get your data ready for processing with many types of ml/ai models.
This is intended to be used inside of the data processor and therefore is an internal function. This documentation exists to explain the process and help the user understand the parameters that can be set in the pre-processor function.
A list object
Steven P. Sanderson II, MPH
Gelman, A. (2007) "Scaling regression inputs by dividing by two standard deviations." Unpublished. Source: http://www.stat.columbia.edu/~gelman/research/unpublished/standardizing.pdf.
https://recipes.tidymodels.org/reference/index.html#section-step-functions-normalization
step_center
https://recipes.tidymodels.org/reference/step_center.html
step_normalize
https://recipes.tidymodels.org/reference/step_normalize.html
step_range
https://recipes.tidymodels.org/reference/step_range.html
step_scale
https://recipes.tidymodels.org/reference/step_scale.html
Other Data Recipes:
hai_data_impute()
,
hai_data_poly()
,
hai_data_transform()
,
hai_data_trig()
,
pca_your_recipe()
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_scale( .recipe_object = rec_obj, value, .type_of_scale = "center" )$scale_rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_scale( .recipe_object = rec_obj, value, .type_of_scale = "center" )$scale_rec_obj %>% get_juiced_data()
Takes in a recipe and will perform the desired transformation on the selected varialbe(s) using a selected recipe. To call the desired transformation recipe use a quoted argument like "boxcos", "bs" etc.
hai_data_transform( .recipe_object = NULL, ..., .type_of_scale = "log", .bc_limits = c(-5, 5), .bc_num_unique = 5, .bs_deg_free = NULL, .bs_degree = 3, .log_base = exp(1), .log_offset = 0, .logit_offset = 0, .ns_deg_free = 2, .rel_shift = 0, .rel_reverse = FALSE, .rel_smooth = FALSE, .yj_limits = c(-5, 5), .yj_num_unique = 5 )
hai_data_transform( .recipe_object = NULL, ..., .type_of_scale = "log", .bc_limits = c(-5, 5), .bc_num_unique = 5, .bs_deg_free = NULL, .bs_degree = 3, .log_base = exp(1), .log_offset = 0, .logit_offset = 0, .ns_deg_free = 2, .rel_shift = 0, .rel_reverse = FALSE, .rel_smooth = FALSE, .yj_limits = c(-5, 5), .yj_num_unique = 5 )
.recipe_object |
The data that you want to process |
... |
One or more selector functions to choose variables to be imputed. When used with imp_vars, these dots indicate which variables are used to predict the missing data in each variable. See selections() for more details |
.type_of_scale |
This is a quoted argument and can be one of the following:
|
.bc_limits |
A length 2 numeric vector defining the range to compute the transformation parameter lambda. |
.bc_num_unique |
An integer to specify minimum required unique values to evaluate for a transformation |
.bs_deg_free |
The degrees of freedom for the spline. As the degrees of freedom for a spline increase, more flexible and complex curves can be generated. When a single degree of freedom is used, the result is a rescaled version of the original data. |
.bs_degree |
Degree of polynomial spline (integer). |
.log_base |
A numberic value for the base. |
.log_offset |
An optional value to add to the data prior to logging (to avoid log(0)) |
.logit_offset |
A numberic value to modify values ofthe columns that are
either one or zero. They are modifed to be |
.ns_deg_free |
The degrees of freedom for the natural spline. As the degrees of freedom for a natural spline increase, more flexible and complex curves can be generated. When a single degree of freedom is used, the result is a rescaled version of the original data. |
.rel_shift |
A numeric value dictating a translation to apply to the data. |
.rel_reverse |
A logical to indicate if theleft hinge should be used as opposed to the right hinge. |
.rel_smooth |
A logical indicating if hte softplus function, a smooth approximation to the rectified linear transformation, should be used. |
.yj_limits |
A length 2 numeric vector defining the range to compute the transformation parameter lambda. |
.yj_num_unique |
An integer where data that have less possible values will not be evaluated for a transformation. |
This function will get your data ready for processing with many types of ml/ai models.
This is intended to be used inside of the data processor and therefore is an internal function. This documentation exists to explain the process and help the user understand the parameters that can be set in the pre-processor function.
A list object
Steven P. Sanderson II, MPH
https://recipes.tidymodels.org/reference/step_BoxCox.html
https://recipes.tidymodels.org/reference/step_bs.html
https://recipes.tidymodels.org/reference/step_log.html
https://recipes.tidymodels.org/reference/step_logit.html
https://recipes.tidymodels.org/reference/step_ns.html
https://recipes.tidymodels.org/reference/step_relu.html
https://recipes.tidymodels.org/reference/step_sqrt.html
https://recipes.tidymodels.org/reference/step_YeoJohnson.html
Other Data Recipes:
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_trig()
,
pca_your_recipe()
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_transform( .recipe_object = rec_obj, value, .type_of_scale = "log" )$scale_rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_transform( .recipe_object = rec_obj, value, .type_of_scale = "log" )$scale_rec_obj %>% get_juiced_data()
Takes in a recipe and will scale values using a selected recipe. To call the recipe use a quoted argument like "sinh", "cosh" or "tanh".
hai_data_trig( .recipe_object = NULL, ..., .type_of_scale = "sinh", .inverse = FALSE )
hai_data_trig( .recipe_object = NULL, ..., .type_of_scale = "sinh", .inverse = FALSE )
.recipe_object |
The data that you want to process |
... |
One or more selector functions to choose variables to be imputed. When used with imp_vars, these dots indicate which variables are used to predict the missing data in each variable. See selections() for more details |
.type_of_scale |
This is a quoted argument and can be one of the following:
|
.inverse |
A logical: should the inverse function be used? Default is FALSE |
This function will get your data ready for processing with many types of ml/ai models.
This is intended to be used inside of the data processor and therefore is an internal function. This documentation exists to explain the process and help the user understand the parameters that can be set in the pre-processor function.
A list object
Steven P. Sanderson II, MPH
https://recipes.tidymodels.org/reference/step_hyperbolic.html
Other Data Recipes:
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
pca_your_recipe()
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_trig( .recipe_object = rec_obj, value, .type_of_scale = "sinh" )$scale_rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) date_seq <- seq.Date(from = as.Date("2013-01-01"), length.out = 100, by = "month") val_seq <- rep(rnorm(10, mean = 6, sd = 2), times = 10) df_tbl <- tibble( date_col = date_seq, value = val_seq ) rec_obj <- recipe(value ~ ., df_tbl) hai_data_trig( .recipe_object = rec_obj, value, .type_of_scale = "sinh" )$scale_rec_obj %>% get_juiced_data()
Default classification metric sets from yardstick
hai_default_classification_metric_set()
hai_default_classification_metric_set()
Default classification metric sets from yardstick
A yardstick metric set tibble
Steven P. Sanderson II, MPH
Other Default Metric Sets:
hai_default_regression_metric_set()
hai_default_classification_metric_set()
hai_default_classification_metric_set()
Default regression metric sets from yardstick
hai_default_regression_metric_set()
hai_default_regression_metric_set()
Default regression metric sets from yardstick
A yardstick metric set tibble
Steven P. Sanderson II, MPH
Other Default Metric Sets:
hai_default_classification_metric_set()
hai_default_regression_metric_set()
hai_default_regression_metric_set()
this will produce a ggplot2
or plotly
histogram plot of the
density information provided from the hai_get_density_data_tbl
function.
hai_density_hist_plot( .data, .dist_name_col = distribution, .value_col = dist_data, .alpha = 0.382, .interactive = FALSE )
hai_density_hist_plot( .data, .dist_name_col = distribution, .value_col = dist_data, .alpha = 0.382, .interactive = FALSE )
.data |
The data that is produced from using |
.dist_name_col |
The column that has the distribution name, should be distribution and that is set as the default. |
.value_col |
The column that contains the x values that comes from the
|
.alpha |
The alpha parameter for ggplot |
.interactive |
This is a Boolean fo TRUE/FALSE and is defaulted to FALSE.
TRUE will produce a |
This will produce a histogram of the density information that is
produced from the function hai_get_density_data_tbl
. It will look for an attribute
from the .data
param to ensure the function was used.
A plot, either ggplot2
or plotly
Steven P. Sanderson II, MPH
Other Distribution Plots:
hai_density_plot()
,
hai_density_qq_plot()
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() dist_data_tbl <- hai_get_dist_data_tbl(df) hai_density_hist_plot( .data = dist_data_tbl, .dist_name_col = distribution, .value_col = dist_data, .alpha = 0.5, .interactive = FALSE )
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() dist_data_tbl <- hai_get_dist_data_tbl(df) hai_density_hist_plot( .data = dist_data_tbl, .dist_name_col = distribution, .value_col = dist_data, .alpha = 0.5, .interactive = FALSE )
this will produce a ggplot2
or plotly
histogram plot of the
density information provided from the hai_get_density_data_tbl
function.
hai_density_plot( .data, .dist_name_col, .x_col, .y_col, .size = 1, .alpha = 0.382, .interactive = FALSE )
hai_density_plot( .data, .dist_name_col, .x_col, .y_col, .size = 1, .alpha = 0.382, .interactive = FALSE )
.data |
The data that is produced from using |
.dist_name_col |
The column that has the distribution name, should be distribution and that is set as the default. |
.x_col |
The x value from the tidied density object. |
.y_col |
The y value from the tidied density object. |
.size |
The size parameter for ggplot. |
.alpha |
The alpha parameter for ggplot. |
.interactive |
This is a Boolean fo TRUE/FALSE and is defaulted to FALSE.
TRUE will produce a |
This will produce a density plot of the density information that is
produced from the function hai_get_density_data_tbl
. It will look for an attribute
from the .data
param to ensure the function was used.
A plot, either ggplot
2 or plotly
Steven P. Sanderson II, MPH
Other Distribution Plots:
hai_density_hist_plot()
,
hai_density_qq_plot()
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() tidy_density_tbl <- hai_get_density_data_tbl(df) hai_density_plot( .data = tidy_density_tbl, .dist_name_col = distribution, .x_col = x, .y_col = y, .alpha = 0.5, .interactive = FALSE )
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() tidy_density_tbl <- hai_get_density_data_tbl(df) hai_density_plot( .data = tidy_density_tbl, .dist_name_col = distribution, .x_col = x, .y_col = y, .alpha = 0.5, .interactive = FALSE )
this will produce a ggplot2
or plotly
histogram plot of the
density information provided from the hai_get_density_data_tbl
function.
hai_density_qq_plot( .data, .dist_name_col = distribution, .x_col = x, .y_col = y, .size = 1, .alpha = 0.382, .interactive = FALSE )
hai_density_qq_plot( .data, .dist_name_col = distribution, .x_col = x, .y_col = y, .size = 1, .alpha = 0.382, .interactive = FALSE )
.data |
The data that is produced from using |
.dist_name_col |
The column that has the distribution name, should be distribution and that is set as the default. |
.x_col |
The column that contains the x values that comes from the
|
.y_col |
The column that contains the y values that comes from the
|
.size |
The size parameter for ggplot |
.alpha |
The alpha parameter for ggplot |
.interactive |
This is a Boolean fo TRUE/FALSE and is defaulted to FALSE.
TRUE will produce a |
This will produce a qq plot of the density information that is
produced from the function hai_get_density_data_tbl
. It will look for an attribute
from the .data
param to ensure the function was used.
A plot, either ggplot2
or plotly
Steven P. Sanderson II, MPH
Other Distribution Plots:
hai_density_hist_plot()
,
hai_density_plot()
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() tidy_density_tbl <- hai_get_density_data_tbl(df) hai_density_qq_plot( .data = tidy_density_tbl, .dist_name_col = distribution, .x_col = x, .y_col = y, .size = 1, .alpha = 0.5, .interactive = FALSE )
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() tidy_density_tbl <- hai_get_density_data_tbl(df) hai_density_qq_plot( .data = tidy_density_tbl, .dist_name_col = distribution, .x_col = x, .y_col = y, .size = 1, .alpha = 0.5, .interactive = FALSE )
This function will attempt to get some key information on the data you pass to it. It will also automatically normalize the data from 0 to 1. This will not change the distribution just it's scale in order to make sure that many different types of distributions can be fit to the data, which should help identify what the distribution of the passed data could be.
The resulting output has attributes added to it that get used in other functions that are meant to compliment each other.
This function will automatically pass the .x
parameter to hai_skewness_vec()
and hai_kurtosis_vec()
in order to help create the random data
from the distributions.
The distributions that can be chosen from are:
Distribution | R stats::dist |
normal | rnorm |
uniform | runif |
exponential | rexp |
logistic | rlogis |
beta | rbeta |
lognormal | rlnorm |
gamma | rgamma |
weibull | weibull |
chisquare | rchisq |
cauchy | rcauchy |
hypergeometric | rhyper |
f | rf |
poisson | rpois |
hai_distribution_comparison_tbl( .x, .distributions = c("gamma", "beta"), .normalize = TRUE )
hai_distribution_comparison_tbl( .x, .distributions = c("gamma", "beta"), .normalize = TRUE )
.x |
The numeric vector to analyze. |
.distributions |
A character vector of distributions to check. For example, c("gamma","beta") |
.normalize |
A boolean value of TRUE/FALSE, the default is TRUE. This
will normalize the data using the |
Get information on the empirical distribution of your data along with
generated densities of other distributions. This information is in the resulting
tibble that is generated. Three columns will generate, Distribution, from the
param .distributions
, dist_data
which is a list vector of density
values passed to the underlying stats r distribution function, and density_data
,
which is the dist_data
column passed to list(stats::density(unlist(dist_data)))
This has the effect of giving you the desired vector that can be used in resultant
plots (dist_data
) or you can interact with the density
object itself.
If the skewness of the distribution is negative, then for the gamma and beta
distributions the skew is set equal to the kurtosis and the kurtosis is set
equal to sqrt((skew)^2)
A tibble.
Steven P. Sanderson II, MPH
Other Distribution Functions:
hai_get_density_data_tbl()
,
hai_get_dist_data_tbl()
x_vec <- hai_scale_zero_one_vec(mtcars$mpg) df <- hai_distribution_comparison_tbl( .x = x_vec, .distributions = c("beta", "gamma") ) df
x_vec <- hai_scale_zero_one_vec(mtcars$mpg) df <- hai_distribution_comparison_tbl( .x = x_vec, .distributions = c("beta", "gamma") ) df
Automatically prep a data.frame/tibble for use in the Earth algorithm.
hai_earth_data_prepper(.data, .recipe_formula)
hai_earth_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the Earth algorithm. The Earth algorithm is for classification and regression.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other Earth:
hai_auto_earth()
library(ggplot2) # Regression hai_earth_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_earth_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_earth_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_earth_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
library(ggplot2) # Regression hai_earth_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_earth_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_earth_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_earth_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
Takes a numeric vector(s) or date and will return a tibble of one of the following:
"sin"
"cos"
"sincos"
c("sin","cos","sincos")
hai_fourier_augment( .data, .value, .period, .order, .names = "auto", .scale_type = c("sin", "cos", "sincos") )
hai_fourier_augment( .data, .value, .period, .order, .names = "auto", .scale_type = c("sin", "cos", "sincos") )
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.period |
The number of observations that complete a cycle |
.order |
The fourier term order |
.names |
The default is "auto" |
.scale_type |
A character of one of the following: "sin","cos", or sincos" All can be passed by setting the param equal to c("sin","cos","sincos") |
Takes a numeric vector or date and will return a vector of one of the following:
"sin"
"cos"
"sincos"
c("sin","cos","sincos")
This function is intended to be used on its own in order to add columns to a tibble.
A augmented tibble
Steven P. Sanderson II, MPH
Other Augment Function:
hai_fourier_discrete_augment()
,
hai_hyperbolic_augment()
,
hai_polynomial_augment()
,
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_move_augment()
,
hai_winsorized_truncate_augment()
suppressPackageStartupMessages(library(dplyr)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_fourier_augment(data_tbl, b, .period = 12, .order = 1, .scale_type = "sin") hai_fourier_augment(data_tbl, b, .period = 12, .order = 1, .scale_type = "cos")
suppressPackageStartupMessages(library(dplyr)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_fourier_augment(data_tbl, b, .period = 12, .order = 1, .scale_type = "sin") hai_fourier_augment(data_tbl, b, .period = 12, .order = 1, .scale_type = "cos")
Takes a numeric vector(s) or date and will return a tibble of one of the following:
"sin"
"cos"
"sincos"
c("sin","cos","sincos") When either of these values falls below zero, then zero else one
hai_fourier_discrete_augment( .data, .value, .period, .order, .names = "auto", .scale_type = c("sin", "cos", "sincos") )
hai_fourier_discrete_augment( .data, .value, .period, .order, .names = "auto", .scale_type = c("sin", "cos", "sincos") )
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.period |
The number of observations that complete a cycle |
.order |
The fourier term order |
.names |
The default is "auto" |
.scale_type |
A character of one of the following: "sin","cos", or sincos" All can be passed by setting the param equal to c("sin","cos","sincos") |
Takes a numeric vector or a date and will return a vector of one of the following:
"sin"
"cos"
"sincos"
c("sin","cos","sincos")
This function is intended to be used on its own in order to add columns to a tibble.
A augmented tibble
Steven P. Sanderson II, MPH
Other Augment Function:
hai_fourier_augment()
,
hai_hyperbolic_augment()
,
hai_polynomial_augment()
,
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_move_augment()
,
hai_winsorized_truncate_augment()
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_fourier_discrete_augment(data_tbl, b, .period = 2 * 12, .order = 1, .scale_type = "sin") hai_fourier_discrete_augment(data_tbl, b, .period = 2 * 12, .order = 1, .scale_type = "cos")
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_fourier_discrete_augment(data_tbl, b, .period = 2 * 12, .order = 1, .scale_type = "sin") hai_fourier_discrete_augment(data_tbl, b, .period = 2 * 12, .order = 1, .scale_type = "cos")
Takes a numeric vector or date and will return a vector of one of the following:
"sin"
"cos"
"sincos" This will do value = sin(x) * cos(x) When either of these values falls below zero, then zero else one
hai_fourier_discrete_vec( .x, .period, .order, .scale_type = c("sin", "cos", "sincos") )
hai_fourier_discrete_vec( .x, .period, .order, .scale_type = c("sin", "cos", "sincos") )
.x |
A numeric vector |
.period |
The number of observations that complete a cycle |
.order |
The fourier term order |
.scale_type |
A character of one of the following: "sin","cos","sincos" |
Takes a numeric vector or date and will return a vector of one of the following:
"sin"
"cos"
"sincos"
The internal caluclation is straightforward:
sin = sin(2 * pi * h * x)
, where h = .order/.period
cos = cos(2 * pi * h * x)
, where h = .order/.period
sincos = sin(2 * pi * h * x) * cos(2 * pi * h * x)
where h = .order/.period
This function can be used on its own. It is also the basis for the function
hai_fourier_discrete_augment()
.
A numeric vector of 1's and 0's
Steven P. Sanderson II, MPH
Other Vector Function:
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_fourier_discrete_vec(data_tbl$a, .period = 12, .order = 1, .scale_type = "sin") vec_2 <- hai_fourier_discrete_vec(data_tbl$a, .period = 12, .order = 1, .scale_type = "cos") vec_3 <- hai_fourier_discrete_vec(data_tbl$a, .period = 12, .order = 1, .scale_type = "sincos") plot(data_tbl$b) lines(vec_1, col = "blue") lines(vec_2, col = "red") lines(vec_3, col = "green")
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_fourier_discrete_vec(data_tbl$a, .period = 12, .order = 1, .scale_type = "sin") vec_2 <- hai_fourier_discrete_vec(data_tbl$a, .period = 12, .order = 1, .scale_type = "cos") vec_3 <- hai_fourier_discrete_vec(data_tbl$a, .period = 12, .order = 1, .scale_type = "sincos") plot(data_tbl$b) lines(vec_1, col = "blue") lines(vec_2, col = "red") lines(vec_3, col = "green")
Takes a numeric vector and will return a vector of one of the following:
"sin"
"cos"
"sincos" This will do value = sin(x) * cos(x)
hai_fourier_vec(.x, .period, .order, .scale_type = c("sin", "cos", "sincos"))
hai_fourier_vec(.x, .period, .order, .scale_type = c("sin", "cos", "sincos"))
.x |
A numeric vector |
.period |
The number of observations that complete a cycle |
.order |
The fourier term order |
.scale_type |
A character of one of the following: "sin","cos","sincos" |
Takes a numeric vector and will return a vector of one of the following:
"sin"
"cos"
"sincos"
The internal caluclation is straightforward:
sin = sin(2 * pi * h * x)
, where h = .order/.period
cos = cos(2 * pi * h * x)
, where h = .order/.period
sincos = sin(2 * pi * h * x) * cos(2 * pi * h * x)
where h = .order/.period
This function can be used on it's own. It is also the basis for the function
hai_fourier_augment()
.
A numeric vector
Steven P. Sanderson II, MPH
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_fourier_vec(data_tbl$b, .period = 12, .order = 1, .scale_type = "sin") vec_2 <- hai_fourier_vec(data_tbl$b, .period = 12, .order = 1, .scale_type = "cos") vec_3 <- hai_fourier_vec(data_tbl$date_col, .period = 12, .order = 1, .scale_type = "sincos") plot(data_tbl$b) lines(vec_1, col = "blue") lines(vec_2, col = "red") lines(vec_3, col = "green")
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_fourier_vec(data_tbl$b, .period = 12, .order = 1, .scale_type = "sin") vec_2 <- hai_fourier_vec(data_tbl$b, .period = 12, .order = 1, .scale_type = "cos") vec_3 <- hai_fourier_vec(data_tbl$date_col, .period = 12, .order = 1, .scale_type = "sincos") plot(data_tbl$b) lines(vec_1, col = "blue") lines(vec_2, col = "red") lines(vec_3, col = "green")
This function will return a tibble that can either be nested/unnested,
and grouped or un-grouped. The .data
argument must be the output of the
hai_distribution_comparison_tbl()
function.
hai_get_density_data_tbl(.data, .unnest = TRUE, .group_data = TRUE)
hai_get_density_data_tbl(.data, .unnest = TRUE, .group_data = TRUE)
.data |
The data from the |
.unnest |
Should the resulting tibble be un-nested, a Boolean value TRUE/FALSE. The default is TRUE |
.group_data |
Should the resulting tibble be grouped, a Boolean value TRUE/FALSE. The default is FALSE |
This function expects to take the output of the hai_distribution_comparison_tbl()
function. It returns a tibble of the tidy
density data.
A tibble.
Steven P. Sanderson II, MPH
Other Distribution Functions:
hai_distribution_comparison_tbl()
,
hai_get_dist_data_tbl()
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() hai_get_density_data_tbl(df)
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() hai_get_density_data_tbl(df)
This function will return a tibble that can either be nested/unnested,
and grouped or ungrouped. The .data
argument must be the output of the
hai_distribution_comparison_tbl()
function.
hai_get_dist_data_tbl(.data, .unnest = TRUE, .group_data = FALSE)
hai_get_dist_data_tbl(.data, .unnest = TRUE, .group_data = FALSE)
.data |
The data from the |
.unnest |
Should the resulting tibble be unnested, a boolean value TRUE/FALSE. The default is TRUE |
.group_data |
Shold the resulting tibble be grouped, a boolean value TRUE/FALSE. The default is FALSE |
This function expects to take the output of the hai_distribution_comparison_tbl()
function. It returns a tibble of the distribution and the randomly generated
data produced from the associated stats r function like rnorm
A tibble.
Steven P. Sanderson II, MPH
Other Distribution Functions:
hai_distribution_comparison_tbl()
,
hai_get_density_data_tbl()
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() hai_get_dist_data_tbl(df)
library(dplyr) df <- hai_scale_zero_one_vec(.x = mtcars$mpg) %>% hai_distribution_comparison_tbl() hai_get_dist_data_tbl(df)
Automatically prep a data.frame/tibble for use in the glmnet algorithm.
hai_glmnet_data_prepper(.data, .recipe_formula)
hai_glmnet_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the glmnet algorithm. It expects data to be presented in a certain fashion.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other knn:
hai_knn_data_prepper()
library(ggplot2) hai_glmnet_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .) rec_obj <- hai_glmnet_data_prepper(Titanic, Survived ~ .) get_juiced_data(rec_obj)
library(ggplot2) hai_glmnet_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .) rec_obj <- hai_glmnet_data_prepper(Titanic, Survived ~ .) get_juiced_data(rec_obj)
This function expects a data.frame/tibble and will return a faceted histogram.
hai_histogram_facet_plot( .data, .bins = 10, .scale_data = FALSE, .ncol = 5, .fct_reorder = FALSE, .fct_rev = FALSE, .fill = "steelblue", .color = "white", .scale = "free", .interactive = FALSE )
hai_histogram_facet_plot( .data, .bins = 10, .scale_data = FALSE, .ncol = 5, .fct_reorder = FALSE, .fct_rev = FALSE, .fill = "steelblue", .color = "white", .scale = "free", .interactive = FALSE )
.data |
The data you want to pass to the function. |
.bins |
The number of bins for the histograms. |
.scale_data |
This is a boolean set to FALSE. TRUE will use |
.ncol |
The number of columns for the facet_warp argument. |
.fct_reorder |
Should the factor column be reordered? TRUE/FALSE, default of FALSE |
.fct_rev |
Should the factor column be reversed? TRUE/FALSE, default of FALSE |
.fill |
Default is |
.color |
Default is 'white' |
.scale |
Default is 'free' |
.interactive |
Default is FALSE, TRUE will produce a |
Takes in a data.frame/tibble and returns a faceted historgram.
A ggplot or plotly plot
Steven P. Sanderson II, MPH
hai_histogram_facet_plot(.data = iris) hai_histogram_facet_plot(.data = iris, .scale_data = TRUE)
hai_histogram_facet_plot(.data = iris) hai_histogram_facet_plot(.data = iris, .scale_data = TRUE)
Takes a numeric vector(s) or date and will return a tibble of one of the following:
"sin"
"cos"
"tan"
"sincos"
c("sin","cos","tan", "sincos")
hai_hyperbolic_augment( .data, .value, .names = "auto", .scale_type = c("sin", "cos", "tan", "sincos") )
hai_hyperbolic_augment( .data, .value, .names = "auto", .scale_type = c("sin", "cos", "tan", "sincos") )
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.names |
The default is "auto" |
.scale_type |
A character of one of the following: "sin","cos","tan", "sincos" All can be passed by setting the param equal to c("sin","cos","tan","sincos") |
Takes a numeric vector or date and will return a vector of one of the following:
"sin"
"cos"
"tan"
"sincos"
c("sin","cos","tan", "sincos")
This function is intended to be used on its own in order to add columns to a tibble.
A augmented tibble
Steven P. Sanderson II, MPH
Other Augment Function:
hai_fourier_augment()
,
hai_fourier_discrete_augment()
,
hai_polynomial_augment()
,
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_move_augment()
,
hai_winsorized_truncate_augment()
suppressPackageStartupMessages(library(dplyr)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_hyperbolic_augment(data_tbl, b, .scale_type = "sin") hai_hyperbolic_augment(data_tbl, b, .scale_type = "tan")
suppressPackageStartupMessages(library(dplyr)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_hyperbolic_augment(data_tbl, b, .scale_type = "sin") hai_hyperbolic_augment(data_tbl, b, .scale_type = "tan")
Takes a numeric vector and will return a vector of one of the following:
"sin"
"cos"
"tan"
"sincos" This will do value = sin(x) * cos(x)
hai_hyperbolic_vec(.x, .scale_type = c("sin", "cos", "tan", "sincos"))
hai_hyperbolic_vec(.x, .scale_type = c("sin", "cos", "tan", "sincos"))
.x |
A numeric vector |
.scale_type |
A character of one of the following: "sin","cos","tan","sincos" |
Takes a numeric vector and will return a vector of one of the following:
"sin"
"cos"
"tan"
"sincos"
This function can be used on it's own. It is also the basis for the function
hai_hyperbolic_augment()
.
A numeric vector
Steven P. Sanderson II, MPH
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin") vec_2 <- hai_hyperbolic_vec(data_tbl$b, .scale_type = "cos") vec_3 <- hai_hyperbolic_vec(data_tbl$b, .scale_type = "sincos") plot(data_tbl$b) lines(vec_1, col = "blue") lines(vec_2, col = "red") lines(vec_3, col = "green")
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_hyperbolic_vec(data_tbl$b, .scale_type = "sin") vec_2 <- hai_hyperbolic_vec(data_tbl$b, .scale_type = "cos") vec_3 <- hai_hyperbolic_vec(data_tbl$b, .scale_type = "sincos") plot(data_tbl$b) lines(vec_1, col = "blue") lines(vec_2, col = "red") lines(vec_3, col = "green")
This is a wrapper around the h2o::h2o.kmeans()
function that will return a list
object with a lot of useful and easy to use tidy style information.
hai_kmeans_automl( .data, .split_ratio = 0.8, .seed = 1234, .centers = 10, .standardize = TRUE, .print_model_summary = TRUE, .predictors, .categorical_encoding = "auto", .initialization_mode = "Furthest", .max_iterations = 100 )
hai_kmeans_automl( .data, .split_ratio = 0.8, .seed = 1234, .centers = 10, .standardize = TRUE, .print_model_summary = TRUE, .predictors, .categorical_encoding = "auto", .initialization_mode = "Furthest", .max_iterations = 100 )
.data |
The data that is to be passed for clustering. |
.split_ratio |
The ratio for training and testing splits. |
.seed |
The default is 1234, but can be set to any integer. |
.centers |
The default is 1. Specify the number of clusters (groups of data) in a data set. |
.standardize |
The default is set to TRUE. When TRUE all numeric columns will be set to zero mean and unit variance. |
.print_model_summary |
This is a boolean and controls if the model summary is printed to the console. The default is TRUE. |
.predictors |
This must be in the form of c("column_1", "column_2", ... "column_n") |
.categorical_encoding |
Can be one of the following:
|
.initialization_mode |
This can be one of the following:
|
.max_iterations |
The default is 100. This specifies the number of training iterations |
A list object
Steven P. Sanderson II, MPH
Other Kmeans:
hai_kmeans_automl_predict()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_obj()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_scree_plt()
,
hai_kmeans_tidy_tbl()
,
hai_kmeans_user_item_tbl()
## Not run: h2o.init() output <- hai_kmeans_automl( .data = iris, .predictors = c("Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length"), .standardize = FALSE ) h2o.shutdown() ## End(Not run)
## Not run: h2o.init() output <- hai_kmeans_automl( .data = iris, .predictors = c("Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length"), .standardize = FALSE ) h2o.shutdown() ## End(Not run)
This is a wrapper around the h2o::h2o.predict()
function that will return a list
object with a lot of useful and easy to use tidy style information.
hai_kmeans_automl_predict(.input)
hai_kmeans_automl_predict(.input)
.input |
This is the output of the |
This function will internally take in the output assigned from the
hai_kmeans_automl()
function only and return a list of useful
information. The items that are returned are as follows:
prediction - The h2o dataframe of predictions
prediction_tbl - The h2o predictions in tibble format
valid_tbl - The validation data in tibble format
pred_full_tbl - The entire validation set with the predictions attached using
base::cbind()
. The predictions are in a column called predicted_cluster
and
are in the formate of a factor using forcats::as_factor()
A list object
Steven P. Sanderson II, MPH
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_obj()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_scree_plt()
,
hai_kmeans_tidy_tbl()
,
hai_kmeans_user_item_tbl()
## Not run: h2o.init() output <- hai_kmeans_automl( .data = iris, .predictors = c("Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length"), .standardize = FALSE ) pred <- hai_kmeans_automl_predict(output) h2o.shutdown() ## End(Not run)
## Not run: h2o.init() output <- hai_kmeans_automl( .data = iris, .predictors = c("Sepal.Width", "Sepal.Length", "Petal.Width", "Petal.Length"), .standardize = FALSE ) pred <- hai_kmeans_automl_predict(output) h2o.shutdown() ## End(Not run)
Create a tibble that maps the hai_kmeans_obj()
using purrr::map()
to create a nested data.frame/tibble that holds n centers. This tibble will be
used to help create a scree plot.
hai_kmeans_mapped_tbl(.data, .centers = 15) kmeans_mapped_tbl(.data, .centers = 15)
hai_kmeans_mapped_tbl(.data, .centers = 15) kmeans_mapped_tbl(.data, .centers = 15)
.data |
You must have a tibble in the working environment from the
|
.centers |
How many different centers do you want to try |
Takes in a single parameter of .centers. This is used to create the tibble
and map the hai_kmeans_obj()
function down the list creating a nested tibble.
A nested tibble
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Scree_plot
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_automl_predict()
,
hai_kmeans_obj()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_scree_plt()
,
hai_kmeans_tidy_tbl()
,
hai_kmeans_user_item_tbl()
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() ui_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) hai_kmeans_mapped_tbl(ui_tbl)
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() ui_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) hai_kmeans_mapped_tbl(ui_tbl)
Takes the output of the hai_kmeans_user_item_tbl()
function and applies the
k-means algorithm to it using stats::kmeans()
hai_kmeans_obj(.data, .centers = 5) kmeans_obj(.data, .centers = 5)
hai_kmeans_obj(.data, .centers = 5) kmeans_obj(.data, .centers = 5)
.data |
The data that gets passed from |
.centers |
How many initial centers to start with |
Uses the stats::kmeans()
function and creates a wrapper around it.
A stats k-means object
Steven P. Sanderson II, MPH
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_automl_predict()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_scree_plt()
,
hai_kmeans_tidy_tbl()
,
hai_kmeans_user_item_tbl()
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) %>% hai_kmeans_obj()
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) %>% hai_kmeans_obj()
Take data from the hai_kmeans_mapped_tbl()
and unnest it into a
tibble for inspection and for use in the hai_kmeans_scree_plt()
function.
hai_kmeans_scree_data_tbl(.data) kmeans_scree_data_tbl(.data)
hai_kmeans_scree_data_tbl(.data) kmeans_scree_data_tbl(.data)
.data |
You must have a tibble in the working environment from the
|
Takes in a single parameter of .data from hai_kmeans_mapped_tbl()
and
transforms it into a tibble that is used for hai_kmeans_scree_plt()
. It will
show the values (tot.withinss) at each center.
A nested tibble
Steven P. Sanderson II, MPH
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_automl_predict()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_obj()
,
hai_kmeans_scree_plt()
,
hai_kmeans_tidy_tbl()
,
hai_kmeans_user_item_tbl()
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() ui_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(ui_tbl) hai_kmeans_scree_data_tbl(kmm_tbl)
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() ui_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(ui_tbl) hai_kmeans_scree_data_tbl(kmm_tbl)
Create a scree-plot from the hai_kmeans_mapped_tbl()
function.
hai_kmeans_scree_plt(.data) kmeans_scree_plt(.data) hai_kmeans_scree_plot(.data)
hai_kmeans_scree_plt(.data) kmeans_scree_plt(.data) hai_kmeans_scree_plot(.data)
.data |
The data from the |
Outputs a scree-plot
A ggplot2 plot
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Scree_plot
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_automl_predict()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_obj()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_tidy_tbl()
,
hai_kmeans_user_item_tbl()
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() ui_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(ui_tbl) hai_kmeans_scree_plt(.data = kmm_tbl)
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() ui_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(ui_tbl) hai_kmeans_scree_plt(.data = kmm_tbl)
K-Means tidy functions
hai_kmeans_tidy_tbl(.kmeans_obj, .data, .tidy_type = "tidy") kmeans_tidy_tbl(.kmeans_obj, .data, .tidy_type = "tidy")
hai_kmeans_tidy_tbl(.kmeans_obj, .data, .tidy_type = "tidy") kmeans_tidy_tbl(.kmeans_obj, .data, .tidy_type = "tidy")
.kmeans_obj |
A |
.data |
The user item tibble created from |
.tidy_type |
"tidy","glance", or "augment" |
Takes in a k-means object and its associated user item tibble and then
returns one of the items asked for. Either: broom::tidy()
, broom::glance()
or broom::augment()
. The function defaults to broom::tidy()
.
A tibble
Steven P. Sanderson II, MPH
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_automl_predict()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_obj()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_scree_plt()
,
hai_kmeans_user_item_tbl()
library(healthyR.data) library(dplyr) library(broom) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() uit_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) km_obj <- hai_kmeans_obj(uit_tbl) hai_kmeans_tidy_tbl( .kmeans_obj = km_obj, .data = uit_tbl, .tidy_type = "augment" ) hai_kmeans_tidy_tbl( .kmeans_obj = km_obj, .data = uit_tbl, .tidy_type = "glance" ) hai_kmeans_tidy_tbl( .kmeans_obj = km_obj, .data = uit_tbl, .tidy_type = "tidy" ) %>% glimpse()
library(healthyR.data) library(dplyr) library(broom) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() uit_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) km_obj <- hai_kmeans_obj(uit_tbl) hai_kmeans_tidy_tbl( .kmeans_obj = km_obj, .data = uit_tbl, .tidy_type = "augment" ) hai_kmeans_tidy_tbl( .kmeans_obj = km_obj, .data = uit_tbl, .tidy_type = "glance" ) hai_kmeans_tidy_tbl( .kmeans_obj = km_obj, .data = uit_tbl, .tidy_type = "tidy" ) %>% glimpse()
Takes in a data.frame/tibble and transforms it into an aggregated/normalized user-item tibble of proportions. The user will need to input the parameters for the rows/user and the columns/items.
hai_kmeans_user_item_tbl(.data, .row_input, .col_input, .record_input) kmeans_user_item_tbl(.data, .row_input, .col_input, .record_input)
hai_kmeans_user_item_tbl(.data, .row_input, .col_input, .record_input) kmeans_user_item_tbl(.data, .row_input, .col_input, .record_input)
.data |
The data that you want to transform |
.row_input |
The column that is going to be the row (user) |
.col_input |
The column that is going to be the column (item) |
.record_input |
The column that is going to be summed up for the aggregation and normalization process. |
This function should be used before using a k-mean model. This is commonly referred to as a user-item matrix because "users" tend to be on the rows and "items" (e.g. orders) on the columns. You must supply a column that can be summed for the aggregation and normalization process to occur.
A aggregated/normalized user item tibble
Steven P. Sanderson II, MPH
Other Kmeans:
hai_kmeans_automl()
,
hai_kmeans_automl_predict()
,
hai_kmeans_mapped_tbl()
,
hai_kmeans_obj()
,
hai_kmeans_scree_data_tbl()
,
hai_kmeans_scree_plt()
,
hai_kmeans_tidy_tbl()
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record )
library(healthyR.data) library(dplyr) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record )
Automatically prep a data.frame/tibble for use in the k-NN algorithm.
hai_knn_data_prepper(.data, .recipe_formula)
hai_knn_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the k-NN algorithm. The k-NN algorithm is a lazy learning classification algorithm. It expects data to be presented in a certain fashion.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other knn:
hai_glmnet_data_prepper()
library(ggplot2) hai_knn_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .) rec_obj <- hai_knn_data_prepper(iris, Species ~ .) get_juiced_data(rec_obj)
library(ggplot2) hai_knn_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .) rec_obj <- hai_knn_data_prepper(iris, Species ~ .) get_juiced_data(rec_obj)
This function takes in a vector as it's input and will return the kurtosis of that vector. The length of this vector must be at least four numbers. The kurtosis explains the sharpness of the peak of a distribution of data.
((1/n) * sum(x - mu})^4) / ((()1/n) * sum(x - mu)^2)^2
hai_kurtosis_vec(.x)
hai_kurtosis_vec(.x)
.x |
A numeric vector of length four or more. |
A function to return the kurtosis of a vector.
The kurtosis of a vector
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Kurtosis
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
hai_kurtosis_vec(rnorm(100, 3, 2))
hai_kurtosis_vec(rnorm(100, 3, 2))
This function takes in a data table and a predictor column. A user can either create
their own formula using the .formula
parameter or, if they leave the default of
NULL
then the user must enter a .degree
AND .pred_col
column.
hai_polynomial_augment( .data, .formula = NULL, .pred_col = NULL, .degree = 1, .new_col_prefix = "nt_" )
hai_polynomial_augment( .data, .formula = NULL, .pred_col = NULL, .degree = 1, .new_col_prefix = "nt_" )
.data |
The data being passed that will be augmented by the function. |
.formula |
This should be a valid formula like 'y ~ .^2' or NULL. |
.pred_col |
This is passed |
.degree |
This should be an integer and is used to set the degree in the poly function. The degree must be less than the unique data points or it will error out. |
.new_col_prefix |
The default is "nt_" which stands for "new_term". You can set this to whatever you like, as long as it is a quoted string. |
A valid data.frame/tibble must be passed to this function. It is required that
a user either enter a .formula
or a .degree
AND .pred_col
otherwise this
function will stop and error out.
Under the hood this function will create a stats::poly()
function if the
.formula
is left as NULL
. For example:
.formula = A ~ .^2
OR .degree = 2, .pred_col = A
There is also a parameter .new_col_prefix
which will add a character string
to the column names so that they are easily identified further down the line.
The default is 'nt_'
An augmented tibble
Steven P. Sanderson II, MPH
Other Augment Function:
hai_fourier_augment()
,
hai_fourier_discrete_augment()
,
hai_hyperbolic_augment()
,
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_move_augment()
,
hai_winsorized_truncate_augment()
suppressPackageStartupMessages(library(dplyr)) data_tbl <- data.frame( A = c(0, 2, 4), B = c(1, 3, 5), C = c(2, 4, 6) ) hai_polynomial_augment(.data = data_tbl, .pred_col = A, .degree = 2, .new_col_prefix = "n") hai_polynomial_augment(.data = data_tbl, .formula = A ~ .^2, .degree = 1)
suppressPackageStartupMessages(library(dplyr)) data_tbl <- data.frame( A = c(0, 2, 4), B = c(1, 3, 5), C = c(2, 4, 6) ) hai_polynomial_augment(.data = data_tbl, .pred_col = A, .degree = 2, .new_col_prefix = "n") hai_polynomial_augment(.data = data_tbl, .formula = A ~ .^2, .degree = 1)
Takes in a numeric vector and returns back the range of that vector
hai_range_statistic(.x)
hai_range_statistic(.x)
.x |
A numeric vector |
Takes in a numeric vector and returns the range of that vector using
the diff
and range
functions.
A single number, the range statistic
Steven P. Sandeson II, MPH
hai_range_statistic(seq(1:10))
hai_range_statistic(seq(1:10))
Automatically prep a data.frame/tibble for use in the Ranger algorithm.
hai_ranger_data_prepper(.data, .recipe_formula)
hai_ranger_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the Ranger algorithm.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/rand_forest.html
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other Ranger:
hai_auto_ranger()
library(ggplot2) # Regression hai_ranger_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_ranger_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_ranger_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_ranger_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
library(ggplot2) # Regression hai_ranger_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_ranger_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_ranger_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_ranger_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
8 Hex RGB color definitions suitable for charts for colorblind people.
hai_scale_color_colorblind(..., theme = "hai")
hai_scale_color_colorblind(..., theme = "hai")
... |
Data passed in from a |
theme |
Right now this is |
This function is used in others in order to help render plots for those that are color blind.
A gggplot
layer
Steven P. Sanderson II, MPH
Other Color_Blind:
color_blind()
,
hai_scale_fill_colorblind()
8 Hex RGB color definitions suitable for charts for colorblind people.
hai_scale_fill_colorblind(..., theme = "hai")
hai_scale_fill_colorblind(..., theme = "hai")
... |
Data passed in from a |
theme |
Right now this is |
This function is used in others in order to help render plots for those that are color blind.
A gggplot
layer
Steven P. Sanderson II, MPH
Other Color_Blind:
color_blind()
,
hai_scale_color_colorblind()
Takes a numeric vector and will return a vector that has been scaled from [0,1]
hai_scale_zero_one_augment(.data, .value, .names = "auto")
hai_scale_zero_one_augment(.data, .value, .names = "auto")
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.names |
This is set to 'auto' by default but can be a user supplied character string. |
Takes a numeric vector and will return a vector that has been scaled from [0,1]
The input vector must be numeric. The computation is fairly straightforward.
This may be helpful when trying to compare the distributions of data where a
distribution like beta from the fitdistrplus
package which requires data to be
between 0 and 1
This function is intended to be used on its own in order to add columns to a tibble.
An augmented tibble
Steven P. Sanderson II, MPH
Other Augment Function:
hai_fourier_augment()
,
hai_fourier_discrete_augment()
,
hai_hyperbolic_augment()
,
hai_polynomial_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_move_augment()
,
hai_winsorized_truncate_augment()
Other Scale:
hai_scale_zero_one_vec()
,
hai_scale_zscore_augment()
,
hai_scale_zscore_vec()
,
step_hai_scale_zscore()
df <- data.frame(x = rnorm(100, 2, 1)) hai_scale_zero_one_augment(df, x)
df <- data.frame(x = rnorm(100, 2, 1)) hai_scale_zero_one_augment(df, x)
Takes a numeric vector and will return a vector that has been scaled from [0,1]
hai_scale_zero_one_vec(.x)
hai_scale_zero_one_vec(.x)
.x |
A numeric vector to be scaled from |
Takes a numeric vector and will return a vector that has been scaled from [0,1]
The input vector must be numeric. The computation is fairly straightforward.
This may be helpful when trying to compare the distributions of data where a
distribution like beta from the fitdistrplus
package which requires data to be
between 0 and 1
This function can be used on it's own. It is also the basis for the function
hai_scale_zero_one_augment()
.
A numeric vector
Steven P. Sanderson II, MPH
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
Other Scale:
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_scale_zscore_vec()
,
step_hai_scale_zscore()
vec_1 <- rnorm(100, 2, 1) vec_2 <- hai_scale_zero_one_vec(vec_1) dens_1 <- density(vec_1) dens_2 <- density(vec_2) max_x <- max(dens_1$x, dens_2$x) max_y <- max(dens_1$y, dens_2$y) plot(dens_1, asp = max_y / max_x, main = "Density vec_1 (Red) and vec_2 (Blue)", col = "red", xlab = "", ylab = "Density of Vec 1 and Vec 2" ) lines(dens_2, col = "blue")
vec_1 <- rnorm(100, 2, 1) vec_2 <- hai_scale_zero_one_vec(vec_1) dens_1 <- density(vec_1) dens_2 <- density(vec_2) max_x <- max(dens_1$x, dens_2$x) max_y <- max(dens_1$y, dens_2$y) plot(dens_1, asp = max_y / max_x, main = "Density vec_1 (Red) and vec_2 (Blue)", col = "red", xlab = "", ylab = "Density of Vec 1 and Vec 2" ) lines(dens_2, col = "blue")
Takes a numeric vector and will return a vector that has been scaled by mean and standard deviation
hai_scale_zscore_augment(.data, .value, .names = "auto")
hai_scale_zscore_augment(.data, .value, .names = "auto")
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.names |
This is set to 'auto' by default but can be a user supplied character string. |
Takes a numeric vector and will return a vector that has been scaled by mean and standard deviation.
The input vector must be numeric. The computation is fairly straightforward.
This may be helpful when trying to compare the distributions of data where a
distribution like beta from the fitdistrplus
package which requires data to be
between 0 and 1
This function is intended to be used on its own in order to add columns to a tibble.
An augmented tibble
Steven P. Sanderson II, MPH
Other Augment Function:
hai_fourier_augment()
,
hai_fourier_discrete_augment()
,
hai_hyperbolic_augment()
,
hai_polynomial_augment()
,
hai_scale_zero_one_augment()
,
hai_winsorized_move_augment()
,
hai_winsorized_truncate_augment()
Other Scale:
hai_scale_zero_one_augment()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
step_hai_scale_zscore()
df <- data.frame(x = mtcars$mpg) hai_scale_zscore_augment(df, x)
df <- data.frame(x = mtcars$mpg) hai_scale_zscore_augment(df, x)
Takes a numeric vector and will return a vector that has been scaled from by mean and standard deviation
hai_scale_zscore_vec(.x)
hai_scale_zscore_vec(.x)
.x |
A numeric vector to be scaled by mean and standard deviation inclusive. |
Takes a numeric vector and will return a vector that has been scaled from mean and standard deviation.
The input vector must be numeric. The computation is fairly straightforward.
This may be helpful when trying to compare the distributions of data where a
distribution like beta from the fitdistrplus
package which requires data to be
between 0 and 1
This function can be used on it's own. It is also the basis for the function
hai_scale_zscore_augment()
.
A numeric vector
Steven P. Sanderson II, MPH
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
Other Scale:
hai_scale_zero_one_augment()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_augment()
,
step_hai_scale_zscore()
vec_1 <- mtcars$mpg vec_2 <- hai_scale_zscore_vec(vec_1) ax <- pretty(min(vec_1, vec_2):max(vec_1, vec_2), n = 12) hist(vec_1, breaks = ax, col = "blue") hist(vec_2, breaks = ax, col = "red", add = TRUE)
vec_1 <- mtcars$mpg vec_2 <- hai_scale_zscore_vec(vec_1) ax <- pretty(min(vec_1, vec_2):max(vec_1, vec_2), n = 12) hist(vec_1, breaks = ax, col = "blue") hist(vec_2, breaks = ax, col = "red", add = TRUE)
Takes in a data.frame/tibble and returns a vector of names of the columns that are skewed.
hai_skewed_features(.data, .threshold = 0.6, .drop_keys = NULL)
hai_skewed_features(.data, .threshold = 0.6, .drop_keys = NULL)
.data |
The data.frame/tibble you are passing in. |
.threshold |
A level of skewness that indicates where you feel a column should be considered skewed. |
.drop_keys |
A c() character vector of columns you do not want passed to the function. |
Takes in a data.frame/tibble and returns a vector of names of the skewed
columns. There are two other parameters. The first is the .threshold
parameter
that is set to the level of skewness you want in order to consider the column
too skewed. The second is .drop_keys
, these are columns you don't want to be
considered for whatever reason in the skewness calculation.
A character vector of column names that are skewed.
Steven P. Sandeson II, MPH
hai_skewed_features(mtcars) hai_skewed_features(mtcars, .drop_keys = c("mpg", "hp")) hai_skewed_features(mtcars, .drop_keys = "hp")
hai_skewed_features(mtcars) hai_skewed_features(mtcars, .drop_keys = c("mpg", "hp")) hai_skewed_features(mtcars, .drop_keys = "hp")
This function takes in a vector as it's input and will return the skewness of that vector. The length of this vector must be at least four numbers. The skewness explains the 'tailedness' of the distribution of data.
((1/n) * sum(x - mu})^3) / ((()1/n) * sum(x - mu)^2)^(3/2)
hai_skewness_vec(.x)
hai_skewness_vec(.x)
.x |
A numeric vector of length four or more. |
A function to return the skewness of a vector.
The skewness of a vector
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Skewness
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_winsorized_move_vec()
,
hai_winsorized_truncate_vec()
hai_skewness_vec(rnorm(100, 3, 2))
hai_skewness_vec(rnorm(100, 3, 2))
Automatically prep a data.frame/tibble for use in the SVM_Poly algorithm.
hai_svm_poly_data_prepper(.data, .recipe_formula)
hai_svm_poly_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the SVM_Poly algorithm. The SVM_Poly algorithm is for regression only.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/svm_poly.html
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_rbf_data_prepper()
,
hai_xgboost_data_prepper()
Other SVM_Poly:
hai_auto_svm_poly()
library(ggplot2) # Regression hai_svm_poly_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_svm_poly_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_svm_poly_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_svm_poly_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
library(ggplot2) # Regression hai_svm_poly_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_svm_poly_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_svm_poly_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_svm_poly_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
Automatically prep a data.frame/tibble for use in the SVM_RBF algorithm.
hai_svm_rbf_data_prepper(.data, .recipe_formula)
hai_svm_rbf_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the SVM_RBF algorithm. The SVM_RBF algorithm is for regression only.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/svm_rbf.html
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_xgboost_data_prepper()
Other SVM_RBF:
hai_auto_svm_rbf()
library(ggplot2) # Regression hai_svm_rbf_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_svm_rbf_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_svm_rbf_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_svm_rbf_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
library(ggplot2) # Regression hai_svm_rbf_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_svm_rbf_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_svm_rbf_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_svm_rbf_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
Create a umap object from the uwot::umap()
function.
hai_umap_list(.data, .kmeans_map_tbl, .k_cluster = 5) umap_list(.data, .kmeans_map_tbl, .k_cluster = 5)
hai_umap_list(.data, .kmeans_map_tbl, .k_cluster = 5) umap_list(.data, .kmeans_map_tbl, .k_cluster = 5)
.data |
The data from the |
.kmeans_map_tbl |
The data from the |
.k_cluster |
Pick the desired amount of clusters from your analysis of the scree plot. |
This takes in the user item table/matix that is produced by
hai_kmeans_user_item_tbl()
function. This function uses the defaults of
uwot::umap()
.
A list of tibbles and the umap object
Steven P. Sanderson II, MPH
https://github.com/jlmelville/uwot (GitHub)
https://github.com/jlmelville/uwot (arXiv paper)
Other UMAP:
hai_umap_plot()
library(healthyR.data) library(dplyr) library(broom) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() uit_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(uit_tbl) umap_list(.data = uit_tbl, kmm_tbl, 3)
library(healthyR.data) library(dplyr) library(broom) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() uit_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(uit_tbl) umap_list(.data = uit_tbl, kmm_tbl, 3)
Create a UMAP Projection plot.
hai_umap_plot(.data, .point_size = 2, .label = TRUE) umap_plt(.data, .point_size = 2, .label = TRUE)
hai_umap_plot(.data, .point_size = 2, .label = TRUE) umap_plt(.data, .point_size = 2, .label = TRUE)
.data |
The data from the |
.point_size |
The desired size for the points of the plot. |
.label |
Should |
This takes in umap_kmeans_cluster_results_tbl
from the umap_list()
function output.
A ggplot2 UMAP Projection with clusters represented by colors.
Steven P. Sanderson II, MPH
https://github.com/jlmelville/uwot (GitHub)
https://github.com/jlmelville/uwot (arXiv paper)
Other UMAP:
hai_umap_list()
library(healthyR.data) library(dplyr) library(broom) library(ggplot2) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() uit_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(uit_tbl) ump_lst <- hai_umap_list(.data = uit_tbl, kmm_tbl, 3) hai_umap_plot(.data = ump_lst, .point_size = 3)
library(healthyR.data) library(dplyr) library(broom) library(ggplot2) data_tbl <- healthyR_data %>% filter(ip_op_flag == "I") %>% filter(payer_grouping != "Medicare B") %>% filter(payer_grouping != "?") %>% select(service_line, payer_grouping) %>% mutate(record = 1) %>% as_tibble() uit_tbl <- hai_kmeans_user_item_tbl( .data = data_tbl, .row_input = service_line, .col_input = payer_grouping, .record_input = record ) kmm_tbl <- hai_kmeans_mapped_tbl(uit_tbl) ump_lst <- hai_umap_list(.data = uit_tbl, kmm_tbl, 3) hai_umap_plot(.data = ump_lst, .point_size = 3)
Takes a numeric vector and will return a tibble with the winsorized values.
hai_winsorized_move_augment(.data, .value, .multiple, .names = "auto")
hai_winsorized_move_augment(.data, .value, .multiple, .names = "auto")
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.multiple |
A positive number indicating how many times the the zero center mean absolute deviation should be multiplied by for the scaling parameter. |
.names |
The default is "auto" |
Takes a numeric vector and will return a winsorized vector of values that have been moved some multiple from the mean absolute deviation zero center of some vector. The intent of winsorization is to limit the effect of extreme values.
An augmented tibble
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Winsorizing
Other Augment Function:
hai_fourier_augment()
,
hai_fourier_discrete_augment()
,
hai_hyperbolic_augment()
,
hai_polynomial_augment()
,
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_truncate_augment()
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_winsorized_move_augment(data_tbl, a, .multiple = 3)
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_winsorized_move_augment(data_tbl, a, .multiple = 3)
Takes a numeric vector and will return a vector of winsorized values.
hai_winsorized_move_vec(.x, .multiple = 3)
hai_winsorized_move_vec(.x, .multiple = 3)
.x |
A numeric vector |
.multiple |
A positive number indicating how many times the the zero center mean absolute deviation should be multiplied by for the scaling parameter. |
Takes a numeric vector and will return a winsorized vector of values that have been moved some multiple from the mean absolute deviation zero center of some vector. The intent of winsorization is to limit the effect of extreme values.
A numeric vector
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Winsorizing
This function can be used on it's own. It is also the basis for the function
hai_winsorized_move_augment()
.
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_truncate_vec()
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_winsorized_move_vec(data_tbl$a, .multiple = 1) plot(data_tbl$a) lines(data_tbl$a) lines(vec_1, col = "blue")
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_winsorized_move_vec(data_tbl$a, .multiple = 1) plot(data_tbl$a) lines(data_tbl$a) lines(vec_1, col = "blue")
Takes a numeric vector and will return a tibble with the winsorized values.
hai_winsorized_truncate_augment(.data, .value, .fraction, .names = "auto")
hai_winsorized_truncate_augment(.data, .value, .fraction, .names = "auto")
.data |
The data being passed that will be augmented by the function. |
.value |
This is passed |
.fraction |
A positive fractional between 0 and 0.5 that is passed to the
|
.names |
The default is "auto" |
Takes a numeric vector and will return a winsorized vector of values that have been truncated if they are less than or greater than some defined fraction of a quantile. The intent of winsorization is to limit the effect of extreme values.
An augmented tibble
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Winsorizing
Other Augment Function:
hai_fourier_augment()
,
hai_fourier_discrete_augment()
,
hai_hyperbolic_augment()
,
hai_polynomial_augment()
,
hai_scale_zero_one_augment()
,
hai_scale_zscore_augment()
,
hai_winsorized_move_augment()
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_winsorized_truncate_augment(data_tbl, a, .fraction = 0.05)
suppressPackageStartupMessages(library(dplyr)) len_out <- 24 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) hai_winsorized_truncate_augment(data_tbl, a, .fraction = 0.05)
Takes a numeric vector and will return a vector of winsorized values.
hai_winsorized_truncate_vec(.x, .fraction = 0.05)
hai_winsorized_truncate_vec(.x, .fraction = 0.05)
.x |
A numeric vector |
.fraction |
A positive fractional between 0 and 0.5 that is passed to the
|
Takes a numeric vector and will return a winsorized vector of values that have been truncated if they are less than or greater than some defined fraction of a quantile. The intent of winsorization is to limit the effect of extreme values.
A numeric vector
Steven P. Sanderson II, MPH
https://en.wikipedia.org/wiki/Winsorizing
This function can be used on it's own. It is also the basis for the function
hai_winsorized_truncate_augment()
.
Other Vector Function:
hai_fourier_discrete_vec()
,
hai_fourier_vec()
,
hai_hyperbolic_vec()
,
hai_kurtosis_vec()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_vec()
,
hai_skewness_vec()
,
hai_winsorized_move_vec()
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_winsorized_truncate_vec(data_tbl$a, .fraction = 0.05) plot(data_tbl$a) lines(data_tbl$a) lines(vec_1, col = "blue")
suppressPackageStartupMessages(library(dplyr)) len_out <- 25 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) vec_1 <- hai_winsorized_truncate_vec(data_tbl$a, .fraction = 0.05) plot(data_tbl$a) lines(data_tbl$a) lines(vec_1, col = "blue")
Automatically prep a data.frame/tibble for use in the xgboost algorithm.
hai_xgboost_data_prepper(.data, .recipe_formula)
hai_xgboost_data_prepper(.data, .recipe_formula)
.data |
The data that you are passing to the function. Can be any type
of data that is accepted by the |
.recipe_formula |
The formula that is going to be passed. For example
if you are using the |
This function will automatically prep your data.frame/tibble for use in the XGBoost algorithm.
This function will output a recipe specification.
A recipe object
Steven P. Sanderson II, MPH
https://parsnip.tidymodels.org/reference/details_boost_tree_xgboost.html
Other Preprocessor:
hai_c50_data_prepper()
,
hai_cubist_data_prepper()
,
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
,
hai_earth_data_prepper()
,
hai_glmnet_data_prepper()
,
hai_knn_data_prepper()
,
hai_ranger_data_prepper()
,
hai_svm_poly_data_prepper()
,
hai_svm_rbf_data_prepper()
library(ggplot2) # Regression hai_xgboost_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_xgboost_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_xgboost_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_xgboost_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
library(ggplot2) # Regression hai_xgboost_data_prepper(.data = diamonds, .recipe_formula = price ~ .) reg_obj <- hai_xgboost_data_prepper(diamonds, price ~ .) get_juiced_data(reg_obj) # Classification hai_xgboost_data_prepper(Titanic, Survived ~ .) cla_obj <- hai_xgboost_data_prepper(Titanic, Survived ~ .) get_juiced_data(cla_obj)
This is a simple function that will perform PCA analysis on a passed recipe.
pca_your_recipe(.recipe_object, .data, .threshold = 0.75, .top_n = 5)
pca_your_recipe(.recipe_object, .data, .threshold = 0.75, .top_n = 5)
.recipe_object |
The recipe object you want to pass. |
.data |
The full data set that is used in the original recipe object passed
into |
.threshold |
A number between 0 and 1. A fraction of the total variance that should be covered by the components. |
.top_n |
How many variables loadings should be returned per PC |
This is a simple wrapper around some recipes functions to perform a PCA on a given recipe. This function will output a list and return it invisible. All of the components of the analysis will be returned in a list as their own object that can be selected individually. A scree plot is also included. The items that get returned are:
pca_transform - This is the pca recipe.
variable_loadings
variable_variance
pca_estimates
pca_juiced_estimates
pca_baked_data
pca_variance_df
pca_rotattion_df
pca_variance_scree_plt
pca_loadings_plt
pca_loadings_plotly
pca_top_n_loadings_plt
pca_top_n_plotly
A list object with several components.
Steven P. Sanderson II, MPH
https://recipes.tidymodels.org/reference/step_pca.html
Other Data Recipes:
hai_data_impute()
,
hai_data_poly()
,
hai_data_scale()
,
hai_data_transform()
,
hai_data_trig()
suppressPackageStartupMessages(library(timetk)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(healthyR.data)) suppressPackageStartupMessages(library(rsample)) suppressPackageStartupMessages(library(recipes)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(plotly)) data_tbl <- healthyR_data %>% select(visit_end_date_time) %>% summarise_by_time( .date_var = visit_end_date_time, .by = "month", value = n() ) %>% set_names("date_col", "value") %>% filter_by_time( .date_var = date_col, .start_date = "2013", .end_date = "2020" ) %>% mutate(date_col = as.Date(date_col)) splits <- initial_split(data = data_tbl, prop = 0.8) rec_obj <- recipe(value ~ ., training(splits)) %>% step_timeseries_signature(date_col) %>% step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) output_list <- pca_your_recipe(rec_obj, .data = data_tbl) output_list$pca_variance_scree_plt output_list$pca_loadings_plt output_list$pca_top_n_loadings_plt
suppressPackageStartupMessages(library(timetk)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(healthyR.data)) suppressPackageStartupMessages(library(rsample)) suppressPackageStartupMessages(library(recipes)) suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(plotly)) data_tbl <- healthyR_data %>% select(visit_end_date_time) %>% summarise_by_time( .date_var = visit_end_date_time, .by = "month", value = n() ) %>% set_names("date_col", "value") %>% filter_by_time( .date_var = date_col, .start_date = "2013", .end_date = "2020" ) %>% mutate(date_col = as.Date(date_col)) splits <- initial_split(data = data_tbl, prop = 0.8) rec_obj <- recipe(value ~ ., training(splits)) %>% step_timeseries_signature(date_col) %>% step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)")) output_list <- pca_your_recipe(rec_obj, .data = data_tbl) output_list$pca_variance_scree_plt output_list$pca_loadings_plt output_list$pca_top_n_loadings_plt
step_hai_fourier
creates a a specification of a recipe
step that will convert numeric data into either a 'sin', 'cos', or 'sincos'
feature that can aid in machine learning.
step_hai_fourier( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, scale_type = c("sin", "cos", "sincos"), period = 1, order = 1, skip = FALSE, id = rand_id("hai_fourier") )
step_hai_fourier( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, scale_type = c("sin", "cos", "sincos"), period = 1, order = 1, skip = FALSE, id = rand_id("hai_fourier") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
scale_type |
A character string of a scaling type, one of "sin","cos", or "sincos" |
period |
The number of observations that complete a cycle |
order |
The fourier term order |
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_fourier
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_fourier
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier_discrete()
,
step_hai_hyperbolic()
,
step_hai_scale_zero_one()
,
step_hai_scale_zscore()
,
step_hai_winsorized_move()
,
step_hai_winsorized_truncate()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_fourier(b, scale_type = "sin") %>% step_hai_fourier(b, scale_type = "cos") %>% step_hai_fourier(b, scale_type = "sincos") # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_fourier(b, scale_type = "sin") %>% step_hai_fourier(b, scale_type = "cos") %>% step_hai_fourier(b, scale_type = "sincos") # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
step_hai_fourier_discrete
creates a a specification of a recipe
step that will convert numeric data into either a 'sin', 'cos', or 'sincos'
feature that can aid in machine learning.
step_hai_fourier_discrete( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, scale_type = c("sin", "cos", "sincos"), period = 1, order = 1, skip = FALSE, id = rand_id("hai_fourier_discrete") )
step_hai_fourier_discrete( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, scale_type = c("sin", "cos", "sincos"), period = 1, order = 1, skip = FALSE, id = rand_id("hai_fourier_discrete") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
scale_type |
A character string of a scaling type, one of "sin","cos", or "sincos" |
period |
The number of observations that complete a cycle |
order |
The fourier term order |
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_fourier_discrete
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_fourier_discrete
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier()
,
step_hai_hyperbolic()
,
step_hai_scale_zero_one()
,
step_hai_scale_zscore()
,
step_hai_winsorized_move()
,
step_hai_winsorized_truncate()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_fourier_discrete(b, scale_type = "sin") %>% step_hai_fourier_discrete(b, scale_type = "cos") %>% step_hai_fourier_discrete(b, scale_type = "sincos") # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_fourier_discrete(b, scale_type = "sin") %>% step_hai_fourier_discrete(b, scale_type = "cos") %>% step_hai_fourier_discrete(b, scale_type = "sincos") # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
step_hai_hyperbolic
creates a a specification of a recipe
step that will convert numeric data into either a 'sin', 'cos', or 'tan'
feature that can aid in machine learning.
step_hai_hyperbolic( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, scale_type = c("sin", "cos", "tan", "sincos"), skip = FALSE, id = rand_id("hai_hyperbolic") )
step_hai_hyperbolic( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, scale_type = c("sin", "cos", "tan", "sincos"), skip = FALSE, id = rand_id("hai_hyperbolic") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
scale_type |
A character string of a scaling type, one of "sin","cos","tan" or "sincos" |
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_hyperbolic
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_hyperbolic
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier()
,
step_hai_fourier_discrete()
,
step_hai_scale_zero_one()
,
step_hai_scale_zscore()
,
step_hai_winsorized_move()
,
step_hai_winsorized_truncate()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_hyperbolic(b, scale_type = "sin") %>% step_hai_hyperbolic(b, scale_type = "cos") # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_hyperbolic(b, scale_type = "sin") %>% step_hai_hyperbolic(b, scale_type = "cos") # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
step_hai_scale_zero_one
creates a a specification of a recipe
step that will convert numeric data into from a time series into its
velocity.
step_hai_scale_zero_one( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, skip = FALSE, id = rand_id("hai_scale_zero_one") )
step_hai_scale_zero_one( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, skip = FALSE, id = rand_id("hai_scale_zero_one") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_scale_zero_one
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_scale_zero_one
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier()
,
step_hai_fourier_discrete()
,
step_hai_hyperbolic()
,
step_hai_scale_zscore()
,
step_hai_winsorized_move()
,
step_hai_winsorized_truncate()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) data_tbl <- data.frame(a = rnorm(200, 3, 1), b = rnorm(200, 2, 2)) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_scale_zero_one(b) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% prep() %>% juice()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) data_tbl <- data.frame(a = rnorm(200, 3, 1), b = rnorm(200, 2, 2)) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_scale_zero_one(b) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% prep() %>% juice()
step_hai_scale_zscore
creates a a specification of a recipe
step that will convert numeric data into from a time series into its
velocity.
step_hai_scale_zscore( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, skip = FALSE, id = rand_id("hai_scale_zscore") )
step_hai_scale_zscore( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, skip = FALSE, id = rand_id("hai_scale_zscore") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_scale_zscore
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_scale_zscore
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier()
,
step_hai_fourier_discrete()
,
step_hai_hyperbolic()
,
step_hai_scale_zero_one()
,
step_hai_winsorized_move()
,
step_hai_winsorized_truncate()
Other Scale:
hai_scale_zero_one_augment()
,
hai_scale_zero_one_vec()
,
hai_scale_zscore_augment()
,
hai_scale_zscore_vec()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) data_tbl <- data.frame( a = mtcars$mpg, b = AirPassengers %>% as.vector() %>% head(32) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_scale_zscore(b) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% prep() %>% juice()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) data_tbl <- data.frame( a = mtcars$mpg, b = AirPassengers %>% as.vector() %>% head(32) ) # Create a recipe object rec_obj <- recipe(a ~ ., data = data_tbl) %>% step_hai_scale_zscore(b) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% prep() %>% juice()
step_hai_winsorized_move
creates a a specification of a recipe
step that will winsorize numeric data.
step_hai_winsorized_move( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, multiple = 3, skip = FALSE, id = rand_id("hai_winsorized_move") )
step_hai_winsorized_move( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, multiple = 3, skip = FALSE, id = rand_id("hai_winsorized_move") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
multiple |
A positive number indicating how many times the the zero center mean absolute deviation should be multiplied by for the scaling parameter. |
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_winsorize_move
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_winsorize_move
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier()
,
step_hai_fourier_discrete()
,
step_hai_hyperbolic()
,
step_hai_scale_zero_one()
,
step_hai_scale_zscore()
,
step_hai_winsorized_truncate()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(b ~ ., data = data_tbl) %>% step_hai_winsorized_move(a, multiple = 3) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(b ~ ., data = data_tbl) %>% step_hai_winsorized_move(a, multiple = 3) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
step_hai_winsorized_truncate
creates a a specification of a recipe
step that will winsorize numeric data.
step_hai_winsorized_truncate( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, fraction = 0.05, skip = FALSE, id = rand_id("hai_winsorized_truncate") )
step_hai_winsorized_truncate( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, fraction = 0.05, skip = FALSE, id = rand_id("hai_winsorized_truncate") )
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables that will be used to create the new variables. The
selected variables should have class |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variables that will be
used as inputs. This field is a placeholder and will be
populated once |
fraction |
A positive fractional between 0 and 0.5 that is passed to the
|
skip |
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. |
id |
A character string that is unique to this step to identify it. |
Numeric Variables
Unlike other steps, step_hai_winsorize_truncate
does not
remove the original numeric variables. recipes::step_rm()
can be
used for this purpose.
For step_hai_winsorize_truncate
, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
Main Recipe Functions:
recipes::recipe()
recipes::prep()
recipes::bake()
Other Recipes:
step_hai_fourier()
,
step_hai_fourier_discrete()
,
step_hai_hyperbolic()
,
step_hai_scale_zero_one()
,
step_hai_scale_zscore()
,
step_hai_winsorized_move()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(b ~ ., data = data_tbl) %>% step_hai_winsorized_truncate(a, fraction = 0.05) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(recipes)) len_out <- 10 by_unit <- "month" start_date <- as.Date("2021-01-01") data_tbl <- tibble( date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit), a = rnorm(len_out), b = runif(len_out) ) # Create a recipe object rec_obj <- recipe(b ~ ., data = data_tbl) %>% step_hai_winsorized_truncate(a, fraction = 0.05) # View the recipe object rec_obj # Prepare the recipe object prep(rec_obj) # Bake the recipe object - Adds the Time Series Signature bake(prep(rec_obj), data_tbl) rec_obj %>% get_juiced_data()