Title: | Hospital Data Analysis Workflow Tools |
---|---|
Description: | Hospital data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative hospital data. Some of these include average length of stay, readmission rates, average net pay amounts by service lines just to name a few. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything. |
Authors: | Steven Sanderson [aut, cre, cph] |
Maintainer: | Steven Sanderson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.2.9000 |
Built: | 2024-12-26 06:00:41 UTC |
Source: | https://github.com/spsanderson/healthyR |
Get the counts of a column by a particular grouping if supplied, otherwise just get counts of a column.
category_counts_tbl(.data, .count_col, .arrange_value = TRUE, ...)
category_counts_tbl(.data, .count_col, .arrange_value = TRUE, ...)
.data |
The data.frame/tibble supplied. |
.count_col |
The column that has the values you want to count. |
.arrange_value |
Defaults to true, this will arrange the resulting tibble in descending order by .count_col |
... |
Place the values you want to pass in for grouping here. |
Requires a data.frame/tibble.
Requires a value column, a column that is going to counted.
Steven P. Sanderson II, MPH
Other Data Table Functions:
los_ra_index_summary_tbl()
,
named_item_list()
,
top_n_tbl()
,
ts_census_los_daily_tbl()
,
ts_signature_tbl()
library(healthyR.data) library(dplyr) healthyR_data %>% category_counts_tbl( .count_col = payer_grouping , .arrange = TRUE , ip_op_flag ) healthyR_data %>% category_counts_tbl( .count_col = ip_op_flag , .arrange_value = TRUE , service_line )
library(healthyR.data) library(dplyr) healthyR_data %>% category_counts_tbl( .count_col = payer_grouping , .arrange = TRUE , ip_op_flag ) healthyR_data %>% category_counts_tbl( .count_col = ip_op_flag , .arrange_value = TRUE , service_line )
8 Hex RGB color definitions suitable for charts for colorblind people.
color_blind()
color_blind()
This function is used in others in order to help render plots for those that are color blind.
A vector of 8 Hex RGB definitions.
Steven P. Sanderson II, MPH
Other Color Blind:
hr_scale_color_colorblind()
,
hr_scale_fill_colorblind()
color_blind()
color_blind()
Diverging Bars is a bar chart that can handle both negative and positive
values. This can be implemented by a smart tweak with geom_bar()
. But the
usage of geom_bar()
can be quite confusing. That's because, it can be used to
make a bar chart as well as a histogram. Let me explain.
By default, geom_bar()
has the stat set to count. That means, when you
provide just a continuous X variable (and no Y variable), it tries to make
a histogram out of the data.
In order to make a bar chart create bars instead of histogram,
you need to do two things. Set stat = identity
and provide both x
and y
inside aes()
where, x
is either character or factor and y
is numeric.
In order to make sure you get diverging bars instead of just bars, make sure,
your categorical variable has 2 categories that changes values at a certain
threshold of the continuous variable. In below example, the mpg from mtcars
data set is normalized by computing the z score. Those vehicles with mpg
above zero are marked green and those below are marked red.
diverging_bar_plt( .data, .x_axis, .y_axis, .fill_col, .plot_title = NULL, .plot_subtitle = NULL, .plot_caption = NULL, .interactive = FALSE )
diverging_bar_plt( .data, .x_axis, .y_axis, .fill_col, .plot_title = NULL, .plot_subtitle = NULL, .plot_caption = NULL, .interactive = FALSE )
.data |
The data to pass to the function, must be a tibble/data.frame. |
.x_axis |
The data that is passed to the x-axis. |
.y_axis |
The data that is passed to the y-axis. This will also equal the
parameter |
.fill_col |
The column that will be used to fill the color of the bars. |
.plot_title |
Default is NULL |
.plot_subtitle |
Default is NULL |
.plot_caption |
Default is NULL |
.interactive |
Default is FALSE. TRUE returns a plotly plot |
This function takes only a few arguments and returns a ggplot2 object.
A plotly
plot or a ggplot2
static plot
Steven P. Sanderson II, MPH
Other Plotting Functions:
diverging_lollipop_plt()
,
gartner_magic_chart_plt()
,
los_ra_index_plt()
,
ts_alos_plt()
,
ts_median_excess_plt()
,
ts_plt()
,
ts_readmit_rate_plt()
suppressPackageStartupMessages(library(ggplot2)) data("mtcars") mtcars$car_name <- rownames(mtcars) mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") mtcars <- mtcars[order(mtcars$mpg_z), ] # sort mtcars$car_name <- factor(mtcars$car_name, levels = mtcars$car_name) diverging_bar_plt( .data = mtcars , .x_axis = car_name , .y_axis = mpg_z , .fill_col = mpg_type , .interactive = FALSE )
suppressPackageStartupMessages(library(ggplot2)) data("mtcars") mtcars$car_name <- rownames(mtcars) mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") mtcars <- mtcars[order(mtcars$mpg_z), ] # sort mtcars$car_name <- factor(mtcars$car_name, levels = mtcars$car_name) diverging_bar_plt( .data = mtcars , .x_axis = car_name , .y_axis = mpg_z , .fill_col = mpg_type , .interactive = FALSE )
This is a diverging lollipop function. Lollipop chart conveys the same information as bar chart and diverging bar. Except that it looks more modern. Instead of geom_bar, I use geom_point and geom_segment to get the lollipops right. Let’s draw a lollipop using the same data I prepared in the previous example of diverging bars.
diverging_lollipop_plt( .data, .x_axis, .y_axis, .plot_title = NULL, .plot_subtitle = NULL, .plot_caption = NULL, .interactive = FALSE )
diverging_lollipop_plt( .data, .x_axis, .y_axis, .plot_title = NULL, .plot_subtitle = NULL, .plot_caption = NULL, .interactive = FALSE )
.data |
The data to pass to the function, must be a tibble/data.frame. |
.x_axis |
The data that is passed to the x-axis. This will also be the
|
.y_axis |
The data that is passed to the y-axis. This will also equal the
parameters of |
.plot_title |
Default is NULL |
.plot_subtitle |
Default is NULL |
.plot_caption |
Default is NULL |
.interactive |
Default is FALSE. TRUE returns a plotly plot |
This function takes only a few arguments and returns a ggplot2 object.
A plotly
plot or a ggplot2
static plot
Steven P. Sanderson II, MPH
Other Plotting Functions:
diverging_bar_plt()
,
gartner_magic_chart_plt()
,
los_ra_index_plt()
,
ts_alos_plt()
,
ts_median_excess_plt()
,
ts_plt()
,
ts_readmit_rate_plt()
suppressPackageStartupMessages(library(ggplot2)) data("mtcars") mtcars$car_name <- rownames(mtcars) mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") mtcars <- mtcars[order(mtcars$mpg_z), ] # sort mtcars$car_name <- factor(mtcars$car_name, levels = mtcars$car_name) diverging_lollipop_plt(.data = mtcars, .x_axis = car_name , .y_axis = mpg_z)
suppressPackageStartupMessages(library(ggplot2)) data("mtcars") mtcars$car_name <- rownames(mtcars) mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") mtcars <- mtcars[order(mtcars$mpg_z), ] # sort mtcars$car_name <- factor(mtcars$car_name, levels = mtcars$car_name) diverging_lollipop_plt(.data = mtcars, .x_axis = car_name , .y_axis = mpg_z)
Diagnosis to Condition Code Mapping file
data(dx_cc_mapping)
data(dx_cc_mapping)
A data frame with 86852 rows and 5 variables
Other AHRQ:
px_cc_mapping
Plot a Gartner Magic Chart of two continuous variables.
gartner_magic_chart_plt( .data, .x_col, .y_col, .point_size_col = NULL, .y_lab = "", .x_lab = "", .plot_title = "", .top_left_label = "", .top_right_label = "", .bottom_right_label = "", .bottom_left_label = "" )
gartner_magic_chart_plt( .data, .x_col, .y_col, .point_size_col = NULL, .y_lab = "", .x_lab = "", .plot_title = "", .top_left_label = "", .top_right_label = "", .bottom_right_label = "", .bottom_left_label = "" )
.data |
The dataset you want to plot. |
.x_col |
The x-axis for the plot. |
.y_col |
The y-axis for the plot. |
.point_size_col |
The default is NULL. If you want to size the dots by a column in the data frame/tibble, enter the column name here. |
.y_lab |
The y-axis label (default: ""). |
.x_lab |
The x-axis label (default: ""). |
.plot_title |
The title of the plot (default: ""). |
.top_left_label |
The top left label (default: ""). |
.top_right_label |
The top right label (default: ""). |
.bottom_right_label |
The bottom right label (default: ""). |
.bottom_left_label |
The bottom left label (default: ""). |
A ggplot
plot.
Steven P. Sanderson II, MPH
Other Plotting Functions:
diverging_bar_plt()
,
diverging_lollipop_plt()
,
los_ra_index_plt()
,
ts_alos_plt()
,
ts_median_excess_plt()
,
ts_plt()
,
ts_readmit_rate_plt()
library(dplyr) library(ggplot2) data_tbl <- tibble( x = rnorm(100, 0, 1), y = rnorm(100, 0, 1), z = abs(x) + abs(y) ) gartner_magic_chart_plt( .data = data_tbl, .x_col = x, .y_col = y, .point_size_col = z, .x_lab = "los", .y_lab = "ra", .plot_title = "tst", .top_right_label = "High RA-LOS", .top_left_label = "High RA", .bottom_left_label = "Leader", .bottom_right_label = "High LOS" ) gartner_magic_chart_plt( .data = data_tbl, .x_col = x, .y_col = y, .point_size_col = NULL, .x_lab = "los", .y_lab = "ra", .plot_title = "tst", .top_right_label = "High RA-LOS", .top_left_label = "High RA", .bottom_left_label = "Leader", .bottom_right_label = "High LOS" )
library(dplyr) library(ggplot2) data_tbl <- tibble( x = rnorm(100, 0, 1), y = rnorm(100, 0, 1), z = abs(x) + abs(y) ) gartner_magic_chart_plt( .data = data_tbl, .x_col = x, .y_col = y, .point_size_col = z, .x_lab = "los", .y_lab = "ra", .plot_title = "tst", .top_right_label = "High RA-LOS", .top_left_label = "High RA", .bottom_left_label = "Leader", .bottom_right_label = "High LOS" ) gartner_magic_chart_plt( .data = data_tbl, .x_col = x, .y_col = y, .point_size_col = NULL, .x_lab = "los", .y_lab = "ra", .plot_title = "tst", .top_right_label = "High RA-LOS", .top_left_label = "High RA", .bottom_left_label = "Leader", .bottom_right_label = "High LOS" )
8 Hex RGB color definitions suitable for charts for colorblind people.
hr_scale_color_colorblind(..., theme = "hr")
hr_scale_color_colorblind(..., theme = "hr")
... |
Data passed in from a |
theme |
Right now this is |
This function is used in others in order to help render plots for those that are color blind.
A gggplot
layer
Steven P. Sanderson II, MPH
Other Color Blind:
color_blind()
,
hr_scale_fill_colorblind()
8 Hex RGB color definitions suitable for charts for colorblind people.
hr_scale_fill_colorblind(..., theme = "hr")
hr_scale_fill_colorblind(..., theme = "hr")
... |
Data passed in from a |
theme |
Right now this is |
This function is used in others in order to help render plots for those that are color blind.
A gggplot
layer
Steven P. Sanderson II, MPH
Other Color Blind:
color_blind()
,
hr_scale_color_colorblind()
Plot the index of the length of stay and readmit rate against each other along with the variance
los_ra_index_plt(.data)
los_ra_index_plt(.data)
.data |
The data supplied from |
Expects a tibble
Expects a Length of Stay and Readmit column, must be numeric
Uses cowplot
to stack plots
A patchwork
ggplot2
plot
Steven P. Sanderson II, MPH
Other Plotting Functions:
diverging_bar_plt()
,
diverging_lollipop_plt()
,
gartner_magic_chart_plt()
,
ts_alos_plt()
,
ts_median_excess_plt()
,
ts_plt()
,
ts_readmit_rate_plt()
suppressPackageStartupMessages(library(dplyr)) data_tbl <- tibble( "alos" = runif(186, 1, 20) , "elos" = runif(186, 1, 17) , "readmit_rate" = runif(186, 0, .25) , "readmit_rate_bench" = runif(186, 0, .2) ) los_ra_index_summary_tbl( .data = data_tbl , .max_los = 15 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_rate_bench ) %>% los_ra_index_plt() los_ra_index_summary_tbl( .data = data_tbl , .max_los = 10 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_rate_bench ) %>% los_ra_index_plt()
suppressPackageStartupMessages(library(dplyr)) data_tbl <- tibble( "alos" = runif(186, 1, 20) , "elos" = runif(186, 1, 17) , "readmit_rate" = runif(186, 0, .25) , "readmit_rate_bench" = runif(186, 0, .2) ) los_ra_index_summary_tbl( .data = data_tbl , .max_los = 15 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_rate_bench ) %>% los_ra_index_plt() los_ra_index_summary_tbl( .data = data_tbl , .max_los = 10 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_rate_bench ) %>% los_ra_index_plt()
Create the length of stay and readmit index summary tibble
los_ra_index_summary_tbl( .data, .max_los = 15, .alos_col, .elos_col, .readmit_rate, .readmit_bench )
los_ra_index_summary_tbl( .data, .max_los = 15, .alos_col, .elos_col, .readmit_rate, .readmit_bench )
.data |
The data you are going to analyze. |
.max_los |
You can give a maximum LOS value. Lets say you typically do not see los over 15 days, you would then set .max_los to 15 and all values greater than .max_los will be grouped to .max_los |
.alos_col |
The Average Length of Stay column |
.elos_col |
The Expected Length of Stay column |
.readmit_rate |
The Actual Readmit Rate column |
.readmit_bench |
The Expected Readmit Rate column |
Expects a tibble
Expects the following columns and there should only be these 4
Length Of Stay Actual - Should be an integer
Length Of Stacy Benchmark - Should be an integer
Readmit Rate Actual - Should be 0/1 for each record, 1 = readmitted, 0 did not.
Readmit Rate Benchmark - Should be a percentage from the benchmark file.
This will add a column called visits that will be the count of records per length of stay from 1 to .max_los
The .max_los param can be left blank and the function will default to 15. If
this is not a good default and you don't know what it should be then set it to
75 percentile from the stats::quantile()
function using the defaults, like so
.max_los = stats::quantile(data_tbl$alos)[[4]]
Uses all data to compute variance, if you want it for a particular time frame
you will have to filter the data that goes into the .data argument. It is
suggested to use timetk::filter_by_time()
The index is computed as the excess of the length of stay or readmit rates over their respective expectations.
A tibble
Steven P. Sanderson II, MPH
Other Data Table Functions:
category_counts_tbl()
,
named_item_list()
,
top_n_tbl()
,
ts_census_los_daily_tbl()
,
ts_signature_tbl()
suppressPackageStartupMessages(library(dplyr)) data_tbl <- tibble( "alos" = runif(186, 1, 20) , "elos" = runif(186, 1, 17) , "readmit_rate" = runif(186, 0, .25) , "readmit_bench" = runif(186, 0, .2) ) los_ra_index_summary_tbl( .data = data_tbl , .max_los = 15 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_bench ) los_ra_index_summary_tbl( .data = data_tbl , .max_los = 10 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_bench )
suppressPackageStartupMessages(library(dplyr)) data_tbl <- tibble( "alos" = runif(186, 1, 20) , "elos" = runif(186, 1, 17) , "readmit_rate" = runif(186, 0, .25) , "readmit_bench" = runif(186, 0, .2) ) los_ra_index_summary_tbl( .data = data_tbl , .max_los = 15 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_bench ) los_ra_index_summary_tbl( .data = data_tbl , .max_los = 10 , .alos_col = alos , .elos_col = elos , .readmit_rate = readmit_rate , .readmit_bench = readmit_bench )
Takes in a data.frame/tibble and creates a named list from a supplied grouping
variable. Can be used in conjunction with save_to_excel()
to create a new
sheet for each group of data.
named_item_list(.data, .group_col)
named_item_list(.data, .group_col)
.data |
The data.frame/tibble. |
.group_col |
The column that contains the groupings. |
Requires a data.frame/tibble and a grouping column.
Steven P. Sanderson II, MPH
Other Data Table Functions:
category_counts_tbl()
,
los_ra_index_summary_tbl()
,
top_n_tbl()
,
ts_census_los_daily_tbl()
,
ts_signature_tbl()
library(healthyR.data) df <- healthyR_data df_list <- named_item_list(.data = df, .group_col = service_line) df_list
library(healthyR.data) df <- healthyR_data df_list <- named_item_list(.data = df, .group_col = service_line) df_list
Gives the optimal binwidth for a histogram given a data set, it's value and the desired amount of bins
opt_bin(.data, .value_col, .iters = 30)
opt_bin(.data, .value_col, .iters = 30)
.data |
The data set in question |
.value_col |
The column that holds the values |
.iters |
How many times the cost function loop should run |
Supply a data.frame/tibble with a value column. from this an optimal binwidth will be computed for the amount of binds desired
A tibble of histogram breakpoints
Steven P. Sanderson II, MPH
Modified from Hideaki Shimazaki Department of Physics, Kyoto University shimazaki at ton.scphys.kyoto-u.ac.jp Feel free to modify/distribute this program.
Other Utilities:
save_to_excel()
,
sql_left()
,
sql_mid()
,
sql_right()
suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(dplyr)) df_tbl <- rnorm(n = 1000, mean = 0, sd = 1) df_tbl <- df_tbl %>% as_tibble() %>% set_names("value") df_tbl %>% opt_bin( .value_col = value , .iters = 100 )
suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(dplyr)) df_tbl <- rnorm(n = 1000, mean = 0, sd = 1) df_tbl <- df_tbl %>% as_tibble() %>% set_names("value") df_tbl %>% opt_bin( .value_col = value , .iters = 100 )
Procedure to Condition Code Mapping file
data(px_cc_mapping)
data(px_cc_mapping)
A data frame with 79721 rows and 5 variables
Other AHRQ:
dx_cc_mapping
Save a tibble/data.frame to an excel .xlsx
file. The file will automatically
with a save_dtime in the format of 20201109_132416 for November 11th, 2020
at 1:24:16PM.
save_to_excel(.data, .file_name)
save_to_excel(.data, .file_name)
.data |
The tibble/data.frame that you want to save as an |
.file_name |
the name you want to give to the file. |
Requires a tibble/data.frame to be passed to it.
A saved excel file
Steven P. Sanderson II, MPH
Other Utilities:
opt_bin()
,
sql_left()
,
sql_mid()
,
sql_right()
Takes a few arguments from a data.frame/tibble and returns a service line augmented to a data.frame/tibble for a set of patients.
service_line_augment(.data, .dx_col, .px_col, .drg_col)
service_line_augment(.data, .dx_col, .px_col, .drg_col)
.data |
The data being passed that will be augmented by the function. |
.dx_col |
The column containing the Principal Diagnosis for the discharge. |
.px_col |
The column containing the Principal Coded Procedure for the discharge. It is possible that this could be blank. |
.drg_col |
The DRG Number coded to the inpatient discharge. |
This is an augment function in that appends a vector to an data.frame/tibble
that is passed to the .data
parameter. A data.frame/tibble
is required, along with a principal diagnosis column, a principal procedure column,
and a column for the DRG number. These are needed so that the function can
join the dx_cc_mapping and px_cc_mapping columns to provide the service line.
This function only works on visits that are coded using ICD Version 10 only.
Lets take an example discharge, the DRG is 896 and the Principal Diagnosis code
maps to DX_660, then this visit would get grouped to alcohol_abuse
DRG 896: ALCOHOL, DRUG ABUSE OR DEPENDENCE WITHOUT REHABILITATION THERAPY WITH MAJOR COMPLICATION OR COMORBIDITY (MCC)
DX_660 Maps to the following ICD-10 Codes ie F1010 Alcohol abuse, uncomplicated:
library(healthyR) dx_cc_mapping %>% filter(CC_Code == "DX_660", ICD_Ver_Flag == "10")
An augmented data.frame/tibble with the service line appended as a new column.
Steven P. Sanderson II, MPH
df <- data.frame( dx_col = "F10.10", px_col = NA, drg_col = "896" ) service_line_augment( .data = df, .dx_col = dx_col, .px_col = px_col, .drg_col = drg_col )
df <- data.frame( dx_col = "F10.10", px_col = NA, drg_col = "896" ) service_line_augment( .data = df, .dx_col = dx_col, .px_col = px_col, .drg_col = drg_col )
Takes a few arguments from a data.frame/tibble and returns a service line vector for a set of patients.
service_line_vec(.data, .dx_col, .px_col, .drg_col)
service_line_vec(.data, .dx_col, .px_col, .drg_col)
.data |
The data being passed that will be augmented by the function. |
.dx_col |
The column containing the Principal Diagnosis for the discharge. |
.px_col |
The column containing the Principal Coded Procedure for the discharge. It is possible that this could be blank. |
.drg_col |
The DRG Number coded to the inpatient discharge. |
This is a vectorized function in that it returns a vector. It can be applied
inside of a mutate
statement when using dplyr
if desired. A data.frame/tibble
is required, along with a principal diagnosis column, a principal procedure column,
and a column for the DRG number. These are needed so that the function can
join the dx_cc_mapping and px_cc_mapping columns to provide the service line.
This function only works on visits that are coded using ICD Version 10 only.
Lets take an example discharge, the DRG is 896 and the Principal Diagnosis code
maps to DX_660, then this visit would get grouped to alcohol_abuse
DRG 896: ALCOHOL, DRUG ABUSE OR DEPENDENCE WITHOUT REHABILITATION THERAPY WITH MAJOR COMPLICATION OR COMORBIDITY (MCC)
DX_660 Maps to the following ICD-10 Codes ie F1010 Alcohol abuse, uncomplicated:
library(healthyR) dx_cc_mapping %>% filter(CC_Code == "DX_660", ICD_Ver_Flag == "10")
A vector of service line assignments.
Steven P. Sanderson II, MPH
df <- data.frame( dx_col = "F10.10", px_col = NA, drg_col = "896" ) service_line_vec( .data = df, .dx_col = dx_col, .px_col = px_col, .drg_col = drg_col )
df <- data.frame( dx_col = "F10.10", px_col = NA, drg_col = "896" ) service_line_vec( .data = df, .dx_col = dx_col, .px_col = px_col, .drg_col = drg_col )
Perform an SQL LEFT() type function on a piece of text
sql_left(.text, .num_char)
sql_left(.text, .num_char)
.text |
A piece of text/string to be manipulated |
.num_char |
How many characters do you want to grab |
You must supply data that you want to manipulate.
Steven P. Sanderson II, MPH
Other Utilities:
opt_bin()
,
save_to_excel()
,
sql_mid()
,
sql_right()
sql_left("text", 3)
sql_left("text", 3)
Perform an SQL SUBSTRING type function
sql_mid(.text, .start_num, .num_char)
sql_mid(.text, .start_num, .num_char)
.text |
A piece of text/string to be manipulated |
.start_num |
What place to start at |
.num_char |
How many characters do you want to grab |
You must supply data that you want to manipulate.
Steven P. Sanderson II, MPH
Other Utilities:
opt_bin()
,
save_to_excel()
,
sql_left()
,
sql_right()
sql_mid("this is some text", 6, 2)
sql_mid("this is some text", 6, 2)
Perform an SQL RIGHT type function
sql_right(.text, .num_char)
sql_right(.text, .num_char)
.text |
A piece of text/string to be manipulated |
.num_char |
How many characters do you want to grab |
You must supply data that you want to manipulate.
Steven P. Sanderson II, MPH
Other Utilities:
opt_bin()
,
save_to_excel()
,
sql_left()
,
sql_mid()
sql_right("this is some more text", 3)
sql_right("this is some more text", 3)
Get a tibble returned with n records sorted either by descending order (default) or ascending order.
top_n_tbl(.data, .n_records, .arrange_value = TRUE, ...)
top_n_tbl(.data, .n_records, .arrange_value = TRUE, ...)
.data |
The data you want to pass to the function |
.n_records |
How many records you want returned |
.arrange_value |
A boolean with TRUE as the default. TRUE sorts data in descending order |
... |
The columns you want to pass to the function. |
Requires a data.frame/tibble
Requires at least one column to be chosen inside of the ...
Will return the tibble in sorted order that is chosen with descending as the default
Steven P. Sanderson II, MPH
Other Data Table Functions:
category_counts_tbl()
,
los_ra_index_summary_tbl()
,
named_item_list()
,
ts_census_los_daily_tbl()
,
ts_signature_tbl()
library(healthyR.data) df <- healthyR_data df_tbl <- top_n_tbl( .data = df , .n_records = 3 , .arrange_value = TRUE , service_line , payer_grouping ) print(df_tbl)
library(healthyR.data) df <- healthyR_data df_tbl <- top_n_tbl( .data = df , .n_records = 3 , .arrange_value = TRUE , service_line , payer_grouping ) print(df_tbl)
Plot ALOS - Average Length of Stay
ts_alos_plt(.data, .date_col, .value_col, .by_grouping, .interactive)
ts_alos_plt(.data, .date_col, .value_col, .by_grouping, .interactive)
.data |
The time series data you need to pass |
.date_col |
The date column |
.value_col |
The value column |
.by_grouping |
How you want the data summarized - "sec", "min", "hour", "day", "week", "month", "quarter" or "year" |
.interactive |
TRUE or FALSE. TRUE returns a |
Expects a tibble with a date time column and a value column
Uses timetk
for underlying sumarization and plot
If .by_grouping is missing it will default to "day"
A static ggplot2 object is return if the .interactive function is FALSE
otherwise a plotly
plot is returned.
A timetk time series plot
Steven P. Sanderson II, MPH
Other Plotting Functions:
diverging_bar_plt()
,
diverging_lollipop_plt()
,
gartner_magic_chart_plt()
,
los_ra_index_plt()
,
ts_median_excess_plt()
,
ts_plt()
,
ts_readmit_rate_plt()
library(healthyR) library(healthyR.data) library(timetk) library(dplyr) library(purrr) # Make A Series of Dates ---- data_tbl <- healthyR_data df_tbl <- data_tbl %>% filter(ip_op_flag == "I") %>% select(visit_end_date_time, length_of_stay) %>% summarise_by_time( .date_var = visit_end_date_time , .by = "day" , visits = mean(length_of_stay, na.rm = TRUE) ) %>% filter_by_time( .date_var = visit_end_date_time , .start_date = "2012" , .end_date = "2019" ) %>% set_names("Date","Values") ts_alos_plt( .data = df_tbl , .date_col = Date , .value_col = Values , .by = "month" , .interactive = FALSE )
library(healthyR) library(healthyR.data) library(timetk) library(dplyr) library(purrr) # Make A Series of Dates ---- data_tbl <- healthyR_data df_tbl <- data_tbl %>% filter(ip_op_flag == "I") %>% select(visit_end_date_time, length_of_stay) %>% summarise_by_time( .date_var = visit_end_date_time , .by = "day" , visits = mean(length_of_stay, na.rm = TRUE) ) %>% filter_by_time( .date_var = visit_end_date_time , .start_date = "2012" , .end_date = "2019" ) %>% set_names("Date","Values") ts_alos_plt( .data = df_tbl , .date_col = Date , .value_col = Values , .by = "month" , .interactive = FALSE )
Sometimes it is important to know what the census was on any given day, or what
the average length of stay is on given day, including for those patients that
are not yet discharged. This can be easily achieved. This will return one
record for every account so the data will still need to be summarized. If there
are multiple entries per day then those records will show up and you will
therefore have multiple entries in the column date
in the resulting tibble
.
If you want to aggregate from there you should be able to do so easily.
If you have a record where the .start_date_col
is filled in but the corresponding
end_date
is null then the end date will be set equal to Sys.Date()
If a record has a start_date
that is NA
then it will be discarded.
This function can take a little bit of time to run while the join comparison runs.
ts_census_los_daily_tbl( .data, .keep_nulls_only = FALSE, .start_date_col, .end_date_col, .by_time = "day" )
ts_census_los_daily_tbl( .data, .keep_nulls_only = FALSE, .start_date_col, .end_date_col, .by_time = "day" )
.data |
The data you want to pass to the function |
.keep_nulls_only |
A boolean that will keep only those records that have a NULL end date, meaning the patient is still admitted. The default is FALSE which brings back all records. |
.start_date_col |
The column containing the start date for the record |
.end_date_col |
The column containing the end date for the record. |
.by_time |
How you want the data presented, defaults to day and should remain that way unless you need more granular data. |
Requires a dataset that has at least a start date column and an end date column
Takes a single boolean parameter
A tibble object
Steven P. Sanderson II, MPH
Other Data Table Functions:
category_counts_tbl()
,
los_ra_index_summary_tbl()
,
named_item_list()
,
top_n_tbl()
,
ts_signature_tbl()
library(healthyR) library(healthyR.data) library(dplyr) df <- healthyR_data df_tbl <- df %>% filter(ip_op_flag == "I") %>% select(visit_start_date_time, visit_end_date_time) %>% timetk::filter_by_time(.date_var = visit_start_date_time, .start_date = "2020") ts_census_los_daily_tbl( .data = df_tbl , .keep_nulls_only = FALSE , .start_date_col = visit_start_date_time , .end_date_col = visit_end_date_time )
library(healthyR) library(healthyR.data) library(dplyr) df <- healthyR_data df_tbl <- df %>% filter(ip_op_flag == "I") %>% select(visit_start_date_time, visit_end_date_time) %>% timetk::filter_by_time(.date_var = visit_start_date_time, .start_date = "2020") ts_census_los_daily_tbl( .data = df_tbl , .keep_nulls_only = FALSE , .start_date_col = visit_start_date_time , .end_date_col = visit_end_date_time )
Plot out the excess +/- of the median value grouped by certain time parameters.
ts_median_excess_plt( .data, .date_col, .value_col, .x_axis, .ggplot_group_var, .years_back )
ts_median_excess_plt( .data, .date_col, .value_col, .x_axis, .ggplot_group_var, .years_back )
.data |
The data that is being analyzed, data must be a tibble/data.frame. |
.date_col |
The column of the tibble that holds the date. |
.value_col |
The column that holds the value of interest. |
.x_axis |
What is the be the x-axis, day, week, etc. |
.ggplot_group_var |
The variable to group the ggplot on. |
.years_back |
How many yeas back do you want to go in order to compute the median value. |
Supply data that you want to view and you will see the excess +/- of the median values over a specified time series tibble.
A ggplot2
plot
Other Plotting Functions:
diverging_bar_plt()
,
diverging_lollipop_plt()
,
gartner_magic_chart_plt()
,
los_ra_index_plt()
,
ts_alos_plt()
,
ts_plt()
,
ts_readmit_rate_plt()
suppressPackageStartupMessages(library(timetk)) ts_signature_tbl( .data = m4_daily , .date_col = date ) %>% ts_median_excess_plt( .date_col = date , .value_col = value , .x_axis = month , .ggplot_group_var = year , .years_back = 1 )
suppressPackageStartupMessages(library(timetk)) ts_signature_tbl( .data = m4_daily , .date_col = date ) %>% ts_median_excess_plt( .date_col = date , .value_col = value , .x_axis = month , .ggplot_group_var = year , .years_back = 1 )
This is a warpper function to the timetk::plot_time_series()
function with
a limited functionality parameter set. To see the full reference please visit
the timetk
package site.
ts_plt( .data, .date_col, .value_col, .color_col = NULL, .facet_col = NULL, .facet_ncol = NULL, .interactive = FALSE )
ts_plt( .data, .date_col, .value_col, .color_col = NULL, .facet_col = NULL, .facet_ncol = NULL, .interactive = FALSE )
.data |
The data to pass to the function, must be a tibble/data.frame. |
.date_col |
The column holding the date. |
.value_col |
The column holding the value. |
.color_col |
The column holding the variable for color. |
.facet_col |
The column holding the variable for faceting. |
.facet_ncol |
How many columns do you want. |
.interactive |
Return a |
This function takes only a few of the arguments in the function and presets others while choosing the defaults on others. The smoother functionality is turned off.
A plotly
plot or a ggplot2
static plot
Steven P. Sanderson II, MPH
https://business-science.github.io/timetk/reference/plot_time_series.html
Other Plotting Functions:
diverging_bar_plt()
,
diverging_lollipop_plt()
,
gartner_magic_chart_plt()
,
los_ra_index_plt()
,
ts_alos_plt()
,
ts_median_excess_plt()
,
ts_readmit_rate_plt()
suppressPackageStartupMessages(library(dplyr)) library(timetk) library(healthyR.data) healthyR.data::healthyR_data %>% filter(ip_op_flag == "I") %>% select(visit_end_date_time, service_line) %>% filter_by_time( .date_var = visit_end_date_time , .start_date = "2020" ) %>% group_by(service_line) %>% summarize_by_time( .date_var = visit_end_date_time , .by = "month" , visits = n() ) %>% ungroup() %>% ts_plt( .date_col = visit_end_date_time , .value_col = visits , .color_col = service_line )
suppressPackageStartupMessages(library(dplyr)) library(timetk) library(healthyR.data) healthyR.data::healthyR_data %>% filter(ip_op_flag == "I") %>% select(visit_end_date_time, service_line) %>% filter_by_time( .date_var = visit_end_date_time , .start_date = "2020" ) %>% group_by(service_line) %>% summarize_by_time( .date_var = visit_end_date_time , .by = "month" , visits = n() ) %>% ungroup() %>% ts_plt( .date_col = visit_end_date_time , .value_col = visits , .color_col = service_line )
Plot Readmit Rate
ts_readmit_rate_plt(.data, .date_col, .value_col, .by_grouping, .interactive)
ts_readmit_rate_plt(.data, .date_col, .value_col, .by_grouping, .interactive)
.data |
The data you need to pass. |
.date_col |
The date column. |
.value_col |
The value column. |
.by_grouping |
How you want the data summarized - "sec", "min", "hour", "day", "week", "month", "quarter" or "year". |
.interactive |
TRUE or FALSE. TRUE returns a |
Expects a tibble with a date time column and a value column
Uses timetk
for underlying sumarization and plot
If .by_grouping is missing it will default to "day"
A timetk
time series plot that is interactive
Steven P. Sanderson II, MPH
Other Plotting Functions:
diverging_bar_plt()
,
diverging_lollipop_plt()
,
gartner_magic_chart_plt()
,
los_ra_index_plt()
,
ts_alos_plt()
,
ts_median_excess_plt()
,
ts_plt()
set.seed(123) suppressPackageStartupMessages(library(timetk)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(dplyr)) ts_tbl <- tk_make_timeseries( start = "2019-01-01" , by = "day" , length_out = "1 year 6 months" ) values <- arima.sim( model = list( order = c(0, 1, 0)) , n = 547 , mean = 1 , sd = 5 ) df_tbl <- tibble( x = ts_tbl , y = values ) %>% set_names("Date","Values") ts_readmit_rate_plt( .data = df_tbl , .date_col = Date , .value_col = Values , .by = "month" , .interactive = FALSE )
set.seed(123) suppressPackageStartupMessages(library(timetk)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(dplyr)) ts_tbl <- tk_make_timeseries( start = "2019-01-01" , by = "day" , length_out = "1 year 6 months" ) values <- arima.sim( model = list( order = c(0, 1, 0)) , n = 547 , mean = 1 , sd = 5 ) df_tbl <- tibble( x = ts_tbl , y = values ) %>% set_names("Date","Values") ts_readmit_rate_plt( .data = df_tbl , .date_col = Date , .value_col = Values , .by = "month" , .interactive = FALSE )
Returns a tibble that adds the time series signature from the
timetk::tk_augment_timeseries_signature()
function. All added from a chosen
date column defined by the .date_col
parameter.
ts_signature_tbl(.data, .date_col, .pad_time = TRUE, ...)
ts_signature_tbl(.data, .date_col, .pad_time = TRUE, ...)
.data |
The data that is being analyzed. |
.date_col |
The column that holds the date. |
.pad_time |
Boolean TRUE/FALSE. If TRUE then the |
... |
Grouping variables to be used by |
Supply data with a date column and this will add the year, month, week, week day and hour to the tibble. The original date column is kept.
Returns a time-series signature tibble.
You must know the data going into the function and if certain columns should be dropped or kept when using further functions
A tibble
Steven P. Sanderson II, MPH
Other Data Table Functions:
category_counts_tbl()
,
los_ra_index_summary_tbl()
,
named_item_list()
,
top_n_tbl()
,
ts_census_los_daily_tbl()
library(timetk) ts_signature_tbl( .data = m4_daily , .date_col = date , .pad_time = TRUE , id )
library(timetk) ts_signature_tbl( .data = m4_daily , .date_col = date , .pad_time = TRUE , id )