---
title: "Getting Started with healthyR.ai"
subtitle: "A Quick Introduction"
author: "Steven P. Sanderson II, MPH"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Getting Started with healthyR.ai}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

First of all, thank you for using `healthyR.ai`. If you encounter issues or want
to make a feature request, please visit https://github.com/spsanderson/healthyR.ai/issues

```{r setup}
library(healthyR.ai)
```

In this should example we will showcase the `pca_your_recipe()` function. This
function takes only a few arguments. The arguments are currently `.data` which
is the full data set that gets passed internally to the `recipes::bake()` function,
`.recipe_object` which is a recipe you have already made and want to pass to the 
function in order to perform the pca, and finally `.threshold` which is the fraction
of the variance that should be captured by the components.

To start this walk through we will first load in a few libraries.

# Libraries

```{r lib_load, warning=FALSE, message=FALSE}
library(timetk)
library(dplyr)
library(purrr)
library(healthyR.data)
library(rsample)
library(recipes)
library(ggplot2)
library(plotly)
```

# Data

Now that we have out libraries we can go ahead and get our data set ready.

## Data Set

```{r data_set}
data_tbl <- healthyR_data %>%
    select(visit_end_date_time) %>%
    summarise_by_time(
        .date_var = visit_end_date_time,
        .by       = "month",
        value     = n()
    ) %>%
    set_names("date_col","value") %>%
    filter_by_time(
        .date_var = date_col,
        .start_date = "2013",
        .end_date = "2020"
    ) %>%
    mutate(date_col = as.Date(date_col))

head(data_tbl)
```

The data set is simple and by itself would not be at all useful for a pca analysis
since there is only one predictor, being time. In order to facilitate the use of 
the function and this example, we will create a `splits` object and a `recipe` 
object.

## Splits

```{r splits}
splits <- initial_split(data = data_tbl, prop = 0.8)

splits

head(training(splits))
```

## Initial Recipe

```{r initial_rec_obj}
rec_obj <- recipe(value ~ ., training(splits)) %>%
    step_timeseries_signature(date_col) %>%
    step_rm(matches("(iso$)|(xts$)|(hour)|(min)|(sec)|(am.pm)"))

rec_obj

get_juiced_data(rec_obj) %>% glimpse()
```

Now that we have out initial recipe we can use the `pca_your_recipe()` function.

```{r pca_your_rec}
pca_list <- pca_your_recipe(
  .recipe_object = rec_obj,
  .data          = data_tbl,
  .threshold     = 0.8,
  .top_n         = 5
)
```

# Inspect PCA Output

The function returns a list object and does so `insvisible` so you must assign
the output to a variable, you can then access the items of the list in the usual
manner. 

The following items are included in the output of the function:

  1. pca_transform - This is the pca recipe.
  2. variable_loadings
  3. variable_variance
  4. pca_estimates
  5. pca_juiced_estimates
  6. pca_baked_data
  7. pca_variance_df
  8. pca_variance_scree_plt
  9. pca_rotation_df

Lets start going down the list of items.

## PCA Transform

This is the portion you will want to output to a variable as this is the recipe 
object itself that you will use further down the line of your work.

```{r pca_transform}
pca_rec_obj <- pca_list$pca_transform

pca_rec_obj
```

## Variable Loadings

```{r var_loadings}
pca_list$variable_loadings
```

## Variable Variance

```{r var_variance}
pca_list$variable_variance
```

## PCA Estimates

```{r pca_estimates}
pca_list$pca_estimates
```

## Jucied and Baked Data

```{r juice_bake}
pca_list$pca_juiced_estimates %>% glimpse()

pca_list$pca_baked_data %>% glimpse()
```

## Roatation Data

```{r rotation_df}
pca_list$pca_rotation_df %>% glimpse()
```

## Variance and Scree Plot

```{r var_df}
pca_list$pca_variance_df %>% glimpse()
```

```{r scree_plt, fig.width=8, fig.height=8}
pca_list$pca_variance_scree_plt
```

## Variable Loading Plots

```{r loading_plots}
pca_list$pca_loadings_plt

pca_list$pca_top_n_loadings_plt
```