Understanding the complex relationships between stock prices and economic indicators is critical to developing robust investment strategies and macroeconomic insights for portfolio trends. This post describes a comprehensive framework for analyzing these relationships using several statistical and machine learning techniques. Each approach has unique strengths and offers complementary insights into the dynamics of stocks and economic variables. Similarly, to find relationships and structural patterns between stocks and economic time series, accounting for time lags is also crucial when analyzing time series data, especially for relationships that may evolve over time. In the following, some key methods are outlined and a possible approach to implementation is illustrated using example scripts in R:

Lagged Correlation:
Compute correlations at different time lags to identify delayed relationships.
Granger Causality:
Use lagged economic variables to test predictive relationships.
Lagged Regression:
Include lagged versions of economic variables as predictors in regression models.
Cross-Correlation Function (CCF):
Identify lagged relationships by analyzing cross-correlations.
Distributed Lag Models:
Estimate the cumulative effect of lagged predictors.
Vector Autoregression (VAR):
Includes lags to model interdependencies over time.

Notes for lagged time-series:
   Data Alignment: Ensure lagged variables align properly with stock data (e.g handle NAs).
   Lag Selection: Use statistical criteria like AIC or BIC to determine the optimal lag.
   Interpretation: Visualize results to understand time-lagged dependencies.

1 Collect data

First, we need to get factors we want to test out of our DB. Theses are based on FRED’s economic and Yahoo’s currency time series.
Secondly, we name some sample stocks that are more widely varied in terms of sector and are likely to be included in an everyone’s portfolio.

eco_factors <- c('DFII10','DFII5','VIXCLS','T10Y2Y','T10Y3M','EFFR','DCOILWTICO','DCOILBRENTEU','DFEDTARL','BAMLC0A0CM','BAMLH0A0HYM2',
                 'RIFSPPNA2P2D90NB','DTB3','DGS1MO','DGS3MO','DGS6MO','DGS1','DGS2','DGS3','DGS7','DGS20','DGS30','DGS5','DGS10',
                 'EURAUD=X','EURCHF=X','EURCNY=X','EURHKD=X','EURJPY=X','EURUSD=X')

stocks <- c("aapl","msft","de","x9984_t","dte_de","de","jpm","ko","wmt")

base_data <- aikiaTrade::get_any_history(c(stocks,
                                           eco_factors),
                                         start_date = '2024-09-03',
                                         end_date = '2024-11-29',
                                         complete_cases = T,
                                         exact = T,
                                         as.xts.return = T)

# Create specific tibbles to work with
economic_data <- base_data[,eco_factors] 
stock_df <- base_data[,stocks]

In order to calculate some time lagged series, we setup a helper function.

lag_variables_xts <- function(data, max_lag) {
  lagged_list <- lapply(0:max_lag, function(lag) xts::lag.xts(data, k = lag, na.pad = TRUE))
  lagged_xts <- do.call(cbind, lagged_list)
  colnames(lagged_xts) <- stringr::str_replace_all(colnames(lagged_xts),"\\.","_lag")
  return(lagged_xts)
}

As we certainly want to use this procedure and the script in different sessions and phases, we set up some variables as dynamic in order to be able to remain as flexible and variable as possible in the future, which is why we define the maximum time lag as an independent input variable.

max_lag <- 5

Now, the time-lagged series can be created and then cleaned up. We make sure that all relevant xts structures are identical and that no NAs remain.

economic_lagged <- lag_variables_xts(economic_data, max_lag)
economic_lagged <- na.omit(economic_lagged)
economic_lagged <- apply(economic_lagged, 2, function(x) ifelse(is.finite(x), x, 0)) |> xts::as.xts() # ensure so have a consistent index class
economic_data <- apply(economic_data, 2, function(x) ifelse(is.finite(x), x, 0)) |> xts::as.xts()

# Extract the dates (indexes) from the economic xts and keep only congruent dates in all data frames.
common_dates <- zoo::index(stock_df) %in% zoo::index(economic_lagged)
stock_df <- stock_df[common_dates]

str(economic_lagged)
str(stock_df)

2 Multiple Approaches

With the time series produced, we can now start to address the various approaches for structural examination and connections between stock returns and their potential underlying factors.

2.1 Lagged Correlation Analysis

Description:
Lagged correlation measures the strength and direction of relationships between economic indicators and stock prices at different time lags.

cor_results_lagged <- cor(economic_lagged, stock_df, use = "pairwise.complete.obs")
top_bottom_correlations <- do.call(rbind, lapply(1:ncol(cor_results_lagged), function(i) {
  cor_values <- cor_results_lagged[, i] # Use actual values for ranking
  sorted_indices <- order(cor_values, decreasing = TRUE) # Sort descending
  top_5 <- sorted_indices[1:5] # Top 5 positive
  bottom_5 <- sorted_indices[(length(sorted_indices) - 4):length(sorted_indices)] # Bottom 5 negative
  tibble::tibble(Stock = colnames(stock_df)[i],
             Economic_Variable = c(rownames(cor_results_lagged)[top_5], rownames(cor_results_lagged)[bottom_5]),
             Correlation = c(cor_values[top_5], cor_values[bottom_5]))
}))

Output:

Correlation values for each economic variable and stock pair at different lags.
Top 5 positive correlations and bottom 5 negative correlations as constant negative correlations are also of high importance.

Interpretation:

Positive correlations: When the economic variable increases, the stock’s value tends to increase at the given lag.
Negative correlations: When the economic variable increases, the stock’s value tends to decrease.
Lag significance: The time delay (lag) where the correlation is strongest can suggest a predictive relationship (e.g., stock reacts after a lag of n days to changes in the economic variable).

Advantages:

Simple and intuitive, making it a great starting point.
Highlights time-delay patterns in stock-economic relationships.

Disadvantages:

Does not account for the influence of other variables, leading to potential spurious correlations.
Assumes linear relationships, ignoring non-linear dynamics.

2.2 Granger Causality Testing

Description:
Granger causality evaluates whether past values of an economic indicator improve predictions of stock prices.

granger_results <- list()
for (i in 1:ncol(stock_df)) {
  for (j in 1:ncol(economic_lagged)) {
    test <- tryCatch({
      lmtest::grangertest(stock_df[, i] ~ economic_lagged[, j], order = 5)
    }, error = function(e) NA)
    if (!inherits(test,"error")) {
      granger_results[[paste0(names(stock_df)[i], "_vs_",names(economic_lagged)[j])]] <- list(p_value = test$`Pr(>F)`[2])
    }
  }
}
granger_significant <- do.call(rbind, lapply(names(granger_results), function(name) {
  components <- unlist(strsplit(name, "_vs_"))
  data.frame(Stock = components[1], Economic_Variable = components[2], P_Value = granger_results[[name]]$p_value)
}))
top_granger <- granger_significant %>% 
  dplyr::group_by(Stock) %>% 
  dplyr::slice_min(order_by = P_Value, n = 10, with_ties = FALSE) %>% # Top 10 smallest p-values
  dplyr::ungroup()

Output:

P-values indicating whether an economic variable causes a stock’s values.
Top 10 most significant economic variables.

Interpretation:

A small p-value (< 0.05) suggests that the economic variable has a statistically significant predictive relationship with the stock after accounting for past stock values.
Directionality: Unlike correlation, Granger causality implies that the economic variable contains unique predictive information about the stock.

Limitations:

Granger causality doesn’t confirm causation — it suggests that one variable’s past values improve predictions of another.
Sensitive to lag selection and assumes linear relationships.

Key Points:

Focuses on predictive relationships, not just correlations.
Small p-values indicate significant Granger causation, suggesting that the variable adds unique predictive value.

Advantages:

Captures directionality, showing how changes in economic variables might drive stock performance.
Relatively simple to implement and interpret for time series data.

Disadvantages:

Assumes stationarity and may produce misleading results for non-stationary series.
Sensitive to lag selection, which can influence outcomes significantly.

2.3 Distributed Lag Regression

Description:
This regression model examines how economic indicators at different time lags impact stock prices.

dlm_results <- list()
for (i in 1:ncol(stock_df)) {
  model <- lm(as.numeric(stock_df[, i]) ~ ., data = economic_lagged)
  coefficients <- summary(model)$coefficients |> 
                          tibble::as_tibble(rownames = "factors") |> 
                          dplyr::arrange(`Pr(>|t|)`) |> 
                          dplyr::filter(!stringr::str_detect(factors,"(Intercept)"))
  dlm_results[[paste0(names(stock_df)[i])]] <- coefficients[1:10,]
}
top_dlm <- do.call(rbind, lapply(names(dlm_results), function(stock) {
  tibble::tibble(Stock = stock, dlm_results[[stock]])
}))

Output:

Regression coefficients for economic variables at different lags.
Top 10 coefficients for each stock based on most significance in p-value.

Interpretation:

A large positive coefficient means that an increase in the economic variable at a specific lag is associated with an increase in the stock’s value.
A large negative coefficient indicates the opposite effect.
Lag-specific relationships: Highlights the magnitude and direction of influence for specific time delays.
Significance of coefficients: Examine p-values for individual lags to determine which relationships are statistically significant.

Limitations:

Regression assumes a linear relationship and may overfit if too many predictors or lags are included.

Key Points:

Quantifies the magnitude and direction of influence at each lag.
Produces regression coefficients to assess economic variables’ influence.

Advantages:

Provides detailed insights into lag-specific effects.
Useful for identifying time-sensitive drivers of stock prices.

Disadvantages:

Assumes a linear relationship, potentially oversimplifying complex interactions.
Prone to overfitting when too many lags or variables are included.

2.4 Cross-Correlation Function (CCF) Analysis

Description:
CCF examines the correlation between two time series (stocks and economic variables) across multiple lags.

ccf_results <- list()
for (i in 1:ncol(stock_df)) {
  for (j in 1:ncol(economic_data)) {
    ccf_values <- ccf(as.numeric(stock_df[, i]), as.numeric(economic_data[, j]), lag.max = max_lag, plot = FALSE)
    max_positive <- max(ccf_values$acf[ccf_values$acf > 0], na.rm = TRUE)
    max_pos_lag <- ccf_values$lag[ccf_values$acf == max_positive]
    max_negative <- min(ccf_values$acf[ccf_values$acf < 0], na.rm = TRUE)
    max_neg_lag <- ccf_values$lag[ccf_values$acf == max_negative]
    ccf_results[[paste0(names(stock_df)[i],"_vs_", names(economic_data)[j])]] <- list(
      Max_Positive = max_positive,
      Lag_Positive = max_pos_lag,
      Max_Negative = max_negative,
      Lag_Negative = max_neg_lag
    )
  }
}
ccf_summary <- do.call(rbind, lapply(names(ccf_results), function(name) {
  components <- unlist(strsplit(name, "_vs_"))
  tibble::tibble(Stock = components[1],
             Economic_Variable = components[2],
             Max_Positive = ccf_results[[name]]$Max_Positive,
             Lag_Positive = ccf_results[[name]]$Lag_Positive,
             Max_Negative = ccf_results[[name]]$Max_Negative,
             Lag_Negative = ccf_results[[name]]$Lag_Negative,)
}))
ccf_summary %>%
  dplyr::filter(is.infinite(Max_Positive) | is.infinite(Max_Negative))

ccf_summary <- ccf_summary %>%
  dplyr::filter(!is.infinite(Max_Positive) & !is.infinite(Max_Negative))

filter_top_bottom_ccf <- function(ccf_summary, top_n = 5) {
  # Initialize an empty list to store results
  filtered_ccf <- list()
  
  # Loop through each stock in the ccf_summary
  for (stock in unique(ccf_summary$Stock)) {
    stock_data <- ccf_summary[ccf_summary$Stock == stock, ]
    
    # Get top N highest positive correlations
    top_positive <- stock_data %>% 
      dplyr::arrange(desc(Max_Positive)) %>% 
      head(top_n)
    
    # Get top N lowest negative correlations
    top_negative <- stock_data %>% 
      dplyr::arrange(Max_Negative) %>% 
      head(top_n)
    
    # Combine top positive and negative correlations
    stock_filtered <- rbind(top_positive, top_negative)
    filtered_ccf[[stock]] <- stock_filtered
  }
  
  # Combine results into a single dataframe
  final_filtered_ccf <- do.call(rbind, filtered_ccf)
  return(final_filtered_ccf)
}

# Filter top and bottom CCF results
filtered_ccf_summary <- filter_top_bottom_ccf(ccf_summary, top_n = 5)

Output:

Maximum positive correlation and minimum negative correlation values for each economic variable-stock pair at various lags.

Interpretation:

Peak positive correlation: Indicates the lag at which the economic variable and stock are most strongly positively related.
Peak negative correlation: Indicates the lag at which they are most strongly negatively related.
Can identify lead-lag relationships: For instance, if the peak correlation is at a lag of 3 days, changes in the economic variable may predict stock behavior after 3 days.

Limitations:

Does not account for the influence of other variables.

Key Points:

Finds peak positive and negative correlations, indicating strong relationships.
Can uncover lagged dependencies between variables.

Advantages:

Provides a visual tool to explore lead-lag patterns.
Suitable for identifying asynchronous relationships.

Disadvantages:

Doesn’t account for confounding variables, limiting its explanatory power.
Sensitive to noise in time series, which can distort correlation patterns.

2.5 Vector Autoregression (VAR)

Description:
VAR models analyze the interdependencies among multiple time series, allowing each variable to depend on its own lags and those of others.

library(tseries)
# Ensure stationarity by differencing (if necessary)
check_stationarity <- function(series) {
  adf_test <- adf.test(series, alternative = "stationary", k = 0)
  return(adf_test$p.value < 0.05) # Returns TRUE if stationary
}

# Apply differencing to non-stationary series
make_stationary <- function(data) {
  stationary_data <- data
  for (col in colnames(data)) {
    if (!check_stationarity(as.numeric(data[, col]))) {
      stationary_data[, col] <- xts::diff.xts(data[, col], differences = 1)
    }
  }
  return(na.omit(stationary_data)) # Remove NA rows created by differencing
}

Process data for stationarity

economic_data_stationary <- make_stationary(economic_data)
stock_data_stationary <- make_stationary(stock_df)

Define some helper functions

# Helper function to fit VAR model with optimal lag selection
fit_var <- function(stock_series, economic_data_func) {

  aligned_data <- merge(stock_series, economic_data_func, all = F) # Align datasets

  colnames(aligned_data)[1] <- "Stock" # Rename for clarity
  
  tryCatch({
    lag_selection <- vars::VARselect(aligned_data, type = "const")$selection # Choose lag order
    optimal_lag <- lag_selection["AIC(n)"] # Use AIC for lag selection
    var_model <- vars::VAR(aligned_data, p = optimal_lag, type = "const") # Fit VAR model
    return(var_model)
  }, error = function(e) {
    warning("VAR model fitting failed: ", conditionMessage(e))
    return(NULL)
  })
}

# Extract significant coefficients from VAR results
extract_significant_var <- function(var_model, significance_level = 0.05) {
  if (is.null(var_model)) return(NULL)
  var_coeffs <- lapply(var_model$varresult, function(model) {
    coef_summary <- summary(model)$coefficients # Get coefficients table
    significant <- coef_summary[coef_summary[, "Pr(>|t|)"] <= significance_level, , drop = FALSE]
    significant
  })
  var_coeffs <- do.call(rbind, lapply(names(var_coeffs), function(name) {
    
    if (nrow(var_coeffs[[name]])>0 & name == "Stock") {
      data.frame(
        Predictor = rownames(var_coeffs[[name]]),
        Coefficient = var_coeffs[[name]][, 1],
        P_Value = var_coeffs[[name]][, 4],
        Dependent_Variable = name
      )
    } else {
      NULL
    }
  }))
  return(var_coeffs)
}

After we have set up all helper functions and made the preliminary steps for a decent VAR analysis, we now write a loop to check the results of the individual stocks against the factors and keep them in the tibble “var_results_summary”.

var_results_summary <- list()
for (i in 1:ncol(stock_data_stationary)) {
  cat("Analyzing Stock:", colnames(stock_data_stationary)[i], "\n")
  
  # Fit VAR model for each stock
  var_model <- fit_var(stock_data_stationary[, i], economic_data_stationary)
  
  # Extract significant coefficients
  significant_coeffs <- extract_significant_var(var_model, significance_level = 1)
  
  if (!is.null(significant_coeffs)) {
    significant_coeffs$Stock <- colnames(stock_data_stationary)[i] # Add stock name
    var_results_summary[[colnames(stock_data_stationary)[i]]] <- significant_coeffs
  } else {
    cat("No significant coefficients found for Stock:", colnames(stock_data_stationary)[i], "\n")
  }
}

# Combine results into a single dataframe
final_summary <- do.call(rbind, var_results_summary)

final_summary_top10 <- final_summary |> 
  dplyr::group_by(Stock) |> 
  dplyr::arrange(P_Value,.by_group = T) |> 
  dplyr::slice_head(n=10) |> 
  dplyr::ungroup() |> 
  dplyr::select(-Dependent_Variable)

Output:

Significant coefficients from the VAR model showing how lagged values of economic variables and stocks predict future stock values.

Interpretation:

Significant coefficients for an economic variable at a given lag suggest that it influences the stock’s future values.
VAR accounts for interdependencies between multiple economic variables, providing a more comprehensive view of relationships.
Examining impulse response functions can show how shocks to economic variables propagate through the system and affect stock prices over time.

Limitations:

Assumes stationarity of the time series.
Requires careful lag selection (using AIC, BIC, etc.)

Key Points:

Explores mutual relationships between stocks and economic variables.
Uses AIC/BIC to select optimal lag structures.

Advantages:

Captures feedback loops and interactions among multiple variables.
Provides impulse-response analysis to study the impact of shocks.

Disadvantages:

Computationally expensive, especially for large datasets or many variables.
Requires all series to be stationary, necessitating preprocessing steps.

2.6 Random Forest with Lagged Features

Description:
Random Forest, a machine learning technique, ranks economic variables based on their importance in predicting stock prices.

rf_results <- list()
for (i in 1:ncol(stock_df)) {
  stock_series <- as.numeric(stock_df[, i])
  lagged_features <- as.data.frame(economic_lagged[complete.cases(economic_lagged), ])
  response <- stock_series[1:length(stock_series)] 
  predictors <- lagged_features[1:length(response), ] 
  
  if (length(response) > 10) { # Ensure sufficient data
    rf_model <- randomForest::randomForest(x = predictors, y = response, importance = TRUE)
    importance_scores <- randomForest::importance(rf_model)
    sorted_importance <- importance_scores[order(-importance_scores[, 1]), ] # Sort by importance
    rf_results[[names(stock_df)[i]]] <- tibble::tibble(
      Economic_Variable = rownames(sorted_importance)[1:10],
      Importance = sorted_importance[1:10, 1]
    )
  }
}

# Top and Bottom Features for Random Forests
top_bottom_rf <- do.call(rbind, lapply(names(rf_results), function(stock) {
  rf_model <- rf_results[[stock]]
  data.frame(Stock = stock, Feature = rf_model[1:10, 1], Importance = rf_model[1:10, 2])
}))

Output:

Variable importance scores for lagged economic variables predicting each stock’s values.
Top 10 most important economic variables for each stock.

Interpretation:

High importance score: Indicates that the economic variable is a strong predictor of the stock’s values.
Random forests are non-linear models, so they can capture more complex relationships than linear regression or VAR.
Useful for ranking predictors and understanding feature contributions but does not provide explicit equations or coefficients.

Limitations:

Does not explicitly show lag-specific effects (importance scores aggregate across all lags).

Key Points:

Incorporates non-linear relationships between variables.
Uses lagged features of economic variables for prediction.

Advantages:

Handles high-dimensional data and multicollinearity effectively.
Provides feature importance scores for ranking predictive variables.

Disadvantages:

Acts as a “black-box” model, with limited interpretability compared to statistical methods.
Requires significant computational resources for training and tuning.

Now, to determine which economic factors are overall the best suited (and for each stock best suited) based on the results into a summarizing valuation function from the various tests. We need to integrate the results from all the methods we just have implemented: Lagged Correlation, Granger Causality, Distributed Lag Regression, Cross-Correlation Function (CCF), VAR model coefficients, and Random Forest feature importance.

3 Function to align and create a scoring table

NOTE:
What we are doing here is of course only a rough first step to get an idea of which factors might generally 
move our portfolio and individual stocks. 
It is not advisable to simply combine the results of the various different approaches; they should be evaluated 
individually and possible common features recorded separately.

The overall ranking framework aggregates results from all approaches into a unified ranking of economic factors. The function combines scores from multiple analyses, normalizing and weighting them based on their significance to identify the most impactful economic variables for each stock. This framework offers a simple first view by integrating the strengths of diverse methods and prioritizing variables with consistent importance across approaches. However, the additional weightings rely on a subjective idea, which also introduce bias into the final ranking, and the complexity of integration increases.
In order to be a little more robust, we set thresholds for significance of p-values <=0.05 and correlation limits for abs(Correlation) >= 0.6.

create_factor_scores <- function(stock_name, 
                                 top_bottom_correlations, 
                                 top_granger, 
                                 top_dlm, 
                                 filtered_ccf_summary, 
                                 final_summary_top10, 
                                 top_bottom_rf,
                                 weights) {
  
  # Normalize a vector to a scale of 0 to 1
  normalize <- function(x, reverse = FALSE) {
    if (reverse) {
      return(1 - (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE)))
    }
    (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
  }

  # Extract and format each data frame to include only the relevant stock's factors
  correlations_df <- top_bottom_correlations %>%
    dplyr::filter(Stock == stock_name) %>%
    dplyr::filter(abs(Correlation) >= 0.6) %>%  # Apply correlation threshold
    dplyr::select(Economic_Variable, Correlation) %>%
    dplyr::rename(Correlation_Score = Correlation)

  granger_df <- top_granger %>%
    dplyr::filter(Stock == stock_name) %>%
    dplyr::filter(P_Value <= 0.05) %>%  # Apply p-value threshold
    dplyr::select(Economic_Variable, P_Value) %>%
    dplyr::mutate(Granger_Score = -log(P_Value)) %>% # Transform p-value to a score for consistency
    dplyr::select(Economic_Variable, Granger_Score)

  dlm_df <- top_dlm %>%
    dplyr::rename(P_Value = `Pr(>|t|)`) %>%
    dplyr::filter(#Stock == stock_name,
                  P_Value <= 0.05) %>% 
    dplyr::select(Economic_Variable = factors, DLM_Score = Estimate) 

  ccf_df_pos <- filtered_ccf_summary %>%
    dplyr::filter(Stock == stock_name) %>%
    dplyr::filter(Max_Positive >= 0.6) %>%  # Apply positive correlation threshold
    dplyr::select(Economic_Variable, CCF_Score_max = Max_Positive )

  ccf_df_neg <- filtered_ccf_summary %>%
    dplyr::filter(Stock == stock_name) %>%
    dplyr::filter(Max_Negative <= -0.6) %>%
    dplyr::select(Economic_Variable, CCF_Score_min = Max_Negative )

  final_summary_df <- final_summary_top10 %>%
    dplyr::filter(Stock == stock_name,
                  P_Value <= 0.05) %>%
    dplyr::select(Economic_Variable = Predictor, VAR_Score = Coefficient)

  rf_df <- top_bottom_rf %>%
    dplyr::filter(Stock == stock_name) %>%
    dplyr::select(Economic_Variable, Importance) %>%
    dplyr::rename(RF_Score = Importance)
  
  # Combine all the data frames using a full join to align by Economic_Variable
  factor_scores <- list(
    correlations_df,
    granger_df,
    dlm_df,
    ccf_df_pos,
    ccf_df_neg,
    final_summary_df,
    rf_df
  ) %>%
    purrr::reduce(dplyr::full_join, by = "Economic_Variable") %>%
    dplyr::mutate(
      # Normalize each score column
      Correlation_Score = normalize(Correlation_Score),
      Granger_Score = normalize(Granger_Score),
      DLM_Score = normalize(DLM_Score),
      CCF_Positive_Score = normalize(CCF_Score_max),
      CCF_Negative_Score = normalize(CCF_Score_min),
      VAR_Score = normalize(VAR_Score),
      RF_Score = normalize(RF_Score)
    ) %>%
    dplyr::mutate_all(~replace(., is.na(.), 0)) %>%
    dplyr::rename(Factor = Economic_Variable)
    
  # Calculate weighted score for each factor
  factor_scores$Weighted_Score <- with(factor_scores, 
                                       weights["Correlation"] * Correlation_Score +
                                       weights["Granger"] * Granger_Score +
                                       weights["DLM"] * DLM_Score +
                                       weights["CCF_Positive"] * CCF_Positive_Score +
                                       weights["CCF_Negative"] * CCF_Negative_Score +
                                       weights["VAR"] * VAR_Score +
                                       weights["RF"] * RF_Score
  ) 
  # Rank factors by weighted score
  factor_scores <- factor_scores[order(factor_scores$Weighted_Score, decreasing = TRUE), ]
  factor_scores$Stock <- stock_name
  
  factor_scores <- factor_scores %>% dplyr::distinct(Factor,.keep_all = T)
  
  return(factor_scores)
}

With the corresponding weightings, we finally gain an initial overview of the economic factors that were repeatedly listed as influencing factors in the course of the analysis.

weights <- c(Correlation = 0.2, Granger = 0.2, DLM = 0.15, CCF_Positive = 0.15, CCF_Negative = 0.1, VAR = 0.1, RF = 0.1)

final_ranking <- purrr::map_df(c("aapl","msft","de","x9984_t","dte_de","de","jpm","ko","wmt"), create_factor_scores,
                    top_bottom_correlations = top_bottom_correlations,
                    top_granger = top_granger,
                    top_dlm = top_dlm,
                    filtered_ccf_summary = filtered_ccf_summary,
                    final_summary_top10 = final_summary_top10,
                    top_bottom_rf = top_bottom_rf, 
                    weights = weights)

4 Visualization the Results

Visualizing the results is a good way to intuitively understand the large amount of data.
To do so, we like to use the plotly package, which does not overload the plot with information. It offers the possibility of interactively handling and displaying additional information by hovering over the image.

Key Features of the plot:

Color-coded bars improve readability.
Hover functionality displays detailed information for each factor.
Enables quick identification of key economic drivers.

First, summarize the top factors by their overall highest rating

top_factors <- final_ranking %>%
  dplyr::distinct(Factor,Stock,.keep_all = T) %>%
  dplyr::group_by(Factor) %>%
  dplyr::summarise(Average_Score = mean(Weighted_Score, na.rm = TRUE)) %>%
  dplyr::arrange(desc(Average_Score)) %>%
  dplyr::slice_head(n = 20) # Keep top 20 factors

Create a Plotly bar plot

plotly::plot_ly(
  data = top_factors,
  x = ~reorder(Factor, Average_Score,decreasing=TRUE),
  y = ~Average_Score,
  type = 'bar',
  text = ~paste("Average Score:", round(Average_Score, 3)),
  textposition = 'auto',
  marker = list(
    color = ~Average_Score, # Use the score for color
    colorscale = list(
      c(0, "#CC9719"),
      c(0.25, "#FFBE88"),
      c(1, "#CC6A19")
      ),
    showscale = F, # Show color scale legend
    line = list(color = "#CC6A19", width = 1.5)
  )
) %>%
  plotly::layout(
    title = list(
      text = "Top Economic Factors by Overall Rating",
      font = list(size = 24), 
      y = 0.95 # Shift downward
    ),
    xaxis = list(title = "<b>Economic Variable</b>"),
    yaxis = list(title = "<b>Average Final Score</b>"),
    bargap = 0.2
  )

Also as an example, we can filter out the 20 most important factors for Deere & Co from the final_ranking data framework and examine which influencing factors the company is most likely depending on.

top_10_single <- final_ranking %>%
  dplyr::distinct(Factor,Stock,.keep_all = T) %>%
  dplyr::filter(Stock == "de") %>%
  dplyr::arrange(desc(Weighted_Score)) %>%
  dplyr::slice_head(n = 20)

Again, we are using plotly to create a bar plot for interactive additional information.

plotly::plot_ly(
  data = top_10_single,
  x = ~reorder(Factor, Weighted_Score,decreasing=TRUE), # Order bars by Weighted_Score
  y = ~Weighted_Score,
  type = 'bar',
  text = ~paste("Weighted Score:", round(Weighted_Score, 3)),
  textposition = 'auto',
  hoverinfo = 'text', # Show only custom hover text
  marker = list(
    color = ~Weighted_Score, # Use the score for color
    colorscale = list(
      c(0, "#4964AB"),
      c(0.1, "#9BABD4"),
      c(1, "#6893A1")
    ),
    showscale = F, # Show color scale legend
    line = list(color = "#1F525B", width = 1.5)
  )
) %>%
  plotly::layout(
    title = list(
      text = "Top Economic Factors for 'Deere & Co'",
      font = list(size = 24), 
      y = 0.95 # Shift downward
    ),
    xaxis = list(title = "<b>Economic Variables</b>", tickangle = -45),
    yaxis = list(title = "<b>Average Score</b>"),
    showlegend = FALSE, # Disable legend
    bargap = 0.2
  )

5 Conclusion

Of course, the past is no guide for the future and we can only estimate which factors are currently influencing our results.

It has also been shown that the output of meaningful factors can still be increased. It would therefore be advisable to set up another round with new economic factors and/or further ‘lagged’ time series in order to identify further and, if possible, even more meaningful relationships between economic factors and stock performance.

These results can then be taken into account and incorporated into further analyses, such as a performance attribution, which also breaks down the allocation of stock performance to (economic) underlying factors.

Finally, having reached this point, thank you for your interest! And if you found the article helpful and liked it, please consider giving me a short feedback via

Mail OR LinkedIn

Exploring Structural Relationships Between Stocks and Economic time series: A Multifaceted Approach

1 Collect data

2 Multiple Approaches

2.1 Lagged Correlation Analysis

2.2 Granger Causality Testing

2.3 Distributed Lag Regression

2.4 Cross-Correlation Function (CCF) Analysis

2.5 Vector Autoregression (VAR)

2.6 Random Forest with Lagged Features

3 Function to align and create a scoring table

4 Visualization the Results

5 Conclusion