Title: | Rank of Variables Based on Principal Component Analysis for Mixed Data Types |
---|---|
Description: | Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of a dataset while preserving as much variability as possible. By transforming the original variables into a new set of uncorrelated variables called principal components, PCA helps in identifying patterns and simplifying the complexity of high-dimensional data. The 'RankPCA' package provides a streamlined workflow for performing PCA on datasets containing both categorical and continuous variables. It facilitates data preprocessing, encoding of categorical variables, and computes PCA to determine the optimal number of principal components based on a specified variance threshold. The package also computes composite indices for ranking observations, which can be useful for various analytical purposes. Garai, S., & Paul, R. K. (2023) <doi:10.1016/j.iswa.2023.200202>. |
Authors: | Dr. Sandip Garai [aut, cre, cph] |
Maintainer: | Dr. Sandip Garai <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-05 04:12:49 UTC |
Source: | https://github.com/cran/RankPCA |
This function performs Principal Component Analysis (PCA) on datasets containing both categorical and continuous variables. It facilitates data preprocessing, encoding of categorical variables, and computes PCA to determine the optimal number of principal components based on a specified variance threshold. The function also computes composite indices for ranking observations.
rankPCA(data, range_cat_var, range_continuous_var, threshold)
rankPCA(data, range_cat_var, range_continuous_var, threshold)
data |
data to be analyzed. |
range_cat_var |
Range of categorical variables. |
range_continuous_var |
Range of continuous variables. |
threshold |
Threshold for cumulative variance explained. |
A list containing PCA results and composite index.
Garai, S., & Paul, R. K. (2023). Development of MCS based-ensemble models using CEEMDAN decomposition and machine intelligence. Intelligent Systems with Applications, 18, 200202, https://doi.org/10.1016/j.iswa.2023.200202.
# Create a sample dataset set.seed(123) sample_data <- data.frame( Category1 = sample(c("A", "B", "C"), 100, replace = TRUE), Category2 = sample(c("X", "Y", "Z"), 100, replace = TRUE), Category3 = sample(c("M", "N", "O"), 100, replace = TRUE), Continuous1 = rnorm(100), Continuous2 = runif(100, min = 0, max = 100), Continuous3 = rnorm(100, mean = 50, sd = 10), Continuous4 = rpois(100, lambda = 5), Continuous5 = rbinom(100, size = 10, prob = 0.5) ) result <- rankPCA(data = sample_data, range_cat_var = 1:3, range_continuous_var = 4:8, threshold = 80) # Access the results eigenvalues_pca <- result$eigenvalues_pca pca_max_dim <- result$pca_max_dim coordinates <- result$coordinates eigenvalues <- result$eigenvalues weighted_coordinates <- result$weighted_coordinates weighted_sums <- result$weighted_sums composite_index <- result$composite_index loading_vectors <- result$loading_vectors
# Create a sample dataset set.seed(123) sample_data <- data.frame( Category1 = sample(c("A", "B", "C"), 100, replace = TRUE), Category2 = sample(c("X", "Y", "Z"), 100, replace = TRUE), Category3 = sample(c("M", "N", "O"), 100, replace = TRUE), Continuous1 = rnorm(100), Continuous2 = runif(100, min = 0, max = 100), Continuous3 = rnorm(100, mean = 50, sd = 10), Continuous4 = rpois(100, lambda = 5), Continuous5 = rbinom(100, size = 10, prob = 0.5) ) result <- rankPCA(data = sample_data, range_cat_var = 1:3, range_continuous_var = 4:8, threshold = 80) # Access the results eigenvalues_pca <- result$eigenvalues_pca pca_max_dim <- result$pca_max_dim coordinates <- result$coordinates eigenvalues <- result$eigenvalues weighted_coordinates <- result$weighted_coordinates weighted_sums <- result$weighted_sums composite_index <- result$composite_index loading_vectors <- result$loading_vectors
This function calculates the ranking of variables based on the sum of absolute values for each row of loading vectors.
variable_ranking(loading_vectors)
variable_ranking(loading_vectors)
loading_vectors |
A matrix containing loading vectors. |
A data frame containing the ranked variables.
# Define row and column names row_names <- c("Category1A", "Category1B", "Category1C", "Category2X", "Category2Y", "Category2Z", "Category3M", "Category3N", "Category3O", "Continuous1", "Continuous2", "Continuous3", "Continuous4", "Continuous5") col_names <- paste0("PC", 1:8) # Define the data matrix loading_vectors <- matrix(c( 0.199, 0.268, 0.189, 0.641, 0.092, 0.171, 0.079, -0.070, 0.244, -0.371, 0.042, -0.426, 0.358, -0.070, 0.016, 0.371, -0.435, 0.099, -0.227, -0.216, -0.441, -0.100, -0.094, -0.294, 0.087, -0.338, 0.458, 0.083, -0.515, -0.150, 0.007, 0.029, -0.473, 0.170, -0.164, 0.172, 0.296, 0.006, -0.044, 0.462, 0.407, 0.155, -0.279, -0.261, 0.198, 0.141, 0.039, -0.510, 0.101, -0.487, -0.465, 0.302, -0.117, 0.062, 0.036, 0.035, 0.145, 0.546, 0.057, -0.211, -0.123, -0.325, 0.287, 0.191, -0.274, -0.003, 0.491, -0.134, 0.271, 0.272, -0.349, -0.245, 0.290, 0.207, 0.001, -0.048, -0.250, -0.090, -0.275, 0.330, -0.134, 0.099, -0.277, -0.072, -0.180, 0.485, 0.134, 0.147, 0.006, 0.051, -0.216, 0.007, 0.008, -0.278, -0.712, 0.004, 0.320, 0.145, -0.061, 0.146, -0.078, 0.215, -0.414, 0.096, 0.061, 0.044, 0.096, -0.271, -0.273, 0.603, -0.064, 0.245 ), ncol = 8, byrow = TRUE) # Assign row and column names rownames(loading_vectors) <- row_names colnames(loading_vectors) <- col_names # Now you can use the loading_vectors variable in your code print(loading_vectors) # rank the variables ranked_variables <- variable_ranking(loading_vectors) print(ranked_variables)
# Define row and column names row_names <- c("Category1A", "Category1B", "Category1C", "Category2X", "Category2Y", "Category2Z", "Category3M", "Category3N", "Category3O", "Continuous1", "Continuous2", "Continuous3", "Continuous4", "Continuous5") col_names <- paste0("PC", 1:8) # Define the data matrix loading_vectors <- matrix(c( 0.199, 0.268, 0.189, 0.641, 0.092, 0.171, 0.079, -0.070, 0.244, -0.371, 0.042, -0.426, 0.358, -0.070, 0.016, 0.371, -0.435, 0.099, -0.227, -0.216, -0.441, -0.100, -0.094, -0.294, 0.087, -0.338, 0.458, 0.083, -0.515, -0.150, 0.007, 0.029, -0.473, 0.170, -0.164, 0.172, 0.296, 0.006, -0.044, 0.462, 0.407, 0.155, -0.279, -0.261, 0.198, 0.141, 0.039, -0.510, 0.101, -0.487, -0.465, 0.302, -0.117, 0.062, 0.036, 0.035, 0.145, 0.546, 0.057, -0.211, -0.123, -0.325, 0.287, 0.191, -0.274, -0.003, 0.491, -0.134, 0.271, 0.272, -0.349, -0.245, 0.290, 0.207, 0.001, -0.048, -0.250, -0.090, -0.275, 0.330, -0.134, 0.099, -0.277, -0.072, -0.180, 0.485, 0.134, 0.147, 0.006, 0.051, -0.216, 0.007, 0.008, -0.278, -0.712, 0.004, 0.320, 0.145, -0.061, 0.146, -0.078, 0.215, -0.414, 0.096, 0.061, 0.044, 0.096, -0.271, -0.273, 0.603, -0.064, 0.245 ), ncol = 8, byrow = TRUE) # Assign row and column names rownames(loading_vectors) <- row_names colnames(loading_vectors) <- col_names # Now you can use the loading_vectors variable in your code print(loading_vectors) # rank the variables ranked_variables <- variable_ranking(loading_vectors) print(ranked_variables)