Title: | Method for Clustering Partially Observed Data |
---|---|
Description: | Software for k-means clustering of partially observed data from Chi, Chi, and Baraniuk (2016) <doi:10.1080/00031305.2015.1086685>. |
Authors: | Jocelyn T. Chi [aut, cre], Eric C. Chi [aut, ctb], Richard G. Baraniuk [aut] |
Maintainer: | Jocelyn T. Chi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1 |
Built: | 2024-11-08 02:40:10 UTC |
Source: | https://github.com/cran/kpodclustr |
assign_clustpp
Function for assigning clusters to rows in a matrix
assign_clustpp(X, init_centers, kmpp_flag = TRUE, max_iter = 20)
assign_clustpp(X, init_centers, kmpp_flag = TRUE, max_iter = 20)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
init_centers |
Centers for initializing k-means |
kmpp_flag |
(Optional) Indicator for whether or not to initialize with k-means++ |
max_iter |
(Optional) Maximum number of iterations |
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig clusts <- assign_clustpp(Orig, k)
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig clusts <- assign_clustpp(Orig, k)
findMissing
Function for finding indices of missing data in a matrix
findMissing(X)
findMissing(X)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
A numeric vector containing indices of the missing entries in X
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing missing <- findMissing(X)
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing missing <- findMissing(X)
initialImpute
Initial imputation for k-means
initialImpute(X)
initialImpute(X)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
A data matrix containing no missing entries
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing X_copy <- initialImpute(X)
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing X_copy <- initialImpute(X)
kmpp
Computes initial centroids via kmeans++
kmpp(X, k)
kmpp(X, k)
X |
Data matrix whose rows are observations and columns are features |
k |
Number of clusters. |
A data matrix whose rows contain initial centroids for the k clusters
n <- 10 p <- 2 X <- matrix(rnorm(n*p),n,p) k <- 3 kmpp(X,k)
n <- 10 p <- 2 X <- matrix(rnorm(n*p),n,p) k <- 3 kmpp(X,k)
kpod
Function for performing k-POD, a method for k-means clustering on partially observed data
kpod(X, k, kmpp_flag = TRUE, maxiter = 100)
kpod(X, k, kmpp_flag = TRUE, maxiter = 100)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
k |
Number of clusters |
kmpp_flag |
(Optional) Indicator for whether or not to initialize with k-means++ |
maxiter |
(Optional) Maximum number of iterations |
cluster: Clustering assignment obtained with k-POD
cluster_list: List containing clustering assignments obtained in each iteration
obj_vals: List containing the k-means objective function in each iteration
fit: Fit of clustering assignment obtained with k-POD (calculated as 1-(total withinss/totss))
fit_list: List containing fit of clustering assignment obtained in each iteration
Jocelyn T. Chi
p <- 5 n <- 200 k <- 3 sigma <- 0.15 missing <- 0.20 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig truth <- Data$truth kpod_result <- kpod(X,k) kpodclusters <- kpod_result$cluster
p <- 5 n <- 200 k <- 3 sigma <- 0.15 missing <- 0.20 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig truth <- Data$truth kpod_result <- kpod(X,k) kpodclusters <- kpod_result$cluster
makeData
Function for making test data
makeData(p, n, k, sigma, missing, seed = 12345)
makeData(p, n, k, sigma, missing, seed = 12345)
p |
Number of features (or variables) |
n |
Number of observations |
k |
Number of clusters |
sigma |
Variance |
missing |
Desired missingness percentage |
seed |
(Optional) Seed (default seed is 12345) |
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 X <- makeData(p,n,k,sigma,missing)$Orig
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 X <- makeData(p,n,k,sigma,missing)$Orig