| Title: | Method for Clustering Partially Observed Data |
|---|---|
| Description: | Software for k-means clustering of partially observed data from Chi, Chi, and Baraniuk (2016) <doi:10.1080/00031305.2015.1086685>. |
| Authors: | Jocelyn T. Chi [aut, cre], Eric C. Chi [aut, ctb], Richard G. Baraniuk [aut] |
| Maintainer: | Jocelyn T. Chi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1 |
| Built: | 2026-05-31 10:29:24 UTC |
| Source: | https://github.com/cran/kpodclustr |
assign_clustpp Function for assigning clusters to rows in a matrix
assign_clustpp(X, init_centers, kmpp_flag = TRUE, max_iter = 20)assign_clustpp(X, init_centers, kmpp_flag = TRUE, max_iter = 20)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
init_centers |
Centers for initializing k-means |
kmpp_flag |
(Optional) Indicator for whether or not to initialize with k-means++ |
max_iter |
(Optional) Maximum number of iterations |
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig clusts <- assign_clustpp(Orig, k)p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig clusts <- assign_clustpp(Orig, k)
findMissing Function for finding indices of missing data in a matrix
findMissing(X)findMissing(X)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
A numeric vector containing indices of the missing entries in X
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing missing <- findMissing(X)p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing missing <- findMissing(X)
initialImpute Initial imputation for k-means
initialImpute(X)initialImpute(X)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
A data matrix containing no missing entries
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing X_copy <- initialImpute(X)p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing X_copy <- initialImpute(X)
kmpp Computes initial centroids via kmeans++
kmpp(X, k)kmpp(X, k)
X |
Data matrix whose rows are observations and columns are features |
k |
Number of clusters. |
A data matrix whose rows contain initial centroids for the k clusters
n <- 10 p <- 2 X <- matrix(rnorm(n*p),n,p) k <- 3 kmpp(X,k)n <- 10 p <- 2 X <- matrix(rnorm(n*p),n,p) k <- 3 kmpp(X,k)
kpod Function for performing k-POD, a method for k-means clustering on partially observed data
kpod(X, k, kmpp_flag = TRUE, maxiter = 100)kpod(X, k, kmpp_flag = TRUE, maxiter = 100)
X |
Data matrix containing missing entries whose rows are observations and columns are features |
k |
Number of clusters |
kmpp_flag |
(Optional) Indicator for whether or not to initialize with k-means++ |
maxiter |
(Optional) Maximum number of iterations |
cluster: Clustering assignment obtained with k-POD
cluster_list: List containing clustering assignments obtained in each iteration
obj_vals: List containing the k-means objective function in each iteration
fit: Fit of clustering assignment obtained with k-POD (calculated as 1-(total withinss/totss))
fit_list: List containing fit of clustering assignment obtained in each iteration
Jocelyn T. Chi
p <- 5 n <- 200 k <- 3 sigma <- 0.15 missing <- 0.20 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig truth <- Data$truth kpod_result <- kpod(X,k) kpodclusters <- kpod_result$clusterp <- 5 n <- 200 k <- 3 sigma <- 0.15 missing <- 0.20 Data <- makeData(p,n,k,sigma,missing) X <- Data$Missing Orig <- Data$Orig truth <- Data$truth kpod_result <- kpod(X,k) kpodclusters <- kpod_result$cluster
makeData Function for making test data
makeData(p, n, k, sigma, missing, seed = 12345)makeData(p, n, k, sigma, missing, seed = 12345)
p |
Number of features (or variables) |
n |
Number of observations |
k |
Number of clusters |
sigma |
Variance |
missing |
Desired missingness percentage |
seed |
(Optional) Seed (default seed is 12345) |
Jocelyn T. Chi
p <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 X <- makeData(p,n,k,sigma,missing)$Origp <- 2 n <- 100 k <- 3 sigma <- 0.25 missing <- 0.05 X <- makeData(p,n,k,sigma,missing)$Orig