Introducing the “missRanger” package for assigning missing values in a chained random forest. This package is useful for creating and assigning missing values.
Package version is 2.2.0. Checked with R version 4.2.2.
Install Package
Run the following command.
#Install Package
install.packages("missRanger")
Example
See the command and package help for details.
#Loading the library
library("missRanger")
###Create Data#####
set.seed(1234)
n <- 10
TestData <- data.frame(Group = sample(paste0("Group ", 1:2), n, replace = TRUE),
Time_1 = round(rnorm(n) - 1.5, 2),
Time_2 = round(rnorm(n), 2),
Time_3 = round(rnorm(n) - 1.5, 2))
TestData[1:4, 2:4] <- sample(1:2, 12, replace = TRUE)
########
#Assign missing values to data: generateNA command
#Specify data: x option; vector, matrix, data.frame can be specified
#Probability to assign missing values per column: p option; range 0.1-1.0
#Set seed: seed option
ResultData <- generateNA(x = TestData, p = 0.3, seed = 1234)
#Result
ResultData
# Group Time_1 Time_2 Time_3
#1 Group 2 2.00 2.00 2.00
#2 Group 2 1.00 NA 1.00
#3 Group 2 1.00 2.00 1.00
#4 Group 2 1.00 NA 2.00
#5 <NA> NA 2.42 -2.44
#6 <NA> NA 0.13 NA
#7 Group 1 -2.50 NA NA
#8 Group 1 -2.28 -0.44 -2.21
#9 Group 1 NA 0.46 -2.00
#10 <NA> -0.54 -0.69 NA
#Missing value assignment by chained random forest method: missRanger command
#Open access:https://doi.org/10.1093/bioinformatics/btr597
#Open access:http://www.jstatsoft.org/v45/i03/
#Specify data:data option
#Specify by assignment variable (left)~assigned data variable (right): formula option
#For example, to use ResultData without Group, use . ~ group
#Assign missing values using predictive mean matching: pmm.k option; not used with 0
#Display the process: verbose option;0:hide,1:show progress bar,.
#2:show OOB prediction error per iteration and variable
missRanger(data = ResultData,
formula = .~. -Group, pmm.k = 3,
num.trees = 100, verbose = 2)
#Missing value imputation by random forests
#Variables to impute: Group, Time_1, Time_2, Time_3
#Variables used to impute: Time_1, Time_2, Time_3
#Group Time_1 Time_2 Time_3
#iter 1: 1.0000 1.0000 0.9862 1.6359
#
# Group Time_1 Time_2 Time_3
#1 Group 2 2.00 2.00 2.00
#2 Group 2 1.00 2.42 1.00
#3 Group 2 1.00 2.00 1.00
#4 Group 2 1.00 -0.69 2.00
#5 Group 2 2.00 2.42 -2.44
#6 Group 2 2.00 0.13 1.00
#7 Group 1 -2.50 -0.44 2.00
#8 Group 1 -2.28 -0.44 -2.21
#9 Group 1 1.00 0.46 -2.00
#10 Group 2 -0.54 -0.69 -2.21
I hope this makes your analysis a little easier !!