Analysis in R:Assign missing values in chained random forests “missRanger” package


Introducing the “missRanger” package for assigning missing values in a chained random forest. This package is useful for creating and assigning missing values.

Package version is 2.2.0. Checked with R version 4.2.2.


Install Package

Run the following command.

#Install Package


See the command and package help for details.

#Loading the library

###Create Data#####
n <- 10
TestData <- data.frame(Group = sample(paste0("Group ", 1:2), n, replace = TRUE),
                       Time_1 = round(rnorm(n) - 1.5, 2),
                       Time_2 = round(rnorm(n), 2),
                       Time_3 = round(rnorm(n) - 1.5, 2))
TestData[1:4, 2:4] <- sample(1:2, 12, replace = TRUE)

#Assign missing values to data: generateNA command
#Specify data: x option; vector, matrix, data.frame can be specified
#Probability to assign missing values per column: p option; range 0.1-1.0
#Set seed: seed option
ResultData <- generateNA(x = TestData, p = 0.3, seed = 1234)

#     Group Time_1 Time_2 Time_3
#1  Group 2   2.00   2.00   2.00
#2  Group 2   1.00     NA   1.00
#3  Group 2   1.00   2.00   1.00
#4  Group 2   1.00     NA   2.00
#5     <NA>     NA   2.42  -2.44
#6     <NA>     NA   0.13     NA
#7  Group 1  -2.50     NA     NA
#8  Group 1  -2.28  -0.44  -2.21
#9  Group 1     NA   0.46  -2.00
#10    <NA>  -0.54  -0.69     NA

#Missing value assignment by chained random forest method: missRanger command
#Open access:
#Specify data:data option
#Specify by assignment variable (left)~assigned data variable (right): formula option
#For example, to use ResultData without Group, use . ~ group
#Assign missing values using predictive mean matching: pmm.k option; not used with 0
#Display the process: verbose option;0:hide,1:show progress bar,.
#2:show OOB prediction error per iteration and variable
missRanger(data = ResultData,
           formula = .~. -Group, pmm.k = 3,
           num.trees = 100, verbose = 2)
#Missing value imputation by random forests
#Variables to impute:		Group, Time_1, Time_2, Time_3
#Variables used to impute:	Time_1, Time_2, Time_3
#Group	Time_1	Time_2	Time_3
#iter 1:	1.0000 	1.0000 	0.9862 	1.6359 	
#     Group Time_1 Time_2 Time_3
#1  Group 2   2.00   2.00   2.00
#2  Group 2   1.00   2.42   1.00
#3  Group 2   1.00   2.00   1.00
#4  Group 2   1.00  -0.69   2.00
#5  Group 2   2.00   2.42  -2.44
#6  Group 2   2.00   0.13   1.00
#7  Group 1  -2.50  -0.44   2.00
#8  Group 1  -2.28  -0.44  -2.21
#9  Group 1   1.00   0.46  -2.00
#10 Group 2  -0.54  -0.69  -2.21

I hope this makes your analysis a little easier !!

