Analysis in R: Commands related to data frames

RAnalytics
スポンサーリンク

The use of R for web analysis is growing, but I think it is not yet widespread. This section introduces commands related to R data frames that are useful to those considering using R for web analysis. Once you have a handle on data frame operations, you will be able to perform more advanced tasks.

The following data is an example.

###Creating Data Examples#####
Day <- c("5&#26376;07&#26085;", "5&#26376;08&#26085;", "5&#26376;09&#26085;", "5&#26376;10&#26085;", "5&#26376;11&#26085;",
         "5&#26376;12&#26085;", "5&#26376;13&#26085;", "5&#26376;14&#26085;", "5&#26376;15&#26085;")
Sales <- c(5, 2, NA, 4, NA, 5, 6, 7, 8)
Cost <- c(2000, 400, 0, 800, 0, 2000, 2200, 2400, 2600)
########

###Conversion of data into data frames#####
AnaDate <- data.frame(Day, Sales, Cost)
########

AnaDate
      Day Sales Cost
1 5&#26376;07&#26085;     5 2000
2 5&#26376;08&#26085;     2  400
3 5&#26376;09&#26085;    NA    0
4 5&#26376;10&#26085;     4  800
5 5&#26376;11&#26085;    NA    0
6 5&#26376;12&#26085;     5 2000
7 5&#26376;13&#26085;     6 2200
8 5&#26376;14&#26085;     7 2400
9 5&#26376;15&#26085;     8 2600
スポンサーリンク

Introduction of data frame processing functions

Check data contents: summary

summary(AnaDate)

      Day        Sales            Cost
 5&#26376;07&#26085;:1   Min.   :2.000   Min.   :   0
 5&#26376;08&#26085;:1   1st Qu.:4.500   1st Qu.: 400
 5&#26376;09&#26085;:1   Median :5.000   Median :2000
 5&#26376;10&#26085;:1   Mean   :5.286   Mean   :1378
 5&#26376;11&#26085;:1   3rd Qu.:6.500   3rd Qu.:2200
 5&#26376;12&#26085;:1   Max.   :8.000   Max.   :2600
 (Other):3   NA's   :2  

Check the number of missing values in the data: sum of totals and is.na for NA value check

sum(is.na(AnaDate))

 [1] 2

#It indicates that there are two missing values in the data.

Reference to specified row or column: parenthesis operator [i, j].

*i is the column and j is the row.

Extract the second row of data.

AnaDate[complete.cases(AnaDate), ] 

   Day &#12288;Sales Cost
1 5&#26376;07&#26085; &#12288;5 &#12288;2000
2 5&#26376;08&#26085;&#12288; 2 &#12288; 400
4 5&#26376;10&#26085;   4    800
6 5&#26376;12&#26085;   5   2000
7 5&#26376;13&#26085;   6   2200
8 5&#26376;14&#26085;   7   2400
9 5&#26376;15&#26085;   8   2600

Data will be displayed with columns 3 and 5 containing missing values deleted.

Merge data frames: rbind for horizontal merging and cbind for vertical merging

rbind(AnaDate, AnaDate)

     Day &#12288;Sales Cost
1  5&#26376;07&#26085;     5 2000
2  5&#26376;08&#26085;     2  400
3  5&#26376;09&#26085;    NA    0
4  5&#26376;10&#26085;     4  800
5  5&#26376;11&#26085;    NA    0
6  5&#26376;12&#26085;     5 2000
7  5&#26376;13&#26085;     6 2200
8  5&#26376;14&#26085;     7 2400
9  5&#26376;15&#26085;     8 2600
10 5&#26376;07&#26085;     5 2000
11 5&#26376;08&#26085;     2  400
12 5&#26376;09&#26085;    NA    0
13 5&#26376;10&#26085;     4  800
14 5&#26376;11&#26085;    NA    0
15 5&#26376;12&#26085;     5 2000
16 5&#26376;13&#26085;     6 2200
17 5&#26376;14&#26085;     7 2400
18 5&#26376;15&#26085;     8 2600

cbind(AnaDate, AnaDate) 

  Day Sales Cost Day   Sales Cost
1 5&#26376;07&#26085; 5 2000 5&#26376;07&#26085; 5   2000
2 5&#26376;08&#26085; 2  400 5&#26376;08&#26085; 2    400
3 5&#26376;09&#26085; NA   0 5&#26376;09&#26085; NA     0
4 5&#26376;10&#26085; 4  800 5&#26376;10&#26085; 4    800
5 5&#26376;11&#26085; NA   0 5&#26376;11&#26085; NA     0
6 5&#26376;12&#26085; 5 2000 5&#26376;12&#26085; 5   2000
7 5&#26376;13&#26085; 6 2200 5&#26376;13&#26085; 6   2200
8 5&#26376;14&#26085; 7 2400 5&#26376;14&#26085; 7   2400
9 5&#26376;15&#26085; 8 2600 5&#26376;15&#26085; 8   2600

Batch processing for rows or columns: apply command

Changing the mean in the function allows for different processing. In the example, the average is calculated.

#Processing for columns
apply(AnaDate[, 2:ncol(AnaDate)], 1, mean, na.rm=TRUE)
[1] 1002.5  201.0    0.0  402.0    0.0 1002.5 1103.0 1203.5 1304.0

#Processing for rows
apply(AnaDate[, 2:ncol(AnaDate)], 2, mean, na.rm=TRUE)
      Sales        Cost
   5.285714 1377.777778 

I hope this makes your analysis a little easier !!

Copied title and URL