As a data scientist, you often work with a dataset with different categories, each making a column with values for numerous elements going down the columns.

The **colMeans()** function will be very useful to you to find the mean values of the items for each category. But what is the colMeans() function, and how to use it with numeric matrix, array, data frame, and dataset? Let’s find out in detail.

**colMeans in R**

The colMeans() is a built-in R function that calculates the means of each column of a matrix or array. The **colMeans()** method returns the mean for the specified columns for the data frame, matrix, or arrays.

**Syntax**

`colMeans(x, na.rm = FALSE, dims = 1)`

**Parameters**

**x: **It is an array of two or more dimensions, containing numeric, complex, integer, or logical values, or a numeric data frame.

**dims**: It is an integer: Which dimensions are regarded as ‘**columns**’ to sum over. It is over dimensions 1:dims.

**na.rm:** It is a logical argument. If **TRUE**, NA values are ignored.

**Example**

Let’s create a Matrix using the matrix() function and calculate the mean of columns of the matrix.

```
rv <- rep(1:4)
mtrx <- matrix(rv, 2, 2)
mtrx
cat("The mean of rows is: ", "\n")
colMeans(mtrx)
```

**Output**

```
[,1] [,2]
[1,] 1 3
[2,] 2 4
The mean of rows is:
[1] 1.5 3.5
```

The **rep**() **function** replicates numeric values, or text, or the values of a vector for a specific number of times.

The **matrix()** function will create a **2 X 2** matrix.

The mean of first column values is 1,5 cause **1 + 2 = 3** and **3 / 2 = 1.5** and same for the second column.

**Calculate the mean of columns of the array in R**

To create an array in R, use the array() function. Let’s create an array and use the **colMeans()** function to calculate the **mean** of columns of the array.

```
arr <- array(1:4, c(2, 2, 2))
arr
cat("The mean of columns is: ", "\n")
colMeans(arr)
```

**Output**

```
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 1 3
[2,] 2 4
The mean of columns is:
[,1] [,2]
[1,] 1.5 1.5
[2,] 3.5 3.5
```

**Calculating the mean of columns of a data frame in R**

To create a data frame in R, use the data.frame() function. To calculate the mean of columns of the data frame, use the **colMeans()** function.

```
x <- c(2:4)
y <- c(2:4 * 2)
z <- c(2:4 * 3)
w <- c(2:4 * 4)
df <- data.frame(x, y, z, w)
df
cat("The mean of columns of df is: ", "\n")
colMeans(df)
```

**Output**

```
x y z w
1 2 4 6 8
2 3 6 9 12
3 4 8 12 16
The mean of columns of df is:
x y z w
3 6 9 12
```

**Calculate the mean of columns of a data set in R**

You can calculate the mean of columns of the dataset in R using the **colMeans()** function. We will use the **USArrests **dataset.

`colMeans(USArrests)`

**Output**

```
Murder Assault UrbanPop Rape
7.788 170.760 65.540 21.232
```

**Handling NA Values (na.rm) in colMeans() function**

One of the most regular issues of the **R colMeans()** function is the existence of **NAs** (i.e., missing values) in the data. Let’s see what happens when we apply our functions to data with missing values.

```
x <- c(1, 2, NA, 3)
y <- c(NA, 4, 5, 6)
z <- c(7, NA, 8, 9)
w <- c(10, 11, NA, 13)
df <- data.frame(x, y, z, w)
df
cat("The mean of columns of df is: ", "\n")
colMeans(df)
```

**Output**

```
x y z w
1 1 NA 7 10
2 2 4 NA 11
3 NA 5 8 NA
4 3 6 9 13
The mean of columns of df is:
x y z w
NA NA NA NA
```

You can see that we got all the **NAs** in the output because every column contains one **NA**. So, it will return **NA** in the output.

But no worries, there is an easy solution. We have to add **na.rm = TRUE** within our functions.

```
x <- c(1, 2, NA, 3)
y <- c(NA, 4, 5, 6)
z <- c(7, NA, 8, 9)
w <- c(10, 11, NA, 13)
df <- data.frame(x, y, z, w)
cat("The mean of columns of df is: ", "\n")
colMeans(df, na.rm = TRUE)
```

**Output**

```
The mean of columns of df is:
x y z w
2.00000 5.00000 8.00000 11.33333
```

As you can see that it ignored the **NA** values and calculate the mean of the remaining column values. Please note that the handling of missing values is a research topic by itself. Just ignoring **NA** values is usually not the best idea.

That is it for the colMeans() function in the R tutorial.