4 DataFrames.jl - 4.8 Groupby and Combine - 《Julia Data Science》

The strategy is to split the dataset into distinct students, apply the mean function to each student, and combine the result.

The split is called groupby and we give as second argument the column ID that we want to split the dataset into:

groupby(all_grades(), :name)

To apply this function, use the combine function:

Imagine having to do this without the and combine functions. We would need to loop over our data to split it up into groups, then loop over each split to apply a function, and finally loop over each group to gather the final result. Therefore, the split-apply-combine technique is a great one to know.

group = [:A, :A, :B, :B]
X = 1:4
df = DataFrame(; group, X, Y)

This is accomplished in a similar manner:

Note that we’ve used the dot . operator before the right arrow to indicate that the mean has to be applied to multiple source columns [:X, :Y].

gdf = groupby(df, :group)
combine(gdf, [:X, :Y] .=> rounded_mean; renamecols=false)