using CSV, DataFrames, StatsBase18 Aggregate
iris = CSV.read("/Users/egenn/icloud/Data/iris.csv", DataFrame)150×5 DataFrame
125 rows omitted
| Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Cleanup column names
rename!(iris, replace.(names(iris), "." => "_"))150×5 DataFrame
125 rows omitted
| Row | Sepal_Length | Sepal_Width | Petal_Length | Petal_Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
18.1 Create a grouped DataFrame
groupby(iris, :Species)GroupedDataFrame with 3 groups based on key: Species
First Group (50 rows): Species = "setosa"
25 rows omitted
| Row | Sepal_Length | Sepal_Width | Petal_Length | Petal_Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 39 | 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 40 | 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 41 | 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 42 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 43 | 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 44 | 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 45 | 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 46 | 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 47 | 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 48 | 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 49 | 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 50 | 5.0 | 3.3 | 1.4 | 0.2 | setosa |
⋮
Last Group (50 rows): Species = "virginica"
25 rows omitted
| Row | Sepal_Length | Sepal_Width | Petal_Length | Petal_Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 2 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 3 | 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 4 | 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 5 | 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 6 | 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7 | 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 8 | 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 9 | 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 10 | 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 11 | 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 12 | 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 13 | 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 39 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 40 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 41 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 42 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 43 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 44 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 45 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 46 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 47 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 48 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 49 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 50 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
18.2 Apply a function to a grouped DataFrame
combine(groupby(iris, :Species), :Sepal_Length => mean)3×2 DataFrame
| Row | Species | Sepal_Length_mean |
|---|---|---|
| String15 | Float64 | |
| 1 | setosa | 5.006 |
| 2 | versicolor | 5.936 |
| 3 | virginica | 6.588 |