using DataFrames16 Reshape
16.1 Long to wide: unstack()
Example 1:
dat_long = DataFrame(ID = repeat([1:3;], 3),
Fruit = ["mango", "mango", "mango",
"banana", "banana", "banana",
"tangerine", "tangerine", "tangerine"],
Score = [1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3])9×3 DataFrame
| Row | ID | Fruit | Score |
|---|---|---|---|
| Int64 | String | Float64 | |
| 1 | 1 | mango | 1.1 |
| 2 | 2 | mango | 2.1 |
| 3 | 3 | mango | 3.1 |
| 4 | 1 | banana | 1.2 |
| 5 | 2 | banana | 2.2 |
| 6 | 3 | banana | 3.2 |
| 7 | 1 | tangerine | 1.3 |
| 8 | 2 | tangerine | 2.3 |
| 9 | 3 | tangerine | 3.3 |
Reshape a DataFrame from wide to long format using unstack():
Usage: unstack(df, colkey, value)
unstack(dat_long, :Fruit, :Score)3×4 DataFrame
| Row | ID | mango | banana | tangerine |
|---|---|---|---|---|
| Int64 | Float64? | Float64? | Float64? | |
| 1 | 1 | 1.1 | 1.2 | 1.3 |
| 2 | 2 | 2.1 | 2.2 | 2.3 |
| 3 | 3 | 3.1 | 3.2 | 3.3 |
Example 2:
dat_long = DataFrame(
Account_ID = [8001, 8002, 8003, 8004, 8001, 8002, 8003, 8004,
8001, 8002, 8003, 8004, 8001, 8002, 8003, 8004],
Age = [67.8017038366664, 42.9198507293701, 46.2301756642422,
39.665983196671, 67.8017038366664, 42.9198507293701,
46.2301756642422, 39.665983196671, 67.8017038366664,
42.9198507293701, 46.2301756642422, 39.665983196671,
67.8017038366664, 42.9198507293701, 46.2301756642422,
39.665983196671],
Admission = ["ED", "Planned", "Planned", "ED", "ED", "Planned",
"Planned", "ED", "ED", "Planned", "Planned", "ED", "ED", "Planned",
"Planned", "ED"],
Lab_key = ["RBC", "RBC", "RBC", "RBC", "WBC", "WBC", "WBC", "WBC",
"Hematocrit", "Hematocrit", "Hematocrit", "Hematocrit",
"Hemoglobin", "Hemoglobin", "Hemoglobin", "Hemoglobin"],
Lab_value = [4.63449321082268, 3.34968550627897, 4.27037213597765,
4.93897736897793, 8374.22887757195, 7612.37380499927,
8759.27855519425, 6972.28096216548, 36.272693147236,
40.5716317809522, 39.9888624177955, 39.8786884058422,
12.6188444991545, 12.1739747363806, 15.1293426442183,
14.8885696185238]
)16×5 DataFrame
| Row | Account_ID | Age | Admission | Lab_key | Lab_value |
|---|---|---|---|---|---|
| Int64 | Float64 | String | String | Float64 | |
| 1 | 8001 | 67.8017 | ED | RBC | 4.63449 |
| 2 | 8002 | 42.9199 | Planned | RBC | 3.34969 |
| 3 | 8003 | 46.2302 | Planned | RBC | 4.27037 |
| 4 | 8004 | 39.666 | ED | RBC | 4.93898 |
| 5 | 8001 | 67.8017 | ED | WBC | 8374.23 |
| 6 | 8002 | 42.9199 | Planned | WBC | 7612.37 |
| 7 | 8003 | 46.2302 | Planned | WBC | 8759.28 |
| 8 | 8004 | 39.666 | ED | WBC | 6972.28 |
| 9 | 8001 | 67.8017 | ED | Hematocrit | 36.2727 |
| 10 | 8002 | 42.9199 | Planned | Hematocrit | 40.5716 |
| 11 | 8003 | 46.2302 | Planned | Hematocrit | 39.9889 |
| 12 | 8004 | 39.666 | ED | Hematocrit | 39.8787 |
| 13 | 8001 | 67.8017 | ED | Hemoglobin | 12.6188 |
| 14 | 8002 | 42.9199 | Planned | Hemoglobin | 12.174 |
| 15 | 8003 | 46.2302 | Planned | Hemoglobin | 15.1293 |
| 16 | 8004 | 39.666 | ED | Hemoglobin | 14.8886 |
Usage: unstack(df, rowky, colkey, value)
unstack(dat_long, [:Account_ID, :Age, :Admission], :Lab_key, :Lab_value)4×7 DataFrame
| Row | Account_ID | Age | Admission | RBC | WBC | Hematocrit | Hemoglobin |
|---|---|---|---|---|---|---|---|
| Int64 | Float64 | String | Float64? | Float64? | Float64? | Float64? | |
| 1 | 8001 | 67.8017 | ED | 4.63449 | 8374.23 | 36.2727 | 12.6188 |
| 2 | 8002 | 42.9199 | Planned | 3.34969 | 7612.37 | 40.5716 | 12.174 |
| 3 | 8003 | 46.2302 | Planned | 4.27037 | 8759.28 | 39.9889 | 15.1293 |
| 4 | 8004 | 39.666 | ED | 4.93898 | 6972.28 | 39.8787 | 14.8886 |
16.2 Wide to long: stack()
Example 1:
dat_wide = DataFrame(ID = [1, 2, 3],
mango = [1.1, 2.1, 3.1],
banana = [1.2, 2.2, 3.2],
tangerine = [1.3, 2.3, 3.3])3×4 DataFrame
| Row | ID | mango | banana | tangerine |
|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | |
| 1 | 1 | 1.1 | 1.2 | 1.3 |
| 2 | 2 | 2.1 | 2.2 | 2.3 |
| 3 | 3 | 3.1 | 3.2 | 3.3 |
Usage: stack(df, [measure_vars], [id_vars])
If no measure_vars are defined, defaults to all floating point columns.
If no id_vars are defined, defaults to all columns not in measure_vars
dat_wide2long = stack(dat_wide)9×3 DataFrame
| Row | ID | variable | value |
|---|---|---|---|
| Int64 | String | Float64 | |
| 1 | 1 | mango | 1.1 |
| 2 | 2 | mango | 2.1 |
| 3 | 3 | mango | 3.1 |
| 4 | 1 | banana | 1.2 |
| 5 | 2 | banana | 2.2 |
| 6 | 3 | banana | 3.2 |
| 7 | 1 | tangerine | 1.3 |
| 8 | 2 | tangerine | 2.3 |
| 9 | 3 | tangerine | 3.3 |
stack(dat_wide, 2:4, 1)9×3 DataFrame
| Row | ID | variable | value |
|---|---|---|---|
| Int64 | String | Float64 | |
| 1 | 1 | mango | 1.1 |
| 2 | 2 | mango | 2.1 |
| 3 | 3 | mango | 3.1 |
| 4 | 1 | banana | 1.2 |
| 5 | 2 | banana | 2.2 |
| 6 | 3 | banana | 3.2 |
| 7 | 1 | tangerine | 1.3 |
| 8 | 2 | tangerine | 2.3 |
| 9 | 3 | tangerine | 3.3 |
stack(dat_wide, [:mango, :banana, :tangerine], :ID)9×3 DataFrame
| Row | ID | variable | value |
|---|---|---|---|
| Int64 | String | Float64 | |
| 1 | 1 | mango | 1.1 |
| 2 | 2 | mango | 2.1 |
| 3 | 3 | mango | 3.1 |
| 4 | 1 | banana | 1.2 |
| 5 | 2 | banana | 2.2 |
| 6 | 3 | banana | 3.2 |
| 7 | 1 | tangerine | 1.3 |
| 8 | 2 | tangerine | 2.3 |
| 9 | 3 | tangerine | 3.3 |
Example 2:
dat_wide = DataFrame(
Account_ID = [8001, 8002, 8003, 8004],
Age = [67.8017038366664, 42.9198507293701, 46.2301756642422,
39.665983196671],
RBC = [4.63449321082268, 3.34968550627897, 4.27037213597765,
4.93897736897793],
WBC = [8374.22887757195, 7612.37380499927, 8759.27855519425,
6972.28096216548],
Hematocrit = [36.272693147236, 40.5716317809522, 39.9888624177955,
39.8786884058422],
Hemoglobin = [12.618444991545, 12.1739747363806, 15.1293426442183,
14.8885696185238],
Admission = ["ED", "Planned", "Planned", "Planned"]
)4×7 DataFrame
| Row | Account_ID | Age | RBC | WBC | Hematocrit | Hemoglobin | Admission |
|---|---|---|---|---|---|---|---|
| Int64 | Float64 | Float64 | Float64 | Float64 | Float64 | String | |
| 1 | 8001 | 67.8017 | 4.63449 | 8374.23 | 36.2727 | 12.6184 | ED |
| 2 | 8002 | 42.9199 | 3.34969 | 7612.37 | 40.5716 | 12.174 | Planned |
| 3 | 8003 | 46.2302 | 4.27037 | 8759.28 | 39.9889 | 15.1293 | Planned |
| 4 | 8004 | 39.666 | 4.93898 | 6972.28 | 39.8787 | 14.8886 | Planned |
Usage: stack(df, measures_vars, id_vars; variable_name, value_name)
stack(dat_wide,
[:RBC, :WBC, :Hematocrit, :Hemoglobin],
[:Account_ID, :Age, :Admission],
variable_name = :Lab, value_name = :Value)16×5 DataFrame
| Row | Account_ID | Age | Admission | Lab | Value |
|---|---|---|---|---|---|
| Int64 | Float64 | String | String | Float64 | |
| 1 | 8001 | 67.8017 | ED | RBC | 4.63449 |
| 2 | 8002 | 42.9199 | Planned | RBC | 3.34969 |
| 3 | 8003 | 46.2302 | Planned | RBC | 4.27037 |
| 4 | 8004 | 39.666 | Planned | RBC | 4.93898 |
| 5 | 8001 | 67.8017 | ED | WBC | 8374.23 |
| 6 | 8002 | 42.9199 | Planned | WBC | 7612.37 |
| 7 | 8003 | 46.2302 | Planned | WBC | 8759.28 |
| 8 | 8004 | 39.666 | Planned | WBC | 6972.28 |
| 9 | 8001 | 67.8017 | ED | Hematocrit | 36.2727 |
| 10 | 8002 | 42.9199 | Planned | Hematocrit | 40.5716 |
| 11 | 8003 | 46.2302 | Planned | Hematocrit | 39.9889 |
| 12 | 8004 | 39.666 | Planned | Hematocrit | 39.8787 |
| 13 | 8001 | 67.8017 | ED | Hemoglobin | 12.6184 |
| 14 | 8002 | 42.9199 | Planned | Hemoglobin | 12.174 |
| 15 | 8003 | 46.2302 | Planned | Hemoglobin | 15.1293 |
| 16 | 8004 | 39.666 | Planned | Hemoglobin | 14.8886 |