using DataFrames8 Data Input/Output
8.1 Julia IO
Julia IO includes a collection of modules for reading and writing different file formats in Julia.
Part of Julia IO, FileIO.jl is a module that “aims to provide a common framework for detecting file formats and dispatching to appropriate readers/writers.”
df = DataFrame(a = randn(20), b = randn(20));
insertcols!(df, 2, :a² => df.a .^2);
insertcols!(df, :b² => df.b .^2);
first(df, 3)| Row | a | a² | b | b² |
|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | |
| 1 | 0.317969 | 0.101104 | 1.03639 | 1.0741 |
| 2 | 0.0558648 | 0.00312088 | -1.93684 | 3.75135 |
| 3 | -0.0300227 | 0.000901361 | -0.579862 | 0.33624 |
8.2 CSV
CSV Support in Julia is provided by CSV.jl. The module provides a high performance module for reading and writing CSV data in Julia.
using CSV8.2.1 Read CSV
To read a CSV file as a DataFrame, pipe CSV.File() to DataFrame():
iris = CSV.File(expanduser("~/icloud/Data/iris.csv")) |> DataFrame| Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
same as:
iris = DataFrame(CSV.File(expanduser("~/icloud/Data/iris.csv")))| Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
or use CSV.read() with second argument set to the Type to sink to:
iris = CSV.read(expanduser("~/icloud/Data/iris.csv"), DataFrame)| Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Show the first 5 rows of the DataFrame:
first(iris, 5)| Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String15 | |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
8.2.2 Write CSV
CSV.write(expanduser("~/icloud/Data/df.csv"), df)df = DataFrame(a = randn(20), b = randn(20));
insertcols!(df, 2, :a² => df.a .^2);
insertcols!(df, :b² => df.b .^2);
first(df, 3)| Row | a | a² | b | b² |
|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | |
| 1 | 0.451832 | 0.204152 | 1.65199 | 2.72906 |
| 2 | 1.04743 | 1.09711 | -0.0433866 | 0.0018824 |
| 3 | -1.48649 | 2.20966 | -0.529627 | 0.280505 |
8.3 Serialization
The Standard Library module Serialization allows to serialize() and deserialize() arbitrary data to and from a file. Use it for short-term, preferably local, I/O, as it will likely not be interoperable between systems and/or Julia versions.
using Serializationserialize(expanduser("~/icloud/Data/Julia/df"), df)8.4 JLD2
JLD2.jl reads and writes Julia structures using a subset of HDF5 written in pure Julia.
using JLD2To save and load data using JLD2, use the @save and @load macros.
You can save multiple julia objects to a single file.
8.4.1 Save JLD
df1 = DataFrame(a = 1:5, b = randn(5))
df2 = DataFrame(c = 6:10, d = randn(5))
@save expanduser("~/icloud/Data/Julia/dfs.jld") df1 df28.4.2 Load JLD
@load expanduser("~/icloud/Data/Julia/dfs.jld")8.5 HDF5
HDF5 Support in Julia is provided by HDF5.jl.
8.6 Arrow
Apache Arrow format support is provided by Arrow.jl
8.7 RData
Support for reading R’s .RData and .rda formats is provided by RData.jl.
To write to an .RData file it is recommended to use RCall.jl to call R within Julia
8.8 BSON
Support for BSON files is provided by BSON.jl
8.9 MAT
Support for reading and writing Matlab .mat files is provided by MAT.jl