Comma-separated values (CSV) files are are very effective way to store tables. CSV files have two advantages over other data storage files. First, it does exactly what the name indicates it does, namely storing values by separating them using commas . This acronym is also used as the file extension. So, be sure that you save your files using the “.csv” extension such as “myfile.csv.” To demonstrate how a CSV file looks, we can add the package using the Pkg REPL mode (Section 3.5.4.2):

    and load it via:

    1. using CSV

    We can now use our previous data:

    1. grades_2020()

    and read it from a file after writing it:

    1. function write_grades_csv()
    2. path = "grades.csv"
    3. CSV.write(path, grades_2020())
    4. end
    1. path = write_grades_csv()
    2. read(path, String)
    1. name,grade_2020
    2. Sally,1.0
    3. Bob,5.0
    4. Alice,8.5
    5. Hank,4.0

    Here, we also see the second benefit of CSV data format: the data can be read by using a simple text editor. This differs from many alternative data formats which require proprietary software, e.g. Excel.

    This works wonders, but what if our data contains commas , as values? If we were to naively write data with commas, it would make the files very hard to convert back to a table. Luckily, CSV.jl handles this for us automatically. Consider the following data with commas ,:

    1. function grades_with_commas()
    2. df = grades_2020()
    3. df[3, :name] = "Alice,"
    4. df
    5. end
    6. grades_with_commas()
    1. function write_comma_csv()
    2. path = "grades-commas.csv"
    3. CSV.write(path, grades_with_commas())
    4. end
    5. path = write_comma_csv()
    6. read(path, String)

    So, CSV.jl adds quotation marks " around the comma-containing values. Another common way to solve this problem is to write the data to a tab-separated values (TSV) file format. This assumes that the data doesn’t contain tabs, which holds in most cases.

    Also, note that TSV files can also be read using a simple text editor, and these files use the “.tsv” extension.

    1. function write_comma_tsv()
    2. end
    3. read(write_comma_tsv(), String)
    1. name grade_2020
    2. Sally 1.0
    3. Bob 5.0
    4. Alice, 8.5
    5. Hank 4.0

    Text file formats like CSV and TSV files can also be found that use other delimiters, such as semicolons “;” spaces “ ,” or even something as unusual as “π.”

    1. function write_space_separated()
    2. path = "grades-space-separated.csv"
    3. CSV.write(path, grades_2020(); delim=' ')
    4. end
    5. read(write_space_separated(), String)
    1. name grade_2020
    2. Sally 1.0
    3. Bob 5.0
    4. Alice 8.5
    5. Hank 4.0

    By convention, it’s still best to give files with special delimiters, such as “;” the “.csv” extension.

    Loading CSV files using CSV.jl is done in a similar way. You can use CSV.read and specify in what kind of format you want the output. We specify a DataFrame.

    1. path = write_grades_csv()
    2. CSV.read(path, DataFrame)

    Conveniently, CSV.jl will automatically infer column types for us:

    1. path = write_grades_csv()
    2. df = CSV.read(path, DataFrame)
    1. 4×2 DataFrame
    2. Row name grade_2020
    3. String7 Float64
    4. ─────┼─────────────────────
    5. 1 Sally 1.0
    6. 2 Bob 5.0
    7. 4 Hank 4.0
    1. Row a b c d e
    2. String3 Date Int64 Float64 DateTime
    3. ─────┼──────────────────────────────────────────────────────────
    4. 1 Kim 2018-02-03 3 4.0 2018-02-03T10:00:00

    These CSV basics should cover most use cases. For more information, see the and especially the CSV.File constructor docstring.

    4.1.2 Excel

    There are multiple Julia packages to read Excel files. In this book, we will only look at XLSX.jl, because it is the most actively maintained package in the Julia ecosystem that deals with Excel data. As a second benefit, XLSX.jl is written in pure Julia, which makes it easy for us to inspect and understand what’s going on under the hood.

    Load XLSX.jl via

    1. using XLSX:
    2. eachtablerow,
    3. readxlsx,
    4. writetable

    To write files, we define a little helper function for data and column names:

    1. function write_xlsx(name, df::DataFrame)
    2. path = "$name.xlsx"
    3. data = collect(eachcol(df))
    4. cols = names(df)
    5. writetable(path, data, cols)
    6. end

    Now, we can easily write the grades to an Excel file:

    1. function write_grades_xlsx()
    2. path = "grades"
    3. write_xlsx(path, grades_2020())
    4. "$path.xlsx"
    5. end

    When reading it back, we will see that XLSX.jl puts the data in a XLSXFile type and we can access the desired sheet much like a Dict:

    1. path = write_grades_xlsx()
    2. xf = readxlsx(path)
    1. XLSXFile("grades.xlsx") containing 1 Worksheet
    2. sheetname size range
    3. -------------------------------------------------
    4. Sheet1 5x2 A1:B5
    1. xf = readxlsx(write_grades_xlsx())