In Section 4.4, we saw that select can take one or more source columns and put it into one or more target columns source => target. Also to recap here’s an example: select(df, :name => :people_names).

    In this section, we discuss how to transform variables, that is, how to modify data. In DataFrames.jl, the syntax is source => transformation => target.

    Like before, we use the grades_2020 dataset:

    Suppose we want to increase all the grades in grades_2020 by 1. First, we define a function that takes as argument a vector of data and returns all of its elements increased by 1. Then we use the transform function from DataFrames.jl that, like all native DataFrames.jl’s functions, takes a DataFrame as first argument followed by the transformation syntax:

    1. transform(grades_2020(), :grade_2020 => plus_one)
    namegrade_2020grade_2020_plus_one
    Sally1.02.0
    Bob5.06.0
    Alice8.59.5
    Hank4.05.0

    Like we said above, the DataFrames.jl minilanguage is always source => transformation => target. So, if we want to keep the naming of the target column in the output, we can do:

    1. transform(grades_2020(), :grade_2020 => plus_one => :grade_2020)

    We can also use the keyword argument renamecols=false:

    namegrade_2020
    Sally2.0
    Bob6.0
    Alice9.5
    Hank5.0

    The same transformation can also be written with select as follows:

    1. select(grades_2020(), :, :grade_2020 => plus_one => :grade_2020)

    where the : means “select all the columns” as described in Section . Alternatively, you can also use Julia’s broadcasting and modify the column by accessing it with df.grade_2020:

    1. df.grade_2020 = plus_one.(df.grade_2020)
    2. df
    namegrade_2020
    Sally2.0
    Bob6.0
    Alice9.5
    Hank5.0

    To show how to transform two columns at the same time, we use the left joined data from Section :

    With this, we can add a column saying whether someone was approved by the criterion that one of their grades was above 5.5:

    1. pass(A, B) = [5.5 < a || 5.5 < b for (a, b) in zip(A, B)]
    2. transform(leftjoined, [:grade_2020, :grade_2021] => pass; renamecols=false)
    namegrade_2020grade_2021grade_2020_grade_2021
    Sally1.09.5true
    Hank4.06.0true
    Bob5.0missingmissing
    Alice8.5missingtrue

    We can clean up the outcome and put the logic in a function to get a list of all the approved students:

    1. function only_pass()
    2. pass(A, B) = [5.5 < a || 5.5 < b for (a, b) in zip(A, B)]
    3. leftjoined = transform(leftjoined, [:grade_2020, :grade_2021] => pass => :pass)
    4. passed = subset(leftjoined, :pass; skipmissing=true)
    5. return passed.name