缺失值

    The behavior of missing values follows one basic rule: missing values propagate automatically when passed to standard operators and functions, in particular mathematical functions. Uncertainty about the value of one of the operands induces uncertainty about the result. In practice, this means an operation involving a missing value generally returns missing

    As missing is a normal Julia object, this propagation rule only works for functions which have opted in to implement this behavior. This can be achieved either via a specific method defined for arguments of type Missing, or simply by accepting arguments of this type, and passing them to functions which propagate them (like standard operators). Packages should consider whether it makes sense to propagate missing values when defining new functions, and define methods appropriately if that is the case. Passing a missing value to a function for which no method accepting arguments of type Missing is defined throws a MethodError, just like for any other type.

    Equality and Comparison Operators

    Standard equality and comparison operators follow the propagation rule presented above: if any of the operands is missing, the result is missing. Here are a few examples

    1. julia> missing == 1
    2. missing
    3. julia> missing == missing
    4. missing
    5. julia> missing < 1
    6. missing
    7. julia> 2 >= missing
    8. missing

    In particular, note that missing == missing returns missing, so == cannot be used to test whether a value is missing. To test whether x is missing, use .

    Special comparison operators isequal and are exceptions to the propagation rule: they always return a Bool value, even in the presence of missing values, considering missing as equal to missing and as different from any other value. They can therefore be used to test whether a value is missing

    1. julia> missing === 1
    2. false
    3. julia> isequal(missing, 1)
    4. false
    5. julia> missing === missing
    6. true
    7. julia> isequal(missing, missing)
    8. true

    The isless operator is another exception: missing is considered as greater than any other value. This operator is used by , which therefore places missing values after all other values.

    1. julia> isless(1, missing)
    2. true
    3. julia> isless(missing, Inf)
    4. false
    5. julia> isless(missing, missing)
    6. false

    Logical (or boolean) operators |, and xor are another special case, as they only propagate missing values when it is logically required. For these operators, whether or not the result is uncertain depends on the particular operation, following the well-established rules of which are also implemented by NULL in SQL and NA in R. This abstract definition actually corresponds to a relatively natural behavior which is best explained via concrete examples.

    Let us illustrate this principle with the logical "or" operator |. Following the rules of boolean logic, if one of the operands is true, the value of the other operand does not have an influence on the result, which will always be true

    1. julia> true | true
    2. true
    3. julia> false | true
    4. true
    1. julia> true | missing
    2. true
    3. julia> missing | true
    4. true

    On the contrary, if one of the operands is false, the result could be either true or false depending on the value of the other operand. Therefore, if that operand is missing, the result has to be missing too

    1. julia> false | true
    2. true
    3. julia> true | false
    4. true
    5. julia> false | false
    6. false
    7. julia> false | missing
    8. missing
    9. julia> missing | false
    10. missing

    The behavior of the logical "and" operator is similar to that of the | operator, with the difference that missingness does not propagate when one of the operands is false. For example, when that is the case of the first operand

    On the other hand, missingness propagates when one of the operands is true, for example the first one

    1. julia> true & true
    2. true
    3. julia> true & false
    4. false
    5. julia> true & missing
    6. missing

    Finally, the "exclusive or" logical operator xor always propagates missing values, since both operands always have an effect on the result. Also note that the negation operator returns missing when the operand is missing just like other unary operators.

    Control Flow and Short-Circuiting Operators

    Control flow operators including , while and the x ? y : z do not allow for missing values. This is because of the uncertainty about whether the actual value would be true or false if we could observe it, which implies that we do not know how the program should behave. A TypeError is thrown as soon as a missing value is encountered in this context

    1. julia> if missing
    2. println("here")
    3. end
    4. ERROR: TypeError: non-boolean (Missing) used in boolean context

    For the same reason, contrary to logical operators presented above, the short-circuiting boolean operators && and do not allow for missing values in situations where the value of the operand determines whether the next operand is evaluated or not. For example

    1. julia> missing || false
    2. ERROR: TypeError: non-boolean (Missing) used in boolean context
    3. julia> missing && false
    4. ERROR: TypeError: non-boolean (Missing) used in boolean context
    5. julia> true && missing && false

    On the other hand, no error is thrown when the result can be determined without the missing values. This is the case when the code short-circuits before evaluating the operand, and when the missing operand is the last one

    1. julia> true && missing
    2. missing
    3. julia> false && missing
    4. false

    Arrays containing missing values can be created like other arrays

    1. julia> [1, missing]
    2. 2-element Array{Union{Missing, Int64},1}:
    3. 1
    4. missing

    Arrays allowing for missing values can be constructed with the standard syntax. Use Array{Union{Missing, T}}(missing, dims) to create arrays filled with missing values:

    1. julia> Array{Union{Missing, String}}(missing, 2, 3)
    2. 2×3 Array{Union{Missing, String},2}:
    3. missing missing missing
    4. missing missing missing

    An array allowing for missing values but which does not contain any such value can be converted back to an array which does not allow for missing values using convert. If the array contains missing values, a MethodError is thrown during conversion

    Skipping Missing Values

    Since missing values propagate with standard mathematical operators, reduction functions return missing when called on arrays which contain missing values

    1. julia> sum([1, missing])
    2. missing

    In this situation, use the skipmissing function to skip missing values

    1. julia> sum(skipmissing([1, missing]))
    2. 1

    This convenience function returns an iterator which filters out missing values efficiently. It can therefore be used with any function which supports iterators

    1. julia> maximum(skipmissing([3, missing, 2, 1]))
    2. 3
    3. julia> mean(skipmissing([3, missing, 2, 1]))
    4. 2.0
    5. julia> mapreduce(sqrt, +, skipmissing([3, missing, 2, 1]))
    6. 4.146264369941973

    我们可以使用 来提取 non-missing 值,并将它们存入一个数组

    1. julia> collect(skipmissing([3, missing, 2, 1]))
    2. 3-element Array{Int64,1}:
    3. 3
    4. 2
    5. 1

    The three-valued logic described above for logical operators is also used by logical functions applied to arrays. Thus, array equality tests using the == operator return missing whenever the result cannot be determined without knowing the actual value of the missing entry. In practice, this means that missing is returned if all non-missing values of the compared arrays are equal, but one or both arrays contain missing values (possibly at different positions)

    1. julia> [1, missing] == [2, missing]
    2. false
    3. julia> [1, missing] == [1, missing]
    4. missing
    5. julia> [1, 2, missing] == [1, missing, 2]
    6. missing

    As for single values, use to treat missing values as equal to other missing values but different from non-missing values

    1. julia> isequal([1, missing], [1, missing])
    2. true
    3. false