Scalar User Defined Functions (UDFs)

    To define the properties of a user-defined function, the user can use some of the methods defined in this class.

    • asNonNullable(): UserDefinedFunction

    • asNondeterministic(): UserDefinedFunction

      Updates UserDefinedFunction to nondeterministic.

    • Updates UserDefinedFunction with a given name.

    Find full example code at “examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedScalar.scala” in the Spark repo.

    1. import org.apache.spark.sql.api.java.UDF1;
    2. import org.apache.spark.sql.expressions.UserDefinedFunction;
    3. import static org.apache.spark.sql.functions.udf;
    4. import org.apache.spark.sql.types.DataTypes;
    5. SparkSession spark = SparkSession
    6. .builder()
    7. .appName("Java Spark SQL UDF scalar example")
    8. .getOrCreate();
    9. // Define and register a zero-argument non-deterministic UDF
    10. // UDF is deterministic by default, i.e. produces the same result for the same input.
    11. UserDefinedFunction random = udf(
    12. () -> Math.random(), DataTypes.DoubleType
    13. random.asNondeterministic();
    14. spark.udf().register("random", random);
    15. spark.sql("SELECT random()").show();
    16. // +-------+
    17. // +-------+
    18. // |xxxxxxx|
    19. // +-------+
    20. // Define and register a one-argument UDF
    21. spark.udf().register("plusOne",
    22. (UDF1<Integer, Integer>) x -> x + 1, DataTypes.IntegerType);
    23. spark.sql("SELECT plusOne(5)").show();
    24. // +----------+
    25. // |plusOne(5)|
    26. // +----------+
    27. // | 6|
    28. // +----------+
    29. // Define and register a two-argument UDF
    30. UserDefinedFunction strLen = udf(
    31. (String s, Integer x) -> s.length() + x, DataTypes.IntegerType
    32. spark.udf().register("strLen", strLen);
    33. // +------------+
    34. // |UDF(test, 1)|
    35. // +------------+
    36. // | 5|
    37. // +------------+
    38. // UDF in a WHERE clause
    39. spark.udf().register("oneArgFilter",
    40. (UDF1<Long, Boolean>) x -> x > 5, DataTypes.BooleanType);
    41. spark.range(1, 10).createOrReplaceTempView("test");
    42. spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show();
    43. // +---+
    44. // | id|
    45. // +---+
    46. // | 6|
    47. // | 7|
    48. // | 8|
    49. // | 9|
    50. // +---+