Running the Apache Beam samples With Apache Flink

    For example, for Hop 1.2, the latest currently supported version is 1.13. Make sure to download Flink for a recent and JDK 8 compatible Scale version. For Flink 1.3.6, this is Scala 2.11.

    Download your selected Flink version and unzip to a convenient location.

    To keep things as simple as possible, we’ll run a local single node Flink cluster with a single command.

    In the folder where you unzipped Flink to, run:

    Your output should look similar to the one below:

    Apache Flink Dashboard

    In Hop Gui’s metadata perspective for the Samples project, edit the Flink pipeline run configuration and make sure the Fat jar file location (the very last option) points to the Hop fat jar you created earlier in the prerequisites.

    Set your Flink master to your cluster’s master. For embedded Flink, [local] will do.

    Go back to the data orchestration perspective and run one of the Beam pipelines in the samples project. In this example, we used samples/beam/pipelines/generate-synthetic-data.hpl

    When you start your pipeline from Hop Gui, it will appear in your Flink Dashboard.

    Apache Flink Dashboard - width="90%"

    Set your Flink master to [auto] and export your Hop metadata again (see ).

    Unlike Spark you can not pass java options at runtime to the TaskManager. So we also want to set the PROJECT_HOME variable in the run configuration. This variable is used during execution to know where the source files are. (Metadata perspective → Pipeline Run Configuration → Flink → Variables)

    Apache Beam - Flink run configuration - master

    Use a command like the one below to pass all the information required by .

    With your Hop and Flink set up correctly, your output will look similar to what’s shown below:

    After your pipeline finishes and the flink run command ends, your Flink dashboard will show a new entry in the ‘Completed Job List’. You can follow up any running applications in the ‘Running Job List’ and drill down into their execution details while running.