Running the Apache Beam samples With Apache Spark

For example, for Hop 1.2, the latest currently supported version is 3.1.2.

Download your selected Spark version and unzip to a convenient location.

To keep things as simple as possible, we’ll run a local single node Spark cluster.

First we need to start our local master. This can be done with a single command from the folder where you unzipped Spark:

Your output should look similar to the one below:

You should now be able to access the Spark Master’s web ui at http://localhost:8080.

Copy the master’s url from the master’s page header, e.g. spark://localhost.localdomain:7077.

With the master in place, we can start a worker (formerly called slave). Similar to the master, this is a single command that takes the master’s url that yo

Your output should look similar to the one below:

Since Spark doesn’t support remote execution, we’ll be running one of the sample pipelines through Spark Submit.

Use a command like the one below to pass all the information required by spark-submit.

In this case, the fat jar and metadata export files were saved to . The last argument, Spark, is the name of the Spark pipeline run configuration in the samples project. Replace with the necessary arguments for your environment and run.

You should see verbose logging output similar to the output below: