Apache Spark (Tutorial 1) : Java 8 + Maven 3 + Eclipse

Requirements

  • Java 8 : We are going to use the Java 8 Function interface and the hot Lambda expressions.
  • Maven 3 : Just to automate collecting the project dependencies.
  • Eclipse : My usual IDE for Java/JavaEE developments.

Windows Environment Setup

  1. Download the executable winutils from the Hortonworks repository.
  2. Create a dummy directory where you place the downloaded executable winutils.exe. For example : C:\SparkDev\bin.
  3. Add the environment variable HADOOP_HOME which points to C:\SparkDev. You have 2 choices :
    • Windows > System Setting
    • Eclipse > Your Class which can be run as a Java Application (containing the static main method) > Right Click > Run as > Run Configurations > Evironment Tab :

Apache Spark HADOOP_HOME environment variable in Eclipse

Project Setup

Create a Maven Project. Configure pom.xml as follows :

Local Spark Context Creation

The benefit of creating a local Spark context is the possibility to run everything locally without being in need of deploying Spark Server separately as a master. This is very interesting while development phase. So here it is the basic configuration :

 

Now that we have an operational environment, let’s move to punchy examples of RDD Tranformations and Actions tutorial.