Tuesday, May 17, 2016

How to Run Spark testing application in your fav Python IDE

Here is a quick step to run and test your spark application using python IDE, essentially, we need load the dependency module. setup the environment and load the context.

1. copy and grab pyspark folder under the standard spark distribution to your project folder

2. setup some bootstrap to take care the environments using the following code , I use 1.6.1 as an example. and you may create this as a module.
class  Setup(object):

    def setupSpark(self):
        os.environ["SPARK_HOME"] = "/Users/and/Development/spark/spark-1.6.1-bin-hadoop2.6/"        os.environ["PYSPARK_SUBMIT_ARGS"]="--master local[2]";
        spark_home = os.environ.get("SPARK_HOME")

        spark_release_file = spark_home + "/RELEASE"        if os.path.exists(spark_release_file) and "Spark 1.6.1" in open(spark_release_file).read():
            pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
            if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"            os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

        sys.path.insert(0, spark_home + "/python")

        sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.9-src.zip"))

        return pyspark.SparkContext()
3. you are good to go
from lib.setup import Setup


print sc.parallelize(range(1,10)).count()

No comments:

Locations of visitors to this page