My Note on Solutions.: May 2016

Tuesday, May 17, 2016

Spark window function, failure: ``union'' expected but `(' found

this a very weird error when I try to run a simple window ranking, all looks good from the syntax perspective.

team=[("Lakers","WEST",29 ),("Golden State","WEST",89 ),
      ("MIA HEAT","EAST",79 ),("SAS","WEST",9 ),
      ("RAPTORS","EAST",29 ) ]

sql.createDataFrame(
    sc.parallelize(
        team).map(lambda x: Row(Team=x[0],Division=x[1], Score=x[2])))\
    .registerAsTable("team")

print sql.sql("SELECT team, division, score, rank() OVER (PARTITION BY division ORDER BY score desc)"              "  as rank FROM team").take(10)

And I got this errors complaining the syntax

4j.protocol.Py4JJavaError: An error occurred while calling o36.sql.
: java.lang.RuntimeException: [1.43] failure: ``union'' expected but `(' found

SELECT team, division, score, rank() OVER (PARTITION BY division ORDER BY score desc)  as rank FROM team
                                          ^
 at scala.sys.package$.error(package.scala:27)
 at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
 at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)

To Fix this, please make sure you are using HiveContext instead of SqlContext

How to Run Spark testing application in your fav Python IDE

Here is a quick step to run and test your spark application using python IDE, essentially, we need load the dependency module. setup the environment and load the context.

1. copy and grab pyspark folder under the standard spark distribution to your project folder

2. setup some bootstrap to take care the environments using the following code , I use 1.6.1 as an example. and you may create this as a module.

class  Setup(object):


    def setupSpark(self):
        os.environ["SPARK_HOME"] = "/Users/and/Development/spark/spark-1.6.1-bin-hadoop2.6/"        os.environ["PYSPARK_SUBMIT_ARGS"]="--master local[2]";
     
        spark_home = os.environ.get("SPARK_HOME")

    
        spark_release_file = spark_home + "/RELEASE"        if os.path.exists(spark_release_file) and "Spark 1.6.1" in open(spark_release_file).read():
            pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
            if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"            os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

     
        sys.path.insert(0, spark_home + "/python")

     
        sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.9-src.zip"))

        return pyspark.SparkContext()

3. you are good to go

from lib.setup import Setup

sc=Setup().setupSpark()

print sc.parallelize(range(1,10)).count()

Pages

My Note on Solutions.

Tuesday, May 17, 2016

Spark window function, failure: ``union'' expected but `(' found

How to Run Spark testing application in your fav Python IDE

Search This Blog

All Blogs

Blog Archive