My Note on Solutions.

Tuesday, May 17, 2016

Spark window function, failure: ``union'' expected but `(' found

this a very weird error when I try to run a simple window ranking, all looks good from the syntax perspective.
team=[("Lakers","WEST",29 ),("Golden State","WEST",89 ),
      ("MIA HEAT","EAST",79 ),("SAS","WEST",9 ),
      ("RAPTORS","EAST",29 ) ]

        team).map(lambda x: Row(Team=x[0],Division=x[1], Score=x[2])))\

print sql.sql("SELECT team, division, score, rank() OVER (PARTITION BY division ORDER BY score desc)"              "  as rank FROM team").take(10)

And I got this errors complaining the syntax
4j.protocol.Py4JJavaError: An error occurred while calling o36.sql.
: java.lang.RuntimeException: [1.43] failure: ``union'' expected but `(' found

SELECT team, division, score, rank() OVER (PARTITION BY division ORDER BY score desc)  as rank FROM team
 at scala.sys.package$.error(package.scala:27)
 at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
 at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)

To Fix this, please make sure you are using HiveContext instead of SqlContext

How to Run Spark testing application in your fav Python IDE

Here is a quick step to run and test your spark application using python IDE, essentially, we need load the dependency module. setup the environment and load the context.

1. copy and grab pyspark folder under the standard spark distribution to your project folder

2. setup some bootstrap to take care the environments using the following code , I use 1.6.1 as an example. and you may create this as a module.
class  Setup(object):

    def setupSpark(self):
        os.environ["SPARK_HOME"] = "/Users/and/Development/spark/spark-1.6.1-bin-hadoop2.6/"        os.environ["PYSPARK_SUBMIT_ARGS"]="--master local[2]";
        spark_home = os.environ.get("SPARK_HOME")

        spark_release_file = spark_home + "/RELEASE"        if os.path.exists(spark_release_file) and "Spark 1.6.1" in open(spark_release_file).read():
            pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
            if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"            os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

        sys.path.insert(0, spark_home + "/python")

        sys.path.insert(0, os.path.join(spark_home, "python/lib/"))

        return pyspark.SparkContext()
3. you are good to go
from lib.setup import Setup


print sc.parallelize(range(1,10)).count()

Tuesday, December 15, 2015

How to: test ElasticSearch geoLocation support

Here is a quick tutorial to setup ES 2.1 and index some earthquake data by using the REST api. and then do by query by assign the POI within couple miles and return those earthquakes count aggregated by Range.
the query looks like this.  basically query all the earthquake happened near San Diego within 300KMs.

and the results showing the individual earthquakes and the buckets aggregated.
you can grad the data source from USGS , I closed the CSV file here
to do the Indexing, first we define the mapping , I used Postman to do the REST call. My mapping looks like this,
once we have the mapping and Index, we can feed the data to ES using the Sense tool. I put a small script to parse the CSV data to JSON following the Bulk format. 
like this

And I found Sense is the best client when you do the Bulk indexing, all other REST plugin in Chome give me the encoding error.
then copy and paste the data we convered, basically tell the importer we want to create a new document called eq, then the doc itself

Thursday, May 21, 2015

Http 1.1 Chunked transfer.

If you inspect the http response with some zipped resources, you may find 2 things. I assure we are on HTTP 1.1. there is no content-length header ins the response, also the transfer-encoding is chunked.


and Http 1.1 have a full spec about the chunked encoding, check it here.

if you use node.js , we can write a simple test to try out the chunked transfer. here is the source code


and try pull the response using wget. you will notice the 3 seconds delay between Okay3 and 4.


and inspect the traffic in between, you can see the wirelevel bits


1st response returns Okay1


then the 2nd returns2 and 3


after 3 seconds delay

we get okay4



once we call the end , you can see the end trailer.

wireshark are smart enough, to tell you 4 frames invloved in this http req/res


Monday, May 18, 2015

Power-shell , Filter and Projection

To getstart with any command in PS, run help.



To filter it, use the where-object or use ? directly.


get running service


Or just using ? instead.


Using Select to run a projection, select name, status only


also, you can skip and tail the results


sellect last 5 only


convert the result to a Html page? using the convert*


to show it in a gridview


Friday, May 1, 2015

What's new in 5

updates for 4.6, web forms mvc5

roslyn support. .net compiler

roslen c# languae fature
var name="ss"
var messgage=$"this is a amessag  {name}"


totally modular
>old days, all features on. now , you can turn on off , faster
>cloud support
>faster dev cycle. 2 seconds to compiler
>cross platform. run on mac, linux, windows
>faster. less memory
powerd by readthedocs, markdown syntax
1. turn on featues on and off, moduleer. can run on IoT

//like the nodejs
public void configure(IapplicationBuild app)
{ (context)=>
await context.Response.WriteAsync("Hello World")

//add dependency .microsfot.aspnet.diagnostis


dnx . web //start the app

//not static handler by default. need add dependency
//add dependency

app.useStatiffiles (); // extension methods

2. roslyn engine
instead of compile cs to dll, then load it.
now load it in memory

DNX (Dontne Execution Environment)
5.0 core

old file have the csproj, inclue all files. cause merge issues.
now all fiels in the project

commands in package.json
like alias in npm

dnx . web  //run web command in the folder.

target framework

bower support
>>>nuget not versioned

gulp support
task runner support (show gupp task)

>before build
>after build

publish the app to a disk and can run it directly. by run the web.command


<Environment names="Dev">

<link rel="stylesheet" href="cdnlink" asp-fall-back-ref="otherlinks">

model injection.
more html
<input asp-for="email" class="form-control"/>

used to be html.Textbox(m->m.Email, new {"classs=formcontrol"})

4. configuration.

new Configuration().addInifile() or addjson()

app.addUserSecretcs(), no connectionstring in web.config.
also for hosted on cloud, IT ops will asign those value

user-script //
user-script set AppSettings:SiteTitle "sec title"

5. controller

new ScrottController () , no need for base calss

public xxxController()

public string Index()
return "hello, index"

6.dotnet version manager
dnvm list

7. mac

dnu restore

dnx run kestrol
Mcirosfot.aspnet.serverhosting -server kestrol

8. run on rsp pi

What's new in C# 6

Roslyn open source

  IDE Features (CTL+.)
  . lightball to remove unused namespace
    .. fix the scope , remove all unused cross the project / solutions
  . refactor
    rename, introduce local variable, show conflict
    add this automaticaly if their is conflict that could be resolved by IDE
  . Array vs ImmutableArray

  var c=new ImmutableArray(); call c.length will trigger null exception.
  //you can build code analyzer
  //tell the ide you shoudl you  ImmutableArray<int>.empty

language new features
  instead of big change, little things added.

  1.using static System.Console. //simar typescript import {WriteLine} from systemcnosole

    then you can call WriteLine methods.
  2.Immutable, auto property.
    public class Point
      public int x {get;}
      public int y {get;}=default10;
  3 lambda for methods
    public void string ToString()=>String.format("tostring {0}", x);

  4 $String,
    public void String toString()=>$({X}{Y})
  5. nameof(variable)
    if p!=null &&"xxx"
    will be if p?.name=="xx"
    if json!=null && json['x']!=null && json['x']=="mon"
    will be
  7.initlize elements
    public JObject tojson()=>return new jsonobject(){['x']=x, ['Y']=y }
  8. awit in catch block

    var result=await repo.DosomethingAsync();
    catch(Excepton ex)
      await repo.LogException(ex) //doable now

  9.catch(Exception ex) when (ex.Occurences>3)

Debugging features
  1. you can edit code, even add new class , and do initialization code when debugging the app
    run linq query

    also in watch window

  C# extensions toolkit in the extnsion gallery
  2. C# interactive window

      #r "System"
      #r "System.Core"
      using System.Diagnostics;
      using System.linq;
      var memoryPigs =from p in Process.getProcesses() where p.workset64 >64*1024*1024 select new
      {p.ProcessName, p.WorkingSet}
      foreach(var r in memopigs)

Locations of visitors to this page