My Note on Solutions.

Wednesday, August 10, 2016

How to locate Magento credit card leakge

Just got a chance to help locate the leaking issues of a customer who got complaints from customer about credit card leak after made purchase.

and it turns out to be a very smart hacker. here is the steps I try to locate the issue.

I use tcpdump to capture all traffic for a couple hours and do a quick analysis to see whether there are some special.  like sending out credit card using SMTP with the port 25, or just use stand http post, if it https. you can sort the TLS certificate to see any special certificate which represents some evil 3rd party hosts.

Nothing found special for me. most just traffic to this site and some 3rd party API call like shipping rate calculation, Fedex integration.

then do a analysis about the request log to see any url which has huge hit from the same IP.
you can use some handy linux command to do the aggregation.

awk '{print $8$2}' requestlog|sort|uniq -c|sort -r -n|more

nothing special, most hits are from crawlers ip with Google.

feel a little big frustrated now, this must be a smart hacker.  then I watch for file change  within 5 minutes. if they hacker intercept the request and store somewhere, I will definitly find out where did he/she store the file.

go the root directly of the magento, I spect all files change within 5 mintues (I placed a order using a dummy credit card"

 find . -mmin -5 -ls

there we go, one file called db-tab-footer_bg.gif was changed minutes ago. there should not be any chagne for gif files. this turns out to be a complicated image file. since it's all stored with encrypted data ( it must be credit card there),

if I check the access log, not a lot request to this file, only once a week coming from the hacker's IP.

now, time to locate how can they capture the data and dump to this file.
just search all php files containing db-tab-footer_bg.gif.
 it's in the global config file, here is the content. essentially they intercept all post request and encrypt using his/her RSA public key and put to the gif file. nobody can decrypt it.

now time to get ride of the backdoor and do the housekeeping to locate how did they inject the backdoor on the server

Sunday, July 31, 2016

Group All established connections by target IP

here is a very common case that you want to check whether you have even connection for a distributed system on the server side. or whether the client send the requests cross all server endpoints.

Basic idea. netstat -an|grep 'EST' will show you all the established connection.

so the 4th column is the client endpoints, and 5th column is the target one.

if we want to see how many connection for each Target as a IP. you can do the following command.
netstat -an -p tcp|grep "EST"|awk '{print $5}'|awk -F "." '{print $1"."$2"."$3"."$4 }'|sort |uniq -c |sort -n -r
and if we only want to see the top 5.  we can put rows index for the awk hint. 
netstat -an -p tcp|grep "EST"|awk '{print $5}'|awk -F "." 'FNR<6 {print $1"."$2"."$3"."$4 }'|sort |uniq -c |sort -n -r

Tuesday, May 17, 2016

Spark window function, failure: ``union'' expected but `(' found

this a very weird error when I try to run a simple window ranking, all looks good from the syntax perspective.
team=[("Lakers","WEST",29 ),("Golden State","WEST",89 ),
      ("MIA HEAT","EAST",79 ),("SAS","WEST",9 ),
      ("RAPTORS","EAST",29 ) ]

        team).map(lambda x: Row(Team=x[0],Division=x[1], Score=x[2])))\

print sql.sql("SELECT team, division, score, rank() OVER (PARTITION BY division ORDER BY score desc)"              "  as rank FROM team").take(10)

And I got this errors complaining the syntax
4j.protocol.Py4JJavaError: An error occurred while calling o36.sql.
: java.lang.RuntimeException: [1.43] failure: ``union'' expected but `(' found

SELECT team, division, score, rank() OVER (PARTITION BY division ORDER BY score desc)  as rank FROM team
 at scala.sys.package$.error(package.scala:27)
 at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
 at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)

To Fix this, please make sure you are using HiveContext instead of SqlContext

How to Run Spark testing application in your fav Python IDE

Here is a quick step to run and test your spark application using python IDE, essentially, we need load the dependency module. setup the environment and load the context.

1. copy and grab pyspark folder under the standard spark distribution to your project folder

2. setup some bootstrap to take care the environments using the following code , I use 1.6.1 as an example. and you may create this as a module.
class  Setup(object):

    def setupSpark(self):
        os.environ["SPARK_HOME"] = "/Users/and/Development/spark/spark-1.6.1-bin-hadoop2.6/"        os.environ["PYSPARK_SUBMIT_ARGS"]="--master local[2]";
        spark_home = os.environ.get("SPARK_HOME")

        spark_release_file = spark_home + "/RELEASE"        if os.path.exists(spark_release_file) and "Spark 1.6.1" in open(spark_release_file).read():
            pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
            if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"            os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

        sys.path.insert(0, spark_home + "/python")

        sys.path.insert(0, os.path.join(spark_home, "python/lib/"))

        return pyspark.SparkContext()
3. you are good to go
from lib.setup import Setup


print sc.parallelize(range(1,10)).count()

Tuesday, December 15, 2015

How to: test ElasticSearch geoLocation support

Here is a quick tutorial to setup ES 2.1 and index some earthquake data by using the REST api. and then do by query by assign the POI within couple miles and return those earthquakes count aggregated by Range.
the query looks like this.  basically query all the earthquake happened near San Diego within 300KMs.

and the results showing the individual earthquakes and the buckets aggregated.
you can grad the data source from USGS , I closed the CSV file here
to do the Indexing, first we define the mapping , I used Postman to do the REST call. My mapping looks like this,
once we have the mapping and Index, we can feed the data to ES using the Sense tool. I put a small script to parse the CSV data to JSON following the Bulk format. 
like this

And I found Sense is the best client when you do the Bulk indexing, all other REST plugin in Chome give me the encoding error.
then copy and paste the data we convered, basically tell the importer we want to create a new document called eq, then the doc itself

Thursday, May 21, 2015

Http 1.1 Chunked transfer.

If you inspect the http response with some zipped resources, you may find 2 things. I assure we are on HTTP 1.1. there is no content-length header ins the response, also the transfer-encoding is chunked.


and Http 1.1 have a full spec about the chunked encoding, check it here.

if you use node.js , we can write a simple test to try out the chunked transfer. here is the source code


and try pull the response using wget. you will notice the 3 seconds delay between Okay3 and 4.


and inspect the traffic in between, you can see the wirelevel bits


1st response returns Okay1


then the 2nd returns2 and 3


after 3 seconds delay

we get okay4



once we call the end , you can see the end trailer.

wireshark are smart enough, to tell you 4 frames invloved in this http req/res


Monday, May 18, 2015

Power-shell , Filter and Projection

To getstart with any command in PS, run help.



To filter it, use the where-object or use ? directly.


get running service


Or just using ? instead.


Using Select to run a projection, select name, status only


also, you can skip and tail the results


sellect last 5 only


convert the result to a Html page? using the convert*


to show it in a gridview


Locations of visitors to this page