ClickStreamFlow
example application for Tigon.
This app serves as a demonstration for using Tigon SQL to join 2 data streams.
In this example, a view event data stream and a click event data stream are joined to generate meta information for each click event using the view event data stream. Click events are filtered on the basis of conditions specified in a SQL query.
The ClickStreamFlow
application consists of:
- A
SQLInputFlowlet
that accepts two incoming data streams of view events and click events. It then processes this data based on the provided SQL query and provides meta information for each click event using the information from the associated view event. - A
DigestFlowlet
that accepts the output data generated by theSQLInputFlowlet
and emits it to a HTTP end-point as a string encoded JSON. - The generated output data is logged to stdout.
ClickStreamFlow
accepts two input streams:
viewStream
: A stream of data packets that represent a view of a web page. The data contains the following fields:- pageViewID : Represents the unique ID that corresponds to the view event on a specific page. In this example, the ID is assumed to be an increasing attribute. It can be used to identify a specific view event.
- viewTime : Represents the time at which the view event occured. In this example, times are in seconds and stored as integers.
- lid : Respresents the ID of the nth link available on a given web page.
- pageInfo : This field contains additional information about the web page being viewed. It may contain the name of the page or the web server it is being served from.
- linkDetails : This field is a string-encoded map of link IDs and their associated details.
clickStream
: A stream of data packets that represents a click event. It contains these fields:- refPageViewID : The corresponding page view event ID that was responsible for the click.
- clickTime : The time at which this click event occured.
- lid : The ID of the link that was clicked.
The SQL query performs a join on these two data streams based on the pageViewID and the refPageViewID in viewStream and clickStream respectively. It generates an output record containing meta information associated with each click: click time; referrer page information; link ID; link details; and the referrer page view ID.
The query filters out all click events that occur five minutes (300 seconds) after the associated view event.
To build the ClickStreamFlow
jar, run:
mvn clean package
To build the jar without running the tests, run:
mvn clean package -DskipTests
To run the app in the Standalone Runtime Environment:
$ ./run_standalone.sh /path/to/ClickStreamFlow-<version>.jar co.cask.tigon.apps.clickstreamflow.ClickStreamFlow
To run the app in the Distributed Runtime Environment:
$ ./run_distributed.sh /path/to/ClickStreamFlow-<version>.jar co.cask.tigon.apps.clickstreamflow.ClickStreamFlow
Optional runtime arguments:
httpPort
: Specify the network port for the HTTP data ingestion service. If not specified, Tigon automatically assigns a port. This value is always announced at runtime.tcpPort_viewStream
: Specify the network port for the TCP data ingestion service for ingesting view stream data. If not specified, Tigon automatically assigns a port. This value is always announced at runtime.tcpPort_clickStream
: Specify the network port for the TCP data ingestion service for ingesting click stream data. If not specified, Tigon automatically assigns a port. This value is always announced at runtime.
To post view event data to the viewStream:
$ curl -X POST http://localhost:<httpPort>/v1/tigon/viewStream -d '{ "data" : ["1","10","0","1","2","PageName1","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"] }'
To post click event data to the clickStream:
$ curl -X POST http://localhost:<httpPort>/v1/tigon/clickStream -d '{ "data" : ["2","290","1"] }'
-
viewStream
:{"data":["1","10","0","1","2","PageName1","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["2","20","0","1","2","PageName2","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["3","30","0","1","2","PageName3","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["4","40","0","1","2","PageName4","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["5","50","0","1","2","PageName5","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["6","60","0","1","2","PageName6","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["7","70","0","1","2","PageName7","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["8","80","0","1","2","PageName8","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["9","90","0","1","2","PageName9","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]} {"data":["10","100","0","1","2","PageName10","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
-
clickStream
:{"data":["2","290","1"]} {"data":["4","320","2"]} {"data":["6","350","0"]} {"data":["8","380","1"]} {"data":["10","410","2"]}
ClickTime : 290 Link Message : LinkName1 Referrer Page : PageName2
ClickTime : 320 Link Message : LinkName2 Referrer Page : PageName4
ClickTime : 350 Link Message : LinkName0 Referrer Page : PageName6
ClickTime : 380 Link Message : LinkName1 Referrer Page : PageName8
Copyright © 2014 Cask Data, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.