Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

Latest commit

 

History

History

ClickStreamFlow

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

ClickStreamFlow

ClickStreamFlow example application for Tigon.

This app serves as a demonstration for using Tigon SQL to join 2 data streams.

In this example, a view event data stream and a click event data stream are joined to generate meta information for each click event using the view event data stream. Click events are filtered on the basis of conditions specified in a SQL query.

Implementation Details

The ClickStreamFlow application consists of:

  • A SQLInputFlowlet that accepts two incoming data streams of view events and click events. It then processes this data based on the provided SQL query and provides meta information for each click event using the information from the associated view event.
  • A DigestFlowlet that accepts the output data generated by the SQLInputFlowlet and emits it to a HTTP end-point as a string encoded JSON.
  • The generated output data is logged to stdout.

Data Processing

ClickStreamFlow accepts two input streams:

  • viewStream : A stream of data packets that represent a view of a web page. The data contains the following fields:
    • pageViewID : Represents the unique ID that corresponds to the view event on a specific page. In this example, the ID is assumed to be an increasing attribute. It can be used to identify a specific view event.
    • viewTime : Represents the time at which the view event occured. In this example, times are in seconds and stored as integers.
    • lid : Respresents the ID of the nth link available on a given web page.
    • pageInfo : This field contains additional information about the web page being viewed. It may contain the name of the page or the web server it is being served from.
    • linkDetails : This field is a string-encoded map of link IDs and their associated details.
  • clickStream : A stream of data packets that represents a click event. It contains these fields:
    • refPageViewID : The corresponding page view event ID that was responsible for the click.
    • clickTime : The time at which this click event occured.
    • lid : The ID of the link that was clicked.

The SQL query performs a join on these two data streams based on the pageViewID and the refPageViewID in viewStream and clickStream respectively. It generates an output record containing meta information associated with each click: click time; referrer page information; link ID; link details; and the referrer page view ID.

The query filters out all click events that occur five minutes (300 seconds) after the associated view event.

Build and Usage

To build the ClickStreamFlow jar, run:

 mvn clean package

To build the jar without running the tests, run:

 mvn clean package -DskipTests

To run the app in the Standalone Runtime Environment:

 $ ./run_standalone.sh /path/to/ClickStreamFlow-<version>.jar co.cask.tigon.apps.clickstreamflow.ClickStreamFlow

To run the app in the Distributed Runtime Environment:

 $ ./run_distributed.sh /path/to/ClickStreamFlow-<version>.jar co.cask.tigon.apps.clickstreamflow.ClickStreamFlow

Optional runtime arguments:

  • httpPort: Specify the network port for the HTTP data ingestion service. If not specified, Tigon automatically assigns a port. This value is always announced at runtime.
  • tcpPort_viewStream: Specify the network port for the TCP data ingestion service for ingesting view stream data. If not specified, Tigon automatically assigns a port. This value is always announced at runtime.
  • tcpPort_clickStream: Specify the network port for the TCP data ingestion service for ingesting click stream data. If not specified, Tigon automatically assigns a port. This value is always announced at runtime.

Example Usage

To post view event data to the viewStream:

$ curl -X POST http://localhost:<httpPort>/v1/tigon/viewStream -d '{ "data" : ["1","10","0","1","2","PageName1","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"] }' 

To post click event data to the clickStream:

$ curl -X POST http://localhost:<httpPort>/v1/tigon/clickStream -d '{ "data" : ["2","290","1"] }' 

Sample Input and Output

Input

  • viewStream:

      {"data":["1","10","0","1","2","PageName1","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["2","20","0","1","2","PageName2","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["3","30","0","1","2","PageName3","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["4","40","0","1","2","PageName4","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["5","50","0","1","2","PageName5","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["6","60","0","1","2","PageName6","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["7","70","0","1","2","PageName7","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["8","80","0","1","2","PageName8","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["9","90","0","1","2","PageName9","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
      {"data":["10","100","0","1","2","PageName10","'0':'LinkName0' - '1':'LinkName1' - '2':'LinkName2'"]}
    
  • clickStream:

      {"data":["2","290","1"]}
      {"data":["4","320","2"]}
      {"data":["6","350","0"]}
      {"data":["8","380","1"]}
      {"data":["10","410","2"]}
    

Output

    ClickTime : 290	Link Message : LinkName1	Referrer Page : PageName2
    ClickTime : 320	Link Message : LinkName2	Referrer Page : PageName4
    ClickTime : 350	Link Message : LinkName0	Referrer Page : PageName6
    ClickTime : 380	Link Message : LinkName1	Referrer Page : PageName8

License and Trademarks

Copyright © 2014 Cask Data, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.