SlideShare a Scribd company logo
Analytical Queries with Hive: SQL
Windowing, and Table Functions

                     Harish Butani, SAP
Agenda
 Ø  Why
      Ø  What    are Partitioned Table Functions (PTFs)?
        Ø  Why are they interesting?
 Ø  What
        Ø  Our solution
        Ø  Demo
 Ø  How
        Ø  Our Implementation, briefly
 Ø  Expand on concept of PTFs:
        Ø  Multi Pass and Recursive Algorithms
 Ø  Next steps and Summary
What are PTFs?
What are PTFs
Ø  AreFunction invocations that can appear in place of a
  table in SQL.
  Contract is Table in à Table out.


Ø  Input   is partitioned (optionally ordered) .
  An instance of a PTF operates on a Partition.
  Partitioning drives parallel execution


Ø  Similar   to MR, but predates MR.

Ø  Available   in many DBs: Oracle, Aster, DB2 etc.
Analytics expressed using PTFs
}    Aggregations by Partition à Ranking,Top N.
      }    Rank products within manufacturer by price
      }    List three largest census tracts by area within each US county
}    Inter row Calculations à Time Series Analysis
      }    Find occurrences where a flight was more than 15 mins. late, five or
            more times in a row
}    Multi Pass Algorithms à Market Basket Analysis
      }    Find items bought frequently together
      }    Find Web pages visited in the same session
}    Graph Algorithms à implemented as Recursive Queries
      }    Find lowest cost flights between two cities
                … exposed in SQL as Table Function invocations
Analytics expressed using PTFs
Ø  Aster        SQL/MR Function Library
  Ø    Time Series Analysis
  Ø    Graph Analysis
  Ø    Fraud Detection
  Ø    Sessionization
  Ø    …
PTFs: bottom-line
Ø  Enable
        more interesting Questions more simply in the SQL
  context
  enable analysis not expressible in SQL
  Simplify expressing analysis
Ø  Foster
        Reuse by providing Function Libraries and bridging to
  external engines.

Ø  App. Developers     expect and rely on this in other DBs.

Ø  Our    solution:
  Ø    An attempt to provide this for Hive
  Ø    Still under development
PTF Invocation Example
          Example: Market Basket Analysis
          Ø  Input is a large set of Baskets, each contains a set of Items
          Ø  Find Items that occur frequently together.
        No Standard Form. But typical structure is:
                              from FrequentItemSets(
                                Basket
                                partition by basketId order by itemName,
                                supportThreshold= 0.15)                        Ø  From clause invokes PTF: FIS
                              select itemset                                   Ø  Fn. told how to partition and
                                                                                   order Input
                                                                               Ø  Other Args.: Support threshold
                                                                                   in this case
                                                                               Ø  No change in other parts of SQL
 Basket
                                                           ItemSets	
  
BasketId	
     ItemName	
                    {"items":["apples","baguette"]}
1	
            Apples	
                      {"items":["apples","corned_b"]}
1	
            Baguette	
                    {"items":["apples","hering"]}
2	
            Apples	
                      {"items":["apples","olives"]}
2	
            Avocado	
                     {"items":["apples"]}
2	
            Olives	
  
SQL Windowing
 Ø  Used  to express Aggregations on Partitions
 Ø  Further Window expressions enable aggregations on a
     Window surrounding a row. So row specific Aggregations.
 Ø  Functions available:
     Ø    Ranking: Rank, DenseRank, PercentRank, NTile
     Ø    Aggregation: Sum, Min, Max, Avg, StdDev,Variance
     Ø    Navigation: First Value, Last Value, Lead, Lag
     Ø    Statistics: CoVariance, Linear Regression: Slope, Intercept

 Ø  Enable    expressing
     Ø    Cumulative Sums
     Ø    Delta Analysis
     Ø    Ratios
SQL Windowing, as a PTF
 Simple Example:
 Ø  Group Sales data by Channel and Month
 Ø  Within each Channel: compute Rank, DenseRank, for each
     Month by sales amount
 Ø  Also compute Rank over all Months across Channels

      select channel, month, sum(amount),
          rank() over (order by sum(amount) desc) AS ra,
          denserank() over (partition by channel order by sum(amount) desc) as dr,
          rank() over (partition by channel order by sum(amount) desc) as r
      From sales
      Group by channel, month;




 Key Observation:
 Ø  Processing happens in 3 stages
 Ø  First everything else is executed: join, group by, and having
     clauses
 Ø  Result Set is made available to Windowing Functions: Partition
     by Partition.

                  Similar to how PTFs are executed!
SQL Windowing, as a PTF
    If all windowing clauses have the same
    partition & order expression,
                                                                                     1. Assume Rank
    then this can be expressed as a PTF                                              on all rows not in
                                                                                     original Query

                                       select channel, month, r, dr
                                           denserank() AS dr,
                                           rank() as r                                                    2. Form makes
                                       From <Group By Query>                                              explicit: first
                                         partition by channel order by sum(amount)                        everything else is
                                                                                                          executed; then do
                                                                                                          Windowing
In PTF form                                                                                               calculations.
-  Assume a PTF fn. called WindowingTableFunction
-  Input is the Group By Query
-  Args. to function are Windowing Clauses

select channel, month, s, r, dr
From WindowingTableFunction(
        <select channel, month, sum(amount_sold) as s
          from sales
           Group by channel, month>
        partition by channel                                   Our Solution:
        order by s,
        [r : <rank()>, dr: <denserank()>] )
                                                               1.  Windowing Clause Support at this level
                                                               2.  Can easily add multiple order support
                                                               3.  Multiple partition support requires more
                                                                    thought
Our Solution
PTFs with Hive
} To   use
   } Download    jar from https://github.com/hbutani/SQLWindowing/wiki
   } Setup   windowingCli.sh in bin/ext directory
   } Use    in CLI mode
   } Also   usable from API
PTFs with Hive CLI
 Windowing CLI
                                                Hive CLI


                       Hive                     Translator

        Windowing
          Shell
                                                             Ø  In CLI enter HQL or PTF Queries
                                                Metadata
                                                             Ø  WindowingCLI embeds Hive
                      Hive callbacks:                             Ø  HQL passed to Hive
                      - Metadata                             Ø  Windowing CLI interacts with Hive for
                      - Execute Embedded Hive                    metadata and executing embedded Hive
                      Queries
                                                                 Queries
Execute PTF Queries
as MR jobs
                                                             Ø  PTF works in a similar fashion to Hive
                                            Execute Hive          Ø  Translates Queries into MR jobs
                                            Queries as MR
                                            jobs
     Cluster
Query Structure
     Query abstraction is a select statement:
          Ø  Input  is a Table Function Invocation
          Ø  Filter & Project on Table Function output
          Ø  Table Function call:
                    Ø  Input is Hive Table, Query or another PTF => can chain PTFs
                    Ø  Specify partitioning and order of Input
                    Ø  Other Function Args.                 Input can be:
                                                            •  Hive Table
                                                            •  Hive Query
From PartitionedTableFunction(
        Input Specification                                 •  Another PTF
               Partition by …
        Order by …
        Function Arguments…
                                                   Not shown here:
   )
Select (ColumnName | Expression)+                  1.  Output of Query can be written to Hive
Where Expression                                       Table or Partition
                                                   2.    Can this form be a SubQuery in HQL? Not
                                                         yet.
Query Structure: Windowing Clauses
                                                         1. Windowing Clauses one
From Input Specification,                                variation to Simple Query
                                                         form
   Partition by …
   Order by …
with
  windowing clause….,
  windowing clause…
Select (ColumnName | Expression)+
Where Expression




                Syntactic sugar
                for                 From WindowingTableFunction(
                                            Input Specification
                                                   Partition by …
                                            Order by …
                                            Windowing Clauses …
                                       )
                                    Select (ColumnName | Expression)+
                                    Where Expression
Query Examples: Basic Query
   Rank Parts within Manufacturer by price
Part                                                          1.  On TPCH   Part table
MfrName	
              PartName	
               Price	
       2.  Rank Parts within each Manufacturer by
Manufacturer#1	
       violet almond            2095.99	
     Price
                       orange lavender
                       peach	
  
Manufacturer#2	
       yellow magenta           2094.99	
  
                       gainsboro almond
                       turquoise	
  
                                                                        Not so straightforward w/o windowing because no
Manufacturer#2	
       papaya cream             2095.99	
  
                       smoke yellow khaki	
                             inter row expressions
Manufacturer#1	
       pink orange peach        2094.99	
               1.  Rank over all Rows: (Mfr, Part, Overall Rank)
                       beige steel	
             	
                     2.  Rank Min Query: (Mfr, Min(Rank) )
                                                                        3.  Join 1 & 2 on Mfr, subtract Rank from Min
                                                                             (Rank)



                     from part_rc
                     partition by p_mfgr                                    Manufacturer#1	
     violet almond            2095.99	
     1	
  
                                                                                                 orange lavender
                     order by p_mfgr, p_retailprice desc
                                                                                                 peach	
  
                     with
                                                                            Manufacturer#1	
     pink orange peach        2094.99	
     2	
  
                       rank() as r
                                                                                                 beige steel	
  
                     select p_mfgr,p_name, p_retailprice, r                 Manufacturer#2	
     papaya cream             2095.99	
     1	
  
                                                                                                 smoke yellow khaki	
  
                                                                            Manufacturer#2	
     yellow magenta           2094.99	
     2	
  
                                                                                                 gainsboro almond          	
  
                                                                                                 turquoise	
  
Demo
Query Examples: Top N
           Calculate the Top 3 Tracts(based on land area) by County.
                                                                   Census Geo Header data:
 County	
     Tract	
      AreaLand	
           SumLev	
              1.  Geography dimension for Census data.
 001	
        451101	
     300	
                140	
  
                                                                         2.  Contains   data from multiple hierarchies and levels.
 001	
                     1200	
               120	
  
 005	
        000102	
     35	
                 140	
                    3.  Query   on County-> Census Tract -> Census Block
 004	
        000200	
     15	
                 140	
                    hierarchy
                                                                         4.  Summary    Level column used to identity level

  from <select county, tract, arealand
          from geo_header_sf1
          where sumlev = 140>
  partition by county
  order by county, arealand desc
  with                                                                                   County	
     Tract	
      AreaLand	
     R	
     Cum_Area	
  
    rank() as r,                                                                         001	
        451101	
     300	
          1	
     300	
  
    sum(arealand) over rows                                                               	
  
        between unbounded preceding and current row as cum_area                          001	
        450701	
     250	
          2	
     550	
  
  select county, tract, arealand, r, cum_area                                             	
  
  where <r <= 3>                                                                         001	
        441503	
     150	
          3	
     700	
  
                                                                                                       	
  
                                                                                         005	
        000102	
     450	
          1	
     450	
  
                                                                                         005	
        000200	
     200	
          2	
     650	
  
1. Input is a HQL                     3. Only output top     2. Sum from start
query                                 3 rows from each       of Partition up to
                                      partition.             Current row.
PTF Example: NPath
 Now example of a PTF: NPath
 }  Look for patterns in Time
 }  User specifies Labels: interesting conditions, for e.g. LATE : arr_delay > 15 mins
 }  Then specifies Patterns on Labels. Patterns are simple Regexes. For e.g.
       }    LATE.LATE.LATE.LATE.LATE+ à look for occurrences where a flight is 5 or more times late.
 }    On Occurrences found (Occurrences are a set of rows) specify aggregation calculations. For e.g.
       }    Average Delay among late occurrences
       }    Number of delays



Note
}  This is a non trivial function to implement.
}  But from User point of view just another Function invocation. Can specify Function behavior
    through arguments
}  Also Query executed in the same way: Partition input, invoke function on each Partition…
PTF Example: NPath
         Find incidents where a Flight(to NY) has been more than 15 minutes
         late 5 or more times in a row.                             1. Query on
                                                                    FlightsData table,
                                                                                                                                                  restrict to flights
                                                                                                                                                  to NY
      from npath(<select origin_city, year, month, day_of_month, arr_delay, fl_num
                      from flightsdata                                                                                                2. Looking at data
                      where dest_city = 'New York' and dep_time != ''>                                                                per Flight; order
                    partition by fl_num                                                                                               within partition
                    order by year, month, day_of_month,                                                                               by time

                      <[LATE : "arr_delay > 15"]>,

                     'LATE.LATE.LATE.LATE.LATE+',
                                                                                                                                   4. This is very hard in SQL.
                     <["origin_city", "fl_num", "year", "month", "day_of_month",                                                   Remember the LABEL and
                                      ["(path.sum() { it.arr_delay})/((double)count)", "double", "avgDelay"],                      PATTERNS are specified at
                                      ["count", "int", "numOfDelays"]                                                              Query execution time. So
                     ]>                                                                                                            window of analysis is
                  )                                                                                                                dynamic.
      select origin_city, fl_num, year, month, day_of_month, avgDelay, numOfDelays


                                                                    Origin	
         FlNum	
     Year	
      Month	
     Day	
        AvgDelay	
          NumOfDely	
  
3.                                                                  Boston	
         1017	
      2010	
      10	
        25	
         59.37	
             8	
  
-  Arg. 3 specify conditions as LABELS                              Boston 	
        1017	
      2010	
      10	
        26 	
        58.14	
             7	
  
-  Arg. 4 specify PATTERN                                           Boston	
         1017	
      2010	
      10	
        28	
         30.83	
             6	
  
-  Arg. 5 specify AGGR. EXPRESSIONS
                                                                    Boston 	
        1017	
      2010 	
     10	
        29	
         25.67	
             5	
  
                                                                    Pittsburgh	
     1058	
      2010	
      12	
        26	
         82.62	
             8	
  
Our Solution: What’s available
 Ø    Windowing Functions
        Ø  21 functions available
        Ø  For Ranking, Aggregation, Navigation and Statistics
        Ø  Both Row based and Value based windows.
 Ø    One Pass PTFs
        Ø  NPath
        Ø  Others in the works: Allocation, Deallocation etc. in the bucket of
            Lightweight Dimensional Analysis: See wiki for details.
 Ø    Multi Pass PTFs
        Ø  Market Basket Analysis: using Dynamic Item Set counting Algorithm.
        Ø  Plans to do Generalized Transitive Closure.


                Looking for your input and help implementing others
Our Solution: Query Evaluation
Query Evaluation: a PTF
                                                                                                       Hopefully No Surprise
                                                                                                       Ø    Shuffle to Partition and Sort
 Input           Map                                                                                   Ø    PTFs work on Partitions instead of
DataSet         Splits                                                                                       Rows
                                                                                                       Ø    PTFs use Groovy for expression
                                                                                                             evaluation today.
                                   Writables                  Shuffle controlled by
                                       +                      partition and order
                                    SerDe                     specification

                                                              	
  

                                                                                                                                               Output

                                                                             Partition                                            Partition

          Ø    A PartitionedTableFunction (PTF) given a
                                                                              Writables                                            Writables
                Partition computes an output Partition.
                                                                                  +               PTF                                  +
                Ø    An invocation of PTF specifies how input                 SerDe                                                SerDe
                      dataset should be partitioned and ordered.
                Ø    A PTF defines shape of Output.                           Ø     Partitions are containers of Rows.
                Ø    A PTF may operate on raw data before it is                       Ø  List of Writables + ObjectInspector
                      partitioned and ordered.                                 Ø     Rows exposed as Groovy Binding; Partition
                                                                                      exposed as Groovy iterable.
  For details see doc. directory on GitHub                                     Ø     All Evaluation in context of Row and Partition
                                                                                      and optionally Window
Multi Pass & Recursive Queries PTFs
Multi Pass and Recursive Queries PTFs
}    So far
      }    functions perform one pass over the input
      }    function acts on input after it is partitioned.
}    But other use cases require multiple passes on the input
      Ø  Since each Partition executed independently:
            Ø  you may need to consolidate Output
            Ø  and based on consolidation revisit data.

      Ø  Recursive   Queries used for implementing Graph Algorithms
            are an important subclass of such problems
Ø  In      the Context of this talk the focus is:
      Ø  How    do these fit into the PTF model: both from an interface
            perspective and also from an execution model
Multi Pass PTFs
}    Key Observation:
}    Execution model can be extended with following changes:
      }    Partition input and persist
      }    Repeat à a fixed # of iterations or dynamically determined
            }    Map-side: operate on persisted Input partitions; do Partition à Partition
            }    use shuffle to consolidate output across partitions.
            }    Output from one pass read as Input in next pass.
Ø  But           from Interface Perspective
      Ø    End User still sees this as a PTF invocation


                        Multi Pass mechanics still being worked out
Query Evaluation: Multi Pass PTF
                                                                                                                             repeat
 Input
DataSet                               Map
                                     Splits
                                                  Partition                            Partition

                                                                                        Writables        Shuffle controlled by
                 Partition and
                                                   Writables
                                                       +          PTF       mapside         +            partition and order
                 optionally order                   SerDe                                SerDe           specification

                 	
                                                                                      	
  



                                                               Partition                                        Partition
 Ø    Input to Function is Partitioned.
                                                                Writables                                        Writables
       Function is applied on Map-Side
                                                                                      PTF
 Ø 
                                                                    +                       reduceside
                                                                                                                     +
 Ø    Shuffle used to collect Output                            SerDe                                            SerDe
 Ø    Process repeated
 Ø    Pass n reads output from Pass n-1



                                              Output
PTF Example: Market Basket Analysis
 Ø    Frequent ItemSets computed by doing 2 passes (the SON alg.):
         Ø  Pass 1 compute Frequent ItemSets for each Partition independently
             and consolidate across Partitions.
         Ø  Pass 2 go over input again, eliminate false negatives.
         Ø  Note: computing Frequent ItemSets is complex
               Ø  number of possible ItemSets is exponential
               Ø  Many interesting algorithms: we have implemented Dynamic
                   Item Counting alg.
 Ø    See wiki for details….
 Ø    This is not a big jump from 1 pass
         Ø  Output of Pass 1 is very small relative to input; so no issue of
             communication cost
         Ø  Only 2 passes involved.
PTF Example: Market Basket Analysis
 Basket
                                                                             Implementation and Interface needs work:
BasketId	
     ItemName	
                                                    •  User exposed to the fact that input initially
1	
            Apples	
                                                         partitioned. And then output of function is
1	
            Baguette	
                                                       partitioned.
                                                                             •  Also Multi Pass mechanics not ready.
2	
            Apples	
  
2	
            Avocado	
  
2	
            Olives	
  



 from candidateFrequentItemSets(                                         from FrequentItemSets(
                 <select * from basketdata                                 Basket
                  distribute by basketName                                 partition by basketId order by itemName,
                  sort by basketName, itemName>                            supportThreshold= 0.15)
 partition by itemset order by itemset,                                  select itemset
 'basketName', 'itemName', <0.15>)
 select itemset



                                                     ItemSets	
  
                                       {"items":["apples","baguette"]}
                                       {"items":["apples","corned_b"]}
                                       {"items":["apples","hering"]}
                                       {"items":["apples","olives"]}
                                       {"items":["apples"]}
Recursive Queries as PTFs
}    Heart of Graph Algorithms is traversal: the discovery and selection of Paths
}    In SQL context Graph held in Table R(Src., Dest.) and traversal expressed
      as Relational Operators run iteratively to a fixed point.
      }  Typically until no new Paths discovered.
}    Lot of work in DB community to parallelize these Algorithms:
      }  Partition work
      }  Reduce communication à naïve impl. will suffer from high
          comm. cost.
}    Of late revival of interest:
      }  HaLoop project: Relation based implementation. Based on tweaking
          MR mechanics: changes to JobTracker and TaskTracker.
      }  Giraph project: Matrix based Direct Algorithms
Recursive Queries as PTFs
}     In PTF context:
}     A Graph Algorithm involves
       }  An Input Relation R(Source, Destination)
       }  Output contains Paths that meet a certain criteria.
}     A Recursive Query is processed in a fashion very similar to a Multi Pass
       Algorithms
       }    Partition input and persist ß Partition R by Destination
       }    Repeat
                 }    operate on persisted partitions ß Do map-side Join of R with newly discovered
                       Paths
                 }    use partitioning to consolidate output across partitions. ß partition output of join (the
                       new Paths) by Source. Dedup; output only new Paths.

}     But for End User still sees this as a PTF invocation

      Under the covers could use HaLoop or Giraph to implement Alg.?
PTF Example: Transitive Closure
Flights
                                               Simple example
Source	
       Destination	
                   •  List all possible Routes
New York	
     London	
  

New York	
     Paris	
  
 	
  
London	
       Bombay	
  
                	
  
London	
       Dubai	
  
Dubai	
        Bombay	
  



      from transitiveClosure(
                      <select * from Flights
                        distribute by Dest>
      partition by Src,
      ‘Src', ‘Dest’)
      select Src, Dest                                                       Source	
       Destination	
  
                                                                             New York	
     London	
  

                                                                             New York	
     Paris	
  
                                                                              	
  
                                                                             London	
       Bombay	
  
                                                                                             	
  
                                                                             London	
       Dubai	
  
                                                                             Dubai	
        Bombay	
  
                                                                             New York       Bombay
                                                                             New York       Dubai
PTF Example: Generalized TC
 But TC Mechanics can be generalized.                            More Interesting example:
 Specify:                                                        Input: FlightsTable(Src, Dest, ArrTm, DepTm, Cost)
 Ø  Path Joining Condition: how to generate new                 Ø   Find me the best routes from any airport in New York to
     Paths                                                            Bombay.
 Ø  Path Attributes: how to calculate Aggregation               Ø   The waits at intermediate points must be between 2 to 5 hours.
     attributes on Paths à Sets of Edges                        Ø   Pick the lowest cost flight, but for a direct flight I am willing to
 Ø  Path selection criteria: how to pick from multiple               pay a $100 premium.
     Paths between a Source and Destination.


                                                          from GeneralizedTC(
from GeneralizedTC(                                                       <select * from Flights
                <select * from Flights                                      where Src = ‘New York’
                  distribute by Dest>                                       distribute by Dest>
partition by Src,                                         partition by Src,
SourceColumn,                                             ‘Src’,
DestColumn,                                               ‘Dest’,
Path Joining Condition,                                   Path Joining Condition = <Dest = Path.Src &&
[Path Attributes],                                                                      ArrTm <= Path.DepTm - 120 &&
Path selection condition,                                                               ArrTm >= Path.DepTm - 300>,
select Src, Dest,….                                       Path Attributes = [
Where …                                                                        totalCost: <sum(Price)>,
                                                                               hops : <count(*)>
                                                                              ],
                                                          Path selection condition = <
                                                                                            p1.cost <= p2.cost ||
Google for ‘Shaul Dhar Generalized
Transitive Closure in SQL.’
                                                                                            (p1.hops == 0 && p1.cost <= p2.cost + 100)
1993 Phd thesis from Univ. of Wisconsin,                                                    >,
Madison.                                                  select Src, Dest,….
                                                          Where Dest = ‘Bombay’
Summary and Next Steps
Benefits
Ø  Enable
        more interesting Questions more simply in the SQL
  context
  enable analysis not expressible in SQL
  Simplify expressing analysis


Ø  Foster
        Reuse by providing Function Libraries and bridging to
  external engines.

Ø  App. Developers   expect and rely on this in other DBs.
Next Steps
 Ø  Make Windowing   clauses closer to standard SQL
 Ø  Use Hive Expressions Evaluation at runtime instead of Groovy
 Ø  So why cannot this be part of Hive?
    Ø  Yes   makes a lot of sense. See wiki for a step-by-step plan.
 Ø  Build    out Multi-Pass and Recursive Query mechanics
    Ø  Enhance   PTF interface. Ensure simple ifc for End User. No leakage of
        implementation.
    Ø  Investigate using HaLoop/Giraph as execution model

 Ø  Flush    out function library
More information
}  More details/download at
    https://github.com/hbutani/SQLWindowing/wiki
}  Contact:
      }    Harish Butani: harish.butani@sap.com
Appendix
So Why PTFs?
 Ø    Usability
       Ø  Table Functions: enable more interesting Questions more simply
       Ø  Enable analysis not expressible in SQL
       Ø  Simplification of existing queries: both syntactically and in performance
             Ø    replace self-joins with intra row computations
 Ø    Reusability
       Ø    Functions are parameterized, so reusable in wide range of contexts
       Ø    Functions can reshape Output Schema at runtime
 Ø    Bridge
       Ø    Since ifc to SQL engine is coarse, execution of PTFs may involve other MPP
             engines (multi-pass)
 Ø    Speed
       Ø    Partitioned Table Functions enable working at scale by breaking down dataset
             into manageable Partitions
So Why PTFs?
 Ø  Use    Cases:
    Ø  Time  Series Analysis: NPath
    Ø  Market Basket Analysis
    Ø  Lightweight Dimensional Analytics
    Ø  Graph Algorithms: Transitive Closure
    Ø  Sessionization
    Ø  SQL Windowing Clauses
      Ø    Can be treated as a PTF if all windowing clauses are on the same Partition.
PTF Interface
                     3. Function may have Map
                     side processing




                 1.
                 -  Controls Shape of Output
                 -  Responsible for Output




                2. Enables chaining of
                Functions
Partitioned Table Function Mechanics
                                                                     Node
1.  Partition input into n             Input DataSet
                                                                        Partition 0
    partitions, optionally
    order partitions                                                                        Func.


2.    distribute over set of
      processing nodes
                                                  Partition and      Node
                                                  optionally order
3.    execute Fn. at each node                                         Partition 0
                                                  	
                                                      Next Operator
4.     merge data from all                                                                  Func.
      instances and stream
      merged result to next
      Operator                                                                       .
                                                                                     .
                                                                                     .
                                                                     Node
                                                                                     	
  
                                                                       Partition 0
         Why?
         Ø for Performance: enables parallel
                                                                                            Func.
         operation on large datasets
         Ø for providing advanced analytics                                                        This is not the
                                                                                                    same as Hive
               Supported in many Databases: Oracle, DB2, Aster etc.                                 UDTF
Support in RDBMS: Aster SQL/MR
Problem: Group clickstream data into sessions
Ø  Consider all rows of a User as a Partition
Ø  Order rows by time
Ø  Group rows within 60 seconds of each other as
    belonging to a Session.



Benefits
Ø  Work at Scale: designed to work over 100s of terabytes
Ø  Usability and Reusability:
      Ø  SQL/MR looks like a table, it can appear anywhere a table can appear in SQL
      Ø  Enable analysis not expressible in SQL
      Ø  Simplification of existing queries
      Ø  Functions are parameterized so usable in wide range of contexts.


Aster Function Library
Ø  Time Series Analysis
Ø  Graph Analysis
Ø  Fraud Detection
Ø  Sessionization
Ø ....
Support in RDBMS : Oracle
FUNCTION test_ptf (p_cursor IN t_parallel_test_ref_cursor)
  RETURN t_parallel_test_tab PIPELINED
  PARALLEL_ENABLE(PARTITION p_cursor BY HASH (id));

END parallel_ptf_api;

SELECT sid, count(*)
FROM TABLE(test_ptf(CURSOR(SELECT * FROM input) ) ) t2
GROUP BY sid;
Hive Script Operators
 But one can do PTFs in Hive using Script Operators
  From
          (From clicks
          Select transform(userid, pageid, ts)
          Using ‘/bin/cat’
          As (userid, pageid, ts)
          Distribute by userid
          Sort by userid, ts) map_output
  Select transform(userid, pageid, sessionId)
  Using ‘your java pgm/python script/… pass in args..’
  As (userid, pageid, sessionId)



But
Ø  Not as Usable
       Ø  compare to Aster Query
       Ø  Painful for App. Developer to use. Imagine trying to
           chain functions.
Ø  Not very Reusable                                                        Analogy
       Ø  arg passing embedded in strings that spawn process       Integration via stdin stdout
 And implementation artifacts                                                   Vs
       Ø  Data streamed across process boundaries.                 An embedded Func Library
       Ø  Type information passing limited: tab separated files.
PTF Example: Hierarchical Evaluation
  For a Country, State, City Geo hierarchy compute: % of Country Sales,
  Top City Sales, Avg. City Sales

from hierarchyEvaluate(
  < select Country, State, City, sum(Sales) from Sales group by Country, State, City>
   partition by Country
   order by Country, State, City,
<['Country', 'State', 'City']>,
<[
  ["Sales / Ancestor('Country', 'Sales') * 100.0", "% Country"],
  ["TopN(Descendants('City'), 'Sales', ,1)", "Top City"],
  ["Avg(Descendants('City'), Sales)", "Avg. City"]
]>)
select Country, State, City, '% Country', 'Top City', 'Avg. City'
Simple PTF implementation
                            Annotation specifies Function
                            Args, Name, behavior




                            Shape based on Function
                            before it in chain; or the
                            Query Input if this is the first
                            function.
Query Evaluation: complete picture
                      Ø    A Query is a chain of PTFs.
                            Ø    Input of chain is a Hive Query or table
                      Ø    Windowing clauses are syntax sugar for the
                            special Windowing Table function.
                      Ø    Query translated to a series of Jobs.  Each Job
                            executes part of the Function Chain
                            Ø    For intermediate steps: the output is written
                                  to hdfs and exposed as a Temporary Table to
                                  be used by the next Job in the Chain.
Query Evaluation: Windowing Clauses


 Input           Map
DataSet         Splits



                               Writables               Shuffle controlled by
                                   +                   partition and order
                                SerDe                  specification

                                                       	
  

                                                                                                      Output

                                                                      Partition          Partition
          Windowing Table Function:
                                                                       Writables          Writables
          1.    Pass Partition to each Agg. Function                       +       PTF        +
                                                                        SerDe              SerDe
          2.    If Window Clause specified, pass a Window
                also
          3.    Collect output from each Agg. Function
          4.    Union over all Agg. Functions + Input used
                to evaluate select list
Query Eval.: Market Basket Analysis
        Based on the 2 pass SON algorithm as described in the
        Mining Massive Datasets book.
Job 1                      Node                           Ø           Apply Dynamic Item Counting
                                                                       with support appropriately
                                                                                                             Node
Input Basket                   Partition 0                             scaled down.
  DataSet                                                 Ø           Output Candidate IS from
                                                                                                                   Partition 0
                                                                       each Mapper.
                                                    Map

                                                                                                                                   Reduce


                                                                Distribute by itemset
              Partition by basket            .
              Order by basket, item          .                  	
                                                                      Candidate
                                             .
              	
  
                                             	
                                                                                         ItemSets



Job 2                      Node
                                                                                              Node
                               Partition 0
 Scan baskets for                                                                                    Partition 0
 Candidate Itemsets
                                                    Map

                                                                                                                          Reduce
                                                                                                                                             Final

                                                                Distribute by itemset                                                       ItemSets

                                                                	
  
Query Eval.: Transitive Closure
Sample invocation

      From transitiveClosure(<select X, Y from InputTable cluster by Y>
                              partition by X)                                                                                             }    In the ith iteration:
      Select X, Y;                                                                                                                              }    On the Map side a partition of
                                                                                                                                                      T is joined with the
                                                                                                                                                      corresponding deltaR; this is
First Pass:                                                                                                                                           similar to a MapJoin in Hive.
                                                                                                                                                      Generates new Paths.
                                                                                                                                                }    On the Reduce side the new
      Cluster:	
                                                                        Create ith Partition of R                                     paths from deltaNextR are held
                                      Distribute by X
                                                                                        	
                                                            in a searchable structure; the
       Node i
                      Map Function:
                                      	
                    ReduceFunction:
                                                                                Node j                                                                rows of the corresponding R
                                                                                                                                                      partition are streamed to mark
         T(X, i)      O/P (X, Row)                          O/P Row                 R(i, Y)                                                           any rows in deltaNextR that
                                                                                                                                 repeat               are duplicates. At the end a new
                                                                                                                                 	
                   file is stored in the
                                                                                                                                                      correponding R partition with
                                                                                                                                                      the new paths

ith Pass:

    Cluster:	
                                             1.  Read in R(i, Y)                          Create ith deltaR file
                     1.  Read in dR(I, Y)                  2.  Perform dRnext – R                       in R(i, Y)
                                                           3.  Add dR rows to R
      Node i
                     2.  Perform dR join T

                     Map Function:                                   ReduceFunction:
                                                                                               Node j   	
  
       T(X, i)       O/P (X, Row)
                                      3. Distribute by X

                                      	
                                                                                                        Can we use HaLoop
                                                                                                                                                or Giraph to do
                                                                                                                                                Graph Processing?
Can this be part of Hive?
}    There are undeniable reasons for doing this, briefly:
      }    end users would want this functionality inside Hive for reasons of consistent behavior, support etc.
      }    use of a consistent expression language. For e.g. reuse of Hive functions in Windowing
            clauses.
      }    Implementation wise:
            }    Windowing is orders of magnitude simpler than Hive, and can benefit from using equivalent components that are in
                  Hive.
            }    Avoid the trap of constantly chasing changes in the Hive code base.
            }    Folding in Table function mechanics may open up optimizations not possible with the approach today.
Path to folding into Hive
 Possible Path to moving into Hive ( Details here)
 Ø  Step 1: Move to Hive MR mechanics for Job execution
 Ø  Step 2: Move to Hive Evaluators and Expressions
 Ø  Step 3: Introduce the concept of Partition Table Functions in
     Hive; allow users to invoke PTFs via an exec Function
     mechanism.
 Ø  Step 4: Allow Table function invocations to appear in Table
     Expressions; do AST transformations to translate to HQL
     followed by exec PTF.
 Ø  Step 5: Extend HQL with Windowing Clauses

More Related Content

Analytical Queries with Hive: SQL Windowing and Table Functions

  • 1. Analytical Queries with Hive: SQL Windowing, and Table Functions Harish Butani, SAP
  • 2. Agenda Ø  Why Ø  What are Partitioned Table Functions (PTFs)? Ø  Why are they interesting? Ø  What Ø  Our solution Ø  Demo Ø  How Ø  Our Implementation, briefly Ø  Expand on concept of PTFs: Ø  Multi Pass and Recursive Algorithms Ø  Next steps and Summary
  • 4. What are PTFs Ø  AreFunction invocations that can appear in place of a table in SQL. Contract is Table in à Table out. Ø  Input is partitioned (optionally ordered) . An instance of a PTF operates on a Partition. Partitioning drives parallel execution Ø  Similar to MR, but predates MR. Ø  Available in many DBs: Oracle, Aster, DB2 etc.
  • 5. Analytics expressed using PTFs }  Aggregations by Partition à Ranking,Top N. }  Rank products within manufacturer by price }  List three largest census tracts by area within each US county }  Inter row Calculations à Time Series Analysis }  Find occurrences where a flight was more than 15 mins. late, five or more times in a row }  Multi Pass Algorithms à Market Basket Analysis }  Find items bought frequently together }  Find Web pages visited in the same session }  Graph Algorithms à implemented as Recursive Queries }  Find lowest cost flights between two cities … exposed in SQL as Table Function invocations
  • 6. Analytics expressed using PTFs Ø  Aster SQL/MR Function Library Ø  Time Series Analysis Ø  Graph Analysis Ø  Fraud Detection Ø  Sessionization Ø  …
  • 7. PTFs: bottom-line Ø  Enable more interesting Questions more simply in the SQL context enable analysis not expressible in SQL Simplify expressing analysis Ø  Foster Reuse by providing Function Libraries and bridging to external engines. Ø  App. Developers expect and rely on this in other DBs. Ø  Our solution: Ø  An attempt to provide this for Hive Ø  Still under development
  • 8. PTF Invocation Example Example: Market Basket Analysis Ø  Input is a large set of Baskets, each contains a set of Items Ø  Find Items that occur frequently together. No Standard Form. But typical structure is: from FrequentItemSets( Basket partition by basketId order by itemName, supportThreshold= 0.15) Ø  From clause invokes PTF: FIS select itemset Ø  Fn. told how to partition and order Input Ø  Other Args.: Support threshold in this case Ø  No change in other parts of SQL Basket ItemSets   BasketId   ItemName   {"items":["apples","baguette"]} 1   Apples   {"items":["apples","corned_b"]} 1   Baguette   {"items":["apples","hering"]} 2   Apples   {"items":["apples","olives"]} 2   Avocado   {"items":["apples"]} 2   Olives  
  • 9. SQL Windowing Ø  Used to express Aggregations on Partitions Ø  Further Window expressions enable aggregations on a Window surrounding a row. So row specific Aggregations. Ø  Functions available: Ø  Ranking: Rank, DenseRank, PercentRank, NTile Ø  Aggregation: Sum, Min, Max, Avg, StdDev,Variance Ø  Navigation: First Value, Last Value, Lead, Lag Ø  Statistics: CoVariance, Linear Regression: Slope, Intercept Ø  Enable expressing Ø  Cumulative Sums Ø  Delta Analysis Ø  Ratios
  • 10. SQL Windowing, as a PTF Simple Example: Ø  Group Sales data by Channel and Month Ø  Within each Channel: compute Rank, DenseRank, for each Month by sales amount Ø  Also compute Rank over all Months across Channels select channel, month, sum(amount), rank() over (order by sum(amount) desc) AS ra, denserank() over (partition by channel order by sum(amount) desc) as dr, rank() over (partition by channel order by sum(amount) desc) as r From sales Group by channel, month; Key Observation: Ø  Processing happens in 3 stages Ø  First everything else is executed: join, group by, and having clauses Ø  Result Set is made available to Windowing Functions: Partition by Partition. Similar to how PTFs are executed!
  • 11. SQL Windowing, as a PTF If all windowing clauses have the same partition & order expression, 1. Assume Rank then this can be expressed as a PTF on all rows not in original Query select channel, month, r, dr denserank() AS dr, rank() as r 2. Form makes From <Group By Query> explicit: first partition by channel order by sum(amount) everything else is executed; then do Windowing In PTF form calculations. -  Assume a PTF fn. called WindowingTableFunction -  Input is the Group By Query -  Args. to function are Windowing Clauses select channel, month, s, r, dr From WindowingTableFunction( <select channel, month, sum(amount_sold) as s from sales Group by channel, month> partition by channel Our Solution: order by s, [r : <rank()>, dr: <denserank()>] ) 1.  Windowing Clause Support at this level 2.  Can easily add multiple order support 3.  Multiple partition support requires more thought
  • 13. PTFs with Hive } To use } Download jar from https://github.com/hbutani/SQLWindowing/wiki } Setup windowingCli.sh in bin/ext directory } Use in CLI mode } Also usable from API
  • 14. PTFs with Hive CLI Windowing CLI Hive CLI Hive Translator Windowing Shell Ø  In CLI enter HQL or PTF Queries Metadata Ø  WindowingCLI embeds Hive Hive callbacks: Ø  HQL passed to Hive - Metadata Ø  Windowing CLI interacts with Hive for - Execute Embedded Hive metadata and executing embedded Hive Queries Queries Execute PTF Queries as MR jobs Ø  PTF works in a similar fashion to Hive Execute Hive Ø  Translates Queries into MR jobs Queries as MR jobs Cluster
  • 15. Query Structure Query abstraction is a select statement: Ø  Input is a Table Function Invocation Ø  Filter & Project on Table Function output Ø  Table Function call: Ø  Input is Hive Table, Query or another PTF => can chain PTFs Ø  Specify partitioning and order of Input Ø  Other Function Args. Input can be: •  Hive Table •  Hive Query From PartitionedTableFunction( Input Specification •  Another PTF Partition by … Order by … Function Arguments… Not shown here: ) Select (ColumnName | Expression)+ 1.  Output of Query can be written to Hive Where Expression Table or Partition 2.  Can this form be a SubQuery in HQL? Not yet.
  • 16. Query Structure: Windowing Clauses 1. Windowing Clauses one From Input Specification, variation to Simple Query form Partition by … Order by … with windowing clause…., windowing clause… Select (ColumnName | Expression)+ Where Expression Syntactic sugar for From WindowingTableFunction( Input Specification Partition by … Order by … Windowing Clauses … ) Select (ColumnName | Expression)+ Where Expression
  • 17. Query Examples: Basic Query Rank Parts within Manufacturer by price Part 1.  On TPCH Part table MfrName   PartName   Price   2.  Rank Parts within each Manufacturer by Manufacturer#1   violet almond 2095.99   Price orange lavender peach   Manufacturer#2   yellow magenta 2094.99   gainsboro almond turquoise   Not so straightforward w/o windowing because no Manufacturer#2   papaya cream 2095.99   smoke yellow khaki   inter row expressions Manufacturer#1   pink orange peach 2094.99   1.  Rank over all Rows: (Mfr, Part, Overall Rank) beige steel       2.  Rank Min Query: (Mfr, Min(Rank) ) 3.  Join 1 & 2 on Mfr, subtract Rank from Min (Rank) from part_rc partition by p_mfgr Manufacturer#1   violet almond 2095.99   1   orange lavender order by p_mfgr, p_retailprice desc peach   with Manufacturer#1   pink orange peach 2094.99   2   rank() as r beige steel   select p_mfgr,p_name, p_retailprice, r Manufacturer#2   papaya cream 2095.99   1   smoke yellow khaki   Manufacturer#2   yellow magenta 2094.99   2   gainsboro almond     turquoise  
  • 18. Demo
  • 19. Query Examples: Top N Calculate the Top 3 Tracts(based on land area) by County. Census Geo Header data: County   Tract   AreaLand   SumLev   1.  Geography dimension for Census data. 001   451101   300   140   2.  Contains data from multiple hierarchies and levels. 001   1200   120   005   000102   35   140   3.  Query on County-> Census Tract -> Census Block 004   000200   15   140   hierarchy 4.  Summary Level column used to identity level from <select county, tract, arealand from geo_header_sf1 where sumlev = 140> partition by county order by county, arealand desc with County   Tract   AreaLand   R   Cum_Area   rank() as r, 001   451101   300   1   300   sum(arealand) over rows     between unbounded preceding and current row as cum_area 001   450701   250   2   550   select county, tract, arealand, r, cum_area     where <r <= 3> 001   441503   150   3   700       005   000102   450   1   450   005   000200   200   2   650   1. Input is a HQL 3. Only output top 2. Sum from start query 3 rows from each of Partition up to partition. Current row.
  • 20. PTF Example: NPath Now example of a PTF: NPath }  Look for patterns in Time }  User specifies Labels: interesting conditions, for e.g. LATE : arr_delay > 15 mins }  Then specifies Patterns on Labels. Patterns are simple Regexes. For e.g. }  LATE.LATE.LATE.LATE.LATE+ à look for occurrences where a flight is 5 or more times late. }  On Occurrences found (Occurrences are a set of rows) specify aggregation calculations. For e.g. }  Average Delay among late occurrences }  Number of delays Note }  This is a non trivial function to implement. }  But from User point of view just another Function invocation. Can specify Function behavior through arguments }  Also Query executed in the same way: Partition input, invoke function on each Partition…
  • 21. PTF Example: NPath Find incidents where a Flight(to NY) has been more than 15 minutes late 5 or more times in a row. 1. Query on FlightsData table, restrict to flights to NY from npath(<select origin_city, year, month, day_of_month, arr_delay, fl_num from flightsdata 2. Looking at data where dest_city = 'New York' and dep_time != ''> per Flight; order partition by fl_num within partition order by year, month, day_of_month, by time <[LATE : "arr_delay > 15"]>, 'LATE.LATE.LATE.LATE.LATE+', 4. This is very hard in SQL. <["origin_city", "fl_num", "year", "month", "day_of_month", Remember the LABEL and ["(path.sum() { it.arr_delay})/((double)count)", "double", "avgDelay"], PATTERNS are specified at ["count", "int", "numOfDelays"] Query execution time. So ]> window of analysis is ) dynamic. select origin_city, fl_num, year, month, day_of_month, avgDelay, numOfDelays Origin   FlNum   Year   Month   Day   AvgDelay   NumOfDely   3. Boston   1017   2010   10   25   59.37   8   -  Arg. 3 specify conditions as LABELS Boston   1017   2010   10   26    58.14   7   -  Arg. 4 specify PATTERN Boston   1017   2010   10   28   30.83   6   -  Arg. 5 specify AGGR. EXPRESSIONS Boston   1017   2010    10   29   25.67   5   Pittsburgh   1058   2010   12   26   82.62   8  
  • 22. Our Solution: What’s available Ø  Windowing Functions Ø  21 functions available Ø  For Ranking, Aggregation, Navigation and Statistics Ø  Both Row based and Value based windows. Ø  One Pass PTFs Ø  NPath Ø  Others in the works: Allocation, Deallocation etc. in the bucket of Lightweight Dimensional Analysis: See wiki for details. Ø  Multi Pass PTFs Ø  Market Basket Analysis: using Dynamic Item Set counting Algorithm. Ø  Plans to do Generalized Transitive Closure. Looking for your input and help implementing others
  • 23. Our Solution: Query Evaluation
  • 24. Query Evaluation: a PTF Hopefully No Surprise Ø  Shuffle to Partition and Sort Input Map Ø  PTFs work on Partitions instead of DataSet Splits Rows Ø  PTFs use Groovy for expression evaluation today. Writables Shuffle controlled by + partition and order SerDe specification   Output Partition Partition Ø  A PartitionedTableFunction (PTF) given a Writables Writables Partition computes an output Partition. + PTF + Ø  An invocation of PTF specifies how input SerDe SerDe dataset should be partitioned and ordered. Ø  A PTF defines shape of Output. Ø  Partitions are containers of Rows. Ø  A PTF may operate on raw data before it is Ø  List of Writables + ObjectInspector partitioned and ordered. Ø  Rows exposed as Groovy Binding; Partition exposed as Groovy iterable. For details see doc. directory on GitHub Ø  All Evaluation in context of Row and Partition and optionally Window
  • 25. Multi Pass & Recursive Queries PTFs
  • 26. Multi Pass and Recursive Queries PTFs }  So far }  functions perform one pass over the input }  function acts on input after it is partitioned. }  But other use cases require multiple passes on the input Ø  Since each Partition executed independently: Ø  you may need to consolidate Output Ø  and based on consolidation revisit data. Ø  Recursive Queries used for implementing Graph Algorithms are an important subclass of such problems Ø  In the Context of this talk the focus is: Ø  How do these fit into the PTF model: both from an interface perspective and also from an execution model
  • 27. Multi Pass PTFs }  Key Observation: }  Execution model can be extended with following changes: }  Partition input and persist }  Repeat à a fixed # of iterations or dynamically determined }  Map-side: operate on persisted Input partitions; do Partition à Partition }  use shuffle to consolidate output across partitions. }  Output from one pass read as Input in next pass. Ø  But from Interface Perspective Ø  End User still sees this as a PTF invocation Multi Pass mechanics still being worked out
  • 28. Query Evaluation: Multi Pass PTF repeat Input DataSet Map Splits Partition Partition Writables Shuffle controlled by Partition and Writables + PTF mapside + partition and order optionally order SerDe SerDe specification     Partition Partition Ø  Input to Function is Partitioned. Writables Writables Function is applied on Map-Side PTF Ø  + reduceside + Ø  Shuffle used to collect Output SerDe SerDe Ø  Process repeated Ø  Pass n reads output from Pass n-1 Output
  • 29. PTF Example: Market Basket Analysis Ø  Frequent ItemSets computed by doing 2 passes (the SON alg.): Ø  Pass 1 compute Frequent ItemSets for each Partition independently and consolidate across Partitions. Ø  Pass 2 go over input again, eliminate false negatives. Ø  Note: computing Frequent ItemSets is complex Ø  number of possible ItemSets is exponential Ø  Many interesting algorithms: we have implemented Dynamic Item Counting alg. Ø  See wiki for details…. Ø  This is not a big jump from 1 pass Ø  Output of Pass 1 is very small relative to input; so no issue of communication cost Ø  Only 2 passes involved.
  • 30. PTF Example: Market Basket Analysis Basket Implementation and Interface needs work: BasketId   ItemName   •  User exposed to the fact that input initially 1   Apples   partitioned. And then output of function is 1   Baguette   partitioned. •  Also Multi Pass mechanics not ready. 2   Apples   2   Avocado   2   Olives   from candidateFrequentItemSets( from FrequentItemSets( <select * from basketdata Basket distribute by basketName partition by basketId order by itemName, sort by basketName, itemName> supportThreshold= 0.15) partition by itemset order by itemset, select itemset 'basketName', 'itemName', <0.15>) select itemset ItemSets   {"items":["apples","baguette"]} {"items":["apples","corned_b"]} {"items":["apples","hering"]} {"items":["apples","olives"]} {"items":["apples"]}
  • 31. Recursive Queries as PTFs }  Heart of Graph Algorithms is traversal: the discovery and selection of Paths }  In SQL context Graph held in Table R(Src., Dest.) and traversal expressed as Relational Operators run iteratively to a fixed point. }  Typically until no new Paths discovered. }  Lot of work in DB community to parallelize these Algorithms: }  Partition work }  Reduce communication à naïve impl. will suffer from high comm. cost. }  Of late revival of interest: }  HaLoop project: Relation based implementation. Based on tweaking MR mechanics: changes to JobTracker and TaskTracker. }  Giraph project: Matrix based Direct Algorithms
  • 32. Recursive Queries as PTFs }  In PTF context: }  A Graph Algorithm involves }  An Input Relation R(Source, Destination) }  Output contains Paths that meet a certain criteria. }  A Recursive Query is processed in a fashion very similar to a Multi Pass Algorithms }  Partition input and persist ß Partition R by Destination }  Repeat }  operate on persisted partitions ß Do map-side Join of R with newly discovered Paths }  use partitioning to consolidate output across partitions. ß partition output of join (the new Paths) by Source. Dedup; output only new Paths. }  But for End User still sees this as a PTF invocation Under the covers could use HaLoop or Giraph to implement Alg.?
  • 33. PTF Example: Transitive Closure Flights Simple example Source   Destination   •  List all possible Routes New York   London   New York   Paris       London   Bombay       London   Dubai   Dubai   Bombay   from transitiveClosure( <select * from Flights distribute by Dest> partition by Src, ‘Src', ‘Dest’) select Src, Dest Source   Destination   New York   London   New York   Paris       London   Bombay       London   Dubai   Dubai   Bombay   New York Bombay New York Dubai
  • 34. PTF Example: Generalized TC But TC Mechanics can be generalized. More Interesting example: Specify: Input: FlightsTable(Src, Dest, ArrTm, DepTm, Cost) Ø  Path Joining Condition: how to generate new Ø  Find me the best routes from any airport in New York to Paths Bombay. Ø  Path Attributes: how to calculate Aggregation Ø  The waits at intermediate points must be between 2 to 5 hours. attributes on Paths à Sets of Edges Ø  Pick the lowest cost flight, but for a direct flight I am willing to Ø  Path selection criteria: how to pick from multiple pay a $100 premium. Paths between a Source and Destination. from GeneralizedTC( from GeneralizedTC( <select * from Flights <select * from Flights where Src = ‘New York’ distribute by Dest> distribute by Dest> partition by Src, partition by Src, SourceColumn, ‘Src’, DestColumn, ‘Dest’, Path Joining Condition, Path Joining Condition = <Dest = Path.Src && [Path Attributes], ArrTm <= Path.DepTm - 120 && Path selection condition, ArrTm >= Path.DepTm - 300>, select Src, Dest,…. Path Attributes = [ Where … totalCost: <sum(Price)>, hops : <count(*)> ], Path selection condition = < p1.cost <= p2.cost || Google for ‘Shaul Dhar Generalized Transitive Closure in SQL.’ (p1.hops == 0 && p1.cost <= p2.cost + 100) 1993 Phd thesis from Univ. of Wisconsin, >, Madison. select Src, Dest,…. Where Dest = ‘Bombay’
  • 36. Benefits Ø  Enable more interesting Questions more simply in the SQL context enable analysis not expressible in SQL Simplify expressing analysis Ø  Foster Reuse by providing Function Libraries and bridging to external engines. Ø  App. Developers expect and rely on this in other DBs.
  • 37. Next Steps Ø  Make Windowing clauses closer to standard SQL Ø  Use Hive Expressions Evaluation at runtime instead of Groovy Ø  So why cannot this be part of Hive? Ø  Yes makes a lot of sense. See wiki for a step-by-step plan. Ø  Build out Multi-Pass and Recursive Query mechanics Ø  Enhance PTF interface. Ensure simple ifc for End User. No leakage of implementation. Ø  Investigate using HaLoop/Giraph as execution model Ø  Flush out function library
  • 38. More information }  More details/download at https://github.com/hbutani/SQLWindowing/wiki }  Contact: }  Harish Butani: [email protected]
  • 40. So Why PTFs? Ø  Usability Ø  Table Functions: enable more interesting Questions more simply Ø  Enable analysis not expressible in SQL Ø  Simplification of existing queries: both syntactically and in performance Ø  replace self-joins with intra row computations Ø  Reusability Ø  Functions are parameterized, so reusable in wide range of contexts Ø  Functions can reshape Output Schema at runtime Ø  Bridge Ø  Since ifc to SQL engine is coarse, execution of PTFs may involve other MPP engines (multi-pass) Ø  Speed Ø  Partitioned Table Functions enable working at scale by breaking down dataset into manageable Partitions
  • 41. So Why PTFs? Ø  Use Cases: Ø  Time Series Analysis: NPath Ø  Market Basket Analysis Ø  Lightweight Dimensional Analytics Ø  Graph Algorithms: Transitive Closure Ø  Sessionization Ø  SQL Windowing Clauses Ø  Can be treated as a PTF if all windowing clauses are on the same Partition.
  • 42. PTF Interface 3. Function may have Map side processing 1. -  Controls Shape of Output -  Responsible for Output 2. Enables chaining of Functions
  • 43. Partitioned Table Function Mechanics Node 1.  Partition input into n Input DataSet Partition 0 partitions, optionally order partitions Func. 2.  distribute over set of processing nodes Partition and Node optionally order 3.  execute Fn. at each node Partition 0   Next Operator 4.  merge data from all Func. instances and stream merged result to next Operator . . . Node   Partition 0 Why? Ø for Performance: enables parallel Func. operation on large datasets Ø for providing advanced analytics This is not the same as Hive Supported in many Databases: Oracle, DB2, Aster etc. UDTF
  • 44. Support in RDBMS: Aster SQL/MR Problem: Group clickstream data into sessions Ø  Consider all rows of a User as a Partition Ø  Order rows by time Ø  Group rows within 60 seconds of each other as belonging to a Session. Benefits Ø  Work at Scale: designed to work over 100s of terabytes Ø  Usability and Reusability: Ø  SQL/MR looks like a table, it can appear anywhere a table can appear in SQL Ø  Enable analysis not expressible in SQL Ø  Simplification of existing queries Ø  Functions are parameterized so usable in wide range of contexts. Aster Function Library Ø  Time Series Analysis Ø  Graph Analysis Ø  Fraud Detection Ø  Sessionization Ø ....
  • 45. Support in RDBMS : Oracle FUNCTION test_ptf (p_cursor IN t_parallel_test_ref_cursor) RETURN t_parallel_test_tab PIPELINED PARALLEL_ENABLE(PARTITION p_cursor BY HASH (id)); END parallel_ptf_api; SELECT sid, count(*) FROM TABLE(test_ptf(CURSOR(SELECT * FROM input) ) ) t2 GROUP BY sid;
  • 46. Hive Script Operators But one can do PTFs in Hive using Script Operators From (From clicks Select transform(userid, pageid, ts) Using ‘/bin/cat’ As (userid, pageid, ts) Distribute by userid Sort by userid, ts) map_output Select transform(userid, pageid, sessionId) Using ‘your java pgm/python script/… pass in args..’ As (userid, pageid, sessionId) But Ø  Not as Usable Ø  compare to Aster Query Ø  Painful for App. Developer to use. Imagine trying to chain functions. Ø  Not very Reusable Analogy Ø  arg passing embedded in strings that spawn process Integration via stdin stdout And implementation artifacts Vs Ø  Data streamed across process boundaries. An embedded Func Library Ø  Type information passing limited: tab separated files.
  • 47. PTF Example: Hierarchical Evaluation For a Country, State, City Geo hierarchy compute: % of Country Sales, Top City Sales, Avg. City Sales from hierarchyEvaluate( < select Country, State, City, sum(Sales) from Sales group by Country, State, City> partition by Country order by Country, State, City, <['Country', 'State', 'City']>, <[ ["Sales / Ancestor('Country', 'Sales') * 100.0", "% Country"], ["TopN(Descendants('City'), 'Sales', ,1)", "Top City"], ["Avg(Descendants('City'), Sales)", "Avg. City"] ]>) select Country, State, City, '% Country', 'Top City', 'Avg. City'
  • 48. Simple PTF implementation Annotation specifies Function Args, Name, behavior Shape based on Function before it in chain; or the Query Input if this is the first function.
  • 49. Query Evaluation: complete picture Ø  A Query is a chain of PTFs. Ø  Input of chain is a Hive Query or table Ø  Windowing clauses are syntax sugar for the special Windowing Table function. Ø  Query translated to a series of Jobs.  Each Job executes part of the Function Chain Ø  For intermediate steps: the output is written to hdfs and exposed as a Temporary Table to be used by the next Job in the Chain.
  • 50. Query Evaluation: Windowing Clauses Input Map DataSet Splits Writables Shuffle controlled by + partition and order SerDe specification   Output Partition Partition Windowing Table Function: Writables Writables 1.  Pass Partition to each Agg. Function + PTF + SerDe SerDe 2.  If Window Clause specified, pass a Window also 3.  Collect output from each Agg. Function 4.  Union over all Agg. Functions + Input used to evaluate select list
  • 51. Query Eval.: Market Basket Analysis Based on the 2 pass SON algorithm as described in the Mining Massive Datasets book. Job 1 Node Ø  Apply Dynamic Item Counting with support appropriately Node Input Basket Partition 0 scaled down. DataSet Ø  Output Candidate IS from Partition 0 each Mapper. Map Reduce Distribute by itemset Partition by basket . Order by basket, item .   Candidate .     ItemSets Job 2 Node Node Partition 0 Scan baskets for Partition 0 Candidate Itemsets Map Reduce Final Distribute by itemset ItemSets  
  • 52. Query Eval.: Transitive Closure Sample invocation From transitiveClosure(<select X, Y from InputTable cluster by Y> partition by X) }  In the ith iteration: Select X, Y; }  On the Map side a partition of T is joined with the corresponding deltaR; this is First Pass: similar to a MapJoin in Hive. Generates new Paths. }  On the Reduce side the new Cluster:   Create ith Partition of R paths from deltaNextR are held Distribute by X   in a searchable structure; the Node i Map Function:   ReduceFunction: Node j rows of the corresponding R partition are streamed to mark T(X, i) O/P (X, Row) O/P Row R(i, Y) any rows in deltaNextR that repeat are duplicates. At the end a new   file is stored in the correponding R partition with the new paths ith Pass: Cluster:   1.  Read in R(i, Y) Create ith deltaR file 1.  Read in dR(I, Y) 2.  Perform dRnext – R in R(i, Y) 3.  Add dR rows to R Node i 2.  Perform dR join T Map Function: ReduceFunction: Node j   T(X, i) O/P (X, Row) 3. Distribute by X   Can we use HaLoop or Giraph to do Graph Processing?
  • 53. Can this be part of Hive? }  There are undeniable reasons for doing this, briefly: }  end users would want this functionality inside Hive for reasons of consistent behavior, support etc. }  use of a consistent expression language. For e.g. reuse of Hive functions in Windowing clauses. }  Implementation wise: }  Windowing is orders of magnitude simpler than Hive, and can benefit from using equivalent components that are in Hive. }  Avoid the trap of constantly chasing changes in the Hive code base. }  Folding in Table function mechanics may open up optimizations not possible with the approach today.
  • 54. Path to folding into Hive Possible Path to moving into Hive ( Details here) Ø  Step 1: Move to Hive MR mechanics for Job execution Ø  Step 2: Move to Hive Evaluators and Expressions Ø  Step 3: Introduce the concept of Partition Table Functions in Hive; allow users to invoke PTFs via an exec Function mechanism. Ø  Step 4: Allow Table function invocations to appear in Table Expressions; do AST transformations to translate to HQL followed by exec PTF. Ø  Step 5: Extend HQL with Windowing Clauses