Building a Rule Engine with SQL Server
Joshua Greenberg, Ph.D.
February 2006
Summary: Rules play a central role in a wide variety of applications. In addition to the declarative specification of business rules, the simple rule engine design described in this article can be used to implement state machines, predicate dispatchers, or any other rule-based system. (15 printed pages)
Contents
Introduction
Running Example
Design Fundamentals
A Few Examples
Conclusion
Introduction
The term rule is often used to describe what may more accurately be called a well-formed formula (wff). In many applications, the satisfaction of a wff is necessary to identify the suitability of some action. For example, in a customer relationship management (CRM) application, we may want to suggest one or more products similar to the product currently being purchased. To make our suggestion as attractive as possible, it would be wise to leverage the information we have gathered about this, and other, customers. For instance, our decision about which additional products to offer might be based on the current customer's age, purchase history, gender, geographic location, or a wide array of other factors. By including such demographics, we can maximize the likelihood that our suggestions will be valuable to our customers, and will consequently translate into additional purchases and revenue for our firm.
In this article we present a design for a rule engine based on nothing more than a few tables and queries in a SQL Server database. The running example used to explore the various aspects of the design was chosen to be as realistic as possible. As a result, the simplicity of the rule engine design is often overshadowed by the complexity of the particular business application. While reading through the content, it is important to keep the following points in mind:
- The rule-engine implementation is simple and convenient to implement and maintain.
- The core of the rule-engine implementation uses nothing more than a single table, Wff, and a single query, QueryOfQueries. This article presents some variations of the Wff table and QueryOfQueries procedure, but the fundamental rule-engine requirements are completely encapsulated by the very simple QueryOfQueries2 procedure.
- The criteria used to construct a rule (e.g., eye-color = blue, age < 20, etc.) are composed of operands and operators. The rule-engine implementation has immediate access to all of the operators available in SQL, and can be easily extended to include any custom operators (by using SQL Server User Defined Functions). In other words, the rule-engine has no limitations on the complexity of the operators it can use.
- The article is broken into two main divisions. First, we show how the rule engine is used to identify the satisfaction of one or more rules (wffs). Then, given the IDs of the satisfied rules, we show how this information can be translated into the suggestion of a particular cross-sell product, a transition in a state-machine, or the selection of a method implementation in a dispatcher. The important point to understand is that the rule-engine implementation is actually orthogonal to the particular business application. Furthermore, connecting the rule-engine implementation to the business application is as simple as adding a new column to the Wff table, or joining the Wff table to your business specific tables.
- Our rule-engine implementation is presented in contrast to the more standard (and costly) implementations based on RETE or other similar algorithms. The majority of rule engines use an internal rule structure such as Rule(Operand, Operator, Value). Such implementations have their own set of advantages and disadvantages. One of the major disadvantages is the amount of source code needed to actually implement the rule engine. In our simple SQL-based rule-engine, there is no code—just a table and a query. While it would be foolish to state that the rule-engine presented here is superior to all other implementations, I believe it is fair to say that the simplicity (and zero-cost) of this model may make it the perfect option for a huge number of applications. If I were selling this model, I would suggest that any IT professional ready to engage the power of a rule-based system would be wise to begin with the rule engine proposed in this article, and then move to a more complex solution only once they have exhausted the power of this simpler model.
- Throughout the article, no effort has been made to explore possible performance improvements. The focus of this article is an explanation of the rule-engine implementation. While there are many ways to greatly improve the performance of the solution, the implementation shown here will probably work quickly enough for the majority of applications.
Running Example
Since our goal is to design a working implementation, it will help to have a concrete example. Our particular example will be kept simple enough to avoid complicating the explanation, but will be complex enough to highlight all of the important features in such a system.
Consider a Web site that sells gourmet foods. When a customer decides to make a purchase, we request that the customer register and answer a small set of personal questions. In addition to the standard name, address, and phone, we assume that the customer also provides us with their age and gender. Based on this information and a history of other buyer's purchases and personal information, we would like to suggest an additional product to the customer at checkout time. To accomplish this, we have established the following set of criteria (functions):
- DOBBetween(c,x,y) — True if customer, c, was born between dates x and y
- AvgMonthlyPurchase(c,x) — True if customer, c, spends more than x dollars per month
- ResState(c,x) — True if customer, c, lives in state x (e.g., Florida)
- Gender(c,x) — True if customer, c, has gender, x
- Purchasing(c,x) — True if customer, c, is currently purchasing product, x
A schema used to hold customer, order, and item information might include the following tables.
Customer(ID, Fname, Lname, DOB, Gender)
Address(ID, CustID, Street1, Street2, City, State, Zip)
Order(ID, CustID, Date)
OrderItem(ItemID, OrderID, Qty)
Item(ID, Name, Price)
Since this article is not focused on relational schema design, we have kept the schema rather simple. Essentially, the Customer table holds the customer's first name, last name, gender, and date of birth. The Address table contains a customer's address, and uses a foreign key into the Customer table to identify the matching customer. The Order table gives each order a unique ID, maintains a foreign key pointing to the customer making the order, and also holds the date the order was made. Finally, the OrderItem table has foreign keys into both Order and Item, and maintains the type and number of each Item purchased in a given Order. The price of those items is found in the Item table. Obviously, a real schema may be far more complex, but we will only be using this schema for a brief moment, so it will be quite adequate.
With our tables in hand, we can easily design a query that returns all customers matching a given set of criteria. For example, consider the following SELECT statement.
CREATE PROCEDURE FLMalesOverTwentyThatBuyMonthlyNowBuyingSausage(@CustID)
AS
SELECT Customer.ID
FROM Customer INNER JOIN
Address ON Customer.ID = Address.CustID INNER JOIN
[Order] ON Customer.ID = [Order].CustID INNER JOIN
OrderItem ON [Order].ID = OrderItem.OrderID INNER JOIN
Item ON OrderItem.ItemID = Item.ID
WHERE (Customer.Gender = 'M')
AND (Customer.DOB > '1/1/65')
AND (Customer.DOB < '1/1/85')
AND (Address.State = 'FL')
AND (Customer.ID = @CustID)
GROUP BY Customer.ID
HAVING (SUM(Item.Price * OrderItem.Qty) / (DATEDIFF(MM, MIN([Order].[Date]),
MAX([Order].[Date])) + 1) > 100)
The query above will return the customer ID of the current customer if that customer meets the following criteria:
The customer purchases at least $100 worth of goods each month.
The customer is male.
The customer was born between 1/1/65 and 1/1/85.
The customer's state of residence is Florida.
The customer is currently purchasing Italian Sausage. Note that this criterion does not appear explicitly in the query, but is assumed implicitly since we are running this particular query. We will see a better way to accomplish this when we explore the design of our rule engine.
This is a reasonably complicated query that allows us to identify the customers falling into a rather specific group. It is likely that our application would define many such groups. For example, all else being equal, we may want to offer men a red wine and women a white wine. Or, if the age of our buyer is less than 21, we may wish to offer a special jar of gourmet olives. In general, there may be a number of fundamentally similar queries that allow us to partition our customers, and to offer them a product specific to their group. A brute force approach to identifying the group to which our current customer belongs would be to run all of these slightly different queries and then to check to see which query returns the ID of our current customer. Obviously, this is not very convenient, but in the end we would, hopefully, find the group into which our current customer falls. So, if we assumed that for each query (slight variation of the same query) we also maintain an associated product (Item.ID), then when we finally discover the query that returns our current customer ID, we could offer that associated product as a cross-sell. The algorithm looks something like:
For Each Query in our List of Queries
Run the Query
If the Query returns our current customer's ID
Find the ItemID associated with this query
Offer this Item to our current customer
Break
End If
End For
Clearly, there must be a better way. Keep in mind that there might be hundreds of variations of this same query and we would now need to keep track of each variation and its associated cross-sell product. Perhaps, instead, we can turn the situation inside out, and store the queries as rows in a table. Then, we could ask a different question. That is, instead of running each different select statement to ask, "Does this customer match our given criteria," we could issue a single query that would ask, "Which queries are satisfied by our current customer's data?" So, we are now left with the problems of storing all of our query variations in one (or more) tables, and of designing a new query that identifies a set of satisfied queries. While this may sound confusing, it is actually quite straightforward.
Design Fundamentals
In the previous section, we explored an inefficient algorithm for determining the partition into which our current customer falls. One important feature of our partitioning mechanism is that the distinguishing criteria were similar for each of the different partitions. That is, applying the same basic criteria with slightly altered instantiation values imposed the different partitions. For example, the partition created by our example query returned only men of a certain age meeting a few other restrictions. A second partition was based on the same restrictions except that it contained only women instead of men. In general, using the same set of criteria created all of our different partitions:
- DOBBetween(c,x,y) — True if customer, c, was born between dates x and y
- AvgMonthlyPurchase(c,x) — True if customer, c, spends more than x dollars per month
- ResState(c,x) — True if customer, c, lives in state x (e.g., Florida)
- Gender(c,x) — True if customer, c, has gender, x
- Purchasing(c,x) — True if customer, c, is currently purchasing product, x
So, all of our queries are based on these same five criteria, but with different values for c, x, and y. While a real system may have a much larger set of criteria, it is rather unlikely that any single partitioning scheme would be based on too vast a number of criteria. One reason for this is that most partitioning schemes would seek to impose disjoint partitions (equivalence classes). In terms of queries, we are suggesting that each different query would return a completely different set of customers. In the context of our running example, this makes sense, since the specificity implied by such a scheme ensures that only a single cross-sell product is suggested. If this were not the case, we would either have to develop a conflict resolution mechanism, or suggest multiple products to our customer (there is actually a far better way to suggest multiple products to the customer, as I will show later). Of course, there is nothing wrong with these latter approaches, and they can both be easily implemented using the rule engine present shortly, but they may not always be the right choice. Either way, it is likely that a given application (or problem within an application) will be solved with fewer than 20 or 30 criteria.
To be clear, I will present a design that can work with either disjoint or overlapping partitions, but I suggest that in many situations a scheme based on disjoint partitions will be most effective.
The first step in building our wff engine is identification of the initial criteria; our five sample criteria have already been outlined above. Once this is done, a table is constructed to hold each of the criteria's variables, except for what we might call the this variable. For example, consider the criterion, DOBBetween(c,x,y). This criterion can easily be recast as c.DOBBetween(x,y). This same modification can be applied to all of our criteria, and we can therefore eliminate the, c-variable, and simply write our criterion as DOBBetween(x,y). Next, we need to create columns in our table for each of the variables in the resultant term. In this case, we might call one column, MinDOB, and the other variable, MaxDOB. Proceeding in this fashion, the following table would be constructed from the five criteria above.
Wff(ID, MinDOB, MaxDOB, MinMonthlyPurchase, State, Gender, CurItemID)
The first column, ID, represents the unique identifier for wff. However, the ID field is not a primary key for the table. As I will explain shortly, we use this ID to simplify the implementation of the logical OR operator in my wffs.
Using this table structure, we can now convert our sample query into a record. The corresponding record appears as follows.
Wff(1, '1/1/65', '1/1/85', 100, 'FL', 'M', 14)
It should be clear that we have simply taken the constant values from FLMalesOverTwentyThatBuyMonthlyNowBuyingSausage, and added them into a record of the Wff table. However, notice that the in the last column we are now explicitly identifying the current item being purchased (e.g., Italian Sausage) rather than depending on any implicit mechanism ties to a specific query. The next step is creating a query that returns this record (ID = 1), if and only if FLMalesOverTwentyThatBuyMonthlyNowBuyingSausage would return the current customer's ID. This is accomplished with the following new query.
CREATE PROCEDURE QueryOfQueries(@CurItemID, @CurCustID)
AS
SELECT DISTINCT Wff.ID
FROM Customer
INNER JOIN Address ON Customer.ID = Address.CustID
INNER JOIN [Order] ON Customer.ID = [Order].CustID
INNER JOIN OrderItem ON [Order].ID = OrderItem.OrderID
INNER JOIN Item ON OrderItem.ItemID = Item.ID
CROSS JOIN Wff
WHERE (Customer.Gender = Wff.Gender OR Wff.Gender IS NULL)
AND (Customer.DOB > Wff.MinDOB OR Wff.MinDOB IS NULL)
AND (Customer.DOB < Wff.MaxDOB OR Wff.MaxDOB IS NULL)
AND (Address.State = Wff.State OR Wff.State IS NULL)
AND (@CurItemID = Wff.CurItemID OR Wff.CurItemID IS NULL)
AND Customer.ID = @CurCustID
GROUP BY Customer.ID, Wff.ID, Wff.Gender, Wff.MinDOB, Wff.MaxDOB, Wff.State,
Wff.MinMonthlyPurchase, Wff.CurItemID
HAVING ((SUM(Item.Price * OrderItem.Qty) / (DATEDIFF(MM, MIN([Order].[Date]),
MAX([Order].[Date])) + 1) > Wff.MinMonthlyPurchase) OR
(Wff.MinMonthlyPurchase IS NULL))
Now, we can examine what QueryOfQueries actually does.
The QueryOfQueries procedure is inseparable from the Wff table. Each record in the Wff table defines a different set of constant values used as the WHERE clause restrictions for a given query. As I have already described, the record:
Wff(1, '1/1/65', '1/1/85', 100, 'FL', 'Male', 14)
represents the FLMalesOverTwentyThatBuyMonthlyNowBuyingSausage query. This works because of the way a SELECT statement does its processing. If we were to take the values in Wff(ID=1) and substitute them for their corresponding values in QueryOfQueries, we would be left with the following SELECT statement:
CREATE PROCEDURE QueryOfQueriesA(@CurItemID, @CurCustID)
AS
SELECT DISTINCT Wff.ID
FROM Customer
INNER JOIN Address ON Customer.ID = Address.CustID
INNER JOIN [Order] ON Customer.ID = [Order].CustID
INNER JOIN OrderItem ON [Order].ID = OrderItem.OrderID
INNER JOIN Item ON OrderItem.ItemID = Item.ID
CROSS JOIN Wff
WHERE (Customer.Gender = 'M' OR 'M' IS NULL)
AND (Customer.DOB > '1/1/65' OR '1/1/65' IS NULL)
AND (Customer.DOB < '1/1/85' OR '1/1/85' IS NULL)
AND (Address.State = 'FL' OR 'FL' IS NULL)
AND (@CurItemID = Wff.CurItemID OR Wff.CurItemID IS NULL)
AND Customer.ID = @CurCustID
AND Wff.ID = 1
GROUP BY Customer.ID, Wff.ID, Wff.Gender, Wff.MinDOB, Wff.MaxDOB, Wff.State,
Wff.MinMonthlyPurchase, Wff.CurItemID
HAVING ((SUM(Item.Price * OrderItem.Qty) / (DATEDIFF(MM, MIN([Order].[Date]),
MAX([Order].[Date])) + 1) > 100) OR (100 IS NULL))
Simplifying further, we obtain:
CREATE PROCEDURE QueryOfQueriesB(@CurItemID, @CurCustID)
AS
SELECT DISTINCT Wff.ID
FROM Customer
INNER JOIN Address ON Customer.ID = Address.CustID
INNER JOIN [Order] ON Customer.ID = [Order].CustID
INNER JOIN OrderItem ON [Order].ID = OrderItem.OrderID
INNER JOIN Item ON OrderItem.ItemID = Item.ID
CROSS JOIN Wff
WHERE Customer.Gender = 'M'
AND Customer.DOB > '1/1/65'
AND Customer.DOB < '1/1/85'
AND Address.State = 'FL'
AND @CurItemID = Wff.CurItemID
AND Customer.ID = @CurCustID
AND Wff.ID = 1
GROUP BY Customer.ID, Wff.ID, Wff.Gender, Wff.MinDOB, Wff.MaxDOB, Wff.State,
Wff.MinMonthlyPurchase, Wff.CurItemID
HAVING (SUM(Item.Price * OrderItem.Qty) / (DATEDIFF(MM, MIN([Order].[Date]),
MAX([Order].[Date])) + 1) > 100)
For those who cringe at the sight of the CROSS JOIN, it should be clear that the INNER JOINs collectively return only a single record (due to the Customer.ID = @CurCustID restriction). An alternative, though visually more complex, formulation of this query makes this point very clear.
CREATE PROCEDURE QueryOfQueriesAlt(@CurItemID, @CurCustID)
AS
SELECT DISTINCT Wff.ID
FROM
(SELECT
Customer.ID AS CID,
SUM(Item.Price * OrderItem.Qty) / (DATEDIFF(MM, MIN([Order].[Date]),
MAX([Order].[Date])) + 1) AS AvgMonthlyPurchase,
Customer.Gender AS CGender,
Customer.DOB AS CDOB,
Address.State AS Cstate
FROM Customer
INNER JOIN Address ON Customer.ID = Address.CustID
INNER JOIN [Order] ON Customer.ID = [Order].CustID
INNER JOIN OrderItem ON [Order].ID = OrderItem.OrderID
INNER JOIN Item ON OrderItem.ItemID = Item.ID
WHERE Customer.ID = @CurCustID
GROUP BY Customer.ID, Customer.Gender, Customer.DOB, Address.State)
Cids
CROSS JOIN
Wff
WHERE (Cids.CGender = Wff.Gender OR Wff.Gender Is NULL)
AND (Cids.CDOB > Wff.MinDOB OR Wff.MinDOB Is NULL)
AND (Cids.CDOB < Wff.MaxDOB OR Wff.MaxDOB Is NULL)
AND (Cids.CState = Wff.State OR Wff.State Is NULL)
AND (Wff.CurItemID = @CurItemID OR Wff.CurItemID Is NULL)
AND (Cids.AvgMonthlyPurchase > Wff.MinMonthlyPurchase OR
Wff.MinMonthlyPurchase Is NULL)
In QueryOfQueriesAlt, we see that the inner SELECT statement only returns a single record (due to the Customer.ID = @CurCustID restriction). Consequently, the CROSS JOIN has no performance penalty. Furthermore, one may also recognize that the complexity of the query is only a result of the need for a GROUP BY to calculate the average monthly purchases. If your particular business rules do not require a GROUP BY, then the resultant query can be built with little difficulty. Finally, as we will see at the end of this section, there is an alternative formulation of the QueryOfQueries procedure that neatly separates the business-specific schema complexity from the rule-engine specific functionality. If you are curious right now, take a peek ahead at the QueryOfQueries2 procedure. Those few lines are truly all one needs to implement the rule engine; the rest of the complexity above is domain-specific.
The query, QueryOfQueriesB, except for the fact that it returns a Wff.ID instead of a Customer.ID, is equivalent to FLMalesOverTwentyThatBuyMonthlyNowBuyingSausage. You may be wondering about the effect of the additional Wff.ID = 1 restriction in the WHERE clause. However, a quick glance at the query should convince you of the fact that the WFF.ID value will only be returned if the Customer.ID would have been returned. Furthermore, it should be clear that the addition of the Wff.ID = 1 restriction was only done because we had explicitly restricted QueryOfQueriesB to use the values of the Wff(ID=1) record. In fact, this latter observation may completely elucidate the design. When the SELECT statement in QueryOfQueries executes, it goes record by record through the Wff table, replacing the Wff.<field> placeholders with the values found in each record. We have simply shown the equivalent of the first of these replacements, and given it the explicit name, QueryOfQueriesB. Thus, execution of QueryOfQueries on a Wff table with n records will result in the creation and execution of n queries of the form found in QueryOfQueriesB. So, at this point we have shown how to use a table and a query to check for the satisfaction of a well-formed formula. Or, more accurately, we have shown how to use a single table and a single query to check for the satisfaction of any number of well-formed formulas (assuming those formulas are constructed using the same criteria).
Before moving on to more specific and useful examples, there are a few points worth discussing. First, we have not yet explored the necessity of the "(OR Wff.<field_name> = NULL)" expressions. The easiest way to explain these is by taking a simple example. Consider a cross-sell for customers under 21 (since this article will first be read in 2006, we will just check for customers with a DOB > '1/1/85'). Clearly, if this is our only criterion, we need some way of ensuring that all of the other criteria implied in the design of our Wff table are ignored. This is done very easily, and results in creation of the following record.
Wff(2, '1\1\85', NULL, NULL, NULL, NULL, NULL)
Without going through the complete substitution process for this record, it should be clear that all of the "(OR Wff.<field_name> = NULL)" expressions, except for (Wff.MinDOB is NULL), will return TRUE, and the only condition left to be tested will be Customer.DOB > '1/1/85'. Therefore, if this single condition returns TRUE, then the Wff(ID=2) record will be returned, implying that the equivalent well-formed formula is satisfied.
Throughout this article, I have been speaking about well-formed formulas, but I have not really given a direct example. At this point, a few examples might help to clarify the remaining content in this section. Using a slight variation of the five criteria we established above, we can express the two wffs corresponding to records 1 and 2 in our Wff table. Note that we have replaced the DOBBetween(x,y) criterion with two new criteria, MinDOB(x) and MaxDOB(x).
Wff(ID = 1): MinDOB('1/1/65') AND MaxDOB('1/1/85') AND AvgMonthlyPurchase(100)
AND ResState('FL') AND Gender('Male') AND Purchasing(14)
Wff(ID = 2): MinDOB('1/1/85')
It should be immediately apparent that the only logical operator we have used is AND (conjunction). Before you grow concerned, rest assured that the model is in no way limited to this single operator. To be complete, we must at least be able to represent OR (disjunction) and NOT (negation). This is actually quite simple, and requires no algorithmic modifications to anything we have already presented.
The negation operator is trivially realized by adding your desired restriction to the query in QueryOfQueries and by adding the column containing the complementary value to the Wff table. For example, if we were only interested in individuals without blue eyes, we would alter our Wff table by adding a column called, BadEyeColor. The new schema for Wff would now be:
Wff(ID, MinDOB, MaxDOB, MinMonthlyPurchase, State, Gender, CurItemID, BadEyeColor)
The change to the query is equally trivial. We simply add a new constraint to the WHERE clause:
AND ((NOT Customer.EyeColor = Wff.BadEyeColor) OR (Wff.BadEyeColor is NULL))
The corresponding row in the Wff table would be:
Wff(3, NULL, NULL, NULL, NULL, NULL, NULL, 'Blue')
Implementation of the disjunction operator is equally trivial. For example, if we were interested in people who were either born after '1/1/85' or did not have green eyes and were male, we would not have to alter the Wff table, nor would we have to alter the query. The only step required is the addition of two records to the Wff table.
Wff(4, NULL, NULL, NULL, NULL, 'Male', NULL, 'Green')
Wff(4, '1\1\85', NULL, NULL, NULL, NULL, NULL, NULL)
As described earlier, the Wff.ID column is not a primary key. When the query in QueryOfQueries is executed, it has two opportunities to return the Wff.ID = 4. That is, it will return the Wff.ID = 4 if the customer does not have green eyes and is a male OR if he/she was born after '1/1/85'. In terms of a well-formed formula, we are now able to represent:
Wff(ID = 4): (Gender('Male) AND NOT EyeColor('Green')) OR MinDOB('1\1\85')
The ability to express AND, OR, and NOT, gives us the capacity to express any logical operator including the conditional and bi-conditional. We close this section with a further simplification to our explanatory model. Consider the QueryofQueries2 procedure shown below.
CREATE PROCEDURE QueryOfQueries2(@BirthDate, @AvgMonthlyPurchase,
@State, @Gender, @CurItemID)
AS
SELECT DISTINCT Wff.ID
FROM Wff
WHERE ((MinDOB < @BirthDate) OR (MinDOB is NULL))
AND ((MaxDOB > @BirthDate) OR (MaxDOB is NULL))
AND ((MinMonthlyPurchase < @AvgMonthlyPurchase) OR (MinMonthlyPurchase is NULL))
AND ((State = @State) OR (State is NULL))
AND ((Gender = @Gender) OR (Gender is NULL))
AND ((CurItemID = @CurItemID) OR (CurItemID is NULL))
QueryofQueries2 is designed under the assumption that it appears within a stored procedure accepting @variables for all arguments to the query. Those variables represent the current customer's properties. This is an alternative to the QueryofQueries procedure where the query pulls data directly from the available tables. The QueryOfQueries2 version will work regardless of whether the current customer has or has not already been added to our database. Of course, if we are pulling historical data for this customer, such as his average monthly purchase, then some amount of upfront work (based on an existing CustID) would still be required to make the call to QueryOfQueries2. However, that work (e.g., calculating the average monthly purchase and passing it in as a parameter) is going to be done either way. So, QueryOfQueries2 is equally as efficient as QueryOfQueries, but its implementation is much easier to follow, and it allows us to focus on the important aspects of the design, as opposed to the tangential questions of how to calculate a customer's average monthly purchases. In other words, from this point forward we can ignore the complexity of a business-specific query (with GROUP BY and multiple INNER JOINs) and simply focus on the more fundamental aspects of the design
A Few Examples
In our running business rule example, we have presently only discovered which of a number of well-formed formulas are satisfied by a given customer's properties. This takes us a long way towards our original goal of offering a cross-sell product, but it is missing a step. Fortunately, adding in this final piece of functionality is quite straightforward. Given the Wff.ID returned from the QueryOfQueries procedure, we would like to find the cross-sell. Or, better yet, we can just change the QueryOfQueries procedure to return the cross-sell product's ItemID. There are, in fact, a number of ways this can be accomplished. One simple approach would be to add one more column to the Wff table, and store the cross-sell ItemID directly in that column. The resultant table schema would look like.
Wff(ID, MinDOB, MaxDOB, MinMonthlyPurchase, State, Gender, CurItemID, CrossItemID)
And, the QueryofQueries2 procedure would be altered to return this new column.
CREATE PROCEDURE QueryOfQueries2(@BirthDate, @AvgMonthlyPurchase,
@State, @Gender, @CurItemID)
AS
SELECT DISTINCT Wff.CrossItemID
FROM Wff
WHERE ((MinDOB < @BirthDate) OR (MinDOB is NULL))
AND ((MaxDOB > @BirthDate) OR (MaxDOB is NULL))
AND ((MinMonthlyPurchase < @AvgMonthlyPurchase) OR (MinMonthlyPurchase is NULL))
AND ((State = @State) OR (State is NULL))
AND ((Gender = @Gender) OR (Gender is NULL))
AND ((CurItemID = @CurItemID) OR (CurItemID is NULL))
However, this approach is not very flexible. A far better alternative is to use an additional mapping table to associate a satisfied Wff.ID with an Offer.ID. Then, this Offer.ID can be associated with any number of products in the remainder of your business schema. Using this technique, our Wff table would remain unchanged, and a new WffOffer table would be added to our schema.
Wff(ID, MinDOB, MaxDOB, MinMonthlyPurchase, State, Gender, CurItemID)
WffOffer(WffID, OfferID)
OfferItem(OfferID, ItemID)
In addition to the Wff and WffOffer tables, we have shown a third table called OfferItem. This table allows the business to associate any number of Items (pricing options, etc., not shown) with a Wff.ID. Now, our QueryOfQueries procedure would appear something like:
CREATE PROCEDURE QueryOfQueries3(@BirthDate, @AvgMonthlyPurchase,
@State, @Gender, @CurItemID)
AS
SELECT DISTINCT OfferItem.ItemID
FROM Wff INNER JOIN WffOffer ON Wff.ID = WffOffer.WffID
INNER JOIN OfferItem ON WffOffer.OfferID = OfferItem.ItemID
WHERE ((MinDOB < @BirthDate) OR (MinDOB is NULL))
AND ((MaxDOB > @BirthDate) OR (MaxDOB is NULL))
AND ((MinMonthlyPurchase < @AvgMonthlyPurchase) OR (MinMonthlyPurchase is NULL))
AND ((State = @State) OR (State is NULL))
AND ((Gender = @Gender) OR (Gender is NULL))
AND ((CurItemID = @CurItemID) OR (CurItemID is NULL))
Using this same mechanism, we can alter the QueryOfQueries procedure to return anything we like. In fact, we can shift focus entirely, and use this same mechanism to implement a state-transition diagram (deterministic finite automata). A transition is fired when an input is received and the input satisfies a given wff. Further, upon transition, a certain output may be produced and a new state may be entered. So, our implementation must do the following:
- Model the transition criteria as wffs.
- Model the current state as one of the transition criteria.
- Model the next state as a field associated with firing of a given rule.
- Model the output as an entity.
The following tables are sufficient, though a smaller or greater number of tables could also be used to implement the same ideas:
Std(ID, Cond1, Cond2, CurState)
Transition(ID, StdID, NextState)
TransitionOutput(TransID, OutputID)
Output(ID, ExternalEntityID)
The new QueryOfQueries procedure, now called DoTransition, returns the NextState and the ExternalEntityIDs.
CREATE PROCEDURE DoTransition(@Cond1, @Cond2, @CurState)
AS
SELECT DISTINCT Transition.NextState, Output.ExternalEntityID
FROM Std INNER JOIN Transition ON Std.ID = Transition.StdID
INNER JOIN TransitionOutput ON Transition.ID = TransitionOutput.TransID
INNER JOIN Output ON TransitionOutput.OutputID = Output.ID
WHERE ((Std.Cond1 = @Cond1) OR (Std.Cond1is NULL))
AND ((Std.Cond2 = @Cond2) OR (Std.Cond2is NULL))
AND Std.CurState = @CurState
Another interesting example is a custom multiple predicate dispatch mechanism. Whereas a common object-oriented language such as C# or Java can only dispatch on the runtime type of the this object (and the static types of the other arguments), we can create a table in our database that models a far more complex dispatch strategy. Without going into too many details, the resultant tables might include:
Dispatch(ID, Cond1, Cond2)
Method(ID, DispID, ClassName, MethodName)
The corresponding QueryOfQueries pseudo-code procedure might be:
CREATE PROCEDURE DoDispatch(@Cond1, @Cond2, @Args)
AS
SELECT DISTINCT Method.ClassName, Method.MethodName
FROM Dispatch INNER JOIN Method ON Dispatch.ID = Method.DispID
WHERE ((Std.Cond1 = @Cond1) OR (Std.Cond1is NULL))
AND ((Std.Cond2 = @Cond2) OR (Std.Cond2is NULL))
pMethod = ReflectOn(Method.ClassName, Method.MethodName)
pMethod.Invoke(parseArgs(@Args))
Conclusion
In the interest of space, we appealed to some "hand waving" in the implementation of DoDispatch. Additional effort would be needed to map the actual parameters to the formal parameters of the identified method. Leveraging the semantics provided by an RDF model could do this most effectively. That is, each actual parameter would be tied to a semantically meaningful RDF entity, as would each formal parameter of the identified method. This mapping would ensure that the proper arguments are passed to the method, even if the method were added long after the model was initially designed.
It is interesting to note that in the case of the state-transition diagram implementation, we would like to ensure that there is only a single matching wff record, while in the dispatch implementation we would actually expect to get multiple results and then do additional processing to determine the most specific method. Of course, if the state-transition diagram were an implementation of a workflow system with parallel paths, then we may still want multiple results. But these are all topics for another article.
About the Author
Joshua Greenberg, Ph.D. is the president of , a Florida-based consulting firm specializing in software and business development for the J2EE, .NET, and Oracle platforms. Joshua has over fifteen years of experience building mainstream information systems. In addition, Joshua has continued his research on consumer-to-business systems and semantically enriched content.