Table Oriented Programming

A practical, intuitive, and consistent way to organize
and process data and algorithm collections
 
Updated 2/12/2002

Summary of Primary Table-Oriented-Programming Concepts

  1. Control Tables - A way of organizing processing, decision, and attribute logic.

  2. Table-Friendly Syntax - Syntax and language constructs that make dealing with tables and T.O.P. easier and more natural. This includes overcoming the weaknesses of SQL.

  3. Data Dictionaries - Special Control Tables for storing processing, decision, and attribute logic for relational database fields and/or UI fields.

  4. Fundamental and Consistent Collection Operations - A base set of operations (interface) that all collections (tables, trees, stacks, lists, etc.) should have easy or built-in access to regardless of a collection's current size or complexity. (Arrays are evil! Arrays are the Goto of the collections world.)

  5. Code Management - Relational tables are a potentially much more sophisticated tool for managing complex and multi-faceted collections of programming code than OO classes or files.

  6. Database Engine Neutrality - A T.O.P. system should be able to access a wide variety of database engines. There are some practical limitations to this, but the goal should be kept in mind.

  7. Memory-Mapping Reduction - The goal of reducing or eliminating the need to manually map and/or transfer memory variables to and from table fields and to and from the UI (screens). (This process should be invisible to the programmer regardless of the fact that internal implementation usually uses memory-based copies.)

  8. File Directory Management - Hierarchies are too narrow in scope and too restrictive. It is time for multi-aspect thinking. One search key (the hierarchy) is not enough. (Link)

Most of these concepts can be summed up nicely
by the concept of Collection Convergence. See also Yin and Yang.

Table-Oriented Programming (TOP for short) can be characterized as a programming language and/or development method that makes dealing with tables, lists, indexing, sorting, searching, filtering, etc. a native and direct part of language design and complexity management. This is a contrast to the clumsy collection API's and attribute management techniques such as set/get made popular by object oriented programming vendors. Table-Oriented Programming can also be characterized by using tables to organize program logic, not just data. Such tables are called Control Tables. They offer potential organization benefits over both raw procedural programming and object oriented programming.

Most general-purpose languages use API-like constructs (function library calls) and SQL to deal with tables. We believe that this approach is too bulky, code intensive, and formal to be used often. Pushing into, Pulling out of, and converting data for API's and SQL is not very practical. (Some OOP languages do not call them API's but use something that is essentially the same.)

For example, most languages have special math-handling syntax for dealing with mathematical equations. Example:

   a = (b * c) + e + f
Now, if your only choice was to use API's, then you would have to use syntax like:
   a = plus(plus(times(b,c),e),f)      // silly example
Or, in OOP-ish syntax:
   a = ((b.times(c)).plus(e)).plus(f)   // sillier
Or, as an OOP purist:
   a = ((b.math.times(c)).math.plus(e)).math.plus(f)   // silliest
It would of course be silly to force math experts to use such syntax; yet the equivalent is being done to database and table developers. This API-like approach is fine for occasional use, but if 70% of your applications dealt with math, this would get a bit cumbersome. We have special constructs and syntax for math, why not tables? Most custom business applications use or need far more table handling than math. Perl is the king of strings, Java is the king of networking, C is the king of speed, we now need a king of tables. (SQL and MS-Access fall short of the title).

The market focus on Object Oriented Programming has left table-handling in the dust with regard to innovation. Sorted tables and lists are actually very useful for dealing with complex data structures including trees, stacks, queues, etc. Also, tables are not limited by the size of RAM, unlike traditional data structure algorithms. They provide built-in virtual memory.

Most custom business applications are very table intensive. Accounting, inventory, reservation systems, order tracking, etc., are common examples. Also, file and directory lists, E-mail lists, sales prospects, and even lines and circles in a drawing program can be represented with tables. Yet, the languages usually used, such as C++ and Visual Basic, use nothing more than API's to work with tables. These languages encourage people to use in-memory constructs rather than ordered tables. Sad.

Although SQL is a high-level language that is quite powerful for certain types of operations, it is far from a general-purpose table processing language. Many programmers end up writing "speggitti-SQL" because the alternative is to use annoying API calls or convert to data cursors. SQL is also a poor match for interactive programs because it is more of a batch-processing and query processing language.

SQL's set-oriented processing approach is often just not appropriate for many situations. SQL also has an annoying nest-happy LISP-like structure, which makes breaking down the logic into manageable chunks tough, especially for multi-joins. Using cursors can sometimes help, but they are far from standardized, given low vendor attention, and often not given "native" or direct access to the data engine.

SQL also cannot use functions and methods that are in the calling program; you have to use SQL's built-in or external functions. SQL puts a wall between you, your code, and the data. In addition, SQL does not support named fields sets, which will be described later. (More on SQL and stored procedures.)

TOP languages do exist in various levels or incarnations of Table-orientedness. These include Xbase derivatives (dBASE, FoxPro, Clipper), PAL (Paradox), Power-Builder, Perl (for certain list types), Progress, Oracle's PL/SQL, and Clarion Developer. (We will not necessarily vouch for the quality or design of these languages, only say that they have a table-tilt to them.) These languages get little press compared to big OOP languages. Also, when upgrades are built for them, OOP features get most of the development resources, and their TOP features are treated as second priority by the vendors now.

Why does OOP get 20 times more attention than TOP? We are not saying that TOP should be everything, but it does not deserve to be ignored. Being that tables are common and powerful, TOP does not deserve only 5% of the amount of attention that OOP gets. We only ask for balance, not an overthrow.


My Motivation

Why am I so heck-bent on promoting Table-Oriented-Programming? Simply because I have found the table paradigm so very useful for RAD (rapid development), software organization, and flexibility. Yet the IT market focused on technologies like Object Oriented Programming that made for better brochures and airline magazine articles instead of real and practical benefits.

My exposure to TOP started back in the late 1980's when I purchased a dBASE III book. I quickly fell in love with dBASE and later its XBase derivatives. (dBASE was not the first language I learned, nor was it the first that I used in a commercial setting.) It made working with relational tables such a snap that I started to view ALL collections as XBase tables. (Collections are any set of similar or closely related items.) This even began including program logic. (After all, OOP subclasses are simply a collection of related classes.)

Other languages tended to use different "containers" within the same language for collections. Such containers include arrays, vectors, dictionaries (associative arrays), and API/object interfaces to SQL database engines. The problem with these is that they are very limited and very different from each other. If the needs of a collection grew beyond the bounds or features of one of these structures or significantly changed in requirements, then switching to another "type" of collection container or re-creating the needed collection feature by hand was a pain in the [beep], let alone darn illogical.

It seemed much more logical to me to have ONE kind of interface into ANY collection and then hook up an easy-to-use set of standard collection operations that would be available to ALL collections big and small. (Not all engines will support all features, but the idea is to switch engines to get needed features, and not your existing API calls.) Although it has some annoying limitations and language weaknesses, XBase opened my eyes to table-oriented thinking.

OOP and other fads and trends prevented this powerful view of collections from progressing any further, and even reversed it to some extent. SQL as an interface is fine for formal transactions, but is too bulky and formal for many types of collection manipulations. Thus, I am here trying to sell the dream and vision of perhaps what should be called "collection-oriented-programming." I found it a more powerful metaphore than anything else on the market, and I hope you will too.


Ideal Table Oriented Programming (ITOP) Features

We have looked at table-intensive processes and found a common set of features that would enhance TOP features of existing languages. We call these features "Ideal Table Oriented Programming" because not all are found in existing TOP languages. These features are to TOP what Inheritance, Polymorphism, and Encapsulation are to OOP. (In fact, ITOP shares many of these OOP aspects.)

Data Dictionaries

First, we will present an example data dictionary portion. Data dictionaries are an important concept to ITOP. We will refer to parts of it below to explain certain concepts.

Data Dictionary Sample (simplified)
Table- Spec. Field-Name Field-Title Pre-Func. Post-Func. Groups Sort-1 Pick-Func. Total- able
Customers CustName Customer Name {none} {none} R 10 custProfl() No
Purchases PurchDate Purchase Date vdate1() dateFmt1(2) B,R 20 {none} No
Trans Amt Purchase Amount preDollar() postDollar( "###,###.##") B,R 30 {none} Yes

Breif Table Legend:
Table-Spec. - Table or field-set specifier. (Fields can be virtual.)
Field-Name - Abbreviation for field name.
Field-Title - Descriptive field title
Pre-Func. - Pre-validation function. Similar to an OOP constructor.
Post-Func. - Post-validation function. May also perform formatting for display.
Group - Groups that field belongs to. (There are many ways to represent these).
Sort-1 - 1st Sorting order of fields as displayed on table and reports. (May have other sorts.)
Pick-Func. - Function called if user double-clicks on field.
Total-able - 'Y' if field can be totaled on a report.
(Note that a Data Dictionary can have many more columns than shown and can be organized in different ways.)

Data dictionaries (DD's) are sort of a table describing a table(s). A DD differs from a common table structure list in that it may apply to more than one table, and it can also assign functions or behavior to handle common or related operations. DD's are often described as only a documentation tool in some literature; however, we are extending or allowing them to also be used for the centralized storage of field-related properties and/or operations actually used in software.

Under ideal conditions, the DD provides enough information to generate input screens, multi-row grids, and reports without programming these from scratch. It keeps all logic related to a field or column in one central place. (Similar to the goal of an OOP class or subclass.) It is much easier to find and change information in DD tables than hunting through saparate modules or subclasses in program code. DD's are not intended to replace all program code, just reduce the need for it except down at the true customization level where it belongs.

See an actual data dictionary for more examples and specifics. Note that the linked examples don't need to contain programming code and function calls to be effective. Putting programming code in tables is simply one TOP technique among many, but not a prerequisite.
 

The End of Linear Paradigms

Data dictionaries greatly reduce the need for bulky field specifications often used in OOP:
   field1.property1 = x
   field1.property2 = x
   field1.property3 = x
   ...etc...
   field1.property29 = x
   field1.property30 = x
   field2.property1 = x
   field2.property2 = x
   ...etc...
   field49.property1 = x
   field49.property2 = x
   field49.property3 = x
   ...etc...
   field50.property30 = x
I see these constructs all over VB and Java code. A construct like this is crying out for a tabled alternative when you have several dozen fields and several properties/functions. If you have 4 tables with 20 fields each, and each field averages 15 used properties, then you would have to write about 1,200 lines of code. (4 x 20 x 15) However, this could be converted into a table that is about 80 by 20 in cell dimensions (we are assuming that there are a total of 20 properties and/or functions). The 2D nature of tables makes them much more compact and logical for representing similar information. (This applies to control tables as well as DD's.) Code that repeats similar, but slightly different constructs or assignments over and over again is sometimes called "comb" code, or "E" code because of it's repetitous appearance. (Stacked E's resemble a comb.)

Optional Data Dictionary

Although data dictionaries are very powerful, they should be optional. This is because DD's are a bit formal and take some effort to build, just like any good organizational paradigm. You should be able to generate a quick data table in a program without having to fill out a DD. Not all tables and lists require high levels of formality, especially if there are only a few fields. ITOP does not focus on just large or just small tables. Tables may be quick, ad-hoc array-like structures, billion-record IRS transactions, or something in-between. DD's should not be shoved down one's throat.

Detached Data Dictionary

In addition to being optional, DD's should not be built into the table file itself. This is where Microsoft Access goes wrong. DD's cannot be shared as easily if there must be a one-to-one relationship with each table. (One-to-one DD's can still be built if desired.) For example, sometimes the same or similar structures and operations are used with many different tables.

Allowing all such tables to share one or few DD's makes maintanence much easier. Plus, tables from different systems can be accessed without having to convert to or from its native DD's.

An ITOP application should make it easy to physically separate the program code, data dictionary, and actual tables if so desired. An option to jam them altogether like MS-Access prefers should also be given.

In the DD example, the Table-Spec column allows asterisks to indicate that the Field-Name will be used to find the appropriate entry. For example, several tables may have a CustName field in them. Rather then creating an entry for each table, an asterisk is put in the TableSpec column to serve as a wild-card.

Extendable Data Dictionary

In addition to being optional, the DD should also be extendible if needed by the application. ITOP should only expect that a certain minimum set of fields be included. The developer should be able to add fields to the data dictionary as needed.

For example, if a certain action happens when a field is double-clicked, the data dictionary should be able to have a new column to enter the snippet or function call for each field upon double-clicking. (This example assumes that double-clicking is not already part of the minimum standards.)

Pre and Post Validation Functions

The pre- and post-validation functions are a very powerful part of ITOP. They allow consistent processing of fields regardless of where they are entered or displayed. For example, the pre-validation function executes regardless of whether the data was entered in a form, a grid, or any other input approach (assuming a short-cut outside of ITOP is not used.)

The pre-validation function serves two purposes. First, it checks the data to see that it is correct, and second, formats the field data for storage. For instance, a date may be input as "12/31/1998". The pre-validation function may change it to "19981231" before storing it in the actual table. If the user entered "12/32/1998", then the function would return a value of 'false' indicating an error. The function may resemble this psuedo code:

  Boolean Function Vdate1()  
    boolean status = true     // initialize
    yearpart = substr(curfld,7,4)
    monthpart = substr(curfld,1,2)
    daypart = substr(curlfd,4,2)
    if not between(monthpart,"01","12") _
       or not between(daypart,"01","31") then
      status = false
      curmsg = "Bad month or day number"
    else
      curout = convert2yyyymmdd(curfld)
    endif
    return(status)   // true if passed
  End Function
Notes: Curfld, Curmsg, and Curout pre-assigned variables. Curfld is the current field as entered by the user. Curmsg is the error message given to the user if the validation fails (a default is assigned if not programmed), and Curout is the field re-formatted for storage. The ITOP system automatically prepares and uses these variables before and after the function is triggered by user or batch actions. Another such reserved variable may be the length of the native string. This variable assignment method is only one possible approach to pre-validation routines; depending on the programming language, it may be better to pass these as function parameters instead.

Post-validation routines re-format the input for display. There is no true/false return value since it was already checked during input. Therefore, the return value will be the reformatted field. For example, if the stored value is "19981231", then the post-validation function can turn it into "12/31/1998". In short, the post-validation function makes the output prettier or easier to read. The example above uses Datefmt1(2). This sample function returns the date with years shown as 2-digits. (The function may get the original value from a Curfld-like variable as shown in the pre-validation example.)

It may seem like a pain to write pre- and post-validation functions, but remember that the same functions can be used over and over again. The inputs and outputs to these functions are generic enough that generic functions can be written for common formats like dates, phone numbers, etc. Thus, you do not have to re-invent the wheel for similar field types. (Although the programmer is expected to build all the validation functions, a pre-built set could be included in the DD kit to save steps or serve as examples.)

Sort Orders

Data Dictionary Sort orders specify the order that fields appear on reports and screens. In our example the fields are given an order in the Sort-1 column. The DD could also have a Sort-2 column, Sort-3 column, etc.


Standard Collection Operations

A good table-oriented system gives every collection (such as tables) a standard set of operations that can be used on all tables. One is not limited just to the operations that the programmer can see in advance and explicitly builds in for a given collection. Building or adding each of these operations in one at a time as the needs arise can be very time-consuming.

These operations include filtering, ordering, searching, auto caching and persistence, grouping and totaling, transferring, import and export, field/property selection, inserting, deleting, updating, and joining or relating. Click here for more details on these fundamental operations.

No Ceilings!

Many current approaches to collection processing have practical ceilings that require arbitrary interface changes to move to the next step. When these ceilings are reached, the programmer is forced to revamp the existing code to take advantage of the next level of power. Such revamping is a waste of time and resources. (Bertrand Meyer calls this the "Continuity Problem", where a small change in the requirements results in a large change in program code.) It would be like having to steer with your elbow if on a bicycle, steer with your nose if in a car, and then steer with your foot if in an 18-wheel rig.

Fortunately, the transportation industry pretty much standardized on steering by turning a wheel with one's hands regardless of the vehicle size or task. (Well, the bike uses a bar, but close enough.) The software collections industry is not this wise yet. They still want to divide collections into things like stacks, queues, sets, dictionaries, trees, etc.; letting short-lived operational needs drive the protocol chosen. Collection needs change and grow over time repeatedly in my experience. Thus, one should pick a flexible collections protocol. Once a stack always a stack? Nooooo waaaay. It may continue still acting as a stack for some operations, but often will need other views as well.

These ceilings are usually either complexity ceilings or size ceilings (such as RAM). Let's look at a common Perl approach and then some SQL problems that tend to be ceiling bound.

Perlers often use lists of lists and/or pointers to lists to store and process collections. Perl "associative arrays" are basically a RAM table with 2 columns and one index (to the "key" column). If the requirements suddenly change, such as the need for 3 columns, or 2 indexes with persistence, one then has to completely revamp the way fields and/or indexing is done. Perlers usually add a second level of complexity in the form of a list of pointer or a list of lists. In ITOP, or even XBase, these additions would be dirt simple. There is nothing magic about the limit of 2 columns and one index, so why does Perl and array-centric thinking impose this arbitrary limit?

Note that I have proposed using associative arrays elsewhere quite a few times. This may seem like a contradiction. However, those uses are generally an interface mechanism and not data collection management.
Although I find pointers to pointers nasty and error-prone to work with in almost any form, let us just assume that this approach is fine in some cases. However, if the complexity of the structure, the quantity and variety of operations keeps growing, or the size of such structures increase beyond a certain amount; then the typical response is to use a more powerful relational database add-in. Aside from the fact that DB API's can be bureaucratic to work with, one has to convert the native pointer structure and much of its processing into something the DB API's can use.

Thus, there are roughly 3 different kinds of interfaces one has to use as a collection graduates from simple to middle-level to complex:

  1. A regular or associative array.
  2. An array of arrays (or a list of pointers) if the structure grows beyond 2 columns or 1 index. (A "doubling-up," if you will.)
  3. Relational API's when heavy persistence, concurrency, or size is needed.

I see no reason why the same basic interface cannot be used from baby collections to Oracle-size collections. Why the industry tolerates this, I have no idea. Perhaps because they have not seen collections done right.

Note that there may be some minor setting differences as collections scale. For example, transaction markers and concurrency error handling may need to be specified for the higher-end collections. However, these can be treated as additions to the existing code, not overhauls.

Now let's look at traditional SQL operations. SQL is usually fine for fairly simple processing stuff. However, as the number of expressions, links (joins), and/or fields increase; SQL can get nasty at times. Standard SQL lacks many block-box (subroutines) and reference reduction (factoring) techniques found in most programming languages (and promoted as "a good practice"). In standard SQL you usually cannot assign variables, macros, subroutines, etc. to complex or repeating parts in order to break the logic and sequence down into more manageable parts. You simply end up with one big, messy string with lots of parenthesis. Beyond a certain complexity point one has to break the statement into 2 or more separate SQL statements.

Further, if set-oriented operations are no longer sufficient to handle the complexity of the job, the entire SQL statement has to be converted into a cursor-oriented approach that deals better with one record at a time. It is like having to stop, backup for several miles, and then start again on a different path. (See SQL Criticism.)

ITOP offers several techniques to avoid or reduce overhauls from complexity and size changes. The primary technique is the provision of a built-in set of standard, common, rudimentary, yet powerful collection operations (described above). Other techniques include internal-aware expression evaluation and the blurring of set-orientation versus cursor-orientation in database commands. (Set-oriented operations have some significant advantages in traditional client/server setups, however, one should have a choice, especially if the bandwidth between the client and the server is sufficient.)

"Complexity Scaling" can also be horizontal as well as vertical. For example, an API that is dedicated to a stack collection can get cumbersome if the needs grow outside of the traditional parameters of stacks. I encounter the need to use and view stacks, trees, queues, etc. in ways outside of these narrow collection "subtypes" all the time. Requirements change and your collections interface should be ready for such changes.

See Also:
Array Problems
Taxonomy Trap

Collection Convergence

There is a pattern to most of these (above) recommendations.
convergence diagram
Right now many systems have roughly four different collections protocols or protocol types. Is there a reason for this? One may argue that different collections have different needs, and thus specialization is required. Although this is certainly true for the implementation much of the time, I find this generally not to be the case with the collection protocols themselves. The primary reason is that requirements change too often. A collection may start out as a tree or a stack, but morphs into a more general-looking collection as more requirements are needed. I have experienced this process on several occasions. (See The Collection Taxonomy Trap for more about this, and The Multi-Dispatch Pattern for some source code management ideas.)

Besides morphability and scalability (described in previous section), another benefit is easier training. Instead of learning four or more different collection management systems, one should only have to learn a single protocol. Fine adjustments and specialized extensions can then be added on as needed (such as a Pop(x) "wrapper" function for stack-like activity).

A third advantage is that the same collection system can be used for all the different collection types and variations. Rather than build a class/code browser, an RDBMS browser, an array browser, etc.; vendors can focus on building one grand collection system and browser that does it all. It could even be modular such that you can attach different text browsing engines that highlight code keywords, etc.

Even if you disagree with my specific protocol and/or syntax proposals, the idea of a consistent collection protocol should ring through as a very logical idea.

You may notice that my rejection of strong protocol taxonomies parallels my distaste for heavy use of sub-classing, also known as sub-typing and IS-A thinking. Software engineering has over-emphasized IS-A thinking. Perhaps in some niches it has an important place, but not for custom business applications.
 

Few Types

The proliferation of field types has made data more difficult to transfer or share data between different applications and generates confusion. ITOP has only two fundamental data types: numeric and character, and perhaps a byte type for conversion purposes. (I have been kicking around ideas for having only one type.) The pre- and post-validators give any special handling needed by the field. A format string can be provided for various items like dates ("99/99/99"), Social-Security-Numbers ("999-99-9999"), and so forth. (Input formats are not shown in our sample DD.)

Types like dates and SSN's can be internally represented (stored) just fine with characters or possibly integers. For example, December 31, 1998 could be represented as "19981231". This provides a natural sort order.

Booleans can be represented with "Y" and "N" (yes and no) and blank for uninitialized. This has the advantage of adding more codes in the future if 2 turn out not to be enough. Further, I have witnessed RDBMS numeric ID numbers being changed into strings and visa verse. Being type-agnostic reduces or eliminates the code changes needed after external or data source type changes. (Fortunately, ID numbers rarely are compared with greater-than and/or less-than operators. There are drawbacks to type-agnosticism, but overall I think the benefits are greater.)

Enforcement of format can be done via validation specifiers (both built-in and custom). Fewer language types increases the share-ability and portability of data. (See also Black Box Bye Bye.)

Field Groups

One of the most time-wasting process in programming table processing with many of the popular languages is having to type the names of all the fields that will show up on a screen or report. It would be much easier to specify the name of a set, and all fields belonging to that set would then be used. Although sets could be pnumonic names, we chose to use letters in our example for simplicity.

Suppose that we had to make a report that showed customer transaction detail, but which ommited customer names for reasons of confidentiality. With our setup, we could just ask for a report on all fields in set "B" (See the Groups column above). When dealing with tables with 50 plus fields, specifying a set name is much simpler than typing 50 names or building a field loop.

Other Possible ITOP Features