The interactive, single-user format of personal computers has spawned applications quite different in character from mainframe applications. Like mainframes, personal computers can serve the needs of an organization in areas such as accounting, billing, or inventory. Unlike typical mainframes, however, they also serve personal needs, such as preparing memos, developing financial projections, and planning projects.
Two of the three major personal computer applications--spread-sheets and word processing--are primarily used as personal productivity tools by individuals. The third major application--database management systems--is different. It is used almost exclusively as a scaled-down version of its mainframe counterpart. Mainframe database management systems do not typically serve the individuals needs, but rather the organization's needs.
Does this mean that individuals do not need to store, organize, retrieve, and otherwise manage information? No, they certainly do. But their needs--and the character of their data--are quite different from organizational needs and data.
In the course of their activities, professionals and managers encounter a variety of information that can be conveniently expressed as a short piece of text, that we call an item. An item might represent (for example) an idea, task, reminder, or fact. Typically, an item consists of a single phrase or a sentence and has little or no formal internal structure.
Some examples of items are:
An individual may handle several items in a single day and have hundreds or even thousands of "active" items that need to be stored, organized, and scanned regularly.
While much of the information that individuals need to manage is short and self-generated, sometimes the granularity is larger and comes from other sources. An individual may need to index or organize larger bodies of text such as memos, reports, messages, or news stories. it is useful to associate such objects with items ' where the item may be a topic, summary, source, or headline describing the text. We call such bodies of text notes.
To cope with the task of organizing items and notes, individuals group them into sets--typically in the form of an ordered list or file. The items/notes then can be manipulated conveniently as a unit. We call these sets categories and say that the items/notes are assigned to categories.
A representative range of categories for one individual is:
When many categories are defined, the categories take on special significance they provide the concepts for organizing and understanding the information at higher levels of abstraction. The set names become the words of a language, which can describe the information and its interrelationships. This language can capture salient features of the universe of the information, reflecting the "real world" from which they are drawn.
When items/notes can be assigned to multiple categories, an individua1 can alter his or her focus to reflect different perspectives by organizing the same information in multiple ways. We call each such organization a view. This is difficult to accomplish with physical storage methods (such as paper in files), where each item or note exists in a unique place.
Most people are unaware of the volume of information they manage on a regular basis, because they use a variety of mechanisms for this purpose. The style of management is highly individual, but often includes such aids as note cards, rolodexes, yellow Posted notes, while-you-were-out messages, and lists on paper that are recopied on a regular basis. In general, personal information is too ad hoc and poorly structured to warrant putting it into a record-oriented on-line database.
In the 1960s, mainframe computers were owned primarily by large organizations and businesses and were applied to their problems. These consisted mainly of clerical database tasks, which usually involved filing, retrieving, sorting, processing, and reporting data. For these applications, the computer represented several advantages over the corresponding manual processes:
These advantages of computerized databases over manual processes prompted both database theorists and practitioners to focus on the efficient, reliable manipulation of large volumes of data. It was recognized, early on, that understanding and describing the structure of the data in advance (i.e., at database design time) was a major advantage. Most of the data being manipulated had a repeating, record-oriented structure that could be used to improve processing and storage efficiency. Therefore, relatively little attention was paid to providing richer data description languages or designing systems with more flexible data structures.
As a result, the dominant database management paradigms are not well suited for managing personal data. They are oriented toward the storage and retrieval of large volumes of data with a known, repetitive structure. These databases are mainly used for keeping corporate records and communicating within an organization.
Personal data has different characteristics: it is often of relatively limited volume; its structure is not known in advance and evolves overtime. It contains heterogeneous data types. This information may be generated and managed directly by an individual or may be some body of data that is indexed or accessed according to some idiosyncratic needs. Therefore, mainframe databases and their personal computer counterparts are not very useful for personal data, where discovering presenting, or modifying the structure of information is more important. A different data management paradigm is required.
Many programs designed to manage personal data have adopted a completely unstructured approach, treating all data as free text that is searched and displayed in response to ad hoc keyword queries. This approach confuses the lack of a fixed structure with no structure at all. Personal data has structure, but its structure is fluid, changing in response to evolving needs.
Hypertext systems, such as Hypercard for the Apple Macintosh, provide a simple structuring mechanism--links between data elements--which allows the user to create a topological space of adjacent elements. However, structure is fluid, changing in of the grouping of similar items and relationships defined among those groups, not the simple linking of individual items together. Most hypertext systems assume that the information of interest resides exclusively in the data rather than in the structure. They do not provide the user with the means to view and manipulate structure itself or create relationships among groups of data elements.
To study the requirements of personal database management, we constructed several prototype systems, culminating in a program suitable for release as a commercial product. This program, named "Lotus Agenda," meets several design requirements for managing personal information. These are:
From the user's perspective, Agenda consists of a few simple elements that are used to incrementally define and populate a database.
The first of these is the item, as described
above. Items are the content of the database and can consist of
up to several lines (350 characters) of text. Items in Agenda
are assigned to or filed in categories. Categories provide the
structure of the database and are used to organize items and display
them conveniently. Like items, the categories are not pre-defined--the
user creates, deletes, and modifies categories as necessary in
order to manage the items.
Each category defines a subset of those items assigned to it.
Alternatively, categories can be understood as one-place predicates
defined over the universe of items. Given the categories defined
earlier, the user might assign an item "Call Fred next Tuesday
about pricing policy plans" to several categories: "Phone
calls," "Fred Smith," and "Pricing policy
committee."
Figure 1 shows a simple list of items in Agenda assigned to a category called "Tasks." A bar cursor serves as a pointer, highlighting the current item or category in reverse video. In the example, the cursor is on item "Call Jeff about the R&D budget."
Both items and categories can optionally have an attached body of text called a note. Notes can be up to about 10,000 characters in length and may be internal to an Agenda database or contained in a separate ASCII file. Notes are easily accessible from an item or category by placing the bar cursor on the item or category and pressing a function key.
Figure 2 illustrates a note attached to item "Call Jeff about the R&D budget." Typically, notes are used to capture some further detail or some secondary information about an item or category. Another use of an item and its note is to make the note a short document and use the item itself as a title. Notes on categories are often used to document the intent behind the category, i.e., what sorts of items should be assigned there.
The categories are organized into a tree structure called the category hierarchy. The purpose of the hierarchy is to conveniently describe two fundamental structuring principles: subsumption and mutual exclusion. Items that are assigned to a category in the hierarchy are also implicitly assigned to the parent of that category in the hierarchy, i.e., a parent category subsumes its children. A category in the hierarchy can be optionally designated as exclusive, indicating that no item can be assigned to more than one child in the category's subtree.
Subsumption and mutual exclusion, defined through this tree structure implement a limited class of logical implications: A implies B (subsumption); and A implies not B (mutual exclusion). Advanced features called conditions and actions allow users to define more complex logical interrelationships between categories, including a form of declarative if-then rules. Automatic assignment and other forms of intelligent behavior are implemented in part by enforcing these relationships.
Items are entered and displayed in Agenda through views. A view is analogous to a report in a typical database management system, except that it is dynamic: it is modified automatically when items or categories change. The user typically defines a number of different views and can quickly switch among them. Views allow the user to focus on different aspects of the database for different purposes by varying the content or format of the display. For example, when deciding what to do next, the user might consult a view listing items by priority; in case a colleague drops in, another view might provide only the items that relate to that person, organized by topic.
The user constructs views visually by arranging categories into (possibly irregular) row and column structures on the display. Formally, views are defined via three mechanisms:
In summary, an Agenda database consists of four basic elements. Items and notes provide the content; categories organized into a hierarchy provide the structure; and views provide the data entry and reporting (output) mechanism.
The power of Agenda comes primarily from the user's ability to update the database implicitly through the views. By manipulating items and categories in a view, the user can make changes to the underlying database that are immediately reflected in other views, without concern for the internal organization or storage of the information.
While this facility is a significant advantage to the user, it presents a complex implementation problem. Theoretically, updating a database through a view is not always well defined. Specifically, an action may be ambiguous or even contradictory to interpret in the underlying database. Agenda uses a variety of formal methods and heuristics to interpret these actions in a reasonable way and affect reasonable changes to the database.
A normal suite of screen-oriented commands is provided to the user in order to manipulate the objects visible in the view: insert, delete, move, copy, edit, or replace. The system interprets these commands by making the simplest change to the underlying database, which results in the literal change to the display indicated by the command. As a result, the actual interpretation of a command is highly context sensitive.
Some simple examples are:
By far the most common operations on categories in a view are insert and replace, which involve the selection of a category for display in one of these three contexts.
One approach to allowing the user to select categories to define a view is to present the category hierarchy and allow the user to select the desired categories. Experience with users, however, revealed that it was disconcerting to keep switching back and forth between the partially constructed view and a display of the category hierarchy. We choose an alternative approach, allowing the user to simply position the cursor on the screen and type the name of the desired category.
The goal of these methods is to make the minimal logically consistent change to the database that will produce the change to the view indicated by the user. This often causes other views to be modified implicitly as well. When the user makes a change to a view that is ambiguous with respect to a database (i.e., that has multiple interpretations), Agenda selects the interpretation that causes the minimal disruption to other views. If this heuristic provides no guidance for resolving the ambiguity, Agenda examines assignments of similar items and categories in the vicinity of the user's change.
While these techniques do not always yield an optimal result, they are quite satisfactory in the majority of cases. The residual cases are minor annoyances, more than offset by the convenience of updating through views.
This introduces some ambiguities in interpreting the user's intent. Is the user creating a new category or trying to select an existing one? What if there are multiple categories with the same name in different parts of the hierarchy? To resolve these questions, Agenda incorporates a concept called category matching. When the user begins to type, a special display appears at the top of the screen, which indicates how many categories in the hierarchy match the string being entered and displays the first of these (in hierarchy display order). While typing the string, the user can use arrow keys to display other matches, pop up a display box containing the relevant portions of the hierarchy, accept the currently displayed category, or create a new category simply by typing a unique string.
This process has several advantages. in most contexts, the user only types a few characters to select a category without sacrificing the ability to make a selection from the entire hierarchy when desired. Also, new categories can be created without special commands--incrementally extending the structure of the database.
Perhaps the most powerful aspect of updating through views is the role of criteria (queries). when an item is entered into a view for which a criteria selection is specified, it follows that the item is intended to satisfy the current criteria. Agenda incorporates a special reverse query evaluation technique that can force a new item to satisfy a given set of criteria. Thus, entering a new item into a view that displays only items associated with a particular project or date (for example) will assign that item to that project or date.
In some cases, there are multiple ways that a new item could satisfy the criteria. When such ambiguities occur, the context of surrounding items in the view are used to select a single interpretation.
Entering and categorizing data in a personal
database can be a chore. Ideally, a personal database system should
aid the user in this process. Agenda uses a wide variety of cues
and contextual information to help the user properly categorize
items and manage the database. Updating the database through views
is one aspect of this process. Another mechanism allows the user
to program automatic assignments and implicit actions.
In addition to serving as the structure of the database I the category hierarchy embodies a declarative program that is executed against changed items. Each time a new item is entered or an old one is modified, it is placed on a queue, which is processed in the background when no other actions are pending. The processing for each item takes place sequentially through the categories of the hierarchy in depth-first order (the order in which the hierarchy is displayed).
The user can associate with each category in the hierarchy a set of conditions and a set of actions, assembled from a restricted vocabulary available for this purpose. The conditions are evaluated each time an item is changed, and if the conditions are met, the changed item is automatically assigned to this category. Whenever an assignment is made to a category (including an explicit assignment by the user), any actions the user has defined for that category are immediately taken.
There are three types of conditions that can be tested for each category. These are called string conditions, profile conditions, and date conditions.
Unless the user explicitly defines a profile condition or a date condition, a category has only a string condition associated with it, since it exists implicitly by virtue of the category's name. This is by far the most common situation, as it does not require any special action on the part of the user. Indeed, the user does not even need to know about the subtleties of automatic assignment in order to benefit from string conditions.
In the simplest case, items will get assigned to categories that match one or more words in the item. For instance, an item "call Mr. Smith about the policy meeting" might match the string conditions of categories "calls," "John Smith," "policy committee," and "Meetings" and hence be automatically assigned to them.
A set of special symbols can be embedded into a category name to gain additional control of string matching. These provide for alternative descriptions of the category (called aliases); requiring precise matches; accepting any substrings; and fixing the order of matched words or case sensitivity.
A global setting called Initiative determines how strong a match must be in order for the string conditions to succeed. By varying the initiative, the user can control how freely the program will accept matches between items and categories.
Another global setting, called Authority, determines whether matches over the initiative threshold are carried out immediately or are put in a special queue of suggested categorizations to be reviewed with the user. This technique, sometimes called mixed initiative, allows the program to engage the user in a clarification dialog for weak or potentially spurious automatic assignments. When this queue is non-empty, a special ? symbol appears at the upper right of the screen. The user begins the dialog by selecting a menu item "questions," after which the program presents its pending assignments and the strength of the evidence for them.
Profile conditions are used for implementing ad hoc logical implications. For example, any item assigned to both "John Smith" and "policy committee" should be assigned to "discuss with manager." They are also handy for implementing default assignments, such as classifying any "bug" not already assigned to a "classification" as "serious.,, Date conditions are mainly used to define calendar views--views in which the items are displayed in sections according to dates. Using date conditions, it is possible to construct a daily calendar, weekly calendar, or any other desired arrangement. Date conditions are also useful for changing an item's assignments based on some time-sensitive conditions. For example, an item that becomes overdue can be automatically assigned to a "late" category, or one due within three days can be moved into a "high priority" category.
Date conditions work in conjunction with a natural language date parser that extracts English date expressions from items and assigns the appropriate date. This grammar can correctly interpret expressions such as "two weeks from last Tuesday," "the day after tomorrow," and "the last week in June," as well as standard American date formats such as "May 25, 1989 and 12/5/87.
Actions are somewhat simpler than conditions in that they are taken immediately whenever assignments occur. They can be used to date an item (date actions), push an item onto one or more other categories (profile actions), or execute a limited set of special actions such as to discard the item from the database.
Note that any change to an item that results from a successful condition or an action will place the item back onto the queue of changed items, causing it to be processed again through the hierarchy.
Programs of considerable complexity can be constructed by cascading sequences of conditions and actions throughout the hierarchy. In effect, the hierarchy is a declarative program that is run against each changed item.
There are several structuring principles that are inherent in tabular database organization, but that seem alien to personal data. Forcing users to consider them inhibits their ability to organize their data.
The most important of these is a fixed field/value distinction. In a tabular database, the fields (column headings), and possibly the set of legal values that can appear in a field (data), are normally defined in advance--when the database is designed. The organization into fields and values, however, is not a characteristic of the data itself, but a representational convenience; the same information can be represented in different tabular forms.
In general, the fields represent the fixed information about what can be in the domain, while the values indicate the changing information about what is. The distinction is not inherent in the data, but rather reflects how the data may change over time.
While data from a traditional database system could be transformed and presented in any of these formats, it would be quite difficult to allow arbitrary updates to the transformed data, as this may require extensive internal reorganization. By not presuming a particular field/value orientation at the database level, Agenda can present the data in any such format and support updates in real time.
In Agenda, both fields and values are represented by the more general concept of category. whether a category serves as a field or a value depends on its use in a given context--what role it plays in a view.
While it is possible in Agenda to create a multi-table structure by adding an additional section to the view, it is misleading to do so in this instance. Agenda databases can contain multiple record types, as in a multi-table database; however, only a limited form of join is possible. In a typical tabular database, each table contains one or more distinguished fields constituting a unique key for each entry. The corresponding concept in Agenda is the item itself, while the categories correspond to fully indexed non-key fields. Some relational database joins implicitly alter the interpretation of a given field from being a key to a non-key field. In this example, one might expect to join the two tables in 6d on DEPARTMENT, yielding a table. However, this implies a conversion of DEPARTMENT from a key to a non-key field. The corresponding operation in Agenda requires converting an item to a category, which is not a view-level operation.
Nonetheless, the referential integrity implied by the decomposition of the data into two tables is enforced in Agenda. Specifically, an individual could not be assigned to a department inconsistent with his or her division.
In a conventional database, the tabular structure is fixed in advance and the content varies over time. When dealing with personal data, users must be able to reorganize their databases on-the-fly. The basic problem is that the structure of a personal database may not be known in advance and, indeed, may continuously evolve over time as the nature of the information or the context changes
For example, consider the problem of organizing a new project to be undertaken. As the project begins to take shape, it becomes necessary to organize the mass of ideas, tasks, and information into categories or classifications so that the project can be understood and managed at an appropriate level of abstraction. A limited amount of detail can be handled at a time, and the structure provides the means to focus on manageable sub-tasks or aggregate the information into a meaningful big picture. The complexity of the structure is often dictated by the volume of information that needs to be handled. The amount of information can vary dramatically as the project actually unfolds over time--and therefore the structural requirements change as well.
In this context, the ratio of items to categories becomes important Experience with Agenda suggests that personal information tends to be organized into a relatively small number of categories, each of which typically contains multiple items. If the items are too numerous, the user tends to refine the categories; if they are too few, the user tends to aggregate them. The purpose of this process seems to be to maintain this ratio at a comfortable level. This requirement is in contrast to record-oriented databases,, where the ratio of items to categories is typically either one-to-one--such as each person having a unique address--or very-many-to-one--such as all people being assigned to either male or female.
We have also found that a single item tends to be filed in a relatively small number of places. This is why filing cabinets--where an item can be filed at most once--work acceptably, but do not provide the convenience or flexibility of a personal information manager. There is a definite departure point in power--a significant step up in utility--when items can be conveniently assigned to multiple categories.
The evolution of personal database structure often follows a particular pattern. Users seem to like regular, normalized data. As they discover and refine the structure of their data, however, it passes through non-standard states and in fact may be only partially normalized in steady state.
To address the need for dynamic modification of the database structure, Agenda category hierarchies can be modified incrementally without making the database, defined views,, or other program structures obsolete.
Two design considerations allow Agenda to provide this flexibility. First, the physical storage of the items does not depend on the structure of the database. Thus, no reformatting, reorganization, or indexing again at the physical level is required to implement a change to the category hierarchy. Second, no distinction is made internally between terminal and non-terminal categories in the hierarchy. Consequently, an item can be assigned to categories anywhere in the hierarchy, including the root category. To refine the categorization for a set of items, the user moves the item's assignments further down the hierarchy; to aggregate, the user moves the items' assignments up the hierarchy.
As noted earlier, most aspects of a view are defined by selecting categories. This allows Agenda to modify the views as the database structure changes. Since categories are the vocabulary for defining the elements of a view (criteria, sections, and columns), the program can respond intelligently to changes in the category hierarchy.
For example, deleting a category need not make a view obsolete--it may just remove a section or columns from a view or modify the selection criteria. Similarly, simple heuristics can be used to augment a view when new categories are added. For example, adding a new category as a child of the "Division/Dept" category would add a new section to a view in which all the existing divisions already appear as sections.
Perhaps the most complex adjustment required when the category hierarchy changes is the reinterpretation of queries (criteria). In Agenda, the user does not directly specify a Boolean expression in order to select the items that will appear in a view. Rather, the user identifies the categories of interest, indicating whether items assigned to that category are to be included or excluded, and the system synthesizes an appropriate Boolean expression.
This expression is not a simple conjunction of the categories indicated--complex expressions of logical ANDS, ORs, and NOTs often result. The heuristics for formulating these expressions derive from a simple observation: the user would never intend a query that would necessarily result in a null set of selected items.
For example, if a group of categories are selected that are exclusive i.e., an item can appear in one of them, at most, these categories will be ORed in the query, since conjoining them could never result in a non-empty set of items. While this approach limits the user's flexibility for specifying arbitrary queries, the query synthesis algorithm almost always produces the desired result in practice. Most queries are actually quite simple and follow the heuristic rules used in the program. When a category is deleted, queries in the system are resynthesized when they are next needed, to reflect the change.
Evolving databases create another challenge. How do you move data from one database to another? In a database where the structure is defined in advance, it is possible (though a bit tedious) to write a program that picks up information from one database, re-formats it, and copies it into another database. In an Agenda database, such a program is in constant danger of becoming obsolete, as the category hierarchies of the source and destination database continue to evolve
At the core of this problem is the indeterminacy of the translation. From a linguistic perspective, an item and a category assignment to form a declarative proposition: the item is the subject, and the category assignment is a statement about that subject. For example an item "Write Agenda press release" may be assigned to a category "Fred Smith," which has a parent category "Task assignments," representing the proposition that Fred Smith is responsible for writing a press release about Agenda. Each database contains a vocabulary of categories that, in their context, make assertions about the items.
While items can be transferred easily, it is difficult to determine how an item's assignments in one database should be interpreted in another database. When is a category in one database the same as a category in another? What if there are multiple categories in the destination database that match a category in the source database?
Our approach to this problem is to use partial match algorithms that attempt to find similarities between the structure of two databases. Moving data from one database to another involves assigning the data as similarly as possible in the destination database to its category assignments in the source database.
Two basic measures are used in the partial match process--comparing the category names (i.e., a string comparison) and comparing category hierarchy structures in the vicinity of the categories being compared.
When an item is exported from one database to another, information is carried along with it indicating not only the specific category assignments in the source database, but certain information about the context of those categories in the category hierarchy. In the destination database, the hierarchy is searched for matching categories, and if these are found, the item is assigned to them. If no matching categories are found, an optional attempt is made to locate an appropriate place in the destination category hierarchy for the orphaned category, and a corresponding category is created.
This approach to inter-database communication serves several purposes in addition to the immediate requirement to move items around. It allows databases with similar, but not necessarily identical, structures to remain related, even though each may evolve differently over time. For example, many users maintain a "shadow" database of completed action items as a separate Agenda database. By exporting items from one or more primary databases to this database, the structure is incrementally refined as necessary to reflect changes in the category hierarchies of the primary databases.
A similar benefit accrues to shared databases. Members of a work group may communicate by exporting items to each other or to a common shared database, which is consulted periodically. This can be used to accumulate changes to track progress on a project, or simply to send messages back and forth.
Transferring items from one database to another is reminiscent of the transfer of DNA in living organisms. An agent carries with it a genetic description that, in a suitably related environment, can serve to integrate the agent into this new environment. Under some conditions, the invading agent not only becomes integrated into the host, but transforms the basic structure of the host to accommodate it.
The development of Agenda spanned approximately two and a half years from conception to scheduled product release. During this time, many novel and interesting concepts were developed that were not ultimately incorporated into the product and may form the basis for future versions or other products.
User and market testing occurred throughout the final 12 months of this process. The development team was often surprised by the range of novel applications discovered by users. Several of these involved the analysis of existing bodies of text or information that was drawn in real-time from external sources. These included:
Because Agenda incorporates a variety of heuristic methods for taking actions without an explicit command (such as automatic assignments), it requires a subtle paradigm shift on the part of the user. Few computer users are accustomed to programs that interpret intentions in addition to simply executing commands. We found that people were quite willing to accept this sharing of control, tolerating occasional mistakes in the program's judgment in return for the benefits of improved personal organization and effectiveness. The results of cascading program-initiated actions often surprise even experienced users, who continually fine-tune their databases to more closely mirror their own judgment.
From a user's perspective, item/category databases
are tools for controlling complexity. They -make it possible to
organize volumes of detail far greater than manual methods allow;
quickly shift perspective in response to immediate demands; centralize
disparate notes, ideas, reminders, lists, and tasks; and get the
big picture by examining information from multiple viewpoints
and at varying levels of detail. Personal information managers
provide a means for coping with the heterogeneous, evolving, free-form
information that individuals must manage on a continuing basis.
Written by:
This article has been modified by removing references to illustrations and various extraneous references.