W3

Building an RDF model:

A quick look at iCalendar

I spent a few hours reading 50 pages of the iCalendar RFC2445 with a view to evaluating proposals to put it into XML. My conclusion early on was that the spec should be written in terms of RDF properties, particularly as it has a clear property/value and parameter/value structure.

Epilogue: see also the RDF Calendar Workspace, started late 2002, which takes a similar approach to this "quick look" with running code, example data, and schemas.

DanC

Summary

General points I noticed included

  1. The spec is full of x-extensions and IANA registries. these would all be done using namespaces in XML
  2. There is no summary of properties with their domains and ranges, which would make the spec much clearer.
  3. The parameter value type of "URI" implicitly causes dereferencing. This is not clear from the spec but is assumed by the examples.
  4. There are a few example of wanton reification, e.g. relationship type.
  5. Encodings, for cleanliness: the encoding is a relationship between two objects, not the property of an object. Same comment on XML DSig.
  6. I am concerned that I have not found very much protocol defining what how agents interact, or what a message containing a calendar entry means. But maybe that is elsewhere in the spec.

Narrative

When looking for a natural representation of data in a given lanbguage in RDF, one looks at first for the natural structureo fthe language. iCalndar has a nested set of structures which naturally lend themselves to an RDF graph interpretation. Apart from the noted exceptions, this translatoin leads to a set of fairly logically defined RDF properties which could form iCalendar's contribution to the semantic web.

A "calendar" consists of a set of components, such as events, and to-do list and journal entries. These seem natural RDF types. (There is a choice of whether to introduce special a specific property as the relationship between the containing calendar and a specfic type of component, or whther to use generic inclusion property and then specifythe subtype of the component.)

The components have properties, even known as properties in iCalendar. Now each property is in fact a complex thing which has a "value" (implcitly named) and various "parameters" with names.

The named parameters are clearly easily represented as RDF properties.

The values are generally atomic things suhc as integers and strings, with two exceptions. One is when the valeu if the URI and this implies that the actual value is in a document with that URI. Another is that the value datatype "rcecur"is a string which itself has a substructure. This recurrence substructure takes the form of (guess what!) a set of attribute value pairs.

Detailed comments

2.3 Internationalization

If this were XML this would be done for you, with Unicode and the various encodings etc.

4.1 Content Lines

x-name and iana-token are extensions which XML would give for free using namespaces.

"Each property defines the specific ABNF for the parameters allowed on the property"

This makes general parsing impossible, direct conversion into XML difficult. The only hope is that in fact that it not true and there is more consistency than this line leads you to believe! This sounds like a remake of the RFC822 problem which HTTP has in spades: One parser per page of the spec.

4.1.3

Here in the example

ATTACH;FMTTYPE=image/basic;ENCODING=BASE64;VALUE=BINARY:
      MIICajCCAdOgAwIBAgICBEUwDQYJKoZIhvcNAQEEBQAwdzELMAkGA1U
      EBhMCVVMxLDAqBgNVBAoTI05ldHNjYXBlIENvbW11bmljYXRpb25zIE        <...remainder of "BASE64" encoded binary data...>

represents the encoding as though it were a property of the value. It isn't: it is a relationship between the value and thestring expressed here. Nicer to write that.

<attach>
   <fmttype>image/basic</fmttype>
   <base64>MIICajCCAdOgAwIBAgICBEUwDQYJKoZIhvcN
     [...]
   </base64>
</attach>

which would mean (in XML or RDF nonstriped strawman syntax) "Something is attached which has content type image/basic and has base64 encoding MMICCblablahblah".

Note that making base64 a first class relationship (subclass of encoding) makes for brevity and extensibility: with a namespace I can introduce a new one.

Value=binary has all these problems and is unnecessary. It is assumed in base64. The earlier example with the URI

ATTACH:http://xyz.com/public/quarterly-report.doc

has an implicit dereferencing operation which it would be best to expose:

<attach>
   <uri>http://xyz.com/public/quarterly-report.doc
</uri>
</attach>

which means, consistently with the previous example, "something is attached which is identified by URI http://...."

4.2 property parameters

Property parameter values MUST NOT contain a double quote. So I guess that if i want to represent something which does... I attach it?

4.2.1

ALTREP and many of the following parameters can be represented obviously as RDF properties. There needs to be an explicit property between the introduced thing and any "value".

<description>
   <altrep>cid:[email protected]</altrep>
   <text>Proext XYZ review meeting</text>
<description>

This becomes more obvious when you look at things like ATTENDEE.

4.2.2.

There seems to be an embryonic notion of type here ("properties with the CAL-ADDRESS value type". I assume this can be formalized. it would be so much simpler if this were tabulated.

4.2.3 Calendar User Type.

"mailto:" is usually in lower case. I thought it was in fact mandatory that it be in lower case.

4.2.5 Delegatees

It is very confusing who ends up being the attendee notionally when both delegates-to and -from are specified. Changing this to RDF, or contemplating doing logical operations on this make one queasy about the solidity here.

ATTENDEE;DELEGATED-TO="mailto:[email protected]";DELEGATED-FROM="mailto:[email protected]":[email protected]

What is that equivalent to? I assume [email protected] goes to the meeting.

4.2.7. See comment about 4.1.3

4.2.9 Free/Busy Time type

make relationships first class

FREEBUSY=FREE: would be better as FREE: to reduce unnecessary complication and allow extension.

If that section of the spec (4.2.9) seems to be self-referential and difficult to read, that is also because it is describing an unnatural part of a clumsy syntax. You don't say "I am free or busy as follows: 12-1pm and we are talking about free here"! because RDF makes these things first class objects and allow you to group FREE and BUSY and REALLYBUSY as subclases of FREEBUSYTYPE life is easier.

4.2.10 language

xml:lang of course is what one would get for free with XML.

4.2.15

"RELATED-TO:RELTYPE=SIBLING" is a classic wanton reification. Just say SIBLING:

Unfortunately the specification defined how calendars can be put into a hierarchical relationship but doesn't say what that relationship *means*. Maybe it does later in the spec.

4.2.18 Sent By

This is a relationship between a mailbox and another mailbox. It is that the owner of one mailbox is being represented by the owner of another. Yes, the message which asserted this data was probably sent by the agent, but the term is misleading when it crops up in the data. This will cause confusion. This is an example of the clarification which arises when you try to represent the meaning of each rdf:property (icalendar:parameter) independently.

4.2.20 Value Data Type

Note that the "URI" data type does not just constrain the value string to be a valid URI, but indicated that the value string is the document you get when you dereference the URI. Big difference, particularly when you automate the base 64 decoding of something.

In general, note XML data types are defined by XML schema working group. See draft @@. A comparison would be a useful exercise.

4.8.4.1 ATTENDEE

"If the LANGUAGE property parameter is specified, the identified language applies to the CN parameter"

That is a terrible bit of design - a typical bit of interference between different headers which is so temping for designers in these flat specs which can't use nesting. How many other clauses like this are there?

LANGUAGE is, I must admit, a problem RDF has a bug with in general. It is difficult to specify that a string has a language without making an intermediate node that you don't want. This is, I realize the same as the intermediate packaging problem: how to let a system know that what it asked for is inside, but in the mean time, here is some useful information about it. Here is a number and by the way it is prime. here is a GIF and by ht way it is copyright. Here is a common name and by the way it is in English. It is interesting to see the way iCalendar has the same problem

4.8.4 UID

There is linking between components of calendars which uses "UIDs" which are mid URIs with the prefix removed. This is a bug

4.8.7.4 "SEQUENCE"

This is not in fact a property of an event, but is a property of a given expression of the state of an event. the rule is that it must be incremented by the organizer if the event changes significantly. In a peer-peer world, it is not obvious what to do.

Not reviewed

I skipped most of the rest of the spec but a few very similar concerns arose with some other parts I glanced at.

Conclusion

It seems that RDF nodes for the calendar, for each event etc, and for each icalendar:property is a fairly straightforward mapping.

A spinoff would be a vocabulary which would include useful reusable models of time.The timezone work could be factored out if it is definitive.

Where RDF mapping was not obvious this sometimes coincided with unclear aspects of the specification.

There are three levels at which the RDF mapping could be made

  1. A very direct mapping of the ical:properties and parameters onto rdf:properties. Always use the same "value" rdf:property for the VALUE of an ical:property. This would leave some things looking illogical in RDF. It would be simple to define as a mapping, but the definitoin of the properties would be strange in some cases.
  2. Make a few simple adjustments to make the RDF more natural. Places to lok for these arehese have been indicated with a @@ in the table. This will make the mapping obvious to an iCal expert reading the RDF, but at the same time make the RDF queries simpler and the properties more reusable. It would move things like RELATED RELTYPE=X into a subclass relationship between X and RELATED which allows generic RDF machinery to process it.
  3. An extensive rework in which the logic of rules was largely exposed in RDFS or something stronger would of course be great.

Appendix: Node types

Node types infered
party implicit node in all properties with a CAL-ADDRESS value type. (person or group: anything which can have a mailbox)
cal-address A mailbox - normally mailto:... URI
CU Calendar user defined in CUTYPE
INDIVIDUAL, GROUP, RESOURCE, ROOM CU
ldap-directory starts "ldap:" (is this a standard?) URI
mime-type string
participation status needs-action, accepted, declines, tentative, delegated, ... (an enum type- could do better. Constraints in the spec.) string
component of a calendar
EVENT, TODO, etc component
TimeProperty DTSTART, DTEND, DUE, EXDATE, RDATE
Timezone see TZID string
icalobject
recur defined by recurrence properties -Really complex datatype could be broken down into RDF! Contains its own nested attr/value structure.

Appendix: rdf:Properties - from "parameters"

Properties from section 4
iCalendar name domain range Notes
ALTREP anything iCal property? URI altervative to body
CN party string
: (mailbox) party cal-address Implicit node between a party and that part's mailbox. Represted by "value" of property
CUTYPE - type
DELEGATED-FROM party cal-address
DELEGATED-TO party cal-address
DIR party URI
eightbit, base64 bits text text encodes bits accordingto RFC2045. Was value of encoding "property"which was faulty model. Now, subclass of generic �ncoding"property
ENCODING bits text Only in schema, as superclass of eightbit and base64 See notes
FMTTYPE document mime-type Why not call it content-type?! Applies to a document. Expect the implit uri proprerty to tell you which object.
FBTYPE Supertype of the following
FREE, BUSY, BUSY-UNAVAILABLE, BUSY-TENTATIVE ? time-interval enum became subclasses FBTYPE property
LANGUAGE string-or-doc iso-language Equivalent xml:lang
MEMBER party cal-address group membership
PARTSTAT party enum A status: part of some protocol?
RANGE component superclass only of ...
THIS-AND-PRIOR, THISANDFUTURE component date-time subclass of RANGE (was qualifier)
RELATED component period@@ superclass of TRIGGER-FROM-START and TRIGGER-FROM-END?
RELTYPE component component Superclass only, of
PARENT, CHILD, SIBLING component component Subclases of RELTYPE. Hierarchical constraints. Semantics unclear@@.
ROLE party enum roleparam Attendee; role=chair could it be better "chair?". Wait and see wether it is a separate dimension.
RSVP party boolean
SENT-BY party cal-address Misleading. "Represented by" would be better. Some message was sent.
TZID anything taking time or D timezone Yuk. should be part of the time string. Makes time complictaed
VALUE string-or-doc string Superclass of the following
BINARY, BOOLEAN, CAL-ADDRESS, DATE, DATE-TIME DURATION, FLOAT, INTEGER, PERIOD, RECUR TEXT, TIME, URI, UTC-OFFSET" string string Specifies the datatype of an associated string
URI document URI Subclass of VALUE but indicates the vale is the content of the resouce identified.
calprop icalobject superclass for the following
VERSION icalobject string subclass of calprop. unique.
PRODID icalobject string subclass of calprop

semantics? unique.

CALSCALE icalbobject string subclass of calprop
METHOD icalobject string This is a hook for a protocol definition
VEVENT icalobject event Property VENVENT of calendar implies component is of type event. See spec for properties including this in their domain
VTODO icalobject todo similar
VJOURNAL icalobject journal similar
VFREEBUSY icalobject freebusy similar
VTIMEZONE icalobject timezonedef similar Definition of a timezone.
VALARM ?component alarm can nest in component
CALSCALE icalobject

Appendix: Calendar component Properties

See spec 4.8

The columns E, T etc indicate whether the subject of the property is permitted to be an event, todo, journal, freebusy, alarm or timezone component.

Properties of calendar components
iCalendar name E T J F

A

Tz range Notes
ATTACH y y y y text-or-doc
CATEGORIES y y y text List of enums
CLASS y y y classification
COMMENT y y y y y text no comment
DESCRIPTION y y y y text
GEO y y float float lat long. @@ Split into two properties?
LOCATION y y text
PERCENT- COMPLETE y integer
PRIORITY y y integer
RESOURCES y y text
STATUS y y y text enum - see the spec.
SUMMARY y y y y text
COMPLETED date-time
DTEND y y date-time or date
DUE y date-time or date
DTSTART y y y y date-time or date
DURATION y y y y duration
FREEBUSY y period
TRANSP y text really boolean!
TZID a a a a a a text
TZNAME y text
TZOFFFROM y utc-offset like -0500
TZOFFTO y utc-offset
TZURL y URI
ATTENDEE y y y y y y party @@ If language is specified, it applies to CN: Kludge! @@@
CONTACT y y y y text
ORGANIZER y y y y party Note in FREEBUSY the use is different
RECURRENCE-ID y y y date-time or date Could be a problem. Not a property of an event, but its presence makes it a reference to a specific occurrence of a repeated event.
RELATED-TO y y y text (really URI whcih is UID of component) Subclass only of PARENT, CHILD, SIBLING above.
PARENT , CHILD, SIBLING y y y see RELATED-TO
URI y y y y URI document "associated with" component. For more information.
UID y y y y UID - URI without mid: @@ Missing scheme!!! @@ replace with midL: URI
EXDATE y y y date-time or date Excludes the dates given @@ implicit logic makes search logic difficult.
EXRULE y y y recur
RDATE y y y date-time or date
RRULE y y y recur

Properties ofAlarm coponents and config control and misc
name domain range Notes
ACTION A text really an enum
REPEAT A Ainteger
TRIGGER A duration or date-time See RELATED. @ Split into two properties?
CREATED ETJ date-time
DTSTAMP ETJF date-time
LAST-MODIFIED ETJTz date-time
SEQUENCE ETJ integer fuzzy rules for incrementing this
REQUEST-STATUS ETJF text eg 3.1.1

Properties from recurrence rules
name domain range Notes
UNTIL rrule text text - all these are text with various constraints and substructure
COUNT
INTERVAL
BYSECOND
BYMINUTE
BYHOUR
BYDAY
BYMONTHDAY
BYYEARDAY
BYWEEKNO
BYMONTH
BYSETPOS
WKST
FREQ

Properties of
name domain range Notes

Examples

@@@

References

There must be a much better list of resources for hacking calendar files of various formats - but until I find it here are some random things I found.


$Id: foo.html,v 1.47 2005/02/02 18:26:43 timbl Exp $
TimBL, Oct 2001