Automating a Highly Expressive Knowledge Space: Dual Representation

2007-07-30


Abstract. "Dual representation" represents a methodology for building and maintaining a highly flexible content management system built around a centralized vocabulary. This vocabulary incorporates a formal semantics (via a combination of an OWL ontology and a RIF Core rule set) for hierarchical document composition and an abstract framework for modelling terms in a domain in a way that facilitates semi-automated use of native XML and RDF representations. It relies on a very recent set of technologies (Semantic Web technologies) to automate certain aspects of data entry, structure, storage, display, and retrieval with minimal intervention by traditional database administrators and computer programmers.

A Content Management System instance is built around a domain model expressed using a Data Node Model. From the domain model various XML/RDF management components are generated. The Data Mason takes a domain model and generates XML schemas, formal ontologies, XSLT transforms, stored queries, XML templates, document definitions, etc. The ScreenCompiler takes an abstract representation of a data entry form (with references to terms in a domain model) and generates a user interface in a particular host language (currently XForms)

Introduction

The model described here is the same described in the Semantic Technology Conference 2006 proceedings for Automating the Management of a Semi-structured Database [DNM]. It is a semi-structured modelling language which uses RDF to describe infoset items that participate in functional mappings to RDF. The functional mappings can be formally defined using GRDDL [GRDDL]. The model maps information items [INFOSET] parsed from an XML document with namespace nodes which share a common base URI, to Associative Box (Abox) membership assertions (i.e., statements such as "item set #23211 is a member of a collection / class with the same name") as well as other assertions about these information items.

The mapping consists of several transformation rules which convert RDF statements made between these Data Nodes (OWL Classes [OWL] and OWL Properties), including: contains, "is a specialization of", "inherits structural constraints". These statements are converted into Description Logic (OWL-DL) expressions. Besides class membership assertions, the RDF graphs which are the range of these functional mappings (GRDDL result graph) will also include contains relations made between instances. This relation is interpreted as the owl:inverseOf of <http://www.w3.org/2001/04/infoset#parent> as defined in the RDF Schema for the XML Information Set [RDFS-INFO]. These RDF graphs will conform to the OWL expressions generated from an instance of the model. This controlled Semantic Web represents ontological commitment ([PROXSEM],[KR]) within a particular domain (vehicle almanacs, digital library resource management, etc.).

Adopted Rule Language: RIF BLD

A variation of the human-readable concrete syntax for the Rule Interchange Format (modified to use compact URIs and a currently circulating editor's revision [RIFBLD]) is used to describe the semantics which require rule-based dialects for the additional expressiveness over OWL. Where OWL is sufficient, the Manchester OWL syntax [MANOWL] is used. The dnode namespace prefix is used when forming compact URIs [CURIE] in the namespace of the Data Node model.

Macros for Dual Representation Archetypes

The dnode:type property is used to associate an XSD datatype to a leaf node in a directed multigraph of Data Nodes (an RDF graph of a data node model). These leaf data nodes are interpreted as RDF properties with the same URI. Depending on the XSD datatype assigned to these leaf nodes, some of them are interpreted as owl:DatatypeProperties with a pairwise disjoint partition of values [SWBPCV] for their range, each with human-readable string labels. Others will be interpreted as simple owl:DatatypeProperties with an rdfs:range corresponding to the assigned datatype. The former subcategory of leaf nodes can be thought of as Controlled Vocabularies.

class: dnode:DataNode

class: dnode:RootNode
EquivalentTo: dnode:DataNode and not ( dnode:contained-by some dnode:DataNode) 

class: dnode:LeafNode
SubClassOf: dnode:DataNode
            (dnode:type min 1)
            not ( dnode:contains some dnode:DataNode )

class: dnode:ControlledVocabulary
EquivalentTo: dnode:LeafNode and ( dnode:type value owl:DataRange )

In addition, a single dnode:RootNode is designated as the data node whose members (in an XML instance) will be the root node (or document element) in the corresponding XML instance document. The dnode:contained-by property is introduced for convenience to emphasize that these data nodes (and their instances) are true root nodes and cannot be contained by another data node.

There are several rules involving antecedents which include atoms that make use of the contains property between two (non-leaf) data nodes. These data nodes primarily comprise the backbone of a data node model. The first rule gives the general interpretation for the use of the contains property:

Forall ?A ?C
   And(rdfs:subClassOf(?A _:univRest) 
       owl:Restriction(_:univRest)
       owl:onProperty(_:univRest dnode:contains)
       owl:allValuesFrom(_:univRest _:disjClass) 
       owlunionOf(_:disjClass _:aList) 
       list:member(_:aList ?B1)
       list:member(_:aList ?B2)
       ...
       list:member(_:aList ?BN)

          :- And(dnode:contains(?A ?B1) 
                 dnode:contains(?A ?B2) 
                 ... 
                 dnode:contains(?A ?BN)                                         
                 dnode:DataNode(?A) 
                 dnode:DataNode(?B1)
                 ...
                 dnode:DataNode(BN))

Which can be paraphrased as:

When used between two data nodes (A and B), the contains relationship indicates that members of the first data node (an OWL class) which appear in a source XML instance document can be related to members of the second data node (also an OWL class) by the same property in the corresponding faithful RDF rendition of the source

The list:member logic function (or built-in) is burrowed here as shorthand notation for constructing an anonymous class consisting of the boolean disjunction operation applied over each contained data node

The remaining rules deal with the owl:DatatypeProperties. These are either those with XSD datatypes as their rdfs:range:

Forall ?L ?P ?DT
   And(rdfs:subClassOf(?P _:Rest) 
       owl:Restriction(_:Rest)
       owl:onProperty(_:Rest ?L)
       owl:allValuesFrom(_:Rest ?DT) 
       rdfs:domain(?L ?P) :- And(dnode:contains(?P ?L) 
                                 dnode:LeafNode(?L)
                                 dnode:type(?L ?DT) 
                                 owl:disjointWith(?DT owl:DataRange))

or pair-wise disjoint sets of values (owl:DataRange).

Forall ?L ?P ?DT
   And(rdfs:subClassOf(?P _:Rest) 
       owl:Restriction(_:Rest)
       owl:onProperty(_:Rest ?L)
       owl:allValuesFrom(_:Rest ?ENUM)
       rdfs:domain(?L ?P) :- And(dnode:contains(?P ?L) 
                                 dnode:ControlledVocabulary(?L)
                                 dnode:enumeratedClassFn(?L ?ENUM))

The dnode:enumeratedClassFn logic function (arity 1) takes the dnode:ControlledVocabulary instance and returns an anonymous class which corresponds to the value space for the controlled vocabulary. This value space is modelled after the second pattern introduced by the Semantic Web Best Practices and Deployment Working Group in Representing Specified Values in OWL: "value partitions" and "value sets" [SWBPCV].

Finally, this rule covers the how the dnode:inheritsConstraints property is interpreted (with respect to dnode:contains) as well as the rdfs:subClassOf property:

Forall ?A ?B ?C
   dnode:contains(?A ?C) :- And(Or(
                                  dnode:inheritsConstraints(?A ?B) 
                                  rdfs:subClassOf(?A ?B)
                                )
                                dnode:DataNode(?P)
                                dnode:DataNode(?L)
                                dnode:contains(?B ?C))

It should be noted that the dnode:inheritsConstraints relation behaves more like XSD type derivation than the rdfs:subClassOf relation which mimics the class inclusion operator in Description Logic.

There is an analogy between how these rules are applied and the way in which Description Logic dialects (such as OWL) act as templates for well-defined portions of a much more expressive language. The RDF-based OWL dialect maps [DL2OWL] RDF statements to general forms of Description Logic expressions. Data node model RDF statements about data nodes are mapped to more specific Description Logic expressions. In particular, these expressions can be interpreted as describing infoset constraints which can be captured in XPath-based document processing dialects such as XSLT, XPath, XQuery, XProc, etc. By describing infoset constraints in tandem with model-theoretic constraints, a fully functional mapping from XML documents to RDF graphs can be provided. In this way an XML / RDF repository can guarantee a closed, fully-automated, faithful rendition of its information space with strong ontological commitment [KR].

Figure 1. Dual Representation Architecture

Dual Representation Architecture

A Dual Representation Convention for XML->RDF

The general pattern or convention setup by these rules and their interpretation is as follows. As an example, below is a data node model describing 5 data nodes in a data node multigraph.

ex:Root a dnode:DataNode;
        dnode:contains ex:NodeA, ex:NodeB, ex:hasFooB.
ex:NodeA a dnode:DataNode;
        dnode:contains ex:hasFooA.
ex:hasFooA a dnode:LeafNode;
           dnode:type xsd:string.
ex:NodeB a dnode:DataNode.
ex:hasFooB a dnode:ControlledVocabulary
         dnode:type [ a owl:DataRange; owl:oneOf ("value1" "value2" "value3" .. ) ].

An XML instance documents which conform to a data node model will be of the form:

<ex:Root>
  <ex:NodeA>
    <ex:hasFooA>.. string ..</ex:hasFooA>
  </ex:NodeA>
  <ex:NodeB />
  <ex:hasFooB>.. string ..</ex:hasFooB>
</ex:Root>

The corresponding faithful RDF rendition (through the use of a generated XSLT transformation) will be in the form:

[] a ex:root
   dnode:contains 
     [ a ex:NodeA ; ex:hasFooA ".. string .." ],
     [ a ex:NodeB ].
   ex:hasFooB " .. string .. " 

The full data node model framework includes an additional mechanism (not covered in this overview) which allows certain combinations of dnode:contains asserted between members of specific data nodes to instead be transformed (in the corresponding RDF) into assertions using a specific property instead. This provides more expressive relations between members of the data nodes besides the contains relationship whose semantics is limited to infoset constraints in a directed graph and thus very ambiguous.

Value Proposition of Dual Representation in CMS and RESTful Applications

This guided modeling of a multigraph of OWL classes and RDF properties and the XML-based dialects that are generated from such a modeling is the way by which Content Management Systems with native support ([KR-DB], [XCMS]) for the XML and RDF syntax as well as native support for GRDDL-like mechanisms are bootstrapped with predefined GRDDL transforms, XML schemas, OWL ontologies, and even fully generated XForms user interfaces.

This last capability (auto-generated screens) is yet another example of how the formally expressed infoset constraints can be used to facilitate operations on the XML source documents. In particular, a templated user interface description (which references the data nodes in the corresponding data node model) is processed by a script which takes as (additional input) the data node model and generates a fully featured XForms 1.1 document. The XForms 1.1 document is pre-built to perform user-guided XML operations on XML instances which make use of the namespace associated with the data node model.

These XML documents are sent back and forth over the wire (in very RESTful fashion [RWAB]) to the underlying CMS which updates the corresponding faithful rendition of each document with respect to the XSLT transform. In this way, the first-class representation (the original XML document) becomes the primary exchange format for data entry, replication, RESTful operations, etc.

The uniformity of the initial XML vocabulary solves the problem of impedance with associating CRUD user interfaces with RDF graphs directly. The dual representation capability also emphasizes representation state transfer [REST] operations where the entire state of a resource (an XML document in a CMS with a corresponding faithful RDF rendition) is sent back and forth from the CMS's HTTP server to a remote, browser-based application. This provides a uniform granularity for other features common in a CMS including ACL-based security policy, where access control is delegated at the document / graph level. In addition, this (to some extent) obviates the need to support a fine-grained update language for manipulating RDF graphs. Updates to the RDF graph can be made at the coarse-grained level associated with the HTTP PUT method.

Bibliography

[KR-DB] Borgida, A: Knowledge Representation meets Databases — a view of the symbiosis —, 9 June 2007. Proceedings of the 2007 International Workshop on Description Logics (DL2007). http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS//Vol-250/invited_3.pdf

[MANOWL] Horridge, M. - Drummond, N. Goodwin, J. - Rector, A. - Stevens, R.- Wang, H. H: The Manchester OWL Syntax, Date. Proceedings of the OWL: Experiences and Directions Workshop Series. http://owl-workshop.man.ac.uk/acceptedLong/submission_9.pdf

[GRDDL] Conolly, D.:Gleaning Resource Descriptions from Dialects of Language, W3C Proposed Recommendation, 16 July 2007. World Wide Web Consortium. http://www.w3.org/TR/grddl

[OWL] Dean, M. - Schreiber, G. (Editors):OWL Web Ontology Language Reference, W3C Recommendation, 10 February 2004. World Wide Web Consortium. http://www.w3.org/TR/owl-ref

[RDFS-INFO] Tobin, R.: An RDF Schema for the XML Information Set, W3C Note, 6 April, 2001. World Wide Web Consortium. http://www.w3.org/TR/xml-infoset-rdfs

[RIFBLD] Boley, H. - Michael, K.: RIF Basic Logic Dialect. http://www.w3.org/2005/rules/wg/wiki/Core

[CURIE] Birbeck, B.: CURIE Syntax 1.0, W3C Note, 27 October 2005, World Wide Web Consortium. http://www.w3.org/2001/sw/BestPractices/HTML/2005-10-27-CURIE

[INFOSET] Cowan, J. - Tobin, R.: XML Information Set (Second Edition), W3C Recommendation, 4 February 2004, World Wide Web Consortium. http://www.w3.org/TR/xml-infoset/

[KR] Davis, R - Shrobe, H - Szolovits, P: What is a Knowledge Representation?, 1993. MIT Computer Science and Artificial Intelligence Laboratory. http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html

[PROXSEM] Thompson, H.: Formalising the Proximate Semantics of XML Languages with UML, OWL and GRDDL ,17 May 2007, Proceedings of XTec 2007. http://2007.xtech.org/public/schedule/detail/46

[XCMS] Ogbuji, C.:Tools for Next Generation of CMS: XML, RDF, & GRDDL, 23 May 2007. Semantic Technology Conference 2007, San Jose, CA. http://www.semantic-conference.com/2007/sessions/r3.html

[SWBPCV] Rector, A.: Representing Specified Values in OWL: "value partitions" and "value sets", W3C Working Group Note, 17 May 2005, World Wide Web Consortium. http://www.w3.org/TR/swbp-specified-values/

[REST] Fielding, R.: Architectural Styles and the Design of Network-based Software Architectures: Representation State Transfer. 2000. UNIVERSITY OF CALIFORNIA, IRVINE. http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

[RWAB] Birbeck, M. - Boyer, J. - Gilman, A. - K. Kelley - S. Pemberton - C. Wiecha: Rich Web Application Backplane. 16 November 2006. World Wide Web Consortium. http://www.w3.org/TR/backplane/

[DL2OWL] Horrocks, I - Patel-Schneider, P - Harmelen, F: From SHIQ and RDF to OWL: The Making of a Web Ontology Language., 2003. Journal of Web Semantics. http://www.cs.man.ac.uk/~horrocks/Publications/download/2003/HoPH03a.pdf