Versa by Deconstruction (and Analogy)

Chimezie. June 16th. 2005

Purpose

The purpose of this document is to serve as a companion to the Versa specification. It is written explicitly for readers with little to no experience in reading a formal language specification. It is also geared for readers with minimal experience with the RDF data model and no experience with it's associated syntaxes. It tries to focus mainly on the most fundamental aspects of Versa (as well as those similar to SPARQL - an alternative and more recent RDF querying language - and XPath).

What is Versa?

Versa is a query language designed for the specific purpose of extracting information from an RDF graph. A Versa query facilitates the isolation of resources, and their associated property values through specific patterns and constraints as specified by a Versa expression. A Versa query is performed by submitting a Versa expression to a Versa query processor associated with an RDF graph from which the user wishes to extract information.

A Versa expression consists of either: a resource, a blank node, a literal, a traversal expression (or filter), a list, a set, a variable (referenced by name), or the recognized name of a function.

RDF Data Model

The RDF Data Model is best described in section 3.2 of the RDF Concepts document. A diagram of this model is below:

RDF Data Modell

Data types

Versa defines the following data types:

Resource: A Versa resource is a string that represents the URI (Universal Resource Identifier) of an RDF resource in the underlying RDF graph. A Versa resource is expressed in one of two forms: as a QName (defined by the XML Namespaces specification) or as a fully expressed URI as a quoted string preceded by the @ character.

In short, a QName consists of two parts seperated by the colon ascii character (':'). The first part is the namespace prefix, which is a short string associated with a URI. The second part is a local name, which is also a short string (but usually longer than a prefix).

A QName is essentially the canonical or shortened form of a URI formed by concatenating the URI associated with the prefix with the local name. For example, the rdf prefix is almost always associated with the following URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#. So, the QName rdf:type is shorthand for the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#type

It's useful to think of QNames as comprising of two parts to a name: an identifier for a larger domain of information and name which identifies a concept within the larger domain. It allows you to be specific about the concepts you use where there could be ambiguity.

Consider the above example: the concept type could mean several things depending on the context. It could mean a size or style of printed or typewritten characters or it could mean a classification of people or things with common characteristics. By using the URI associated with the rdf prefix you can be specific about which concept of type you meant to use.

Blank Node / Anonymous Resource: A Versa blank node is the same as an RDF blank node. An RDF anonymous resource can be:

"treated as simply indicating the existence of a thing, without using, or saying anything about, the name of that thing."

So, they are resources without a permanent unique identifier (a URI). Consider the statement:

"Someone was the mother of Lee Harvey Oswald."

The woman who birthed Lee is not named explicitly but that doesn't prevent us from making a statement about her.

Literal: A Versa literal is a special kind of expression that represents a value: A string, a number, or a boolean value.

String: A Versa string is simply a sequence of zero or more characters. Strings are expressed within Versa by enclosing them with single or double quotes. In order to allow the quote characters themselves to be included within a string, Versa provides a means to 'escape' quotes by using the '\' character. The following is an example of using the '\' character to allow the inclusion of a quote character within a string:

'This document\'s subject is Versa'

Number: A Versa number is a literal that represents a numerical value.

Boolean: A boolean represents a literal value of 'true' or 'false.' The character '*' is provided as the short-form for 'true.'

Set: A Versa set follows from the definition of a set in the mathematical sense (as specified by Native Set Theory): A collection of distinct elements having specific common properties. Members of a set consist of: literals, resources, and blank nodes, (other) sets, and lists.

List: A Versa list is a collection of elements (not necessarily distinct) which can be any one of: literals, resources, and blank nodes, (other lists), and sets.

Contexts

Often, Versa expressions are evaluated with respect to a context. A Versa context is very similar to an XPath context (and is comprised of similar parts). As a refresher, an XPath context consists of:

The definition of a Versa Context:

"Many Versa constructs are evaluated with regard to a context. The context is a value of any data type, and it can always be referred to explicitly in an expression using the token "."

You can think of the Versa context as an XPath context without a position and size and associated with a named graph or the entire underlying RDF model. This concept of querying within a named graph is well expressed in 8.2 of the SPARQL specification. This named graph is the scope

Forward Filter / Traversal Expressions

The most fundamental kind of expression in Versa is the Forward Traversal Operator. A forward traversal operator is an expression which returns a list of objects from statements that match the pattern it represents. Traversal operators are almost direct, visual analogs of triple patterns. A Versa forward filter operator always takes on the following form:

subjects - predicates -> boolean

Statements match the pattern if their subject, predicate, and objects match the criteria expressed in the operator. Specifically, the first item (marked subjects above) is an expression that is expected to result in a list of resources. The results from evaluating this expression are converted to a list. Statements that match must have a subject whose value is equivalent to any of the resources in the resulting list.

Versa has an explicit set of rules that are followed when converting from one data type to another. The most common example, however is to convert a single resource or literal to a list. A resource or literal can be converted to a list by returning a list with the resource or literal as it's only member (a list with a length of 1). The complete set of rules are expressed as a matrix in section 3.2 of the Versa specification.

The second item (marked predicates above) is an expression that is also expected to result in a list of resources. In this case, the expression is evaluated with respect to a list consisting of the resources from the first expression (subjects). Statements that have matched the first criteria (the subjects expression) are examined and excluded if their predicate is not a member of this resulting list.

Finally, the third expression (marked as boolean) is evaluated with respect to the objects of each of the statements that have matched the pattern so far. These expressions are expected to evaluate to a boolean value. If this value is true then the objects of the resulting statements are returned. They are discarded otherwise.

Below is a diagram that expresses this process:

There is an alternative form of this that allows the subjects of the resulting statements to be returned instead. These are called Forward Traversal Filter Operators and they all take on the following form:

subjects |- predicates -> boolean

Below is an example taken from the Versa by Example document:

all()-rdfs:label->*

all is a built-in versa function that returns all the resources in the underlying RDF model. The prefix rdfs is almost always associated with the URI http://www.w3.org/2000/01/rdf-schema#. This URI identifies the domain of concepts that have to do with defining an RDF vocabulary. Within this domain, the property http://www.w3.org/2000/01/rdf-schema#label is defined as "A human-readable name for the subject." In this case the rdfs:label property when used as the predicate of a statement, associates a human-readable name - which consists of a string - to the subject of the statement.

The example forward traversal expression will return the human-readable names of every resource in the RDF model. This is an example of the most common way the '*' character is used.

Backward Traversal Expressions

A backward traversal operator essentially expresses the inverse constraint along the same predicate(s). Backward traversal operator expressions are of the form:

list <- list - boolean

Below is a more complex example also taken from the Versa by Example document that demonstrates combining traversal expressions in both directions:

(rdfs:Resource <- rdf:type - *) - rdfs:label -> *

The rdfs domain also defines a concept rdfs:Resource (fully identified by the URI http://www.w3.org/2000/01/rdf-schema#Resource). An rdfs:Resource is the class of all resources in a graph. So the all function can be considered as implementing the leftmost / inner expression:

rdfs:Resource <- rdf:type - *

The second part of the expression will return the human-readable label of every resource included in the result of evaluating the first expression. In other words, it will return the label of everything in the underlying RDF model.

Variables

Versa allows the use of variables to temporarily bind results to a named variable. A Versa context always has associated with it a set of variable bindings. Variables can be referenced by an expression with the following form: $variableName, where variableName is the name of the variable being referenced. Such an expression always evaluates to the value associated with the named variable.

Standard Functions

Conversion Functions

Set and List Functions

Comparison Functions

Boolean Functions

String Functions

Distribute Function

The distribute function is the most common way data is extracted from targeted resources in the underlying graph. It is also the most difficult to use the first time around because of the format of it's result. Below is a diagram of how the distribute function works:

Mathematically, the distribute function is best thought of as the cartesian product (where each pair is the result of applying the expression with the item as the context node) of the list of items in the source list and the list of expressions that follows - grouped in the order of items in the original set.

Resource Functions

The type function (probably the most common function used) essentially implements the following expression:

all() |- rdf:type -> member(transitive-closure(list(classes), '. - rdfs:subClassOf -> *'),.)

Additional Functions

The following is a list of functions added to 4Suite's Versa implementation after the specification was originally written:

Scoped Query

The scoped-subquery function takes two arguments: expr and scope. The first argument is a string which is evaluated as a Versa expression. The second argument is converted to a list and the first item in this list is used as the scope to use when evaluating the expression. A scope is the name of an RDF graph (within the entire model). An RDF model can be thought of as a large RDF graph comprised of smaller, uniquely identified graphs. It is often useful to be able to look only within a named sub-graph. Since the context can only be associated with one named graph, you will often need to change the graph you want to look within at some point in the middle of a query.

Scope

The scope function takes a single argument which is converted to a list. For each item in the list, the following expression is evaluated:

. - rdf:type -> *

The definition of the rdf:type property (as stated within the RDF primer) is:

When an RDF resource is described with an rdf:type property, the value of that property is considered to be a resource that represents a category or class of things, and the subject of that property is considered to be an instance of that category or class.

The scopes of all the statements that match this expression are collected and returned as a list.

TransitiveClosure

This function is a bit more complicated and takes two arguments: source-set and expr . The first argument is expected to be a starting list of resources. The second argument is a string representation of a Versa expression to apply on all items in the starting set and to the results resursively. The result of evaluating the expression each time is converted to a list (using the standard conversion rules). This is best demonstrated with the following example which will return all of my ancestors (if evaluated against an RDF model that made use of the property ex:parentOf to express a relationship between a descendant and it's direct parent as well as a resource which represents me: http://chimezie.ogbuji.net):

transitive-closure(list(@'http://chimezie.ogbuji.net'),'. <- ex:parentOf - *')

Some Suggested Extension Functions

Below is a list of the most useful extension functions I've defined:

The value function is a shortcut for the expression:

. - rdf:value -> *

The query-chain function treats each of it's arguments as the string representation of a Versa expression which is compiled and evaluated. The results of successive evaluations are bound to variables of the form resultN, where N is the number of the expression evaluated. Each expression is evaluated with the prior result as the context node and with all the previous results available as variables (result1,result2,..,resultN).

Finally, the sparql-eval function would take the first string as a SPARQL query and evaluate it with respect to the context node and scope or with respect to the given scope (the second argument). The resulting variables selected are bound within the context. It has the advantage of being able to express graph patterns in a "Turtle-like" syntax which can be more concise for some kinds of graph patterns.

References