Versa by Deconstruction ======================= *`Chimezie. June 12th. 2005`* ## Purpose ## The purpose of this document is to serve as a companion to the Versa specification. It is written explicitly for readers with little to no experience in reading a formal language specification. It is also geared for readers with minimal experience with the RDF data model and no experience with it's associated syntaxes. It tries to focus mainly on the most fundamental aspects of Versa (as well as those similar to [SPARQL](http://www.w3.org/TR/rdf-sparql-query) - an alternative and more recent RDF querying language - and [XPath](http://www.w3.org/TR/xpath)). ## What is Versa? ## Versa is a query language designed for the specific purpose of extracting information from an RDF graph. A Versa query facilitates the isolation of resources, and their associated property values through specific patterns and constraints as specified by a Versa *expression*. A Versa query is performed by submitting a Versa expression to a Versa *query processor* associated with an RDF graph from which the user wishes to extract information. A Versa expression consists of either: a resource, a blank node, a literal, a traversal expression (or filter), a list, a set, a variable (referenced by name), or the recognized name of a function. ## RDF Data Model ## The RDF Data Model is best described in section [3.2](http://www.w3.org/TR/rdf-concepts/#section-data-model) of the RDF Concepts document. A diagram of this model is below: RDF Data Modell ## Data types ## Versa defines the following data types: **Resource:** A Versa resource is a string that represents the URI (Universal Resource Identifier) of an RDF resource in the underlying RDF graph. A Versa resource is expressed in one of two forms: as a QName (defined by the XML Namespaces specification) or as a fully expressed URI as a quoted string preceded by the @ character. In short, a QName consists of two parts seperated by the colon ascii character (':'). The first part is the namespace prefix, which is a short string associated with a URI. The second part is a local name, which is also a short string (but usually longer than a prefix). A QName is essentially the canonical or shortened form of a URI formed by concatenating the URI associated with the prefix with the local name. For example, the *rdf* prefix is almost always associated with the following URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#. So, the QName *rdf:type* is shorthand for the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#type It's useful to think of QNames as comprising of two parts to a name: an identifier for a larger domain of information and name which identifies a concept within the larger domain. It allows you to be specific about the concepts you use where there could be ambiguity. Consider the above example: the concept *type* could mean several things depending on the context. It could mean a size or style of printed or typewritten characters or it could mean a classification of people or things with common characteristics. By using the URI associated with the *rdf* prefix you can be specific about which concept of type you meant to use. **Blank Node / Anonymous Resource:** A Versa blank node is the same as an RDF blank node. An RDF anonymous resource can be: *"treated as simply indicating the existence of a thing, without using, or saying anything about, the name of that thing."* So, they are resources without a permanent unique identifier (a URI). Consider the statement: *"Someone was the mother of Lee Harvey Oswald."* The woman who birthed Lee is not named explicitly but that doesn't prevent us from making a statement about her. **Literal:** A Versa literal is a special kind of expression that represents a value: A string, a number, or a boolean value. **String:** A Versa string is simply a sequence of zero or more characters. Strings are expressed within Versa by enclosing them with single or double quotes. In order to allow the quote characters themselves to be included within a string, Versa provides a means to 'escape' quotes by using the '\' character. The following is an example of using the '\' character to allow the inclusion of a quote character within a string: *'This document\'s subject is Versa'* **Number:** A Versa number is a literal that represents a numerical value. **Boolean:** A boolean represents a literal value of 'true' or 'false.' The character '*' is provided as the short-form for 'true.' **Set:** A Versa set follows from the definition of a set in the mathematical sense (as specified by [Native Set Theory](http://en.wikipedia.org/wiki/Naive_set_theory)): A collection of distinct elements having specific common properties. Members of a set consist of: literals, resources, and blank nodes, (other) sets, and lists. **List:** A Versa list is a collection of elements (not necessarily distinct) which can be any one of: literals, resources, and blank nodes, (other lists), and sets. ## Contexts ## Often, Versa expressions are evaluated with respect to a context. A Versa context is very similar an XPath context (and is comprised of similar parts). As a refresher, an XPath context consists of: * a node (the **context node**) * a pair of non-zero positive integers (the **context position** and the **context size**) * a set of variable bindings * a function library * the set of namespace declarations in scope for the expression The definition of a Versa Context: *"Many Versa constructs are evaluated with regard to a context. The context is a value of any data type, and it can always be referred to explicitly in an expression using the token "."* You can think of the Versa context as an XPath context without a position and size and associated with a named graph or the entire underling RDF model. This concept of querying within a named graph is well expressed in [8.2](http://www.w3.org/TR/rdf-sparql-query/#restrictByLabel) of the SPARQL specification. This *named graph* is the scope * A scope: the name of a sub-graph within the entire RDF model to restrict queries to ## Forward Filter / Traversal Expressions ## The most fundamental kind of expression in Versa is the *Forward Traversal Operator*. A forward traversal operator is an expression which returns a list of objects from statements that match the pattern it represents. Traversal operators are almost direct, visual analogs of [triple patterns](http://www.w3.org/TR/rdf-sparql-query/#defn_TriplePattern). A Versa forward filter operator always takes on the following form: *subjects - predicates -> boolean* Statements match the pattern if their subject, predicate, and objects match the criteria expressed in the operator. Specifically, the first item (marked *subjects* above) is an expression that is expected to result in a list of resources. The results from evaluating this expression are converted to a list. Statements that match must have a subject whose value is equivalent to any of the resources in the resulting list. Versa has an explicit set of rules that are followed when converting from one data type to another. The most common example, however is to convert a single resource or literal to a list. A resource or literal can be converted to a list by returning a list with the resource or literal as it's only member (a list with a length of 1). The complete set of rules are expressed as a matrix in section 3.2 of the Versa specification. The second item (marked *predicates* above) is an expression that is also expected to result in a list of resources. In this case, the expression is evaluated with respect to a list consisting of the resources from the first expression (*subjects*). Statements that have matched the first criteria (the *subjects* expression) are examined and excluded if their predicate is not a member of this resulting list. Finally, the third expression (marked as *boolean*) is evaluated with respect to the objects of each of the statements that have matched the pattern so far. These expressions are expected to evaluate to a boolean value. If this value is *true* then the objects of the resulting statements are returned. They are discarded otherwise. There is an alternative form of this that allows the subjects of the resulting statements to be returned instead. These are called *Forward Traversal Filter Operators* and they all take on the following form: *subjects |- predicates -> boolean* Below is an example taken from the *Versa by Example* [document](http://uche.ogbuji.net/tech/rdf/versa/versa-by-example.txt): *all()-rdfs:label->* *all* is a built-in versa function that returns all the resources in the underlying RDF model. The prefix *rdfs* is almost always associated with the URI http://www.w3.org/2000/01/rdf-schema#. This URI identifies the domain of concepts that have to do with defining an RDF vocabulary. Within this domain, the property http://www.w3.org/2000/01/rdf-schema#label is defined as *"A human-readable name for the subject."* In this case the *rdfs:label* property when used as the predicate of a statement, associates a human-readable name - which consists of a string - to the subject of the statement. The example forward traversal expression will return the human-readable names of every resource in the RDF model. This is an example of the most common way the '\*' character is used. ## Backward Traversal Expressions ## A backward traversal operator essentially expresses the inverse constraint along the same predicate(s). Backward traversal operator expressions are of the form: *list <- list - boolean* Below is a more complex example also taken from the *Versa by Example* document that demonstrates combining traversal expressions in both directions: *(rdfs:Resource <- rdf:type - *) - rdfs:label -> \** The *rdfs* domain also defines a concept *rdfs:Resource* (fully identified by the URI http://www.w3.org/2000/01/rdf-schema#Resource). An *rdfs:Resource* is the class of all resources in a graph. So the *all* function can be considered as implementing the leftmost / inner expression: *rdfs:Resource <- rdf:type - \** The second part of the expression will return the human-readable label of every resource included in the result of evaluating the first expression. In other words, it will return the label of everything in the underlying RDF model. ## Variables ## Versa allows the use of variables to temporarily bind results to a named variable. A Versa context always has associated with it a set of variable bindings. Variables can be referenced by an expression with the following form: *$variableName*, where **variableName** is the name of the variable being referenced. Such an expression always evaluates to the value associated with the named variable. ## Standard Functions ## ### Conversion Functions ### * list * set * boolean * string * number ### Set and List Functions ### * member (returns true if the given list has a member equal to a value – if given) * distribute * map * filter * sort (convert the first argument to a list sorted in the given order) * max * min * union * intersection * difference * join * head * rest * tail * length * slice ### Comparison Functions ### * lt (less than) * gt (greater than) * lte / le (less than or equal to) * eq (equal to) * gte / ge (greater than or equal to) ### Boolean Functions ### * and * or * not * is-resource * is-literal ### String Functions ### * concat * starts-with * contains * substring-before * substring-after * substring * string-length ### Resource Functions ### * all (return all resources or those that match the given criteria) * [type](http://www.w3.org/TR/rdf-primer/#example12) (returns all resources of a specified type) * traverse * order (similar to *transitive-closure* but where the expression is: . - *property* -> \*) * properties (return the properties associated with the given resources) The *type* function (probably the most common function used) essentially implements the following expression: *all() |- [rdf:type](http://www.w3.org/TR/rdf-primer/#example12) -> member(transitive-closure(list(*classes*), '. - [rdfs:subClassOf](http://www.w3.org/TR/rdf-schema/#ch_subclassof) -> \*'),.)* ## Additional Functions ## The following is a list of functions added to 4Suite's Versa implementation *after* the specification was originally written: * scoped-subquery * scoped * transitive-closure ### Scoped Query ### The *scoped-subquery* function takes two arguments: *expr* and *scope*. The first argument is a string which is evaluated as a Versa expression. The second argument is converted to a list and the first item in this list is used as the scope to use when evaluating the expression. A scope is the name of an RDF graph (within the entire model). An RDF model can be thought of as a large RDF graph comprised of smaller, uniquely identified graphs. It is often useful to be able to look only within a named sub-graph. Since the context can only be associated with *one* named graph, you will often need to change the graph you want to look within at some point in the middle of a query. ### Scope ### The scope function takes a single argument which is converted to a list. For each item in the list, the following expression is evaluated: *. - rdf:type -> \** The definition of the *rdf:type* property (as stated within the RDF primer) is: ``When an RDF resource is described with an rdf:type property, the value of that property is considered to be a resource that represents a category or class of things, and the subject of that property is considered to be an instance of that category or class. `` The scopes of all the statements that match this expression are collected and returned as a list. ### [TransitiveClosure](http://en.wikipedia.org/wiki/Transitive_closure) ### This function is a bit more complicated and takes two arguments: *source-set* and *expr* . The first argument is expected to be a starting list of resources. The second argument is a string representation of a Versa expression to apply on all items in the starting set and to the results resursively. The result of evaluating the expression each time is converted to a list (using the standard conversion rules). This is best demonstrated with the following example which will return all of my ancestors (if evaluated against an RDF model that made use of the property *ex:parentOf* to express a relationship between a descendant and it's direct parent as well as a resource which represents me: **http://chimezie.ogbuji.net**): ``transitive-closure(list(@'http://chimezie.ogbuji.net'),'. <- ex:parentOf - *')`` ## Some Suggested Extension Functions ## Below is a list of the most useful extension functions I've defined: * value(*resources*) * query-chain(*expr1*,*expr2*,..,*exprN*) * sparql-eval(*expr*,*scope*) The *value* function is a shortcut for the expression: *. - rdf:value -> \** The *query-chain* function treats each of it's arguments as the string representation of a Versa expression which is compiled and evaluated. The results of successive evaluations are bound to variables of the form *resultN*, where **N** is the number of the expression evaluated. Each expression is evaluated with the prior result as the context node and with all the previous results available as variables (**result1**,**result2**,..,**resultN**). Finally, the *sparql-eval* function would take the first string as a SPARQL query and evaluate it with respect to the context node and scope or with respect to the given scope (the second argument). The resulting variables [*selected*](http://www.w3.org/TR/rdf-sparql-query/#select) are bound within the context. It has the advantage of being able to express graph patterns in a ["*Turtle-like*"](http://www.w3.org/TR/rdf-sparql-query/#syntaxMisc) syntax which can be more concise for some kinds of graph patterns. ## References ## * **Namespaces in XML.** Tim Bray, Dave Hollander, and Andrew Layman. January 14th 1999. <http://www.w3.org/TR/REC-xml-names > * **RDF Data Model.** World Wide Web Consortium. February 10th 2004. <http://www.w3.org/TR/rdf-concepts/#section-data-model > * **Versa Specification.** Uche Ogbuji, Mike Olson. <http://copia.ogbuji.net/files/Versa.html > * **RDF Context references.** Chimezie, Ogbuji. May 16th 2005. <http://copia.ogbuji.net/blog/2005-05-16/Contexts__> * **XPath - XML Path Language.** World Wide Web Consortium. <http://www.w3.org/TR/xpath> * **SPARQL Query Language for RDF : Restricting by Graph Label.** World Wide Web Consortium. April 19th 2005. <http://www.w3.org/TR/rdf-sparql-query> * **SPARQL - Graph Patterns.** Andy Seaborne, 2004. <http://www.w3.org/2004/Talks/17Dec-sparql/QueryLang1/all.html> * **Native Set Theory.** <http://en.wikipedia.org/wiki/Naive_set_theory> * **Redland Rasqal RDF Query Demonstration.** Dave Beckett. <http://librdf.org/query> * **Notation 3 Design Issues.** Tim-Berners-Lee. 27th December 2001. <http://www.w3.org/DesignIssues/Notation3.html> * **Turtle - Terse RDF Triple Language.** Dave Beckett. 23rd December 2004. <http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle> * **New Syntaxes for RDF.** Dave Beckett. 17th November 2003. <http://www.ilrt.bris.ac.uk/discovery/2003/11/new-syntaxes-rdf/paper.html>