Semantics

MarkLogic is the world’s only Enterprise Triple Store, managing documents, data, and triples together so you can discover, understand, and make decisions in context. MarkLogic 8 extends the use of standard SPARQL so you can do analytics (aggregates) over triples; explore semantics graphs using property paths; and update semantic triples; all using the standard SPARQL 1.1 language over standard protocols. In addition, MarkLogic 8 lets you discover new facts and relationships with Automatic Inference.

  1. SPARQL 1.1 Query – Property Paths
  2. SPARQL 1.1 Query – Aggregates
  3. SPARQL 1.1 Query – Update – Graph Update
  4. SPARQL 1.1 Update – Graph Management
  5. Automatic Inference

SPARQL 1.1 Query – Property Paths

SPARQL 1.1's property paths let you traverse an RDF graph – that is, you can follow a route across a graph.

You can answer questions like:

  • "show me all the people who are connected to Y" by finding people that know Y, and people that know people that know Y, and so on. This is sometimes known as a transitive closure query.
  • "show me all the papers that were influenced by X" by finding all papers that cite X, and papers that cite papers that cite X, and so on
  • in a triple store that has "parent" relationships ( :John :hasParent :Joe , :Joe :hasParent :Martha), "show me all the ancestors of :John"
  • "show me the contents of containers that travelled on a ship that is owned by a member of the board of a company whose headquarters is in China"

See SPARQL 1.1 Query Language – Property Paths (http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#propertypaths).

In MarkLogic 7 the following SPARQL 1.1 paths are supported:

Syntax Form Property Path Expression Name Matches
iri PredicatePath An IRI. A path of length one.
elt1 | elt2 AlternativePath A alternative path of elt1 or elt2 (all possibilities are tried).
elt1 / elt2 SequencePath A sequence path of elt1 followed by elt2.
^elt InversePath Inverse path (object to subject).
(elt) A group path elt, brackets control precedence.

In MarkLogic 8.0, all SPARQL 1.1 paths are supported except negation (!), including the following unenumerated paths:

elt? ZeroOrOnePath A path that connects the subject and object of the path by zero or one matches of elt.
elt+ OneOrMorePath A path that connects the subject and object of the path by one or more matches of elt.
elt* ZeroOrMorePath A path that connects the subject and object of the path by zero or more matches of elt.

Try out the following:

  • SPARQL queries with property paths over a large number of triples
  • SPARQL queries with property paths over a large number of triples in a cluster – at least 3 D-nodes
  • SPARQL queries with property paths as part of a complex SPARQL query
  • SPARQL queries with property paths plus a cts:query parameter to restrict results to only some documents (a combination query) – e.g. restrict to a collection or directory

Examples

SPARQL paths examples

SPARQL 1.1 Query – Aggregates

With aggregate SPARQL functions you can do simple analytic queries over triples.
MarkLogic 8 supports all the SPARQL 1.1 Aggregate functions – COUNT, SUM, MIN, MAX, and AVG – as well as the grouping operations GROUP BY, GROUP BY .. HAVING, GROUP_CONCAT and SAMPLE.
See the W3C recommendation at http://www.w3.org/TR/sparql11-query/#aggregates.

Aggregate functionality includes:

  • GROUP BY
  • COUNT
  • SUM
  • MIN
  • MAX
  • SAMPLE
  • AVG
  • GROUP_CONCAT
  • GROUP BY .. HAVING <some aggregate variable>
  • ORDER BY <some aggregate variable>
  • GROUP BY <more than one item>

Try the following:

  • SPARQL queries with aggregates over a large number of triples
  • SPARQL queries with aggregates over a large number of triples in a cluster – at least 3 D-nodes
  • SPARQL queries with aggregates as part of a complex SPARQL query
  • SPARQL queries with aggregates plus a cts:query parameter to restrict results to only some documents (a combination query) – e.g. restrict to a collection or directory

Examples

Formatted examples at: SPARQL aggregates examples

SPARQL 1.1 Query – Update – Graph Update

Delete, insert, and update (delete/insert) triples using the SPARQL 1.1 Update language. The following commands are supported in MarkLogic 8:

  • INSERT DATA
  • DELETE DATA
  • DELETE .. INSERT WHERE
  • DELETE WHERE
  • INSERT WHERE
  • CLEAR

Note that there is no UPDATE command in SPARQL 1.1 Update! To change a triple or set of triples, use the DELETE .. INSERT WHERE command. This will delete and insert triples in the same transaction, but the things you delete aren’t necessarily the same as the things you insert – if you want that kind of update functionality, you need to write the DELETE .. INSERT WHERE appropriately.

Per the SPARQL 1.1 Update spec there are two “shapes” of command here – INSERT DATA and DELETE DATA will insert and delete a specific triple, while DELETE .. INSERT WHERE lets you specify a pattern to match against. If you want to just delete triples, but you want to delete according to a pattern, use DELETE .. INSERT WHERE without the optional INSERT clause. Similarly, if you want to just insert triples, but you want to insert according to a pattern, use DELETE .. INSERT WHERE without the optional DELETE clause. See http://www.w3.org/TR/sparql11-update/#updateLanguage for details.

The only SPARQL 1.1 Update – Graph Update command not supported in this release is LOAD. LOAD doesn’t add anything to the Graph Store HTTP Protocol commands that are supported in MarkLogic 7 – see “Addressing the Graph Store” at http://docs.marklogic.com/guide/semantics/loading#id_39864.

Examples

For examples, see SPARQL update examples

SPARQL 1.1 Update – Graph Management

Manipulate RDF graphs using the SPARQL 1.1 Update language. The following commands are supported:

  • CREATE – create a graph
  • DROP – drop a graph and its contents
  • COPY – make the destination graph into a copy of the source graph; any content in the destination graph before this operation will be removed (think copy/paste)
  • MOVE – move the contents of the source graph into the destination graph, and remove them from the source graph; any content in the destination graph before this operation will be removed (think cut/paste)
  • ADD – add the contents of the source graph into the destination graph; keep the source graph intact; keep the initial contents of the destination graph intact

Graph-level security is enforced for SPARQL and XQuery/JavaScript operations. You can specify or change permissions on a graph.

Examples

For examples, see SPARQL update graph examples

Automatic Inference

Infer new triples automatically by specifying one or more rulesets. Rules are applied at query-time (for backward-chaining inference). Rulesets for RDFS, RDFS-Plus, OWL-Horst, and their subsets are supplied; and you can create your own.

To see the available rulesets, go to your MarkLogic install directory, then go to the Config directory under that. You’ll see a set of files with a .rules extension. Each file here is a ruleset. If you open one in a text editor you’ll see the rulesets are componentized – that is, they are defined in small component rulesets, then built up into larger rulesets using import. Inferencing is quite expensive – this “building block” approach means you can enable only the rules you really need for each query.
This is an important consideration – for best performance, you should only apply the rules that you need to apply.

You can set a default ruleset for a database, and supplement or override that default on a per-query basis.

In MarkLogic 8, there are two “flavors” for each of the top-level rulesets.

  • xxx-full.rules – the full ruleset according to the appropriate specification.
    owl-horst-full.rules, rdfs-plus-full.rules, rdfs-full.rules.
  • xxx.rules – a partially-optimized ruleset – closer to the specification than xxx-opt.rules, but more highly optimized than xxx-full.rules.
    owl-horst.rules, rdfs-plus.rules, rdfs.rules.

New built-ins for inference

In the worked examples we’ll use a couple of new built-ins – sem:store() and sem:ruleset-store() – to define the universe of triples over which we want to query. The store definition may include a ruleset, and it may contain other ways of restricting a query’s domain such as a cts:query.

The signatures for the new functions are reproduced below, from the MarkLogic 8 docs.
The worked examples show how to use these built-ins to specify a ruleset for inference. The ruleset is specified with each query: in a future release you’ll be able to specify a default ruleset for all queries against a database.

sem:store(
   [$options as xs:string*],
   [$query as cts:query?]
) as sem:store

Summary

Returns a sem:store value that queries from the current database’s triple index restricted by the cts:query argument when passed to sem:sparql() or sem:sparql-update() as part of the options argument.

Parameters
$options Options as a sequence of string values. Available options are:
“any”
Values from any fragment should be included.
“document”
Values from document fragments should be included.
“properties”
Values from properties fragments should be included.
“locks”
Values from locks fragments should be included.
“checked”
Word positions should be checked when resolving the query.
“unchecked”
Word positions should not be checked when resolving the query.
“size=number of MB”
The maximum size of the memory used to cache inferred triples. This defaults to the default inference size set for the app-server. If the value provided is bigger than the maximum inference size set for the app-server, an error is raised [XDMP-INFSIZE].
“no-default-rulesets”
Don't apply the database's default rulesets to the sem:store.
$query Only include triples in fragments selected by the cts:query. The triples do not need to match the query, but they must occur in fragments selected by the query. The fragments are not filtered to ensure they match the query, but instead selected in the same manner as “unfiltered” cts:search operations. If a string is entered, the string is treated as a cts:word-query of the specified string.

sem:ruleset-store

sem:ruleset-store(
   $locations as xs:string*,
   [$store as sem:store*],
   [$options as xs:string*]
) as sem:store

Summary

Returns a sem:store value that answers queries from the set of triples derived by applying the ruleset to the triples in the sem:store values provided in $store.

Parameters
$locations The locations of the rulesets.
$store The base store(s) to apply the ruleset over to get inferred triples.
$options Options as a sequence of string values. Available options are: “size=number of MB” The maximum size of the memory used to cache inferred triples. This defaults to the default inference size set for the app-server. If the value provided is bigger than the maximum inference size set for the app-server, an error is raised [XDMP-INFSIZE].

Required Privileges

http://marklogic.com/xdmp/privileges/sem-sparql

New signature for sem:sparq() and sem:sparql-update()

Note that the signatures for sem:sparql() and sem:sparql-update() have been changed for this release.
For example, $cts:query is no longer a parameter – it’s defined as part of sem:store.
The new signature for sem:sparql() is:

sem:sparql(
   $sparql as xs:string,
   [$bindings as map:map?],
   [$options as xs:string*],
   [$store as sem:store*]
) as item()*

The old (MarkLogic 7.x) signature for sem:sparql() has been retained for backwards-compatibility, but has been deprecated.

For Inference examples: see SPARQL inference examples

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.