Introducing SPARQL

There are a good number of online resources for learning SPARQL including:

As well, we recommend the following books:

Learning SPARQL, by Bob DuCharme
Semantic Web for the Working Ontologist, by Jim Hendler and Dean Allemang

And, of course, there is the W3C SPARQL spec and their published Glossary of Linked Data terms.

We assume you can learn SPARQL syntax elsewhere. In this exercise, we will write a series of SPARQL queries over the data you've just loaded in the previous exercise.

We have provided versions of all the queries for this exercise in an associated Query Console workspace, ts-sparql.xml. You may wish to try formulating the queries yourself, before reading the solutions in the workspace. Or you can simply download and import them into Query Console and try them, if you're feeling challenged :).

Browsing the graph

In order to understand what's in our data, it's helpful to explore a bit. The REST API exposes an endpoint for browsing around your graph. Point your browser at http://localhost:9910/v1/graphs/things (replace localhost as needed) and you will see the first 10,000 nodes (listed by IRIs) in the database.

I happen to be interested in bridges so when I did this, I clicked on the <http://dbpedia.org/resource/Brooklyn_Bridge> and got back all the triples that reference the Brooklyn Bridge. Go ahead and do that yourself.

From there, I clicked on the predicate for geographic points (<http://www.georss.org/georss/point>). If you do this, you will see the first 10,000 geo points we have. Scrolling down the results, you'll eventually see a subject: <http://dbpedia.org/resource/Brooklyn>. We've found what looks to be a resource identifier for the city of Brooklyn. You can then click on it to see all the facts about Brooklyn. (Alternatively, if you were looking for Brooklyn to start with, you could have gone and read about DBPedia and learned that it uses the prefix <http://dbpedia.org/resource> for resources).

Asking Questions of DBPedia

You have an identifier for Brooklyn. So, let's see what we can find out about it.

You can see from the things endpoint, that we have facts that use the predicate: <http://dbpedia.org/ontology/birthPlace>. So you can ask "Who was born in Brooklyn?". You can write that in SPARQL as:

SELECT * WHERE { ?s <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Brooklyn> }

You can actually write that so it is a little more readable, with prefixes, as:

PREFIX db: <http://dbpedia.org/resource/> PREFIX onto: <http://dbpedia.org/ontology/> SELECT * WHERE { ?s onto:birthPlace db:Brooklyn }

You can now see that Danny Kaye was born in Brooklyn. What else do you know about him? You can ask that as

PREFIX db: <http://dbpedia.org/resource/> PREFIX onto: <http://dbpedia.org/ontology/> SELECT ?p ?o WHERE { db:Danny_Kaye ?p ?o }

You can use Query Console to execute these SPARQL queries against the tutsem-content database (make sure to choose Query Type: SPARQL Query). I'll leave the next few to you:

Find all predicates and objects with Danny Kaye as subject
1. Return the answer as triples - i.e. Danny Kaye - predicate - object (Hint: SPARQL SELECT returns "solutions"; SPARQL CONSTRUCT returns "triples")
2. Alternatively, do this via a DESCRIBE query
Who else was born in the same place as Danny Kaye?
Who was born in the same place as Danny Kaye AND died in Seattle?
Find everyone who was born the same place as Danny Kaye OR who died in Washington DC? Return results in descending order of name.

News Data

The BBC data contains news articles and metadata stored as triples. One of the vocabularies used is rnews (<http://iptc.org/std/rNews/2011-10-07#>). You can go learn about rnews when you have time, but for now, let's take as given that it uses the following identifiers:

NewsItem - ID of the news item
headline
datePublished

If you recall, we loaded the news triples into the graph "http://www.bbc.co.uk/news/graph".

Can you find all the headlines and dates of news items in the graph "http://www.bbc.co.uk/news/graph", ordered by date? Try this:

PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#> SELECT $s $headline $date FROM <http://www.bbc.co.uk/news/graph> WHERE { $s a rnews:NewsItem ; rnews:headline $headline ; rnews:datePublished $date . } ORDER BY DESC($date) # Note: the predicate "a" is shorthand for "has the RDF type"
Now, try finding all the headlines and dates of news items in the graph "http://www.bbc.co.uk/news/graph", ordered by date, but only show the items newer than July 11 2013. (hint: use FILTER)
Next, find all the headlines and dates of news items in the graph "http://www.bbc.co.uk/news/graph", ordered by date, but only show the second "page" of results (a page is 25 items).
What if a news item doesnt have a datePublished? Modify your headlines query to include headlines of items that don't have a date. (NB: The dataset doesn't actually have items with missing dates).
Find all the headlines and dates of news items in the graph http://www.bbc.co.uk/news/graph", ordered by date, but only show the items where the headline contains "Elton John"
Are there any news items in the graph http://www.bbc.co.uk/news/graph" newer than August 1st 2013?

If you recall from our data loading exercise, we learned a little about how the IRIs of our news documents are expressed. Now, let's say we want to find out something about one of the news documents. Let's get all the subjects added by OpenCalais and organize them by type. We can do that by issuing a SPARQL query based on a specific IRI like for example, the one below on <http://www.bbc.co.uk/news/world-asia-22965046>:

PREFIX oc: <http://s.opencalais.com/1/pred/> PREFIX cat: <http://s.opencalais.com/1/type/cat/> PREFIX e: <http://s.opencalais.com/1/type/em/e/> PREFIX geo: <http://s.opencalais.com/1/type/er/Geo/> PREFIX bbc: <http://www.bbc.co.uk/news/> PREFIX oc: <http://s.opencalais.com/1/pred/> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#> # @title All subjects for a given News Item. # @author Philip A. R. Fennell CONSTRUCT { $subject oc:name $name ; rdf:type $type . } FROM <http://www.bbc.co.uk/news/graph> WHERE { SELECT DISTINCT $subject $name $type WHERE { $DocInfo owl:sameAs <http://www.bbc.co.uk/news/world-asia-22965046> . $thing oc:docId $DocInfo ; oc:subject $subject . $subject oc:name $name ; rdf:type $type . } ORDER BY $type }

Nifty! As you see, the CONSTRUCT statement created new triples that could have been inserted into the database.

For additional credit:

Try running the SPARQL queries via REST
Run some SPARQL queries as XQuery Search API extensions

See also News_Search.xml for additional advanced queries

References

Query Console Workspaces:
- ts-sparql.xml
- News_Search.xml over the news content and triples
Product Documentation:
- Semantics Quickstart
- SPARQL

Loading Data

SPARQL and XQuery/JavaScript Together

Contents