Home > Federated query support
Federated query support (SERVICE keyword)
The problem
SPARQL and semantic web in general have the capacity to work in distributed contexts and to resolve queries against more than one SPARQL endpoint. Yet this is complicated to comprehend for a user, and even more complicated for a user to specify which part of its query should be routed to which SPARQL service.
The solution
Starting with v8, Sparnatural provides basic support for federated querying using the SERVICE
keyword. The idea is that federated querying will be activated for certain properties in your configuration, that you need to configure in advance. Once configured, the use of federated querying will be transparent for the user, who will not have to set anything when writing its query.
In order to configure federated querying, you need to :
- Declare the endpoint to which you want to send the federated queries, by creating an instance of
sd:Service
- Indicate the SPARQL endpoint URL of this service using the
sd:endpoint
property - Link specific properties in your configuration to this endpoint using the
core:sparqlService
annotation.
In the queries involving these specific properties, all the “branch” of the query using this property will automatically be surrounded with a SERVICE
keyword.
We will illustrate this configuration with federated queries on Wikidata.
Configuration in SHACL
- Add a
http://data.sparna.fr/ontologies/sparnatural-config-core#sparqlService
annotation on a property shape that is to be executed as a SERVICE - This annotation will point to an entity that itself is the subject of an
http://www.w3.org/ns/sparql-service-description#endpoint
predicate holding the actual SPARQL endpoint URL - Optionnaly you can add
http://data.sparna.fr/ontologies/sparnatural-config-core#executedAfter
on the property shape to force the rest of the query to be put in subquery that will be evalued before the SERVICE clause is sent
PREFIX core: <http://data.sparna.fr/ontologies/sparnatural-config-core#>
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>
ex:myNodeShape a sh:NodeShape ;
sh:property ex:myProperty .
ex:myProperty sh:path ex:myProperty ;
# use custom annotation to link the property to the SPARQL service
# the SPARQL service indicates its endpoint with the sd:endpoint property
core:sparqlService [sd:endpoint <https://vocab.getty.edu/sparql> ] ;
# rdfs:label ...
# the remote part of the query will be executed after the rest
core:executedAfter true ;
.
Configuration in OWL
The short, simple, Turtle way
If you edit the configuration in Turtle, a configuration could like this:
PREFIX core: <http://data.sparna.fr/ontologies/sparnatural-config-core#>
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>
ex:myProperty owl:subPropertyOf core:NonSelectableProperty ;
# use custom annotation to link the property to the SPARQL service
# the SPARQL service indicates its endpoint with the sd:endpoint property
core:sparqlService [sd:endpoint <https://vocab.getty.edu/sparql> ] ;
# rdfs:label ...
.
Configuring in Protégé
Declare an instance of sd:Service
sd:Service
is a class declared in the SPARQL Service Description vocabulary. It is included for you in the Sparnatural configuration ontology, so you don’t have to add an import.
In Protégé, create an instance of that class, and give it the URI you want, for example https://www.wikidata.org
:
Declare the endpoint URL
While the Service URI can be anything you like, you need to formally declare the technical URL at which the service listens using the sd:endpoint
property on this instance.
in Protégé you are force to first declare an instance of owl:Thing with the URL to be able to further select it (if you edit your configuration manually you don’t need to do that).
Create an instance of owl:Thing and give it the precise URL of the SPARQL endpoint, in our case https://query.wikidata.org/
:
Then come back to your Service individual, edit it and add an object property assertion with predicate sd:endpoint
and value https://query.wikidata.org/
. In Protégé you are forced to switch to first switch to “View > Render by prefixed name” in order to be able to select your endpoint URL:
Link a property to the service
Now that our service is ready, we can indicate that a property in our configuration should be routed to that Service. We do that with the core:sparqlService
property. For example we could imagine that a property to fetch the geographic coordinates uses federation to fetch the geometry from Wikidata:
Note how:
- we select the service www.wikidata.org, not the URL https://query.wikidata.org
- Our property is mapped to
<http://www.wikidata.org/prop/direct/P625>
which is the property identifier in Wikidata holding geo coordinates
and… TADAM !
Now when a query is issued involving this property, it is automatically wrapped in a SERVICE clause using the endpoint URL:
Additionnal (experimental) config : executedAfter
Some - if not most - of the federated query use-cases involve first querying the local triplestore, retrieve some URIs, and then second fetch some properties of these URIs in a remote federated triplestore. And not the reverse. For these kinds of query to be successfull, we need to instruct Sparnatural to execute first the local part of the query, and in a second time the remote part of the query. This involves wrapping the first part of the query in nested SELECT clause.
This can be achieved by setting an extra (experimental) flag config:executedAfter
on the property:
PREFIX core: <http://data.sparna.fr/ontologies/sparnatural-config-core#>
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>
ex:myProperty owl:subPropertyOf core:NonSelectableProperty ;
# use custom annotation to link the property to the SPARQL service
# the SPARQL service indicates its endpoint with the sd:endpoint property
core:sparqlService [sd:endpoint <https://vocab.getty.edu/sparql> ] ;
# the remote part of the query will be executed after the rest
core:executedAfter true ;
# rdfs:label ...
This way, the generated SPARQL query will look like this: (note the use of the inner SELECT):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?ProvidedCHO_1 ?ProvidedCHO_1_label WHERE {
{
SELECT * WHERE {
?ProvidedCHO_1 rdf:type <http://www.europeana.eu/schemas/edm/ProvidedCHO>.
OPTIONAL {
?ProvidedCHO_1 (^<http://www.openarchives.org/ore/terms/proxyFor>/<http://purl.org/dc/elements/1.1/title>) ?ProvidedCHO_1_label.
FILTER((LANG(?ProvidedCHO_1_label)) = "en")
}
?ProvidedCHO_1 (^<http://www.openarchives.org/ore/terms/proxyFor>/<http://purl.org/dc/elements/1.1/type>) ?Type_2.
}
}
SERVICE <https://vocab.getty.edu/sparql> {
?Type_2 (<http://www.w3.org/2004/02/skos/core#scopeNote>/rdf:value) ?ScopeNote_4.
FILTER((LANG(?ScopeNote_4)) = "en")
}
}
LIMIT 1000
Joining on literal values
Sometimes the key to join between the local and the remote triplestore is a literal value. See the original discussion at #605. In that case, the way to configure this in SHACL is the following:
- Declare the property that contains in the local triplestore the key to join with the remote SPARQL service. Like “lcsh id”.
this:Museum a sh:NodeShape;
sh:targetClass dbpedia:Museum;
sh:nodeKind sh:IRI;
rdfs:label "Museum"@en, "Musée"@fr;
sh:property
# the property expressing the literal value
this:Museum_LCSH_id .
# The library of congress ID, A Literal
this:Museum_LCSH_id
sh:path dbpedia:fake_lcsh_id;
sh:name "LCSH ID"@en, "LCSH ID"@fr;
sh:node this:LCSHIdentifier;
dash:searchWidget core:NonSelectableProperty .
- Declare a
sh:node
constraint on this property pointing to the entity in the config that “represents” these literal values, like “LCSH Identifier”
# The library of congress ID, A Literal
this:Museum_LCSH_id
# points to a Literal NodeShape
sh:node this:LCSHIdentifier .
# A Literal NodeShape
this:LCSHIdentifier a sh:NodeShape;
# This represents literals
sh:nodeKind sh:Literal;
rdfs:label "Library of Congress Subject Headings ID"@en, "Library of Congress Subject Headings ID"@fr;
# ...
.
- Then attach a property on this literal entity, which represent how this same literal is expressed in the remote SPARQL service. This is counter intuitive, because literals normally cannot have properties, but this property will be an inverse property from the remote entity to the literal. It is this property that will be annotated with the
core:sparqlService
. The property also uses ansh:node
to declare its “range” entity, as usual.
Like this:
# A Literal NodeShape
this:LCSHIdentifier a sh:NodeShape;
# property attached to this literal
sh:property this:Wikidata_hasLCSHIdentifier ;
.
# The property being an inverse and linking to the Wikidata item
# Note the use of inversePath
this:Wikidata_hasLCAuthorityIdentifier sh:path [ sh:inversePath wdt:P244 ] ;
sh:name "is LC Authority ID of"@en, "is LC Authority ID of"@fr;
sh:nodeKind sh:IRI;
sh:node this:MuseumWikidata;
# This is using a SPARQL SERVICE keyword
core:sparqlService <https://www.wikidata.org> .