configuring and securing a sparql endpoint
DESCRIPTION
Configuring and Securing a SPARQL endpointTRANSCRIPT
![Page 1: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/1.jpg)
Configuring and Securing a SPARQL endpoint
2012 VIVO Implementation Fest
![Page 2: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/2.jpg)
2
Welcome & Who am I?
Vincent Sposato, University of FloridaEnterprise Software EngineeringPrimarily focused on VIVO operations and reproducible harvests
John Fereira, Cornell UniversityMann Library Information Technology Services (ITS)Programmer / Analyst / Technology Strategist
![Page 3: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/3.jpg)
3
Goals of this session
• Provide you with an overview of SPARQL endpoint, and it’s uses
• Provide you with a process for installing and configuring a SPARQL endpoint (Fuseki specifically)
• Outline the possibilities for securing such an endpoint
• Answer questions
![Page 4: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/4.jpg)
SPARQL Endpoint Overview
![Page 5: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/5.jpg)
5
What is a SPARQL endpoint?
• A SPARQL endpoint enables users to query a knowledge base via the SPARQL language
• Results returned are normally in a machine readable language, as the primary purpose of the endpoint is information exchange
• Current Implementations– Joseki / Fuseki– Virtuoso – Many others depending on needs…
![Page 6: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/6.jpg)
6
Why use a SPARQL endpoint?
• To provide querying services for your dataset
• Provide your semantic data to other applications through machine readable interfaces
![Page 7: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/7.jpg)
7
Public SPARQL endpoints
• US Government– Data.gov (http://semantic.data.gov/sparql)
• University of Florida– VIVO (http://sparql.vivo.ufl.edu/sparql.html)
• Bio2RDF– PubMed SPARQL (
http://pubmed.bio2rdf.org/sparql)
![Page 8: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/8.jpg)
8
Data Reuse Example from Cornell
Data as it appears in VIVO for:Abruña, Héctor D
Data as it appears Cornell Department of Chemistry and Biology for:Abruña, Héctor D
![Page 9: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/9.jpg)
9
Why Fuseki and not Joseki?
• Fuseki is the successor to Joseki, and is based upon SPARQL 1.1
• Joseki has database connection timeout issues that Fuseki is able to resolve with an additional library
• Fuseki has true update support, and ability to define specific graphs
![Page 10: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/10.jpg)
Fuseki Installation
![Page 11: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/11.jpg)
11
Requirements for Fuseki
• Oracle/Sun Java 1.6+– OpenJDK would work
• Latest Fuseki package– Download the distribution package as it is a
complete environment– https://repository.apache.org/content/repositorie
s/snapshots/org/apache/jena/jena-fuseki/0.2.2-incubating-SNAPSHOT/
• Apache Web Server– Only if you want to redirect output by way of AJP
• Ability to remove the :2020 from the end of the URL of the SPARQL endpoint
![Page 12: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/12.jpg)
12
JAVA 6 JDK• Can I use the open-jdk?
– Yes, you can. However, if you are installing it on the same server as your VIVO, you need to make sure it is configured correctly not to interfere with Sun Java and the VIVO application
• What is Java?– “Write once, run anywhere” – popular quote about java
• Installation– Debian/Ubuntu
• apt-get install sun-java6-jdk• apt-get install openjdk-6-jre
– Centos/Redhat • yum install java (need to configure alternatives)• yum install java-1.6.0-openjdk
– Windows: download and install
![Page 13: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/13.jpg)
13
Apache
• Why do I need Apache too?– Allows for AJP for redirecting 2020 to a standard
web port (80, 443)
• What is Apache?– “a secure, efficient and extensible server that
provides HTTP services in sync with current HTTP standards” – httpd.apache.org
• Installation– Debian/Ubuntu – apt-get install apache2– Centos/Redhat – yum install httpd– Windows: download and follow the instructions
![Page 14: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/14.jpg)
14
Fuseki• Download Fuseki (tar/zip)
– wget https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-fuseki/0.2.2-incubating-SNAPSHOT/jena-fuseki-0.2.2-incubating-20120506.050243-16-distribution.tar.gz
• Extract contents of the file– tar xzvf fuseki-0.2.2-incubating-20120506.050243-16-
distribution.tar.gz
• Create a Fuseki directory– mkdir /usr/local/fuseki
• Copy extracted contents to new directory– cp –R jena-fuseki-0.2.2-incubating-SNAPSHOT/*
/usr/local/fuseki
• Make fuseki_server executable– chmod 777 fuseki_server
![Page 15: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/15.jpg)
15
Supporting Libraries• Download Jena-ARQ-2.9.9
– wget http://www.apache.org/dist/incubator/jena/jena-arq-2.9.0-incubating/jena-arq-2.9.0-incubating.jar
• Download Jena-IRI-0.9.0– wget
http://www.apache.org/dist/incubator/jena/jena-iri-0.9.0-incubating/jena-iri-0.9.0-incubating.jar
• Download Jena-SDB-1.3.4– wget
http://sourceforge.net/projects/jena/files/SDB/SDB-1.3.4/sdb-1.3.4.zip/download
– cp download sdb-1.3.4.zip
• Download MySQL-Connector-Java-5.1.19– wget
http://mirrors.ibiblio.org/pub/mirrors/maven2/mysql/mysql-connector-java/5.1.19/mysql-connector-java-5.1.19.jar
![Page 16: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/16.jpg)
Fuseki Configuration
![Page 17: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/17.jpg)
17
Prepare supporting libraries
• Make a lib directory under /usr/local/fuseki–mkdir /usr/local/fuseki/lib
• Copy all jar files into new lib directory–Make sure that you unzip the SDB-1.3.4 file,
and extract the jar file from it
![Page 18: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/18.jpg)
18
Create configuration file
• Create a new file in the /usr/local/fuseki directory– nano /usr/local/fuseki/fuseki-vivo.ttl
• This file will hold Fuseki’s:– Server Service definitions– RDF Dataset definitions– Graph definitions
![Page 19: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/19.jpg)
19
Add namespaces to the file# Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
@prefix : <#> .@prefix fuseki: <http://jena.apache.org/fuseki#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .@prefix jumble: <http://rootdev.net/vocab/jumble#> .@prefix sdb: <http://jena.hpl.hp.com/2007/sdb#> .This section defines the namespaces we will be utilizing throughout the configuration file. The Fuseki configuration file is written in N3/Turtle
![Page 20: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/20.jpg)
20
Define the Fuseki server[] rdf:type fuseki:Server ; # Timeout - server-wide default: milliseconds. # Format 1: "1000" -- 1 second timeout # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query. # See java doc for ARQ.queryTimeout ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000,60000" ] ;
fuseki:services ( <#service_VIVO_read_only> ) .
This sections tells the Fuseki server which services defined later should be enabled – if they are not ‘turned on’ here they will be ignored in the file later on.
![Page 21: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/21.jpg)
21
Define the connection libraries# SDB[] ja:loadClass "net.rootdev.fusekisdbconnect.SDBConnect" .jumble:SDBConnect rdfs:subClassOf ja:RDFDataset .
This section specifically defines the connection classes you will be using. The one needed for VIVO 1.2+ will be SDB.
![Page 22: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/22.jpg)
22
Define the service<#service_VIVO_read_only> rdf:type fuseki:Service ; rdfs:label "UF VIVO Service (R)" ; fuseki:name "VIVO" ; fuseki:serviceQuery "query" ; fuseki:serviceQuery "sparql" ; fuseki:serviceUpdate "update" ; fuseki:serviceUpload "upload" ; fuseki:serviceReadWriteGraphStore "data" ; # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore "get" ; fuseki:dataset <#ufvivo_dataset_read> ;
.
This section defines the name of the service, and the different functionality that this service will provide. It also has a link to the dataset that is backing this service.
![Page 23: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/23.jpg)
23
Define the dataset<#ufvivo_dataset_read> rdf:type sdb:DatasetStore ; sdb:store <#VIVOStore> .
Here the dataset that will be served by your services are defined. You can add named graphs if you want to only define a specific graph to be accessed. We also have a link to the actual store that this data resides in.
![Page 24: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/24.jpg)
24
Define the data store<#VIVOStore> rdf:type jumble:SDBConnect; rdfs:label "UF VIVO SDB Store"; sdb:layout "layout2"; jumble:defaultUnionGraph "true" ; sdb:engine "InnoDB"; sdb:connection [ rdf:type sdb:SDBConnection; sdb:sdbHost ”localhost"; sdb:sdbType "mysql"; sdb:sdbName ”vitrodb"; sdb:sdbUser ”vitro"; sdb:sdbPassword ”vitro123"; sdb:driver "com.mysql.jdbc.Driver"; ] .
We define the actual database connection information required to allow the service to query the database. Here we are assuming you are using MySQL, other servers may be configured differently.
![Page 25: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/25.jpg)
25
Create Fuseki launch script
• Create a new file in the /usr/local/fuseki directory– nano /usr/local/fuseki/launchFuseki.sh
• This file will :– Set some environment variables– Execute the Java jar file for Fuseki– Output results to a log
![Page 26: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/26.jpg)
26
Define the environment#!/bin/bash
export FusekiInstallDir=/usr/local/fusekiexport FusekiPort=3030export FusekiJVMArgs="-cp $FusekiInstallDir/fuseki-server.jar:$FusekiInstallDir/lib/* -Xmx1200M"export Date=`date +%Y-%m-%d`export FusekiLogFile=$FusekiInstallDir/FusekiLog-$Date.logexport FusekiConfigFile=$FusekiInstallDir/fuseki-vivo.ttlexport FusekiServiceName=/VIVO
These items are needed in order to properly call the remainder of the tasks associated with initiating Fuseki.
![Page 27: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/27.jpg)
27
Initiate Java & Fuseki# Check to see if logfile existsif [ ! -f $FusekiLogFile ]; then touch $FusekiLogFilefi
# Check to see if config file existsif [ ! -f $FusekiConfigFile ]; then echo “ERROR – Fuseki failed to start – no configuration file - $FusekiConfigFile” >> $FusekiLogFile exit 1fi
# Execute Java calling the package for Fusekijava $FusekiJVMArgs org.apache.jena.fuseki.FusekiCmd --desc $FusekiConfigFile --port=$FusekiPort $FusekiServiceName >> $FusekiLogFile 2>&1 &
We do some basic checks and then instantiate Fuseki server, passing it the configuration needed.
![Page 28: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/28.jpg)
28
Get Fuseki started
• Change permissions on launchFuseki.sh to allow for execution• chmod 777 launchFuseki.sh
• Run launchFuseki.sh• ./launchFuseki.sh
• Tail the log to ensure that all is running correctly• tail –f fusekiLog-Date.log• Last line should appear as :• 17:42:24 INFO Server :: Started
2012/05/08 17:42:24 EDT on port 3030
![Page 29: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/29.jpg)
29
Test your Fuseki• Go to www.example.com:3030• Select Control Panel from the Server Management area• Select /VIVO from the dropdown that appears, and click Select• Let’s enter a SPARQL query to test:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX geo: <http://aims.fao.org/aos/geopolitical.owl#>PREFIX core: <http://vivoweb.org/ontology/core#>
## This example query gets 50 geographic locations# and (if available) their labels#SELECT ?countryName ?iso3WHERE { ?country rdf:type core:Country OPTIONAL { ?country geo:nameListEN ?countryName } OPTIONAL { ?country geo:codeISO3 ?iso3 }
}LIMIT 50
• Select Text from the Output dropdown• Click Get Results• If the result returned 50 lines, then you now have a working endpoint. CONGRATULATIONS!
![Page 30: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/30.jpg)
Securing Fuseki
![Page 31: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/31.jpg)
31
Basic - Firewall
• The easiest method of protecting your SPARQL endpoint would be a firewall
• You can block access to the specific ports that Fuseki is running on
• This is more a kin to using a machete, when a scalpel might be better suited
• Works well if you have no interest in sharing data with the outside world
![Page 32: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/32.jpg)
32
Intermediate – Fuseki Config
• If you want people to be able read data, but not update data through your endpoint – Fuseki config file is a good start.
• If you do not define an update process, no one will be able to update your dataset – PERIOD.
• Even if you happen to leave in the update configuration, unless you start the Fuseki server with --update it will not allow updates to happen either.
• Intermediate level of configuration, although still pretty broad controls of on or off
![Page 33: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/33.jpg)
33
Advanced – Fuseki Partitions
• Partition 2+ separate Fuseki configs that allow different levels of access and/or to different datasets.
• Grant access to the different Fuseki servers based upon ports being used.
• Also possibly add authentication at this point to allow for some sort of external authentication.
![Page 34: Configuring and Securing a SPARQL Endpoint](https://reader035.vdocument.in/reader035/viewer/2022081420/55cf91a1550346f57b8f1ada/html5/thumbnails/34.jpg)
Questions?