Phobegone
Technical Report

  1. Vlad Ionuț Milinovici a
  2. Alexandru Ioan Palade a
  1. Faculty of Computer Science, Iași

Introduction

This project aims to be a Web service that combine sources of treatment information from the semantic Web about different phobias. It also features a social aspect to let users recomment what they think are better treatments. The current version of this technical report explores the implementation ideas and posibilities.

The purpose of our application is to help people treat their phobias. We want to aggregate different information from the semantic web into a single and easy source for help.

The following chapters describe the main modules, the data formats, and the data sources.

Project Architecture

The application server back-end will be written in the Python programming language, using a RDF specific library. The Web client side is using the standard HTML5 and CSS specifications together with the AngularJS library. This offers a dynamic and modular design of the presentation layer. We will now describe the main components of the software.

Internal Models

Phobia Selection and Treatments

First, the user needs to finding their phobia. To achieve this, we store a regularly updated list of phobias found with DBPedia and their description. The user then types either the phobia name or the thing that he or she is afraid. The UI must filter the list accordingly.

There could exist lesser known phobias which do not have a Wikipedia article. In that case, we still want to show the user their phobia name, but we don't have helpful data. This is both more user friendly but the user might also use other features of our service (e.g. the social aspect). Therefore we should also have a more complete and static list of phobias.

Finding the remedies for a phobia has three stages. Firstly, we query WikiData for any directly relevant fields. For example, the property Q319877 is "drug used for treatment". Then, we look up the drug on DrugBank (through DBPedia) to suggest similar, possibly useful drugs. Also, we can extract useful information such as the drug status and prices.

Because the WikiData project is still new and far from complete, we need to look for additional sources of information. Therefore, we look in the Treatments section of the phobia Wikipedia article. For each link to another Wikipedia page, we query its type and try to find out if it's relevant. For example, if it's an article about a pharmaceutical drug, there is a good chance that it's relevant for us. We could also use a basic Natural Language Processing algorithm to try to deduce the context of a resource URI from the Treatments section.

We use schema.org to tag the human-readable text with semantic data. For this, we need to find a mapping between the DBPedia and WikiData and schema.org ontologies. We can navigate the DBPedia concept graph until we find a owl:sameAs property that contains a mapping to schema.org. This is sometimes far too broad to be interesting.

The Collaboration Diagram for the main modules

Internal Data

All the data stored will be in the RDF format and queried with SPARQL. We need to transform data into

The data regarding the social aspect of the application must be transformed and stored in the RDF format. For the contact and social graph information we will use the FOAF vocabulary.

All our actual data sources have their specific ontologies. The exception to this are the supplimentary resources such as books and articles. The ISBN and ISSN are both official URN namespaces. There are other methods to transform a book resource to a RDF format.

For most of the semantic data processing we can use rdflib. This library can parse and serialize objects to RDF/XML, it offers a SPARQL endpoint and has persistence data capabilities. There will be a conversion module based on rdflib that will make the transactions between the model and the database seamless.

Our API will provide responses in the JSON-LD data format (that is recommended by the W3C). The extension called rdflib-json offers this capability.

There are numerous other Python libraries for RDF processing. For the triple store database, we have the following options: AllegroGraph, OpenLink Virtuoso and Sesame.

Data Sources

As data sources, we use DBPedia, Wikidata, Wikipedia, DrugBank and OpenStreetMap. They are highlighted in green in the figure presenting the architecture diagram.

We will use the data from the DBPedia graph using their SPARQL endpoint. DBPedia is extracting (or mapping) data from Wikipedia (mostly metadata about the articles and the info boxes). It tries to add structural information to the Wikipedia textual information.

Despite this, DBPedia doesn't structure actual text from the articles where most of the information about treatments reside. Therefore we do not extract remedies directly from each phobia article. Instead, we can try to find links from there to other Wikipedia pages. We then determine whether that page refers to a psychological treatment and so we might conclude that it is relevant to us.

Wikidata is also useful because there is information we cannot find on DBPedia. Contrary to DBPedia, they provide data to Wikipedia and its information is structured from the start. In other words, Wikidata does not extract data from Wikipedia as DBPedia does. It therefore offers supplementary information for some of the phobias, for instance, medication. This is structured information so it's straightforward to extract it. The only problem is that Wikidata is a newer project and it does not provide enough data, yet. We will consider contributing by adding some relevant information ourselves.

In addition to suggestions for remedies, we can also recommend psychologists close to the user's geographic location. This could be useful in case the application does not find enough information, or if the user really needs help from an expert. For this, we can use the OpenStreetMap's API that uses schema.org vocabularies. Their Semantic Network that is encoded as a SKOS vocabulary and contains Linked Open Data representations of OSM tags as JSON-LD. In the case of OpenStreetMap, we can use a regular expression in the API query to retrieve all the hostpitals that contain "Psychology" in the Name. We can restrict this search to a radius around a specific place.

Conclusion

The semantic web offers the tools necessary to build complex application by combining different sources of information. One difficulty is the lack of sufficient data for the newer projects (although this is improving continuously). Another minor issue is that the mapping between similar ontologies can be incomplete or inconsistent.

References

DBPedia
DBPedia, by Wikimedia Foundation ; .
DrugBank
DrugBank, by University of Alberta and The Metabolomics Innovation Centre ; .
FOAF Vocabulary
FOAF Vocabulary, by Dan Brickley and Libby Miller ; .
OpenStreetMap
OpenStreetMap, by Steve Coast ; .
rdflib
rdflib, by Daniel Krech ; .
Schema.org
Schema.org, by Google, Yahoo, Microsoft, Yandex ; .
WikiData
WikiData, by Wikimedia Foundation ; .