Program and Proceedings

The program for the ISWC workshop “NLP & DBpedia” on October 22nd, 2013 in Sydney, Australia is now available at

We are very happy to announce that the day will be started by a keynote held by Raphael Troncy with the title “NERD: an open source platform for extracting and disambiguating named entities in very diverse documents“.

The proceedings will be submitted to CEUR and will be available hopefully in a couple of days at http://ceur-ws.org/Vol-1064

Meanwhile, we are providing a temporary download from:

We also provide an introduction, so you can get an overview of the Volume:

NLP & DBpedia – An Upward Knowledge Acquisition Spiral –

You can cite the individual articles by replacing authors and titles of:

{hellmann-nlp2013,
author = {Sebastian Hellmann and Agata Filipowska and Caroline Barriere and Pablo N. Mendes and Dimitris Kontokostas},
title = {{NLP & DBpedia - An Upward Knowledge Acquisition Spiral}},
booktitle = {Proceedings of 1st International Workshop on NLP and DBpedia, October 21-25, Sydney, Australia},
year = {2013},
series = {NLP \& DBpedia 2013},
volume = {1064},
address = {Sydney, Australia},
month = {October},
publisher = {CEUR Workshop Proceedings},
}

Posted in Uncategorized | Comments Off

Two NIF workshops in Leipzig (24.9.) and Prague (9.10.): Call for participation

With the almost finished standardization of the ITS 2.0, we are also close to provide a complete NIF 2.0 specification.
WS 1 is a more general tutorial held in conjunction with the LSWT and WS 2 is a more specialized  workshop with developers and expert discussions and coding. While the former is for teaching and dissemination the latter is for advancing NIF and NER benchmarking and community meet and greet.

WS 1, 24.9.2013, Leipzig: Content Analysis and the Semantic Web (Tutorial)
held in conjunction with the Leipziger Semantic Web Tag

Session 1: Semantic Web and NLP – Sebastian Hellmann, INFAI
Session 2 & 3: Relation Extraction and Opinion Mining – Feiyu Xu, DFKI
Session 4: Interactive session, Q&A, hands-on
More info here: https://nlp2rdf.org/leipzig-24-9-2013

WS 2, 9.10.2013, Prague: NIF workshop: An open benchmark for Wikipedia-Based NER
held in conjunction with the LOD2 plenary meeting (http://lod2.eu)

The NLP Interchange Format (NIF) 2.0 is currently created and is taking shape. One large topic is to create interoperable Named Entity Recognition and Linking tools and corpora to ease the engineering burden, when benchmarking such systems. This workshop is a developer meeting to talk about:

  • NIF & NER: how currently NIF meets the needs of the NER community, and what is missing? Is it feasible to create a universal best practice?
  • NIF & GATE: whether and how can NIF be incorporated in GATE?
  • NIF & GATE NER eval framework: brainstorming on ideas for future research on how multilingual NER tools can be evaluated with support of GATE, DBpedia and possibly NIF.

More information available here: https://nlp2rdf.org/events/prague-9-10-2013

Posted in Uncategorized | Comments Off

New landing page and GitHub repos

We finally started moving. The new overview page is here:
http://persistence.uni-leipzig.org/nlp2rdf/

and the GitHub repo is here:

Posted in Uncategorized | Comments Off

NIF Roadmap 2012 and pointers

Just a repost of an email I wrote to the Stanbol Dev mailing list. See here for the discussion.

Below is a copy of the email:
Last year, we have been working on the NLP Interchange Format (NIF).
NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

What NIF currently is:
1. In Sept. 2011, we published the specification 1.0: https://nlp2rdf.org/nif-1-0 . There are about 8-12 implementations (see demo at 5.) out there, we know of.
2. One of the latest draft papers about it can be found here: http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
3. Basic idea is to use # fragments to give URIs to Strings, e.g.:
http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729 represents the first occurence of “Semantic Web” in http://www.w3.org/DesignIssues/LinkedData.html
Of course, you can then use this URI as subject and add any annotation you want.
e.g.:
:o ffset_717_729 its:mentions dbpedia:Semantic_Web .
4. There is a Web annotator making use of the Hash URI scheme or NIF:

5. There is a demonstrator (will be much nicer in a couple of days):
with eye candy, but minor bug:
6. Apart from that NIF also tries to find best practices for annotation. E.g. OLiA idenitifers for Part of Speech tags or NERD or the lemon model.

What is planned for NIF:
a) A new spec NIF 2.0 within this year. Discussion will be on this mailing list: http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
NIF will be simplified (simpler URI Schemes and annotations), consolidated (Better implementations) and extended (ability to express confidence value and string sets, etc. )
b) We plan to have implementations for NERD http://nerd.eurecom.fr , DBpedia Spotlight, Zemanta.com and DKPro http://www.ukp.tu-darmstadt.de/research/current-projects/dkpro/
c) Inclusion of XPointer as NIF URI Scheme and creation of a mapping to “string uris”. This should somehow be compatible with the Internationalisation Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/ , but we are still working together on a bidirectional bridge. There have been a plethora of discussion partly at this thread: http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0101.html
d) NIF should be compatible with PROV-AQ: Provenance Access and Query http://www.w3.org/TR/2012/WD-prov-aq-20120619/

What I am hoping for or my ideas about how Stanbol and NIF overlap:
I) Reading your docu, you guys seem to be able to provide very good use cases and feedback for NIF 2.0 . We would really like to include that and also tailor NIF 2.0 to your needs. We are currently setting up a Wiki – still ugly sorry: http://wiki.nlp2rdf.org/ Please mail me for accounts.
II) I would assume, that you need some OWL model for all the enhancer output. NIF standardizes NLP tool output and it tries to be blank-node free and lightweight, but still as expressive as possible. So for you this would mean that you could really save time, as ontology modelling is really tedious. By reusing NIF you would get a free data model and spec and you could focus on the implementation of the Stanbol engine. I got a 404 on http://incubator.apache.org/enhancer/enhancementstructure.html
I read “fise” somewhere. What is it? How does it compare to NIF? What URIs do you use? How many triples do you have per annotation?
III) With NIF we focused on the RDF output for tools, not on the workflow. Stanbol seems to focus on the workflow as well, right? It might be easy to implement a NIF engine with Stanbol. This could be a good showcase for NIF and Stanbol. With a Debian package, we could include Stanbol into the LOD2 Stack http://stack.lod2.eu/

Posted in News | Comments Off

Gate ANNIE

 

My name is Didier Cehrix and like Marcus and Robert I’m studying computer science at the University of Leipzig. For the practical part of the lecture “Software aus Komponente” I created a wrapper for the NLP Software Gate, especially for the ANNIE plugin.

The wrapper use the embedded version of Gate. The ANNIE plugin is ein Part Of Speach tagger. For the NIF output must the Gate document format be converted. The Gate document use a tree and this tree must be traversed and so the RDF output generated.

For future work I will to implement a document converter from NIF to Gate. This enable to use NIF as input in Gate.

Homepage Gate
Additional parameter None
Webserviceurl
Demo
Code
Posted in Implementations | Comments Off

DBpedia Spotlight

My name is Robert Schulze and like Marcus I’m studying computer science at the University of Leipzig. For the practical course of the lecture “Software aus Komponente” I created a wrapper for the web service, that generates NIF output.

The wrapper uses the to find named entities in a given text input. It’s implemented in Node.js, runs as a web service itself and fulfills all normative and (almost) all interface requirements given by the NIF-1.0 specification. Please have a look at the  for a detailed overview. Additionally to the specification I added JSONP as a output format. JSONP output allows JavaScript developers to create client side software on top of my implementation.

For future development I would like to support N-Triples,  Turtle and N3 as output formats. Because to the best of my knowledge there is no RDF framework/library/tool written in JavaScript or for Node.js, that supports transformations between these formats and RDF/XML, this is not the easiest goal to achieve. Furthermore I think it would be convenient to have a small reference implementation that uses the JSONP output.

Homepage
Additionalparameter None.
Status NIF 1.0 compliant without RDF/XML input; JSONP output
Webserviceurl
Demo
Code

RDF | JSON | N3 | NTriples

Posted in Implementations | Comments Off

MontyLingua

My name is Marcus Nitzschke and I’m studying computer science at the University of Leipzig. This implementation was written as the practical course of the lecture “Software aus Komponenten” in autumn 2011. Generally I chose this topic because I’m interested in the techniques of the Semantic Web and in detail because the connection of these techniques and NLP applications meant a new experience to me.

Due to the website, “MontyLingua is a free, commonsense-enriched, end-to-end natural language understander for English”. The commonsense-enriched part let MontyLingua differ from various other NLP tools. MontyLingua combines a Tokenizer, Part-of-speech Tagger, Extractor, Lemmatiser and a so called NLGenerator, which generates naturalistic English sentences and text summaries.

Because MontyLingua is written in Python this is one of the first non-Java wrapper for NLP2RDF (Monty also provides a Java binary, but Python is more fun :) ). The wrapper currently implements the Part-of-speech Tagger component of MontyLingua. For future work it would be interesting to extract informations of word relationships which are provided by MontyLingua.

Homepage Montylingua
Additionalparameter None
Status NIF 1.0 compliant without RDF/XML input and given error handling.
Webserviceurl
Demo
Code

RDF | JSON | N3 | NTriples

Posted in Implementations | Comments Off

Tutorial: How to call a NIF web service with your favorite SemWeb library

The parameters for NIF 1.0 can be found in the Parameter Section of the spec.
Below are example code snippets for several client side implementations. The result is always a combined RDF model of two NIF services.

curl

Note that there currently is no “best” RDF merge tool for the so we will use Jena CLI.

# query snowball demo webservice
curl "http://nlp2rdf.lod2.eu/demo/NIFStemmer?input=My%20favorite%20actress%20is%20Natalie%20Portman!&input-type=text&nif=true" > snowball.owl
# query stanford demo webservice
curl "http://nlp2rdf.lod2.eu/demo/NIFStanfordCore?input=My%20favorite%20actress%20is%20Natalie%20Portman!&input-type=text&nif=true" > stanford.owl
#combine with Jena rdfcat
rdfcat -x snowball.owl stanford.owl > combined.owl

Jena

See http://jena.sourceforge.net


Model model = ModelFactory.createDefaultModel();
String text = "My favorite actress is Natalie Portman!"
StringBuilder p = new StringBuilder();
p.append("?input=");
p.append(URLEncoder.encode(text,"UTF-8"));
p.append("&input-type=text");
p.append("&nif=true");
URL stemmer = new URL("http://nlp2rdf.lod2.eu/demo/NIFStemmer"+p.toString());
URL stanford = new URL("http://nlp2rdf.lod2.eu/demo/NIFStanfordCore"+p.toString());
model.read(
   new BufferedReader(new InputStreamReader(stemmer.openConnection().getInputStream())), null);
model.read(
   new BufferedReader(new InputStreamReader(stanford.openConnection().getInputStream())), null);

ARC2

See . This is also the code used in this .


$stemmer = "http://nlp2rdf.lod2.eu/demo/NIFStemmer?input-type=text&nif=true&input=".urlencode($text);
$parser = ARC2::getRDFXMLParser();
$parser->parse($stemmer);
$stemmertriples = $parser->getTriples();
$stanford = "http://nlp2rdf.lod2.eu/demo/NIFStanfordCore?input-type=text&nif=true&input=".urlencode($text);
$parser = ARC2::getRDFXMLParser();
$parser->parse($stanford);
$stanfordtriples = $parser->getTriples();
$alltriples = array_merge($stanfordtriples, $stemmertriples);
$ser = ARC2::getTurtleSerializer();
$output = $ser->getSerializedTriples($alltriples);
echo $output;
Posted in Tutorials | Comments Off

Tutorial Challenge: Semantic Yellow Pages

According to the Get Involved page each blog post has to start with a short introduction:
Hello, I am Konrad Höffner and I am a student of computer science at the at University of Leipzig. I love living in the future but one of the things that I am still dissatisfied with is yellow pages. They are nationally limited, pestered by advertisments (and I *hate* ads),  don’t understand synonyms, are only indexed in the language of the country of origin and/or are generell dumb (try searching for “delicious pizza nearby” in Google Maps). Fortunately I think the Semantic Web is the technology that can alleviate this nuisance and it’s only *you* who can save the world!

Challenge

Your goal is to create a Semantic Yellow Pages Search for . In a simple html search form a user can enter keywords or a search sentence. For the search the following needs to be extracted:

  1. a location
  2. an amenity
  3. (optional) a restriction or filter condition

This information is now to be used to construct SPARQL queries on the (or another knowledge base if you like) and present the result to the user. Note that if no location can be found in the search string, the user’s current location should be used instead. The position can be found out via the HTML 5 feature geolocation (or given in another input text field for testing ).

Example 1

“I am looking for a optician in Paris.” Here the location is the city of and the amenity is . There is no restriction or filter in this example. The city Paris only has a single geo point (her center). Since it is a a radius of 5 km is appropriate. Here is an example SPARQL query():

Prefix lgd: 
Prefix lgdo: 
Select ?optician ?name ?opticiangeo from  {
   ?paris owl:sameAs  .
   ?paris geo:geometry ?parisgeo .
   ?optician a lgdo:Optician .
   OPTIONAL {?optician rdfs:label ?name . }
   ?optician geo:geometry ?opticiangeo .
   Filter(bif:st_intersects(?parisgeo, ?opticiangeo, 5))
}

As an additional challenge you can adjust the radius not only for the type of the location (city:5km, country:100km), but also for the searched amenity ( 5 km, 0.4 km)

Example 2

“cheap restaurant”: Here the task is to find cheap restaurants near the position of the user. The amenity in this case is . The restriction is hard to extract here and could best be translated to “below a certain price point” which even then still requires the application to a) find out the restaurant’s prices and b) determine where that price lies (e.g. below the median or at least one standard deviation to the left of the average). Because the restriction handling is quite challenging, it is ok if you don’t implement restrictions or only do it for basic cases like “within 500 m”. If a restriction is present and the results are shown as a list, the results should be ordered according to the restriction criterion, e.g. for “within 500 m” they should be ordered by distance, ascending.

Requirements

Most of the following requirements should be met:

  • Synonyms should be included, i.e. searching for “tooth doctor” returns the same result as “dentist”.
  • Other languages should be included, i.e. searching for “Zahnarzt” returns the same result as “dentist”.
  • Search results should be shown as a table. The geo position and the name of the amenities should be shown along with their relevant properties (distance, opening times, etc. )

Suggested resources

Some suggestions of resources that can be used, i.e. you can use anything else.

  • (recommended for both the ontology, the instances itself and the queries)
  • . Note that there are already many multilingual labels included.
  • Wortschatz Universität Leipzig for the synonyms. AFAIK the synonyms are only included for the German language so it is ok if synonyms are only used for German. A prepared table of synonyms specifically for the LinkedGeoData ontology will be available soon, so that you don’t have to work with the Wortschatz interface.
  • LinkedGeoData supports the REST interface and can thus be queried like this: . Further details can be found at .
Posted in Tutorial Challenges | Comments Off

Tutorial Challenge: Semantic Search

According to the Get Involved page each blog post has to start with a short introduction:
My name is Sebastian and I wrote his challenge to give you a rough template for writing your own challenge. Besides I think, that the problem can be easily solved with NIF and it is a good showcase.

The goal of this challenge is to create a Semantic Search. In this context this means the following.

For a given text (see below) a user gets a search form and can enter one or several search terms. The search shall return all sentences that have “something to do” with the search term. Additional information should also be shown.

Most of the following requirements should be met:

  • Synonyms should be included, i.e. searching for “USA” returns sentences with “United States”
  • Some form of normalisation (stemming, lemmatising, stopword removal) should be applied.
  • DBpedia Instances, that are in the text and match the search should shown. They can also be shown to disambiguate the search, i.e.  when searching for “Bush”  or “Madonna”.
  • Related and similar instances to the found DBpedia instances, that are also in the same text, i.e. Barack Obama is related to United States.

Given text

this text should be used:

Mockup

A static mockup, where only “USA” can be searched can be found here

Code:

Some suggestions of resources that can be used, i.e. you can use anything else.

  • Snowball Stemmer
  • FOX
  • Stanford CoreNLP 
  • Pablo Mendes . Some information from the data set was extracted and loaded here in a Virtuoso Triples Store: http://hanne.aksw.org:8890/sparql . Graph: http://dbpedia.org/lexicalizations You can get alternative surface forms for DBpedia with SPARQL Queries.

 

Posted in Tutorial Challenges | Comments Off