Wikidata:WikidataCon 2017/Notes/An open source tool for fishing Wikidata entities in text and PDF documents
Title: An open source tool for fishing Wikidata entities in text and PDF documents
Speaker(s) edit
Name or username: Patrice Lopez
Useful links:
https://github.com/kermitt2/nerd
Abstract edit
entity-fishing (repo: https://github.com/kermitt2/nerd, demo: http://entity-fishing.science-miner.com, documentation: http://nerd.readthedocs.io) is an open source tool dedicated to the automatic identification and disambiguation of Wikidata entities in multilingual text and PDF documents. The tool is based on machine-learning techniques exploiting Wikipedia as training source. entity-fishing offers high performance and scalability and is totally generic in term of domains and languages. It can thus address a large variety of usages. Our work focuses more particularly on processing scholarly documents, taking advantage of the massive amount of scientific knowledge and links present in Wikidata.
Collaborative notes of the session edit
Entity recognizer
Grobid-NER - https://github.com/kermitt2/grobid-ner
LMDB
Entitiy embedding.