Wikidata:WikidataCon 2017/Submissions/Extracting a database of etymological relationships from Wiktionary

 This is an Open submission for WikidataCon 2017 that has not yet been reviewed by the members of the Program Committee.

Submission no. 20
Title of the submission

Wikiproject Etymology: Extracting a database of etymological relationships from Wiktionary


Author(s) of the submission

Ester Pantaleo

E-mail address

esterpantaleo@gmail.com

Country of origin

Italy

Affiliation, if any (organisation, company etc.)

Type of session
Length of session

10 min

Ideal number of attendees

30


Abstract

In this talk I will show progresses in the extraction of a database of etymological relationships from Wiktionary. In particular, I have worked with data extracted from the English Wiktionary, and I have generated a (RDF) database of lexical information and etymological relationships. I have also created an interactive visualization, a graphical etymology dictionary, where users can search a word (in principle in any language) and visualize the etymological tree of the word, i.e., the tree of ancestors and descendants of ancestors, as well as descendants of the word itself. Etymological trees are multilingual trees that show how different words in different languages have evolved from a common ancestor. Through the visualization users can also see lexical data associated with words in the tree, like POS and definitions, by clicking on words in the tree.

This is an IEG project centered around Wiktionary that could produce data for Wikidata when a structure for lexical data will be available in Wikidata (see the | Wikidata for Wiktionary project). For this purpose, as suggested before (see here), the primary sources tool could be used, as data would need a validation step.

The slides above are screenshots of the interactive visualizations produced by the graphical and multilingual etymology dictionary etytree. The tool is currently under development and can be tested at etytree. Users can click on language tags and words to see definitions. We have also started a Wikidata project: Etymology.

What will attendees take away from this session?

Attendees will:

  1. Get a general sense of how the data has been extracted.
  2. See a few examples of etymological trees constructed by etytree using data extracted from Wiktionary. The dictionary can be used to search any word in any language (in principle) and it presents (in a graphical way) how words in different languages are connected between each other if they derive from the same root. Users can discover new words in other languages that derive from the same ancestral word or can discover relationships between words in the same language.
  3. See how different pages in Wiktionary contain the same (and sometimes contradictory) information.
  4. See difficulties encountered during the extraction process.

Slides edit

 

Interested attendees edit

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest.

  1. --Micru (talk) 13:46, 11 August 2017 (UTC)[reply]
  2. --Sannita - not just another it.wiki sysop 16:43, 1 September 2017 (UTC)[reply]
  3. ...