User talk:Hjfocs/Archive 1

Add discussion
Active discussions

Feel free to get in touch with me here!Edit

Unwanted referencesEdit

Just asking if you are aware that there seems to be quite a big number of unwanted URL references being offered by the Wikidata Primary Sources tool within the domain of films. The type of references I am referring to are spam at best, and illegal at worst. For example, some just lead to pages such as [1] as a reference for The Alamo (Q1621909). Others are just links to shopping pages which I suspect are just spam. Perhaps there is some way that some of these references could be automatically deleted? Those which link to torrents are especially unwanted because it may be illegal to build a library of links to torrents of pirated movies. Danrok (talk) 23:29, 2 October 2015 (UTC)

Hey @Danrok: those references come from the Google Knowledge Vault project, which is claimed to feed the freebase datasets in the primary sources tool (cf. the project page). There seems to be a related open issue in the tool repo, I think it would be useful to also report your comments there.
As a side note, such unwanted references are one of the first things that caught my attention when playing with Freebase data, and are one of the main motivations behind StrepHit: I think it is crucial to spend time investigating verifiable sources. Cheers! --Hjfocs (talk) 08:36, 5 October 2015 (UTC)

Primary Sources listEdit

Moin Moin Hjfocs, I have since noticed entries that appear in the list, but even praised be no entry. Examples: Károly Makk (Q1046993), Ravi Shankar (Q103774). Somehow I can't see the error. Can you help? Regards --Crazy1880 (talk) 18:15, 7 December 2015 (UTC)

@Crazy1880: thanks for reporting! According for instance to this call to the primary sources back-end API, it seems there is indeed data for Károly Makk (Q1046993). Hence, it should be correct to find that item in the list, but it seems nothing is displayed in the front-end!
I think the error may be related to our previous conversation. Unfortunately, the issues (64 and 65) we opened some time ago are still (almost) untouched.
Could you please open a new specific issue in that repository? I find it important to keep track of these problems there.
Google is the owner, and it seems there is no active maintenance.
Good news, however: The StrepHit project has been accepted for funding, so I expect to push forward the development of the tool very soon!
I will get back to you once the project starts, since you are a fundamental resource for testing the data and the tool.
Talk to you soon! --Hjfocs (talk) 19:19, 8 December 2015 (UTC)

No DatasetEdit

Moin Moin Hjfocs, there is no dataset in the list. Is this a problem of the NFS-error tonight?? Regards --Crazy1880 (talk) 11:55, 16 December 2015 (UTC)

@Crazy1880: not sure what you mean here? --Hjfocs (talk) 18:30, 18 December 2015 (UTC)
Moin Moin, mean a while it seemed to be fixed. So all is right. Thanks --Crazy1880 (talk) 18:32, 18 December 2015 (UTC)

Borse Alessio Guidetti per Wikimania 2016Edit


Ciao, un messaggio dalla Commissione Borse Alessio Guidetti

Ciao, come forse saprai quest'anno Wikimania, il raduno annuale delle comunità Wikimedia, si terrà a Esino Lario (Lecco) dal 22 al 28 giugno.

Come per le scorse edizioni dell'evento, anche per il 2016 l'associazione Wikimedia Italia intende rendere disponibili alcune borse di partecipazione.
Potete trovare il bando di partecipazione con tutti i dettagli a questo link.
La scadenza è il 30 aprile 2016, ore 23:59 CEST.

Trovate invece tutte le informazioni su Wikimania Esino Lario sul sito ufficiale dell'evento
Grazie, e un sincero augurio di buon lavoro e buon divertimento sull'enciclopedia libera :-)

per non ricevere più questa tipologia di messaggi rimuovi il tuo nome da queste liste

Alexmar983 13:24, 24 April 2016 (UTC)

more details about your NLP project(s) ?Edit

Do you have a report describing the pipeline you are creating to feed wikidata with content and references? I will be presenting wikidata at the 2016 BioCreative conference on biomedical language processing and they are very curious about existing efforts along these lines. Our group is also interested in expanding this kind of workflow into the biomedical domain. Thanks! --I9606 (talk) 22:53, 13 June 2016 (UTC)

@I9606: thanks for reaching out! You can find below a list of links:
As a side note, I am also collaborating with ContentMine, who has been working on fact extraction from academic papers, using PubMed as a prominent source. @Bomarrow1: looping you in.
Cheers, --Hjfocs (talk) 11:08, 14 June 2016 (UTC)
@Hjfocs, Bomarrow1: I'm curious to hear if you have made any headway on the biomedical aims mentioned in the initial proposal? Seems like maybe no? I think there is a pretty good chance that the BioCreative folks will be interested in a running a challenge on a task very much like you proposed for your IEG here. Any advice on guiding that discussion would be great. Also, if they are indeed interested, it would be great for you to get involved with leading the challenge. The conference is one month away now. Good luck and keep us all posted! --I9606 (talk) 16:45, 1 July 2016 (UTC)
@I9606: In these 6 months, we implemented the biographical domain, although the biomedical was a candidate (cf. m:Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline#Sources_identification).
Feel free to share all the StrepHit details with the folks you mentioned.
Cheers, --Hjfocs (talk) 08:41, 4 July 2016 (UTC)

Wikidata:Press coverageEdit

Hi Marco, do you see a way to leverage your news mining to update that page on a more regular basis? --Daniel Mietchen (talk) 22:37, 7 June 2017 (UTC)

@Daniel Mietchen: It's technically feasible, although it would require the implementation of Web spiders for a given set of news sources. The code to patch would be this:
Basically, you need to write a spider for each source. Cf for instance the NNDB spider:
Hjfocs (talk) 12:53, 8 June 2017 (UTC)
Hmm. I can't help with that. Pinging Léa. --Daniel Mietchen (talk) 16:06, 8 June 2017 (UTC)
Hello, I don't have the technical skills to do that. Why not asking on Project Chat or the wikidata-tech mailing-list to see if a volunteer can help? Lea Lacroix (WMDE) (talk) 07:34, 9 June 2017 (UTC)
@Lea Lacroix (WMDE): sure, why not? @Daniel Mietchen: feel free to open a new thread. Please don't forget to mention me if you use the project chat, thanks. --Hjfocs (talk) 17:28, 9 June 2017 (UTC)


Your bot has been listed at Wikidata:Requests for permissions/Removal/Inactive bot accounts as being inactive for over two years. As a housekeeping measure it's proposed to remove the bot flag from inactive bot accounts, unless you expect the bot will be operated again in the near future. If you consent to the removal of the bot flag (or do not reply on the deflag page) you can rerequest the bot flag at Wikidata:Requests for permissions/Bot should you need it again. Of course, You may request retaining your bot flag here if you need the bot flag. Regards--GZWDer (talk) 12:31, 26 June 2017 (UTC)

Translation administratorEdit

You should request a translation administrator flag, which allows you to move pages without affecting their existing translations.--GZWDer (talk) 14:58, 29 June 2017 (UTC)

@GZWDer: thanks a lot for the heads-up (I was not aware of that), and for your work in Wikidata:Primary_sources_tool. --Hjfocs (talk) 10:55, 30 June 2017 (UTC)

Wikidata Entity LinkingEdit

Hi! I am trying to make a map of the exinsting Entity Linking services that can output Wikidata identifiers. Of course most systems that output Wikipedia links can be converted a posteriori to return Qids (by using sitelinks) but I am interested in systems that deal with Wikidata specifically (i.e. which would be able to output Wikidata items which do not have any sitelinks). I saw that in StrepHit you use the Dandelion API. Are you happy with it, and are you aware of any alternatives? Thanks! − Pintoch (talk) 13:08, 29 June 2017 (UTC)

Return to the user page of "Hjfocs/Archive 1".