Open main menu

Wikidata talk:WikiProject Freebase

What is this?Edit

I hardly know what this is but it seemed interesting, and I could not find a place to discuss this migration. It seemed big enough to warrant its own project page. I hope someone explains more. Blue Rasberry (talk) 15:02, 17 December 2014 (UTC)

Thanks for setting up the page! --Denny (talk) 18:07, 17 December 2014 (UTC)
I'm interested in learning more, too. I am a long-time Wikipedian who was just getting into Freebase editing, doing simple data entry and clean-up. More recently, I've been dipping my toe into Wikidata editing, but I have been admittedly a little lost with how and what to contribute here. Now I feel like I have to bring myself up to speed on the finer aspects of both projects if I want to salvage the work I've done on Freebase. Dancter (talk) 18:43, 18 December 2014 (UTC)

PropertiesEdit

Maybe it's a good idea to make a list of Wikidata-properties and their equivalent on Freebase, so we can look if we need some extra properties. Sjoerd de Bruin (talk) 18:22, 17 December 2014 (UTC)

Yes, that would be extremely useful. We could also add it as a statement on the property page here. --Denny (talk) 22:11, 17 December 2014 (UTC)
How would that work? So, for example, if it was determined that the closest analog to the Freebase property "Based on" on Wikidata is "fictional analog of," would we add a statement to Property:P1074 with the mid value /m/02h7lm7 for the Freebase identifier? Dancter (talk) 18:58, 18 December 2014 (UTC)
I guess so. What would be useful would be two different kind of mappings: "these two properties have exactly the same meaning", and "this property on Freebase is a strict subset of this property on Wikidata". Both these annotations would be extremely valuable, actually necessary, for creating the mapped data. --Denny (talk) 16:53, 22 December 2014 (UTC)
Could someone start a three columns list: Freebase properties? Than we could add corresponding WD properties (2nd column) and comments (3rd column). And: Isn't Wikipedia the main source for Freebase? Does Freebase "know" what info it has imported from Wikipedia? How can we avoid to re-import outdated data that has been corrected on WP already? BTW: I've seen that many items on Freebase are not linked to WD through Freebase ID (P646) even if they have a link to WP. --Kolja21 (talk) 19:01, 23 December 2014 (UTC)
Should the list be placed in a subpage, such as Wikidata:WikiProject Freebase/Properties? Are there any properties that don't need to be listed, such as ones that were deprecated on Freebase, or do we want to look at them all? There's probably a much easier way to generate a formatted list of all the types and properties in the Freebase Commons, but I'm not familiar with MQL. For me, it involves a lot of copying and pasting, and it would take quite a while to list everything. Perhaps it would help to prioritize and divide? Here is a list of most of the Commons domains on Freebase. There are a few others such as Metaweb System Types, Common, and Freebase that may be important. Also, there is probably a lot of useful data in bases such as Schema staging and Web Video. I was thinking that for a particular domain, it may be a good idea to solicit input from people with special expertise or interest in that domain, such as through a related WikiProject. Dancter (talk) 18:46, 24 December 2014 (UTC)
Hey guys, here's a list of most-used freebase types, and the mappings can occur there. A similar output of freebase properties is also possible, if it is desired. cheers Spencerk (talk) 11:41, 5 January 2015 (UTC)
ACTION ITEM - We're going to need a "disambiguating" flag for a Property in Wikidata, which will be useful for the new Wikidata API being worked on now. Thadguidry (talk) 02:14, 7 January 2015 (UTC)

Statistical mappingsEdit

Hi guys, I think the freebase-wikidata properties mappings can be produced automatically. When freebase says:

   berlin   location/containedby   germany 

and wikidata says:

   berlin   P30   germany

Because we already know the ids, over even a small number of topics you can develop confidence that p30 and 'location/containedby' refer to the same thing. I've done this on a 800 topics sample, and the results are here -> Here is the results of only an 800 topics sample. The code is here on github. cheersSpencerk (talk) 17:16, 5 January 2015 (UTC)

@Spencerk:, these lists are perfect, thank you very much! I will definitively use them for creating the candidate files! Thank you! Any further help on that is appreciated. --Denny (talk) 18:00, 6 January 2015 (UTC)

cool, thanks Denny, let me run it on a bigger sample, and i'll add the wikidata property names so it's easier to debug.
...Given that its just fb >> wikidata, its okay for a fb property to be a subset of the wikidata one right? cheersSpencerk (talk) 12:52, 7 January 2015 (UTC)
Yes, subsets of Wikidata properties are fine. Basically, what we need is (formally speaking) a list of Freebase-properties that are subproperties of Wikidata-properties. This will allow us to make mappings and provide the data for upload. I already did a first test with /people/person/nationality -> P27, and got several hundred thousand mapped instances. That's pretty neat :) --Denny (talk) 16:13, 8 January 2015 (UTC)
How will properties such as /people/person/parents be dealt with? Dancter (talk) 17:04, 8 January 2015 (UTC)
Tougher question. Probably based on P21 of the object, use P22 or P25. But I guess we will need to define such mapping rules individually for each property we map. --Denny (talk) 17:34, 8 January 2015 (UTC)
That'd probably be a good rule for suggestion, but I think it'd be good for whatever tool is used to import the data to allow the user the option to adjust things for each statement case. For example, it is possible for there to be an instance in which the sex or gender (P21) value on Wikidata is actually incorrect. The user should still be able to use the tool to add the correct statement of father (P22) or mother (P25). The nice thing about a statistical approach to property mapping is that regardless of how cleanly a Freebase property maps based on strict definition, if it is frequently (mis)used in a particular way (i.e. /business/business_location, /organization/organization_partnership/members), in theory an alternative could still be suggested that may allow users to add a correct statement, even if the original statement as defined was technically incorrect. Dancter (talk) 18:02, 9 January 2015 (UTC)
Denny: So you say you have several hundred thousand nationality claims for humans. What are your sources for these claims? And "imported from freebase dump" (or "copied from Wikipedia categories in 2008 to Freebase / imported from freebase dump to wikidata") is not a source.
How many of these humans have a claimed nationality that didn't legally exist at the time they died? And some of these humans are actually living people and go to real airports and have real border controls... "So you're russian, where is your passport?" "I'm not russian." "Wikipedia says your russian. Please follow my colleague." Won't happen? It probably will. How many of those several hundred thousand nationality claims are false? How many birth and death dates in freebase [1][2] are false? How many family relatives are false? How many marital status claims are false?
Freebase data is largely built on Wikipedia infoboxes and other structured data sources like FashionModelDatabase and MusicBrainz. As of January 2014, Freebase had approximately 44 million topics and 2.4 billion "facts" (only in Domain /music: 27 million topics and 187 million facts [3]). But i read that the number of "Confident facts" is only 637 million, "Confident facts" means with a probability of being true at or above 90%. Which begs the question, how they know what is true and apparently they don't. "To assess the severity of this problem, we manually labeled a subset of our balanced test set, using an in-house team of raters. This subset consisted of 1000 triples for 10 different predicates. We asked each rater to evaluate each such triple, and to determine (based on their own research, which can include web searches, looking at wikipedia, etc) whether each triple is true or false or unknown; we discarded the 305 triples with unknown labels."
And so should we: Discard the Freebase claims with unknown sources. --Atlasowa (talk) 23:49, 8 January 2015 (UTC)
Hi @Atlasowa:! I agree completely with you. The data should be treated as being without sources. I also wouldn't want it to be simply uploaded, but rather to be piped through the primary sources tool, where each single statement would be evaluated and manually extended with a source, and then the contributor using the tool would manually upload it to Wikidata. There is no plan to simply dump these triples into Wikidata, I think that would be quite bad.
Regarding the numbers you cited, you are mixing up Knowledge Vault with Freebase. For Freebase we estimate a much higher quality than the 90% you cite. Nevertheless, as the examples show, there are plenty of errors in Freebase. Which is why I fully agree that we should not just upload these triples into Wikidata, but provide a curation step. What I was discussing here was merely how to prepare the data for that curation step. I hope that addresses some of the worries you expressed.--Denny (talk) 00:14, 9 January 2015 (UTC)
Hi atlasowa, yes, thanks for sharing that paper. Honestly, I've had conversations with a dozen companies using freebase, and have never heard accuracy complaints, ever. Since there's no non-statistical kb that's close to the size of freebase, so theres no way to fully evaluate it, though I've never heard of a spot-check that's lower than 95% accuracy. I know you WD folks are keen, but you're missing a huge opportunity. Spencerk (talk) 18:10, 16 January 2015 (UTC)
hi @Denny: Here's a bigger run - Better property mappings cheers! Spencerk (talk) 18:10, 16 January 2015 (UTC)

ScopeEdit

Is there going to be discussion on the scope of this effort? Will it involve changing the requirements for data in Wikidata?

(I've started out putting this on the discussion page, but I'm hoping that some discussion of scope will make it onto the main page.)

Pfps (talk) 18:45, 18 December 2014 (UTC)

Wikidata definitely needs a way to host content that doesn't comply with all its technical and notability requirements, and distinguish it from content that does. If data could be stored in several layers (at least "validated" and "not validated"), all problematic content could be flagged as "dirty" without the need to delete it, and it could be cleaned up in progressive steps, rather than rejected. Diego Moya (talk) 17:43, 18 December 2014 (UTC)
That matches what we were thinking would be an excellent way to proceed. Pfps (talk) 18:40, 18 December 2014 (UTC)
I agree that that's the most sensible way to proceed. --Waldir (talk) 17:34, 4 January 2015 (UTC)

NotificationsEdit

I posted notice of this page and the Freebase project on the Wikidata chat board and English Wikipedia's village pump.

Blue Rasberry (talk) 19:08, 18 December 2014 (UTC)

Freebase is further discussed on the Wikidata project chat at Wikidata:Project chat/Archive/2015/01#Freebase - large content donation from Google. Blue Rasberry (talk) 21:06, 7 January 2015 (UTC)

Early days and next stepsEdit

just catching up on the news now - very exciting stuff!

@Bluerasberry, Denny, Lydia_Pintscher_(WMDE): As discussed on the mailing list, the implications of this announcement are hard to know at this point, but there is clearly interest in having the community involved in the design and development of processes and related tools. I wonder then if an aspect of this is something that could be a turned into a Wikimedia IEG in time for the next cycle of funding (which I believe will be between March and May of next year)? My understanding is that Google is helping with the migration of Freebase data, and I know that IEG does not fund MediaWiki software features but in terms of community organizing efforts by individual Wikidata contributors and developing processes and workflows, there could be something here? (Full disclaimer: I am currently a member of the IEG Committee).

Interested to hear your thoughts. -Thepwnco (talk) 23:49, 20 December 2014 (UTC)

I like the thought, but it really depends on the actual proposal. I currently would not know how to make an IEG application that can support that - in the end, the decisions on how to exactly incorporate the new data lies with the community. --Denny (talk) 16:51, 22 December 2014 (UTC)
There is plenty of grant money at the IEG level and not enough people who seek to apply for it. It would be fantastic if this project could lead to someone, anyone, anywhere applying for funds to develop Wikidata. I would comment on any proposal made. Here are some ideas for proposals:
  • Documenting the Freebase content donation
  • Soliciting comments from stakeholders in other Wikimedia projects who might be interested in seeing what Freebase has; bringing more people into Wikidata to look at content here
  • Organizing training to have anyone anywhere process Wikidata and Freebase datasets
  • Developing general Wikidata tutorials or guides using Freebase content as examples
  • Curating existing discussions about Freebase, summarizing the topics, and presenting links to this information in Wikidata
  • In any way, using this donation to increase anyone's engagement in Wikidata
If someone has a small idea and wants a few thousand dollars or has a bigger idea and would like to request USD 10-20k, then that is the scale of funding which is commonly requested at meta:Grants:IdeaLab. Blue Rasberry (talk) 16:56, 22 December 2014 (UTC)
@Bluerasberry: whoa, those are all great ideas! Thanks for sharing and providing your thoughts on the potential for IEG funding within the Wikidata community. I think it's clear that there is no shortage of interesting and exciting Wikidata projects that could be submitted for IEG funding. The key is having contributors within the community who want to take on the responsibility of project lead and/or community manager for the particular initiative.
I also wanted to share that I've just today learned that IEG is actually going to be changing its format for the next round to focus specifically on granting projects that endeavour to increase gender diversity within WMF projects. Now that obviously doesn't rule out a Wikidata-Freebase project, but it does change things quite a bit in terms of planning and preparing a proposal. I have also had confirmed that round 2 of IEG for 2015 will follow the same format as previous rounds and will still accept all applications rather than ones addressing one strategic topic (in the case of round 1, gender diversity). More information should be released in the next month or so. -Thepwnco (talk) 01:49, 23 December 2014 (UTC)
Thepwnco (talkcontribslogs) I think there is funding by some channel in any case. Especially if someone group has community support, is willing to work for the equivalent of USD 25/hour or thereabouts, and makes a proposal for less than USD 10k, I think just about any project could find finding. None of the Wikimedia grant funding channels even get many proposals to consider at all, so any reasonable proposal is already in the top 50% of fundable options. If anyone wanted to actually make a proposal then I would comment on it and solicit comments from others. I would like to see more people making Wikidata more accessible and usable. Wikipedia is big, Wikidata will be bigger, and I would like community input sooner. I wish someone would ask for a grant. Blue Rasberry (talk) 14:55, 24 December 2014 (UTC)

Data-tuning between Wikidata and FreebaseEdit

Moin, it is possible to make a balance between Wikidata and Freebase, to see which items have been in Wikidata Freebase IDs? Or is it possible to have a bot take over that? Or how should the merger go in the end? Regards --Crazy1880 (talk) 11:13, 24 December 2014 (UTC)

Well, we already have Property:P646 which was initially populated by a bot using the Freebase<-->Wikidata dump mapping. Legoktm (talk) 00:43, 1 January 2015 (UTC)

Giving the public a Freebase API via Wikidata Query Service enhancementsEdit

The overall effort of a new Wikidata API (mimic Freebase API) is now being tracked in Phabricator here: https://phabricator.wikimedia.org/project/board/891/ Thadguidry (talk) 02:07, 7 January 2015 (UTC)

Property mappingsEdit

The data is getting in a state where it is more and more cleaned up. I can now create a list of Freebase-properties that would have the biggest impact if mapped to Wikidata-properties. Here's the top of the list (only properties with datatype item for now):

  1. http://www.freebase.com/location/location/containedby
  2. http://www.freebase.com/location/location/contains
  3. http://www.freebase.com/people/person/place_of_birth
  4. http://www.freebase.com/location/location/people_born_here
  5. http://www.freebase.com/people/person/nationality
  6. http://www.freebase.com/people/profession/people_with_this_profession
  7. http://www.freebase.com/people/person/profession
  8. http://www.freebase.com/music/genre/artists
  9. http://www.freebase.com/music/artist/genre
  10. http://www.freebase.com/people/deceased_person/place_of_death

As you can see, the list of properties includes also reverses, which, in Wikidata, we would often (but not always) skip, i.e. we wouldn't have a property for Nr. 4, people_born_here (which is good, in my opinion).

The list contains a few hundred properties (maybe around a thousand). How do i best share it in such a way that we can work together on a mapping of those properties to Wikidata properties? A few ideas that come to my mind are:

  1. use Freebase ID (P646) on the relevant property to add, e.g. /people/person/nationality to country of citizenship (P27)
  2. use equivalent property (P1628) to connect country of citizenship (P27) to http://www.freebase.com/people/person/nationality
  3. even better, since more flexible, use subproperty of (P1647), i.e. a property to point to a subproperty of the given property (sorry for this sentence). I.e. instead of equalizing country of origin (P495) with /film/film/country (as suggested by the statistical mapping) we would say P495 has - as one of its subproperties - /film/film/country, i.e. ever /film/film/country is a P495, but not every P495 is a /film/film/country
  4. instead of doing it in Wikidata, work together in a Google Spreadsheet (which would be a pity to not have the data in Wikidata itself)

Any thoughts on how to proceed best?

Once we have decided on the how, some of the data from the statistical mapping above could be used (but it contains quite a few errors, probably due to the small sample size - anyone wants to run it on a bigger sample?) --Denny (talk) 17:38, 2 March 2015 (UTC)

To experiment, one could try to populate filming location (P915) from http://www.freebase.com/m/01x3gpk for any films here (about 26'000). I created a small project for this (WikiProject Filming Locations).
At a second stage, items for films currently not available at Wikidata could be created and populated. --- Jura 17:45, 2 March 2015 (UTC)
Yes, I think that would be a good start, to first go for items (films) that already are available in Wikidata. I also would be happy to start with a single property, and then expand. What I was trying to get at with my question, though, was how to work together on the list of mappings of properties, e.g. P915 to /film/film/featured_film_location resp. /n/01x3gpk. Or is this too early to think about this and we should simply start with a single property? --Denny (talk) 18:03, 2 March 2015 (UTC)
Some properties might be quite straight forward, others have their own pitfalls. The question might be if import property by property or item by item.
The mapping could be done directly on the property pages. --- Jura 21:03, 2 March 2015 (UTC)
Wikidata items are already being mapped according to Freebase ID (P646), so I think it makes sense to map Wikidata properties the same way. Would these mapping statements have any effect on the import tool after the data is released? It reads to me that, at least with respect to the primary sources tool, the data is being specially prepared according to hard mappings established beforehand.
I like idea 3. I'd prefer not getting too clever about the mappings. I'm assuming the P646 mappings will also be used by third-party clients transitioning from Freebase APIs, so I think that should be taken into consideration. For all the useful Freebase properties that don't have clear and exact Wikidata equivalents, I think we should generally just have Wikidata properties created for them. Also, it's been over 3 months since the transition was announced, and though I expect the pace to pick up somewhat after Freebase goes read-only, I don't think we'll be able to fine-tune the details of integrating every valuable Freebase property in the following 3 months, before Freebase closes down, after which we'll be flying blind. I'd be pessimistic about trying to get the many properties needed for this approach through Wikidata:Property proposal, though. There's so much to do. I don't even know if the issue of how to handle mediators/CVTs has even been broached, yet.
As far as how to collaborate around this process, I prefer something that ensures that each Freebase property has some on-wiki discussion and approval about how it is mapped, however brief. A collaborative spreadsheet such as Google Spreadsheet would probably be the fastest way for everyone to propose, track, and visualize mappings in one place, but personally, due to past experiences, I don't feel comfortable enough about my privacy on Google Docs to use it. My thought is that regardless of how the mappings are proposed/contributed, somewhere there is a wikitable (or wikitables) showing for each mapping the users who've signed off on it, and links to any discussions regarding it. Dancter (talk) 19:19, 24 March 2015 (UTC)
Yes, the data is being specially prepared for Wikidata. I am mapping according to the mappings defined in Wikidata for the items, and I would like to do the same for the properties. I can redo the mapping anytime after the initial release of the tool, and probably will do so as more and more data gets mapped.
The third parties are a good point. For them, a mapping in Wikidata would make most sense.
The point about Google Docs is understood, and would mean we should do the mapping on Wikidata. If we tried a centralized discussion, this might get too big. Hmm.
Since there are less properties in Wikidata than in Freebase, it could be effective to discuss the integration on the property talk page being mapped to. What I mean: /film/film/genre and /music/artist/genre are both mapped to P136. So let's add two claims (using P646?) on P136 that list both /film/film/genre and /music/artist/genre, and let's discuss anything that needs to be discussed on the discussion page of P136.
Then I can just grab the values on the properties and create the mappings. --Denny (talk) 21:53, 26 March 2015 (UTC)
There's no reason why we can't try to do all of the above, and see what takes. The Google Docs thing is my issue, and I can work around it if need be. I do think there needs to be some sort of central documentation, though. My concern with simply mapping directly to properties is that the only real tool that I am aware of for tracking usage of such mappings would be the "What links here" function, which is fairly crude. Right now, it's easy for me to see the few Freebase mids that were already mapped to Wikidata properties, and by clicking through to each of them, I can see that only one of those mids is for an actual Freebase property. I wouldn't be able to manage that sort of thing as the number of mapped properties grows. Dancter (talk) 01:03, 27 March 2015 (UTC)

Items about people: FreebaseEdit

At User:Jura1/People_charts#Resources, I added a Freebase chart. The coverage seems fairly low (10%: 294722 of 2523507).

As Freebase has items on 3.4 million people, is there a way to add more mappings to Wikidata? --- Jura 22:57, 14 June 2015 (UTC)

@tpt: what do you think of it? --- Jura 22:41, 15 June 2015 (UTC)

We use the mapping published by Google that is very conservative (it maps a Freebase item to a Wikidata one if, and only if, they share links to 2 or more Wikipedias) and it is the one that has been imported into Wikidata. But some bigger mapping exists like one from Samsung. We may use it in the future to feed the Primary Source tool as data are not directly added to Wikidata but I am not sure that the quality is good enough for a direct importation into Wikidata (but I don't really know). Tpt (talk) 23:47, 15 June 2015 (UTC)
Would be interesting to know how much we excluded. 10% seems low. How many would there be if 1 link would be sufficient? How many if 1 link + DOB/DOD?
It seems we might miss out on a lot of content. About half the items in enwiki don't link to any other Wikipedia (first chart @ User:Jura1/People_charts#Resources).
Maybe the ones that didn't get mapped don't have that many statements worth importing anyways. --- Jura 08:26, 16 June 2015 (UTC)
There are 1.15M Wikidata items with a Freebase identifier and the Samsung one contains around 4.4M relations. I assume this mapping is mostly done using Wikipedia links but I don't manage to find how they did it (they said a few month ago they will publish the source code but it seems they didn't). So, as Samsung mapped 4 times more items I think we can assume that the number of statements is far from low. More as these items are about less known topics they are maybe less covered by Wikidata. I'll try to investigate more on the Samsung mapping in the coming weeks but I believe that the mapping of properties is a more important goal for now. Tpt (talk) 18:46, 16 June 2015 (UTC)
The numbers I mentioned just cover items with P31=Q5. Coverage is probably higher for geographic feature, usually present in many more languages. --- Jura 07:19, 18 June 2015 (UTC)

Wikidata:WikiProject Freebase/MappingEdit

FAQEdit

I added some questions at Help:FAQ/Freebase#How_can_I_access_the_data_in_Freebase.3F_How_can_I_query_Freebase.E2.80.99s_data.3F. --- Jura 07:25, 18 June 2015 (UTC)

Apparently, it's not meant for Wikidata people, just for Freebasers. Well, here are the questions about Freebase: --- Jura 11:39, 18 June 2015 (UTC)

How do I list all properties in Freebase?
http://www.freebase.com/type/property?instances= I'm not sure if there is a more user friendly page. If you want the data in JSON there is the query sandbox. Tpt (talk) 20:42, 18 June 2015 (UTC)
How do I list all possible values for a property in Freebase?
Go to the property description page on Freebase (like http://www.freebase.com/people/person/gender ). The type of the property values is the right box that links to the type description page. If it is an "enumerate" type, the possible values are displayed in the "Topic" section. If it isn't, they are accessible using the "instances" tab. Tpt (talk) 20:42, 18 June 2015 (UTC)
How do I list all items (instances) of a given property and their values in Freebase?
Go to the property description page on Freebase (like http://www.freebase.com/people/person/gender ). Click on the "instances" tab to get the list of instances of this property. Tpt (talk) 20:42, 18 June 2015 (UTC)
How do I make this into a spreadsheet?
I don't think there is in the Freebase UI a such option. You may use the MQL API that returns JSON and then convert it to CSV or directly use Freebase dumps. An other way is to use the OpenRefine tool that is able to query Freebase and export data in CSV. Tpt (talk) 20:42, 18 June 2015 (UTC)

Q20666937Edit

I created the above for simple cases. --- Jura 21:14, 13 July 2015 (UTC)

+ Q20680025. --- Jura 14:12, 23 July 2015 (UTC)

Zoo species integration reviewEdit

I added some data from www.freebase.com/zoos to Property:P1990. There were about 2400 species for 170 zoos. The ones with most species were Q818395 and Q1043283. AmaryllisGardener had added some data to both manually as well.

  • Identifiers for zoos: First I checked how many zoos already had a Freebase identifier (about 150 of 700). I could add another 340 identifiers by matching the names from www.freebase.com/zoos to Zoos already at Wikidata (see Q20666937). I found a few additional zoos in Freebase which weren't at www.freebase.com/zoos but generally didn't include much information beyond the name and a Wikipedia excerpt. To cover the 170 zoos entirely, I had to add a few identifiers manually. The multibeacon tool seem to miss a few identifiers, so I worked with the general beacon files. User:Jura1/zoos provides an overview.
  • Identifiers for species: Most species already had identifiers at Commons. Still, there were some I had to map manually as well.
  • There are a couple of additional elements in Freebase that might be worth importing, but I found that, e.g., the data of opening is easier to import from Wikipedia directly. Visitor data is ready for import as well, but QuickStatements has some issues with that datatype.

I would be glad to have your feedback. --- Jura 10:23, 17 July 2015 (UTC)

Congrats! Your work is very nice. I think we should find an efficient workflow that would allow to avoid to rebuild everything each time we want to import some data from Freebase. If I have understood well your work, we met the same challenges:
  1. Improve the mapping between Wikidata items and Freebase topics in order to don't miss to much data. It would be nice if we had a tool to improve the currently available mappings. A path we could follow is to use the ids presents in Freebase and Wikidata to do some reconciliation and then use something like mix'n'match for the remaining things.
  2. Do the conversion. I think that my conversion tool is doing it pretty well if challenge 1 is solved and we have mapped properties.
  3. Upload new data into Wikidata. Primary Sources does it currently but the process is far too slow for data that don't need human curation. I'm thinking to add a new mode to Primary Sources to make this process quicker and easier. But it may require a not too small amount of work. Before it is done working directly with Primary Sources API is something possible.
PS: If you want to help with one of this challenge, fell free to ping me on IRC or here.
Tpt (talk) 20:28, 17 July 2015 (UTC)
Thanks for the review. I found it a good way to help me identify possible issues with the large scale upload. It's harder to extrapolate from 10s of edits with the Primary Sources tool. BTW for (1) I came across the issue at Property_talk:P646#Freebase_redirects. Maybe my sample was bad, as I got 20 redirects for 400 (or 150?). --- Jura 14:12, 23 July 2015 (UTC)

Next stepsEdit

Just wondering, what are the next planned steps and are there any target dates for these? --- Jura 09:18, 15 September 2015 (UTC)

Wondering this also. There is renewed interest to finish the mapping + import. // Sj (talk) 12:25, 20 July 2018 (UTC)

Freebase as a sourceEdit

Many bots, many processes acknowledge a source by identifying it in a "reference statement". When Freebase has a source for a statement, this source is obviously to be preferred. When there is no source, it is best to acknowledge Freebase at the source. It is best practice, it acknowledges the contributions of the Freebase community and, it is the "right" thing to do.

This should be easy to implement. Thanks, GerardM (talk) 20:38, 30 September 2015 (UTC)

Finnish userEdit

Renewed interest in thisEdit

I've had a few conversations about this recently; hoping I can help refresh this page.

  • Do we still need a mix/n/match style tool for reconciling + resolving mappings?
  • Tpt commented that the primary sources tool is being updated, after which he can rerun his conversion tool + import
  • Spot checking + more importing would still need to be done by hand

Thanks to Bluerasberry, Denny, Jura, Legoktm and all who have worked on this so far.

Sj (talk) 14:45, 20 July 2018 (UTC)

It could be very nice to have a good topic/item matching tool. We only have 4M of Freebase topics mapped so a very low percentage of the Freebase total. This tool should be fed with the existing automated mappings not already on Wikidata because of their lower quality.
I could easily rerun the conversion tools and update the current Primary Sources database if the version v2 deployment is delayed. Tpt (talk) 14:53, 20 July 2018 (UTC)
I previously commented years ago. Now I understand Wikidata better and have 100 projects here. I am not sure what Freebase information is good for now and what is unused. If there is no champion of this dataset to explain it then I would not know how to engage. Blue Rasberry (talk) 17:48, 20 July 2018 (UTC)
  • How about attempting to import directly topics that Wikidata doesn't cover yet? Initially I had thought that tv series episodes might be a good thing, but their coverage has improved recently.
    --- Jura 09:42, 24 July 2018 (UTC)
We should definitely import some topics but probably not all of them. There were a lot of "sandbox" topics created by users for tests or for trying to get into the Google Knowledge Graph. There are also some domains we probably do not want to import too, e.g. the set of topics that where used to represents ISBN identifiers. A massive importation would also create some deduplication problems. Tpt (talk) 12:20, 24 July 2018 (UTC)
I think we may set up a (read-only) mirror of Freebase in Wikimedia Labs; this will help future import or integration.--GZWDer (talk) 12:25, 24 July 2018 (UTC)
It would be cool if that could be run as a sparql endpoint that could be queried from query server.
--- Jura 09:24, 25 July 2018 (UTC)

What are the key imapct of Wikiprojects over wikidataEdit

as I am new in Wikipedia project and wiki data, so first of all before starting to get involved in any project I wanted to clear out few of my doubts from community members,

1- what is the key point to start learning about WikiProjects?. 2- How can I get involved with any of existing ongoing project I want to join any like intern?

Return to the project page "WikiProject Freebase".