Logo of Wikidata

Welcome to Wikidata, Iamcarbon!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

Notability edit

Thank you for contributing to Wikidata. I see that you recently created an item that does not clearly indicate its notability. The Wikidata project only accepts items that meet its notability criteria, and your item is therefore likely to be deleted soon. In brief, items must have an associated Wikipedia article, must be needed for statements on another notable item, or must have both identifiers and serious sources. For the last case, a good indication of notability would be multiple articles about the subject in independent publications like newspapers or magazines. You can add such sources as references to specific claims using reference URL (P854), or as top-level claims using described at URL (P973).

Also, this may not apply in this specific case, but you should know that we discourage editors from contributing on topics with which they have a strong personal connection, as this may present a conflict of interest. If you are being paid to edit here, then you are obliged to disclose this. For a longer version, you might find it useful to read the essay "How to create an item on Wikidata so that it won't get deleted".  Madamebiblio (talk) 10:32, 20 March 2024 (UTC)Reply

Google Knowledge Graph Search API edit

I'm curious which knowledge graph API you're using. I recently imported a bunch of entries for people and was stuck with a 1QPS limit. I was using the enterprise API which is still in preview. Is the source for your bot public? I'm curious how you did the matching (e.g. in ambiguous cases).

I really wish google's API here was better. They clearly have a lot of data tied to these identifiers that they aren't releasing. BrokenSegue (talk) 14:31, 23 March 2024 (UTC)Reply

Hi @BrokenSegue!
I'm also using the same Google Knowledge Graph (GKG) API, but have been able to stay under the 60qpm API limit as the rest of our pipeline is VERY slow.
For context -- I have been building and annotating an internal dataset consisting of images (mostly in the Arts, Architecture, and Fashion fields) over the past ~10 years, which has grown to about ~15M images. I recently ran these through several new AI models (GPT-4 and Claude), and looked up the overlapping labels on Google Knowledge Graph (to determine basic notability / significance). We cross checked these against Wikidata to establish additional notability, where we discovered around ~70K missing ids.
I was hoping to contribute the most notable of these entities and concepts back to Wikidata to improve interoperability with the Knowledge graph -- but have treading softly to make sure these contributions meet the project's goals and policies.
I've started this process by manually looking up each entity / concept and determining whether it already exists (and just needs an association), or deciding whether to add it.
Most of these knowledge graph entities DO exist. When there's only a few items with the same name, it's quick to find the right one and associate the id. When there are a lot of items, it takes time to manually look through them all to find the right one (particularly when the items don't have descriptions or labels). In the worst case, like "Vase of Flowers", there are hundreds of items with exact same name, and we need to match the actual photo. Matching specific pieces of ART is the most tedious, and takes a lot of time. My plan to is automate this in the future using a vision model.
In the case where I have checked all the existing items with the same name, and various aliases and alternate identifiers, and am confident that do not exist, I have been doing a basic notability check (e.g. making sure that have multiple pages Google results mentioning them, with the source coming up first). It's been more difficult to find and reference reputable sources that aren't written by the author, as most of the top ranking websites are polluted with SEO or locked behind a paywall. It can take up-to 15 minutes per item to find 1-2 good sources -- even when the item is well known.
I am estimating that out of 100 newly discovered knowledge graph labels that I've found, only 10 passed my basic notability check. The majority of these already exist (and just need the id association), and maybe 1-2 (of the 100) have been manually added. I've added around 250 items in total using this process. Iamcarbon (talk) 03:28, 24 March 2024 (UTC)Reply
Very cool work. Is this part of a research project or a business or is this just a hobby project? My imports of gkids was done much more conservatively as I checked to see if the URL they had on file matched the enwiki sitelink. I've also been interested in trying to also import Bing entity ID (P9885) but their API is too expensive last I checked. BrokenSegue (talk) 15:18, 24 March 2024 (UTC)Reply
This is currently a personal project exploring how how improve AI/ML algorithms ability to cite authoritative data. I'm hoping to apply this work to research/ personal knowledge building in the future.
Comparing URLs makes a lot of sense. Once I get a bot going, I'll see if I can make any additional matches using this approach too.
I haven't look at the bing entity ids, but these would also be great to get associated as well. I'll add this to my list to look into as well. Iamcarbon (talk) 21:13, 26 March 2024 (UTC)Reply

Q125214215 edit

Hello! I think iPhone 13 Pro Max (Q125214215) is the same as iPhone 13 Pro Max (Q108541741) and they should be merged. If you don't know how, I can! -wd-Ryan (Talk/Edits) 00:27, 30 March 2024 (UTC)Reply

Identical. Merged! (thank you!) Iamcarbon (talk) 01:54, 30 March 2024 (UTC)Reply