User:TiagoLubiana/WikidataPodAI

So, I am experimenting with the concept of turning the weekly Status Updates into a podcast using ChatGPT and some AI voice generators. Let's see how this goes!

1st iteration - Feb 08 2024 edit

After playing around with ChatGPT's "Create a GPT" system, I was reasonably satisfied with the output. The most valuable things I did were (1) say it should play the role of an experienced Wikidatian talking to a fellow Wikidatian and (2) that the text should be screen-reader friendly. This is the current config of the GPT:

Configurations edit

  • Name: WikiPod Parser
  • Description: Crafts screen reader-friendly, neutral podcast scripts from Wikidata summaries for experienced Wikidatians.
  • Instructions: WikiPod Parser is expertly designed to convert Wikidata Weekly Summaries into podcast scripts that are not only neutral and factual but also optimized for screen reader accessibility. This transformation involves reformatting bullet-point data into a narrative that flows like a conversation among experienced Wikidatians. The goal is to produce content that feels like an insightful discussion between knowledgeable peers, without sacrificing the clarity or factual integrity that Wikidata editors expect. By focusing on creating seamless narratives that logically connect information, the GPT ensures the content is easily navigable and comprehensible through auditory means. It takes on the role of a well-informed, impartial journalist, adept at crafting engaging and coherent stories from structured data, aiming to deliver a comprehensive and listener-friendly podcast experience for the Wikidata community. The GPT avoids using abbreviations to ensure clarity and accessibility, making the scripts as inclusive as possible for all listeners.


I've paid 30 USD out of my (Brazilian student) budget for their basic monthly plan, with 2hrs of AI voice generation and enabling downloads. Not sure this was the best cost-benefit, but I did like their system and it was good enough for this prototype.

Tooling edit

  • Text generation: Custom ChatGPT plugin fed with full source code for this week's summary.
  • Voice generation: LOVO.ai's Genny voice and podcast generator (https://lovo.ai/). Very open to suggestions on that step!
  • Deployment: Genny link
  • YouTube video on my channel + spreading on the Telegram groups and on the next Weekly Summary for feedback.

Links edit

Generated script edit

Welcome to this week's edition of the Wikidata Weekly Summary podcast, where we delve into the happenings and highlights within the Wikidata community for the week leading on February 5, 2024. Let's unwrap the discussions, developments, and events that have been stirring interest among Wikidatians.

Starting with discussions, the community saw new bot permission requests, indicating the tireless efforts to enhance data quality on Wikidata. SamoasambiaBot is on a mission to import coordinate values from Finnish Wikipedia articles and to add corresponding location statements. Meanwhile, MidleadingBot 5 has been proposed to create items for books housed in the National Diet Library, a move towards expanding the bibliographic coverage on Wikidata.

Closed discussions varied in outcomes, including closures due to being "Not an RfC discussion," "No Consensus," and "Stale." These discussions ranged widely from links concerning MotoGP riders to debates on creating new properties for beaches, reflecting the diverse and dynamic nature of conversations within the community.

The events calendar was bustling with opportunities for learning and engagement. Highlights included the LD4 Wikidata Affinity Group call, discussing Wikidata for Education and the Dagbani NLP project, and the kick-off panel discussion for the Black Histories Wikipedia & Wikidata Edit-a-thon. The Wikidata 101 clinic also promised to be an enriching session for those looking to deepen their understanding of Wikidata.

In terms of publications and resources, blogs like the Diff Blog and the ZBW Labs Blog offered insights into the roles of librarians in the Wikidata ecosystem and the expanded access to the 20th Century Press Archives, respectively. Papers and videos further enriched the discourse around Wikidata, exploring its challenges and opportunities in the Semantic Web community and beyond.

The tool of the week spotlighted a query and user interface for exploring the works of Venezuelan authors, artists, scientists, and creators in the public domain, showcasing the innovative uses of Wikidata for cultural and historical exploration.

Updates on Wikidata's development goals for the first quarter of 2024 highlighted the platform's commitment to improvement and user needs adaptation. The introduction of new properties and proposals, from "official list URL" to "Unified Saudi Occupational Classification," demonstrated the ongoing expansion of Wikidata's data model.

Lastly, the weekly tasks encouraged community engagement across various fronts, from adding labels in multiple languages to contributing to showcase items and lexemes. These tasks play a crucial role in enriching and ensuring the accuracy of the vast knowledge repository that Wikidata represents.

This summary serves as a testament to the vibrant and collaborative spirit of the Wikidata community, underlining the platform's key role in the ecosystem of free knowledge and its continuous evolution driven by the contributions of its dedicated members.

Comments edit

Feel free to add comments below!

  • Voice fluidity is not perfect, but it is okay. It is all a bit monotonic and misses some kind of music or tone in the back. 30 USD / month is not outrageous, but still a bit expensive. As an Wikidatian, though, it already sounds okay. The beginning and end could be a bit more meaningful. . ~~~~