Wikidata Bot

PythonPywikibot

This user account is a bot with a bot flag. The bot is operated by Amitie 10g.
  • Block this bot if it is malfunctioning.
  • Check its work.
  • Contact the operator about mistakes.
  • See all Requests for Permissions related to this bot: 1
  • Task: import data from TOP500
  • Type of execution: on demand
  • License: MIT license
This bot runs on Wikimedia Toolforge.
Administrators: If this bot needs to be blocked due to a malfunction, please remember to disable autoblocks so that other WMF Toolforge bots are not affected.

TOP500 importer is a project aimed to import data from TOP500 into Wikidata.

Tasks edit

Importing process has some tasks (or steps), in the following order:

  1.   Get operating systems and pass to dict.py (most, will added more soon)
  2.   Get platforms and pass to dict.py (some, will added more soon)
  3.   Get manufacturers and pass to dict.py (some, will added more soon)
  4.   Get locations and pass to dict.py (some, will added more soon)
  5.   Get CPU and pass to dict.py (most, will added more soon)
  6.   Import everything to Wikidata (using mass())[n 1][n 2]
  7.   Consolidate duplicates, using the Merge gadget (READ NOTE!!!)[n 3]
    •   Split back non-true-duplicates, as I merged prior to realizing the above.
  8.   Check wrong manufacturers and correct them, using QuickStatements
    •   Move manufacturers from HPE to SGI, as appropriate[n 4]
  9.   Repeat steps 1 to 5, to get more data and ensure everything is available.
  10.   Find items with missing statements via bot (code will be created soon)
  1. Import process has been parallelized. However, this task has taken weeks due the time to take data from Top500 website, plus the maxthrottle parameter (set to the default value provided by Pywikibot), to avoid database replication lag.
  2. Bot imported up to the TOP500 ID 200.000. More executions will be performed soon.
  3. The TOP500 database contains several machines with the same name (normally the system hardware like Origin 2000 rather than a nickname like Pleiades) but different identifiers and (most of them) at different locations; those machines would have different specs. Different installments of machines with the same name should not be considered as duplicates, therefore, them will not be merged into older items; machines with unknown location or several ones under the same location are candidate for merging. This job should be done by a human; I'll create a page to coordinate the candidates for merging, once the import has been finished, to avoid doing this more than once.
  4. Silicon Graphics (Q623459) created several supercomputers; then, in 2009, it has been purchased by Rackable Systems (then renamed to Silicon Graphics International (Q17080768)); and then, in 2016, SGI has been purchased by Hewlett Packard Enterprise (Q19923099). As TOP500 database has HPE as the manufacturer for machines manufactured by SGI, those items should be corrected according to the epoch.

Subpages edit

  • /created: Log with pages created
  • /status: Status number: (1 or 128: runing; 0: stopped; 2: error)

Items created edit

This list has been generated automatically, and order is based on the TOP500 ID. Several items seems to be duplicated, but some instances are different, so, a manual review is in course. See also on Xtools

Archives: current 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Page is archived manually

Latest archive

Archives: current 1 2 3 4 5 6 7 8 9 10 11 12 13 14