User:Amadalvarez/sports statistics property
Background edit
To date, properties related to sports information have been underdeveloped. Player and team characteristics have average coverage, but seasons, leagues, championships, matches, etc. have items with poor content. I think the main reason lies in the variety of performances and data among different sports that makes it difficult to share a homogeneous ontology. If we focus on the quantitative properties that measure results, we can find:
- a basic group that exists from the very beginning and that allows to measure total elementary data: number of matches played/races/starts (P1350), number of points/goals/set scored (P1351), number of wins (P1355), number of losses (P1356), number of draws/ties (P1357), points for (P1358), number of points/goals conceded (P1359)
- a group with total career figures of a player: total goals in career (P6509), total shots in career (P6543), total points in career (P6544), penalty minutes in career (P6546).
- Some properties are specific to a sport, such as the career plus-minus rating (P6547) for hockey, century breaks (P4912), highest break (P6590) for snooker or, the properties doubles record (P555) and singles record (P564) for tennis (by the way, with a format completely contrary to good design for quantitative data).
New trend edit
In late February 2021, a block of baseball-only statistical properties was created to measure several actions of players, and some for matches: number of at bats (P9180), number of hits (P9184), bases on balls (P9188), runs batted in (P9190), stolen bases (P9217), doubles hit (P9220), triples hit (P9225). A request is now open for 13 new properties (1 for rugby, 1 for baseball and 11 for basketball) to collect statistical totals.
We're opening a can of worms !. The number of statistical indicators managed by sports specialists is high, but I want to understand that it is reasonable from the point of view of who is dealing with a single sport. Additionally, these indicators have more than one dimension, as they are usually shown by player, team, season, match, etc. and their combinations (ex: goals of a player in a match, in a season, in a team, in his career, etc.). If we multiply this by the number of sports (except cases like the P1351 which is shared by many sports) the number of properties will become huge. Also, if someone is able to maintain a new statistical indicator, who will be able to approve / reject the creation of a new property, for example, number of turnover (Q354115), or the Pace Factor [1] of an NBA team ?.
Proposal edit
My proposal is to handle these indicators into a common structure for all of them, based on the traditional OLAP cube with 3 dimensions: subject, object and scope. Each different combination of this three dimensions points to a quantity, it is, the value of the indicator. I'm not talking about having an OLAP, I'm just taking the idea of 3-dimensional representation.
Component description edit
A "single wildcard property" called “sports statistics”, “statistic indicator value”, “quantity of” (or what we agree on, doesn't matter now) will define the "indicator" we need to collect in sport item. Let's see the three dimension descriptions:
- the subject of the measure (player, team, competition, match, ...).
- It answers the question, "Whose indicator is it?".
- In our case, it is the item.
- We may show it on the Z-axis of the cube.
- the object of the measure (points, goals, rebounds, fast laps, etc.).
- It respond the question "What is the concept measured ?".
- Here we will use the new wildcard property to define the indicator concept. It's not a closed list as would happen with specific properties for each indicator.
- We may show it in X-axis cube.
- the scope of the measure.
- It answers the question "Which is the part the measure correspond?". Usually, we could talk about "period of time", not a date. Example: season, time in a league, match, career, etc. However, can also be use to define another criteria, for instance, which of the members of a team does the indicator refer to?.
- We may show it on the Y-axis of the cube.
If we project the concepts defined on the cube, the cell on the X-Y-Z coordinate contains the figure of the measured quantity.
Examples of use edit
Here we can see different situations, such as the figures of the whole career, a season or a match for the case of a player, a team or a match.
The "value" of the "quantity of" property is the object to be measured;
- the value of the P518 qualifier is the scope of the measurement;
- and the value of the P1114 qualifier is the measured figure.
In player item:
- Whole career statistical
- Michael Jordan (Q41421)quantity of (new)basketball game (Q18431960)
applies to part (P518)career (Q282049) quantity (P1114) 1072 - Michael Jordan (Q41421)quantity of (new)point (Q2353718)
applies to part (P518)career (Q282049) quantity (P1114) 32292 - Michael Jordan (Q41421)quantity of (new)rebound (Q654355)
applies to part (P518)career (Q282049) quantity (P1114) 6672 - Michael Jordan (Q41421)quantity of (new)assist (Q1510817)
applies to part (P518)career (Q282049) quantity (P1114) 5633 - Michael Jordan (Q41421)quantity of (new)three-point field goal (Q746826)
applies to part (P518)career (Q282049) quantity (P1114) 536 - .... etc.
- Time in league statistical
- Dani Pedrosa (Q313959)quantity of (new)podium (Q5688743)
applies to part (P518)MotoGP (Q10858737) quantity (P1114) 112 - Dani Pedrosa (Q313959)quantity of (new)pole position (Q588596)
applies to part (P518)MotoGP (Q10858737) quantity (P1114) 31 - Dani Pedrosa (Q313959)quantity of (new)fastest lap (Q310258)
applies to part (P518)MotoGP (Q10858737) quantity (P1114) 44 - Dani Pedrosa (Q313959)quantity of (new)point (Q3393320)
applies to part (P518)MotoGP (Q10858737) quantity (P1114) 3087 - Dani Pedrosa (Q313959)quantity of (new)fastest lap (Q310258)
applies to part (P518)250cc/Moto2 (Q15635270) quantity (P1114) 15 - Dani Pedrosa (Q313959)quantity of (new)point (Q3393320)
applies to part (P518)250cc/Moto2 (Q15635270) quantity (P1114) 626
- Season statistical
- Michael Jordan (Q41421)quantity of (new)basketball game (Q18431960)
applies to part (P518)1996–97 NBA season (Q1321749) quantity (P1114) 82 - Michael Jordan (Q41421)quantity of (new)point (Q2353718)
applies to part (P518)1996–97 NBA season (Q1321749) quantity (P1114) 2427 - .... etc.
- Match
- Michael Jordan (Q41421)quantity of (new)point (Q2353718)
applies to part (P518)Game 4 of 1988 NBA Playoffs Eastern Conference First Round, Chicago Bulls at Cleveland Cavaliers (Q56670521) quantity (P1114) 44 - Michael Jordan (Q41421)quantity of (new)rebound (Q654355)
applies to part (P518)Game 4 of 1988 NBA Playoffs Eastern Conference First Round, Chicago Bulls at Cleveland Cavaliers (Q56670521) quantity (P1114) 5
In team item:
- Chicago Bulls (Q128109)quantity of (new)point (Q2353718)
applies to part (P518)2019–20 Chicago Bulls season (Q63859186) quantity (P1114) 6942
In match item: examples equivalent to the requested properties.
- 1989 Georgetown vs. Princeton men's basketball game (Q90747291)quantity of (new)rebound (Q654355)
applies to part (P518)Bob Scrabis (Q98841453) quantity (P1114) 2 - 1989 Georgetown vs. Princeton men's basketball game (Q90747291)quantity of (new)rebound (Q654355)
applies to part (P518)Princeton Tigers men's basketball (Q7245013) quantity (P1114) 13
Some considerations edit
- The property we used in the examples to represent the temporal scope is applies to part (P518), but it could also be criterion used (P1013) or relative to (P2210). It is worth debating to agree which one we will use in all cases in order to have a homogeneous way of accessing the information.
- Although not represented in the diagram, the model may incorporate other additional qualifiers.
- Note that when the subject is a match or the period (scope) is autodefined by item (ex.:Morocco at the 2018 FIFA World Cup (Q44310763), the scope dimension represents values about clubs or players, or doesn't need to be.
- The new property (“sports statistics” or whatever name) must appears as main value (Q54828448).
Overlapping with existing properties edit
When we approve this property we will give solution to many of the newly created properties or in the application phase. For those historical and widely populated properties, a very precise analysis will be necessary, since in some cases their use is applied to multiple situations and with different structures. In general, we could say:
- Current uses as main property can be migrated with few changes. In many cases, the current usage is erroneous and should be fix in advance.
- When it is being used to display total data (career), it will be possible to migrate it to the new property.
- Many uses as a qualifier of properties P1350, P1351 seem reasonable to maintain, specially when they document one "scope" that has no associated items:
- example to keep: the P54 of a player gather sport chronology and can contains repetitions within the same club; and there are no items for each team-period of a player. Ex: Marek Štěch (Q984140)
- example could be migrated: Marek Štěch (Q984140)participant in (P1344)2018–19 FA Cup (Q54866623)
number of matches played/races/starts (P1350) 0,
- Many uses as a qualifier of properties P1355, P1356, P1357, P1358, P1359 could be migrated to the new property, but solutions that leave heterogeneous structures for a single solution must be avoided.
- In any case, the current user systems of these structures must be carefully considered while the migration process.
The following table shows the current situation.
Property | defined as | # entries Principal |
# entries Qualifyer |
Scope |
---|---|---|---|---|
number of matches played/races/starts (P1350) | Undefined | 39190 | 675513 | principal: career for teams, career for competitions (28222) players participation in competition (12406) qualifier: results within a team (P54) or in a competition (P1344) for players results participant teams in competitions (222454) |
number of points/goals/set scored (P1351) | Undefined | 6297 | 756543 | Similar skill to P1350 |
number of wins (P1355) | Qualifyer | 37! | 31958 | Principal:usually, shows career info for players and info that must be P1350 for teams Qualifier: Results within P1344, P1923, P2416, P710, mainly. |
number of losses (P1356) | Undefined | 25 | 30956 | Principal:career info for players and autodefined scope items for teams Qualifier: Results within P1344, P1923, P710, mainly. |
number of draws/ties (P1357) | Qualifyer | 22! | 30732 | Principal:career info for players and autodefined scope items for teams Qualifier: Results within P1344, P1923, P710, mainly. |
points for (P1358) | Qualifyer | 14! | 101002 | Principal:career info for players and autodefined scope items for teams Qualifier: Results within P1344, P1923, P710, mainly. |
number of points/goals conceded (P1359) | Undefined | 242 | 16958 | Principal:career info for players (goalkeepers, mainly) and goals received for teams/season Qualifier: Results within P1323, P1944, P710, mainly. |
total goals in career (P6509) | Undefined | 11494 | 4 | career |
total shots in career (P6543) | Undefined | 6189 | 2 | career |
total points in career (P6544) | Undefined | 7265 | 4 | career |
penalty minutes in career (P6546) | Undefined | 7250 | 0 | career |
career plus-minus rating (P6547) | Undefined | 6189 | 0 | career |
century breaks (P4912) | Principal | 502 | 1 | career |
highest break (P6590) | Undefined | 658 | 0 | career |
doubles record (P555) | Undefined | 3935 | 5 | career |
singles record (P564) | Undefined | 3956 | 4 | career |
number of at bats (P9180) | Both | 3 | 4 | principal: career for player qualifier: match for competitions |
number of hits (P9184) | Both | 3 | 4 | principal: career for player qualifier: match for competitions |
bases on balls (P9188) | Both | 2 | 3 | principal: career for player qualifier: match for competitions |
stolen bases (P9217) | Both | 4 | 6 | principal: career for player qualifier: match for competitions |
doubles hit (P9220) | Both | 2 | 4 | principal: career for player qualifier: match for competitions |
triples hit (P9225) | Both | 2 | 4 | principal: career for player qualifier: match for competitions |