Wikidata:Property proposal/text features

Text features edit

number of words edit

Originally proposed at Wikidata:Property proposal/Sister projects

Descriptionnumber of words in text
Data typeQuantity
DomainWikisource texts
Allowed values>0
Allowed unitsnone
Example 1À M. Paul Foucher (Q55867126) → 1000 (replace with actual number)
Example 2À M. des Herbiers (Q55867160) → 800 (replace with actual number)
Example 3À son frère (Q55867161) → 700 (replace with actual number)
Planned useadd to some Wikisource text

Discussion edit

  •   Comment it could be interesting, but the rules to compute the number of words should be fixed, because the exact same text can have different word counts in different systems... --Hsarrazin (talk) 08:11, 14 November 2018 (UTC)[reply]
  •   Support @Hsarrazin: determination method (P459) can be used to denote which method was used to count words. Dhx1 (talk) 10:28, 14 November 2018 (UTC)[reply]
  •   Comment Also suggest the domain be expanded to cover all texts described by Wikidata--not just Wikisource items, where a reliable source exists for the word count. Dhx1 (talk) 10:33, 14 November 2018 (UTC)[reply]
    • Yes, P459 would generally be added with a value that links to a fairly detailed explanation on how it's being done. Personally I'd start out with Wikisource and see how it goes. Eventually it could be expanded. --- Jura 12:23, 15 November 2018 (UTC)[reply]
  • tend to   Oppose, seems like a specific version of number of parts of this work (P2635), or redundant with the scheme :
    ⟨ text ⟩ has part Search ⟨ word ⟩
    quantity (P1114)   ⟨ number of words ⟩
    .
    Also see type–token distinction (Q175928)      there is a difference in the number of word-types used (if you use « dog » twice in your text this count as one word type but two « occurences » of the word « dog ») - actually we may be able to solve this with the pair of property has part Search/has part(s) of the class (P2670)   now that I think of it :
    ⟨ the text ⟩ has part(s) of the class (P2670)   ⟨ word-type ⟩
    quantity (P1114)   ⟨ the number of different word-type ⟩
    and
    ⟨ the text ⟩ has part Search ⟨ word ⟩
    quantity (P1114)   ⟨ the number of different occurences ⟩
    Indeed, « word-type » can be thought as a metaclass of words and « has part of the type » can cross the boundary beetween the class level and the metaclass one (that’s what it is for actually), while « text » and « words » can be thought as classes of the same level - you use words to build text, each time you copy a text you copy all of its words alike with the text. author  TomT0m / talk page 13:16, 20 November 2018 (UTC)[reply]
    • Thanks for your input. number of parts of this work (P2635) could work if we were just interested in one aspect, but using units to differentiate between types of parts seems complicated as we would need to retrieve the detailed SPARQL node each time. has part(s) of the class (P2670) seems a good alternative, but as we will likely have several values for the statements (depending one calculation method), selecting the correct one is slightly easier with a separate property. Furthermore, as this property will apply to many items, I think a dedicated property is preferable. --- Jura 06:42, 23 November 2018 (UTC)[reply]
      @Jura1: The counting method actually is a case to discriminate using « has part » / « has part of the type », « has part of the type » is appropriate for example if you count the « word-type » number, per the type-token distinction, and « word-token » we can even use « has part ». I also note you don’t details at all the way to model different counting method, I think it may be way more appropriate not to use arbitrary items for obscure non-described method if we can use generic concepts to model them ( an item for « word type » for example, through metaclassification). author  TomT0m / talk page 13:49, 16 December 2018 (UTC)[reply]
  •   Support Good idea. I wonder if the domain could indeed be stretched beyond wikisource-entries. Lymantria (talk) 11:14, 16 December 2018 (UTC)[reply]

@ديفيد عادل وهبة خليل 2, Hsarrazin, Lymantria, TomT0m, Dhx1, Jura1:   Done: number of words (P6570). − Pintoch (talk) 20:28, 6 March 2019 (UTC)[reply]

number of sentences edit

Originally proposed at Wikidata:Property proposal/Sister projects

Descriptionnumber of sentences in text
Data typeQuantity
DomainWikisource texts
Allowed values>0
Allowed unitsnone
Example 1À M. Paul Foucher (Q55867126) → 50 (replace with actual number)
Example 2À M. des Herbiers (Q55867160) → 40 (replace with actual number)
Example 3À son frère (Q55867161) → 30 (replace with actual number)
Planned useadd to some Wikisource text

Motivation (both proposals) edit

I think it would be good to add such metadata to Wikisource texts. Maybe additional properties can be useful.

@Hsarrazin: who edits there frequently. @Dhx1: who mentioned related readability scores on Project chat --- Jura 05:40, 14 November 2018 (UTC)[reply]

Discussion edit