User:Rdrg109/1/16

Motivation edit

Someone in /r/ChineseLanguage (Q110820753) asked for Chinese literature research papers for free, so I thought that this question could be answered with Wikidata. In this page I present some queries that might show relevant information to the proposed question.

Queries edit

With the following query, we can obtain the total number of scholarly articles written in Chinese. At the time of writing, the query returns 1.059.747.

SELECT (COUNT(*) AS ?count) {
  ?item wdt:P407 wd:Q7850;
        wdt:P31 wd:Q13442814.
}
Try it!

With the following query, we can obtain those scholarly articles written in Chinese and that have the property full work available at URL (P953). Recall that we are interested in those that are free, so we might want those papers whose have a URL where we can access to them. At the time of writing, the query returns 4.987.

SELECT (COUNT(*) AS ?count) {
  ?item wdt:P407 wd:Q7850;
        wdt:P31 wd:Q13442814;
        wdt:P953 [].
}
Try it!

With the following query, we can obtain such scholarly articles whose full work available at URL (P953) end in .pdf. Through this query, we can list those that are more likely to be freely available. Unfortunately, at the time of writing, this query only returns 9 rows. I noticed that some values of full work available at URL (P953) doesn't end in pdf because they are HTML websites that have a button that opens up the PDF, so they are also free. We would need to check those websites since they could also have freely accessible articles.

SELECT
  ?item
  ?url
WITH {
  SELECT ?item ?url {
    ?item wdt:P407 wd:Q7850;
          wdt:P31 wd:Q13442814;
          wdt:P953 ?url.
  }
} AS %0
{
  INCLUDE %0.
  FILTER(REGEX(STR(?url), "\\.pdf$", '')).
}
Try it!

With the following query, we can list those scholarly articles whose main topic is part of computer science (first query) or mechanical engineering (second query). You can notice that both queries are the same, it is only one entity that changes, so you can change that entity with whatever area of knowledge you might be interested in (e.g. biology, mathematics, thermodynamics, psychology, etc.)

SELECT ?item ?titleZh ?titleEn {
  ?item wdt:P407 wd:Q7850;
        wdt:P31 wd:Q13442814;
        wdt:P921/wdt:P361 wd:Q21198.

  OPTIONAL {
    ?item wdt:P1476 ?titleZh.
    FILTER(LANG(?titleZh) = "zh").
  }

  OPTIONAL {
    ?item wdt:P1476 ?titleEn.
    FILTER(LANG(?titleEn) = "en").
  }
}
Try it!
SELECT ?item ?titleZh ?titleEn {
  ?item wdt:P407 wd:Q7850;
        wdt:P31 wd:Q13442814;
        wdt:P921/wdt:P361 wd:Q101333.

  OPTIONAL {
    ?item wdt:P1476 ?titleZh.
    FILTER(LANG(?titleZh) = "zh").
  }

  OPTIONAL {
    ?item wdt:P1476 ?titleEn.
    FILTER(LANG(?titleEn) = "en").
  }
}
Try it!

Additional notes edit

I first thought answering this as a reply in the Discord server, but I then remembered that those messages are not indexed by the search engines, so the knowledge would be gatekeeped in that Discord server. For this reason, I decided to write this page. It required more effort, but at least any people with access to the Internet can read it instead of those only users that belong to that Discord server.

Using dorks in search engines edit

Unfortunately, as of this date, the Wikidata results are less than 100 results. By using search operators on some search engines, you can get more results. Here are some dorks that can accomplish that.

Scholarly articles published by institutions in Taiwan that use traditional characters. We can't ensure that the institutions are in Taiwan, but because we filter in to those websites with the .cn top-level domain, we are considering the potential ones.

site:.edu.tw filetype:pdf "國" ("abstract" OR "摘要")

Scholarly articles published by institutions in China that use traditional characters. We can't ensure that the institutions are inside of China, but because we search for the ".cn" top-level domain, those institutions are the potential ones.

site:.edu.cn filetype:pdf "大學" ("abstract" OR "摘要")

Scholarly articles published by institutions in China that use simplified characters. We can't ensure that the institutions are inside of China, but because we search for the ".cn" top-level domain, we are looking for the potential one.

site:.edu.cn filetype:pdf "大学" ("abstract" OR "摘要")

Filter the results to Simplified Chinese and Traditional Chinese edit

To limit the results to written works that use Simplified Chinese, try searching for a word whose characters have a simplified or traditional counterpart. For example, "literature" is written as "文学" in simplified chinese and "文學" in traditional chinese.

If I wanted to find written works on literature in Simplified Chinese, I would search the following

filetype:pdf (site:.edu.tw OR site:.edu.cn) ("abstract" OR "摘要") "文学"

Instead, if I wanted to find written works on literature in Traditional Chinese, I would search the following

filetype:pdf (site:.edu.tw OR site:.edu.cn) ("abstract" OR "摘要") "文學"