User:MaxEnt/shoshin

» First impressions of Wikidata by User:MaxEnt «

Reverse chronology, with newest material at the top.

All widening gyres lead to Rome edit

13 March 2021

In my recent initiative to really learn JavaScript for real this time (beyond hacking 30 lines of code as a stopgap measure every six months), I authored a simple Greasemonkey script that runs on all Wikipedia pages (I'm not currently sourcing this out of my personal userscript, which is also possible, but more public).

My small script makes a simple fetch request to the MediaWiki API on my personal wiki to determine whether I already have a page of the same name as the Wikipedia page I'm currently viewing. Easy enough.

I thought it was working perfectly at this simple task, but I was wrong.

In the case where my personal wiki contains an alias page, but the alias page contains no templates whatsoever, the API does not return an empty array (as Dijkstra would have mandated in response, as "nothing to report" is distinct from "you didn't even ask"); instead the API returns an object with no "template" member whatsoever.

My code blithely assumed that if I explicitly requested a list of all templates, such a member would surely be present. Weirdly, after attempting to access this non-existent member, none of the rest of my script executes (that I can see with console.log statements), but no error is reported either. It simply fails silently and no magical personal infobox appears.

I can't stand overuse of if/else proceduralism (if you have a hammer, and it swings both ways ...) so rather than layer on the braces, I elected instead to look for the best idiom to add a member that doesn't exist with a chosen default value (in this case the empty array []).

I quickly found a StackExchange page advocating

member_expr = member_expr || default_value;

Whatever happened to the reflexive ||= assignment operator as God and Brian Kernighan (or Kenneth Iverson) would have shaped from a muddy river bank? Too many programmers have a low conception of the ODR, but I digress.

The above syntax was endorsed as idiomatic, except that most programmers use Lodash or underscore.js (those being the ratified Wikidata names) when confronted with this kind of "utility belt" picklette.

Well, I had heard of these, but I was keeping myself confined to liturgical Latin for the first few weeks, without any dang slangifiers.

And the upshot:

  • underscore.js is not much adopted by new projects these days, but remains ubiquitous among large pre-existing projects.
  • Lodash has more current uptake, but only for larger projects, as ES6 now directly provides a large enough subset for smaller projects to eschew the extra complexity of yet another library.

And no, this is not my own callow opinion formed from five minutes on SE. Following this thread, I found the wonkiest post imaginable, by Albert Ziegler of Semmle, which brings high-end data science to the uptake graph for underscore.js and Lodash. Seriously: fitted Markov models and logistic regression analysis alongside big fancy R plots that even I don't know how to make (it's the particular manner of burping one plot from another plot that I've never seen before).

He's doing this to showcase the power of their internal technology: a Datalog spin-off of blurry ontological constituency { SemmleCode, .QL }, this/these being an object-oriented query engine for deduction with support for recursive query (good for attacking Java stack dumps—and I've never seen a large production Java system that wasn't able to stack dump a quarter bog-roll in a single inverse gulp). If someone (or something) has to read those dumps, I'm sure glad it's SemmleCode and not me.

[*] Runaway instantiation in a complex C++ template library can knock Enterprise Java's quarter bog-roll flat for eye-glazing wall-of-text geek Mesmerism, but only at compile time, rather than run time.

Weirdly, this is fairly adjacent to OWL, SPARQL, and things like Apache Jena that I was just reading about yesterday.

Rome-return factor one.

But there's also already a second boom-a-Rome factor.

The Semmle page turned out to be a complete mess. Puffy and badly structured. Which I immediately attempted to address with the least edit possible.

Principally, my edit (group) involved added the second official definitional item for SemmleCode (this redirect is actively linked from the Wikipedia .QL page):

Semmle Inc is a code-analysis platform provider, with offices in San Francisco, Seattle, New York, Oxford, Valencia and Copenhagen.

SemmleCode is an object-oriented query language for deductive databases developed by Semmle.

It is distinguished within this class by its support for recursive query.

It turns out that the three main sections of the article (it only has three sections) all pertain to SemmleCode more than Semmle.

I promptly renamed "Background" to "SemmleCode background".

And "Integration with development environments" to "SemmleCode integration with development environments".

The third main section (in the middle of the muddle) is "Sample query in QL". Already clear this was not about Semmle, so no title change.

This is precisely what I complained about in an earlier post: the crazy-making array of pages where corporations are gaily conflated and confused with their signature software product/project (since in our newfangled PC-panopticon "gay" is no longer used in a derogatory sense by anyone with future life prospects, I figure it's finally high time to yank the old word back, in a rearguard chicken-salad action).

I say: Down with the gay naughty notability vise where Semmle and SemmleCode are confined to a Frankenbooth of Frankensteinian ambiguity between creator and creation.

For the record, Frankenstein was the scientist, not the monster, despite the Prince Charles / Princess Diana photo-op disparity field, in which Diana seemed for a time to wear the real royal pants of the regality-challenged royal couple.

[*] The gay naughties peaked circa 2008, and Wikipedia participation has never quite been the same since.

[**] The strong ODR: In computer science, if you're capturing structure by typing the same thing twice, and "thing" has internal syntactic structure (much beyond internal dots), then you're doing it wrong. Hence ||= for "reassign if not false-ish" ought to be a real thing.

Fun with notability edit

11 March 2021

On my first real day, I've already created two items. I got around to skimming the notability policy on the second pass (sue me).

Wikidata in its first phases has two main goals: to centralize interlanguage links across Wikimedia projects and to serve as a general knowledge base for the world at large.

Hmmm, I sort of only barely give a toss about centralizing interlanguage links. I'm sure that was a pain point for a lot of people on Wikipedia, but it was never my dog. The only time I would ever use this is when I'm cooking Indian cuisine, because there sure are a lot of confusing Hindi synonyms for one ingredient after another. Perhaps that would help me finally distinguish the seventeen different types of black cumin.

On the other hand, for "general knowledge base" I'm all in, and that's an understatement.

It contains at least one valid sitelink to a page on [yada yada].

I'm happy to link en.wiki pages that are not already represented, should I come across one I care about. But this really isn't my dog, either.

It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references.

Now we're getting to my dog. I use Wikipedia as a shadow ontology for my own personal research wiki, and it often does not suit the purpose, because Wikipedia culture loves to jam multiple discrete subjects onto a single page. In my own wiki I do not do this and it makes keeping a rough correspondence difficult.

A really bad case of this concerns all those mushroom VC companies from California where their PR face deliberately confuses their all-encompassing web presence with their legal corporate entity.

In my own wiki, a corporation is part of category:corporation and a software project is part of category:software project. I do not ever combine these both.

Another bad case on Wikipedia is the whole of mammalian biology. For this one I don't fault the culture as much, because it's a nasty problem.

So you have a page titled "intelligence". It will probably be a page 80% about human intelligence (ours truly) and only a very little bit about monkey or dolphin or dog intelligence. On my own wiki, a page primarily about a human capacity or biological system has the word "human" at the front of the page title. And maybe I then have to supplement that with "primate intelligence" or "mammalian intelligence" and it's a hassle for sure, but think about the ONTOLOGY!

I'm hoping that wikidata will prove to be a useful second opinion, with its ontology fundamentally less jammed up.

Of course, this runs counter to the aspirations for wikidata in its current phase, to be a shadow ontology of the encyclopedia we now have. Well I know how that's going to work out (not at all well in the long run) and I know it won't stay this way forever and I know that the actual present battle is to prevent wholesale additions of material outside of these guidelines, because we're really not ready for that yet.

Meanwhile, I'm mostly here to stress test the looming ontology divergence and staying entirely (or even mostly) within the bounds of the broken Wikipedia ontology won't serve my purpose.

It fulfills a structural need, for example: it is needed to make statements made in other items more useful.

Another item where I'm all in, and another understatement.

One my main activities on my personal wiki is collecting small abuses of language. I have an entire page devoted to words of "trombone" scope. Words like "useful" that could be valued at a zinc penny or could be valued at a platinum pound.

But already I've presumed too much to characterize the problem as one of mere magnitude. I specialize in thinking way too hard about how to maximally complicate "useful".

When we're trying to decarbonize the entire global economy with a human population pushing toward ten billion, and we give ourselves a ludicrously short two- or three-decade deadline, is it "useful" to conserve water by flushing less often? Yeah, it is, but it's zinc penny useful, if you know how to do the real math.

It's surely "useful" that the current project is orderly and doesn't fall over under the collective weight of good intentions.

But it's also "useful", on some celestial spreadsheet somewhere, to think really hard about what Wikidata wants to be, after it's ontology passes through puberty (and the angry phase of bipolar separation anxiety concerning the pragmatic but inelegant Wikipedian "mothership" ontology, which is suddenly all too embarrassing for words).

In my remit, it's not at first obvious what's actually useful. That's fundamentally why I'm here: because I don't yet know.

But, fortunately, it's "useful" to sort that kind of issue out, so I still legitimately slide in under the wire of official policy, as I choose to read it on first encounter.

Some lame CopyQ fu edit

11 March 2021

This particular CopyQ fu only works on FF running on a traditional X desktop.

Command type: automatic.

Filter:

copyq:

if (!isClipboard()) fail(); 
var url = str(read('text/x-moz-url-priv')); 
var m = url.match(/^https:\/\/www.wikidata\.org\//); 
if (m === null) fail();

Command:

\([QPL]\d+\) ?$

Script:

copyq:

var clip = str(clipboard()); 
var ffurl = str(clipboard('text/x-moz-url-priv'));
popup('auto FF wikidata.org');

var m = clip.match(/ *([^(]*) +\(([QPL]\d+)\)/);

if (m === null) ignore(); 

var name = m[1];
var prop = m[2]; 

var txt = '[[' + prop + '|' + name + ' (' + prop + ')]]';

setData(mimeText, txt);
copy (txt);
On code style: I'm not a JS wizard (yet), but it would hardly help because CopyQ isn't up to speed with all of ES6, so I'm unable to use template interpolation and who knows what else, so I don't push the envelope with fancier idiom.

On Firefox, when you copy to the clipboard it creates a special MIME tag which contains the URL of the source page.

I've set up filter to look for a url of the form

https://wikidata.org/

I've set up command to look for clipboard text ending with:

(Q999) or similar L or P expression

If both of those pass, it runs the script.

Using "divine providence" as an example, when I triple-click on "providence" in the page title, FF selects the entire page title.

While I'm doing this, CopyQ tries to run automatically on changes to the X clipboard. !isClipboard() rejects changes to the X clipboard. The X clipboard is useless to me, because it doesn't have the crucial text/x-moz-url-priv MIME type which allows my script to taste the source page URL.

Then I click "copy" so that FF stores to the regular clipboard, with the special MIME types.

It then uses a simple regular expression to rewrite the clipboard contents as follows:

[[Q866338|Divine providence (Q866338)]]

I'm probably not even supposed to use this construct, in favour of some {{Q}}-ish alternative I've yet to discover.

But I can't reliably verify my edits to page source (raw editor) with only the Q templates, so I'm doing it this way for now.

Note 1: What I'm seeing with the hover tip on the rendered page for the Q elements is just a reiteration of the Q number, which helps not at all.

Note 2: It's spectacularly more useful in my clipboard manager to search and inspect my clipboard elements formatted with both the number and the name, rather than the number alone.

I do a lot of nested work with various tabs open. An unpasted item from some long-ago tab can wind up a 100 slots down the stack, and it's still there when I return to my suspended edit tab.

Chip the glasses and crack the plates!
Blunt the knives and bend the forks!
That's what Bilbo Baggins hates-

Turns out, I do stack my deferred clipboard items a little too deep sometimes, and then they are not recovered as the whole suspended edifice comes tumbling down.

From divine providence to strict order (Q11077412) edit

11 March 2021

Well, I took the short description too much on faith, and when I went to en.wiki and a few other places, I discovered that God's intervention has two distinct scales:

  • continuous trim of every last atom — sounds tiring to me
  • direct manifestation in the human experience (special providence)

At en.wiki the discussion is 90% Christian with a side-order of LDS.

Meanwhile, I'm struggling with « instance of | attributes of God in Christianity »

This essay begins in very concrete terms not the least bit applicable to this case.

Now I learn that classes are used for just about every abstract idea that can't be put directly into a box.

Human brains, in general, are a class.

Like subclass of (P279), part of (P361) is a transitive property.

This is weird.

Normal definition of transitive:

The Transitive Property states that for all real numbers x ,y, and z, if x=y and y=z, then x=z

What I think this page is trying to say is (R⊆S)∧(S⊆T)⟹R⊆T

Which turns out to be the correct definition of a transitive relation.

What the page is really about:

I love my Mom because she bought me bananas and bananas are amazing. So, using the Transitive Property ... my Mom is amazing.

But it also associates "transitive property" with equality (an equivalence class) rather than a general relation.

The following property: If a = b and b = c, then a = c. One of the equivalence properties of equality.

Most of the quick Google hits associate "transitive property" with this weaker notion of equivalence under equality, whereas "transitive relation" also always dives directly into the stronger notion of partial order.

Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special cases of a preorder:

  • an antisymmetric preorder is a partial order
  • a symmetric preorder is an equivalence relation.

I was about to make a quick change to the Help:Basic membership properties, but then I noticed the <!--T:92--> markup embedded in the page, so I rushed off to read the "make sure you read these passages" (yes, in that order):

Now isn't this a whole new world of hurt? But, of course, it all makes great sense.

Instead, I came to my senses and left a rather detailed comment on the appropriate talk page suggesting improved language:

Supernatural realm edit

11 March 2021

I've now also created the item supernatural realm, having done quite a bit of searching beforehand.

Sure enough, 30 s later, as I'm trying to apply this new item somewhere, an input box suggests elsewhere causing me to immediately reconsidered my own new item.

But here's the problem: elsewhere is an instance-of mythical place.

The supernatural realm that many Americans fervently believe in is not by any of these people considered to be a mythical place.

These people think it's existence is firmly established by revelation via divine providence.

I'm among those who side with divine providence as hypothetical entity. But it's not for me to classify heaven and hell as mythical elsewheres, nonetheless.

I have a long history of ontological struggles with en.wiki, because I use it as my source ontology for my own research wiki, which is extremely large as personal wikis go.

It was pretty easy to carve off from sexual ethics (a field of study) the new item sexual morality (what sexual ethics studies).

This case is far more difficult to resolve. Think it's time for a walk in the sunshine to digest my fresh ontological impasse.

New user on Wikidata edit

11 March 2021.

I'm a long-time editor on en.Wikipedia, but I haven't taken a serious tour of Wikidata until now.

I think it's always useful to record the teething pains of an experienced person when adapting to a new environment. "Beginner's mind" (Shoshin) does not last very long.

One of my first edit attempts was to add a "start time" property to the hoary network protocol SNMP.

This was an interesting case, because it contains SNMPv1, which itself has a "start time" property.

There doesn't seem to be any way for SNMP to infer "start time" from its member SNMPv1 as opposed to another viable member, such as SNMPv2—except that SNMPv1 is classified as "replaced by" its successor. But that's probably going above and beyond for most software clients.

Furthermore, SNMPv1 is not classified as the "inception point" of the protocol (there could quite easily be an alpha SNMPv0.1 which we have yet to capture).

So I thought maybe I could qualify my "start time" with "software version" = "1". But this only elicited a big black ! telling me I was violating the schema.

I thought that would be a clever way to annotated that my "start time" is based on the start time of SNMPv1 as my "decision method".

I think I also tried "has part" for "SNMPv1" and got the same black bang.

All around, it seems my first attempt at qualification was too subtle for the schema by half.

Was I overexcited by qualification, or is the schema actually too rigid in this respect?