User:Phil wink/Quantitative scansion notes

On 14 December 2015, User:Epìdosis proposed the new property: quantitative metrical pattern. This essay is a response to that proposal. It contains both background information and some suggestions of my own, which I hope will lead to productive discussion. If so, I will likely update the page as questions arise and implementations are worked out. For reference, "Version 1" is here.

Introduction edit

Scansion — the analysis and (usually) graphic notation of the metrical structure of verse — is a quagmire. It promises simplicity and clarity, but often delivers … lunacy. My first goal is to point out as many problems as I can right at the beginning, so that as this property is implemented there will be as few surprises as possible. Secondarily, I try to suggest paths that I feel overcome these problems. Disagreements about scansion are often the result of people talking past each other because of unstated assumptions about what it really is or should do; so I have tried to be explicit about both the facts (as I understand them) and my opinions. But I'm happy to try to clarify more.

I have a variety of ambitions for this property. They cannot all be met perfectly, so tradeoffs are inevitable. One ambition is that it be simple: The fewer characters, and the fewer exotic characters, the better. It should also be powerful: Our work should be applicable to as many germane languages, periods, traditions, and wiki uses as possible. It should be optimized both for humans and computers: Our scansion should be so tightly defined that it can truly function as data, yet it must communicate to humans who have typographical traditions and visual needs of their own.

The scope of the proposed property is quantitative meter, so

1. We are not scanning individual lines, but mapping the pattern that all lines in a given meter follow.

2. We are completely ignoring syllabic, accentual, accentual-syllabic, tonal, parallel, free, and all other types of verse which aren’t quantitative verse.

3. Conversely, we will attempt to accommodate all types of quantitative verse by addressing the known challenges of the 3 central traditions: Greek/Latin, Sanskrit, and Arabic.

But first, let us "dot our i's and cross our t's — and mind our p's and q's"...

Typography edit

Preliminary criteria edit

1. Symbols likely to display correctly under most conditions (font/browser/operating system)

2. Monospaces correctly (really, a special instance of #1)

3. One prosodic element = one character: avoid formatted characters, combining characters, repeated characters

4. Easy to type, edit

5. Visually meaningful; as far as possible, reflect scansion tradition

Brill symbols edit

Starting with the symbols listed in Brill's notes (Rietbroek 2008), I tested Criterion #1 — whether they'd actually display — over several relatively standard and capacious fonts I happened to have on my computer (Arial Unicode, Calibri, Consolas, Courier New, DejaVu Sans, Liberation Sans, Linux Libertine G, Lucida Sans Unicode, TeXGyreScholia, Times New Roman). Results were broadly similar. If the symbol usually appeared correctly across fonts, I gave it a Y or N in the 1:1 column, based on whether it passed Criterion #3; an asterisk (*) indicates that the symbol was generally supplied by a math font, not by the specified font (if no Y/N/* appears, this means the symbol failed to appear correctly across most tested fonts). I also downloaded and tested New Athena Unicode in which, as expected, all symbols worked. Note that the description is from the PDF and does not always correspond to the Unicode tag. Using this table, you can see for yourself which symbols do or do not render on your system(s).

#	Symbol	Unicode	1:1	Description	Notes
1	×	00D7	Y	(multiplication sign) anceps
2	⏑	23D1		metrical breve
3	–	2013	Y	(EN dash) longum
4	⏒	23D2		metrical long over short
5	⏓	23D3		metrical short over long	Y in Linux Libertine G
6	⏔	23D4		metrical long over two shorts
7	⏕	23D5		metrical two shorts over long
8	◯◯	25EF	*	Aeolian basis	N in Arial Unicode & DejaVu Sans (repeated characters)
9	⏖	23D6		metrical two shorts joined
10	⌒	2312	*	brevis in longo	Y in Arial Unicode
11	̭	032D	N	catalexis indicator	(combining with preceding space)
12	⁝	205D		tricolon	Y in DejaVu Sans
13	\|	007C	Y	word end indicator
14	‖	2016	Y	period end	Absent in Calibri & Courier New
15	\| \| \|	007C	N	stanza end	(repeated characters)
16	⊗	2297	*	stanza end	Y in Lucida Sans Unicode & DejaVu Sans & Arial Unicode & Linux Libertine G
17	^H	0048	N	hiatus	(superscript 'H')
18	∫	222B	Y	dovetail	Absent in TeXGyreScholia
19	~	007E	Y	responsion
20	¨	00A8	Y	anaclasis
22	́	0301	N	ictus	(combining character)
23	͡	0361	N	bridge	Absent in Lucida Sans Unicode & TeXGyreScholia (combining character)
24	⏗	23D7		metrical triseme
25	⏘	23D8		metrical tetraseme
26	⏙	23D9		metrical pentaseme

Taking "Y" symbols as our first cut of "usable symbols", we note that many of these will likely not be very useful in a reasonably simple quantitative scansion, while some quite foundational symbols (like the longs & shorts over each other) are absent or rare.

"Y" symbols — first cut

× – | ‖ ∫ ~ ¨

Additional symbols edit

At this point we may already begin to wonder if the solution must involve a distinct underlying code + transformed human-readable display. But let us first catalogue a few other symbols that might be of use.

#	Symbol	Unicode	Description	Notes
27	/	002F	slash
28	\	005C	backslash
29	¦	00A6	broken bar
30	!	0021	exclamation point	rare in scansion, but possibly useful in underlying code
31	#	0023	number sign	rare in scansion, but possibly useful in underlying code
32	^	005E	caret
33	˘	02D8	breve
34	∪	222A	union	(math symbol; not available in most fonts)
35	̆	0306	combining breve	(combining: for display only?)
36	͝	035D	combining double breve	(combining: for display only?)
37	1	0031	numeral 1	rare in scansion, but possibly useful in underlying code
38	2	0032	numeral 2	rare in scansion, but possibly useful in underlying code
39	3	0033	numeral 3	rare in scansion, but possibly useful in underlying code
40	B	0042	capital B	rare in scansion, but possibly useful in underlying code
41	D	0044	capital D	rare in scansion, but possibly useful in underlying code
42	U	0055	capital U
43	u	0075	lowercase u
44	W	0057	capital W	rare in scansion, but possibly useful in underlying code
45	w	0077	lowercase w	rare in scansion, but possibly useful in underlying code
46	X	0058	capital X
47	x	0078	lowercase x
48		0020	space

Monospacing edit

Symbols which consistently render correctly in monospaced fonts are extremely valuable when demonstrating scansion online, as this allows WYSIWYG editing:

–  ˘  ˘ –  ˘   ˘ – |  – –   –    – ˘  ˘  – –
Arma virumque cano, Troiae qui primus ab oris

To the extent that the symbols used in Wikidata's claims should match those used elsewhere in the Wikimedia Empire, then symbols likely to monospace well should also be used in Wikidata. Consider the "union" symbol:

1234567890
∪∪∪∪∪∪∪∪∪∪

On my browsers, anyway, this supposedly monospaced character renders at a width of 1.2! This is because it's actually being replaced with a character from a (non-monospaced) math font. So (1) "union" is bad at monospacing, (2) the proper Unicode metrical breve is largely unavailable, and (3) the crappy little breve above is too high and shrimpy. This pretty much leaves us with (gasp) "u":

–  u  u –  u   u – |  – –   –    – u  u  – –
Arma virumque cano, Troiae qui primus ab oris

A truly ugly solution, but universally displayable, and easily entered.

Latin and Greek edit

Halporn, Ostwald, and Rosenmeyer (1980, pp 3-4, 61-62. Hereafter "HOR".) use the same symbols whether scanning Greek or Latin verse:

#	Symbol	Description
3	–	Longum
2	⏑	Breve
1	×	Anceps
11	̭	Lack of 1 element: acephaly (headlessness) or catalexis (taillessness)
27	/	regular occurrence of a word break
-	(dashed slash)	Alternative positions of work break (not in Unicode)
27 × 2	//	Pause, i.e. end of period
27 × 3	///	End of strophe
23	͡	Bridge
6	⏔	Though #6 & #7 are not defined specially in their list of sigla, they are special instances of #3 & #2, and are used in HOR.
7	⏕

Nasty units edit

However, these symbols are combined ad lib to form a wide variety of compound symbols. The scansions discussed below are not just theoretically possible, they actually occur in HOR (and not as crazy exceptional cases).

Examples of standard quantitative scansions from HOR which present special rendering challenges

Left (happy little emoji): This symbol (which, by analogy to anceps and biceps, I will call triceps) indicates a position which may hold 1 long or 1 short or 2 shorts. It does not occur in Unicode, but I believe it is necessary that Wikidata somehow come to terms with this unit, as it is a standard component of verse used by Plautus and Terence (and surely others). For display purposes, the best (non-LaTeX) I can come up with is ∪͝∪ which renders as ∪͝∪ or ... probably a bit better ... u͝u which renders as u͝u. (A similar trick I find less appealing is u͜͞u which renders u͜͞u.) But even if these are workable display solutions, they destroy any sense of the value as code.
Center (no skateboarding): This indicates a biceps (the 2 shorts in a dactyl which may be replaced by 1 long) which may optionally have a break within it (if, of course, it's the 2 shorts). I assume that an effect like this would require LaTeX. I will argue later that Wikidata may be best off ignoring all optional breaks.
Right (park bench with double rainbow): This is another biceps which is bridged with the following long … but of course depending upon which way the biceps is realized, it can be a long-long bridge or a short-long bridge. I assume this too would require LaTeX. I will also argue against coding for bridges in Wikidata.

Principled limitations in Greek and Latin edit

Wikidata can decide to code absolutely all structure in a verse line type, but the cost seems great, both for humans trying to edit the code, and browsers trying to render it.

Several principled limitations will greatly reduce these difficulties, without (I think) much compromising the verse structure:

1. Code no bridges. The Greeks and Romans didn't know they had them (even though they did); why should we spend a combining character on them? Furthermore, at least some bridges are genre-specific, so if we were to code them, we'd either have to create distinct items for (say) tragic iambic trimeter versus non-tragic iambic trimeter — or we'd have to take the further step of distinguishing optional bridges... a bridge too far!

2. Code no optional breaks. While mandatory breaks (like that in the middle of the 2nd line of an elegiac couplet) should be retained, optional breaks can be viewed as slightly less a feature of verse structure and slightly more a feature of a poet's style. And (as we saw above) when they occur in the midst of a biceps, they may be difficult to render.

3. Do not distinguish the more likely syllable type. Anceps (×) and biceps positions are sometimes distinguished as being primarily filled with short, secondarily with long (or vice versa) by displaying the secondary symbol above the primary one. Let us simplify our character set by not caring which is more common.

4. Do not code the 1%. Since we are coding line types, not individual lines, we aim to code all possibilities. In exceptional cases, established rules are broken, either by incompetence, miscopying, or by inspired willfulness. Let us not let outliers stand in the way of coding the norm.

5. Code no foot/metron divisions. Many texts (especially introductory texts) graphically divide a line into its component feet or metra (Arma vi|rumque ca|no, Troi|ae qui |primus ab | oris). My sense is that more academic publications move away from this. For our purposes, ignoring foot/metron breaks eliminate a lot of clutter and free up that break symbol (which would be either | or /) for use elsewhere. (This is probably the least important of my "principled limitations" and may well be reversed, especially considering the probable utility of foot/pāda markers in Sanskrit and Arabic verse).

We'll summarize just what elements are really needed, and what symbols to put to them, after addressing quantitative scansion in some additional languages.

Sanskrit edit

Closely analogous to Greek and Latin (G/L), Sanskrit measures heavy and light syllables. Unfortunately, the typical symbols are reversed:

G/L type	G/L symbol	Skt type	Skt symbol	Skt letter	Unicode
long	–	heavy (guru)	∪	ग (G)	0917
short	∪	light (laghu)	– (or \|)	ल (L)	0932

Sanskrit also has its own traditional symbolic scansion, using Devanagari letters to indicate series (gaṇas) of 3 syllables. A section with syllable count not divisible by 3 results in a series of letters + 1 or 2 guru or laghu symbols tacked on at the end.

Devanagari	Unicode	Roman	G/L scansion
म	092E	M	– – –
य	092F	Y	∪ – –
र	0930	R	– ∪ –
स	0938	S	∪ ∪ –
त	0924	T	– – ∪
ज	091C	J	∪ – ∪
भ	092D	Bh	– ∪ ∪
न	0928	N	∪ ∪ ∪

I do not know to what extent gaṇa (Sanskrit letter) and symbolic (light=| heavy=∪) scansions are still used; but these and standard G/L scansion should in principle be able to be losslessly transformed into one another. However, as far as I've seen, Sanskrit notation has no letter or symbol for anceps or any other type of position which allows multiple possibilities. This suggests that Sanskrit notation is in fact optimized for scanning realized verses, not the underlying metrical form. If this is the case, then while Sanskrit notation can be losslessly transformed to G/L, the reverse is not true, and our project — to scan the underlying meters — might require a G/L foundation.

Many types of Sanskrit verse (e.g. the shlokas of Mahabharata and Ramayana) typically mark the end of the first half-verse with | (#13) and the end of the second half verse with ‖ (#14, or 2 × #13). Sanskrit verses tend to be quite long, and in European versification would usually be conceptualized as couplets or short stanzas; even in Sanskrit they may be printed as 2, 3, or 4 lines.

The Indian elephant in the room edit

There is a big challenge with coding some Sanskrit verse, which is not seen (or seen rarely) in Greek or Latin. Consider one of the most important Sanskrit meters, the Anuṣṭubh. It comprises 4 pādas of 8 syllables each. Here are the scansions of 2 instances of (I think!) well-formed pādas:

u – u – u u u u
- u - - - - - -

We logically deduce that the underlying metrical structure must be:

x x x - x x x x

This would give us 2⁷ = 128 variants per pāda. But there aren't that many. This is because the acceptable values of an anceps are dependent upon the values of the other anceps within the pāda. Now different genres and periods can have slightly different dependency rules, but I will tabulate the list given by Brown (1869, p 5-6.). Assume that the 1st and 8th positions are independent: they can be heavy or light in any case. We are left with 2 sets of 3 positions each (that is, 2 gaṇas) , which we can crosstabulate. "1" indicates the combination is valid for the 1st pāda, "2" for the 2nd pāda. For the 1st pāda only 18 out of the expected 64 possibilities are well-formed — for the 2nd pāda, only 5.

2nd → 1st ↓	M – – –	Y ∪ – –	R – ∪ –	J ∪ – ∪	Bh – ∪ ∪	N ∪ ∪ ∪
M: – – –		1	1	2	1	1
Y: ∪ – –		1	1	2	1	1
R: – ∪ –	1	1	1		1	1
S: ∪ ∪ –
T: – – ∪		1		2	1	1
J: ∪ – ∪		1		2
Bh: – ∪ ∪		1		2
N: ∪ ∪ ∪

That's the first half-line; the second half-line exhibits exactly the same structure. How on earth to scan this efficiently will be addressed later. For the mean time, we merely observe that a simple formulation like

x x x x x x x x ¦ x x x x u – u x |

is true (in a sense) but does a very poor job of describing the actual metrical structure. Whereas displaying 18×5=90 distinct scansions per half-line is probably not a great idea either!

Mātrika (moric) meters edit

An entire category of Sanskrit verse is structured rather differently, though still quantitatively. Moric verse — rather than specifying valid strings of light and heavy syllables as we have encountered above — simply specifies a total length value for a given segment of verse, where light=1 mora, and heavy=2 morae. Evidently moric verse (mātrika in Sanskrit) is especially prevalent among Prakrit (related vernaculars) prosodies. Let us take the Āryā meter. It comprises 4 pādas with lengths 12-18-12-15. This might be symbolically represented:

12 morae ¦ 18 morae | 12 morae ¦ 15 morae ||

or, more granularly:

m m m m m m m m m m m m ¦ m m m m m m m m m m m m m m m m m m |
m m m m m m m m m m m m ¦ m m m m m m m m m m m m m m m ||

Scansion from Gasparov (1996) symbolizing the 5 possible combinations of light & heavy filling a segment of 4 morae.

where m=1 mora. The number of permutations of light and heavy syllables which would fill these pādas … well, I don’t want to think about it. However, it appears that in reality, these large pādas are composed of smaller measures of 4 morae, which by definition can only be realized in 5 ways:

m m m m =
u u u u
u u –
–   u u
u –   u
–   –

This suggests that coding for dependent anceps (discussed below), or something very similar, could be used to represent this meter, at least in the data. However, how this is optimally displayed is still an open question. Gasparov's clever symbol can be approximated with uu͞uu which renders uu͞uu; or it can be quite closely duplicated with the exotic uu uu which renders as uu uu. Other less fussy options include:

uuuu uuuu uuuu ¦ uuuu uuuu uuuu uuuu uu | uuuu uuuu uuuu ¦ uuuu uuuu u uuuu uu ||

or, more efficiently (and I think more clearly):

4m 4m 4m ¦ 4m 4m 4m 4m uu | 4m 4m 4m ¦ 4m 4m u 4m uu ||

I have several open questions about mātrika, and further research is needed. So my proposed solutions below only provisionally account for this metrical category.

Arabic edit

This section will only look at Classical Arabic verse, predominant from around the time of Muhammad through the mid-20th century. Pre-classical and modern Arabic verse may present other issues. Arabic prosody immediately presents 3 incompatibilities with Greek/Latin/Sanskrit: direction, and prosodic units and scansion (which will be treated together).

Direction of text edit

Arabic is written right-to-left (RTL). Of course romanized Arabic runs LTR. What order should a symbol string representing the meter run in? All text and scansions in this essay will run LTR, and it is my tacit assumption that Wikidata's values will also run LTR — though this is open to discussion. Assuming WD's values do run LTR, if it is desirable to be able to display these scansions RLT under some circumstances, then what are our options? A second property? Automated transformations (e.g. via template)? Manual transformations?

I will not address Persian verse in this essay, but it is (to my knowledge) the only other major language which features both RTL writing and quantitative prosody (there may be others). Persian verse is closely modeled on Arabic and, for the present, it is assumed that a system that supports Arabic scansion will provide a firm basis for Persian.

Prosodic units and scansion edit

Contrary to the examples of Greek, Latin, and Sanskrit, Arabic prosody does not conceptualize its verse to be composed of syllables, but of hierarchical groups ultimately defined by phoneme types. To briefly describe the prosodic system, moving from small to big, where C=consonant, v=short vowel, and V=long vowel…

The ḥarf is the minimal linguistic unit. It is either moving (Cv) or quiescent (C or V). The ḥarf is the basic unit of time, so is essentially equivalent to "mora". However, whereas in other prosodies (e.g. Latin, Japanese) a single mora can function as a prosodic unit, in Arabic the ḥarf is too brief, so its Elementary Prosodic Units (EPUs) are composed of multiple ḥarfs. The minimal EPU is CvC = 2 ḥarfs = "long syllable" (though, as stated, this concept is foreign).

EPU	ḥarfs	G/L equivalent	Notes
Light Cord (khafīf)	2	–	optionally replaced with ∪ (i.e. it's an anceps)
Heavy Cord (thaquīl)	2	∪ ∪	optionally replaced with – (i.e. it's a biceps)
Joined Peg (majmūʿ)	3	∪ –
Separated Peg (mafrūq)	3	– ∪	Theoretically defective, but necessary to explain some verse forms.
Fāsila Cord	4	∪ ∪ –	Some prosodists reject the Heavy Cord, and use Fāsila instead.

The prosody admits 8 metrical feet, each composed of 1 peg + 1 or 2 cords. Arabic verse has no traditional abstract system of scansion. Metrical form is traditionally represented, not symbolically, but verbally by means of 8 mnemonic words which exemplify the feet.

Mnemonic	G/L scansion	ḥarf values	Cord/peg composition
faʿūlun	∪ – –	32	J + L
fāʿilun	– ∪ –	23	L+ J
mafāʿīlun	∪ – – –	322	J + L + L
mustafʿilun	– – ∪ –	223	L + L + J
fāʿilātun	– ∪ – –	323	L + J + L
mufāʿalatun	∪ – ∪ ∪ –	322 or 34	J + H + L (or J + F)
mutafāʿilun	∪ ∪ – ∪ –	223 or 43	H + L + J (or F + J)
mafʿūlātu	– – – ∪	2221*	L + L + S*

*Theoretically defective, but necessary to explain some verse forms.

Like Sanskrit verses, Arabic verses tend to be quite long, with major divisions within them. Here are 3 different scansion views of the Ṭawīl verse:

ḥarfs   3  2    3  2 2    3  2    3  2 2    3  2    3  2 2    3  2    3  2 2
G/L    u - - | u - - - | u - - | u - - - ! u - - | u - - - | u - - | u - - - #
Mnem  faʿūlun mafāʿīlun faʿūlun mafāʿīlun faʿūlun mafāʿīlun faʿūlun mafāʿīlun

In European versification, this structure would tend to be thought of as a couplet.

Representation edit

Though Classical Arabic prosody does not use the concept of the syllable, all metrical structure can be reduced to short and long syllables. So using Greek/Latin symbols should render the structure losslessly. I assume that this will be our primary method. However, if it is also desirable to display Arabic meters with their traditional scansion, it may be necessary to keep the foot divisions in the G/L-style scansion (contra my G/L notes above), as this will help identify the correct mnemonic words. It is my understanding that ḥarfs, pegs, and cords, while essential furniture in Arabic prosody, are not considered part of the scansion.

This raises the possibility of 4 distinct displays for Arabic scansion:

LRT Greek/Latin symbols
RTL Greek/Latin symbols
LRT Arabic mnemonics (romanized)
RTL Arabic mnemonics (Arabic script)

Moving forward edit

Because the Sanskrit and Arabic systems are to some degree deficient in symbols descriptive of the underlying meter (as opposed to realized verse lines), I propose that Wikidata's values be founded upon the Greek/Latin system, though some modifications may be necessary.
Because of the difficulties of displaying certain units, I am advocating for a system in which the Wikidata values are strings of Code, which can be called up by templates which transform them into strings of the Display values. If later it is determined that better Display values are available, or if the Unicode options become sufficiently common, then these transforming templates can be updated and all displayed instances of the scansions will be automatically updated, without having to alter the underlying values. This also leaves the door open to the creation of additional transforming templates to display the coded scansions in formats native to Sanskrit, Arabic, or other prosodies (though as discussed above, the native display of Sanskrit underlying meters is questionable).

Minimal prosodic units edit

In the table below, I try to answer these questions:

1. Regardless of details of coding or display, what are the minimal prosodic units which must be included if we wish a single system to accommodate Greek, Latin, Sanskrit, and Arabic quantitative scansions?

2. In a perfect Unicode world, what symbol is optimally mapped to the unit?

3. In the existing world of unpredictable browsers and fonts, how do we optimally display the unit?

4. Assuming that the Wikidata value will need to be transformed into different characters for display anyway, what are the optimal code values?

#	UNIT	Unicode	Unicode code	Display	Display code	Code	Code code	Description
1	unit separator	[space]	0020	[space]	0020	[space]	0020	Unit separator, both for code and display.
2	Light syllable	⏑	23D1	∪ or u	222A or 0075	u	0075	=G/L "short". "Light" is probably more accurate, even for G/L.
3	Heavy syllable	–	2013	–	2013	-	002D	=G/L "long". "Heavy" is probably more accurate, even for G/L. Note that I'm assuming an EN dash for Unicode & Display, but a hyphen for Code.
4	Independent anceps	⏒	23D2	×	00D7	x	0078
5	Dependent anceps	[?]	?	[?]	?	X	0058	If this unit is used, it will require additional coding apparatus, discussed below.
6	Biceps	⏔	23D4	∪∪ or uu	`<u>∪∪</u>` or `<u>uu</u>`	w or B or 2	0077 or 0042 or 0032	Literally a double-u. Biceps starts with B and looks like a sideways B. 2 stands for BIceps.
7	Triceps	[NA]	-	∪͝∪ or u͝u	`<u>∪͝∪</u>` or `<u>u͝u</u>`	3	0033
8	Foot/ metron/ pāda divider	[not displayed?]	-	[not displayed?]	-	\|	007C	I'm still inclined to ignore these for G/L, but unsure. They may be necessary for Sanskrit and Arabic code. If they must be displayed for S and/or A, then perhaps the broken bar (¦=00A6) would be appropriate.
9	Mandatory word boundary	/	002F	/	002F	/	002F	=G/L "caesura".
10	Intra-verse line boundary	\|	007C + <br>	\|	007C + <br>	!	0021	Pipes displayed only for Sanskrit.
11	Verse end	‖	2016	\|\|	007C × 2			This unit is not needed in code; pipes displayed only for Sanskrit.

Dependent anceps edit

To my knowledge, the dependent anceps problem applies chiefly to Sanskrit verse, but this discussion may be germane in other situations.

The easy path

We may choose not to solve the problem, and simply display:

x x x x x x x x ¦ x x x x u – u x |

even though we know this is a poor description of the actual metrical structure. After all, this complex situation can never be fully explained with a mere scansion; this is what articles are for. However, even if we take this path of avoidance, I recommend that we still keep a discrete code for "Dependent anceps" (e.g. capital X), and just transform this code to the same display value as "Independent anceps". This is because if we later decide to make the system more robust, at least some of the required structure will already be there.

The hard path

The other option is to attempt to characterize the dependencies in the scansion code. Whether and how these complex dependencies are displayed is another question. Certainly our initial transforming templates would simply ignore this information, so that to the viewer there would initially be little or no difference between the easy and hard paths … the difference would be hidden within the coded value. The display might either be identical to the above example, or just subtly different, say:

x x x x x x x x ¦ x x x x u – u x |

or

x X X X X X X x ¦ x X X X u – u x |

However, the code would contain robust information. (For now, I am using the gaṇas for shorthand, since they are germane for Sanskrit; if we pursue this hard path, we'll have a fuller discussion of how this will be managed.) The slightly easier method is to specify each half-pāda's valid possibilities:

x X X X (M Y R T J B) X X X (M Y R B N) x | x X X X (M Y T J B) u – u x !

This correctly limits which gaṇas could appear in a given position, but fails to account for the interaction between the 2 gaṇas within the 1st pāda. To do this, more extensive coding would be required:

x X X X X X X (<M Y> <M R> <M B> <M N> <Y Y> <Y R> <Y B> <Y N> <R M> <R Y> <R R> <R B> <R N> <T Y> <T B> <T N> <J Y> ) x | x X X X (M Y T J B) u – u x !

This perfectly reflects all dependencies (at least according to Brown). A more hierarchical approach is probably superior; in that case, we'd get something like this:

x X X X X X X (<M <Y R B N>> <Y <Y R B N>> <R <M Y R B N>> <T <Y B N>> <J Y> ) x | x X X X (M Y T J B) u – u x !

All of these examples scan only the 1st half-verse (though the second is a verbatim repeat of the first). Just for reference, the above formulation is real, but I suspect it to be one of the worst-case scenarios of complexity. But no promises. Naturally, all hard paths require additional reserved code characters.

Display

I think it would be valuable to house this complex dependent anceps information in our code, even if we find no adequate way to display it. At the moment, the only display solution that makes sense to me is a modification of Quinn's (see Phalaeceans, below). I propose the following rules:

1. Begin the scansion with the entire verse on 1 line (no line breaks, even at [!]. This scansion displays the simple “X” (or whatever we decide to use) for each dependent anceps.

2. Report out each variant segment (but only the variant bits) on its own line.

3. When multiple sets of variants need to be reported, and they are not mutually dependent, extend broken bars as far down as they both run together.

4. When one defined set of variants is repeated, do not repeat the list, but label the initial list, and repeat the label.

Given this system, Brown's Anuṣṭubh (again, this is probably a near-worst-case scenario) would be displayed:

x X X X X X X x ¦ x X X X u – u x | x X X X X X X x ¦ x X X X u – u x ||
 (A)            ¦  (B)               (A)               (B)
  – – – u – –   ¦   – – –
  – – – – u –   ¦   u – –
  – – – – u u   ¦   – – u
  – – – u u u   ¦   u – u
  u – – u – –   ¦   – u u
  u – – – u –
  u – – – u u
  u – – u u u
  – u – – – –
  – u – u – –
  – u – – u –
  – u – – u u
  – u – u u u
  – – u u – –
  – – u – u u
  – – u u u u
  u – u u – –
  – u u u – –

This system may work equally well for moric verse. The Āryā might be displayed:

4m      4m 4m ¦ 4m 4m 4m 4m uu | 4m 4m 4m ¦ 4m 4m u 4m uu ||
u u u u
u u –
–   u u
u –   u
–   –

I do not imagine these complex displays replacing the simple displays, but being an alternative we can offer, once the "simple" display is worked out.

Additional code units edit

To implement dependent anceps at the Code level, something like these additional Units will be necessary. (These are not used consistently in the examples above.)

#	UNIT	Code	Unicode	Description
12	alternative separator	,	002C	Separates elements, any 1 of which may alternatively be mapped onto the given positions.
13	dependent containers	[ ]	005B & 005D	Contains 1 group of dependent anceps (e.g. `[X X X]` ) whose alternatives must then be listed. In standard display, these brackets are hidden, but their content is still displayed.
14	list containers	( )	0028 & 0029	Contains the list to be mapped onto the previous dependents. In standard display these containers and everything inside them are totally ignored.
15	alternative containers	< >	003C & 003E	Occur within (). Contain distinct alternatives that may be mapped onto the group of dependent anceps. May be nested, e.g. `<- <- u, u u, u ->>` = `- - u` OR `- u u` OR `- u -`.
16	group labels	letters (lowercase?)		If group labeling is desired (e.g. "(A)" and "(B)" in the Anuṣṭubh example above) these should be implemented in the code with the literal letters that are to be displayed. These labels should probably be placed within the dependent containers. Possibly they could be placed within the list containers … the first time followed by the list they label, the second time alone (e.g. `(A)`)indicating a repetition of the previously-defined list.

Additionally, for moric verse to be displayed, we will probably have to add:

#	UNIT	Code	Unicode
17	mora count	numerals	0030-39
18	mora indicator	M	004D

Both of these will probably be displayed as coded, e.g. 4M is displayed: 4M. However, implications of using dependent anceps structure with moric verse have not yet been fully vetted.

Stray notes edit

Phalaeceans and their implications edit

A closer look at Latin hendecasyllabics (phalaeceans here, for better clarity) highlights some issues. Consider these scansions:

x x – u u – u – u – u (PEPP3&4)

x x – u u – u – u – x (Cole in Wimsatt; also PEPP1&2 since you asked)

x x – u u – u – u - - (HOR)

- - - u u – u – u – x (Quinn)
- u
u –

References

First, this underlines the importance of including references in our statements. This is a well-studied and (I think) basically uncontroversial meter, but the first 4 sources I checked all gave subtly different scansions. I suspect the final syllable in PEPP3&4 is simply an error sadly perpetuated through editions. If the final syllable in HOR includes an "understood" brevis in longo, then it is functionally equivalent to Cole. But I don't know.

Qualifiers

Second, I wonder if Quinn's minority opinion on the first 2 positions is the result of his writing specifically about Catullus’s phalaeceans, not phalaeceans in general. This encourages us to include an "as used by/in" property as qualifier for our statements. The evolution of verse practice is such that — although conforming to the same overall pattern — often one poet allows certain configurations that another does not. (As a second example, the dependent anceps formulation of Kālidāsa's Anuṣṭubh is radically simpler than that of the Epics — probably each should appear as distinct values of Anuṣṭubh: quantitative metrical pattern, one with the qualifier "as used by Kālidāsa", the other with the qualifier "as used in the Mahābhārata and Rāmāyaṇa".)

Dependent anceps

Third, the most interesting difference (to me, anyway) is that Quinn specifies that the first 2 syllables are dependent anceps. (That is, if they were independent, Quinn would have to list u u as a fourth alternative — all 4 PEPPs explicitly list all 4 possible combinations in the first 2 positions; Quinn lists 3, as shown.) Thus we have a non-Sanskrit instance of a scansion that ideally would be coded with a dependent anceps scansion: X X (- -, - u, u -) - u u - u - u - x. I have not yet been able to confirm this with a reliable source, but the English Wikipedia's article on meter suggests that even the highly regulated Classical Arabic has an instance too.

References edit

Brown, Charles Philip (1869). Sanskrit Prosody and Numerical Symbols Explained (PDF). London: Trübner & Co.
Gasparov, M.L. (1996). A History of European Versification. Oxford: Clarendon Press. ISBN 0-19-815879-3.
Halporn, James W.; Ostwald, Martin; Rosenmeyer, Thomas G. (1980). The Meters of Greek and Latin Poetry (Revised ed.). Indianapolis: Hackett Publishing Company, Inc. ISBN 0-87220-243-7.
Morgan, Les (2011). Croaking Frogs: A Guide to Sanskrit Metrics and Figures of Speech. Mahodara Press. ISBN 978-1463725624.
PEPP3: Preminger, Alex; Brogan, T.V.F.; et al., eds. (1993). The New Princeton Encyclopedia of Poetry and Poetics. New York: MJF Books. ISBN 1-56731-152-0.
PEPP4: Greene, Roland; Cushman, Stephen; et al., eds. (2012). The Princeton Encyclopedia of Poetry and Poetics (Fourth ed.). Princeton, NJ: Princeton University Press. ISBN 978-0-691-13334-8.
Quinn, Kenneth, ed. (1973). Catullus: The Poems (2nd ed.). London: St. Martin's Press. ISBN 0-333-01787-0.
Rietbroek, Pim (2008). Metrical Notation: A Guide to the most frequently-used symbols in Unicode (PDF). Version 1.0.3. Brill.
TEI Consortium, eds. "6 Verse". TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 2.9.1. 2015-10-15. TEI Consortium. (Accessed 2016-01-02)
Wimsatt, W. K., ed. (1972). Versification: Major Language Types. New York: New York University Press. ISBN 08147-9155-7.