mkmss a manuscript copying simulation. underlying model this copying simulation is based on a...

mkmss

A manuscript copying simulation

Underlying model

This copying simulation is based on a geographical model where “Places” (= population centres) create demand for texts. There are four places in the current version of the simulation: Rome, Ephesus, Antioch, and Alexandria.

The simulation runs through a number of cycles (called “generations”). Each cycle consists of a number of steps: (1) import copies from other

places; (2) make copies from local copies; (3) edit local copies according to local preferences; (4) lose copies; (5) grow demand for copies (using logistic growth).

Some copies are recovered at the end. Recovered copies are taken from ones that survive until the end of the cycles (extant copies) and ones lost during the cycles (lost copies).

A model data set

Examples of a real data sets can be found here:

http://www.tfinney.net/Views/index.xhtml

We will aim to find a combination of simulator settings to produce analysis results like those obtained from UBS4 apparatus data for the Gospel of Mark:

CMDS:

http://www.tfinney.net/Views/cmds/Mark-UBS4.15.SMD.gif DC:

http://www.tfinney.net/Views/dc/Mark-UBS4.15.SMD.png NJ:

http://www.tfinney.net/Views/pheno/NJ/Mark-UBS4.15.SMD.png

http://www.tfinney.net/Views/index.xhtml

Start the simulation

The simulation can be temporarily accessed here:

https://mkmss.shinyapps.io/mkmss

Another way to access it is by installing RStudio's Shiny on your machine then downloading the files located here:

http://www.tfinney.net/Simulation/scripts/Apps/mkmss/

https://mkmss.shinyapps.io/mkmss



Characters/text

A character is a place where the text varies. (In NT textual research, a character is called a “variant phrase”.)

A state is one of the textual manifestations (of zero or more words) encountered at a character. (In NT textual research, a state is called a “reading”.)

A character can have two or more states. This simulation uses a negative binomial distribution to decide how many states are contained in each character. (The distribution is calibrated to behave similarly to the UBS4 apparatus with respect to the number of states per character.)

Experiment with this slider to see the effect of different numbers of characters per artificial text. As with a number of other sliders, larger values make the simulation work harder and therefore take longer to complete.

Generations/simulation

How many cycles are completed. With a growth factor of r = 1, it takes about eight

generations for the population of copies to get near its maximum size.

Larger values make the simulation work much harder.

P(import) / unit (of demand)

For every copy required by a Place, what is the chance of importing one from another Place?

Copies are more likely to be imported from nearer places according to the Zipf distribution (P = k / rank^s) using s=1.

Increasing P(import) makes more “cross-talk” between Places. (Look for the word “taken” under the “Recovered texts” tab.)

P(change) / character

For every character copied, what is the chance that it will change to another one of the possible states? (Remember that a number of states is assigned to each character at the start of the simulation.)

What do you think is a realistic proportion in this situation? That is, what is the chance that a scribe would change the state of a UBS-like character in the course of making one copy. The UBS apparatus has about 140 characters (i.e. variant phrases) for the Gospel of Mark. (About 9 per chapter.)

Try different values to see the effect. Large values produce the characteristic pattern seen with unrelated texts – i.e. those which behave the same way with respect to each other as texts comprised of randomly chosen states. In an NJ diagram, unrelated texts produce a diagram that looks like the spokes of a bicycle wheel.

P(correction) / generation

What is the chance that a copy will be corrected against another copy from the same Place?

Copies to use as the exemplar (i.e. used as a source of states for the copy being corrected) are chosen according to their rank in the Place's extant (i.e. not lost) collection. A Zipf distribution (with s=1) is used here too.

New additions (whether imported or copied) are added to the end of the extant collection so they are less likely to be used as exemplars until the ones ranked above them are lost.

What is a reasonable value for this? (Just about every real copy was corrected at least once.)

P(edition) / generation

This introduces the idea of local preference for particular states. Were some readings preferred more than others in a place?

At the start of the simulation, each place is assigned its own list of preferred states for each character. The preferred states are a permutation of the list of states. E.g., for a character with three states, 1 3 2 and 3 2 1 are two permutations. As with just about every other thing in this simulation, the permutations are randomly generated.

What kind of effect would you expect local preferences to have? (Try the slider and see.)

Trend (toward preferred text)

The strength of preference for the higher ranked states is determined by the trend slider. This sets the exponent of the Zipf distribution used to choose from the list of preferences. A value of one gives relative probabilities of 1, 0.5, 0.33, 0.25, … whereas a value of five gives relative probabilities of 1, 0.031, 0.0041, 0.00098, …

The higher the value, the more pronounced is preference for the local flavour of a text. (A value of five or more could be described as pathological parochialism.)

P(loss) / generation

What is the chance that a copy will be lost in a generation? A value of 0.5 means that only one in 2^n survives n

generations. E.g., for 10 generations, the chance of survival would be one in 1024.

What would you guess is the per generation chance of survival for a real New Testament manuscript? (Assume that one generation is 25 years.)

This slider acts as a reference for other “per generation” sliders. E.g. if P(correction) is twice P(loss) then each copy has an expectation value of two correction events in its life time.

Growth / generation (logistic)

The growth is calculated as follows:

ΔN = Ν * r * (1 – Ν/Κ) This slider sets r. (K is set inside the program.) A value of r = 1 makes the demand double

every generation at the beginning. The rate of increase decrease to zero as N/K approaches one.

Lost / extant (for recovery)

What is the relative probability that the recovery phase will retrieve lost copies instead of extant ones?

Lost copies tend to be older, but there are few really old ones.

The simulation is set to recover 60 / 1200 copies: only 5%.

What do you think is the real ratio of lost (e.g. ones found in rubbish heaps, burials, caves) to extant copies (e.g. ones that have survived in monasteries, libraries or museums) for New Testament manuscripts?

Seed (for random numbers)

This lets you try the same slider settings with a different set of random numbers.

The program uses (pseudo) random processes extensively. Setting a seed let's one reproducibly produce the same set of pseudo random numbers(!)

Try different seed values to see the effect.

Mission

Your mission is to try to produce a set of settings which produce analysis results which “look like” results obtained from a real data set (Mark UBS4).

mkmss a manuscript copying simulation. underlying model this copying simulation is based on a...

Documents