looking through the cutoff window - tu/e · looking through the cutoff window / lancia, ... è...

178
Looking through the cutoff window Citation for published version (APA): Lancia, C. (2013). Looking through the cutoff window Eindhoven: Technische Universiteit Eindhoven DOI: 10.6100/IR760624 DOI: 10.6100/IR760624 Document status and date: Published: 01/01/2013 Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: https://www.tue.nl/index.php?id=71870 Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 15. Feb. 2019

Upload: ngothuy

Post on 15-Feb-2019

223 views

Category:

Documents


0 download

TRANSCRIPT

Looking through the cutoff window

Citation for published version (APA):Lancia, C. (2013). Looking through the cutoff window Eindhoven: Technische Universiteit Eindhoven DOI:10.6100/IR760624

DOI:10.6100/IR760624

Document status and date:Published: 01/01/2013

Document Version:Publishers PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

A submitted manuscript is the author's version of the article upon submission and before peer-review. Therecan be important differences between the submitted version and the official published version of record. Peopleinterested in the research are advised to contact the author for the final version of the publication, or visit theDOI to the publisher's website. The final author version and the galley proof are versions of the publication after peer review. The final published version features the final layout of the paper including the volume, issue and pagenumbers.Link to publication

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the Taverne license above, pleasefollow below link for the End User Agreement:

https://www.tue.nl/index.php?id=71870

Take down policyIf you believe that this document breaches copyright please contact us at:

[email protected]

providing details and we will investigate your claim.

Download date: 15. Feb. 2019

https://doi.org/10.6100/IR760624https://research.tue.nl/en/publications/looking-through-the-cutoff-window(6add8900-d29a-45ad-97dd-b08031f19780).html

L O O K I N G T H R O U G H T H E C U T O F F W I N D O W

L O O K I N G T H R O U G H T H EC U T O F F W I N D O W

carlo lancia

CIP-DATA LIBRARY Technische Univesiteit Eindhoven

October 2013, carlo lancia

Looking through the cutoff window / Lancia, Carlo

A catalogue record is available from the Eindhoven University of TechnologyLibrary.

ISBN: 978-90-386-3493-7

MSC2010: 60B10, 60J05, 60K25, 90B20, 90B22.

Subject Headings: Markov chains, Cutoff Phenomenon, Stationary Distri-bution, Queues with Correlated Arrivals, Late Customers, Air Traffic Man-agement, Airport Congestion.

Cover image and design: Carlo & Michele Lancia

Printed by Whrmann Print Service, Zutphen, The Netherlands.

A B S T R A C T

The main topic of the present dissertation is the cutoff phenomenon fordiscrete-time Markov chains. We speak about cutoff when a random pro-cess experiences a sudden, abrupt convergence at a deterministic time afterhaving been for a long time arbitrarily far from equilibrium.

The first part of this essay builds a general framework for studying cutoffbehaviour. In the recent literature many works have appeared on the topic ofcutoff, but very few of them exploit any intuition from statistical mechanicsin establishing the phenomenon. Cutoff is completely understood nowadaysfor the class of birth-and-death chains. Unfortunately, in many cases of in-terest the process under examination is not within that class. Nevertheless,it is often the case that a projection of the original chain returns a birth-and-death chain or a more opportune process. If this action is performedcapitalising on the entropic properties of the original process, the resultingprojected chain is likely to exhibit a drift towards a relatively small regionof the projected state space. Such a region will correspond to those statesthat are most likely to be visited under equilibrium conditions. The afore-mentioned drift, in turn, can provide a quasi-deterministic trajectory to theportion of the state space where the equilibrium measure is most concen-trated, implying cutoff-like behaviour.

The role played by entropy in highlighting the drift is extensively studiedthrough a number of examples like many-particles systems, card shufflingmodels, birth-and-death chains and random walks on higher-dimensionalstructures. The main results provided therein use the language and frame-work of hitting times to characterise the cutoff phenomenon and the exis-tence of two contributions to the cutoff window, in which the sharp con-vergence takes place. The cutoff window arises as an interplay between thestrength of the drift and the thermalisation time that is needed to relax in-side the region of the state space where the stationary distribution is mostlyconcentrated.

The second part of the research focuses on a queueing system where thearrivals are given by a point process called Pre-Scheduled Random Arrivals.The arrival pattern is obtained by superimposing random fluctuations to aconstant stream of customers. This problem was posed and intensively stud-ied in the late 50s by the founders of queueing theory, but so far it remainsunsolved. This essay proposes a method to approximate the stationary dis-tribution up to the desired precision when the fluctuations are exponentiallydistributed. The results obtained suggest that this queueing system exhibitscutoff as in the first part of the dissertation.

The study of this model is eventually motivated by comparing the theo-retical queue length with actual data from air traffic applications.

v

S O M M A R I O

Largomento principale di questa ricerca il cutoff per catene di Markov atempo discreto. Si parla di cutoff quando un processo stocastico convergeimprovvisamente allo stato di equilibrio dopo essere stato arbitrariamentelontano da tale stato per un tempo molto lungo.

La tesi divisa in due parti. La prima fornisce una serie di strumenti gen-erali, utili per lo studio del cutoff. Su tale tema sono apparsi recentementemolti articoli in letteratura, ma la maggior parte di essi studia il fenomenosenza sfruttare lintuizione che pu fornire la meccanica statistica. Il cutoff ormai completamente caratterizzato per la classe di processi che va sottoil nome di catene di nascita e morte. In molti casi di interesse, purtroppo,il processo in esame non appartiene a tale classe ma e tuttavia possibileottenere una catena di nascita e morte o un processo pi conveniente at-traverso una proiezione. Se questa operazione viene fatta utilizzando almeglio le propriet entropiche del processo originale, il processo proiettatopresenter una tendenza a muoversi verso una regione ridotta dello spaziodegli stati che corrisponde allinsieme degli stati che hanno maggiore prob-abilit di essere visitati in condizioni di equilibrio. Questo effetto di deriva in grado di garantire una traiettoria quasi deterministica verso la porzionedello spazio degli stati dove la misura di equilibrio maggiormente concen-trata, producendo la convergenza tipica del cutoff.

Il ruolo giocato dallentropia nellevidenziare la deriva viene studiato inmaniera approfondita attraverso una serie di esempi, quali sistemi con ungrande numero di particelle, modelli di mescolamento delle carte, catene dinascita e morte, e cammini aleatori su strutture multidimensionali. Il risul-tato principale usa il linguaggio e gli strumenti degli hitting times per carat-terizzare il fenomeno del cutoff e per mostrare lesistenza di due contributialla finestra di cutoff. La finestra lintervallo temporale in cui avviene laconvergenza allo stato di equilibrio, la quale si configura come mutua inter-azione tra lintensit della deriva e il processo di termalizzazione, duranteil quale il processo perde memoria della traiettoria passata e diffonde nellaregione dove la misura stazionaria concentrata.

La seconda parte di questa ricerca focalizzata su un modello di codein cui gli arrivi sono la realizzazione di un processo stocastico chiamatoPre-Scheduled Random Arrivals. Il processo degli arrivi ottenuto imponendofluttuazioni aleatorie ad un flusso costante di clienti. Tale modello statointrodotto negli anni 50 dai fondatori della teoria delle code e da allora stato studiato intensamente, ma una caratterizzazione completa dello statodi equilibrio non e stata ancora trovata. In questa tesi viene proposto unmetodo per approssimare la misura stazionaria alla precisione desideratanel caso speciale in cui i ritardi imposti siano distribuiti esponenzialmente. Irisultati ottenuti sembrano suggerire che tale modello di code esibisca cutoffnel senso illustrato nella prima parte della tesi.

Lo studio di questo modello di code infine motivato da applicazioni inambiti di traffico aereo. In particolare viene confrontata la lunghezza dellacoda teorica con quella proveniente da un database di dati reali.

vi

P U B L I C AT I O N S

Some ideas and figures have previously appeared in the following publica-tions:

[1] Carlo Lancia, Francesca R. Nardi, and Benedetto Scoppola. Entropy-driven cutoff phenomena. Journal of Statistical Physics, 149(1):108141,2012.

[2] Gianluca Guadagni, Carlo Lancia, Sokol Ndreca, and Benedetto Scop-pola. Queues with exponentially delayed arrivals. arXiv preprintarXiv:1302.1999, 2013.

[3] Carlo Lancia and Benedetto Scoppola. Equilibrium and non-equilibriumIsing models by means of PCA. Journal of Statistical Physics, 2013. Toappear.

[4] Maria Virginia Caccavale, Antonio Iovanella, Carlo Lancia, GuglielmoLulli, and Benedetto Scoppola. A model of inbound air traffic: the ap-plication to Heathrow Airport. Journal of Air Transport Management, 2014.To appear.

vii

C O N T E N T S

list of figures xilist of tables xiilistings xiiacronyms xiisymbols xiiipreface xvacknowledgements xix1 introduction 12 the cutoff phenomenon 5

2.1 Representation of a Markov chain 52.2 Topology of the state space 62.3 The equilibrium distribution 62.4 Reversibility 82.5 Markov chain mixing 92.6 The cutoff phenomenon 102.7 Collecting and shuffling 122.8 Cutoff for birth-and-death chains 162.9 Seven shuffles are enough 20

3 sizing up the cutoff window 293.1 How the drift triggers cutoff 293.2 A window with two shutters 333.3 Entropy-driven cutoff phenomena 353.4 Coupon Collectors Chain Revisited 443.5 The Ehrenfest Urn model 453.6 The Lazy Random Walk on the Hypercube 473.7 Mean-field Ising model 49

4 cracking the cutoff window 554.1 Non-reversible random walk on a cylinder 564.2 Partially-diffusive random walk 60

5 exponentially delayed arrivals 655.1 The eda/d/1 queueing system 655.2 The generating function of an eda/d/1 685.3 Intermediate results on Q(z,y) 715.4 Power series expansion of Q(z,y) 735.5 Recursive computation of Q(k)(z,y) 765.6 Analytic results on akj (z) 785.7 Automated computation of the solution 825.8 General expression of Q(z,y) 885.9 Closing Remarks 91

6 cutoff for eda/d/1 936.1 The t auxiliary chain 936.2 The bulk of the stationary distribution 956.3 Heuristic lower bound on Q0,l 976.4 Cutoff for the truncated model 1016.5 Extending the proof to the general case 103

7 pre-scheduled random arrivals 1077.1 The psra/d/1 queueing system 1077.2 The problem of airport congestion 109

ix

x contents

7.3 London Heathrow Airport 1117.4 Insensitivity to the delays distribution 1167.5 The SESAR Programme 1197.6 Mixed-traffic scenarios 1207.7 Python code for PSRA simulation 123

conclusions 131

appendices 133a hitting time of the bulk for the mean-field ising model 135b hitting time of the bulk for the partially diffusive

random walk 141

bibliography 143

L I S T O F F I G U R E S

Figure 1 Graphical interpretation of TV-distance. 8Figure 2 Cutoff for the biased random walk. 12Figure 3 Top-in-at-random shuffle. 13Figure 4 Rising sequences in a deck of 13 cards while it is

riffle-shuffled. 21Figure 5 Approach to equilibrium vs. behaviour of rising se-

quences. 22Figure 6 Distribution of the number of rising sequences in a

deck of 52 cards. 23Figure 11 Approach to equilibrium of a biased random walk of

size 10,000. 30Figure 12 Approach to equilibrium of a biased random walk of

size 100,000. 31Figure 13 The Ehrenfest Urn: approach to equilibrium depend-

ing on the initial state. 46Figure 14 Evolute measure and approach to equilibrium of an

Ehrenfest Urn. 48Figure 15 Coupling scheme for the random walk on the cylin-

der. 59Figure 16 Transitions of EDA/D/1 in the quarter plane. 69Figure 17 Stationary queue of EDA/D/1 : simulation vs. trun-

cated expansion. 87Figure 18 Paths of the typical trajectories from (0, l) back to Tl

for an EDA/D/1 queue. 100Figure 19 Example of PSRA arrival pattern. 108Figure 20 STARs of London Heathrow airport. 111Figure 21 Qualitative layout of the inbound air traffic over Lon-

don Heathrow. 113Figure 22 Fit of the queue distribution at London Heathrow air-

port. 114Figure 24 Output of a PSRA/D/1 queueing system for different

delays PDF. 117Figure 26 PSRA vs. 4D Trajectory, FIFO policy. 121Figure 27 PSRA vs. 4D Trajectory, BEBS policy. 122

xi

L I S T O F TA B L E S

Table 1 Goodness of fit for the queue distribution at LondonHeathrow airport 116

L I S T I N G S

Listing 1 Sage code to compute akj (z). 82Listing 2 Sage code to compute Akj (z). 83Listing 3 Sage code to compute Q(k)(z,y). 84Listing 4 Sage code to compute Q(z,y) up to any prescribed

order n. 84Listing 5 Python code for simulations of different single-server

queue models 123

A C R O N Y M S

ASMA Arrival Sequencing and Metering Area

ATA Actual Time of Arrival

ATC Air Traffic Control

ATM Air Traffic Management

BEBS Best Equipped Best Served

BDC Birth-and-Death Chain

BT Business Trajectory

BVP Boundary Value Problem

EDA Exponentially Delayed Arrivals

EU European Union

FIFO First In First Out

GDP Ground Delay Program

HDR High Density Rule

IATA International Air Transport Association

ICAO International Civil Aviation Organization

IID Independent and Identically Distributed

xii

IFR Instrument Flight Rules

LACC London Area Control Centre

LTCC London Terminal Control Centre

MC Markov Chain

MCMC Markov Chain Monte Carlo

MMQS Markov Modulated Queueing System

PDF Probability Density Function

PSA Power Series Approximation

PSRA Pre-Scheduled Random Arrivals

RHS Right Hand Side

SESAR Single European Sky ATM Research

STAR Standard Terminal Arrival Route

SWIM System Wide Information Management

TMA Terminal Manoeuvring Area

VFR Visual Flight Rules

S Y M B O L S

The following is a list of frequently used symbols, their meaning is invariantthroughout the essay.

i ,j Kroneckers delta;

P (A) Probability of event A;

E [X] Expectation of random variable X;

Var[X] Variance of random variable X;

[X] Standard deviation of random variable X;

Xtn Generic Markov Chain (MC);

n State space of a generic MC;

Pn Transition matrix of a generic MC;

0n Initial measure of a generic MC;

tn Evolved measure after t steps of a generic MC;

n Stationary measure of a generic MC;

}1 2}TV Total-variation distance between 1 and 2 ;

xiii

xiv symbols

Hitting time of a generic set, orGeneric stopping time;

Coalescence time of a generic coupling;

tAn ,u Family of nested subsets;

n Hitting time of the set An , ;

{ Superscript to denote complementary set;

Asymptotic equivalence;

o() Asymptotically dominated by ;

O() Asymptotically bounded above by ;

() Asymptotically bounded above and below by ;

Equivalence relation, orDistribution of a random variable;

7 Superscript in projected quantities;

Traffic index, load;

Qn ,l Stationary distribution of EDA/D/1;

Q(z , y) Generating function of Qn ,l .

P R E FA C E

When I was a bachelor student, the odds of me getting a PhD in Mathemat-ics were two to the power of two hundred and seventy-six thousand seven hundredand nine to one against. My favourite activity was playing tressette, an Italiangame close to bridge. The breaks between lectures were perfect occasions toplay small tournaments. I was indeed so addicted as a player that at thefinal exam of Discrete Mathematics I was asked to compute the probabilityof a ten-card hand* with at least one void suit. It was my baptism.

In the Fall 2004 I met Benedetto Scoppola, who taught me a first coursein Probability and Markov Chains. In the following years I have taken hiscourses in Graph Theory, Advanced Markov Chains, and Queueing Theory.He has also supervised both my bachelors and masters theses, respectively,an application of a perfect sampling algorithm, the Randomness Recycler, tothe Clique Problem on Erdos-Rnyi random graphs, and the development ofan original MCMC algorithm for the so-called Terminal Steiner Tree Problem.These works have the idea of minimising an object function by samplingfrom a Gibbs measure in common, the so-called Statistical Mechanics approachto Operation Research problems.

Regarding the subject of this essay there is a nice story to tell. In thefinal year of my master I had to complete a self-study activity. I thought Iwould ask Gianluca Guadagni, my former teacher of Stochastic DifferentialEquations, for a small research topic. My request was moved by a desire forpayback: I felt he had not fairly graded my exam with respect to some othercolleagues. Expecting something about Stochastic Differential Equations, Iwent to him and asked for a research subject. I had clearly counted mychickens before having hatched them because he suggested me to studycutoff instead. Quite reluctantly, I accepted I had just shaped my futureyears, but who could have ever foretold that?

I started reading [DLP10], which was the most recent paper on cutoff atthat time. Quite soon, though, I encountered the wonderful papers on cutoffby Persi Diaconis, [AD86] and [BD92] in particular, and I decided that I likedthem the most. In those papers cutoff was proved for two deck-shufflingmodels, the perfect match for my cards passion.

After graduating with a masters degree in Mathematical Engineering, Iwent for a few job interviews without the necessary determination: I hadmore than half a mind to try an academic career, and in the end I enteredthe open competitions for a PhD position. Before applying for it, I went tomy former teacher in Quantum Mechanics and Statistical Physics, AndreyVarlamov, and took counsel with him. He is a great scientist and wonderfulspeaker, one of the best I have ever encountered in my life, and a verypragmatic person into the bargain. For being a researcher you need twovectors. he explained to me, his thumb and forefinger forming an L. Thefirst is the vector of exploration, he continued, twisting his wrist as if hisforefinger were a drill, and you need it for getting inside things. However,this vector is rather useless without the second one, the vector of money,which is orthogonal to the former. Only by working in this direction you canlive on research. I think you already possess the first vector but regardingthe second, I can give no guarantees.

* In Italy regular decks are composed of 40 cards.

xv

xvi preface

I was quite lucky to win a position in the same university I had just gradu-ated from, so that I could continue working with Benedetto. Together we de-cided that the best option for my PhD project was to continue studying cut-off phenomena. At that time, a paper by Javiera Barrera, Olivier Bertoncini,and Roberto Fernndez had appeared at arXiv.org. In [BBF09] cutoff wasinvestigated in the class of Birth-and-Death Chains (BDCs) with the explicituse of hitting times and the concept of drift. For BDCs there already existeda complete characterisation of cutoff, but this was actually more focused onthe spectral properties of the transition matrix [DSC06, DLP10], and hittingtimes could be found only by looking under the hood. We thought of bridg-ing the two approaches and started to develope a general methodology forproving cutoff. The idea was to tackle the problem in a way more familiarto statistical mechanics, keeping at the same time the probabilistic view of asudden convergence of the evolved measure to the equilibrium one.

After one year, we had written the draft of a paper which contained aninteresting result, the forebear of Theorem 3.2. It is quite a pity that its proofwas formally incorrect. Even worse, I discovered this rather important factduring a seminar.

It was 2011, and Francesca Nardi had invited me to present my researchat Eurandom. I was actually a bit nervous before the presentation, but Inever expected I was going to argue with Sergey Foss, who was amongthe listeners. The incident happened because of a wrong formula: I hadbounded a random variable instead of its expectation. As a matter of fact,it was not such a big error, because the random variable was supposed tobe quasi-deterministic; it was an informal seminar into the bargain, and Iwas just supposed to present my ideas in a more-colloquial-than-detailedway. At that point, the optimal strategy would have been admitting somesloppiness in the formulas and be safe.

Pride goes before a fall, they say, and I had the very brilliant idea of claim-ing I was right. Sure enough, a quarrel started, during which I managed toscore no points for myself but a few for Sergey. After a while I was deeplyburied in the troubles I had just looked for. In a desperate try for solvingthe situation I scribbled something at the blackboard, but things did onlyworsen as he left his seat and came to the board Surely not to negotiatemy surrender, but to finish me off, I thought*. I was about to give up whenFrancesca intervened in the dispute and snoozed it. I completed the talknearly whispering, running ashamedly through the remaining slides.

Later that year, Francesca became co-supervisor of my PhD project, to-gether with Benedetto we fixed the draft and published our first paper. Itwas dedicated to the loving memory of Roberta Dal Passo, a former profes-sor of the University of Rome Tor Vergata. I will be forever grateful to her,who bred my confidence and mindset.

I visited Eurandom once more during the spring of 2012 to continue mycollaboration with Francesca. The travel also gave me the opportunity to dis-cuss several times with Roberto Fernndez about cutoff, diffusion and ther-malisation. The title of this dissertation was actually coined by him duringan enlightening conversation on the meaning of cutoff. Almost at the endof that visit we asked for a double degree agreement. Thanks to Francescasdogged determination we quickly found a joint agreement between TU/e

* Apparently, this is the standard way for Russian mathematicians to express they have an inter-est in the topic. At the moment, though, I was mainly concerned over the possibility to get justkicked out.

arxiv.org

preface xvii

and Tor Vergata. In November 2012 I moved to Eindhoven and started writ-ing this dissertation.

The essay spans aspects of Theoretical and Applied Probability with thetopic of cutoff phenomena; a little Statistics appears as well in the last chap-ter, which covers an actual Air Traffic Management problem. The disserta-tion also bridges the way cutoff is approached by different communities ofprobabilists, namely, those researchers who study cutoff as a singular kindof convergence to the equilibrium measure, and those researchers who areinterested in cutoff as the potential counterpart (and trigger) of metastablebehaviours. Looking through the cutoff window, those who describe thephenomenon as the former see it with the eyes of the latter.

Extending across many different subjects, the trademark of this disserta-tion just lies in the width of the topics it traverses. Personally, I really likehow smooth the transition is from the initial chapters, purely theoretical, tothe last one, entirely devoted to an important applied problem. As Andreysaid, research means to delve into problems by the vector of exploration,and I had to find a trade-off between width and depth, or I would havenever completed this work. While writing it, I have tried my best to keep theformalism at bay, so as to enhance its bridging qualities. The essay shouldbe mostly accessible even to an undergraduate student, at any rate I havewritten it having this target in mind.

Eindhoven, 2 October 2013

Um, um, um. Stop that thunder!Plenty too much thunder up here.

Whats the use of thunder? Um, um, um.We dont want thunder; we want rum;

give us a glass of rum. Um, um, um!

Herman Melville [Mel51]

A C K N O W L E D G E M E N T S

There are many people I owe much to. I guess this is the right place whereto write that I am really fond of them.

First of all, I wish to thank both my supervisors. Francesca, this jointproject was made possible only by your calm and perseverance. It is a pitythat you have taken only a little part in it, I sincerely wish you all the bestand a fast recovery. Benedetto, you truly represents to me what being ascientist means. I have been a student of yours for ten years or so, it hasbeen a constant growth for me ever since. Should I ever be parted from you,be sure that I will invoke un po di puma first.

Next, there are a few people I wish to credit for their precious help, whichhas been important, if not fundamental, in the realisation of this essay. Iwould like to start expressing my gratefulness to Remco van der Hofstadand Roberto Fernndez, discussing with them has been for me a sourceof positive inspiration. I would like to address earnest thanks to ElisabettaScoppola, for the words of encouragement she had for me at the early stageof this work. I wish to thank also Maria Vlasiou, Stella Kapodistria, andSerban Badila, for the useful conversations we had on the topics of Chap-ter 5. Many thanks to Sokol Ndreca, Gianluca Guadagni, Antonio Iovanellaand Guglielmo Lulli, for their precious friendship and the collaboration onthe subjects of Chapter 5 and 7. A special mention is due to Emilio Cirillo:our walks along the Dommel have proved really helpful in clearing up theheuristics of Chapter 6. I would also like to thank Maria Virginia Caccavale,who processed the Heathrow dataset, and Damiano Taurino, for his helpfulcomments on the SESAR programme and the 4D Trajectories.

I wish to express my deepest gratitude to the espresso brigade, that is tosay, Alessandro, Enrico, Julien, Maria Luisa, Rui, and my officemate, Martin:hard times await me without your sweet company. I also owe an apologyto my dearest friend, Ale, for the unfair words I said regarding the styleof his masters thesis, they were only sour grapes. In Italian we say Chidisprezza compra, and it looks like I am no exception to this rule after all.Also, I wish to address heartfelt thanks to Alex, who read the non-technicalparts and gave me important feedback on the cover design.

Finally, I want to mention my family, Piero, Giuliana, Michele, and Gi-acomo. I could write double the amount of these pages and give just anintroductory view of how complex and difficult my feelings for them are;but my equilibrium state, what really lies at the bottom of my heart, is justlove and serenity. Vi voglio bene sempre.

Last but not least, I wish to thank Silvia, my sweet love. However roughmight it be the path that lies in front of us, if you are at my side I will neverfalter. This small work of mine is dedicated to you.

xix

call me cutoff . Some instants ago never mind how long pre-cisely having little or no states in my trajectory, and nothing par-ticular to interest me on the tails, I thought I would converge about alittle and see the relevant part of the state space.

1I N T R O D U C T I O NThe present essay is mainly focused on the cutoff phenomenon for finiteMarkov Chains (MCs). A MC is a model for random dynamics, i. e., a proba-bilistic description of the evolution of an observable. The signature feature ofa MC is the dependence on the past observations only through the currentstate. In other words, the observable will take the next value by running aprobabilistic update rule that depends on the current value only and comestypically bundled in a square matrix called transition kernel, or transition ma-trix. Such a characteristic is known as markovianess, or memoryless property,and it can be summarised in the following way: whatever the past sequence ofvalues already taken, only the very last matters.

Under mild assumptions on the transition matrix, a MC exhibits a uniquestationary state (or equilibrium state), that is, a probability distribution thatgives the likelihood of measuring a certain value of the observable in equi-librium conditions. The stationary state is an asymptotic characterisation ofthe MC, in the sense that it describes the long-run behaviour of the observ-able. The existence of a unique equilibrium distribution is often quite easy toprove, this partly explains why MCs are important sources of elegant math-ematics in addition to being widely deployed tools for modelling stochasticevolutions.

It is also possible to use MCs as efficient computational devices. Indeed,the stationary distribution of a MC can be easily sampled by simulating thechain evolution until equilibrium is reached. This idea has given rise to theso-called Markov Chain Monte Carlo (MCMC) Paradigm, with plenty of appli-cations ranging from approximate counting and integration to combinato-rial optimisation, statistical physics, and statistical inference. The interestedreader is referred to [Jer93, JS96, Jer03] for further details. However, the va-lidity of MCMC algorithms crucially depends on how fast the MC being runreaches its equilibrium state. Slow convergence to equilibrium will, in prac-tice, result in a useless algorithm that requires too large a computationaltime to be run. On the other hand, it is quite dangerous to stop the algo-rithm without precise knowledge of whether equilibrium has been reachedor not. Indeed, it may happen that the output is unreliable, due to the sam-pling from a distribution which is not the targeted one.

There exists an entire industry devoted to the characterisation of the con-vergence to equilibrium, an exhausting exposition of these techniques canbe found in [MT06, LPW06]. Estimating 2, the second eigenvalue of thetransition matrix, is the most frequently addressed problem since in manyinstances t2 is a satisfactory proxy for the actual distance from equilibriumafter the chain has evolved for t steps. The picture is radically different whenthe MC exhibits cutoff because the cutoff phenomenon gives an extremely ac-curate description of the convergence to the stationary state. More precisely,a MC exhibits cutoff behaviour if the distance from the equilibrium state sud-denly drops from almost the maximum to almost the minimum value. Suchan abrupt convergence takes place over a short time-window b, negligiblewith respect to the time a for the phenomenon to arise. In this respect, cutoffis much more informative than the classical bound in terms of the second

1

2 introduction

eigenvalue cited above. In particular, for MCMC algorithms the presence ofcutoff immediately advises

- not to run the algorithm execution for times much larger than a as itwould be a waste of computational resources to simulate the MC muchlonger than necessary;

- not to stop it before a steps have passed, for in this case the outputwould be sampled from a probability distribution which is as far aspossible from the targeted one.

The cutoff phenomenon naturally arises in many models of interest, seefor instance [Dia96]. Hence, the characterisation of cutoff phenomena is afavourable addition to the study of non-asymptotic behaviour and conver-gence to equilibrium, a central topic in the modern theory of MCs.

The name cutoff phenomenon first appeared in the literature in [AD86],although the first results in this subject were obtained a few years ear-lier [DS81]. In 1994 Yuval Peres conjectured that cutoff occurs if and only ifthe time to reach equilibrium is much larger than the ratio 112 , see [Per04].It took more than ten years to have that conjecture proved within a specialclass of MCs, known as Birth-and-Death Chains (BDCs) [DSC06, DLP10]. Thiswas possible because BDCs manifest a peculiar link between the transitionmatrix and the stationary distribution, and between the spectral propertiesof the transition kernel and the typical evolution of the chain. Whether Yu-val Peres characterisation holds for a wider class of chains still remains anopen question.

The remaining literature on the cutoff phenomenon is mainly composedof model-dependent results. In his 1996 survey Persi Diaconis wrote: Atpresent writing, proof of a cutoff is a difficult, delicate affair, requiring detailedknowledge of the chain, such as all eigenvalues and eigenvectors. Most of the exam-ples where this can be pushed through arise from random walk on groups, with thewalk having a fair amount of symmetry. Since then the picture has essentiallyremained unchanged, see for instance the discussion in [LS13]. However, inthe last few years cutoff has been investigated in its generality for its re-lation to the exponential escape, a feature of metastable behaviour. Metastablebehaviour can be roughly described as a (exponentially) long sojourn in astate of apparent equilibrium followed by a quick transition to the stableequilibrium. In [BBF09] the authors showed that under suitable hypotheses,cutoff and exponential behaviour are two sides of the same coin. To studyboth phenomena in a common framework, they had to renounce the descrip-tion of cutoff in terms of an abrupt fall-off in the distance from equilibrium.They chose, instead, the common, unifying language of hitting times. A hit-ting time is a random variable representing the first time a state is visitedby the MC. For the exponential behaviour, hitting times are used to markthe first moment the chain visits a state sufficiently far from the metastablestate for the escape to be successful. When studying cutoff, hitting times areused to flag the first time the chain visits target quantiles of the stationarydistribution.

This research is conveniently situated at the interface between the two ap-proaches to cutoff described above. On one hand it tackles the phenomenonclassically, i. e., with the formalism of the distance from the equilibrium state,on the other it explicitly speaks the language of hitting times. It also uses theidiom of statistical physics. The quote above by Persi Diaconis mentions afair amount of symmetry; it turns out that these symmetries, if any, may oftenbe used to define an equivalence relation on the set of all possible values

introduction 3

of a MC, the state space. Projecting the chain onto the quotient state space bymeans of this equivalence relation often leads to a simpler process for whichcutoff is easier to prove. The projection highlights the entropy of each equiv-alence class and helps to establish the paths of the typical trajectories in thestate space. In this way it is therefore possible to characterise how the chainapproaches the relevant portion of the state space, i. e., the one that corre-sponds to the typical values taken by the MC at equilibrium. In terms of thestationary state the relevant part of the state space translates to the appro-priate quantiles of the equilibrium distribution. The cutoff time, a, can thenbe interpreted as the expectation of the hitting time of this relevant part,whilst the cutoff window, b, is discovered to arise as the intertwining of twoseparate contributions: the standard deviation of , and the thermalisation.

Among the original contributions of this work, the thermalisation is oneof the most important, it is the new thing we see looking through the cut-off window. Roughly speaking, it is the time to reach equilibrium startingfrom within the relevant part of the state space. This means that the typicaltrajectory of a MC exhibiting cutoff can be decomposed in two parts: theapproach to the relevant quantiles of the state space and the relaxation toequilibrium once they have been reached. Each of these parts is naturallystudied over the corresponding time scale, easing up the task of provingcutoff. In fact, a quite common approach to the proof of cutoff is the designof a coupling, sufficiently clever to allow estimates on the overall time scalea b. Such a detailed inspection is not needed here, being intrinsic to themodus operandi developed. The proposed new methodology is shown atwork in a variety of examples, mainly classical and non-classical models ofrandom walks.

The essay continues with the study of the cutoff phenomenon for a familyof queueing systems, extremely important for both historical and applied ar-guments. These are single-server queues with fixed-length, deterministic ser-vice time, and arrival process obtained as the superposition of Independentand Identically Distributed (IID) random shifts to a pre-scheduled stream ofcustomers. An arrival stream of this kind was introduced for the first time byC.B. Winsten in the 50s, who named it the problem of the late customer [Win59],it was then studied by the pioneers of queueing theory, in particular by D.G.Kendall [Ken64]. Currently it is better known by the name of Pre-ScheduledRandom Arrivals (PSRA) [GNS11, Gwi11, NH12]. It is very fitting in describ-ing the actual stream of arrivals in many situations where planned inflow ofcustomers is inherently subject to random fluctuations, e. g. transportationsystems.

The queueing system obtained in the special case of exponentially dis-tributed delay can be easily described by mean of a bivariate MC, i. e. a two-component chain. Finding the stationary distribution for this MC is a verydifficult and still open problem. The equilibrium state can be investigatedusing the bivariate generating function to produce an iterative functionalscheme, able to approximate the generating function to the desired order.The analysis of the generating function also leads to a fair location of therelevant quantiles of the stationary distribution in the quarter plane. Then,using the already developed methodology, a cutoff-like abrupt convergencecan be shown.

The research is completed with a study of the inbound air traffic at theLondon Heathrow Airport. The analysis of a data set of actual arrivals givesa description of the airport congestion. A comparison of the latter with theoutput of the family of queues described above shows a surprising goodness

4 introduction

of fit. The existence of the cutoff behaviour for such a system is a synonymof resilience, a key performance of models for Air Traffic Management (ATM).According to [Glu12], resilience means that starting from a stress situation,like peaks of traffic load, time deficit, operational procedures, limitationand reliability of equipment, or abnormal/emergency situations, the systemsteadily reaches equilibrium, that is to say, normal operation condition. Dueto the presence of cutoff the system can cope very well with congestion, andthe time needed to recover from a stress situation can be estimated withhigh precision.

2T H E C U T O F F P H E N O M E N O NCutoff belongs to the mixing properties of a Markov Chain (MC), a sequenceof random objects describing the evolution of a possibly complex system.The mixing properties of a MC specify the existence of a stationary distribu-tion and the speed of convergence to it. The cutoff phenomenon is a strongrealisation of the former and a sharp characterisation of the latter at thesame time.

For these statements to make sense from a mathematical point of view westart with some definitions.

2.1 representation of a markov chain

Let n be a sequence of finite sets. A MC is a collection of n-valued ran-dom variables X0n,X1n,X2n, . . . satisfying the so-called Markov (or memoryless) Markov propertyproperty: for each i, j P n and for each sequence of states tksu0st nwith kt1 = i and kt = j,

P(Xtn = kt

X0n = k0, X

1n = k1, . . . , X

t1n = kt1

)= P

(Xtn = j

Xt1n = i

), (2.1)

= Pn(i, j) . (2.2)

The meaning of (2.1) is that the evolution of a MC, i. e., its next value Xtn,depends only on the current position Xt1n whatever may it be its past tra-jectory X0n,X1n, . . . ,Xt2n in the state space. Equation (2.2) points out that theprobability of a transition from state i to state j does not depend on time,in this case the MC is said to be homogeneous. Although inhomogeneous MCscan be very useful to model relevant systems, we will not consider themand henceforth restrict ourselves to the homogeneous case.

Remark 2.1. All the quantities introduced so far display a subscript n. This isindeed the usual notation used to represent families of MCs. Families of MCsare a key ingredient in the definition of cutoff, to be introduced later on inSection 2.6.

By (2.2), a MC can be represented as a |n| |n| square matrix Pn =tPn(i, j)u, usually called the transition matrix or transition kernel. The ele-ments of a transition matrix are non-negative, that is, Pn(i, j) 0, and ifsummed over any row they equal to unity, i. e.,

j Pn(i, j) = 1. Another inter-esting object to consider is the probability law the initial state X0n is chosenaccording to. This is called the initial distribution of the MC and is indicatedby 0n. The initial distribution can be thought of as a vector 0n P [0, 1]|n|such that its i-th component is 0n(i) = P

(X0n = i

).

Once Pn and 0n are prescribed, the evolved measure after t steps can becomputed. The evolved measure, tn, is a vector whose i-th entry is tn(i) =P(Xtn = i

). It can be found by multiplying the t-th power of Pn by 0n to

the left [H02], i. e.,

tn = 0n P

tn .

5

6 the cutoff phenomenon

Remark 2.2. When 0n is set, the evolved measures tn is given by a determin-istic recursive relation and, in principle, could be computed at any specifiedtime.

2.2 topology of the state space

Any MC can be naturally represented as a graph G(V ,E), where the vertexset is V = n and the edge set E V V contains all the couples of states(i, j) such that Pn(i, j) 0. According to the geometrical intuition, two statesi and j are said to be adjacent if the edge (i, j) is in E. The state i is said toCommunicating

states communicate with state j if there exists a path L G that joins i with j. If allvertices in V communicate each other then the graph G is connected and thetransition matrix Pn is said to be irreducible. The interested reader is referredto [Bol98] for a comprehensive introduction on graph theory.

Each chain Xtn induces a neighbourhood structure on n through itsunique graph representation G(V ,E). Given a couple of states (i, j), let Pi,j bethe set of all paths joining i with j, if any, and define the following functionon n n:

d(i, j) =

$

&

%

minLPPi,j len(L) , if Pi,j H , , otherwise , (2.3)where len(L) is the length of L, i. e., the number of edges that compose thepath.

Remark 2.3. We note that state i is adjacent to state j and state i is com-municating with state j may not be symmetric relations. In graph theoryterms, G(V ,E) is a directed graph. As such, the equality d(i, j) = d(j, i) failsin general to hold, and the function defined by (2.3) is not a metric on n.

Given a subset of the space state A n, we define its boundary as the setof states which are not in A but adjacent to it, i. e.,

BA = ti P A : d(i, j) = 1, j P nzAu . (2.4)

In the following we will encounter very often a special type of MCs, named Birth-Birth-and-DeathChains and-Death Chains (BDCs). The graph representation G(V ,E) of a BDC is iso-

morphic to a segment, that is, the state space of a BDC can be put in aone-to-one correspondence with the set t1, 2, . . . , |n|u; from state i onlytransitions to states i, i 1, and i+ 1 are allowed. BDCs are of great interestin applications and are frequently used to model queueing systems. For aBDC, the set BA contains two elements for each connected component of therestricted graph G(A,E), and if the starting point i P nzA is known then itis possible to establish in which state the chain will be found the first time itvisits A. We come back to this very important remark later on in Section 3.3.

2.3 the equilibrium distribution

In Section 2.1 we have pointed out that the transition kernel of a MC is anon-negative matrix. If the chain happens also to be irreducible*, i. e., if

@ i, j P n D t 1 such that Ptn(i, j) 0 , (2.5)

* The irreducibility condition (2.5) is equivalent to all couple of states being communicating inthe sense of Section 2.2.

2.3 the equilibrium distribution 7

then the Perron-Frobenius Theorem states that the spectrum of Pn is includedin (1, 1] and that 1 a simple eigenvalue [Sen81]. Therefore,

D! n such that n = nPn . (2.6)

Formula (2.6) establishes that there exists an invariant probability distribu-tion for the MC. If a chain is started with initial distribution 0n = n, thentn = n for all t 0. Because of this property, n is called the stationary (orequilibrium) distribution. A classical result about n is the so-called Markov MC Convergence

TheoremChain Convergence Theorem [Gne66, Br99, H02]. It states that under the fur-ther assumption of the chain being aperiodic (see below for the definition),

n = limttn . (2.7)

A MC Xtn is said to be aperiodic if

gcdtt 0 : Ptn(i, i) 0u = 1 ,

where gcd stands for the greatest common divisor.By (2.7), a MC that is both irreducible and aperiodic progressively loses

memory of its past, in the sense that the probability of finding the chainin state i after a sufficiently large number of steps becomes independentof the starting state after a sufficiently large number of steps. Therefore,an irreducible and aperiodic chain is often called ergodic. The convergenceexpressed by (2.7) is usually meant to be component-wise, as in [Gne66]. Inthis case the theorem states the convergence in the sense of the `-norm,i. e.,

limt

tn n

= limt supiPn

tn(i) n(i)

= 0 . (2.7a)

However, any distance between probability distributions is suitable sincethe set of all probability measure on n is finite-dimensional. The mostoften used distance between probability measures is probably the so-called TV-distancetotal-variation distance, frequently abbreviated to TV-distance,

tn n

TV= maxAn

tn(A) n(A)

, (2.8)

=1

2

iPn

tn(i) n(i)

. (2.9)

Remark 2.4. Figure 1 gives a graphical interpretation of total-variation dis-tance between two probability distributions and . In particular, the over-lap region has area 1 } }TV. This means that if and are supportedon disjoint regions then their TV-distance is equal to 1.

Another distance of common use is the `p-distance `p distance

tn n

p=

(

iPn

tn(i)

n(i) 1

p

n

) 1p

.

Remark 2.5. The TV-distance is a number between 0 and 1, which satisfies

tn n

1= 2

tn n

TV. In addition, regarded as a function of time, the

TV-distance is non-increasing*, i. e.,

tn n

TV

t+1n n

TV. (2.10)

* Surprisingly this easy and useful property of TV-distance is not mentioned at all in manytextbooks. A nice exception is [Jer03].

8 the cutoff phenomenon

Figure 1: Graphical interpretation of the total-variation distance between two proba-bility measures and . The TV-distance equals both the areas shaded inred and blue. The area shaded in purple equals 1 } }TV.

Separation is a further possibility for quantifying the likeness of two distri-butions. It is defined as

sep(tn,n

)= maxiPn

"

1 tn(i)

n(i)

*

.

Although separation is not formally a distance (it is not symmetric), itis important for historical reasons. In the early studies on the cutoff phe-nomenon many important results were initially obtained for separation, seee. g. [AD86, AD87]. In particular, separation was used in the fundamentalpaper [DSC06] that fully characterises the cutoff phenomenon for the wholeclass of BDCs, see Section 2.8 below.

2.4 reversibility

A MC is reversible with respect to a given probability measure n on n if

n(i)Pn(i, j) = n(j)Pn(j, i) ,

in which case n is said to be a reversible measure with respect to Pn. Areversible chain owns its name to the time reversal property it exhibits: if theTime Reversalinitial distribution is 0n = n then, for each sequence of states tkmu P n,

P(X0n = k0, X

1n = k1, . . . , X

tn = kt

)= P

(X0n = kt, X

1n = kt1, . . . , X

tn = k0

). (2.11)

Roughly speaking, equation (2.11) states that a reversible MC behaves thesame regardless of whether time runs backwards or forwards. Another ex-tremely important feature of reversible measures is that they are also sta-tionary [H02].

For a reversible chain the following formula for the t-th power of theSpectralRepresentation

Theoremtransition matrix can be proved:

Ptn(i, j) =

d

n(j)

n(i)

n

m=1

tn,m un,m(i)un,m(j) , (2.12)

where n,m and un,m are the m-th eigenvalue and eigenvector of the sym-metric matrix P1n = DPnD1, respectively, and D is a diagonal matrix such

2.5 markov chain mixing 9

that D(i, i) =a

n(i), see [Fil91, LPW06]. An almost immediate conse-quence of (2.12) is a bound on the total-variation distance from stationarityafter t steps. For a reversible and ergodic MC started at time t = 0 fromstate i,

tn n

TV 12

[2tn,2

1 n(i)n(i)

], (2.13)

where n,2 is the second largest* eigenvalue of Pn in absolute value, i. e.,|2| = maxm1 |m|.

2.5 markov chain mixing

Given an ergodic MC, the Convergence Theorem entails the existence of aunique stationary measure n that represents the asymptotic distributionof Xtn when t is large, ideally infinite. To have a more precise control onthe convergence, the mixing time is typically introduced. The mixing time is Mixing timethe first time such that the distance from equilibrium drops below a fixedthreshold , i. e.,

tmixn () = min

t 0 :

tn n

TV

(

. (2.14)

Remark 2.6. The mixing time of a MC is a purely deterministic quantity whichis not affected by the realisation of the MC whatsoever.

Alternative definitions of the mixing time can be obtained by using a dif-ferent metric on the set of probability measures onn, but (2.14) is fairly themost common way to define the mixing time. There are, however, dissentingopinions on whether the TV-distance is the right distance to measure howfar from equilibrium a MC is. The reason for that is the tendency of the TV-distance to be very unforgiving even of small deviations from stationarity.An example may be useful.

Suppose that, after perfectly shuffling a deck of 52 cards, we happen tosee the bottom card, say Q. The distribution of our deck is now no longeruniform over the set of 52! cards permutations, but over the set of 51! per-mutations having Q at the bottom-most position. From (2.8) the distancebetween the uniform distribution and the biased one is

(1

51! 152!

)51!

= 1 152 0.98 ,

which is quite close to 1, the distance from uniformity of a brand-new deck.The mixing time is completely determined by the transition matrix Pn,

and by the initial distribution 0n. In order to drop the latter dependence, Worst-case mixingtimethe worst-case scenario is often considered, i. e.,

tmixn () = max0n

min

t 0 :

tn n

TV

(

. (2.14a)

Accordingly, the worst-case mixing time can be related to the spectral prop-erties of the transition matrix. For example, a direct consequence of (2.13) isthe following bound on the mixing time:

tmixn () log 12 + log

1mini n(i)mini n(i)

log 1n,2. (2.15)

* The largest being 1 from Perron-Frobenius theorem. We note that maximising

tn n

TV over the initial distribution is equivalent to setting0n(i) = i,j (Kroneckers delta) and maximising over j, see [MT06].

10 the cutoff phenomenon

Both (2.13) and (2.15) suggest that the closer is n,2 to 1, the larger is thetime needed to ensure convergence within a tolerance . Let us define thespectral gap as gapn = 1 n,2 and the relaxation time as treln = gap1n . Then,Spectral gap

and relaxation time the following lower bound holds in fact for ergodic reversible MCs:

tmixn () (treln 1) log1

2.

Spectral methods for bounding the mixing time are highly relevant in thetreatment of the cutoff phenomenon. In particular, for the class of BDCs thereexists a complete characterisation of cutoff in terms of the asymptotic be-haviour of the product trel tmix, see [DLP10] and Section 2.8 below. How-ever, the approach followed in this thesis does not focus on spectral methods.The interested reader is referred to [MT06, LPW06] for a comprehensive dis-cussion of those techniques.

Another widely used method for bounding the total-variation distancefrom equilibrium is the Coupling Lemma. A coupling of two MCs with thesame transition probabilities but distinct initial distributions is a bivariatechain (Xtn, Ytn) such that the marginals of the joint transition probabilityPn((i, i1), (j, j1)) are Pn(i, j) and Pn(i1, j1), respectively. In other words, bothcomponents Xtn and Ytn are MCs with transition matrix Pn, but the initialdistribution of Xtn may differ in general from that of Ytn. Couplings are gen-erally designed in such a way that the two components stay glued togetherafter they have met for the first time, i. e.,

if Xsn = Ysn then X

tn = Y

tn @ t s . (2.16)

In this way, if Ytn is started according to the stationary distribution n andthe chains evolve together after they meet for the first time, the mixing timeof Xtn is dominated by the coalescence time = mintt : Xtn = Ytnu. TheCoupling LemmaCoupling Lemma states in fact that for a coupling satisfying (2.16), withstarting positions X0n = i and Y0n n,

tn n

TV P ( t) . (2.17)

We will make frequent use of couplings, for they allow to transform theproblem of estimating a deterministic object like the TV-distance into theproblem of estimating the coalescence time, . The interested reader is re-ferred to [Lin92] or [Tho00] for a more detailed treatment of coupling meth-ods for finite MCs.

2.6 the cutoff phenomenon

Hereafter we consider families of finite ergodic discrete-time MCs, that issextets of the form

tn,Xtn,Pn,n,tn,0nu ,

where n is the finite state space of the n-th chain Xtn, which has transitionmatrix Pn and unique stationary measure n. The symbols 0n and tn standfor the initial distribution of the n-th chain and its probability distributionafter t steps; the time t is a discrete quantity. For the sake of simplicity wewill drop the finite ergodic discrete-time specification and simply refer tothem as families of MCs only:

2.6 the cutoff phenomenon 11

Definition 2.1. A family of Markov chains is said to exhibit a total-variationcutoff if there exist two sequences of integers, tanu and tbnu such that

bn

ann 0 (2.18)

and

lim lim infn

anbnn n

TV= 1 , (2.19)

lim lim supn

an+bnn n

TV= 0 . (2.20)

In this case an and bn are called cutoff time and cutoff window, respectively.

Remark 2.7. If a family of MC exhibits an (an, bn)-cutoff then it will alsoexhibit an (an, b1n)-cutoff for every sequence of windows b1n = O(bn).

Remark 2.8. Definition 2.1 is specific for the TV-distance and was first intro-duced in [AD86]. It can easily be adapted to any other notion of distance

tn n

, in this is case the family is said to exhibit a ~ ~-cutoff. However,an arbitrary distance could in principle be not bounded above by 1 or evenbe unbounded. For this reason a bit of care may be required in adaptingthe definition. For example, in the case of the `2 distance equations (2.19)and (2.20) become

lim lim infn

anbnn n

2= + , (2.19a)

lim lim supn

an+bnn n

2= 0 , (2.20a)

see [SC97] and [DSC06] for more details.

Equations (2.19) and (2.20) represent the sharp convergence to the equi-librium distribution in a narrow window of order bn centred about thecutoff time an. Figure 2 displays

tn n

TVas a function of time for

a biased random walk. A biased random walk is a BDC on the segmentn = t0, 1, . . . , nu whose transition probabilities present a constant unbal-ance (bias) towards one of the extreme points of n. The transition proba-bilities are displayed by (3.1)(3.2) in Section 3.1; with respect to Figure 2,the bias is = 1/6. We clearly see that the system abruptly converges in asmall window centred at an = n/2. The actual number of steps needed toachieve equilibrium is bn = (

?n), and the size of the window is negligible

with respect to the length of the plateau.The cutoff phenomenon is a crisp asymptotic picture of the mixing time

of a family of MCs. The following bound is valid for every ergodic MC (seefor example [LPW06]):

ktmixn ()n n

TV (2 )k . (2.21)

Take = 1/4, then equation (2.21) states that in a time equal to tmixn (1/4) Cutoff behaviourvs. normal mixingthe chain is at a distance from equilibrium lower than 1/2. After that, it is

sufficient to wait another tmixn (1/4) steps to see that distance reduced by afactor 2. Conversely, if cutoff is present then the time to go from distance 1/2to distance 1/4 is proportional to the window size bn, an infinitesimal timelapse with respect to the number of steps already waited to reach distance1/2. Thus, establishing cutoff for a given chain is much stronger a character-isation than providing any estimates of the mixing time. In [AD86], the first

12 the cutoff phenomenon

Figure 2: Biased random walk on a segment. The transition probabilities are Pn(i, i1) = 16 , Pn(i, i) =

13 and Pn(i, i + 1) =

12 . The curves refer to different

values of n, the length of the segment.

paper that formally addressed the cutoff phenomenon, D. Aldous and P. Di-aconis made the following remark. Denoting by d(k) the TV-distance fromequilibrium at time k, [...] it is elementary that d(k) 0 geometrically fast,and Perron-Frobenius theory says d(k) ak, where a, have eigenvalue/eigen-vector interpretation*, but these asymptotics miss the cut-off phenomenon. For cardplayers, the question is not exactly how close to uniform is the deck after a millionriffle-shuffles?, but is 7 shuffles enough?.

The cutoff phenomenon appears in many natural examples ranging fromcards shuffling models and random walks on the symmetric group [BD92,DMP95, DS81, Hil92, Ros94, Por95] to statistical mechanics models [Ald83,DS87, DGM90, LS10, LS13], list management problems [DFP92] and manyother. An excellent and exhaustive review is given in [Dia96] by Persi Dia-conis. We next discuss some examples.

2.7 collecting and shuffling

The literature about the cutoff phenomenon starts in 1981 with the paperby P. Diaconis and M. Shahshahani. In [DS81] they investigated the conver-gence to uniformity of a shuffling method called random transpositions, a MCon the symmetric group Sn. Given a set of n objects cards in this case the symmetric group is the set of all n! possible permutations of thoseobjects. As for each random walk on Sn, the equilibrium distribution is uni-form [LPW06]. Performing the random transposition shuffle, the cards of aRandom

transpositions deck are initially displaced in a row. Two cards are then chosen uniformlyat random and transposed the cards may possibly coincide, in this caseno transposition is made. These operations are repeated until the deck isshuffled.

* Cf. (2.12) on page 8.

2.7 collecting and shuffling 13

Figure 3: Top-in-at-random shuffle. The topmost card is inserted back into a ran-domly chosen position.

For c = c(t,n) = t1/2n lognn , P. Diaconis and M. Shahshahani proved

that

Db P R s.t. for n 10, c 0 ,

tn n

TV b e2c, (2.22)

@ t,

tn n

TV 2

(1

e ee2c

)+ o(1) as n . (2.23)

If the upper bound (2.22) is evaluated in t = 1/2n logn+ n then (2.20) isobtained.

The original proof of (2.22)(2.20) is rather technical and uses the toolsof group representations. An alternative and much easier approach is viastrong stationary times, see [LPW06]. Stationary times are a special kind ofstopping times and play a key role in many proofs of cutoff. The first use ofstopping times in establishing cutoff was made by D. Aldous and P. Diaco-nis, in [AD86] they proved cutoff for the top-in-at-random shuffle. Incidentally,that was also the first paper in which the name cutoff was used.

Definition 2.2. A non-negative discrete random variable n is a stopping timefor the MC Xtn if the event tn = su depends on the trajectory of Xtn onlyup to time s. In other words, the indicator function 1tn=su is a function ofX0n, X1n, . . . , Xsn only.

Definition 2.3. A stopping time n is called a stationary time if Xtn evaluatedat time n is at equilibrium. In formulas, n is a stationary time if

P(Xtn = i, n = t

)= n(i) .

Definition 2.4. A strong stationary time is a stationary time with the additionalproperty of Xnn being independent of n, i. e.,

P(Xtn = i | n = t

)= n(i) .

The top-in-at-random shuffle is performed by repeatedly inserting thetopmost card back in a position of the deck picked out uniformly at random, Top-in-at-random

shufflesee Figure 3. Also the MC that models the top-in-at-random is a random walkon the symmetric group, so its equilibrium distribution is uniform. For thethe top-in-at-random shuffle, D. Aldous and P. Diaconis proved that

@ 0, n 2 ,

n logn+nn n

TV e,

@ n , n lognnnn nTV 1 .

(2.24)

14 the cutoff phenomenon

The key to (2.24) is the following Lemma:

lemma 2 .1 Let Xtn be a MC with state space n. Let n be a strong stationarytime for Xtn. Then, @ t 0,

tn n

TV P (n t) .

Proof. For any A n,

tn(A) = P(Xtn P A

),

=

stP(Xtn P A, n = s

)+ P

(Xtn P A, n t

),

=

stn(A)P (n = s) + P

(Xtn P A| n t

)P (n t) ,

= n(A) +[P(Xtn P A | n t

) n(A)

]P (n t) ,

which yields

tn(A) n(A)

P (n t).

A strong stationary time for the top-in-at-random model is the integerfollowing the first time the original bottom card reaches the topmost posi-tion. Let us imagine to perform the top-in-at-random shuffle until T jn, thefirst time when j cards have been re-inserted into the deck below the cardthat at time t = 0 was the bottommost one. Clearly, the j cards are equallydistributed due to the randomness of the inserting position. The number ofsteps between the first moment j 1 cards are below the original bottomcard and the first moment j cards are below it is T jn T j1n , a geometricrandom variable. In formulas,

P(T jn T j1n = t

)=j

n

(1 j

n

)t1, (2.25)

T0n = 0 .

At time Tn1n , when n 1 cards have been placed below the original bot-tommost card, the latter has reached the topmost position of the deck, andthe n 1 cards below it are uniformly permuted. Performing another shuf-fle, the system eventually loses memory of the starting position and reachesuniformity. Thus, a strong stationary time for the top-in-at-random is topn =Tn1n + 1. To obtain (2.24) is sufficient to show that for t = n logn+ n,

P(

topn t

) e. (2.26)

Inequality (2.26) is easily obtained via the coupon collectors problem, as wenext explain.

The coupon collector draws with equal probability from a set of n dif-ferent coupons, and the drawn coupons are immediately replaced. The col-lector wins as soon as he draws all the n different coupons. The questiontypically asked by the collector, especially if he pays a fee for each drawhe makes, is how many draws do I need to win?. The answer is ratherassertive due to the cutoff phenomenon.

The coupon collectors model is a BDC on the segment t0, 1, . . . ,nu, Xtn = imeaning that at time t the collector is still missing i coupons. As soon asXtn = 0 the collector wins, so we define

ccn = mintt 0 : Xtn = 0u .

2.7 collecting and shuffling 15

The transition rates for the coupon collectors chain are Coupon collectorschain

P(Xtn = j

Xt1n = i

)=

$

&

%

i/n , if j = i 1 ,

1 i/n , if j = i ,

0 , otherwise .

The coupon collectors chain is a biased random walk, in the sense that theprobability to go left (from i to j i) is larger than the probability to go right(from i to j i). More precisely, it is the extreme case of the biased randomwalk we have already discussed*, the probability to go right is in fact null.For the coupon collectors model, the number of steps Sin before a movetakes place from state i to i 1 represents the number of draws betweentwo successful extractions of missing coupons. The random variable Sin isgeometrically distributed, i. e.,

P(Sin = s

)=i

n

(1 i

n

)s1. (2.27)

According to (2.25) and (2.27), the random variables T in T i1n and Sinare identically distributed and so are the random variables topn = 1 +n1j=1

(Tjn T j1n

)and ccn = 1+

n1j=1 S

jn. Therefore,

P(

topn t

)= P

(ccn t

). (2.28)

Now, if we call Aj the event the j-th coupon is not drawn in the first t trialsthen

P(ccn t

)= P

nj=1

Aj

nj=1

P(Aj)

,

= n

(1 1

n

)t net/n , (2.29)

which gives (2.26) for t = n logn+ n.Provided the total number of coupons is large, the coupon collector knows

hitherto that n logn+ cn draws will be enough to win. Can he expect towin having drawn much less than n logn coupons? No, he definitely cannot. According to Cantellis inequality the tail probabilities of a real random Cantellis inequalityvariable X with finite mean and variance 2 can be estimated as follows:

P (X ) 11+2

,

P (X ) 11+2

.(2.30)

The time to win, ccn , has mean n logn and variance O(n2), so that (2.30)infers

P (ccn n logn n) 1

1+ 2.

The stationary distribution for the coupon collectors chain is a mass con-centrated in state 0, n(i) = i,0. This means that ccn is a stationary time

* Cf. Figure 2 on page 12. Named after Italian mathematician Francesco Paolo Cantelli, inequality (2.30) is the one-sided

version of Chebyshevs inequality. A proof of this inequality can be found in [Fel68b].

16 the cutoff phenomenon

for the coupon collector because when the chain reaches state 0, only self-transitions are available (absorbing state). Further, by means of (2.8),

tn n

TV= 1P

(Xtn = 0

),

= 1P (ccn t) . (2.31)

If t = n logn n, then (2.31) yields

n lognnn n

TV 1 1

1+ 2,

which is equivalent to (2.20) for an = n logn and bn = (n). Lemma 2.1applied to the stationary time ccn and (2.28)(2.29) (or Cantelli again) leadsto (2.19) and a (an,bn)-cutoff.

2.8 cutoff for birth-and-death chains

In Section 2.5 we reviewed some methods for bounding the distance of theevolved measure tn of an ergodic MC from the unique stationary distribu-tion n. For our purposes we can sort those method in two distinct classes,namely, spectral methods and random times methods. Random times compre-hend coalescence and stopping/stationary times. Section 2.7 explained howthe existence of a concentrated strong stationary time can be exploited toprove cutoff in some classical examples. On the other hand, the evolution ofa MC and its mixing properties do depend only on Pn, as we already notedat the end of Section 2.1 and in Remark 2.6. A question naturally arising iswhether, with respect to the cutoff phenomenon, the spectral properties ofPn and the presence of random times particularly meaningful for the evo-lution of the chain are in fact two sides of the same coin. Strongly expectedand sought for, so far a positive answer could be found only in the class ofBirth-and-Death Chains (BDCs).

The state space n of a BDC can be put in a one-to-one correspondencewith the segment t0, 1, . . . , |n| 1u. Only transitions to nearest-neighbourare allowed, that is Pn(i, j) = 0 if |i j| 1. It is a common habit to indicatethe non-zero transition probabilities with the following symbols:

pi = Pn(i, i+ 1) ,

ri = Pn(i, i) ,

qi = Pn(i, i 1) .

If pi,qi 0 @ i = 0, 1, . . . , |n| 1 and ri 0 for at least a state i P n thena BDC is ergodic, and the stationary distribution can be written as

n(i) = n(0)i

k=1

pk1qk

. (2.32)

In [Per04] Yuval Peres conjectured that in many natural chains cutoff occursif and only if treln = o(tmixn (1/4)) asymptotically, for the parameter n .This conjecture was proved true within the class of BDCs both for cutoff inseparation and TV-distance in [DSC06] and [DLP10], respectively.

In order to better understand the proof of Yuval Peress claim and how itlinks the spectral properties of the transition matrix to the behaviour of thedistance from stationarity, let us start surveying the proof of cutoff for thecoupon collectors chain. Carried out at the end of Section 2.7, it is based on

2.8 cutoff for birth-and-death chains 17

two facts, namely, the stationary distribution is a mass in state 0 and thereexists a stationary time, ccn , which is concentrated in the sense that

2 [ccn ]

E [ccn ]n 0 . (2.33)

For the examples presented in Section 2.7 the stationary time ccn is alsothe hitting time of state 0. Hitting times, defined below, play a fundamentalrole in the methodology to be developed in Chapter 3.

Definition 2.5. The hitting time of a set A n is defined as

n(A) = min

t 0 : Xtn P A(

.

Remark 2.9. Hitting times are stopping times.

With respect to Definition 2.5, the proof of cutoff for the coupon collectorcan then be sketched as follows:

1. The stationary measure n is concentrated in the subset A = t0u. Asa consequence, equation (2.8) yields to

tn n

TV= maxAn

tn(A) tn(A)u ,

n(A)P(Xtn P A

),

= 1P(Xtn P A

),

1P (n(A) t) ;

2. The hitting time of 0 is a stationary time for the chain. Lemma 2.1 thenprovides the bound

tn n

TV P ((A) t) ;

3. The hitting time n(A) is concentrated as in (2.33), so if we take an =E [n(A

)] and bn = [n(A)] then the probability of both the eventstn(A) an bnu and tn(A) an + bnu is asymptoticallysmall in n and by (2.30).

Remark 2.10. Having in mind a possible generalisation of this structure toother-than-BDCs, the most critical point is 2. Indeed, the stationary measureof a BDC, as well as the first two moments of any hitting time, depend onlyon the transition rates pi and qi. The last statement is justified by (2.32) andequations (2.43)(2.46) below.

In [DSC06] P. Diaconis and L. Saloff-Coste remarked that if D(, ) is either Characterisation ofcutoffin separation forBDCs

separation or TV-distance then there exists a sequence of random times TDnsuch that

D(tn,n

)= P

(TDn t

). (2.34)

If D is total-variation then the elements of the family TDn are understoodto be optimal coupling times (optimal in the sense that (2.17) is satisfiedas an equality), while if D is separation then the TDn s are optimal strongstationary times. They also stressed that once mean and variance of TDn arecomputed, the distance from stationarity can be easily bounded by means ofCantellis inequality (2.30). In the case of separation TDn is the first time anauxiliary chain, named strong stationary dual, hits the state which is furthest*

* In the sense of (2.3).

18 the cutoff phenomenon

with respect to the starting state, conventionally taken to be X0n = 0. Thedual chain is still a BDC and keeps the eigenvalues of the original chain,see [DF90]. Then, using the first passage time distribution in terms of thespectral properties of Pn provided by [Kei79, BS87], P. Diaconis and L. Saloff-Coste obtained

E[TDn]

=|n|i=2

1i

,

2[TDn]

=|n|i=2 (1 i)

12i

,

where tiu2i|n| are the eigenvalues of Pn different from 1. The varianceof TDn can be easily bounded by

2[TDn

] treln E

[TDn

], (2.35)

or alternatively as

2[TDn

] E2

[TDn

]. (2.36)

Via (2.30) and (2.34) it is immediate to show that

sep(

E[TDn ][TDn ]n ,n

) 1 1

1+ 2, (2.37)

sep(

E[TDn ]+[TDn ]n ,n

) 11+ 2

, (2.38)

that is to say, a cutoff with an = E[TDn]

and bn = ([TDn]), provided

treln = o(E[TDn]). Conversely, suppose the chain exhibits an (a1n,b1n)-cutoff

with cutoff time a1n . The separation distance can be alternativelybounded by (2.30), (2.34), and (2.35). For all 0,

sep((1)E[TDn ]n ,n

) 1 1

1+ 2 gap E[TDn] , (2.39)

sep((1+)E[TDn ]n ,n

) 11+ 2 gap E

[TDn] . (2.40)

Inequalities (2.39) and (2.40) could be interpreted as a cutoff with cutoff win-dow proportional to the cutoff time. This means that the length of the timeintervals [an bn,an + bn] and [(1 )E

[TDn]

, (1+ )E[TDn]] must be

of the same order for n large. In other words, an E[TDn]). Inequali-

ties (2.36)(2.40) now infer that gap E[TDn]. The proof of Yuval Peress

conjecture for the separation case is concluded by showing that E[TDn]

is proportional to the mixing time and gap E[TDn] if and only if

gap tmixn ().In [DLP10] the conjecture is again proved true in the class of BDCs but forCharacterisation of

cutoffin total variation for

BDCs

TV-distance. Here the authors used the following equivalent definition ofcutoff:

Definition 2.6. A family of ergodic MCs is said to exhibit cutoff if

limn t

mixn ()

tmixn (1 )= 1 @ P (0, 1/2) .

The structure of the proof is similar to the one we have outlined onpage 17, the main difference being that point 2 is replaced by

tn n

TVP (maxtn(Q()), n(Q(1 ))ut)+2,

Another name for hitting time.

2.8 cutoff for birth-and-death chains 19

where n(Q()) is the hitting time (starting from state 0) of the -quantile ofthe stationary distribution, i. e.,

Q() = min

#

k 0 :k

i=0

n(i)

+

.

Then, the first passage time distribution provided by [Kei79, BS87] is usedas in [DSC06] to obtain the following bound the variance of n(Q()):

2 [n(Q())] E [n(Q())]

gapn. (2.41)

Similarly to what we have seen for TDn , the hitting time n(Q()) is concen-trated if the product gapn E [n(Q())] diverges. It turns out that again thiscondition is also necessary for cutoff, and E [n(Q())] is proportional to themixing time. Thus, Yuval Peress conjecture is proved.

A radically different approach to the cutoff phenomenon for BDCs is the Cutoff at mean timesone proposed in [MY01, BBF09]:

Definition 2.7. A family of chains is said to exhibit cutoff at mean times if thereexists a sequence of random times Tn such that

Tn

E [Tn]Prob 1 . (2.42)

This alternative definition is adopted for several reasons. It allows to re-lease the phenomenon from the notion of distance in use, the TV-distance,in particular, can greatly penalise the vision of the approach to stationarity*.Definition 2.7 does not require the knowledge of the stationary distributionand let us look at the cutoff phenomenon as a physical phenomenon ratherthan as a purely probabilistic one. Moreover, it gives the possibility to setup a common framework to study and characterise cutoff and metastabil-ity at the same time. In [BBF09] the authors indeed derive some sufficientconditions for both cutoff phenomenon (in the sense of (2.42)) and exponen-tial escape (a fundamental feature of metastable behaviour) to arise. Lastbut not least, if we compare the proof of cutoff for the coupon collectorschain and the characterisation of cutoff for BDCs both in separation and TV-distance, then we can fairly say that (2.42) captures the soul of the phenom-enon. Condition (2.42) is in fact inferred by a concentration property like[Tn] = o(E [Tn]), a cornerstone of all the discussions so far.

In the analysis carried out in [BBF09] a key role is played by formulas forthe first and second moment of the hitting times of any state. We introducethese formulas here for future reference. Let us suppose that a BDC has statespace n = t0, 1, . . . , nu and let Tij be the hitting time of j starting fromi, that is,

Tij = mintt 0 : Xtn = j, X0n = iu .

For 0 j i n,

E[Tij

]=

i

k=j+1

1

qk

n

m=k

n(m)

n(k), (2.43)

E[T2ij

]=

i

k=j+1

2

qk

n

m=k

E[Tmj

] n(m)n(k)

E[Tij

], (2.44)

* See the discussion on page 9 on the unforgiving nature of TV-distance.

20 the cutoff phenomenon

whereas, for 0 i j n,

E[Tij

]=

j1

k=i

1

pk

k

m=0

n(m)

n(k), (2.45)

E[T2ij

]=

j1

k=i

2

pk

k

m=0

E[Tmj

] n(m)n(k)

E[Tij

]. (2.46)

A proof of formulas (2.43)(2.46) is found in [Fel68a, BBF09]. From (2.43)and (2.44) the following formula can be easily obtained for the variance ofTij when 0 j i n:

2[Tij

]=

i

k=j+1

2 [Tkk1] (2.47)

=i

k=j+1

1

qk

n

m=k

(2E [Tmk1]E [Tkk1])n(m)

n(k)E

[Tij

].

An analogous formula holds if 0 i j n.

2.9 seven shuffles are enough

We end this chapter with the riffle shuffle, also known as dovetail shuffle.Experiments in [Dia88] shows that a good mathematical model for this shuf-fling technique is the following. A deck of n cards is split in two accordingto a binomial distribution B(n, 1/2), then the two parts are riffled togetherin such a way that the next card drops from one of the two heaps withprobability proportional to the number of cards still present in each heap,see [Gil55] for more details. The mathematical analysis of riffle shuffle ispresented in [BD92], the seminal paper by D. Bayer and P. Diaconis. Ap-parently, the riffle shuffle is the first example that comes to the mind whenspeaking of cutoff. A possible explanation for that is equation (2.48) below,which gives a mathematical foundation to the well-known statement sevenshuffles are enough.

The analysis of the riffle shuffle is based on the concept of rising sequences.A rising sequence is a maximal subset of an arrangement of cards, con-Rising sequencessisting of successive face values displayed in order. Given the permutationA45 2 6 3 7 we recognise two interleaved rising sequences, namely, A23 and4 5 6 7. As pointed out in [BD92], rising sequences do not intersect, thus anyarrangement of cards is the union of its rising sequences. Figure 4 showsa single iteration of riffle shuffle. The deck, initially arranged in ascendingorder, is cut in two parts following a binomial rule, that is, the probabilityof one of the two packets having k cards is

(nk

)2k. This first stage selects

two rising sequences, namely, 1, 2, . . . , k and k+ 1, k+ 2, . . . , n. Then, thetwo packets are riffled together in such a way that, if the two heaps haverespectively A and B cards, the next card will fall from the first heap withprobability AA+B . While the riffling takes place, the cards coming from thetwo heaps keep their relative order so that the final arrangement is justthe interleaving of the mentioned sequences. If successive shuffles are per-formed then the number of rising sequences in the deck will initially tendto double. Conversely, the length of any sequence will tend to decrease ex-ponentially fast.

In their delightful paper, D. Bayer and P. Diaconis proved that if a deckof n cards (originally in ascending ordered) is given a sequence of t rif-

2.9 seven shuffles are enough 21

Figure 4: Riffle shuffling and rising sequences. (a) Initial deck arrangement, only onerising sequence is present; (b) The deck is divided in two parts according toa binomial distribution; (c) The packets are riffled together; (d) New deckarrangement, two rising sequences are now present.

fle shuffles then the probability of the deck being arranged according to apermutation with r rising sequences is

Qt(r) =

(2t+nrn

)2nt

,

where r is the number of rising sequences in . Figure 6 displays Qt(r) asa function of the number of rising sequences r. As t increases, the permuta-tions with larger number of rising sequences become more likely. The maintheorem in [BD92] states that the riffle shuffle model exhibits cutoff. In par- The riffle shuffle

modelexhibits cutoff

ticular, if tn() is the distribution of the deck arrangement after t stepsand n() = 1n! then for t = log2(n

3/2c),

tn n

TV= 1 2

(14c?3

)+O

(n1/4

), (2.48)

where 0 c and (x) = 1?2

x et2/2 dt. The proof of (2.48) is car-

ried out through a very detailed analysis of Qt(r) for t around log2 n3/2 and

the use of the central limit theorem. The idea is that the probability of a deckarrangement having r rising sequences is actually given by Qt(r) times thenumber of permutations with r rising sequences, often called the Euleriannumber En,r. Using the number of rising sequences to identify a permuta-tion, the symmetric group Sn is mapped onto the segment t1, 2, . . . , nu. Asa consequence, both tn and n get transformed according to the entropiccontribution En,r. In [Tan73] it is proved that for large n, suitably rescaledEulerian numbers are distributed according to a normal N

(n/2,

a

n/12). For

r = n2 + h, n2 + 1 h

n2 ,

En,r

n!=e h22?n12

a

2 n12

(1+ o

(n1/2

))uniformly in h . (2.49)

22 the cutoff phenomenon

Figure 5: Riffle shuffling of a deck of 52 cards. The chart displays the TV-distancefrom stationarity computed from (2.48) and a simulation of the number ofrising sequences.

The stationary distribution is hence no longer uniform, instead it can beasymptotically treated as a normal. On the other hand, the evolved measuretn() shows a sort of energy-entropy conflict. Qt(r) gives in fact the largestweight to the initial permutation 1, 2, . . . , n even for those values of t forwhich stationarity is achieved, see Figures 6h6j. Nonetheless, the configu-rations with a number of rising sequences more than O(

?n) away from n/2

are greatly penalised. As a result, after log2 n3/2 steps the evolved measure

is nearly uniform in the bulk of (2.49). The bulk is the family of subsets

An, =

"

r :

r n

2

c

n

12

*

.

Since the stationary measure is concentrated onAn,, the difference betweenthe evolved and the stationary measure in A{n, can be neglected.

It is very difficult to exploit an approach to cutoff like the one illustratedin Section 2.8 in the case of riffle shuffle. Although the stationary measurepresent a concentration property in the sense discussed above, it is ratherhard to compute the hitting time of the region An,. Among the technicalreasons, the complete expression of the transition matrix for the riffle shufflemodel is not known in literature [BD92]. However, Figure 5 gives strongevidence that the picture we have started building since Section 2.8 is quitegeneral. Indeed, the convergence is triggered in the first time the number ofrising sequences hits the set An,.

2.9 seven shuffles are enough 23

(a) Qt(r) after 1 shuffle.

(b) Qt(r) after 2 shuffles.

Figure 6: Qt(r) for a regular deck of 52 cards, t = 1, . . . , 10.

24 the cutoff phenomenon

(c) Qt(r) after 3 shuffles.

(d) Qt(r) after 4 shuffles.

Figure 7: (cont.) Qt(r) for a regular deck of 52 cards, t = 1, . . . , 10.

2.9 seven shuffles are enough 25

(e) Qt(r) after 5 shuffles.

(f) Qt(r) after 6 shuffles.

Figure 8: (cont.) Qt(r) for a regular deck of 52 cards, t = 1, . . . , 10.

26 the cutoff phenomenon

(g) Qt(r) after 7 shuffles.

(h) Qt(r) after 8 shuffles.

Figure 9: (cont.) Qt(r) for a regular deck of 52 cards, t = 1, . . . , 10.

2.9 seven shuffles are enough 27

(i) Qt(r) after 9 shuffles.

(j) Qt(r) after 10 shuffles.

Figure 10: (cont.) Qt(r) for a regular deck of 52 cards, t = 1, . . . , 10.

3S I Z I N G U P T H E C U T O F F W I N D O WIn Chapter 2 we have introduced the cutoff phenomenon and presentedthe fundamental results that characterise it for the class of Birth-and-DeathChains (BDCs). We have also outlined a common thread to all chains exhibit-ing cutoff behaviour, i. e., the existence of a stopping time that encapsulatesall the features of the approach to stationarity. When such a stopping time isa strong stationary time, a concentration property of the form (2.33) or (2.42)easily gives cutoff, and the proof, mutatis mutandis, isthe same as for thecoupon collectors chain, Section 2.7. In Remark 2.10 we have stressed thatthe upper bound to the TV-distance is the most critical point to deal withfor a generalisation of the method. We begin this chapter, looking for a pos-sible way to overcome that point, from the biased random walk. This is amodel that has much in common with the coupon collectors chain from thepoint of view of the description of cutoff that we are building, yet it doesnot present any evident strong stationary time.

3.1 the drift triggers cutoff : biased random walk on a seg-ment

The biased random walk on a segment is a BDC on the state space n =t0, 1, . . . , nu. For i P t1, 2, . . . ,n 1u, the transition probabilities are

Pn(i, i+ 1) = pi = 13 ,Pn(i, i) = ri = 13 ,

Pn(i, i 1) = qi = 13 + ,(3.1)

whereas at the extreme points 0 and n,

Pn(0, 1) = p0 = 13 ,Pn(0, 0) = r0 = 23 + ,

Pn(n,n 1) = qn = 13 + ,Pn(n,n) = rn = 23 .

(3.2)

The parameter P (0, 1/3) is called bias, or drift. Since 0, the chain de-fined by (3.1) and (3.2) is more likely to take transitions to the left rather thanto the right. The imbalance in the transition probabilities does not presentany spatial dependency, so we say that the chain has a constant drift to theleft. According to (2.32),

n(i) = n(0)

(1/31/3+

)i,

where, by normalisation,

n(0) =2

1/3+.

The presence of a drift positively affects the stochastic stability of each The drift triggerscutoffMarkov Chain (MC) [MT93]. This is especially true in BDCs, due to formu-

las (2.32) and (2.43)(2.47). Figure 11 shows how the evolved measure tn

29

30 sizing up the cutoff window

Figure 11: Biased random walk on a segment of size 10,000. To have a comparabletime scale, the case = 0.000 is drawn for a segment of size 1,000.

approaches n for increasing values of the bias, including the case = 0.When = 0 we have a uniform random walk because the probability togo left equals the probability to go right, and the chain has a diffusive be-haviour. In diffusive processes the distance from stationarity decreases expo-nentially*, but not abruptly. In other words, diffusion means no cutoff. Con-versely, for every drift larger than 0, the diffusion behaviour is destroyedand a cutoff-like curve arises. In Figure 11, though, the smallest value of seems to be problematic, for the convergence does not look abrupt at all.However, we must not forget that cutoff is an asymptotic feature. Increasingthe the size of the state space, Figure 12 shows in fact the correct curvefor the formerly problematic value of the bias. Both