introduction to theoretical computer sciencecontents preface 19 preliminaries 27 0 introduction 29 1...

655
BOAZ BARAK INTRODUCTION TO THEORETICAL COMPUTER SCIENCE TEXTBOOK IN PREPARATION. AVAILABLE ON HTTPS://INTROTCS.ORG

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • B OA Z BA R A K

    I N T RO D U C T I O N TOT H E O R E T I CA LCO M P U T E R S C I E N C E

    T E X T B O O K I N P R E PA R AT I O N .AVA I L A B L E O N HTTPS://INTROTCS.ORG

    HTTPS://INTROTCS.ORG

  • Text available on https://github.com/boazbk/tcs - please post any issues there - thank you!

    This version was compiled on Tuesday 8th September, 2020 05:00

    Copyright © 2020 Boaz Barak

    This work is licensed under a CreativeCommons “Attribution-NonCommercial-NoDerivatives 4.0 International” license.

    https://github.com/boazbk/tcshttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.enhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.enhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.enhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.en

  • To Ravit, Alma and Goren.

  • Contents

    Preface 9

    Preliminaries 17

    0 Introduction 19

    1 Mathematical Background 37

    2 Computation and Representation 73

    I Finite computation 111

    3 Defining computation 113

    4 Syntactic sugar, and computing every function 149

    5 Code as data, data as code 175

    II Uniform computation 205

    6 Functions with Infinite domains, Automata, and Regularexpressions 207

    7 Loops and infinity 241

    8 Equivalent models of computation 271

    9 Universality and uncomputability 315

    10 Restricted computational models 347

    11 Is every theorem provable? 365

    Compiled on 9.8.2020 05:00

  • 6

    III Efficient algorithms 385

    12 Efficient computation: An informal introduction 387

    13 Modeling running time 407

    14 Polynomial-time reductions 439

    15 NP, NP completeness, and the Cook-Levin Theorem 465

    16 What if P equals NP? 485

    17 Space bounded computation 505

    IV Randomized computation 507

    18 Probability Theory 101 509

    19 Probabilistic computation 529

    20 Modeling randomized computation 541

    V Advanced topics 565

    21 Cryptography 567

    22 Proofs and algorithms 595

    23 Quantum computing 597

    VI Appendices 631

  • Contents (detailed)

    Preface 90.1 To the student . . . . . . . . . . . . . . . . . . . . . . . . 10

    0.1.1 Is the effort worth it? . . . . . . . . . . . . . . . . 110.2 To potential instructors . . . . . . . . . . . . . . . . . . . 120.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 14

    Preliminaries 17

    0 Introduction 190.1 Integer multiplication: an example of an algorithm . . . 200.2 Extended Example: A faster way to multiply (optional) 220.3 Algorithms beyond arithmetic . . . . . . . . . . . . . . . 270.4 On the importance of negative results . . . . . . . . . . 280.5 Roadmap to the rest of this book . . . . . . . . . . . . . 29

    0.5.1 Dependencies between chapters . . . . . . . . . . 300.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 320.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 33

    1 Mathematical Background 371.1 This chapter: a reader’s manual . . . . . . . . . . . . . . 371.2 A quick overview of mathematical prerequisites . . . . 381.3 Reading mathematical texts . . . . . . . . . . . . . . . . 39

    1.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 401.3.2 Assertions: Theorems, lemmas, claims . . . . . . 401.3.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 40

    1.4 Basic discrete math objects . . . . . . . . . . . . . . . . . 411.4.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 411.4.2 Special sets . . . . . . . . . . . . . . . . . . . . . . 421.4.3 Functions . . . . . . . . . . . . . . . . . . . . . . . 441.4.4 Graphs . . . . . . . . . . . . . . . . . . . . . . . . 461.4.5 Logic operators and quantifiers . . . . . . . . . . 491.4.6 Quantifiers for summations and products . . . . 501.4.7 Parsing formulas: bound and free variables . . . 501.4.8 Asymptotics and Big-𝑂 notation . . . . . . . . . . 52

  • 8

    1.4.9 Some “rules of thumb” for Big-𝑂 notation . . . . 531.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    1.5.1 Proofs and programs . . . . . . . . . . . . . . . . 551.5.2 Proof writing style . . . . . . . . . . . . . . . . . . 551.5.3 Patterns in proofs . . . . . . . . . . . . . . . . . . 56

    1.6 Extended example: Topological Sorting . . . . . . . . . 591.6.1 Mathematical induction . . . . . . . . . . . . . . . 601.6.2 Proving the result by induction . . . . . . . . . . 611.6.3 Minimality and uniqueness . . . . . . . . . . . . . 63

    1.7 This book: notation and conventions . . . . . . . . . . . 651.7.1 Variable name conventions . . . . . . . . . . . . . 661.7.2 Some idioms . . . . . . . . . . . . . . . . . . . . . 67

    1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 691.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 71

    2 Computation and Representation 732.1 Defining representations . . . . . . . . . . . . . . . . . . 75

    2.1.1 Representing natural numbers . . . . . . . . . . . 762.1.2 Meaning of representations (discussion) . . . . . 78

    2.2 Representations beyond natural numbers . . . . . . . . 782.2.1 Representing (potentially negative) integers . . . 792.2.2 Two’s complement representation (optional) . . 792.2.3 Rational numbers and representing pairs of

    strings . . . . . . . . . . . . . . . . . . . . . . . . . 802.3 Representing real numbers . . . . . . . . . . . . . . . . . 822.4 Cantor’s Theorem, countable sets, and string represen-

    tations of the real numbers . . . . . . . . . . . . . . . . . 832.4.1 Corollary: Boolean functions are uncountable . . 892.4.2 Equivalent conditions for countability . . . . . . 89

    2.5 Representing objects beyond numbers . . . . . . . . . . 902.5.1 Finite representations . . . . . . . . . . . . . . . . 912.5.2 Prefix-free encoding . . . . . . . . . . . . . . . . . 912.5.3 Making representations prefix-free . . . . . . . . 942.5.4 “Proof by Python” (optional) . . . . . . . . . . . 952.5.5 Representing letters and text . . . . . . . . . . . . 972.5.6 Representing vectors, matrices, images . . . . . . 992.5.7 Representing graphs . . . . . . . . . . . . . . . . . 992.5.8 Representing lists and nested lists . . . . . . . . . 992.5.9 Notation . . . . . . . . . . . . . . . . . . . . . . . . 100

    2.6 Defining computational tasks as mathematical functions 1002.6.1 Distinguish functions from programs! . . . . . . 102

    2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 108

  • 9

    I Finite computation 111

    3 Defining computation 1133.1 Defining computation . . . . . . . . . . . . . . . . . . . . 1153.2 Computing using AND, OR, and NOT. . . . . . . . . . . 116

    3.2.1 Some properties of AND and OR . . . . . . . . . 1183.2.2 Extended example: Computing XOR from

    AND, OR, and NOT . . . . . . . . . . . . . . . . . 1193.2.3 Informally defining “basic operations” and

    “algorithms” . . . . . . . . . . . . . . . . . . . . . 1213.3 Boolean Circuits . . . . . . . . . . . . . . . . . . . . . . . 123

    3.3.1 Boolean circuits: a formal definition . . . . . . . . 1243.3.2 Equivalence of circuits and straight-line programs 127

    3.4 Physical implementations of computing devices (di-gression) . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

    3.4.1 Transistors . . . . . . . . . . . . . . . . . . . . . . 1313.4.2 Logical gates from transistors . . . . . . . . . . . 1323.4.3 Biological computing . . . . . . . . . . . . . . . . 1323.4.4 Cellular automata and the game of life . . . . . . 1323.4.5 Neural networks . . . . . . . . . . . . . . . . . . . 1323.4.6 A computer made from marbles and pipes . . . . 133

    3.5 The NAND function . . . . . . . . . . . . . . . . . . . . 1343.5.1 NAND Circuits . . . . . . . . . . . . . . . . . . . . 1353.5.2 More examples of NAND circuits (optional) . . . 1363.5.3 The NAND-CIRC Programming language . . . . 138

    3.6 Equivalence of all these models . . . . . . . . . . . . . . 1403.6.1 Circuits with other gate sets . . . . . . . . . . . . 1413.6.2 Specification vs. implementation (again) . . . . . 142

    3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433.8 Biographical notes . . . . . . . . . . . . . . . . . . . . . . 146

    4 Syntactic sugar, and computing every function 1494.1 Some examples of syntactic sugar . . . . . . . . . . . . . 151

    4.1.1 User-defined procedures . . . . . . . . . . . . . . 1514.1.2 Proof by Python (optional) . . . . . . . . . . . . . 1534.1.3 Conditional statements . . . . . . . . . . . . . . . 154

    4.2 Extended example: Addition and Multiplication (op-tional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

    4.3 The LOOKUP function . . . . . . . . . . . . . . . . . . . 1584.3.1 Constructing a NAND-CIRC program for

    LOOKUP . . . . . . . . . . . . . . . . . . . . . . . 1594.4 Computing every function . . . . . . . . . . . . . . . . . 161

    4.4.1 Proof of NAND’s Universality . . . . . . . . . . . 1624.4.2 Improving by a factor of 𝑛 (optional) . . . . . . . 163

  • 10

    4.5 Computing every function: An alternative proof . . . . 1654.6 The class SIZE(𝑇 ) . . . . . . . . . . . . . . . . . . . . . . 1674.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 174

    5 Code as data, data as code 1755.1 Representing programs as strings . . . . . . . . . . . . . 1775.2 Counting programs, and lower bounds on the size of

    NAND-CIRC programs . . . . . . . . . . . . . . . . . . . 1785.2.1 Size hierarchy theorem (optional) . . . . . . . . . 180

    5.3 The tuples representation . . . . . . . . . . . . . . . . . 1825.3.1 From tuples to strings . . . . . . . . . . . . . . . . 183

    5.4 A NAND-CIRC interpreter in NAND-CIRC . . . . . . . 1845.4.1 Efficient universal programs . . . . . . . . . . . . 1855.4.2 A NAND-CIRC interpeter in “pseudocode” . . . 1865.4.3 A NAND interpreter in Python . . . . . . . . . . 1875.4.4 Constructing the NAND-CIRC interpreter in

    NAND-CIRC . . . . . . . . . . . . . . . . . . . . . 1885.5 A Python interpreter in NAND-CIRC (discussion) . . . 1915.6 The physical extended Church-Turing thesis (discus-

    sion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1935.6.1 Attempts at refuting the PECTT . . . . . . . . . . 195

    5.7 Recap of Part I: Finite Computation . . . . . . . . . . . . 1995.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 203

    II Uniform computation 205

    6 Functions with Infinite domains, Automata, and Regularexpressions 2076.1 Functions with inputs of unbounded length . . . . . . . 208

    6.1.1 Varying inputs and outputs . . . . . . . . . . . . . 2096.1.2 Formal Languages . . . . . . . . . . . . . . . . . . 2116.1.3 Restrictions of functions . . . . . . . . . . . . . . . 211

    6.2 Deterministic finite automata (optional) . . . . . . . . . 2126.2.1 Anatomy of an automaton (finite vs. unbounded) 2156.2.2 DFA-computable functions . . . . . . . . . . . . . 216

    6.3 Regular expressions . . . . . . . . . . . . . . . . . . . . . 2176.3.1 Algorithms for matching regular expressions . . 221

    6.4 Efficient matching of regular expressions (optional) . . 2236.4.1 Matching regular expressions using DFAs . . . . 2276.4.2 Equivalence of regular expressions and automata 2286.4.3 Closure properties of regular expressions . . . . 230

  • 11

    6.5 Limitations of regular expressions and the pumpinglemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

    6.6 Answering semantic questions about regular expres-sions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2396.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 240

    7 Loops and infinity 2417.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 242

    7.1.1 Extended example: A Turing machine for palin-dromes . . . . . . . . . . . . . . . . . . . . . . . . 244

    7.1.2 Turing machines: a formal definition . . . . . . . 2457.1.3 Computable functions . . . . . . . . . . . . . . . . 2477.1.4 Infinite loops and partial functions . . . . . . . . 248

    7.2 Turing machines as programming languages . . . . . . 2497.2.1 The NAND-TM Programming language . . . . . 2517.2.2 Sneak peak: NAND-TM vs Turing machines . . . 2547.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . 255

    7.3 Equivalence of Turing machines and NAND-TM pro-grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

    7.3.1 Specification vs implementation (again) . . . . . 2607.4 NAND-TM syntactic sugar . . . . . . . . . . . . . . . . . 261

    7.4.1 “GOTO” and inner loops . . . . . . . . . . . . . . 2617.5 Uniformity, and NAND vs NAND-TM (discussion) . . 2647.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2657.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 268

    8 Equivalent models of computation 2718.1 RAM machines and NAND-RAM . . . . . . . . . . . . . 2738.2 The gory details (optional) . . . . . . . . . . . . . . . . . 277

    8.2.1 Indexed access in NAND-TM . . . . . . . . . . . . 2778.2.2 Two dimensional arrays in NAND-TM . . . . . . 2798.2.3 All the rest . . . . . . . . . . . . . . . . . . . . . . 279

    8.3 Turing equivalence (discussion) . . . . . . . . . . . . . . 2808.3.1 The “Best of both worlds” paradigm . . . . . . . 2818.3.2 Let’s talk about abstractions . . . . . . . . . . . . 2818.3.3 Turing completeness and equivalence, a formal

    definition (optional) . . . . . . . . . . . . . . . . . 2838.4 Cellular automata . . . . . . . . . . . . . . . . . . . . . . 284

    8.4.1 One dimensional cellular automata are Turingcomplete . . . . . . . . . . . . . . . . . . . . . . . 286

    8.4.2 Configurations of Turing machines and thenext-step function . . . . . . . . . . . . . . . . . . 287

  • 12

    8.5 Lambda calculus and functional programming lan-guages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

    8.5.1 Applying functions to functions . . . . . . . . . . 2908.5.2 Obtaining multi-argument functions via Currying 2918.5.3 Formal description of the λ calculus . . . . . . . . 2928.5.4 Infinite loops in the λ calculus . . . . . . . . . . . 295

    8.6 The “Enhanced” λ calculus . . . . . . . . . . . . . . . . 2958.6.1 Computing a function in the enhanced λ calculus 2978.6.2 Enhanced λ calculus is Turing-complete . . . . . 298

    8.7 From enhanced to pure λ calculus . . . . . . . . . . . . 3018.7.1 List processing . . . . . . . . . . . . . . . . . . . . 3028.7.2 The Y combinator, or recursion without recursion 303

    8.8 The Church-Turing Thesis (discussion) . . . . . . . . . 3068.8.1 Different models of computation . . . . . . . . . . 307

    8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3088.10 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 312

    9 Universality and uncomputability 3159.1 Universality or a meta-circular evaluator . . . . . . . . . 316

    9.1.1 Proving the existence of a universal TuringMachine . . . . . . . . . . . . . . . . . . . . . . . . 318

    9.1.2 Implications of universality (discussion) . . . . . 3209.2 Is every function computable? . . . . . . . . . . . . . . . 3219.3 The Halting problem . . . . . . . . . . . . . . . . . . . . 323

    9.3.1 Is the Halting problem really hard? (discussion) 3269.3.2 A direct proof of the uncomputability of HALT

    (optional) . . . . . . . . . . . . . . . . . . . . . . . 3279.4 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . 329

    9.4.1 Example: Halting on the zero problem . . . . . . 3309.5 Rice’s Theorem and the impossibility of general soft-

    ware verification . . . . . . . . . . . . . . . . . . . . . . . 3339.5.1 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . 3359.5.2 Halting and Rice’s Theorem for other Turing-

    complete models . . . . . . . . . . . . . . . . . . . 3399.5.3 Is software verification doomed? (discussion) . . 340

    9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3429.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 345

    10 Restricted computational models 34710.1 Turing completeness as a bug . . . . . . . . . . . . . . . 34710.2 Context free grammars . . . . . . . . . . . . . . . . . . . 349

    10.2.1 Context-free grammars as a computational model 35110.2.2 The power of context free grammars . . . . . . . 35310.2.3 Limitations of context-free grammars (optional) 355

  • 13

    10.3 Semantic properties of context free languages . . . . . . 35710.3.1 Uncomputability of context-free grammar

    equivalence (optional) . . . . . . . . . . . . . . . 35710.4 Summary of semantic properties for regular expres-

    sions and context-free grammars . . . . . . . . . . . . . 36010.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 36110.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 362

    11 Is every theorem provable? 36511.1 Hilbert’s Program and Gödel’s Incompleteness Theorem 366

    11.1.1 Defining “Proof Systems” . . . . . . . . . . . . . . 36711.2 Gödel’s Incompleteness Theorem: Computational

    variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36911.3 Quantified integer statements . . . . . . . . . . . . . . . 37111.4 Diophantine equations and the MRDP Theorem . . . . 37311.5 Hardness of quantified integer statements . . . . . . . . 374

    11.5.1 Step 1: Quantified mixed statements and com-putation histories . . . . . . . . . . . . . . . . . . 375

    11.5.2 Step 2: Reducing mixed statements to integerstatements . . . . . . . . . . . . . . . . . . . . . . 378

    11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 38011.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 383

    III Efficient algorithms 385

    12 Efficient computation: An informal introduction 38712.1 Problems on graphs . . . . . . . . . . . . . . . . . . . . . 389

    12.1.1 Finding the shortest path in a graph . . . . . . . . 39012.1.2 Finding the longest path in a graph . . . . . . . . 39212.1.3 Finding the minimum cut in a graph . . . . . . . 39212.1.4 Min-Cut Max-Flow and Linear programming . . 39312.1.5 Finding the maximum cut in a graph . . . . . . . 39512.1.6 A note on convexity . . . . . . . . . . . . . . . . . 395

    12.2 Beyond graphs . . . . . . . . . . . . . . . . . . . . . . . . 39712.2.1 SAT . . . . . . . . . . . . . . . . . . . . . . . . . . 39712.2.2 Solving linear equations . . . . . . . . . . . . . . . 39812.2.3 Solving quadratic equations . . . . . . . . . . . . 399

    12.3 More advanced examples . . . . . . . . . . . . . . . . . 39912.3.1 Determinant of a matrix . . . . . . . . . . . . . . . 39912.3.2 Permanent of a matrix . . . . . . . . . . . . . . . . 40112.3.3 Finding a zero-sum equilibrium . . . . . . . . . . 40112.3.4 Finding a Nash equilibrium . . . . . . . . . . . . 40212.3.5 Primality testing . . . . . . . . . . . . . . . . . . . 40212.3.6 Integer factoring . . . . . . . . . . . . . . . . . . . 403

  • 14

    12.4 Our current knowledge . . . . . . . . . . . . . . . . . . . 40312.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 40412.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 40412.7 Further explorations . . . . . . . . . . . . . . . . . . . . 405

    13 Modeling running time 40713.1 Formally defining running time . . . . . . . . . . . . . . 409

    13.1.1 Polynomial and Exponential Time . . . . . . . . . 41013.2 Modeling running time using RAM Machines /

    NAND-RAM . . . . . . . . . . . . . . . . . . . . . . . . . 41213.3 Extended Church-Turing Thesis (discussion) . . . . . . 41713.4 Efficient universal machine: a NAND-RAM inter-

    preter in NAND-RAM . . . . . . . . . . . . . . . . . . . 41813.4.1 Timed Universal Turing Machine . . . . . . . . . 420

    13.5 The time hierarchy theorem . . . . . . . . . . . . . . . . 42113.6 Non-uniform computation . . . . . . . . . . . . . . . . . 425

    13.6.1 Oblivious NAND-TM programs . . . . . . . . . . 42713.6.2 “Unrolling the loop”: algorithmic transforma-

    tion of Turing Machines to circuits . . . . . . . . . 43013.6.3 Can uniform algorithms simulate non-uniform

    ones? . . . . . . . . . . . . . . . . . . . . . . . . . 43213.6.4 Uniform vs. Non-uniform computation: A recap 434

    13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 43513.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 438

    14 Polynomial-time reductions 43914.1 Formal definitions of problems . . . . . . . . . . . . . . 44014.2 Polynomial-time reductions . . . . . . . . . . . . . . . . 441

    14.2.1 Whistling pigs and flying horses . . . . . . . . . . 44214.3 Reducing 3SAT to zero one and quadratic equations . . 444

    14.3.1 Quadratic equations . . . . . . . . . . . . . . . . . 44714.4 The independent set problem . . . . . . . . . . . . . . . 44914.5 Some exercises and anatomy of a reduction. . . . . . . . 452

    14.5.1 Dominating set . . . . . . . . . . . . . . . . . . . . 45314.5.2 Anatomy of a reduction . . . . . . . . . . . . . . . 456

    14.6 Reducing Independent Set to Maximum Cut . . . . . . 45814.7 Reducing 3SAT to Longest Path . . . . . . . . . . . . . . 459

    14.7.1 Summary of relations . . . . . . . . . . . . . . . . 46214.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 46214.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 463

    15 NP, NP completeness, and the Cook-Levin Theorem 46515.1 The class NP . . . . . . . . . . . . . . . . . . . . . . . . . 467

    15.1.1 Examples of functions in NP . . . . . . . . . . . . 46915.1.2 Basic facts about NP . . . . . . . . . . . . . . . . . 470

  • 15

    15.2 From NP to 3SAT: The Cook-Levin Theorem . . . . . . . 47215.2.1 What does this mean? . . . . . . . . . . . . . . . . 47315.2.2 The Cook-Levin Theorem: Proof outline . . . . . 474

    15.3 The NANDSAT Problem, and why it is NP hard . . . . 47515.4 The 3NAND problem . . . . . . . . . . . . . . . . . . . . 47715.5 From 3NAND to 3SAT . . . . . . . . . . . . . . . . . . . 47915.6 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . 48015.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 48215.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 483

    16 What if P equals NP? 48516.1 Search-to-decision reduction . . . . . . . . . . . . . . . . 48716.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 489

    16.2.1 Example: Supervised learning . . . . . . . . . . . 49216.2.2 Example: Breaking cryptosystems . . . . . . . . . 493

    16.3 Finding mathematical proofs . . . . . . . . . . . . . . . 49316.4 Quantifier elimination (advanced) . . . . . . . . . . . . 495

    16.4.1 Application: self improving algorithm for 3SAT . 49716.5 Approximating counting problems and posterior

    sampling (advanced, optional) . . . . . . . . . . . . . . 49816.6 What does all of this imply? . . . . . . . . . . . . . . . . 49916.7 Can P ≠ NP be neither true nor false? . . . . . . . . . . 50116.8 Is P = NP “in practice”? . . . . . . . . . . . . . . . . . . 50216.9 What if P ≠ NP? . . . . . . . . . . . . . . . . . . . . . . . 50316.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 50416.11 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 504

    17 Space bounded computation 50517.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 50517.2 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 505

    IV Randomized computation 507

    18 Probability Theory 101 50918.1 Random coins . . . . . . . . . . . . . . . . . . . . . . . . 510

    18.1.1 Random variables . . . . . . . . . . . . . . . . . . 51318.1.2 Distributions over strings . . . . . . . . . . . . . . 51518.1.3 More general sample spaces . . . . . . . . . . . . 515

    18.2 Correlations and independence . . . . . . . . . . . . . . 51618.2.1 Independent random variables . . . . . . . . . . . 51718.2.2 Collections of independent random variables . . 519

    18.3 Concentration and tail bounds . . . . . . . . . . . . . . . 51918.3.1 Chebyshev’s Inequality . . . . . . . . . . . . . . . 52118.3.2 The Chernoff bound . . . . . . . . . . . . . . . . . 522

  • 16

    18.3.3 Application: Supervised learning and empiricalrisk minimization . . . . . . . . . . . . . . . . . . 523

    18.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 52518.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 528

    19 Probabilistic computation 52919.1 Finding approximately good maximum cuts . . . . . . . 530

    19.1.1 Amplifying the success of randomized algorithms 53119.1.2 Success amplification . . . . . . . . . . . . . . . . 53219.1.3 Two-sided amplification . . . . . . . . . . . . . . . 53319.1.4 What does this mean? . . . . . . . . . . . . . . . . 53419.1.5 Solving SAT through randomization . . . . . . . 53419.1.6 Bipartite matching . . . . . . . . . . . . . . . . . . 536

    19.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 53919.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 53919.4 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 539

    20 Modeling randomized computation 54120.1 Modeling randomized computation . . . . . . . . . . . 542

    20.1.1 An alternative view: random coins as an “extrainput” . . . . . . . . . . . . . . . . . . . . . . . . . 545

    20.1.2 Success amplification of two-sided error algo-rithms . . . . . . . . . . . . . . . . . . . . . . . . . 547

    20.2 BPP and NP completeness . . . . . . . . . . . . . . . . . 54820.3 The power of randomization . . . . . . . . . . . . . . . . 549

    20.3.1 Solving BPP in exponential time . . . . . . . . . . 54920.3.2 Simulating randomized algorithms by circuits . . 550

    20.4 Derandomization . . . . . . . . . . . . . . . . . . . . . . 55120.4.1 Pseudorandom generators . . . . . . . . . . . . . 55320.4.2 From existence to constructivity . . . . . . . . . . 55420.4.3 Usefulness of pseudorandom generators . . . . . 555

    20.5 P = NP and BPP vs P . . . . . . . . . . . . . . . . . . . . 55620.6 Non-constructive existence of pseudorandom genera-

    tors (advanced, optional) . . . . . . . . . . . . . . . . . 55920.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 56220.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 562

    V Advanced topics 565

    21 Cryptography 56721.1 Classical cryptosystems . . . . . . . . . . . . . . . . . . . 56821.2 Defining encryption . . . . . . . . . . . . . . . . . . . . . 57021.3 Defining security of encryption . . . . . . . . . . . . . . 57121.4 Perfect secrecy . . . . . . . . . . . . . . . . . . . . . . . . 573

  • 17

    21.4.1 Example: Perfect secrecy in the battlefield . . . . 57421.4.2 Constructing perfectly secret encryption . . . . . 575

    21.5 Necessity of long keys . . . . . . . . . . . . . . . . . . . 57721.6 Computational secrecy . . . . . . . . . . . . . . . . . . . 578

    21.6.1 Stream ciphers or the “derandomized one-timepad” . . . . . . . . . . . . . . . . . . . . . . . . . . 580

    21.7 Computational secrecy and NP . . . . . . . . . . . . . . 58321.8 Public key cryptography . . . . . . . . . . . . . . . . . . 585

    21.8.1 Defining public key encryption . . . . . . . . . . 58721.8.2 Diffie-Hellman key exchange . . . . . . . . . . . . 588

    21.9 Other security notions . . . . . . . . . . . . . . . . . . . 59021.10 Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590

    21.10.1 Zero knowledge proofs . . . . . . . . . . . . . . . 59121.10.2 Fully homomorphic encryption . . . . . . . . . . 59121.10.3 Multiparty secure computation . . . . . . . . . . 592

    21.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 59321.12 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 593

    22 Proofs and algorithms 59522.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 59522.2 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 595

    23 Quantum computing 59723.1 A brief introduction to quantum mechanics . . . . . . . 598

    23.1.1 The double slit experiment . . . . . . . . . . . . . 59823.1.2 Quantum amplitudes . . . . . . . . . . . . . . . . 59923.1.3 Linear algebra quick review . . . . . . . . . . . . 602

    23.2 Bell’s Inequality . . . . . . . . . . . . . . . . . . . . . . . 60323.3 Quantum weirdness . . . . . . . . . . . . . . . . . . . . . 60423.4 Quantum computing and computation - an executive

    summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 60523.5 Quantum systems . . . . . . . . . . . . . . . . . . . . . . 607

    23.5.1 Quantum amplitudes . . . . . . . . . . . . . . . . 60823.5.2 Quantum systems: an executive summary . . . . 609

    23.6 Analysis of Bell’s Inequality (optional) . . . . . . . . . . 61023.7 Quantum computation . . . . . . . . . . . . . . . . . . . 612

    23.7.1 Quantum circuits . . . . . . . . . . . . . . . . . . 61323.7.2 QNAND-CIRC programs (optional) . . . . . . . 61623.7.3 Uniform computation . . . . . . . . . . . . . . . . 616

    23.8 Physically realizing quantum computation . . . . . . . 61823.9 Shor’s Algorithm: Hearing the shape of prime factors . 619

    23.9.1 Period finding . . . . . . . . . . . . . . . . . . . . 61923.9.2 Shor’s Algorithm: A bird’s eye view . . . . . . . . 620

    23.10 Quantum Fourier Transform (advanced, optional) . . . 622

  • 18

    23.10.1 Quantum Fourier Transform over the BooleanCube: Simon’s Algorithm . . . . . . . . . . . . . . 624

    23.10.2 From Fourier to Period finding: Simon’s Algo-rithm (advanced, optional) . . . . . . . . . . . . . 625

    23.10.3 From Simon to Shor (advanced, optional) . . . . 62523.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 62723.12 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 628

    VI Appendices 631

  • Preface

    “We make ourselves no promises, but we cherish the hope that the unobstructedpursuit of useless knowledge will prove to have consequences in the futureas in the past” … “An institution which sets free successive generations ofhuman souls is amply justified whether or not this graduate or that makes aso-called useful contribution to human knowledge. A poem, a symphony, apainting, a mathematical truth, a new scientific fact, all bear in themselves allthe justification that universities, colleges, and institutes of research need orrequire”, Abraham Flexner, The Usefulness of Useless Knowledge, 1939.

    “I suggest that you take the hardest courses that you can, because you learnthe most when you challenge yourself… CS 121 I found pretty hard.”, MarkZuckerberg, 2005.

    This is a textbook for an undergraduate introductory course onTheoretical Computer Science. The educational goals of this book areto convey the following:

    • That computation arises in a variety of natural and human-madesystems, and not only in modern silicon-based computers.

    • Similarly, beyond being an extremely important tool, computationalso serves as a useful lens to describe natural, physical, mathemati-cal and even social concepts.

    • The notion of universality of many different computational models,and the related notion of the duality between code and data.

    • The idea that one can precisely define a mathematical model ofcomputation, and then use that to prove (or sometimes only conjec-ture) lower bounds and impossibility results.

    • Some of the surprising results and discoveries in modern theoreti-cal computer science, including the prevalence of NP-completeness,the power of interaction, the power of randomness on one handand the possibility of derandomization on the other, the abilityto use hardness “for good” in cryptography, and the fascinatingpossibility of quantum computing.

    Compiled on 9.8.2020 05:00

    https://library.ias.edu/files/UsefulnessHarpers.pdfhttps://youtu.be/xFFs9UgOAlE?t=3646https://youtu.be/xFFs9UgOAlE?t=3646

  • 20

    I hope that following this course, students would be able to rec-ognize computation, with both its power and pitfalls, as it arises invarious settings, including seemingly “static” content or “restricted”formalisms such as macros and scripts. They should be able to followthrough the logic of proofs about computation, including the cen-tral concept of a reduction, as well as understanding “self-referential”proofs (such as diagonalization-based proofs that involve programsgiven their own code as input). Students should understand thatsome problems are inherently intractable, and be able to recognize thepotential for intractability when they are faced with a new problem.While this book only touches on cryptography, students should un-derstand the basic idea of how we can use computational hardness forcryptographic purposes. However, more than any specific skill, thisbook aims to introduce students to a new way of thinking of computa-tion as an object in its own right and to illustrate how this new way ofthinking leads to far-reaching insights and applications.

    My aim in writing this text is to try to convey these concepts in thesimplest possible way and try to make sure that the formal notationand model help elucidate, rather than obscure, the main ideas. I alsotried to take advantage of modern students’ familiarity (or at leastinterest!) in programming, and hence use (highly simplified) pro-gramming languages to describe our models of computation. Thatsaid, this book does not assume fluency with any particular program-ming language, but rather only some familiarity with the generalnotion of programming. We will use programming metaphors andidioms, occasionally mentioning specific programming languagessuch as Python, C, or Lisp, but students should be able to follow thesedescriptions even if they are not familiar with these languages.

    Proofs in this book, including the existence of a universal TuringMachine, the fact that every finite function can be computed by somecircuit, the Cook-Levin theorem, and many others, are often con-structive and algorithmic, in the sense that they ultimately involvetransforming one program to another. While it is possible to followthese proofs without seeing the code, I do think that having accessto the code, and the ability to play around with it and see how it actson various programs, can make these theorems more concrete for thestudents. To that end, an accompanying website (which is still workin progress) allows executing programs in the various computationalmodels we define, as well as see constructive proofs of some of thetheorems.

    0.1 TO THE STUDENT

    This book can be challenging, mainly because it brings together avariety of ideas and techniques in the study of computation. There

  • 21

    are quite a few technical hurdles to master, whether it is followingthe diagonalization argument for proving the Halting Problem isundecidable, combinatorial gadgets in NP-completeness reductions,analyzing probabilistic algorithms, or arguing about the adversary toprove the security of cryptographic primitives.

    The best way to engage with this material is to read these notes ac-tively, so make sure you have a pen ready. While reading, I encourageyou to stop and think about the following:

    • When I state a theorem, stop and take a shot at proving it on yourown before reading the proof. You will be amazed by how muchbetter you can understand a proof even after only 5 minutes ofattempting it on your own.

    • When reading a definition, make sure that you understand whatthe definition means, and what the natural examples are of objectsthat satisfy it and objects that do not. Try to think of the motivationbehind the definition, and whether there are other natural ways toformalize the same concept.

    • Actively notice which questions arise in your mind as you read thetext, and whether or not they are answered in the text.

    As a general rule, it is more important that you understand thedefinitions than the theorems, and it is more important that youunderstand a theorem statement than its proof. After all, before youcan prove a theorem, you need to understand what it states, and tounderstand what a theorem is about, you need to know the definitionsof the objects involved. Whenever a proof of a theorem is at leastsomewhat complicated, I provide a “proof idea.” Feel free to skip theactual proof in a first reading, focusing only on the proof idea.

    This book contains some code snippets, but this is by no meansa programming text. You don’t need to know how to program tofollow this material. The reason we use code is that it is a precise wayto describe computation. Particular implementation details are notas important to us, and so we will emphasize code readability at theexpense of considerations such as error handling, encapsulation, etc.that can be extremely important for real-world programming.

    0.1.1 Is the effort worth it?This is not an easy book, and you might reasonably wonder whyshould you spend the effort in learning this material. A traditionaljustification for a “Theory of Computation” course is that you mightencounter these concepts later on in your career. Perhaps you willcome across a hard problem and realize it is NP complete, or find aneed to use what you learned about regular expressions. This might

  • 22

    1 An earlier book that starts with circuits as the initialmodel is John Savage’s [Sav98].

    very well be true, but the main benefit of this book is not in teachingyou any practical tool or technique, but instead in giving you a differ-ent way of thinking: an ability to recognize computational phenomenaeven when they occur in non-obvious settings, a way to model compu-tational tasks and questions, and to reason about them.

    Regardless of any use you will derive from this book, I believelearning this material is important because it contains concepts thatare both beautiful and fundamental. The role that energy and matterplayed in the 20th century is played in the 21st by computation andinformation, not just as tools for our technology and economy, but alsoas the basic building blocks we use to understand the world. Thisbook will give you a taste of some of the theory behind those, andhopefully spark your curiosity to study more.

    0.2 TO POTENTIAL INSTRUCTORS

    I wrote this book for my Harvard course, but I hope that other lectur-ers will find it useful as well. To some extent, it is similar in contentto “Theory of Computation” or “Great Ideas” courses such as thosetaught at CMU or MIT.

    The most significant difference between our approach and moretraditional ones (such as Hopcroft and Ullman’s [HU69; HU79] andSipser’s [Sip97]) is that we do not start with finite automata as our ini-tial computational model. Instead, our initial computational modelis Boolean Circuits.1 We believe that Boolean Circuits are more fun-damental to the theory of computing (and even its practice!) thanautomata. In particular, Boolean Circuits are a prerequisite for manyconcepts that one would want to teach in a modern course on Theoret-ical Computer Science, including cryptography, quantum computing,derandomization, attempts at proving P ≠ NP, and more. Even incases where Boolean Circuits are not strictly required, they can of-ten offer significant simplifications (as in the case of the proof of theCook-Levin Theorem).

    Furthermore, I believe there are pedagogical reasons to start withBoolean circuits as opposed to finite automata. Boolean circuits are amore natural model of computation, and one that corresponds moreclosely to computing in silicon, making the connection to practicemore immediate to the students. Finite functions are arguably easierto grasp than infinite ones, as we can fully write down their truth ta-ble. The theorem that every finite function can be computed by someBoolean circuit is both simple enough and important enough to serveas an excellent starting point for this course. Moreover, many of themain conceptual points of the theory of computation, including thenotions of the duality between code and data, and the idea of universal-ity, can already be seen in this context.

    http://www.cs.cmu.edu/~./15251/http://stellar.mit.edu/S/course/6/sp16/6.045/materials.html

  • 23

    After Boolean circuits, we move on to Turing machines and proveresults such as the existence of a universal Turing machine, the un-computability of the halting problem, and Rice’s Theorem. Automataare discussed after we see Turing machines and undecidability, as anexample for a restricted computational model where problems such asdetermining halting can be effectively solved.

    While this is not our motivation, the order we present circuits, Tur-ing machines, and automata roughly corresponds to the chronologicalorder of their discovery. Boolean algebra goes back to Boole’s andDeMorgan’s works in the 1840s [Boo47; De 47] (though the defini-tion of Boolean circuits and the connection to physical computationwas given 90 years later by Shannon [Sha38]). Alan Turing definedwhat we now call “Turing Machines” in the 1930s [Tur37], while finiteautomata were introduced in the 1943 work of McCulloch and Pitts[MP43] but only really understood in the seminal 1959 work of Rabinand Scott [RS59].

    More importantly, while models such as finite-state machines, reg-ular expressions, and context-free grammars are incredibly importantfor practice, the main applications for these models (whether it is forparsing, for analyzing properties such as liveness and safety, or even forsoftware-defined routing tables) rely crucially on the fact that theseare tractable models for which we can effectively answer semantic ques-tions. This practical motivation can be better appreciated after studentssee the undecidability of semantic properties of general computingmodels.

    The fact that we start with circuits makes proving the Cook-LevinTheorem much easier. In fact, our proof of this theorem can be (andis) done using a handful of lines of Python. Combining this proofwith the standard reductions (which are also implemented in Python)allows students to appreciate visually how a question about computa-tion can be mapped into a question about (for example) the existenceof an independent set in a graph.

    Some other differences between this book and previous texts arethe following:

    1. For measuring time complexity, we use the standard RAM machinemodel used (implicitly) in algorithms courses, rather than Tur-ing machines. While these two models are of course polynomiallyequivalent, and hence make no difference for the definitions of theclasses P, NP, and EXP, our choice makes the distinction betweennotions such as 𝑂(𝑛) or 𝑂(𝑛2) time more meaningful. This choicealso ensures that these finer-grained time complexity classes corre-spond to the informal definitions of linear and quadratic time that

    https://www.cs.cornell.edu/~kozen/Papers/NetKAT-APLAS.pdf

  • 24

    students encounter in their algorithms lectures (or their whiteboardcoding interviews…).

    2. We use the terminology of functions rather than languages. That is,rather than saying that a Turing Machine 𝑀 decides a language 𝐿 ⊆{0, 1}∗, we say that it computes a function 𝐹 ∶ {0, 1}∗ → {0, 1}. Theterminology of “languages” arises from Chomsky’s work [Cho56],but it is often more confusing than illuminating. The languageterminology also makes it cumbersome to discuss concepts suchas algorithms that compute functions with more than one bit ofoutput (including basic tasks such as addition, multiplication,etc…). The fact that we use functions rather than languages meanswe have to be extra vigilant about students distinguishing betweenthe specification of a computational task (e.g., the function) and itsimplementation (e.g., the program). On the other hand, this point isso important that it is worth repeatedly emphasizing and drillinginto the students, regardless of the notation used. The book doesmention the language terminology and reminds of it occasionally,to make it easier for students to consult outside resources.

    Reducing the time dedicated to finite automata and context-freelanguages allows instructors to spend more time on topics that a mod-ern course in the theory of computing needs to touch upon. Theseinclude randomness and computation, the interactions between proofsand programs (including Gödel’s incompleteness theorem, interactiveproof systems, and even a bit on the 𝜆-calculus and the Curry-Howardcorrespondence), cryptography, and quantum computing.

    This book contains sufficient detail to enable its use for self-study.Toward that end, every chapter starts with a list of learning objectives,ends with a recap, and is peppered with “pause boxes” which encour-age students to stop and work out an argument or make sure theyunderstand a definition before continuing further.

    Section 0.5 contains a “roadmap” for this book, with descriptionsof the different chapters, as well as the dependency structure betweenthem. This can help in planning a course based on this book.

    0.3 ACKNOWLEDGEMENTS

    This text is continually evolving, and I am getting input from manypeople, for which I am deeply grateful. Salil Vadhan co-taught withme the first iteration of this course and gave me a tremendous amountof useful feedback and insights during this process. Michele Amorettiand Marika Swanberg carefully read several chapters of this text andgave extremely helpful detailed comments. Dave Evans and RichardXu contributed many pull requests fixing errors and improving phras-

  • 25

    ing. Thanks to Anil Ada, Venkat Guruswami, and Ryan O’Donnell forhelpful tips from their experience in teaching CMU 15-251.

    Thanks to everyone that sent me comments, typo reports, or postedissues or pull requests on the GitHub repository https://github.com/boazbk/tcs. In particular I would like to acknowledge helpfulfeedback from Scott Aaronson, Michele Amoretti, Aadi Bajpai, Mar-guerite Basta, Anindya Basu, Sam Benkelman, Jarosław Błasiok, EmilyChan, Christy Cheng, Michelle Chiang, Daniel Chiu, Chi-Ning Chou,Michael Colavita, Rodrigo Daboin Sanchez, Robert Darley Waddilove,Anlan Du, Juan Esteller, David Evans, Michael Fine, Simon Fischer,Leor Fishman, Zaymon Foulds-Cook, William Fu, Kent Furuie, PiotrGaluszka, Carolyn Ge, Mark Goldstein, Alexander Golovnev, SayanGoswami, Michael Haak, Rebecca Hao, Joosep Hook, Thomas HUET,Emily Jia, Chan Kang, Nina Katz-Christy, Vidak Kazic, Eddie Kohler,Estefania Lahera, Allison Lee, Benjamin Lee, Ondřej Lengál, RaymondLin, Emma Ling, Alex Lombardi, Lisa Lu, Aditya Mahadevan, Chris-tian May, Jacob Meyerson, Leon Mlodzian, George Moe, Glenn Moss,Hamish Nicholson, Owen Niles, Sandip Nirmel, Sebastian Oberhoff,Thomas Orton, Joshua Pan, Pablo Parrilo, Juan Perdomo, Banks Pick-ett, Aaron Sachs, Abdelrhman Saleh, Brian Sapozhnikov, AnthonyScemama, Peter Schäfer, Josh Seides, Alaisha Sharma, Haneul Shin,Noah Singer, Matthew Smedberg, Miguel Solano, Hikari Sorensen,David Steurer, Alec Sun, Amol Surati, Everett Sussman, Marika Swan-berg, Garrett Tanzer, Eric Thomas, Sarah Turnill, Salil Vadhan, PatrickWatts, Jonah Weissman, Ryan Williams, Licheng Xu, Richard Xu, Wan-qian Yang, Elizabeth Yeoh-Wang, Josh Zelinsky, Fred Zhang, GraceZhang, and Jessica Zhu.

    I am using many open source software packages in the productionof these notes for which I am grateful. In particular, I am thankful toDonald Knuth and Leslie Lamport for LaTeX and to John MacFarlanefor Pandoc. David Steurer wrote the original scripts to produce thistext. The current version uses Sergio Correia’s panflute. The templatesfor the LaTeX and HTML versions are derived from Tufte LaTeX,Gitbook and Bookdown. Thanks to Amy Hendrickson for some LaTeXconsulting. Juan Esteller and Gabe Montague initially implementedthe NAND* programming languages in OCaml and Javascript. I usedthe Jupyter project to write the supplemental code snippets.

    Finally, I would like to thank my family: my wife Ravit, and mychildren Alma and Goren. Working on this book (and the correspond-ing course) took so much of my time that Alma wrote an essay for herfifth-grade class saying that “universities should not pressure profes-sors to work too much.” I’m afraid all I have to show for this effort is600 pages of ultra-boring mathematical text.

    http://www.cs.cmu.edu/~./15251/https://github.com/boazbk/tcshttps://github.com/boazbk/tcshttps://www.latex-project.org/http://pandoc.org/http://scorreia.com/software/panflute/https://tufte-latex.github.io/tufte-latex/https://www.gitbook.com/https://bookdown.org/http://jupyter.org/

  • PRELIMINARIES

  • 1 This quote is typically read as disparaging theimportance of actual physical computers in ComputerScience, but note that telescopes are absolutelyessential to astronomy as they provide us with themeans to connect theoretical predictions with actualexperimental observations.2 To be fair, in the following sentence Graham says“you need to know how to calculate time and spacecomplexity and about Turing completeness”. Thisbook includes these topics, as well as others such asNP-hardness, randomization, cryptography, quantumcomputing, and more.

    0Introduction

    “Computer Science is no more about computers than astronomy is abouttelescopes”, attributed to Edsger Dijkstra.1

    “Hackers need to understand the theory of computation about as much aspainters need to understand paint chemistry.”, Paul Graham 2003.2

    “The subject of my talk is perhaps most directly indicated by simply askingtwo questions: first, is it harder to multiply than to add? and second, why?…I(would like to) show that there is no algorithm for multiplication computation-ally as simple as that for addition, and this proves something of a stumblingblock.”, Alan Cobham, 1964

    The origin of much of science and medicine can be traced back tothe ancient Babylonians. However, the Babylonians’ most significantcontribution to humanity was arguably the invention of the place-valuenumber system. The place-value system represents any number usinga collection of digits, whereby the position of the digit is used to de-termine its value, as opposed to a system such as Roman numerals,where every symbol has a fixed numerical value regardless of posi-tion. For example, the average distance to the moon is approximately238,900 of our miles or 259,956 Roman miles. The latter quantity, ex-pressed in standard Roman numerals isMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMDCCCCLVI

    Writing the distance to the sun in Roman numerals would requireabout 100,000 symbols: a 50-page book just containing this singlenumber!

    Compiled on 9.8.2020 05:00

    Learning Objectives:• Introduce and motivate the study of

    computation for its own sake, irrespective ofparticular implementations.

    • The notion of an algorithm and some of itshistory.

    • Algorithms as not just tools, but also ways ofthinking and understanding.

    • Taste of Big-𝑂 analysis and the surprisingcreativity in the design of efficientalgorithms.

  • 30 introduction to theoretical computer science

    For someone who thinks of numbers in an additive system likeRoman numerals, quantities like the distance to the moon or sun arenot merely large—they are unspeakable: cannot be expressed or evengrasped. It’s no wonder that Eratosthenes, who was the first personto calculate the earth’s diameter (up to about ten percent error) andHipparchus who was the first to calculate the distance to the moon,did not use a Roman-numeral type system but rather the Babyloniansexagesimal (i.e., base 60) place-value system.

    0.1 INTEGER MULTIPLICATION: AN EXAMPLE OF AN ALGORITHM

    In the language of Computer Science, the place-value system for rep-resenting numbers is known as a data structure: a set of instructions,or “recipe”, for representing objects as symbols. An algorithm is a setof instructions, or “recipe”, for performing operations on such rep-resentations. Data structures and algorithms have enabled amazingapplications that have transformed human society, but their impor-tance goes beyond their practical utility. Structures from computerscience, such as bits, strings, graphs, and even the notion of a programitself, as well as concepts such as universality and replication, have notjust found (many) practical uses but contributed a new language anda new way to view the world.

    In addition to coming up with the place-value system, the Babylo-nians also invented the “standard algorithms” that we were all taughtin elementary school for adding and multiplying numbers. These al-gorithms have been essential throughout the ages for people usingabaci, papyrus, or pencil and paper, but in our computer age, do theystill serve any purpose beyond torturing third-graders? To see whythese algorithms are still very much relevant, let us compare the Baby-lonian digit-by-digit multiplication algorithm (“grade-school multi-plication”) with the naive algorithm that multiplies numbers throughrepeated addition. We start by formally describing both algorithms,see Algorithm 0.1 and Algorithm 0.2.

    Algorithm 0.1 — Multiplication via repeated addition.

    Input: Non-negative integers 𝑥, 𝑦Output: Product 𝑥 ⋅ 𝑦1: Let 𝑟𝑒𝑠𝑢𝑙𝑡 ← 0.2: for 𝑖 = 1, … , 𝑦 do3: 𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑟𝑒𝑠𝑢𝑙𝑡 + 𝑥4: end for5: return 𝑟𝑒𝑠𝑢𝑙𝑡

  • introduction 31

    Algorithm 0.2 — Grade-school multiplication.

    Input: Non-negative integers 𝑥, 𝑦Output: Product 𝑥 ⋅ 𝑦1: Write 𝑥 = 𝑥𝑛−1𝑥𝑛−2 ⋯ 𝑥0 and 𝑦 = 𝑦𝑚−1𝑦𝑚−2 ⋯ 𝑦0 in dec-

    imal place-value notation. # 𝑥0 is the ones digit of 𝑥, 𝑥1 isthe tens digit, etc.

    2: Let 𝑟𝑒𝑠𝑢𝑙𝑡 ← 03: for 𝑖 = 0, … , 𝑛 − 1 do4: for 𝑗 = 0, … , 𝑚 − 1 do5: 𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑟𝑒𝑠𝑢𝑙𝑡 + 10𝑖+𝑗 ⋅ 𝑥𝑖 ⋅ 𝑦𝑗6: end for7: end for8: return 𝑟𝑒𝑠𝑢𝑙𝑡

    Both Algorithm 0.1 and Algorithm 0.2 assume that we alreadyknow how to add numbers, and Algorithm 0.2 also assumes that wecan multiply a number by a power of 10 (which is, after all, a sim-ple shift). Suppose that 𝑥 and 𝑦 are two integers of 𝑛 = 20 decimaldigits each. (This roughly corresponds to 64 binary digits, which isa common size in many programming languages.) Computing 𝑥 ⋅ 𝑦using Algorithm 0.1 entails adding 𝑥 to itself 𝑦 times which entails(since 𝑦 is a 20-digit number) at least 1019 additions. In contrast, thegrade-school algorithm (i.e., Algorithm 0.2) involves 𝑛2 shifts andsingle-digit products, and so at most 2𝑛2 = 800 single-digit opera-tions. To understand the difference, consider that a grade-schooler canperform a single-digit operation in about 2 seconds, and so would re-quire about 1, 600 seconds (about half an hour) to compute 𝑥 ⋅ 𝑦 usingAlgorithm 0.2. In contrast, even though it is more than a billion timesfaster than a human, if we used Algorithm 0.1 to compute 𝑥 ⋅ 𝑦 using amodern PC, it would take us 1020/109 = 1011 seconds (which is morethan three millennia!) to compute the same result.

    Computers have not made algorithms obsolete. On the contrary,the vast increase in our ability to measure, store, and communicatedata has led to much higher demand for developing better and moresophisticated algorithms that empower us to make better decisionsbased on these data. We also see that in no small extent the notion ofalgorithm is independent of the actual computing device that executesit. The digit-by-digit multiplication algorithm is vastly better than iter-ated addition, regardless whether the technology we use to implementit is a silicon-based chip, or a third-grader with pen and paper.

    Theoretical computer science is concerned with the inherent proper-ties of algorithms and computation; namely, those properties that areindependent of current technology. We ask some questions that were

  • 32 introduction to theoretical computer science

    already pondered by the Babylonians, such as “what is the best way tomultiply two numbers?”, but also questions that rely on cutting-edgescience such as “could we use the effects of quantum entanglement tofactor numbers faster?”.

    RRemark 0.3 — Specification, implementation and analysisof algorithms.. A full description of an algorithm hasthree components:

    • Specification: What is the task that the algorithmperforms (e.g., multiplication in the case of Algo-rithm 0.1 and Algorithm 0.2.)

    • Implementation: How is the task accomplished:what is the sequence of instructions to be per-formed. Even though Algorithm 0.1 and Algo-rithm 0.2 perform the same computational task(i.e., they have the same specification), they do it indifferent ways (i.e., they have different implementa-tions).

    • Analysis: Why does this sequence of instructionsachieve the desired task. A full description of Algo-rithm 0.1 and Algorithm 0.2 will include a proof foreach one of these algorithms that on input 𝑥, 𝑦, thealgorithm does indeed output 𝑥 ⋅ 𝑦.

    Often as part of the analysis we show that the algo-rithm is not only correct but also efficient. That is, wewant to show that not only will the algorithm computethe desired task, but will do so in prescribed numberof operations. For example Algorithm 0.2 computesthe multiplication function on inputs of 𝑛 digits using𝑂(𝑛2) operations, while Algorithm 0.4 (describedbelow) computes the same function using 𝑂(𝑛1.6)operations. (We define the 𝑂 notations used here inSection 1.4.8.)

    0.2 EXTENDED EXAMPLE: A FASTER WAY TO MULTIPLY (OP-TIONAL)

    Once you think of the standard digit-by-digit multiplication algo-rithm, it seems like the “obviously best’ ’ way to multiply numbers.In 1960, the famous mathematician Andrey Kolmogorov organizeda seminar at Moscow State University in which he conjectured thatevery algorithm for multiplying two 𝑛 digit numbers would requirea number of basic operations that is proportional to 𝑛2 (Ω(𝑛2) opera-tions, using 𝑂-notation as defined in Chapter 1). In other words, Kol-mogorov conjectured that in any multiplication algorithm, doublingthe number of digits would quadruple the number of basic operationsrequired. A young student named Anatoly Karatsuba was in the au-

  • introduction 33

    Figure 1: The grade-school multiplication algorithmillustrated for multiplying 𝑥 = 10𝑥 + 𝑥 and 𝑦 =10𝑦 + 𝑦. It uses the formula (10𝑥 + 𝑥) × (10𝑦 + 𝑦) =100𝑥𝑦 + 10(𝑥𝑦 + 𝑥𝑦) + 𝑥𝑦.

    3 If 𝑥 is a number then ⌊𝑥⌋ is the integer obtained byrounding it down, see Section 1.7.

    dience, and within a week he disproved Kolmogorov’s conjecture bydiscovering an algorithm that requires only about 𝐶𝑛1.6 operationsfor some constant 𝐶. Such a number becomes much smaller than 𝑛2as 𝑛 grows and so for large 𝑛 Karatsuba’s algorithm is superior to thegrade-school one. (For example, Python’s implementation switchesfrom the grade-school algorithm to Karatsuba’s algorithm for num-bers that are 1000 bits or larger.) While the difference between an𝑂(𝑛1.6) and an 𝑂(𝑛2) algorithm can be sometimes crucial in practice(see Section 0.3 below), in this book we will mostly ignore such dis-tinctions. However, we describe Karatsuba’s algorithm below since itis a good example of how algorithms can often be surprising, as wellas a demonstration of the analysis of algorithms, which is central to thisbook and to theoretical computer science at large.

    Karatsuba’s algorithm is based on a faster way to multiply two-digitnumbers. Suppose that 𝑥, 𝑦 ∈ [100] = {0, … , 99} are a pair of two-digit numbers. Let’s write 𝑥 for the “tens” digit of 𝑥, and 𝑥 for the“ones” digit, so that 𝑥 = 10𝑥 + 𝑥, and write similarly 𝑦 = 10𝑦 + 𝑦 for𝑥, 𝑥, 𝑦, 𝑦 ∈ [10]. The grade-school algorithm for multiplying 𝑥 and 𝑦 isillustrated in Fig. 1.

    The grade-school algorithm can be thought of as transforming thetask of multiplying a pair of two-digit numbers into four single-digitmultiplications via the formula(10𝑥 + 𝑥) × (10𝑦 + 𝑦) = 100𝑥𝑦 + 10(𝑥𝑦 + 𝑥𝑦) + 𝑥𝑦 (1)

    Generally, in the grade-school algorithm doubling the number ofdigits in the input results in quadrupling the number of operations,leading to an 𝑂(𝑛2) times algorithm. In contrast, Karatsuba’s algo-rithm is based on the observation that we can express Eq. (1) alsoas

    (10𝑥+𝑥)×(10𝑦+𝑦) = (100−10)𝑥𝑦+10 [(𝑥 + 𝑥)(𝑦 + 𝑦)]−(10−1)𝑥𝑦 (2)which reduces multiplying the two-digit number 𝑥 and 𝑦 to com-

    puting the following three simpler products: 𝑥𝑦, 𝑥𝑦 and (𝑥 + 𝑥)(𝑦 + 𝑦).By repeating the same strategy recursively, we can reduce the task ofmultiplying two 𝑛-digit numbers to the task of multiplying three pairsof ⌊𝑛/2⌋ + 1 digit numbers.3 Since every time we double the number ofdigits we triple the number of operations, we will be able to multiplynumbers of 𝑛 = 2ℓ digits using about 3ℓ = 𝑛log2 3 ∼ 𝑛1.585 operations.

    The above is the intuitive idea behind Karatsuba’s algorithm, but isnot enough to fully specify it. A complete description of an algorithmentails a precise specification of its operations together with its analysis:proof that the algorithm does in fact do what it’s supposed to do. The

    https://svn.python.org/projects/python/trunk/Objects/longobject.c

  • 34 introduction to theoretical computer science

    Figure 2: Karatsuba’s multiplication algorithm illus-trated for multiplying 𝑥 = 10𝑥 + 𝑥 and 𝑦 = 10𝑦 + 𝑦.We compute the three orange, green and purple prod-ucts 𝑥𝑦, 𝑥𝑦 and (𝑥 + 𝑥)(𝑦 + 𝑦) and then add andsubtract them to obtain the result.

    Figure 3: Running time of Karatsuba’s algorithmvs. the grade-school algorithm. (Python implementa-tion available online.) Note the existence of a “cutoff”length, where for sufficiently large inputs Karat-suba becomes more efficient than the grade-schoolalgorithm. The precise cutoff location varies by imple-mentation and platform details, but will always occureventually.

    operations of Karatsuba’s algorithm are detailed in Algorithm 0.4,while the analysis is given in Lemma 0.5 and Lemma 0.6.

    Algorithm 0.4 — Karatsuba multiplication.

    Input: non-negative integers 𝑥, 𝑦 each of at most 𝑛 digitsOutput: 𝑥 ⋅ 𝑦1: procedure Karatsuba(𝑥,𝑦)2: if 𝑛 ≤ 4 then return 𝑥 ⋅ 𝑦 ;3: Let 𝑚 = ⌊𝑛/2⌋4: Write 𝑥 = 10𝑚𝑥 + 𝑥 and 𝑦 = 10𝑚𝑦 + 𝑦5: 𝐴 ← Karatsuba(𝑥, 𝑦)6: 𝐵 ← Karatsuba(𝑥 + 𝑥, 𝑦 + 𝑦)7: 𝐶 ← Karatsuba(𝑥, 𝑦)8: return (10𝑛 − 10𝑚) ⋅ 𝐴 + 10𝑚 ⋅ 𝐵 + (1 − 10𝑚) ⋅ 𝐶9: end procedure

    Algorithm 0.4 is only half of the full description of Karatsuba’salgorithm. The other half is the analysis, which entails proving that (1)Algorithm 0.4 indeed computes the multiplication operation and (2)it does so using 𝑂(𝑛log2 3) operations. We now turn to showing bothfacts:Lemma 0.5 For every non-negative integers 𝑥, 𝑦, when given input 𝑥, 𝑦Algorithm 0.4 will output 𝑥 ⋅ 𝑦.Proof. Let 𝑛 be the maximum number of digits of 𝑥 and 𝑦. We provethe lemma by induction on 𝑛. The base case is 𝑛 ≤ 4 where the algo-rithm returns 𝑥 ⋅ 𝑦 by definition. (It does not matter which algorithmwe use to multiply four-digit numbers - we can even use repeatedaddition.) Otherwise, if 𝑛 > 4, we define 𝑚 = ⌊𝑛/2⌋, and write𝑥 = 10𝑚𝑥 + 𝑥 and 𝑦 = 10𝑚𝑦 + 𝑦.

    Plugging this into 𝑥 ⋅ 𝑦, we get𝑥 ⋅ 𝑦 = 102𝑚𝑥𝑦 + 10𝑚(𝑥𝑦 + 𝑥𝑦) + 𝑥𝑦 . (3)Rearranging the terms we see that𝑥 ⋅ 𝑦 = 102𝑚𝑥𝑦 + 10𝑚 [(𝑥 + 𝑥)(𝑦 + 𝑦) − 𝑥𝑦 − 𝑥𝑦] + 𝑥𝑦 . (4)

    since the numbers 𝑥,𝑥, 𝑦,𝑦,𝑥 + 𝑥,𝑦 + 𝑦 all have at most 𝑚 + 1 < 𝑛 digits,the induction hypothesis implies that the values 𝐴, 𝐵, 𝐶 computedby the recursive calls will satisfy 𝐴 = 𝑥𝑦, 𝐵 = (𝑥 + 𝑥)(𝑦 + 𝑦) and𝐶 = 𝑥𝑦. Plugging this into (4) we see that 𝑥 ⋅ 𝑦 equals the value(102𝑚 − 10𝑚) ⋅ 𝐴 + 10𝑚 ⋅ 𝐵 + (1 − 10𝑚) ⋅ 𝐶 computed by Algorithm 0.4.

    https://goo.gl/zwzpYe

  • introduction 35

    Lemma 0.6 If 𝑥, 𝑦 are integers of at most 𝑛 digits, Algorithm 0.4 willtake 𝑂(𝑛log2 3) operations on input 𝑥, 𝑦.Proof. Fig. 2 illustrates the idea behind the proof, which we onlysketch here, leaving filling out the details as Exercise 0.4. The proofis again by induction. We define 𝑇 (𝑛) to be the maximum number ofsteps that Algorithm 0.4 takes on inputs of length at most 𝑛. Since inthe base case 𝑛 ≤ 2, Exercise 0.4 performs a constant number of com-putation, we know that 𝑇 (2) ≤ 𝑐 for some constant 𝑐 and for 𝑛 > 2, itsatisfies the recursive equation𝑇 (𝑛) ≤ 3𝑇 (⌊𝑛/2⌋ + 1) + 𝑐′𝑛 (5)for some constant 𝑐′ (using the fact that addition can be done in 𝑂(𝑛)operations).

    The recursive equation (5) solves to 𝑂(𝑛log2 3). The intuition be-hind this is presented in Fig. 2, and this is also a consequence of theso-called “Master Theorem” on recurrence relations. As mentionedabove, we leave completing the proof to the reader as Exercise 0.4.

    Figure 4: Karatsuba’s algorithm reduces an 𝑛-bitmultiplication to three 𝑛/2-bit multiplications,which in turn are reduced to nine 𝑛/4-bit multi-plications and so on. We can represent the compu-tational cost of all these multiplications in a 3-arytree of depth log2 𝑛, where at the root the extra costis 𝑐𝑛 operations, at the first level the extra cost is𝑐(𝑛/2) operations, and at each of the 3𝑖 nodes oflevel 𝑖, the extra cost is 𝑐(𝑛/2𝑖). The total cost is𝑐𝑛 ∑log2 𝑛𝑖=0 (3/2)𝑖 ≤ 10𝑐𝑛log2 3 by the formula forsumming a geometric series.

    Karatsuba’s algorithm is by no means the end of the line for multi-plication algorithms. In the 1960’s, Toom and Cook extended Karat-suba’s ideas to get an 𝑂(𝑛log𝑘(2𝑘−1)) time multiplication algorithm forevery constant 𝑘. In 1971, Schönhage and Strassen got even better al-gorithms using the Fast Fourier Transform; their idea was to somehowtreat integers as “signals” and do the multiplication more efficientlyby moving to the Fourier domain. (The Fourier transform is a centraltool in mathematics and engineering, used in a great many applica-tions; if you have not seen it yet, you are likely encounter it at somepoint in your studies.) In the years that followed researchers kept im-proving the algorithm, and only very recently Harvey and Van Der

    https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms)

  • 36 introduction to theoretical computer science

    Hoeven managed to obtain an 𝑂(𝑛 log𝑛) time algorithm for multipli-cation (though it only starts beating the Schönhage-Strassen algorithmfor truly astronomical numbers). Yet, despite all this progress, westill don’t know whether or not there is an 𝑂(𝑛) time algorithm formultiplying two 𝑛 digit numbers!

    RRemark 0.7 — Matrix Multiplication (advanced note).(This book contains many “advanced” or “optional”notes and sections. These may assume backgroundthat not every student has, and can be safely skippedover as none of the future parts depends on them.)Ideas similar to Karatsuba’s can be used to speed upmatrix multiplications as well. Matrices are a powerfulway to represent linear equations and operations,widely used in numerous applications of scientificcomputing, graphics, machine learning, and manymany more.One of the basic operations one can do withtwo matrices is to multiply them. For example,

    if 𝑥 = (𝑥0,0 𝑥0,1𝑥1,0 𝑥1,1) and 𝑦 = (𝑦0,0 𝑦0,1𝑦1,0 𝑦1,1)then the product of 𝑥 and 𝑦 is the matrix(𝑥0,0𝑦0,0 + 𝑥0,1𝑦1,0 𝑥0,0𝑦0,1 + 𝑥0,1𝑦1,1𝑥1,0𝑦0,0 + 𝑥1,1𝑦1,0 𝑥1,0𝑦0,1 + 𝑥1,1𝑦1,1). You cansee that we can compute this matrix by eight productsof numbers.Now suppose that 𝑛 is even and 𝑥 and 𝑦 are a pair of𝑛 × 𝑛 matrices which we can think of as each com-posed of four (𝑛/2) × (𝑛/2) blocks 𝑥0,0, 𝑥0,1, 𝑥1,0, 𝑥1,1and 𝑦0,0, 𝑦0,1, 𝑦1,0, 𝑦1,1. Then the formula for the matrixproduct of 𝑥 and 𝑦 can be expressed in the same wayas above, just replacing products 𝑥𝑎,𝑏𝑦𝑐,𝑑 with matrixproducts, and addition with matrix addition. Thismeans that we can use the formula above to give analgorithm that doubles the dimension of the matricesat the expense of increasing the number of operationby a factor of 8, which for 𝑛 = 2ℓ results in 8ℓ = 𝑛3operations.In 1969 Volker Strassen noted that we can computethe product of a pair of two-by-two matrices usingonly seven products of numbers by observing thateach entry of the matrix 𝑥𝑦 can be computed byadding and subtracting the following seven terms:𝑡1 = (𝑥0,0 + 𝑥1,1)(𝑦0,0 + 𝑦1,1), 𝑡2 = (𝑥0,0 + 𝑥1,1)𝑦0,0,𝑡3 = 𝑥0,0(𝑦0,1 − 𝑦1,1), 𝑡4 = 𝑥1,1(𝑦0,1 − 𝑦0,0),𝑡5 = (𝑥0,0 + 𝑥0,1)𝑦1,1, 𝑡6 = (𝑥1,0 − 𝑥0,0)(𝑦0,0 + 𝑦0,1),𝑡7 = (𝑥0,1 − 𝑥1,1)(𝑦1,0 + 𝑦1,1). Indeed, one can verifythat 𝑥𝑦 = (𝑡1 + 𝑡4 − 𝑡5 + 𝑡7 𝑡3 + 𝑡5𝑡2 + 𝑡4 𝑡1 + 𝑡3 − 𝑡2 + 𝑡6).

  • introduction 37

    Using this observation, we can obtain an algorithmsuch that doubling the dimension of the matricesresults in increasing the number of operations by afactor of 7, which means that for 𝑛 = 2ℓ the cost is7ℓ = 𝑛log2 7 ∼ 𝑛2.807. A long sequence of work hassince improved this algorithm, and the current recordhas running time about 𝑂(𝑛2.373). However, unlike thecase of integer multiplication, at the moment we don’tknow of any algorithm for matrix multiplication thatruns in time linear or even close to linear in the sizeof the input matrices (e.g., an 𝑂(𝑛2𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛)) timealgorithm). People have tried to use group represen-tations, which can be thought of as generalizations ofthe Fourier transform, to obtain faster algorithms, butthis effort has not yet succeeded.

    0.3 ALGORITHMS BEYOND ARITHMETIC

    The quest for better algorithms is by no means restricted to arithmetictasks such as adding, multiplying or solving equations. Many graphalgorithms, including algorithms for finding paths, matchings, span-ning trees, cuts, and flows, have been discovered in the last severaldecades, and this is still an intensive area of research. (For example,the last few years saw many advances in algorithms for the maximumflow problem, borne out of unexpected connections with electrical cir-cuits and linear equation solvers.) These algorithms are being usednot just for the “natural” applications of routing network traffic orGPS-based navigation, but also for applications as varied as drug dis-covery through searching for structures in gene-interaction graphs tocomputing risks from correlations in financial investments.

    Google was founded based on the PageRank algorithm, which isan efficient algorithm to approximate the “principal eigenvector” of(a dampened version of) the adjacency matrix of the web graph. TheAkamai company was founded based on a new data structure, knownas consistent hashing, for a hash table where buckets are stored at dif-ferent servers. The backpropagation algorithm, which computes partialderivatives of a neural network in 𝑂(𝑛) instead of 𝑂(𝑛2) time, under-lies many of the recent phenomenal successes of learning deep neuralnetworks. Algorithms for solving linear equations under sparsityconstraints, a concept known as compressed sensing, have been usedto drastically reduce the amount and quality of data needed to ana-lyze MRI images. This made a critical difference for MRI imaging ofcancer tumors in children, where previously doctors needed to useanesthesia to suspend breath during the MRI exam, sometimes withdire consequences.

    https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Sub-cubic_algorithmshttps://en.wikipedia.org/wiki/Group_representationhttps://en.wikipedia.org/wiki/Group_representationhttp://discreteanalysisjournal.com/article/1245-on-cap-sets-and-the-group-theoretic-approach-to-matrix-multiplication

  • 38 introduction to theoretical computer science

    Even for classical questions, studied through the ages, new dis-coveries are still being made. For example, for the question of de-termining whether a given integer is prime or composite, which hasbeen studied since the days of Pythagoras, efficient probabilistic algo-rithms were only discovered in the 1970s, while the first deterministicpolynomial-time algorithm was only found in 2002. For the relatedproblem of actually finding the factors of a composite number, newalgorithms were found in the 1980s, and (as we’ll see later in thiscourse) discoveries in the 1990s raised the tantalizing prospect ofobtaining faster algorithms through the use of quantum mechanicaleffects.

    Despite all this progress, there are still many more questions thananswers in the world of algorithms. For almost all natural prob-lems, we do not know whether the current algorithm is the “best”,or whether a significantly better one is still waiting to be discovered.As alluded in Cobham’s opening quote for this chapter, even for thebasic problem of multiplying numbers we have not yet answered thequestion of whether there is a multiplication algorithm that is as ef-ficient as our algorithms for addition. But at least we now know theright way to ask it.

    0.4 ON THE IMPORTANCE OF NEGATIVE RESULTS

    Finding better algorithms for problems such as multiplication, solv-ing equations, graph problems, or fitting neural networks to data, isundoubtedly a worthwhile endeavor. But why is it important to provethat such algorithms don’t exist? One motivation is pure intellectualcuriosity. Another reason to study impossibility results is that theycorrespond to the fundamental limits of our world. In other words,impossibility results are laws of nature.

    Here are some examples of impossibility results outside computerscience (see Section 0.7 for more about these). In physics, the impos-sibility of building a perpetual motion machine corresponds to the lawof conservation of energy. The impossibility of building a heat enginebeating Carnot’s bound corresponds to the second law of thermody-namics, while the impossibility of faster-than-light information trans-mission is a cornerstone of special relativity. In mathematics, while weall learned the formula for solving quadratic equations in high school,the impossibility of generalizing this formula to equations of degreefive or more gave birth to group theory. The impossibility of provingEuclid’s fifth axiom from the first four gave rise to non-Euclidean ge-ometries, which ended up crucial for the theory of general relativity.

    In an analogous way, impossibility results for computation corre-spond to “computational laws of nature” that tell us about the fun-damental limits of any information processing apparatus, whether

    https://en.wikipedia.org/wiki/AKS_primality_testhttps://en.wikipedia.org/wiki/AKS_primality_testhttps://en.wikipedia.org/wiki/Non-Euclidean_geometryhttps://en.wikipedia.org/wiki/Non-Euclidean_geometry

  • introduction 39

    based on silicon, neurons, or quantum particles. Moreover, computerscientists found creative approaches to apply computational limitationsto achieve certain useful tasks. For example, much of modern Internettraffic is encrypted using the RSA encryption scheme, which relies onits security on the (conjectured) impossibility of efficiently factoringlarge integers. More recently, the Bitcoin system uses a digital ana-log of the “gold standard” where, instead of using a precious metal,new currency is obtained by “mining” solutions for computationallydifficult problems.

    ✓ Chapter Recap• The history of algorithms goes back thousands

    of years; they have been essential to much of hu-man progress and these days form the basis ofmulti-billion dollar industries, as well as life-savingtechnologies.

    • There is often more than one algorithm to achievethe same computational task. Finding a faster al-gorithm can often make a much bigger differencethan improving computing hardware.

    • Better algorithms and data structures don’t justspeed up calculations, but can yield new qualitativeinsights.

    • One question we will study is to find out what isthe most efficient algorithm for a given problem.

    • To show that an algorithm is the most efficient onefor a given problem, we need to be able to provethat it is impossible to solve the problem using asmaller amount of computational resources.

    0.5 ROADMAP TO THE REST OF THIS BOOK

    Often, when we try to solve a computational problem, whether it issolving a system of linear equations, finding the top eigenvector of amatrix, or trying to rank Internet search results, it is enough to use the“I know it when I see it” standard for describing algorithms. As longas we find some way to solve the problem, we are happy and mightnot care much on the exact mathematical model for our algorithm.But when we want to answer a question such as “does there exist analgorithm to solve the problem 𝑃 ?” we need to be much more precise.

    In particular, we will need to (1) define exactly what it means tosolve 𝑃 , and (2) define exactly what an algorithm is. Even (1) cansometimes be non-trivial but (2) is particularly challenging; it is notat all clear how (and even whether) we can encompass all potentialways to design algorithms. We will consider several simple models ofcomputation, and argue that, despite their simplicity, they do capture

    https://en.wikipedia.org/wiki/RSA_(cryptosystem)https://en.wikipedia.org/wiki/Bitcoin

  • 40 introduction to theoretical computer science

    all “reasonable” approaches to achieve computing, including all thosethat are currently used in modern computing devices.

    Once we have these formal models of computation, we can tryto obtain impossibility results for computational tasks, showing thatsome problems can not be solved (or perhaps can not be solved withinthe resources of our universe). Archimedes once said that given afulcrum and a long enough lever, he could move the world. We willsee how reductions allow us to leverage one hardness result into aslew of a great many others, illuminating the boundaries betweenthe computable and uncomputable (or tractable and intractable)problems.

    Later in this book we will go back to examining our models ofcomputation, and see how resources such as randomness or quantumentanglement could potentially change the power of our model. Inthe context of probabilistic algorithms, we will see a glimpse of howrandomness has become an indispensable tool for understandingcomputation, information, and communication. We will also see howcomputational difficulty can be an asset rather than a hindrance, andbe used for the “derandomization” of probabilistic algorithms. Thesame ideas also show up in cryptography, which has undergone notjust a technological but also an intellectual revolution in the last fewdecades, much of it building on the foundations that we explore inthis course.

    Theoretical Computer Science is a vast topic, branching out andtouching upon many scientific and engineering disciplines. This bookprovides a very partial (and biased) sample of this area. More thananything, I hope I will manage to “infect” you with at least some ofmy love for this field, which is inspired and enriched by the connec-tion to practice, but is also deep and beautiful regardless of applica-tions.

    0.5.1 Dependencies between chaptersThis book is divided into the following parts, see Fig. 5.

    • Preliminaries: Introduction, mathematical background, and repre-senting objects as strings.

    • Part I: Finite computation (Boolean circuits): Equivalence of cir-cuits and straight-line programs. Universal gate sets. Existence of acircuit for every function, representing circuits as strings, universalcircuit, lower bound on circuit size using the counting argument.

    • Part II: Uniform computation (Turing machines): Equivalence ofTuring machines and programs with loops. Equivalence of models(including RAM machines, 𝜆 calculus, and cellular automata), con-figurations of Turing machines, existence of a universal Turing ma-

  • introduction 41

    chine, uncomputable functions (including the Halting problem andRice’s Theorem), Gödel’s incompleteness theorem, restricted com-putational models models (regular and context free languages).

    • Part III: Efficient computation: Definition of running time, timehierarchy theorem, P and NP, P/poly, NP completeness and theCook-Levin Theorem, space bounded computation.

    • Part IV: Randomized computation: Probability, randomized algo-rithms, BPP, amplification, BPP ⊆ P/𝑝𝑜𝑙𝑦, pseudorandom genera-tors and derandomization.

    • Part V: Advanced topics: Cryptography, proofs and algorithms(interactive and zero knowledge proofs, Curry-Howard correspon-dence), quantum computing.

    Figure 5: The dependency structure of the differentparts. Part I introduces the model of Boolean cir-cuits to study finite functions with an emphasis onquantitative questions (how many gates to computea function). Part II introduces the model of Turingmachines to study functions that have unbounded inputlengths with an emphasis on qualitative questions (isthis function computable or not). Much of Part II doesnot depend on Part I, as Turing machines can be usedas the first computational model. Part III dependson both parts as it introduces a quantitative study offunctions with unbounded input length. The moreadvanced parts IV (randomized computation) andV (advanced topics) rely on the material of Parts I, IIand III.

    The book largely proceeds in linear order, with each chapter build-ing on the previous ones, with the following exceptions:

    • The topics of 𝜆 calculus (Section 8.5 and Section 8.5), Gödel’s in-completeness theorem (Chapter 11), Automata/regular expres-sions and context-free grammars (Chapter 10), and space-boundedcomputation (Chapter 17), are not used in the following chapters.Hence you can choose whether to cover or skip any subset of them.

    • Part II (Uniform Computation / Turing Machines) does not havea strong dependency on Part I (Finite computation / Boolean cir-cuits) and it should be possible to teach them in the reverse orderwith minor modification. Boolean circuits are used Part III (efficientcomputation) for results such as P ⊆ P/poly and the Cook-Levin

  • 42 introduction to theoretical computer science

    Theorem, as well as in Part IV (for BPP ⊆ P/poly and derandom-ization) and Part V (specifically in cryptography and quantumcomputing).

    • All chapters in Part V (Advanced topics) are independent of oneanother and can be covered in any order.

    A course based on this book can use all of Parts I, II, and III (possi-bly skipping over some or all of the 𝜆 calculus, Chapter 11, Chapter 10or Chapter 17), and then either cover all or some of Part IV (random-ized computation), and add a “sprinkling” of advanced topics fromPart V based on student or instructor interest.

    0.6 EXERCISES

    Exercise 0.1 Rank the significance of the following inventions in speed-ing up multiplication of large (that is 100-digit or more) numbers.That is, use “back of the envelope” estimates to order them in terms ofthe speedup factor they offered over the previous state of affairs.

    a. Discovery of the grade-school digit by digit algorithm (improvingupon repeated addition)

    b. Discovery of Karatsuba’s algorithm (improving upon the digit bydigit algorithm)

    c. Invention of modern electronic computers (improving upon calcu-lations with pen and paper).

    Exercise 0.2 The 1977 Apple II personal computer had a processorspeed of 1.023 Mhz or about 106 operations per seconds. At thetime of this writing the world’s fastest supercomputer performs 93“petaflops” (1015 floating point operations per second) or about 1018basic steps per second. For each one of the following running times(as a function of the input length 𝑛), compute for both computers howlarge an input they could handle in a week of computation, if they runan algorithm that has this running time:

    a. 𝑛 operations.b. 𝑛2 operations.c. 𝑛 log𝑛 operations.d. 2𝑛 operations.e. 𝑛! operations.

  • introduction 43

    4 As we will see in Chapter Chapter 21, almost anycompany relying on cryptography needs to assumethe non-existence of certain algorithms. In particular,RSA Security was founded based on the securityof the RSA cryptosystem, which presumes the non-existence of an efficient algorithm to compute theprime factorization of large integers.5 Hint: Use a proof by induction - suppose that this istrue for all 𝑛’s from 1 to 𝑚 and prove that this is truealso for 𝑚 + 1.

    6 Start by showing this for the case that 𝑛 = 𝑘𝑡 forsome natural number 𝑡, in which case you can do sorecursively by breaking the matrices into 𝑘 × 𝑘 blocks.

    Exercise 0.3 — Usefulness of algorithmic non-existence. In this chapter wementioned several companies that were founded based on the discov-ery of new algorithms. Can you give an example for a company thatwas founded based on the non-existence of an algorithm? See footnotefor hint.4

    Exercise 0.4 — Analysis of Karatsuba’s Algorithm. a. Suppose that𝑇1, 𝑇2, 𝑇3, … is a sequence of numbers such that 𝑇2 ≤ 10 andfor every 𝑛, 𝑇𝑛 ≤ 3𝑇⌊𝑛/2⌋+1 + 𝐶𝑛 for some 𝐶 ≥ 1. Prove that𝑇𝑛 ≤ 20𝐶𝑛log2 3 for every 𝑛 > 2.5

    b. Prove that the number of single-digit operations that Karatsuba’salgorithm takes to multiply two 𝑛 digit numbers is at most1000𝑛log2 3.

    Exercise 0.5 Implement in the programming language of yourchoice functions Gradeschool_multiply(x,y) and Karat-suba_multiply(x,y) that take two arrays of digits x and y and returnan array representing the product of x and y (where x is identifiedwith the number x[0]+10*x[1]+100*x[2]+... etc..) using thegrade-school algorithm and the Karatsuba algorithm respectively.At what number of digits does the Karatsuba algorithm beat thegrade-school one?

    Exercise 0.6 — Matrix Multiplication (optional, advanced). In this exercise, weshow that if for some 𝜔 > 2, we can write the product of two 𝑘 × 𝑘real-valued matrices 𝐴, 𝐵 using at most 𝑘𝜔 multiplications, then wecan multiply two 𝑛 × 𝑛 matrices in roughly 𝑛𝜔 time for every largeenough 𝑛.

    To make this precise, we need to make some notation that is unfor-tunately somewhat cumbersome. Assume that there is some 𝑘 ∈ ℕand 𝑚 ≤ 𝑘𝜔 such that for every 𝑘 × 𝑘 matrices 𝐴, 𝐵, 𝐶 such that𝐶 = AB, we can write for every 𝑖, 𝑗 ∈ [𝑘]:𝐶𝑖,𝑗 = 𝑚−1∑ℓ=0 𝛼ℓ𝑖,𝑗𝑓ℓ(𝐴)𝑔ℓ(𝐵) (6)for some linear functions �