can boltzmann machines discover cluster updatesgenerative sampling of the boltzmann machines can...

7
Can Boltzmann Machines Discover Cluster Updates ? Lei Wang * Beijing National Lab for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China Boltzmann machines are physics informed generative models with wide applications in machine learning. They can learn the probability distribution from an input dataset and generate new samples accordingly. Apply- ing them back to physics, the Boltzmann machines are ideal recommender systems to accelerate Monte Carlo simulation of physical systems due to their flexibility and eectiveness. More intriguingly, we show that the generative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from the latent representation of the Boltzmann machines, which learn to mediate complex interactions and identify clusters of the physical system. We demonstrate these findings with concrete examples of the classical Ising model with and without four spin plaquette interactions. Our results endorse a fresh research paradigm where intelligent machines are designed to create or inspire human discovery of innovative algorithms. Introduction– It is intriguing to wonder whether artificial intelligence can make scientific discoveries [14] ever since the dawn of AI. With recent rapid progress in machine learn- ing, AI is reaching human-level performance in many tasks and it is becoming even more optimistic that AI can indeed achieve the lofty goal of making scientific discoveries. Fo- cussing on the computational study of many-body physical problems with relatively well-defined rules and targets, e.g. distinguishing phases of matter or finding the lowest energy state, the above question is addressed recently in Refs. [58]. An equally interesting question is whether AI can invent, or at least, inspire human discovery of new problem-solving strategies, i.e. new algorithms. In this respect, the two exam- ples from DeepMind, where computers discover optimal strat- egy in the video game Breakout [9] and master the board game Go [10], are particularly inspiring. These are vivid examples of an intelligence agent which discovers nontrivial problem- solving strategies even unknown to its programmer. Along this line, it is highly desirable to devise new ecient Monte Carlo algorithms to sample configuration spaces of physical problems more rapidly. The existing ones such as hy- brid Monte Carlo [11], cluster algorithms [12, 13], loop algo- rithm [14], and worm algorithm [15] are all landmark achieve- ments in computational physics and find wide applications in physical, statistical and biological problems. There are some recent eorts to improve the Monte Carlo sampling [1620] using ideas and techniques from machine learning. Similar approaches were also discussed in statistics literature [2123] where one uses surrogate functions to guide and accelerate hybrid Monte Carlo calculations [11]. As suggested in [16], using intelligent Boltzmann Machines (BM) [24, 25] opens possibilities of algorithmic innovations because they can dis- covery unknown algorithmic strategies instead of merely act- ing as cheaper surrogate functions. BM is an energy based model consists of stochastic visible (s) and hidden (h) variables illustrated in Fig 1. The BM ar- chitecture is specified by an energy function E(s, h) which de- pends on the connectivity, connection weights, and biases of the units. The joint probability distribution of the units follows the Boltzmann distribution p(s, h) = e -E(s,h) . One can train the learn generate Figure 1. A schematic plot of the Boltzmann Machine which can learn from data and generate new samples. The Boltzmann Machine consists of stochastic visible (red and blue dots) and hidden units (white and gray squares) connected into a network. The colors indi- cate the status of various units which follow a Boltzmann distribution with a given energy function. The BM learns marginal probability distribution of visible variables from data by tuning its energy func- tion parameters. We show that BM with an appropriated designed architecture can discover ecient cluster Monte Carlo algorithms in the generative sampling. BM by tuning its energy function such that the marginal distri- bution of the visible variables p(s) = h p(s, h) approximates the target probability distribution π(s) of a dataset. The hidden units of a BM mediate interactions between the visible units and serve as internal representations of the data. A successfully trained BM can capture salient features of the input data. For example, the BM learns about pen strokes from an image dataset of handwritten digits [26]. Once trained, the BM can generate new samples from the learned distribution. To keep the physical simulation unbiased, the BM recommended update of the visible units s s 0 is ac- cepted with the probability according to Metropolis-Hastings rule [2729], A(s s 0 ) = min " 1, p(s) p(s 0 ) · π(s 0 ) π(s) # . (1) Equation (1) shows that the BM guides the simulation and increases the acceptance rate by exploiting the knowledge learned from data. In particular, one can even achieve a rejec- tion free Monte Carlo simulation scheme if the BM perfectly describes the target probability distribution. This is in princi- ple possible because BM is a universal approximator of dis- crete probability distributions [3032]. The expressive powers arXiv:1702.08586v1 [physics.comp-ph] 28 Feb 2017

Upload: others

Post on 21-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

Can Boltzmann Machines Discover Cluster Updates ?

Lei Wang∗Beijing National Lab for Condensed Matter Physics and Institute of Physics,

Chinese Academy of Sciences, Beijing 100190, China

Boltzmann machines are physics informed generative models with wide applications in machine learning.They can learn the probability distribution from an input dataset and generate new samples accordingly. Apply-ing them back to physics, the Boltzmann machines are ideal recommender systems to accelerate Monte Carlosimulation of physical systems due to their flexibility and effectiveness. More intriguingly, we show that thegenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms.The creative power comes from the latent representation of the Boltzmann machines, which learn to mediatecomplex interactions and identify clusters of the physical system. We demonstrate these findings with concreteexamples of the classical Ising model with and without four spin plaquette interactions. Our results endorsea fresh research paradigm where intelligent machines are designed to create or inspire human discovery ofinnovative algorithms.

Introduction– It is intriguing to wonder whether artificialintelligence can make scientific discoveries [1–4] ever sincethe dawn of AI. With recent rapid progress in machine learn-ing, AI is reaching human-level performance in many tasksand it is becoming even more optimistic that AI can indeedachieve the lofty goal of making scientific discoveries. Fo-cussing on the computational study of many-body physicalproblems with relatively well-defined rules and targets, e.g.distinguishing phases of matter or finding the lowest energystate, the above question is addressed recently in Refs. [5–8].

An equally interesting question is whether AI can invent,or at least, inspire human discovery of new problem-solvingstrategies, i.e. new algorithms. In this respect, the two exam-ples from DeepMind, where computers discover optimal strat-egy in the video game Breakout [9] and master the board gameGo [10], are particularly inspiring. These are vivid examplesof an intelligence agent which discovers nontrivial problem-solving strategies even unknown to its programmer.

Along this line, it is highly desirable to devise new efficientMonte Carlo algorithms to sample configuration spaces ofphysical problems more rapidly. The existing ones such as hy-brid Monte Carlo [11], cluster algorithms [12, 13], loop algo-rithm [14], and worm algorithm [15] are all landmark achieve-ments in computational physics and find wide applications inphysical, statistical and biological problems. There are somerecent efforts to improve the Monte Carlo sampling [16–20]using ideas and techniques from machine learning. Similarapproaches were also discussed in statistics literature [21–23]where one uses surrogate functions to guide and acceleratehybrid Monte Carlo calculations [11]. As suggested in [16],using intelligent Boltzmann Machines (BM) [24, 25] openspossibilities of algorithmic innovations because they can dis-covery unknown algorithmic strategies instead of merely act-ing as cheaper surrogate functions.

BM is an energy based model consists of stochastic visible(s) and hidden (h) variables illustrated in Fig 1. The BM ar-chitecture is specified by an energy function E(s,h) which de-pends on the connectivity, connection weights, and biases ofthe units. The joint probability distribution of the units followsthe Boltzmann distribution p(s,h) = e−E(s,h). One can train the

learn

generate

Figure 1. A schematic plot of the Boltzmann Machine which canlearn from data and generate new samples. The Boltzmann Machineconsists of stochastic visible (red and blue dots) and hidden units(white and gray squares) connected into a network. The colors indi-cate the status of various units which follow a Boltzmann distributionwith a given energy function. The BM learns marginal probabilitydistribution of visible variables from data by tuning its energy func-tion parameters. We show that BM with an appropriated designedarchitecture can discover efficient cluster Monte Carlo algorithms inthe generative sampling.

BM by tuning its energy function such that the marginal distri-bution of the visible variables p(s) =

∑h p(s,h) approximates

the target probability distribution π(s) of a dataset. The hiddenunits of a BM mediate interactions between the visible unitsand serve as internal representations of the data.

A successfully trained BM can capture salient features ofthe input data. For example, the BM learns about pen strokesfrom an image dataset of handwritten digits [26]. Oncetrained, the BM can generate new samples from the learneddistribution. To keep the physical simulation unbiased, theBM recommended update of the visible units s → s′ is ac-cepted with the probability according to Metropolis-Hastingsrule [27–29],

A(s→ s′) = min[

1,p(s)p(s′)

·π(s′)π(s)

]. (1)

Equation (1) shows that the BM guides the simulation andincreases the acceptance rate by exploiting the knowledgelearned from data. In particular, one can even achieve a rejec-tion free Monte Carlo simulation scheme if the BM perfectlydescribes the target probability distribution. This is in princi-ple possible because BM is a universal approximator of dis-crete probability distributions [30–32]. The expressive powers

arX

iv:1

702.

0858

6v1

[ph

ysic

s.co

mp-

ph]

28

Feb

2017

Page 2: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

2

of BM were studied recently from physics perspectives [33–36]. See Refs. [8, 37, 38] for other recent applications of BMto quantum and statistical physics problems.

Reference [16] argues that the efficient simulation of theBM, and more importantly, its ability to capture high levelfeatures make BM ideal recommender systems to accelerateMonte Carlo simulation of challenging physical problems. Inparticular, Ref. [16] employs a restricted architecture of BMwhere the connections are limited to be between the visibleand hidden units. Such a restricted BM can be sampled ef-ficiently by blocked Gibbs sampling alternating between thehidden and visible units. The generative sampling of the re-stricted BM already gives rise to nonlocal updates becausethe BM learns about collective density correlations from theMonte Carlo data. This suggests that besides being used as ageneral purpose recommender engine for accelerating MonteCarlo simulations, the BM may discover conceptually new ef-ficient updates.

In this paper, we demonstrate the BM’s creative power byexact constructions and then present a general framework ex-ploit the power. The crucial insight is that the hidden units ofthe BM can learn to mediate complex interactions between thevisible units and decouple the visible units into disconnectedclusters. The generative sampling of the BM then automat-ically propose efficient cluster updates. To encourage thesediscoveries, it is crucial to design the BM in a suitable archi-tecture and allow its parameters adapt to the physical distribu-tion via learning.

Example: Ising model– To make the discussions con-crete, we start with the classical Ising model and show thatthe generative sampling of the BM encompasses a wide rangeof celebrated cluster algorithms [12, 39–42]. The Boltzmannweight of the Ising model reads

π(s) = exp

βJ∑`

∏i∈`

si

, (2)

where β = 1/T is the inverse temperature and J is the cou-pling constant. We consider ferromagnetic coupling J > 0 inthe following for clarity. The considerations are neverthelessgeneral and valid for the antiferromagnetic case as well. Equa-tion (2) consists of a summation over links ` of a lattice anda product over Ising spins si ∈ {−1, 1} reside on the verticesconnected by the link.

To devise a BM inspired cluster update of the Ising model,we consider the architecture illustrated in Fig. 2(a). We viewthe Ising spins as visible variables and introduce binary hiddenvariables h` ∈ {0, 1} on the links of the lattice. These units arecoupled according to the following energy function

E(s,h) = −∑`

W ∏i∈`

si + b

h`. (3)

Equation (3) is a high-order BM [43] because the interactionconsists of three-spin interactions (one hidden unit and twovisible units). Similar architectures were discussed in the ma-chine learning literature under the name three-way Boltzmann

flip clustersp(h|s)

(a) (b)h` = 0, 1

si = ±1

visible

hidden

Figure 2. (a) The Boltzmann Machine Eq. (3) reproduces clusterMonte Carlo algorithms of the Ising model Eq. (2). Solid dots resideon the vertices are the visible units representing the Ising spins. Redand blue colors denote Ising spin up and down. The squares resideon the links are the binary hidden units, where white and gray colorindicates inactive (h` = 0) or active (h` = 1) status of the hidden unit.The effective interaction between the visible units can either be W(thick links) or 0 (thin links). (b) The sampling of the BM. Given thevisible units, we sample the hidden units according to Eq. (4). Theinactive hidden units (white squares) divide the visible units into dis-connected components which can be flipped collectively at random.

Machines [44–46]. In light of the translational invariance ofthe Ising model (2) we use the same connection weight W andbias b for all the links. Therefore the BM energy functionEq. (3) only contains two free parameters.

To perform generative sampling of the BM (3) we proceedin two steps by exploiting its particular architecture shown inFig. 2. First, given a set of visible Ising spins, we can readilyperform direct sampling of the hidden units. This is becausethe conditional probability factorizes into products over eachlink p(h|s) = p(s,h)/p(s) =

∏` p(h` |s), where

p(h` = 1|s) = σ

W ∏i∈`

si + b

, (4)

and σ(z) = 1/(1 + e−z) is the sigmoid activation function. Asshown in Fig. 2(a) the inactive hidden units (white squares)divide the lattice into disconnected components since h` = 0in Eq. (3) decouples the visible Ising spins reside on the link`. One can therefore identify connected components usingunion-find algorithm [47, 48] and flip all the visible Ising spinswithin each component collectively at random. This clustermove respect the statistical weight of the BM Eq. (3) due tothe Z2 symmetry of visible Ising spins in the energy function.

Combining the two steps in Fig. 2(b) forms an update ofthe visible units of the BM. Recommending the update to theIsing model Monte Carlo simulation, it is accepted with theprobability Eq. (1) [29]. Matching the unnormalized marginaldistribution of the BM (3) p(s) =

∏`

(1 + eW

∏i∈` si+b

)and the

Ising model Boltzmann weight Eq. (2) gives rise to a rejectionfree Monte Carlo scheme. The resulting condition

1 + eb+W

1 + eb−W = e2βJ (5)

can always be satisfied with appropriate chosen W and b. It is

Page 3: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

3

instructive to examine the BM recommended updates in twolimiting cases.

In the limit of b→ −∞, the solution of Eq. (5) reads W+b =

ln(e2βJ − 1). Thus, the conditional sampling of the hiddenunits Eq. (4) will set h` = 1 with probability σ(W + b) =

1 − e−2βJ if the link connects to two parallel spins∏

i∈` si = 1.While it will always set the hidden unit to inactive h` = 0 ifthe link connects to anti-parallel spins. Combined with therandom cluster flip of visible units, this BM recommendedupdate shown in Fig. 2(b) exactly reproduces the Swendsen-Wang cluster algorithm [12] of the Ising model.

While in the opposite limit b → ∞, the solution of Eq. (5)approaches to W = βJ. In this limit, all the hidden units arefrozen to h` = 1 because the activation function in Eq. (4) sat-urates no matter whether the visible Ising spins are aligned ornot. The BM Eq. (3) then trivially reproduces the Ising modelstatistics by copying its coupling constant βJ to the connectionweight W. In this limit the BM recommended update shownin Fig. 2(b) is a trivial global flip of the visible Ising spins.

In between the above two limiting cases, the BM stillrecommends valid rejection free Monte Carlo updates forthe Ising model. These updates correspond to the Nieder-mayer’s cluster algorithm [39] where the sites are randomlyconnected into clusters according to Eq. (4) and the clustersmay contain misalligned visible spins. The bias parameterb in Eq. (4) controls the activation threshold of the hiddenunits and thus affects the average cluster size. In essence,the BM (3) encompasses several general cluster Monte Carloframeworks including the Kandel-Domany [40] and the dualMonte Carlo [41, 42] algorithms. In these algorithms, theMonte Carlo sampling alternates between the physical degreesof freedom and auxiliary graphical variables corresponding tothe hidden units of the BM.

Example: Ising model with plaquette interactions– Thepotential of BM goes beyond reproducing existing algorith-mic frameworks [12, 39–42]. By further exploiting its powerfrom latent representations one can make nontrivial algorith-mic discoveries. We illustrate this using the plaquette Isingmodel [17] as an example. The Boltzmann weight reads

π(s) = exp

βJ∑`

∏i∈`

si + βK∑℘

∏i∈℘

si

, (6)

where the second term contains four-spin interactions on eachsquare plaquette denoted by ℘. We consider K > 0 for con-creteness. Since no simple and efficient cluster algorithm isknown, Ref. [17] fits the Boltzmann weight Eq. (6) to an ordi-nary Ising model Eq. (2) and propose Monte Carlo updates bysimulating the latter model with cluster algorithms [13, 49].However, the acceptance rates decrease for large systems dueto imperfect fittings and the approach ends up to show similarscaling behavior as the single spin flip update.

Here we construct a BM which suggests an efficient, un-biased, and rejection free cluster Monte Carlo algorithm forEq. (6). First, we decompose the four-spin plaquette in-teraction using the Hubbard-Stratonovich (HS) transforma-

�J�J + W/2 �J � W/2

Figure 3. The Boltzmann Machine Eq. (8) suggests a new cluster up-date for the plaquette Ising model (6). Red/blue dots on the verticesdenote the visible Ising spins, and white/gray squares in the plaque-tte center denote the hidden units. The double arrows indicates twoparallel links `℘, ¯̀

℘ composing of the plaquette ℘. The hidden unitsare sampled directly according to Eq. (9) where the break up of theplaquette into parallel links are chosen at random. Once the hiddenunits are given, Eq. (8) reduced to an inhomogeneous Ising modelwhere the visible spins are coupled with modified coupling strengthsindicated by the thickness of the links.

tion [50, 51]

exp

βK∏i∈℘

si

=e−βK

2

∑h℘∈{0,1}

exp[W

(h℘ −

12

)F℘(s)

], (7)

where W = acosh(e2βK) is the coupling strength betweenthe binary HS field h℘ and the sum of two-spin productsF℘(s) =

∏i∈`℘ si +

∏i∈ ¯̀

℘si defined for the plaquette. The two

parallel links `℘ and ¯̀℘ constitute the plaquette ℘, see Fig. 3.

Equation (7) is equivalent to the discrete HS transformationwidely adopted for the Hubbard models [52]. Regarding theHS field h℘ as a hidden unit, the following BM

E(s,h) = −∑`

βJ + W∑℘

(h℘ −

12

) (δ``℘ + δ` ¯̀

)∏i∈`

si

(8)exactly reproduces Eq. (6) after marginalization. Since Eq. (7)holds for arbitrary partition of the plaquette into two links `℘∪¯̀℘ = ℘ and `℘∩ ¯̀

℘ = ∅, we choose vertical or horizontal breakup at random for each plaquette.

Simulation of the BM Eq. (8) suggests an efficient clus-ter update for the original plaquette Ising model (6). First ofall, sampling the hidden variables given the visible Ising spinsis straightforward since the conditional probability factorizesover plaquettes p(h|s) =

∏℘ p(h℘|s), where

p(h℘ = 1|s) = σ(WF℘(s)

). (9)

Therefore, the hidden unit of each plaquette activates indepen-dently given the local feature F℘(s). Next, once the hiddenvariables are given, the BM Eq. (8) corresponds to an Isingmodel with two-spin interactions only, shown in Fig. 3. Onecan sample it efficiently using the cluster updates [12] by tak-ing into account of the randomly modified coupling strengths.As discussed in the above, this amounts to introduce anotherset of hidden units which plays the role of auxiliary graphicalvariables. Finally, according to Eq. (1) the updates of the vis-ible Ising spins are always accepted because the BM Eq. (8)

Page 4: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

4

exactly reproduces the statistics of the plaquette Ising modelEq. (6).

To demonstrate the efficiency of the discovered cluster up-date, we simulate the plaquette Ising model (6) in the vicinityof the critical point and compare performance to the simplelocal update algorithm. Figure 4(a) shows the Binder ratio〈(∑

i si)4〉 / 〈

(∑i si

)2〉

2for various system sizes at K/J = 0.2,

which indicates a critical temperature T/J = 2.4955(5). Theblack dashed line indicates the universal critical value of theBinder ratio 1.1679 corresponding to the two-dimensionalIsing universality class [53]. Figure 4(b) shows the energyautocorrelation times [54] of the local updates and the clus-ter updates at the critical point, both measured in the unit ofMonte Carlo sweeps of the visible spins [55]. The local up-dates exhibit the same scaling for the Ising model (K = 0)and the plaquette Ising model (K = 0.2). While the clusterupdates are orders of magnitude more efficient than the localupdates. The dynamic exponent of the cluster algorithm isalso significantly reduced compared to the local update.

General framework– To sum up, we outline a generalframework of discovering cluster updates using the followingBM

E(s,h) = −E(s) −∑α

[WαFα(s) + bα] hα, (10)

where Fα(s) is a feature of the visible units and hα ∈ {0, 1} isthe corresponding hidden variable. Wα and bα are connectionweight and bias, and α is the index for various features. Forexample, Eq. (3) and Eq. (8) used the features defined on links(α = `) and on plaquettes (α = ℘) respectively. In general, oneis free to design features consist of long-range interactions oreven multispins interactions [56]. There are several crucialpoints in the design of Eq. (10). First, one can easily samplethe hidden units conditioned on these features since there isno interaction between the hidden variables, i.e. Eq. (10) is asemi-restricted BM. The activation probability of each hiddenunit is p(hα = 1|s) = σ (WαFα(s) + bα). Second, once the hid-den units are given, Eq. (10) reduces to an effective model forthe visible spins, which should be easier to sample comparedto the original problem. For example, one can randomly flipeach disconnected component separated by the inactive hid-den units if E(s) = 0 and Fα(s) = Fα(−s). Alternatively, onecan build another BM to simplify the sampling of Eq. (10)given the hidden units. One can even apply this idea itera-tively and build a hierarchy of BMs. Overall, the key designprinciple is to choose appropriate features in Eq. (10) suchthat the BM correctly reproduce the distribution of the physi-cal problem and is easy to simulate. A good design is likely toexploit the knowledge of the original physical problem [57].

The BM Eq. (8) provides a general paradigm to discoverefficient cluster updates automatically from data because itsparameters are learnable, see Fig. 1. Although we adopteda constructive approach in this paper, in general, the hiddenunits of the BMs can learn to be auxiliary graphical vari-ables or HS fields. The BM learning can be done in multi-ple ways, either through unsupervised learning of the config-

0.00 0.01 0.02

1/L

1.12

1.14

1.16

1.18

1.20

1.22

Bin

der

rati

o

(a)

T= 2. 494

T= 2. 495

T= 2. 496

T= 2. 497

101 102

L

100

101

102

103

104

Auto

corr

ela

tion t

ime

(b)

Local K= 0. 0

Local K= 0. 2

Cluster K= 0. 0

Cluster K= 0. 2

Figure 4. Results for the Ising model with four-spin plaquette in-teractions (6) on square lattices with linear length L. (a) Binder ra-tio obtained using the cluster update suggested by BM Eq. (8) atK/J = 0.2. The dashed line indicates the universal value for the two-dimensional Ising universality class. (b) The cluster update improvesthe energy autocorrelation time by orders of magnitude at the criticalpoint compared to the local update.

uration data [37, 58], or through supervised learning of theunnormalized target distribution π(s) [16], or even through re-inforcement learning [59] by optimizing the autocorrelationtime of the Monte Carlo samples.

In closing, we note many cluster quantum Monte Carlo al-gorithms [60, 61] share the framework of [40–42]. General-ization of the hidden units to higher integers or even continu-ous variables is likely to increase the capacity of the BM. Onecan include higher order self-interactions of the hidden vari-ables in Eq. (10) in this case. To this end, these BMs provideconcrete parametrization of valid Monte Carlo update poli-cies which can be optimized through learning. This approachopens a promise of discovering practically useful Monte Carloalgorithms for a broad range of problems, such as frustratedmagnets or correlated fermions where known efficient clusterupdates are rare. Exploring more general and powerful BMarchitectures in these settings may lead to even more excitingalgorithmic discoveries.

L.W. thanks Li Huang and Yi-feng Yang for collabora-tions on [16, 19] and acknowledges Jing Chen, Junwei Liu,Yang Qi, Zi-Yang Meng, and Tao Xiang for useful discus-sions. L.W. thanks Li Huang for comments on the manuscript.L.W. is supported by the Ministry of Science and Technol-ogy of China under the Grant No.2016YFA0302400 and thestart-up grant of IOP-CAS. The simulation is performed atTianhe-2 supercomputer in National Supercomputer Centerin Guangzhou. We use the ALPS library [62] for the MonteCarlo data analysis.

Page 5: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

5

[email protected][1] P. Langley, H. A. Simon, G. L. Bradshaw, and J. M. Zytkow,

Scientific discovery: Computational explorations of the creativeprocesses (MIT press, 1987).

[2] M. Alai, Minds and Machines 14, 21 (2004).[3] N. J. Nilsson, The quest for artificial intelligence (Cambridge

University Press, 2009).[4] Y. Gil, M. Greaves, J. Hendler, and H. Hirsh, Science 346, 171

(2014).[5] J. Carrasquilla and R. G. Melko, Nature Physics (2017),

10.1038/nphys4035.[6] E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nature

Physics (2017), 10.1038/nphys4037.[7] L. Wang, Physical Review B 94, 195105 (2016).[8] G. Carleo and M. Troyer, Science 355, 602 (2017).[9] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,

M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland,G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis,Nature 518, 529 (2015).

[10] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre,G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Pan-neershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham,N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,K. Kavukcuoglu, T. Graepel, and D. Hassabis, Nature 529, 484(2016).

[11] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth,Physics Letters B (1987).

[12] R. H. Swendsen and J.-S. Wang, Phys. Rev. Lett. 58, 86 (1987).[13] U. Wolff, Phys. Rev. Lett. 62, 361 (1989).[14] H. G. Evertz, G. Lana, and M. Marcu, Phys. Rev. Lett. 70, 875

(1993).[15] N. Prokof’ev and B. Svistunov, Phys. Rev. Lett. 87, 160601

(2001).[16] L. Huang and L. Wang, Physical Review B 95, 035105 (2017).[17] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Physical Review B 95,

041101 (2017).[18] J. Liu, H. Shen, Y. Qi, Z. Y. Meng, and L. Fu, arXiv (2016),

1611.09364.[19] L. Huang, Y.-f. Yang, and L. Wang, arXiv (2016), 1612.01871.[20] X. Y. Xu, Y. Qi, J. Liu, L. Fu, and Z. Y. Meng, arXiv (2016),

1612.03804.[21] R. M. Neal, Bayesian learning for neural networks (Springer

Science & Business Media, 1996).[22] J. S. Liu, Monte Carlo strategies in scientific computing

(Springer Science & Business Media, 2008).[23] C. E. Rasmussen, J. Bernardo, M. Bayarri, J. Berger, A. Dawid,

D. Heckerman, A. Smith, and M. West, in Bayesian Statistics7 (2003) pp. 651–659.

[24] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, Cognitivescience 9, 147 (1985).

[25] G. E. Hinton and T. J. Sejnowski, in Parallel Distrilmted Pro-cessing, Vol. 1 (1986) pp. 282–317.

[26] http://deeplearning.net/tutorial/rbm.html.[27] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H.

Teller, and E. Teller, The Journal of Chemical Physics 21, 1087(1953).

[28] W. K. Hastings, Biometrika 57, 97 (1970).[29] See the Supplementary Materials for a proof of Eq. (1).[30] Y. Freund and D. Haussler, in Proceeding NIPS’91 Proceedings

of the 4th International Conference on Neural Information Pro-

cessing Systems (Morgan Kaufmann Publishers, San Francisco,CA, 1994) pp. 912–919.

[31] N. Le Roux and Y. Bengio, Neural computation 20, 1631(2008).

[32] G. Montufar and N. Ay, Neural computation 23, 1306 (2011).[33] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, arXiv

(2017), 1701.04831.[34] D.-L. Deng, X. Li, and S. Das Sarma, arXiv (2017),

1701.04844.[35] X. Gao and L.-M. Duan, arXiv (2017), 1701.05039.[36] Y. Huang and J. E. Moore, arXiv (2017), 1701.06246.[37] G. Torlai and R. G. Melko, Physical Review B 94, 165134

(2016).[38] G. Torlai and R. G. Melko, arXiv (2016), 1610.04238.[39] F. Niedermayer, Phys. Rev. Lett. 61, 2026 (1988).[40] D. Kandel and E. Domany, Physical Review B 43, 8539 (1991).[41] N. Kawashima and J. E. Gubernatis, Physical Review E 51,

1547 (1995).[42] D. M. Higdon, Journal of the American Statistical Association

93, 585 (1998).[43] T. J. Sejnowski, in AIP Conference Proceedings, Vol. 151 (AIP,

1986) pp. 398–403.[44] R. Memisevic and G. Hinton, in Computer Vision and Pattern

Recognition (2007) pp. 1–8.[45] R. Memisevic and G. E. Hinton, Neural computation 22, 1473

(2010).[46] A. Krizhevsky and G. E. Hinton, in International conference on

artificial intelligence and statistics (2010) pp. 621–628.[47] R. Sedgewick and K. D. Wayne, Algorithms (Addison-Wesley

Professional, 2011).[48] J. Gubernatis, N. Kawashima, and P. Werner, Quantum Monte

Carlo Methods: Algorithms for Lattice Models (CambridgeUniversity Press, 2016).

[49] G. T. Barkema and J. F. Marko, Phys. Rev. Lett. 71, 2070(1993).

[50] R. Stratonovich, in Soviet Physics Doklady, Vol. 2 (1957) p.416.

[51] J. Hubbard, Phys. Rev. Lett. 3, 77 (1959).[52] J. E. Hirsch, Physical Review B 28, 4059 (1983).[53] J. Salas and A. D. Sokal, Journal of Statistical Physics 98, 551

(2000).[54] V. Ambegaokar and M. Troyer, American Journal of Physics

78, 150 (2010).[55] M. Newman and G. T. Barkema, Monte Carlo methods in sta-

tistical physics (Oxford, 1999).[56] Introducing two set of hidden units coupled respectively to the

feature F`(s) =∏

i∈` si and F℘(s) =∏

i∈℘ si gives an alternativecluster algorithms for the plaquette Ising model Eq. (6). Theresulting algorithm is a simple generalization of the Swendsen-Wang algorithm [12] to the case of plaquette unit. The satisfiedplaquetteF℘(s) = 1 is connected with probability 1−e−2βK whilethe unsatisfied plaquette is always disconnected. However, thisalgorithm has larger averaged clusters size compared to the onepresented in the texts. This is because the visible units are morelikely to be connected with the plaquette decomposition.

[57] When all the hidden units are frozen to the active state by in-finitely strong bias, the marginal probability of the BM (10) ap-proaches to p(s) = eE(s)+

∑α WαFα(s). In this limit the BM reduces

to an effective Hamiltonian of the visible spins with designedfeatures. One thus loses the flexibility empowered by the latentstructures. Moreover, lacking the nonlinearity in the marginaldistribution induced by the dynamical hidden units also reducesthe fitting capacity of the BM.

[58] G. E. Hinton, Neural computation 14, 1771 (2002).

Page 6: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

6

[59] R. S. Sutton and A. G. Barto, Reinforcement Learning: An In-troduction (MIT Press, Cambridge, MA, USA, 1998).

[60] H. G. Evertz, Advances in Physics 52, 1 (2003).[61] N. Kawashima and K. Harada, Journal of the Physics Society

Japan 73, 1379 (2004).[62] B. Bauer et al., Journal of Statistical Mechanics: Theory and

Experiment 2011, P05001 (2011).

Page 7: Can Boltzmann Machines Discover Cluster Updatesgenerative sampling of the Boltzmann Machines can even discover unknown cluster Monte Carlo algorithms. The creative power comes from

7

Detailed balance condition of Eq. (1)

The acceptance probability of the recommender updatefrom the restricted Boltzmann Machine is derived in [16]. Werepeat the derivation for the BM considered in the main textsfor the convenience of the readers.

First of all, the Metropolis-Hastings [27, 28] acceptancerate of the physical model satisfies

A(s→ s′) = min[1,

T (s′ → s)T (s→ s′)

·π(s′)π(s)

]. (11)

The transition probability is recommended from the simula-tion of the BM, where we sample alternatingly between thehidden and visible units, i.e, T (s → s′) = Fh(s → s′)p(h|s).Here Fh denotes update of the visible units given the hiddenvariables h. In Ref. [16] we have used Fh(s → s′) = p(s′|h)as the conditional probability given the hidden units of therestricted BM is simply tractable. For the BMs consider in

this paper, we adopted more sophisticated update such as clus-ter flip transition of the visible units. The requirement on Fhis that it respects the joint probability distribution of the BMgiven the hidden units

Fh(s→ s′)p(s,h) = Fh(s′ → s)p(s′,h). (12)

The ratio of the transition probability thus satisfy

T (s→ s′)T (s′ → s)

=Fh(s→ s′)p(h|s)Fh(s′ → s)p(h|s′)

=p(s′,h)p(h|s)p(s,h)p(h|s′)

=p(s′)p(s)

. (13)

Substitute Eq. (13) into the Metropolis-Hastings acceptanceprobability Eq. (11), we obtain Eq. (1) in the main texts.