future of integer calibration weighting methods...dse: dual-system estimation nass uses dse to...

43
Future of Integer Calibration Weighting Methods Luca Sartore National Institute of Statistical Sciences (NISS) USDA National Agricultural Statistics Service [email protected] [email protected] SDSS 2018 Data Science at the National Institute of Statistical Sciences May 17, 2018 ... providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Future of Integer Calibration Weighting Methods

Luca Sartore

National Institute of Statistical Sciences (NISS)USDA National Agricultural Statistics Service

[email protected][email protected]

SDSS 2018Data Science at the National Institute of Statistical Sciences

May 17, 2018

“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Page 2: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum Computing:The Future of Integer Calibration

Luca Sartore

National Institute of Statistical Sciences (NISS)USDA National Agricultural Statistics Service

[email protected][email protected]

SDSS 2018Data Science at the National Institute of Statistical Sciences

May 17, 2018

“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 2

Page 3: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Acknowledgements

Valbona BejleriKatherine CespedesMatt FetterThomas JacobJohnathan LisicBeth SchleinClifford SpiegelmanKelly ToppinLinda Young

SDSS 2018 – Data Science at NISS – Luca Sartore 3

Page 4: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Presentation Outline

PART I: MOTIVATION

1. Census of Agriculture

2. The calibration problem at NASS

PART II: METHODOLOGY

3. Quantum computing

4. Quantum rounding algorithm

5. Quantum integer calibration

PART III: APPLICATION

6. Simulation study

7. Concluding remarks

SDSS 2018 – Data Science at NISS – Luca Sartore 4

Page 5: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

PART I

MOTIVATION

1. Census of Agriculture

2. The calibration problem at NASS

SDSS 2018 – Data Science at NISS – Luca Sartore 5

Page 6: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Census of Agriculture

Every five years, USDA’s National Agricultural Statistics Service(NASS) conducts the Census of Agriculture.

I The Census provides a detailed picture of U.S. farms, ranchesand the people who operate them.

I It is the only source of uniform, comprehensive agriculturaldata for every state and county in the United States.

I NASS also obtains information on most commodities fromadministrative sources or surveys of non-farm populations(e.g. cotton ginning data).

SDSS 2018 – Data Science at NISS – Luca Sartore 6

Page 7: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

DSE: Dual-System Estimation

NASS uses DSE to adjust its estimates by generating weightsassigned to each data-record.

I DSE requires two independent surveys to produce adjustedestimates for under-coverage, non-response and incorrectfarm-classification at the national, state and county levels.

I The adjusted weights are used as starting values for thecalibration process.

I The weights are calibrated to ensure that the Censusestimates are consistent across all levels of aggregation and inagreement with information from other sources.

SDSS 2018 – Data Science at NISS – Luca Sartore 7

Page 8: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Calibration problemA solution w such that T = Aw , where

T is a vector partitioned into y known and y∗ unknownpopulation totals,

A is the matrix of collected data from a population, and

w is a vector of unknown weights.

Calibration finds the solution of the linear system y = Aw , where

A is a sub-matrix of the collected data.

NASS publishes its estimates by usinginteger weights

to avoid fractional farms.

SDSS 2018 – Data Science at NISS – Luca Sartore 8

Page 9: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

INCA: Integer CalibrationCurrently, NASS uses the following steps to calibrate its weights:

1. All unfeasible weights are truncated to their closest boundary,and to minimize the objective function, non-integer weightsare then rounded sequentially according to an importanceindex based on the gradient.

2. Each weight, according to the magnitude of the gradient, isallowed to move by unit-steps that decrease the objectivefunction.

Limitation

INCA converges to a local minima,not to a global solution.

SDSS 2018 – Data Science at NISS – Luca Sartore 9

Page 10: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

PART II

METHODOLOGY

3. Quantum computing

4. Quantum rounding algorithm

5. Quantum integer calibration

SDSS 2018 – Data Science at NISS – Luca Sartore 10

Page 11: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum Computing and the future of INCA

Quantum computing

Looks promising for applications wherecomplex problems need

computationally efficient solutions, such asfinding a discrete global optimum.

Quantum Integer Calibration (QUINCA)I Rounds the DSE weights with a quantum search.I Performs multidimensional adjustments of the rounded

weights to match given calibration benchmarks.

SDSS 2018 – Data Science at NISS – Luca Sartore 11

Page 12: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

The quantum-bit (qubit)

Classical computers use bits to represent either zeros or ones, butquantum computers operate with qubits (or quantum bits), whichare allowed to denote values that are simultaneously 0 and 1.

Observing the status of a qubit with superposition α0|0〉+ α1|1〉will produce

x =

{0, with probability |α0|2,1, with probability |α1|2,

where

I the amplitudes of α0 and α1 ∈ C satisfy |α0|2 + |α1|2 = 1,

I |0〉 denotes the vector (1, 0)>, and

I |1〉 represents the vector (0, 1)>.

SDSS 2018 – Data Science at NISS – Luca Sartore 12

Page 13: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum operations

Quantum algorithms manipulate the qubits and producemeasurements of the probability distribution of the observedoutcomes.

1. The algorithm takes as input n classical bits and creates asuperposition of 2n possible states.

2. The superposition is then processed by quantum operations.

3. When the superposition is measured, it randomly collapses tozero or one.

4. Step 3 is iterated to assure convergence over the probabilitydistribution.

SDSS 2018 – Data Science at NISS – Luca Sartore 13

Page 14: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance

SDSS 2018 – Data Science at NISS – Luca Sartore 14

Page 15: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Example of quantum roundingI We have a matrix of data

A =

[1 1 16 4 8

].

I We start with the DSE weights w∗ = (2.3, 5.1, 7.9)>.I Our known totals are y = (15, 97)>.I We consider the objective function

L(w) =∑i

|yi − w>Ai |,

where Ai is the i-th row of the matrix A.I The gradient of L

g(w) = −A>sign(y − Aw).

SDSS 2018 – Data Science at NISS – Luca Sartore 15

Page 16: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 16

Page 17: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 1: Trimming the weightsI The weights that do not satisfy given constraints are trimmed

and all the others are truncated.I Update the weights

wi =

1, if w∗i < 1,

bw∗i c, if 1 ≤ w∗i < 6,

6, if w∗i ≥ 6,

where w∗i represents the i-th DSE weight.

In our example

w∗ = (2.3, 5.1, 7.9)>

w = (2, 5, 6)>

SDSS 2018 – Data Science at NISS – Luca Sartore 17

Page 18: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 18

Page 19: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 2: Compute probabilitiesA qubit |Ψi 〉 is initialized as |Ψi 〉 =

√1− pi |0〉+

√pi |1〉, where

the initial probabilities are computed as

pi =

{g−1(1) gi (w

∗ − w), if gi < 0 and 0 < w∗ − w < 1,

0, otherwise,

where g−1(1) denotes the inverse of the smallest component of thegradient, gi represents the i-th component of the gradient, for anyi = 1, . . . , n.

In our example

g = (−7, − 5, − 9)>

w∗ − w = (0.3, 0.1, 0)>

p = (0.23, 0.06, 0)>

SDSS 2018 – Data Science at NISS – Luca Sartore 19

Page 20: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 20

Page 21: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 3 & 4: Stable setting of γ0 and γ1

I The quantities γ0i = 1− pi and γ1i = pi are set so that

pi =γ1i

γ0i + γ1i.

I These two values, γ0i and γ1i , will be updated at eachiteration.

In our example

γ0 = (0.77, 0.94, 1)>

γ1 = (0.23, 0.06, 0)>

SDSS 2018 – Data Science at NISS – Luca Sartore 21

Page 22: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 22

Page 23: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 5 & 6: Observing the quantum stateI m binary vectors xj are generated by observing the quantum

states of the qubits j = 1, . . . ,m.I The performance of these vectors is evaluated by a loss

function Lj associated to xj .

In our example

The measurements (with m = 5) and their losses are

x1 = (1, 0, 0)> → L1 = 12

x2 = (0, 0, 0)> → L2 = 19

x3 = (1, 0, 0)> → L3 = 12

x4 = (1, 1, 0)> → L4 = 7

x5 = (0, 0, 0)> → L5 = 19

SDSS 2018 – Data Science at NISS – Luca Sartore 23

Page 24: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 24

Page 25: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 7: Updating the probabilities via γ0 and γ1

γ[τ+1]0i ← γ

[τ ]0i (1− λ) + λ

m∑j=1

(1− xji )

(L(m) − Lj

L(m) − L(1)

)2

,

γ[τ+1]1i ← γ

[τ ]1i (1− λ) + λ

m∑j=1

xji

(L(m) − Lj

L(m) − L(1)

)2

,

where

I L(1) denotes the minimum loss associated with the best fit,

I L(m) represents the maximum loss associated with the worst fit,

I the scalar λ ∈ [0, 1] is used to speed-up convergence.

In our example

For λ = 0.5,

γ0 = (0.38, 0.81, 1.34)>, and γ1 = (0.96, 0.53, 0)>

p = (0.71, 0.39, 0)>

SDSS 2018 – Data Science at NISS – Luca Sartore 25

Page 26: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Quantum rounding at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 26

Page 27: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 8: ConvergenceThese operations are iterated until convergence is achieved.

γ1i/(γ0i + γ1i )→ 0 ⇒ wi = bw∗i c

γ1i/(γ0i + γ1i )→ 1 ⇒ wi = dw∗i e

In our example

The probabilities p ≈ (1, 1, 0)>, so the vector of roundedweights is

w + p =

256

+

110

=

366

SDSS 2018 – Data Science at NISS – Luca Sartore 27

Page 28: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

QUINCA at a glance

SDSS 2018 – Data Science at NISS – Luca Sartore 28

Page 29: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

The calibration settingI Final integer weights are computed iteratively by unit

adjustments according to the sign of the gradient.I The feasible steps are computed by s � x .I The components of the vector s satisfy the equality

si = −sign(gi ) and those of x are generated by measuring thestatus of the qubits |Ψi 〉, for any i = 1, . . . , n.

In our example

The new gradient g and the vector s are

g = (−6, − 4, − 8)>

s = (1, 1, 1)>

SDSS 2018 – Data Science at NISS – Luca Sartore 29

Page 30: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

QUINCA at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 30

Page 31: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 1: Qubits initializationQubits |Ψi 〉 are initialized to take into account only for feasible adjustments inthe opposite direction of the gradient; i.e.

pi =

0, if gi > 0 and wi < 2,

0, if gi < 0 and wi > buic − 1,

|gi |/max(|g(1)|, |g(n)|

), otherwise.

These are used to initialize γ0 and γ1.

In our example

p = (0.75, 0, 0)>

γ0 = (0.25, 1, 1)>

γ1 = (0.75, 0, 0)>

SDSS 2018 – Data Science at NISS – Luca Sartore 31

Page 32: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

QUINCA at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 32

Page 33: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 2: Quantum adjustmentsI The performance of the m binary vectors is evaluated by the

loss function Lj , for any j = 1, . . . ,m.I γ0 and γ1 are updated as in the rounding algorithm.I When the ratio γ1i/(γ0i + γ1i ) converges, wi ← wi + si xi ,

where

xi =

{0, if γ0i > γ1i ,

1, otherwise.

In our example

The probabilities p ≈ (1, 0, 0)>, therefore

w + s � p =

366

+

111

�1

00

=

466

SDSS 2018 – Data Science at NISS – Luca Sartore 33

Page 34: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

QUINCA at a glance (continued)

SDSS 2018 – Data Science at NISS – Luca Sartore 34

Page 35: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Step 3: Convergence

I At each iteration the gradient is updated and a new set ofprobabilities are computed to initialize the qubits.

I The quantum calibration algorithm terminates when the ratioγ1i/(γ0i + γ1i )→ 0 for any i = 1, . . . , n.

In our example

The optimal solution was found in w = c(4, 6, 6)> with afinal loss of 2. No further adjustments are needed.

SDSS 2018 – Data Science at NISS – Luca Sartore 35

Page 36: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

PART III

APPLICATION

6. Simulation study

7. Concluding remarks

SDSS 2018 – Data Science at NISS – Luca Sartore 36

Page 37: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

A simulation study1) The data

I 150 weights ωi ∼ Gamma(3.333, 1).

I A 201× 150 matrix A is simulated such that

aki =

{1, if k = 1,

bkicki , otherwise,

where bki ∼ Bernoulli(0.3) and cki ∼ Poisson(4).

I Calibration benchmarks are computed as y = Aω.

I DSE weights are simulated from a U(0, 7.5).

I Final weights are restricted such that wi ∈ [1, 6].

SDSS 2018 – Data Science at NISS – Luca Sartore 37

Page 38: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

A simulation study (continued)2) The loss function and its gradient

The loss function is

Lj =201∑k=1

∣∣∣∣∣yk −150∑i=1

akiwi

∣∣∣∣∣ ,and its gradient is

gi = −201∑k=1

sign(εk)aki ,

where εj = yk −∑150

i=1 akiwi .

SDSS 2018 – Data Science at NISS – Luca Sartore 38

Page 39: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

A simulation study (continued)3) The setting for the experiments

I Investigating the performance of the algorithm with respect to

I number of measurements m ∈ {64, 101, 161, 256},I learning rate λ ∈ {0.50, 0.60, 0.69, 0.75}.

I The simulated vector of the DSE weights is the same for allthe combinations of m and λ.

SDSS 2018 – Data Science at NISS – Luca Sartore 39

Page 40: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

A simulation study (continued)4) The results

Table: Final loss after QUINCANumber of Learning Rate

Measurements 0.50 0.60 0.69 0.7564 1588 1532 1529 1437

101 1498 1362 1403 1330161 1585 1402 1468 1431256 1610 1590 1343 1327

Table: Elapsed time in secondsNumber of Learning Rate

Measurements 0.50 0.60 0.69 0.7564 0.75s 0.59s 0.82s 0.67s

101 1.74s 2.37s 1.61s 1.30s161 2.99s 3.27s 2.28s 3.04s256 3.19s 2.35s 6.03s 5.02s

SDSS 2018 – Data Science at NISS – Luca Sartore 40

Page 41: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Concluding remarks

I QUINCA is a preliminary improvement of INCA.

I The proposed methodology performs a quantum search that,by design, overcomes the limitations of INCA and finds bettervectors of integer calibrated weights.

I QUINCA adjusts the weights by performing multidimensionalsteps and has the potential of converging heuristically to aglobal solution.

I Future research can exploit quantum entanglement to movetowards a global solution in one step.

SDSS 2018 – Data Science at NISS – Luca Sartore 41

Page 42: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Selected References

Barnett, S. (2009). Quantum information, volume 16. Oxford University Press.

Dasgupta, S., Papadimitriou, C. H., and Vazirani, U. (2006). Algorithms.McGraw-Hill, Inc.

Grover, L. and Rudolph, T. (2002). Creating superpositions that correspond toefficiently integrable probability distributions. arXiv preprintquant-ph/0208112.

Koenker, R. (2005). Quantile regression. Cambridge University Press, NewYork.

Sartore, L. and Toppin, K. (2016). inca: Integer Calibration. R package version0.0.2.

Sartore, L., Toppin, K., Young, L., and Spiegelman, C. (2018). Developinginteger calibration weights for Census of Agriculture. Journal of Agricultural,Biological and Environmental Statistics, Accepted.

Young, L. J., Lamas, A. C., and Abreu, D. A. (2017). The 2012 Census ofAgriculture: a capture–recapture analysis. Journal of Agricultural, Biologicaland Environmental Statistics, 22(4):523–539.

SDSS 2018 – Data Science at NISS – Luca Sartore 42

Page 43: Future of Integer Calibration Weighting Methods...DSE: Dual-System Estimation NASS uses DSE to adjust its estimates by generating weights assigned to each data-record. I DSE requires

Thank you!

Questions?

Luca Sartore, PhD

[email protected]

[email protected]

SDSS 2018 – Data Science at NISS – Luca Sartore 43