cs229 final writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day )...

CS229 Final Project

Shale Gas Production Decline Prediction Using Machine Learning Algorithms

Wentao Zhang [email protected] Shaochuan Xu [email protected]

In petroleum industry, oil companies sometimes purchase oil and gas production wells from others instead of drilling a new well. The shale gas production decline curve is critical when assessing how much more natural gas can be produced for a specific well in the future, which is very important during the acquisition between the oil companies, as a small under-‐estimate or over-‐estimate of the future production may result in significantly undervaluing or overvaluing an oilfield. In this project, we use the Locally Weighted Linear Regression to predict this future production based on the existing decline curves; Then, we apply the K-‐means to group the decline curves into two categories, high and low productivity; Moreover, Principal Component Analysis is also tried to calculate the eigenvectors of the covariance matrix, based on which we also predict the future production both with K-‐means as a preprocess and without K-‐means. At last, three methods are compared with each other in terms of the accuracy defined by a standardized error.

l Dataset The data we used are monthly production rate curves of thousands of shale gas wells as

below. In order to deal with different lengths of different curves, and the 0 production rate data points of some curves, we modify these data a little bit. We substitute 0 data points in any curve by a very small number 0.0001, and we make all the curves the same length by adding zeros to the end, for the sake of being loaded into MATLAB as a matrix.

Figure 1 2152 decline curves used to learn to predict production in the future

l Locally Weighted Linear Regression (LWLR) Our goal is to predict the future gas production of a new well, given its historical

production data and information from other wells with longer history. Suppose that we randomly choose a decline curve r with n months in total. We want to use the first l month to

0 20 40 60 80 100 120 140 month

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

pro

duct

ion

rate

( m

cf /

day

)

shale gas production decline - all curves

predict the rest (n-‐l) months of the curve. In order to find curves from the training set that are “similar” to r, we define the distance between two curves by squared L-‐2 norm. Before we calculate the distance, we need to filter the training set by removing curves whose history is shorter than n. Then we pick k wells from the filtered training set that are closest to r, give each of them a weight wi and make prediction for r as:

!!

fpredicted =w(d( fpast _existing(i ) , fmeasured )/h)⋅ f future_existing(i )

i∈neighbk ( fpast _existing )∑

w(d( fpast _existing(i ) , fmeasured )/h)i∈neighbk ( fpast _existing )

∑

Where h is the longest distance. Results:

Figure 2 High-productivity, good fit (upper left); High-productivity, bad fit (upper right);

Low-productivity, good-fit (lower left); Low-productivity, bad fit (lower right). We restrict the number of neighbors k equal to 3. In Figure 2, four typical predicted curves are shown. The results are generally consistent with the real values. Comparatively, the predicted curves are smoother than the real ones, because the predictions are the sum of multiple training wells.

20 40 60 80 100 120

month0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )

Well 1219 - LWLR known curve predicted curve test curve

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )


Figure 3 Predicted curves with l (known months) increasing for a good fitting; error vs. l

Figure 4 Predicted curves with l (known months) increasing for a bad fitting; error vs. l

In Figure 3 and Figure 4, we change the known curve from short to long, and plot the error versus the known months. The error does not decrease when we know longer curve and predict shorter. The reason might be that the Standardized Error is defined as the average relative error of predicted months. For this reason, in the tail of the curve, since absolute values are small, relative errors are easily to be large. A better error needs to be defined if we really want to tell if the prediction is better with longer known curve.

l Principal Component Analysis (PCA) Since each well has a history as many as tens of months, intuitively, we want to reduce

the dimensions of time and keep the intrinsic components that reflect production decline. First of all, we filter the training set by removing the wells whose history is shorter than the total months n of a test curve. After the normalization on the data, we eigen-‐decompose the empirical covariance matrix and extract the first 5 eigenvectors as the principal components. Then, we fit the known part of the test curve by using a linear combination of 5 eigenvectors. The coefficients θ of the linear combination are calculated from linear regression,

known ly U θ=

And we predict the future decline curve as,

estimate hy U θ=

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )


20 40 60 80 100 120 Month

0

0.2

0.4

0.6

0.8

1

1.2

Sta

ndar

lized

Erro

r

Well 1219 - LWLR

20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )


20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )


20 40 60 80 100 120 month

0

1000

2000

3000

4000

5000

pro

duct

ion

rate

( m

cf / d

ay )


20 40 60 80 100 120 Month

0

0.2

0.4

0.6

0.8

1

1.2

Sta

ndar

lized

Erro

r

Well 1923 - LWLR

Where yknown∈Rl is the normalized known history of the test well, Ul∈Rl*5 is the eigenvectors with the first l dimensions. Our estimation is therefore yestimate. Results:


Low-productivity, good-fit (lower left); Low-productivity, bad fit (lower right). As can be seen from Figure 5, the prediction is either too smooth or too variant compared to the real data. This is because at the fitting step, θ is either underfitted (high variance) or overfitted (low variance). Another problem in PCA is that all the training wells have contribution to the estimation, which makes it unprecise for very high or low production prediction.

l PCA after K-‐means If we assume high-‐productivity wells are similar to each other and low-‐productivity

wells are similar to each other, we can group all the decline curves into two categories. We modify the K-‐means method to be applied into this real situation that different decline curves have different dimensions. We calculate the distance between a centroid and a curve by using the dimension of the shorter one. As comparing two figures in Figure 6, this modified K-‐means method is good enough to distinguish high-‐productivity wells from low-‐productivity wells.

Figure 6 Decline curves in the high productivity wells (left) and the low productivity wells (right)

Then, we run PCA again after clustering the original decline curves by K-‐means.


Low-productivity, good-fit (lower left); Low-productivity, bad fit (lower right). From Figure 7, we can see that although the underfitting/overfitting problem still exists, the results are better than the original PCA. This might be due to we add the L-‐2 norm distance information into the PCA, which makes it an integrated method.

l Discussion

Figure 8 Errors of three methods calculated from Leave-One-Out cross validation

We apply Leave-‐One-‐Out cross validation to all the three methods, compare the predictions with real production data and calculate the average relative errors as in Figure 3 and Figure 4. We also define a threshold value to avoid the extremely large errors. The reason we do this is that one extreme value can make the average of all relative errors really huge, but these extreme values are due to the shutting down periods of the wells (when the production is nearly zero). Figure 8 verifies our intuition that LWLR is the best among the three methods because no information is lost due to dimension reduction. PCA has the largest relative error among three methods because higher order principal characteristics, reflecting the details of decline curves, are not included. K-‐means helps cluster the wells into high and low productivity classes, which improves PCA with the availability of that prior information.

cs229 final writeupcs229.stanford.edu/proj2016spr/report/028.pdf · production rate ( mcf / day )...

Documents