modeling the dynamics of online auctions using a functional data analytic approach galit shmueli (+...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Modeling the Dynamics of Online Auctions
Using a Functional Data Analytic Approach
Galit Shmueli (+ Wolfgang Jank)Dept of Decision & Information
TechnologiesRobert H. Smith School of BusinessUniversity of Maryland, College Park
December 2004
2
Overview Online auctions
Importance How they work “Classical” empirical research and new opportunities
Where are the statisticians? Using FDA for
Representing auctions Studying auction dynamics Comparing auctions Exploring relations with other variables
Current & Future directions
3
Online Auctions Central in the eMarket place (eBay,
Yahoo!, Amazon.com…) High accessibility, low transaction costs eBay has more than 27M active users
(from over 61M registered). Every moment there are ~10M items across more than 43,000 product categories amounting to nearly $15 billion in gross merchandise sales (BusinessWeek, 2003)
4
Online Auctions
The focus of much empirical research
Players: IS and economists
We’re looking at this from a whole new perspective! (and lots of this can be applied to other eCommerce data)
5
eBay.com Is by far the largest C2C auction site
Buy/sell anything imaginable (Almost) anyone can buy/sell. You need a
credit card to register (free). In lots of countries
6
How eBay auctions work:Selling an item
Set some auction features (duration, opening price,…)
Describe item
Bells & whistles
+ more info on shipping, text description, payment options, etc.
7
How eBay auctions work: Bidding on an item
Choose auction Proxy bidding:
Place max bid eBay bids for you Price increases by
one increment Highest bidder
pays 2nd highest bid
Highest bid is not disclosed!
8
Bidding on an item – cont.
Auction theory: bid your max and leave In practice: lots of sniping Sniping agents (wow – more data!)
9
Research Q’s Asked by Economists and IS researchers Auction design mechanisms – mostly regressions on final
price Lucking-Reiley et al: Opening Bid, Number of Bidders, Number
of Bids, Length of Auction, Reputation of Seller Bapna et al: Bid increments
Winner’s Curse – structural model + prior Winner likely to over-pay (Bajari & Hortacsu)
Bid Shilling – t-tests Fraudulent “price-pushing” by the seller (Kauffman & Wood)
Reputation and trust – regression, probit model Seller rating effect on price or P(+ rating) (Wood et al; Ba &
Pavlov) Bid Sniping – bid time CDF
Last minute biding to increase chances of success (Roth & Ockenfels)
But early bidding also prevalent Bidding strategies – k-means clustering
3 strategies: Participators, evaluators, opportunists (Bapna et al.)
No statisticians playing the game!
11
Why? Data Accessibility? eBay displays data for all auctions completed
in the last 30 days. Millions of auctions (how do you sample?) Data are on in HTML format!!!!
Researchers use spiders (web agents) People usually write their own code eBay changes the rules and formats eBay does NOT like spiders You really need some programming expertise
Commercial software (Andale, Hammertap) data directly from eBay limited (mostly aggregates) Expensive, unreliable
12
Lots of opportunities there!
No statistical framing (sample/pop, type of data, etc)
No data visualization Mostly “traditional” statistical
methods Ignoring data Sampling issues and more….
13
Unstated assumptions in current (static) approach
An auction is an observation from a population of eBay auctions (US market, certain time-frame, etc.)
Sample collected by web-spider is random and representative of population.
Data structure: multivariate, with a fixed set of measurements on each auction
Auctions are independent
14
Visualizing Online Auction Data
Lots of empirical research, but no-one is LOOKING at the data!
Ordinary displays not always useful
Shmueli & Jank, “Visualizing online auctions”, JCGS, forthcoming
15
Enlightening Visualizations
Detecting Fraud (color = seller rating)
16
Advanced visualizations for interpreting modeling results
Surplus from eBay auctions (Bapna, Jank, & Shmueli, 2004)
Data from sniping agent gives highest bid
What are factors that affect surplus? Advanced, interactive visualizations
help learn the multidimensional structure of the data and to interpret results of complicated models! Beats heavy statistical software like SAS
17
Understanding complicated results: surplus model
(log) Price-4 -2 0 2 4 6 8
0
1
2
3
4
5
6
7
Number of Days2 4 6 8 10
2
4
6
8
10
12
Variable Coefficient SE Pvalue Intercept 2.51 0.52 <.0001 Categories* Antique/Art 0.41 0.10 <.0001 Pottery/Glass 0.28 0.07 0.00 Collectibles 0.41 0.05 <.0001 EverythingElse 0.38 0.09 <.0001 Toys/Hobbies 0.33 0.08 <.0001 Music/Movie/Games 0.39 0.15 0.01 Jewelry -0.30 0.12 0.01 Automotive -0.24 0.06 0.00 Home/Garden -0.26 0.05 <.0001 Health/Beauty -0.16 0.06 0.00 US Dollars** 0.20 0.04 <.0001 NUM_DAYS -0.15 0.07 0.03 SNIPE_TIME -0.23 0.06 0.00 NUM_BIDDERS*** -0.52 0.05 <.0001 PRICE*** 0.36 0.03 <.0001 S_RATING*** -0.03 0.01 0.00 W_RATING*** 0.03 0.01 0.02 OPENING_BID*** -0.17 0.02 <.0001 OPENING_BID x PRICE 0.04 0.00 <.0001 PRICE x NUM_BIDDERS 0.09 0.02 <.0001 NUM_DAYS x SNIPE_TIME 0.02 0.01 0.01 * Base Category: Books, Business/Industry, Clothing/Accessories, Computer, Coins/Stamps, Electronics, Photography, Sporting Goods ** Base category: Euros and GBP *** The variables surplus, price, opening bid, winner rating, seller rating and number of bidders were transformed to the log-scale
18
Back to current research
Almost exclusively static Auction =
Snapshot at end response: price,
# bids,…
But eBay does show complete bid histories!
19
Our new dynamic approach
Auction = complete bid history Response:
Price over time # of bidders over time Average bidder rating over time…
Interested in auction dynamics! Car/horse race
20
Data Structure: Challenges Each bid history = time series measured at
unequally-spaced time points, closed interval.
Bidding is usually sparse at mid-auction and dense at auction end
Different auctions Different number of bids, placed at different
times Different durations
Much variability across auctions We have LOTS of auctions! How to represent an auction?
21
Alternative representation: Curves! Functional Data Analysis is a
modern statistical approach suitable for modeling objects (curves, 3D objects, etc), not just scalars/vectors.
Made famous by the two monographs of Ramsay & Silverman
http://ego.psych.mcgill.ca/misc/fda
22
Example of FDA: Handwriting
Possible goal: detect fraudulent signature
Twenty traces of writing “fda” by same person
We can think of these traces as functions with X,Y coordinates
Use FDA to explore and model similarities and differences between the 20 traces.
23
FDA for bidding data Bids from single auction are
represented by single entity Assume a very flexible
underlying curve for all auctions
Storage and computation: represent each auction by some basis function and a set of coefficients
Perform statistical analyses on the coefficients, or a grid taken on the curves
24
The bidding path (=the functional object)
An auction is represented by its bidding path, a continuous function relating $ (or other!) over time
In practice, bidding paths are observed at random discrete time points. These are in the observed bid histories
We aim to reconstruct the unobservable continuous profile from the observed discrete bid history
25
Recovering the bidding path
Use smoothing to recover the bidding path
One useful smoother is the Penalized Smoothing Spline Piecewise polynomial with smooth
breakpoints
Penalize curvature by minimizing
jjj dxxftfyfPENNSE 22 )('')()(
fit curvature
pL
l plp
p tttttf
1
2210 )()(
26
Smoothing Splines for recovering bidding paths Strengths
Good tradeoff between fit and local variability Computationally cheap (+ numerically stable):
well approximated by a finite set of Bspline basis functions
For smooth derivatives penalize higher order derivatives
Challenges Must determine and knots Requires prior interpolation+smoothing Curves not necessarily monotone
q
i ii ttf1
)()(
27
From bid histories to bidding paths: potential enhancements Use live-bids rather than proxy-bids Use monotone splines (non-decreasing)
Integrate auction theory into curve requirements (knot positions, polynomial order, etc)
0 1 2 3 4 5 6 7
4.95
5
5.05
5.1
5.15
5.2
5.25
5.3
5.35
5.4
5.45
Day of Auction
log(
Cur
rent
Pric
e)
Case 6 RMS residual = 0.073634
0 1 2 3 4 5 6 7
5.05
5.1
5.15
5.2
5.25
5.3
5.35
5.4
5.45
log(
liveb
id)
Auction 6 (monotone splines)
28
Learning about Auction dynamics (the auction as a car race)
1st derivative = velocity, 2nd = acceleration, 3rd=? Auction #1 Auction #2
29
A sample of auctions
158 auctions for new Palm M515 PDAs
7-days, new $250
30
And their derivatives
31
Curve fitting: Sensitivity Analysis
Smoothing splines + pre-smoothing monotone smoothing splines
Choice of knots hardly influential Smoothing parameter chosen ad-
hoc
32
Smoothing spline vs.Monotone smoothing spline
pL
l plp
p tttttf
1
2210 )()(
t
dxxDf
xfDDCCtf
0
21
10 )(
)(exp)(
dxxDf
xfDtfyfF
jjj
222
)(
)(
j
jj dxxfDtfyfPENNSE222 )()()(
33
Basis function expansions
Splines: linear combination of B-splines
Monotone: The ratio can be approximated by a linear combination of basis functions
Fitted function:
DffD /2
)}]([exp{ )( 1110 tDDtf T φc
q
iii ttf
1
)()(
34
Exploratory analysis of curves: Auction Explorer
35
“Handling” the curves
Two approaches Functional datum (fd):
Use curve coefficients directly in analysis When: linear representation + linear
operations Grid
Use a set of discrete values from a grid taken on the curves.
When: nonlinear operations and nonlinear representation (e.g. monotone splines)
36
Exploring & Modeling The Auction Curves
Summaries of curves Average curve 95% CI for curve Bid paths and/or
derivative curves Compare subsets
of auctions
37
Exploratory analysis: Auction Clustering
Using the bidding curve coefficients we apply cluster analysis (k-medoids)
Early bidding Sniping
38
Comparing cluster dynamics: Phase-plane plots
Early bidding
Sniping
39
Characterizing the 2 Profiles
Two profiles diverse wrt Opening Bid Investigate this influence dynamically
via Functional Regression
Opening Bid Seller Rating Bidder Rating # Bids
Early 46.01(7.94) 908.16 (106.08) 101.86 (10.42) 7.04 (0.52)
Late 22.31(6.94) 1171.54 (292.89) 94.29 (13.29) 11.13 (0.83)
40
functional-PCA : When do auctions behave differently?
When during the auction do bid curves deviate most/least?
PCA+ varimax
300 premium wristwatches
Principal components as perturbations of the mean
41
Functional Regression Models Involve a curve as a response/predictor In our case, response = bidding path Predictors:
Static: opening price, seller rating, etc. Dynamic: current # bidders, current avg
bidder rating Grid: fit a regression model at each grid
point and then interpolate the coefficients
42
Functional Regression of Bidding Path vs. Opening Bid
Estimated Parameter Curve
43
Functional Regression of Bidding Acceleration vs. Opening Bid
Estimated Parameter Curve
44
Interpretation: Opening Bid and Auction Energy
Value of Item
Open Bid
Value of Item
Open Bid
Potential Market Energy left in the auction
45
Current & Future Directions Real-time forecasting of bidding paths of
ongoing auctions Representing an auction in 2D (price + #bids
over time) Modeling other aspects of auction data
Consumer surplus – with Ravi Bapna Bid arrival process – with Ralph Russo (Iowa) New predictors: currency, category, and dynamic ones Effects of auction design changes eBay addiction
Other eCommerce and IT applications Papers:
http://www.smith.umd.edu/ceme/statistics
Extras
47
Smoothing Spline Parameters Order of the Spline
cubic spline: popular, provides smooth fit; 2nd derivative (curvature), no breakpoints
To obtain m smooth derivatives, use spline of order m+2.
Knot locations (breakpoints) The more knots, the more flexible
(wiggliness) Tradeoff between data-fit and
variability of function Smoothness penalty parameter
fit approaches exact interpolation fit approaches linear regression
48
Alternatively: bspline basis functions B-splines on fixed grid
of knots (s1<s2<…sq) give good approximation to most smooth functions Computational aspect:
numerical stability, especially for irregularly distributed time-points
They form a set of natural cubic splines with limited support
q
iii ttf
1
)()(
Basis function i
coefficients
WyW ''ˆ 1