Using Machine Learning to Predict Price Dispersion

Theory suggests two sources of price dispersion amongst homogenous goods: market frictions or product heterogeneity. We collected posted-price listings for Kindle Fire tablets from eBay to determine if listing heterogeneity can explain the high degree of dispersion we observe. Using a basic set of controls and empirical techniques in line with the previous literature, we can explain only 13% of variation in posted prices, which is also in keeping with previous research.

However, we can explain 42% of the dispersion by applying machine learning to a richer set of variables, which we extract from raw downloaded HTML pages. We interpret this number as a bound on the role of market frictions in driving price dispersion. Variables describing the amount of information in the listings, the style of the listings, and the content of the listings’ text are effective price predictors independently of one another. Our analysis suggests that the content of the listings’ text plays a primal role in generating the predictions of the machine learning estimator. We repeat our analysis on a cross-section of products across a variety of categories on eBay, including household products, sporting goods, and other consumer electronics, and we find a comparable degree of price predictability across all of the products.