Department of Labor

Webscrape price data for the CPI

Hundreds of data collectors make in-person visits to retailers around the US to collect prices for the Consumer Price Index. Other data collectors manually collect data from retail websites. CPI data collection methods are, slow, labor intensive, and allow human error.


The CPI should begin collecting prices by webscraping. Webscraping uses automated computer programs to collect data from websites. Webscraping programs can be directed to harvest price data directly from retail websites. Economists at MIT already use webscraping to create their own price indexes under the "Billion Prices Project." The CPI should adapt its data collection methods to incorporate webscraping technologies. With webscraping, the BLS could potentially collect the same data with hundreds fewer workers. Shifting data collection from brick and mortar outlets to websites (when possible) would reduce respondent burden as well as BLS expenses.


Webscraping would allow improved, as well as cheaper, data collection. Automatic data collection eliminates human error. Data could be collected accurately without devoting resources to managing data collectors and double checking their work. Webscraping also allows larger sample collection since BLS could easily collect all prices from a given website instead of limiting itself to a small, manually collected sample. A larger sample size would improve the accuracy of the CPI. During tough economic times, accurate economic statistics are essential. Webscraping would allow the BLS to produce a better product at a lower cost.

I agree to have my idea, not my name or information, posted online. YES


Idea No. 8598