- Home
- Source Files
dunnhumby
Source files
Real-world data to put your theory into practice
(Nearly) Real-world data
Here at dunnhumby, we understand the importance of great data and the analysts who make sense of it. Uncovering patterns, predicting trends, validating theories — insight gained through analysing customer data is the foundation of our business and key to the success of every one of our clients.
But more than that, we just really love data. We love connecting the dots. We love the human stories data can help you tell. And we love the people who love data as much as we do. That’s why we created Source Files, a platform for sharing datasets inspired on the real-world, where fellow data geeks – from professors to students to data scientists – can easily access rich data sources. Whether you’re teaching a course, completing a class project, testing an algorithm, or running a hack-a-thon, Source Files is the place to go to put your theory into practice.
Breakfast at the Frat
What’s inside?
- A representation of sales and promotion information on five products from three brands within four categories (mouthwash, pretzels, frozen pizza, and boxed cereal) over 156 weeks.
- Unit sales, households, visits, and spend data by product, store, and week
- Base Price and Shelf Price, to determine a product’s discount, if any
- Promotional support details (e.g. sale tag, in-store display), if applicable
What’s it for?
This dataset is designed to facilitate time series analyses, including:
- Price sensitivity analysis
- Promotional effectiveness analysi
- Comparing/contrasting results across products, categories or store geographies
Download 'Breakfast at the Frat: A Time Series Analysis'
Something went wrong, please try again.
Thank you - your request was successful.
Carbo-Loading
What’s inside?
- A representation of household level transactions over a period of two years from four categories: Pasta, Pasta Sauce, Syrup, and Pancake Mix
What’s it for?
- Classroom projects and case studies
- Understanding the process required to mine data
- Learning how to merge data tables and aggregate data
How should I use it?
Professors have had success asking students questions such as:
- What is the household penetration of Product X? That is, out of all customers purchasing Pasta Sauce, what percent purchase Product X or Brand Z?
- Did any customers first purchase an item or category using a coupon? If so, how many of these customers made additional purchases of the item or category?
- In two complementary categories (e.g. Pasta and Pasta Sauce), what products, if any, are commonly purchased together?
Special considerations
Don’t forget, you’re dealing with Big Data! Large file sizes may take 5+ minutes to download, and importing the millions of rows of data contained within will require specialised software such as R, Microsoft Excel with PowerPivot, Microsoft Access, SAS, SPSS, SQL, etc.
Download 'Carbo-Loading: A Relational Database'
Something went wrong, please try again.
Thank you - your request was successful.
The Complete Journey
What’s inside?
- A representation of household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer
- All of a household’s purchases within the store, not just those from a limited number of categories
- Customer attributes and direct marketing contact history for select households
What’s it for?
- More advanced classroom settings
- Academic research on the effects of direct marketing to customers
How should I use it?
Professors have had success asking students questions such as:
- How many customers are spending more/less over time?
- Which customer attributes appear to affect spend of the customer?
- Is there evidence to suggest that direct marketing improves overall customer engagement?
Special considerations
Don’t forget, you’re dealing with Big Data! Large file sizes may take 5+ minutes to download, and importing the millions of rows of data contained within will require specialised software such as R, Microsoft Excel with PowerPivot, Microsoft Access, SAS, SPSS, SQL, etc.
Download 'The Complete Journey'
Something went wrong, please try again.
Thank you - your request was successful.
Let’s Get Sort-of-Real
What’s inside?
By the numbers
- 117: Weeks of transactions at till dummy data
- 300M: Total number of transactions
- 47M: Total number of baskets
- 400,000: Average number of baskets per week
- 2.6M: Average number of transactions per week
- ~500,000: Distinct number of customers
- ~5,000: Distinct number of products
- ~760: Distinct number of stores
What’s it for?
We’ve replicated the typical patterns found in real in-store data to help data scientists test their techniques and algorithms in a (nearly) real-world environment.
A note on download times
Please remember, you’re dealing with Big Data! Large file sizes can result in download times of five minutes or more. Please be patient.
Samples available
- Data preview
- 2,000 baskets, randomly selected, over a period of two weeks
- All transactions for a randomly selected sample of 5,000 customers
- All transactions for a randomly selected sample of 50,000 customers
User guide
Downloading the full dataset? You’ll want to check out our handy User Guide too.
Download 'Let's Get Sort-of-Real: Data Sample'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Sample 2K baskets'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Sample 5K customers'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Sample 50K customers'
Something went wrong, please try again.
Thank you - your request was successful.
Full dataset
Ready to get real? Grab the full 4.3GB dataset below (in nine ~500MB files, for your downloading convenience).
Download 'Let's Get Sort-of-Real: Part One'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Two'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Three'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Four'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Five'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Six'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Seven'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Eight'
Something went wrong, please try again.
Thank you - your request was successful.
Download 'Let's Get Sort-of-Real: Part Nine'
Something went wrong, please try again.
Thank you - your request was successful.