Data Ethics: the emperor’s new clothes?

Last year I took on a new role at dunnhumby, leading our best practice for data management and governance. As part of this remit I knew I needed to assess our position on the hot-new-topic of Data Ethics, deciding where we were doing well and where we need to focus and improve.

The first step was to understand what was meant by the term ‘Data Ethics’. This terms, alongside ‘AI Ethics’ has become a hot trend, with articles, roles, software and whole companies springing up to help data-rich organisations tackle this. The use of the word ‘ethics’ creates a weight to the topic, a sense of obligation and risk. No one wants to risk being branded ‘unethical’, but it’s not immediately obvious how one avoids this; this goes beyond being legal/privacy compliant.

For me, Data Ethics is about what we consider when we decide a) what data to use, and b) how to use it, to ensure we are treating our clients, their customers, and our colleagues with respect. As a customer data science company who transform and analyse billions of data points every day, Data Ethics is not something we take lightly.

Some organisations are ahead of the game and I am very happy to learn from them. For the past few months I have had the Open Data Institute’s ‘Data Ethics Canvas’ printed out and stuck on my wall. It is a stunning visual, and evokes the image of an information and process tsunami and has left me questioning where to start and how much there is to do? It’s easy to feel like a very small person at the bottom of a very large hill.

But as I have started to dig into this topic, I have been relieved to discover this is not a whole new data discipline. In fact, many questions are the same ones that data engineers, CTO, CIOs and data governance teams have been facing for years, and more recently infosec and privacy lawyers. You could consider these aspects the ‘hygiene factors’:

Do I know what data I want to use and what the level of quality is?
Do I have the right permissions to use it?
Can I secure the data from accidental or malicious leak/theft?

For many of us in the data science industry, this stuff, although sometimes complex to answer and implement, is our bread and butter; they are the questions we ask every day to ensure we are compliant with legislation and organizational policy.

And so it’s reasonable to ask, is the buzz around ‘data ethics’ justified? Or is it the emperor’s new clothes? Have we all been doing this already?

More challenging questions

But then we come to the really challenging stuff; the questions that make your brain hurt and have you exploring worm holes at 3am, and many of these questions are indeed fairly new to our industry. Many are a natural extension of the hygiene factors above, but they pose a much more challenging, complex and open-ended set of questions:

What are all the potential consequences on society of the data I am using and how I apply the insights I have derived from the data?
Is my data limited in a way that impacts my understanding and application of insights derived from it?

When it comes to industries that regularly generate and use highly sensitive, highly regulated data, such as health care and banking, examples of the above spring to mind easily. But even in the seemingly innocuous world of retail we should now be asking these much broader questions.

Let’s look at some examples…

What are all the potential consequences on society of the data I am using and how I apply the insights I have derived from the data?

At dunnhumby, when we consider a new way of using data, we usually focus firstly on how this can benefit the shopper – how can their experience of the retailer be improved through the use of data and data science. This might be about offering a better curated range that suits their needs and preferences, it could be about making the check-out more convenient, it might be about how to inspire them based on products they have browsed or previously bought. This will be explicitly linked to the retailer’s strategy e.g. are they trying to grow a certain category or launch a new own-brand range.

Most companies would stop there, confident the objectives will be met, and proceed with the work. What often doesn’t happen is a more holistic assessment that considers the unintended consequences from the use of data – the knock-on effects, the groups that are indirectly impacted, the potential misuse of data or insights. At dunnhumby we have a data governance board designed to debate and decide upon these scenarios, and we are increasingly getting into these holistic assessments.

In recent months, for example we have discussed the potential implication of joining up loyalty card transactions for multiple members of the same household (something considered common practice in many countries). The benefits being that we creating a more realistic view of the consumers of products, and have the ability to understand how tastes and preferences impact the entire household shop. We may also use this to streamline communications to that household, so they get one relevant message, rather than duplicates to each individual member. All sounds sensible and beneficial to the shopper. But there are potential downsides; could the ‘relevant communications’ expose purchases by one household member to another? Could we fall into assumptions about the make-up of the household based on societal expectations?

Another example that is often debated is the use of demographic or profiling identities. This has been common practice across the world, with features such as age, gender, socio-economic band still regularly used to segment and target customers, and in many countries going beyond this to look at ethnicity and religious affiliation. But how much does the use of these labels help reinforce harmful, out-of-date stereotypes, and limit our ability to challenge and change these? For the past 30 years, dunnhumby have focused on analysing what people are really doing, what they are really buying and using that to improve and tailor their experience with retailers. We strongly believe what you have purchased previously is a much better indication of what you might purchase in future than demographic information and helps us avoid potentially harmful stereotypes.

Does my data limited in a way that impacts my understanding and application of insights derived from it?

Traditionally when we consider data quality, we want a complete, timely, accurate data set. In the world of retail this often means a data set that contains all the till transactions for a recent period, and ideally a link to a customer database created through marketing activities such as a loyalty programme. In this scenario ‘complete’ data could still be missing vital behaviours or groups of customers who represent sections of society because they are not accessible through the retailer’s data.

An example would be a traditional loyalty segmentation that considered how much a customer has spent in assessing their ‘loyalty’ (RFV; recency, frequency and value). If loyalty is truly what we want to understand, the important data is their share of wallet, or the number of categories they are shopping. If we go on total spend alone we can easily be excluding customers who are spending less overall which could be driven by many factors, not necessarily their loyalty. The unintended consequence of this could be that those customers are missing out on the best offers and coupons, despite being loyal to the retailer. The reason that they are spending less could be a consequence of being in a lower income bracket, and so we are starting to uncover some potential unintentional bias or even discrimination.

Another example would be the very use of a loyalty card to capture transactional data to analyse. Many retailers will be making significant decisions about how to best serve their customers through this dataset, however there could be whole communities that are not represented because they chose not to use loyalty cards for a variety of reasons.

Conclusion

It’s become clear to me that Data Ethics is not the emperor’s new clothes; it is an evolution of data security and data privacy and it is bringing new considerations and challenges to the data science industry. This is partly driven by government and legislation, but also by citizens’ expectations.

In the world of retail loyalty, we often talk about a ‘value exchange’ – this used to be limited to tangible monetary reward for sharing data (e.g. if I use my loyalty card, I get some relevant coupons), but it is going beyond this; people now expect their data to be used ethically and even to contribute to improvements across society.

It is an incredibly challenging area for data rich companies; there is no binary answer to whether an action, a use of data, a data set is ‘ethical’. The most we can do is establish solid frameworks to assess the scale of impact on individuals and set parameters that allow our data science and data engineering communities to work and innovate within while minimising risk.

The key for me is that we continue to challenge ourselves and debate these topics. There will be many times that we cannot avoid some unintended consequences or bias, but what is crucial is that we do this knowingly and thoughtfully and use these experiences to continuously improve how we use data to provide benefits.

Blog: How to overcome assortment challenges? Lessons from history

Cookie	Description
cli_user_preference	The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data.
cookielawinfo-checkbox-advertisement	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
CookieLawInfoConsent	The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wsaffinity	Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky.

Cookie	Description
wordpress_test_cookie	WordPress cookie to read if cookies can be placed, and lasts for the session.
wp_lang	This cookie is used to remember the language chosen by the user while browsing.

Cookie	Description
CONSENT	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
vuid	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
_ga	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_gat_gtag_UA_*	This cookie is installed by Google Analytics to store the website's unique user ID.
_ga_*	Set by Google Analytics to persist session state.
_gid	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjSessionUser_{site_id}	This cookie is set by the provider Hotjar to store a unique user ID for session tracking and analytics purposes.
_hjSession_{site_id}	This cookie is set by the provider Hotjar to store a unique session ID, enabling session recording and behavior analysis.
_hp2_id_*	This cookie is set by the provider Hotjar to store a unique visitor identifier for tracking user behavior and session information.
_hp2_props.*	This cookie is set by the provider Hotjar to store user properties and session information for behavior analysis and insights.
_hp2_ses_props.*	This cookie is set by the provider Hotjar to store session-specific properties and data for tracking user behavior during a session.
_lfa	This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address.

Cookie	Description
aam_uuid	Set by LinkedIn, for ID sync for Adobe Audience Manager.
AEC	Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge.
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg	Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud.
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg	Set by LinkedIn, Unique Identifier for Adobe Experience Cloud.
AnalyticsSyncHistory	Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland).
bcookie	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID.
bscookie	LinkedIn sets this cookie to store performed actions on the website.
DV	Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site.
ELOQUA	This cookie is set by Eloqua Marketing Automation Tool. It contains a unique identifier to recognise returning visitors and track their visit data across multiple visits and multiple OpenText Websites. This data is logged in pseudonymised form, unless a visitor provides us with their personal data through creating a profile, such as when signing up for events or for downloading information that is not available to the public.
gpv_pn	Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics.
lang	Session-based cookie, set by LinkedIn, used to set default locale/language.
lidc	LinkedIn sets the lidc cookie to facilitate data center selection.
lidc	Set by LinkedIn, used for routing from Share buttons and ad tags.
li_gc	Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes.
li_sugr	Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland).
lms_analytics	Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics.
NID	Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads.
OGP / OGPC	Set by Google, cookie enables the functionality of Google Maps.
OTZ	Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate.
s_cc	Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics.
s_ips	Set by LinkedIn, tracks percent of page viewed.
s_plt	Set by LinkedIn, this cookie tracks the time that the previous page took to load.
s_pltp	Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics.
s_ppv	Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed.
s_sq	Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics.
s_tp	Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page.
s_tslv	Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics.
test_cookie	Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies.
U	Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland).
UserMatchHistory	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
UserMatchHistory	This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy.
VISITOR_INFO1_LIVE	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages.
yt-remote-connected-devices	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
_gcl_au	Set by Google Analytics, to take information in advert clicks and store it in a 1st party cookie so that conversions can be attributed outside of the landing page.

Data Ethics: the emperor’s new clothes?

More challenging questions

Conclusion

TOPICS

Get in touch

The latest insights from our experts around the world

How to overcome assortment challenges? Lessons from history

Unlocking success: why agency planning teams should get Tesco retail media certified

Smart Retail: AI cheat sheet for retail execs

Data Ethics: the emperor’s new clothes?

More challenging questions

Conclusion

TOPICS

RELATED PRODUCTS

Get in touch

The latest insights from our experts around the world

How to overcome assortment challenges? Lessons from history

Unlocking success: why agency planning teams should get Tesco retail media certified

Smart Retail: AI cheat sheet for retail execs