With vast amounts of data now being generated by every business process and customer touch point, companies in almost every industry are focused on making the most of this data for competitive advantage. This has led to data science becoming a ‘must have’ capability, and data science teams being rapidly formed in businesses of all sizes, in all sectors.
Having spent over 20 years working in, setting up or running teams designed to leverage this competitive advantage, I wanted to share my observations on what it takes to build a great data science team. The starting point is a simple eight step framework that ensures you begin with the best foundation for success.
We all know that the first step to building a team is to hire some people, but what skillsets are required to build a high-performing data science team? There is a lot of hype about unicorn data scientists who can do it all and emerging fields like Machine Learning (ML) Engineering and ML Ops but it’s not just about recruiting a bunch of bright technical experts; make sure you hire people who can understand your business, construct a problem statement and extract insights from science and analytics. These traits are as important as being able to use mathematical models to create the next algorithm.
Integrate your team into the business, as it is critical that data science delivers at the right part of the decision-making process. Successful teams are moving away from influencing decisions to augmenting or even automating decisions, this means science models must be incorporated in the business process. Therefore it is critical to invest time to understand these processes and build commercial understanding by being as close to the other departments as possible.
Analysts and data scientists don’t always like agile. They like logical flows, assembling all data before building features and before building models. But businesses often don’t have the luxury of time for this approach, wanting to see results quickly as they make day-to-day decisions. You need to strike a balance, and be flexible, responsive and adaptable by working in an agile way, adding more data and features as you progress, and building confidence early.
It’s a tricky thing to do without results to benchmark, but you must estimate the value that your scientific models will add to the business if implemented. Even before you start, you should use your business knowledge to make assumptions about the outcomes. After implementation, measure the value and communicate this to stakeholders. We’re all impatient to move onto the next thing, but the best ammunition for arguing for more resource, budget or time is demonstrating the value that your data science team can deliver.
It’s natural that people will be sceptical of your team in the beginning, as many decisions are made on experience and gut instincts, not founded in data. Start small to build trust with your stakeholders, find those that are more open to the data-driven approach, and use your results in these areas to influence more widely.
Preparing teams to productionise code helps you to move faster, this will mean you need engineering skills as well as modelling skills within the team but ensures that the business can see results from your work more rapidly.
Make sure the business knows you are there, what your remit is, and how it will benefit the business overall. Communicate as widely and regularly as you can, demonstrating how and where data science is improving business processes, growing sales, helping win new customers, creating efficiencies.
Data science is still evolving with new techniques, technologies and concepts being introduced frequently, rethink how you equip your team with the skills and knowledge they need. Bring the outside in, encourage curiosity, and learn from others, both within the business and the wider data science community.
With data scientist now feted as one of the most sought-after roles in business, and investment in building this capability being ramped up across the board, it can be confusing to know where to start but follow these principles and you’ll set your data science team up for the best possible success.
Cookie | Description |
---|---|
cli_user_preference | The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data. |
cookielawinfo-checkbox-advertisement | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category . |
cookielawinfo-checkbox-analytics | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
CookieLawInfoConsent | The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data. |
viewed_cookie_policy | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
wsaffinity | Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky. |
Cookie | Description |
---|---|
wordpress_test_cookie | WordPress cookie to read if cookies can be placed, and lasts for the session. |
wp_lang | This cookie is used to remember the language chosen by the user while browsing. |
Cookie | Description |
---|---|
fs_cid | Set by FullStory to correlate sessions for diagnostics and session consistency; not always set. |
fs_lua | Set by FullStory to record the time of the user’s last activity, helping manage session timeouts. |
fs_session | Set by FullStory to manage session flow and recording. Not always visible or applicable across all implementations. |
fs_uid | Set by FullStory to uniquely identify a user’s browser. Used for session replay and user analytics. Does not contain personal data directly. |
VISITOR_INFO1_LIVE | Set by YouTube to estimate user bandwidth and improve video quality by adjusting playback speed. |
VISITOR_PRIVACY_METADATA | Set by YouTube to store privacy preferences and metadata related to user consent and settings. |
vuid | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website. |
YSC | Set by YouTube to track user sessions and maintain video playback state during a browser session. |
_ga | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_ga_* | Set by Google Analytics to persist session state. |
_gid | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
_lfa | This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address. |
__Secure-ROLLOUT_TOKEN | YouTube sets this cookie via embedded videos to manage feature rollouts. |
Cookie | Description |
---|---|
aam_uuid | Set by LinkedIn, for ID sync for Adobe Audience Manager. |
AEC | Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge. |
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud. |
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, Unique Identifier for Adobe Experience Cloud. |
AnalyticsSyncHistory | Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
bcookie | LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID. |
bscookie | LinkedIn sets this cookie to store performed actions on the website. |
DV | Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site. |
ELOQUA | This cookie is set by Eloqua Marketing Automation Tool. It contains a unique identifier to recognise returning visitors and track their visit data across multiple visits and multiple OpenText Websites. This data is logged in pseudonymised form, unless a visitor provides us with their personal data through creating a profile, such as when signing up for events or for downloading information that is not available to the public. |
gpv_pn | Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics. |
lang | Session-based cookie, set by LinkedIn, used to set default locale/language. |
lidc | Set by LinkedIn, used for routing from Share buttons and ad tags. |
lidc | LinkedIn sets the lidc cookie to facilitate data center selection. |
li_gc | Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes. |
li_sugr | Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
lms_analytics | Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics. |
NID | Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. |
OGP / OGPC | Set by Google, cookie enables the functionality of Google Maps. |
OTZ | Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate. |
s_cc | Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics. |
s_ips | Set by LinkedIn, tracks percent of page viewed. |
s_plt | Set by LinkedIn, this cookie tracks the time that the previous page took to load. |
s_pltp | Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics. |
s_ppv | Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed. |
s_sq | Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics. |
s_tp | Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page. |
s_tslv | Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics. |
test_cookie | Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies. |
U | Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
UserMatchHistory | LinkedIn sets this cookie for LinkedIn Ads ID syncing. |
UserMatchHistory | This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy. |
yt-remote-connected-devices | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
_gcl_au | Set by Google Tag Manager to store and track conversion events. It is typically associated with Google Ads, but may be set even if no active ad campaigns are running, especially when GTM is configured with default settings. The cookie helps measure the effectiveness of ad clicks in relation to site actions. |