A recent LinkedIn article claimed that Python has overtaken R as the lead language used by data scientists to create Machine Learning platforms. While this may be no surprise to many in the data science community who have embraced Python for its scalability and ease of use, the uninitiated may be wondering what all the fuss is about.
Programming languages, believe it or not, have existed for over 200 years. We’ve come a long way from punch-card programmable looms and machine-specific assembly languages to the programming languages we are so familiar with today. But early low-level programming languages only evolved because they were far too laborious and error-prone to build entire systems out of. Also, object-oriented programming came about to provide a good way for non-specialists to create meaningful applications; hiding away complex implementations from the users. Today the motivation for more bug-free and versatile code promotes an open source ecosystem among developers.
More recently, API’s have snuck into code in every industry from government to gaming. Google Maps was the first example of mashing together data from different web applications to make a new one. The enormous popularity of Google Maps prompted them to release an API so that developers could utilise their local map services without the need to hack. Demand drives development forward.
The principles that drove the evolution of programming are why Python has become the programming language of choice today. Although out of the box it doesn’t do anything clever like statistical modelling or even matrix multiplications, it does have dedicated fans, who over time have established a thriving ecosystem that fosters all kinds of statistical and analytical tools built for common purposes. Libraries created by academic institutions, as well as corporates, to solve their own scientific challenges have been open-sourced and shared with the global community, extending Python’s capability to do things like analysing celestial objects or creating the next Picasso using deep learning. Easy to use, general purpose and transparent, Python also encourages self-service.
Python’s active ecosystem has enabled users to interface with tools written in other programming languages. Such is its versatility that new programming models have APIs that allow users to code in Python. One example is the PySpark API, enabling Python programmers to take advantage of the benefits of a cluster computing framework. Cluster computing allows for tasks to run in parallel. This means Machine Learning algorithms can now be scaled to run faster. Companies like eBay, Yahoo and Netflix are using Apache Spark to enhance the customer experience with targeted offers, personalised content and online recommendations to customers.
At dunnhumby Python is contributing to greater productivity within our data science teams. We benefit greatly from the active ecosystem and use a whole host of open-source libraries. For example, packages like pandas, numpy, scipy, statsmodel and scikit-learn enable us to quickly iterate through different machine learning models. Building over other Python modules, we created dunnhumby’s own Python library for Data Science that has reduced routine analytical workload from weeks to days. Using libraries like xlsxwriter to output pre-formatted reports, eliminates the requirement to manually highlight numbers or fields. Graphic libraries like matplotlib and seaborn allow us to create visually enticing charts and graphs which help communicate key insight findings and results to our retail clients. For any repetitive tasks that one might have in their daily workflow, (for example, a data analyst publishing data into marts for analysis), it’s very likely that Python can automate those and save valuable time. We’re already taking advantage of these benefits with 200 of our data science professionals skilled in Python programming today and plans in place to have all analysts trained in Python by mid-2018, further boosting productivity and development capabilities.
So is Python the great enabler to truly revolutionise data science practices? It’s certainly got a role to play in opening up the discipline as this recent story from the US suggests: Berkeley, the well-known university in California, is now requiring all their undergraduates (regardless of their university majors) to take the Foundations of Data Science course, which is taught in Python[1]. Not only is it testament to how fundamental programmatic and statistical thinking is for the next generation of skilled workers, it places Python firmly at the pinnacle when it comes to data science.
Cookie | Description |
---|---|
cli_user_preference | The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data. |
cookielawinfo-checkbox-advertisement | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category . |
cookielawinfo-checkbox-analytics | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
CookieLawInfoConsent | The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data. |
viewed_cookie_policy | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
wsaffinity | Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky. |
Cookie | Description |
---|---|
wordpress_test_cookie | WordPress cookie to read if cookies can be placed, and lasts for the session. |
wp_lang | This cookie is used to remember the language chosen by the user while browsing. |
Cookie | Description |
---|---|
CONSENT | YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data. |
vuid | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website. |
_ga | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_gat_gtag_UA_* | This cookie is installed by Google Analytics to store the website's unique user ID. |
_ga_* | Set by Google Analytics to persist session state. |
_gid | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
_hjSessionUser_{site_id} | This cookie is set by the provider Hotjar to store a unique user ID for session tracking and analytics purposes. |
_hjSession_{site_id} | This cookie is set by the provider Hotjar to store a unique session ID, enabling session recording and behavior analysis. |
_hp2_id_* | This cookie is set by the provider Hotjar to store a unique visitor identifier for tracking user behavior and session information. |
_hp2_props.* | This cookie is set by the provider Hotjar to store user properties and session information for behavior analysis and insights. |
_hp2_ses_props.* | This cookie is set by the provider Hotjar to store session-specific properties and data for tracking user behavior during a session. |
_lfa | This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address. |
Cookie | Description |
---|---|
aam_uuid | Set by LinkedIn, for ID sync for Adobe Audience Manager. |
AEC | Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge. |
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud. |
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, Unique Identifier for Adobe Experience Cloud. |
AnalyticsSyncHistory | Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
bcookie | LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID. |
bscookie | LinkedIn sets this cookie to store performed actions on the website. |
DV | Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site. |
ELOQUA | This cookie is set by Eloqua Marketing Automation Tool. It contains a unique identifier to recognise returning visitors and track their visit data across multiple visits and multiple OpenText Websites. This data is logged in pseudonymised form, unless a visitor provides us with their personal data through creating a profile, such as when signing up for events or for downloading information that is not available to the public. |
gpv_pn | Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics. |
lang | Session-based cookie, set by LinkedIn, used to set default locale/language. |
lidc | LinkedIn sets the lidc cookie to facilitate data center selection. |
lidc | Set by LinkedIn, used for routing from Share buttons and ad tags. |
li_gc | Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes. |
li_sugr | Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
lms_analytics | Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics. |
NID | Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. |
OGP / OGPC | Set by Google, cookie enables the functionality of Google Maps. |
OTZ | Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate. |
s_cc | Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics. |
s_ips | Set by LinkedIn, tracks percent of page viewed. |
s_plt | Set by LinkedIn, this cookie tracks the time that the previous page took to load. |
s_pltp | Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics. |
s_ppv | Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed. |
s_sq | Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics. |
s_tp | Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page. |
s_tslv | Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics. |
test_cookie | Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies. |
U | Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
UserMatchHistory | LinkedIn sets this cookie for LinkedIn Ads ID syncing. |
UserMatchHistory | This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy. |
VISITOR_INFO1_LIVE | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
YSC | YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages. |
yt-remote-connected-devices | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
_gcl_au | Set by Google Analytics, to take information in advert clicks and store it in a 1st party cookie so that conversions can be attributed outside of the landing page. |