Provincie Noord-Holland: Scaling Data Science in the Public Sector
Provincie Noord-Holland: Scaling Data Science in the Public Sector
See how Provincie Noord-Holland, an organization in the Netherlands' public sector, maximizes data science project efficiency and collaboration with Dataiku.
Share
Scaling Data Science in the Public Sector
We recently sat down with Kasper de Rooy and Eveline Helder from Provincie Noord-Holland (PNH) to learn how their data analyses, which in some cases previously took years, now only take a few weeks, as well as the steps they are taking to become a more data-driven organization (and the role Dataiku plays in this process).
About PNH
The province of Noord-Holland is one of the 12 provinces in the Netherlands and is arguably the most critical piece of the Dutch economy. Interestingly, the province has its own flag, coat of arms, and national anthem.
PNH consists of the management, provincial officials, and the board that own policy and projects throughout the province. Core tasks of the PNH include but are not limited to the province’s environment, energy, and climate; spatial development and water management; regional accessibility and public transport; quality of public administration; cultural infrastructure and monument conservation; and regional economy.
Nearly 1,400 total employees work for more than 2.8 million North Hollanders.
Getting Started
Three years ago, the team at PNH launched an internal initiative to become a more data-driven organization. Despite having this firm ambition, they were not sure what steps were necessary to actually achieve that goal. What technology and expertise did they need? How would they set up experiments? What new processes would need to be implemented?
They began by talking with a lot of other organizations in the public sector who had more experience with data science and began creating a blueprint for what their ideal team would look like and how to get started. Within PNH, interest grew and knowledge sharing became more frequent, particularly among those who worked in the fields of road infrastructure, mobility, and biodiversity. After these initial conversations, the team had a few key realizations:
They needed data scientists and technology to help them attain success with their data science initiatives.
They did not want to be driven by technology alone, but rather wanted to be both data- and business-driven in order to generate positive performance and encourage buy-in among organization-wide stakeholders.
In order to drive tangible value to the organization, they outlined specific criteria to start with projects and topics that were a priority for the province.
They wanted to build in-house knowledge and skills across business units to avoid having to exclusively rely on third-party companies.
Have a starting point and go from there. There’s a lot to be gained from working with your colleagues to identify your exact needs.” – Kasper DeRooy, Datalab Leader
Unique Challenges for the Public Sector
As PNH is not for profit and aims to achieve societal successes, this made the team’s initiative a challenging one. It required strategic alignment from the beginning, as it can be sometimes difficult to quantify these kinds of societal advances.
In addition, data science for government organizations comes with its own set of requirements and regulations. To this end, Eveline stated, “As a government entity, not only do we need to consider regulations and law when it comes to conducting experiments and working with data, but we need to think about the greater good and the impact the project will have.”
Although the team at PNH doesn’t typically run projects that involve data points on individual people and rather their projects predominantly involve data about the physical world and policy-related data, the team wants to ensure they have the proper privacy-compliant steps in place because they feel they have a strong responsibility toward society.
Upon hiring their first data scientist at the end of 2018, the team at PNH explored their technical needs. There were many questions to be answered, such as defining and quantifying success in their projects. At the time, they were doing their data science projects with whatever tools they could access, which was limited due to their closed IT environment.
The new data scientist had worked on data science at a larger company and from this experience knew that the data science platform they invest in should allow for (and encourage) collaboration on data experience. The team got their first Dataiku license to begin exploration.
“Not only do we need to consider regulations and law when it comes to conducting experiments and working with data, but we need to think about the greater good and the impact the project will have.”
– Eveline Helder, Data Scientist
Initial Use Cases
Upon using the initial license for several weeks, the team at PNH compiled feedback from other colleagues and departments and, with positive insights from various internal groups, ended up signing on as a customer. Outlined below are three of the initial use cases where Dataiku has helped PNH scale their data science efforts.
1. Meadow Birds Analysis
One PNH project deals with biodiversity and nature, specifically protecting the meadow bird population within the province. PNH spends significant resources annually collecting data on counts of the number and species identification of meadow birds and interpreting this raw data to determine trends in numbers, level of endangerment of specific species, and so on. Typically, PNH employs external services in order to compile and analyze this data.
Waiting for results used to take upwards of two years, whereas now, with Dataiku, the team can extract insights and calculate trends based on historical data in a matter of weeks and can understand the data on their own. Instead of needing to outsource, PNH has been able to leverage accessible data to make decisions on policy and save time to be used for other priority projects.
2. Traffic Light Optimization and Performance Measurement
PNH typically optimizes the traffic lights in the province once every three years. In order to measure performance prior to using Dataiku, the team would gather data from a typical day before the traffic light optimization as well as data from a typical day after the optimization to identify how many cars stop and for how long, thereby determining the success of the optimization. However, the team observed that this method only compares roughly one data point with another, but every day in the province is unique and there is lots of variation within a day, so having multiple data points to measure would generate stronger insights.
With Dataiku, the team gathered data from a week before the optimization and a week after, differentiated it by every 15 minutes, and incorporated the fact that different traffic lights have different settings to account for morning rush hour, evening rush hour, and the rest of the day. All of this data left the team with a significantly higher group of data points and more data to handle and evaluate the optimization.
Further, they can drill down to see where improvements have specifically been made (i.e. in the morning or evening rush hour) and observe at what times the optimization has the biggest impact. The team at PNH attributes a lot of this success to Dataiku’s ease of use and is excited to continue involving others for future data analysis projects.
3. Field Productivity and Compliance
A third use case also deals with biodiversity, specifically related to agricultural farms. The team at PNH performed an analysis of GIS data (for satellite imagery) in combination with their data science approach and techniques. PNH provides subsidies to agricultural farmers for mowing their fields later in the season in order to protect birds laying eggs in the fields earlier in the season.
The team uses the satellite data to check if the farmers comply with the agreement to mow their fields later in the season as well as identify the most productive areas or types of fields, which they then use to inform future decisions regarding water level, for example. Once they identify the fields that do not comply with the policies, representatives from PNH can visit these farmers to start a dialogue with them and encourage them to course correct their habits to preserve the biodiversity of the province.
The biggest impact Dataiku has had on the organization is the availability of an easy-to-use platform to collaborate on data science projects.”
Kasper De Rooy Datalab Leader
A Collaborative Approach: PNH and Dataiku
For PNH, the most impactful aspect of Dataiku is the collaboration. They sought out a single platform for their small team of four data scientists to cooperate together, work on projects, and share their findings not only with each other but with broader teams. Eveline mentioned that she spent less time fixing code compared to when she used other tools and enjoyed how much Dataiku helps keep analysts and other team members in the loop, promoting cross-departmental involvement and communication.
Thanks to Dataiku, PNH was able to:
Make significant efficiency gains, as in some cases its data analyses previously took years and now take only a few weeks.
Foster a team-based approach to all data science initiatives.
Bring data analysis in-house, instead of having to use external services for it, which saves both time and resources.
Apply machine learning models for projects such as traffic lights optimization and biodiversity, to name a few.
Standardize its data analysis processes and more easily answer internal queries.
“Using a platform that has a unified environment has a lot of value.”
Kasper de Rooy Datalab Leader
Looking Ahead
The team at PNH is excited about improving its data and AI maturity and continuing to grow its data science team. They now see the expansive number of fields for expertise within the data domain and are keen to continue learning from these different experts — both internal and external — to ultimately generate insights and tangible business impact with other disciplines, companies, and government organizations. Finally, they aim to continue to grow their reach and impact through the number of data science projects and increased diversity within those projects.
Dataiku helps us keep everyone in the loop and communicate how our projects are going — everyone can be involved.” – Eveline Helder, Data Scientist
Technical Safety BC leverages Dataiku to deliver quality safety oversight with a small data science team, ultimately improving predictive performance for risk factors by 85%.
In this EGG talk, Martin Leijen shares how Rabobank determines and protects their privacy and ethical standards, as well as how financial institutions can we effectively maintain a firm commitment to moral and ethical standards while at the same time encouraging a strong drive to optimize business opportunities and profitability.
Business units across GE Aviation — from finance to engineering, supply chain, and more — use Dataiku to leverage real-time data at scale for better and faster decisions.
Though the savings generated by the express shipping recommendation model will only fully materialize over time, the tool when globally implemented is estimated to reduce express shipment costs by 11-36%.