Our blogs Blogs home
Digitalization and software solutions

Digitalization and software solutions


Big data and machine learning for prediction of corrosion in pipelines

In this blog post we will look at some of the achievements during a 5-day machine learning hackathon we arranged recently. We were curious about one of the key concepts in our current strategy – could we manage to become a bit more “data smart” on integrity management and maintenance planning on pipelines? We wanted to learn more about the opportunities and maturity level with technologies like big data, machine learning, artificial intelligence and the internet of things. How easy was it to apply and how could it potentially fit into our current product portfolio?

Onshore pipeline

Scenario and use case of our hackathon

In our hackathon, we set up a mixed team of business representatives, experienced developers, data scientists and domain (pipeline) experts. In total we were around 10 people involved. We had good support from Microsoft with some of their experts within Azure and machine learning. The approach of having an interdisciplinary team turned out to be crucial for the success of the hackathon.

Our dataset came from an onshore pipeline system with a total length of 1455 miles. We had both pipe state data (depth of cover, coating, casings, welds and much more) as well as condition data. In machine learning, you would normally create a “training data set”. Our training set was defined out of roughly 59000 rows of data where around 3000 having measured corrosion. We applied specialized tools for the data management, data cleaning and machine learning.

We wanted to investigate whether we could create predictive algorithms for the corrosion in pipelines. In this case, we wanted to be able to predict susceptibility to corrosion in areas of a pipeline system where it was not possible to inspect using inline inspection devices, typically referred to as being ”unpiggable” pipelines. Assessment of such areas of a pipeline have to date been both expensive and in many instances ineffective in preventing serious pipeline failures.

Background info – pipeline “pigs” and big data

Pipeline “pigs” are devices that are inserted into onshore and offshore oil and gas pipelines to perform different types of tasks without stopping the fluid flow. These devices were initially designed to remove residues, plugs or anything else that could prevent or slow down the flow inside the pipelines. Today pigging is used throughout the life of a pipeline, for many different reasons. For instance, pigs are applied for internal inspections to detect corrosion or cracks that could lead to failure and to provide a basis for repair decisions that can help prevent leakage and ruptures, which can be explosive and dangerous to people, property and the environment. There are basically two different technologies used on the “pigs” that assess pipe wall condition, ultrasound and magnetism. The choice is based upon the objectives of the inspection.

Pipeline pig

Unfortunately, a large proportion of the world’s pipelines are not accessible for pig inspections. When applying “pigs”, regardless of the technology used, vast volumes of data will be generated (big data), in varying formats which is then analyzed by the inspection vendor. The results of this are typically estimates of sizes of corrosion areas or cracks, which have varying degrees of accuracy which must be accounted for in subsequent decision making. . The use of advanced analytics of on this data can help uncover patterns not revealed by conventional analytics and so assist in making more effective predictions than would otherwise be possible. These forms of predictive analytics are the future of pipeline condition assessment and monitoring and will provide stakeholders with a better overview of operations, and more control and flexibility for managing their assets.

These types of fully digitalized services can bring various benefits to day-to-day operations, improving efficiency and effectiveness or risk and integrity management programmes. For condition monitoring, these analytical tools can assist operators improve maintenance regimens and optimize inspection intervals by combining asset specific, industry, historical and real-time data into the data driven predictive and decision processes. Detecting anomalies and accurately predicting future behaviour during operations will enable more effective decision making that can help best focus operational spend on risk reduction. The oil and gas industry will in most cases have big benefits picking up on such digitalization initiatives. Big data and machine learning provides opportunities in most every facet of asset operations, and it seems that it’s just the imagination that sets the limit for what we can achieve.

Prediction of fatigue or corrosion in pipelines

As a classification society with more than 150 years’ history, we have significant experience with prediction of fatigue of various structure types, including pipelines. Our Rules and Recommended Practices will set requirements for inspection intervals, such as every fifth or tenth year, depending on the type of component / structure, utilization and criticality. But why these five or ten years? It’s just because we’ve made a qualified assumption that it’s wise to look at it right now. This is obviously within all margins, and generally a very conservative approach. This in fact captures the very essence of the changes that big data and machine learning bring to the industry. Many rules and estimation type engineering analytics evolved to overcome a lack of data and do this quite effectively. This is no longer a necessity in many applications, where we are data rich (we have big data) and an analytics toolset which is now viable for everyday use.

“Big data” in the pipeline assessment business context includes the vast quantities of data coming from inspection devices full of sensors, such as pigs, and increasingly from embedded or remote sensors the most common which are catholic protection and corrosion coupons in addition we have a many types of asset property data, historical assessments, operational state, soils and environmental information which will exist in many formats, often unstructured such as in documents or photographs or other images. This volume, variation in sources and variety of structured and non-structured information is true “big data” and it is impossible for engineers to interpret without using a more advanced analytical approach such as machine learning or artificial intelligence.

Correlation with other data sources and dealing with unstructured data

In the hackathon we taught our algorithms on previously pigged areas of pipelines where we had the inspection history to determine levels of corrosion as well as many forms of non-inspection information including those relating to pipe properties, corrosion protection history, coating type, local climate data, soil properties and previous field examination results. The algorithm was taught to predict the level of corrosion on a section of pipe based on the many other attributes.


During a quite short hackathon, we were able to go through a complete workflow of machine learning and create relatively accurate predictions of the corrosion levels at’ any location of the pipeline beyond where we had trained the algorithm We even managed to add the new predictive method as an “trial-extension” directly into our existing integrity management solution for onshore and offshore pipelines. The overall accuracy was much higher than we expected with such a limited dataset and time constraints.

Even though machine learning research pioneered as early as in the 1950s, we have a seen more like a revolution than an evolution in the maturity, tooling and simplicity for the past 10 years. One main reason is the raise of cloud computing and the possibility to work with “unlimited” storage or compute power. Even though we applied relatively “simple” approaches such as Microsoft Azure Machine Learning Studio and the Azure ML cheat sheet, we came surprisingly far.

The performance achieved by the machine learning approach to our industry problem is extremely promising and allowed us to define a path to the rapid deployment of machine learning within our asset integrity product solutions.


5 Comments Add your comment
Avatar Brad says:

Which hackathon was this for? This is something I have been heavily looking into and playing with for a couple years now machine learning and pipeline big data…

Jo Øvstaas Jo Øvstaas says:

This was an internal hackathon where we gathered around 10 people. It was a mix of data scientists, full stack developers and pipeline engineers. Hackathons are efficient for exploring good ideas or getting in touch with new technologies.

Avatar Khairul Chowdhury says:

You can get some free data on phmsa website.

At the right side bar download the zip files. I already did some analysis for onshore pipeline corrosion. After doing manual parametric study the accuracy was 98% in predicting corrosion related leak or rupture. Its too good to believe. Need more validation.


Is it possible to get the dataset which was used for the hackathon? We would like to explore the same. Can you also let us know if there is dataset around the video/ultrasound/electromagnetic to process and figure out if it can tell the state of the pipeline.
Warm Regards,

Jo Øvstaas Jo Øvstaas says:

These datasets come from our customer and cannot be shared.

Reply with your comment

Your email address will not be published. Required fields are marked *