Thoughts on Process historians
Today, I found myself wanting to talk about a SCADA related topic a little bit. What is SCADA you say? Well, you can either check this article, or just read on. SCADA is a world that I live in, where software meets temperature sensors and pressure gauges. SCADA is the software layer that sits between a human being and a bunch of devices that control and monitor operations in a factory, oil pipelines in the desert , or power generation substations… you get the idea!! Process historians are not considered SCADA systems by themselves, however they are vital companions in almost all SCADA related project. You can put both Process historians and SCADA systems under the umbrella of industrial software.
So what is a Process historian?
Process historians can be defined as complex pieces of software that are used to store and analyze vital process and industrial data. For example, let’s say you are in charge of a factory floor and you discover your equipment overheat more than usual when it’s about to malfunction. You will need to store the temperature readings of the equipment somewhere over time , so that you can investigate whether the temperature is rising over time or not and by how much. With this kind of visibility, you can replace equipment right on time before it fails on you. A process historian will do the storing, the analysis , the trending , exposure of the data to APIs and even the alarm notification if you configure the product right with the appropriate license. They are vital enablers for success in the industrial world.
Process historians usually undertake considerably sophisticated tasks. A typical setup for a Process historian, will not only store data over time but will also perform complex analysis and calculations on this data. Say, you are in a power generation facility, you will want to calculate the power by multiplying currents and voltages over time. Say, you are in the pipeline industry, maybe you would want to calculate the densities of the fluids streaming through your pipes as they go from the source to the destination, and the use cases just keep piling. Process historian products come with varieties of trending and data graphing options that you could use to visualize the raw or calculated data.
Guts of a Process historian
Process historians are databases, that much is probably obvious by now, what sets them apart from common database engines is in how they store the data. Process historians are technically specialized types of databases called time series databases. A time series database attaches a timestamp to every new piece of data it receives then store the pieces of data in the order by which they were received. This makes trending a piece of data overtime ( like a temperature over time in our first example) fairly fast and efficient , because they were simply stored this way. Time series databases typically don’t need to form complex relations between different data points when they store them. In other words a time series database is optimal for retrieving a piece of data that changes over a period of time.
Process historians fall under the NOSQL database category because they are not relational databases in nature. A relational database, which is the most common database engine currently, is a type of database that stores data in tables with rows and columns, you define what the columns should be and what the rows should be, no timestamps involved unless you put them in the database. You then define complex relationships between these tables and how a change in one table can affect other tables and so on. This is too much clutter for the use case of a time series database, most of the heavy algorithms that go in relational databases to make rows, columns and table relations efficient become a burden if all what you need to do is to store a piece of data with a timestamp then move on to the next. A typical example for relational databases is Microsoft SQL server or MySQL.
Time series databases are also used in fields other than the process industrial world, they are used to collect stock prices over time in the financial world for example, individual servers utilization over time in the IT world, and the list continues…. They are typically built with algorithms heavily designed around time stamping each piece of data stored. Process historians differ from the rest of the time series database in that they are very aggressively tailored towards the process control industry.
First, Process historians prebuilt analysis equations, data visualization options and data sheets are like snapshots from a process engineering textbook. They have variety of options for efficiency equations, power equations, steam table charts , industrial equipment icons ready for use …etc which are all very relevant in the industrial world.
Second, they are very compatible with industrial software packages typically used in process control , like HMIs, DCSs, PLC drivers , OPC drivers …etc.
Third, Process historians typically use specialized algorithms to compress data and save disk space. Let’s say you got a value 1 at t1 and a value of 1.0001 at t2, for most cases you don’t need to store 1.0001 because it won’t affect your analysis much. Overtime, that saves a lot of space and memory, and of course you can disable the compression if you want to.
Fourth, Process historians typically come prepackaged with “interfaces” which are separate pieces of software that can be deployed to the field to closely monitor the small sensors and controllers while the historian sits at the datacenter or the cloud. This is vital because a lot of use cases involve monitoring numerous numbers of sensors distributed across vast geographical areas ( the pipelines industry for example can connect countries and continents in some cases), it is not practical to install the Process historian at each sensor since it is a very heavy piece of software. Instead, you install the “interface” which is light and can communicate with the sensor or the controller, then the “interface” relays the data to the central historian.
Fifth, store and forward, which is extremely vital for process historians since a missing piece of data can result in an incorrect analysis which can lead to a wrong critical decision. What store and forward guarantees is that data will not be lost even if the central historian loses the connection with the remote interface. The remote interface will detect that the historian is not taking data, then it will start storing the data it collects in an internal local buffer. Once the connection to the historian opens back up, the interface will forward this data up to the historian.
Sixth, Process historians typically cache recent data before they store it permanently on the drive. This is very efficient for analysis and calculations performed on newer data , which is usually used to detect any sudden surprises in production before they become big problems.
Seventh, most of the personal who support Process historian vendors -up to this point- have backgrounds in industrial engineering , which builds a common ground for communication when seeking help from industrial engineers who use the software.
Process historian vendors
There are currently multiple vendors that offer Process historian packages, almost none of them are cheap. The established players in the market are currently OSISoft PI, Aspentech IP21, GE Proficy and Wonderware InSQL (or industrial SQL). Almost all Process historians rely heavily on Microsoft technology, I am yet to see any vendor who can claim they are cross platform, which of course adds more cost burden for all the Windows server licenses and the companion administration servers and configuration databases. I worked on most of the vendors products with varying degrees, however let’s cover OSISoft since most of my recent projects had been in it.
OSISoft PI is a popular , but expensive option for process historians. The entire company revolves around the historian, it’s their only product and they love it dearly. PI has evolved drastically in the last few years; a user of the product five years ago would admit a big leap in capabilities and usability between then and now. They developed a platform called Asset Framework that wraps the core of the historian and offer more features to it, it can combine time series data stored in the historian with other forms of data stored in relational databases, this can be used – for example- to link serial numbers of equipment found in a relational database with the time series performance data of the equipment which can then be used to generate effective performance reports. The product also includes REST APIs, .NET SDKs and ODBC\JDBC access. They also developed a web portal to access historical data for in the go analysis. PI includes builtin high availability and redundancy features which mirrors the data across multiple nodes, Process historian need high availability to ensure that a server problem would not cause future analytics to produce wrong results since a lot of analytics relies on totaling or averaging numbers over time as part of the calculations.
IP21 is a worthy contender with a lot of extensions and capabilities. Lots of fortune 500 industrial companies use IP21 for their process data analysis. The product was acquired by Aspentech after it bought the company that developed it initially. It has an adapter called CIMIO which facilitates sending data to the historian. It also supports ODBC\JDBC, it uses a special language called SQL+ which can be used to script complex behaviors into the historian.
Wonderware has extremely popular SCADA products and is pushing the edge to get more traction, they released a cloud historian option by which they manage the infrastructure hosting your data in the cloud. They are investing in their next generation historians, the coming years will show if it will pay off.
GE had made an interesting choice by combining their process historian product with big data technologies like Hadoop with map reduce. This is still a young investment and the coming years will again show it’s worth. GE does a lot of dog fooding with their historian product which helped speed up the development so far. The company spans numerous industries with countless use cases for process historians which is good for their industrial software division.
Process historian and open source
Process historians currently reside very comfortably in the closed source world with relatively high price tags. Due to their unique nature and heavy customization, that hadn’t been a problem for them so far. They rely very heavily on the Microsoft stack; clouds are in Azure, scripts are in powershell, web portals are in Silverlight, SDKs are VC++ or .NET …etc. They are moving a bit towards HTML5 but it is still in the works.
The Open source world offers multiple options for time series database currently which can be made into process historians if the right investment and energy are put into it. The barrier isn’t trivial though and the industrial world didn’t care enough so far to invest in that sort of thing. An excellent example of an upcoming open source time series database which I like is influxdb, it is still actively in development and rough in the edges but once it reaches maturity, it will be a force to reckon with in the world of time series data collection. There is also OpenTSDB which is being used in production by a number organizations.
Two of the main features in the open source world that will offer great value to Process historians are sharding and distributed data processing which are currently lacking a bit.
Sharding is the process of distributing the data load on multiple server nodes while keeping track of where the data went, sharding uses some specialized algorithms made to ensure that when a client request data, the algorithm helps to sort out which node hosts that piece of data then provide it to the client. Sharding is essential for enormous data loads to ensure it will not break a single server, without it scalability becomes a big pain as the organization using the data grows and expands.
Distributed data processing is now a must have for organizations that are extreme data crunchers like Google or Amazon. The principle of distributed data processing is to divide very heavy calculations to smaller calculations , have them executed in distributed server nodes , then get the results and join them together via a lighter calculation. This technique makes the power of any analytical engine virtually endless. It is the base principle for big data processing technologies like map reduce.
Even though I have a lot of experience in the enterprise, I personally believe in open source and the power of the masses as opposed to the power of the enterprise.
These were my thoughts on Process historians, hope you learned something new ?