29 October 2020
Huijuan Shao
R&D Division, Hitachi America, Ltd.
Failure prediction is an important problem in industry and has been studied over decades in various areas. Generally, majority of equipment and industrial components deteriorate after running for a period of time. The failures among multiple devices, which are physically connected to each other, may propagate. When a component of a system fails, other relevant components may break down too. For example, in a mill plant, when a motor fails, the bearings, which are physically connected to it, may fail as well. To capture these relationships, model-based techniques use system equations to extract analytical redundancies between devices’ measurements [1, 2]. An alternative approach is to apply a data-driven solution that uses information from similar devices for failure prediction. Data-driven methods use system’s measurements as the features for failure prediction [3]. In order to predict multivariate responses from multiple devices simultaneously with higher accuracy, we use the correlated features from these devices. We introduced multi-Bernoulli distribution with logit transformation to learn the correlation between the predictors and multivariate responses.
Multivariate Bernoulli Log-Normal Model
The input are features from multiple devices. The output is a multivariate binary response for different devices. The output can be represented as a multivariate random variable and the binary response follows the multivariate Bernoulli distribution. Then, we use a regression to connect the relationship between the input and the output. By estimating the likelihood, we form the multivariate Bernoulli Log-Normal model.
We applied this to a simulated water tank system dataset [4] to demonstrate and validate the performance of our method. This dataset includes a network of water tanks. The measurements for each tank include tank’s pressure and tank refill mode. We ran this MBLN model on a subset of five connected tanks as shown in Figure 1. Our aim was to predict whether there are leaks for two tanks T20 and T30.
Figure 1: Graph structure of water tank data
The dataset was split into two categories. The first 90% dataset is for training and the remaining 10% is for testing. The MBLN model was then applied to predict the leak information of T22 and T30 simultaneously. Other models, such as gradient boost descent, random forest, logistic regression, lasso and elastic-net regularized generalized linear models (glmnet) and kNN, were used to predict the leak information of T22 and T30 as baselines. We compared the receiver operating characteristic (ROC) curve of these six models, as illustrated in Figure 2. As can be seen, the MBLN performs the best from the view of ROC curve because its estimated parameters B and Σ reflect the correlation of 5 tanks and contribute to the tank leak prediction of two tanks T22 and T30.
Figure 2: ROC curve comparison of six models
The scope of this work consists of two main points. One is that MBLN is more effective in handling at least two devices. The other is that the input data from multiple devices should have some correlation. For more detailed information, we recommend referring to our paper [5].
Many thanks to my co-authors Chi Zhang, Shuai Zheng, Hamed Khorasgani, Ahmed Farahat and Chetan Gupta with whom this research work was jointly executed.