Clever Application Of A Predictive Model
What if you could use a predictive model to find new combinations of attributes that do not exist in the data but could be valuable.
In Chapter 10 of Applied Predictive Modeling, Kuhn and Johnson provide a case study that does just this. It’s a fascinating and creative example of how to use a predictive model.
In this post we will discover this less obvious use of a predictive model and the types of experimental design to which it belongs.
Compressive Strength of Concrete Mixtures
The problem modeled in the case study is the compressive strength of different concrete mixtures. Each record in the data is described by the amounts of ingredients of a concrete mixture, such as:
- Cement
- Fly ash
- Blast furnace slag
- Water
- Superplasticizer
- Coarse aggregate
- Fine aggregate
The property of interest from the resulting mixture is the compressive strength of the concrete. Strong concrete with less or cheaper ingredients is desirable.
Refer to Chapter 10 of Applied Predictive Modeling
Predictive Model
Many complex machine learning methods are spot checked on this regression problem, such as:
- Linear Regression
- Radial bias function Support Vector Machines (SVM)
- Neural Networks
- MARS
- Regression Trees (CART and conditional inference trees)
- Bagged and Boosted decision trees
Model accuracy was considered in terms of the RMSE and the R^2 of the predictions. Some of the better performing methods were Neural Networks, Boosted Decision Trees, Cubist and Random Forest.
Optimizing Compressing Strength
This is the clever part of the case study.
After accurate models were created and selected (Neural Networks and Cubist models), the models were used to locate new mixture quantities that resulted in improved concrete compressing strength.
This involved using a direct search method (also called pattern search) called the Nelder Mead algorithm to search the parameter space for a combination of mixture quantities that when passed to the predictive model, predicted a concrete compressing strength greater than any in the dataset.
A number of new mixtures were discovered and plotted in a projected domain relative to the provided data. These new mixtures represent the basis for actual commercial experiments that could be performed in order to find an improved concrete mixture.
Response Surface Methodology
The approach is related to a specific type of experimental design called Response Surface Methodology (RSM).
RSM is used when you want to develop, improve or optimize a process for a new or existing product. It’s commonly used for industrial settings. It is used for problems where the relationship between the inputs and the output are not well understood and need to be estimated.
Designed experiments are performed in order to collect examples of the inputs and the response variable or variables. The inputs variables may be quantities or timings in a process and the output or response variable is something desirable from the result like strength or quality.
The statistical model is constructed to approximate the relationship between the independent variables and the dependent variable, and finally an optimization process explores new combinations of inputs to maximize the output variable.
A critical step prior to performing the designed experiments is to reduce the number of variables to only those factors known to influence the response variable. This is a form of feature selection with which we are very familiar in machine learning.
Simple models are used to model the functional relationship, such as first or second order polynomials. The method is called response surface because of the continuous nature of the response surface for many problems and how it can be plotted as a surface in two-dimensions.
Surrogate Model
Surrogate modeling is when the model constructed in RSM is used in place of a simulation of the problem. For example, in aviation, you can design and build aircraft wings, design them in software and test them in simulators and model the results of experiments or simulation results and estimate new designs to test.
The models may be more elaborate to capture the complex non-linear relationships between the inputs and response variable. For example Support Vector Machines and Neural Networks may be used. Additionally, more powerful direct search methods may be used that use stochastic processes, such as simulated annealing or evolutionary algorithms.
The over-all process may be something like
- Reduce the number of variables involved
- Design experiments and execute them sequentially to collect source data to model
- Construct a surrogate model from the experimental data
- Apply a search method to the variables using the surrogate model
- Sequentially perform experiments based on the optimized predictions of the surrogate model
- Iterate Steps 3 to 5 until a stopping condition is met
Summary
In this post you discovered a clever way to use a predictive model.
In the case study you learned of an example of using machine learning algorithms to model the results of concrete mixture experiments, search the parameter space for mixers with optimal compressive strength that may be taken as the basis for further experiments.
You learned that this type of experimental design is called Response Surface Methodology and is used for industrial problems domains for processes like the concrete mixture example. You also learned that the predictive model is this case study is called a surrogate model.
This is a powerful method that you could use in other domains that have large computation overhead for performing simulations.
Resources
Below are some books you may want to look at to learn more about this approach to experimental design and optimization.
相關推薦
Clever Application Of A Predictive Model
Tweet Share Share Google Plus What if you could use a predictive model to find new combinations
Build a predictive model on Watson Studio using CSV data set from Tweets
In the era that we currently live in, all the focus has shifted towards data. Each day, the amount of data that is generated and co
1---A Combined Model of Random Forest and Multilayer Perceptron to Forecast Expressway Traffic Flow
北郵大水比寫的,明顯就是造假 隨機森林與多層感知器相結合的高速公路交通流預測模型 隨機森林與多層組合模型感知器 A.隨機森林演算法 and it is an extension of Bagging algorithm 在迴歸預測問題中, 隨機森林演
A Mathematical Model Captures the Political Impact of Fake News
This story is for Medium members.Continue with FacebookContinue with GoogleMedium curates expert stories from leading publishers exclusively for members (w
An engine that classifies the content of a Reddit post: an application of Natural Language…
An engine that classifies the content of a Reddit post: an application of Natural Language Processing“white text on black background” by Lauren Peng on Uns
The cart before the horse: A new model of cause and effect
But in many cases, this one-way relationship between cause and effect fails to accurately describe reality. In a recent paper in Nature Communications, sc
Use Auto Scaling to Improve the Fault Tolerance of an Application Behind a Load Balancer
Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So
A Relational Model of Data for Large Shared Data Banks 1970
大型共享資料庫的資料關係模型 未來的資料庫使用者一定是和資料在機器中的儲存(即資料庫的內部模式)相互隔離的。而通過提示服務來提供資訊是一個不太令人滿意的解決方法。當資料可得內部模式表示發生改變,甚至資料外部表示的多個方面發生改變的時候,終端使用者和大多數的應用程式的活動都不
Lowest Common Ancestor of a Binary Search Tree & a Binary Tree
max 結果 solution 返回 分析 else 當前 n) 如果 235. Lowest Common Ancestor of a Binary Search Tree 題目鏈接:https://leetcode.com/problems/lowest-common-
Most efficient way to get the last element of a stream
val lang ted reduce class ret return imp pretty Do a reduction that simply returns the current value:Stream<T> stream; T last = str
Leetcode 17. Letter Combinations of a Phone number
res bsp self. col join lee num nat leetcode 求給出的數字串,如果按照電話鍵盤的編譯方式,可以給出多少那些對應的數字組合。例如: Input:Digit string "23" Output: ["ad", "ae", "af"
leetcode_017 Letter Combinations of a Phone Number
like present class digits div all dfs hat upload Given a digit string, return all possible letter combinations that the number could repr
POJ 2553 The Bottom of a Graph(強連通分量)
margin target 代碼 not push ret dsm ng- http POJ 2553 The Bottom of a Graph 題目鏈接 題意:給定一個有向圖,求出度為0的強連通分量 思路:縮點搞就可以 代碼: #include <
Lowest Common Ancestor of a Binary Search Tree
tor cnblogs span || ces while 宋體 tco earch 3 / 5 1 / \ / 6 2 0 8 /
E - Fantasy of a Summation LightOJ1213
too gree time color lib print lose -- use E - Fantasy of a Summation Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 128000/
LeetCode235:Lowest Common Ancestor of a Binary Search Tree
itself 一個 post 特性 || arc order amp ear Given a binary search tree (BST), find the lowest common ancestor (LCA) of two given nodes in t
[LeetCode] 331. Verify Preorder Serialization of a Binary Tree Java
sep find with har ted 分支 input enc equal 題目: One way to serialize a binary tree is to use pre-order traversal. When we encounter a non-nu
The Bottom of a Graph
ive limit rtai assume ted can hab spa mean poj——The Bottom of a Graph
17. Letter Combinations of a Phone Number
leetcode lan esc ber des let bsp nat leet https://leetcode.com/problems/letter-combinations-of-a-phone-number/#/description 17. Letter C
Letter Combinations of a Phone Number
elf cal con rep python lis commons wiki san Given a digit string, return all possible letter combinations that the number could represent