1. 程式人生 > 實用技巧 >fcn從頭開始_在Excel中從頭開始進行線性迴歸

fcn從頭開始_在Excel中從頭開始進行線性迴歸

fcn從頭開始

While using Excel/Google Sheet for solving an actual problem with machine learning algorithms can be a bad idea, implementing the algorithm from scratch with simple formulas and a simple dataset is very helpful to understand how the algorithm works. After doing it for almost all the common algorithms including the Neural Network

, it helps me a lot.

雖然使用Excel / Google Sheet解決機器學習演算法的實際問題可能是一個壞主意,但是從頭開始使用簡單的公式簡單的資料集來實現演算法對於瞭解演算法的工作原理非常有幫助。 在對幾乎所有常用演算法(包括神經網路)進行了處理之後,它對我有很大幫助。

In this article, I will share how I implemented a simple Linear Regression with Gradient Descent. You can use this link Simple linear regression with gradient descent

to get the Excel/Google Sheet file.

在本文中,我將分享如何使用Gradient Descent實現簡單的線性迴歸。 您可以使用此連結使用梯度下降的簡單線性迴歸來獲取Excel / Google表格檔案。

Now let’s get our hands dirty!

現在讓我們弄髒雙手!

使用簡單的資料集 (Using a simple dataset)

First I use a very simple dataset with one feature, you can see the graph below showing the target variable y and the feature variable x.

首先,我使用一個具有一個特徵的非常簡單的資料集,您可以看到下面的圖形,其中顯示了目標變數y和特徵變數x。

Image for post

建立線性模型 (Creating the linear model)

In Google Sheet or Excel, you can add a trendline. So you get the result of Linear Regression.

在Google表格或Excel中,您可以新增趨勢線。 這樣就得到了線性迴歸的結果。

Image for post

But if you want to use the model to do predictions, then it is necessary to implement the model and in this case, the model is quite simple: for each new observation x, we can just create a formula: y=a*x + b. Where a and b are the parameters of the model.

但是,如果您想使用該模型進行預測,則有必要實施該模型,在這種情況下,該模型非常簡單:對於每個新的觀測值x,我們都可以建立一個公式:y = a * x + b。 其中a和b是模型的引數。

Image for post

模型的成本函式 (The cost function of the model)

How can we obtain the parameters a and b? Well, the optimal values for a and b are those minimizing the cost function, which is the Squared Error of the model. So for each data point, we can calculate the Squared Error.

我們如何獲得引數a和b? 好吧,a和b的最佳值是那些使成本函式最小的值,這是模型的平方誤差。 因此,對於每個資料點,我們可以計算平方誤差。

Squared Error = (prediction-real value)²=(a*x+b-real value)²

平方誤差=(預測實際值)²=(a * x + b實際值)²

In order to find the minimum of the cost function, we use the gradient descent algorithm.

為了找到成本函式的最小值,我們使用了梯度下降演算法。

Image for post
Image for post

簡單梯度下降 (Simple gradient descent)

Before implementing the gradient descent for the Linear Regression, we can first do it for a simple function: (x-2)^2.

在為線性迴歸實現梯度下降之前,我們可以先針對一個簡單函式(x-2)^ 2進行此操作。

The idea is to find the minimum of this function using the following process:

想法是使用以下過程找到此功能的最小值:

  • First, we randomly choose an initial value.

    首先,我們隨機選擇一個初始值。
  • Then for each step, we calculate the value of the derivative function df (for this x value): df(x)

    然後,對於每個步驟,我們計算導數函式df的值(對於此x值): df(x)

  • And the next value of x is obtained by subtracting the value of derivative multiplied by a step size: x = x - step_size*df(x)

    x的下一個值是通過將導數的值乘以步長得到的: x = x-step_size * df(x)

You can modify the two parameters of the gradient descent: the initial value of x and the step size.

您可以修改梯度下降的兩個引數:x的初始值和步長。

Image for post
Image for post

And in some cases, the gradient descent will not work. For example, if the step size is too big, the x value can explode.

在某些情況下,梯度下降將不起作用。 例如,如果步長太大,則x值可能會爆炸。

Image for post
Image for post

線性下降的梯度下降 (Gradient descent for linear regression)

The principle of the gradient descent algorithm is the same for linear regression: we have to calculate the partial derivatives of the cost function with respect to the parameters a and b. Let’s note them as da and db.

對於線性迴歸,梯度下降演算法的原理是相同的:我們必須針對引數a和b計算成本函式的偏導數。 讓我們將它們記為da和db。

Squared Error = (prediction-real value)²=(a*x+b-real value)²

平方誤差=(預測實際值)²=(a * x + b實際值)²

da=2(a*x+b-real value)*x

da = 2(a * x + b實值)* x

db=2(a*x+b-real value)

db = 2(a * x + b實值)

In the following graph, you can see how a and b converge towards the target value.

在下圖中,您可以看到a和b如何收斂到目標值。

Image for post
Image for post

Now in practice, we have many observations and this should be done for each data point. That’s where things become crazy in Google Sheet. So, we use only 10 data points.

現在,在實踐中,我們有很多觀察結果,應該針對每個資料點執行此操作。 這就是Google Sheet變得瘋狂的地方。 因此,我們僅使用10個數據點。

You will see that I first created a sheet with long formulas to calculate da and db, which contain the sum of the derivatives of all the observations. Then I created another sheet to show all the details.

您將看到,我首先建立了一個包含長公式的工作表來計算da和db,其中包含所有觀測值的導數之和。 然後,我建立了另一個工作表以顯示所有詳細資訊。

If you open Google Sheet, you can play yourself by modifying the parameters of the gradient descent: the initial values of a and b, and the step size. Enjoy!

如果您開啟Goog​​le表格,則可以通過修改梯度下降的引數(a和b的初始值以及步長)來發揮自己的作用。 請享用!

Now if you want to understand other algorithms, please free feel copy this Google Sheet, and change it a little bit for Logistic Regression or even Neural Network.

現在,如果您想了解其他演算法,請隨意複製此Google表格,併為Logistic迴歸甚至神經網路進行一些更改。

翻譯自: https://towardsdatascience.com/linear-regression-from-scratch-in-excel-3d8192214752

fcn從頭開始