cs231n assignment1--Softmax

阿新 • • 發佈：2019-01-19

svm實現完了，這部分會相對比較輕鬆，大部分和svm類似。
關於梯度的推導，我主要參考這篇文章
http://www.jianshu.com/p/004c99623104multiclass
梯度推導：
這裡寫圖片描述
向量化的實現和svm類似，實現過svm應該不難實現softmax

以下是softmax.py程式碼：

import numpy as np
from random import shuffle

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """ 

  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1] # C
  num_train = X.shape[0]  # N
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if 
 j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin
        dW[:,y[i]] -= X[i,:]
        dW[:,j] += X[i,:]


  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train. 

  loss /= num_train
  dW /=num_train
  # Add regularization to the loss.
  loss += 0.5 * reg * np.sum(W * W)
  dW +=reg*W;
  #############################################################################
  # TODO:                                                                     #
  # Compute the gradient of the loss function and store it dW.                #
  # Rather that first computing the loss and then computing the derivative,   #
  # it may be simpler to compute the derivative at the same time that the     #
  # loss is being computed. As a result you may need to modify some of the    #
  # code above to compute the gradient.                                       #
  #############################################################################


  return loss, dW


def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0
  num_train= X.shape[0]
  dW = np.zeros(W.shape) # initialize the gradient as zero
  scores = np.dot(X,W)
  correct_class_scores = scores[np.arange(num_train),y]
  correct_class_scores = np.reshape(correct_class_scores,(num_train,-1))
  margin = scores-correct_class_scores+1.0 # numpy廣播
  margin[np.arange(num_train),y]=0.0
  margin[margin<=0]=0.0
  loss += np.sum(margin)/num_train
  loss += 0.5*reg*np.sum(W*W)
  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  #############################################################################
  pass
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################

  margin[margin>0]=1.0
  row_sum = np.sum(margin,axis=1)
  margin[np.arange(num_train),y] = -row_sum
  dW = 1.0/num_train*np.dot(X.T,margin) + reg*W;

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the gradient for the structured SVM     #
  # loss, storing the result in dW.                                           #
  #                                                                           #
  # Hint: Instead of computing the gradient from scratch, it may be easier    #
  # to reuse some of the intermediate values that you used to compute the     #
  # loss.                                                                     #
  #############################################################################
  pass
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################

  return loss, dW

作業部分程式碼：

Softmax exercise
Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.
This exercise is analogous to the SVM exercise. You will:
implement a fully-vectorized loss function for the Softmax classifier
implement the fully-vectorized expression for its analytic gradient
check your implementation with numerical gradient
use a validation set to tune the learning rate and regularization strength
optimize the loss function with SGD
visualize the final learned weights


import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
  """
  Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
  it for the linear classifier. These are the same steps as we used for the
  SVM, but condensed to a single function.  
  """
  # Load the raw CIFAR-10 data
  cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
  X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

  # subsample the data
  mask = range(num_training, num_training + num_validation)
  X_val = X_train[mask]
  y_val = y_train[mask]
  mask = range(num_training)
  X_train = X_train[mask]
  y_train = y_train[mask]
  mask = range(num_test)
  X_test = X_test[mask]
  y_test = y_test[mask]
  mask = np.random.choice(num_training, num_dev, replace=False)
  X_dev = X_train[mask]
  y_dev = y_train[mask]

  # Preprocessing: reshape the image data into rows
  X_train = np.reshape(X_train, (X_train.shape[0], -1))
  X_val = np.reshape(X_val, (X_val.shape[0], -1))
  X_test = np.reshape(X_test, (X_test.shape[0], -1))
  X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

  # Normalize the data: subtract the mean image
  mean_image = np.mean(X_train, axis = 0)
  X_train -= mean_image
  X_val -= mean_image
  X_test -= mean_image
  X_dev -= mean_image

  # add bias dimension and transform into columns
  X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
  X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
  X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
  X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

  return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
print 'dev data shape: ', X_dev.shape
print 'dev labels shape: ', y_dev.shape


Softmax Classifier
Your code for this section will all be written inside cs231n/classifiers/softmax.py.


# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print 'loss: %f' % loss
print 'sanity check: %f' % (-np.log(0.1))

# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# similar to SVM case, do another gradient check with regularization
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized)
print 'Gradient difference: %f' % grad_difference


# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [5e4, 1e8]

for i in range(np.shape(learning_rates)[0]):
    for j in range(np.shape(learning_rates)[0]):
        softmax = Softmax()
        learning_rate = learning_rates[i]
        reg = regularization_strengths[j]
        loss_hist = softmax.train(X_train, y_train, learning_rate, reg,
                      num_iters=1500, verbose=True)
        y_train_pred = softmax.predict(X_train)
        training_accuracy=np.mean(y_train == y_train_pred)
        y_val_pred = softmax.predict(X_val)
        validation_accuracy=np.mean(y_val == y_val_pred)
        results[(learning_rate,reg)]=(training_accuracy,validation_accuracy)
        if(best_val<validation_accuracy):
            best_val = validation_accuracy
            best_softmax = softmax

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
pass
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val


# evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )


# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)

  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])

cs231n assignment1--Softmax

svm實現完了，這部分會相對比較輕鬆，大部分和svm類似。關於梯度的推導，我主要參考這篇文章 http://www.jianshu.com/p/004c99623104multiclass 梯度推導：向量化的實現和svm類似，實現過svm應該不難

cs231n作業：assignment1 - softmax

title: cs231n作業：assignment1 - softmax id: cs231n-1h-3 tags: cs231n homework categories: AI Deep Learning date: 2018-09-27 16:02:

斯坦福CS231n assignment1：softmax損失函式求導

斯坦福CS231n assignment1：softmax損失函式求導在前文斯坦福CS231n assignment1：SVM影象分類原理及實現中我們講解了利用SVM模型進行影象分類的方法，本文我們講解影象分類的另一種實現，利用softmax進行影象分類。

斯坦福深度學習課程cs231n assignment1作業筆記三：softmax實現相關

任務實現向量化的損失函式實現向量化的梯度計算分析梯度與數值梯度的驗證使用驗證集來選擇超引數使用SGD優化方法視覺化權重理論知識 softmax損失函式令W為權重矩陣，大小為D×C；x為輸入，大小為1×D；b為偏置項，大小為1×C。那麼模型的輸

斯坦福cs231n課程記錄——assignment1 Softmax

目錄 Softmax原理某些API解釋 Softmax實現作業問題記錄 Softmax優化 Softmax運用參考文獻一、Softmax原理二、某些API解釋 lambda函式作用：lambda 定義

cs231n-assignment1-SVM/Softmax/two-layer-nets梯度求解

上週完成了cs231n的assignment1,作業中的難點是SVM/Softmax/two-layer-nets的梯度求導，特此寫篇部落格進行總結。作業assignment1的資源連結在這裡：http://download.csdn.net/detail/

CS231N assignment1

位置元素 rand ali num 計算 ini itl 分享圖片 # Visualize some examples from the dataset. # We show a few examples of training images from each cla

CS231N assignment1 SVM

from cs231n.classifiers.softmax import softmax_loss_naive 線性分類器SVM,分成兩個部分 1.a score function that maps the raw data to class scores,也就是所謂的ｆ(w,x)

CS231n Assignment1總結

lecture3一些關於鏈式法則的基本知識。下面是對assignment1的程式碼一些關鍵點或者有意思實現的總結參考答案：https://github.com/sharedeeply/cs231n-assignment-solution/blob/master/assignment1/

影象與機器學習-2-基礎知識及cs231n/assignment1

part 1 機器學習基礎知識：包括線性迴歸，邏輯迴歸，交叉熵，softmax,KNN,神經網路中梯度的傳遞思想。關於線性迴歸和邏輯迴歸部分的知識，可以參考這個部落格的內容，就不再累述：http://blog.csdn.net/viewcode/article/details/8

CS231n assignment1 Q5 Level Representations: Image Feature

這個作業是討論對影象畫素進行進一步計算得到的特徵來訓練線性分類器是否可以提高效能。對於每張圖，我們會計算梯度方向直方圖(HOG)特徵和用HSV（Hue色調，Saturation飽和度,Value明度）顏色空間的色調特徵。把每張圖的梯度方向直方圖和顏色直方圖特徵合併形成我們最後的特徵向量。 HOG大致可以捕捉

cs231n assignment1 環境搭建+實踐操作

網易雲課程視訊及作業連結 http://study.163.com/course/courseMain.htm?courseId=1003223001 1. 環境搭建根據我第一篇的文章成功進入了環境。我用的是VM12+Ubuntu14.04.5，適合電腦配置低的童鞋（啊哦……）

cs231n assignment1 關於svm_loss_vectorized中程式碼的梯度部分

個人覺得svm和softmax的梯度部分是這份作業的難點，參考了一些程式碼覺得還是難以理解，網上似乎也沒有相關的解釋，所以想把自己的想法貼出來，提供一個參考。首先貼上參考的程式碼： def svm_loss_vectorized(W, X, y,

斯坦福深度學習課程cs231n assignment1作業筆記二：SVM實現相關

前言本次作業需要完成：實現SVM損失函式，並且是完全向量化的實現相關的梯度計算，也是向量化的使用數值梯度驗證梯度是否正確使用驗證集來選擇一組好的學習率以及正則化係數使用SGD方法優化loss 視覺化最終的權重程式碼實現使用for迴圈計算SVM

CS231N assignment1——SVM

Multiclass Support Vector Machine exercise Complete and hand in this completed worksheet (including its outputs and any supporting

CS231n assignment1 -- Two-layer neural network

接近assignment1的尾聲了，這次我們要完成的是一個兩層的神經網路，要求如下： RELU使用np.maximum()即可； Softmax與作業上個part相同，可以直接照搬。不同的地方在求導，兩個全連線層，共有W1 b1 W2 b2四個引數。對於它

CS231n-assignment1 K-fold 交叉驗證 python 中字典的用法

num_folds = 5 k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100] X_train_folds = [] y_train_folds = [] ###################################

【Python 程式碼】CS231n中Softmax線性分類器、非線性分類器對比舉例（含python繪圖顯示結果）

#CS231n中線性、非線性分類器舉例（Softmax） #注意其中反向傳播的計算 # -*- coding: utf-8 -*- import numpy as np import matplotlib.pyplot as plt N = 100 # num

CS231n——Assignment1-KNN

一、KNN 1.讀取資料 import numpy as np import random from cs231n.data_utils import load_CIFAR10 import matplotlib.pyplot as plt import os pl

CS231n-2017 Assignment1 k-近鄰方法、SVM、Softmax、兩層神經網路

一、k近鄰方法 1. 使用兩層迴圈計算距離矩陣訓練資料X_train和測試資料X中每一行是一個樣本點。距離矩陣dists中每一行為X中的一點與X_train中各個點的距離。 k_nearest_neighbor檔案中的compute_distances_two_loops()

cs231n assignment1--Softmax

相關推薦