暴力補坑：win10+tensorflow+mnist+python3.6匯入mnist資料錯誤：UnicodeEncodeError

阿新 • • 發佈：2019-02-12

問題背景描述

mnist本身是tensorflow下最常用也是最簡單基礎的資料包。
所以，在新安裝tensorflow，給tensorflow配gpu版本，或者試驗tensorflow的其他沒有接觸過的操作時經常被拿來作為測試之用。
然而，官方文件裡所說的引用mnist資料庫的方法：

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(‘MNIST_data’, one_hot=True)

在直接執行時會報錯：

File “C:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\zmq\utils\jsonapi.py”, line 43, in dumps
s = s.encode(‘utf8’)
UnicodeEncodeError: ‘utf-8’ codec can’t encode character ‘\udcd5’ in position 2416: surrogates not allowed

網路上這個問題相關的資料還是很多的，也有解決方案，但是我並不喜歡：

很多回答都是基於Linux，對於像我這樣只有win10的，有些方法值得商榷
對於剛接觸tensorflow的人，或者說python不熟的，需要的是一個簡單粗暴的使用mnist資料的方式，在encode問題或者用二進位制檔案讀取mnist上花費學習的經歷簡直是消磨革命熱情

所以，不如直接基於mnist的二進位制讀取，寫個和官方的引用方式差不多的猴版module，豈不美哉？

準備

程式碼

在工作目錄下建立mnist.py，高仿從命名開始，內容如下：

# -*- coding: utf-8 -*-
import 
 numpy as np
import struct

# 讀取圖片，返回 [樣本數，影象寬*影象高]的numpy陣列
def read_img(path, filename):
    with open(path+filename,'rb') as bitfile:
        buffer = bitfile.read()
        head = struct.unpack_from('>IIII',buffer,0)
        print('load head:', head)
        imgNum = head[1]
        width = head[2 
]
        hight = head[3]
        bits = imgNum*width*hight
        bitsString = '>'+str(bits)+'B'
        offset = struct.calcsize('>IIII')
        imgs = struct.unpack_from(bitsString,buffer,offset)

    imgs = np.reshape(imgs,[imgNum,width*hight])
    print('load image finished')
    return imgs

# 讀取真值，返回 [樣本數]的numpy陣列
def read_label(path, filename):
    with open(path+filename,'rb') as bitfile:
        buffer = bitfile.read()
        head = struct.unpack_from('>II',buffer,0)
        print('load head:', head)
        labelNum = head[1]
        labelString = '>'+str(labelNum)+'B'
        offset = struct.calcsize('>II')
        imgs = struct.unpack_from(labelString,buffer,offset)

    label = np.reshape(imgs,[labelNum,1])
    labels = np.zeros([labelNum,10])
    for _ in range(labelNum):
        labels[_,label[_]] = 1.0
    print('load labels finished')
    return labels

class train(object):
    def __init__(self,path='mnist\\'):
        self.images = read_img(path, 'x_train.idx3-ubyte')
        self.labels = read_label(path,'y_train.idx1-ubyte')
        self.it_img=iter(self.images)
        self.it_label=iter(self.labels)


    def next_batch(self,batch_size):
        try:
            while True:
                batch_img=[]
                batch_label=[]
                for _ in range(batch_size):
                    batch_img.append(next(self.it_img))
                    batch_label.append(next(self.it_label))
                return np.array(batch_img), np.array(batch_label)
        except StopIteration:
            return StopIteration



class test(object):
    def __init__(self,path='mnist\\'):
        self.images = read_img(path, 'x_test.idx3-ubyte')
        self.labels = read_label(path,'y_test.idx1-ubyte')
        self.it_img=iter(self.images)
        self.it_label=iter(self.labels)

    def next_batch(self,batch_size):

        try:
            while True:
                batch_img=[]
                batch_label=[]
                for _ in range(batch_size):
                    batch_img.append(next(self.it_img))
                    batch_label.append(next(self.it_label))
                return np.array(batch_img), np.array(batch_label)
        except StopIteration:
            return StopIteration

兩個類，train 和test，讀取不同的資料則建立對應的物件即可

食用方法


In[1]: import mnist

In[2]: train=mnist.train()

load head: (2051, 60000, 28, 28)
load image finished
load head: (2049, 60000)
load labels finished

In[3]: test=mnist.test()

load head: (2051, 10000, 28, 28)
load image finished
load head: (2049, 10000)
load labels finished

In[4]: img,label=train.next_batch(batch_size=5)
img
Out[5]: 
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

label
Out[6]: 
array([[ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

In[7]: import numpy as np

In[8]: np.shape(img)
Out[8]: (5, 784)

In[9]: np.shape(label)
Out[9]: (5, 10)

基本上可以算開袋即食了

現在只包含了next_batch這一方法，如果之後有需求可以再加。

暴力補坑：win10+tensorflow+mnist+python3.6匯入mnist資料錯誤：UnicodeEncodeError

問題背景描述 mnist本身是tensorflow下最常用也是最簡單基礎的資料包。所以，在新安裝tensorflow，給tensorflow配gpu版本，或者試驗tensorflow的其他沒有接觸過的操作時經常被拿來作為測試之用。然而，官方文件裡所說的

win10 64位 python3.6 tensorflow安裝

近期對tensorflow比較感興趣，所以安裝下來了解了解。經過多次折騰，目前tensorflow win10 64 下邊暫時不支援python2.x.x python3.7.x。所以安裝python3.6. python3.6 安裝 python2 和

TensorFlow（3）：windows 上面使用python3.6 安裝tensorflow-gpu版本

1，關於tensorflow-gpu https://www.python.org/downloads/windows/ 版本需要特別的說明。要不然安裝會有問題。首先，python不能是最新的版本，tensorflow 的升級速度還是跟不上pyt

Selenium2+Python3.6實戰（八）：定位下拉菜單出錯，如何解決？用select或xpath定位。

排查會有有時 ide 導入 python3 很好沒有元素在登錄界面，有時候會有幾種不同的角色，針對不同角色定位到的信息是不一樣的。查詢資料知道定位下拉框的元素有兩種方式：Xpath和select。但是使用xpath定位時，user定位到了，登錄的時候卻是調用的a

win10，配置python3.6，虛擬環境

下載 spa pan 安裝 virtual scripts bat 配置啟動 1.安裝python3.6（官網下載） 2.pip install virtualenv(安裝虛擬環境) 3.virtualenv TestEnv（創建名為TestEnv的虛擬環境） 4.進入T

Python3 引入sqlite3時出現錯誤：ModuleNotFoundError: No module named '_sqlite3'

err error SQ 解決 python re 編譯安裝簡單的 class under 在Python3 中內置了SQLite3，但是在編譯安裝完之後執行： import sqlite3 出現錯誤： ModuleNotFoundError: No module

tensorflow學習筆記（二十八）：collection tensorflow學習筆記（二十八）：collection

tensorflow學習筆記（二十八）：collection 2016年12月27日 11:53:06 閱讀數：11346 tensorflow collection tensorflow的collection提供一個

Elam的caffe筆記之配置篇：基於CentOS6.5 python3.6 CUDA8.0 cudnnv5.1 opencv3.1 下的caffe配置

Elam的caffe筆記之配置篇：基於CentOS6.5 python3.6 CUDA8.0 cudnnv5.1 opencv3.1 下的caffe配置作為一個完完全全的linux小白，平時連cmd開啟資料夾都要百度的人，面對單獨配置caffe這個棘手任務，第一反應就是搜教程。

按照TensorFlow官方文件匯入MNIST資料集失敗的解決

參考極客學院TensorFlow官方文件中文版： http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_beginners.html import tensorflow.examples.tu

win10 tensorflow faster rcnn訓練自己的資料集（一、製作VOC2007資料集）

參考部落格：http://blog.csdn.net/gaohuazhao/article/details/60871886 一、關於VOC資料集： 1)JPEGImages資料夾資料夾裡包含了訓練圖片和測試圖片，混放在一起 2)Annatations資料夾資

Windows10 ，使用anaconda, 配置TensorFlow，python3.6

前言：為了學習卷積神經網路，故此需要安裝TensorFlow。試了一大圈，踩了無數坑，以此為鑑！ Windows10 ，配置TensorFlow，Python3.7(試了一大圈，沒有成功) Windows10，配置Tensorflow , python 3.6（python降級後，成功

win10中Anaconda(Python3.6)配置OpenCV3.3+深度神經網路實現

OpenCV3.3的釋出帶來的深度學習dnn模組的支援，模組支援 Caffe Tensorflow Torch 等多個深度學習框架，並且有多個網路API介面 GoogleLeNet AlexNet SqueezeNet VGGNet ResNet

Python3.6.5 pip命令錯誤安裝第三方庫方法

在安裝好Python環境後，使用pip命令會報錯： You are using pip version 9.0.3, however version 10.0.1 is available. You should consider upgrading via the 'python -m p

SSD-Tensorflow測試及訓練自己的資料錯誤錦集

深度學習小白一枚~入門初級，開始跑些實驗，一路遇到很多很多問題，在此僅記錄其中一部分，謝謝~環境：Ubuntu16.04+CUDA8.0+Cudnn8.0v6.0+python3.5+tensorflow1.4一、 SSD測試1. 下載SSD-Tensorflow原始

Python3.6實現PostgreSql資料表備份與恢復

sql_ = ['test_error.sql', 'test_post.sql', 'test_pre.sql', 'test_process.sql', 'test.sql'] for x in sql_: a = "pg_d

使用SVN檢出或匯入時出現錯誤：由於目標計算機積極拒絕，無法連線

今天在使用SVN時出現以下錯誤，查詢了下資料後，發現是由於自己把SVN Server給關閉了，正常情況下是不會發生這種情況的。因為你安裝完SVN 後，SVN是預設自動啟動的，我自己由於覺得程序太多，所

pyinstaller 打包python3.6+PyQt5中各種錯誤的解決方案

前言：最近在學習微控制器，老師要求自己寫串列埠通訊的上位機程式，我採用比較簡單的pyqt5+serial模組來完成任務，pycharm測試一切正常，在用pyinstaller打包時出現了很多錯誤，查詢了很多資料後得到了解決，這裡彙總一下解決的方法 1. W

TensorFlow之下載和匯入mnists資料集的read_data_sets()錯誤分析(從原始碼的角度)

在用TensorFlow的mnist資料集做手寫數字識別任務時，使用tensorflow自帶的模組（如下所示）下載和匯入資料集會報錯，原因是該模組爬取的資料集網站不能訪問。。因為該模組是用python內建urllib模組來下載資料的，需要提供有效的資料集網站地址

MySQL匯入sql指令碼錯誤：2006

到如一些小指令碼很少報錯，但最近匯入一個10+M的SQL指令碼，卻重複報錯： Error occured at:2014-03-24 11:42:24 Line no.:85 Error Code: 2006 - MySQL server has gone away 最終找到原因，原來是MyS

【安裝教程】python3.6安裝Tensorflow-GPU路上的那些坑（WIN10）

最近A股動盪難料，筆者在最近的兩週裡證券賬戶裡的資金縮水了10%，成功地完成了一個合格中國韭菜的本分工作——我買就跌停，我賣就瘋漲。三個月的炒股經歷成功交了一大筆學費，昨天週五對股市徹底喪失信心，灰溜溜地清了波倉，準備潛心研究一手股票資料再重振旗鼓捲土歸來。恰逢文字挖掘的

暴力補坑：win10+tensorflow+mnist+python3.6匯入mnist資料錯誤：UnicodeEncodeError

問題背景描述

準備

程式碼

食用方法

相關推薦