torch.utils.data.DataLoader函式

阿新 • • 發佈：2018-12-15

class DataLoader(object):
    r"""
    Data loader. Combines a dataset and a sampler, and provides
    single- or multi-process iterators over the dataset.

    Arguments:
        dataset (Dataset): dataset from which to load the data.
        batch_size (int, optional): how many samples per batch to load
            (default: 1).
        shuffle (bool, optional): set to ``True`` to have the data reshuffled
            at every epoch (default: False).
        sampler (Sampler, optional): defines the strategy to draw samples from
            the dataset. If specified, ``shuffle`` must be False.
        batch_sampler (Sampler, optional): like sampler, but returns a batch of
            indices at a time. Mutually exclusive with batch_size, shuffle,
            sampler, and drop_last.
        num_workers (int, optional): how many subprocesses to use for data
            loading. 0 means that the data will be loaded in the main process.
            (default: 0)
        collate_fn (callable, optional): merges a list of samples to form a mini-batch.
        pin_memory (bool, optional): If ``True``, the data loader will copy tensors
            into CUDA pinned memory before returning them.
        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: False)
        timeout (numeric, optional): if positive, the timeout value for collecting a batch
            from workers. Should always be non-negative. (default: 0)
        worker_init_fn (callable, optional): If not None, this will be called on each
            worker subprocess with the worker id (an int in ``[0, num_workers - 1]``) as
            input, after seeding and before data loading. (default: None)

    .. note:: By default, each worker will have its PyTorch seed set to
              ``base_seed + worker_id``, where ``base_seed`` is a long generated
              by main process using its RNG. However, seeds for other libraies
              may be duplicated upon initializing workers (w.g., NumPy), causing
              each worker to return identical random numbers. (See
              :ref:`dataloader-workers-random-seed` section in FAQ.) You may
              use ``torch.initial_seed()`` to access the PyTorch seed for each
              worker in :attr:`worker_init_fn`, and use it to set other seeds
              before data loading.

    .. warning:: If ``spawn`` start method is used, :attr:`worker_init_fn` cannot be an
                 unpicklable object, e.g., a lambda function.
    """

    __initialized = False

    def __init__(self, dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None,
                 num_workers=0, collate_fn=default_collate, pin_memory=False, drop_last=False,
                 timeout=0, worker_init_fn=None):
        self.dataset = dataset
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.collate_fn = collate_fn
        self.pin_memory = pin_memory
        self.drop_last = drop_last
        self.timeout = timeout
        self.worker_init_fn = worker_init_fn

        if timeout < 0:
            raise ValueError('timeout option should be non-negative')

        if batch_sampler is not None:
            if batch_size > 1 or shuffle or sampler is not None or drop_last:
                raise ValueError('batch_sampler option is mutually exclusive '
                                 'with batch_size, shuffle, sampler, and '
                                 'drop_last')
            self.batch_size = None
            self.drop_last = None

        if sampler is not None and shuffle:
            raise ValueError('sampler option is mutually exclusive with '
                             'shuffle')

        if self.num_workers < 0:
            raise ValueError('num_workers option cannot be negative; '
                             'use num_workers=0 to disable multiprocessing.')

        if batch_sampler is None:
            if sampler is None:
                if shuffle:
                    sampler = RandomSampler(dataset)
                else:
                    sampler = SequentialSampler(dataset)
            batch_sampler = BatchSampler(sampler, batch_size, drop_last)

        self.sampler = sampler
        self.batch_sampler = batch_sampler
        self.__initialized = True

    def __setattr__(self, attr, val):
        if self.__initialized and attr in ('batch_size', 'sampler', 'drop_last'):
            raise ValueError('{} attribute should not be set after {} is '
                             'initialized'.format(attr, self.__class__.__name__))

        super(DataLoader, self).__setattr__(attr, val)

    def __iter__(self):
        return _DataLoaderIter(self)

    def __len__(self):
        return len(self.batch_sampler)

資料載入器，結合了資料集和取樣器，並且可以提供多個執行緒處理資料集。

在訓練模型時使用到此函式，用來把訓練資料分成多個小組，此函式每次丟擲一組資料。直至把所有的資料都丟擲。就是做一個數據的初始化。

此函式的引數：

dataset：包含所有資料的資料集

batch_size :每一小組所包含資料的數量

Shuffle : 是否打亂資料位置，當為Ture時打亂資料，全部丟擲資料後再次dataloader時重新打亂。

sampler : 自定義從資料集中取樣的策略，如果制定了取樣策略，shuffle則必須為False.

Batch_sampler:和sampler一樣，但是每次返回一組的索引，和batch_size, shuffle, sampler, drop_last 互斥。

num_workers : 使用執行緒的數量，當為0時資料直接載入到主程式，預設為0。

collate_fn:不太瞭解

pin_memory:s 是一個布林型別，為T時將會把資料在返回前先複製到CUDA的固定記憶體中

drop_last:布林型別，為T時將會把最後不足batch_size的資料丟掉，為F將會把剩餘的資料作為最後一小組。

timeout：預設為0。當為正數的時候，這個數值為時間上限，每次取一個batch超過這個值的時候會報錯。此引數必須為正數。

worker_init_fn:和程序有關係，暫時用不到

應用例項：

'''
批訓練：把資料分為一小批一小批進行訓練
Dataloader就是用來包裝使用的資料，
比如說該程式中把資料5個5個的打包，
每一次丟擲一組資料進行操作。
'''
import torch
import torch.utils.data as Data
torch.manual_seed(1)
BATCH_SIZE = 5

x = torch.linspace(1,10,10)
y = torch.linspace(10,1,10)

torch_dataset = Data.TensorDataset(x,y) #把資料放在資料庫中
loader = Data.DataLoader(
    # 從dataset資料庫中每次抽出batch_size個數據
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,#將資料打亂
    num_workers=2, #使用兩個執行緒
)
def show_batch():
    for epoch in range(3): #對全部資料進行3次訓練
        for step,(batch_x,batch_y) in enumerate(loader): # 每一次挑選出來的size個數據

            # training

            # 打印出來，觀察資料
            print('Epoch:',epoch,'|Step:',step,'|batch x:',
                  batch_x.numpy(),'|batch y:',batch_y.numpy())

if __name__ == '__main__':
    show_batch()

結果：

Epoch: 0 |Step: 0 |batch x: [ 5.  7. 10.  3.  4.] |batch y: [6. 4. 1. 8. 7.]
Epoch: 0 |Step: 1 |batch x: [2. 1. 8. 9. 6.] |batch y: [ 9. 10.  3.  2.  5.]
Epoch: 1 |Step: 0 |batch x: [ 4.  6.  7. 10.  8.] |batch y: [7. 5. 4. 1. 3.]
Epoch: 1 |Step: 1 |batch x: [5. 3. 2. 1. 9.] |batch y: [ 6.  8.  9. 10.  2.]
Epoch: 2 |Step: 0 |batch x: [ 4.  2.  5.  6. 10.] |batch y: [7. 9. 6. 5. 1.]
Epoch: 2 |Step: 1 |batch x: [3. 9. 1. 8. 7.] |batch y: [ 8.  2. 10.  3.  4.]

torch.utils.data.DataLoader函式

class DataLoader(object): r""" Data loader. Combines a dataset and a sampler, and provides single- or multi-process iterators

PyTorch原始碼解讀之torch.utils.data.DataLoader(轉)

原文連結 https://blog.csdn.net/u014380165/article/details/79058479 寫得特別好！最近正好在學習pytorch，學習一下！ PyTorch中資料讀取的一個重要介面是torch.utils.data.DataLoade

pytorch中的torch.utils.data.Dataset和torch.utils.data.DataLoader

首先看torch.utils.data.Dataset這個抽象類。可以使用這個抽象類來構造pytorch資料集。要注意的是以這個類構造的子類，一定要定義兩個函式一個是__len__，另一個是__getitem__，前者提供資料集size，而後者通過給定索引獲取資料和標籤。__

PyTorch原始碼解讀之torch.utils.data.DataLoader

PyTorch中資料讀取的一個重要介面是torch.utils.data.DataLoader，該介面定義在dataloader.py指令碼中，只要是用PyTorch來訓練模型基本都會用到該介面，該介面主要用來將自定義的資料讀取介面的輸出或者PyTorch已有的

pytorch的torch.utils.data.DataLoader認識

數據讀取作用數據定義 ORC tensor batch 一個讀取 PyTorch中數據讀取的一個重要接口是torch.utils.data.DataLoader，該接口定義在dataloader.py腳本中，只要是用PyTorch來訓練模型基本都會用到該接口，該

from torch.utils.data import DataLoader DataLoader類

from torch.utils.data import DataLoader dataloader = DataLoader(sample,batch_size=5,shuffle=True,num_workers=2) # 例項化引數dataset是一個數據集(這一點個人認為

深入理解計算機系統配套實驗（一） data lab 函式詳解

/* 135. * bitAnd - x&y using only ~ and | 136. * Example: bitAnd(6, 5) = 4 137. * Legal ops: ~ | 138. * Max ops: 8 139. *

Spring Data JPA 函式的用法

在今天的工作中，有一個分組查詢需要按照日期分組統計業務資料。其中有個棘手的問題是業務時間是按照Calendar型別存的，如果是string型別的話，就可以直接擷取年-月-日，然後按年-月-日group by就OK了。但是現在，涉及到時間的轉換。想了半天，發現CriteriaB

無法解析的外部符號 data-cfemail="1e41497770537f77705e2f28">[email protected]，該符號在函式 ___tmai

#include using namespace std; int main() { cout <<“This is a C++ program.”; return 0; } 1>------ 已啟動生成: 專案: hello1, 配置: Debug Win32 ---

d3 data()資料繫結中的key函式

官網https://github.com/d3/d3-selection/blob/master/README.md#selection_data var data = [ {name: "Locke", number: 4}, {name: "Reyes", number: 8},

vue 為什麼元件的data要寫成返回物件的函式

原因：物件為引用型別，當重用元件時，由於資料物件都指向同一個data物件，當在一個元件中修改data時，其他重用的元件中的data會同時被修改；而使用返回物件的函式，由於每次返回的都是一個新物件（Object的例項），引用地址不同，則不會出現這個問題上面解釋完，你可能還是不大清楚，下面我們舉個

Data Lake Analytics的Geospatial分析函式

0. 簡介為滿足部分客戶在雲上做Geometry資料的分析需求，阿里雲Data Lake Analytics（以下簡稱：DLA）支援多種格式的地理空間資料處理函式，符合Open Geospatial Consortium’s (OGC) OpenGIS規範，支援的常用資料格式包括： WKT W

mongo中的遊標與資料一致性的取捨 spring-data-mongodb 使用原生aggregate語句 mongo的runCommand與集合操作函式的關係 spring-data-mongodb與mongo shell的對應關係 mongo中的遊標與資料一致性的取捨

除了特殊註釋外，本文的測試結果均基於 spring-data-mongodb:1.10.6.RELEASE(spring-boot-starter:1.5.6.RELEASE)，MongoDB 3.0.6 　　我們在學習了一門程式語言時，一定要明白語句底層的意義，比如 User user= n

torch.utils.data.DataLoader函式

torch.utils.data.DataLoader函式

PyTorch原始碼解讀之torch.utils.data.DataLoader(轉)

pytorch中的torch.utils.data.Dataset和torch.utils.data.DataLoader

PyTorch原始碼解讀之torch.utils.data.DataLoader

pytorch的torch.utils.data.DataLoader認識

from torch.utils.data import DataLoader DataLoader類

深入理解計算機系統配套實驗（一） data lab 函式詳解

Spring Data JPA 函式的用法

無法解析的外部符號 data-cfemail="1e41497770537f77705e2f28">[email protected]，該符號在函式 ___tmai

d3 data()資料繫結中的key函式

vue 為什麼元件的data要寫成返回物件的函式

Data Lake Analytics的Geospatial分析函式

mongo中的遊標與資料一致性的取捨 spring-data-mongodb 使用原生aggregate語句 mongo的runCommand與集合操作函式的關係 spring-data-mongodb與mongo shell的對應關係 mongo中的遊標與資料一致性的取捨

torch.squeeze()函式的理解

pytorch視訊記憶體越來越多的一個潛在原因-- 這個函式還沒有在torch.cuda.Tensor中定義

vue如何在websocket方法內獲取data裡的資料和method裡的函式

vue-為什麼子元件中的data選項必須是函式？

pytorch學習（一）：torch.nn.utils.rnn.pack_padded_sequence()的用法

TP5 模板函式使用方法，時間格式{$data.create_time|date='Y-m-d H:i'}

微信小程式js頁面引用/utils函式引用

torch.utils.data.DataLoader函式

相關推薦

無法解析的外部符號 data-cfemail="1e41497770537f77705e2f28">[email protected]，該符號在函式 ___tmai