【pytorch學習】《TensorDataset》中的getitem 和《DataLoader》

阿新 • • 發佈：2018-12-19

class TensorDataset(Dataset):
    """Dataset wrapping tensors.
    Each sample will be retrieved by indexing tensors along the first dimension.
    Arguments:
        *tensors (Tensor): tensors that have the same size of the first dimension.
    """

    def __init__(self, *tensors):
        assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
        self.tensors = tensors

    def __getitem__(self, index):
        return tuple(tensor[index] for tensor in self.tensors)

    def __len__(self):
        return self.tensors[0].size(0)

class DataLoader(object):
    r"""
    Data loader. Combines a dataset and a sampler, and provides
    single- or multi-process iterators over the dataset.

    Arguments:
        dataset (Dataset): dataset from which to load the data.
        batch_size (int, optional): how many samples per batch to load
            (default: 1).
        shuffle (bool, optional): set to ``True`` to have the data reshuffled
            at every epoch (default: False).
        sampler (Sampler, optional): defines the strategy to draw samples from
            the dataset. If specified, ``shuffle`` must be False.
        batch_sampler (Sampler, optional): like sampler, but returns a batch of
            indices at a time. Mutually exclusive with batch_size, shuffle,
            sampler, and drop_last.
        num_workers (int, optional): how many subprocesses to use for data
            loading. 0 means that the data will be loaded in the main process.
            (default: 0)
        collate_fn (callable, optional): merges a list of samples to form a mini-batch.
        pin_memory (bool, optional): If ``True``, the data loader will copy tensors
            into CUDA pinned memory before returning them.
        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: False)
        timeout (numeric, optional): if positive, the timeout value for collecting a batch
            from workers. Should always be non-negative. (default: 0)
        worker_init_fn (callable, optional): If not None, this will be called on each
            worker subprocess with the worker id (an int in ``[0, num_workers - 1]``) as
            input, after seeding and before data loading. (default: None)

    .. note:: By default, each worker will have its PyTorch seed set to
              ``base_seed + worker_id``, where ``base_seed`` is a long generated
              by main process using its RNG. However, seeds for other libraies
              may be duplicated upon initializing workers (w.g., NumPy), causing
              each worker to return identical random numbers. (See
              :ref:`dataloader-workers-random-seed` section in FAQ.) You may
              use ``torch.initial_seed()`` to access the PyTorch seed for each
              worker in :attr:`worker_init_fn`, and use it to set other seeds
              before data loading.

    .. warning:: If ``spawn`` start method is used, :attr:`worker_init_fn` cannot be an
                 unpicklable object, e.g., a lambda function.
    """

    __initialized = False

    def __init__(self, dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None,
                 num_workers=0, collate_fn=default_collate, pin_memory=False, drop_last=False,
                 timeout=0, worker_init_fn=None):
        self.dataset = dataset
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.collate_fn = collate_fn
        self.pin_memory = pin_memory
        self.drop_last = drop_last
        self.timeout = timeout
        self.worker_init_fn = worker_init_fn

        if timeout < 0:
            raise ValueError('timeout option should be non-negative')

        if batch_sampler is not None:
            if batch_size > 1 or shuffle or sampler is not None or drop_last:
                raise ValueError('batch_sampler option is mutually exclusive '
                                 'with batch_size, shuffle, sampler, and '
                                 'drop_last')
            self.batch_size = None
            self.drop_last = None

        if sampler is not None and shuffle:
            raise ValueError('sampler option is mutually exclusive with '
                             'shuffle')

        if self.num_workers < 0:
            raise ValueError('num_workers option cannot be negative; '
                             'use num_workers=0 to disable multiprocessing.')

        if batch_sampler is None:
            if sampler is None:
                if shuffle:
                    sampler = RandomSampler(dataset)
                else:
                    sampler = SequentialSampler(dataset)
            batch_sampler = BatchSampler(sampler, batch_size, drop_last)

        self.sampler = sampler
        self.batch_sampler = batch_sampler
        self.__initialized = True

    def __setattr__(self, attr, val):
        if self.__initialized and attr in ('batch_size', 'sampler', 'drop_last'):
            raise ValueError('{} attribute should not be set after {} is '
                             'initialized'.format(attr, self.__class__.__name__))

        super(DataLoader, self).__setattr__(attr, val)

    def __iter__(self):
        return _DataLoaderIter(self)

    def __len__(self):
        return len(self.batch_sampler)

【pytorch學習】《TensorDataset》中的getitem 和《DataLoader》

class TensorDataset(Dataset): """Dataset wrapping tensors. Each sample will be retrieved by indexing tensors along the first dimen

【JAVA學習】java中==、equals()、hashCode()都和物件的比較有關，在java中這三者各有什麼用處呢，即java中為什麼需要設計這三種物件的比較方法呢？

關於hashCode() 為什麼會設計hashCode()方法？ hashCode()方法返回的就是一個數值，我們稱之為hashCode吧。從方法的名稱上就可以看出，其目的是生成一個hash碼。hash碼的主要用途就是在對物件進行雜湊的時候作為key輸入，據此很容易推斷出，我們需要每個物件的ha

【spring Boot】Spring中@Controller和@RestController之間的區別

處理 public 不同 esp 舉例 rest control tro adding spring Boot入手的第一天，看到例子中的@RestController ............. 相同點：都是用來表示Spring某個類的是否可以接收HTTP請求不同點：@C

【Python學習】Python中的數據類型精度問題

類型一次 /usr logs int 第一次 pytho 整數問題 Python真的很神奇。。。神奇到沒有直接的數據類型概念，並且精度可以是任意精度。想當初，第一次接觸OI算法時，寫得第一個算法就是高精度加法，搗鼓了半天。一切在Python看來，僅僅三行代碼即可完成。

【知識筆記】js中undefined和null的區別和聯絡

在JavaScript中存在這樣兩種原始型別:Null與Undefined。這兩種型別常常會使JavaScript的開發人員產生疑惑，在什麼時候是Null，什麼時候又是Undefined?Undefined型別只有一個值，即undefined。當宣告的變數還未被初始化時，變數的預設值為undefined。Nu

【unix學習】檔案系統資訊和檔案許可權

檔案系統資訊和檔案許可權 cpuinfo cpu的資訊總核數 = 物理CPU個數 X 每顆物理CPU的核數總邏輯CPU數 = 物理CPU個數 X 每顆物理CPU的核數 X 超執行緒數檢視物理CPU個數 cat /proc/cpuinfo|

【機器學習】pyspark中RDD的若干操作

1，讀取檔案 from pyspark import SparkContext sc = SparkContext('local', 'pyspark') a，text = sc.textFile(“file:///d:/test.txt”) b，rd

【深度學習】RNN中梯度消失的解決方案（LSTM）

上個部落格裡闡述了梯度消失的原因，同時梯度消失會造成RNN的長時記憶失效。所以在本部落格中，會闡述梯度消失的解決方案：①梯度裁剪（Clipping Gradient）②LSTM（Long Short-T

【演算法導論】9.中位數和順序統計量

在一個由n個元素組成的集合中，第i個順序統計量是該集合中第i小的元素。一箇中位數是它所屬集合的“中點元素”。中位數總是出現在上中位數處和下中位數處，本書中所用的“中位數”都是指下中位數。本章討論的問題是，從一個由n個互異的元素構成的集合中選擇第i個順序統計量的問題，假設集合中的元素都是互異的。

【機器學習】影象中的降噪方法之一：低秩降噪

概述近幾年，低秩矩陣恢復（LRMR）廣泛用於影象處理用途影象恢復，比如去噪、去模糊等。一幅清晰的自然影象其資料矩陣往往是低秩或者近似低秩的，但存在隨機幅值任意大但是分佈稀疏的誤差破壞了原有資料的低秩性。低秩矩陣恢復是將退化影象看做一組低維資料加上噪聲形成的，因此退化前的資料就可以通過低秩矩陣來

【JAVA學習】Java中迭代器Iterator遍歷的兩種用法

迭代器的三個方法 2. 使用hasNext()檢查序列中是否還有元素，有則返回true。 3.使用remove()將迭代器新返回的元素刪除。兩種迴圈方法： //使用for迴圈迭代 for (Iterator iter = l.iterator(); iter.h

【Mybatis學習】框架中使用到的設計模式

框架中是由SqlSessionFactory建立SqlSession的，也即會話，類似JDBC中的Connection。在每次進行程式-資料庫操作時都需要建立SqlSession，因此該SqlSessionFactory會工作在整個生命週期內，如果每次進行操作時都新建它時會極大地消耗連線資源，而且職責也唯一，

【Mybatis學習】Mybatis的解析和執行簡要介紹

Mybatis執行原理簡要分析 1.SqlSessionFactory的構建 SqlsessionFactory主要用來建立SqlSession，為了構建SqlsessionFactory，需要提供配置檔案以及相關引數。構建主要分為2步：解析配置檔案並生成Config

【JAVA學習】——JAVA中異常及其處理

首先在瞭解異常的概念之前，需要了解一下異常和普通問題：對於普通的問題在編譯上是語法錯誤，而相較之異常則是在執行時邏輯產生的錯誤，往往產生異常。當然這是表面的理解。普通問題是指：在當前環境下不能得到足夠的資訊，比如語法錯誤，此時錯誤將會向外傳

【計算機視覺】SIFT中LoG和DoG比較

在實際計算時，三種方法計算的金字塔組數noctaves，尺度空間座標σ，以及每組金字塔內的層數S是一樣的。同時，假設影象為640*480的標準影象。金字塔層數：其中o_min = 0，對於解析度為640*480的影象N=5。每組金字塔內影象數：S=3，即在做極值檢測時使用金子

【JAVA學習】JAVA中int、String的型別轉換

A. 有叄種方法: 1.) String s = String.valueOf(i); 2.) String s = Integer.toString(i); 3.) String s = "" + i; 注: Double, Float, Long 轉成字串的方法大同小異. JAVA資料型別轉換這是一

【機器學習】--線性回歸中L1正則和L2正則

last clas nbsp post pan red font 推廣 http 一、前述 L1正則，L2正則的出現原因是為了推廣模型的泛化能力。相當於一個懲罰系數。二、原理 L1正則：Lasso Regression L2正則：Ridge Regression

【學習筆記】JAVA中replace和replaceAll的區別

replaceAll()&&replace區別： 1、replaceA(regex,replace)引數是regex,是基於正則表示式的替換； 2、replace(oldChar, newChar)可以支援字元的替換，也可以支援字串的替換； PS

【深度學習】關於pytorch中使用pretrained的模型，對模型進行調整

在pytorch中對model進行調整有多種方法。但是總有些莫名奇妙會報錯的。下面有三種，詳情見部落格一是載入完模型後直接修改，（對於resnet比較適用，對於vgg就不能用了）比如： model.fc = nn.Linear(fc_feature

【JAVA學習】session 清理快取的理解和如何清空快取中的資料

尊重版權：http://blog.sina.com.cn/s/blog_62a151be0100nf28.html Session執行一些sql語句把記憶體中的物件的狀態同步到資料庫,這個過程被稱為session清理. 在預設情況下，Session會在下面的時間點清理

【pytorch學習】《TensorDataset》中的__getitem__ 和《DataLoader》

相關推薦

【pytorch學習】《TensorDataset》中的getitem 和《DataLoader》