Python的yield用法與原理

阿新 • • 發佈：2019-01-15

翻了一篇workflow上關於yield的用法，翻的有點爛，在這裡貽笑大方了，慢慢來，總是期待著一點一點的進步。

為了理解yield的機制，我們需要理解什麼是生成器。在此之前先介紹迭代器iterables。

Iterables

當你建立一個list,你可以一個一個的獲取，這種列表就稱為迭代：

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

Mylist 是一個迭代器. 當你理解它為一個list，它便是可迭代的:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

任何可以用 for in 來迭代讀取的都是迭代容器，例如lists,strings,files.這些迭代器非常的便利，因為你可以想取多少便取多少，但是你得儲存所有的值，其中很多值都完全沒有必要每次都保持在記憶體中。

Generators

Generators(生成器)也是可迭代的，但是你每次只能迭代它們一次，因為不是所有的迭代器都被一直儲存在記憶體中的，他們臨時產生這些值：

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

生成器幾乎和迭代器是相同的，除了符號[]變為()。但是你無法用兩次，因為他們只生成一次：他們生成0然後丟棄，繼續統計1，接著是4，一個接著一個。

Yield

Yield的用法有點像return,除了它返回的是一個生成器，例如：

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

上面的例子幾乎非常積累，但是它很好的闡釋了yield的用法，我們可以知道createGenerator()生成的是一個生成器。

為了掌握yield的精髓，你一定要理解它的要點：當你呼叫這個函式的時候，你寫在這個函式中的程式碼並沒有真正的執行。這個函式僅僅只是返回一個生成器物件。有點過於奇技淫巧:-)

然後，你的程式碼會在每次for使用生成器的時候run起來。

現在是解釋最難的地方：

當你的for第一次呼叫函式的時候，它生成一個生成器，並且在你的函式中執行該迴圈，知道它生成第一個值。然後每次呼叫都會執行迴圈並且返回下一個值，知道沒有值返回為止。該生成器背認為是空的一旦該函式執行但是不再刀刀yield。之所以如此是因為該迴圈已經到達終點，或者是因為你再也不滿足“if/else”的條件。

Your code explained

例子：生成器：

# 這裡你建立一個node物件的一個生成器生成方法Here you create the method of the node object that will return the generator
def node._get_child_candidates(self, distance, min_dist, max_dist):

  # 這裡是每次被呼叫的程式碼Here is the code that will be called each time you use the generator object:
    
  # 如果還有一個左孩子節點If there is still a child of the node object on its left
  # 並且距離可以，返回下一個孩子節點AND if distance is ok, return the next child
  if self._leftchild and distance - max_dist < self._median:
      yield self._leftchild

  # 如果還有一個右孩子幾點If there is still a child of the node object on its right
  # 並且距離可以，返回下一個孩子節點AND if distance is ok, return the next child
  if self._rightchild and distance + max_dist >= self._median:
      yield self._rightchild

  # 如果方法執行到這裡，生成器會被認為為空If the function arrives here, the generator will be considered empty
  # there is no more than two values: the left and the right children

呼叫者:

# 建立一個空的列表Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# 迴圈candidates列表,只有一個元素。Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidates list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

這段程式碼包含一些非常機智的部分：

1. list的迴圈迭代部分，但是list在迴圈的同時又在拓展，：）這種方法是一種迴圈內嵌式的資料的相對簡潔的方法，但是又存在著一些風險可能會導致死迴圈的情況。在這個例子當中，candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) 耗盡所有的的生成器的值，但是當保持生成新的生成器物件，並且依據之前生成器產生許多不同的值，由於它產生於不同的節點。

2. extend()方法是一個list 物件方法，它產生一個迭代器並且新增它的值到list當中去。

通常我們

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

但是程式碼中獲得一個生成器，這種方式比較好的原因如下：

首先是你無須讀取該值兩次。

然後你不需要把所有的值都放在記憶體中。

與此同時，它能夠owrk的原因是Python不關心一個方法的引數石佛是一個list.期待是一個迭代器所以它能夠適用於strings,lists,tuples以及生成器。這被稱為動態型別或者鴨子型別（duck typing）是python 如此酷的一大原因。鴨子型別又是另外一個問題了，blablabla。

現在讓我們來看看一些高階的用法：

控制生成器資源消耗：

>>> class Bank(): # let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

這一個非常的有用，特別是類似的資源訪問控制。

Itertools模組

Itertools模組包含一些特別的函式去執行迭代器。有沒有想過去複製一個生成器或者連結兩個生成器?等等。

引入itertools就好了，import itertools.

下面舉個例子.看看四匹馬到達先後順序的例子：

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

最後是理解迭代器的內部機制：

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method). Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

更多的相關內容可以閱讀迴圈如何工作.

Python的yield用法與原理

Iterables

Generators

Yield

Your code explained

Itertools模組

最後是理解迭代器的內部機制：

List詳細用法與原理解析

Spark SQL入門用法與原理分析

Python的yield用法與原理

LruCache原理和用法與LinkedHashMap

斷點相關技術與原理（2）

淺談mmap()和ioremap()的用法與區別

nagios簡介與原理

關於數組方法中delete()與splice()的用法與不同點

jvm特性與原理---------->jvm運行時數據區分區

php運行機制與原理

_IO, _IOR, _IOW, _IOWR 宏的用法與解析

js中slice、splice用法與區別

【數據壓縮】JPEG標準與原理解析

let與expr命令的用法與實戰案例

fullcalendar的用法與小結

oracle 之偽列 rownum 和 rowid的用法與區別

ElasticSearch的基本用法與集群搭建

Spring Boot實戰與原理分析視頻課程

J2EE--Servlet生命周期與原理

SQL中MINUS的用法與UNION的用法

Python的yield用法與原理

Iterables

Generators

Yield

Your code explained

Itertools模組

最後是理解迭代器的內部機制：

相關推薦