numpy高維陣列獲取top-K

阿新 • • 發佈：2021-01-13

技術標籤：Python

文章目錄

前言

理論知識請自行翻閱numpy的argpartition和partition方法的實現原理，該文章僅僅包含使用和效率驗證。此外，numpy版本需要>=1.8.0。

正文

不廢話了，直接放程式碼，一看就懂，看不懂再說，自己跑一下就知道。

import numpy as np

def get_sorted_top_k(array, top_k=1, axis=-1, reverse=False):
    """
    多維陣列排序
    Args:
        array: 多維陣列
        top_k: 取數
        axis: 軸維度
        reverse: 是否倒序

    Returns:
        top_sorted_scores: 值
        top_sorted_indexes: 位置
    """ 

    if reverse:
        # argpartition分割槽排序，在給定軸上找到最小的值對應的idx，partition同理找對應的值
        # kth表示在前的較小值的個數，帶來的問題是排序後的結果兩個分割槽間是仍然是無序的
        # kth絕對值越小，分割槽排序效果越明顯
        axis_length = array.shape[axis]
        partition_index = np.take(np.argpartition(array, kth=-top_k, axis=axis),
                                  range 
(axis_length - top_k, axis_length), axis)
    else:
        partition_index = np.take(np.argpartition(array, kth=top_k, axis=axis), range(0, top_k), axis)
    top_scores = np.take_along_axis(array, partition_index, axis)
    # 分割槽後重新排序
    sorted_index = np.argsort(top_scores, axis=axis)
    if reverse: 

        sorted_index = np.flip(sorted_index, axis=axis)
    top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
    top_sorted_indexes = np.take_along_axis(partition_index, sorted_index, axis)
    return top_sorted_scores, top_sorted_indexes


if __name__ == "__main__":
    import time
    from sklearn.metrics.pairwise import cosine_similarity

    x = np.random.rand(10, 128)
    y = np.random.rand(1000000, 128)
    z = cosine_similarity(x, y)
    start_time = time.time()
    sorted_index_1 = get_sorted_top_k(z, top_k=3, axis=1, reverse=True)[1]
    print(time.time() - start_time)
    start_time = time.time()
    sorted_index_2 = np.flip(np.argsort(z, axis=1)[:, -3:], axis=1)
    print(time.time() - start_time)
    print((sorted_index_1 == sorted_index_2).all())

後記

不吹比的說一句，這段程式碼看著perfect好吧，效率提升不少。

numpy高維陣列獲取top-K

技術標籤：Python 文章目錄前言正文後記前言理論知識請自行翻閱numpy的argpartition和partition方法的實現原理，該文章僅僅包含使用和效率驗證。此外，numpy版本需要>=1.8.0。

獲取list中Top K個值對應的索引

技術標籤：pandas資料處理背景在得到概率分佈的list之後，想要得到 Top K個概率值及其索引，因為索引對應著label id。

獲取陣列的前K小的元素

//升序排列 template <typename Comparable> int partition(vector<Comparable>& v, int left, int right) {

Python中低維陣列填充高維陣列的實現

今天遇到這樣一種業務情況：我的圖片的畫布是（4,4,3）的三維陣列，而得到的圖片是（2,2,3）的三維陣列，我要把圖片放到畫布的中間某個位置應該怎麼做呢？

Numpy對陣列的操作：建立、變形(升降維等)、計算、取值、複製、分割、合併

1. 簡介 NumPy(Numerical Python) 是 Python 語言的一個擴充套件程式庫，支援大量的維度陣列與矩陣運算，此外也針對陣列運算提供大量的數學函式庫。最主要的資料結構是ndarray陣列。

如何解決TOP-K問題

前言：最近在開發一個功能:動態展示的訂單數量排名前10的城市,這是一個典型的Top-k問題，其中k=10,也就是說找到一個集合中的前10名。實際生活中Top-K的問題非常廣泛，比如：微博熱搜的前100名、抖音直播的小時榜前50

top K問題

問題描述在大規模資料處理中，經常會遇到的一類問題就是在海量資料中找出出現頻率最高的前K個數，或者從海量資料中找出最大的前K個數，這類問題通常被稱為top K問題。

python資料分析 Numpy基礎陣列和向量計算

NumPy（Numerical Python的簡稱）是Python數值計算最重要的基礎包。大多數提供科學計算的包都是用NumPy的陣列作為構建基礎。

numpy 建立陣列

numpy 建立陣列 1.從已有列表轉換為陣列 import numpy as npa = [1, 2, 3, 4]array = np.asarray(a)# np.array(a)print(array, type(array))

Leetcode刷題 - Top K 系列

347.Top K Frequent Elements class Solution { public: vector<int> topKFrequent(vector<int>& nums, int k) {

347. Top K Frequent Elements, O(N) solution

package LeetCode_347 import kotlin.collections.ArrayList import kotlin.collections.HashMap /** * 347. Top K Frequent Elements

Neural Network 學習3 Top-k的學習例項

import tensorflow as tftf.random.set_seed(2467)# 隨機種子，我也沒搞明白是什麼玩意output = tf.random.normal([10, 6])# 模擬預測的結果，10個樣本，6類分類output = tf.math.softmax(output, axis=1)# 使得每一