python對檔案進行平行計算初探(二）

阿新 • • 發佈：2020-12-10

上次的平行計算是通過將大檔案分割成小檔案，涉及到檔案分割，其實更有效的方法是在記憶體中對檔案進行分割，分別計算

最後將返回結果直接寫入目標檔案，省去了分割小檔案合併小檔案刪除小檔案的過程

程式碼如下：

import math
from multiprocessing import Pool

"""
不分割檔案，直接起多個程序對檔案進行讀寫
apply_async的callback接收的引數是呼叫行數的返回值，err_callback接收的引數是丟擲來的異常
"""


# 使用者業務邏輯
def business(line):
    return line


def my_callback(lines):
    with open( 
'output', 'a') as f:
        f.writelines(lines)


# 讀取分塊檔案
class Reader(object):
    def __init__(self, file_name, start_pos, end_pos, business_func):
        self.file_name = file_name
        self.start_pos = start_pos
        self.end_pos = end_pos
        self.business_func = business_func

     
def execute(self):
        lines = []
        with open(self.file_name, 'r') as f:
            if self.start_pos != 0:
                f.seek(self.start_pos - 1)
                if f.read(1) != '\n':
                    line = f.readline()
                    self.start_pos = f.tell()
            f.seek(self.start_pos)
             
while self.start_pos <= self.end_pos:
                line = f.readline()
                new_line = self.business_func(line)
                lines.append(new_line)
                self.start_pos = f.tell()
        return '\n'.join(lines) + '\n'


# 將檔案分成要求的塊數，以list返回起止pos
class FileBlock(object):
    def __init__(self, file_name, block_num):
        self.file_name = file_name
        self.block_num = block_num

    def block_file(self):
        pos_list = []
        with open(self.file_name, 'r') as f:
            f.seek(0, 2)
            start_pos = 0
            file_size = f.tell()
            block_size = math.ceil(file_size / self.block_num)
            while start_pos <= file_size:
                if start_pos + block_size > file_size:
                    pos_list.append((start_pos, file_size))
                else:
                    pos_list.append((start_pos, start_pos + block_size))
                start_pos = start_pos + block_size + 1

        return pos_list


if __name__ == '__main__':
    concurrency = 8
    p = Pool(concurrency)
    input_file = '/opt/test/target.txt'
    output_file = '/opt/test/target2.txt'
    fb = FileBlock(input_file, concurrency)
    for s, e in fb.block_file():
        reader = Reader(input_file, s, e, business)
        p.apply_async(reader.execute, callback=my_callback)

    p.close()
    p.join()

python對檔案進行平行計算初探(二）

上次的平行計算是通過將大檔案分割成小檔案，涉及到檔案分割，其實更有效的方法是在記憶體中對檔案進行分割，分別計算

如何使用Python對檔案進行壓縮與解壓縮

前言我們在日常工作中，除了會涉及到使用Python處理文字檔案，有時候還會涉及對壓縮檔案的處理。

基於python實現對檔案進行切分行

針對配置檔案進行切分，重組，每隔30行為一段，進行重新生成功能。程式碼如下

python對檔案的操作方法彙總

規則：open(file_name[,access_mode][,buffering]) 　　引數說明　　file_name:、檔案路徑+檔名稱，加路徑從路徑開始訪問，不加路徑直接訪問的是與你編輯的py檔案在同一目錄下的檔案

python對陣列進行排序,並輸出排序後對應的索引值方式

廢話不多說，直接上程式碼吧！ # -*- coding: cp936 -*- import numpy as np #一維陣列排序

如何在Python對Excel進行讀取

　　在python自動化中，經常會遇到對資料檔案的操作，比如新增多名員工，但是直接將員工資料寫在python檔案中，不但工作量大，要是以後再次遇到類似批量資料操作還會寫在python檔案中嗎？

利用執行緒對檔案進行分割

情景：將一個檔案進行分割以位元組進行平均分割成２份存入兩個檔案： import os

python對檔案的操作

大家記得關注官方釋出的相關資訊： https://docs.python.org/zh-cn/3.10/contents.html 從python2.7 用到3的開發者這種變化還是比較明顯的最明顯的是 print 的用法

python對埠進行掃描

使用cocket模組配合多執行緒對埠進行掃描，後續功能正在思考ing. import socket from multiprocessing.dummy import Pool as ThreadPool

visual studio C++ 使用OpenMP 進行平行計算

https://blog.csdn.net/dengm155/article/details/78836447?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.n