使用Python和百度語音識別生成視訊字幕的實現

阿新 • • 發佈：2020-04-10

從視訊中提取音訊

安裝 moviepy

pip install moviepy

相關程式碼：

audio_file = work_path + '\\out.wav'
video = VideoFileClip(video_file)
video.audio.write_audiofile(audio_file,ffmpeg_params=['-ar','16000','-ac','1'])

根據靜音對音訊分段

使用音訊庫 pydub，安裝：

pip install pydub

第一種方法：

# 這裡silence_thresh是認定小於-70dBFS以下的為silence，發現小於 sound.dBFS * 1.3 部分超過 700毫秒，就進行拆分。這樣子分割成一段一段的。
sounds = split_on_silence(sound,min_silence_len = 500,silence_thresh= sound.dBFS * 1.3)


sec = 0
for i in range(len(sounds)):
 s = len(sounds[i])
 sec += s
print('split duration is ',sec)
print('dBFS: {0},max_dBFS: {1},duration: {2},split: {3}'.format(round(sound.dBFS,2),round(sound.max_dBFS,sound.duration_seconds,len(sounds)))

使用Python和百度語音識別生成視訊字幕的實現

感覺分割的時間不對，不好定位，我們換一種方法：

# 通過搜尋靜音的方法將音訊分段
# 參考：https://wqian.net/blog/2018/1128-python-pydub-split-mp3-index.html
timestamp_list = detect_nonsilent(sound,500,sound.dBFS*1.3,1)
 
for i in range(len(timestamp_list)):
 d = timestamp_list[i][1] - timestamp_list[i][0]
 print("Section is :",timestamp_list[i],"duration is:",d)
print('dBFS: {0},len(timestamp_list)))

輸出結果如下：

使用Python和百度語音識別生成視訊字幕的實現

感覺這樣好處理一些

使用百度語音識別

現在百度智慧雲平臺建立一個應用，獲取 API Key 和 Secret Key：

使用Python和百度語音識別生成視訊字幕的實現

獲取 Access Token

使用百度 AI 產品需要授權，一定量是免費的，生成字幕夠用了。

'''
百度智慧雲獲取 Access Token
'''
def fetch_token():
 params = {'grant_type': 'client_credentials','client_id': API_KEY,'client_secret': SECRET_KEY}
 post_data = urlencode(params)
 if (IS_PY3):
  post_data = post_data.encode( 'utf-8')
 req = Request(TOKEN_URL,post_data)
 try:
  f = urlopen(req)
  result_str = f.read()
 except URLError as err:
  print('token http response http code : ' + str(err.errno))
  result_str = err.reason
 if (IS_PY3):
  result_str = result_str.decode()


 print(result_str)
 result = json.loads(result_str)
 print(result)
 if ('access_token' in result.keys() and 'scope' in result.keys()):
  print(SCOPE)
  if SCOPE and (not SCOPE in result['scope'].split(' ')): # SCOPE = False 忽略檢查
   raise DemoError('scope is not correct')
  print('SUCCESS WITH TOKEN: %s EXPIRES IN SECONDS: %s' % (result['access_token'],result['expires_in']))
  return result['access_token']
 else:
  raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response')

使用 Raw 資料進行合成

這裡使用百度語音極速版來合成文字，因為官方介紹專有GPU服務叢集，識別響應速度較標準版API提升2倍及識別準確率提升15%。適用於近場短語音互動，如手機語音搜尋、聊天輸入等場景。支援上傳完整的錄音檔案，錄音檔案時長不超過60秒。實時返回識別結果

def asr_raw(speech_data,token):
 length = len(speech_data)
 if length == 0:
  # raise DemoError('file %s length read 0 bytes' % AUDIO_FILE)
  raise DemoError('file length read 0 bytes')


 params = {'cuid': CUID,'token': token,'dev_pid': DEV_PID}
 #測試自訓練平臺需要開啟以下資訊
 #params = {'cuid': CUID,'dev_pid': DEV_PID,'lm_id' : LM_ID}
 params_query = urlencode(params)


 headers = {
  'Content-Type': 'audio/' + FORMAT + '; rate=' + str(RATE),'Content-Length': length
 }


 url = ASR_URL + "?" + params_query
 # print post_data
 req = Request(ASR_URL + "?" + params_query,speech_data,headers)
 try:
  begin = timer()
  f = urlopen(req)
  result_str = f.read()
  # print("Request time cost %f" % (timer() - begin))
 except URLError as err:
  # print('asr http response http code : ' + str(err.errno))
  result_str = err.reason


 if (IS_PY3):
  result_str = str(result_str,'utf-8')
 return result_str

生成字幕

字幕格式： https://www.cnblogs.com/tocy/p/subtitle-format-srt.html

生成字幕其實就是語音識別的應用，將識別後的內容按照 srt 字幕格式組裝起來就 OK 了。具體字幕格式的內容可以參考上面的文章，程式碼如下：

idx = 0
for i in range(len(timestamp_list)):
 d = timestamp_list[i][1] - timestamp_list[i][0]
 data = sound[timestamp_list[i][0]:timestamp_list[i][1]].raw_data
 str_rst = asr_raw(data,token)
 result = json.loads(str_rst)
 # print("rst is ",result)
 # print("rst is ",rst['err_no'][0])


 if result['err_no'] == 0:
  text.append('{0}\n{1} --> {2}\n'.format(idx,format_time(timestamp_list[i][0]/ 1000),format_time(timestamp_list[i][1]/ 1000)))
  text.append( result['result'][0])
  text.append('\n')
  idx = idx + 1
  print(format_time(timestamp_list[i][0]/ 1000),"txt is ",result['result'][0])
with open(srt_file,"r+") as f:
 f.writelines(text)

總結

我在視訊網站下載了一個視訊來作測試，極速模式從速度和識別率來說都是最好的，感覺比網易見外平臺還好用。

到此這篇關於使用Python和百度語音識別生成視訊字幕的文章就介紹到這了,更多相關Python 百度語音識別生成視訊字幕內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們！

使用Python和百度語音識別生成視訊字幕的實現

從視訊中提取音訊安裝 moviepy pip install moviepy 相關程式碼： audio_file = work_path + \'\\\\out.wav\'

Python結合百度語音識別實現實時翻譯軟體的實現

一、所需庫安裝 pip install PyAudio pip install SpeechRecognition pip install baidu-aip pip install Wave

基於python實現百度語音識別和圖靈對話

圖例如下 https://github.com/Dongvdong/python_Smartvoice 上電後，只要周圍聲音超過 2000，開始錄音5S

python錄音並呼叫百度語音識別介面的示例

#！/usr/bin/env python import requests import json import base64 import pyaudio import wave import os import psutil

springBoot+vue+百度語音識別

1.將百度語音識別demo下載下來並且用maven封裝成一個jar包核心程式碼如下 package com.baidu.speech.restapi.asrdemo;

python 利用百度API識別圖片文字【2】（多執行緒版）

#!/usr/bin/env python3 # -*- coding: utf-8 -*- \"\"\" Created on Tue Jun 12 09:37:38 2018 利用百度api實現圖片文字識別

python 利用百度API識別圖片文字（多執行緒版）

#!/usr/bin/env python3 # -*- coding: utf-8 -*- \"\"\" Created on Tue Jun 12 09:37:38 2018 利用百度api實現圖片文字識別

Android 新增百度語音識別

技術標籤：android介面android語音識別java 目錄下載demo依賴core新增實現程式碼使用方法返回資料注意事項錯誤程式碼

Android使用百度語音識別api程式碼實現。

第一步 ① 建立平臺應用點選百度智慧雲進入，沒有賬號的可以先註冊賬號，這裡預設都有賬號了，然後登入。

python 爬蟲百度地圖的資訊介面的實現方法

在爬蟲百度地圖的期間，就為它做了一個介面，運用的是PyQt5。得到意想不到的結果：

python利用百度雲介面實現車牌識別的示例

一個小需求---實現車牌識別。目前有兩個想法 1. 調雲線上的介面或者使用SDK做開發（配置環境和編譯第三方庫很麻煩，當然使用python可以避免這些問題）

Python基於百度AI實現OCR文字識別

百度AI功能還是很強大的，百度AI開放平臺真的是測試介面的天堂，免費介面很多，當然有量的限制，但個人使用是完全夠用的，什麼人臉識別、MQTT伺服器、語音識別等等，應有盡有。

Python呼叫百度OCR實現圖片文字識別的示例程式碼

百度AI提供了一天50000次的免費文字識別額度，可以愉快的免費使用！下面直接上方法：

python 呼叫百度介面做人臉識別

操作步驟差不多，記得要在百度AIPI中的控制檯中建立對應的工單建立工單成功後會生成兩個key 這個兩個key是要生成tokn 用

Python呼叫百度地圖和高德地圖API批量獲取國內城市地址經緯度座標

1 資料準備經過嘗試，百度地圖API需要輸入城市中文名稱才能獲取對應經緯度座標，因此先將英文的城市名稱轉為中文

python呼叫百度API實現人臉識別

1、程式碼 from aip import AipFace import cv2 import time import base64 from PIL import Image from io import BytesIO

python 利用百度API批量識別圖片文字【1】

#!/usr/bin/env python3 # -*- coding: utf-8 -*- \"\"\" Created on Tue Jun 12 09:37:38 2018 利用百度api實現圖片文字識別

樹莓派目標識別 python呼叫百度api通用物體識別

python呼叫百度api通用物體識別 https://console.bce.baidu.com/ai/?_=1648487181952&fromai=1#/ai/imagerecognition/overview/index

Python呼叫百度地圖api獲取起點終點距離和預估時長

去百度地圖開放平臺申請API的AK https://lbsyun.baidu.com/apiconsole/center#/home import pandas as pd

python採集百度搜索結果帶有特定URL的連結程式碼例項

這篇文章主要介紹了python採集百度搜索結果帶有特定URL的連結程式碼例項,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

使用Python和百度語音識別生成視訊字幕的實現

相關推薦