最新google演算法：實現中文TTS的測試結果

阿新 • • 發佈：2018-11-12

簡介

本文主要是實現中文的TTS，沒有接入百度、阿里、騰訊和訊飛的API，僅僅依靠自己的訓練演算法和經過樣本處理和測試而成。

樣本的製作方法：

由於本人時間和金錢的限制，無法找專業的人員錄製大量樣本。本文的解決辦法為：

藉助百度語音合成API

神經百度的語音合成API，編寫一個簡潔的程式碼，實現百度API讀取一本45W字的小說，以每句話作為一個訓練樣本。

import os
import re
from aip import AipSpeech
import time

APP_ID = '114788XX'   #你自己申請的API ID
API_KEY = '2m4bO8OV8F21saqe96H8' 
    #你自己申請的API key
SECRET_KEY = 'IO5faSMp7tPkeIjBwClDFTj'   #你自己申請的secret key

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

# txt_path = 'XX.txt'  
txt_path = 'XX.txt'  #你自己讓百度API生成訓練樣本的文字

# with open(txt_path, 'r', encoding='utf8') as f:
#     text = f.read()
#     text = re.sub(r'(.{30})', lambda x: '{}\n'.format(x.group(1)), text) 


# with open(txt_path, 'w', encoding='utf8') as f:
#     f.write(text)

with open(txt_path, 'r', encoding='utf8') as f:
    for index, line in enumerate(f):
        index = '2B%06d'%index
        # if index < 8331:
        #     continue
        line = line.strip()

        try:
            res = client.synthesis(line 
, 'zh', 1, {'per': '4', 'spd': '5', 'vol': '7', 'aue': '6'})
        except Exception:
            time.sleep(5)
            res = client.synthesis(line, 'zh', 1, {'per': '4', 'spd': '5', 'vol': '7', 'aue': '6'})
        if not isinstance(res, dict):
            with open('./wav/{}.wav'.format(index), 'wb') as f:
                f.write(res)

            with open('./txt/{}.txt'.format(index), 'w') as f:
                #line = pinyin.get(line, format="numerical", delimiter=" ")
                f.write(line)
        else:
            print(index, 'err')

        print(index)            
        # index += 1

訓練及樣本處理

訓練樣本要保持和上一個深度學習之經驗和訓練集（訓練中英文樣本）的ljspeech的訓練樣本的格式。

樣本地址

連結: https://pan.baidu.com/s/1k0auHRQQkSyfGB-nAcwlDA 密碼: 7yyq

訓練核心演算法加群：QQ群：821953467


from __future__ import print_function

import argparse
from datetime import datetime
import json
import os
import sys
import time

import tensorflow as tf
from tensorflow.python.client import timeline

from wavenet import WaveNetModel, AudioReader, optimizer_factory

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

BATCH_SIZE = 1
DATA_DIRECTORY = './VCTK-Corpus'
LOGDIR_ROOT = './logdir'
CHECKPOINT_EVERY = 50
NUM_STEPS = int(1e5)
LEARNING_RATE = 1e-3
WAVENET_PARAMS = './wavenet_params.json'
STARTED_DATESTRING = "{0:%Y-%m-%dT%H-%M-%S}".format(datetime.now())
SAMPLE_SIZE = 100000
L2_REGULARIZATION_STRENGTH = 0
SILENCE_THRESHOLD = 0.3
EPSILON = 0.001
MOMENTUM = 0.9
MAX_TO_KEEP = 5
METADATA = False


def get_arguments():
    def _str_to_bool(s):
        """Convert string to bool (in argparse context)."""
        if s.lower() not in ['true', 'false']:
            raise ValueError('Argument needs to be a '
                             'boolean, got {}'.format(s))
        return {'true': True, 'false': False}[s.lower()]

    parser = argparse.ArgumentParser(description='WaveNet example network')
    parser.add_argument('--batch_size', type=int, default=BATCH_SIZE,
                        help='How many wav files to process at once. Default: ' + str(BATCH_SIZE) + '.')
    parser.add_argument('--data_dir', type=str, default=DATA_DIRECTORY,
                        help='The directory containing the VCTK corpus.')
    parser.add_argument('--store_metadata', type=bool, default=METADATA,
                        help='Whether to store advanced debugging information '
                        '(execution time, memory consumption) for use with '
                        'TensorBoard. Default: ' + str(METADATA) + '.')
    parser.add_argument('--logdir', type=str, default=None,
                        help='Directory in which to store the logging '
                        'information for TensorBoard. '
                        'If the model already exists, it will restore '
                        'the state and will continue training. '
                        'Cannot use with --logdir_root and --restore_from.')
    parser.add_argument('--logdir_root', type=str, default=None,
                        help='Root directory to place the logging '
                        'output and generated model. These are stored '
                        'under the dated subdirectory of --logdir_root. '
                        'Cannot use with --logdir.')
    parser.add_argument('--restore_from', type=str, default=None,
                        help='Directory in which to restore the model from. '
                        'This creates the new model under the dated directory '
                        'in --logdir_root. '
                        'Cannot use with --logdir.')
    parser.add_argument('--checkpoint_every', type=int,
                        default=CHECKPOINT_EVERY,
                        help='How many steps to save each checkpoint after. Default: ' + str(CHECKPOINT_EVERY) + '.')
    parser.add_argument('--num_steps', type=int, default=NUM_STEPS,
                        help='Number of training steps. Default: ' + str(NUM_STEPS) + '.')
    parser.add_argument('--learning_rate', type=float, default=LEARNING_RATE,
                        help='Learning rate for training. Default: ' + str(LEARNING_RATE) + '.')
    parser.add_argument('--wavenet_params', type=str, default=WAVENET_PARAMS,
                        help='JSON file with the network parameters. Default: ' + WAVENET_PARAMS + '.')
    parser.add_argument('--sample_size', type=int, default=SAMPLE_SIZE,
                        help='Concatenate and cut audio samples to this many '
                        'samples. Default: ' + str(SAMPLE_SIZE) + '.')
    parser.add_argument('--l2_regularization_strength', type=float,
                        default=L2_REGULARIZATION_STRENGTH,
                        help='Coefficient in the L2 regularization. '
                        'Default: False')
    parser.add_argument('--silence_threshold', type=float,
                        default=SILENCE_THRESHOLD,
                        help='Volume threshold below which to trim the start '
                        'and the end from the training set samples. Default: ' + str(SILENCE_THRESHOLD) + '.')
    parser.add_argument('--optimizer', type=str, default='adam',
                        choices=optimizer_factory.keys(),
                        help='Select the optimizer specified by this option. Default: adam.')
    parser.add_argument('--momentum', type=float,
                        default=MOMENTUM, help='Specify the momentum to be '
                        'used by sgd or rmsprop optimizer. Ignored by the '
                        'adam optimizer. Default: ' + str(MOMENTUM) + '.')
    parser.add_argument('--histograms', type=_str_to_bool, default=False,
                        help='Whether to store histogram summaries. Default: False')
    parser.add_argument('--gc_channels', type=int, default=None,
                        help='Number of global condition channels. Default: None. Expecting: Int')
    parser.add_argument('--max_checkpoints', type=int, default=MAX_TO_KEEP,
                        help='Maximum amount of checkpoints that will be kept alive. Default: '
                             + str(MAX_TO_KEEP) + '.')
    return parser.parse_args()


def save(saver, sess, logdir, step):
    model_name = 'model.ckpt'
    checkpoint_path = os.path.join(logdir, model_name)
    print('Storing checkpoint to {} ...'.format(logdir), end="")
    sys.stdout.flush()

    if not os.path.exists(logdir):
        os.makedirs(logdir)

    saver.save(sess, checkpoint_path, global_step=step)
    print(' Done.')


def load(saver, sess, logdir):
    print("Trying to restore saved checkpoints from {} ...".format(logdir),
          end="")

    ckpt = tf.train.get_checkpoint_state(logdir)
    if ckpt:
        print("  Checkpoint found: {}".format(ckpt.model_checkpoint_path))
        global_step = int(ckpt.model_checkpoint_path
                          .split('/')[-1]
                          .split('-')[-1])
        print("  Global step was: {}".format(global_step))
        print("  Restoring...", end="")
        saver.restore(sess, ckpt.model_checkpoint_path)
        print(" Done.")
        return global_step
    else:
        print(" No checkpoint found.")
        return None


def get_default_logdir(logdir_root):
    logdir = os.path.join(logdir_root, 'train', STARTED_DATESTRING)
    return logdir


def validate_directories(args):
    """Validate and arrange directory related arguments."""

    # Validation
    if args.logdir and args.logdir_root:
        raise ValueError("--logdir and --logdir_root cannot be "
                         "specified at the same time.")

    if args.logdir and args.restore_from:
        raise ValueError(
            "--logdir and --restore_from cannot be specified at the same "
            "time. This is to keep your previous model from unexpected "
            "overwrites.\n"
            "Use --logdir_root to specify the root of the directory which "
            "will be automatically created with current date and time, or use "
            "only --logdir to just continue the training from the last "
            "checkpoint.")

    # Arrangement
    logdir_root = args.logdir_root
    if logdir_root is None:
        logdir_root = LOGDIR_ROOT

    logdir = args.logdir
    if logdir is None:
        logdir = get_default_logdir(logdir_root)
        print('Using default logdir: {}'.format(logdir))

    restore_from = args.restore_from
    if restore_from is None:
        # args.logdir and args.restore_from are exclusive,
        # so it is guaranteed the logdir here is newly created.
        restore_from = logdir

    return {
        'logdir': logdir,
        'logdir_root': args.logdir_root,
        'restore_from': restore_from
    }


def main():
    args = get_arguments()

    try:
        directories = validate_directories(args)
    except ValueError as e:
        print("Some arguments are wrong:")
        print(str(e))
        return

    logdir = directories['logdir']
    restore_from = directories['restore_from']

    # Even if we restored the model, we will treat it as new training
    # if the trained model is written into an arbitrary location.
    is_overwritten_training = logdir != restore_from

    with open(args.wavenet_params, 'r') as f:
        wavenet_params = json.load(f)

    # Create coordinator.
    coord = tf.train.Coordinator()

    # Load raw waveform from VCTK corpus.
    with tf.name_scope('create_inputs'):
        # Allow silence trimming to be skipped by specifying a threshold near
        # zero.
        silence_threshold = args.silence_threshold if args.silence_threshold > \
                                                      EPSILON else None
        gc_enabled = args.gc_channels is not None
        reader = AudioReader(
            args.data_dir,
            coord,
            sample_rate=wavenet_params['sample_rate'],
            gc_enabled=gc_enabled,
            receptive_field=WaveNetModel.calculate_receptive_field(wavenet_params["filter_width"],
                                                                   wavenet_params["dilations"],
                                                                   wavenet_params["scalar_input"],
                                                                   wavenet_params["initial_filter_width"]),
            sample_size=args.sample_size,
            silence_threshold=silence_threshold)
        audio_batch = reader.dequeue(args.batch_size)
        if gc_enabled:
            gc_id_batch = reader.dequeue_gc(args.batch_size)
        else:
            gc_id_batch = None

    # Create network.
    net = WaveNetModel(
        batch_size=args.batch_size,
        dilations=wavenet_params["dilations"],
        filter_width=wavenet_params["filter_width"],
        residual_channels=wavenet_params["residual_channels"],
        dilation_channels=wavenet_params["dilation_channels"],
        skip_channels=wavenet_params["skip_channels"],
        quantization_channels=wavenet_params["quantization_channels"],
        use_biases=wavenet_params["use_biases"],
        scalar_input=wavenet_params["scalar_input"],
        initial_filter_width=wavenet_params["initial_filter_width"],
        histograms=args.histograms,
        global_condition_channels=args.gc_channels,
        global_condition_cardinality=reader.gc_category_cardinality)

    if args.l2_regularization_strength == 0:
        args.l2_regularization_strength = None
    loss = net.loss(input_batch=audio_batch,
                    global_condition_batch=gc_id_batch,
                    l2_regularization_strength=args.l2_regularization_strength)
    optimizer = optimizer_factory[args.optimizer](
                    learning_rate=args.learning_rate,
                    momentum=args.momentum)
    trainable = tf.trainable_variables()
    optim = optimizer.minimize(loss, var_list=trainable)

    # Set up logging for TensorBoard.
    writer = tf.summary.FileWriter(logdir)
    writer.add_graph(tf.get_default_graph())
    run_metadata = tf.RunMetadata()
    summaries = tf.summary.merge_all()

    # Set up session
    sess = tf.Session(config=tf.ConfigProto(log_device_placement=False))
    init = tf.global_variables_initializer()
    sess.run(init)

    # Saver for storing checkpoints of the model.
    saver = tf.train.Saver(var_list=tf.trainable_variables(), max_to_keep=args.max_checkpoints)

    try:
        saved_global_step = load(saver, sess, restore_from)
        if is_overwritten_training or saved_global_step is None:
            # The first training step will be saved_global_step + 1,
            # therefore we put -1 here for new or overwritten trainings.
            saved_global_step = -1

    except:
        print("Something went wrong while restoring checkpoint. "
              "We will terminate training to avoid accidentally overwriting "
              "the previous model.")
        raise

    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    reader.start_threads(sess)

    step = None
    last_saved_step = saved_global_step
    try:
        for step in range(saved_global_step + 1, args.num_steps):
            start_time = time.time()
            if args.store_metadata and step % 50 == 0:
                # Slow run that stores extra information for debugging.
                print('Storing metadata')
                run_options = tf.RunOptions(
                    trace_level=tf.RunOptions.FULL_TRACE)
                summary, loss_value, _ = sess.run(
                    [summaries, loss, optim],
                    options=run_options,
                    run_metadata=run_metadata)
                writer.add_summary(summary, step)
                writer.add_run_metadata(run_metadata,
                                        'step_{:04d}'.format(step))
                tl = timeline.Timeline(run_metadata.step_stats)
                timeline_path = os.path.join(logdir, 'timeline.trace')
                with open(timeline_path, 'w') as f:
                    f.write(tl.generate_chrome_trace_format(show_memory=True))
            else:
                summary, loss_value, _ = sess.run([summaries, loss, optim])
                writer.add_summary(summary, step)

            duration = time.time() - start_time
            print('step {:d} - loss = {:.3f}, ({:.3f} sec/step)'
                  .format(step, loss_value, duration))

            if step % args.checkpoint_every == 0:
                save(saver, sess, logdir, step)
                last_saved_step = step

    except KeyboardInterrupt:
        # Introduce a line break after ^C is displayed so save message
        # is on its own line.
        print()
    finally:
        if step > last_saved_step:
            save(saver, sess, logdir, step)
        coord.request_stop()
        coord.join(threads)


if __name__ == '__main__':
    main()

訓練結果檢驗

測試文字集

1.三間新的房間很漂亮和乾淨.   
2.音樂是人放鬆和解除煩惱的一種方式.
3.在農村晚上不要經常外出去活動，因為比較漆黑。
4.海水很藍，天空中飛來一群小鳥
5.秋天是一個收貨的季節，老人在忙碌著
6.老大畢竟兩個老人跟著大兒子過活也因為老兩口面上還算公正三兄弟
7.間沒多少齷齪這次葉小麗跑了之後老兩口更是過來幫他忙上忙下馬四
8.妹這幾天乾脆住在這邊幫他帶著孩子加上原身的記憶李生接受起他們
9.氣生了三個都是女兒想到這馬四妹又犯愁老大媳婦不願意再把孩子送
10.心疼不已媽你怎麼讓來弟洗碗李紅心虛的看了李生一眼把孩子遞給老
11.頭片子養大了還不是別人家的實在不行再找戶人家送了馬四妹堅決反
12.了名聲不好聽身世不親白孩子你抱過去養著戶口遷過去關葉小麗什

測試音訊地址

1.wav連結: https://pan.baidu.com/s/12CqB9myfNzzWTJlqAWoJyw 密碼: n3h2
2.wav連結: https://pan.baidu.com/s/1UVGVOyaP2HsIIS2Af2hKVw 密碼: euik
3.wav連結: https://pan.baidu.com/s/1uo7xfenFdGHhwJG3TiGzRQ 密碼: 686w
4.wav連結: https://pan.baidu.com/s/1WSVIZuoDqRYX5md1mMEk5w 密碼: rqkk
5.wav連結: https://pan.baidu.com/s/1MZlOoHHkJ4wGnz_4eQnCng 密碼: 57xa
6.wav連結: 連結: https://pan.baidu.com/s/19Ta399HJ-iOpitisnjs8sw 密碼: 25av
7.wav連結: https://pan.baidu.com/s/1kEikiW5MUAFpUxHuNlZqZg 密碼: hvwu
8.wav連結: https://pan.baidu.com/s/1kEikiW5MUAFpUxHuNlZqZg 密碼: hvwu
9.wav連結: https://pan.baidu.com/s/1_7z8jGef4MfwBMdnNMhmhA 密碼: g51x
10.wav連結: https://pan.baidu.com/s/1uyQ6Cuq0DEhWcW8wLhqhFg 密碼: b3rv
11.wav連結: https://pan.baidu.com/s/1ZSk_chv5PlJtsaLh49JfNg 密碼: acgc
12.wav連結: https://pan.baidu.com/s/1MBm53MtJtPBBYBZx7KIMFw 密碼: ttan

總結

由於本文生成的測試樣本是訓練了5萬多次，誤差還比較大，還需要進一步的訓練。後期的結果肯定回比百度和訊飛的樣本好很多。

QQ交流群

最新google演算法：實現中文TTS的測試結果

簡介本文主要是實現中文的TTS，沒有接入百度、阿里、騰訊和訊飛的API，僅僅依靠自己的訓練演算法和經過樣本處理和測試而成。樣本的製作方法：由於本人時間和金錢的限制，無法找專業的人員錄製大量樣本。本文的解決辦法為：藉助百度語音合成API 神經百度的語音合成API

非對稱加密演算法：實現ssh免密碼登入

**描述：利用非對稱加密演算法，實現兩臺主機之間可以免密碼直接登入，如下圖：** 整個實驗的原理如下圖：具體步驟如下： 1、先準備好實驗需要的環境（兩臺虛擬機器–A:192.168.72.73，B:192.168.72.23),此時，我們遠端登入主機B，

演算法：實現連結串列儲存的迴文字串判斷

題目：如何判斷一個單鏈表結構的字串是否是迴文字串。例如，“123454321”，返回“yes”；“12345”，返回“false” 可執行程式碼：isPalindrome.cpp #include&l

Dijkstra演算法 java實現（含測試）

D演算法的實現(求任意點到其他點的最短距離): package D; import java.util.ArrayList; import java.util.List; /** * @author sean22 * @date 2017/12/13/013.

基於OpenCV的三種光流演算法實現原始碼及測試結果

本文包括基於OpenCV的三種光流演算法的實現原始碼及測試結果。具體為HS演算法，LK演算法，和ctfLK演算法，演算法的原實現作者是Eric Yuan，這裡是作者的部落格主頁：http://eric-yuan.me。本文對這三種光流演算法進行了相關除錯及結果驗證，供大家

機器學習實戰（二）LR演算法：實現簡單的分類模型

說明：，裡面有更詳盡的Logistic Regression原理分析和案例實現流程詳解，是一個關於機器學習實戰的不錯的學習資料，推薦一波。出於程式設計實踐和機器學習演算法梳理的目的，按照自己的程式碼風格重寫該應用案例，在實現的過程中也很有助於自己的思考。為方便下次看時能快速理

MTCNN演算法提速應用（ARM測試結果評估） MTCNN演算法提速應用（ARM測試結果評估）

原 MTCNN演算法提速應用（ARM測試結果評估）置頂 2017年11月02日 10:48:05 samylee 閱讀數：11584

MTCNN演算法提速應用（ARM測試結果評估）

經博主測試，mtcnn原三層網路如果用於工程測試，誤檢情況嚴重，在fddb上測試結果也是，經常將手或者耳朵檢測為人臉，這個很頭疼（因為標註資料！），所以重新訓練顯得尤為重要！博主的改進方法及如何重新

簡單測試--C#實現中文漢字轉拼音首字母

esp chart htm foreach ext ads linq 類庫 play 第一種：這個是自己寫的比較簡單的實現方法，要做漢字轉拼音首字母，首先應該有一個存儲首字母的數組，然後將要轉拼音碼的漢字與每個首字母開頭的第一個漢字即“最小”的漢字作比較，這裏的最小指的是

基於C#實現的自動化測試框架：發布自動觸發自動化回歸測試

exc 時間流測試用例出現服務器 text types filter txt 接口自動化測試用例完成以後，以前都是發布以後手動運行測試用例。雖然手動運行下腳本也就是一個F5的事情，但是離自動化測試的標準差得很遠。這兩天有了個大膽的想法，想要實現以下發布時直接觸發自動化

Android應用程序訪問linux驅動第一步：實現並測試Linux驅動

sizeof 屬性文件 rup sla 沒有 lov /dev/ art kmalloc 一直都想親自做一次使用android應用程序訪問Linux內核驅動的嘗試，但總是沒能做到。最近抽出時間，下決心重新嘗試一次。嘗試的開始當然是先寫一個Linux內核驅動了。我希望

python接口自動化測試二：python代碼實現接口測試

服務獲取解碼 odi false 壓縮詳情異常將不 url = ‘接口地址‘ r = requests.get(url) #發送get請求 print(r.status_code) #打印狀態碼，若有重定向，返回的是重定向

elixir東遊記：實現一個簡單的中文語句解析

program a-z 是個 dsl home class dex tail -c 備份：https://zhuanlan.zhihu.com/p/46030123 代碼地址：github:pyzh/gdpl-ex.poc-1 原語句是：List1為‘123

第八次作業--聚類--K均值演算法：自主實現與sklearn.cluster.KMeans呼叫

import numpy as np x = np.random.randint(1,100,[20,1]) y = np.zeros(20) k = 3 x def initcenter(x, k):#初始聚類中心陣列 return x[:k] kc = initcenter

例子：實現最新版本Node.js中Express+mongodb的登入註冊頁面

由於版本差異巨大且不相容的情況下，作為才開始學習Node.js的菜鳥，書籍上的例子是不能看了，因此仿照著網路大神中的例子自己再歸納總結了一遍，方便自己以後檢視。好記性不如爛筆頭嘛。這裡主要使用的版本是express4.0+mongodb最新版本以及Bootstrap3.0介面所做。一

LeetCode演算法題28：實現strStr()解析

實現 strStr() 函式。給定一個 haystack 字串和一個 needle 字串，在 haystack 字串中找出 needle 字串出現的第一個位置 (從0開始)。如果不存在，則返回 -1。示例1：輸入: haystack = "hello", needle = "ll"

第八次作業-----#聚類--K均值演算法：自主實現與sklearn.cluster.KMeans呼叫

1. 用python實現K均值演算法 K-means是一個反覆迭代的過程，演算法分為四個步驟：（x,k,y) 1）選取資料空間中的K個物件作為初始中心，每個物件代表一個聚類中心； def initcenter(x, k): kc 2）對於樣本中的資料物件，根據它們與這些聚類中心的歐氏距離，按距

谷歌開源整合學習工具AdaNet：2017年提出的演算法終於實現了

曉查編譯整理量子位報道 | 公眾號 QbitAI 最近，谷歌在GitHub上釋出了用TensorFlow實現的AutoML框架——AdaNet，它改進了整合學習的方法，能以最少的專家干預實現自動習得高質量模型。谷歌AI研究團曾在2017年的ICML上提出了AdaNet：人

數學推導+純Python實現機器學習演算法：邏輯迴歸

自本系列第一講推出以來，得到了不少同學的反響和贊成，也有同學留言說最好能把數學推導部分寫的詳細點，筆者只能說盡力，因為打公式實在是太浪費時間了。。本節要和大家一起學習的是邏輯（logistic）迴歸模型，繼續按照手推公式+純 Python 的寫作套路。邏輯迴歸本質上跟邏輯這個詞不是很搭邊，叫這個名字完

實現簡易字串壓縮演算法：由字母a-z或者A-Z組成，將其中連續出現2次以上（含2次）的字母轉換為字母和出現次數，

@Test public void test1(){ String content1 = "AAAAAAAAAAAAAAAAAAAAAAAAttBffgfaaddddddsCDaaaBBBBdddfdsgggggg"; String result = yasuo(content1);

最新google演算法：實現中文TTS的測試結果

簡介

樣本的製作方法：

藉助百度語音合成API

訓練及樣本處理

樣本地址

訓練核心演算法加群：QQ群：821953467

訓練結果檢驗

測試文字集

測試音訊地址

總結

相關推薦