WordCountPro 小計

阿新 • • 發佈：2018-04-09

bat operation 文本構造 UNC IE sed .... 重復出現

1、項目代碼：

WordCountPro GitHub

Contributor	Commits
李露陽	14 ( 249++ 265--)
魯平	21 (339++ 92--)
蔣誌遠	18 (1035++ 339--)

2、PSP

PSP2.1	PSP階段	預估耗時實際耗時（分鐘）	實際耗時（分鐘）
Planning	計劃	15	23
Estimate	估計這個任務需要多少時間	10	10
Development	開發	530	892
- Analysis	- 需求分析（包括學習新技術）	100	339
- Design Spec	- 生成設計文檔	100	150
- Coding Standard	- 代碼規範 (為目前的開發制定合適的規範)	10	8
- Design	- 具體設計	30	23
- Coding	- 具體編碼	200	220
- Code Review	- 代碼復審	30	34
- Test	- 測試（自我測試，修改代碼，提交修改）	100	120
Reporting	報告	190	309
- Test Report	- 測試報告	60	65
- Size Measurement	- 計算工作量	10	12
- Postmortem & Process Improvement Plan	- 事後總結, 並提出過程改進計劃	120	232
	合計	780	1272

3、模塊劃分

我們將程序劃分成兩個大模塊，分別管控 IO 、核心功能的實現。

各模塊設計如下：

1. Main

/**
 * com.hust.wcPro
 * Created by Blues on 2018/3/27.
 */

import java.util.HashMap;

public class Main {
    static public void main(String[] args) {

        IOController io_control = new IOController();
        
        String valid_file = io_control.get 
(args);
        if (valid_file.equals("")) {
            return ;
        }
        
        WordCounter wordcounter = new WordCounter();
        
        HashMap<String, Integer> result = wordcounter.count(valid_file);

        io_control.save(result);

    }
}

Main函數負責所有接口的調用，邏輯很簡單，即IO獲取有效的文件參數，調用 WordCounter 類的核心函數，IO 將結果排序後存入 result.txt 中。

2. IOController

IOController 類負責管控 io，具體設計如下：

class IOController {
    IOController() {}
    
    /**
     * Parses the main function arguments
     * 
     * @param args the main function arguments
     * @return a valid file name
     */
    public String get(String[] args);

    /**
     * Saves the result sorted
     * 
     * @param result the result contain word as key as count as value
     * @return the state code of operation
     */
    public int save(HashMap<String, Integer> result);
}

get() 負責解析主函數的參數，返回一個合法的，存在的文件名。
save() 負責將輸出傳入的結果排序後輸出到 result.txt 文件中。

3. WordCounter

public class WordCounter {
    
    WordCounter() {
    }
    
    /**
     * Counts the words in the specific file
     * 
     * @param filename the file to be counted
     * @return the result saves the word(lowercased) as key and count as value
     */
    public HashMap<String, Integer> count(String filename);
}

WordCounter 類負責實現核心功能 count() 函數，負責統計傳入的文件中的各字符的數量，結果以 Map 的形式返回。

4、項目管理

為了能高效的合作以及更好的項目管理，我們選擇使用 Gradle 進行項目的管理以及依賴管理，使用也可以更好的使用 Junit5 進行單元測試。因為多成員合作，我們使用 Git 進行源代碼管理。

其中，Gradle 的配置文件 build.gradle 內容如下，可供參考：

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath 'org.junit.platform:junit-platform-gradle-plugin:1.1.0'
    }
}

plugins {
    id 'com.gradle.build-scan' version '1.12.1'
    id 'java'
    id 'eclipse'
    id 'idea'
    id 'maven'
}

buildScan {
    licenseAgreementUrl = "https://gradle.com/terms-of-service"
    licenseAgree = "yes"
}

apply plugin: 'org.junit.platform.gradle.plugin'

int javaVersion = Integer.valueOf((String) JavaVersion.current().getMajorVersion())
if (javaVersion < 10) apply plugin: 'jacoco'

jar {
    baseName = 'wcPro'
    version = '0.0.1'
    manifest {
        attributes 'Main-Class': 'Main'
    }
}

repositories {
    mavenCentral()
}

dependencies {
    testCompile (
        'org.junit.jupiter:junit-jupiter-api:5.0.3',
        'org.json:json:20090211'
    )

    testRuntime(
        'org.junit.jupiter:junit-jupiter-engine:5.0.3',
        'org.junit.vintage:junit-vintage-engine:4.12.1',
        'org.junit.platform:junit-platform-launcher:1.0.1',
        'org.junit.platform:junit-platform-runner:1.0.1'
    )
}

task wrapper(type: Wrapper) {
    description = 'Generates gradlew[.bat] scripts'
    gradleVersion = '4.6'
}

5、測試

1、單元測試

單元測試我們測試的粒度是到接口，因為項目主要包含 3 個大的接口，所以我們要對其分別進行測試。主要接口：

IOController.get()
IOController.save()
WordCounter.count()

我們設計了 UnitTest 類來進行接口測試，測試內容如下：

class UnitTest {

    UnitTest() {}

    private String getTestResourcePath() {
        String path = "build/resources/test/";
        String osName = System.getProperty("os.name").toLowerCase();
        if (osName.startsWith("win")) {
            path.replace('/', '\\');
        }
        return path;
    }

    @Test
    void testSortMap() {
        //...
    }

    @Test
//    @DisplayName("Custom test name containing spaces")
    @DisplayName("Custom test file that doesn't exist")
    void testIOHandling() {
        //...
    }

    /**
     * Use reflection to test {@code private} method {@isEngChar()}
     */
    @Test
    void testIsEngChar() {
       //...
    }

    /**
     * Use reflection to test {@code private} method {@isHyphen()}
     */
    @Test
    void testIsHyphen() {
        //...
    }


    String fileParentPath = "src/test/resources/";

    @Test
    void testCountEmptyFile() {
        //...
    }

    @Test
    @DisplayName("Border test: wc.count(endWithHyphen.txt)")
    void testCountFileEndWithHyphen() {
        //...
    }

    @Test
    @DisplayName("Bord test: wc.count(startWithHyphen.txt)")
    void testCountFileStartWithHyphen() {
        //...
    }

    @Test
    @DisplayName("Bord test: wc.count(startWithHyphen.txt)")
    void testNumberStartWithHyphen() {
        //...
    }

    @Test
    @DisplayName("Bord test: wc.count(startWithHyphen.txt)")
    void testCountFileWithQuatation() {
        //...
    }


    @Test
    void testCountHyphen() {
        //...
    }

    @Test
    @DisplayName("Border test: single quotation mark")
    void testCountSingleQuotationMark() {
        String fileName = "singleQuotationMark.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(2, result.size());
    }

    @Test
    @DisplayName("Border test: single quotation mark")
    void testCountFileWithContinuedHyphen() {
        //...
    }

    @Test
    @DisplayName("Border test: single quotation mark")
    void testFileWithContinuedHyphen() {
        //...
    }


    @Test
    @DisplayName("Border test: double quotation mark")
    void testCountDoubleQuotationMark() {
        //...
    }

    @Test
    @DisplayName("Border test: word with number")
    void testCountWordWithNumber() {
        //...
    }

    @Test
    @DisplayName("Border test: word with multiple kinds of char")
    void testCountMultiple() {
        //...
    }

}

上訴測試主要利用了 Junit5 測試引擎，在配置方面踩了不少的坑，從項目管理工具的選用到配置文件的編寫。最後到測試用例的設計。設計測試時我們使用了白盒測試的方法，針對程序的各個分支以及狀態設計了上述測試用例。

對於私有方法的測試，我們使用了反射的方式來進行訪問測試，完整代碼參考這裏。

2、靜態測試

靜態測試我們借助了 intelliJ 的 Alibaba P3C 的 idea 插件來完成。

在檢查過程中發現以下錯誤：

技術分享圖片

這個錯誤提示的是命名規範錯誤，但是針對 IO 一詞我局的並不需要進行駝峰寫法，這裏我們選擇以誤報處理。

技術分享圖片

這個錯誤提示的很好，因為在 nowWord.equals("") 的寫法中，如果當 nowWord 變量是空指針是，會崩潰，而換一種寫法 "".equals(nowWord) 則更加安全。

3、黑盒測試

為了能高效進行測試，我們采用了自動化腳本的方式進行測試能更好的進行壓力測試。

首先我們需要大量的、正確的測試用例，每個測試用例的大小必須要足夠大、內容也要保證正確。為此，手寫測試用例是絕對不實際的，所以我們需要自動生成正確的測試用例。為了達到這個目的，我們用 Python 寫了一個簡單的腳本，用來自動生成測試用例，內容隨機但是大小可控：

from functools import reduce
import numpy as np
from numpy.random import randint
import json
import sys, os, re

elements = {
    "words": "abcdefghijklmnopqrstuvwxyz-",
    "symbol": "!@#$%^&*()~`_+=|\\:;\"'<>?/ \t\r\n1234567890-"
}

def generate_usecase(configs):
    global elements
    
    path = os.path.join('test', 'testcase')
    result_path = os.path.join('test', 'result')
    if not os.path.exists(path):
        os.makedirs(path)
    if not os.path.exists(result_path):
        os.makedirs(result_path)
    for config_idx, config in enumerate(configs):
        word_dict = {}
        i = 0
        # 這裏用於生成一個合法的單詞
        while i < config['num_of_type']:
            word_len = randint(*config['word_size'])
            word_elements = randint(0, len(elements['words']), word_len)
            word = np.array(list(elements['words']))[word_elements]
            word = ''.join(word)
            # 這裏將單詞中不合法的 ‘-’ 轉化刪除掉
            word = re.sub(r'-{2,}','-', word)
            word = re.sub(r'^-*', '', word)
            word = re.sub(r'-*$', '', word)
            if len(word) == 0: # 運氣不好全是 ‘-’ 那麽單詞生成失敗，從新生成單詞 
                continue
            word_dict[word] = 0
            i += 1
        total_count = 0
        # 設置單詞重復出現的次數
        for key in word_dict.keys():
            word_dict[key] = randint(*config['word_repeat'])
            total_count += word_dict[key]
        word_dict_tmp = word_dict.copy()
        final_string = ''
        # 構造最終的用例文本
        for i in range(total_count):
            key, val = None, 0
            while (val == 0):
                key_tmp = list(word_dict_tmp.keys())[randint(len(word_dict))]
                val = word_dict_tmp[key_tmp]
                if val != 0:
                    key = key_tmp
                    word_dict_tmp[key_tmp] = val-1
            # 這裏將單詞的內容隨機大小寫
            word_upper_case = randint(0, 2, len(key))
            key = ''.join([s.upper() if word_upper_case[i] > 0 else s for i, s in enumerate(list(key))])
            final_string += key
            sep = ''
            # 構造合法的分隔符
            for _ in range(randint(*config['sep_size'])):
                sep += elements['symbol'][randint(0, len(elements['symbol']))]
            if sep == '-':
                while sep == '-':
                    sep = elements['symbol'][randint(0, len(elements['symbol']))]
            final_string += sep

        with open(os.path.join(path, '{}_usecase.txt').format(config_idx), 'w') as f:
            f.write(final_string)
                   
        sorted_key = sorted(word_dict.items(), key=lambda kv:(-kv[1], kv[0]))
        result = ''
        for key, val in sorted_key:
            result += key + ': ' + str(val) + '\n'

        with open(os.path.join(path, '{}_result_true.txt'.format(config_idx)), 'w') as f:
            f.write(result)

        print('test case {} generated'.format(config_idx))

def main():
    config = sys.argv[-1]
    with open(config) as f:
        config = json.load(f)
    
    generate_usecase(config)

if __name__ == '__main__':
    main()

其中的配置文件如下：

[
    {
        "num_of_type": 10,
        "word_size": [1, 10],
        "sep_size": [1,3],
        "word_repeat": [1, 300]
    },
    {
        "num_of_type": 20,
        "word_size": [1, 20],
        "sep_size": [1,3],
        "word_repeat": [20, 300]
    }
]

內容很簡單，只需要配置有多少個單詞，每個單詞長度範圍，分隔符的長度範圍，每個單詞重復出現的大小範圍，即可生成相應的測試用例和正確的排序後的結果。

..........
YMtyibqY
zxz*^QtRWv*O=3KDvJKmpQb86MThOdnP
ZXZ>#aAys>&mthodnP>`qtRWv(QTRWV*YmTYiBqY^\O9Zxz_?MthOdNP$ zxZ="MtHODnP#!yMTYibqY:o%2AaYS<#QTRwV8MTHOdnp!o#+MTHodNP)*QTRWV;YmtyiBQY  ZXz$hesS`aayS_#FKcU=)AAys;fKcu-$Z$MthoDnp
 YMTYIBqy/3aAyS!Zxz'yMtyiBQY~1KdvjKMpQB'@aAYs'Z'zXZ3z2hESs5aAys@yMtyiBQy4qtRWV3kDvJKMpQB:9yMTyIbqy_YmtyIBqY
KdvJKmpqB>YMtYibQy
>z2O
z`^FKCu$<QTRwv#<mtHOdnP%z+z"*FKCu9hESs<fkcu!YMtYiBqY"HesS9MtHODNp
ZxZ
.........

??上面是自動生成的用例的部分內容。

mthodnp: 287
o: 253
aays: 250
kdvjkmpqb: 232
fkcu: 170
qtrwv: 151
ymtyibqy: 133
hess: 67
zxz: 52
z: 32

??上面是生成的正確答案。

測試用例已經生成好了，要做的就是讓他能自動運行以及統計運行時間了，所以我設計的一下的腳本來完成這個費事的工作，內容在項目的 build.sh 中：

echo '--------- building jar -----------'
gradle build -x test

echo '------ generating test case ------'
python ./scripts/testcase_generate.py ./scripts/config.json

echo '-------- setting test env --------'
cp ./build/libs/* ./test
echo 'jar copyed to ./test'
echo '----------- testing --------------'
declare -i num_test
num_test=($(ls -l ./test/testcase | wc -l)-1)/2
echo 'number of test:' ${num_test}
cd test
jarname=$(ls | grep *.jar)
declare -i correct_cnt
correct_cnt=0
echo 'testing ' ${jarname}
num_test=num_test-1
for i in $(seq 0 ${num_test})
do
    start=`python -c 'import time; print (time.time())'`
    java -jar ${jarname} ./testcase/${i}_usecase.txt
    end=`python -c 'import time; print (time.time())'`
    cmp result.txt ./testcase/${i}_result_true.txt
    if [ ${?} == 0 ]; then
        correct_cnt=correct_cnt+1
        echo 'test ' $i ' passed...time: ' `bc <<< $end-$start`
    else
        echo 'test ' $i ' failed...time: ' `bc <<< $end-$start`
    fi
    mv result.txt ./result/${i}_result.txt
done
num_test=num_test+1
echo 'test passed: ' ${correct_cnt} 'total: ' ${num_test}
cd ..

運行結果如下：

--------- building jar -----------
Starting a Gradle Daemon (subsequent builds will be faster)

BUILD SUCCESSFUL in 4s
2 actionable tasks: 2 executed
------ generating test case ------
test case 0 generated
test case 1 generated
test case 2 generated
test case 3 generated
test case 4 generated
test case 5 generated
test case 6 generated
test case 7 generated
test case 8 generated
-------- setting test env --------
jar copyed to ./test
----------- testing --------------
number of test: 9
testing  wcPro-0.0.1.jar
test  0  passed...time:  0.160000085831
test  1  passed...time:  0.200000047684
test  2  passed...time:  0.299999952316
test  3  passed...time:  0.569999933243
test  4  passed...time:  1.80999994278
test  5  passed...time:  2.25999999046
test  6  passed...time:  0.390000104904
test  7  passed...time:  0.269999980927
test  8  passed...time:  0.230000019073
test passed:  9 total:  9

由此可以完成自動化的黑盒測試，可以及時查看運行時間以及正確性。

6、代碼評審

我們對核心功能 WordCounter 的 count() 方法進行了代碼評審，參考靜態測試給出的結果，發現有很多編碼習慣的問題需要改進，對於一些不安全的操作，比如利用單個字符讀取的過程中，循環須先判斷是否為 -1 。還發現對文件的遍歷方式和情況進行了討論，商討是否一個一個字符的讀取分析會使用太多 IO 時間。

7、代碼優化

針對黑盒測試的結果，我們跑 benchmark 進行測試，在黑盒測試中我們的運行情況

TestCase	Size	Time / s
0	4K	0.16
1	56K	0.20
2	510K	0.30
3	3.5M	0.57
4	17.7M	1.81
5	25.4M	2.26
6	1.5M	0.39
7	300K	0.27
8	167K	0.23

因為對於程序的處理，我們之遍歷了一遍文件並且沒有進行回溯，也沒有進行多余的循環以及判斷，所以我們推測性能的瓶頸應該是在 IO 的讀寫上。因為是一個一個字符讀取的處理，每一次讀取字符都要使用 IO 時間，這樣似乎就會拖慢程序。

WordCountPro 小計

bat operation 文本構造 UNC IE sed .... 重復出現 1、項目代碼： WordCountPro GitHub Contributor Commits 李露陽 14 ( 249++ 265--) 魯平 21 (339++ 92--

WordCountPro 小計

1、項目代碼：

2、PSP

3、模塊劃分

1. Main

2. IOController

3. WordCounter

4、項目管理

5、測試

1、單元測試

2、靜態測試

3、黑盒測試

6、代碼評審

7、代碼優化

WordCountPro 小計

python日常小計

C++數組小計

JXLS 2.4.0系列教程（四）——拾遺如何做頁面小計

生活小計171006

ORA-00600 kcratr_nab_less_than_odr 處理小計

R語言自學小計，從零到畫函數圖像

WordCountPro小程序

Checkio代碼闖關小計

Linux命令screen用法小計

三日小計

遠程管理_小計

mha命令小計

Windows後門小計

數學小計--歐幾里得算最大公約數

工作小計（二）

python爬取百度圖片---釋出exe小計編碼是個大坑

行式填報資料校驗 --- 小計校驗

AngularJS消費小計

git常用命令及用法小計

WordCountPro 小計

1、項目代碼：

2、PSP

3、模塊劃分

1. Main

2. IOController

3. WordCounter

4、項目管理

5、測試

1、單元測試

2、靜態測試

3、黑盒測試

6、代碼評審

7、代碼優化

相關推薦