RAT-SQL論文復現——bug總結與復現流程

阿新 • • 發佈：2021-12-18

復現ACL2020論文：RAT-SQL(paper|code)時遇到的bug總結以及自己的復現流程。嘗試了docker之後遇到一堆坑，最後決定直接使用conda環境。

1 BUG總結

1.1 ValueError: Unsupported kind for param args: VAR_POSITIONAL

發生在preprocess時。原因在於pytorch版本過高，可用如下命令安裝pytorch

conda install pytorch==1.3.1 cudatoolkit=10.1

preprocess階段還有類似bug，都可用這個方法解決，最好使用python3.7。

1.2 no space left on device

發生在train時。微軟給的程式碼中儲存的模型檢查點過多，非bert訓練大約需要幾十G，bert則需要幾百G。
需要指定 --logdir 到足夠大的硬碟中，或減少檢查點數量。

1.3 找不到LOGDIR路徑

發生在eval時。在infer.py和eval.py中的__LOGDIR__都被替換為了實際的log路徑,但是在run.py中，沒有被替換，可以把run.py中104行開始的如下兩行程式碼

res_json = json.load(open(eval_output_path))
print(step, res_json['total_scores']['all']['exact'])

替換成如下程式碼

model_config = json.loads(_jsonnet.evaluate_file(
    eval_config.config,
    tla_codes={'args': eval_config.config_args}))
if 'model_name' in model_config:
    logdir = os.path.join(logdir, model_config['model_name'])
    eval_output_path = eval_output_path.replace('__LOGDIR__', logdir)
    res_json = json.load(open(eval_output_path))
    print(step, res_json['total_scores']['all']['exact'])
else:
    logdir = logdir
    eval_output_path = eval_output_path.replace('__LOGDIR__', logdir)
    res_json = json.load(open(eval_output_path))
    print(step, res_json['total_scores'])

1.4 assert next_choices is not None

發生在eval wikisql時，需要把experiments/wikisql-glove-run.jsonnet中第12行的

eval_use_heuristic: true

改為

eval_use_heuristic: false

1.5 AttributeError: 'RMKeyView' object has no attribute 'index'

依舊發生在eval wikisql時，是records包本身的bug。找到path/to/anaconda3/envs/ratsql/lib/python3.7/site-packages/records.py(其中ratsql是conda環境名)
找到第40行keys函式中

return self._keys

改為

return list(self._keys)

1.6 把自定義的包路徑加入conda環境中

此處的包指不能pip install或conda install的包，比如third_party中的wikisql。用export PYTHONPATH等方法加到當前終端(或類似方法加到當前使用者，所有使用者)感覺相當麻煩，我就想加到我的conda環境中，也不影響其他的專案也不影響別人。使用如下命令一行解決。

conda develop /path/to/rat-sql/third_party/wikisql/

2 復現流程

需要知道root密碼。Ubuntu20.04，RTX3090。"/path/to/"表示該檔案或目錄的所在路徑，比如"/home/ps/rat-sql"中，"/path/to/"等於"/home/ps/"。

2.1

sudo su

mkdir -p /usr/share/man/man1 && \
    apt-get update && apt-get install -y \
    build-essential \
    cifs-utils \
    curl \
    default-jdk \
    dialog \
    dos2unix \
    git \
    sudo

exit

2.2

conda create -n ratsql python=3.7
conda activate ratsql
pip install asdl==0.1.5
pip install astor==0.7.1
pip install attrs==18.2.0
pip install babel==2.7.0
pip install bpemb==0.2.11
pip install cython==0.29.1
pip install jsonnet==0.14.0
pip install networkx==2.2
pip install nltk==3.4
pip install pyrsistent==0.14.9
pip install pytest==5.3.2
pip install records==0.5.3
pip install stanford-corenlp==3.9.2
pip install tabulate==0.8.6
conda install pytorch==1.3.1 cudatoolkit=10.1
pip install torchtext==0.3.1
pip install tqdm==4.36.1
pip install transformers==2.3.0
pip install entmax
pip install scikit-learn

2.3

python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
python -c "from transformers import BertModel; BertModel.from_pretrained('bert-large-uncased-whole-word-masking')"

2.4

sudo su

mkdir -p third_party && \
cd third_party && \
curl https://download.cs.stanford.edu/nlp/software/stanford-corenlp-full-2018-10-05.zip | jar xv && \
cd .. && \
git clone https://github.com/salesforce/WikiSQL third_party/wikisql

exit

連不上github可以把https://換成git://

2.5

mkdir -p data && \
cd data && \
cp -r /path/to/data ./ && \
cd ..

2.6

/bin/bash -c 'if compgen -G "/path/to/rat-sql/**/*.sh" > /dev/null; then dos2unix /app/**/*.sh; fi'

2.7

conda develop /path/to/rat-sql/third_party/wikisql

3 執行

python run.py preprocess experiments/spider-glove-run.jsonnet
python run.py train experiments/spider-glove-run.jsonnet
python run.py eval experiments/spider-glove-run.jsonnet

python run.py preprocess experiments/spider-bert-run.jsonnet
python run.py train experiments/spider-bert-run.jsonnet
python run.py eval experiments/spider-bert-run.jsonnet

python run.py preprocess experiments/wikisql-glove-run.jsonnet
python run.py train experiments/wikisql-glove-run.jsonnet
python run.py eval experiments/wikisql-glove-run.jsonnet

RAT-SQL論文復現——bug總結與復現流程

1 BUG總結

1.1 ValueError: Unsupported kind for param args: VAR_POSITIONAL

1.2 no space left on device

1.3 找不到LOGDIR路徑

1.4 assert next_choices is not None

1.5 AttributeError: 'RMKeyView' object has no attribute 'index'

1.6 把自定義的包路徑加入conda環境中

2 復現流程

2.1

2.2

2.3

2.4

2.5

2.6

2.7

3 執行

RAT-SQL論文復現——bug總結與復現流程

CVE-2020-26945 mybatis二級快取反序列化的分析與復現

weblogic CVE-2020-2963、CNVD-2020-23019 反序列化漏洞分析與復現

Victor CMS 未授權sql注入(CVE-2020-29280)漏洞復現

redis未授權漏洞搭建與復現

2021CISCN-逆向-galss復現及總結

Java中的Unsafe在安全領域的一些應用總結和復現

DLink 815路由器棧溢位漏洞分析與復現

MySQL千萬級大資料SQL查詢優化知識點總結

SQL Server遊標的介紹與使用

MySQL中SQL模式的特點總結

MyBatis SQL xml處理小於號與大於號正確的格式

詳解SQL Server中的事務與鎖問題

初探SQL語句複合主鍵與聯合主鍵

python中68個內建函式的總結與介紹

C++11 std::shared_ptr總結與使用示例程式碼詳解

[課程筆記] 武大定量遙感暑期課 - 學習總結與感悟

A Star演算法總結與實現（附Demo)

【學習打卡】JavaScript原型的自我總結與疑惑（一）

CODESYS V3遠端堆溢位漏洞復現（環境配置+復現過程）

RAT-SQL論文復現——bug總結與復現流程

1 BUG總結

1.1 ValueError: Unsupported kind for param args: VAR_POSITIONAL

1.2 no space left on device

1.3 找不到__LOGDIR__路徑

1.4 assert next_choices is not None

1.5 AttributeError: 'RMKeyView' object has no attribute 'index'

1.6 把自定義的包路徑加入conda環境中

2 復現流程

2.1

2.2

2.3

2.4

2.5

2.6

2.7

3 執行

相關推薦

1.3 找不到LOGDIR路徑