MonogoDB 練習一
阿新 • • 發佈:2017-12-08
測試 處理 sse pod you rdf gho first 獲取
1.解析文件,僅處理 FIELDS 字典中作為鍵的字段,並返回清理後的值字典列表
需求:
1.根據 FIELDS 字典中的映射更改字典的鍵
2.刪掉“rdf-schema#label”中的小括號裏的多余說明,例如“(spider)”
3.如果“name”為“NULL”,或包含非字母數字字符,將其設為和“label”相同的值
4.如果字段的值為“NULL”,將其轉換為“None”
5.如果“synonym”中存在值,應將其轉換為數組(列表),方法是刪掉“{}”字符,並根據“|” 拆分字符串。剩下的清理方式將由你自行決定,例如刪除前綴“*”等。如果存在單數同義詞,值應該依然是列表格式。
6.刪掉所有字段前後的空格(如果有的話)
7.輸出結構應該如下所示
[ { ‘label‘: ‘Argiope‘, ‘uri‘: ‘http://dbpedia.org/resource/Argiope_(spider)‘, ‘description‘: ‘The genus Argiope includes rather large and spectacular spiders that often ...‘, ‘name‘: ‘Argiope‘, ‘synonym‘: ["One", "Two"], ‘classification‘: {‘family‘: ‘Orb-weaver spider‘, ‘class‘: ‘Arachnid‘, ‘phylum‘: ‘Arthropod‘, ‘order‘: ‘Spider‘, ‘kingdom‘: ‘Animal‘, ‘genus‘: None } }, { ‘label‘: ... , }, ... ]
importcodecs import csv import json import pprint import re DATAFILE = ‘arachnid.csv‘ FIELDS ={‘rdf-schema#label‘: ‘label‘, ‘URI‘: ‘uri‘, ‘rdf-schema#comment‘: ‘description‘, ‘synonym‘: ‘synonym‘, ‘name‘: ‘name‘, ‘family_label‘: ‘family‘, ‘class_label‘: ‘class‘, ‘phylum_label‘: ‘phylum‘, ‘order_label‘: ‘order‘, ‘kingdom_label‘: ‘kingdom‘, ‘genus_label‘: ‘genus‘} def process_file(filename, fields): #獲取FIELDS字典的keys列表 process_fields = fields.keys()
#存放結果集 data = [] with open(filename, "r") as f: reader = csv.DictReader(f)
#跳過文件中的前3行 for i in range(3): l = reader.next() #讀文件 for line in reader: # YOUR CODE HERE
#存放總的字典 res = {}
#存放key是classification的子字典 res[‘classification‘] = {}
#循環FIELDS字典的keys for field in process_fields:
#獲取excel中key所對應的val,條件1 tmp_val = line[field].strip()
#生成json數據的新key,即是FIELDS字典的value new_key = FIELDS[field]
#條件4 if tmp_val == ‘NULL‘: tmp_val = None
#條件2 if field == ‘rdf-schema#label‘: tmp_val = re.sub(r‘\(.*\)‘,‘‘,tmp_val).strip()
#條件3 if field == ‘name‘ and line[field] == ‘NULL‘: tmp_val = line[‘rdf-schema#label‘].strip()
#條件5 if field == ‘synonym‘ and tmp_val: tmp_val = parse_array(line[field])
#子字典中所包含的的key if new_key in [‘kingdom‘,‘family‘,‘order‘,‘phylum‘,‘genus‘,‘class‘]:
#子字典中所包含的的key的value res[‘classification‘][new_key] = tmp_val continue
#將新的key和val放入到res中,然後加入到列表中返回 res[new_key] = tmp_val data.append(res) return data def parse_array(v):
#解析數組
#如果以{開頭和}結尾,刪除左右的{},並以|進行分割,最後去除每一個項的空格,返回 if (v[0] == "{") and (v[-1] == "}"): v = v.lstrip("{") v = v.rstrip("}") v_array = v.split("|") v_array = [i.strip() for i in v_array] return v_array return [v]
def test():
#測試函數,如果不出錯,結果正確 data = process_file(DATAFILE, FIELDS) print "Your first entry:" pprint.pprint(data[0]) first_entry = { "synonym": None, "name": "Argiope", "classification": { "kingdom": "Animal", "family": "Orb-weaver spider", "order": "Spider", "phylum": "Arthropod", "genus": None, "class": "Arachnid" }, "uri": "http://dbpedia.org/resource/Argiope_(spider)", "label": "Argiope", "description": "The genus Argiope includes rather large and spectacular spiders that often have a strikingly coloured abdomen. These spiders are distributed throughout the world. Most countries in tropical or temperate climates host one or more species that are similar in appearance. The etymology of the name is from a Greek name meaning silver-faced." } assert len(data) == 76 assert data[0] == first_entry assert data[17]["name"] == "Ogdenia" assert data[48]["label"] == "Hydrachnidiae" assert data[14]["synonym"] == ["Cyrene Peckham & Peckham"] if __name__ == "__main__": test()
MonogoDB 練習一