如何在500w個單詞中統計特定字首的單詞有多少個？

阿新 • • 發佈：2018-12-17

之前在公眾號上看到的一個關於演算法的面試題，特在此記錄：

關於演算法的一點記錄

 */
public class DictionaryTree {

    // 字典樹的節點
    private class Node {
        // 是否是單詞
        private boolean isWord;
        // 單詞計數
        private int count;
        // 字串
        private String str;
        // 子節點
        private Map<String, Node> childs;
        // 父節點
        private Node parent;

        public Node() {
            childs = new HashMap<String, Node>();
        }

        public Node(boolean isWord, int count, String str) {
            this();
            this.isWord = isWord;
            this.count = count;
            this.str = str;
        }

        public void addChild(String key, Node node) {
            childs.put(key, node);
            node.parent = this;
        }

        public void removeChild(String key) {
            childs.remove(key);
        }

        public String toString() {
            return "str : " + str + ", isWord : " + isWord + ", count : " + count;
        }
    }

    // 字典樹根節點
    private Node root;

    DictionaryTree() {
        // 初始化root
        root = new Node();
    }

    // 新增字串
    private void addStr(String word, Node node) {

        // 計數
        node.count++;

        String str = node.str;
        if(null != str) {

            // 尋找公共字首
            String commonPrefix = "";
            for(int i=0; i<word.length(); i++) {
                if(str.length() > i && word.charAt(i) == str.charAt(i)) {
                    commonPrefix += word.charAt(i);
                } else {
                    break;
                }
            }

            // 找到公共字首
            if(commonPrefix.length() > 0) {
                if (commonPrefix.length() == str.length() && commonPrefix.length() == word.length()) {
                    // 與之前的詞重複
                } else if(commonPrefix.length() == str.length() && commonPrefix.length() < word.length()) {
                    // 剩餘的串
                    String wordLeft = word.substring(commonPrefix.length());
                    // 剩餘的串去子節點中繼續找
                    searchChild(wordLeft, node);
                } else if(commonPrefix.length() < str.length()) {
                    // 節點裂變
                    Node splitNode = new Node(true, node.count, commonPrefix);
                    // 處理裂變節點的父關係
                    splitNode.parent = node.parent;
                    splitNode.parent.addChild(commonPrefix, splitNode);
                    node.parent.removeChild(node.str);
                    node.count--;
                    // 節點裂變後的剩餘字串
                    String strLeft = str.substring(commonPrefix.length());
                    node.str = strLeft;
                    splitNode.addChild(strLeft, node);
                    // 單詞裂變後的剩餘字串
                    if(commonPrefix.length() < word.length()) {
                        splitNode.isWord = false;
                        String wordLeft = word.substring(commonPrefix.length());
                        Node leftNode = new Node(true, 1, wordLeft);
                        splitNode.addChild(wordLeft, leftNode);
                    }
                }
            } else {
                // 沒有共同字首，直接新增節點
                Node newNode = new Node(true, 1, word);
                node.addChild(word, newNode);
            }
        } else {
            // 根結點
            if(node.childs.size() > 0) {
                searchChild(word, node);
            } else {
                Node newNode = new Node(true, 1, word);
                node.addChild(word, newNode);
            }
        }
    }

    // 在子節點中新增字串
    public void searchChild(String wordLeft, Node node) {
        boolean isFind = false;
        if(node.childs.size() > 0) {
            // 遍歷孩子
            for(String childKey : node.childs.keySet()) {
                Node childNode = node.childs.get(childKey);
                // 首字母相同，則在該子節點繼續新增字串
                if(wordLeft.charAt(0) == childNode.str.charAt(0)) {
                    isFind = true;
                    addStr(wordLeft, childNode);
                    break;
                }
            }
        }
        // 沒有首字母相同的孩子，則將其變為子節點
        if(!isFind) {
            Node newNode = new Node(true, 1, wordLeft);
            node.addChild(wordLeft, newNode);
        }
    }

    // 新增單詞
    public void add(String word) {
        addStr(word, root);
    }

    // 在節點中查詢字串
    private boolean findStr(String word, Node node) {
        boolean isMatch = true;
        String wordLeft = word;
        String str = node.str;
        if(null != str) {
            // 字串與單詞不匹配
            if(word.indexOf(str) != 0) {
                isMatch = false;
            } else {
                // 匹配，則計算剩餘單詞
                wordLeft = word.substring(str.length());
            }
        }
        // 如果匹配
        if(isMatch) {
            // 如果還有剩餘單詞長度
            if(wordLeft.length() > 0) {
                // 遍歷孩子繼續找
                for(String key : node.childs.keySet()) {
                    Node childNode = node.childs.get(key);
                    boolean isChildFind = findStr(wordLeft, childNode);
                    if(isChildFind) {
                        return true;
                    }
                }
                return false;
            } else {
                // 沒有剩餘單詞長度，說明已經匹配完畢，直接返回節點是否為單詞
                return node.isWord;
            }
        }
        return false;
    }

    // 查詢單詞
    public boolean find(String word) {
        return findStr(word, root);
    }

    // 統計子節點字串單詞數
    private int countChildStr(String prefix, Node node) {
        // 遍歷孩子
        for(String key : node.childs.keySet()) {
            Node childNode = node.childs.get(key);
            // 匹配子節點
            int childCount = countStr(prefix, childNode);
            if(childCount != 0) {
                return childCount;
            }
        }
        return 0;
    }

    // 統計字串單詞數
    private int countStr(String prefix, Node node) {
        String str = node.str;
        // 非根結點
        if(null != str) {
            // 字首與字串不匹配
            if(prefix.indexOf(str) != 0 && str.indexOf(prefix) != 0) {
                return 0;
            // 字首匹配字串，且字首較短
            } else if(str.indexOf(prefix) == 0) {
                // 找到目標節點，返回單詞數
                return node.count;
            // 字首匹配字串，且字串較短
            } else if(prefix.indexOf(str) == 0) {
                // 剩餘字串繼續匹配子節點
                String prefixLeft = prefix.substring(str.length());
                if(prefixLeft.length() > 0) {
                    return countChildStr(prefixLeft, node);
                }
            }
        } else {
            // 根結點，直接找其子孫
            return countChildStr(prefix, node);
        }
        return 0;
    }

    // 統計字首單詞數
    public int count(String prefix) {
        // 處理特殊情況
        if(null == prefix || prefix.trim().isEmpty()) {
            return root.count;
        }
        // 從根結點往下匹配
        return countStr(prefix, root);
    }

    // 列印節點
    private void printNode(Node node, int layer) {
        // 層級遞進
        for(int i=0; i<layer; i++) {
            System.out.print("\t");
        }
        // 列印
        System.out.println(node);
        // 遞迴列印子節點
        for (String str : node.childs.keySet()) {
            Node child = node.childs.get(str);
            printNode(child, layer + 1);
        }
    }

    // 列印字典樹
    public void print() {
        printNode(root, 0);
    }

}

主方法

/**
 * @author xiaoshi on 2018/10/5.
 */
public class Main {

    public static void main(String[] args) {

        DictionaryTree dt = new DictionaryTree();

        dt.add("interest");
        dt.add("interesting");
        dt.add("interested");
        dt.add("inside");
        dt.add("insert");
        dt.add("apple");
        dt.add("inter");
        dt.add("interesting");

        dt.print();

        boolean isFind = dt.find("inside");
        System.out.println("find inside : " + isFind);

        int count = dt.count("inter");
        System.out.println("count prefix inter : " + count);

    }

}

執行結果

str : null, isWord : false, count : 8
    str : apple, isWord : true, count : 1
    str : in, isWord : false, count : 7
        str : ter, isWord : true, count : 5
            str : est, isWord : true, count : 4
                str : ing, isWord : true, count : 2
                str : ed, isWord : true, count : 1
        str : s, isWord : false, count : 2
            str : ert, isWord : true, count : 1
            str : ide, isWord : true, count : 1
find inside : true
count prefix inter : 5

公眾號貼在下面了：

如何在500w個單詞中統計特定字首的單詞有多少個？

之前在公眾號上看到的一個關於演算法的面試題，特在此記錄：關於演算法的一點記錄 */ public class DictionaryTree { // 字典樹的節點 private class Node { // 是否是單詞

C#中統計一個數組有多少個數字重複

在寫單機鬥地主專案出牌系統時隨手寫的統計程式碼 int[] intArray = new int[]{ 3,3,3,9,2,2,2,8}; Console.WriteLine(""); int num=0; for

資料結構面試題總結6——陣列：求兩個陣列中滿足給定和的兩個元素

問題描述：在兩個有序陣列中，分別找出a，b兩個元素滿足a+b = c，c已知。分析：我們要用有序這個有利條件，避免迴圈中套迴圈。我們用兩個索引i,j分別指向陣列A,B的首尾，根據比較的結果來移動索引的位置。如果 A[i] + B[j] < c , i+

統計一行文字的單詞個數（15 分）本題目要求編寫程式統計一行字元中單詞的個數。所謂“單詞”是指連續不含空格的字串，各單詞之間用空格分隔，空格數可以是多個。輸入格式: 輸入給出一行字元。輸出格式: 在一行中輸出單詞個數。輸入樣例: Let's go to room 209. 輸出樣例

MD,一開始就想著怎麼用空格和結尾前判斷字母來計算寫的頭的爆了，反過來判斷空格後面是否有 =‘ ’就尼瑪容易多了 #include<stdio.h> #include<stdlib.h> #include<string.h> int

用python統計多個文字中你想統計的單詞

import collections #計數器 import os import string path = "/Users/U/workspace/python learning/show-me-

一篇文章有若干行，以空行作為輸入結束的條件。統計一篇文章中單詞the(不管大小寫，單詞the是由空格隔開的)的個數。

#include <iostream>using namespace std; int k = 0;int n = 0;int main() { 　　char c;　　　　char a[1000]; 　　do 　　{ 　　　　cin.get(c); 　　　　if(c>='A'&

從控制檯輸入若干個單詞（輸入回車結束）放入集合中，將這些單詞排序後（忽略大小寫）打印出來。 [選做題]

import java.util.Arrays; import java.util.Collections; import java.util.Comparator; import java.util.List; import java.util.Scanner; public cl

如何用Python實現任一個英文的純文字檔案，統計其中的單詞出現的個數？

import re file_name = 'test.txt' lines_count = 0 words_count = 0 chars_count = 0 words_dict = {}

語料中篩選出英文單詞並統計詞頻，正則切割匹配

1.正則的使用匹配2.dic.setdefault()的使用3、內建函式enumerate(sequence,start=0)的使用4、內建函式sorted(),key,reversed引數設定5、str.lower()string大小寫轉換#coding:utf-8 im

英語知識系列：26個字母在單詞中的發音總結

整理了26個字母在單詞中的發音，如下（僅供參考）母音為綠色字母字母音其他發音 A [ei] ei/ æ/a B [bi:] b C [si:] k/s D [di:] d/dʒ E [i:]

c語言中統計輸入的行數、單詞數與字元數

來源：《c程式設計語言》功能需求：寫個函式，用於統計輸入的行數、單詞數與字元數。這裡對單詞的定義比較寬鬆，它是任何其中不包含空格、製表符或換行符的字元序列。程式碼如下： #include <stdio.h> #define IN 1 #define OUT 0 void wc()

Java中統計字元出現個數和單詞出現個數（Map解決）

*統計每個字元/單詞出現的次數 */ publicstaticvoid count_word(){ scanner = new Scanner(System.in);

統計一句話中重複字元、單詞的個數，HashMap，Queue List

//統計一句話中重複字元的個數(Queue)----------------------------- package day081702; import java.util.ArrayList; import java.util.HashMap; impor

第 0004 題：任一個英文的純文字檔案，統計其中的單詞出現的個數。

import os os.chdir('C:/workspace') def count_words(inputname): fh=open(inputname)

第 0004 題：任一個英文的純文字檔案，統計其中的單詞出現的個數

1.建立一個列表，用來存放檔案中的字串 2.使用正則表示式來抓取英文單詞 3.對單詞進行計數統計 from collections import Counter import re def ceate_list(filename): datalist[]=0

譚浩強 C程序設計 8.10寫一函數，輸入一行字符，將此字符串中最長的單詞輸出。

就會譚浩強設計 nbsp get urn log i++ 代碼代碼量稍微一大，就會出現bug，浪費很多時間，繼續努力。 #include <stdio.h> void main(){ void maxLen(char str[50]);

10.16輸入一個字符串，內有數字和非數字字符，如： a123x456 17960? 302tab5876 將其中連續的數字作為一個整數，依次存放到一數組num中。例如123放在num[0]中，456放在num[1]中……統計共有多少個整數，並輸出這些數。

tab lnp zip sm2 cuc ycm rds qt5 tft 10.16輸入一個字符串，內有數字和非數字字符，如： a123x456 17960? 302tab5876 將其中連續的數字作為一個整數，依次存放到一數組num中。例

如何在500w個單詞中統計特定字首的單詞有多少個？

如何在500w個單詞中統計特定字首的單詞有多少個？

C#中統計一個數組有多少個數字重複

資料結構面試題總結6——陣列：求兩個陣列中滿足給定和的兩個元素

用python統計多個文字中你想統計的單詞

一篇文章有若干行，以空行作為輸入結束的條件。統計一篇文章中單詞the(不管大小寫，單詞the是由空格隔開的)的個數。

從控制檯輸入若干個單詞（輸入回車結束）放入集合中，將這些單詞排序後（忽略大小寫）打印出來。 [選做題]

如何用Python實現任一個英文的純文字檔案，統計其中的單詞出現的個數？

語料中篩選出英文單詞並統計詞頻，正則切割匹配

英語知識系列：26個字母在單詞中的發音總結

c語言中統計輸入的行數、單詞數與字元數

Java中統計字元出現個數和單詞出現個數（Map解決）

統計一句話中重複字元、單詞的個數，HashMap，Queue List

第 0004 題：任一個英文的純文字檔案，統計其中的單詞出現的個數。

第 0004 題：任一個英文的純文字檔案，統計其中的單詞出現的個數

譚浩強 C程序設計 8.10寫一函數，輸入一行字符，將此字符串中最長的單詞輸出。

10.16輸入一個字符串，內有數字和非數字字符，如： a123x456 17960? 302tab5876 將其中連續的數字作為一個整數，依次存放到一數組num中。例如123放在num[0]中，456放在num[1]中……統計共有多少個整數，並輸出這些數。

[LeetCode] Longest Word in Dictionary 字典中的最長單詞

任意一個英文的純文本文件，統計其中的單詞出現的個數（shell python 兩種語言實現）

720. Longest Word in Dictionary 字典中最長的單詞

如何在500w個單詞中統計特定字首的單詞有多少個？

相關推薦