1. 程式人生 > >POJ 1451 - T9 - [字典樹]

POJ 1451 - T9 - [字典樹]

line 前綴 create names 題目 $2 eve 解除綁定 lse

題目鏈接:http://bailian.openjudge.cn/practice/1451/

總時間限制: 1000ms  內存限制: 65536kB

描述

Background

A while ago it was quite cumbersome to create a message for the Short Message Service (SMS) on a mobile phone. This was because you only have nine keys and the alphabet has more than nine letters, so most characters could only be entered by pressing one key several times. For example, if you wanted to type "hello" you had to press key 4 twice, key 3 twice, key 5 three times, again key 5 three times, and finally key 6 three times. This procedure is very tedious and keeps many people from using the Short Message Service.

This led manufacturers of mobile phones to try and find an easier way to enter text on a mobile phone. The solution they developed is called T9 text input. The "9" in the name means that you can enter almost arbitrary words with just nine keys and without pressing them more than once per character. The idea of the solution is that you simply start typing the keys without repetition, and the software uses a built-in dictionary to look for the "most probable" word matching the input. For example, to enter "hello" you simply press keys 4, 3, 5, 5, and 6 once. Of course, this could also be the input for the word "gdjjm", but since this is no sensible English word, it can safely be ignored. By ruling out all other "improbable" solutions and only taking proper English words into account, this method can speed up writing of short messages considerably. Of course, if the word is not in the dictionary (like a name) then it has to be typed in manually using key repetition again.

More precisely, with every character typed, the phone will show the most probable combination of characters it has found up to that point. Let us assume that the phone knows about the words "idea" and "hello", with "idea" occurring more often. Pressing the keys 4, 3, 5, 5, and 6, one after the other, the phone offers you "i", "id", then switches to "hel", "hell", and finally shows "hello".

Problem

Write an implementation of the T9 text input which offers the most probable character combination after every keystroke. The probability of a character combination is defined to be the sum of the probabilities of all words in the dictionary that begin with this character combination. For example, if the dictionary contains three words "hell", "hello", and "hellfire", the probability of the character combination "hell" is the sum of the probabilities of these words. If some combinations have the same probability, your program is to select the first one in alphabetic order. The user should also be able to type the beginning of words. For example, if the word "hello" is in the dictionary, the user can also enter the word "he" by pressing the keys 4 and 3 even if this word is not listed in the dictionary.

輸入
The first line contains the number of scenarios.

Each scenario begins with a line containing the number w of distinct words in the dictionary (0<=w<=1000). These words are iven in the next w lines in ascending alphabetic order. Every line starts with the word which is a sequence of lowercase letters from the alphabet without whitespace, followed by a space and an integer p, 1<=p<=100, representing the probability of that word. No word will contain more than 100 letters.

Following the dictionary, there is a line containing a single integer m. Next follow m lines, each consisting of a sequence of at most 100 decimal digits 2?, followed by a single 1 meaning "next word".

輸出
The output for each scenario begins with a line containing "Scenario #i:", where i is the number of the scenario starting at 1.

For every number sequence s of the scenario, print one line for every keystroke stored in s, except for the 1 at the end. In this line, print the most probable word prefix defined by the probabilities in the dictionary and the T9 selection rules explained above. Whenever none of the words in the dictionary match the given number sequence, print "MANUALLY" instead of a prefix.

Terminate the output for every number sequence with a blank line, and print an additional blank line at the end of every scenario.

樣例輸入
2
5
hell 3
hello 4
idea 8
next 8
super 3
2
435561
43321
7
another 5
contest 6
follow 3
give 13
integer 6
new 14
program 4
5
77647261
6391
4681
26684371
77771

樣例輸出
Scenario #1:
i
id
hel
hell
hello

i
id
ide
idea


Scenario #2:
p
pr
pro
prog
progr
progra
program

n
ne
new

g
in
int

c
co
con
cont
anoth
anothe
another

p
pr
MANUALLY
MANUALLY

來源
Northwestern Europe 2001

題意:

原來的按鍵手機都一般是九鍵,九鍵輸入英文很麻煩,例如要鍵入“hello”,必須按兩次鍵4、兩次鍵3、三次鍵5、三次鍵5,最後按三次鍵6。

現有一種新的輸入方案名叫“T9”,只需要不重復地按鍵,軟件就會使用內置的字典來查找最可能的與輸入匹配的單詞。例如,輸入“hello”,只要依次按下4,3,5,5,6各一次即可。當然,這也可能是“gdjjm”一詞的輸入,但是因為這不是一個合理的英語單詞,所以可以安全地忽略它。通過排除所有其他不可能的解決方案,並且只考慮適當的英語單詞,這種方法可以大大加快短信的寫作速度。當然,如果這個詞不在軟件內置的字典中(比如名字),那麽它必須再次使用多次按鍵的方式輸入。

現在給你“T9”的字典,包含 $w(0 \le w \le 1e3)$ 個互不相同的字符串(串長不超過 $100$,已經按字典序升序排序),以及他們的出現概率 $p$。又給出 $m$ 次打字操作,每次輸入包含不超過 $100$ 個數字的 $2 \sim 9$ 數字串,代表依次按鍵,最後跟一個 $1$ 代表結束本次打字。

對“字符組合”的概率,定義為所有以該字符組合為前綴的單詞的出現概率之和。例如,如果字典包含三個單詞“hell”,“hello”和“hellfire”,則字符組合“hell”的概率是這些詞的概率之和。如果某些組合具有相同的概率,則您的程序是選字典序最小的。

題解:

按照手機的 $2 \sim 9$ 八個數字鍵作為字典樹每個節點的八個分支建樹。

每個節點定義一個 $pr$ 和 $s$ 分別代表:按鍵按到當前位置,最有可能的是字符串 $s$,並且可能性為 $pr$。

這樣一來,每次插入一個字符串,對每個節點均維護 $pr$ 和 $s$。

這樣的話,不能分批次插入同一個前綴,一個前綴只能插入一次,因此不妨先把每個前綴的 $pr$ 求出來,由於輸入的字典是按字典序排好序的,因此可以直接累加出前綴的最大概率。

這道題對於加深字典樹的理解還是很有幫助的。

AC代碼:

#include<bits/stdc++.h>
using namespace std;
const int maxn=1e3+5;
const int maxs=103;

int n,m;
string s[maxn];
int pr[maxn][maxs];

namespace Trie
{
    const int SIZE=maxn*maxs;
    int sz;
    struct TrieNode{
        int ed;
        string s;
        int pr;
        int nxt[8];
    }trie[SIZE];
    void init()
    {
        sz=1;
        for(int i=0;i<SIZE;i++)
        {
            trie[i].ed=trie[i].pr=0;
            trie[i].s.clear();
            memset(trie[i].nxt,0,sizeof(trie[i].nxt));
        }
    }
    const string key="22233344455566677778889999";
    void insert(int idx)
    {
        string &str=s[idx];
        int p=1;
        for(int k=0;k<str.size();k++)
        {
            int ch=key[str[k]-a]-2;
            if(!trie[p].nxt[ch]) trie[p].nxt[ch]=++sz;
            p=trie[p].nxt[ch];
            if(pr[idx][k]>trie[p].pr)
            {
                trie[p].pr=pr[idx][k];
                trie[p].s=str.substr(0,k+1);
            }
        }
        trie[p].ed++;
    }
    void search(const string& s)
    {
        int p=1;
        for(int i=0;i<s.size()-1;i++)
        {
            int ch=s[i]-2;
            p=trie[p].nxt[ch];
            if(!p) cout<<"MANUALLY\n";
            else cout<<trie[p].s<<\n;
        }
    }
};

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int T;
    cin>>T;
    for(int kase=1;kase<=T;kase++)
    {
        cin>>n;
        for(int i=1,prob;i<=n;i++)
        {
            cin>>s[i]>>prob;
            for(int k=0;k<s[i].size();k++) pr[i][k]=prob;
        }
        for(int i=2;i<=n;i++)
        {
            for(int k=0;k<min(s[i].size(),s[i-1].size());k++)
            {
                if(s[i][k]==s[i-1][k])
                {
                    pr[i][k]+=pr[i-1][k];
                    pr[i-1][k]=0;
                }
                else break;
            }
        }

        Trie::init();
        for(int i=1;i<=n;i++) Trie::insert(i);

        cin>>m;
        cout<<"Scenario #"<<kase<<":\n";
        for(int i=1;i<=m;i++)
        {
            cin>>s[0];
            Trie::search(s[0]);
            cout<<\n;
        }
        cout<<\n;
    }
}

註:

害怕cin/cout太慢,關閉IO同步並且解除cin/cout綁定,參考https://blog.csdn.net/qian2213762498/article/details/81982380:

影響cout和cin的性能的有兩個方面:同步性和緩沖區,同步性可以通過 ios::sync_with_stdio(false); 禁用;操作系統會對緩沖區進行管理和優化,但十分有限,使用了endl之後,會對緩沖區執行清空操作,這個過程會先執行’\n’,再執行flush操作,非常漫長,所以盡量使用‘\n’而不是endl執行換行。然後,還有一個cout和cin的綁定效果,兩者同時使用的話,cin與cout交替操作,會有一個flush過程,所以還是會很漫長,可以通過 cin.tie(nullptr); 解除綁定。

POJ 1451 - T9 - [字典樹]