1. 程式人生 > >2408 Anagram Groups(讓人窒息的字串桶排序)

2408 Anagram Groups(讓人窒息的字串桶排序)

Anagram GroupsTime Limit: 1000MSMemory Limit: 65536KDescription
World-renowned Prof. A. N. Agram's current research deals with large anagram groups. He has just found a new application for his theory on the distribution of characters in English language texts. Given such a text, you are to find the largest anagram groups. 
A text is a sequence of words. A word w is an anagram of a word v if and only if there is some permutation p of character positions that takes w to v. Then, w and v are in the same anagram group. The size of an anagram group is the number of words in that group. Find the 5 largest anagram groups.
Input

The input contains words composed of lowercase alphabetic characters, separated by whitespace(or new line). It is terminated by EOF. You can assume there will be no more than 30000 words.
Output
Output the 5 largest anagram groups. If there are less than 5 groups, output them all. Sort the groups by decreasing size. Break ties lexicographically by the lexicographical smallest element. For each group output, print its size and its member words. Sort the member words lexicographically and print equal words only once.
Sample Input

undisplayed
trace
tea
singleton
eta
eat
displayed
crate
cater
carte
caret
beta
beat
bate
ate
abet
Sample Output
Group of size 5: caret carte cater crate trace .
Group of size 4: abet bate beat beta .
Group of size 4: ate eat eta tea .
Group of size 1: displayed .

Group of size 1: singleton .

主要就是排序,但是應該建立一個結構體,所有相等的字串都具有相同的最小字典序狀態(即把一個字串的字元拆分組合之後,形成新的字串是最小字典序狀態。)

#include <iostream>
#include <cstdio>
#include <cstring>
#include <map>
#include <algorithm>
using namespace std;

map<string,int> ma;

map<string,bool> ak;
struct node{
	char init[25];///初始狀態
	char str[25];///最小字典序狀態
	int len;
}str[30005];
struct work{
	bool flag;
	char init[25];
}fa[5][30005];///用來做桶排序
bool cmp (node a,node b)
{
	int temp = strcmp(a.init,b.init);
	if (temp <=0){
		return true;
	}
	return false;
}
void fun(node* s) ///桶排序,對所有字串打亂成最小字典序狀態,將這個狀態稱為該字串源頭
{
	int code[26] = {0},len = strlen(s->init);
	for (int i = 0;i < len; i++){
		code[s->init[i] - 'a']++;
	}
	int por = 0;
	for (int i = 0;i < 26;i ++){
		while (code[i]--){
			s->str[por++] = 'a'+i;
		}
	}
}
int main ()
{
	int i = 0;
	int c = 0;
	while (~scanf ("%s",str[i].init)){
		fun(&str[i]);
		str[i].len = strlen(str[i].init);
		i++;
	}
	//cout << '\n';
	sort(str,str+i,cmp);///用最小字典序狀態作為標準來排序
	for (int j = 0;j < i; j++){
		//cout << str[j].init << '\n';
		ma[str[j].str]++;///記錄該最小字典序出現次數,用來做桶排序
	}
	for (int j = 0;j < i; j++){
		for (int i = 0;i < 5; i++){///這裡也是個桶排序,按照出現次數的多少將其來排序,並且桶中存放的是最小字典序狀態
			///問題就在於會出現重複次數相同的字串,所以用多維陣列做桶排序,當前一維已經存放過了就放到下一維
			///由於遍歷順序按照字典序遍歷,又最多五組答案,所以如果相同的重複次數超過五次,那麼之後第六次的字典序一定小於
			///前幾次的,不需要記錄,所以只要“五個桶”。
			if (!fa[i][ma[str[j].str]].flag){
				strcpy(fa[i][ma[str[j].str]].init,str[j].str);
				fa[i][ma[str[j].str]].flag = true;
				ma[str[j].str] = 0;
				break;
			}
		}
	}
	int cut = 0;
	for (int j = 29999;j > 0; j--){///從後往前遍歷桶
		for (int k = 0;k < 5; k++){
			if (fa[k][j].flag){///桶中有元素
				cout << "Group of size "<< j << ":";
				for (int cur = 0;cur < i; cur++){ ///對於桶中每一個元素,遍歷輸出給的資料中的具有相同最小字典序狀態的字串
					if (!strcmp(fa[k][j].init,str[cur].str) && ak[str[cur].init] != true){
						cout << ' ' <<str[cur].init;
						ak[str[cur].init] = true;///注意輸出過的資料不可重複輸出
					}
				}
				cout << " .\n";
				cut++;
			}
			if (cut == 5){///輸出了五組答案就結束
				j = 0;
				break;
			}
		}
	}
	return 0;
}