1. 程式人生 > >大資料日誌分析專案mapreduce程式

大資料日誌分析專案mapreduce程式

總體思路:
使用flume將伺服器上的日誌傳到hadoop上面,然後使用mapreduce程式完成資料清洗,統計pv,visit模型.最後使用azkaban定時執行程式.
使用者每次登入根據session來判斷.
本人親自測試可以使用
原始日誌欄位說明:id,方法中文說明,登入人name,登入時間,操作耗時(毫秒),請求路徑1,請求路徑2,請求全路徑,請求方式(get/post),瀏覽器資訊,使用者ip地址,請求頁面,使用者session
原始日誌如下:

95367   後臺首頁    sw2 1529919971466   21  http://upms.zhangshuzheng.cn:1111   /manage
/index http://upms.zhangshuzheng.cn:1111/manage/index GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/manage/index.jsp" 548a66a9-e89c-401b-b1f0-503357ce72ae 95366 登入 sw2 1529919971322 50 http://upms.zhangshuzheng.cn:1111 /sso
/login http://upms.zhangshuzheng.cn:1111/sso/login POST {validateCode=[2GRQ],password=[12345],rememberMe=[false],backurl=[],username=[sw2]} Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 {"code":1,"data":"http://upms.zhangshuzheng.cn:1111"
,"message":"success"} 548a66a9-e89c-401b-b1f0-503357ce72ae 95365 登入 1529919964249 0 http://upms.zhangshuzheng.cn:1111 /sso/login http://upms.zhangshuzheng.cn:1111/sso/login POST {validateCode=[FDEY],password=[12345],rememberMe=[false],backurl=[],username=[sw2]} Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 {"code":10107,"data":"請更換驗證碼!","message":"ValidateCode error"} 548a66a9-e89c-401b-b1f0-503357ce72ae 95364 登入 1529919670205 2 http://upms.zhangshuzheng.cn:1111 /sso/login http://upms.zhangshuzheng.cn:1111/sso/login GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/sso/login.jsp" f1124085-8fdb-45e8-9a01-716153d24b11 95363 退出登入 1529919670085 47 http://upms.zhangshuzheng.cn:1111 /sso/logout http://upms.zhangshuzheng.cn:1111/sso/logout GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "redirect:http://upms.zhangshuzheng.cn:1111/manage/index" 2837e087-0958-4e47-ac4a-c94441199deb 95362 查詢字典 lzh 1529919651268 19 http://upms.zhangshuzheng.cn:1111 /manage/dictionary/select/sys http://upms.zhangshuzheng.cn:1111/manage/dictionary/select/sys GET sort=pkId&order=asc&offset=0&limit=50 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 {"total":13,"rows":[{"code":"sys","ctime":1521769310000,"description":"系統","fatherCode":"sys","fatherDesc":"系統","pkId":3},{"code":"mg","ctime":1522120361000,"description":"密級","fatherCode":"sys","fatherDesc":"sys","pkId":33,"remarks":"檔案祕密級別"},{"code":"preservationDate","ctime":1522121226000,"description":"儲存期限","fatherCode":"sys","fatherDesc":"sys","pkId":40,"remarks":"設定文件的儲存期限"},{"code":"tradition","ctime":1522132680000,"description":"傳統歸檔","fatherCode":"sys","fatherDesc":"sys","pkId":44,"remarks":"傳統歸檔"},{"code":"comArticle","ctime":1522134595000,"description":"來文","fatherCode":"sys","fatherDesc":"sys","pkId":51,"remarks":"簡化整理--來文"},{"code":"sendArticle","ctime":1522135517000,"description":"發文","fatherCode":"sys","fatherDesc":"sys","pkId":56,"remarks":"簡化整理--發文"},{"code":"innerArticle","ctime":1522137766000,"description":"內部檔案","fatherCode":"sys","fatherDesc":"sys","pkId":63,"remarks":"簡化整理--內部檔案"},{"code":"singleArchive","ctime":1522139048000,"description":"單件","fatherCode":"sys","fatherDesc":"sys","pkId":71,"remarks":"簡化管理--單件"},{"code":"separator","ctime":1522216114000,"description":"分隔符","fatherCode":"sys","fatherDesc":"sys","pkId":78,"remarks":"特殊字元符號"},{"code":"carrierType","ctime":1522380386000,"description":"載體型別","fatherCode":"sys","fatherDesc":"sys","pkId":94,"remarks":"檔案的載體"},{"code":"archiveSource","ctime":1522381316000,"description":"檔案來源","fatherCode":"sys","fatherDesc":"sys","pkId":98,"remarks":"檔案的出處"},{"code":"abbreviation","ctime":1523347840000,"description":"門類簡稱","fatherCode":"sys","fatherDesc":"sys","pkId":109,"remarks":"門類號的簡稱"},{"code":"activitiCode","ctime":1524106881000,"description":"工作流定義","fatherCode":"sys","fatherDesc":"sys","pkId":120,"remarks":"應用於本專案的所有工作流"}]} upms:dictionary:select 2837e087-0958-4e47-ac4a-c94441199deb 95361 查詢字典不分頁 lzh 1529919651241 16 http://upms.zhangshuzheng.cn:1111 /manage/dictionary/selectNoPagination/sys http://upms.zhangshuzheng.cn:1111/manage/dictionary/selectNoPagination/sys GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 {"total":13,"rows":[{"code":"mg","ctime":1522120361000,"description":"密級","fatherCode":"sys","fatherDesc":"sys","pkId":33,"remarks":"檔案祕密級別"},{"code":"preservationDate","ctime":1522121226000,"description":"儲存期限","fatherCode":"sys","fatherDesc":"sys","pkId":40,"remarks":"設定文件的儲存期限"},{"code":"tradition","ctime":1522132680000,"description":"傳統歸檔","fatherCode":"sys","fatherDesc":"sys","pkId":44,"remarks":"傳統歸檔"},{"code":"comArticle","ctime":1522134595000,"description":"來文","fatherCode":"sys","fatherDesc":"sys","pkId":51,"remarks":"簡化整理--來文"},{"code":"sendArticle","ctime":1522135517000,"description":"發文","fatherCode":"sys","fatherDesc":"sys","pkId":56,"remarks":"簡化整理--發文"},{"code":"innerArticle","ctime":1522137766000,"description":"內部檔案","fatherCode":"sys","fatherDesc":"sys","pkId":63,"remarks":"簡化整理--內部檔案"},{"code":"carrierType","ctime":1522380386000,"description":"載體型別","fatherCode":"sys","fatherDesc":"sys","pkId":94,"remarks":"檔案的載體"},{"code":"separator","ctime":1522216114000,"description":"分隔符","fatherCode":"sys","fatherDesc":"sys","pkId":78,"remarks":"特殊字元符號"},{"code":"activitiCode","ctime":1524106881000,"description":"工作流定義","fatherCode":"sys","fatherDesc":"sys","pkId":120,"remarks":"應用於本專案的所有工作流"},{"code":"sys","ctime":1521769310000,"description":"系統","fatherCode":"sys","fatherDesc":"系統","pkId":3},{"code":"abbreviation","ctime":1523347840000,"description":"門類簡稱","fatherCode":"sys","fatherDesc":"sys","pkId":109,"remarks":"門類號的簡稱"},{"code":"singleArchive","ctime":1522139048000,"description":"單件","fatherCode":"sys","fatherDesc":"sys","pkId":71,"remarks":"簡化管理--單件"},{"code":"archiveSource","ctime":1522381316000,"description":"檔案來源","fatherCode":"sys","fatherDesc":"sys","pkId":98,"remarks":"檔案的出處"}]} upms:dictionary:selectNoPagination 2837e087-0958-4e47-ac4a-c94441199deb 95360 字典首頁 lzh 1529919650618 10 http://upms.zhangshuzheng.cn:1111 /manage/dictionary/index http://upms.zhangshuzheng.cn:1111/manage/dictionary/index GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/manage/dictionary/index.jsp" upms:dictionary:read 2837e087-0958-4e47-ac4a-c94441199deb 95359 全宗列表 lzh 1529919646915 64 http://upms.zhangshuzheng.cn:1111 /manage/fonds/list http://upms.zhangshuzheng.cn:1111/manage/fonds/list POST {} Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 [{"ctime":1524758400000,"fondsId":13,"fondsName":"佔地方","fondsNum":"000","mtime":1525190400000,"str1":"0","str2":"1"},{"ctime":1523462400000,"fondsId":1,"fondsName":"ZYP測試","fondsNum":"000","mtime":1525190400000,"str1":"0","str2":"0"},{"ctime":1525190400000,"fondsId":19,"fondsName":"lzh測試2","fondsNum":"001","str1":"0","str2":"1"},{"ctime":1523462400000,"fondsId":4,"fondsName":"301","fondsNum":"002","mtime":1524758400000,"str1":"0","str2":"1"},{"ctime":1523462400000,"fondsId":2,"fondsName":"ZYP測試","fondsNum":"003","mtime":1524758400000,"str1":"0","str2":"1"},{"ctime":1523462400000,"fondsId":5,"fondsName":"ZXY測試","fondsNum":"004","str1":"0","str2":"0"},{"ctime":1523462400000,"fondsId":6,"fondsName":"WXL測試","fondsNum":"005","str1":"0","str2":"0"},{"ctime":1523462400000,"fondsId":7,"fondsName":"SW測試","fondsNum":"006","str1":"0","str2":"1"},{"ctime":1523462400000,"fondsId":8,"fondsName":"LZH測試2","fondsNum":"007","str1":"0","str2":"0"},{"ctime":1524758400000,"fondsId":14,"fondsName":"1","fondsNum":"008","mtime":1524758400000,"str1":"0","str2":"1"},{"ctime":1524758400000,"fondsId":16,"fondsName":"123123","fondsNum":"011","str1":"0","str2":"1"},{"ctime":1524758400000,"fondsId":15,"fondsName":"2","fondsNum":"012","mtime":1524758400000,"str1":"0","str2":"1"},{"ctime":1526313600000,"fondsId":21,"fondsName":"測試99","fondsNum":"099","str1":"0"},{"ctime":1525190400000,"fondsId":20,"fondsName":"lzh測試","fondsNum":"60","mtime":1525190400000,"str1":"0","str2":"1"},{"ctime":1523894400000,"fondsId":9,"fondsName":"innoking","fondsNum":"YNJY","str1":"0","str2":"1"}] 2837e087-0958-4e47-ac4a-c94441199deb 95358 查詢保管年限 lzh 1529919646747 69 http://upms.zhangshuzheng.cn:1111 /manage/scope/preservationDate http://upms.zhangshuzheng.cn:1111/manage/scope/preservationDate GET code=preservationDate Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 [{"code":"D","ctime":1522121323000,"description":"短期","fatherCode":"preservationDate","fatherDesc":"preservationDate","pkId":41,"remarks":"保管期限_短期(30年)"},{"code":"C","ctime":1522121397000,"description":"長期","fatherCode":"preservationDate","fatherDesc":"preservationDate","pkId":42,"remarks":"保管期限_長期(60年)"},{"code":"Y","ctime":1522121727000,"description":"永久","fatherCode":"preservationDate","fatherDesc":"preservationDate","pkId":43,"remarks":"保管期限_永久(無期限)"}] 2837e087-0958-4e47-ac4a-c94441199deb 95357 分類首頁 lzh 1529919645581 6 http://upms.zhangshuzheng.cn:1111 /manage/archivestype/index http://upms.zhangshuzheng.cn:1111/manage/archivestype/index GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/manage/archivestype/index.jsp" upms:archivestype:read 2837e087-0958-4e47-ac4a-c94441199deb 95356 個人資料首頁 lzh 1529919643316 9 http://upms.zhangshuzheng.cn:1111 /manage/personalData/index http://upms.zhangshuzheng.cn:1111/manage/personalData/index GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/manage/personalData/index.jsp" upms:personalData:read 2837e087-0958-4e47-ac4a-c94441199deb 95355 後臺首頁 lzh 1529919639681 60 http://upms.zhangshuzheng.cn:1111 /manage/index http://upms.zhangshuzheng.cn:1111/manage/index GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/manage/index.jsp" 2837e087-0958-4e47-ac4a-c94441199deb 95354 登入 lzh 1529919639478 70 http://upms.zhangshuzheng.cn:1111 /sso/login http://upms.zhangshuzheng.cn:1111/sso/login POST {validateCode=[wqby],password=[123456],rememberMe=[false],backurl=[],username=[lzh]} Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 {"code":1,"data":"http://upms.zhangshuzheng.cn:1111","message":"success"} 2837e087-0958-4e47-ac4a-c94441199deb 95353 登入 1529919630737 2 http://upms.zhangshuzheng.cn:1111 /sso/login http://upms.zhangshuzheng.cn:1111/sso/login GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "/sso/login.jsp" 2837e087-0958-4e47-ac4a-c94441199deb 95352 退出登入 1529919630594 61 http://upms.zhangshuzheng.cn:1111 /sso/logout http://upms.zhangshuzheng.cn:1111/sso/logout GET Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 "redirect:http://upms.zhangshuzheng.cn:1111/manage/index" 6b774f4d-9071-4443-a5b8-042e5e06aecc 95351 許可權列表 admin 1529919600561 62 http://upms.zhangshuzheng.cn:1111 /manage/permission/list http://upms.zhangshuzheng.cn:1111/manage/permission/list GET sort=permissionId&order=asc&offset=0&limit=10 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36 127.0.0.1 {"total":148,"rows":[{"ctime":1,"icon":"zmdi zmdi-accounts-list","name":"系統組織管理","orders":1,"permissionId":1,"pid":0,"status":1,"systemId":1,"type":1},{"ctime":2,"name":"系統管理","orders":2,"permissionId":2,"permissionValue":"upms:system:read","pid":1,"status":1,"systemId":1,"type":2,"uri":"/manage/system/index"},{"ctime":3,"name":"組織管理","orders":3,"permissionId":3,"permissionValue":"upms:organization:read","pid":1,"status":1,"systemId":1,"type":2,"uri":"/manage/organization/index"},{"ctime":4,"icon":"zmdi zmdi-accounts","name":"角色使用者管理","orders":

相關推薦

資料日誌分析專案mapreduce程式

總體思路: 使用flume將伺服器上的日誌傳到hadoop上面,然後使用mapreduce程式完成資料清洗,統計pv,visit模型.最後使用azkaban定時執行程式. 使用者每次登入根據session來判斷. 本人親自測試可以使用 原始日誌欄位說明:

資料日誌分析系統-logstash

logstash簡介Logstash 是一個開源的資料收集引擎,它具有備實時資料傳輸能力。它可以統一過濾來自不同源的資料,並按照開發者的制定的規範輸出到目的地。logstash-2.2.2的配置:從logstash-forward        到kafka的配置[email

資料日誌分析系統-hdfs日誌儲存

hdfs簡介:Hadoop分散式檔案系統(HDFS)被設計成適合執行在通用硬體(commodity hardware)上的分散式檔案系統。專案需求:使用hdfs進行客戶需要的指定域名時間打包日誌 以及原始日誌儲存進行離線計算遇到的問題:在這一步遇到的一個重要的問題:問題:從k

資料技術學習筆記之網站流量日誌分析專案資料採集層的實現3

一、資料採集業務     -》資料來源         -》網站:使用者訪問日誌、使用者行為日誌、伺服器執行日誌         -》業務:

資料技術學習筆記之網站流量日誌分析專案:網站業務與企業架構2

一、回顧     -》flume使用遇到的錯誤         -》少jar包         -》卡住不動:agent檔案不對 &nbs

資料技術學習筆記之網站流量日誌分析專案:Flume日誌採集系統1

一、網站日誌流量專案     -》專案開發階段:         -》可行性分析         -》需求分析  

資料技術學習筆記之Hadoop框架基礎3-網站日誌分析MapReduce過程詳解

一、回顧     -》Hadoop啟動方式         -》單個程序             sbin/h

mapReduce:網站日誌分析專案案例:資料清洗

一、資料情況分析 1.1 資料情況回顧   該論壇資料有兩部分:   (1)歷史資料約56GB,統計到2012-05-29。這也說明,在2012-05-29之前,日誌檔案都在一個檔案裡邊,採用了追加寫入的方式。   (2)自2013-05-30起,每天生成一個數據檔案

ETL專案2:資料清洗,處理:使用MapReduce進行離線資料分析並報表顯示完整專案

ETL專案2:大資料清洗,處理:使用MapReduce進行離線資料分析並報表顯示完整專案 思路同我之前的部落格的思路 https://www.cnblogs.com/symkmk123/p/10197467.html 但是資料是從web訪問的資料 avro第一次過濾 觀察資料的格式,我們

【備忘】資料spark SQL專案實戰分析視訊

一. 大資料初識 二. Spark以及生態圈概況 三. 專案開發環境搭建 四. Spark SQL概要 五. 從Hive平滑過渡到Spark  六. DateFrame與Dataset 七. External Data Source 八. SparkSQL願景 九. 大型網站日誌實戰 十.

[資料專案]-0016-基於Spark2.x新聞網資料實時分析視覺化系統

2018最新最全大資料技術、專案視訊。整套視訊,非那種淘寶雜七雜八網上能免費找到拼湊的亂八七糟的幾年前的不成體系浪費咱們寶貴時間的垃圾,詳細內容如下,視訊高清不加密,需要的聯絡QQ:3164282908(加Q註明51CTO)。   課程介紹 本專案基於某新聞網使用者日誌分析系統進行講解

資料雲端實驗室專案實戰-微博輿情資料分析有感

  大資料開發、只能硬體和圖形影象需求增長最快,需求人員最多。對微博資料分析平臺搭建,以及微博資料分析平臺數據儲存模組設計與實現。   最好有一定軟體開發方面的知識功底,比如瞭解網站開發、OA開發、Linux作業系統 引言、雲端實驗室環境 基於開源的amb

Hadoop學習筆記—20.網站日誌分析專案案例(二)資料清洗

網站日誌分析專案案例(二)資料清洗:當前頁面一、資料情況分析1.1 資料情況回顧  該論壇資料有兩部分:  (1)歷史資料約56GB,統計到2012-05-29。這也說明,在2012-05-29之前,日誌檔案都在一個檔案裡邊,採用了追加寫入的方式。  (2)自2013-05-

第一節 elk日誌分析 資料日誌 win7 64位搭建elk

一 ELK 背景        日誌主要包括系統日誌、應用程式日誌和安全日誌。系統運維和開發人員可以通過日誌瞭解伺服器軟硬體資訊、檢查配置過程中的錯誤及錯誤發生的原因。經常分析日誌可以瞭解伺服器的負荷

清華大學資料能力提升專案三名學生斬獲2017年中國高校SAS資料分析大賽亞軍

2017年11月20日,2017中國高校SAS資料分析大賽頒獎典禮在釣魚臺國賓館舉行。清華大學今年首次組隊參賽,在與北京大學、人民大學、復旦大學等1036支參賽團隊激烈比拼後,清華大學大資料能力

資料線上分析處理和常用工具

大資料線上分析處理的特點 . 資料來源源不斷的到來; 資料需要儘快的得到處理,不能產生積壓; 處理之後的資料量依然巨大,仍然後TB級甚至PB級的資料量; 處理的結果能夠儘快的展現; 以上四個特點可以總結為資料的收集->資料的傳輸->資料的處理-&g

資料文字分析的應用場景有哪些?

https://www.pmcaff.com/discuss/index/480966354177088?from=related&pmc_param%5Bentry_id%5D=1000000000167873 自問自答一發。之前寫過2篇相關的文章: 【資料運營】在運營中,為什麼文字分析遠比數值

資料角度分析plustoken

不知不覺,plustoken從5月份上線釋出到現在11月份,已經走過來小半年,但是質疑聲依舊不斷,還沒上車的依舊觀望,已經上車有的半信半疑,只有極少數的人堅定不移,一路向前。相信很多人都有看到,說plus是中國人自己搞的傳銷盤,這裡,樓主不去翻這篇帖子,僅僅從大資料的角度給大家剖析。 在

資料求索(3):實戰MapReduce

MapReduce 概述 主要用於離線、海量資料運算 WordCount編寫 下面這張經典圖很好地說明了如何編寫一個WordCount,也清楚說明了MapReduce的流程 對於輸入的一個文字(可以存放在HDFS上,可以非常非常大),先對檔案進行拆分,假設這裡一行一

資料技術】3.Mapreduce和Yarn

一、Mapreduce Mapreduce主要應用於日誌分析、海量資料的排序、索引計算等應用場景,它是一種分散式計算模型,主要用於解決離線海量資料的計算問題。 核心思想是:“分而治之,迭代彙總” Mapreduce主要由兩個階段: map階段:任務分解 1.讀取HDFS中的檔案,把輸入檔