1. 程式人生 > 實用技巧 >pca針對初學者_針對初學者和專家的12酷資料科學專案創意

pca針對初學者_針對初學者和專家的12酷資料科學專案創意

pca針對初學者

The domain of Data Science brings with itself a variety of scientific tools, processes, algorithms, and knowledge extraction systems from structured and unstructured data alike, for identifying meaningful patterns in it.

資料科學領域自結構化和非結構化資料中引入了多種科學工具,流程,演算法和知識提取系統,用於識別其中有意義的模式。

Data Science

has been on a boom for the last couple of years, and the push in the domain of Artificial Intelligence due to the various innovations is only going to take it further on to the next level. As more industries begin to realize the power of Data Science, more opportunities surface in the market.

過去幾年,

資料科學一直處於蓬勃發展中,由於各種創新,推動人工智慧領域的發展只會使它進一步發展。 隨著越來越多的行業開始意識到資料科學的力量,更多的機會出現在市場上。

If you fancy Data Science and are eager to get a solid grip on the technology, now is as a good time as ever to hone your skills to comprehend and manage the upcoming challenges in Data Science. The purpose behind penning this article is to share some practicable ideas for your next project, which will not only boost your confidence in Data Science but also play a critical part in enhancing your skills.

如果您喜歡Data Science,並渴望牢牢掌握該技術,那麼現在正是您磨練您的技能以理解和管理Data Science即將到來的挑戰的好時機。 撰寫本文的目的在於為您的下一個專案分享一些可行的想法,這不僅會增強您對資料科學的信心,而且在提高技能方面也將發揮關鍵作用。

Data really powers everything that we do. — Jeff Weiner

資料確實為我們所做的一切提供了動力。 —傑夫·韋納

熱門有趣的資料科學專案(Top Interesting Data Science Projects)

Understanding Data Science can be quite confusing at first, but with constant practice, you can soon begin to grasp the various notions and terminologies in the subject. The best way to gain more exposure to Data Science apart from going through the literature is to take on some helpful projects which will not only upskill you but will also make your resume more impressive.

首先,瞭解資料科學可能會造成混亂,但是通過不斷的實踐,您很快就可以掌握該主題中的各種概念和術語。 除了閱讀文獻之外,使您更多地接觸資料科學的最佳方法是進行一些有用的專案,這些專案不僅會提高您的技能,還將使您的簡歷更加令人印象深刻。

In this section, we will share a handful of fun and interesting project ideas with you, which are spread across all skill levels, ranging from beginners, intermediate, and veterans.

在本節中,我們將與您分享一些有趣而有趣的專案構想,這些構想分佈在從初學者,中級和退伍軍人的所有技能水平中。

But before diving into this you can also check out some cool Python Project Ideas for Python Developers here —

但是在深入探討之前,您還可以在此處檢視一些針對Python開發人員的很棒的Python專案創意-

1.構建聊天機器人 (1. Building Chatbots)

Chatbots play a pivotal role for businesses as they can effortlessly handle a barrage of customer queries and messages without any slowdown. They have single-handedly reduced the customer service workload for us by automating a majority of the process. They do this by utilizing techniques backed with Artificial Intelligence, Machine Learning, and Data Science.

聊天機器人對於企業至關重要,因為它們可以毫不費力地處理大量客戶查詢和訊息。 他們通過自動化大部分流程單方面為我們減輕了客戶服務工作量。 他們通過利用人工智慧機器學習資料科學支援的技術來做到這一點

Chatbots work by analyzing the input from the customer and replying with an appropriate mapped response. To train the chatbot, you can use Recurrent Neural Networks with the intents JSON dataset while the implementation can be handled using Python. Whether you want your chatbot to be domain-specific or open-domain depends on its purpose. As these chatbots process more interactions, their intelligence and accuracy also increase.

聊天機器人通過分析來自客戶的輸入並以適當的對映響應進行回覆來工作。 要訓​​練聊天機器人,您可以將Recurrent Neural Networks與intents JSON資料集結合使用,同時可以使用Python處理實現。 您希望聊天機器人是特定於域的還是開放域的,取決於其用途。 隨著這些聊天機器人處理更多的互動,它們的智慧和準確性也隨之提高。

Read interesting articles on Python —

閱讀有關Python的有趣文章-

2.信用卡欺詐檢測 (2. Credit Card Fraud Detection)

Credit Card Fraud Detection
Photo by Avery Evans on Unsplash
艾弗裡·埃文斯( Avery Evans)Unsplash拍攝的照片

Credit card frauds are more common than you think, and lately, they’ve been on the higher side. Figuratively speaking, we’re on the path to cross a billion credit card users by the end of 2022. But thanks to the innovations in technologies like Artificial Intelligence, Machine Learning, and Data Science, credit card companies have been able to successfully identify and intercept these frauds with sufficient accuracy.

信用卡欺詐比您想像的要普遍得多,最近,欺詐行為的地位更高。 形象地說,我們正在走2022年底前突破10億信用卡使用者的道路。 但是,由於人工智慧,機器學習和資料科學等技術的創新,信用卡公司已經能夠以足夠的準確性成功識別並攔截這些欺詐行為。

Simply put, the idea behind this is to analyze the customer’s usual spending behavior, including mapping the location of those spendings to identify the fraudulent transactions from the non-fraudulent ones. For this project, you can use either R or Python with the customer’s transaction history as the dataset and ingest it into decision trees, Artificial Neural Networks, and Logistic Regression. As you feed more data to your system, you should be able to increase its overall accuracy.

簡而言之,其背後的想法是分析客戶通常的支出行為,包括對映這些支出的位置以從非欺詐交易中識別欺詐交易。 對於此專案,您可以將R或Python客戶的交易歷史記錄一起用作資料集,並將其提取到決策樹人工神經網路Logistic迴歸中。 當您向系統提供更多資料時,您應該能夠提高其整體準確性。

3.假新聞檢測 (3. Fake News Detection)

Fake News Detection
Photo by Aaron Burden on Unsplash
照片由 Aaron BurdenUnsplash拍攝

We’re sure fake news needs no introduction. In today’s all connected world, it has become ridiculously easy to share fake news over the internet. Every once in a while, you can see false information being spread online from unauthorized sources that not only cause problems to the people targeted but also has the potential to cause widespread panic and even violence.

我們確信,假新聞無需介紹。 在當今全連線的世界中,通過網際網路共享虛假新聞變得非常容易。 有時,您會看到虛假資訊從未經授權的來源線上傳播,這不僅給目標人群造成問題,而且還可能引起廣泛的恐慌甚至暴力。

To curb the spread of fake news, it is crucial to identify the authenticity of the information, which can be done using this Data Science project. For this, you can use Python and build a model with TfidfVectorizer and PassiveAggressiveClassifier to separate the real news from the fake one. Some of the Python libraries suited for this project are pandas, NumPy, and scikit-learn, and for the dataset, you can use News.csv.

為了遏制虛假新聞的傳播,至關重要的是要確定資訊的真實性,這可以使用此Data Science專案來完成。 為此,您可以使用Python並使用TfidfVectorizerPassiveAggressiveClassifier構建模型以將真實新聞與假新聞分開。 適用於該專案的某些Python庫pandas, NumPyscikit-learn ,對於資料集,您可以使用News.csv

4.森林火災預測 (4. Forest Fire Prediction)

Forest Fire Prediction
Pixabay from Pexels提供 Pexels Pixabay

Building a forest fire and wildfire prediction system will be another good use of the capabilities offered by Data Science. A wildfire or forest fire is essentially an uncontrolled fire in a forest. Every incident of a forest wildfire has caused an immense amount of damage to not only nature but the animal habitat and human property as well.

建立森林火災和野火預測系統將是Data Science提供的功能的另一個很好的用途。 野火或森林火災本質上是森林中不受控制的火災。 每次森林野火事件不僅對自然造成巨大破壞,而且對動物棲息地和人類財產造成巨大破壞。

To control and even predict the chaotic nature of wildfires, you can use k-means clustering to identify major fire hotspots and their severity. This could be useful in properly allocating resources. You can also make use of the meteorological data to find common periods, seasons for wildfires to increase your model’s accuracy.

要控制甚至預測野火的混亂性質,您可以使用k均值聚類來識別主要火災熱點及其嚴重性。 這在正確分配資源時可能很有用。 您還可以利用氣象資料來查詢常見時期,野火季節,以提高模型的準確性。

5.乳腺癌分類 (5. Classifying Breast Cancer)

Classifying Breast Cancer
Photo by Anna Shvets from Pexels
PexelsAnna Shvets攝

In case you want to add a project related to the healthcare industry to your portfolio, you can try building a breast cancer detection system using Python. Breast cancer cases have been on the rise lately, and the best possible way to fight breast cancer is to identify it at an early stage and take appropriate preventive measures.

如果要將與醫療保健行業相關的專案新增到您的投資組合中,可以嘗試使用Python構建乳腺癌檢測系統。 乳腺癌病例近來呈上升趨勢,而與乳腺癌作鬥爭的最佳方法是及早發現並採取適當的預防措施。

To build such a system with Python, you can use the IDC(Invasive Ductal Carcinoma) dataset, which contains histology images for cancer-inducing malignant cells, and you can train your model on this dataset. For this project, you’ll find Convolutional Neural Networks better suited for the task, and as for the Python libraries, you can use NumPy, OpenCV, TensorFlow, Keras, scikit-learn, and Matplotlib.

要使用Python構建這樣的系統,您可以使用IDC(侵襲性導管癌)資料集,該資料集包含用於誘發癌症的惡性細胞的組織學影象,並且可以在該資料集上訓練模型。 在該專案中,您會發現更適合該任務的C語言神經網路。對於Python庫,您可以使用NumPy OpenCV TensorFlow Keras, scikit-learnMatplotlib

6.駕駛員睡意檢測 (6. Driver Drowsiness Detection)

Road accidents take many lives every year, and one of the causes of road accidents is sleepy drivers. Being a potential cause for danger on the road, one of the best ways to prevent this is to implement a drowsiness detection system.

道路交通事故每年奪去許多人的生命,而導致道路交通事故的原因之一就是睏倦的駕駛員。 作為潛在的道路危險源,防止這種情況的最好方法之一是實施睡意檢測系統

A driver drowsiness detection system such as this is yet another project that has the potential to save many lives by constantly assessing the driver’s eyes and alerting him with alarms in case the system detects frequent closing of eyes.

像這樣的駕駛員睡意檢測系統是又一個專案,它有可能通過不斷評估駕駛員的眼睛並系統檢測到頻繁閉眼的情況下向駕駛員發出警報來挽救許多生命

A webcam is a must for this project to allow the system to periodically monitor the driver’s eyes. To make this happen, this Python project will require a deep learning model and libraries such as OpenCV, TensorFlow, Pygame, and Keras.

對於該專案,必須有網路攝像頭,以使系統能夠定期監視駕駛員的眼睛。 要做到這一點,這Python專案將需要一個深度學習模型和庫,如OpenCV TensorFlow pygame的Keras

演示地址

7.推薦系統(電影/網路節目推薦)(7. Recommender Systems(Movie/Web Show Recommendation))

Recommender Systems
Pixabay from Pexels提供 Pexels Pixabay

Have you ever wondered how media platforms like YouTube, NetFlix, and others recommend you what to watch next? To do so, they use a tool called the recommender/recommendation system. It takes several metrics into consideration, such as age, previously watched shows, most-watched genre, watch frequency, and feeds them into a Machine Learning model which then generates what the user might like to watch next.

您是否想過YouTubeNetFlix等媒體平臺如何推薦您接下來看什麼? 為此,他們使用一種稱為“推薦器/推薦系統”的工具。 它考慮了多個指標,例如年齡,以前觀看的節目,觀看次數最多的型別,觀看頻率,並將它們輸入到機器學習模型中,然後生成使用者接下來可能想觀看的內容。

Based on your preference and input data, you can try to build either a content-based recommendation system or a collaborative filtering recommendation system. For this project, you can pick R with the MovieLens dataset that covers ratings for over 58,000 movies, and as for the packages, you can use recommenderlab, ggplot2, reshap2, and data.table.

根據您的偏好和輸入資料,您可以嘗試構建基於內容的推薦系統或協作過濾推薦系統。 對於這個專案,你可以選擇R中的MovieLens資料集,涵蓋收視58000電影,並作為包,您可以使用recommenderlab GGPLOT2 reshap2data.table。

8.情緒分析 (8. Sentiment Analysis)

Also known as opinion mining, sentiment analysis is a tool backed by Artificial Intelligence, which essentially lets you identify, gather, and analyze people’s opinions about a subject or a product. These opinions could be from a variety of sources, including online reviews, survey responses, and could involve a range of emotions such as happy, angry, positive, love, negative, excitement, and more.

情感分析也稱為觀點挖掘,是人工智慧支援工具,從本質上講,您可以使用它識別,收集和分析人們對某個主題或產品的觀點。 這些意見可能來自各種來源,包括線上評論,調查回覆,並且可能涉及各種情緒,例如快樂,憤怒,積極,愛,消極,激動等。

Modern data-driven companies are the ones that benefit the most from a sentiment analysis tool as it gives them the critical insight about the people’s reaction to the dry run of a new product launch or a change in business strategy. To build a system like this, you could use R with janeaustenR’s dataset along with the tidytext package.

現代資料驅動型公司是從情感分析工具中受益最多的公司,因為它為他們提供了有關人們對新產品釋出的暫定執行或業務戰略變更的React的關鍵見解。 要構建這樣的系統,可以將R與janeaustenR的資料集以及tidytext包一起使用。

演示地址

Check Out Top Google AI Tools —

檢視熱門的Google AI工具-

9.探索性資料分析 (9. Exploratory Data Analysis)

Exploratory Data Analysis
Photo by Lukas from Pexels
PexelsLukas

Data Analysis starts with EDA. The Exploratory Data Analysis plays a key role in the data analysis process as this step helps you make sense of your data and often involves visualizing them for better exploration. For visualization, you can pick from a range of options, such as histograms, scatterplots, or heat maps. EDA can also expose unexpected results and outliers in your data. Once you have identified the patterns and derived the necessary insights from your data, you are good to go.

資料分析從EDA開始。 探索性資料分析在資料分析過程中起著關鍵作用,因為此步驟可幫助您理解資料,並且通常涉及將其視覺化以進行更好的探索。 為了視覺化,您可以從一系列選項中進行選擇,例如直方圖,散點圖或熱圖。 EDA還可以暴露資料中的意外結果和異常值。 一旦確定了模式並從資料中得出了必要的見解,就可以了。

A project of this scale can easily be done with Python, and for the packages, you can use pandas, NumPy, seaborn, and matplotlib.

使用Python可以輕鬆完成如此規模的專案,對於這些包,您可以使用pandas,NumPy, seaborn和matplotlib。

A great source for EDA datasets is the IBM Analytics Community.

EDA資料集的一個重要來源是IBM Analytics Community

10.性別檢測與年齡預測 (10. Gender Detection & Age Prediction)

Identified as a classification problem, this gender detection and age prediction project will put both your Machine Learning and Computer Vision skills to test. The goal here is to build a system that takes a person’s image and tries to identify their age and gender.

被識別為分類問題,此性別檢測和年齡預測專案將同時測試您的機器學習和計算機視覺技能。 這裡的目標是建立一個獲取人物影象並嘗試識別其年齡和性別的系統。

For this fun project, you can implement Convolutional Neural Networks and use Python with the OpenCV package. You can grab the Adience dataset for this project. Factors such as makeup, lighting, facial expressions will make this challenging and try to throw your model off, so keep that in mind.

對於這個有趣的專案,您可以實現卷積神經網路,並將Python與OpenCV軟體包一起使用。 您可以獲取此專案的Adience資料集。 諸如化妝,照明,面部表情等因素將使這一挑戰變得艱鉅,並嘗試使您的模型脫穎而出,因此請記住這一點。

11.認識言語情感 (11. Recognizing the Speech Emotions)

Speech is one of the most fundamental ways of expressing ourselves, and it hides various emotions inside it, such as calmness, anger, joy, and excitement, to name a few. By analyzing the emotions behind the speech, it is possible to use this information to restructure our actions and services, and even products, to offer a more personalized service to specific individuals.

言語是表達自我的最基本方法之一,它掩蓋了其中的各種情感,例如鎮定,憤怒,喜悅和興奮。 通過分析演講背後的情緒,可以使用此資訊來重組我們的行為和服務,甚至產品,以為特定個人提供更個性化的服務。

This Speech Emotion Recognition project tries to identify and extract emotions from multiple sound files containing human speech. To make something like this in Python, you can use the Librosa, SoundFile, NumPy, Scikit-learn, and PyAaudio packages. For the dataset, you can use the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS), which has over 7300 files for you to use.

語音情感識別專案試圖從包含人類語音的多個聲音檔案中識別並提取情感。 要在Python中進行類似的操作,可以使用Librosa SoundFile ,NumPy,Scikit-learn和PyAaudio軟體包對於資料集,您可以使用Ryerson情緒語音和歌曲視聽資料庫(RAVDESS) ,該資料庫具有7300多個檔案供您使用。

12.客戶細分 (12. Customer Segmentation)

Customer Segmentation
Photo by You X Ventures on Unsplash
You X VenturesUnsplash拍攝的照片

Modern businesses strive by delivering highly personalized services to their customers, which would not have been possible without some form of customer categorization or segmentation. In doing so, organizations can easily structure their services and products around their customers while targeting them to drive more revenue.

現代企業努力為客戶提供高度個性化的服務,而如果沒有某種形式的客戶分類或細分,這是不可能的。 這樣,組織可以輕鬆地圍繞客戶構建其服務和產品,同時針對他們以增加收入。

For this project, you will be going to use unsupervised learning to group your customers into clusters based on individual aspects such as age, gender, region, interests, and so on. K-means clustering or hierarchical clustering will be suitable here, but you can also experiment with Fuzzy clustering or Density-based clustering methods. You can use the Mall_Customers dataset as sample data.

對於本專案,您將使用無監督學習,根據年齡,性別,地區,興趣愛好等各個方面將客戶分組。 K均值聚類分層聚類在這裡很合適,但是您也可以嘗試使用模糊聚類基於密度的聚類方法。 您可以將Mall_Customers資料集用作樣本資料。

需要構建更多的資料科學專案構想— (More Data Science Project Ideas to Build —)

  • Coronavirus visualizations

    冠狀病毒視覺化
  • Visualising climate change

    視覺化氣候變化
  • Uber’s pickup analysis

    優步的皮卡分析
  • Web traffic forecasting using time series

    使用時間序列進行網路流量預測
  • Impact Of Climate Change On Global Food Supply

    氣候變化對全球糧食供應的影響
  • Detecting Parkinson’s Disease

    檢測帕金森氏病
  • Pokemon Data Exploration

    寵物小精靈資料探索
  • Earth Surface Temperature Visualization

    地表溫度視覺化
  • Brain Tumor Detection with Data Science

    利用資料科學進行腦腫瘤檢測
  • Predictive policing

    預測性警務

結論(Conclusion)

Through this article, we tried to cover more than 10 fun and handy Data Science project ideas for you, which will help you understand the ABCs of the technology. Being one of the hottest in-demand domains in the industry, the future of Data Science holds many promises, but to make the most out of the upcoming opportunities, you need to be prepared to take on the challenges it brings. Good luck!

通過本文,我們試圖為您介紹10多個有趣且方便的資料科學專案創意,這將幫助您瞭解該技術的基礎知識。 作為行業中最熱門的需求領域之一,資料科學的未來有很多希望,但是要充分利用即將到來的機遇,您需要做好準備應對它帶來的挑戰。 祝好運!

Note: To eliminate problems of different kinds, I want to alert you to the fact this article represent just my personal opinion I want to share, and you possess every right to disagree with it.

注意:為消除各種問題,我想提醒您以下事實,即本文僅代表我要分享的個人觀點,您擁有反對該觀點的一切權利。

If you have more suggestions or ideas, we’d love to hear about them.

如果您有更多建議或想法,我們很樂意聽到有關它們的資訊。

更有趣的讀物 (More Interesting Readings)

I hope you’ve found this article useful! Below are some interesting readings hope you like them too-

希望本文對您有所幫助! 以下是一些有趣的讀物,希望您也喜歡它們-

About Author

關於作者

Claire D. is a Content Crafter and Marketer at Digitalogy a tech sourcing and custom matchmaking marketplace that connects people with pre-screened & top-notch developers and designers based on their specific needs across the globe. Connect with me on Medium, Linkedin, & Twitter.

克萊爾·D Digitalogy的Content Crafter and Marketinger ,這是一個技術採購和自定義對接市場,根據全球各地的特定需求,將人們與預先篩選和一流的開發人員和設計師聯絡起來。 Medium上與我聯絡 LinkedinTwitter

翻譯自: https://towardsdatascience.com/12-cool-data-science-projects-ideas-for-beginners-and-experts-fc75b5498e03

pca針對初學者