1. 程式人生 > 實用技巧 >使用python機器學習和深度學習的5個很棒的計算機視覺專案創意

使用python機器學習和深度學習的5個很棒的計算機視覺專案創意

專案構想(Project Ideas)

Computer Vision is a field of artificial intelligence that deals with images and pictures to solve real-life visual problems. The ability of the computer to recognize, understand and identify digital images or videos to automate tasks is the main goal which computer vision tasks seek to accomplish and perform successfully.

ç動態數值Vision是人工智慧與影象和圖片交易,以解決現實生活中的視覺問題的領域。 計算機識別,理解和識別數字影象或視訊以自動執行任務的能力是計算機視覺任務尋求成功完成和執行的主要目標。

Humans have no problem to identify the objects and the surroundings around them. However, it is not so easy for computers to identify and distinguish the various patterns, visuals, images, and objects in the environment. The reason for this difficulty arises because the interpretability of the human brain and eyes differ from computers which interpret most of the outputs in either 0’s or 1’s i.e. in binary. The images are often times converted in arrays of three dimensions consisting of the colors red, blue, green. They have a range of values that can be computed from 0 to 255 and using this conventional means of arrays, we can write code exclusive to identify and recognize images. With the rising technology and advancements in machine learning, deep learning, and computer vision, modern computer vision projects can solve complicated tasks like image segmentation and classification, object detection, face recognition, and so much more.

人類可以輕鬆識別物體及其周圍的環境。 但是,計算機識別並區分環境中的各種圖案,視覺效果,影象和物件並非易事。 出現這種困難的原因是因為人腦和眼睛的可解釋性與計算機不同,計算機以0或1(即二進位制)解釋大多數輸出​​。 影象通常按三維陣列進行轉換,三維陣列由紅色,藍色,綠色組成。 它們具有可以在0到255之間計算的值範圍,並且使用這種常規的陣列方法,我們可以編寫專有程式碼來識別和識別影象。 隨著技術的進步以及機器學習,深度學習和計算機視覺的進步,現代計算機視覺專案可以解決複雜的任務,例如影象分割和分類,物件檢測,人臉識別等等。

We will be looking at two projects for beginners to get started with computer vision, then we will look at two more intermediate level projects to gain a more solid foundation of computer vision with machine learning and deep learning. Finally, we will look at one advanced level computer vision project using deep learning. For each project, we will briefly discuss the theory related to the particular project. After this, we will understand how these projects can be handled and optimized. I will try to provide at least one link to the resources that will help you to get started with each of these projects.

我們將研究兩個專案供初學者使用計算機視覺入門,然後我們將研究另外兩個中級專案,以通過機器學習和深度學習獲得更堅實的計算機視覺基礎。 最後,我們將研究一個使用深度學習的高階計算機視覺專案。 對於每個專案,我們將簡要討論與特定專案有關的理論。 之後,我們將瞭解如何處理和優化這些專案。 我將嘗試提供至少一個指向這些資源的連結,這些連結將幫助您開始使用這些專案。

Image for post
Photo by Daniil Kuželev on Unsplash
DaniilKuželevUnsplash拍攝的照片

初級計算機視覺專案:(Beginner level computer vision projects:)

1.顏色檢測- (1. Color Detection —)

This is a basic project for beginners to get started with the computer vision module open-cv. Here, you can learn how exactly you can distinguish the various colors apart from each other. This starter project also helps in the understanding the concepts of masking and is perfect for a beginner level computer vision project. The task is to distinguish between the various colors like red, green, blue, black, white, etc. from the specific frame and display only the visible colors. This project allows the user to gain a better understanding of how exactly masking works for more complicated image classification and image segmentation tasks. This beginner project can be used to learn more detailed concepts of how exactly these images of numpy arrays are exactly stacked in the form of RGB images. You can also learn about the conversion of images from the color form into a form of grayscale images.

這是初學者入門的計算機視覺模組open-cv的基礎專案。 在這裡,您可以瞭解如何準確地區分各種顏色。 該入門專案還有助於理解蒙版的概念,非常適合初學者級別的計算機視覺專案。 任務是從特定框架中區分各種顏色,例如紅色,綠色,藍色,黑色,白色等,並僅顯示可見顏色。 該專案使使用者可以更好地瞭解遮罩對於更復雜的影象分類和影象分割任務的工作原理。 該初學者專案可用於學習有關如何將這些numpy陣列的影象準確地以RGB影象形式正確堆疊的更詳細的概念。 您還可以瞭解將影象從彩色形式轉換為灰度影象形式的知識。

More complex projects can be achieved with the same task by using deep learning models such as UNET or CANET to solve more complex image segmentation and classification tasks along with the maskings of each image. There is a wide range of complex projects available with deep learning approaches if you want to learn more.

通過使用諸如UNET或CANET之類的深度學習模型來解決更復雜的影象分割和分類任務以及每個影象的遮罩,可以用相同的任務完成更復雜的專案。 如果您想了解更多資訊,則可以使用深度學習方法來獲得各種各樣的複雜專案。

There are lots of free resources available online to get started with the color detection project of your choice. After researching and looking at the various resources and choices I found the below reference to be quite optimal because it has a YouTube video as well a detailed explanation of the code. Both the starter code and the video demonstration is provided by them.

線上提供了許多免費資源,可以開始使用您選擇的顏色檢測專案。 在研究並查看了各種資源和選擇之後,我發現以下參考文獻是最佳的,因為它具有YouTube視訊以及程式碼的詳細說明。 它們都提供了入門程式碼和視訊演示。

2.光學字元識別(OCR)— (2. Optical Character Recognition (OCR) —)

This is another basic project best suited for beginners. Optical character recognition is the conversion of 2-Dimensional text data into a form of machine-encoded text by the use of an electronic or mechanical device. You use computer vision to read the image or text files. After reading the images, use the pytesseract module of python to read the text data in the image or the PDF and then convert them into a string of data that can be displayed in python.

這是另一個最適合初學者的基礎專案。 光學字元識別是通過使用電子或機械裝置將二維文字資料轉換為機器編碼文字的形式。 您使用計算機視覺讀取影象或文字檔案。 讀取影象後,使用python的pytesseract模組讀取影象或PDF中的文字資料,然後將它們轉換為可以在python中顯示的資料字串。

The installation of the pytesseract module might be slightly complicated so refer to a good guide to get started with the installation procedure. You can also look at the resource link provided below to make the overall installation process easier. It also guides you through an intuitive understanding of optical character recognition. Once you have an in-depth understanding of how OCR works and the tools required, you can proceed to compute more complex problems. This can be using sequence to sequence attention models to convert the data read by OCR from one language into another.

pytesseract模組的安裝可能會稍微複雜一些,因此請參考良好的指南以開始進行安裝過程。 您也可以檢視下面提供的資源連結,以簡化整個安裝過程。 它還會指導您直觀瞭解光學字元識別。 一旦您對OCR的工作原理和所需的工具有了深入的瞭解,就可以繼續計算更復雜的問題。 可以使用序列對注意力模型進行序列化,以將OCR讀取的資料從一種語言轉換為另一種語言。

Here are two links that will help you to get started with Google text-to-speech and optical character recognition. View the references provided in the optical character recognition link to understand more concepts and learn about OCR in a more detailed approach.

這是兩個連結,可幫助您開始使用Google文字語音轉換和光學字元識別。 檢視光學字元識別連結中提供的參考,以瞭解更多概念並以更詳細的方式瞭解OCR。

中級計算機視覺專案: (Intermediate level computer vision projects:)

1.使用深度學習進行人臉識別- (1. Face Recognition using Deep Learning —)

Face recognition is the procedural recognition of a human face along with the authorized name of the user. Face detection is a simpler task and can be considered as a beginner level project. Face detection is one of the steps that is required for face recognition. Face detection is a method of distinguishing the face of a human from the other parts of the body and the background. The haar cascade classifier can be used for the purpose of face detection and accurately detect multiple faces in the frame. The haar cascade classifier for frontal face is usually an XML file that can be used with the open-cv module for reading the faces and then detecting the faces. A machine learning model such as the histogram of oriented gradients (H.O.G) which can be used with labeled data along with support vector machines (SVM’s) to perform this task as well.

人臉識別是對人臉以及使用者授權名稱的程式識別。 人臉檢測是一項較簡單的任務,可以視為初學者級專案。 人臉檢測是人臉識別所需的步驟之一。 人臉檢測是一種將人的臉與身體其他部位和背景區分開的方法。 haar級聯分類器可用於面部檢測的目的,並準確檢測幀中的多個面部。 用於正面人臉的haar級聯分類器通常是XML檔案,可與open-cv模組一起使用以讀取人臉然後檢測人臉。 諸如定向梯度直方圖(HOG)之類的機器學習模型也可以與標記資料以及支援向量機(SVM)一起使用,以執行此任務。

The best approach for face recognition is to make use of the DNN’s (deep neural networks). After the detection of faces, we can use the approach of deep learning to solve face recognition tasks. There is a huge variety of transfer learning models like VGG-16 architecture, RESNET-50 architecture, face net architecture, etc. which can simplify the procedure to construct a deep learning model and allow users to build high-quality face recognition systems. You can also build a custom deep learning model for solving the face recognition task. The modern models built for face recognition are highly accurate and provide an accuracy of almost over 99% for labeled datasets. The applications for the face recognition models can be used in security systems, surveillance, attendance systems, and a lot more.

面部識別的最佳方法是利用DNN(深度神經網路)。 在檢測到人臉之後,我們可以使用深度學習的方法來解決人臉識別任務。 遷移學習模型種類繁多,例如VGG-16架構,RESNET-50架構,人臉網路架構等,可以簡化構建深度學習模型的過程,並允許使用者構建高質量的人臉識別系統。 您還可以構建自定義的深度學習模型來解決人臉識別任務。 用於人臉識別的現代模型具有很高的準確性,可為標記的資料集提供幾乎超過99%的準確性。 人臉識別模型的應用程式可用於安全系統,監視,考勤系統等。

Below is an example of a face recognition model built by me using the methods of VGG-16 transfer learning for face recognition after the face detection is performed by the haar cascade classifier. Check it out to learn a more detailed explanation of how exactly you can build your very own face recognition model.

以下是由我通過Haar級聯分類器執行人臉識別後,使用VGG-16轉移學習方法進行人臉識別的人臉識別模型的示例。 進行檢查,以瞭解有關如何精確構建自己的面部識別模型的更詳細說明。

2.物件檢測/物件跟蹤- (2. Object Detection/Object Tracking —)

This computer vision project could easily be considered a fairly advanced one but there are so many free tools and resources that are available that you could complete this task without any complications. The object detection task is the method of drawing a bounding box around the recognized object and identifying the recognized object according to the determined labels and predict these with specific accuracies. the object tracking is slightly different in comparison to the object detection, as you not only detect the particular object but also follow the object with the bounding box around it. Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. With this kind of identification and localization, object detection can be used to count objects in a scene and determine and track their precise locations, all while accurately labeling them. An example of this can be either following a particular vehicle on a road path or tracking a ball in any sports game like golf, cricket, baseball, etc. The various algorithms to perform these tasks are R-CNN’s (Region-based convolutional neural networks), SSD (single shot detector), and YOLO (you only look once) among many others.

這個計算機視覺專案很容易被認為是一個相當高階的專案,但是有太多可用的免費工具和資源,您可以毫無困難地完成此任務。 物件檢測任務是這樣一種方法:在已識別的物件周圍繪製一個邊界框,並根據確定的標籤識別已識別的物件,並以特定的精度進行預測。 與物件檢測相比,物件跟蹤略有不同,因為您不僅可以檢測到特定物件,還可以跟隨物件並使其周圍帶有邊界框。 物件檢測是一種計算機視覺技術,可讓我們識別和定點陣圖像或視訊中的物件。 通過這種識別和定位,物件檢測可用於對場景中的物件進行計數並確定和跟蹤其精確位置,同時還能對它們進行精確標記。 這樣的示例可以是沿著道路上的特定車輛行駛,或者是在任何體育比賽中(例如高爾夫,板球,棒球等)跟蹤球。執行這些任務的各種演算法是R-CNN(基於區域的卷積神經網路) ),SSD(單發檢測器)和YOLO(您只能看一次)等等。

I am going to mention 2 of the best resources by two talented programmers. One method is more so for embedded systems like the raspberry pi and the other one is for PC related real-time webcam object detection. These two below resources are some of the best ways to get started with object detection/object tracking and they have YouTube videos explaining them in detail as well. Please do check out these resources to gain a better understanding of object detection.

我將提到兩個有才華的程式設計師的最佳資源中的2個。 對於像樹莓派這樣的嵌入式系統,一種方法更為有效,而另一種方法則是與PC相關的實時網路攝像頭物件檢測。 下面的這兩個資源是開始進行物件檢測/物件跟蹤的一些最佳方法,並且還有YouTube視訊也對它們進行了詳細說明。 請檢查這些資源,以更好地瞭解物件檢測。

高階計算機視覺專案: (Advanced level computer vision projects:)

1.人類的情感和手勢識別 (1. Human Emotion and Gesture Recognition —)

This project uses computer vision and deep learning to detect the various faces and classify the emotions of that particular face. Not only do the models classify the emotions but also detects and classifies the different hand gestures of the recognized fingers accordingly. After distinguishing the human emotions or gestures a vocal response is provided by the trained model with the accurate prediction of the human emotion or gesture respectively. The best part about this project is the wide range of data set choices you have available to you.

該專案使用計算機視覺和深度學習來檢測各種面Kong並對該特定面Kong的情緒進行分類。 這些模型不僅可以對情緒進行分類,而且可以相應地檢測和分類識別出的手指的不同手勢。 在區分人類情緒或手勢之後,由訓練模型提供的語音響應分別具有對人類情緒或手勢的準確預測。 該專案最好的部分是您可以使用的多種資料集選擇。

The below link is a reference to one of the deep learning projects done by me by using methodologies of computer vision, data augmentation, and libraries such as TensorFlow and Keras to build deep learning models. I would highly recommend viewers to check the below 2-part series for a complete breakdown, analysis, and understanding of how to compute the following advanced computer vision task. Also, make sure to refer to the Google text-to-speech link provided in the previous section to understand how the vocal text conversion of text to speech works.

以下連結是對我通過使用計算機視覺,資料擴充和TensorFlow和Keras等庫構建深度學習模型的方法完成的一個深度學習專案的引用。 我強烈建議觀看者檢查以下兩部分的系列,以獲取完整的細分,分析和對如何計算以下高階計算機視覺任務的理解。 另外,請確保參考上一節中提供的Google文字語音轉換連結,以瞭解將語音文字轉換為語音文字的工作方式。

Image for post
Photo by Anastasia Petrova on Unsplash
Anastasia PetrovaUnsplash拍攝的照片

結論:(Conclusion:)

These are the 5 awesome computer vision project ideas across various difficulty levels. The brief theory for each of the concepts along with a link to some helpful resources was provided accordingly. I hope this article helps the viewers to dive into the amazing field of computer vision and explore the various projects offered by the stream. If you are interested in learning everything about machine learning then feel free to check out my tutorial series that explains every concept about machine learning from scratch by referring to the link which is provided below. The parts of the series will be constantly updated on a weekly basis or sometimes even faster.

這些是跨各種難度級別的5個很棒的計算機視覺專案構想。 相應地提供了每個概念的簡要理論以及一些有用資源的連結。 我希望本文能幫助觀眾深入研究計算機視覺的驚人領域,並探索該流提供的各種專案。 如果您有興趣學習有關機器學習的所有知識,請隨時閱讀我的​​教程系列,該教程通過參考下面提供的連結從頭開始解釋機器學習的每個概念。 該系列的各個部分將每週或有時甚至更快地不斷更新。

Thank you all for sticking on till the end and I hope you enjoyed the read. Have a wonderful day!

謝謝大家一直堅持到最後,希望您閱讀愉快。 祝你有美好的一天!

翻譯自: https://towardsdatascience.com/5-awesome-computer-vision-project-ideas-with-python-machine-learning-and-deep-learning-721425fa7905