Have a question?
Message sent Close
0 reviews

Text Mining & Optical Character Recognition with Python

Topic modelling, news classification, NER, sentiment analysis, keyword extraction, license plate recognition system
Christ Raharja
1,043 Students enrolled
  • Description
  • Curriculum
  • FAQ
  • Reviews

Welcome to Text Mining & Optical Character Recognition with Python course. This is a comprehensive project-based course where you will learn step-by-step how to perform advanced text mining techniques using natural language processing. Additionally, you will also build an optical character recognition system using several Python libraries like EasyOCR and Tesseract. The OCR system will have the capability of extracting text from various document types and images. This course perfectly combines text mining with computer vision, providing an ideal opportunity to practice your programming skills by building complex projects with real-world applications. In the introduction session, you will learn the basic fundamentals of text mining and optical character recognition, such as getting to know their use cases, how those technologies work, technical challenges and limitations. Then, in the next session, we will download text datasets from Kaggle, the data will contain hundreds or even thousands of unstructured text. Before starting the project, we will learn about basic text mining techniques like tokenization, stopwords removal, stemming, lemmatization, and text normalization. This section is very important as it provides you with a basic understanding of text mining. Afterward, we will start the project section, for text mining, we will have eight projects, in the first project, we will build named entity recognition system for news article, in the second project, we will create topic modeling system for academic research, in the third project, we will create news article classification and categorization using TF-IDF, in the fourth project, we will build text summarization system for research paper, in the fifth project, we will create keyword extraction system for searching engine optimization tool, in the sixth project, we will perform sentiment analysis on product review, in the seventh project, we will build plagiarism detection tool, and in the last project, we will create spam email classification system. In the next section, we will learn basic techniques required for OCR like image processing and region of interest identification. Meanwhile, for OCR, we will have three projects, in the first project, we will build a car license plate recognition system, in the second project, we will create a handwriting recognition system, and in the last project, we will build a receipts scanner system.

First of all, before getting into the course, we need to ask ourselves this question: why should we learn about text mining and optical character recognition? Well, here is my answer: Text mining and optical character recognition are essential for transforming unstructured text data into valuable insights, enabling businesses and researchers to analyze and interpret vast amounts of information efficiently. These technologies play a crucial role in automating data extraction and analysis processes, reducing manual effort and increasing accuracy. Additionally, in fields such as healthcare, finance, and legal, text mining and OCR are indispensable for managing large volumes of documents, extracting relevant information, and ensuring compliance with regulatory requirements. Moreover, by mastering these techniques, we equip ourselves with the skills needed to develop advanced data-driven applications, ultimately enhancing our ability to solve complex real-world problems through data science and artificial intelligence

Below are things that you can expect to learn from this course:

  • Learn the basic fundamentals of text mining and its use cases

  • Learn the basic fundamentals of optical character recognition and its use cases

  • Learn how text mining works. This section covers data collection, text preprocessing, feature extraction, text analysis and modeling

  • Learn how optical character recognition works. This section covers capturing image, preprocessing, text localization, character segmentation, character recognition, and output generation

  • Learn how to do tokenization and remove stopwords using NLTK

  • Learn how to perform stemming, lemmatization, and text localization using NLTK

  • Learn how to build named entity recognition system using Spacy and Flair

  • Learn how to perform topic modeling using Gensim and LDA

  • Learn how to build news article classification using TF-IDF

  • Learn how to build text summarizer using Transformers and BART

  • Learn how to extract keywords using Rake NLTK and Spacy

  • Learn how to perform sentiment analysis using TextBlob and BERT

  • Learn how to build plagiarism detection tool using TF-IDF & Cosine Similarity

  • Learn how to build spam email detection tool using support vector machine

  • Learn how to do image processing and identify region of interest

  • Learn how to build car license plate recognition system using EasyOCR

  • Learn how to build handwriting recognition system using EasyOCR

  • Learn how to build receipt scanner system using Tesseract

Introduction to Text Mining
Introduction to Optical Character Recognition
Finding & downloading Datasets From Kaggle
Tokenization & Removing Stopwords with NLTK
Stemming, Lemmatization, and Text Normalization with NLTK
Building Named Entity Recognition System with Spacy & Flair
News Articles Classification with TF-IDF
Summarizing Text with Transformers & BART
Extracting Keywords with Rake NLTK & Spacy
Sentiment Analysis with TextBlob & BERT
Building Plagiarism Detection Tool with TF-IDF & Cosine Similarity
Image Processing & Region of Interest Identification
Building Car License Plate Recognition System with EasyOCR
Building Handwriting Recognition System with EasyOCR
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
Course details
Video 4 hours
Certificate of Completion

External Links May Contain Affiliate Links read more

Join our Telegram Channel To Get Latest Notification & Course Updates!
Join Our Telegram For FREE Courses & Canva PremiumJOIN NOW