Introduction to NLP and Text Mining
In this course you will be introduced to the essential techniques of natural language processing (NLP) and text mining with Python.
Overview
In this course you will be introduced to the essential techniques of natural language processing (NLP) and text mining with Python. The course will discuss how to apply unsupervised and supervised modeling techniques to text, and devote considerable attention to data preparation and data handling methods required to transform unstructured text into a form in which it can be mined.
- Intermediate, Advanced
- 4 Weeks
- Expert Instructor
- Tuiton-Back Guarantee
- 100% Online
- TA Support
Learning Outcomes
This course focuses on learning key concepts, tools and methodologies for natural language processing with an emphasis on hands-on learning through guided tutorials and real-world examples. You will learn how to:
- Process text data and strings, and perform pattern matching with regular expressions in Python
- Preprocess and wrangle noisy text data via stemming, lemmatization, tokenization, removal of stop-words and more
- Represent text data in structured and easy-to-consume formats for machine learning and text mining
- Represent text documents using features related to text word frequency, parts of speech and sentiment
- Represent text documents using vectorized features like bag-of-words, TF-IDF, and document similarity
- Use the concepts of information retrieval and document similarity (e.g. in applications like recommender systems)
- Perform unsupervised NLP using techniques like keyphrase extraction, topic modeling and text summarization
- Leverage pre-trained models for part-of-speech (POS) tagging and named entity recognition (NER)
- Develop supervised models to classify documents
Who Should Take This Course
Data scientists and aspiring data scientists who want to analyze text data and build models that use text data.
Our Instructors
Mr. Dipanjan Sarkar
He holds a master of technology degree from IIIT Bangalore, with specializations in data science and software engineering and completed his post graduate diploma in machine learning and artificial intelligence from Columbia University in the City of New York.
Dipanjan has been an analytics practitioner and consultant for several years now, specializing in machine learning, natural language processing, computer vision and deep learning. Having a passion for data science and education, he also acts as an AI Advisor, Subject Matter Expert and Instructor at various organizations like Springboard, Propulsion Academy and Statistics.com (The Institute for Statistics Education) where he helps people build their skills on areas in data science and artificial intelligence. Dipanjan also beta-tests new courses on data science for popular MOOC platform, Coursera, before they are released. He is a published author, having authored several books on R, Python, Machine Learning, Natural Language Processing, and Deep Learning which includes Text Analytics with Python 2nd ed.
Course Syllabus
Week 1
Introduction and Text Data Preparation
- Introduction to NLP & NLP applications
- Python for NLP
- NLP basics – Parsing Text and Exploring Text Corpora
- Tokenization and POS Tags
- Shallow Parsing
- Constituency Parsing
- Corpus Analysis
- WordNet & Synsets
- Working with Text and Regular Expressions
Week 2
Feature Engineering and Representation
- Introduction to text pre-processing and wrangling
- Text pre-processing and wrangling – methodologies
- Build your own text pre-processor
- Non-vectorized text feature engineering
- Vectorized representations of text features
- Keyphrase Extraction – Concepts and Methodologies
Week 3
Unsupervised Natural Language Processing
- Introduction to text pre-processing and wrangling
- Text pre-processing and wrangling – methodologies
- Build your own text pre-processor
- Non-vectorized text feature engineering
- Vectorized representations of text features
- Keyphrase Extraction – Concepts and Methologies
Week 4
Information Extraction
- Introduction to Supervised natural language processing
- Text Classification – concepts and methodologies
- Machine Learning for Text Classification
- Sequential Tagging Models
- Parts of Speech Tagging
- Named Entity Recognition
Class Dates
2024
Instructors: Mr. Dipanjan Sarkar
Instructors: Mr. Dipanjan Sarkar
Instructors: Mr. Dipanjan Sarkar
2025
Instructors: Mr. Dipanjan Sarkar
Instructors: Mr. Dipanjan Sarkar
Instructors: Mr. Dipanjan Sarkar
Prerequisites
Predictive Analytics 1 – Machine Learning Tools
- Skill: Intermediate, Advanced
- Credit Options: ACE, CAP, CEU
The Statistics.com courses have helped me a lot, pushing me to the limit and making me learn much more than I expected I could. The knowledge I gained I could immediately leverage in my job … then eventually led to landing a job in my dream company – Amazon.
Karolis Urbonas
This program has been a life and work game changer for me. Within 2 weeks of taking this class, I was able to produce far more than I ever had before.
Susan Kamp
The material covered in the Analytics for Data Science Certificate will be indispensable in my work. I can’t wait to take other courses. Great work!
Stephen McAllister
I learned more in the past 6 weeks than I did taking a full semester of statistics in college, and 10 weeks of statistics in graduate school. Seriously.
Amir Aminimanizani
This is the best online course I have ever taken. Very well prepared. Covers a lot of real-life problems. Good job, thank you very much!
Elena Rose
The more courses I take at Statistics.com, the more appreciation I have for the smart approach, quality of instructors, assistants, admin and program. Well done!
Leonardo Nagata
This course greatly benefited me because I am interested in working in AI. It has given me solid foundational knowledge…After completing this last course, I feel I have gained valuable skills that will enhance my employability in Data Science, opening up diverse career opportunities.
Richard Jackson
Frequently Asked Questions
-
What is your satisfaction guarantee and how does it work?
-
Can I transfer or withdraw from a course?
-
Who are the instructors at Statistics.com?
Visit our knowledge base and learn more.
Additional Information
Homework
Homework in this course consists of short answer questions to test concepts and guided data analysis problems using software.
In addition to assigned readings, this course also has a get started guide, and supplemental readings available online.
Course Text
The text used for the practical work in this course is Text Analytics with Python (Apress, 2019) by Dipanjan Sarkar, chosen for its wealth of hands on Python illustrations and code. The code for these illustrations is organized here:
https://github.com/dipanjanS/text-analytics-with-python/tree/master/New-Second-Edition
Note: this text is also used in the follow on course, NLP and Deep Learning.
For a well-written guide to foundational concepts and context, you may wish to consider Fundamentals of Predictive Text Mining (Springer, 2015) by Weiss, Indurkhya and Zhang.
Software
This course provides problems and illustrations in Python, and assumes some familiarity with that language.
Supplemental Information
Literacy, Accessibility, and Dyslexia
At Statistics.com, we aim to provide a learning environment suitable for everyone. To help you get the most out of your learning experience, we have researched and tested several assistance tools. For students with dyslexia, colorblindness, or reading difficulties, we recommend the following web browser add-ons and extensions:
Chrome
- Color Enhancer (for colorblindness)
- HelperBird (for colorblindness, dyslexia, and reading difficulties)
Firefox
- Mobile Dyslexic
- Color Vision Simulation (native accessibility feature)
- Other native accessibility features instructions
Safari
- Navidys (for colorblindness, dyslexia, and reading difficulties)
- HelperBird for Safari (for colorblindness, dyslexia, and reading difficulties)