Data analytics and data science are popular terms, and skills in these areas are in great demand. But what do these terms mean? Below is an overview and a listing of related courses. For information about our certificate programs in data science and analytics, click here.
→Test Yourself
Take a 10-question quiz on analytics
Data Prep
It is a truism that most of the work in data mining is not in algorithm specification, application and interpretation. It is in extracting, cleaning and preparing data. Learn how to extract data from a relational database using SQL, and merge it into a single file in R, so that you can perform statistical operations.
Predictive Modeling and Forecasting
In predictive modeling (also called predictive analytics) we seek to predict the value of a variable of interest (purchase/no purchase, fraudulent/not fraudulent, malignant/benign, amount of spending, etc.) by using “training” data where the value of this variable is known. Once a statistical model is built with the training data (“trained”), it is then applied to data where the value is unknown. Predictive modeling is also termed “supervised learning” and is covered in the following courses:
- Customer Analytics Using R
- Predictive Analytics 1 (Excel section, R section, Python section)
- Predictive Analytics 2 (Excel section, R section, Python section)
- Predictive Analytics 3 (Excel section, R section, Python section)
- Forecasting Analytics
- Applied Predictive Analytics
- Machine Learning Using Weka
- Deep Learning
Anomaly Detection
Anomaly detection is a special application of predictive modeling in which the goal is to detect unusual events (e.g. an attack on a network). There may be training examples of these anomalies, but often there are no exemplars, so unsupervised methods must be used to detect outliers.
Recommender Systems
The purpose of a recommender system is to identify, statistically, “what goes with what.” These systems lie behind the notices you see on web sites advising you that “customers who bought X also bought Y.” The general statistical terms for the methods used are affinity analysis and association rules; these are unsupervised methods.
Segmentation/Clustering
In clustering, we seek to identify groups of customers, records, etc. that are similar to one another. “Clustering” is the general statistical technique; when we apply it to customers it is the statistical component in customer segmentation. Clustering is an “unsupervised” data mining method – there is no known outcome that serves to train a model.
Text Analytics & Social Network Analysis
The most rapid data growth is not in numerical data, but in text – Twitter feeds, the contents of Facebook pages, emails, etc. – which must be pre-processed to be usable. Learn more:
Tools to Use in Data Analytics
Visualization
Graphical visualization techniques are important ways to explore data, gain insight, and deal with the complexity of big data.