Unsupervised learning is a branch of machine learning, characterized by the analysis and clustering of unlabeled datasets. To do so, these algorithms learn to find patterns or groups in the data, with little to no human intervention. In mathematical terms, unsupervised learning involves observing several instances of a vector X and learning the probability distribution p(X) for these instances.
This method is in contrast with supervised learning, including classification and regression, in which the model is given a labeled training set to learn from. Hence, supervised and unsupervised learning models differ based on their input dataset. Indeed, a supervised learning model utilizes labeled input and output data, while an unsupervised learning model learns from an unlabeled training set to make predictions about the classification of the points. Thus, with an unsupervised learning model, the goal is to get insight from large volumes of data, unlike a supervised model where the goal is to predict an outcome for new data.
If you wish to learn more about supervised machine learning and its various methods, please view the previous articles which are dedicated to these topics.
Summary of the composition of machine learning
Two types of unsupervised learning problems
We can think of unsupervised learning problems as being divided into two categories: clustering and association rules.
Clustering is an unsupervised learning technique, which groups unlabeled data points based on their similarity and differences. Hence, points are grouped into clusters in such a way that those in a same group have the most similarity with each other, while points in different groups have little to no similarities. To do so, cluster analysis finds features or characteristics among the data points and groups them based on the absence or presence of these features. Clustering methods include k-means clustering, hierarchical clustering, or even probabilistic clustering.
On the other hand, association rules are another form of unsupervised learning, which find relationships between points in a dataset. In other words, these algorithms find the points that occur together in a database. These methods are often used for market basket analysis, which allows companies to understand the relationship between the purchase of different products. Indeed, it would allow to establish relationships of the form : “People who buy product X also tend to buy product Y”. Association algorithms include the Apriori algorithm, the Eclat algorithm and the FP-growth algorithm.
There are various unsupervised learning algorithms, which we will dive into in the following article. Please take a look at it if you wish to learn more about these methods.
Applications of unsupervised learning
Although unsupervised learning tends to be more difficult to use and less accurate than supervised learning, this complex method still has many specific applications, for which it is chosen over supervised models.
a. Anomaly detection in Finance
Unsupervised learning algorithms have the capability of analyzing large amounts of data and identifying unusual points among the dataset. Once those anomalies have been detected, they can be brought to the awareness of the user, who can then decide to act or not on this warning. Anomaly detection can be very useful in the financial and banking sectors. Indeed, financial fraud has become a daily problem, due to the ease with which credit card details can be stolen. Using unsupervised learning models, unauthorized or fraudulent transactions on a bank account can be identified as it will most often constitute a change in the user’s normal pattern of spending.
To do so, an anomaly detection algorithm can be used. This model will take as input the credit card history of the individual, which includes the type of transactions, the amounts of the transactions, the time of the transactions, the location at which the transaction was made and so on. Using all this information, the model will be able to flag a fraudulent transaction based on how it stands out from the rest of the transactions. Upon receiving the warning, the bank can inform the card holder, who can then decide to block the card or attest that this transaction is actually not fraudulent.
Today, many banks, such as Bank of America, use these anomaly detection models to insure the safety of their clients’ information and savings.
Anomaly detection can also have other applications such as to flag human errors or breaches in security.
b. Clustering with medical data
When it comes to the medical domain, large amounts of data are usually available with no “labels”. In this domain, unsupervised learning is prioritized as labeling medical data can be very time consuming and expensive, making unsupervised models more appropriate than supervised methods. Thus, unsupervised clustering models can be very useful for image detection, segmentation, and classification. Indeed, the clustering algorithm can receive a large amount of unlabeled medical data as input and identify clusters or patterns of information, which would have been difficult for doctors to identify.
For example, an unsupervised learning algorithm can be ran on a neurological disease dataset to identify factors leading to a disease or subgroups corresponding to different stages of the disease’s progression.
c. Recommendation engines and personalized adds
Using unsupervised learning, the model can take as input the purchase and/or the search history of an individual and use it to find trends or predictions. Hence, companies can use this type of model to make effective selling strategies and targeted adds, which would be specific to an individual based on its behavior data.
Many other applications of unsupervised learning exist and all tend to help businesses identify patterns in large amounts of data, which couldn’t be done effectively by human experts. For instance, these algorithms are also used by Google News or Apple News to categorize articles on the same stories, coming in from different news outlets.
What’s next
Hopefully this article will have given you a good understanding of what unsupervised learning is and some of its interesting applications.
As you may have deduced, both supervised and unsupervised learning models have their own clear range of applications based on the outcome wished by the user. In the following article, we will be diving into the main unsupervised learning algorithms, such as k-means clustering, hierarchical clustering or even the Apriori algorithm.