What is Machine Learning?

Machine learning is a branch of artificial intelligence that deals with self-improving algorithms. The algorithms "learn" by recording the results of vast quantities of data processing actions. Over time, the algorithm improves its functionality without being explicitly programmed.  

How Does Machine Learning Work? 

To understand machine learning, consider a simple use case. An office mailbox has a steady stream of incoming emails from customers. The office manager would like an automated system that tags each email and routes it to the correct department so that payment queries go to the billing team, product queries go to sales, and so on. 

One way to handle this is to examine each email for keywords, which could be stored in a lookup table. Anything with "pay," "cash," or "balance" in the text could go to billing, while "purchase," "order," or "product" could go to sales. 

This algorithm would work to a certain extent, but the developers would have to include every possible keyword in the lookup table. They would also need to resolve clashes – where would the system route an email that said, "I would like to pay the balance for my recent product order"?

With machine learning techniques, the algorithm learns each time it processes data. In the above example, the algorithm may try to route the email to the sales department. This action would be then be flagged as incorrect, so the algorithm now knows that this particular phrasing is a billing query. Future queries of a similar nature will be routed directly to billing – without any manual intervention. 

What Are The Methods Of Machine Learning? 

All machine learning methods are based on the same basic principles: create a machine learning algorithm and "train" it by allowing it to process large datasets. There are four different approaches to doing this, depending on the desired outcome: 

Supervised Machine Learning

In supervised learning, the data is already tagged with the predicted outcome. In the example above, the incoming data would be tagged with the correct department. 

This data trains the algorithm to seek a particular set of outcomes. The algorithm can build working models to describe classification and regression based on historical values. 

These models are useful when working with consistent incoming data. For example, credit card transaction data is often relatively uniform. Regression testing can reveal any unusual outliers, which could potentially indicate fraud. 

Unsupervised Machine Learning

In unsupervised learning, the algorithm is left to figure out the correct outcome for any data process. Without pre-labeled data to rely on, the algorithm must process the data, identify structures, and form its own model. 

Usually, this involves techniques such as clustering, anomaly detection, and adversarial networks. Without data labels to guide it, the algorithm looks for its own structure in data.  

This can be useful in techniques such as data exploration where little is known about the data contents. The machine learning process can flag up interesting structures that might be suitable for deeper analytics. 

Semi-supervised Machine Learning

In semi-supervised learning, the algorithm is provided with both labeled and unlabelled data. The labeled data helps the algorithm to infer the correct outcomes and build functioning models. These models can then be used to process the unlabeled data. 

Semi-supervised learning is often used as a compromise where there aren't resources to label all available data. The machine learning algorithm can work with what it has to build a functioning set of data rules. 

Natural Language Processing, or NLP, uses semi-supervised learning. NLP deals with the processing of either written or spoken language. The machine learning algorithm is provided with a corpus – a dictionary and some sentiment analysis data – and it gradually learns to interpret language with increasing nuance, based on experience. 

Reinforcement Learning

In reinforcement learning, the algorithm may attempt a number of different solutions to a problem. It then compares the results and learns to favor the best outcomes. 

In the example above, learning can be reinforced if each department refuses to accept misrouted emails. The algorithm might try to send each email to multiple departments and note whether the email is accepted or refused. When it gets things right, the positive feedback reinforces the correct behavior. 

Reinforcement learning is used in dynamic environments like networking, IoT, and robotics. This kind of algorithm can respond quickly to sudden changes. For example, if a network path fails, it can start routing traffic through a different channel. 

How Is Machine Learning Used in Data Storage? 

Machine learning isn't a standalone application. Instead, machine learning techniques are being integrated into most tools and platforms. This adds an extra layer of AI to the storage, retrieval, and analysis of data. 

Machine learning can play a big part in standard operations, such as:

  • ETL: Extract, Transform, Load processes apply transformations to incoming data before loading to the destination repository. This incoming data may be unpredictable to some extent, either because of the nature or the quality of the data. Machine learning algorithms can learn to react to variations and ensure a smooth data flow. 
  • Data integration: When multiple data sources are integrated, it can lead to errors, incompatibilities, and data loss. Machine learning algorithms can be deployed to tackle these issues on the fly, reducing the need for manual intervention and reducing overall processing time.
  • Data Exploration: Data exploration is the process of searching through data for patterns and clusters that might not be immediately visible. In a large repository, such as a data lake, this can only be done with machine learning tools. 
  • Structuring Data: Unstructured data such as files, images, audio, and documents, are not held in a table structure. Machine learning can help to add structure to this data by tagging it or creating metadata. Natural language processing is one common application of this type of machine learning. 

Machine learning is not true artificial intelligence, but it's currently one of the most widely-used forms of intelligent algorithm design. 

Share This Article
facebook linkedin twitter

Glossary of Terms

A guide to the nomenclature of data integration technology.