## Introduction

*Artificial intelligence (AI)*, *deep learning*, and *neural networks* represent incredibly exciting and powerful machine learning-based techniques used to solve many real-world problems. For a primer on machine learning, you may want to read this five-part series that I wrote.

While human-like deductive reasoning, inference, and decision-making by a computer is still a long time away, there have been remarkable gains in the application of AI techniques and associated algorithms.

The concepts discussed here are extremely technical, complex, and based on mathematics, statistics, probability theory, physics, signal processing, machine learning, computer science, psychology, linguistics, and neuroscience.

That said, this article is not meant to provide such a technical treatment, but rather to explain these concepts at a level that can be understood by most non-practitioners, and can also serve as a reference or review for technical folks as well.

The primary motivation and driving force for these areas of study, and for developing these techniques further, is that the solutions required to solve certain problems are incredibly complicated, not well understood, nor easy to determine manually.

Increasingly, we rely on these techniques and machine learning to solve these problems for us, without requiring explicit programming instructions. This is critical for two reasons. The first is that we likely wouldn’t be able, or at least know how to write the programs required to model and solve many problems that AI techniques are able to solve. Second, even if we did know how to write the programs, they would be inordinately complex and nearly impossible to get right.

Luckily for us, machine learning and AI algorithms, along with properly selected and prepared training data, are able to do this for us.

So with that, let’s get started!

## Artificial Intelligence Overview

In order to define AI, we must first define the concept of *intelligence* in general. A paraphrased definition based on Wikipedia is:

Intelligence can be generally described as the ability to perceive information, and retain it as knowledge to be applied towards adaptive behaviors within an environment or context.

While there are many different definitions of intelligence, they all essentially involve learning, understanding, and the application of the knowledge learned to achieve one or more goals.

It’s therefore a natural extension to say that AI can be described as intelligence exhibited by machines. So what does that mean exactly, when is it useful, and how does it work?

A familiar instance of an AI solution includes *IBM’s Watson*, which was made famous by beating the two greatest *Jeopardy* champions in history, and is now being used as a *question answering computing system* for commercial applications. *Apple’s Siri* and *Amazon’s Alexa* are similar examples as well.

In addition to speech recognition and natural language (processing, generation, and understanding) applications, AI is also used for other recognition tasks (pattern, text, audio, image, video, facial, …), autonomous vehicles, medical diagnoses, gaming, search engines, spam filtering, crime fighting, marketing, robotics, remote sensing, computer vision, transportation, music recognition, classification, and so on.

Something worth mentioning is a concept known as the *AI effect*. This describes the case where once an AI application has become somewhat mainstream, it’s no longer considered by many as AI. It happens because people’s tendency is to no longer think of the solution as involving real intelligence, and only being a application of normal computing.

This despite the fact that these applications still fit the definition of AI regardless of widespread usage. The key takeaway here is that today’s AI is not necessarily tomorrow’s AI, at least not in some people’s minds anyway.

There are many different goals of AI as mentioned, with different techniques used for each. The primary topics of this article are artificial neural networks and an advanced version known as deep learning.

## Biological Neural Networks Overview

The human brain is exceptionally complex and quite literally the most powerful computing machine known.

The inner-workings of the human brain are often modeled around the concept of *neurons* and the networks of neurons known as *biological neural networks*. According to Wikipedia, it’s estimated that the human brain contains roughly 100 billion neurons, which are connected along pathways throughout these networks.

At a very high level, neurons interact and communicate with one another through an interface consisting of *axon terminals* that are connected to *dendrites* across a gap (*synapse*) as shown here.

In plain english, a single neuron will pass a message to another neuron across this interface if the sum of *weighted input signals* from one or more neurons (*summation*) into it is great enough (exceeds a *threshold*) to cause the message transmission. This is called *activation* when the threshold is exceeded and the message is passed along to the next neuron.

The *summation* process can be mathematically complex. Each neuron’s input signal is actually a *weighted combination* of potentially many input signals, and the weighting of each input means that that input can have a different influence on any subsequent calculations, and ultimately on the final output of the entire network.

In addition, each neuron applies a function or *transformation* to the weighted inputs, which means that the combined weighted input signal is transformed mathematically prior to evaluating if the *activation threshold* has been exceeded. This combination of weighted input signals and the functions applied are typically either linear or nonlinear.

These input signals can originate in many ways, with our senses being some of the most important, as well as ingestion of gases (breathing), liquids (drinking), and solids (eating) for example. A single neuron may receive hundreds of thousands of input signals at once that undergo the summation process to determine if the message gets passed along, and ultimately causes the brain to instruct actions, memory recollection, and so on.

The ‘thinking’ or processing that our brain carries out, and the subsequent instructions given to our muscles, organs, and body are the result of these neural networks in action. In addition, the brain’s neural networks continuously change and update themselves in many ways, including modifications to the amount of weighting applied between neurons. This happens as a direct result of learning and experience.

Given this, it’s a natural assumption that for a computing machine to replicate the brain’s functionality and capabilities, including being ‘intelligent’, it must successfully implement a computer-based or artificial version of this network of neurons.

This is the genesis of the advanced statistical technique and term known as *artificial neural networks*.

## Artificial Neural Networks Overview

*Artificial neural networks* (*ANNs*) are statistical models directly inspired by, and partially modeled on biological neural networks. They are capable of modeling and processing nonlinear relationships between inputs and outputs in parallel. The related algorithms are part of the broader field of machine learning, and can be used in many applications as discussed.

Artificial neural networks are characterized by containing *adaptive weights* along paths between neurons that can be tuned by a *learning algorithm* that *learns* from observed data in order to improve the model. In addition to the learning algorithm itself, one must choose an appropriate *cost function*.

The cost function is what’s used to *learn* the optimal solution to the problem being solved. This involves determining the best values for all of the tunable model parameters, with neuron path adaptive weights being the primary target, along with algorithm tuning parameters such as the *learning rate*. It’s usually done through *optimization* techniques such as *gradient descent* or *stochastic gradient descent*.

These optimization techniques basically try to make the ANN solution be as close as possible to the optimal solution, which when successful means that the ANN is able to solve the intended problem with high performance.

Architecturally, an artificial neural network is modeled using layers of *artificial neurons*, or computational units able to receive input and apply an activation function along with a threshold to determine if messages are passed along.

In a simple model, the first layer is the *input* layer, followed by one *hidden* layer, and lastly by an *output* layer. Each layer can contain one or more neurons.

Models can become increasingly complex, and with increased abstraction and problem solving capabilities by increasing the number of *hidden layers*, the number of neurons in any given layer, and/or the number of paths between neurons. Note that an increased chance of *overfitting* can also occur with increased model complexity.

Model architecture and tuning are therefore major components of ANN techniques, in addition to the actual learning algorithms themselves. All of these characteristics of an ANN can have significant impact on the performance of the model.

Additionally, models are characterized and tunable by the *activation function* used to convert a neuron’s weighted input to its output activation. There are many different types of transformations that can be used as the activation function, and a discussion of them is out of scope for this article.

The abstraction of the output as a result of the transformations of input data through neurons and layers is a form of *distributed representation*, as contrasted with *local representation*. The meaning represented by a single artificial neuron for example is a form of local representation. The meaning of the entire network however, is a form of distributed representation due to the many transformations across neurons and layers.

One thing worth noting is that while ANNs are extremely powerful, they can also be very complex and are considered *black box* algorithms, which means that their inner-workings are very difficult to understand and explain. Choosing whether to employ ANNs to solve problems should therefore be chosen with that in mind.

## Deep Learning Introduction

*Deep learning*, while sounding flashy, is really just a term to describe certain types of neural networks and related algorithms that consume often very *raw* input data. They process this data through many layers of nonlinear transformations of the input data in order to calculate a target output.

Unsupervised *feature extraction* is also an area where deep learning excels. Feature extraction is when an algorithm is able to automatically derive or construct meaningful features of the data to be used for further learning, generalization, and understanding. The burden is traditionally on the data scientist or programmer to carry out the feature extraction process in most other machine learning approaches, along with feature selection and engineering.

Feature extraction usually involves some amount dimensionality reduction as well, which is reducing the amount of input features and data required to generate meaningful results. This has many benefits, which include simplification, computational and memory power reduction, and so on.

More generally, deep learning falls under the group of techniques known as *feature learning* or *representation learning*. As discussed so far, feature extraction is used to ‘learn’ which features to focus on and use in machine learning solutions. The machine learning algorithms themselves ‘learn’ the optimal parameters to create the best performing model.

Paraphrasing Wikipedia, *feature learning* algorithms allow a machine to both learn for a specific task using a well-suited set of features, and also learn the features themselves. In other words, these algorithms *learn how to learn*!

Deep learning has been used successfully in many applications, and is considered to be one of the most cutting-edge machine learning and AI techniques at the time of this writing. The associated algorithms are often used for *supervised*, *unsupervised*, and *semi-supervised* learning problems.

For neural network-based deep learning models, the number of layers are greater than in so-called *shallow learning* algorithms. Shallow algorithms tend to be less complex and require more up-front knowledge of optimal features to use, which typically involves feature selection and engineering.

In contrast, deep learning algorithms rely more on optimal model selection and optimization through model tuning. They are more well suited to solve problems where prior knowledge of features is less desired or necessary, and where labeled data is unavailable or not required for the primary use case.

In addition to statistical techniques, neural networks and deep learning leverage concepts and techniques from signal processing as well, including nonlinear processing and/or transformations.

You may recall that a nonlinear function is one that is not characterized simply by a straight line. It therefore requires more than just a slope to model the relationship between the input, or independent variable, and the output, or dependent variable. Nonlinear functions can include polynomial, logarithmic, and exponential terms, as well as any other transformation that isn’t linear.

Many phenomena observed in the physical universe are actually best modeled with nonlinear transformations. This is true as well for transformations between inputs and the target output in machine learning and AI solutions.

## A Deeper Dive into Deep Learning - No Pun Intended

As mentioned, input data is transformed throughout the layers of a deep learning neural network by artificial neurons or processing units. The chain of transformations that occur from input to output is known as the *credit assignment path*, or *CAP*.

The CAP value is a proxy for the measurement or concept of ‘depth’ in a deep learning model architecture. According to Wikipedia, most researchers in the field agree that deep learning has multiple nonlinear layers with a CAP greater than two, and some consider a CAP greater than ten to be *very deep learning*.

While a detailed discussion of the many different deep-learning model architectures and learning algorithms is beyond the scope of this article, some of the more notable ones include:

- Feed-forward neural networks
- Recurrent neural network
- Multi-layer perceptrons (MLP)
- Convolutional neural networks
- Recursive neural networks
- Deep belief networks
- Convolutional deep belief networks
- Self-Organizing Maps
- Deep Boltzmann machines
- Stacked de-noising auto-encoders

It’s worth pointing out that due to the relative increase in complexity, deep learning and neural network algorithms can be prone to overfitting. In addition, increased model and algorithmic complexity can result in very significant computational resource and time requirements.

It’s also important to consider that solutions may represent *local minima* as opposed to a global optimal solution. This is due to the complex nature of these models when combined with optimization techniques such as gradient descent.

Given all of this, proper care must be taken when leveraging artificial intelligence algorithms to solve problems, including the selection, implementation, and performance assessment of algorithms themselves. While out of scope for this article, the field of machine learning includes many techniques that can help with these areas.

## Summary

AI is an extremely powerful and exciting field. It’s only going to become more important and ubiquitous moving forward, and will certainly continue to have very significant impacts on modern society.

Artificial neural networks (ANNs) and the more complex deep learning technique are some of the most capable AI tools for solving very complex problems, and will continue to be developed and leveraged in the future.

While a terminator-like scenario is unlikely any time soon, the progression of artificial intelligence techniques and applications will certainly be very exciting to watch!