Archive for Programming

Turbocharging Python with Command Line Tools

By Noah Gift for

Table of Contents

  • Introduction
  • Using The Numba JIT (Just in time Compiler)
  • Using the GPU with CUDA Python
  • Running True Multi-Core Multithreaded Python using Numba
  • KMeans Clustering
  • Summary


It’s as good a time to be writing code as ever – these days, a little bit of code goes a long way. Just a single function is capable of performing incredible things. Thanks to GPUs, Machine Learning, the Cloud, and Python, it’s is easy to create “turbocharged” command-line tools. Think of it as upgrading your code from using a basic internal combustion engine to a nuclear reactor. The basic recipe for the upgrade? One function, a sprinkle of powerful logic, and, finally, a decorator to route it to the command-line.

Writing and maintaining traditional GUI applications – web or desktop – is a Sisyphean task at best. It all starts with the best of intentions, but can quickly turn into a soul crushing, time-consuming ordeal where you end up asking yourself why you thought becoming a programmer was a good idea in the first place. Why did you run that web framework setup utility that essentially automated a 1970’s technology – the relational database – into series of python files? The old Ford Pinto with the exploding rear gas tank has newer technology than your web framework. There has got to be a better way to make a living.

The answer is simple: stop writing web applications and start writing nuclear powered command-line tools instead. The turbocharged command-line tools that I share below are focused on fast results vis a vis minimal lines of code. They can do things like learn from data (machine learning), make your code run 2,000 times faster, and best of all, generate colored terminal output.

Here are the raw ingredients that will be used to make several solutions:

You can follow along with source code, examples, and resources in Kite’s github repository.

Using The Numba JIT (Just in time Compiler)

Python has a reputation for slow performance because it’s fundamentally a scripting language. One way to get around this problem is to use the Numba JIT. Here’s what that code looks like:

First, use a timing decorator to get a grasp on the runtime of your functions:

def timing(f):
    def wrap(*args, **kwargs):
        ts = time()
        result = f(*args, **kwargs)
        te = time()
        print(f'fun: {f.__name__}, args: [{args}, {kwargs}] took: {te-ts} sec')
        return result
    return wrap

Next, add a numba.jit decorator with the “nopython” keyword argument, and set to true. This will ensure that the code will be run by the JIT instead of regular python.

def expmean_jit(rea):
    """Perform multiple mean calculations"""

    val = rea.mean() ** 2
    return val

When you run it, you can see both a “jit” as well as a regular version being run via the command-line tool:

$ python jit-test

Running NO JIT
func:'expmean' args:[(array([[1.0000e+00, 4.2080e+05, 4.2350e+05, ..., 1.0543e+06, 1.0485e+06,
       [2.0000e+00, 5.4240e+05, 5.4670e+05, ..., 1.5158e+06, 1.5199e+06,
       [3.0000e+00, 7.0900e+04, 7.1200e+04, ..., 1.1380e+05, 1.1350e+05,
       [1.5277e+04, 9.8900e+04, 9.8100e+04, ..., 2.1980e+05, 2.2000e+05,
       [1.5280e+04, 8.6700e+04, 8.7500e+04, ..., 1.9070e+05, 1.9230e+05,
       [1.5281e+04, 2.5350e+05, 2.5400e+05, ..., 7.8360e+05, 7.7950e+05,
        7.7420e+05]], dtype=float32),), {}] took: 0.0007 sec

$ python jit-test –jit

Running with JIT
func:'expmean_jit' args:[(array([[1.0000e+00, 4.2080e+05, 4.2350e+05, ..., 1.0543e+06, 1.0485e+06,
       [2.0000e+00, 5.4240e+05, 5.4670e+05, ..., 1.5158e+06, 1.5199e+06,
       [3.0000e+00, 7.0900e+04, 7.1200e+04, ..., 1.1380e+05, 1.1350e+05,
       [1.5277e+04, 9.8900e+04, 9.8100e+04, ..., 2.1980e+05, 2.2000e+05,
       [1.5280e+04, 8.6700e+04, 8.7500e+04, ..., 1.9070e+05, 1.9230e+05,
       [1.5281e+04, 2.5350e+05, 2.5400e+05, ..., 7.8360e+05, 7.7950e+05,
@click.option('--jit/--no-jit', default=False)
        7.7420e+05]], dtype=float32),), {}] took: 0.2180 sec

How does that work? Just a few lines of code allow for this simple toggle:

def jit_test(jit):
    rea = real_estate_array()
    if jit:
        click.echo('Running with JIT', fg='green'))
        click.echo('Running NO JIT', fg='red'))

In some cases a JIT version could make code run thousands of times faster, but benchmarking is key. Another item to point out is the line:

click.echo('Running with JIT', fg='green'))

This script allows for colored terminal output, which can be very helpful it creating sophisticated tools.

Using the GPU with CUDA Python

Another way to nuclear power your code is to run it straight on a GPU. This example requires you run it on a machine with a CUDA enabled. Here’s what that code looks like:

def cuda_operation():
    """Performs Vectorized Operations on GPU"""

    x = real_estate_array()
    y = real_estate_array()

    print('Moving calculations to GPU memory')
    x_device = cuda.to_device(x)
    y_device = cuda.to_device(y)
    out_device = cuda.device_array(
        shape=(x_device.shape[0],x_device.shape[1]), dtype=np.float32)

    print('Calculating on GPU')
    add_ufunc(x_device,y_device, out=out_device)

    out_host = out_device.copy_to_host()
    print(f'Calculations from GPU {out_host}')

It’s useful to point out is that if the numpy array is first moved to the GPU, then a vectorized function does the work on the GPU. After that work is completed, then the data is moved from the GPU. By using a GPU there could be a monumental improvement to the code, depending on what it’s running. The output from the command-line tool is shown below:

$ python cuda-operation
Moving calculations to GPU memory

(10015, 259)
Calculating on GPU
Calculcations from GPU [[2.0000e+00 8.4160e+05 8.4700e+05 ... 2.1086e+06 2.0970e+06 2.0888e+06]
 [4.0000e+00 1.0848e+06 1.0934e+06 ... 3.0316e+06 3.0398e+06 3.0506e+06]
 [6.0000e+00 1.4180e+05 1.4240e+05 ... 2.2760e+05 2.2700e+05 2.2660e+05]
 [3.0554e+04 1.9780e+05 1.9620e+05 ... 4.3960e+05 4.4000e+05 4.4080e+05]
 [3.0560e+04 1.7340e+05 1.7500e+05 ... 3.8140e+05 3.8460e+05 3.8720e+05]
 [3.0562e+04 5.0700e+05 5.0800e+05 ... 1.5672e+06 1.5590e+06 1.5484e+06]]

Running True Multi-Core Multithreaded Python using Numba

One common performance problem with Python is the lack of true, multi-threaded performance. This also can be fixed with Numba. Here’s an example of some basic operations:

def add_sum_threaded(rea):
    """Use all the cores"""

    x,_ = rea.shape
    total = 0
    for _ in numba.prange(x):
        total += rea.sum()

def add_sum(rea):
    """traditional for loop"""

    x,_ = rea.shape
    total = 0
    for _ in numba.prange(x):
        total += rea.sum()

@click.option('--threads/--no-jit', default=False)
def thread_test(threads):
    rea = real_estate_array()
    if threads:
        click.echo('Running with multicore threads', fg='green'))
        click.echo('Running NO THREADS', fg='red'))

Note that the key difference between the parallel version is that it uses @numba.jit(parallel=True) and numba.prange to spawn threads for iteration. Looking at the picture below, all of the CPUs are maxed out on the machine, but when almost the exact same code is run without the parallelization, it only uses a core.

$ python thread-test

$ python thread-test --threads

KMeans Clustering

One more powerful thing that can be accomplished in a command-line tool is machine learning. In the example below, a KMeans clustering function is created with just a few lines of code. This clusters a pandas DataFrame into a default of 3 clusters.

def kmeans_cluster_housing(clusters=3):
    """Kmeans cluster a dataframe"""
    url = ''
    val_housing_win_df =pd.read_csv(url)
    numerical_df =(
        val_housing_win_df.loc[:,['TOTAL_ATTENDANCE_MILLIONS', 'ELO',
    #scale data
    scaler = MinMaxScaler()
    #cluster data
    k_means = KMeans(n_clusters=clusters)
    kmeans =
    val_housing_win_df['cluster'] = kmeans.labels_
    return val_housing_win_df

The cluster number can be changed by passing in another number (as shown below) using click:

@click.option('--num', default=3, help='number of clusters')
def cluster(num):
    df = kmeans_cluster_housing(clusters=num)
    click.echo('Clustered DataFrame')

Finally, the output of the Pandas DataFrame with the cluster assignment is show below. Note, it has cluster assignment as a column now.

$ python -W cluster

Clustered DataFrame 0 1 2 3 4
TEAM Chicago Bulls Dallas Mavericks Sacramento Kings Miami Heat Toronto Raptors
GMS 41 41 41 41 41
PCT_ATTENDANCE 104 103 101 100 100
COUNTY Cook Dallas Sacremento Miami-Dade York-County
MEDIAN_HOME_PRICE_COUNTY_MILLIONS 269900.0 314990.0 343950.0 389000.0 390000.0
COUNTY_POPULATION_MILLIONS 5.20 2.57 1.51 2.71 1.10
cluster 0 0 1 0 0

$ python -W cluster --num 2

Clustered DataFrame 0 1 2 3 4
TEAM Chicago Bulls Dallas Mavericks Sacramento Kings Miami Heat Toronto Raptors
GMS 41 41 41 41 41
PCT_ATTENDANCE 104 103 101 100 100
COUNTY Cook Dallas Sacremento Miami-Dade York-County
MEDIAN_HOME_PRICE_COUNTY_MILLIONS 269900.0 314990.0 343950.0 389000.0 390000.0
COUNTY_POPULATION_MILLIONS 5.20 2.57 1.51 2.71 1.10
cluster 1 1 0 1 1


The goal of this article is to show how simple command-line tools can be a great alternative to heavy web frameworks. In under 200 lines of code, you’re now able to create a command-line tool that involves GPU parallelization, JIT, core saturation, as well as Machine Learning. The examples I shared above are just the beginning of upgrading your developer productivity to nuclear power, and I hope you’ll use these programming tools to help build the future.

Many of the most powerful things happening in the software industry are based on functions: distributed computing, machine learning, cloud computing (functions as a service), and GPU based programming are all great examples. The natural way of controlling these functions is a decorator-based command-line tool – not clunky 20th Century clunky web frameworks. The Ford Pinto is now parked in a garage, and you’re driving a shiny new “turbocharged” command-line interface that maps powerful yet simple functions to logic using the Click framework.

Noah Gift is lecturer and consultant at both UC Davis Graduate School of Management MSBA program and the Graduate Data Science program, MSDS, at Northwestern. He is teaching and designing graduate machine learning, AI, Data Science courses and consulting on Machine Learning and Cloud Architecture for students and faculty.

Noah’s new book, Pragmatic AI, will help you solve real-world problems with contemporary machine learning, artificial intelligence, and cloud computing tools. Noah Gift demystifies all the concepts and tools you need to get results—even if you don’t have a strong background in math or data science. Save 30% with the code, “KITE”.

About the Author:

This article originally appeared on

(Reprinted with permission)


Thoughts on Security

By Adam Smith for

Last week we launched Kite, a copilot for programmers. We’ve been excited about the Kite vision since 2014—we’re blown away by how many of you are excited about it too!

The response far exceeded our expectations. We had over a thousand upvotes on Hacker News; we were in the all time top 1% of launches on Product Hunt; and we had over two thousands tweets about Kite, not counting retweets. We couldn’t be more grateful to those who believed in the vision and took the time to share Kite with a friend or join the discussion online.

That said, we have a lot of work to do. Kite is the first product of its kind, which means we’re pioneering some new terrain. We signed up for this, and are committed to getting it right.

Why Cloud? Garmin versus Waze.

The first question is: why keep the copilot logic in the cloud, instead of locally as part of the Kite install? The short answer is we can build a better experience if Kite is a cloud service.

The full answer is a long list of things that are better about cloud services. Editors today are Garmin GPS, and Kite is Waze. Some folks still use Garmin GPS due to privacy concerns, but most of the world uses internet-connected navigation for its many advantages: fresher maps, more coverage, better tuned navigation algorithms, better user experience because iteration is 10x cheaper, etc.

The same patterns apply to Kite. I’d like to give three quick examples, and then talk about the larger strategy.

  1. Data by the Terabyte. Kite uses lots of data to power the copilot experience. We index public documentation, maintain maps of the Python world (e.g. scipy.array is an alias for numpy.array), and surface patterns extracted from all of Github. We keep all of this in RAM, so you don’t have to. We run servers with 32 GB of RAM; while some of you may have that kind of rig (we’re jealous!), the typical Macbook Pro doesn’t. This data set will grow as we add support for more programming languages and more functionality. With a cloud-based architecture you don’t need to preselect which languages you’ll use, or sacrifice gigabytes of memory on your machine.
  2. Machine Learning. Kite is powered by a number of statistical models, and we’re adding more over time. For example, Kite’s search and “Did you mean” features both use machine learning. Of course we could ship these to your local client, but our models will get smarter over time if we know which result you clicked on (like Google Search) and whether you accepted a suggested change to your code (like Google Spellcheck).
  3. Rapid ship cycles. We ship multiple times per week. This means our bugs get fixed faster, data is fresher, and you get the newest features as soon as possible.

The cloud and its resulting feedback loops lead to better products, faster. We’ve seen the same evolution across a number of verticals. A few examples:

  • Outlook → Gmail
  • Colocation → AWS
  • Network File Share → Dropbox
  • MS Office → Google Docs

In each of these cases, security had to be addressed. At first it wasn’t clear the world would make the jump. It didn’t happen all at once, and there are still people using the legacy technologies. This evolution takes time, and overall is very healthy.

So what does Kite need to do as a company excited about the possibilities of cloud-connected programming?

Security: Core Principles

Let’s talk about the security concerns that naturally arise from a cloud-powered programming copilot. As software developers, security has naturally been on our minds since the beginning. Frankly many of us here at Kite would have left similar comments on the HN thread :). Many of you are rightfully concerned about security as well, so let’s jump in.

Our approach to security begins with a few core principles:

  1. Security is a journey, not a destination. We will never be done giving you the tools you need to control your data. We will also never be done earning your trust.
  2. Control. You should control what data gets sent to Kite’s backend and whether you want us to store it for your later use. We should offer as much control as we can.
  3. Transparency. You should understand what is happening. We need to communicate this repeatedly, and clearly.
  4. We’re building the future together with you. We don’t presume to have all of the answers. We want to work with all of you to find the best solutions.

We are committed to these principles. We want you and your employer to be excited about using Kite, and we think these principles are a good first step.

Let’s look at some examples of how we’ll put these principles into action.

You should be able to control

  • Which directories and files, if any, are indexed by Kite,
  • If Kite should remember your history of code changes,
  • If Kite should help with terminal commands,
  • If Kite should remember terminal commands you’ve previously written,
  • If Kite should remember the output of past terminal commands,
  • …and you should be able to easily turn these switches on and off.
  • If you change a setting, we should ask if you’d like to delete historical data, as applicable.

You should always be able to see

  • What files Kite has indexed (and permanently remove them as needed),
  • What terminal commands, or file edits, are being remembered by Kite (and permanently remove them as needed),
  • …and Kite should check in periodically to verify that your security settings match your preferences.

These are the first levels of control and transparency, which are based on files, directories, and the type of information (terminal versus editor).

Secrets, like passwords or keys, are a category of content that deserve special attention. We don’t want secrets on our servers, and we will be developing multiple mechanisms (automated and manual) to make sure they stay off our servers. We don’t have specifics to announce yet, but we believe we will set industry standards that will be adopted across multiple categories of tools such as continuous integration and code review systems.

We know a lot of folks are also interested in on premise deployment. We understand the use case and want to support it. We worry that it would delay a lot of seriously awesome stuff we have on the roadmap, e.g. support for Javascript, so we are thinking through how to fit it in. It is something we want to facilitate, particularly in the long run.

An Example

Since last week’s launch we have begun adding some of these principles into the product. I’d like to show you one feature we shipped yesterday. It’s called the lockout screen.

Kite’s Security panel asks users to whitelist the directories that Kite should be turned on for. Code living outside of this whitelist never gets read by Kite. So what should the sidebar show if you open a Python file outside of the whitelist? As of yesterday’s addition, you’ll see something like this:

This interaction embodies the principles of transparency and control. It communicates what is happening, why, and gives you a one-click control mechanism to change what’s happening, if you so choose.

The Future Ahead

We are committed to incorporating the principles of control and transparency into the foundations of Kite. We will write more about security on our blog as we design and implement these features.

That said, we realize that everyone has different needs. We can’t promise that the options and functionality we choose on day 1 will be perfect for everyone, but we’re working day and night to expand the circle as widely as possible. We’ll do this tirelessly over the long term.

We’d love to hear your thoughts along the way. It’s only been a week, but all of you have been incredibly helpful as we learn how to get this right. As always, we encourage you to talk with us on Twitter at @kitehq.

Nothing makes us happier than knowing so many of you are equally excited about the Kite vision. The future of programming is awesome. Let’s build it together!

P.S. We are hiring! We are looking for frontend web devs, generalist systems engineers, programming language devs, and mac/windows/linux developers. You can reach us at [email protected].

About the Author:

This article originally appeared on

(Reprinted with permission)


10 Essential Data Science Packages for Python

By TJ Simmons for

Interest in data science has risen remarkably in the last five years. And while there are many programming languages suited for data science and machine learning, Python is the most popular.

Since it’s the language of choice for machine learning, here’s a Python-centric roundup of ten essential data science packages, including the most popular machine learning packages.


Scikit-Learn is a Python module for machine learning built on top of SciPy and NumPy. David Cournapeau started it as a Google Summer of Code project. Since then, it’s grown to over 20,000 commits and more than 90 releases. Companies such as J.P. Morgan and Spotify use it in their data science work.

Because Scikit-Learn has such a gentle learning curve, even the people on the business side of an organization can use it. For example, a range of tutorials on the Scikit-Learn website show you how to analyze real-world data sets. If you’re a beginner and want to pick up a machine learning library, Scikit-Learn is the one to start with.

Here’s what it requires:

  • Python 3.5 or higher.
  • NumPy 1.11.0 or higher.
  • SciPy 0.17.0 or higher.


PyTorch does two things very well. First, it accelerates tensor computation using strong GPU. Second, it builds dynamic neural networks on a tape-based autograd system, thus allowing reuse and greater performance. If you’re an academic or an engineer who wants an easy-to-learn package to perform these two things, PyTorch is for you.

PyTorch is excellent in specific cases. For instance, do you want to compute tensors faster by using a GPU, as I mentioned above? Use PyTorch because you can’t do that with NumPy. Want to use RNN for language processing? Use PyTorch because of its define-by-run feature. Or do you want to use deep learning but you’re just a beginner? Use PyTorch because Scikit-Learn doesn’t cater to deep learning.

Requirements for PyTorch depend on your operating system. The installation is slightly more complicated than, say, Scikit-Learn. I recommend using the “Get Started” page for guidance. It usually requires the following:

  • Python 3.6 or higher.
  • Conda 4.6.0 or higher.


Caffe is one of the fastest implementations of a convolutional network, making it ideal for image recognition. It’s best for processing images.

Yangqing Jia started Caffe while working on his PhD at UC Berkeley. It’s released under the BSD 2-Clause license, and it’s touted as one of the fastest-performing deep-learning frameworks out there. According to the website, Caffe’s image processing is quite astounding. They claim it can process “over 60M images per day with a single NVIDIA K40 GPU.”

I should highlight that Caffe assumes you have at least a mid-level knowledge of machine learning, although the learning curve is still relatively gentle.

As with PyTorch, requirements depend on your operating system. Check the installation guide here. I recommend using the Docker version if you can so it works right out of the box. The compulsory dependencies are below:

  • CUDA for GPU mode.
    • Library version 7 or higher and the latest driver version are recommended, but releases in the 6s are fine too.
    • Versions 5.5 and 5.0 are compatible but considered legacy.
  • BLAS via ATLAS, MKL, or OpenBLAS.
  • Boost 1.55 or higher.


TensorFlow is one of the most famous machine learning libraries for some very good reasons. It specializes in numerical computation using dataflow graphs.

Originally developed by Google Brain, TensorFlow is open sourced. It uses dataflow graphs and differentiable programming across a range of tasks, making it one of the most highly flexible and powerful machine learning libraries ever created.

If you need to process large data sets quickly, this is a library you shouldn’t ignore.

The most recent stable version is v1.13.1, but the new v2.0 is in beta now.


Theano is one of the earliest open-source software libraries for deep-learning development. It’s best for high-speed computation.

While Theano announced that it would stop major developments after the release of v1.0 in 2017, you can still study it for historical reasons. It’s made this list of top ten data science packages for Python because if you familiarize yourself with it, you’ll get a sense of how its innovations later evolved into the features you now see in competing libraries.


Pandas is a powerful and flexible data analysis library written in Python. While not strictly a machine learning library, it’s well-suited for data analysis and manipulation for large data sets. In particular, I enjoy using it for its data structures, such as the DataFrame, the time series manipulation and analysis, and the numerical data tables. Many business-side employees of large organizations and startups can easily pick up Pandas to perform analysis. Plus, it’s fairly easy to learn, and it rivals competing libraries in terms of its features in data analysis.

If you want to use Pandas, here’s what you’ll need:


Keras is built for fast experimentation. It’s capable of running on top of other frameworks like TensorFlow, too. Keras is best for easy and fast prototyping as a deep learning library.

Keras is popular amongst deep learning library aficionados for its easy-to-use API. Jeff Hale created a compilation that ranked the major deep learning frameworks, and Keras compares very well.

The only requirement for Keras is one of three possible backend engines, like TensorFlow, Theano, or CNTK.


NumPy is the fundamental package needed for scientific computing with Python. It’s an excellent choice for researchers who want an easy-to-use Python library for scientific computing. In fact, NumPy was designed for this purpose; it makes array computing a lot easier.

Originally, the code for NumPy was part of SciPy. However, scientists who need to use the array object in their work were having to install the large SciPy package. To avoid that, a new package was separated from SciPy and called NumPy.

If you want to use NumPy, you’ll need Python 2.6.x, 2.7.x, 3.2.x, or newer.


Matplotlib is a Python 2D plotting library that makes it easy to produce cross-platform charts and figures.

So far in this roundup, we’ve covered plenty of machine learning, deep learning, and even fast computational frameworks. But with data science, you also need to draw graphs and charts. When you talk about data science and Python, Matplotlib is what comes to mind for plotting and data visualization. It’s ideal for publication-quality charts and figures across platforms.

For long-term support, the current stable version is v2.2.4, but you can get v3.0.3 for the latest features. It does require that you have Python 3 or newer, since support for Python 2 is being dropped.


SciPy is a gigantic library of data science packages mainly focused on mathematics, science, and engineering. If you’re a data scientist or engineer who wants the whole kitchen sink when it comes to running technical and scientific computing, you’ve found your match with SciPy.

Since it builds on top of NumPy, SciPy has the same target audience. It has a wide collection of sub packages, each focused on niches such as Fourier transforms, signal processing, optimizing algorithms, spatial algorithms, and nearest neighbor. Essentially, this is the companion Python library for your typical data scientist.

As far as requirements go, you’ll need NumPy if you want SciPy. But that’s it.


This brings to an end my roundup of the 10 major data-science-related Python libraries. Is there something else you’d like us to cover that also uses Python extensively? Let us know!

And don’t forget that Kite can help you learn these packages faster with its ML-powered autocomplete as well as handy in-editor docs lookups. Check it out for free as an IDE plugin for any of the leading IDEs.

About the Author:

This article originally appeared on

(Reprinted with permission)


Introduction to Artificial Neural Networks in Python

By Padmaja Bhagwat,

The Python implementation presented may be found in the Kite repository on Github.

Biology inspires the Artificial Neural Network

The Artificial Neural Network (ANN) is an attempt at modeling the information processing capabilities of the biological nervous system. The human body is made up of trillions of cells, and the nervous system cells – called neurons – are specialized to carry “messages” through an electrochemical process. The nodes in ANN are equivalent to those of our neurons, whose nodes are connected to each other by Synaptic Weights (or simply weights)  – equivalent to the synaptic connections between axons and dendrites of the biological neuron.

Let’s think of a scenario where you’re teaching a toddler how to identify different kinds of animals. You know that they can’t simply identify any animal using basic characteristics like a color range and a pattern: just because an animal is within a range of colors and has black vertical stripes and a slightly elliptical shape doesn’t automatically make it a tiger.

Instead, you should show them many different pictures, and then teach the toddler to identify those features in the picture on their own, hopefully without much of a conscious effort. This specific ability of the human brain to identify features and memorize associations is what inspired the emergence of ANNs.

What is an Artificial Neural Network?

In simple terms, an artificial neural network is a set of connected input and output units in which each connection has an associated weight. During the learning phase, the network learns by adjusting the weights in order to be able to predict the correct class label of the input tuples. Neural network learning is also referred to as connectionist learning, referencing the connections between the nodes. In order to fully understand how the artificial neural networks work, let’s first look at some early design approaches.

What can an Artificial Neural Network do?

Today, instead of designing a standardized solutions to general problems, we focus on providing a personalized, customized solution to specific situations. For instance, when you log in to any e-commerce website, it’ll provide you with personalized product recommendations based on your previous purchase, items on your wishlist, most frequently clicked items, and so on.

The platform is essentially analyzing the user’s behavior pattern and then recommending the solution accordingly; solutions like these can be effectively designed using Artificial Neural Networks.

ANNs have been successfully applied in wide range of domains such as:

  • Classification of data – Is this flower a rose or tulip?
  • Anomaly detection – Is the particular user activity on the website a potential fraudulent behavior?
  • Speech recognition – Hey Siri! Can you tell me a joke?
  • Audio generation – Jukedeck, can you compose an uplifting folk song?
  • Time series analysis – Is it good time to start investing in stock market?

And the list goes on…

Early model of ANN

The McCulloch-Pitts model of Neuron (1943 model)

This model is made up of a basic unit called Neuron. The main feature of their Neuron model is that a weighted sum of input signals is compared against a threshold to determine the neuron output. When the sum is greater than or equal to the threshold, the output is 1. When the sum is less than the threshold, the output is 0.  It can be put into the equations as such:

McCulloch-Pitts Model of Neuron

This function f which is also referred to as an activation function or transfer function is depicted in the figure below, where T stands for the threshold.

The figure below depicts the overall McCulloch-Pitts Model of Neuron.

Let’s start by designing the simplest Artificial Neural Network that can mimic the basic logic gates. On the left side, you can see the mathematical implementation of a basic logic gate, and on the right-side, the same logic is implemented by allocating appropriate weights to the neural network.

If you give the first set of inputs to the network i.e. (0, 0) it gets multiplied by the weights of the network to get the sum as follows: (0*1) + (0*1) = 0 (refer eq. 1). Here, the sum, 0, is less than the threshold, 0.5, hence the output will be 0 (refer eq. 2).

Whereas, for the second set of inputs (1,0), the sum (1*1) + (0*1) = 1 is greater than the threshold, 0.5, hence the output will be 1.

Similarly, you can try any different combination of weights and thresholds to design the neural network depicting AND gate and NOT gate as shown below.

This way, the McCulloch-Pitts model demonstrates that networks of these neurons could, in principle, compute any arithmetic or logical function.

Perceptron model

This is the simplest type of neural network that helps with linear (or binary) classifications of data. The figure below shows the linearly separable data.

perceptron model artificial neural networks

The learning rule for training the neural network was first introduced with this model. In addition to the variable weight values, the perceptron added an extra input that represents bias. Thus, the equation 1 was modified as follows:

artificial neural networks equation

Bias is used to adjust the output of the neuron along with the weighted sum of the inputs. It’s just like the intercept added in a linear equation.

Multilayer perceptron model

A perceptron that as a single layer of weights can only help in linear or binary data classifications. What if the input data is not linearly separable, as shown in figure below?

Multilayer perceptron model

This is when we use a multilayer perceptron with a non-linear activation function such as sigmoid.

Multilayer perceptron has three main components:

  • Input layer: This layer accepts the input features. Note that this layer does not perform any computation – it just passes on the input data (features) to the hidden layer.
  • Hidden layer: This layer performs all sorts of computations on the input features and transfers the result to the output layer. There can be one or more hidden layers.
  • Output layer: This layer is responsible for producing the final result of the model.

Now that we’ve discussed the basic architecture of a neural network, let’s understand how these networks are trained.

Training phase of a neural network

Training a neural network is quite similar to teaching a toddler how to walk. In the beginning, when she is first trying to learn, she’ll naturally make mistakes as she learns to stand on her feet and walk gracefully.

Similarly, in the initial phase of training, neural networks tend to make a lot of mistakes. Initially, the predicted output could be stunningly different from the expected output. This difference in predicted and expected outputs is termed as an ‘error’.

The entire goal of training a neural network is to minimize this error by adjusting its weights.

This training process consists of three (broad) steps:

1. Initialize the weights

The weights in the network are initialized to small random numbers (e.g., ranging from -1 to 1, or -0.5 to 0.5). Each unit has a bias associated with it, and the biases are similarly initialized to small random numbers.

def initialize_weights():
    # Generate random numbers

    # Assign random weights to a 3 x 1 matrix
    synaptic_weights = random.uniform(low=-1, high=1, size=(3, 1))
    return synaptic_weights

2. Propagate the input forward

In this step, the weighted sum of input values is calculated, and the result is passed to an activation function – say, a sigmoid activation function – which squeezes the sum value to a particular range (in this case, between 0 to 1), further adding bias with it. This decides whether a neuron should be activated or not.

artificial neural networks

Our sigmoid utility functions are defined like so:

def sigmoid(x):
    return 1 / (1 + exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

3.  Backpropagate the error

In this step, we first calculate the error, i.e., the difference between our predicted output and expected output. Further, the weights of the network are adjusted in such a way that during the next pass, the predicted output is much closer to the expected output, thereby reducing the error.

For neuron j (also referred to as unit j) of the output layer, the error is computed as follows:

Errj = Oj*(1 – Oj )*( Tj – Oj ) ……………….. (5)

Where Tjis the expected output, Ojis the predicted output and Oj *(1 – Oj) is the derivative of sigmoid function.

The weights and biases are updated to reflect the back propagated error.

Wij = Wij + (l*Errij*Oj ) ………………………. (6)
bi = bj + (l* Errij) ……………………………….  (7)

Above, l is the learning rate, a constant typically varying between 0 to 1. It decides the rate at which the value of weights and bias should vary. If the learning rate is high, then the weights and bias will vary drastically with each epoch. If it’s too low, then the change will be very slow.

We terminate the training process when our model’s predicted output is almost same as the expected output. Steps 2 and 3 are repeated until one of the following terminating conditions is met:

  • The error is minimized to the least possible value
  • The training has gone through the maximum number of iterations
  • There is no further reduction in error value
  • The training error is almost same as that of validation error

So, let’s create a simple interface that allows us to run the training process:

def learn(inputs, synaptic_weights, bias):
     return sigmoid(dot(inputs, synaptic_weights) + bias)

def train(inputs, expected_output, synaptic_weights, bias, learning_rate, training_iterations):
     for epoch in range(training_iterations):
          # Forward pass -- Pass the training set through the network.
          predicted_output = learn(inputs, synaptic_weights, bias)

        # Backaward pass
        # Calculate the error
        error = sigmoid_derivative(predicted_output) * (expected_output - predicted_output)

        # Adjust the weights and bias by a factor
        weight_factor = dot(inputs.T, error) * learning_rate
        bias_factor = error * learning_rate

        # Update the synaptic weights
        synaptic_weights += weight_factor

        # Update the bias
        bias += bias_factor

        if ((epoch % 1000) == 0):
            print("Epoch", epoch)
            print("Predicted Output = ", predicted_output.T)
            print("Expected Output = ", expected_output.T)
    return synaptic_weights

Bringing it all together

Finally, we can train the network and see the results using the simple interface created above. You’ll find the complete code in the Kite repository.

# Initialize random weights for the network
    synaptic_weights = initialize_weights()

    # The training set
    inputs = array([[0, 1, 1],
                    [1, 0, 0],
                    [1, 0, 1]])

    # Target set
    expected_output = array([[1, 0, 1]]).T

    # Test set
    test = array([1, 0, 1])

    # Train the neural network
    trained_weights = train(inputs, expected_output, synaptic_weights, bias=0.001, learning_rate=0.98,

    # Test the neural network with a test example
    accuracy = (learn(test, trained_weights, bias=0.01)) * 100

    print("accuracy =", accuracy[0], "%")


You now have seen a sneak peek into Artificial Neural Networks! Although the mathematics behind training a neural network might have seemed a little intimidating at the beginning, you can now see how easy it is to implement them using Python.

In this post, we’ve learned some of the fundamental correlations between the logic gates and the basic neural network. We’ve also looked into the Perceptron model and the different components of a multilayer perceptron.

In my upcoming post, I’m going to talk about different types of artificial neural networks and how they can be used in your day-to-day applications. Python is well known for its rich set of libraries like Keras, Scikit-learn, and Pandas to name a few – which abstracts out the intricacies involved in data manipulation, model building, training the model, etc. We shall be seeing how to use these libraries to build some of the cool applications. This post is an introduction to some of the basic concepts involved in building these models before we dive into using libraries.

Try it yourself

The best way of learning is by trying it out on your own, so here are some questions you can try answering using the concepts we learned in this post:

  1. Can you build an XOR model by tweaking the weights and thresholds?
  2. Try adding more than one hidden layer to the neural network, and see how the training phase changes.

See you in the next post!

About the Author:

This article originally appeared on

(Reprinted with permission)