Numerical instability in deep learning with softmax

One of the most frequently used activation function in output layers for multi-class classification neural network is softmax. Softmax is defined as f(X) = exp(xi)/sum(exp(xi)) and it returns probability for each individual class with all probability the sum of one. For the two-class problem, sigmoid will return the same probability as softmax.

While translating softmax into program code, there are some little thing to watch out due to numerical instability associated with exploding gradient or vanishing gradient.  Let’s look at two examples:

Screen Shot 2017-10-09 at 9.07.14 PM

While the weights increase 1000 X, the probability becomes useless, either 0 or 1.

Screen Shot 2017-10-09 at 9.16.37 PM

The same thing happens to the weights vanishing. All the weights now share the same probability as the weights approach zero.

However, there is an easy fix for addressing the exploding and vanishing weights by modifying the softmax function to softmax(X + c). Mostly commonly used C is max(X) and it leaves the weight vector to be all negative. It will rule out overflow and vanishing denominator with at least one zero elements. Underflow some but not all weights are harmless.

Let’s see the impact.

Screen Shot 2017-10-09 at 9.33.02 PM

Screen Shot 2017-10-09 at 9.30.31 PM

Screen Shot 2017-10-09 at 9.30.40 PM

As seen from the two examples with stable syntax, it saved more weights from vanishing or exploding gradient.

Advertisements

Deep Learning with GPU- How do we start? — A quick setup guide on Amazon ec2

Deep learning is one of the hottest buzzwords in tech and is impacting everything from health care to transportation to manufacturing, and more. Companies are turning to deep learning to solve hard problems, like speech recognition, object recognition, and machine translation.

Everything new breakthrough comes with challenges. The biggest challenge for deep learning is that it requires intensive training of the model and massive amount of matrix multiplications and other operations. A single CPU usually has no more than 12 cores and it will be a bottleneck for deep learning network development. The good thing is that all the matrix computation can be parallelized and that’s where GPU comes into rescue. A single GPU might have thousands of cores and it is a perfect solution to deep learning’s massive matrix operations. GPUs are much faster than CPUs for deep learning because they have orders of magnitude more resources dedicated to floating point operations, running specialized algorithms ensuring that their deep pipelines are always filled.

The good thing is that all the matrix computation can be parallelized and that’s where GPU comes into rescue. A single GPU might have thousands of cores and it is a perfect solution to deep learning’s massive matrix operations. GPUs are much faster than CPUs for deep learning because they have orders of magnitude more resources dedicated to floating point operations, running specialized algorithms ensuring that their deep pipelines are always filled.

GPUs are much faster than CPUs for deep learning because they have orders of magnitude more resources dedicated to floating point operations, running specialized algorithms ensuring that their deep pipelines are always filled.

Now we’re know why GPU is necessary for deep learning. Probably you’re interested in deep learning and can’t wait to do something about it. But you don’t have big GPUs on your computer. The good news is that there are public GPU serves for you to start with. Google, Amazon, OVH all have GPU servers for you to rent and the cost is very reasonable.

In this article, I’ll show you how to set up a deep learning server on Amazon ec2, p2-2xlarge GPU instance in this case. In order to set up amazon instance, here is the prerequisite software you’ll need:

  1. Python 2.7 (recommend anaconda)
  2. Cygwin with wget, vim (if on windows)
  3. Install Amazon AWS Command Line Interface (AWS CLI), for Mac

Here is the fun part:

  1. Register an Amazon ec2 account at: https://aws.amazon.com/console/
  2. Go to Support –> Support Center –> Create case  (Only for the new ec2 user.)fastai.PNGType in the information in the form and ‘submit’ at the end. Wait for up to 24-48 hours for it to be activated. If you are already an ec2 user, you can skip this step.
  3. Create new user group. From console, Services –> Security, Identity & Compliance –> IAM –> Users –> Add user
  4. After created new user, add permission to the user by click the user just created.user group
  5. Obtain Access keys: Users –> Access Keys –> Create access key. Save the information.key
  6. Now we’re done with Amazon EC2 account, go to Mac Terminal or Cygwin on Windows
  7. Download set-up files from fast.ai. setup_p2.sh and setup_instance.sh . Change the extension to .sh since WordPress doesn’t support bash file upload
  8. Save the two shell script to your current working directory
  9. In the terminal, type: aws configure                                                                                      Type in the access key ID and Secret access key saved in step 5.
  10. bash setup_p2.sh
  11. Save the generated text (on terminal) for connecting to the server
  12. Connect to your instance: ssh -i /Users/lxxxx/.ssh/aws-key-fast-ai.pem ubuntu@ec2-34-231-172-2xx.compute-1.amazonaws.com
  13. Check your instance by typing: nvidia-smi
  14. Open Chrome Browser with URL: ec2-34-231-172-2xx.compute-1.amazonaws.com:8888Password: dl_course
  15. Now you can start to write your deep learning code in the Python Notebook.
  16. Shut down your instance in the console or you’ll pay a lot of money.

For a complete tutorial video, please check Jeremy Howard’s video here.

Tips:

The settings, passwords are all saved at ~/username/.aws , ~/username/.ipython.