Deep Learning with GPU- How do we start? — A quick setup guide on Amazon ec2

Deep learning is one of the hottest buzzwords in tech and is impacting everything from health care to transportation to manufacturing, and more. Companies are turning to deep learning to solve hard problems, like speech recognition, object recognition, and machine translation.

Everything new breakthrough comes with challenges. The biggest challenge for deep learning is that it requires intensive training of the model and massive amount of matrix multiplications and other operations. A single CPU usually has no more than 12 cores and it will be a bottleneck for deep learning network development. The good thing is that all the matrix computation can be parallelized and that’s where GPU comes into rescue. A single GPU might have thousands of cores and it is a perfect solution to deep learning’s massive matrix operations. GPUs are much faster than CPUs for deep learning because they have orders of magnitude more resources dedicated to floating point operations, running specialized algorithms ensuring that their deep pipelines are always filled.

The good thing is that all the matrix computation can be parallelized and that’s where GPU comes into rescue. A single GPU might have thousands of cores and it is a perfect solution to deep learning’s massive matrix operations. GPUs are much faster than CPUs for deep learning because they have orders of magnitude more resources dedicated to floating point operations, running specialized algorithms ensuring that their deep pipelines are always filled.

GPUs are much faster than CPUs for deep learning because they have orders of magnitude more resources dedicated to floating point operations, running specialized algorithms ensuring that their deep pipelines are always filled.

Now we’re know why GPU is necessary for deep learning. Probably you’re interested in deep learning and can’t wait to do something about it. But you don’t have big GPUs on your computer. The good news is that there are public GPU serves for you to start with. Google, Amazon, OVH all have GPU servers for you to rent and the cost is very reasonable.

In this article, I’ll show you how to set up a deep learning server on Amazon ec2, p2-2xlarge GPU instance in this case. In order to set up amazon instance, here is the prerequisite software you’ll need:

  1. Python 2.7 (recommend anaconda)
  2. Cygwin with wget, vim (if on windows)
  3. Install Amazon AWS Command Line Interface (AWS CLI), for Mac

Here is the fun part:

  1. Register an Amazon ec2 account at: https://aws.amazon.com/console/
  2. Go to Support –> Support Center –> Create case  (Only for the new ec2 user.)fastai.PNGType in the information in the form and ‘submit’ at the end. Wait for up to 24-48 hours for it to be activated. If you are already an ec2 user, you can skip this step.
  3. Create new user group. From console, Services –> Security, Identity & Compliance –> IAM –> Users –> Add user
  4. After created new user, add permission to the user by click the user just created.user group
  5. Obtain Access keys: Users –> Access Keys –> Create access key. Save the information.key
  6. Now we’re done with Amazon EC2 account, go to Mac Terminal or Cygwin on Windows
  7. Download set-up files from fast.ai. setup_p2.sh and setup_instance.sh . Change the extension to .sh since WordPress doesn’t support bash file upload
  8. Save the two shell script to your current working directory
  9. In the terminal, type: aws configure                                                                                      Type in the access key ID and Secret access key saved in step 5.
  10. bash setup_p2.sh
  11. Save the generated text (on terminal) for connecting to the server
  12. Connect to your instance: ssh -i /Users/lxxxx/.ssh/aws-key-fast-ai.pem ubuntu@ec2-34-231-172-2xx.compute-1.amazonaws.com
  13. Check your instance by typing: nvidia-smi
  14. Open Chrome Browser with URL: ec2-34-231-172-2xx.compute-1.amazonaws.com:8888Password: dl_course
  15. Now you can start to write your deep learning code in the Python Notebook.
  16. Shut down your instance in the console or you’ll pay a lot of money.

For a complete tutorial video, please check Jeremy Howard’s video here.

Tips:

The settings, passwords are all saved at ~/username/.aws , ~/username/.ipython.

 

 

 

 

Proposing a new metric for assessing missing data (Porosity Score) – Original

0.1 Introduction

When it comes to exploratory data analysis, we’ll often encounter data series with missing values. But the challenge is that how do we decide which time series to keep and how to score them. The most simple way to do is to compute the total percentage of the missing data. But this has a big flaw that it can’t differentiate the quality of the time series when they have the same amount of missing data points but positioned differently.

Let’s take a look at the following two data vectors: [NA,1,NA,1,NA,1,NA,1] and [1,1,1,1,NA,NA,NA,NA].

The recovery rate for these two vectors is different. The first time series is more often considered easier to impute, a.k.a estimate missing values. Because of the differences in these two series, I have come up with another method to score the quality of the series: porosity score. The concept is derived from environmental physics. What this does is to compute an adjusted porosity score of the time series vector by considering how the missing/bad data is positioned, the size of each block of missing data and adjust their impact on the overall dataset. Whether it is all discrete or continuously positioned every k index.

The porosity score proposed here will penalize the missing data block by its size. The bigger continuous hole it has, the worse the data is.

0.2 Define function

The function is defined below as PorosityScore. By default, the function will return a PorosityScore with penalty turned on. This is recommended metric. What this means is that it penalize each block of missing data differently. For example, the penalty weight for a missing block size of 4 will be 4 while it will be 1 for block size 1. This makes sense because the bigger hole you have, the worse data it should be.

# This function  is intended to compute the porosity of a time series vector.
# The computed porosicy(completeness) can be then used to screen feature variables in a dataframe.
# This function find the blocks of mimssing data and track the size of each block
#
# Input: Time Sereis Vector
#        tolerance: default 1 discrete missing value
#        Missing Value: NA or 0 or user specified (e.g. -99999 )
#        batch: when used in apply function, set it to TRUE and only adjuested.porosity will be generated. 
# Output: A list contains:
#         1. total.porosity.score (0-1) 
#         2. adjusted.porosity.score  (0-1) 
#         3. score with penalty (recommended) (0 - length(tsIn)^2) 
#         4. missing.blocksize
#         adjusted and penlty is used to control what type of output will be provided when run using apply function.
#
# e.g.
# > a <-  c(1,2,NA,3,NA,NA,4,5,6,7,8,NA,9,10,NA,NA)
# > result <- Porosity(a,tolerance = 2)
# > result$adjusted.porosity.score
# > result$total.porosity.score
#
# for dataframe usage. e.g. apply(dfIn,2,PorosityScore,tolerance=0,batch=TRUE, adjusted = FALSE, penalty = TRUE)

PorosityScore<- function(tsIn,tolerance =0,missingValue = NA,batch = FALSE, adjusted = FALSE, penalty = TRUE){
  #tsIn <- c(1,2,NA,3,NA,NA,4,5,6,7,8,NA,9,10,NA,NA)
  
  mVal = -99999999.9999 
  if(is.na(missingValue)) {
    tsIn[is.na(tsIn)] <- mVal
  }else{
    mVal = missingValue
  }
  idx <- which(tsIn == mVal )
  # Compute the total sparsity of the data
  totalPorosity <- length(idx) / length(tsIn)
  
  result <- list() 
  
  count <- 0
  i = 1
  while(i <= length(tsIn)) {
    if(tsIn[i] == mVal){
      count <-  count + 1
    }else{
      if(count !=0){
        result <- append(result,count)
        }
      count <- 0
    } 
    
    i <-  i +1
  } 
  
  if(count !=0) {
    result <- append(result,count)
  } 
  
  if(length(result) ==0){
      adjPorosity <- 0
      PenaltyPorosity <- 0
      blockSizeVec <- NA
      sprintf("The average porosity is: %5.1f.", mean(blockSizeVec))
      sprintf("The total and adjusted porosity score is:(%5.1f , %5.1f)", totalPorosity,adjPorosity)
      resultlist <-  list("total.porosity.score" =  totalPorosity ,"adjusted.porosity.score" = adjPorosity, 
                 "PenaltyPorosity"=PenaltyPorosity, "missing.blocksize" = blockSizeVec) 
  }else{
      # convert it to a vector
      blockSizeVec <- sapply(result,sum) # Map OF number of missing value in each missing blocks
     # If the spacing of the missing data is continous (>1), bad (e.g. [2,3,3,4,4,1,1,5,6,6])
      AvgPorosity <- mean(blockSizeVec)                            # The smaller,  the better
     # adjusted porosity score
      resVecAdj <- blockSizeVec[blockSizeVec>tolerance] 
      adjPorosity <- sum(resVecAdj)/length(tsIn) 
      PenaltyPorosity <- sum(blockSizeVec*resVecAdj)
      sprintf("The average porosity is: %5.1f.", mean(blockSizeVec))
      sprintf("The total and adjusted porosity score is:(%5.1f , %5.1f)", totalPorosity,adjPorosity)
      resultlist <-  list("total.porosity.score" =  totalPorosity ,"adjusted.porosity.score" = adjPorosity, 
                 "PenaltyPorosity"=PenaltyPorosity, "missing.blocksize" = blockSizeVec) 
  }
  
 if(batch) {
  # for using with apply function 
  # only return adjusted porosity since total porosity is too easy to compute
   if(adjusted){
     return(adjPorosity)
   }
   if(penalty){
     return(PenaltyPorosity)    
   }
 }else{
    return(resultlist) 
 }  
}

0.3 Example

Let’s look at the example

# use it with single vector
 print("dataset one")
## [1] "dataset one"
 a <-  c(1,2,NA,3,NA,NA,4,5,6,7,8,NA,9,10,NA,NA)
 result <- PorosityScore(a)
 print(result)
## $total.porosity.score
## [1] 0.375
## 
## $adjusted.porosity.score
## [1] 0.375
## 
## $PenaltyPorosity
## [1] 10
## 
## $missing.blocksize
## [1] 1 2 1 2
 #print("data set one")
 #print(result$adjusted.porosity.score)
 #print(result$total.porosity.score)
 #print(result$PenaltyPorosity)
 print("dataset two")
## [1] "dataset two"
 a2 <-  c(1,NA,2,3,4,NA,4,NA,6,NA,8,NA,9,10,NA)
 result2 <- PorosityScore(a2)
 print(result2)
## $total.porosity.score
## [1] 0.4
## 
## $adjusted.porosity.score
## [1] 0.4
## 
## $PenaltyPorosity
## [1] 6
## 
## $missing.blocksize
## [1] 1 1 1 1 1 1
#print(result2$adjusted.porosity.score)
#print(result2$total.porosity.score)
#print(result2$PenaltyPorosity)
# how to use it with a dataframe
#dfIn <- as.data.frame(matrix(5,5,2))
#results <- apply(dfIn,2,PorosityScore,tolerance=1,batch=TRUE, adjusted = FALSE, penalty = TRUE)

0.4 Conclusion

As we can see that the function can successfully distinguish time series with different missing patterns. In the above example, the first vector has a greater porosity score with a penalty. We can use this score to filter out numeric features with missing data by rank them.