Increase the cell width of the Jupyter notebook in browser

Sometimes you may need to use the full-screen size for showing animation, pictures or embed iframe web in your work. By default, the notebook only uses a small part of it.  Assume you only want to change current notebook but not the default setting, here is how:

from IPython.core.display import display, HTML

display(HTML(“<style>.container {width:100% !important;}</style>”))

Alternatively, you can also set the CSS of a notebook by reading from a pre-defined css file.

First, create a file containing the css setting:

Capture

If the file name is custom_ipython.css then add a cell containing:

from IPython.core.display import HTML
def css_styling():
styles = open(“custom.css”, “r”).read()
return HTML(styles)
css_styling()

Advertisements

Numerical instability in deep learning with softmax

One of the most frequently used activation function in output layers for multi-class classification neural network is softmax. Softmax is defined as f(X) = exp(xi)/sum(exp(xi)) and it returns probability for each individual class with all probability the sum of one. For the two-class problem, sigmoid will return the same probability as softmax.

While translating softmax into program code, there are some little thing to watch out due to numerical instability associated with exploding gradient or vanishing gradient.  Let’s look at two examples:

Screen Shot 2017-10-09 at 9.07.14 PM

While the weights increase 1000 X, the probability becomes useless, either 0 or 1.

Screen Shot 2017-10-09 at 9.16.37 PM

The same thing happens to the weights vanishing. All the weights now share the same probability as the weights approach zero.

However, there is an easy fix for addressing the exploding and vanishing weights by modifying the softmax function to softmax(X + c). Mostly commonly used C is max(X) and it leaves the weight vector to be all negative. It will rule out overflow and vanishing denominator with at least one zero elements. Underflow some but not all weights are harmless.

Let’s see the impact.

Screen Shot 2017-10-09 at 9.33.02 PM

Screen Shot 2017-10-09 at 9.30.31 PM

Screen Shot 2017-10-09 at 9.30.40 PM

As seen from the two examples with stable syntax, it saved more weights from vanishing or exploding gradient.

Delete thousands of spam emails without subject in gmail

Machine learning has been successfully used to automatically detect and flag spam email in Gmail and other services, but it still fails to do so in many cases. One biggest missing feature in Gmail is that it allows spam without subject lines to be kept in the inbox. This is very annoying since these spam are all automatically generated with unique email addresses. It is difficult to create universal filters using sender’s email address and not feasible to delete it manually. This happened to my Gmail couple of weeks ago when it is flooded with spams.

image

Finally, I found a way to automatically remove all the junks using Google Lab’s filter scripts.

First, create a new filter and ‘export’ it to a XML file.

Gmail

Then Edit the file and update the section enclosed with entry.

Capture.PNG

The XML script to do the trick is:

<entry><entry> <category term=’filter’></category> <title>Mail Filter</title> <id>tag:mail.google.com,2008:filter:1434203171999</id> <updated>2017-09-30T14:47:33Z</updated> <content></content>    <apps:property name=’subject’ value=”/> <apps:property name=’hasAttachment’ value=’true’/>    <apps:property name=’shouldTrash’ value=’true’/> <apps:property name=’sizeOperator’ value=’s_sl’/> <apps:property name=’sizeUnit’ value=’s_smb’/> </entry>

Then, import the xml file. Now the script will do the job for you.

The <id></id> tag should be different than this sample.

 

Here is why we need GPU and Parallerization

We all know deep learning, especially in computation vision, are resource intensive. It will give you an even more straightforward connection on why we say that by looking at the settings for ConvNet configuration. The amount of memory and parameters to be used and computed.

Capture

Source: Fei-Fei Li & Andrej Karpathy, Stanford University.

There are 13 convolution layers with filters size of 3 by 3, within which several pooling layers with a stride of two were used. Then three fully-connected neural network with nodes of O(1000) used. As seen from the numbers, the majority of memory is in early CONV and the majority of parameters are in late FC for this ConvNet.

Now you’ll know why GPU and parallel computing is very helpful for the deepnet.

 

Two Common Catches in R Programming

Sometimes, the scripts you created gives you a big surprise due to some subtle differences of the command. Here are two common difficult to catch traps in R programming.

1. which vs. %in% during subset dataset

which

df <- data.frame(a = runif(5), d = runif(5), animals = c('dog','cat','snake','lion','rat'), z = 1:5)
results1 <- df[, -which(names(df) %in% c("a","d"))]  # works as expected
# how about this one
results2 <- df[, -which(names(df) %in% c("b","c"))]  # surprise! All data are gone

%in%

df <- data.frame(a = runif(5), d = runif(5), animals = c('dog','cat','snake','lion','rat'), z = 1:5)
results1 <- df[, !names(df) %in% c("a","d")]  # works as expected
# how about this one
results2 <- df[, !names(df) %in% c("b","c")]  # returns the un-altered data.frame

Another fast way to drop columns is assign it to NULL

dropVec <- c('a','d')
df[dropVec] <- list(NULL)

2. Missing parathesis ()

Look at the following examples, you would expect it to print 1:9, right? Instead, it is print i-1.

n <- 10
for (i in 1:n-1) {
  print(i)
}
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
n <- 10
for (i in 1:(n-1)){
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9

Or check out my Rpub: http://rpubs.com/euler-tech/303265

Common challenges while aggregating data with multiple group IDs and functions in R

While analyzing a dataset, one of the most common tasks will be looking at the data features in an aggregated way. For example, aggregate the dataset by its year, month, day, or IDs, etc. Then you might want to look at the aggregated effects using the aggregate functions, not only one but multiple (say Min, Max, count etc).

There are a couple of ways to do it in R:

  • Aggregate each function separately and merge them.

agg.sum <- aggregate(. ~ id1 + id2, data = x, FUN = sum)

agg.min <- aggregate(. ~id1 + id2, data = x, FUN = min)

merge(agg.sum, agg.min, by  = c(“id1”, “id2”)

  • Aggregate all at a once using ‘dplyr’

# inclusion

df %>% group_by(id1, id2) %>% summarise_at(.funs = funs(mean, min, n()), .vars = vars(var1, var2))

# exclusion

df %>% group_by(id1, id2) %>% summarise_at(.funs = funs(mean, min, n()), .vars = vars(-var1, -var2))

These are very handy for quick analysis, especially for people prefer simpler coding.