How to evaluate unsupervised learning

Every time we build a machine learning model or any predictive model, the first thing we ask is how to evaluate it? What’s the best metric for each model? For supervised machine learning problem,  there are usually pre-set or well-known metrics. But for unsupervised learning, what should we do?

Let’s first look at what’s the typical unsupervised learning algorithms and its corresponding application scenes.

Typical unsupervised learning includes:

  • Hierarchical clustering: builds a multilevel hierarchy of clusters by creating a cluster tree
  • k-means clustering: partitions data into k distinct clusters based on distance to the centroid of a cluster
  • Gaussian mixture models: models clusters as a mixture of multivariate normal density components
  • Self-organizing maps: uses neural network that learns the topology and distribution of the data
  • Hidden Markov models: uses observed data to recover the sequence of states
  • Generative model such as Boltzmann machine to generate the distribution of outputs similar to input

Unsupervised learning methods are sued in bioinformatics for sequence analysis and genetic clustering; in data mining for sequence and pattern mining; in medical imaging for image segmentation; and in computer vision for object recognition, dimensionality reduction techniques for reducing dimensions.

Let’s go back to our original question: how to evaluate unsupervised learning?

Obviously, the answer depends on the class of unsupervised algorithms you use.

  1. Dimensionality reduction algorithms

For this type of algorithms, we can use methods similar to supervised learning by looking at its reconstructing error with test dataset or by applying a k-fold cross-validation procedure.

2.  Clustering algorithms

It is difficult to evaluate a clustering if you don’t have labeled test data. Typically there are two types of metrics: I. internal metrics, use only information on the computed clusters to evaluate if clusters are compact and well-separated[3]; II. external metrics that perform a statistical testing on the structure of your data [1].

For external indices, we evaluate the results of a clustering algorithm based on a known cluster structure of a data set (or cluster labels).

For internal indices, we evaluate the results using quantities and features inherent in the data set. The optimal number of clusters is usually determined based on an internal validity index.

A very good resource for clustering evaluation is from sklearn’s documentation page where it listed methods like adjusted rand index, mutual information based scores,  homogeneity,, completeness and V-measure, Fowlkes-Mallows scores and etc. With one method not covered: the Silhouette Coefficient which assumes ground truth labels are available.

Sometimes, an extrinsic performance function can be defined to evaluate it. For instance, if clustering is used to create meaningful classes (e.g. documents classification), it is possible to create an external dataset by hand-labeling and test the accuracy (gold standard). Another way of evaluating clustering is using high-dimension visualization tools like t-sne to visually check. For example, for feature learning in images, visualization of the learned features can be useful.

Screen Shot 2018-01-27 at 9.48.47 PM.png

3. Generative models

This type of method is stochastic, the actual value achieved after a given amount of training may depend on random seeds. So we can vary these and compare several runs to see if there is significant different performance. Also, visualizing the constructed output along with input can be a good metric too. For example, reconstructed hand-written digits with RBM can be compared with training samples.



[1] Halkidi, Maria, Yannis Batistakis, and Michalis Vazirgiannis. “On clustering validation techniques.” Journal of Intelligent Information Systems 17.2-3 (2001): 107-145.
[2] Hall, Peter, Jeff Racine, and Qi Li. “Cross-validation and the estimation of conditional probability densities.” Journal of the American Statistical Association 99.468 (2004).
[3] Yanchi Liu, Zhongmou Li, and Hui Xiong. “Understanding of Internal Clustering Validation Measures” IEEE International Conference on Data Mining 2010.







Three different ways of initializing deep neural network yield surprising results

While training deep neural net, there are many parameters to be initialized and trained through the forward and backward propagation. A lot of times we spent a lot of time on trying different activation function, tuning the depth of deepnet, and number of units and other hyperparameters. But we may forget the importance of initialization of its weights and biases. In this article, I’ll share three ways of initialization methods (1. zeros initialization, 2. Random initialization 3. He initialization) and see their corresponding impact.

In this example, it is a three-layer neural network with the following setup: Linear –> RELU –> Linear –> RELU -> Linear –> Sigmoid. So the first two hidden layer are (linear + Relu) and the last layer (L) is (linear + sigmoid) as illustrated in the below figure.


The dataset is created using the following code.

Screen Shot 2017-12-14 at 11.31.13 PM

The data looks like this:


  1. Zero initialization

In this case, just assign all parameters and bias to zero using np.zeros().

Screen Shot 2017-12-14 at 11.44.50 PM

The below plots shows that none of the points are correctly separated and the logloss cost function stays stagnant since all the neuron are the same.

2. Random initialization

In this case, the weights are randomly initialized by a large number 10 and bias set to zero. And we can see the neural network starts to learn correctly.

Screen Shot 2017-12-14 at 11.45.03 PM

3. He initialization

Last, we’ll see how ‘He’ initialization method works. In He et al., 2015, they proposed a new way for neural network: sqrt(2./layer_dims[l-1]). And we can see that this separated the two-class very well.

Screen Shot 2017-12-14 at 11.45.17 PM

As we can see that initialization is very important in training deep neural network. It is important to break the symmetry of the neurons in the same layer and proper initialization can make your training much faster.

The forward propagation and backward propagation are shown below:

Screen Shot 2017-12-14 at 11.36.09 PMScreen Shot 2017-12-14 at 11.36.19 PM


Reconstructing a unix timestamp to readable date in Python and Java

We’re living in a multiple dimension world, at least four dimensions. The most critical dimension is time and it is recorded with all the dataset in digital observations. One of the most common but unreadable ways to record the time is Unix timestamp. It will show you a date time in the format of ‘1513219212‘ (ISO 8601: 2017-12-14T02:40:12Z). And you have might no idea what it is.

First, what is a Unix timestamp? It is the time in seconds from January 1st, 1970 to the very moment you call for the stamp itself.Simply put, the Unix timestamp is a way to track time as a running total of seconds. This count starts at the Unix Epoch on January 1st, 1970 at UTC. Therefore, the Unix timestamp is merely the number of seconds between a particular date and the Unix Epoch. The reason why Unix timestamps are used by many webmasters is that they can represent all time zones at once. (Wikipedia)

All the programming languages have a library to handle this.

In python, you can do the following:

import datetime

datetime.datetime.fromtimestamp(int(‘1513219212’)).strftime(“%Y-%m-%d %H:%M:%S”)

#It will print out ‘2017-12-13 21:40:12’ Local Time

Since the above method will give you locale dependent time and error prone. The method is better:


# This will print out ‘2017-12-14T02:40:12Z’ 

‘Z’ stands for Zulu time, which is also GMT and UTC.

You can also use ‘ctime’ from time module to get a human-readable timestamp from a Unix timestamp.

Here is how to do it in Java:

Timestamp to Date in Java

The output will be: The date is: Wed Dec 13 21:40:12 EST 2017.

It is important to note the difference here. Java expect milliseconds and you’ll need to cast it to long otherwise it will have integer overflow.

If you want to set the timezone in Java, you can simply add this line:


It will return The date is: Thu Dec 14 02:40:12 UTC 2017。

Dealing with legacy code contains ‘xrange’ in Python 2.7

Python 3.x has been around since 2008 but 2.7.x is still around and continues used in current development. While doing machine learning, one of the most used function is ‘xrange’ in loops. But ‘xrange’ has been replaced with ‘range’ in 3.x. Here is a good practice for writing code that’s compatible with both Python 2 and 3.x.



except NameError:

xrange = range

For Python 2.7 die-hard fans switching to 3.x, you can define ‘xrange’ as following:

def xrange(x):

return iter(range(x))


Understand blockchain with Simply Python Code

Everybody knows Bitcoin now but not everyone knows how blockchain technology works. The blockchain is like a distributed ledger which is a consensus of replicated, shared and synchronized digital data geographically spread across multiple sites and there is no centralized data storage. This is different from centralized and decentralized storage. See illustration image here:


In another word, blockchain is a public data storage where every new data is stored in a ‘block’ container and inserted into an immutable chain with past data. In terms of bitcoin or other coins, these data are a series of the transaction record. Of course, the data stored here can be anything. The blockchain technology is supposed to be more secure and hack-proof since the computation resources required to hack it is unimaginable.

Here, I’ll show a simple Python code to demonstrate how blockchain works:

The code structure is like this shown in Eclipse:

Screen Shot 2017-12-10 at 12.34.35 PM

The Python code is shown below:Screen Shot 2017-12-10 at 12.35.57 PMScreen Shot 2017-12-10 at 12.35.40 PMScreen Shot 2017-12-10 at 12.35.49 PM

Let’s look at the blockchain created:

Screen Shot 2017-12-10 at 12.39.43 PM.png

As seen in the code, each block contains the hash of the previous block. And this makes it’s hard to modify the blockchain. In practice, there are other restriction to make each new block harder to generate. For example, you can restrict new block to all start with nth zero in the new hash. The more leading zero will make it harder to generate a new block. The way it is distributed requires that a new legitimate block need to be voted ‘valid’ by at least 51% of public storage holder.



How to clear all in python Spyder workspace

While doing data analysis, sometimes we want clear everything in current workspace to have a fresh environment. It is similar to Matlab’s ‘clear all’ function. Here is how the function looks like (

def clear_all():
“””Clears all the variables from the workspace of the spyder application.”””
gl = globals().copy()
for var in gl:
if var[0] == ‘_’: continue
if ‘func’ in str(globals()[var]): continue
if ‘module’ in str(globals()[var]): continue

del globals()[var]
if __name__ == “__main__”:

Converting local time to UTC and vice verse in Python

When dealing with global data time series, we often encounter data in different time zones. Here I’ll share with the python scripts that created to address this issue:

  1. Converting from local to UTC

# e.g. local_to_utc(t.timetuple())

import time,calendar
import datetime

def local_to_utc(t_tuple):
secs = time.mktime(t_tuple)
utcStruct = time.gmtime(secs)
return datetime.datetime(*utcStruct[:6])

2. Converting from UTC to local time

# e.g.: utc_to_local(t.timetuple()):

import time
import calendar
import datetime
def utc_to_local(t_tuple):
secs = calendar.timegm(t_tuple)
localStruct = time.localtime(secs)
return datetime.datetime(*localStruct[:6])