Pandas– ValueError: If using all scalar values, you must pass an index

For Python users, we all know that it is very convenient to create a data frame from a dictionary. For example:

df = pd.DataFrame({‘Key’:[‘a’,’b’,’c’,’d’], ‘Value’:[1,2,3,4]})

It works beautifully when the values is a list/dict with multiple columns. However, you may encounter into syntax errors ValueError: If using all scalar values, you must pass an index” when you try to convert the following dictionary to a data frame.

dict_test = {

‘bacon’:’pig’,

‘pulled pork’:’pig’,

‘pastrami’: ‘cow’,

‘honey ham’:’pip’,

‘nova lox’: ‘salmon’

}

df = pd.DataFrame.from_dict(dict_test)

Why is that?

While pandas create data frame from a dictionary, it is expecting its value to be a list or dict. If you give it a scalar, you’ll also need to supply index. In this example, the values are ‘pig’ instead of [‘pig’].

How to fix it:

  1. Change the data to:

dict_test = {

‘bacon’:[‘pig’],

‘pulled pork’:[‘pig’],

‘pastrami’: [‘cow’],

‘honey ham’:[‘pip’],

‘nova lox’: [‘salmon’]

}

2. Get the list items from the dictionary and add ‘list’ for Python 3.x.

pd.DataFrame.from_dict(list(dict_test.items()), columns = [‘food’,’animal’])

3. Specify the orientation with ‘index’.

pd.DataFrame.from_dict(dict_test, orient = ‘index’)

4. Pass the Series constructor instead:

s = pd.Series(dict_test, name = ‘animal’)

s.index.name = ‘Food’

df = pd.DataFrame(s)

Advertisements

The convenience of subplot = True in dataframe.plot

When it comes to data analysis, there is always a saying: “one picture worths a thousand words.”. Visualization is an essential and effective way of data exploration and usually as our first step of understanding the raw data. In Python, there are a lot of visualization libraries. For python dataframe, it has plenty of built-in plotting methods: line, bar, barh, hist, box, kde, density, area, pie, scatter and hexbin.

The quickest way to visualize all the columns data in a dataframe can be achieved by simply call: df.plot().  For example:

df = pd.DataFrame({‘A’:np.arange(1,10),’B’:2*np.arange(1,10)})
df.plot(title = ‘plot all columns in one chart.’)

dfplot1.png

But a lot of times we want each feature plotted on a separate chart due to the complex of data. It will help us disentangle the dataset.

It turns out that there is a simple trick to play with in df.plot, using ‘subplot = True’.

df.plot(figsize = (8,4), subplots=True, layout = (2,1), title = ‘plot all columns in seperate chart’);

dfplot2.png

That’s it. Simple but effective. You can change the layout by playing with the layout tupple input.

Hope you find it helpful too.