Analyzing data from Microsoft Access file

While working as a data scientist, sometime you’ll receive an old data file from a system in Microsoft Access format. The first instinct is that “Oh, do I need to install Microsoft Access?”. You’re out of luck if you use Linux or Mac and it is cumbersome to install a big piece of software that you won’t use often. The good thing is that we can use Python to do the task easily.

Microsoft only creates ODBC driver for windows. If you’re using windows, it should have been installed on the system. The newest ‘ACE’ ODBC driver is included in the Office installation. To check whether you have the drive on your windows system, run the following code:

import pyodbc

[x for x in pyodbc.drivers() if x.startswith('Microsoft Access Driver']

 

If you see an empty list then you are running 64-bit Python and you need to install the 64-bit version of the “ACE” driver. If you only see [‘Microsoft Access Driver (*.mdb)’] and you need to work with a .accdb file then you need to install the 32-bit version of the “ACE” driver.

To connect the file and start the query, here is the code:


conn_str = ( r'DRIVER={Microsoft Access Driver ( *.mdb, .accdb)};'r'DBQ=C:\path.accdb; )
connection =pyodbc.connect(conn_str)
cursor =connection.cursor()
# Run a sql-like query
results = cursor.execute("select * from TableName limit 5;")
rows = results.fetchall()
# Turn it into a dataframe
labels = list('ABCDE') # if it's a five column data
pd.Dataframe.from_records(rows, columns = labels) # You must provide column names

 

For Ubuntu User, you can install pyodbc drive by the following command:

$ sudo apt-get install python-pyodbc

$ sudo apt-get install mdbtools libmdbodbc1

On Mac, you can install pyodbc simply by: pip install pyodbc

Access to the file is the same as windows, make the connection and setup the cursor to the file, then start to execute the sql commands.

The rest will be on you for creating interesting results.