Jupyter notebooks are a popular way of exploring data sets by setting out your code, data and visualisations in an interactive, web-based notebook. Jupyter notebooks can be run on your own machine, or as-a-service as is the case with IBM Watson Studio. Oftentimes your data is in CSV format and loaded into a data frame for analysis using Apache Spark or Pandas, but it is also possible to load data directly from a Cloudant database directly from the notebook. In this article I’ll demonstrate how this is done in Python and Node.js.

notebook

Photo by Jess Watters on Unsplash

Sprinkling some Pixiedust

We’re going to use the open-source Pixiedust library along the way. Pixiedust providers helper functions that allow data to be visualised with very little effort in a notebook. Throw it a Spark or Pandas dataframe and Pixiedust will do the rest, whether you need a table, chart or map.

notebook0

Setting up Watson Studio

Watson Studio allows Jupyter notebooks to be run as-a-service in the IBM Cloud. The notebooks can be backed by a choice of kernels: choose your Python version, number of CPU cores and memory allocation. The notebooks can be paired with other services in the IBM Cloud such as Apache Spark, IBM Cloudant and IBM Cloud Object Storage to create analytic workspaces and interactive dashboards to draw insights into your data.

Setting up Cloudant

To allow access to our Cloudant data from a notebook, we could use the Cloudant account’s admin credentials, but better practice is to create and api-key/password pair that has read access to the database(s) needed.

In the Cloudant dashboard, select the database to be accessed and choose the “Permissions” tab. Click the “Generate API Key” button:

api key

Make a note of the key and password. The new key is automatically given _reader access to your database and it can be given further permissions by checking boxes against the key.

Python

We’re going to use the official Cloudant libraries in our notebooks: first up, the Python library.

The first time only, we’re going to need to install the library with a shell command in a notebook cell:

!pip install cloudant

In the next cell, you can then import the library and make a connection to the Cloudant database:

from cloudant import Cloudant
u = 'APIKEY'
p = 'PASSWORD'
a = 'HOST'
client = Cloudant(u, p, account=a, connect=True, auto_renew=True)

supplying the apikey/password pair we generated earlier and a hostname - or more precisely, the bit of your Cloudant’ accounts domain name before .cloudant.com.

We can use the Cloudant client object to connect to a specific database:

db = client['mydata']

and use that object to fetch some data - in this case the first 500 documents:

# fetch first 500 documents
response = db.all_docs(limit=500, include_docs= True)

# put document bodies into an array
docs = []
for r in response['rows']:
    docs.append(r['doc'])
    

The documents array can become a Pandas data frame,

# create a Pandas dataframe containing the data
import pandas as pd
df = pd.DataFrame(data=docs)
display(df)

and the data frame used in a Pixiedust visualization:

import pixiedust
display(df)

notebook2

Using Node.js in notebooks

The pixiedust_node project allows you to mix-n-match Node.js code in notebooks together with Python code. Patrick Titzler wrote an excellent guide to running Node.js in Watson Studio notebooks - on a local installation of Jupyter, the Node.js executable simply needs to be available on your machine’s path.

Once your Jupyter environment is configured we can install pixiedust_node

!pip install pixiedust_node
import pixiedust_node

and install the official Node.js Cloudant library:

npm.install('@cloudant/cloudant')

At this point, any cells starting with %%node are interpreted as Node.js code. Firstly, we need to initialise our Cloudant connection:

%%node
const Cloudant = require('@cloudant/cloudant')
const u = 'USERNAME'
const p = 'PASSWORD'
const a = 'HOST'
const cloudant = Cloudant({account: a, username: u, password: p})
const db = cloudant.db.use('mydata')

We can use the db object to fetch some data and turn the data into an array of documents for display:

%%node
var docs = []
db.list({ include_docs: true, limit:500}, (err, data) => {
  for(var r of data.rows) {
    docs.push(r.doc)
  }
  display(docs)
})

notebook3

A cool feature of pixiedust_node is that global Node.js variables are automatically copied into the Python. So the docs array is immediately available to work with in Python in the next cell:

print('This is Python')
print(len(docs))
# This is Python
# 500

Further reading

There’s much more to be done with Notebooks & Cloudant. Here’s some more reading material: