GitHub Webhooks and Cloud Functions

February 08, 2021 | Glynn Bird | GitHub Serverless

Some GitHub repositories are not just source code for apps, they can also store data files holding JSON, YAML, XML or any other file format e.g.

In this blog post we’ll create an IBM Cloud Function that is triggered by a commit to a GitHub repository, which stores a copy of JSON data from GitHub in a Cloudant database.

pic2

The Cloudant-based mirror of the data can then be indexed, searched and used for operational use-cases while staying in near real-time sync with the GitHub reference.

pic

Photo by Brady Rogers on Unsplash

Prequisites:

Step 1 - IBM Cloud Function 🔗

First we are going to create an IBM Cloud function which will be called for every GitHub commit.

pic1

Step 2 - GitHub WebHook 🔗

Step 3 - Create a Cloudant service 🔗

Step 4 - Configuring the Cloud Function 🔗

Step 5 - That’s it! 🔗

Try creating, editing and deleting JSON documents in your GitHub repository - you should see them mirrored in your Cloudant service’s “github” database.

Note that data only travels one way: from GitHub to Cloudant.

What else can you do with this? 🔗

  1. Other data formats. Currently this script only accepts data from .json or .geojson files, but it could be extended to convert other formats e.g. YAML, into JSON before writing the data to Cloudant.
  2. Add indexing and querying. Using Cloudant’s extensive indexing options, the Cloudant copy of the data could be used to provide a read-only API for this data, allowing API clients to retrieve slices of data of your choosing.

Limitations 🔗

The serverless function assumes that the commit it is processing can be written to Cloudant in one bulk write, but some GitHub commits or squashed pull requests could add up to many megabytes, exceeding the size of a single Cloudant write.

Cloudant documents are limited to 1MB - adding larger documents to GitHub would result in a failure to write them to Cloudant.