Getting started with CKAN

As I’ve written about before, one of the main tasks the code fellows will be doing is setting up open data platforms. We’ve seen what can be achieved at the national level with data.gov and data.gov.uk which are both powered by CKAN. In Amsterdam, the code fellows were introduced to another platform, CitySDK.

citysdk logo and appshotCKAN and CitySDK offer different approaches and advantages for organisations opening their data. CitySDK mandates data follow a certain format. In contrast, CKAN functions as a portal. Data can be hosted anywhere, in any format, so it’s up to the data owners to look after it appropriately.

Rather than spend too much time analysing, we decided to spend an afternoon setting up CKAN together. CKAN requires Linux. If you’d like to install it, you’ll need to have root access, and be familiar with installing packages, creating directories, changing permissions, and other command line basics.

Our first step, was to set up a basic cloud server instance on Rackspace. This wasn’t anything special, just a fresh Ubuntu 12.04 install.

CKAN can be installed from a package or from source. Considering we might have a play with the code base at some point, we decided to install from source, following this guide.

From SSH, the first few steps were to install the relevant packages and set-up a virtual python environment. From the virtual environment we used the pip command which clones the repo and installs CKAN from GitHub. In the process we got a few warnings, but nothing that needed to be dealt with.

Once you’ve installed the python requirements you’ll need to set up a PostrgreSQL database and add database user details to a config file. Deceptively, there are quite a few more steps to go through, as you’ll need to install Solr for text search. This isn’t a step you can skip! We thought we could get CKAN going for testing purposes but it won’t run without a Solr instance.

We followed the single instance Solr guide as suggested. Everything ran smoothly until we tried to start the jetty server.  We got the “Could not start Jetty servlet engine because no Java Development Kit….”  error which you’ll see in the note.  Our Java JDK package was installed in an unexpected place so we had to hunt it down and set it in the Jetty configuration file accordingly.

Moving on to step 6 in the guide,  we used the paster command to create database tables. N.B. when you’re doing this, remember to do it in the python virutalenv.

Step 7, setting up a datastore, is an optional measure. The datastore extension enables you to use a postgres database to store structured data. If you’ve already got data structured and hosted elsewhere then you can probably skip this. The guide doesn’t mention it, but there’s also a filestore extension you can use to upload and host files through CKAN. This can be done locally or  with a cloud storage solution like Amazon’s S3.

For the final steps, linking to who.ini was easy but we ran into some trouble using paster. After messing around for a moment we realised we weren’t using the python virtual env (not the first time that caught us out).  Once we we’re back in and ran ‘paster serve’ we had our first deployment of CKAN!

CKAN's first intro page

Overall, the guide is quite comprehensive. If you’ve got, a fresh install, full access and you don’t need to stray from the path too much you should be fine.

Leave a Reply