data collection (for all kinds of things) with KoBoToolbox

I’ve been dabbling quite a bit with data collection this past year. There are a lot of methods by which to skin this cat, but I’ve settled in with KoBoToolbox for a lot of reasons expressed below.

My MSUDAI colleague Ben Carter has done a wonderful job of documenting his experiences with KoBoToolbox for field data collection from the perspective of an archaeologist. He provides some wonderful introductions here and here. Read these first or come back to them.

KoBoToolbox is a highly versatile, free, open source data collection tool originally designed for humanitarian organizations. Its back-end is similar to Open Data Kit (ODK) running XForms/xml, but Kobo adds really slick Enketo Forms in the browser, making it compatible with any device or operating system. It’s also hosted, which is a double-edged sword, but fantastic for getting a project up and running almost instantly.

You can set up a project in several ways. The KoBoToolbox website provides a super simple GUI with drag and drop selections to set up values. You can do a great deal from here, but for more fine-grained control of settings and logic you can also download the structure as an XLSForm, add more code, and upload back into the system. Or you can just download the straight XML and modify it directly, if that’s your thing.

the KoBo form builder

Projects so far

A lot of my experience with the platform has been either on a test basis or for temporary projects. Since the data resides in the cloud (in this case on Harvard servers somewhere), this can get tricky for a state employee like myself. For that reason, I can’t currently use KoBoToolbox for any data that is in any way sensitive (including capturing archaeological site locations). It is possible to run the platform on your own Linux server or via Docker, however my agency doesn’t have compatible resources to make that happen at the moment. So I’ve set up quite a few projects that are really only prototypes. Hopefully I can find a way to make them official soon.

Site visits

a screenshot of the basic site visit form

This project is designed for quick recording of visual observations at a site, without excavation. This is the kind of thing our regional archaeologists do all the time and it’s been a challenge to get the information back into the site inventory database. This form includes a GPS reading and multiple photo attachments. One thing I love is that the GPS reading also displays accuracy, so you can identify a comfortable threshold for error and grab points until you get a good one. You can set some metadata to be automatically collected, such as date/time, device ID, etc. If you’re in the middle of nowhere without a data connection, the whole system still works as long as you have the form loaded. Any records you create will queue in the browser until you’re back to a connection.

As I mentioned before, site location stored on someone else’s servers is a no-go. So unfortunately I have not been able to flesh this out and deploy it to any real degree.

Survey data collection

View the form

This one was similar to the site visits project above, but designed at a more granular level to collect information about Phase I survey. I got to take it for a dry run last fall in when I was out in the field and it was SO FUN, even with the mild torture of knowing that I couldn’t “officially” implement it. Another element of KoBoToolbox that really shines is the on-the-fly analysis. I could visualize all points on a map at any time and filter them by other data fields. I had a running map of positive shovel test pits, historic positives, and lithics. We were also collecting more accurate GPS points with a handheld unit, but the cell phone and tablet coordinates were completely adequate as reference data.

KoBoToolbox allows for very fine, granular control over data constraints, cascading picklists, skip logic, etc. If in a perfect world I could find a way to safely (and compliantly) store locational data, I’d take the time to sync up values and constraints with our core VCRIS site database terms, and artifact vocabularies.

Records inventory

View the form

One of my huge, ongoing projects has been to create a unified database for archaeological materials collections, but I’ve since broadened it to include records due to the fact that it all lives in the same storage area, at least for the moment. Our “field notes” collection has been managed by different people in different physical locations over the years, and there has never been a finding aid short of rough alphabetization of site number order in boxes. As we move toward digitization and face critical physical storage issues, it’s time to figure out exactly what we have and where it is.

Our relational database is currently in Microsoft Access. Hopefully we can migrate to something more powerful and stable in the future, but for now it’s what I’m working with. I chose to set up data collection in KoBoToolbox instead of in an Access form primarily because we often get funds for short term help on very short notice, sometimes without the ability to provide IT equipment quickly to temporary staff, interns, and volunteers. Since we don’t have public wi-fi available in our building, KoBo seemed like a great choice because I could set multiple people up to work on any device at hand, regardless of connectivity and with no required technical proficiency. Also, concurrent work in an Access database is a pain under our current configuration.

This form isn’t particularly sophisticated in terms of skip logic or cascading picklists, but I did want to assign a Recordset ID to each new entry. My initial plan was to code in an auto-increment value to each new record, but I couldn’t find a good way to do so. There are some unique IDs automatically generated in KoBo, but one is random (and very long), and the Index value is perfect except it automatically resets if the form is updated and redeployed. Big problem.

Basic documentation for KoBoToolbox and XLSForms is great, but advanced functions aren’t very well documented. Calculation fields are incredibly powerful, but there doesn’t appear to be an authoritative source for commands, functions, and parameters. In the end I discovered that I could code in functions as default values for any field, but it is not possible to reference an entry in an earlier record. As such, I decided to create Recordset IDs as a random number with a prefix.

There are two ways to get this value to display on the form. First, I edited the Calculation field of an exported version of the XLSForm.

XLSForm with concatenation and random number calculation

That worked, but I needed to play around with decimal places and formatting and discovered that I could also add this into the label section of any field through the GUI KoBoToolbox dashboard.

calculation in the label of the KoBo field

So, here we are! The other unexpected and lovely advantage of KoBo is once again the integrated analysis. We’re not too far into this project, but it’s clear from the outset that we have specific proportions of different record types and media. Having this data will allow us to be very agile as we develop strategies to integrate subsets into digital repositories, look for offsite storage, and more. Further, once this is integrated into the relational database we’ll have an instant snapshot of all records and material collections on hand and then connect that information back to the broader history of survey data associated with each site. From there we can integrate all this into our online system so CRM archaeologists and researchers can instantly see what we’ve got and how much. I’m dreaming big.

after a few days of work, we've got a breakdown of types within 173 recordsets

comments powered by Disqus