databases, optical codes, databases, databases...

Since all the Institute action wrapped up in August and September, I’m back to some database projects that have been asking for my attention for a while. Back in May I posted about optical scanning. First order of business now is to take this project back up again and get rolling. Some backstory: I don’t directly work with our artifact collections, but we’re a very small shop. And since all information is connectable by state site number, it’s in my diabolical master plan to get all agency databases talking to each other. And, in what I’ve come to learn is a big part of the Agile1 philosophy of project management and software development, I wanted to build something as simple as possible that works at a base level. We can increase complexity later.

So when there was funding available in the summer to hire folks to inventory boxes, I whipped together a plan with Collections colleagues to inventory boxes one by one (which had never actually been done before), give them each a scannable optical code identity, and rebuild the Access database into something relational and easy to update.

We bought a thermal printer for box tags and the inventory folks recorded all site numbers in each box along with the boxes’ location. When incorporated with existing data, we’ll have a good handle on collections inventory.

I ran into a hitch, though. Data Matrix encoding seems to be quite proprietary. This hurts my little open heart. I know my employer doesn’t have any more funds earmarked for this project, so I’m determined to do well with what we’ve got and look for open solutions. After some research, I came upon this fantastic paper in Biodiversity Data Journal (an open science journal) on the Makelabels code from Virginia Tech, available on GitHub.

The software from the printer manufacturer does have this capability, but a) locating updated software releases from Zebra is a nightmare, and b) the codes look good, but DON’T SCAN. Blast. So my next order of business is to generate a bunch of codes that work and begin to tag boxes. After some research and considerations of our fiscal constraints, I decided to use non-adhesive tags and place them into existing sticky sleeves on each box, as well as a duplicate tag inside a polyethylene bag within.

I’m also going to dive back in to restructuring the old Access database with inventory information and making things relate. Ultimately the agency would be very well served by a more robust system, but Access will do to meet immediate needs of tracking box locations, loan information, and the like.

When this is wrapped up, we’ll be that much closer to being able to integrate all these datasets inside our agency and to really tap into some of the power of this information. Next stop, scanning, OCRing, and indexing the vast collection of nonstandard artifact catalogs. Visualizations, keyword queries, endless possibilities. A person can dream, right?


  1. Agile side note: getting into research on project management strategy, I’ve come to find that I have pretty good intuition on how to set these kinds of projects up. Small, workable chunks, lots of demos, constant reworking. And this is totally obvious and boring to people who do anything tech related, but keep in mind that I’m an archaeologist in a bureaucracy, so, novel!
comments powered by Disqus