Approaches to Openness: Digital Archaeology Data in Virginia and Public Engagement
Jolene Smith, Virginia Department of Historic Resources
Society for Historical Archaeology, January 2016
I drafted this paper as a public document. Check out the wonderful comments.
Abstract: Virginia’s archaeological site inventory contains detailed information on nearly 43,000 sites in datasets maintained by the Department of Historic Resources (State Historic Preservation Office). At times, responsibility to protect sensitive sites from looting and vandalism seems to run counter to providing information to the public about Virginia’s archaeology. But the two are not mutually exclusive. This paper will explore Virginia’s historical approach to archaeological data dissemination with regards to both risks and benefits. This paper will also outline future initiatives to maximize site data availability for different types of users. By leveraging archaeological site information to create a sense of stewardship among local governments, development interests, and the general public, we may be able to protect Virginia’s buried heritage more effectively than ever before.
Archaeological Data in Virginia
 As of this moment, Virginia contains exactly 44,130 archaeological sites officially recorded in the statewide inventory at the Department of Historic Resources. I have been privileged to manage this dataset since 2008. The Archives at DHR includes everything in Virginia recorded with a Smithsonian Trinomial site number. It’s nearly comprehensive, but we do miss out on information from some academic archaeology, investigations conducted by private foundations, etc. All that is to say we have a lot, but we don’t have everything.
 The Department of Historic Resources is the State Historic Preservation Office for Virginia, so much of the data we collect reflects the needs of compliance with environmental regulations as well as supporting the workflow for listing historic properties on the National Register of Historic Places. The vast majority of our data has been created in response to the growth of the Cultural Resource Management industry after the 1966 introduction of the National Historic Preservation Act.
Historically, approaches by our agency to data-sharing have been very much ad-hoc. Generally, research required one-on-one contact between a researcher and SHPO staff, although depending on staffing, security varied. In the early days, researchers could pull records straight from files, but as time progressed, Archives staff provided more oversight (Barber 2015).
All this changed as DHR transitioned to electronic systems.  In 1991 the agency began using IPS (Integrated Preservation Software), an offline relational database developed with the National Park Service and distributed between individuals and institutions and then manually synced as needed (Virginia Department of Historic Resources 2001) (while a game changer, this system was about as beloved as you might think). For the first time, archaeological site records were digitized, although only new work. NPS encouraged new kinds of multi-state analysis and new ways to use information and envisioned large-scale data analysis possibilities using a standardized system (Miller 1994), however IPS did not become widely adopted.
Between the late 1990s through early 2000s, DHR staff and contractors transcribed all basic site-level information from paper records and migrated IPS data into the new Historic Resources Data Sharing System (DSS), one of the first web databases in the country with comprehensive state archaeological site data. DSS access was limited to professional archaeologists, planners, environmental firms, and other qualified individuals. For the first time, users had access to statewide information with querying power and a full web GIS. In the mid 2000s, we also began distributing statewide, county, and project specific geospatial data (via GIS shapefiles).
 In 2013, DHR transitioned to VCRIS, a new, more powerful web database and map. We introduced tiered access in order to allow for conditional security and data restriction (some users are authorized to access the statewide archaeology and architectural dataset, others are restricted to architecture, etc.). VCRIS includes a public viewer, although it is currently limited to architectural point data, as well as a full GIS feature service allowing for authorized users to access live data in external (ESRI-friendly) applications. Although the functionality is not refined, it is possible for users to export very large datasets as well as to query by keyword in large open text fields. For example, querying assemblage description for a specific type of artifact, “copper bead” is now possible for authorized users outside of DHR database administrators. It’s not pretty, but it’s possible.
In the DSS era, access to information was generally limited to users who paid an annual or quarterly access fee, with waivers for “strategic partners” and specific projects on a case by case basis. VCRIS operates on a similar fee-for-service model (and its maintenance/enhancements are completely funded by these fees), although DHR has formalized complimentary access and made it easier for academic researchers to access data at no cost.
While we are very electronic, many researchers (both professional and nonprofessional) come in person to the DHR reading room. Security has varied over the years, but has been largely reliant on reading room staff to ask specific questions and verify the intentions of information seekers before data is released. Recently we’ve introduced a paper form that archives visitors must sign that collects information about professional qualifications and reason for visit in order to reduce the need for sometimes awkward interrogation by desk staff.
To sum up this this history lesson and to put all this back into context, DHR’s data can be seen as “open” or “closed” in a few different ways : Ease of access: how fast (or difficult) is it for a researcher to get the information they need? Audience with access: Can someone who works at DHR get the information? How about a CRM archaeologist? A graduate student out of state? An interested member of the public? Usability of the data: Can a researcher easily get the information out of its parent document for further analysis? Cost of access: If the information is not free, is the cost easy to absorb? Does the cost of accessing the information disincentivize thorough research?
Toward An Open Approach
In 2012, Joshua Wells and David Anderson approached us with an intriguing request: send our whole database (or a large chunk of it) off to be a part of DINAA (the Digital Index for American Archaeology), where it would be linked with databases from other states and publicly accessible, with specific locations and sensitive details protected. When I received the initial communications, I was very excited by the concept. I had been doing some very basic research on interoperability of databases, and this was coming at a perfect time. I was, however, a bit nervous about how my colleagues would react to such a dramatic shift toward openness in what has traditionally been a fairly restrictive environment. I made a case to the State Archaeologist and DHR’s Archivist and they agreed to participate.  As of today, select fields from Virginia’s entire inventory as of 2013 are available for anyone to access through DINAA and our time periods and site type descriptions have been integrated into a linked open data model.
DHR has over 8,000 archaeology reports digitized as PDFs, however they have been stored up to this point in network folders accessible only to DHR staff and with very little metadata stored in filenames. A critical shortage of available storage space as well as a major opportunity for public outreach inspired an ongoing proof of concept digital repository project. This project will identify a selection of low-risk (mostly because of destruction) and high-interest archaeological reports and project documentation for integration into a digital repository with a free, public web portal. The repository will be to current digital archives standards and will also incorporate elements of linked open data to connect the amazing resources at DHR with the wider world.
Earlier this year, my small agency had an all-staff meeting. This time around, our keynote speaker was David Givens, archaeologist at Jamestown. He gave us a really wonderful overview of the top stories from Jamestown.  He spoke to us about researching a silver reliquary and trying to identify its contents based on imaging. There was a lead item inside the box and its identity was unclear. After much online searching, he came upon an example of a lead ampulle in the digital catalog of the Museum of London . It was a breakthrough. That was what was in the box at Jamestown. There were oohs and ahhs from the DHR audience. And I thought to myself, “we HAVE this.” Well not this pictured 15th century lead ampulla exactly, but we have so many pieces of so many puzzles in our collections. Our materials could be making national and international news just like this.
Risks of Rapid Change and Downsides to Openness
 Blowing Virginia’s archaeological dataset wide open is a pretty radical concept. Just think about the imagery. It’s explosive. Call me a fuddy duddy, but I prefer my explosions to be of the controlled sort .
Reservations about opening up archaeological data generally fall into three groups:
 Someone might steal my ideas. From the perspective of a state-level archive, this point actually hasn’t been raised very frequently. When materials are submitted to us, they become publically accessible to a limited degree by default. However, it has been very challenging to keep up with academic work (namely theses and dissertations) and that’s something we could improve upon in the future. The open scholarship community is active and growing and can provide much more on why this concern isn’t or shouldn’t be valid. See Eric Kansa’s recent essay entitled “Click Here to Save Archaeology” for more on this issue and others only superficially touched upon in this paper (2015).
 We won’t be able to make any money from this. Sadly, this point is one that hits close to home. I do not mean to imply that my agency is out to make a tidy profit from archaeologists just trying to do some research, but the fact of the matter is that our digital information infrastructure and in-person reading room are currently structured to support themselves. VCRIS is currently funded exclusively through license fees and DHR Archives see ever-dwindling revenue from in person visits (copies, scanning, etc.). Restructuring funding models is a hard sell because it’s hard to do, but I argue we must look for different solutions.
 Publishing archaeological information might attract looters. This is a valid concern and one that isn’t taken lightly by SHPO archaeologists or others in the archaeological community. The challenge is finding the appropriate balance. Maximizing overall benefit to sites, communities, our agency, and researchers, while minimizing risk. Unfortunately this risk isn’t easily quantifiable, so we’re stuck with best guesses. I propose that the professional looters (at least in Virginia) know where to look. They’ve identified the locations of Civil War earthworks and campsites. They’ve researched the locations of burial caves and mounds. We may have more to fear from the casual or opportunistic looter. By releasing detailed site information connected to buffered locational data on all but the most sensitive sites, I argue that we can approach this balance.
There have been instances in the past when individuals out to find treasure have accessed or attempted to access archaeological records through DHR’s reading room. In one case a large amount of information on shipwrecks was provided to a treasure hunter without proper screening. In another case, a member of a relic hunting message board proposed a concerted effort to visit DHR and request records under false pretense (but we check those boards, too). In both of these cases, the bad actors were motivated by finding very specific coordinate information, not comparative research data.
But the adage “you can’t un-ring a bell” comes to mind. If we put this information out there, what if it is too much? What if something goes wrong. I argue that the time to take a series of calculated risks is now.
Positive Implications of Openness
 So, let’s for a moment imagine that we’ve made a lot of progress opening up Virginia’s data, well on our way to utopia. Some significant portion of our collection of over 6 million objects is digitized, linked, and accessible to anyone. Information about time periods and site types for our statewide inventory is public (and we’re already there for the time being thanks to DINAA). The vast body of knowledge held within more than 650,000 pages of archaeological reports are online in machine readable formats. As they arrive, raw datasets are indexed and published, to be tapped into by researchers worldwide. Publishing our data online won’t solve all our archaeological questions, but it is bound to unlock some doors we don’t even know are there.
By making the information free to access, we also begin to democratize Virginia archaeology to some degree. Indigenous communities and non-professionals may find it possible to explore alternative ways of interpreting the data (Kansa 2015). Researchers who are currently locked out by the restrictive nature of our non-commercial complimentary VCRIS account policy may take advantage of access to information.
As an employee of a government agency, I work within a structure of metrics, performance measures, and beans to count (whether I like it or not). Well, here are so many beans! My pitch to superiors who are rightfully concerned with financial solvency is this: how do we make ourselves as an agency indispensable to the outside world? When we are faced with a political administration who is not sympathetic to archaeology or historic preservation, who will write the letters to the Capitol? I don’t argue that this is the reason to open our data, but I do hold that a widespread investment by outsiders in DHR’s collection of information has the potential to serve a protective function.
Another real benefit of increasing access to our archaeological site information is increasing stewardship for archaeology. If people can learn more about the value of the information in their very own backyards, they are more likely to help protect it. Awareness of our agency and resources it provides including technical assistance by regional archaeologists is good for sites.
Archaeology Data Service. “Guides to Good Practice: Main.” Accessed December 31, 2015. http://guides.archaeologydataservice.ac.uk/.
Barber, Michael. Personal Communication. E-mail, December 10, 2015.
Kansa, Eric. “Click Here to Save Archaeology,” 2015. https://ekansa-pubs.github.io/click-here-to-save-archaeology/.
Miller, Diane. “National Register Information Is a Hidden Treasure.” CRM 17, no. 2 (1994): 13.
Seifried, Rebecca. “Linked Open Data for the Uninitiated.” ISAW Papers 7.26 (2014), 2014. http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/seifried/.
Virginia Department of Historic Resources. “Guidelines for Conducting Cultural Resource Survey in Virginia,” 2001.