The modern opens the past
Digital data-mining can greatly benefit archaeological studies, scholar says
It turns out that the past has a future. That is, archaeology has a chance to transform its print-bound publishing norms into a digital-age machine for gathering, sharing, and analyzing vast stores of field data. Such information — often paid for by public funds — is more likely now to be hoarded and parceled out to journals that would be obscure or expensive for most people.
Harvard-trained anthropologist Eric Kansa, Ph.D. ’01, brought this message and more to a Harvard audience in the inaugural lecture in a series organized by the Digital Futures consortium. The group began this summer as a One Harvard gathering of experts interested in how the digital age will change scholarship.
Kansa is “a true revolutionary” and “an ethnologist of academic culture,” said Digital Futures co-chair Judson Harward, director of research computing for the arts and humanities at Harvard University Information Technology. And his message of helping archaeology into the present is important for the future of the profession, he added. “No academic field has a longer history of publishing. But few have so conservative a publishing tradition.” The lecture, “A More Open Future for the Past,” was delivered Sept. 10 at Science Center Hall A.
Kansa teaches at the University of California, Berkeley, and co-directs the nonprofit Alexandria Archive Institute in San Francisco. He’s at the forefront of a movement to shift archaeology publishing from expensive print platforms with limited reach to open-source venues that he said will broaden usage, encourage innovation, and improve scholarship by creating vibrant communities of researchers.
“The gold standard of professional communication in academic archaeology is a peer-reviewed article in an established journal,” he said. Yet the digital age already has its examples of online academic journals that are prestigious, highly accessible, and attract the best “symbolic capital,” including Nobel Prize-winning contributors. PLOS Biology is in the forefront, said Kansa, and for a decade has been marrying print-age gravitas with digital-age scholarship.
Archaeology needs the same kind of forward-looking champions. Kansa looked out over the Harvard audience, which included tenured faculty, and said, “Please. Start an open-access journal.”
He was quick to supply a broader context. “The need for transformation in archaeology is part of a bigger suite of movements … to reform academic communication generally,” he said. “This is just a small part of the big picture.”
The broadest context relates to all the movements, inside and outside academia, that urge open access and open data throughout the online world. One landmark, said Kansa, was the 2003 Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities.
Some of those contexts point to progress in open access: online social habits, including bookmarking, hyperlinks, and open annotation; text-mining, now a common tool in research; and sharing strategies — Kansa’s forte and emphasis — which require ways of structuring data so it can be distributed efficiently.
In archaeology, and in academics generally, trends show: rising costs of traditional print publishing, up 300 percent since 1985; a slump in public funds for academic research and a countervailing explosion in student debt, which has risen more than 500 percent in the last decade; a shift in the academic environment from tenured faculty that has caused graduate students to flee academe or seek more “alt-ac” (alternative academic) positions; and entrenched publishing norms that put up editorial and legal impediments to sharing data.
Before Internet activist Aaron Swartz committed suicide, said Kansa, he faced federal felony charges with sentences possibly totaling more than 30 years relating to his download of 3.5 million academic articles from JSTOR, a pay-walled repository for archiving such articles. “For comparison’s sake,” Kansa said, “you get 20 years for human trafficking.”
All these woes explain what Kansa said was an increasingly risk-averse academic environment and its “dysfunctional incentives” to publish, to share information, and to advance ever-changing careers. Traditional journals share information only in restricted ways. Look at it like an archaeologist, he said. “If you dig up a site, you’re actually destroying it. So data is all you have left.”
But there are few incentives to conform to data-recording standards that would make sharing easier. As an example, Kansa said, one researcher with multiple data spreadsheets recorded in a private code. Only the researcher and his team knew, for instance, that the number “14 means ‘sheep,’” he said.
On the other hand, publishing vast bodies of archaeological data gathered in the field, if done properly, would expand the quality, scale, and accessibility of information that reveals how humans once lived. The point, he said, is to publish data, whether old or new, “so it is fit for wider consumption, for wider use.”
He said that the Britain-based Archaeology Data Service is archiving some open data content that is in accessible file formats, and can be legally transferred for free. At Harvard, the Shelby White and Leon Levy Program for Archaeological Publications supports open data publication. (Kansa is a board member at the Harvard program, which gives out nearly $1 million a year to fund publications of archaeological excavations.) More broadly, the Obama administration has recognized that “open access to public research is an important goal,” he said.
But there’s the rub. Data has to be “structured,” said Kansa, edited into format architectures that make it easy to search, compress, and analyze. “If you care” about data, he said, “you have to care more and more about structured data.”
That’s Kansa’s specialty. His Open Context nonprofit promotes “data-sharing as a form of publishing,” he said. It has 38 projects going now, on topics ranging from Asian stoneware jars to maps of animal bones that help determine the rise and spread of agriculture. The California Digital Library, a prestigious repository and arm of the University of California, handles Open Context’s data preservation and archiving.
There are other ventures exploring data publication, said Kansa, including the for-profit Journal of Open Archaeology Data.
But that is just the beginning. Kansa showed two views of the Web of Data map from 2009 and 2011 — a blooming, flower-like graphical representation of sites that carry open data from a variety of fields, arranged so machines can access and process linked data. “We need archaeology on that map,” he said. “If you’re not [on the map], you’re working in isolation.”
“This is really a lot of work,” said Kansa of wrangling data sets into order. Complicating the issue is archaeology itself. The profession is rife with data rich in numbers, words, photos, and graphics, as, for example, in studying distribution patterns for animal bones or Asian stoneware.
“Data are hard,” he said. “Data never speak for themselves,” but have to be formatted and clarified in much the way that an editor prepares text — hence the data publishing analogy.
Once data are available in clarified formats, they begin to bring together scholars from diverse disciplines in ways that make the world a more understandable, collaborative, and interconnected place. After all, the original intent of the Web was to create communities of inquiry, said Kansa, since “You don’t just share data to make it useful.”
On a smaller scale, those intersecting communities are already evident at Harvard. Co-sponsoring the Kansa lecture with the Digital Futures consortium were DARTH Crimson, Harvard’s Digital Arts and Humanities initiative, the Harvard Library, the Harvard College Library, the Peabody Museum of Archaeology and Ethnology, the Harvard Department of Anthropology, the FAS Standing Committee on Archaeology, and the Harvard Semitic Museum.
The lecture is available for viewing online.