On mass digitization for the UNC library (Kirill Fesenko, 2007)

Mass digitization of collections has become an attractive area for leading academic libraries. Through digitization and publishing of books on the Internet, libraries seek to experiment with new technologies, improve access to collections, provide innovative support to education and research, and to increase visibility and competitiveness of their universities. Our Library has been also researching these opportunities with an aim to start a one year experimental large scale digitization project. Among our major goals for this project are the following: experiment with new digitization technologies, processes and relevant work flows; research new ways of online publishing and providing Internet access to digitized material; explore new opportunities in cooperative development of online collections with other academic and cultural institutions.

With these ideas in mind, our staff examined three main mass digitization vendors and their approaches to work with libraries: Google, Microsoft and the Internet Archive. We met in person or conducted phone conferences with representatives from all three organizations, reviewed their contracts and consulted with colleagues in other institutions. Of the three possible mass digitization approaches and vendors, our Library chose Internet Archive for the pilot project. Google and Microsoft's contracts for mass digitization of library collections would not support our goals for the pilot project for several reasons. Firstly, in both Google and Microsoft's approach, the choosing of titles for digitization would be assigned to the vendor and not to the Library. Secondly, both contracts impose restrictions on the Library' ability to use digitized books. For example, these contracts require Library to block Internet search engines from accessing the digitized books on its servers. Thirdly, in case of Google, the books have to be shipped to their own digitization centers and we won't be able to experiment with the equipment. In Microsoft's case, the contract requires installation of 10 digitization stations minimally, which would expose us to larger workload that we would like to have for the pilot project. Google's approach to scanning library books which are still in copyright without explicit publisher permission caused a group of authors and publishers to sue the company for the copyright infringement. To our knowledge, this case is still in the Federal Court.

Internet Archive's proposal, on the other side, does not have the above mentioned restrictions. It will provide the Library with one fast scanner (aka "scribe") and an operator, and publish digitized books on its web site for public access. Importantly, the Library will retain the flexibility of choosing books for digitization and their future reuse on its own web sites. This project will also help the Library to connect with the Open Content Alliance (OCA), a reputable group of cultural, technology, nonprofit, and governmental organizations working closely with the Internet Archive on similar projects. We also plan to develop knowledge and experience needed to define long term approach for mass digitization of collections as result of this experimental one-year project.