Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

OPAC integration in the era of mass digitization the MBooks experience

Identifieur interne : 000344 ( Istex/Corpus ); précédent : 000343; suivant : 000345

OPAC integration in the era of mass digitization the MBooks experience

Auteurs : Bradford Eden ; Christina Kelleher Powell

Source :

RBID : ISTEX:3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF

Abstract

Purpose The purpose of this paper is to provide an overview of the OPAC integration in the University of Michigan's local implementation of materials digitized in the partnership with Google. Designmethodologyapproach The paper provides a discussion of different strategies used in integrating metadata with digital resources and presenting the digital objects to the user in the OPAC. Findings The paper finds that methods that had served in smaller digitization projects require more automation and error reduction processes in an undertaking of this scale. Increased integration with the OPAC is one approach. Originalityvalue Michigan is the first of the Google partners to mount their materials themselves and others involved in mass digitization may be interested in the experience.

Url:
DOI: 10.1108/07378830810857771

Links to Exploration step

ISTEX:3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">OPAC integration in the era of mass digitization the MBooks experience</title>
<author wicri:is="90%">
<name sortKey="Eden, Bradford" sort="Eden, Bradford" uniqKey="Eden B" first="Bradford" last="Eden">Bradford Eden</name>
</author>
<author wicri:is="90%">
<name sortKey="Kelleher Powell, Christina" sort="Kelleher Powell, Christina" uniqKey="Kelleher Powell C" first="Christina" last="Kelleher Powell">Christina Kelleher Powell</name>
<affiliation>
<mods:affiliation>Digital Library Production Service, University of Michigan, Ann Arbor, Michigan, USA</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1108/07378830810857771</idno>
<idno type="url">https://api.istex.fr/document/3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000344</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">OPAC integration in the era of mass digitization the MBooks experience</title>
<author wicri:is="90%">
<name sortKey="Eden, Bradford" sort="Eden, Bradford" uniqKey="Eden B" first="Bradford" last="Eden">Bradford Eden</name>
</author>
<author wicri:is="90%">
<name sortKey="Kelleher Powell, Christina" sort="Kelleher Powell, Christina" uniqKey="Kelleher Powell C" first="Christina" last="Kelleher Powell">Christina Kelleher Powell</name>
<affiliation>
<mods:affiliation>Digital Library Production Service, University of Michigan, Ann Arbor, Michigan, USA</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Library Hi Tech</title>
<idno type="ISSN">0737-8831</idno>
<imprint>
<publisher>Emerald Group Publishing Limited</publisher>
<date type="published" when="2008-03-07">2008-03-07</date>
<biblScope unit="volume">26</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="24">24</biblScope>
<biblScope unit="page" to="32">32</biblScope>
</imprint>
<idno type="ISSN">0737-8831</idno>
</series>
<idno type="istex">3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF</idno>
<idno type="DOI">10.1108/07378830810857771</idno>
<idno type="filenameID">2380260104</idno>
<idno type="original-pdf">2380260104.pdf</idno>
<idno type="href">07378830810857771.pdf</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0737-8831</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract">Purpose The purpose of this paper is to provide an overview of the OPAC integration in the University of Michigan's local implementation of materials digitized in the partnership with Google. Designmethodologyapproach The paper provides a discussion of different strategies used in integrating metadata with digital resources and presenting the digital objects to the user in the OPAC. Findings The paper finds that methods that had served in smaller digitization projects require more automation and error reduction processes in an undertaking of this scale. Increased integration with the OPAC is one approach. Originalityvalue Michigan is the first of the Google partners to mount their materials themselves and others involved in mass digitization may be interested in the experience.</div>
</front>
</TEI>
<istex>
<corpusName>emerald</corpusName>
<author>
<json:item>
<name>Bradford Eden</name>
</json:item>
<json:item>
<name>Christina Kelleher Powell</name>
<affiliations>
<json:string>Digital Library Production Service, University of Michigan, Ann Arbor, Michigan, USA</json:string>
</affiliations>
</json:item>
</author>
<subject>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Online catalogues</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Digital libraries</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>University libraries</value>
</json:item>
</subject>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>research-article</json:string>
</originalGenre>
<abstract>Purpose The purpose of this paper is to provide an overview of the OPAC integration in the University of Michigan's local implementation of materials digitized in the partnership with Google. Designmethodologyapproach The paper provides a discussion of different strategies used in integrating metadata with digital resources and presenting the digital objects to the user in the OPAC. Findings The paper finds that methods that had served in smaller digitization projects require more automation and error reduction processes in an undertaking of this scale. Increased integration with the OPAC is one approach. Originalityvalue Michigan is the first of the Google partners to mount their materials themselves and others involved in mass digitization may be interested in the experience.</abstract>
<qualityIndicators>
<score>4.705</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>519 x 680 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>3</keywordCount>
<abstractCharCount>788</abstractCharCount>
<pdfWordCount>3301</pdfWordCount>
<pdfCharCount>19172</pdfCharCount>
<pdfPageCount>9</pdfPageCount>
<abstractWordCount>117</abstractWordCount>
</qualityIndicators>
<title>OPAC integration in the era of mass digitization the MBooks experience</title>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<volume>26</volume>
<publisherId>
<json:string>lht</json:string>
</publisherId>
<pages>
<last>32</last>
<first>24</first>
</pages>
<issn>
<json:string>0737-8831</json:string>
</issn>
<issue>1</issue>
<subject>
<json:item>
<value>Information & knowledge management</value>
</json:item>
<json:item>
<value>Information & communications technology</value>
</json:item>
<json:item>
<value>Internet</value>
</json:item>
<json:item>
<value>Library & information science</value>
</json:item>
<json:item>
<value>Information behaviour & retrieval</value>
</json:item>
<json:item>
<value>Librarianship/library management</value>
</json:item>
<json:item>
<value>Information user studies</value>
</json:item>
<json:item>
<value>Metadata</value>
</json:item>
<json:item>
<value>Library technology</value>
</json:item>
</subject>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<title>Library Hi Tech</title>
<doi>
<json:string>10.1108/lht</json:string>
</doi>
</host>
<publicationDate>2008</publicationDate>
<copyrightDate>2008</copyrightDate>
<doi>
<json:string>10.1108/07378830810857771</json:string>
</doi>
<id>3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF</id>
<score>0.16283412</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">OPAC integration in the era of mass digitization the MBooks experience</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Emerald Group Publishing Limited</publisher>
<availability>
<p>EMERALD</p>
</availability>
<date>2008</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">OPAC integration in the era of mass digitization the MBooks experience</title>
<author>
<persName>
<forename type="first">Bradford</forename>
<surname>Eden</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">Christina</forename>
<surname>Kelleher Powell</surname>
</persName>
<affiliation>Digital Library Production Service, University of Michigan, Ann Arbor, Michigan, USA</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Library Hi Tech</title>
<idno type="pISSN">0737-8831</idno>
<idno type="DOI">10.1108/lht</idno>
<imprint>
<publisher>Emerald Group Publishing Limited</publisher>
<date type="published" when="2008-03-07"></date>
<biblScope unit="volume">26</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="24">24</biblScope>
<biblScope unit="page" to="32">32</biblScope>
</imprint>
</monogr>
<idno type="istex">3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF</idno>
<idno type="DOI">10.1108/07378830810857771</idno>
<idno type="filenameID">2380260104</idno>
<idno type="original-pdf">2380260104.pdf</idno>
<idno type="href">07378830810857771.pdf</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2008</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract>
<p>Purpose The purpose of this paper is to provide an overview of the OPAC integration in the University of Michigan's local implementation of materials digitized in the partnership with Google. Designmethodologyapproach The paper provides a discussion of different strategies used in integrating metadata with digital resources and presenting the digital objects to the user in the OPAC. Findings The paper finds that methods that had served in smaller digitization projects require more automation and error reduction processes in an undertaking of this scale. Increased integration with the OPAC is one approach. Originalityvalue Michigan is the first of the Google partners to mount their materials themselves and others involved in mass digitization may be interested in the experience.</p>
</abstract>
<textClass>
<keywords scheme="keyword">
<list>
<head>Keywords</head>
<item>
<term>Online catalogues</term>
</item>
<item>
<term>Digital libraries</term>
</item>
<item>
<term>University libraries</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Emerald Subject Group">
<list>
<label>cat-IKM</label>
<item>
<term>Information & knowledge management</term>
</item>
<label>cat-ICT</label>
<item>
<term>Information & communications technology</term>
</item>
<label>cat-INT</label>
<item>
<term>Internet</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Emerald Subject Group">
<list>
<label>cat-LISC</label>
<item>
<term>Library & information science</term>
</item>
<label>cat-IBRT</label>
<item>
<term>Information behaviour & retrieval</term>
</item>
<label>cat-LLM</label>
<item>
<term>Librarianship/library management</term>
</item>
<label>cat-IUS</label>
<item>
<term>Information user studies</term>
</item>
<label>cat-MTD</label>
<item>
<term>Metadata</term>
</item>
<label>cat-LTC</label>
<item>
<term>Library technology</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2008-03-07">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus emerald not found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:document><!-- Auto generated NISO JATS XML created by Atypon out of MCB DTD source files. Do Not Edit! -->
<article dtd-version="1.0" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">lht</journal-id>
<journal-id journal-id-type="doi">10.1108/lht</journal-id>
<journal-title-group>
<journal-title>Library Hi Tech</journal-title>
</journal-title-group>
<issn pub-type="ppub">0737-8831</issn>
<publisher>
<publisher-name>Emerald Group Publishing Limited</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1108/07378830810857771</article-id>
<article-id pub-id-type="original-pdf">2380260104.pdf</article-id>
<article-id pub-id-type="filename">2380260104</article-id>
<article-categories>
<subj-group subj-group-type="type-of-publication">
<compound-subject>
<compound-subject-part content-type="code">research-article</compound-subject-part>
<compound-subject-part content-type="label">Research paper</compound-subject-part>
</compound-subject>
</subj-group>
<subj-group subj-group-type="subject">
<compound-subject>
<compound-subject-part content-type="code">cat-IKM</compound-subject-part>
<compound-subject-part content-type="label">Information & knowledge management</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-ICT</compound-subject-part>
<compound-subject-part content-type="label">Information & communications technology</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-INT</compound-subject-part>
<compound-subject-part content-type="label">Internet</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="subject">
<compound-subject>
<compound-subject-part content-type="code">cat-LISC</compound-subject-part>
<compound-subject-part content-type="label">Library & information science</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-IBRT</compound-subject-part>
<compound-subject-part content-type="label">Information behaviour & retrieval</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-IUS</compound-subject-part>
<compound-subject-part content-type="label">Information user studies</compound-subject-part>
</compound-subject>
<compound-subject>
<compound-subject-part content-type="code">cat-MTD</compound-subject-part>
<compound-subject-part content-type="label">Metadata</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-LLM</compound-subject-part>
<compound-subject-part content-type="label">Librarianship/library management</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-LTC</compound-subject-part>
<compound-subject-part content-type="label">Library technology</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>OPAC integration in the era of mass digitization: the MBooks experience</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="editor">
<string-name>
<given-names>Bradford</given-names>
<surname>Eden</surname>
</string-name>
</contrib>
</contrib-group>
<contrib-group>
<contrib contrib-type="author">
<string-name>
<given-names>Christina</given-names>
<surname>Kelleher Powell</surname>
</string-name>
<aff>Digital Library Production Service, University of Michigan, Ann Arbor, Michigan, USA</aff>
</contrib>
</contrib-group>
<pub-date pub-type="ppub">
<day>07</day>
<month>03</month>
<year>2008</year>
</pub-date>
<volume>26</volume>
<issue>1</issue>
<issue-title>Information organization futures</issue-title>
<issue-title content-type="short">Information organization futures</issue-title>
<fpage>24</fpage>
<lpage>32</lpage>
<permissions>
<copyright-statement>© Emerald Group Publishing Limited</copyright-statement>
<copyright-year>2008</copyright-year>
<license license-type="publisher">
<license-p></license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="07378830810857771.pdf"></self-uri>
<abstract>
<sec>
<title content-type="abstract-heading">Purpose</title>
<x></x>
<p>The purpose of this paper is to provide an overview of the OPAC integration in the University of Michigan's local implementation of materials digitized in the partnership with Google.</p>
</sec>
<sec>
<title content-type="abstract-heading">Design/methodology/approach</title>
<x></x>
<p>The paper provides a discussion of different strategies used in integrating metadata with digital resources and presenting the digital objects to the user in the OPAC.</p>
</sec>
<sec>
<title content-type="abstract-heading">Findings</title>
<x></x>
<p>The paper finds that methods that had served in smaller digitization projects require more automation and error reduction processes in an undertaking of this scale. Increased integration with the OPAC is one approach.</p>
</sec>
<sec>
<title content-type="abstract-heading">Originality/value</title>
<x></x>
<p>Michigan is the first of the Google partners to mount their materials themselves and others involved in mass digitization may be interested in the experience.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Online catalogues</kwd>
<x>, </x>
<kwd>Digital libraries</kwd>
<x>, </x>
<kwd>University libraries</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>peer-reviewed</meta-name>
<meta-value>no</meta-value>
</custom-meta>
<custom-meta>
<meta-name>academic-content</meta-name>
<meta-value>yes</meta-value>
</custom-meta>
<custom-meta>
<meta-name>rightslink</meta-name>
<meta-value>included</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec>
<title>Introduction – steps toward integration</title>
<p>The University of Michigan Digital Library Production Service (DLPS) has a long history of including the online catalog and the work of catalogers into the digitization process as electronic texts were created for online delivery. Even in the earliest days of its predecessor organization, the Humanities Text Initiative, the OPAC was seen as a primary resource for users looking for electronic texts, and MARC records were created for each of the titles made available for the American Verse Project (located online at
<ext-link ext-link-type="uri" xlink:href="http://quod.lib.umich.edu/a/amverse">http://quod.lib.umich.edu/a/amverse</ext-link>
) as early as 1995. Appropriate changes were made to existing MARC records to meet standards for electronic surrogates and then loaded into the OPAC as a new record, separate from its print original. This process of creating “electronic edition” records for digitized versions of print holdings continued for ten years. Electronic holdings began to be added to the original print record as digitization became a more routine part of the entire library workflow, as projects like the Mellon Foundation‐funded Making of America (located online at
<ext-link ext-link-type="uri" xlink:href="http://moa.umdl.umich.edu">http://moa.umdl.umich.edu</ext-link>
) were developed, and digitization replaced microfilming as the reformatting method of choice for dealing with brittle books. A report of items that were put online was sent out each month, and catalogers ran the scripts required to create a holding record and attach it to the print record for each text digitized.</p>
<p>Moving from the small‐scale production of the Humanities Text Initiative, where texts were transcribed and encoded through manual processes (see
<xref ref-type="bibr" rid="b3">Powell and Kerr (1997)</xref>
for a summary of the workflow), to the large‐scale production model of representing volumes through page images converted into text using optical character recognition (OCR), also brought changes in how metadata was created for each electronic text. Rather than trying to create metadata independently by hand for each digitized text, whether by DLPS staff or descriptive catalogers, methods were put into place to automatically derive metadata from existing cataloging. MARC records for titles being digitized were exported from the OPAC by library systems office staff and converted into Text Encoding Initiative (TEI) headers in Standard Generalized Markup Language (SGML) using Perl scripts based on crosswalks between the two systems of description (
<xref ref-type="bibr" rid="b2">Marko and Powell, 2001</xref>
). Advances in technology allowed DLPS staffers to export records from the OPAC directly, and convert MARC Extensible Markup Language (XML) into TEI headers using Extensible Stylesheet Language (XSLT), streamlining the process further. Open Archive Initiative (OAI) records were created in a similar process.</p>
<p>Exporting records from the OPAC moved integration further in another way as well. Original practice in the Humanities Text Initiative was to create a unique identifier for each text by combining the first five letters of the author's last name and the first five significant letters of the title, resulting in an identifier like CarleFarmB for Will Carleton's
<italic>Farm Ballads</italic>
. Following this scheme for more than a handful of volumes quickly leads to issues surrounding uniqueness. For the first round of Making of America conversion, identifiers were created by the vendor based on the sequence in which the materials were digitized. Here, uniqueness was not a problem; however, the identifiers were almost completely meaningless in identifying a text. Since these vendor‐created identifiers needed to be matched with the MARC record to unite the OCR with the TEI header and create a complete electronic text, a solution to both the problems of identifier uniqueness and identifier meaning became clear. The MARC 001 (Control Number) field, which uniquely identifies records in the OPAC, became the standard identifier for all locally‐created digital objects, tying these volumes to the metadata that describe them.</p>
<p>The OPAC has also been used as a tool to locate material for digitization, as in the second phase of the Making of America project. The original project had a thematic focus of American materials published between 1850 and 1877, and subject specialists selected monographs individually in the subject areas of education, psychology, American history, sociology, science and technology, and religion. In the second phase, commonly referred to as “MoA IV,” material was selected automatically based on criteria such as place and date of publication, subject headings, call number ranges, and library locations such as remote storage (
<xref ref-type="bibr" rid="b1">Bonn, 1999</xref>
).</p>
</sec>
<sec>
<title>The insufficiency of past practice – a realization</title>
<p>In each of these cases, the initial practice required more manual intervention than the later refinements; however, they could never be called truly automated – human intention and intervention, no matter how modest, was required in record selection, item selection, record export, electronic item metadata creation, and description of/linking to the resulting digital object. As planning began for the digitization partnership with Google, it was clear that past practice would not serve for a project of this scale done in concert with an external partner. As DLPS moved forward on MBooks, the local implementation of the materials digitized as part of the Michigan Digitization Project (MDP), opportunities to leverage the OPAC – both the intellectual work within the records and the technical infrastructure of the library management system itself – presented themselves.</p>
<p>The most compelling argument for integrating the OPAC into the process was that the criteria for including material in the digital collection were no longer bibliographic. The University of Michigan was no longer building a collection based on a particular topic, genre, or time period. Print materials held by the library would be digitized, barring issues with their physical condition that would preclude digitization. This focus on the book as an item to be digitized as part of the entire library collection, rather than as a title that fit within a subset of that collection selected for digitization, lead to the inevitable selection of the barcode, rather than the record identifier, as the new unique identifier for materials. Since Google planned to digitize all copies of a particular title, not compare them and select the best, this would eliminate the looming problem of duplicate record identifiers if we continued to follow our previous workflow. Barcodes would be unique for each copy, regardless of how many might be encountered throughout the library system over the course of the project.</p>
<p>Furthermore, the identifier in the MARC 001 field describes an entire intellectual work, which could consist of multiple volumes. Using this as an identifier for digitization required extensions to the record ID value for each piece, i.e. BAC1457.0001.001, BAC1457.0001.002, etc. Extracting these identifiers from the OPAC manually (keying them into a database by the reformatting staff, for example), with enumeration of multiple parts also done manually, when assigning them to volumes prior to digitization inevitably lead to errors. The errors were not insurmountable but were inevitable, and required materials digitized under the wrong identifier to be renamed. Scanning the barcode removes the potential for user error in keying or copying identifiers and provides an opportunity for tracking the item throughout the digitization process by making use of the circulation module of the OPAC. It is important to note that this refers to the barcode at the time of scan. It is immortalized as the MDP identifier, even if the barcode on the physical item itself should be replaced at some point in the future.</p>
<p>Another driving force in integrating the OPAC more firmly into the MDP workflow and MBooks delivery system was the unwillingness to create a “shadow OPAC.” What seemed to be an ideal strategy when working with collections of thousands of volumes quickly grows unworkable when applied to millions of volumes. Extracting records and essentially recreating the entire OPAC externally would remove the metadata from the usual round of correction and authority work and create another system to maintain. Also, while the storage required for bibliographic metadata is small compared to that for the digital objects themselves, it is not insignificant when discussing millions of volumes.</p>
<p>Finally, workflow for creating links from the MARC records to the digitized objects is unsustainable at such a high volume. A daily task of batching identifiers for linking, even in a largely automated process, is too much work, as compared to a monthly task of similar numbers of items. It was projected that we could be mounting as many volumes per day as we had been per years once the Google partnership was in full swing.</p>
</sec>
<sec>
<title>New roles for the OPAC</title>
<sec>
<title>The OPAC as tracking module</title>
<p>With a project of this scale, involving an external partner, maintaining information about the location of the volumes has been of paramount importance. The OPAC has served a central role in the control of the process. As books are pulled for digitization, their barcodes are scanned. If there is no barcode on the volume, it is set aside until one is added, and if the barcode is not linked to a record, it is set aside until it can be linked
<xref ref-type="fn" rid="fn1">[1]</xref>
. The volume is checked out to the MDP and the item status is set. This process has turned up a number of status errors, such as volumes listed as missing or at the bindery. Status conflicts are reported by the OPAC and are checked manually and resolved as appropriate
<xref ref-type="fn" rid="fn2">[2]</xref>
. After the volumes are scanned and returned for shelving, the status is updated. Using the OPAC as a tracking module keeps all the management and inventory information where it is readily available to any staff members who may be looking for a particular volume for any reason.</p>
</sec>
<sec>
<title>The OPAC as linking module</title>
<p>One of the library's earliest goals was to provide access to the digitized volumes though Mirlyn, the university library catalog. Much thought was given to how to best record location information that was concise, accurate, and – perhaps most importantly – stable; creating numerous MARC 856 (Electronic Location and Access) fields full of URLs that might have to be changed for unforeseen reasons at some point in the future was an unattractive scenario. After analysis of prior usage, it was decided to include the MDP identifiers (that is, the barcodes) in the call‐number‐2 field in the item record for the digitized volume, an option that provided the greatest flexibility.</p>
<p>After a volume's digital images and data are retrieved from Google and validated via an automated process designed and implemented by the library's Core Services group, a Metadata Encoding and Transmission Standard (METS) document is created and a CNRI handle – a unique, persistent identifying URL – registered for that volume. The record for the volume is then automatically updated by the Library Systems Office through a batch file of barcodes and handles submitted daily; the MARC 006 and 007 fixed fields are updated to record the reformatting codes, the MARC 533 (Reproduction Note) field is updated to record the descriptive data about the reproduction, and the barcode is added to the call‐number‐2 field in the item record along with the “mdp” namespace, where it officially becomes the MDP identifier. Using an Aleph expand routine provided within the Ex Libris system, the call‐number‐2 is used to automatically create a “virtual MARC 856 field” to provide users links to the MBooks and Google versions in the Mirlyn display (see
<xref ref-type="fig" rid="F_2380260104001">Figure 1</xref>
).</p>
<p>Serial records present a link with detailed holdings for multiple items (see
<xref ref-type="fig" rid="F_2380260104002">Figure 2</xref>
).</p>
<p>Icons display with the MBooks links, showing the availability status for the item. This is determined by another facet of the OPAC's changing role in the era of mass digitization.</p>
</sec>
<sec>
<title>The OPAC as a source of baseline rights information</title>
<p>Access to digital materials is one of the most complex issues, and the University of Michigan Library is taking a conservative approach in order to avoid making copyrighted materials available. Manually investigating the rights status of each volume before it is made available, however, would be a daunting task that would doom the project before it began. DLPS is leveraging the years of work already done by catalogers to make an initial assessment based on established copyright policy and basic bibliographic information. As volumes are loaded into the repository and the MARC records are updated, an automated examination of the bibliographic record is performed and a row for each volume is added to the rights database. Because MARC does not lend itself to storing rights information, an external database has been created to record the rights status of each volume
<xref ref-type="fn" rid="fn3">[3]</xref>
. For each item, the bibliographic record is examined, a determination is made based on a series of tests, and a rights attribute is assigned, as is a reason for the determination (in this case, “bibliographically derived”) and a timestamp.</p>
<p>The rights tests are relatively simple: if the volume was published in the United States or not, if the volume was published prior to an established cutoff date for public domain status (dependent upon place of publication), and if the volume is a US federal government document. Initial analysis is done of the MARC 008 control field (Fixed‐Length Data Elements). These 40 characters provide information about the work as a whole that is invaluable in performing the initial rights assessment. The Date 1 and Date 2 fields, Place of Publication field, and Government Publication field are all examined to determine whether an item is in the public domain, in the public domain in the United States, or likely to be in copyright. Additionally, if the Place of Publication fixed field indicates the volume was published in the USA, the MARC 260 (Publication, Distribution, etc.) subfield a (place of publication) is checked against a list of known cities; if it does not match, the algorithm identifies it as outside the USA. Similarly, should any of the date fields contain fill characters or blanks, it is set to null. Any occurrence of “u” in a date is changed to a 9; i.e. 18uu would become 1899. This allows for the most conservative interpretation of the date in the algorithm.</p>
<p>If a volume passes all the tests, it is marked as either Public Domain or Public Domain in the USA, depending on where and when it was published. Should it fail to match at any point in the algorithm, it is marked as In Copyright. If information is ambiguous or is missing from the record, it is marked Undetermined and treated as if it were in copyright; a report of potentially flawed records is generated for cataloger review
<xref ref-type="fn" rid="fn4">[4]</xref>
. The reason “bibliographically derived” is applied and the database row is stamped with the time and date of the query. Mass identification of copyright status based on bibliographic information in this manner means that some volumes are restricted unnecessarily. The rights database has additional rights and reason attributes to accommodate future investigation (for example, determination of whether or not a copyright notice was printed in the text during the years this would be appropriate) or contract negotiation with publishers. These rights are combined with information about the user's location (via IP address) and identity (if known; university affiliates may authenticate, for example, and possibly have access to licensed works) to provide appropriate access for each volume and an appropriate MBooks logo in Mirlyn.</p>
</sec>
<sec>
<title>The OPAC as a metadata source for other interfaces</title>
<p>The previous new roles have all been contained within the OPAC itself, extending its services for tracking volumes or providing data exports into new areas of effort. For the first time, however, DLPS is using the OPAC as a source of metadata that is displayed in other, non‐Mirlyn, interfaces. Clicking on one of the MBooks links shown in the screenshots above launches the MBooks pageturner. This interface is not part of the OPAC; it was written by the Digital Library eXtension Service (DLXS) staff and uses the METS files to provide users access to the digital page images and text. These METS documents contain the MDP identifier for the object, but do not contain a copy of the descriptive metadata. The volume title presented in the interface is retrieved on the fly from Mirlyn using the Aleph X‐Server functionality; the barcode is passed to the X‐Server, which retrieves the record as XML. Depending on which view the user has selected, either the title or a fuller set of publication information is retrieved, converted by XSLT and displayed as HTML; a link back to the record for the volume is provided as well, as shown below. The first screenshot shows the default view, with the title drawn from the MARC 245 (Title Statement) field, truncated if necessary (see
<xref ref-type="fig" rid="F_2380260104003">Figure 3</xref>
).</p>
<p>If the user desires more information and clicks on the “more” link, a fuller citation is displayed, with Main Entry information drawn from the MARC 100, 110 or 111 fields as appropriate, as well as publication information drawn from the MARC 250 (Edition Statement) and 260 (Publication, Distribution, etc.) fields and the MARC 300 (Physical Description) field, as shown below. If it is later discovered that this is the wrong record, or some of the metadata in the record is incorrect, corrections made in Mirlyn are automatically available to MBooks, without regenerating the METS document. This leverages maintenance of the catalog and precludes building another metadata database within MBooks.</p>
<p>In a similar vein, OAI records are now being harvested from directly Mirlyn, rather than converted from exported MARC records. Using a locally‐created harvester, MBooks records that contain items determined to be in the public domain are harvested from the Ex Libris OAI data provider module. This is a far more efficient process and again precludes creating another external set of records to be reprocessed and maintained (see
<xref ref-type="fig" rid="F_2380260104004">Figure 4</xref>
).</p>
<p>So far, MBooks appears to be a success. Usage, especially local usage, is growing steadily, with almost 200,000 pages viewed since the launch in April 2007. Development is ongoing, with the most recent development in the area of Section 508 of the Americans with Disabilities Act. Teaming up with the Office of Services for Students with Disabilities (OSSD), the library is now able to offer access to the OCRed text of volumes to students with screenreaders. Students with visual impairments who register with the OSSD will receive an email message when they check out a book that has been digitized, providing them with a URL that will permit them to access the text of digitized volumes. Such a service is only possible now that the Mirlyn is integrated into the MBooks interface, sharing metadata about volumes and users in ways we had not previously envisioned.</p>
</sec>
</sec>
<sec>
<fig position="float" id="F_2380260104001">
<label>
<bold>Figure 1
<x> </x>
</bold>
</label>
<graphic xlink:href="2380260104001.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2380260104002">
<label>
<bold>Figure 2
<x> </x>
</bold>
</label>
<graphic xlink:href="2380260104002.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2380260104003">
<label>
<bold>Figure 3
<x> </x>
</bold>
</label>
<graphic xlink:href="2380260104003.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2380260104004">
<label>
<bold>Figure 4
<x> </x>
</bold>
</label>
<graphic xlink:href="2380260104004.tif"></graphic>
</fig>
</sec>
</body>
<back>
<fn-group>
<title>Notes</title>
<fn id="fn1">
<p>One of the advantages of using the barcode for digitization is that if it happens to be linked to the record for an incorrect edition of a work, for example, it is a simple matter to have it relinked to the correct record, whereas using the record identifier for the digitized object would require the image files to be renamed, as previously mentioned. Even if the linking error is discovered after the volume is online – and these misidentifications are often reported by users – correcting the problem is straightforward and does not change the URL for the volume.</p>
</fn>
<fn id="fn2">
<p>As with the barcode errors mentioned previously, staff have joked that the MDP is really a big inventory and record clean up project.</p>
</fn>
<fn id="fn3">
<p>This is separate from access rules for individual users, which are determined by other means, such as IP address or authorization through the University of Michigan authentication system.</p>
</fn>
<fn id="fn4">
<p>Yet another situation where the mass digitization project is helping to identify areas in the OPAC where attention might be needed.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="b1">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Bonn</surname>
,
<given-names>M.</given-names>
</string-name>
</person-group>
(
<year>1999</year>
), “
<article-title>
<italic>Building a digital library: the stories of the making of America</italic>
</article-title>
”, in
<person-group person-group-type="editor">
<string-name>
<surname>Saunders</surname>
,
<given-names>L.</given-names>
</string-name>
</person-group>
(Ed.),
<source>
<italic>The Evolving Virtual Library II: More Visions and Case Studies</italic>
</source>
,
<publisher-name>Information Today</publisher-name>
,
<publisher-loc>Medford, NJ</publisher-loc>
.</mixed-citation>
</ref>
<ref id="b2">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Marko</surname>
,
<given-names>L.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Powell</surname>
,
<given-names>C.</given-names>
</string-name>
</person-group>
(
<year>2001</year>
), “
<article-title>
<italic>Descriptive metadata strategy for TEI headers: a University of Michigan Library case study</italic>
</article-title>
”,
<source>
<italic>OCLC Systems and Services</italic>
</source>
, Vol.
<volume>17</volume>
No.
<issue>3</issue>
, pp.
<fpage>117</fpage>
<x></x>
<lpage>20</lpage>
.</mixed-citation>
</ref>
<ref id="b3">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Powell</surname>
,
<given-names>C.K.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Kerr</surname>
,
<given-names>N.</given-names>
</string-name>
</person-group>
(
<year>1997</year>
), “
<article-title>
<italic>SGML creation and delivery: the humanities text initiative</italic>
</article-title>
”,
<source>
<italic>D‐Lib Magazine</italic>
</source>
,
<issue>July/August</issue>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.dlib.org/dlib/july97/humanities/07powell.html">www.dlib.org/dlib/july97/humanities/07powell.html</ext-link>
.</mixed-citation>
</ref>
</ref-list>
<ref-list>
<title>Further Reading</title>
<ref id="frg1">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Shaw</surname>
,
<given-names>E.J.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Blumson</surname>
,
<given-names>S.</given-names>
</string-name>
</person-group>
(
<year>1997</year>
), “
<article-title>
<italic>Making of America: online searching and page presentation at the University of Michigan</italic>
</article-title>
”,
<source>
<italic>D‐Lib Magazine</italic>
</source>
,
<issue>July/August</issue>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.dlib.org/dlib/july97/america/07shaw.html">www.dlib.org/dlib/july97/america/07shaw.html</ext-link>
.</mixed-citation>
</ref>
</ref-list>
<app-group>
<app id="APP1">
<title>About the author</title>
<p>Christina Kelleher Powell is the Coordinator of Encoded Text Services at the University of Michigan Digital Library Production Service. She has an MILS from the University of Michigan School of Information. She is indebted to her colleagues throughout the library whose work she describes here.</p>
</app>
</app-group>
</back>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>OPAC integration in the era of mass digitization the MBooks experience</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>OPAC integration in the era of mass digitization the MBooks experience</title>
</titleInfo>
<name type="personal">
<namePart type="given">Bradford</namePart>
<namePart type="family">Eden</namePart>
</name>
<name type="personal">
<namePart type="given">Christina</namePart>
<namePart type="family">Kelleher Powell</namePart>
<affiliation>Digital Library Production Service, University of Michigan, Ann Arbor, Michigan, USA</affiliation>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="research-article"></genre>
<originInfo>
<publisher>Emerald Group Publishing Limited</publisher>
<dateIssued encoding="w3cdtf">2008-03-07</dateIssued>
<copyrightDate encoding="w3cdtf">2008</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract>Purpose The purpose of this paper is to provide an overview of the OPAC integration in the University of Michigan's local implementation of materials digitized in the partnership with Google. Designmethodologyapproach The paper provides a discussion of different strategies used in integrating metadata with digital resources and presenting the digital objects to the user in the OPAC. Findings The paper finds that methods that had served in smaller digitization projects require more automation and error reduction processes in an undertaking of this scale. Increased integration with the OPAC is one approach. Originalityvalue Michigan is the first of the Google partners to mount their materials themselves and others involved in mass digitization may be interested in the experience.</abstract>
<subject>
<genre>keywords</genre>
<topic>Online catalogues</topic>
<topic>Digital libraries</topic>
<topic>University libraries</topic>
</subject>
<relatedItem type="host">
<titleInfo>
<title>Library Hi Tech</title>
</titleInfo>
<genre type="journal">journal</genre>
<subject>
<genre>Emerald Subject Group</genre>
<topic authority="SubjectCodesPrimary" authorityURI="cat-IKM">Information & knowledge management</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-ICT">Information & communications technology</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-INT">Internet</topic>
</subject>
<subject>
<genre>Emerald Subject Group</genre>
<topic authority="SubjectCodesPrimary" authorityURI="cat-LISC">Library & information science</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-IBRT">Information behaviour & retrieval</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-LLM">Librarianship/library management</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-IUS">Information user studies</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-MTD">Metadata</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-LTC">Library technology</topic>
</subject>
<identifier type="ISSN">0737-8831</identifier>
<identifier type="PublisherID">lht</identifier>
<identifier type="DOI">10.1108/lht</identifier>
<part>
<date>2008</date>
<detail type="title">
<title>Information organization futures</title>
</detail>
<detail type="volume">
<caption>vol.</caption>
<number>26</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>1</number>
</detail>
<extent unit="pages">
<start>24</start>
<end>32</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF</identifier>
<identifier type="DOI">10.1108/07378830810857771</identifier>
<identifier type="filenameID">2380260104</identifier>
<identifier type="original-pdf">2380260104.pdf</identifier>
<identifier type="href">07378830810857771.pdf</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© Emerald Group Publishing Limited</accessCondition>
<recordInfo>
<recordContentSource>EMERALD</recordContentSource>
</recordInfo>
</mods>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000344 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000344 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:3C787A80BAA6BE1CCBA927A91DF44ECF673DECDF
   |texte=   OPAC integration in the era of mass digitization the MBooks experience
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024