Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Managing text digitisation

Identifieur interne : 000298 ( Istex/Corpus ); précédent : 000297; suivant : 000299

Managing text digitisation

Auteurs : Stephen Chapman

Source :

RBID : ISTEX:C3FAF33A64CC32020BC01071612AE259794A3A61

Abstract

Good project plans include technical specifications, plans of work, budgets and schedules that are consistent with project goals. Text digitisation presents many variables and many choices, so there is no onesizefitsall solution to recommend. This article presents a simpletouse questionnaire as a tool for project managers to translate their vision of text digitisation into a series of functional requirements optimised for their collections and users. These requirements can then be used to develop specifications, draft workflows, and select appropriate staff, services and equipment.

Url:
DOI: 10.1108/14684520310462536

Links to Exploration step

ISTEX:C3FAF33A64CC32020BC01071612AE259794A3A61

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Managing text digitisation</title>
<author wicri:is="90%">
<name sortKey="Chapman, Stephen" sort="Chapman, Stephen" uniqKey="Chapman S" first="Stephen" last="Chapman">Stephen Chapman</name>
<affiliation>
<mods:affiliation>Stephen Chapman is the Preservation Librarian for Digital Initiatives at Weissman Preservation Centre, Harvard University Library, Cambridge, Massachusetts, USA.</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:C3FAF33A64CC32020BC01071612AE259794A3A61</idno>
<date when="2003" year="2003">2003</date>
<idno type="doi">10.1108/14684520310462536</idno>
<idno type="url">https://api.istex.fr/document/C3FAF33A64CC32020BC01071612AE259794A3A61/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000298</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Managing text digitisation</title>
<author wicri:is="90%">
<name sortKey="Chapman, Stephen" sort="Chapman, Stephen" uniqKey="Chapman S" first="Stephen" last="Chapman">Stephen Chapman</name>
<affiliation>
<mods:affiliation>Stephen Chapman is the Preservation Librarian for Digital Initiatives at Weissman Preservation Centre, Harvard University Library, Cambridge, Massachusetts, USA.</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Online Information Review</title>
<idno type="ISSN">1468-4527</idno>
<imprint>
<publisher>MCB UP Ltd</publisher>
<date type="published" when="2003-02-01">2003-02-01</date>
<biblScope unit="volume">27</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="17">17</biblScope>
<biblScope unit="page" to="27">27</biblScope>
</imprint>
<idno type="ISSN">1468-4527</idno>
</series>
<idno type="istex">C3FAF33A64CC32020BC01071612AE259794A3A61</idno>
<idno type="DOI">10.1108/14684520310462536</idno>
<idno type="filenameID">2640270102</idno>
<idno type="original-pdf">2640270102.pdf</idno>
<idno type="href">14684520310462536.pdf</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1468-4527</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Good project plans include technical specifications, plans of work, budgets and schedules that are consistent with project goals. Text digitisation presents many variables and many choices, so there is no onesizefitsall solution to recommend. This article presents a simpletouse questionnaire as a tool for project managers to translate their vision of text digitisation into a series of functional requirements optimised for their collections and users. These requirements can then be used to develop specifications, draft workflows, and select appropriate staff, services and equipment.</div>
</front>
</TEI>
<istex>
<corpusName>emerald</corpusName>
<author>
<json:item>
<name>Stephen Chapman</name>
<affiliations>
<json:string>Stephen Chapman is the Preservation Librarian for Digital Initiatives at Weissman Preservation Centre, Harvard University Library, Cambridge, Massachusetts, USA.</json:string>
</affiliations>
</json:item>
</author>
<subject>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Digital libraries</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Projects</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Electronic data processing</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Information technology</value>
</json:item>
</subject>
<language>
<json:string>eng</json:string>
</language>
<abstract>Good project plans include technical specifications, plans of work, budgets and schedules that are consistent with project goals. Text digitisation presents many variables and many choices, so there is no onesizefitsall solution to recommend. This article presents a simpletouse questionnaire as a tool for project managers to translate their vision of text digitisation into a series of functional requirements optimised for their collections and users. These requirements can then be used to develop specifications, draft workflows, and select appropriate staff, services and equipment.</abstract>
<qualityIndicators>
<score>5.996</score>
<pdfVersion>1.2</pdfVersion>
<pdfPageSize>595 x 842 pts (A4)</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>4</keywordCount>
<abstractCharCount>588</abstractCharCount>
<pdfWordCount>5289</pdfWordCount>
<pdfCharCount>38366</pdfCharCount>
<pdfPageCount>11</pdfPageCount>
<abstractWordCount>83</abstractWordCount>
</qualityIndicators>
<title>Managing text digitisation</title>
<genre.original>
<json:string>review-article</json:string>
</genre.original>
<genre>
<json:string>review-article</json:string>
</genre>
<host>
<volume>27</volume>
<publisherId>
<json:string>oir</json:string>
</publisherId>
<pages>
<last>27</last>
<first>17</first>
</pages>
<issn>
<json:string>1468-4527</json:string>
</issn>
<issue>1</issue>
<subject>
<json:item>
<value>Information & knowledge management</value>
</json:item>
<json:item>
<value>Information & communications technology</value>
</json:item>
<json:item>
<value>Internet</value>
</json:item>
<json:item>
<value>Library & information science</value>
</json:item>
<json:item>
<value>Collection building & management</value>
</json:item>
<json:item>
<value>Information behaviour & retrieval</value>
</json:item>
<json:item>
<value>Records management & preservation</value>
</json:item>
<json:item>
<value>Bibliometrics</value>
</json:item>
<json:item>
<value>Databases</value>
</json:item>
<json:item>
<value>Document management</value>
</json:item>
</subject>
<genre>
<json:string>Journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<title>Online Information Review</title>
<doi>
<json:string>10.1108/oir</json:string>
</doi>
</host>
<publicationDate>2003</publicationDate>
<copyrightDate>2003</copyrightDate>
<doi>
<json:string>10.1108/14684520310462536</json:string>
</doi>
<id>C3FAF33A64CC32020BC01071612AE259794A3A61</id>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/C3FAF33A64CC32020BC01071612AE259794A3A61/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/C3FAF33A64CC32020BC01071612AE259794A3A61/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/C3FAF33A64CC32020BC01071612AE259794A3A61/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Managing text digitisation</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>MCB UP Ltd</publisher>
<availability>
<p>EMERALD</p>
</availability>
<date>2003</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Managing text digitisation</title>
<author>
<persName>
<forename type="first">Stephen</forename>
<surname>Chapman</surname>
</persName>
<affiliation>Stephen Chapman is the Preservation Librarian for Digital Initiatives at Weissman Preservation Centre, Harvard University Library, Cambridge, Massachusetts, USA.</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Online Information Review</title>
<idno type="pISSN">1468-4527</idno>
<idno type="DOI">10.1108/oir</idno>
<imprint>
<publisher>MCB UP Ltd</publisher>
<date type="published" when="2003-02-01"></date>
<biblScope unit="volume">27</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="17">17</biblScope>
<biblScope unit="page" to="27">27</biblScope>
</imprint>
</monogr>
<idno type="istex">C3FAF33A64CC32020BC01071612AE259794A3A61</idno>
<idno type="DOI">10.1108/14684520310462536</idno>
<idno type="filenameID">2640270102</idno>
<idno type="original-pdf">2640270102.pdf</idno>
<idno type="href">14684520310462536.pdf</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2003</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Good project plans include technical specifications, plans of work, budgets and schedules that are consistent with project goals. Text digitisation presents many variables and many choices, so there is no onesizefitsall solution to recommend. This article presents a simpletouse questionnaire as a tool for project managers to translate their vision of text digitisation into a series of functional requirements optimised for their collections and users. These requirements can then be used to develop specifications, draft workflows, and select appropriate staff, services and equipment.</p>
</abstract>
<textClass>
<keywords scheme="keyword">
<list>
<head>Keywords</head>
<item>
<term>Digital libraries</term>
</item>
<item>
<term>Projects</term>
</item>
<item>
<term>Electronic data processing</term>
</item>
<item>
<term>Information technology</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Emerald Subject Group">
<list>
<label>cat-IKM</label>
<item>
<term>Information & knowledge management</term>
</item>
<label>cat-ICT</label>
<item>
<term>Information & communications technology</term>
</item>
<label>cat-INT</label>
<item>
<term>Internet</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Emerald Subject Group">
<list>
<label>cat-LISC</label>
<item>
<term>Library & information science</term>
</item>
<label>cat-CBM</label>
<item>
<term>Collection building & management</term>
</item>
<label>cat-IBRT</label>
<item>
<term>Information behaviour & retrieval</term>
</item>
<label>cat-RMP</label>
<item>
<term>Records management & preservation</term>
</item>
<label>cat-BIB</label>
<item>
<term>Bibliometrics</term>
</item>
<label>cat-DAT</label>
<item>
<term>Databases</term>
</item>
<label>cat-DOCM</label>
<item>
<term>Document management</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2003-02-01">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/C3FAF33A64CC32020BC01071612AE259794A3A61/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus emerald not found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:document><!-- Auto generated NISO JATS XML created by Atypon out of MCB DTD source files. Do Not Edit! -->
<article dtd-version="1.0" xml:lang="en" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">oir</journal-id>
<journal-id journal-id-type="doi">10.1108/oir</journal-id>
<journal-title-group>
<journal-title>Online Information Review</journal-title>
</journal-title-group>
<issn pub-type="ppub">1468-4527</issn>
<publisher>
<publisher-name>MCB UP Ltd</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1108/14684520310462536</article-id>
<article-id pub-id-type="original-pdf">2640270102.pdf</article-id>
<article-id pub-id-type="filename">2640270102</article-id>
<article-categories>
<subj-group subj-group-type="type-of-publication">
<compound-subject>
<compound-subject-part content-type="code">review-article</compound-subject-part>
<compound-subject-part content-type="label">General review</compound-subject-part>
</compound-subject>
</subj-group>
<subj-group subj-group-type="subject">
<compound-subject>
<compound-subject-part content-type="code">cat-IKM</compound-subject-part>
<compound-subject-part content-type="label">Information & knowledge management</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-ICT</compound-subject-part>
<compound-subject-part content-type="label">Information & communications technology</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-INT</compound-subject-part>
<compound-subject-part content-type="label">Internet</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="subject">
<compound-subject>
<compound-subject-part content-type="code">cat-LISC</compound-subject-part>
<compound-subject-part content-type="label">Library & information science</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-CBM</compound-subject-part>
<compound-subject-part content-type="label">Collection building & management</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-BIB</compound-subject-part>
<compound-subject-part content-type="label">Bibliometrics</compound-subject-part>
</compound-subject>
<compound-subject>
<compound-subject-part content-type="code">cat-DAT</compound-subject-part>
<compound-subject-part content-type="label">Databases</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-IBRT</compound-subject-part>
<compound-subject-part content-type="label">Information behaviour & retrieval</compound-subject-part>
</compound-subject>
</subj-group>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-RMP</compound-subject-part>
<compound-subject-part content-type="label">Records management & preservation</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-DOCM</compound-subject-part>
<compound-subject-part content-type="label">Document management</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Managing text digitisation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<string-name>
<given-names>Stephen</given-names>
<surname>Chapman</surname>
</string-name>
<aff>Stephen Chapman is the Preservation Librarian for Digital Initiatives at Weissman Preservation Centre, Harvard University Library, Cambridge, Massachusetts, USA.</aff>
</contrib>
</contrib-group>
<pub-date pub-type="ppub">
<day>01</day>
<month>02</month>
<year>2003</year>
</pub-date>
<volume>27</volume>
<issue>1</issue>
<fpage>17</fpage>
<lpage>27</lpage>
<permissions>
<copyright-statement>© MCB UP Limited</copyright-statement>
<copyright-year>2003</copyright-year>
<license license-type="publisher">
<license-p></license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="14684520310462536.pdf"></self-uri>
<abstract>
<p>Good project plans include technical specifications, plans of work, budgets and schedules that are consistent with project goals. Text digitisation presents many variables and many choices, so there is no one‐size‐fits‐all solution to recommend. This article presents a simple‐to‐use questionnaire as a tool for project managers to translate their vision of text digitisation into a series of functional requirements optimised for their collections and users. These requirements can then be used to develop specifications, draft workflows, and select appropriate staff, services and equipment.</p>
</abstract>
<kwd-group>
<kwd>Digital libraries</kwd>
<x>, </x>
<kwd>Projects</kwd>
<x>, </x>
<kwd>Electronic data processing</kwd>
<x>, </x>
<kwd>Information technology</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>peer-reviewed</meta-name>
<meta-value>no</meta-value>
</custom-meta>
<custom-meta>
<meta-name>academic-content</meta-name>
<meta-value>yes</meta-value>
</custom-meta>
<custom-meta>
<meta-name>rightslink</meta-name>
<meta-value>included</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<ack>
<p>Refereed article received 9 September 2002 Approved for publication 12 September 2002  Developed initially for the Northeast Document Conservation Centre’s School for Scanning conferences, this paper also presents personal opinions of lessons learned in Harvard’s Library Digital Initiative projects (1998‐ ). The author is grateful to NEDCC for being included in the School for Scanning faculty and to Bill Comstock in the Harvard College Library for reviewing drafts of this article.</p>
</ack>
</front>
<body>
<sec>
<title>Introduction</title>
<p>Many technologies and services are available to improve access to books, journals, manuscripts and other printed materials in libraries and archives. Good digital images can be made by scanning paper or microfilm; searchable text by keying or optical character recognition (OCR); encoded text by applying a standard or content‐specific encoding scheme; and networked delivery by using a commercial or custom‐developed “page turning” application. These many “ors” are both a blessing and a curse to managers of text conversion projects. Where production teams could formerly implement specifications for analogue reformatting (e.g. preservation microfilming), they must often develop them for digitising similar materials. How can this be done accurately and efficiently?</p>
<p>This article presents a decision checklist for managers seeking to create specifications and infrastructure, or to interpret available guidelines, products, and services, for text digitisation. (In this context, “text” refers to a broad range of source materials: handwritten and machine‐printed, single sheet and bound, printed originals or film and photocopy surrogates.) The best practice advocated here is to arrive at specifications that account for project‐ and programme‐specific variables associated with the source materials and the digital reproductions.</p>
<p>No project can be planned, funded and undertaken without specifications. Solutions promoted or marketed as “standard” preservation and access strategies should be assessed carefully. Key variables among ostensibly similar projects include use requirements for the digitised text, attributes of source materials, near‐ and long‐term goals for digital library content and infrastructure, staffing, schedules, and budgets. The best “best practice” for scanning, for example, will never produce searchable text. Image and metadata formats optimised for one delivery application might work poorly, if at all, in another. In planning projects, always be guided by the question, “What is right for my collection and my institution?”, and remember that if the project team does not consider key questions and make decisions, someone else may make them on their behalf.</p>
</sec>
<sec>
<title>Decision checklist</title>
<p>A project manager does not have to be well versed in computer programming, digital imaging technologies, metadata standards, or user interface design to make sensible decisions about text digitisation. However, the project manager must be a skilled communicator who ensures that key questions and decisions are vetted among the parties who will create, manage, and use digital resources. Answers to the following macro‐level questions during planning can be instrumental in developing (or choosing) specifications and workflows for imaging, full text creation, metadata, storage and delivery.</p>
<sec>
<title>Checklist</title>
<list list-type="order">
<list-item>
<label>1. </label>
<p>(1) What type of digitised text is needed?</p>
</list-item>
<list-item>
<label>2. </label>
<p>(2) What functions need to be supported in delivery?</p>
</list-item>
<list-item>
<label>3. </label>
<p>(3) How much quality is needed?</p>
</list-item>
<list-item>
<label>4. </label>
<p>(4) What are the planned outcomes for the source materials?</p>
</list-item>
<list-item>
<label>5. </label>
<p>(5) How will people find the digital objects?</p>
</list-item>
<list-item>
<label>6. </label>
<p>(6) Are there any restrictions to using the digital objects?</p>
</list-item>
<list-item>
<label>7. </label>
<p>(7) What lifespan is envisioned for the digital resources?</p>
</list-item>
</list>
<p>This brief list of questions can help a project team zero in on the functional requirements for systems and the technical specifications for the “raw materials” used to deliver functional digital objects (multi‐page works) to the World Wide Web. These raw materials include digital images, searchable text files, structural metadata, administrative metadata, catalogue records or other discovery metadata, and links.</p>
<p>Given the ever‐increasing range of viable options for text digitisation, assume that budget and institutional policies, not technology and services, will be the limiting factors in decision‐making. In each area of inquiry, the corollary question may not be, “Can it be done?” (because it very likely can), but rather, “Can we afford it?”</p>
</sec>
</sec>
<sec>
<title>Question 1: what type of digitised text is needed?</title>
<p>Although the question of product type is not the largest one to bear on costs, it gets to the heart of digitisation workflow. Projects will require measurably different systems, services, and procedures according to the nature of the final products.</p>
<p>Consider, for example, whether a project must yield searchable texts. Managers must ensure that stakeholders recognise that “digitisation” is not synonymous with creating searchable text. To begin planning, determine whether you need to produce digital images, searchable text, or both.</p>
<p>As explained in detail in digitisation handbooks (Morrison
<italic>et al</italic>
., 1998; Sitts, 2000), scanning is a photographic process that produces raster digital images, comprising basic components (pixels) that are not machine‐searchable. (“Page images” refer to digital images that reproduce one side of each originally produced printed page; since this term is gaining wide currency, it will be used here.) Text conversion, on the other hand, is a transcription process in which keying or OCR produces machine‐readable – and therefore fully searchable – alphanumeric characters. “Full text” refers to all formats of “fully searchable” alphanumeric data.</p>
<p>Thus, if one wants to deliver page images that replicate the appearance of the original materials and use full text for searching, the digitisation strategy would have to include both scanning and text conversion.</p>
<p>Starting at the end, considering delivery issues first is a good way to decide what types of products need to be made in digitisation workflows. Review examples of digitised text created and delivered by libraries and archives to identify archetypes of the results you want to achieve.</p>
<p>After referring to examples of the five product types defined in
<xref ref-type="fig" rid="F_2640270102001">Table I</xref>
, list the ones that are essential in your project.</p>
<p>Reviews need only be cursory at this early stage of planning. Do not get sidetracked in evaluating look‐and‐feel attributes of interfaces, or trying to determine what back‐end systems and formats are being used. Use the archetypes to facilitate discussions about baseline product goals with technical advisors. These conversations will be liberating or sobering, depending on an institution’s (or vendor’s) state of readiness. Skilled technical advisors can readily assess whether sufficient infrastructure exists or can be developed within given parameters to turn a vision into a real‐world implementation.</p>
<p>Finally, in addition to reviewing archetype digital collections, a team may also choose to review commercial products and services to answer Question 1. Here again, one encounters a range of options. Consider the deceptively simple example of PDF (Adobe’s Portable Document Format). As noted in
<xref ref-type="fig" rid="F_2640270102002">Table II</xref>
, which lists many of the popular formats to deliver electronic text, Adobe (2002) formally describes several “flavours” of PDF in its documentation, and providers of paper‐to‐PDF conversion services have expanded this core list by adding more options (Document Solutions, 2002).</p>
<p>The rule for evaluating commercial products and services is the same as the one for reviewing archetypes of networked resources: find relevant examples and classify them according to a straightforward scheme that addresses baseline requirements for your project.</p>
<p>By simply classifying the types of digitised text products that are needed in a project, technical specifications can be developed – or choices can at least be narrowed – for delivery applications and approaches to conversion as follows.</p>
<sec>
<title>Specifications</title>
<p>Delivery application requires (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>means to display page images;</p>
</list-item>
<list-item>
<label></label>
<p>means to index and search text.</p>
</list-item>
</list>
<p>Approach to digitisation (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>requires scanning;</p>
</list-item>
<list-item>
<label></label>
<p>requires text conversion.</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Question 2: what functions need to be supported in delivery?</title>
<p>Where the answer to Question 1 helps the manager develop the raw materials inventory (page images, full text, or both), Questions 2 and 3 largely dictate the amount of processing these materials will require. As a consequence, these are the key questions pertaining to cost.</p>
<p>To be useful in original and digitised forms, multi‐page text needs to be organised. Structural metadata serve the same purposes as pagination, signatures, sewing and binding in printed materials: metadata organises individual pages (page images) into objects that facilitate navigation and citation.</p>
<p>The operative question here, however, is not, “What structural metadata do we need?”, but, “What do the digital objects need to do?” Good project plans specify functional requirements for the multi‐page objects, as well as their constituent parts (discussed in Question 3).</p>
<p>Continuing the exercise in Question 1, a project team will likely discover that digital objects of the same type (e.g. page images + full text for searching) are delivered in diverse ways. Delivery applications are designed to make multi‐page text Web accessible. Thus, all “page turners” work with standard Web browsers. Beyond this shared protocol, however, page turners present different features, each tailored to specific uses. Examples include: means to turn pages, go to specified pages or sections, magnify views, do fielded searching, display alternative formats, print pages and sections, and even highlight and annotate selected phrases. These features, in effect, denote the added benefits that justify digitising rather than photocopying or microfilming collections.</p>
<p>The Making of America (MoA II) project team at the University of California, Berkeley present a useful model to interpret interface design. The buttons and other features of page turners serve as “methods” to express underlying digital object “behaviours” Hurley
<italic>et al</italic>
. (1999). The MoA II model implicitly recognizes that behaviours, which are closely tied to use requirements, are dynamic. As users demand new functions, either the methods, the underlying document encoding, or both, must change. Thus, digital objects and their associated interfaces designed to meet one set of requirements may be enhanced to meet new demands in the future.</p>
<p>Before setting minimum thresholds for the look‐and‐feel representations of page images or displayed full text, managers are encouraged to set baseline requirements for using the multi‐page objects. Use
<xref ref-type="fig" rid="F_2640270102003">Table III</xref>
as an example to list core, required functions for a page turner, then further define the functions by describing the tasks you envision users being able to perform.</p>
<p>By simply classifying and describing object behaviours, technical specifications can be developed – or choices can at least be narrowed – for delivery applications and structural metadata as follows.</p>
<sec>
<title>Specifications</title>
<p>Delivery application requires (answer Y/N for each) means to:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>page forward, page back and go to specific pages;</p>
</list-item>
<list-item>
<label></label>
<p>navigate to sections/parts of the digital object;</p>
</list-item>
<list-item>
<label></label>
<p>customise display of images (e.g. rotate, zoom, thumbnail view);</p>
</list-item>
<list-item>
<label></label>
<p>display different version(s) of the work (text, image);</p>
</list-item>
<list-item>
<label></label>
<p>perform fielded as well as keyword searching;</p>
</list-item>
<list-item>
<label></label>
<p>print designated sections;</p>
</list-item>
<list-item>
<label></label>
<p>print an accurate (two‐sided) codex of the entire object.</p>
</list-item>
</list>
<p>Structural metadata needed (answer Y/N for each) to:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>encode pagination;</p>
</list-item>
<list-item>
<label></label>
<p>encode parts or sections;</p>
</list-item>
<list-item>
<label></label>
<p>encode features within sections and/or pages.</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Question 3: how much quality is needed?</title>
<p>As noted above, major cost‐benefit decisions pertain to the amount of processing – particularly manual processing – that needs to be applied to “raw scans” and “raw text”. The adage of getting what you pay for generally holds true. Costs for structural metadata increase in some proportion to the numbers of behaviours that must be supported. Prices for scanning partially relate to numbers of images (which might be produced in multiple versions per page), but quality requirements have a greater impact upon cost. For page images, quality may be defined subjectively (e.g. with terms such as “legibility” and “fidelity”), semi‐objectively by the Quality Index, or objectively with performance metrics such as MTF. Quality Index and MTF are different metrics used to judge resolution. QI relies on targets and human interpretation of results; MTF on targets and software analysis (Kenney and Chapman, 1995; Williams, 1998). With full‐text, quality is synonymous with accuracy: for character transcription, text formatting, or both.</p>
<p>When selecting equipment and services, make quality requirements explicit, not implicit. Detailed RFPs, specifications and contracts are instrumental in developing realistic budgets and schedules. Explicit quality requirements also promote cordial relations among production teams.</p>
<p>Because people who use digitised surrogates of historic text might judge quality by what isn’t there, completeness is an important attribute to define. Under what circumstances may parts of the original work be excluded in reformatting? (Answers to Question 2 are pertinent. Blank pages, for example, need to be represented in order to produce printed codexes with accurate pagination.) To reduce costs, may portions of page margins be cropped? Is original order important? Must each part of the original be located
<italic>in situ</italic>
, or may “ancillary items” such as bookplates, foldouts, and advertisements be placed in different locations?</p>
<p>(
<italic>Note</italic>
: Mandating “100 per cent completeness” reduces the likelihood that either microfilm or photocopy surrogates would suffice as a single source for digitisation; in these cases, one would need to use original printed sources to fill in gaps, or, if more cost effective, to waive using surrogates and digitise originals.)</p>
<sec>
<title>Page image quality</title>
<p>Guides to digitisation sometimes overlook the importance of stressing basic photographic principles of good object‐to‐lens positioning, even lighting, sharp focus, and accurate exposure to produce pleasing images. Much of the discussion about digital quality centres on maximising resolution and tone reproduction, and minimising noise and other artefacts. Because larger file sizes can increase preservation costs (see Question 7), it is important to focus on what is needed to achieve success in a project, and not to get sidetracked by investigating which system can produce the highest resolution or bit depth for a given cost.</p>
<p>Good digital masters are not characterised by size, but by their capability to produce the appropriate range of deliverables to meet current and future use requirements. If evidence of aging is not deemed to be intrinsic to the value of the reproductions, then colour digitisation would be an unnecessary expense. There are also cases in which file sizes suitable to produce 1:1 printed facsimiles would not be sufficient to meet all use requirements – such as permitting users to view details on screen at 10:1 or greater magnification.</p>
<p>
<xref ref-type="fig" rid="F_2640270102004">Table IV</xref>
provides examples of how quality benchmarks influence cost. (All other things being equal, costs also vary according to the size and condition of source materials.) To generalise, quality requirements at the “legibility” end of the spectrum are most likely to permit scanning from originals or intermediates (photocopies, microfilm), and minimise or eliminate the need for post‐capture processing. Quality requirements towards the “facsimile/enhanced” end of the spectrum are likely to require scanning original materials, using professional skill and/or equipment, and performing post‐capture image processing.</p>
<p>Quality benchmarks for page images should be considered carefully due to the finality of the photographic process. Tones and details not captured in a digital master cannot be added later. Thus, increasing quality in the future comes at the expense of rescanning pages (or other parts of the original object such as foldouts or covers) rather than re‐processing masters as needed.</p>
</sec>
<sec>
<title>Full text quality</title>
<p>Full text has one quality attribute: accuracy. As with page images, quality benchmarks may be placed on a spectrum from inexact (but acceptable) to exact. Answering the following questions will help to establish accuracy benchmarks for your project:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>What is the minimum percentage of characters (e.g. per page) that must be accurately transcribed? (Conversely, what is the maximum error rate that can be tolerated?)</p>
</list-item>
<list-item>
<label></label>
<p>Over and above the minimum, are higher character accuracy rates required for regions of interest such as formulas, captions, titles, personal names, indexes, etc.?</p>
</list-item>
<list-item>
<label></label>
<p>If full text is to be displayed (see Question 1), does original layout need to be replicated or approximated? Fonts?</p>
</list-item>
</list>
<p>One might reason that any level of searching is better than none, and disregard setting a minimum accuracy rate in favour of accepting the automated results from OCR. (Because uncorrected OCR‐generated text – “raw OCR” – can easily be ten times less expensive than keying, it is an attractive option for machine‐printed text input/full‐text‐for‐searching output workflows.) But would this strategy satisfy your project requirements? Using raw OCR for searching works well until users ask, “How reliable are these results?” Furthermore, if text is displayed, users might ask, “How reliable is this transcription?” Answering such questions precisely requires comparing OCR results to the original source. (Given the inherent redundancy in many keying workflows, accuracy rates from keying are generally trustworthy.) Statistical sampling methods have been used to project overall rates of accuracy for uncorrected OCR, but these procedures, like manual comparisons, require additional staff, procedures, or systems (Weissman, 2002a).</p>
<p>Some manual text conversion processes are inevitable if original text is handwritten, exceptionally high accuracy rates are mandated, or baseline per‐page minimums need to be guaranteed. In these cases, either text is keyed, or raw OCR is inspected and corrected to meet quality objectives. OCR only theoretically saves costs. The chosen threshold for character accuracy and the quality of page images – because OCR processes only digital images, not printed pages – determine whether this will be true in your project. For historic materials (particularly nineteenth century and earlier), it is possible that the combined costs of scanning + OCR + correcting OCR‐generated errors will exceed keying costs to meet project standards for quality.</p>
<p>Fortunately, quality decisions about full text masters do not have the finality of those for page images. Post‐capture processes such as automated spell checking, keying, or re‐keying can be applied to enhance accuracy when resources are available. Accuracy can be improved for entire works or for meaningful portions (e.g. formulas, tables, personal names, place names, captions).</p>
<p>If full text is to be displayed, one needs to decide whether it needs to replicate or approximate original fonts and page layout. There are several methods to control how text is rendered (on screen or paper), but all depend on underlying text encoding. Encoding should be considered in the broader context of digital object functionality (Question 2), as its principal function is to embed document structure. Managers seriously considering setting accuracy requirements for the formatting and rendering of full text are encouraged to consult resources that address this topic more fully (Hatii and Ninch, 2002).</p>
<p>By simply describing where quality must fall on the spectrums of legible‐to‐faithful for page images and raw‐to‐accurate for full text, technical specifications can be developed – or choices can at least be narrowed – for digital masters as follows.</p>
</sec>
<sec>
<title>Specifications</title>
<p>Standard of completeness is to:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>include all sides of all pages, including covers, in original volume;</p>
</list-item>
<list-item>
<label></label>
<p>include all sides of all pages
<italic>in situ</italic>
;</p>
</list-item>
<list-item>
<label></label>
<p>permit excluding: _____ (name the non‐essential components).</p>
</list-item>
</list>
<p>Page image masters must: (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>be saved at true proportions (1:1) to the original printed page;</p>
</list-item>
<list-item>
<label></label>
<p>meet standard of legibility (for detail and tone reproduction);</p>
</list-item>
<list-item>
<label></label>
<p>meet standard of fidelity (for detail and tone reproduction);</p>
</list-item>
<list-item>
<label></label>
<p>meet specified intermediate standard (e.g. faithful detail, legible tone).</p>
</list-item>
</list>
<p>Full text must: (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>be complete (one text file per page), but character accuracy may be raw;</p>
</list-item>
<list-item>
<label></label>
<p>be complete, with character accuracy meeting minimum level (e.g. 99.95 percent);</p>
</list-item>
<list-item>
<label></label>
<p>be complete, with accuracy meeting specified intermediate level; (e.g. raw OCR for most text, 99.995 percent accuracy for tables/formulas);</p>
</list-item>
<list-item>
<label></label>
<p>approximate (or replicate) the layout, fonts, etc. of the original pages.</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Question 4: what are the planned outcomes for the source materials?</title>
<p>Mandated outcomes for source materials shape specifications for several workflows. These decisions are fundamental to preparation, conservation and cataloguing activities, particularly if items are relocated following digitisation. Handling policies, moreover, also serve as principal criteria to select digitisation equipment (scanners, cameras, lighting, copystands, and cradles).</p>
<p>Although digitisation has often been cited as a preservation strategy that will decrease requests to consult originals, preservation administrators have underscored that reformatting itself subjects materials to risks of loss or damage (McIntyre, 1998). Chapman (1998) notes, “When considering handling at the level of the collection rather than the single item, it is not unreasonable to argue that a group of materials may never be handled to such an extent again during its lifetime”. Transporting items, even within an institution, and handling them in cataloguing, preparation and digitisation workflows introduces risks, particularly for brittle or fragile materials. On the other hand, reformatting provides an opportune time to assess, stabilise, or rehouse items.</p>
<p>As illustrated in
<xref ref-type="fig" rid="F_2640270102005">Table V</xref>
, planned outcomes pertaining to the structure, housing, storage location, and circulation policy of source materials influence costs. In general, changes to the status quo increase costs. Disbinding is a notable, and complex, exception. Pre‐scanning disbinding costs could easily be offset by cost savings in digitisation, but what additional work would need to take place afterwards? If volumes are retained, would they be rebound and returned to the stacks as circulating copies? Rebound or left disbound and moved to depository storage (not accessible to the public)? These questions help to shape cataloguing specifications (and costs), as records need to specify location and circulation information accurately at the end of the project.</p>
<p>Setting quality benchmarks for page images (Question 3) helps a manager and service provider answer the question, “How should technicians use the scanners or digital cameras in a project?” The terms and conditions for materials handling, however, answer the more fundamental question: “Which type of scanner or camera can be used for your project?” (see
<xref ref-type="fig" rid="F_2640270102006">Table VI</xref>
).</p>
<p>For single‐page sources, would autofeeding be allowed? If not, then the least expensive (highest production) type of scanner would not be viable. Given the format, size, and condition of printed pages, would technicians be permitted to turn pages or bound volumes over to position them correctly on a scanner platen? If not, then flatbed scanners would also be rejected, limiting options to overhead scanners or digital cameras with appropriate copystands and book cradles. Bound volumes that must be handled carefully can only be digitised in two ways, particularly if the volumes should not be opened fully. Either one person uses a digital camera or bookscanner with a specialised cradle (Weissman, 2002b) or digitisation becomes a two‐person job: one positions the volume and turns pages, the other operates the camera.</p>
<p>Finally, in addition to establishing criteria to select scanning equipment, handling policies help managers assess “hybrid approaches” to digitising bound volumes. Book‐to film‐to‐digital approaches might be sensible for institutions that have existing infrastructure for high‐production, safe‐handling bound‐volume reformatting. (Many e‐reserves programmes, for example, mandate book‐to photocopy‐to digital production workflows to balance requirements for handling, quality and cost.) Hybrid approaches are neither intrinsically good nor bad, but valued in a larger context of handling guidelines and overall costs. If the total cost of photocopying or microfilming and digitising is less than direct digitisation, then two‐copy workflows might be judged to be the best approach – pending review and acceptance of digital image quality.</p>
<p>By choosing outcomes for source materials, technical specifications can be developed – or choices can at least be narrowed – for materials preparation, conservation, cataloguing workflows, and scanning equipment, as follows.</p>
<sec>
<title>Specifications</title>
<p>Materials preparation require procedures and staff to (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>disbind volumes (if lower scanning costs are guaranteed);</p>
</list-item>
<list-item>
<label></label>
<p>clean, repair or conserve materials;</p>
</list-item>
<list-item>
<label></label>
<p>rehouse materials.</p>
</list-item>
</list>
<p>During digitisation (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>handling policies must be followed (describe …);</p>
</list-item>
<list-item>
<label></label>
<p>curator must be present (to supervise handling, or to handle items directly).</p>
</list-item>
</list>
<p>Viable scanner(s) (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>flatbed, overhead or camera (note whether greyscale or colour is required);</p>
</list-item>
<list-item>
<label></label>
<p>overhead or digital camera only;</p>
</list-item>
<list-item>
<label></label>
<p>book cradle(s) required (1808 cradle __ ) (908 cradle __ ).</p>
</list-item>
</list>
<p>Post‐digitisation workflows require (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>materials treatment or rehousing;</p>
</list-item>
<list-item>
<label></label>
<p>cataloguing to update item location (or withdrawal);</p>
</list-item>
<list-item>
<label></label>
<p>cataloguing to update circulation status.</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Question 5: how will people find the digital objects?</title>
<p>Digitised text objects that meet requirements for functionality, image quality and accuracy are of little use if they cannot be located and retrieved. Standards and practices for descriptive metadata, which facilitate discovery, are more mature and numerous than those for structural and administrative metadata, page images, and full text. There are probably more viable options in this area of decision making and specification setting than any other; familiarising a project team with a range of metadata standards to prioritise choices is not practical. Establishing baseline requirements for discovery helps to prepare a manager or project team to solicit expert advice, review literature, or evaluate products.</p>
<p>Questions pertaining to discovery might begin with scope:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>Where will people be able locate your digitised text resources? From one place or several? (Bibliographic catalogues? Catalogues for encoded finding aids? Web search engine indexes? Custom‐designed Web site portals? Abstract and indexing services?)</p>
</list-item>
<list-item>
<label></label>
<p>Is interoperability important? Do records created for one system (domain) need to be constructed so that they can be copied to or “harvested” by other systems?</p>
</list-item>
<list-item>
<label></label>
<p>Which access points are essential to ensure that primary audiences will locate materials?</p>
</list-item>
<list-item>
<label></label>
<p>As a corollary, which additional systems and/or access points are needed to promote use of the items or collections by new audiences?</p>
</list-item>
</list>
<p>Key cost decisions in this arena, however, pertain to authority control, levels of domain expertise required for subject cataloguing, and maintenance costs over time. Processes for updating descriptive metadata should be discussed, particularly if the metadata will be distributed among multiple systems. What application(s) need to be maintained to permit editing and updating?</p>
<p>Issues of synchronicity should also be addressed. Do digitised objects need to be identified in the same records as their print or film counterparts? If so, what happens if decisions are made not to map digitised versions to the existing levels of description and organisation of the originals? For example, if several pamphlets stored in an envelope, and described collectively as a group in the catalogue record, are digitised to produce independently titled objects. Would the catalogue record link to an intermediate page that lists all titles? Or would the record be expanded to titles individually, with links to each digital object? Or would the group‐level record be replaced by new item‐level records?</p>
<p>Naming and linking strategies are also important components of discovery metadata. When digital objects are moved to new locations, how will links be updated? Do strategies for “persistent naming” need to be explored, or is the project of small enough scale that it would be viable to update links manually?</p>
<p>By describing discovery requirements in general terms, technical specifications can be developed – or choices can at least be narrowed – for descriptive metadata and linking as follows.</p>
<sec>
<title>Specifications</title>
<p>Descriptive metadata (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>records (or tags) to be created for one “stand‐alone system”;</p>
</list-item>
<list-item>
<label></label>
<p>records need to be shared (uploaded to or harvested by other systems);</p>
</list-item>
<list-item>
<label></label>
<p>digitised objects need to be discoverable in multiple systems (bibliographic catalogues, finding aids catalogues, Web search engines, etc.);</p>
</list-item>
<list-item>
<label></label>
<p>standards/rules for authority control need to be implemented.</p>
</list-item>
</list>
<p>Naming and linking strategies (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>catalogue records or links on “main page” always point directly to object;</p>
</list-item>
<list-item>
<label></label>
<p>records/links (sometimes/always) point to an intermediate “contents” page;</p>
</list-item>
<list-item>
<label></label>
<p>persistent links desired.</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Question 6: are there any restrictions to using the digital objects?</title>
<p>This question applies to all versions of digitised text objects that will be stored and managed. If selection strategies are devised to identify works that can be distributed freely to all, does that mean that access would be unrestricted to all versions of each title – masters as well as networked‐delivery formats? Decisions regarding accessibility not only shape delivery infrastructure (e.g. systems for authorisation and authentication), but also potentially affect policies and practices for storage. </p>
<p>If some versions need to be controlled, then specify whether access needs to be mediated (e.g. to solicit payment, to solicit permissions, to facilitate non‐networked delivery of data, etc.) or restricted (i.e. accessible only to a designated subset of “the entire world”). If the latter, what is the subset and will its membership change over time?</p>
<p>By describing access restrictions in these general terms, technical specifications can be developed – or choices can at least be narrowed – for delivery systems, storage, and administrative metadata as follows.</p>
<sec>
<title>Specifications</title>
<p>Delivery infrastructure (answer Y/N for each) systems needed to:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>restrict access to authorised users;</p>
</list-item>
<list-item>
<label></label>
<p>request and receive payment from users;</p>
</list-item>
<list-item>
<label></label>
<p>forward request (e.g. to a curator), then to deliver versions asynchronously (e.g. by FTP or on CD‐ROM).</p>
</list-item>
</list>
<p>Storage (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>masters must be segregated from delivery versions to restrict access;</p>
</list-item>
<list-item>
<label></label>
<p>“response class” of storage must ensure real‐time delivery;</p>
</list-item>
<list-item>
<label></label>
<p>response class of storage must ensure asynchronous delivery.</p>
</list-item>
</list>
<p>Administrative metadata (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>level of access needs to be associated with each version of the object;</p>
</list-item>
<list-item>
<label></label>
<p>level of access needs to be associated with each digital file;</p>
</list-item>
<list-item>
<label></label>
<p>systems needed to change/update access levels over time.</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Question 7: what lifespan is envisioned for the digital resources?</title>
<p>Stated assumptions about life expectancies for digital resources help in making decisions about where data need to be stored. (Used here, “digital resources” refer to all digitised text components: descriptive, structural and administrative metadata, page images, and full text.) Digital files of all formats are physical objects whose longevity relates meaningfully to the storage environment. To generalise, library and archival collections stored in controlled environments have a greater likelihood of remaining usable over time than collections stored in secure, although relatively uncontrolled locations. Controlled storage environments are characterised by having systems and oversight designed to respond to known or perceived risks, such as biological, chemical, or mechanical failures. The notable and additional risk for digital files is technical obsolescence: incompatibility of data with associated devices needed to render information into human‐readable form.</p>
<p>The operative questions here are, “Where can data go?” and “What terms and conditions apply to storing digitised text components in these locations?” Project managers should investigate whether institutional policies exist or need to be developed to govern storage practices – as well as terms of ownership and rights of use – for materials managed by “external” entities. Emerging digital repositories and digital asset management services provide opportunities to delegate preservation responsibilities to specialists. These partnerships, however, will not only depend on negotiated agreements, but also ongoing payments. In some cases, costs are billed annually according to the file sizes of the deposited objects ($ per GB), thus decisions about where objects are to be stored might influence how materials are scanned (see Question 3). If these ongoing maintenance costs cannot be budgeted, what costs would accrue within the institution to assume responsibilities to ensure that digital objects remain usable beyond the life of the project?</p>
<p>By stating the minimum lifespan desired for digital resources, a project team can assess whether infrastructure is or will be sufficient to meet expectations of ongoing usability. Some amount of data transformation will be necessary – although how much, and at what intervals, cannot be predicted – for resources expected to keep pace with technology changes, user expectations, or both. Thus, most strategies for ongoing management of digital resources require administrative metadata to document ownership, rights, and technical attributes of data. (Some of this metadata might also be recorded to document provenance and the history of changes made to objects or their underlying components.)</p>
<p>Stating mandatory or strawman life expectancies for digital resources helps to develop technical specifications – or narrow choices – for storage, administrative oversight, and administrative metadata as follows.</p>
<sec>
<title>Specifications</title>
<p>Storage:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>if offline, what measures need to be taken to ensure that minimum life expectancy will be met? (checksums, duplicate copies, distributed copies, periodic data checks, disaster recovery plans etc.);</p>
</list-item>
<list-item>
<label></label>
<p>if online, same question.</p>
</list-item>
</list>
<p>Administrative and operational oversight: (answer “exists” or “needed”) to:</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>maintain descriptive metadata;</p>
</list-item>
<list-item>
<label></label>
<p>maintain digital objects;</p>
</list-item>
<list-item>
<label></label>
<p>negotiate with providers of storage or preservation services;</p>
</list-item>
<list-item>
<label></label>
<p>fund any of the activities cited above.</p>
</list-item>
</list>
<p>Administrative metadata (answer Y/N for each):</p>
<list list-type="bullet">
<list-item>
<label></label>
<p>ownership metadata (e.g. rights to preserve/distribute) required;</p>
</list-item>
<list-item>
<label></label>
<p>technical metadata required to facilitate future data transformations;</p>
</list-item>
<list-item>
<label></label>
<p>desire to make metadata “repository‐compliant” (regardless of whether data will be deposited to a repository right away).</p>
</list-item>
</list>
</sec>
</sec>
<sec>
<title>Summary</title>
<p>Digitisation projects require technical specifications and accompanying workflows for data and metadata production to be planned, budgeted and managed effectively. Specifications for one project may not be well suited to meet the requirements of another. Even if source materials are alike, user needs, delivery applications, and data management/preservation infrastructures can vary widely among institutions. Use of a decision‐making checklist at the start of planning is one way to establish functional requirements that managers and technical advisors may use as criteria to develop the best text digitisation strategy for their collections.</p>
</sec>
<sec>
<fig position="float" id="F_2640270102001">
<label>
<bold>Table I
<x> </x>
</bold>
</label>
<caption>
<p>Types of digitised text products</p>
</caption>
<graphic xlink:href="2640270102001.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640270102002">
<label>
<bold>Table II
<x> </x>
</bold>
</label>
<caption>
<p>Selected “flavours” of digitised text</p>
</caption>
<graphic xlink:href="2640270102002.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640270102003">
<label>
<bold>Table III
<x> </x>
</bold>
</label>
<caption>
<p>Sample list of object behaviours, by function</p>
</caption>
<graphic xlink:href="2640270102003.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640270102004">
<label>
<bold>Table IV
<x> </x>
</bold>
</label>
<caption>
<p>Sample quality benchmarks for key attributes of page images</p>
</caption>
<graphic xlink:href="2640270102004.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640270102005">
<label>
<bold>Table V
<x> </x>
</bold>
</label>
<caption>
<p>Costs related to designated outcomes for source materials</p>
</caption>
<graphic xlink:href="2640270102005.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640270102006">
<label>
<bold>Table VI
<x> </x>
</bold>
</label>
<caption>
<p>Production costs and scanner configurations related to handling decisions</p>
</caption>
<graphic xlink:href="2640270102006.tif"></graphic>
</fig>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<mixed-citation>
<person-group person-group-type="author">
<string-name>Adobe Systems</string-name>
</person-group>
(
<year>2002</year>
), “
<article-title>
<italic>The four flavours of Adobe PDF for paper‐based documents</italic>
</article-title>
”, Adobe Acrobat Capture 3.0 White Paper, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.adobe.com/products/acrcapture/pdfs/aacflavors.pdf">www.adobe.com/products/acrcapture/pdfs/aacflavors.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Chapman</surname>
,
<given-names>S.</given-names>
</string-name>
</person-group>
(
<year>1998</year>
), “
<article-title>
<italic>Guidelines for image capture</italic>
</article-title>
”,
<source>
<italic>Joint RLG and NPO Preservation Conference, Guidelines for Digital Imaging</italic>
</source>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.rlg.org/preserv/joint/chapman.html">www.rlg.org/preserv/joint/chapman.html</ext-link>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation>
<person-group person-group-type="author">
<string-name>Document Solutions</string-name>
</person-group>
(
<year>2002</year>
), “
<article-title>
<italic>Services: PDF conversion</italic>
</article-title>
”, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.document-solutions.com/services_ptop.htm">www.document‐solutions.com/services_ptop.htm</ext-link>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation>
<person-group person-group-type="author">
<string-name>Humanities Advanced Technology and Information Institute (HATII) and NINCH (2002)</string-name>
</person-group>
, “
<article-title>
<italic>Digitisation and encoding of text</italic>
</article-title>
”,
<source>
<italic>The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials</italic>
</source>
,
<publisher-name>National Initiative for a Networked Cultural Heritage</publisher-name>
,
<publisher-loc>Washington, DC</publisher-loc>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.nyu.edu/its/humanities/ninchguide/V/">www.nyu.edu/its/humanities/ninchguide/V/</ext-link>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Hurley</surname>
,
<given-names>B.J.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Price‐Wilkin</surname>
,
<given-names>J.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Proffitt</surname>
,
<given-names>M.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Besser</surname>
,
<given-names>H.</given-names>
</string-name>
</person-group>
(
<year>1999</year>
),
<source>
<italic>The Making of America II Testbed Project: A Digital Library Service Model</italic>
</source>
,
<publisher-name>Council of Library and Information Resources</publisher-name>
,
<publisher-loc>Washington, DC</publisher-loc>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.clir.org/pubs/reports/pub87/part2.html">www.clir.org/pubs/reports/pub87/part2.html</ext-link>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Kenney</surname>
,
<given-names>A.R.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Chapman</surname>
,
<given-names>S.</given-names>
</string-name>
</person-group>
(
<year>1995</year>
),
<source>
<italic>Tutorial: Digital Resolution Requirements for Replacing Text‐Based Material: Methods for Benchmarking Image Quality</italic>
</source>
,
<publisher-name>Commission on Preservation and Access</publisher-name>
,
<publisher-loc>Washington, DC</publisher-loc>
.</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>McIntyre</surname>
,
<given-names>J.E.</given-names>
</string-name>
</person-group>
(
<year>1998</year>
), “
<article-title>
<italic>Protecting the physical form</italic>
</article-title>
”,
<source>
<italic>Joint RLG and NPO Preservation Conference, Guidelines for Digital Imaging</italic>
</source>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.rlg.org/preserv/joint/mcintyre.html">www.rlg.org/preserv/joint/mcintyre.html</ext-link>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Morrison</surname>
,
<given-names>A.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Popham</surname>
,
<given-names>M.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Wikander</surname>
,
<given-names>K.</given-names>
</string-name>
</person-group>
(
<year>1998</year>
), “
<article-title>
<italic>Creating and documenting electronic texts: a guide to good practice</italic>
</article-title>
”,
<source>
<italic>AHDS Guides to Good Practice</italic>
</source>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://ota.ahds.ac.uk/documents/creating/">http://ota.ahds.ac.uk/documents/creating/</ext-link>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation>
<person-group person-group-type="editor">
<string-name>
<surname>Sitts</surname>
,
<given-names>M.K.</given-names>
</string-name>
</person-group>
(Ed.) (
<year>2000</year>
),
<source>
<italic>Handbook for Digital Projects: A Management Tool for Preservation and Access</italic>
</source>
,
<publisher-name>Northeast Document Conservation Centre</publisher-name>
,
<publisher-loc>Andover, MA</publisher-loc>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.nedcc.org/digital/dighome.htm">www.nedcc.org/digital/dighome.htm</ext-link>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Weissman Preservation Centre</surname>
,
<given-names>Harvard University Library</given-names>
</string-name>
</person-group>
(
<year>2002</year>
a), “
<article-title>
<italic>Text digitisation, OCR</italic>
</article-title>
”, available at:
<ext-link ext-link-type="uri" xlink:href="http://preserve.harvard.edu/resources/ocr.html">http://preserve.harvard.edu/resources/ocr.html</ext-link>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Weissman Preservation Centre</surname>
,
<given-names>Harvard University Library</given-names>
</string-name>
</person-group>
(
<year>2002</year>
b), “
<article-title>
<italic>Bookscanners</italic>
</article-title>
”, available at:
<ext-link ext-link-type="uri" xlink:href="http://preserve.harvard.edu/resources/bookscanners.html">http://preserve.harvard.edu/resources/bookscanners.html</ext-link>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Williams</surname>
,
<given-names>D.R.</given-names>
</string-name>
</person-group>
(
<year>1998</year>
), “
<article-title>
<italic>What is an MTF … and why should you care?</italic>
</article-title>
”,
<source>
<italic>RLG DigiNews</italic>
</source>
, Vol.
<volume>2</volume>
No.
<issue>1</issue>
, available at:
<ext-link ext-link-type="uri" xlink:href="http://www.rlg.org/preserv/diginews/diginews21.html">www.rlg.org/preserv/diginews/diginews21.html</ext-link>
#technical</mixed-citation>
</ref>
</ref-list>
</back>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Managing text digitisation</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>Managing text digitisation</title>
</titleInfo>
<name type="personal">
<namePart type="given">Stephen</namePart>
<namePart type="family">Chapman</namePart>
<affiliation>Stephen Chapman is the Preservation Librarian for Digital Initiatives at Weissman Preservation Centre, Harvard University Library, Cambridge, Massachusetts, USA.</affiliation>
</name>
<typeOfResource>text</typeOfResource>
<genre type="review-article" displayLabel="review-article"></genre>
<originInfo>
<publisher>MCB UP Ltd</publisher>
<dateIssued encoding="w3cdtf">2003-02-01</dateIssued>
<copyrightDate encoding="w3cdtf">2003</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">Good project plans include technical specifications, plans of work, budgets and schedules that are consistent with project goals. Text digitisation presents many variables and many choices, so there is no onesizefitsall solution to recommend. This article presents a simpletouse questionnaire as a tool for project managers to translate their vision of text digitisation into a series of functional requirements optimised for their collections and users. These requirements can then be used to develop specifications, draft workflows, and select appropriate staff, services and equipment.</abstract>
<subject>
<genre>Keywords</genre>
<topic>Digital libraries</topic>
<topic>Projects</topic>
<topic>Electronic data processing</topic>
<topic>Information technology</topic>
</subject>
<relatedItem type="host">
<titleInfo>
<title>Online Information Review</title>
</titleInfo>
<genre type="Journal">journal</genre>
<subject>
<genre>Emerald Subject Group</genre>
<topic authority="SubjectCodesPrimary" authorityURI="cat-IKM">Information & knowledge management</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-ICT">Information & communications technology</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-INT">Internet</topic>
</subject>
<subject>
<genre>Emerald Subject Group</genre>
<topic authority="SubjectCodesPrimary" authorityURI="cat-LISC">Library & information science</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-CBM">Collection building & management</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-IBRT">Information behaviour & retrieval</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-RMP">Records management & preservation</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-BIB">Bibliometrics</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-DAT">Databases</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-DOCM">Document management</topic>
</subject>
<identifier type="ISSN">1468-4527</identifier>
<identifier type="PublisherID">oir</identifier>
<identifier type="DOI">10.1108/oir</identifier>
<part>
<date>2003</date>
<detail type="volume">
<caption>vol.</caption>
<number>27</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>1</number>
</detail>
<extent unit="pages">
<start>17</start>
<end>27</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">C3FAF33A64CC32020BC01071612AE259794A3A61</identifier>
<identifier type="DOI">10.1108/14684520310462536</identifier>
<identifier type="filenameID">2640270102</identifier>
<identifier type="original-pdf">2640270102.pdf</identifier>
<identifier type="href">14684520310462536.pdf</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© MCB UP Limited</accessCondition>
<recordInfo>
<recordContentSource>EMERALD</recordContentSource>
</recordInfo>
</mods>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000298 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000298 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:C3FAF33A64CC32020BC01071612AE259794A3A61
   |texte=   Managing text digitisation
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024