HomeAboutTechnical and Metadata Standards

Technical and Metadata Standards

Technical Details

Equipment used:
  • Canon CanoScan8800F
  • Macbook Air
  • HP Workstation
  • Nikon D3100
Software used:
  • MP Navigator EX Ver. 1.08
  • Adobe Bridge CC 6.0.0.151 x64
  • ABBYY FineReader 12
  • Tesseract 3.04.01
  • Omeka.org
  • Omeka.net
  • Sublime
  • TextEdit
  • FileZilla
  • Cyberduck
  • Google Doc

Digitization Standards:

Scanning:

Documents and photo prints were scanned at 24 Bit 600 dpi to avoid redundant workflow later if the scanning result is not as expected - after effect software such as Macbook Preview and Adobe Photoshop can be used to resize or to reformat the files upon evaluation of the scan images quality. All the scanning output was saved in TIFF format as master file for the lossless and embedded metadata features.

Scanning of photo prints at high resolution provides the flexibility to choose the desired quality of the image to be shown on the web. Specimen photos were converted into JPEG and resized to 2,000 pixels for the long side so the details on the specimen can be viewed clearly on the website without taking long loading time. Documents scanned at the same settings as the photo for a better result on OCR tools (ABBYY FineReader 12 and Tesseract 3.04.01 are used) when the old documents that deteriorated with age were scanned at 24 Bit. However, the document scans were converted to JPEG prior to posting on the web and generally have 1,000 pixels on the long size although it can be varied subject to the clarity of the document text.

The scanning standard used meets/exceeds both Library of Congress and U.S. National Archives and Records Administration (NARA) standards.

Metadata:

The production of a standardized metadata scheme for the IU Paleontology Collection (IUPC) is relatively challenging due to the following reasons:
i) the wide variety of analog resource formats
ii) the connection between each element to the original analog resource and the digitized resource
iii) the complex relation structure of each metadata values and in differentiate them
iv) controlled vocabulary for the most important information of each specimen: taxonomy, stratigraphic units, and the horizon are unavailable

The Darwin Core is designated to describe physical objects and observations. In a scenario of digitizing a publication describing a specimen. The published work specifically detailed a specimen with photographs, would it be practical to use Darwin Core to describe the paper itself as a whole rather than to describing the specimen mentioned in the paper? What would be the granularity of the description should go to each element? If the author of the paper is different from the fossil collector who collected the specimens mentioned in the paper, should we be including the collector’s name, which element is appropriate to input the collectors’ name? And should the metadata record represent the both the original analog and digitized resource as well? What is the best practice, would the one-to-one principle work? And vice versa for digitizing a specimen that was mentioned in a published work; how could the relationship between a specimen and published work be shown in the metadata field? The current solution to address all the concern above is to first evaluate the users’ needs, of which their interest is often in the original object. Hence, the metadata record is standardized to represent the physical resource, with the customization of the Dublin Core and Darwin Core element sets as the metadata standard, and to use DC: relation in describing the connection between a specimen and the publication that used the specimen.

I am hoping from the compilation of all the IUPC records in the database, the IUPC will be able to create a paleontology controlled vocabulary with the assistant from professionals in the field.

Until then, this metadata scheme is subject to changes to best represent the IUPC material.

Metadata elements sets
  • Dublin Core
  • Darwin Core, which is the extension of Dublin core metadata schema for biological specimens.
Controlled vocabulary:

LCNAF, W3CDTF, IMT, RFC5646, LCSH, DCMI Type Vocabulary, TGN, ISO 3166-1-alpha-2 country codes, USGS Geolex, DwC Terms, GRBio, ISO 8601:2004(E)

Complete metadata scheme:

References:

http://memory.loc.gov/ammem/about/techStandards.pdf

http://purl.dlib.indiana.edu/iudl/findingaids/paleontology/encodedtext/VAD5801

https://memory.loc.gov/ammem/award99/icuhtml/dcguide.html

https://www.archives.gov/files/preservation/technical/guidelines.pdf

https://www.library.wisc.edu/digital-library-services/wp-content/uploads/sites/22/2015/06/dccompanion.pdf

Miller, S. J. (2011). Metadata for digital collections : a how-to-do-it manual. New York: Neal-Schuman Publishers.

https://www.nedcc.org/free-resources/preservation-leaflets/5.-photographs/5.1-a-short-guide-to-film-base-photographic-materials-identification,-care,-and-duplication

https://www.nps.gov/museum/coldstorage/html/filmid2_0.html