VII

Understand the system of standards and methods used to control and create information structures and apply basic principles involved in the organization and representation of knowledge.

Discourse

What good is a collection if not organized coherently? That is the relevance and gravity of this competency: it refers to the ability to catalog, order, classify and present a collection in a easily understandable, searchable and well ordered manner. Cataloging and description is what separates a library from a warehouse of books; the systems we use to catalog our records and form subject relevant collections distinguish the library as an invaluable academic resource. In essence the record does not in itself provide enough information to use it best, and we as information professionals must create companion data (referred to as metadata) to inform the access of the patron.

Collections are described and cataloged in the field by a variety of factors, and not one size fits all. On the contrary each discipline of library science offers a unique, comprehensive classification and cataloging standard. Systems such as the Open Archival Information System (OAIS) model, Dublin Core and Encoded Archival Description (EAD), along with the profession’s professional standard for finding aids and inventories are most applicable to archival institutions and special repositories, while MARC, Dewey Decimal and Library of Congress descriptive classification systems are most appropriate for more public, open access collections. Many institutions use a variety of classification systems in conjunction with AACR2 cataloging standards (Anglo-American Cataloguing Rules, Second Edition) to create complex, multifaceted information structures, often fully integrated with digital management systems. All such structures have in mind increasing access, relevance, visibility, extensibility and “findability” (to invoke Peter Morville) of a collection. The ultimate goal is to take a disorganized collection of records and create a system which allows for them to be browsed, interrogated and classified. Metadata provides a rich context to a record by describing its origin, creator, contents and place within a hierarchy of knowledge.

The existence of standard cataloging rules, as with the AACR2 is an absolute must. Standardization allows for interoperability between library institutions. Before the creation of such standards during the 1950s and 1960s libraries often used contradictory, proprietary systems which lacked a common vocabulary or consistence. With the advent and continued development of standardization efforts library databases can now be integrated and have become massive networks no longer restricted to local physical plants. A record described in New York is just as likely to be cataloged in an identical fashion in London. This consistency not only enables the end-user to more easily query the record databases, but also provides a means by which information scientists can very rapidly share information on their collections. The apex of this cosmopolitan collaboration is in the form of Worldcat, a unified “world catalog” of most libraries across Europe, the United States and beyond, made possible by the standardization of MARC and AACR2 in descriptive practices.  Standardization does come at a price however: serious effort must be made to adhere to AACr2 rules, which is a time consuming process often necessitating the presence of full time catalogers. Lack of a common description of records results in databases which have poor search results and thus decreased information literacy.

While the AACR2 standards have been in place since the mid 1960s and established themselves in the library world perhaps the most exciting and compelling evolution of metadata and cataloging is still in its infancy. Web 2.0 introduced the concepts of radical decentralization, the destruction of linear, top-down cataloging and an emphasis on digital, online technology. In Web 2.0 projects such as Wikipedia, Flickr and Twitter, information is cataloged either by folksonomy or procedural generation. That is to say, the user decides the rules by a combination of tagging and wikified, backward compatible software. Often if information specialists are involved, they no longer enumerate the categories from the top-down (as with Dewey or Library of Congress) and often instead develop the systems which allow for user collaboration and sorting. Dissenters to this radical departure argue that the chaos intrinsic to the system makes it unsuitable for academic systems, but advocates such as Peter Morville disagree.

What is true is that these Web 2.0 systems have completely dominated the popular information world, and the old ways of classifying information appear strange and foreign to the end user, especially to the young. The technologies of Web 2.0 allow the user to manipulate and immerse within the information structures and often do the work a librarian would otherwise do. One need look no further than Wikipedia to a highly organized system, of labyrinthine yet coherent complexity, completely devised by tens of thousands of volunteers. In Wikipedia we also see the potential problems of Web 2.0 savvy cataloging mitigated by smart software, as with Flickr. While consistency may be a factor, the allure of such systems and the Web 2.0 features they offer clearly outweigh the potential concerns. Yet at the time of this writing it would be premature to throw in one’s lot with the old guard *or* with the information silo smashing others, as the technology and methods are still in flux, in a state of perpetual change and have yet to mature fully. Whether or not these systems are fully developed is irrelevant: the end user expects them, and to expose them to the old requires orientation and a labor of patience.

Nevertheless finding the manpower, time and resources to catalog collections is a hurdle to overcome, especially considering the legion of born digital records which now dominate our information landscape. A solution to cataloging the influx of data may not be found in librarians, or other professionals, but instead in gamers and voluntary user collaboration. Take the GWAP/ESP game, pet project of computer scientist Luis von Ahn, a simple multiplayer experience in which players have to describe an image using metadata (descriptors) while also matching what the other player picks. This game is behind the recent vast improvement in Google Image Search queries (which, as you may have noticed, now allows you to do all sorts of advanced searches), as the logoi derived from the game play has been imported into the search engine. The task of cataloging millions of images based on verbose descriptors would have proved impossible for a professional team, not to mention economically impractical. Yet, give the users of the internet a fun game where they have to guess what other people are thinking in describing an image, and you can catalog vast amounts of information for free.

The thing we must take away from this discussion is that while the creation and nature of information structures may vary, the existence of such structures is essential. The librarian must be capable of producing metadata, assigning it to records in a systematic way and classifying collections according to a hierarchy of knowledge. The value of the library is not the place, it is not the warehouse of books, but it is the systems and guidance which enable the patron to reach them. The librarian is an architect of information structures, and must surpass the role of book warden. This is a vocation I embrace – as we shall soon see.

Applied Work

While at SJSU I have furnished a great deal of work in regard to organization and classification of knowledge. Only a few examples of a substantial amount of coursework and work experience will follow.

My first piece of applied work I will present is a keystone work which I believe by itself can answer the question of my proficiency in this competency. It is a course project produced for Dr. Robert Ellett’s course on classification. It involved a complete process by which records are organized, classified and described. It was a group project involving myself, Dorothy Russell, Josh Tiffany, and Nick Velkavrh. The project called for cataloging eight books. A PDF listing the characteristics of the books and book covers was provided to the group, and complete MARC records had to be created in consideration of it. As a secondary component of the project, the records had to be organized in a “bookshelf” and by Dewey Decimal call number. I was initially responsible for cataloging two of the records, as well as configuring the project on Google Docs. As time went on everyone in the group partook in long meetings and email correspondence, scrutinizing every detail of the MARC records we had produced. We were informed through course documentation that if more than two errors were located in the final product that we would receive a grade of C, and additional errors would constitute a failing grade. An error could be considered something as minor as a period or semicolon in the wrong place. Accordingly immense effort was taken to produce perfect MARC records.

This process involved: inspecting the scenario records from the PDF provided by Dr. Ellett, then translating those characteristics and details into MARC, via AACR2 standards. Naturally some of the MARC fields required additional expertise outside of simply having a knowledge of the formatting rules, including the ability to build Dewey Decimal and Library of Congress numbers, as well as conduct research in OCLC. The project paper concludes with a series of short essays justifying the group’s classification decisions which demonstrate the depth of our understanding.

My second piece of applied work was another group exercise, with the same group, in which we were asked to create Library of Congress (LC) call numbers, including an appropriate MARC tag, subfield, and cutter number including the publication date. I was responsible, as above, with an equal number of the titles as my group mates. After exchanging answers via Google Docs, we then exchanged email correspondence until a consensus on contentious entries was achieved. This evidence further speaks to my ability to work with the LC system.

My third piece of evidence is yet another group exercise with the same group. The same as above, except with Dewey Decimal. Same methodology, results and significance – yet in this case the work demonstrates a mastery of cataloging in accordance with Dewey rules.

My fourth piece of applied work is comprised of two components: a PDF file containing a hierarchy of subjects and subheadings, and an excel spreadsheet containing a list of records organized according to the former. This was a work I produced for Credo Reference, a private company operating out of Boston. Credo Reference offers an online encyclopedia aimed at serving an allegedly more academic substitute to Wikipedia. I was hired on, along with a team of other interns to organize Credo’s topic pages (the equivalent of entries) and to catalog them according to a more robust and extensible set of rules. That being said, the first document are the new subject hierarchy we produced for the company. The base was produced by John Shawler, my supervisor, but was heavily expanded upon by a think tank of interns on a daily basis, so much so that the final product bears little resemblance to the original draft. My areas of responsibility were science, psychology and technology, I had a fundamental impact in this area of subject organization, but also played a significant role in the organization of the other subjects and headings. My main contribution was ensuring academic coherency and depth of the final subject headings. As the headings were being selected over a few weeks, our group of interns was also tasked with organizing the topic pages on the Credo Platform according to the new classification system.

The second document component is therefore a spreadsheet containing the complete reclassification project. I was responsible for reclassifying the science, psychology and technology sections. This work, spanning weeks at four or five hours a day, involved conducting research on each topic page subject, and then fitting each into the schema of knowledge we had constructed. Subsequent work at Credo during Fall 2011, which I am not submitting as evidence but nevertheless must be stated for completeness sake include:

  • Creating topic pages (the equivalent of writing or compiling encyclopedia entries) and classifying them so that they function within the Credo Reference platform
  • Organizing preexisting topic pages according to the old subject hierarchy and
  • Researching reference databases, organizing descriptions and classifications of them within the new subject hierarchy, and configuring them to display on the Credo Platform

Together my work at Credo was valuable as an experience in that I was able to contribute to the creation of new information structures, apply them to a wide variety of content and to consider the necessary steps to do so again in the future.

My fifth piece of applied work is an EAD stub from my work at Stanford University. In Fall 2011  I was tasked with converting OCR finding aids and inventories to EAD stubs using Stanford’s standard format. My work involved examining the OCR records and manually copying the contents into EAD format, by means of an excel spreadsheet with transformation operation formulas. I used Oxygen XML editor to manage, validate and create the XML files. Complete stubs were then entered into the Stanford Archive’s database by means of Archivists’ Toolkit. During my work I studied and mastered the EAD format and am now capable of producing metadata relevant for archival institutions. The stub is in XML format and is one of dozens that I furnished for the university. Accordingly this piece of work also speaks to my ability to work with advanced code bases and information technologies.

My sixth piece of applied work is an OAIS metadata sheet that I produced, amongst many others, for a digital preservation project I overview in competency VI. The project called for creating a digital preservation plan, part of which was the creation of metadata sheets in accordance with archival standards for born digital objects. This speaks to my ability to create information structures for digital repositories, born digital records and to understand the importance of providing metadata for such records in the face of digital obsolescence and other factors I survey elsewhere.

Cataloging Course Project (.PDF)

LC Classification (.PDF)

Dewey Decimal Classification (.PDF)

Credo Reference Subject Headings and Subheadings (.PDF)

Credo Reference Topic Page Organization Spreadsheet (.XLS)

EAD Stub (.XML), also on PasteBin for easier viewing of those who do not have access to an XML editor

OAIS metadata sheet (.XLS)

Bibliography

Chan, L.M. (1994). Cataloging and Classification: An Introduction. White Plains, NY: McGraw-Hill Humanities/Social Sciences/Languages.

Fritz, D. A. (2007). Cataloging with AACR2 and Marc21: For Books, Electronic Resources, Sound Recordings, Videorecordings, and Serials, 2006 Cumulation. American Library Association.

Morville, P. (2005). Ambient Findability: What We Find Changes Who We Become. Sebastopol, CA: O’Reilly Media.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>