Can
Classification Systems such as the
Library of Congress Classification System (or other
ontologies) possibly classify all
documents ? Would it be useful to
classify all
documents ? I used to
think so, but now I'm not so sure:
Consider a classification system that denoted whether a document used humor, and further, whether or not the humor was funny. Consider an author writing a piece of humor which relied entirely for it's humor on being classified as being not funny. If classified as funny, the humor fails and the document is mis-classified; if classified as not funny, the humor succeeds and the document is mis-classified. Either way, the document is mis-classified.
Such
classification systems exist and are useful in the
real world---consider for example the newsgroup
rec.humour.funny, a moderated
newsgroup which tries to carry only
funny humour.
Pathological jokes have been been attempted (by myself) and submitted, but without response from the moderators (who must judge the humour of the
joke).
Cathy suggested that this apparent paradox can be resolved because the joke is impossible to construct as it contains an internal paradox (i.e. it's only true when it's false). The problem with this argument is that jokes are a literary form which has no requirement internal consistency, indeed many famous examples (much of Lewis Carroll's works for example) contain many internal contradictions.
This involves trying to prove either:
- That every document can be classified using the Library of Congress Classification System, or;
- That not every document can be classified using the Library of Congress Classification System.
There are a number of approaches to this:
Consider a document that described completely a classification scheme that apparently identical to the Library of Congress Classification System (but without any direct or indirect reference to the Library of Congress Classification System) but also asserted that a referenced document had proved the classification scheme was incomplete. This document is then submitted to the Library of Congress (LoC) for classification. If the Library of Congress Classification System is incomplete, then the classification scheme described is the Library of Congress Classification System and the document is classified with the Library of Congress Classification System materials (``Classification, Library of Congress''---Z696) . It the Library of Congress Classification System is complete, then the classification scheme described is not the Library of Congress Classification System and the document is classified elsewhere (probably under ``Subject cataloging''---Z695).
Unfortunately this approach is
flawed in that it assumes that the
LoC (or anyone else) always correctly classify documents, which is known to be untrue.
Consider a document that who's subject was the fact that the document was mis-classified in the Library of Congress Classification System.
The
Library of Congress Classification System has no difficulty classifying this document, because it is not making judgments about the relative truth of the contents of a document and the document is clearly about the
Library of Congress Classification System, so it is classified with
Library of Congress Classification System materials.
Should all documents be classified?
Consider a new document that is sufficiently metaphorical and allusionary that it could be about anything. Any assignment of subject classification by a classifier to the document instantly places that subject at the forefront of a readers mind when interpreting the book, thus the classifier biases all subsequent readers of the document.
The correct
classification is not under ``Metaphor'' or ``Allusions'' (both valid
Library of Congress Classification System classes) because these classifications are for documents that are about metaphor and allusion, not documents that use metaphor and allusion. The document could be about metaphor and/or allusion as well as using metaphor and allusion, but as previously stated they could equally well be about anything.
If the document remains unclassified, then it is largely inaccessible to library users, since much searching and browsing is performed by subject---this is certainly true of new, recently published works by unknown authors.
A document with an associated classification is a different document to one without and this classification can have a profound influence on the documents interpretation---this is made concretely true by the inclusion of cataloging-in-publication data in many modern books.
Interpretation
The work describing how to catalog and classify using the
Library of Congress Classification System is
``Subject Cataloging Manual: Classification'' which includes a section ``General Principles of Classification'' listing the 8 principles. Of 8 principles, all require interpretative evaluation (are not clear, simply and directly implementable using computers as we know them), 6 refer to large external schedules and several use terminology such as ``intent of the author,'' ``influence'' and ``appropriate'' without clear definition.
A more significant problem in attempting to prove the Library of Congress Classification System complete or in-complete is that in ``Subject Cataloging Manual: Classification'' F10 page 2 gives ``Generally Principles of Classification'' and states:
7: Unless instructions in the schedules or past practice dictate otherwise, class works on the influence of one subject on another with the subject influenced.
Any deliberately written
pathological document (a document written to cause problems) which couldn't be
classified using the normal rules could be classified with the
Library of Congress Classification System materials using this rule. Undoubtedly human
classifiers have the capability to detect
pathological documents (trying giving a
self-referential text about
classification to a classifier sometime). It is not clear, however, whether a computer can be programmed to be a complete detector of deliberately written
pathological documents.