Paradoxes in Classification Systems

Can Classification Systems such as the Library of Congress Classification System (or other ontologies) possibly classify all documents ? Would it be useful to classify all documents ? I used to think so, but now I'm not so sure:

Consider a classification system that denoted whether a document used humor, and further, whether or not the humor was funny. Consider an author writing a piece of humor which relied entirely for it's humor on being classified as being not funny. If classified as funny, the humor fails and the document is mis-classified; if classified as not funny, the humor succeeds and the document is mis-classified. Either way, the document is mis-classified.

Such classification systems exist and are useful in the real world---consider for example the newsgroup rec.humour.funny, a moderated newsgroup which tries to carry only funny humour. Pathological jokes have been been attempted (by myself) and submitted, but without response from the moderators (who must judge the humour of the joke).

Cathy suggested that this apparent paradox can be resolved because the joke is impossible to construct as it contains an internal paradox (i.e. it's only true when it's false). The problem with this argument is that jokes are a literary form which has no requirement internal consistency, indeed many famous examples (much of Lewis Carroll's works for example) contain many internal contradictions.

Attempting to prove the Library of Congress Classification System Complete or Incomplete

This involves trying to prove either:

That every document can be classified using the Library of Congress Classification System, or;
That not every document can be classified using the Library of Congress Classification System.

There are a number of approaches to this:

Consider a document that described completely a classification scheme that apparently identical to the Library of Congress Classification System (but without any direct or indirect reference to the Library of Congress Classification System) but also asserted that a referenced document had proved the classification scheme was incomplete. This document is then submitted to the Library of Congress (LoC) for classification. If the Library of Congress Classification System is incomplete, then the classification scheme described is the Library of Congress Classification System and the document is classified with the Library of Congress Classification System materials (``Classification, Library of Congress''---Z696) . It the Library of Congress Classification System is complete, then the classification scheme described is not the Library of Congress Classification System and the document is classified elsewhere (probably under ``Subject cataloging''---Z695).

Unfortunately this approach is flawed in that it assumes that the LoC (or anyone else) always correctly classify documents, which is known to be untrue.

Consider a document that who's subject was the fact that the document was mis-classified in the Library of Congress Classification System.

The Library of Congress Classification System has no difficulty classifying this document, because it is not making judgments about the relative truth of the contents of a document and the document is clearly about the Library of Congress Classification System, so it is classified with Library of Congress Classification System materials.

Should all documents be classified?

Consider a new document that is sufficiently metaphorical and allusionary that it could be about anything. Any assignment of subject classification by a classifier to the document instantly places that subject at the forefront of a readers mind when interpreting the book, thus the classifier biases all subsequent readers of the document.

The correct classification is not under ``Metaphor'' or ``Allusions'' (both valid Library of Congress Classification System classes) because these classifications are for documents that are about metaphor and allusion, not documents that use metaphor and allusion. The document could be about metaphor and/or allusion as well as using metaphor and allusion, but as previously stated they could equally well be about anything.

If the document remains unclassified, then it is largely inaccessible to library users, since much searching and browsing is performed by subject---this is certainly true of new, recently published works by unknown authors.

A document with an associated classification is a different document to one without and this classification can have a profound influence on the documents interpretation---this is made concretely true by the inclusion of cataloging-in-publication data in many modern books.

Interpretation

The work describing how to catalog and classify using the Library of Congress Classification System is ``Subject Cataloging Manual: Classification'' which includes a section ``General Principles of Classification'' listing the 8 principles. Of 8 principles, all require interpretative evaluation (are not clear, simply and directly implementable using computers as we know them), 6 refer to large external schedules and several use terminology such as ``intent of the author,'' ``influence'' and ``appropriate'' without clear definition.

A more significant problem in attempting to prove the Library of Congress Classification System complete or in-complete is that in ``Subject Cataloging Manual: Classification'' F10 page 2 gives ``Generally Principles of Classification'' and states:

7: Unless instructions in the schedules or past practice dictate otherwise, class works on the influence of one subject on another with the subject influenced.

Any deliberately written pathological document (a document written to cause problems) which couldn't be classified using the normal rules could be classified with the Library of Congress Classification System materials using this rule. Undoubtedly human classifiers have the capability to detect pathological documents (trying giving a self-referential text about classification to a classifier sometime). It is not clear, however, whether a computer can be programmed to be a complete detector of deliberately written pathological documents.

Pathological	ACM Classification Scheme	Library of Congress Classification System	ontology
classification system	Notoptera	International Institute of Bibliography	digital library
Hirsute	classify	PhyloCode	What is wrong with Star Trek?
pressure group	halting dog problem	classified	The American Scholar
Library of Congress	Barber Paradox	Helicon	Classifier
pi	Incomplete	Biological system of classification