Semantic Compositions has previously been on record as saying "bah, humbug!" to the Semantic Web (here and here). Mark Liberman brings it up again in the context of a new thesaurus specification language, with the same reservations that SC has (which, in fairness, he was worrying about several years before SC started blogging).
Prof. Liberman addresses a particular annoyance of SC's, which is that it's very, very hard to hope to keep up with the proliferation of spelling and other variants for named entities. Quick, what does Big Blue refer to? (Answer: IBM, or maybe that's International Business Machines.) He comes up with 37 variations on a laboratory procedure which bears the acronym RT-PCR. SC has encountered similar difficulties in building ontologies about terrorism -- there's 1.4 million mentions of "al Qaeda" out there, but there's also some 65,000+ mentions of the possibly related terrorist organization "al Queda" (hint: they're the same). It's no better with Kaddafi. Or Khadaffy. Or Qadaffi. Or Ghadaffi. Or whatever this guy's name is being spelled as today.
SC particularly empathizes with Prof. Liberman's lament that:
It's fair to respond that the authors of SKOS are trying to solve a different problem, namely how to let people who are putting explicit semantics in their web documents do so in a way that allows for variable concept labels and partly-related alternative conceptual schemata. Fine -- but some people may think that this will help those who want to represent the content of the kind of documents that ordinary folk write in natural language, especially for documents that are are scientific or technical in character. But it won't.
As SC formulated the same complaint: "So the mindset behind the semantic web goes something like this: NLP is hard. If a task is hard, we quit. Therefore, since NLP is hard, we quit."
Your host suspects that Prof. Liberman would agree with the following statement: The problem with the Semantic Web initiative isn't technological, it's sociological. There's too much of an emphasis on proliferating standards, and coming up with painfully forced cute acronyms (like making "Web Ontology Language" be represented by the acronym "OWL", which plainly violates every intuition that anybody not involved with the Semantic Web has about acronyms); there's not nearly enough time spent on considering what the purpose of the technology is.
SC is not at all opposed to the idea that it's useful to have a highly expressive formal language for representing the meaning of text, and doesn't think Prof. Liberman is, either. SC would like to continue to do what Prof. Liberman is talking about: "represent[ing] the content of the kind of documents that ordinary folk write in natural language, especially for documents that are are scientific or technical in character". In particular, he likes to build applications, and exchange them for pictures of former presidents, preferably printed on paper from these folks. This is a job which is greatly complicated by two tasks: 1) keeping up with the constant fiddling with language specifications, and 2) manually adding lots of aliases to the core ontology. (1) is a problem which will only be solved by mass hypnosis at an upcoming DARPA program meeting, and I'm not sure I care which. (2) is something that SKOS doesn't help me stop doing.
SC has also done commercial work on mapping terms between ontologies, which the SKOS project is also supposed to help with. As your host has said before, having mappings doesn't actually do a whole lot to resolve the basic incompatibilities implied by the different hierarchies that the terms are sitting in. Speaking from the commercial side, SC would love to be able to take some of the Semantic Web standards and build some genuinely novel information retrieval applications off of them. Right now, though, they're too concerned with logical niceties, and not at all with being robust enough to survive real-world cases where you can't hope to automatically extract all of the data that they need. OK, fine, you can hope. But you might want to place a side bet against that, just in case SC is right.
Comments