In Part I of this series, I laid out the case for the semantic web, albeit with rather less enthusiasm than its backers. Today, the #1 issue with a bullet: agreement.
SC spent most of his paid employment time during 2003 working on building an ontology. The rest of that time, he was devoted to working on software to map terms across ontologies, a task itself so nebulous, so poorly defined, that it probably shouldn't be attempted right now. Any academics thinking they can identify SC from publications now: you're wrong. SC's team didn't publish anything about it. Never mind that, though.
To catch up any readers who don't know what an ontology is, the use of the term "ontology" without a determiner refers to the branch of philosophy that studies what we know, and how we classify it. When a linguist or computer scientist says "an ontology", it means a structured classification of the things that we know, generally sorted into a hierarchy.
Building an ontology using a framework you didn't define should be a mandatory experience for anyone presuming to tell the world how to represent meaning. Take a second to consider the meaning of the word "address", and then continue reading.
Did you think of a post office? e-mail? computer memory? giving a speech? OK, all that proves is that unambiguous terms are a good thing. So maybe our ontology needs to have 4 kinds of address, each with unique names. Let's standardize on one example to keep going, though. What terms do you need to represent a postal mail address?
ZIP code? street address? city? state? name? apartment number? salutation? Did you think to separate salutation from the rest of the addressee's name? 5 digit ZIP, or 9?
The problem is that different people store this differently in their minds, and produce very different representations as a result. Lazy programmers will want to store everything in a few strings; more conscientious or anal-retentive types will split the strings by parts, will check the format of each string for acceptability and so forth. It might not even be laziness; if you're keeping a directory of addresses for people in your school, you don't really need the same sort of elaborate validation procedures you might want for administering a criminal database. So two developers who both need to represent an address might come up with completely different data structures for the job, even though they both know all of things we listed in the previous paragraph. Remember that slow-loading web page? It includes 14 separate definitions for the word "action", which should give you some idea about how hard it is for people to agree on the meaning of a single term.
Agreeing on the format of data isn't the only challenge for the semantic web, though. Once you've got a lot of terms, you'll probably want to organize them into a hierarchy like we discussed before. Now, the question is how you want to organize that hierarchy. Here, we present a fairly serious disagreement between two ontologies prepared by teams of people with Ph.D.s in CS, linguistics, philosophy, and other related disciplines.
Consider the word "communication". For SUMO, a publicly available ontology, it's an event where information is transferred. It inherits from other concepts like so:
entity -> physical -> process -> intentionalProcess -> socialInteraction -> communication
The inheritance chain simply means that each term is held to have all of the properties of whatever comes above it, as well as having some specific facts that distinguish it from everything above.
Another ontology, which I can't name, organizes things a bit differently. Communication is a field of study.
object -> mentalObject -> abstractObject -> fieldOfStudy -> communication
Not only do the terms not mean the same thing, but what I haven't shown is that SUMO considers physical and abstract to partition the world at the highest level, so these aren't even related.
The reader might object that this just means that there's some other term which ought to mean in the second ontology what "communication" means in the first. And you're right. It's:
event -> mentalEvent -> communicativeEvent
This more readily represents the "act of information transfer", but it does not carry several pieces of meaning explicitly present in SUMO: it's not necessarily intentional, it's not necessarily social, and it's not necessarily a process implemented strictly through physical means. So even though one might write a bit of code to translate between the two terms, "communication" and "communicativeEvent", it still wouldn't tell you anything about the concepts that each one inherits from.
Worse, there's no way to automate understanding what's covered in one ontology versus another. Although there have been some fairly serious attempts at it, all of them require some kind of hard-coded mappings to begin with, and the approaches often don't generalize well beyond the pairs of ontologies they're written for. Even when successful term-to-term mappings are found, there's still the problem of enforcing agreement between the data encapsulated inside each term.
XML, RDF, DAML, and future extensions to those languages will all allow you to automate reading the way these things are represented. In that sense, they've simplified the semantic web problem. But unless you enforce the use of a standard ontology and standard features, the applications built with those languages will still not be able to talk to each other. That's why, as nice as the semantic web idea sounds, SC doesn't think it will work until somebody says "enough!", and takes control of what documents mean -- an event nobody is particularly hoping will happen.
(Edited on 1/12/05 at 3:03 p.m. to update SUMO link at request of SUMO editor Adam Pease.)
"That's why, as nice as the semantic web idea sounds, SC doesn't think it will work until somebody says "enough!", and takes control of what documents mean -- an event nobody is particularly hoping will happen."
A lurking word, "Microsoft", just shuffled its feet in the dark alleys of my mind. I shuddered.
Posted by: Virge | February 06, 2004 at 07:29 PM
This isn't going to do much for your confidence, Virge:
Microsoft MindNet
Posted by: Semantic Compositions | February 06, 2004 at 11:56 PM
We'll have an automatically updated repository of "world knowledge" force-fitted to Microsoft's ontology and frequently distorted by googlebombing practices.
I suppose this shouldn't be too disheartening. Right now we have an information network our grandparents would have thought of as utopian -- and the biggest single use for it is porn.
"I've seen the future, baby: it is murder" - Leonard Cohen
Posted by: Virge | February 07, 2004 at 12:54 AM
I'd suggest that the SemWeb approach implicitly acknowledges that a single, global "world knowledge" is not likely to happen (and probably isn't desirable!). The RDF and OWL languages do make it relatively straightforward to make 'local' ontologies and mappings between them (they are defined globally, but their application may be much narrower). There are certainly big issues on the global scale - trust etc. But most of the time in practice it isn't necessary to deal with every ontology under the sun, just the ones relevant to the immediate task.
I think it goes along the lines of the Alan Kay quote : "Simple things should be simple; complex things should be possible". The current web gives us lots of 'simple' (-ish) things, Semantic Web technologies may not offer all the answers just yet, but at least they offer a chance of complex things being possible.
Posted by: Danny | March 17, 2004 at 05:27 AM