Geoff Pullum isn't one to take censorship lying down. So he did some digging, and got a response from Websense that they were filtering by IP address instead of content, at least in the case of Language Log. So now it's no longer blocked.
In light of this, a few additional statements are necessary:
1) It is now clear that SC could have contacted Websense directly about the matter, since Prof. Pullum was able to. However, this is not how the unblocking process is communicated when things pop up. Since SC does not wish to engage in a battle over copyright, he will not reproduce the Websense message page here in whole or in part. It states, though, that if you feel a site has been inappropriately blocked, you should contact the IT department of your company, not Websense. This does not make it obvious that Websense is interested in hearing directly about erroneous blocking, although clearly they are in fact open to being contacted. Having previously been rejected by his employer's IT department over the official website of a subcontractor, and also a web interface to an ontology browser, SC had no intention of trying again.
2) The explanation offered to Prof. Pullum doesn't satisfy SC as to the mechanism involved. If Websense works by blocking packets from a given IP address once the request has been made, then it would be understandable. But this would mean that they could not correct the problem as presented to Prof. Pullum, because if they're sniffing packets for IP address, they're checking at a level too low to be able to catch which directory things are coming from, which is imperative if they're all coming through the same port. SC thinks the blocking has to be going on at a higher layer of network services, namely the HTTP layer. And at that level, they ought to be easily able to distinguish between requests to one web server versus another, even if some redirection from aliases is involved (SC would be very surprised if they aren't maintaining a column in their database which lists aliases for banned sites).
3) SC fully respects the right of employers to set policies regarding the use of the equipment which is provided to to employees. This goes double for time spent on that equipment that is being billed to customers. However, that respect is conditioned on the expectation that the policies will be clearly stated and enforced in a manner consistent with their plain meaning. This episode isn't an argument that all blocking software is inappropriate in all contexts; it is, however, an argument for greater transparency in the implementation of the software.
UPDATE: Geoff Nunberg weighs in on Language Log with some additional commentary. It seems that (2) above is basically correct, but that IP-based blocking is actually used to supplement the filtering algorithms, precisely to deal with the workaround that aliasing might provide. However, SC can't attest to the accuracy of Prof. Nunberg's statement that:
Note by the way that the filters also block all translation sites, Google cache and image pages, anonymizer sites, and other sites that return a url different from the one that was requested -- but that's another issue.
SC regularly uses cached Google pages and translation sites from behind the Websense filter at work without a problem. For all your host knows, though, this could be a matter of configuration. Whether it's a matter of configuration or simply a brute-force response to a difficult engineering challenge, though, this is rather disheartening to learn.
(updated 2/14/04 at 6:01 p.m.)
You're right -- this is a question of configuration. Sites like Google cache sites, anonymizer sites, translation sites and the like are classified separately by most filters -- N2H2 calls these Loophole sites, while SurfControl places them in a category called Remote Proxies, and SmartFilter categorizes them as Anonymizer/Translator sites. (I haven't looked to see exactly how WebSense deals with them.) A user can choose to allow this category, but a library that sets its filters at the most restrictive level will wind up blocking access to them.
N2H2 describes this category as follows: "Sites that offer a loophole that can be exploited to access pages which would otherwise be filtered out from your service. This includes proxy evasion directions, Peer to Peer software, anonymizing services, and some Web translators. etc. Unless this category is selected, the system's Internet Content Filtering protection can be compromised." (See http://www.n2h2.com/products/categories.php)
Posted by: geoff nunberg | March 04, 2004 at 02:09 PM
thanks for sharing your post with me
Posted by: coach outlet | August 18, 2011 at 12:13 AM
Coach outlet
Thanks, good information, like it very much.
Posted by: wenen | August 28, 2011 at 11:46 PM