Friends of Semantic Compositions

January 2009

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Site Statistics

Blog powered by TypePad

January 16, 2009

Cheers to the Chief

Long-suffering martini purists long ago recognized that any old drink that happened to be served in a martini glass would be given that name, however undeserving, but stand firm on the classic recipe nonetheless:

[A]s a result of the publication of this monograph, I shall be offered innumerable Martinis. I also know that most of them will be downright poisonous or otherwise unacceptable...
In the first place, the Martini on the rocks is an abomination, and must be classed with fast foods, rock and roll, snowmobiles, acid rain, polyester fabrics, supermarket tomatoes, and books printed on toilet paper as a symptom of anomie. My Martini shall be served "straight up" in a thoroughly chilled, stemmed glass. The gin, but not the vermouth...shall have been chilled before mixing, and the gin and vermouth shall be shaken or stirred -- I don't care which -- with good ice. "Good" means made from spring water, or failing that, Perrier or the like. (Lowell Edmunds, "Martini, Straight Up", Preface to the First Edition, 1979)

SC confesses, before preceding further, that he is a vodka martini drinker, not a gin martini drinker, and has a taste for many of the adulterated concoctions that earn Dr. Edmunds' disapproval. Nevertheless, a short sampling makes reasonably clear that -tini is now simply a suffix meaning "served in a martini glass, possibly containing vodka or gin":

  • A partial selection from the Cheesecake Factory (click on "From the Bar"; formatting per original):

    TROPICAL MARTINI
    Absolut Vodka Shaken with Passion Fruit, Mango and Pineapple

    BIKINI MARTINI™
    Malibu and Cruzan Pineapple Rums with Pineapple Juice.
    Delicious!

    ASIAN PEAR MARTINI
    Absolut Vanilia Vodka, Pear, Sake and Passion Fruit

    RED RASPBERRY MARTINI
    Stoli Raspberry Vodka, Chambord and Fresh Raspberry

  • A couple from the Ruth's Chris Steak House drink menu (formatting largely per original):

    Fusion Martini                
    The ultimate fusion of Belvedere Lemonessence &
    Belvedere Vodka with a touch of fresh lemon sour. Served up.

    Pear Twist Martini                
    Belvedere Lemonessence is "twisted" with
    Absolut Pear Vodka and fresh lemon sour. Served up.

So when SC saw the following show up in an e-mail from the Patina Group this afternoon, he wondered what was particularly Obama-related about an "Obamatini":

Obamatini

Alas, no information is available from the company's website, so SC set out in search of Obamatinis to see what other people thought might be characteristic; recipes from the first 4 pages of Google results are shown:

  • Chocolate vodka, Frangelico and Chambord (link)
  • Zodiac vodka, Blue Curacao, sour mix and blueberries (link)
  • Godiva white chocolate liqueur and vodka (link)
  • Rye (eww), cardamom vodka (double ewww), various fruit juices and grenadine (link)
  • Ciroc (a grape-based "vodka" only dubiously entitled to the name), lemonade and Chambord (link)
  • Vanilla vodka and Godiva dark chocolate liqueur (link)
  • Self-consciously avoiding "race-based liqueurs", Grey Goose, blueberry juice and Chambord (link)

The obvious choices are things that link Obama to his party (Blue Curacao, blueberry juice) or are suggestive of physical appearance (the various selections of chocolate vodka), but the recurring Chambord was a total surprise -- nothing whatsoever obviously links the President-Elect to raspberries generally or Chambord in particular (including several pages worth of Google results for each pairing).

A final note of linguistic observation: SC was surprised it was "Obamatini" instead of "Obamartini", as the latter seems like the obvious choice from an orthographic standpoint. However, Obama and -tini are clearly the morphemes available for combination (when it isn't a Baracktail instead), and so the morphologically expected result is observed instead.

 

January 15, 2009

Write or Die

Readers wondering how they might have prevented SC from going absent so much since 2005 will be thrilled to discover Write Or Die, a web application that is best described by its author:

The idea is to instill in the would-be writer with [sic] a fear of not writing. We do this by employing principles taught in Introduction to Psychology. Anyone remember Operant Conditioning and Negative Reinforcement?
...
Consequences:

  • Gentle Mode: A certain amount of time after you stop writing, a box will pop up, gently reminding you to continue writing.
  • Normal Mode: If you persistently avoid writing, you will be played
    a most unpleasant sound. The sound will stop if and only if you
    continue to write.
  • Kamikaze Mode: Keep Writing or Your Work Will Unwrite Itself

The penalties are not cumulative -- "kamikaze mode" does not play unpleasant noises, and in SC's opinion, the deletion of his text is actually less annoying than the car horn-like sound, "peanut butter-and-jelly song", and other obnoxious sounds that have been employed. The model for writing that it presents is that stream-of-consciousness trumps quality, but there's something to be said for forcing yourself to just get a draft done.

("Courtesy" -- thanks a lot guys -- of the forums at the world's best website)

January 14, 2009

Paging "Ironhead"

SC flew back home from LSA earlier this week just in time to watch the Pittsburgh Steelers ruin what had only lately become a decent season for his beloved Chargers. While he can take warm comfort in the fact that everyone in Pittsburgh woke up to 16-degree Fahrenheit temperatures while he resumed his comfortable temperate-zone lifestyle, the Terrible Towel wavers get to contemplate the prospects of a Super Bowl if they take care of the Baltimore Ravens.

And it's that Ravens part that is precisely the problem for Pittsburgh mayor Luke Ravenstahl, according to this story from ESPN. Callers to a local radio show apparently have persuaded him to file -- but not pay the fee to finish the job -- an application for a name change to Luke "Steelerstahl". You can see a picture of Mr. Steelerstahl filling out the paperwork in this article from the Pittsburgh Post-Gazette. Legitimately asked by the Post-Gazette is whether or not this could actually have legal ramifications (because the mayor is signing documents under a pseudonym if he uses Ravenstahl, presumably):

Left unanswered: Whether official actions, like vetoes of council legislation, could be challenged legally if the mayor's name is in limbo. "You'd have to ask the legal folks that question. I guess there's no truth to the rumor, either, that [Council President] Doug Shields came down and applied to be Luke Ravenstahl now that I've given up my name."

Also left unanswered: whether the home team missed an opportunity twenty years ago to draft University of Pittsburgh star Craig Heyward and get him to change his nickname to "Steelhead".

January 13, 2009

What a...great idea!

One of SC's favorite sessions at the 2009 LSA meeting was titled "Computational Linguistics: Implementation of Analyses against Data". Go here for a listing of the papers (it's session #30). There was a very conscious effort this year, driven very clearly by the efforts of Emily Bender and Terry Langendoen (they had a joint session to themselves earlier for this purpose), to present computational methods as desirable technical approaches to handling theoretical issues, which is exactly the sort of thing your host has always wanted to see develop further. Herewith, a little about each of the talks:

Emily Bender kicked off the discussion with a presentation on a grammar she built for the extinct language Wambaya (making use of 801 examples drawn from the documentation in Rachel Nordlinger's dissertation). Ordinarily, testing all sorts of licensing constraints and making sure that your newer rules don't break your older rules is a process that can take months. However, with the aid of the Grammar Matrix, a tool for writing and testing analyses in the Head-Driven Phrase Structure Grammar formalism, she managed to produce a grammar that correctly analyzed 91% of the cases in her development set, and 76% of cases in a separate test set, spending 210 hours in 5 1/2 weeks to accomplish this task. The introduction of formal test and development methods into the construction of theoretical analyses is welcome, and the steadily rising graph she presented to document the improvements in the grammar as a function of time was frankly astounding. If the only thing anyone took away from the presentation was that they should bring a genuine test plan into their work and actually keep metrics of their work as it progresses, the talk was a success. That it made such a convincing case for the utility of automated parsing and generation as core tools in doing theoretical work is a dream come true.

Next up was a presentation by Jason Baldridge and Katrin Erk of progress in a research project titled Efficient Annotation of Resources by Learning. Their team is tackling the problem of constructing interlinear glosses for text in languages where little prior data is available -- a problem for just about any small minority language in the world, and hence one where an efficient computational solution could reap enormous rewards (scientifically -- the IPO might be a bit more of a pipe dream). For the LSA talk, they described an experiment where 2 trained linguists were given 100,000 clauses of a Mayan dialect called Uspanteko (you can see an example at the project wiki), one of whom was a speaker of the language, and the other of whom was a theoretically knowledgable individual with no Uspanteko experience. The question posed was: how much can you gloss in 2 weeks with a little help from a computer? And the answer appears to be: with random selections from the corpus (to keep from overtraining on sequential -- and possibly contiguous -- material), enough to get a machine learning algorithm to predict labels for the entire corpus with about 30% accuracy. That's not good enough to leave the job to the machine, obviously, but it is good enough to already help rank possible tags for a user to speed up their manual annotation, which is exactly the application they're developing. If you've never tried to use an annotation interface that doesn't know anything about what you're up to -- or worse, tried to do it in a plain-text editor -- trust SC when he tells you that any further progress these folks make will be a blessing.

Following the EARL team, Nianwen Xue, Susan Brown and Martha Palmer presented a paper titled "Computational lexicons: When theory meets data", covering work on building a computational lexicon integrating data from a number of prior projects, which you can browse here. Specifically, they wanted to provide a resource combining the semantic role data found in PropBank (a treebank that encodes data about verb arguments in real sentences) with VerbNet, a very detailed implementation of Beth Levin's work on verb classes. The reason you would want this integration is that sense data is notably lacking from the PropBank, itself an extension of the Penn Treebank, and this is a Bad Thing when trying to train a parser to assign semantic roles to new text. The tagging procedure by which they accomplish their integration is sensible enough, albeit not something to write much about, but the import of the work is clear -- you really can build a computational resource that is faithful to both the needs of statistical parsing and generation algorithms and linguistic theory. It's not hard to imagine building a variety of potentially very interesting applications using a word-sense-aware parser backed by this lexicon, because a little semantic role data is a lot better than nothing at all.

Next up, Jason Riggle and John Goldsmith presented a paper with a too-rare title, "Information-theoretic approaches to phonology", which appears to be an update of this 2007 manuscript. Prof. Goldsmith gave a plenary address at the previous LSA meeting on computational methods, based on this paper, which provoked a certain amount of misunderstanding and suspicion that he was somehow not interested in finding out what was going on inside people's heads when they use language. Nothing could be further from the truth; the current paper demonstrates how the classic autosegmental theory of phonological tiers could be expressed in terms of probabilities for both consonant and vowel segments. More than that, it introduces the use of a genuinely zero-based metric for evaluating the quality of a phonological model, by tying the comparison of models to the number of bits needed to represent segments and words. Now, SC would stipulate that it is not at all clear that the language apparatus always and everywhere chooses the most efficient coding scheme that could be computed. However, as a metric for evaluating whether or not a particular theory has explanatory power, this is an excellent approach. If you can't show that your theory actually buys you something better than a naive n-gram model, you had better have some other compelling reason for adopting your proposal. Indeed, the autosegmental model actually was not the most efficient from a bits/symbol perspective, but the evidence for tiers is compelling enough to not discard them in favor of flat bigrams.

Finally, the talk that most excited SC was saved for last -- Christopher Potts presenting work with Florian Schwarz on getting pragmatic data out of reviews from TripAdvisor and Amazon. The methodology is brilliantly simple: these sites give you a convenient 5-point scale for rating things, with clearly defined negative and positive opinions. So count up associations of ratings with words, and you've got yourself a taxonomy of emotional baggage. Leaving the details of the computation to the linked paper, the paper demonstrated that "what a" tends to be a useful signal of heightened emotion:

  • What a dump!
  • What a nice hotel!
  • What a completely quite neutral reaction I'm faking to throw off the math!

In all seriousness, phrases like "what a" are found to show up in both 1- and 5-star reviews, indicating extremity of reaction (although not polarity), while other words have more clearly directional connotations, like "wow" (positive) and "never" (negative). Even with noise of the sort introduced above, Potts and Schwarz show their results to be remarkably robust, with spurious examples of the relevant constructions to occur with frequencies that are orders of magnitude below the cases of interest. These are the sort of  lessons one would ordinarily learn through survey-based research with lots of manually tabulated results and much smaller quantities of data. As a pure language-engineering tool, the applications are obvious -- it's easy to imagine conducting tests to start classifying all sorts of words as emotionally laden, positive, negative, and so forth, and integrating that into software that acts on opinions. As a research tool for theoretical inquiry, one can just as easily imagine constructing a program to serve as a filter for finding examples deserving closer scrutiny in a corpus.

What all of the papers from this symposium have in common is a commitment to the utility of theoretical linguistics, combined with an equally fervent commitment to the idea that systematic counting of examples is a legitimate way to validate your theories. The notion that a good theory ought to be able to survive contact with data doesn't require an abandonment of theoretical work in itself, and bringing a formal development cycle to your work is simply a dose of good-for-you discipline.

January 12, 2009

A word for our grandchildren

SC attended the American Dialect Society's Word of the Year vote at the LSA conference this past weekend, together with the Tensor and Russell Lee-Goldman, and wishes to add a few things that aren't made clear simply by reading the press release.

1) The voting is quite informal, more so than the apparent exact counts would indicate. If something was an obvious winner on a show-of-hands, an appropriately large number was simply estimated.

2) Fish pedicures are new to SC. They were also new to a woman in the crowd who stood up right before the vote on "most outrageous word" to ask, "Is this a hoax?". After half the crowd responded with an enthusiastic no, she sat down with a look of horror that was completely understandable, but also the funniest moment of the event.

3) "Category" is a rather flexible notion. For example, the category of "Election-Related Words", which the Tensor theorized was intended to be a catch-all in an attempt to avoid having the whole event be about the election, 7 of the 11 election-related words to show up in the nominations (excluding the "word of the year" category itself, which consisted of nominations made after all other categories had been voted on) were outside the election category.

4) "of the year" is also rather flexible. "-licious" has been around for a while, a fact which actually caused the person who nominated it to withdraw it. "[name] the [job]", a category meant to encapsulate "Joe the Plumber" and his numerous namesakes, is a pattern going back decades as the other bloggers present observed (think "Rosie the Riveter" and "Bob the Builder", which both add a phonological constraint of alliteration). John McCain has been referred to as a "maverick" countless times over more than a decade, and "lipstick on a pig" is attested repeatedly in business contexts predating Barack Obama's 2008 use of it (here's one from 2000, and another from 2007). 

Despite the hiccups, SC actually thinks the right word of the year was picked, that being "bailout". Unless there is a stunning and permanent change in the way the United States' government finances its operations and prints money, it is highly likely that bailout funds spent today will still be part of the national debt 30 years hence, and as much a part of the conversation about solvency as any entitlement program that exists today. Thus, your host will be rather surprised if his grandchildren don't end up griping about financing "The Great Bailout", as he suspects it will be known by then. Also, as the chair of the voting pointed out, it would be a lot easier to explain picking "bailout" than picking "Barack Obama" as a pair of combining morphemes (think "Barack star" or "Obamanation"). So well done, American Dialect Society -- bailed out by the bailout.

January 11, 2009

SC does LSA 2009

Hi again.

So your host has been MIA for quite some time. But that didn't stop him from going to LSA 2009, and having a good time. Highlights included, in no particular order:

  • A symposium on applying computational methods to linguistic analysis
  • A panel discussion on linguists as expert witnesses, and possible credentialing
  • Terry Langendoen and Emily Bender's presentation of a vision for future computational resources
  • John Rickford's plenary session on "age grading" -- how the characteristics of individual grammars change over time
  • Getting together with the alas-rather-shrunken community of bloggers in attendance

Having taken copious notes on some of these, SC hopes to write about them over the next week. But promised posts having been a source of trouble around here ([says the understatement king of all time -- Ed.]) let's just see what comes next.