Gelf Magazine - Looking over the overlooked

Comedy | Internet | Media

July 12, 2005

Does Google News Have a Sense of Humor?

How does Google News treat satire? And what's in the search algorithm's secret sauce anyway? Gelf investigates.

David Goldenberg

Axis of Logic Screenshot
Courtesy Opinion Journal
Axis of Logic took the top spot on Google News with its satirical story about Bush's arrest
On Dec. 1 of last year, the top spot (in the upper left-hand corner) on Google News was taken by the Axis of Logic, reporting that, upon his visit to Canada, George Bush had been arrested and charged with war crimes (see the screenshot at left). The article said, “Under both Canadian and International War Crimes law George Bush is being charged with genocide, torture, murder, and the most strong of all war crimes, the crime of war of aggression.”

Of course, the article wasn’t describing actual events. But many people were appalled that such an obvious piece of satire could have gotten prominent, serious placement on a website of the the most-valuable media corporation in the world. (Google recently overtook AOL Time Warner for this distinction.) Slate’s Jack Schafer asked, “Will somebody teach Google News' algorithms a sense of humor?” Certainly, Google News relies deeply on bytes of code—its founder Krishna Bharat told Wired Magazine that “It's a computer, and computers do not understand these topics the way humans do and can't be systematically biased in any direction.” But that’s not exactly true. Within Google News, there are people doing some work, like the Oompa Loompas who run the great machines at Willy Wonka’s Chocolate Factory. And Google’s Oompa Loompas do have some control, which allows them to express both a bias and a sense of humor.

Communication By ‘Pizza’

How does Gelf know anything about how Google News deals with satire? Little information comes from the company itself. Gelf’s questions were passed around from one Google public-relations representative to another, but no spokesperson ever responded to any of our questions about how Google deals with satire. I’ve written about Google before (in Gelf about Mark Jen, who was fired from the company for blogging, and in a Wired Magazine piece with Terry Tang about hidden perks at Google and other companies), and evasiveness with the press seems to be the company’s modus operandi.

So Gelf talked with those publications that are a part of Google News’s system to try to understand how humor and computers can coexist. Gelf sent a survey to all of the publications it could locate that Google lists as ‘satire’ sites. We found about 40, but there are likely more. (Google News places the word ‘satire’ in parentheses next to the name of the publications it labels as such.) We also talked with some publications that were once labeled as satire and are no longer, as well as some that publish satire but remain free of that label (like the Axis of Logic). We heard back from over 20 sources, and got a good sense of how Google News deals with satire.

Very few of the people we talked to had any communication with programmers at Google News outside of the initial ‘you’ve been accepted’ email. Of those who did, none ever spoke by telephone to a Google News programmer. “You can’t get any information out of these people,” says Eric Olsen, the founder and publisher of Blogcritics, which was once incorrectly listed as a satire publication. “There is no person identified, no phone number to call, and any time I have a question I just write back to the address and it takes a week. Even if it’s an emergency. During the satire thing, I convinced the woman at the front desk it was an emergency. She gave me a special code to communicate with them right away. For that day, I think the emergency code was ‘pizza.’ Even then, it still took a full ten days to get a response.”

At Uruknet (a non-satire site from Italy), webmaster Vincenzo Viscuso also found that Google responded slower when there was an emergency, such as the two times that Google News dropped Uruknet as a source without explanation: “In a previous occasion, we wrote to Google for a problem, and they answered us in two or three days; instead, in these two cases, they did not answer us at all.”

468C

Who Gets In?

Google News collects and publishes articles from over 7,000 sources, ranging from the New York Times to Slashdot to Heckler Spray. Google continues to add more publications to its sources list, including, recently, some blogs. But like with much of Google’s inner workings, the news arm's method of deciding which sources to publish remains mysterious. Certainly, to be included, a site must be submitted to Google, and it seems to help if the site is submitted by someone who runs the site. Of the 15 publications we surveyed, all but two of them had been submitted by someone inside the organizations.

Some publications are uncomfortable with what would seem to be a symbiotic relationship. They don’t like the idea of Google making money off of their research, and are willing to sacrifice the readership that Google brings them in order to keep what they see as control of their product. Earlier this year, Google News removed Agence France-Presse after the French company threatened to sue over copyright infringement. (See a copy of the complaint in PDF here).

But many publications are submitted everyday by the people who run them, and most of them are rejected, at least the first time around. Some have suggested that sites (like the conservative Human Events Online) are kept out because the opinions expressed on them clash with political beliefs of programmers at Google (Romenesko), but this seems unlikely. Some sites are not included because they are found to contain inappropriate language, but the sources aggregated by Google reflect opinions from across the political spectrum. And while those publications listed as satire tend to skew left, this is more likely a reflection of the current political state than of any intention of a political bias.

(In case you were wondering, yes, Gelf did ask Google News to consider using our feed and, yes, Google News did reject us.)

Getting ‘Nuked’

The biggest complaint that any of the publications had was that Google is willing to delist or change the listing of publications without explaining to them what is going on. For some satire publications who take pride in stirring up controversy, that means that their status at Google is constantly at risk. “We got nuked from Google News for exactly 30 days,” writes Kamal El-Din of Unconfirmed Sources, an online satire magazine that carries headlines like “Karl Rove Awarded Presidential Medal of Freedom For Defending Identity of Valerie Plame.” “This was about two months ago and I never figured out why. I asked but never got a response. I would like to know which story got me bounced so I can chill my free-speech rights. And I guess that’s the only bummer (and a small one at that) about Google News. It’s a mostly free outlet, but not quite.”

“I think we got banned or suspended because of complaints to Google about our content. We got some hate mail like ‘You guys are a bunch of lying bastards and I’m going to tell Google on you’ so I’m guessing that is what they did.”

The same thing seems to have happened to Uruknet, the anti-war (but not satire) site run out of Italy. It has been cut off from Google News twice, and once wiped from Google search altogether. “Of course, we searched very well for the causes of both lockings,” writes Viscuso, the webmaster. “And in both cases we were sure that it was not a technical problem, looking at the log of Google-robot visits, also because, strangely, they happened just after two campaigns against us.”

“The first time the cause was a certain Michelle Malkin that, from her blog, invited her visitors to write to Google News to make uruknet.info be excluded; after some days, these happened. We did the same thing, and invited our visitor to write, also providing an easy form; some hundreds of people wrote to Google News using the form, and we feel other hundreds wrote directly. So, Google re-indexed us.”

Indeed, Google News seems to rely heavily on reader input to manage its sources. Eric Olsen of Blogcritics told Gelf that he believes that his site was labeled as “satire” ex post facto after someone wrote in to Google News to grumble: “They make it real easy to feedback as a reader. If they clicked on a link and it was crap, they’d complain.”

Les Blough, the editor of Axis of Logic, found that articles that he wrote (though not the rest of the site's stories) were also passed over by Google News after an outraged reader wrote to him saying he was going to complain to Google. In an open letter to alternative media outlets, he wrote, “None of the articles which I authored were picked up by the Google crawler following this complaint, even those published immediately before and after were picked up and distributed.”

What Is Satire, Anyway?

Even though Olsen’s site, Blogcritics, consists almost exclusively of entertainment news and reviews, it occasionally does print a satire piece, and he believes this is what led Google News to change his site's listing. “There’s no distinction. You are what you are,” he says. “They label your entire site satire. As of 8 o’clock Tuesday night, the week before the final of American idol, everything started showing up satire.” After several people wrote in on behalf of Blogcritics, Google removed the satire label.

Like Blogcritics and many other sites, Axis of Logic prints both satire and straight news, and remains unlabeled on Google News. Other sites, even those that print no satire at all, have been labeled as such. Recently, left-leaning political blog Daily Kos was also labeled and delabeled satire. And while the penis-joke-friendly Washington insider blog Wonkette is currently listed as satire, its equally snarky (and less newsy) sister publication Gawker is not.

"Some of what blather.net does is satire—some of it isn't,” the site’s Dave Walsh writes. “Actually, listing us as satire means that we get to publish relatively controversial stuff. Whether people believe it's satire or not is another thing.”

Satire editors tell Gelf which stories brought them the most readers from Google News

•Frederick Gundling, the editor of the Washington Obfuscator, writes “To date, our biggest article was a story on Lynndie England around the time of her trial.”

•“The Britney is pregnant story did well for me,” writes Gary Smith of the Voice of Reason.

•“On breaking news, our stories often are listed first and with a photo,” writes Todd Fox of the Swift Report, pointing to a story of how Brad and Jen broke up because of their political differences.

•“Days before the verdict was announced, we did a piece about how Michael Jackson was acquitted and escaped with a llama and a monkey,” says Ridiculopathy’s Mark Arenz. “The picture was a badly done photoshop, on purpose.”

•Dave Walsh of Blather writes, “We've often been in the Top 10 for several stories like this one” about Ireland’s unwed mothers.

Placement, the ‘Secret Sauce’

How does Google decide which publications have the most relevant information regarding the important news of the day? The algorithms that determine Google News’ ranking system are some of the company’s best-kept secrets, and for good reason.

“I would love to understand their algorithms...their formula for placement,” writes Bill Doty of Broken Newz. “But if I did, I would probably tweak the way I do things for better standings...horrid.”

“I don't try to game the algorithms,” says J.J. Burgund, the editor of Spoof News. “It would just make my head hurt. Spoof News does what I call 'ripped from the headlines satire,' so I'm not surprised when an article gets good placement. We follow our muse and let the algorithms follow theirs. It works out fine most of the time.”

Indeed, staying current seems to be the best way for satire sources to get hits from Google. “When Janet Jackson's nipple made its public debut the masses were starved for even a lick of information,” writes Brian White of Glossy News. “Some of us got as many as 20,000 visits in a single day just from Google News. We had more provocative headlines, souped-up photos and a promise to pander to their animal lust in ways conventional media wouldn't and couldn't. Nipplegate really made a lot of us reconsider our relationship with Google News and weigh a shift in our format accordingly.”

When Google News first launched in 2003, Bharat summed up everything about the service that was available for public consumption in a Google Friends Newsletter, and explained that Google’s vaunted system of PageRank, which it uses to rank web pages in search, was part of the system used to determine relevance of news items:

PageRank is one of these factors, but the exact mix of determinants is part of our secret sauce and not something we're able to discuss in detail. We can say that Google News also integrates other attributes, such as the recency of the content, to help determine which stories get the most prominence.

Last month, at the World Editors Forum, Bharat again discussed the inner workings of Google News, but added little in the way of details (editorsweblog).

“I woke up one Saturday morning and wrote this dumb joke about a missing marathon runner who turned up in Albuquerque,” writes Spoof News's Burgund. “After I posted it, I found just a few other references to the 'Runaway Bride.' That article made it to the top group of U.S. news. So the next day, I wrote about the Runaway Bride again. The article made it to the lead in U.S. news, right above the New York Times and over 2,000 related articles.”

Ethan Zuckerman, a fellow at Harvard's Berkman Center for Internet and Society, told the Online Journalism Review that he has a theory as to why smaller-time sources get ranked higher than obviously bigger sources on some stories, especially those having to do with people’s names. "I think what you're seeing is an odd little linguistic artifact," he said, explaining that while mainstream news publications refer to people on second reference by their last names, alternative news sites often use the whole name multiple times on purpose. As far as Google News is concerned, this means that the smaller sources are more relevant.

El-Din of Unconfirmed Sources concurs. “We have this thing where we use ‘George W. Bush’ every time we mention the president,” he writes. “For a stretch of several weeks if you typed ‘George W. Bush’ in the search bar, ten of the first twenty stories might be ours. Just now we have two in the first ten. (Not bad for a bunch of subversive lefties.)”

While several satire sites get a large chunk of their readership from Google News, none of them consistently outperform the large news organizations. According to research done by Newsknife, which determines the relevancy of news organizations based on their placement within Google News (down the 10 sub-pages of listings for individual news items), only one satire site, Ridiculopathy, was ranked for relevance, and it came in at number 525 overall.

“I noticed that we were getting used more about six months ago,” Ridiculopathy’s Mark Arenz tells Gelf. “We started getting email from people saying ‘But that’s not what happened,’ or ‘The president didn’t say that.’ Even though the word satire is all over our site, I feel bad for folks...” Arenz pauses, then adds, “The hate mail is half the reason we do it.”

David Goldenberg

David Goldenberg is the co-founder and editor of Gelf, and the host of Geeking Out, Gelf's monthly science speaking series.







Post a comment

Comment Rules

The following HTML is allowed in comments:
Bold: <b>Text</b>
Italic: <i>Text</i>
Link:
<a href="URL">Text</a>

Comments

- Internet
- posted on May 05, 07
Richard Smith

Suggested reading:

The Circus of Medicine. Lima, OH: Wyndham Hall Press.
by Richard Dean Smith

Medical practice received uncharacteristic sanction and public approval during the middle decades of the 20th century. For centuries before and decades since, medicine has been the focus of criticism and distrust. Throngs, droves, herds, flocks, legions of self-appointed critics, pricey ‘healthcare’ consultants, economists (academic and otherwise), on-the-run politicians, and promoters of the healthcare insurance industrial complex brought bizarre exaggerated claims against the medical profession. Nearly everyone has opinions on ‘what’s wrong’ with doctors, hospitals, and medicine—a new / old breed of hangers-on surround medicine; such as, some rabble healthcare consultants throwing dice on a blanket in the hospital parking lot, some misguided influential academic healthcare economists dealing three-card monte on the front sidewalk, some less than objective medical journalists hustling a shell game in the hospital foyer, that is, a pervasive new form of healthcare quackery. A result was the managed care mass medical movement.
Literature of past centuries shows that medicine has always been surrounded by a circus: promoters, jugglers, swindlers, con-men and women, charlatans, imposters, poseurs, humbug artists in the School of Indictment opposed to ethical practicing physicians. Purveyors of not just bad advice, but worse. Robert Morris says: “Some of the most responsible doctors will always be in the hands of financial fakers, and some of the most responsible business men will always be in the hands of medical fakers,” and healthcare consultant fakers. In the 19th century, Worthington Hooker said, “It is folly for the physician to boast that he worships in a temple, upon whose altars no strange fires ever burn, while he looks out with contempt upon what he regards as the almost heathenish observances and worship of the unscientific and unlearned people.”
The “intellectual underpinnings” of managed care form a Noodle’s Oration: a superficially plausible argument consisting entirely of fallacies and logical errors. Oliver Wendell Holmes said, “There is a class of minds much more ready to believe that which is at first sight incredible, and because it is incredible, than what is generally thought reasonable: Credo quia impossibile est.” Historian James Harvey Young says: “The quack has been adept at erecting a beautifully logical structure on the basis of a single false but plausible premise. Numerous educated men have missed the premise, admired the logic, and been trapped.”
In the presence of woodenheadedness that permitted rapid adoption and excited growth of managed care, the mass medical movement of managed care flourished unrestrained. Failure of the popular press, failure of ‘learned’ journals, the fallacy of self-assumed authority of the healthcare consulting industrial complex, and lack of responsible inquiry by the academic community (whose tenure intended to prevent such atrocities against human reason) supported the irrational, foolish movement into managed care: “Knaves there will always be, and fools—whatever the justification for their folly—and, therefore, pseudo-medical deception.” The self-trumpeter’s fallacy became the creed and modus operandi of the managed care mass medical movement.
While medical practice is solemn and serious, humor and satire may neutralize madness of the mass movement of managed care: satire consists: “not of mirth, but of the intense and even painful sense of the absurd.…where it is vice rather than folly that is the target, or folly so noxious as to amount to vice—and provoking reactions that vice engenders but not mere folly—we are in the presence of Satire,” and thus, the modern Circus of Medicine.

Links:
wyndhamhallpress.com
richardsmithmd.com


- Internet
- posted on Aug 04, 07
writing fiend

Excellent reporting. Thank you David for bringing this issue to the forefront.

- Internet
- posted on Oct 11, 12
university essays to buy

You know that you can to detect the smashing idea just about writing service and about this good post in the professional essay writing services. Thence, you can to try out this.


Article by David Goldenberg

David Goldenberg is the co-founder and editor of Gelf, and the host of Geeking Out, Gelf's monthly science speaking series.

Learn more about this author






Newsletter

Hate to miss out? Enter your email for occasional Gelf news flashes.

Merch

Gelf t-shirt

The picture is on the front of the shirt, the words are on the back. You can be in between.