Meditations on programming, startups, and technology
New Relic

Google Translate’s bug and Google Suggest’s racial oddity

Google Translate

You may have heard about Google launching their AJAX Language API. Translations on the fly via Javascript: sweet! Google Translate is not that bad, usually. It still messes up quite a few things in translation, but overall it’s still pretty acceptable.

Google uses statistical learning techniques, as opposed to a rule-based approach. From their FAQs:

Most state-of-the-art, commercial machine-translation systems in use today have been developed using a rule-based approach, and require a lot of work to define vocabularies and grammars.

Our system takes a different approach: we feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We’ve achieved very good results in research evaluations.

It’s a very hard problem to solve and the quality can be so-so at times. However, I’m going to unveil the most ridiculous bug I’ve ever encountered using this system. Do you notice anything strange in the following translation from German to English first, and from German to French second?

GERMAN: Output: 4 – 600 Ohm Made in Austria!! Funktionstüchtig! Die Kopfhörer haben einen Spitzen Sound der unverfälscht wieder gegeben wird!! Die Qualität der Kopfhörer ist einfach Spitze.

ENGLISH: Output: 4 – 600 ohms Made in USA! Funktionstüchtig! The headphones have a peak sound of the genuine will be given again! The quality of the headphones is simple tip.

FRENCH: Output: 4 – 600 Ohm Made in France! Fonctionne! Les casques ont un peu de son authentique sera à nouveau! La qualité des écouteurs est facile de pointe.

Clearly you should see an issue here. In case you don’t, I’ll be more explicit:

madeinusa.png

“Google Translate” sometimes changes the country mentioned within the source language to the main country of the translation language. That’s a pretty big bug they have right there. Certain terms should be translated verbatim using dictionary mapping, especially something as simple and hardcoded as countries.

Thanks to my friend Ludo who noticed this bug.

Google Suggest’s racial oddity

While we are on the topic of Google bugs and anomalies, I’ll add a small oddity to the mix. I must prefix this part of my post by clarifying that I respect all ethnicities and colors and have good friends from all over the place. I am against racism, but not against discussions about racism. I won’t publish anyone’s racist or offensive comments, be warned. What this post does is merely point out Google Suggest’s selective behavior, which of course gets picked up by Firefox’s Google search box in the top corner, too.

Google suggestions are based on the number of queries received and the number of results for any given query. This means that entering words in Google Suggest will reveal the most likely queries starting with that given term(s). In Google’s own words:

Our algorithms use a wide range of information to predict the queries users are most likely to want to see.

For example, if I write “money is”, Google will suggest: “money is the root of all evil”, “money is debt”, “money is power”, money is everything” and so on.

I’m an Italian programmer, so I tried “programmers are” and got the hilarious suggestion that “programmers are lazy”. :) Alright, what about “Italians are”? Here are the results:

italians.png

Some people are racist, that’s nothing new. These are stereotypes, for and against Italians, and this shouldn’t surprise anyone. And you can’t really blame Google either for what people have been typing in the most. Google is suggesting, automatically, based on the most popular queries. Okay, that’s for Italians. What about other nationalities? The most common stereotypes are all well represented. Americans, French, German, Spanish, Chinese, Indians, etc… What about “whites” in general?

whites.png

Sad, I know. The picture doesn’t change too much if you are looking for “Christians are”, “Muslims are”, “Jews are”, “gays are”, “Cops are”, “men are”, “women are” and so on.

Google won’t suggest anything if the queries are not popular enough. This means that “Caucasians are” is not going to yield any suggestions, but “Caucasians” (alone) will. Google could do a couple of things: either blacklist the few dozen racial terms which are popular enough to show up in the suggestions, or simply decide that by policy, suggestions are automated and therefore, if you are looking for stupid queries based on race, you shouldn’t get offended by the suggestions that you receive back.

A few years ago there used to be very reprehensible suggestions against black people, as one would expect given the results for the other ethnicities and nationalities, and the racism that unfortunately still exists today. A while ago though, Google did something rather odd. They removed “blacks” from the list, possibly after receiving complaints, and left everyone else in the suggestion engine. If you search for “blacks are” you won’t find any suggestions. And I’m pretty sure it’s still as popular as it ever was, and just as queries containing “whites are”, “Greeks are” or “Christians are” are. On top of that, even if you just search for “blacks” the engine will not suggest anything. To further convince you, even if you were to search for an unusual term like “purples” you’d still get two suggestions: “purples 80s” and “purples wxsand”. If this exclusion was the right thing to do, then Google should do the same for other groups as well. If it wasn’t, then why favor only one group?

I don’t know if we should consider this a form of “selective racism”, but it’s odd and I thought I’d point it out even if the subject is very delicate and risky. If you think about it, it’s not even a racial problem, it’s more of a question of how to make software engineering decisions that properly and equally handle potentially offensive outcomes for some of your users.

No related posts.


If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

receive my posts by email

9 Responses to “Google Translate’s bug and Google Suggest’s racial oddity”

  1. Racist or just a bug in the google code?

    more like a bug to me

  2. @Filepromptguy: it’s extremely unlikely that it’s a bug. It used to work just as it did for all the other groups and then suddenly stopped working only for one of them. The fact that “blacks” without the word “are” doesn’t prompt anything either gives away that it’s highly likely been intentionally blocked. Anyway, it’s an oddity I wanted to point out, but it’s not a big deal. :)

  3. friism says:

    The racial/religious nastiness is not new and Google provides an explanation:
    http://www.google.com/explanation.html

  4. @friism: I’m aware of their official explanation as to why nastiness ends up in search suggestions. Their message makes perfect sense. However, there is a clear lack of intervention when it comes to those words associated with pretty much all the major religious beliefs, nationalities, and ethnicities except for the one I described.

  5. mind says:

    “<various racial slurs> are” shows nothing as well. i’m sure there’s just a list of words somewhere that autocomplete is disabled for. ‘blacks’ must be on this list, while ‘whites’ is not. you’d think they would add all of the non-offensive terms because they can become offensive based on context.

    then again, fuck people getting offended. (i feel dirty after talking about programmatic ways to keep people from seeing certain words)

  6. @mind: yes, they most likely have a list in which they added “blacks” (and racial slurs) but not “jews” or people from different ethnicities. It would be nicer to implement a better filter that blacklists other racist/sexist offensive suggestions, but it’s nothing vital (it wouldn’t be censorship or anything like that, given that the results would be there, but not the autocomplete). On the other hand, having no blacklist at all and just allow the automated system to give suggestions would be acceptable too, if specified in Google’s policy. As I said, I wanted to point this out but I don’t think it’s a big deal at all.

  7. T. says:

    Garbage in garbage out.
    It is not a bug, it is a problem with the input data.

  8. User says:

    Of course, this word-based blacklisting suffers from the usual problem of distinguishing the racial slur from other phrases that happen to contain a homonym. Looks like Google won’t be giving me any contextual suggestions about popular searches on the New Zealand national rugby team!

  9. [...] Antonio Cangiano and [...]

  10. JR says:

    Have you seen this? Pretty hilarious if you ask me. http://www.blahblahfish.com

Leave a Reply

I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.

Copyright © 2005-2012 Antonio Cangiano. All rights reserved.