ketan agrawal

Neural networks like to “cheat”
Last modified on April 08, 2022

artificial intelligence

Links to “Neural networks like to ”cheat“”

Counterfactual Generative Networks

Neural networks like to “cheat” by using simple correlations that fail to generalize. E.g., image classifiers can learn spurious correlations with texture in the background, rather than the actual object’s shape; a classifier might learn that “green grass background” => “cow classification.”

This work decomposes the image generation process into three independent causal mechanisms – shape, texture, and background. Thus, one can generate “counterfactual images” to improve OOD robustness, e.g. by placing a cow on a swimming pool background. Related: generative models counterfactuals

CS224u: Natural Language Understanding (Introduction > Limitations)

  • NLU systems are easily “confused.”
  • Models don’t “know what the world is like.” e.g., GPT-3 doesn’t really…know what a cat is. Image-captioning models show that they don’t know what the world is like.
  • Systems can encourage self-harm.
  • Systems are vulnerable to adversarial attacks.
  • Social biases are reflected in NLP models.
  • Observing diminishing returns in ever-larger language models.
  • Neural networks like to “cheat”. NLI models figure out how to predict the correct relation between premise-hypothesis through some superficial correlation.

Question someone asked: “Is there any case where symbolic approaches definitely would be used over neural nets?”

Answer: Mental health chatbot. Don’t want it saying harmful things to users. Other safety-critical situations. etc. Also– sometimes a mixture of symbolic and neural approaches, e.g. how in Google Translate, they may tack on a logical rule that attempts to correct for gender biases in neurally-produced translations.