Bookmarks for July 9th through July 10th

by danhon

This is an auto-posted collection of public links I’ve either posted to, or favourites from Twitter, my Instapaper bookmarks and my public links posted to for July 9th from 16:01 to 13:41:

  • To Really ‘Disrupt,’ Tech Needs to Listen to Actual Researchers | WIRED
  • [1906.10198] Multimodal and Multi-view Models for Emotion Recognition – Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models. The majority of the studies focus on settings where both modalities are available in training and evaluation. However, in practice, this is not always the case; getting ASR output may represent a bottleneck in a deployment pipeline due to computational complexity or privacy-related constraints. To address this challenge, we study the problem of efficiently combining acoustic and lexical modalities during training while still providing a deployable acoustic model that does not require lexical inputs. We first experiment with multimodal models and two attention mechanisms to assess the extent of the benefits that lexical information can provide. Then, we frame the task as a multi-view learning problem to induce semantic information from a multimodal model into our acoustic-only network using a contrastive loss function. Our multimodal model outperforms the previous state of the art on the USC-IEMOCAP dataset reported on lexical and acoustic information. Additionally, our multi-view-trained acoustic network significantly surpasses models that have been exclusively trained with acoustic features.
  • IMPROVING EMOTION CLASSIFICATION THROUGH VARIATIONAL INFERENCE OF LATENT VARIABLES – "Conventional models for emotion recognition from speech signal are trained in supervised fashion using speech utter- ances with emotion labels. In this study we hypothesize that speech signal depends on multiple latent variables including the emotional state, age, gender, and speech content. We propose an Adversarial Autoencoder (AAE) to perform vari- ational inference over the latent variables and reconstruct the input feature representations. Reconstruction of feature rep- resentations is used as an auxiliary task to aid the primary emotion recognition task. Experiments on the IEMOCAP dataset demonstrate that the auxiliary learning tasks improve emotion classification accuracy compared to a baseline su- pervised classifier. Further, we demonstrate that the proposed learning approach can be used for the end-to-end speech emo- tion recognition, as its applicable for models that operate on frame-level inputs."
  • Using Adversarial Training to Recognize Speakers’ Emotions : Alexa Blogs – A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic. 
  • Amazon’s AI improves emotion detection in voices | VentureBeat
  • Are you really the ‘real’ you? | Life and style | The Guardian
  • JF Ptak Science Books: Deciding to Use the Atomic Bomb: The Chicago Metallurgical Lab Poll, July, 1945 – "In late June 1945, the Interim Committee (a secret, blue chip group established by Secretary of War Stimson with the approval of President Harry S. Truman to examine the problems that could result from the creation of the atomic bomb). decided what exactly to do with the weapons."