A language model computes a probability for a sequence of words:P(w1, w2, … , wT)
P(the cat is small) > P(small the is cat)
P(walking home after school) > P(walking house after school)
Given list of word vectors: x1, x2, … , xt-1, xt, xt+1, … ,xT
At a single time step:
Main Idea use the same set of W weights at all time steps.
Same cross entropy loss function but predicting words instead of classes:
Evaluation could just be negative of average log probability over dataset of size (number of words) T:
bt more common: perplexity: 2^J , so low is better
Goal: Classify each word as
directive subjective expressions(DSEs)
Explicit mentions of private states or speech events expressing private states
expressive subjective expressions(ESEs)
Expressions that indicate sentiment, emotion, etc. without explicitly conveying them.
Problem: For classificatoin you want to incorporate information from words both preceding and following
Each memory layer passes an intermediate sequential representation to the next.
MPQA 1.2 corpus (Wiebe et al., 2005)
consists of 535 news articles (11,111 sentences)
manually labeled with DSE and ESEs at the phrase level
Below is a Bidirectional Recurrent Neural Network(LSTM) implementation using TensorFlow library.
from __future__ import print_function import tensorflow as tf from tensorflow.contrib import rnn import numpy as np # Import MNIST data