关于RNN的几点理解
0. LSTM
LSTM cell 单元 的输入
$X={x_1,x_2,...x_n}$, X.shape=(counts,n,...)
编号 | 符号 | 描述 | shape | tf/keras |
---|---|---|---|---|
1. | $C_ | $i-1$ cell 的单元状态,记忆细胞 (Memory Cell) | 初始化, | |
2. | $H_ | $i-1$ cell 的输出值,输出隐含层(Hidden Layer) | shape=(counts,1,units) | |
3. | $x_ | $i$ cell 的输入量 | $x_i$.shape=(counts,1,...) | 数据输入 |
LSTM cell 单元 的输出
编号 | 符号 | 描述 | shape | tf/keras |
---|---|---|---|---|
1. | $C_ | $i$ cell 的单元状态,记忆细胞 (Memory Cell) | 初始化, | |
2. | $H_ | $i$ cell 的输出值,输出隐含层(Hidden Layer) | shape=(counts,1,units) |
关于门
遗忘门的功能是决定应丢弃或保留哪些信息
输入门(更新门) 用于更新细胞状态。
输出门 用来确定下一个隐藏状态的值
gate 门 | 范围 |
---|---|
$\Gamma_f^ | (0,1) |
$\Gamma_i^ | (0,1) |
$\Gamma_o^ | (0,1) |
- 遗忘门 Forget gate
$$\Gamma_f^{\langle t \rangle} = \sigma(W_f[h^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f)\tag{1} $$
$b_f$ :遗忘门的偏执,在tensorflow 中 体现为:tf.nn.rnn_cell.BasicLSTMCell(num_units,forget_bias=1.0)
$h^{\langle t-1 \rangle}$: 上一个cell 的 hidden state
$W_f$: 权重 recurrent_initializer对应的权重
$x^{\langle t \rangle}$: 当下输入
参数 num_units=128 的话,对于公式 (1)
参数 | 维度 |
---|---|
$h^ | shape=(128,1) |
$x^ | shape=(28,1) |
$[h^ | shape=(28,1)+(128,1)=(156,1) |
$W_f$ | shape=(128,156) |
$b_f$ | shape=(128,1) |
由于激活函数的存在 $0<=\Gamma_f^{\langle t \rangle}<=1$
如果 $\Gamma_f^{\langle t \rangle}==0$ 意味着 当前 LSTM cell 将忽略 上一个cell 传来的 $h^{\langle t-1 \rangle}$
如果 $\Gamma_f^{\langle t \rangle}==1$ 意味着 当前 LSTM cell 将保存 上一个cell 传来的 $h^{\langle t-1 \rangle}$
- 更新门/输入门 Update gate/Inuput gat
$$\Gamma_u^{\langle t \rangle} = \sigma(W_u[h^{\langle t-1 \rangle}, x^{{t}}] + b_u)\tag{2} $$
Similar to the forget gate, here $\Gamma_u^{\langle t \rangle}$ is again a vector of values between 0 and 1. This will be multiplied element-wise with $\tilde{c}^{\langle t \rangle}$, in order to compute $c^{\langle t \rangle}$.
- 更新细胞 Updating the cell (基于 更新门和遗忘门)
To update the new subject we need to create a new vector of numbers that we can add to our previous cell state. The equation we use is:
$$ \tilde{c}^{\langle t \rangle} = \tanh(W_c[h^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c)\tag{3} $$
Finally, the new cell state is:
$$ c^{\langle t \rangle} = \Gamma_f^{\langle t \rangle} c^{\langle t-1 \rangle} + \Gamma_u^{\langle t \rangle} \tilde{c}^{\langle t \rangle} \tag{4} $$
tanh 对应
- 输出门 Output gate
To decide which outputs we will use, we will use the following two formulas:
$$ \Gamma_o^{\langle t \rangle}= \sigma(W_o[h^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o)\tag{5}$$
$$ h^{\langle t \rangle} = \Gamma_o^{\langle t \rangle}* \tanh(c^{\langle t \rangle})\tag{6} $$
LSTM 变体
1. LSTMCell 实例
Keras LSTM 参数解析
tf.keras.layers.LSTM tf.layers.Layer |_tf.keras.layers.Layer |_tf.keras.layers.RNN |_tf.keras.layers.LSTM __init__( units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, **kwargs ) __call__( inputs, initial_state=None, # a tensor or list of tensors representing the initial state of the RNN layer. constants=None, **kwargs )
- return_sequences=False; return_state=False
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {}".format(x_instance.shape[0])) sess.run(tf.global_variables_initializer()) h_t=sess.run(a,feed_dict={x:x_instance}) print ("h_t shape is {}".format(result.shape)) #cell_num is 2 #x input counts/shape[0] is 4 #result shape is (4, 2)
h_t 是 $h^{\langle t \rangle}$ ,$h^{\langle t \rangle} shape=()$
- return_sequences=True; return_state=False
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num, return_sequences=True, return_state=False)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape)) sess.run(tf.global_variables_initializer()) H_all=sess.run(a,feed_dict={x:x_instance}) print ("H_all shape is {}".format(H_all.shape)) # cell_num is 2 # x input counts/shape[0] is 4,x input timestamp/shape[1] is 1,shape is (4, 1, 3) # H_all shape is (4, 1, 2)
H_all 是 $np.array(h^{\langle t \rangle},h^{\langle t-1 \rangle},h^{\langle t-2 \rangle},...,h^{\langle 1 \rangle})$
- return_sequences=False; return_state=True
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num, return_sequences=False, return_state=True)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape)) sess.run(tf.global_variables_initializer()) H_all, h_t, c_t =sess.run(a,feed_dict={x:x_instance})
此时 H_all==h_t
- return_sequences=True; return_state=True
x=tf.placeholder(dtype=tf.float32,shape=(None,1,3)) cell_num=2 a=LSTM(cell_num, return_sequences=True, return_state=True)(x) print ("cell_num is {}".format(cell_num)) sess=tf.Session() x_instance=np.array([[[1,2,3]],[[1,2,3]],[[1,2,3]],[[1,2,3]]]) print ("x input counts/shape[0] is {},x input timestamp/shape[1] is {},shape is {}".format(x_instance.shape[0],x_instance.shape[1],x_instance.shape)) sess.run(tf.global_variables_initializer()) H_all, h_t, c_t =sess.run(a,feed_dict={x:x_instance})
H_all 是 $np.array(h^{\langle t \rangle},h^{\langle t-1 \rangle},h^{\langle t-2 \rangle},...,h^{\langle 1 \rangle})$
经过初步调查,常用的LSTM层有Keras.layers.LSTM 和 Tensorflow.contrib.nn.LSTMCell 及 Tensorflow.nn.rnn_cell.LSTMCell ,其中后面两个的实现逻辑是一样的。
LSTMCell = tf.contrib.rnn.BasicLSTMCell(num_units) tf.nn.rnn_cell.BasicLSTMCell(num_units) tf.keras.layers.RNN( cell, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, **kwargs ) # 推荐
2. 时间 BBTP 反向传播
对于记忆细胞 M
# h_all_1 shape=[counts,time_steps,cell_num_1] # h_all_2 = LSTM(32, return_sequences=True, # return_state=False, dropout=0, recurrent_initializer='orthogonal', kernel_initializer='orthogonal')(h_all_0) # h_all_1 shape=[counts,time_steps,cell_num_2] # h_all_3 = LSTM(8, return_sequences=False, # return_state=False, dropout=0, recurrent_initializer='orthogonal', kernel_initializer='orthogonal')(h_all_2) # h_all_3 shape=[counts,time_steps,cell_num_2] # lay4 = GlobalMaxPool1D()(h_all_3) # lay3 shape =[counts,cell_num_2] # lay4 = tf.layers.dense(lay3, units=20, activation="relu") # output = activation(dot(input, kernel)+bias) # lay4 shape =[counts,units_nums] # print(" lay4 shape is {}".format(lay4.shape))