深度学习-循环神经网络(RNN)

作者: 明天依旧可好
QQ交流群: 807041986

注:关于深度学习的相关问题,若本文未涉及可在下方留言告诉我,我会在文章中进行补充的。

原文链接:https://mtyjkh.blog.csdn.net/article/details/111088248
深度学习系列:深度学习(TensorFlow 2)简单入门
代码|数据: 微信公众号(明天依旧可好)中回复:深度学习

导入数据

import pandas as pd
import tensorflow as tf
import os

df = pd.read_csv("Tweets.csv",usecols=["airline_sentiment","text"])
df

# categorical 实际上是计算一个列表型数据中的类别数,即不重复项,
# 它返回的是一个CategoricalDtype 类型的对象,相当于在原来数据上附加上类别信息 ,
# 具体的类别可以通过和对应的序号可以通过 codes 和 categories 
df.airline_sentiment = pd.Categorical(df.airline_sentiment).codes
df

建立词汇表

import tensorflow_datasets as tfds
import os

tokenizer = tfds.features.text.Tokenizer()

vocabulary_set = set()
for text in df["text"]:
    some_tokens = tokenizer.tokenize(text)
    vocabulary_set.update(some_tokens)

vocab_size = len(vocabulary_set)
vocab_size
''' 输出: 18027 '''

样本编码(测试)

encoder = tfds.features.text.TokenTextEncoder(vocabulary_set)
encoded_example = encoder.encode(text)
print(encoded_example)
''' text为: '@AmericanAir we have 8 ppl so we need 2 know how many seats are on the next flight. Plz put us on standby for 4 people on the next flight?' 输出: [12939, 13052, 13579, 11267, 14825, 8674, 13052, 12213, 12082, 12156, 5329, 5401, 10099, 3100, 7974, 7804, 5671, 2947, 9873, 7864, 9704, 7974, 3564, 11759, 15266, 11250, 7974, 7804, 5671, 2947] '''

将文本编码成数字形式

df["encoded_text"] = [encoder.encode(text) for text in df["text"]]
df

train_x = df["encoded_text"][:10000]
train_y = df["airline_sentiment"][:10000]
test_x = df["encoded_text"][10000:]
test_y = df["airline_sentiment"][10000:]

from tensorflow import keras
train_x = keras.preprocessing.sequence.pad_sequences(train_x,maxlen=50)
test_x = keras.preprocessing.sequence.pad_sequences(test_x,maxlen=50)

train_x.shape,train_y.shape,test_x.shape,test_y.shape
''' 输出: ((10000, 50), (10000,), (4640, 50), (4640,)) '''

构建模型

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size+1, 64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.summary()
''' 输出: Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_2 (Embedding) (None, None, 64) 1153792 _________________________________________________________________ bidirectional_2 (Bidirection (None, 128) 66048 _________________________________________________________________ dense_4 (Dense) (None, 64) 8256 _________________________________________________________________ dense_5 (Dense) (None, 1) 65 ================================================================= Total params: 1,228,161 Trainable params: 1,228,161 Non-trainable params: 0 _________________________________________________________________ '''

激活

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])

训练模型

history = model.fit(train_x,
                    train_y,
                    epochs=20,
                    batch_size=200,
                    validation_data=(test_x, test_y),
                    verbose=1)
''' 输出: Epoch 1/20 50/50 [==============================] - 6s 117ms/step - loss: -4.8196 - accuracy: 0.6652 - val_loss: -0.7605 - val_accuracy: 0.7071 ...... Epoch 19/20 50/50 [==============================] - 6s 123ms/step - loss: -37.5176 - accuracy: 0.7586 - val_loss: -9.0619 - val_accuracy: 0.7272 Epoch 20/20 50/50 [==============================] - 6s 120ms/step - loss: -40.0017 - accuracy: 0.7611 - val_loss: -7.7479 - val_accuracy: 0.7248 '''

本文地址:https://blog.csdn.net/qq_38251616/article/details/111088248

(0)
上一篇 2022年3月22日
下一篇 2022年3月22日

相关推荐