Tensorflow笔记(week&nbsp;0)

Tensorflow笔记(week 0)

Getting start

Tensorflow 是基于 graph 的并行计算模型。关于 graph 的理解可以参考官方文档。
更多的简介可以参考一篇我觉得写得 okay 的博客。

Basic Calculation

首先，我们尝试去编写一段代码，运用 tensorflow 去完成简单的四则运算。

const = tf.constant(2.0, name='const2')

b = tf.placeholder(dtype=tf.float32, name='b')
c = tf.Variable(1.0, dtype=tf.float32, name='c')

d = tf.add(b, c, name='d')
e = tf.subtract(c, const, name='e')

a = tf.multiply(d, e, name='a')

init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init_op)
    res = sess.run(a, feed_dict={b: np.arange(0, 10, 2)})
    print("The result is: {}".format(res))

在此处，我们执行的运算为 a = (b+c) * (c-const)，输出结果如下:

The result is: [-1. -3. -5. -7. -9.]
实际上，使用 Tensorflow 来完成这种简单的运算的效率是非常低的，但是作为一个简单的入门例子，这段代码还是有一点价值的。在这段代码中，我们先使用了 tf.constant() 定义了一个常量 const，用 tf.placeholder() 定义了一个占位值，用 tf.Variable() 定义了一个变量。其中，占位值的意义类似于参数，可以运行我们在运行模型的时候传入数值。之后的 a、d、e 是普通的四则运算，Tensorflow 其实有重载这些简单的运算符，但是网上的推荐做法是依然调用函数。接下来，我们需要注意的是 tf.global_variables_initializer()，这行代码的官方解释是 – 初始化模型的参数，返回一个用来初始化 graph 中所有 global variable 的 op，这种描述其实比较迷幻，但是我们不需要太过纠结这个函数的意义，它更类似于一种机制。最后，我们通过调用 tf.Session() 来运行模型，我们可以通过 feed_dict 来传入参数，在这里，我用了一个循环来测试多组数据。

Matrix Calculation

我们使用 python，多数遇到的情况都是矩阵运算，在这里，我们也稍微测试一下 Tensorflow 的矩阵运算。

a = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=tf.float32)
b = tf.Variable([[49, -231, 0.43], [12, 5.12, -91.1], [321, 95, 6.4]], dtype=tf.float32)

c = tf.matmul(a, b)

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    res = sess.run(c)
    print(res)

整体来说，这也是一段很简单的代码，唯一需要注意的是，计算矩阵乘法需要用到 tf.matmul()。

Simple Neutral Network

这篇笔记的重点在于处理一个简单的图像分类问题，我们使用的数据集是比较有名的 MNIST，有关这个数据集的详细内容可以到它的官网上查看。实际上，Tensorflow 有内嵌了这个数据集，只需要引用 tensorflow.examples.tutorials.mnist，Tensorflow 就可以帮我们下载这个数据集，并提供了一系列实用的方法，具体内容可以在网上搜索。在这里，我参考了一篇博客，自己将数据集解析成 numpy 矩阵，至于原理的话，在参考的博客上有比较详细的说明。之后，我们就结合代码来分析一下如何使用 Tensorflow 来训练一个简单的模型。

import os
import struct
import numpy as np

import matplotlib.pyplot as plt

import tensorflow as tf


def load_mnist(path, kind='train'):
    """Load MNIST data from 'path'"""
    label_path = os.path.join(path, '%s-labels.idx1-ubyte' % kind)
    image_path = os.path.join(path, '%s-images.idx3-ubyte' % kind)

    with open(label_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II', lbpath.read(8))
        labels = np.fromfile(lbpath, dtype=np.uint8)

    with open(image_path, 'rb') as imgpath:
        magic, n, rows, cols = struct.unpack('>IIII', imgpath.read(16))
        images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(labels), 784)

    return images, labels


def display(images, labels, n=1):
    """Display n group(s) of images"""
    for i in range(n):
        fig, ax = plt.subplots(nrows=2, ncols=5, sharex='all', sharey='all')
        ax = ax.flatten()
        for j in range(10):
            img = images[labels == j][i].reshape(28, 28)
            ax[j].imshow(img, cmap='gray', interpolation='nearest')
        ax[0].set_xticks([])
        ax[0].set_yticks([])
        plt.tight_layout()
        plt.show()


# Main Function
x_train, y_train = load_mnist('/Users/Eddie/Desktop')
x_test, y_test = load_mnist('/Users/Eddie/Desktop', 't10k')
# display(x_train, y_train)

# Parameters
alpha = 1.0
epochs = 25
batch_size = 100

x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

# Weight, Bias
w1 = tf.Variable(tf.random_normal([784, 392], stddev=0.03), name='w1')
b1 = tf.Variable(tf.random_normal([392]), name='b1')
w2 = tf.Variable(tf.random_normal([392, 10], stddev=0.03), name='w2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')

# Hidden Layer
hidden_out = tf.nn.sigmoid(tf.add(tf.matmul(x, w1), b1))

# Output
out = tf.nn.softmax(tf.add(tf.matmul(hidden_out, w2), b2))

# Loss
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y*tf.log(out) + (1-y)*tf.log(1-out), axis=1))
optimizer = tf.train.AdadeltaOptimizer(learning_rate=alpha).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(out, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Initialization
init = tf.global_variables_initializer()

# One Hot
y_train_one_hot = tf.one_hot(y_train, 10, 1, 0)
y_test_one_hot = tf.one_hot(y_test, 10, 1, 0)

# Session
with tf.Session() as sess:
    sess.run(init)
    y_train = sess.run(y_train_one_hot)
    y_test = sess.run(y_test_one_hot)
    total_batch = len(y_train) // batch_size
    for epoch in range(epochs):
        cost = 0.0
        for i in range(total_batch):
            batch_x = x_train[i*batch_size:(i+1)*batch_size]
            batch_y = y_train[i*batch_size:(i+1)*batch_size]
            res, c = sess.run([optimizer, cross_entropy], feed_dict={x: batch_x, y: batch_y})
            cost += c / total_batch
        print("Epoch:", (epoch+1), "cost =", "{:.3f}".format(cost))
    print(sess.run(accuracy, feed_dict={x: x_test, y: y_test}))

首先，我们自己定义 load_mnist() 函数，其第一个参数为文件路径，第二个参数为文件名，我们默认读入的文件名为 train。而数据的读入需要考虑到数据的实际存放方式，更详细的内容可以参考之前提到的博客。值得注意的是，函数返回两个参数，第一个为图像矩阵，第二个为对于的类标，而类标的格式并不是比较常规的 one hot，而是一个属于 [0, 9] 的十进制数。
display() 函数用于显示我们导入好的数据，前两个参数显而易见，最后一个参数代表显示图片的组数，同一种字迹的 10 个数字为一组。在显示图片的时候，我们需要用到 matplotlib 库，这是一个模仿 matlab 处理图片方式的 py 库。
对于训练模型的部分，我们首先需要定义神经网络的超函数，这里包括: 学习率(alpha)、迭代次数(epochs)、数据批容量(batch_size)。我们需要传入的参数为 [None, 784] 维的特征与 [None, 10] 维的预测类标。但是，我们的类标并不是以 one hot 格式储存，因此，我们需要调用 tf.one_hot() 对类别进行预处理。其中，tf.one_hot() 的第二个参数表示类别的总数。接着，我们通过 tf.random_normal() 来生成神经网络 weight 与 bias 的初始值，在这里，我们只使用了一个单隐层、隐层包含 392 个神经元的模型。对于激活函数，我们使用了比较常规的 sigmoid 方程，原因是 sigmoid 的输出值属于 [0, 1]，比较方便套用逻辑回归的代价函数。同时，为了使模型更容易收敛，我们使用 AdadeltaOptimizer 来使学习率能够在训练过程中自适应。
最后，留意代码的 Session 部分，在实现分批的时候其实使用了一个比较暴力的方法，实际上，应该还有提升效率的空间。由于中秋还有一些其他事情要处理，就不打算再做一个总结了，具体的训练结果如下:

Epoch: 1 cost = 0.869
Epoch: 2 cost = 0.437
Epoch: 3 cost = 0.362
Epoch: 4 cost = 0.315
Epoch: 5 cost = 0.282
Epoch: 6 cost = 0.257
Epoch: 7 cost = 0.235
Epoch: 8 cost = 0.215
Epoch: 9 cost = 0.200
Epoch: 10 cost = 0.186
Epoch: 11 cost = 0.171
Epoch: 12 cost = 0.161
Epoch: 13 cost = 0.148
Epoch: 14 cost = 0.139
Epoch: 15 cost = 0.131
Epoch: 16 cost = 0.122
Epoch: 17 cost = 0.115
Epoch: 18 cost = 0.108
Epoch: 19 cost = 0.100
Epoch: 20 cost = 0.094
Epoch: 21 cost = 0.089
Epoch: 22 cost = 0.083
Epoch: 23 cost = 0.079
Epoch: 24 cost = 0.074
Epoch: 25 cost = 0.070
0.9705