生命之风的低语
Whispers in the Wind of Life.

层级分类(续)

2025-11-12 14:29:30

一、上回介绍

承接上回多级分类留下的困惑,这一次更新分享一些取得实质的进展! 在上一篇博客中,我尝试建立这样一个结构的多层次化的分类模型: 然而实际上我构建的是一个这样的一个分类模型: 虽然最终的分类效果还不错,但我也清楚很大程度上是依赖于利用了训练好的VGG16模型,以及当前分类的数目相对较少。这样的实现是难以符合实际工程需求的,因为实际上很多情况下要分类的类别数目是成千上万的,这样简单的分类方式再加之以百万级别的数据,效果可想而知。所以很自然而然地会想到如何去实现一个像上上图一样的模型,通过层次化的分类,我认为可以提高整个模型应对更多类别以及更多的数据的能力。之前的疑惑也正在此,如何去实现这么一个层次化的模型?

二、B-CNN(Branch Convolutional Neural Network)

感谢校内老师的点拨!这一次发现了一个好东西,B-CNN(Branch CNN),很大程度上解除了我的疑惑,在此分享一下!

Zhu X , Bain M . B-CNN: Branch Convolutional Neural Network for Hierarchical Classification[J]. 2017.

2.1 B-CNN结构

接下来先通过简单分享几个论文的关键片段介绍一下B-CNN的关键构成!首先直接放出结构图!

A possible way to embed a hierarchy of classes into a CNN model is to output multiple predictions along the CNN layers as the data flow through, from coarse to fine. In this case, lower layers output coarser predictions while higher layers output finer predictions.

We name it Branch Convolutional Neural Network (B-CNN) as it contains several branch networks along the main convolution workflow to do predictions hierarchically.

即论文中认为要实现一个层次化的分类结构,使其能够完成从粗类到细类的分类功能,则应该让模型的lower layers输出粗类(大类)的分类结果而higher layer输出细类(小类)的分类结果。而该模型命名为B-CNN的原因也在于其包含了多个输出分支(如上图)以实现层次化的分类。

A B-CNN model uses existent CNN components as building blocks to construct a network with internal output branches. The network shown at the bottom in Figure 1a is a traditional convolutional neural network. It can be an arbitrary ConvNet with multiple layers. The middle part in Figure 1a shows the output branch networks of a B-CNN. Each branch net produces a prediction on the corresponding level in the label tree (Figure 1b, shown in same color). On the top of each branch, fully connected layers and a softmax layer are used to produce the output in one-hot representation. Branch nets can consist of ConvNets and fully connected neural networks. But for simplicity, in our experiments, we only use fully connected neural networks asour branch nets.

简要概括一下,即论文中所构建的B-CNN结构,在图中水平推进的方向上,(即从input到最后的fine prediction)大多使用的都是ConvNet构成的block,而其中垂直分支出来的部分,为了实验的方便则采用的是全连接层进行预测输出。

2.2 Loss Function&Loss Weight

When the image is fed into B-CNN, the network will output three corresponding predictions as the data flow through and each level’s loss will contribute to the final loss function base on the loss weights distribution.

论文中同样提到了Label-Tree这样一个概念,即针对不同层次的分类数目,由Coarse 1到最终的Fine(Coarse 3)拥有三种不同one-hot标签。而当一张图片输入进该模型后,整个模型会相应输出三个与层次对应的预测向量,然后每一个层次输出的loss将对应一个loss weight(损失权重),而三个层次的loss则根据各自的weight合并构成最终模型的loss。

这里附上论文中所定义的损失函数。简要的解释一下,其中K代表的是不同的层次数,A代表的是损失权重(Loss weight),而最后那个logxxx则是交叉熵损失(Cross Entropy Loss)。这样的一个损失函数能够考虑到每一个层次的loss及它们各自对最终loss的贡献影响。

2.3 Branch Training Strategy-分支训练策略

他来了他来了,他带着策略走来了。我认为又让我眼前一亮的就是论文中提出的针对这个分支模型的训练策略。那么具体来说是怎么样一个策略呢? 论文中提出,通过调整各层次的loss weight,可以使模型在不同训练阶段(即epoch数目)的训练有所侧重(focus)。简单来说,比如一个两层的模型,起初损失权重的分配可以是[0.9,0.1],即让模型focus于第一个层次的学习,而过了比如30个epoch后,权重分配可以是[0.2,0.8],这样模型又将focus于第二个层次的学习训练。

This procedure requires the classifier to extract lower features first with coarse instructions and fine tune parameters with fine instructions later. It to an extent prevents the vanishing gradient problem which would make the updates to parameters on lower layers very difficult when the network is very deep.

如论文所阐述的,这样一个有所侧重分支训练策略具有可解释性,并且能够一定程度上减少深度学习常面对的梯度消失问题。

三、实际实现

ok接下来就针对B-CNN结合之前的代码和论文代码进行实现!这里要说明一下原论文代码在这里->B-CNN 首先是一些必要的包以及预设的一些值如下:

import keras

import numpy as np

import os

import cv2

import tensorflow as tf

from keras.models import Model

from keras.layers import Dense, Dropout, Activation, Flatten

from keras.layers import Conv2D, MaxPooling2D, Input

from keras.initializers import he_normal

from keras import optimizers

from keras.callbacks import LearningRateScheduler

from keras.layers.normalization import BatchNormalization

from keras.utils.data_utils import get_file

from keras import backend as K

import matplotlib

from keras.preprocessing.image import ImageDataGenerator

from keras.optimizers import Adam

from keras.preprocessing.image import img_to_array

from sklearn.preprocessing import MultiLabelBinarizer

from sklearn.model_selection import train_test_split

Image_width=150

Image_height=150

Image_channels=3

Image_size=(Image_width,Image_height)

Image_shape=(Image_width,Image_height,Image_channels)

batch_size=15

#不同层次的损失权重

alpha = K.variable(value=0.8, dtype="float32", name="alpha") # A1

beta = K.variable(value=0.2, dtype="float32", name="beta") # A2

#用于根据不同训练阶段调整学习率

def scheduler(epoch):

learning_rate_init=0.0003

if epoch>20:

learning_rate_init=0.0005

if epoch>30:

learning_rate_init=0.0001

return learning_rate_init

#用于调整不同层次的损失权重

class LossWeightsModifier(keras.callbacks.Callback):

def __init__(self, alpha, beta):

self.alpha = alpha

self.beta = beta

def on_epoch_end(self, epoch, logs={}): # focus

if epoch == 10:

K.set_value(self.alpha, 0.8)

K.set_value(self.beta, 0.2)

if epoch == 25:

K.set_value(self.alpha, 0.1)

K.set_value(self.beta, 0.9)

if epoch == 40:

K.set_value(self.alpha, 0)

K.set_value(self.beta, 1)

接下来一步是上回提到的,就是将label分别按层次处理成对应的one-hot编码以及划分训练集和验证集,详细可以对应上回。

#输入训练文件路径,返回数据array以及labels

def prepare(path):

fileList = os.listdir(path) # 待修改文件夹

data=[]

labels1=[]

labels2=[]

for fileName in fileList:

image=cv2.imread(os.path.join(path,fileName))

if image is not None:

std_image=tf.image.per_image_standardization(image)#图片normalization

image2=cv2.resize(std_image.numpy(),Image_size)

image2=img_to_array(image2)

data.append(image2)

label1=str(fileName).split("_")[0:1]

label2=str(fileName).split("_")[1:2]

labels1.append(label1)

labels2.append(label2)

data=np.array(data,dtype="float32")

return data,labels1,labels2

path=r"./DATA/train"

data,labels1,labels2=prepare(path)

mlb1=MultiLabelBinarizer()#用于生成one-hot编码

mlb2=MultiLabelBinarizer()

labels1_onehot=mlb1.fit_transform(labels1).astype('float32')#one-hot编码

labels1_num=len(mlb1.classes_)

labels2_onehot=mlb2.fit_transform(labels2).astype('float32')

labels2_num=len(mlb2.classes_)

label=[]

for i in range(labels2_onehot.shape[0]):

lab=[]

lab.append(labels1_onehot[i])

lab.append(labels2_onehot[i])

label.append(lab)#创建自定义标签

(trainX,testX,trainY,testY)=train_test_split(data,label,test_size=0.2,random_state=42)

然后构建一个自定义的Generator用于数据增强,与上回提到的方法一样:

train_datagen = ImageDataGenerator(#用于数据增强

rotation_range=15,

shear_range=0.1,

rescale=1./255,

zoom_range=0.2,

horizontal_flip=True,

width_shift_range=0.1,

height_shift_range=0.1

)

validate_datagen = ImageDataGenerator(

rescale=1./255

)

def data_generator(generator,images,labels,batch_size):#自定义generator,输出多标签

num_samples=len(images)

input_generator=generator.flow(images,labels,batch_size=batch_size)

while True:

for offset in range(0,num_samples,batch_size):

batch_samples,batch_labels=input_generator.next()

X_train=[]

y1_train=[]

y2_train=[]

for i in range(len(batch_samples)):

img=batch_samples[i]

labels1=batch_labels[i][0]

labels2=batch_labels[i][1]

X_train.append(img)

y1_train.append(labels1)

y2_train.append(labels2)

X_train=np.array(X_train)

y1_train=np.array(y1_train)

y2_train=np.array(y2_train)

yield X_train,[y1_train,y2_train]

接下来是模型的搭建,其中包含了一个Coarse分支(对应我原先 交通工具-动物的大类),以及一个Fine分支细分。

#------------------model----------------------

img_input = Input(shape=Image_shape, name='input')

#--- block 1 ---

x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)

x = BatchNormalization()(x)

x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)

x = BatchNormalization()(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

#--- coarse branch ---

c_1_bch = Flatten(name='c1_flatten')(x)

c_1_bch = Dense(256, activation='relu', name='c1_fc2')(c_1_bch)

c_1_bch = BatchNormalization()(c_1_bch)

c_1_bch = Dropout(0.5)(c_1_bch)

class_one_classify = Dense(labels1_num, activation='softmax', name='class_one')(c_1_bch)

#--- block 3 ---

x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)

x = BatchNormalization()(x)

x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)

x = BatchNormalization()(x)

x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

#--- fine block ---

x = Flatten(name='flatten')(x)

x = Dense(256, activation='relu', name='fc_cifar10_1')(x)

x = BatchNormalization()(x)

x = Dropout(0.5)(x)

class_two_classify = Dense(labels2_num, activation='softmax', name='class_two')(x)

model = Model(input=img_input, output=[class_one_classify, class_two_classify], name='branch-CNN')

sgd = optimizers.SGD(lr=0.003, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy',

optimizer=sgd,

loss_weights=[alpha, beta],

metrics=['accuracy'])

接下来就是对Keras又进一步的了解,即如何使用callbacks实现可控制的训练过程:

change_lr = LearningRateScheduler(scheduler)#可控制学习率变化

change_lw = LossWeightsModifier(alpha, beta)#控制损失权重的变化

cbks = [change_lr, change_lw]

total_validate=len(testY)#总验证数

total_train=trainX.shape[0]#总训练数

history=model.fit_generator(

data_generator(train_datagen,trainX,trainY,batch_size),

epochs=60,

validation_data=data_generator(validate_datagen,testX,testY,batch_size),

validation_steps=total_validate//batch_size,

steps_per_epoch=total_train//batch_size,

callbacks=cbks,

verbose=1,

)

发现callbacks是一个很有趣的东西,通过callbacks可以动态调整训练参数、提早结束训练等让我觉得有意思的操作。打算接下来再仔细学习一下关于callbacks的使用技巧,以深化对keras的运用嘿嘿。 那么以上即为根据B-CNN对于层级分类的具体实现啦!

四、实验结果

囿于设备有限,我暂时还没调整出能得到比较好结果的参数。不过接下来会慢慢进行调整,我还是挺相信这个结构能带来比较好的成果的!待之后调试出比较好的参数再放结果上来嘿嘿。

五、接下来目标

基本上来说,对于自己探索的这一整个流程,已经经历了从数据搜集、数据预处理、模型构建、模型优化这些个过程,接下来要做的就是模型的部署。这些天在家其实捣鼓了不少,接下来的一篇应该是记录一下部署的成果!