如何用DL4J构建起一个人脸识别系统( 二 )


接下来的问题是,dl4j中迁移学习api只能在模型尾部追加相关的结构,而现在我们的场景是把pretrained的模型的部分结构放在中间,怎么办呢?不着急,我们看看迁移学习API的源码,看DL4J是怎么封装的 。在org.deeplearning4j.nn.transferlearning.TransferLearning的build方法中找到了蛛丝马迹 。
public ComputationGraph build() {initBuilderIfReq();ComputationGraphConfiguration newConfig = editedConfigBuilder.validateOutputLayerConfig(validateOutputLayerConfig == null ? true : validateOutputLayerConfig).build();if (this.workspaceMode != null)newConfig.setTrainingWorkspaceMode(workspaceMode);ComputationGraph newGraph = new ComputationGraph(newConfig);newGraph.init();int[] topologicalOrder = newGraph.topologicalSortOrder();org.deeplearning4j.nn.graph.vertex.GraphVertex[] vertices = newGraph.getVertices();if (!editedVertices.isEmpty()) {//set params from orig graph as necessary to new graphfor (int i = 0; i < topologicalOrder.length; i++) {if (!vertices[topologicalOrder[i]].hasLayer())continue;org.deeplearning4j.nn.api.Layer layer = vertices[topologicalOrder[i]].getLayer();String layerName = vertices[topologicalOrder[i]].getVertexName();long range = layer.numParams();if (range <= 0)continue; //some layers have no paramsif (editedVertices.contains(layerName))continue; //keep the changed paramsINDArray origParams = origGraph.getLayer(layerName).params();layer.setParams(origParams.dup()); //copy over origGraph params}} else {newGraph.setParams(origGraph.params());}原来是直接调用 layer.setParams方法,给每一个层set相关的参数即可 。接下来,我们就有思路了,直接构造一个和vgg16一样的模型,把vgg16的参数set到新的模型里即可 。其实本质上,DeepLearning被train之后,有用的就是参数而已,有了这些参数,我们就可以随心所欲的用这些模型了 。废话不多说,我们直接上代码,构建我们目标模型
private static ComputationGraph buildModel() {ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder().seed(123).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).activation(Activation.RELU).graphBuilder().addInputs("input1", "input2").addVertex("stack", new StackVertex(), "input1", "input2").layer("conv1_1",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nIn(3).nOut(64).build(),"stack").layer("conv1_2",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(64).build(),"conv1_1").layer("pool1",new SubsamplingLayer.Builder().poolingType(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).stride(2, 2).build(),"conv1_2")// block 2.layer("conv2_1",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(128).build(),"pool1").layer("conv2_2",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(128).build(),"conv2_1").layer("pool2",new SubsamplingLayer.Builder().poolingType(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).stride(2, 2).build(),"conv2_2")// block 3.layer("conv3_1",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(256).build(),"pool2").layer("conv3_2",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(256).build(),"conv3_1").layer("conv3_3",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(256).build(),"conv3_2").layer("pool3",new SubsamplingLayer.Builder().poolingType(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).stride(2, 2).build(),"conv3_3")// block 4.layer("conv4_1",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(512).build(),"pool3").layer("conv4_2",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(512).build(),"conv4_1").layer("conv4_3",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(512).build(),"conv4_2").layer("pool4",new SubsamplingLayer.Builder().poolingType(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).stride(2, 2).build(),"conv4_3")// block 5.layer("conv5_1",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(512).build(),"pool4").layer("conv5_2",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(512).build(),"conv5_1").layer("conv5_3",new ConvolutionLayer.Builder().kernelSize(3, 3).stride(1, 1).padding(1, 1).nOut(512).build(),"conv5_2").layer("pool5",new SubsamplingLayer.Builder().poolingType(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).stride(2, 2).build(),"conv5_3").addVertex("unStack1", new UnstackVertex(0, 2), "pool5").addVertex("unStack2", new UnstackVertex(1, 2), "pool5").addVertex("cosine", new CosineLambdaVertex(), "unStack1", "unStack2").addLayer("out", new LossLayer.Builder().build(), "cosine").setOutputs("out").setInputTypes(InputType.convolutionalFlat(224, 224, 3), InputType.convolutionalFlat(224, 224, 3)).build();ComputationGraph network = new ComputationGraph(conf);network.init();return network;}


推荐阅读