怎样用 torch7 做 multi-task learning?怎样 branch out?

这个任务最好还是用caffe,无论是已有模型、引出分支、多损失、不同层使用不同学习率等等,在caffe中有很好的支持,改改prototxt配置文件即可,用torch麻烦一些。如果一定要用torch的话,大致思路如下:1)导入已有模型,推荐使用loadcaffe,毕竟可获得的最多的模型格式是caffe的。2)loadcaffe得到的是nn.Sequential形式的网络,想在原有基础上加几个分支,直接处理有点麻烦,可先手动将nn.Sequential转为nngraph形式,每个layer对应一个node,然后就方便在此基础上创建更复杂的网络了。3)多个损失,使用nn.ParallelCriterion 即可。4)finetune 时一般对不同层使用不同学习率,这一点用torch处理起来就麻烦一些了,网上有解决方案,https://gist.github.com/szagoruyko/1e994e713fce4a41773e , gist被墙了,摘录原内容如下:Q: How to set different learning rates/weight decays per layerA: Two ways of doing it:if you are using optim.sgd, pass as argument to the optimState the fieldslearningRates/weightDecays, containing a Tensor with the multiplying factors(for the learning rate) or the values itself (for the weight decay) perparameter of the network. Here is an example. The downside of this approachis that you need to store an extra tensor of the size of the network.instead of doing parameters, gradParameters = model:getParameters(), doparameters, gradParameters = model:parameters(). This will give you a tableof tensors, each one of them corresponding to a separate weight/bias perlayer. While optimizing using optim, keep a separate optimState for eachparameter (which implies calling optim.sgd in a for loop)
■网友
用多个loss函数,把它们的gradient传回网络即可。


    推荐阅读