背景

我一直在玩Deep Dream和Inceptionism,使用Caffe框架来可视化GoogLeNet的层,这是为Imagenet项目构建的架构,是一个用于视觉对象识别的大型可视化数据库。

你可以在这里找到Imagenet: Imagenet 1000类。


为了探索建筑并产生“梦想”,我使用了三本笔记本:

https://github.com/google/deepdream/blob/master/dream.ipynb https://github.com/kylemcdonald/deepdream/blob/master/dream.ipynb https://github.com/auduno/deepdraw/blob/master/deepdraw.ipynb


这里的基本思想是从模型或“指南”图像中指定层的每个通道中提取一些特征。

然后,我们输入一张我们想要修改的图像到模型中,并提取指定的同一层(每个八度)中的特征, 增强最佳匹配特征,即两个特征向量的最大点积。


到目前为止,我已经设法修改输入图像和控制梦境使用以下方法:

(a)应用图层作为输入图像优化的“end”目标。(参见特征可视化) (b)使用第二图像对输入图像指导优化目标。 (c)可视化由噪声生成的Googlenet模型类。


然而,我想要达到的效果介于这些技术之间,我还没有找到任何文档、论文或代码。

期望结果(不是待回答问题的一部分)

让一个属于给定“end”层的单一类或单元(a)引导优化目标(b),并在输入图像上可视化该类(c):

一个class = 'face'和input_image = 'clouds.jpg'的例子:

请注意:上图是使用人脸识别模型生成的,该模型没有在Imagenet数据集上进行训练。仅供演示之用。


工作代码

方法(一)

from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import matplotlib as plt    
import caffe
         
model_name = 'GoogLeNet' 
model_path = 'models/dream/bvlc_googlenet/' # substitute your path here
net_fn   = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'
   
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('models/dream/bvlc_googlenet/tmp.prototxt', 'w').write(str(model))
    
net = caffe.Classifier('models/dream/bvlc_googlenet/tmp.prototxt', param_fn,
                       mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
                       channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB

def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = StringIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))
  
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
    return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
    return np.dstack((img + net.transformer.mean['data'])[::-1])
      
def objective_L2(dst):
    dst.diff[:] = dst.data 

def make_step(net, step_size=1.5, end='inception_4c/output', 
              jitter=32, clip=True, objective=objective_L2):
    '''Basic gradient ascent step.'''

    src = net.blobs['data'] # input image is stored in Net's 'data' blob
    dst = net.blobs[end]

    ox, oy = np.random.randint(-jitter, jitter+1, 2)
    src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift
            
    net.forward(end=end)
    objective(dst)  # specify the optimization objective
    net.backward(start=end)
    g = src.diff[0]
    # apply normalized ascent step to the input image
    src.data[:] += step_size/np.abs(g).mean() * g

    src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image
            
    if clip:
        bias = net.transformer.mean['data']
        src.data[:] = np.clip(src.data, -bias, 255-bias)

 
def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4, 
              end='inception_4c/output', clip=True, **step_params):
    # prepare base images for all octaves
    octaves = [preprocess(net, base_img)]
    
    for i in xrange(octave_n-1):
        octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
    
    src = net.blobs['data']
    
    detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details
    
    for octave, octave_base in enumerate(octaves[::-1]):
        h, w = octave_base.shape[-2:]
        
        if octave > 0:
            # upscale details from the previous octave
            h1, w1 = detail.shape[-2:]
            detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)

        src.reshape(1,3,h,w) # resize the network's input image size
        src.data[0] = octave_base+detail
        
        for i in xrange(iter_n):
            make_step(net, end=end, clip=clip, **step_params)
            
            # visualization
            vis = deprocess(net, src.data[0])
            
            if not clip: # adjust image contrast if clipping is disabled
                vis = vis*(255.0/np.percentile(vis, 99.98))
            showarray(vis)

            print octave, i, end, vis.shape
            clear_output(wait=True)
            
        # extract details produced on the current octave
        detail = src.data[0]-octave_base
    # returning the resulting image
    return deprocess(net, src.data[0])

我运行上面的代码:

end = 'inception_4c/output'
img = np.float32(PIL.Image.open('clouds.jpg'))
_=deepdream(net, img)

方法(b)

"""
Use one single image to guide 
the optimization process.

This affects the style of generated images 
without using a different training set.
"""

def dream_control_by_image(optimization_objective, end):
    # this image will shape input img
    guide = np.float32(PIL.Image.open(optimization_objective))  
    showarray(guide)
  
    h, w = guide.shape[:2]
    src, dst = net.blobs['data'], net.blobs[end]
    src.reshape(1,3,h,w)
    src.data[0] = preprocess(net, guide)
    net.forward(end=end)

    guide_features = dst.data[0].copy()
    
    def objective_guide(dst):
        x = dst.data[0].copy()
        y = guide_features
        ch = x.shape[0]
        x = x.reshape(ch,-1)
        y = y.reshape(ch,-1)
        A = x.T.dot(y) # compute the matrix of dot-products with guide features
        dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best

    _=deepdream(net, img, end=end, objective=objective_guide)

然后运行上面的代码:

end = 'inception_4c/output'
# image to be modified
img = np.float32(PIL.Image.open('img/clouds.jpg'))
guide_image = 'img/guide.jpg'
dream_control_by_image(guide_image, end)

问题

现在,我尝试访问单个类的失败方法,对类的矩阵进行热编码,并专注于一个(到目前为止还没有效果):

def objective_class(dst, class=50):
   # according to imagenet classes 
   #50: 'American alligator, Alligator mississipiensis',
   one_hot = np.zeros_like(dst.data)
   one_hot.flat[class] = 1.
   dst.diff[:] = one_hot.flat[class]

明确一点:这个问题不是关于梦想的代码,这是一个有趣的背景和已经工作的代码,但它只是关于最后一段的问题:有人能指导我如何从ImageNet获得一个所选类的图像(以类#50:“美洲短短鳄,密西西比短短鳄”为例)(这样我就可以将它们作为输入-与云图一起-创建一个梦想的图像)?