背景

我一直在玩Deep Dream和Inceptionism,使用Caffe框架来可视化GoogLeNet的层,这是为Imagenet项目构建的架构,是一个用于视觉对象识别的大型可视化数据库。

你可以在这里找到Imagenet: Imagenet 1000类。


为了探索建筑并产生“梦想”,我使用了三本笔记本:

https://github.com/google/deepdream/blob/master/dream.ipynb https://github.com/kylemcdonald/deepdream/blob/master/dream.ipynb https://github.com/auduno/deepdraw/blob/master/deepdraw.ipynb


这里的基本思想是从模型或“指南”图像中指定层的每个通道中提取一些特征。

然后,我们输入一张我们想要修改的图像到模型中,并提取指定的同一层(每个八度)中的特征, 增强最佳匹配特征,即两个特征向量的最大点积。


到目前为止,我已经设法修改输入图像和控制梦境使用以下方法:

(a)应用图层作为输入图像优化的“end”目标。(参见特征可视化) (b)使用第二图像对输入图像指导优化目标。 (c)可视化由噪声生成的Googlenet模型类。


然而,我想要达到的效果介于这些技术之间,我还没有找到任何文档、论文或代码。

期望结果(不是待回答问题的一部分)

让一个属于给定“end”层的单一类或单元(a)引导优化目标(b),并在输入图像上可视化该类(c):

一个class = 'face'和input_image = 'clouds.jpg'的例子:

请注意:上图是使用人脸识别模型生成的,该模型没有在Imagenet数据集上进行训练。仅供演示之用。


工作代码

方法(一)

from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import matplotlib as plt    
import caffe
         
model_name = 'GoogLeNet' 
model_path = 'models/dream/bvlc_googlenet/' # substitute your path here
net_fn   = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'
   
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('models/dream/bvlc_googlenet/tmp.prototxt', 'w').write(str(model))
    
net = caffe.Classifier('models/dream/bvlc_googlenet/tmp.prototxt', param_fn,
                       mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
                       channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB

def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = StringIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))
  
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
    return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
    return np.dstack((img + net.transformer.mean['data'])[::-1])
      
def objective_L2(dst):
    dst.diff[:] = dst.data 

def make_step(net, step_size=1.5, end='inception_4c/output', 
              jitter=32, clip=True, objective=objective_L2):
    '''Basic gradient ascent step.'''

    src = net.blobs['data'] # input image is stored in Net's 'data' blob
    dst = net.blobs[end]

    ox, oy = np.random.randint(-jitter, jitter+1, 2)
    src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift
            
    net.forward(end=end)
    objective(dst)  # specify the optimization objective
    net.backward(start=end)
    g = src.diff[0]
    # apply normalized ascent step to the input image
    src.data[:] += step_size/np.abs(g).mean() * g

    src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image
            
    if clip:
        bias = net.transformer.mean['data']
        src.data[:] = np.clip(src.data, -bias, 255-bias)

 
def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4, 
              end='inception_4c/output', clip=True, **step_params):
    # prepare base images for all octaves
    octaves = [preprocess(net, base_img)]
    
    for i in xrange(octave_n-1):
        octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
    
    src = net.blobs['data']
    
    detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details
    
    for octave, octave_base in enumerate(octaves[::-1]):
        h, w = octave_base.shape[-2:]
        
        if octave > 0:
            # upscale details from the previous octave
            h1, w1 = detail.shape[-2:]
            detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)

        src.reshape(1,3,h,w) # resize the network's input image size
        src.data[0] = octave_base+detail
        
        for i in xrange(iter_n):
            make_step(net, end=end, clip=clip, **step_params)
            
            # visualization
            vis = deprocess(net, src.data[0])
            
            if not clip: # adjust image contrast if clipping is disabled
                vis = vis*(255.0/np.percentile(vis, 99.98))
            showarray(vis)

            print octave, i, end, vis.shape
            clear_output(wait=True)
            
        # extract details produced on the current octave
        detail = src.data[0]-octave_base
    # returning the resulting image
    return deprocess(net, src.data[0])

我运行上面的代码:

end = 'inception_4c/output'
img = np.float32(PIL.Image.open('clouds.jpg'))
_=deepdream(net, img)

方法(b)

"""
Use one single image to guide 
the optimization process.

This affects the style of generated images 
without using a different training set.
"""

def dream_control_by_image(optimization_objective, end):
    # this image will shape input img
    guide = np.float32(PIL.Image.open(optimization_objective))  
    showarray(guide)
  
    h, w = guide.shape[:2]
    src, dst = net.blobs['data'], net.blobs[end]
    src.reshape(1,3,h,w)
    src.data[0] = preprocess(net, guide)
    net.forward(end=end)

    guide_features = dst.data[0].copy()
    
    def objective_guide(dst):
        x = dst.data[0].copy()
        y = guide_features
        ch = x.shape[0]
        x = x.reshape(ch,-1)
        y = y.reshape(ch,-1)
        A = x.T.dot(y) # compute the matrix of dot-products with guide features
        dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best

    _=deepdream(net, img, end=end, objective=objective_guide)

然后运行上面的代码:

end = 'inception_4c/output'
# image to be modified
img = np.float32(PIL.Image.open('img/clouds.jpg'))
guide_image = 'img/guide.jpg'
dream_control_by_image(guide_image, end)

问题

现在,我尝试访问单个类的失败方法,对类的矩阵进行热编码,并专注于一个(到目前为止还没有效果):

def objective_class(dst, class=50):
   # according to imagenet classes 
   #50: 'American alligator, Alligator mississipiensis',
   one_hot = np.zeros_like(dst.data)
   one_hot.flat[class] = 1.
   dst.diff[:] = one_hot.flat[class]

明确一点:这个问题不是关于梦想的代码,这是一个有趣的背景和已经工作的代码,但它只是关于最后一段的问题:有人能指导我如何从ImageNet获得一个所选类的图像(以类#50:“美洲短短鳄,密西西比短短鳄”为例)(这样我就可以将它们作为输入-与云图一起-创建一个梦想的图像)?


问题是如何从ImageNet获取所选类#50:“美洲短吻鳄,密西西比短吻鳄”的图像。

访问image-net.org。 点击“下载”。 按照“下载图像url”的说明:

如何从您的浏览器下载一个同义词集的url ? 1. 在搜索框中输入查询,然后点击“搜索”按钮

短吻鳄没有出现。ImageNet正在维护中。搜索结果中只包括ILSVRC同义词集。没有问题,我们对类似的动物“鳄鱼蜥蜴”很好,因为这个搜索是关于获得WordNet树图的正确分支。我不知道您是否会在这里获得直接的ImageNet图像,即使没有维护。

2. 打开一个synset页面

向下滚动:

向下滚动:

寻找美洲短吻鳄,它碰巧也是一种蜥蜴类双壳爬行动物,是它的近邻:

3.你会在图片浏览窗口的左下角发现“下载url”按钮。

您将获得所选类的所有url。浏览器中弹出一个文本文件:

http://image-net.org/api/text/imagenet.synset.geturls?wnid=n01698640

我们在这里看到,这只是关于知道需要放在URL末尾的正确的WordNet id。

手动图像下载

文本文件如下所示:

http://farm1.static.flickr.com/136/326907154_d975d0c944.jpg http://weeksbay.org/photo_gallery/reptiles/American20Alligator.jpg ... 直到1261号图像。

例如,第一个URL链接到:

第二个是死链接:

第三个环节已经失效,但第四个环节还在发挥作用。

这些url的图像是公开的,但许多链接是死的,而且图片的分辨率较低。

自动图像下载

还是在ImageNet指南中:

How to download by HTTP protocol? To download a synset by HTTP request, you need to obtain the "WordNet ID" (wnid) of a synset first. When you use the explorer to browse a synset, you can find the WordNet ID below the image window.(Click Here and search "Synset WordNet ID" to find out the wnid of "Dog, domestic dog, Canis familiaris" synset). To learn more about the "WordNet ID", please refer to Mapping between ImageNet and WordNet Given the wnid of a synset, the URLs of its images can be obtained at http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=[wnid] You can also get the hyponym synsets given wnid, please refer to API documentation to learn more.

那么API文档中有什么呢?

这里有获取所有的WordNet ID(所谓的“synset ID”)及其所有synset的单词所需的所有东西,也就是说,它可以免费获得任何类名和它的WordNet ID。

获取同义词集的单词 给定同义词集的风力,的词 synset可以在 http://www.image-net.org/api/text/wordnet.synset.getwords?wnid= [wnid] 您也可以点击此处 下载所有同义词集的WordNet ID和单词之间的映射, 请按此处下载 所有同义词集的WordNet ID和glosses之间的映射。

如果您知道选择的WordNet id及其类名,您可以使用“nltk”(自然语言工具包)的nltk.corpus. WordNet,请参阅WordNet接口。

In our case, we just need the images of class #50: 'American alligator, Alligator mississipiensis', we already know what we need, thus we can leave the nltk.corpus.wordnet aside (see tutorials or Stack Exchange questions for more). We can automate the download of all alligator images by looping through the URLs that are still alive. We could also widen this to the full WordNet with a loop over all WordNet IDs, of course, though this would take far too much time for the whole treemap - and is also not recommended since the images will stop being there if 1000s of people download them daily.

我恐怕不会花时间来编写这段接受ImageNet类号“#50”作为参数的Python代码,尽管这应该也是可能的,使用从WordNet到ImageNet的映射表。类名和WordNet ID就足够了。

对于单个WordNet ID,代码可以如下所示:

import urllib.request 
import csv

wnid = "n01698640"
url = "http://image-net.org/api/text/imagenet.synset.geturls?wnid=" + str(wnid)

# From https://stackoverflow.com/a/45358832/6064933
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
with open(wnid + ".csv", "wb") as f:
    with urllib.request.urlopen(req) as r:
        f.write(r.read())

with open(wnid + ".csv", "r") as f:
    counter = 1
    for line in f.readlines():      
        print(line.strip("\n"))
        failed = []
        try:
            with urllib.request.urlopen(line) as r2:
                with open(f'''{wnid}_{counter:05}.jpg''', "wb") as f2:
                    f2.write(r2.read())
        except:
            failed.append(f'''{counter:05}, {line}'''.strip("\n"))
        counter += 1
        if counter == 10:
            break

with open(wnid + "_failed.csv", "w", newline="") as f3:
    writer = csv.writer(f3)
    writer.writerow(failed)

结果:

如果你需要这些图片,即使是在死链接的后面,而且是原始质量的,如果你的项目是非商业性的,你可以登录,在下载常见问题中查看“我如何获得图片的副本?”

在上面的URL中,您可以看到URL末尾的wnid=n01698640,这是映射到ImageNet的WordNet id。 或者在“Synset的图像”选项卡,只需点击“Wordnet id”。

到达:

或者右键单击—另存为:

您可以使用WordNet id来获取原始图像。

如果你是商业的,我会建议你联系ImageNet团队。


附加组件

考虑注释的想法:如果你不想要很多图像,而只是尽可能多地表示类的“一个类图像”,请查看可视化GoogLeNet类,并尝试将此方法用于ImageNet的图像。也在使用深梦代码。

Visualizing GoogLeNet Classes July 2015 Ever wondered what a deep neural network thinks a Dalmatian should look like? Well, wonder no more. Recently Google published a post describing how they managed to use deep neural networks to generate class visualizations and modify images through the so called “inceptionism” method. They later published the code to modify images via the inceptionism method yourself, however, they didn’t publish code to generate the class visualizations they show in the same post. While I never figured out exactly how Google generated their class visualizations, after butchering the deepdream code and this ipython notebook from Kyle McDonald, I managed to coach GoogLeNet into drawing these: ... [with many other example images to follow]