Stable Diffusion 图像生成攻略六-卡咪卡咪哈-一个博客

本文有两个主题，一个是如何使用 Hugging Diffusers 框架，另一个是如何用 Diffusers 框架，实现图像的高清放大。

Huggingface Diffusers 框架，提供了高清放大图像的 APIs，同时还提供了预训练模型 stabilityai/stable-diffusion-x4-upscaler。这个预训练模型，似乎是 Huggingface 主推的高清放大模型。这个模型的 Model Card 里，有简短的程序，说明如何使用。

除此之外，Huggingface 还提供了第二个预训练模型，CompVis/ldm-super-resolution-4x-openimages。这个预训练模型的 Model Card 里，也有简短的程序，说明如何使用。

参考两个 Model Cards，写了一段程序，见附录一。用同一张头像作为输入图像，看看两个预训练模型的效果如何。

以下各个段落，记录在使用过程中，遇到的问题，以及解决方案。

用 Huggingface Diffusers，实现图像的高清放大。

1. access_token：

使用预训练模型 stabilityai/stable-diffusion-x4-upscaler，似乎很简单，但是使用过程中，遇到如下报错，

PytorchStreamReader failed reading zip archive: failed finding central directory.

查了 StackOverflow，原因似乎是预训练模型下载不完整导致。

附录一的程序运行时，会自动下载并缓存预训练模型的内容。我们租用的是 MistGPU 的 GPU 主机，它的缓存地址是 /home/mist/.cache/ 文件夹。

$ ls -aR /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler: . .. blobs refs snapshots /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/blobs: . .. 113c9c05070fd0647f57884af1a15871a64dd298 2188379b05015f531d61503e714234d00a64939792f3098b324e516547f0194f 269c89302cd24cd8d3937982450d79f106f405be 33478c297ec29218100f8ee86007b3ab4c2701896d5ca5c9e3a84fc29f678183 3def213b5da2e3dc8b82e313d106374e70cc7a34 469be27c5c010538f845f518c4f5e8574c78f7c8 615fdb58220250bb05bc9f1382327683a4e96728 707195f27097fb84d29f56c8a0ea9300b5b36a83 76e821f1b6f0a9709293c3b6b51ed90980b3166b 7bee7a4acd3ccb2ee9c470d7e9105dffd48d449da4d3d4a5056f7d9e51f4fc5e 887ab066b7264fd29980113a98db6acd349db0e5 9701b233be392017374527288e155239afa0450365fea2a6a779faa33afc8c37 aad5ad10ade11526c5036b5f4410dda9a55d5869 ae0c5be6f35217e51c4c000fd325d8de0294e99c b14ab7fbdf9d227b0ebd443de6502c2a0f69e109 b6dc05aaae1ba43c230612932492a81e431126582481fd6c7d94c6b15f9ce584 cce6febb0b6d876ee5eb24af35e27e764eb4f9b1d0b7c026c8c3333d4cfc916c /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/refs: . .. main /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots: . .. 19b610c68ca7572defb6e09e64d1063f32b4db83 /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83: . .. low_res_scheduler model_index.json scheduler text_encoder tokenizer unet vae /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/low_res_scheduler: . .. scheduler_config.json /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/scheduler: . .. scheduler_config.json /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/text_encoder: . .. config.json model.safetensors pytorch_model.bin /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/tokenizer: . .. merges.txt special_tokens_map.json tokenizer_config.json vocab.json /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/unet: . config.json diffusion_pytorch_model.safetensors .. diffusion_pytorch_model.bin .ipynb_checkpoints /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/unet/.ipynb_checkpoints: . .. config-checkpoint.json /home/mist/.cache/huggingface/diffusers/models–stabilityai–stable-diffusion-x4-upscaler/snapshots/19b610c68ca7572defb6e09e64d1063f32b4db83/vae: . .. config.json diffusion_pytorch_model.bin diffusion_pytorch_model.safetensors # mist @ MistGPU-349 in ~ [18:21:55] $

先删除缓存，然后再运行的程序，使之自动下载缓存预训练模型。重复以上动作，但是仍然遭遇报错。

后来试了试 access_token，还有其它一些修复方法，就出乎意料地运行成功了。

查阅 Huggingface 的文献，使用 stabilityai/stable-diffusion-x4-upscaler 预训练模型，似乎不需要 access_token，或许是因为我们更新了 diffusers、transformers 等等 Python packages，把旧版本的 bugs 解决了。

下面介绍一下 Huggingface access_token 的使用方法，以备以后需要。

1）概念介绍，参阅 Huggingface 这两篇文档。

《 User access tokens 》和《 Loading pipelines that require access request

》。

2）实际操作分两步。

第一步，进入 Huggingface 相关页面，生成属于自己的 Huggingface access token。

似乎对应每位 Huggingface 用户，专属他的 access token，一直是同一个字符串，字符串以 hf_ 开头。

第二步，在程序中引用专属自己的 access_token。

model_stable = “stabilityai/stable-diffusion-x4-upscaler” access_token = “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” pipeline_stable = StableDiffusionUpscalePipeline.from_pretrained( model_stable, torch_dtype=torch.float16, use_auth_token = access_token)

2. prompt 似乎无用：

使用 stabilityai/stable-diffusion-x4-upscaler 预训练模型时，需要提供一段话的 prompt，引导预训练模型进行高清放大处理，见如下程序片段。

经过反复测试，prompt 对于高清放大人像，似乎没有明显作用。

prompt = “a portrait of a beautiful woman” upscaled_image = pipeline_stable(prompt=prompt, image=low_res_img).images[0] upscaled_image.save(“upscaled_portrait_stable.png”)

StableDiffusionUpscalePipeline 的源码，可以查阅Github Repo。

但是实际使用的 pipeline，未必与 Github repo 保持一致，所以，仍然需要查阅部署在本地，实际使用的源码。因为我们租用的是 MistGPU，部署的 Python Packages 都在 /mistgpu/site-packages/ 文件夹里。

部署在本地的 StableDiffusionUpscalePipeline 的源码，可以查阅 /mistgpu/site-packages/ 文件夹。

$ ls /mistgpu/site-packages/diffusers/pipelines/stable_diffusion __init__.py pipeline_stable_diffusion_img2img.py pipeline_cycle_diffusion.py pipeline_stable_diffusion_inpaint_legacy.py pipeline_flax_stable_diffusion.py pipeline_stable_diffusion_inpaint.py pipeline_onnx_stable_diffusion_img2img.py pipeline_stable_diffusion_k_diffusion.py pipeline_onnx_stable_diffusion_inpaint_legacy.py pipeline_stable_diffusion.py pipeline_onnx_stable_diffusion_inpaint.py pipeline_stable_diffusion_upscale.py pipeline_onnx_stable_diffusion.py __pycache__ pipeline_stable_diffusion_depth2img.py safety_checker_flax.py pipeline_stable_diffusion_image_variation.py safety_checker.py

Huggingface Pipeline 的使用方法，可以参阅它的源码中的 __call__( ) 函数。

例如，StableDiffusionUpscalePipeline 的使用方法，可以参阅它的 __call__( ) 函数，如下所示。

@torch.no_grad() def __call__( self, prompt: Union[str, List[str]], image: Union[torch.FloatTensor, PIL.Image.Image, List[PIL.Image.Image]], num_inference_steps: int = 75, guidance_scale: float = 9.0, noise_level: int = 20, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, latents: Optional[torch.FloatTensor] = None, output_type: Optional[str] = “pil”, return_dict: bool = True, callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback_steps: Optional[int] = 1, ): r””” Function invoked when calling the pipeline for generation. Args: prompt (`str` or `List[str]`): The prompt or prompts to guide the image generation. image (`PIL.Image.Image` or List[`PIL.Image.Image`] or `torch.FloatTensor`): `Image`, or tensor representing an image batch which will be upscaled. * num_inference_steps (`int`, *optional*, defaults to 50): The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. guidance_scale (`float`, *optional*, defaults to 7.5): Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). `guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, usually at the expense of lower image quality. negative_prompt (`str` or `List[str]`, *optional*): The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). num_images_per_prompt (`int`, *optional*, defaults to 1): The number of images to generate per prompt. eta (`float`, *optional*, defaults to 0.0): Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [`schedulers.DDIMScheduler`], will be ignored for others. generator (`torch.Generator`, *optional*): One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic. latents (`torch.FloatTensor`, *optional*): Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random `generator`. output_type (`str`, *optional*, defaults to `”pil”`): The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`. return_dict (`bool`, *optional*, defaults to `True`): Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a plain tuple. callback (`Callable`, *optional*): A function that will be called every `callback_steps` steps during inference. The function will be called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. callback_steps (`int`, *optional*, defaults to 1): The frequency at which the `callback` function will be called. If not specified, the callback will be called at every step. Returns: [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] if `return_dict` is True, otherwise a `tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of `bool`s denoting whether the corresponding generated image likely represents “not-safe-for-work” (nsfw) content, according to the `safety_checker`. “””

3. LDM 预训练模型更容易使用：

如前文所述，针对图像的高清放大，Huggingface 目前提供了两个预训练模型，一个是stabilityai/stable-diffusion-x4-upscaler，另一个是 CompVis/ldm-super-resolution-4x-openimages

。

使用 stabilityai/stable-diffusion-x4-upscaler 时，必须使用 StableDiffusionUpscalePipeline；

而使用 CompVis/ldm-super-resolution-4x-openimages 时，必须使用 LDMSuperResolutionPipeline，见代码。

1. 参阅 LDMSuperResolutionPipeline 的 __call__( ) 函数，它并不需要输入一句话的 prompt 提示。

而且下载缓存这个预训练模型，也没有遇到下载不完整等等报错。

2. Stable 模型 vs LDM 模型，都只能把原生图像，放大到 512×512，放大尺寸不能自由设置。

3. 对比Stable 模型 vs LDM 模型，两个模型放大的照片，发现效果差不多。

结论，使用 LDMSuperResolutionPipeline 目前是更好的选择。

附录一. test_img_super_resolution.py

# test_img_super_resolution.py # 2023/01/24 # from PIL import Image from io import BytesIO from diffusers import LDMSuperResolutionPipeline from diffusers import StableDiffusionUpscalePipeline import torch device = “cuda” if torch.cuda.is_available() else “cpu” # Open an image as a PIL.Image low_res_img = Image.open(“low_res_portrait.jpg”) low_res_img = low_res_img.resize((128, 128)) # Model 1. Modified from: # https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler # load model and scheduler model_stable = “stabilityai/stable-diffusion-x4-upscaler” access_token = “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” pipeline_stable = StableDiffusionUpscalePipeline.from_pretrained(model_stable, torch_dtype=torch.float16, use_auth_token = access_token) pipeline_stable = pipeline_stable.to(device) prompt = “a portrait of a beautiful woman” upscaled_image = pipeline_stable(prompt=prompt, image=low_res_img).images[0] upscaled_image.save(“upscaled_portrait_stable.png”) # Model 2. Modified from: # https://huggingface.co/CompVis/ldm-super-resolution-4x-openimages model_ldm = “CompVis/ldm-super-resolution-4x-openimages” pipeline_ldm = LDMSuperResolutionPipeline.from_pretrained(model_ldm) pipeline_ldm = pipeline_ldm.to(device) # run pipeline in inference (sample random noise and denoise) upscaled_image = pipeline_ldm(low_res_img, num_inference_steps=100, eta=1).images[0] # save image upscaled_image.save(“upscaled_portrait_ldm.png”)

THE END