Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands design-oriented planning, reasoning, and layer-wise generation. This intricate task involves understanding vague intentions and faithfully generating multi-layered visual elements, including the background, decorations, objects, and fonts, among others. It also requires layout planning for all elements and visual reasoning to generate all visual elements that satisfy visual design principles.
Unlike the recent CanvaGPT, which integrates GPT-4 with existing design templates to build a custom GPT, this paper introduces the system—a hierarchical generation framework designed to comprehensively address these challenges. This system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input. Examples of such input might include directives like "design a poster for Hisaishi's concert."
The key insight is to dissect the complex task of text-to-design generation into a hierarchy of simpler sub-tasks, each addressed by specialized models working collaboratively. The results from these models are then consolidated to produce a cohesive final output. Our hierarchical task decomposition can streamline the complex process and significantly enhance generation reliability.
Our system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text. Furthermore, we construct the benchmark to demonstrate the superiority of our system over existing methods in generating high-quality graphic designs from user intent. Last, we present a Canva-like multi-layered image editing tool to support flexible editing of the generated multi-layered graphic design images.
We perceive our system as an important step towards addressing more complex and multi-layered graphic design generation tasks in the future.
Original
Edit text only
Edit object only
Finally result
Original
Edit text only
Edit object only
Finally result
@misc{jia2024cole,
title={COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design},
author={Peidong Jia and Chenxuan Li and Yuhui Yuan and Zeyu Liu and Yichao Shen and Bohan Chen and Xingru Chen and Yinglin Zheng and Dong Chen and Ji Li and Xiaodong Xie and Shanghang Zhang and Baining Guo},
year={2024},
eprint={2311.16974},
archivePrefix={arXiv},
primaryClass={cs.CV}
}