VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

Abstract

Diffusion models have shown impressive results in text-to-image synthesis. Using massive datasets of captioned images, diffusion models learn to generate raster images of highly diverse objects and scenes. However, designers frequently use vector representations of images like Scalable Vector Graphics (SVGs) for digital icons, graphics and stickers. Vector graphics can be scaled to any size, and are compact. In this work, we show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics. We do so without access to large datasets of captioned SVGs. Instead, inspired by recent work on text-to-3D synthesis, we vectorize a text-to-image diffusion sample and fine-tune with a Score Distillation Sampling loss. By optimizing a differentiable vector graphics rasterizer, our method distills abstract semantic knowledge out of a pretrained diffusion model. By constraining the vector representation, we can also generate coherent pixel art and sketches. Our approach, VectorFusion, produces more coherent graphics than prior works that optimize CLIP, a contrastive image-text model.

Example generated vectors

VectorFusion generates vector graphics from diverse captions. Search through SVGs in our gallery.

Search SVGs

The Tokyo Tower is a communications and observation tower in the Shiba-koen district of Minato Tokyo Japan. minimal flat 2d vector icon. lineal color. on a white background. trending on artstation

The Tokyo Tower is a communications and observation [...]

A cat as 3D rendered in Unreal Engine. minimal flat 2d vector icon. lineal color. on a white background. trending on artstation

A cat as 3D rendered in Unreal Engine. [...]

a kids book cover with an illustration of white dog driving a red pickup truck. minimal flat 2d vector icon. lineal color. on a white background. trending on artstation

a kids book cover with an illustration of white dog [...]

A painting of a starry night sky. minimal flat 2d vector icon. lineal color. on a white background. trending on artstation

A painting of a starry night sky. [...]

A delicious hamburger. pixel art. trending on artstation

A delicious hamburger. [...]

a tuba with red flowers protruding from its bell. pixel art. trending on artstation

a tuba with red flowers protruding from its bell. [...]

a train. pixel art. trending on artstation

a train. [...]

A smiling sloth wearing a leather jacket, a cowboy hat and a kilt. pixel art. trending on artstation

A smiling sloth wearing a leather jacket, a cowboy hat and a kilt. [...]

The Eiffel Tower. minimal 2d line drawing. on a white background

The Eiffel Tower. [...]

Family vacation to Walt Disney World. minimal 2d line drawing. on a white background

Family vacation to Walt Disney World. [...]

A realistic photograph of a cat. minimal 2d line drawing. on a white background

A realistic photograph of a cat. [...]

Horse eating a cupcake. minimal 2d line drawing. on a white background

Horse eating a cupcake. [...]

Infinitely scalable assets

Vector graphics are compact but can be scaled to arbitrary size while staying sharp. Caption: "a train. minimal flat 2d vector icon. lineal color. on a white background. trending on artstation."

One-stage text-to-SVG generation

We optimize vector graphics by optimizing an image-text loss based on Score Distillation Sampling. VectorFusion uses an inverse graphics approach, enabled by the DiffVG differentiable SVG renderer.

Generating 128 SVG paths from scratch.

Fine-tuning for better quality and speed

VectorFusion also supports a more efficient and higher quality multi-stage setting. First, our method samples raster images from the Stable Diffusion text-to-image diffusion model. VectorFusion then traces those samples automatically with LIVE. However, these samples are often difficult to convert to vector graphics, dull, or don't reflect all the details of the text. Fine-tuning with Score Distillation Sampling improves vibrancy and consistency with the text.

Generating 64 SVG paths, initialized from a diffusion sample.

Pixel art

By restricting SVG paths to be squares on a grid following Pixray, VectorFusion can generate a retro video game pixel art style.

Pixel art generated with VectorFusion, initialized by pixelating a diffusion sample.

Sketches

It's simple to extend our method to support text-to-sketch generation. We start by drawing 16 random strokes, then optimize our latent Score Distillation Sampling loss to learn an abstract line drawing that reflects the user-supplied text.

Sketches generated with VectorFusion.

Citation


                    @article{jain2022vectorfusion,

                      author = {Jain, Ajay and Xie, Amber and Abbeel, Pieter},

                      title  = {VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models},

                      journal = {arXiv},

                      year   = {2022},

                }

VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

Ajay Jain^*

UC Berkeley

Amber Xie^*

UC Berkeley

Pieter Abbeel

UC Berkeley

Abstract

Example generated vectors

The Tokyo Tower is a communications and observation [...]

A cat as 3D rendered in Unreal Engine. [...]

a kids book cover with an illustration of white dog [...]

A painting of a starry night sky. [...]

A delicious hamburger. [...]

a tuba with red flowers protruding from its bell. [...]

a train. [...]

A smiling sloth wearing a leather jacket, a cowboy hat and a kilt. [...]

The Eiffel Tower. [...]

Family vacation to Walt Disney World. [...]

A realistic photograph of a cat. [...]

Horse eating a cupcake. [...]

Infinitely scalable assets

One-stage text-to-SVG generation

Generating 128 SVG paths from scratch.

Fine-tuning for better quality and speed

Generating 64 SVG paths, initialized from a diffusion sample.

Pixel art

Pixel art generated with VectorFusion, initialized by pixelating a diffusion sample.

Sketches

Sketches generated with VectorFusion.

Citation

VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

Ajay Jain*

UC Berkeley

Amber Xie*

UC Berkeley

Pieter Abbeel

UC Berkeley

Abstract

Example generated vectors

The Tokyo Tower is a communications and observation [...]

A cat as 3D rendered in Unreal Engine. [...]

a kids book cover with an illustration of white dog [...]

A painting of a starry night sky. [...]

A delicious hamburger. [...]

a tuba with red flowers protruding from its bell. [...]

a train. [...]

A smiling sloth wearing a leather jacket, a cowboy hat and a kilt. [...]

The Eiffel Tower. [...]

Family vacation to Walt Disney World. [...]

A realistic photograph of a cat. [...]

Horse eating a cupcake. [...]

Infinitely scalable assets

One-stage text-to-SVG generation

Generating 128 SVG paths from scratch.

Fine-tuning for better quality and speed

Generating 64 SVG paths, initialized from a diffusion sample.

Pixel art

Pixel art generated with VectorFusion, initialized by pixelating a diffusion sample.

Sketches

Sketches generated with VectorFusion.

Citation

Ajay Jain^*

Amber Xie^*