AI text-to-Image Generaion, Video/3D Generaion Papers/Models

Oct 10, 2022 » GAN

AI text-to-Image Generaion, Video/3D Generaion Papers/Models

1. DALL-E, Zero-Shot Text-to-Image Generation (2021.1 OpenAI, text-to-image)

DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.

site : https://openai.com/blog/dall-e/
paper : https://arxiv.org/abs/2102.12092
github/source : https://github.com/openai/DALL-E

2. Disco Diffusion (2021.10 Open Project, text-to-image)

A frankensteinian amalgamation notebooks, models and techniques for the generation of AI Art and Animations.

paper : none
github/source : https://github.com/alembics/disco-diffusion

3. DALL-E 2, Hierarchical Text-Conditional Image Generation with CLIP Latents (2022.4 OpenAI, text-to-image)

DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.

site : https://openai.com/dall-e-2 ,Paid Service)
paper : https://arxiv.org/abs/2204.06125v1
github/source : not open

4. MidJourney (2022.3 Discord App)

text-to-image Discord Bot

Discord Paid Service

5. Imagen, Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (2022.5 Google, text-to-image)

unprecedented photorealism × deep level of language understanding

site : https://imagen.research.google
paper : https://arxiv.org/abs/2205.11487
github/source : not open

6. Stable Diffusion (2022.8 stability.ai, text-to-image)

text-to-image Public open relese

site: https://stability.ai/blog/stable-diffusion-public-release
service : https://beta.dreamstudio.ai/
paper: https://arxiv.org/abs/2112.10752v2
github/source : https://github.com/CompVis/stable-diffusion

7. MDM: Human Motion Diffusion Model (2022.9, Tel Aviv University(Israel), text-to-motion(3D))

We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion

site : https://guytevet.github.io/mdm-page
paper : https://arxiv.org/abs/2209.14916
github/source : https://github.com/GuyTevet/motion-diffusion-model

8. DreamFusion: Text-to-3D using 2D Diffusion (22.9, Google, UC Berkeley, text-to-3D)

Generate 3D from text yourself

site : https://dreamfusion3d.github.io
paper : https://arxiv.org/abs/2209.14988
github/source : not open
not official : imagen -> stable diffusion
https://github.com/ashawkey/stable-dreamfusion
A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.

9. Make-A-Video (22.9, Meta AI, text-to-video)

We propose Make-A-Video – an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

site : https://make-a-video.github.io/
paper : https://arxiv.org/abs/2209.14792
github/source : none

10. Phenaki: Variable Length Video Generation From Open Domain Textual Description (2022.10, Google, text-to-video)

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts.

site : https://phenaki.video/
paper : https://arxiv.org/abs/2210.02399
github/source : not open

11. Imagen Video (2022.10, Google, text-to-video)

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.

site : https://imagen.research.google/video/
paper : https://arxiv.org/abs/2210.02303
github/source : not open

[View on WonWizard GitHub]