AI text-to-Image Generaion, Video/3D Generaion Papers/Models
AI text-to-Image Generaion, Video/3D Generaion Papers/Models
1. DALL-E, Zero-Shot Text-to-Image Generation (2021.1 OpenAI, text-to-image)
DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.
- site : https://openai.com/blog/dall-e/
- paper : https://arxiv.org/abs/2102.12092
- github/source : https://github.com/openai/DALL-E
2. Disco Diffusion (2021.10 Open Project, text-to-image)
A frankensteinian amalgamation notebooks, models and techniques for the generation of AI Art and Animations.
- paper : none
- github/source : https://github.com/alembics/disco-diffusion
3. DALL-E 2, Hierarchical Text-Conditional Image Generation with CLIP Latents (2022.4 OpenAI, text-to-image)
DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.
- site : https://openai.com/dall-e-2 ,Paid Service)
- paper : https://arxiv.org/abs/2204.06125v1
- github/source : not open
4. MidJourney (2022.3 Discord App)
text-to-image Discord Bot
- Discord Paid Service
5. Imagen, Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (2022.5 Google, text-to-image)
unprecedented photorealism × deep level of language understanding
- site : https://imagen.research.google
- paper : https://arxiv.org/abs/2205.11487
- github/source : not open
6. Stable Diffusion (2022.8 stability.ai, text-to-image)
text-to-image Public open relese
- site: https://stability.ai/blog/stable-diffusion-public-release
service : https://beta.dreamstudio.ai/ - paper: https://arxiv.org/abs/2112.10752v2
- github/source : https://github.com/CompVis/stable-diffusion
7. MDM: Human Motion Diffusion Model (2022.9, Tel Aviv University(Israel), text-to-motion(3D))
We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion
- site : https://guytevet.github.io/mdm-page
- paper : https://arxiv.org/abs/2209.14916
- github/source : https://github.com/GuyTevet/motion-diffusion-model
8. DreamFusion: Text-to-3D using 2D Diffusion (22.9, Google, UC Berkeley, text-to-3D)
Generate 3D from text yourself
- site : https://dreamfusion3d.github.io
- paper : https://arxiv.org/abs/2209.14988
- github/source : not open
- not official : imagen -> stable diffusion
https://github.com/ashawkey/stable-dreamfusion
A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.
9. Make-A-Video (22.9, Meta AI, text-to-video)
We propose Make-A-Video – an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).
- site : https://make-a-video.github.io/
- paper : https://arxiv.org/abs/2209.14792
- github/source : none
10. Phenaki: Variable Length Video Generation From Open Domain Textual Description (2022.10, Google, text-to-video)
We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts.
- site : https://phenaki.video/
- paper : https://arxiv.org/abs/2210.02399
- github/source : not open
11. Imagen Video (2022.10, Google, text-to-video)
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.
- site : https://imagen.research.google/video/
- paper : https://arxiv.org/abs/2210.02303
- github/source : not open