wonwizard WonWizard AI Research and Developement,
Android & Wizard Dreamer, Korea.
Projects and study. AI Video/Audio, AI Medical, AI Challenge...

AI text-to-Image Generaion, Video/3D Generaion Papers/Models

» GAN

AI text-to-Image Generaion, Video/3D Generaion Papers/Models

1. DALL-E, Zero-Shot Text-to-Image Generation (2021.1 OpenAI, text-to-image)

DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.

  • site : https://openai.com/blog/dall-e/
  • paper : https://arxiv.org/abs/2102.12092
  • github/source : https://github.com/openai/DALL-E

2. Disco Diffusion (2021.10 Open Project, text-to-image)

A frankensteinian amalgamation notebooks, models and techniques for the generation of AI Art and Animations.

  • paper : none
  • github/source : https://github.com/alembics/disco-diffusion

3. DALL-E 2, Hierarchical Text-Conditional Image Generation with CLIP Latents (2022.4 OpenAI, text-to-image)

DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.

  • site : https://openai.com/dall-e-2 ,Paid Service)
  • paper : https://arxiv.org/abs/2204.06125v1
  • github/source : not open

4. MidJourney (2022.3 Discord App)

text-to-image Discord Bot

  • Discord Paid Service

5. Imagen, Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (2022.5 Google, text-to-image)

unprecedented photorealism × deep level of language understanding

  • site : https://imagen.research.google
  • paper : https://arxiv.org/abs/2205.11487
  • github/source : not open

6. Stable Diffusion (2022.8 stability.ai, text-to-image)

text-to-image Public open relese

  • site: https://stability.ai/blog/stable-diffusion-public-release
    service : https://beta.dreamstudio.ai/
  • paper: https://arxiv.org/abs/2112.10752v2
  • github/source : https://github.com/CompVis/stable-diffusion

7. MDM: Human Motion Diffusion Model (2022.9, Tel Aviv University(Israel), text-to-motion(3D))

We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion

  • site : https://guytevet.github.io/mdm-page
  • paper : https://arxiv.org/abs/2209.14916
  • github/source : https://github.com/GuyTevet/motion-diffusion-model

8. DreamFusion: Text-to-3D using 2D Diffusion (22.9, Google, UC Berkeley, text-to-3D)

Generate 3D from text yourself

  • site : https://dreamfusion3d.github.io
  • paper : https://arxiv.org/abs/2209.14988
  • github/source : not open
  • not official : imagen -> stable diffusion
    https://github.com/ashawkey/stable-dreamfusion
    A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.

9. Make-A-Video (22.9, Meta AI, text-to-video)

We propose Make-A-Video – an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

  • site : https://make-a-video.github.io/
  • paper : https://arxiv.org/abs/2209.14792
  • github/source : none

10. Phenaki: Variable Length Video Generation From Open Domain Textual Description (2022.10, Google, text-to-video)

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts.

  • site : https://phenaki.video/
  • paper : https://arxiv.org/abs/2210.02399
  • github/source : not open

11. Imagen Video (2022.10, Google, text-to-video)

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.

  • site : https://imagen.research.google/video/
  • paper : https://arxiv.org/abs/2210.02303
  • github/source : not open

[View on WonWizard GitHub]