Images, Audio & Video

Diffusion and Image Generation

Modern AI image generation is dominated by diffusion models, which learn to reverse a noise-adding process to produce images from...

Multimodal models process and reason across more than one type of data, combining text with images, audio, and video in a single m...

Speech and audio AI covers the full pipeline from human voice to machine-generated sound, including transcription, synthesis, voic...