• Open Daily: 10am - 10pm
    Alley-side Pickup: 10am - 7pm

    3038 Hennepin Ave Minneapolis, MN
    612-822-4611

Open Daily: 10am - 10pm | Alley-side Pickup: 10am - 7pm
3038 Hennepin Ave Minneapolis, MN
612-822-4611
Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paperback

Series: Foundations and Trends(r) in Computer Graphics and Vision

General ComputersProgramming

ISBN10: 1638283362
ISBN13: 9781638283362
Publisher: Now Publishers
Published: May 6 2024
Pages: 230
Weight: 0.72
Height: 0.48 Width: 6.14 Depth: 9.21
Language: English
This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.


The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics - methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics - unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs.

Also from

Li, Chunyuan

Also in

Programming