Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Author: Chunyuan Li , Zhe Gan , Zhengyuan Yang , Jianwei Yang
Publisher: now publishers Inc
ISBN:

9781638283362

Pages: 230
Publication Date: 06 May 2024
Format: Paperback
Availability: In Print

This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us.

Our Price $261.36 Quantity:

Share |

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Customer Reviews (0)

Overview

This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants. The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics – methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics – unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs. The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.

Full Product Details

Author: Chunyuan Li , Zhe Gan , Zhengyuan Yang , Jianwei Yang
Publisher: now publishers Inc
Imprint: now publishers Inc
Weight: 0.330kg
ISBN:

9781638283362

ISBN 10: 1638283362
Pages: 230
Publication Date: 06 May 2024
Audience: Professional and scholarly , Professional & Vocational
Format: Paperback
Publisher's Status: Active
Availability: In Print

This item will be ordered in for you from one of our suppliers. Upon receipt, we will promptly dispatch it out to you. For in store availability, please contact us.

1. Introduction 2. Visual Understanding 3. Visual Generation 4. Unified Vision Models 5. Large Multimodal Models: Training with LLMs 6. Multimodal Agents: Chaining Tools with LLM 7. Conclusion and Research Trends Acknowledgments References

Reviews

Author Information

Tab Content 6

Author Website:

Customer Reviews

Recent Reviews

No review item found!

Add your own review!

Countries Available

All regions

Latest Reading Guide

Shopping Cart

Your cart is empty

Mailing List

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

9781638283362

Availability Information

Overview

Full Product Details

9781638283362

Table of Contents

Reviews

Author Information

Tab Content 6

Customer Reviews

Recent Reviews

Countries Available

Sign up now