Skip to content

A Streamlit-based Multimodal AI Generator using Google's Gemini API for text and image generation.

Notifications You must be signed in to change notification settings

Eswarpuli/genai-multimodal-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Gemini AI - Multimodal Generator (Text + Image)

This is a multimodal AI web app built with Streamlit and powered by Google's Gemini 2.5 API, enabling both text-to-text and image-to-text generation.

Live App


๐Ÿ”ง Features

  • ๐Ÿค– Text Generation: Enter a prompt and receive intelligent, contextual completions using Gemini's LLM capabilities.
  • ๐Ÿ–ผ๏ธ Image Captioning: Upload an image and get a detailed description using Gemini's vision model.
  • โšก Updated UI: Clean and intuitive layout with improved user experience.
  • ๐ŸŒ Built with Streamlit for responsive, real-time interaction.
  • ๐Ÿง  Powered by Google Generative AI (LLM + Vision multimodal models).

๐Ÿ› ๏ธ Tech Stack

  • Python
  • Streamlit
  • Google Generative AI (Gemini 2.5)
  • PIL (Python Imaging Library)

๐Ÿ“Œ Use Cases

  • Natural Language Text Completion
  • Image Understanding / Caption Generation
  • AI Demos and Multimodal Prototypes

๐Ÿ“ธ Preview

App Preview

About

A Streamlit-based Multimodal AI Generator using Google's Gemini API for text and image generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages