Build software better, together

duixcom / Duix-Avatar

🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.

cloning video-generation digital-human cloning-tool ai-avatar ai-avatars video-synthesis multimodal-ai

Updated Oct 16, 2025
C

lancedb / vectordb-recipes

Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs

machine-learning ai deep-learning embeddings openai gpt agents fine-tuning multimodal rag vector-database llms langchain llama-index lancedb gpt-4-vision multimodal-ai

Updated Jan 14, 2026
Jupyter Notebook

AutoArk / EVA-OS

Star

EVA OS — A real-time multimodal AIOS for next-generation hardware, enabling your devices being “alive” and as intelligent as a real brain.

real-time robotics webrtc smart-devices voice-assistant aios multimodal-ai

Updated Jan 29, 2026
TypeScript

OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s.

macos machine-learning text-to-speech computer-vision inference tts speech-to-text stt video-understanding audio-processing mlx image-understanding apple-silicon llm mllm vision-language-model vllm multimodal-ai

Updated Jan 29, 2026
Python

Denis2054 / Building-Business-Ready-Generative-AI-Systems

Star

This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers, intelligent agents, and dynamic RAG frameworks. The projects demonstrate practical applications across various domains.

multi-agent-systems ai-agents rag human-centered-ai llms chain-of-thought enterprise-ai agentic-ai ai-architecture multimodal-ai deepseek-r1 context-engineering generative-ai-systems

Updated Aug 9, 2025
Jupyter Notebook

athrael-soju / Snappy

Star

🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐

python docker typescript computer-vision nextjs document-retrieval rag fastapi vector-search document-understanding pdf-search vector-database vision-ai qdrant colpali multimodal-ai multivector-search deepseek-ocr visual-retrieval

Updated Dec 25, 2025
Python

thubZ09 / vision-language-model-research

Star

Hub for researchers exploring VLMs and Multimodal Learning:)

nlp machine-learning research computer-vision deep-learning multimodal-learning multimodal-deep-learning vision-language multimodal-large-language-models vlms multimodal-ai

Updated Jan 26, 2026

sbhjt-gr / InferrLM

Star

On-device AI for iOS & Android

embeddings gemini http-server openai document-processing rag edge-ai on-device-ai local-inference anthropic llamacpp llama-cpp local-llm gguf deepseek multimodal-ai

Updated Jan 28, 2026
TypeScript

kiranbaby14 / TalkMateAI

Star

🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync

websocket nextjs vlm fastapi huggingface whisper-ai flash-attention-2 multimodal-ai kokoro-tts smolvlm

Updated Jul 5, 2025
TypeScript

seehiong / prompt-to-puzzle

Star

A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.

react game typescript computer-vision html5-canvas puzzle-game generative-art text-to-image hackathon-project appwrite google-cloud-run generative-ai google-gemini google-ai-studio spot-the-difference multimodal-ai google-imagen

Updated Sep 13, 2025
TypeScript

alperensumeroglu / ai-clips-maker

Star

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Updated Apr 2, 2025
Python

DmitryRyumin / ICML-2025-Papers

Star

ICML 2025 Papers: Dive into cutting-edge research from the premier machine learning conference. Stay current with breakthroughs in deep learning, generative AI, optimization, reinforcement learning, and beyond. Code implementations included. ⭐ support the future of machine learning research!

machine-learning reinforcement-learning deep-learning optimization reinforcement-learning-algorithms icml ai-research graph-learning diffusion-models generative-ai multimodal-ai icml-2025

Updated Oct 24, 2025

KrishnaswamyLab / ImmunoStruct

Star

[Nature Machine Intelligence] ImmunoStruct enables multimodal deep learning for immunogenicity prediction

Updated Jan 20, 2026
Python

sinanuozdemir / oreilly-multimodal-ai

Star

Learn how multimodal AI merges text, image, and audio for smarter models

openai diffusion multimodal deepgram livekit stable-diffusion dreambooth generative-ai llava dalle-3 llama3 multimodal-ai

Updated Jan 21, 2025
Jupyter Notebook

neocortex-link / neocortex-unity-sdk

Star

Neocortex Unity SDK for Smart NPCs and Virtual Assistants

ai game-development npc npcs game-ai ai-agents conversational-ai smart-agent ai-tools ai-agent aiagent smart-agents aiagents multimodal-ai smart-npc smart-npcs unity-llm unityllm

Updated Jan 28, 2026
C#

microsoft / multimodal-ai

Star

Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.

python ai azure video-analysis azure-ai enterprise-ai multimodal-ai

Updated Jan 27, 2026
HCL

doepking / gemini_multimodal_demo

Star

A demo multimodal AI chat application built with Streamlit and Google's Gemini model. Features include: secure Google OAuth, persistent data storage with Cloud SQL (PostgreSQL), and intelligent function calling. Includes a persona-based newsletter engine to deliver personalized insights.

postgresql google-cloud smtp cloud-sql cloud-run gemini-ai multimodal-ai

Updated Aug 3, 2025
Python

byerlikaya / SmartRAG

Star

⚡ Production-ready .NET Standard 2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, 📄 intelligent document processing, and 🗄️ multi-database query coordination. 🌍 Cross-platform compatible.

Updated Jan 28, 2026
C#

michaelbeijer / Supervertaler

Sponsor

Star

Open source, AI-enhanced CAT tool with multi-LLM support (OpenAI, Claude, Gemini, Ollama), innovative Superlookup concordance system offering access to multiple terminology sources (TMs, glossaries, web resources, etc.), and seamless CAT tool integration (memoQ, Trados, CafeTran, Phrase).

python nlp translation ai localization gemini ahk translation-memory prompts memoq claude cafetran translation-tool cat-tool llm prompt-engineering multimodal-ai context-aware-translation

Updated Jan 28, 2026
Python

umitkacar / awesome-vision-models

Star

Vision Foundation Models: SAM, ViT, CLIP, DINOv2, object detection, segmentation, and multimodal AI for computer vision.

computer-vision sam yolo image-recognition object-detection vit clip semantic-segmentation zero-shot-learning mae instance-segmentation vision-transformers foundation-models visual-understanding open-vocabulary dinov2 grounding-dino multimodal-ai

Updated Nov 10, 2025
Makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-ai

Here are 207 public repositories matching this topic...

duixcom / Duix-Avatar

lancedb / vectordb-recipes

AutoArk / EVA-OS

waybarrios / vllm-mlx

Denis2054 / Building-Business-Ready-Generative-AI-Systems

athrael-soju / Snappy

thubZ09 / vision-language-model-research

sbhjt-gr / InferrLM

kiranbaby14 / TalkMateAI

seehiong / prompt-to-puzzle

alperensumeroglu / ai-clips-maker

DmitryRyumin / ICML-2025-Papers

KrishnaswamyLab / ImmunoStruct

sinanuozdemir / oreilly-multimodal-ai

neocortex-link / neocortex-unity-sdk

microsoft / multimodal-ai

doepking / gemini_multimodal_demo

byerlikaya / SmartRAG

michaelbeijer / Supervertaler

umitkacar / awesome-vision-models

Improve this page

Add this topic to your repo