Chromium Browser with Component Extraction & AI Agent System

A full-featured Chromium-based browser built with Electron that includes advanced component extraction and composition capabilities, plus a powerful AI Agent system for natural language system control.

🤖 AI Agent System - 39 Capabilities for Total System Control

Control your computer using natural language! The AI Agent system provides 39 powerful capabilities across 4 modules:

🎯 What You Can Do

System Control (18 capabilities): Open/close apps, manage services, analyze performance
File Management (10 capabilities): Create, edit, view, organize files intelligently
Process Management (3 capabilities): Monitor and control processes and services
Task Automation (6 capabilities): Schedule tasks, create workflows, save macros
AI Chat (2 capabilities): Natural conversation and intelligent Q&A

🚀 Quick Start

Click the 🤖 Agent Chat button in the bottom right
Type natural language commands like:
- system info - View CPU, RAM, disk usage
- find slow processes - Find resource hogs
- list processes - Show running processes
- open calculator - Launch apps
- check disk - View disk space
- help - See all 39 capabilities

� Documentation

QUICK_START.md - Get started in 60 seconds
AGENT_COMMANDS.md - Complete command reference
AGENT_SYSTEM.md - Architecture details

⚠️ Important: How Commands Work

The system shows capability names like open_application, but you need to use natural language:

❌ Don't type: open_application
✅ Type instead: open notepad

See QUICK_START.md for full details!

Problem Statement

The Challenge: Modern workflows require monitoring multiple web applications, dashboards, and data sources simultaneously. Users often need to switch between dozens of tabs, losing focus and productivity while trying to compose insights from scattered information.

The Solution: ComponentFlow eliminates tab chaos by letting you extract specific components (charts, metrics, feeds) from any website and compose them into unified, live-updating dashboards - all while maintaining full browsing capabilities.

Screenshots

AI-Powered Browsing Experience

Full-featured Chromium browser with integrated AI summarizer for intelligent page analysis and insights.

Advanced Composition Dashboard

Compose live components from multiple websites into unified dashboards with real-time data updates.

Clean, Modern Interface

Elegant home interface with smart extraction, live composition, and AI analysis capabilities.

Integrated System Explorer

Built-in system explorer for seamless file management and workspace organization.

Overview

This project provides:

Chromium Browser: A fully functional web browser with tabs, navigation, and all standard browser features
Component Extraction: Extract specific HTML components (divs, sections, etc.) from any website using CSS selectors
Component Composition: Combine extracted components from different websites into a single unified view
AI-Powered Summarizer: AI-powered page summarization and analysis capabilities

Features

Browser Features

Multi-Tab Browsing: Open and manage multiple tabs just like Chrome
Full Navigation: Address bar, back/forward buttons, refresh, home button
Browse Any Website: Navigate to any URL on the internet
Webview Integration: Each tab runs in its own isolated webview

🆕 Voice Commands & Speech Recognition

Offline Speech Recognition: Local Python server with Whisper for voice commands
No Internet Dependency: Works completely offline once models are downloaded
Natural Voice Commands: Speak commands like "open aws", "new tab", "refresh"
Audio Recording: Built-in microphone recording with MediaRecorder API
Privacy-Focused: All audio processing happens locally on your machine

📖 Local Speech Recognition Setup →

🆕 System Control & File Management

Natural Language Commands: Control your system using plain English
File Operations: Create, read, delete, move, and copy files
Directory Management: Browse, search, and organize folders
Command Execution: Run system commands safely with output capture
Smart Command Parser: Understands intent from natural language

📖 Full System Control Documentation →

Component Extraction

CSS Selector Extraction: Extract any component using CSS selectors (e.g., .header, #main-content, div.card)
Visual Preview: See extracted components before saving
HTML Inspection: View the full HTML of extracted components

Component Composition

Multi-Source Composition: Combine components from different websites
Live Dynamic Components: Each component is a live webview that maintains its functionality
Real-Time Updates: Components make actual API calls and update dynamically
Isolated Rendering: Each component runs independently with its own context
Auto-Sizing: Components automatically adjust height based on content
Component Management: Save, remove, and organize extracted components

Project Structure

chromium-browser/
├── main.js                    # Electron main process
├── src/
│   ├── browser.html          # Browser UI
│   ├── browser.css           # Browser styles
│   └── browser.js            # Browser logic (with local speech recognition)
├── speech_server.py          # Python Flask server for speech recognition
├── requirements.txt           # Python dependencies for speech server
├── start_speech_server.bat   # Windows batch script to start speech server
├── LOCAL_SPEECH_README.md     # Detailed speech recognition documentation
├── start-all.js              # Start all apps at once
├── package.json
└── README.md

Installation

Clone the repository:

git clone <your-repo-url>
cd chromium-browser

Install dependencies:

npm install

Environment Setup:

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys
# Required: OPENAI_API_KEY for AI summarization features

Set up Local Speech Recognition (Optional but recommended):

# Install Python dependencies for speech recognition
pip install -r requirements.txt

# Or use the provided setup script (Windows)
# Double-click start_speech_server.bat

Optional - Set up local development apps:

The project includes sample applications for testing component extraction. These are optional for general browser use.
```
# Create the apps directory if you want to use demo apps
mkdir apps

# See DEVELOPMENT.md for instructions on setting up demo applications
```

Usage

Quick Start (Recommended)

Start everything with one command:

npm run start-apps

This will launch all three demo web apps and the Chromium browser automatically.

Manual Start

Start the Speech Server (for voice commands):

# Windows - double-click this file
start_speech_server.bat

# Or manually:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python speech_server.py

Start the Chromium browser:

npm start

How to Use

Basic Browsing

Enter URL: Type any URL in the address bar and press Enter or click "Go"
New Tab: Click "+ New Tab" to open additional tabs
Navigate: Use ← → ⟳ ⌂ buttons for back, forward, refresh, and home
Switch Tabs: Click on any tab to switch to it
Close Tabs: Click the × on any tab to close it

Using Voice Commands

Start the Speech Server: Run start_speech_server.bat or python speech_server.py
Enable Voice Commands: Click the 🎤 microphone icon in the toolbar
Start Recording: Click the microphone again to begin recording
Speak Clearly: Say a command like "open aws", "new tab", or "refresh"
Wait for Processing: The system will transcribe and execute your command

Supported Voice Commands:

"open aws" → Opens AWS cost app
"open azure" → Opens Azure cost app
"open store" → Opens e-commerce app
"new tab" → Creates new tab
"close tab" → Closes current tab
"go home" → Navigates to home
"refresh" → Reloads page
"show system" → Opens system monitor

Voice Command Tips:

Speak clearly and close to your microphone
Wait for the recording to complete (5 seconds)
Check the browser console if commands aren't working
The speech server must be running on port 5000

Extracting Components

Method 1: Hover Selection (Recommended)

Navigate to a website (e.g., http://localhost:3003 for the E-commerce dashboard)
Click "📦 Extract" button to open the extraction panel
Click "🎯 Pick Element by Hovering" button
Hover over any element on the page - it will be highlighted with a green outline
Click the element you want to extract
The selector will be auto-filled and extraction will start automatically
Click "Save to Composition" to add it to your collection

Method 2: Manual CSS Selector

Navigate to a website (e.g., http://localhost:3003 for the E-commerce dashboard)
Click "📦 Extract" button to open the extraction panel
Enter a CSS selector manually:
- .metrics - Extract the metrics section from e-commerce app
- .product-table - Extract the product table
- .stat-card - Extract individual stat cards from AWS app
- .service-list - Extract the service list
Click "Extract Component" to preview the extracted HTML
Click "Save to Composition" to add it to your collection

Composing Components

Click "🎨 Compose" button to open the composition view
View all saved components from different websites
See the live preview of all components combined
Remove components you don't want
Click "Back to Browser" to return to browsing

Using System Control

Click the "System Explorer" button (folder icon) in the toolbar
Select the "Commands" tab for natural language control
Type a command like:
- create file test.txt
- list files in C:\Users
- copy "document.txt" to "backup.txt"
- run command "dir"
Click "Execute" or press Enter
View results with detailed output and status

Examples:

create file "notes.txt" with content "Hello World"
delete file old_data.txt
move "file1.txt" to "C:\Backup\file1.txt"
show files in Downloads
run command "ipconfig"

See SYSTEM_CONTROL.md for complete documentation and examples.

🎤 Testing Voice-Activated System Monitoring

Prerequisites:

Start the speech recognition server (see LOCAL_SPEECH_README.md)
Ensure your microphone is connected and permissions are granted

Test Procedure:

Click the 🎤 microphone icon in the navigation bar
Wait for "Recording..." status with red bars
Speak clearly: "Show system resources"
Wait for processing - the command will be transcribed
System panel opens automatically with detailed information

Available Voice Commands:

"Show system resources"  → Opens system panel with all info
"Show CPU"               → Opens system panel (focused on CPU)
"Show disk"              → Opens system panel (disk information)
"Show memory"            → Opens system panel (RAM usage)
"Show services"          → Opens system panel (Windows services)
"Show processes"         → Opens system panel (running processes)

Test File Creation via Voice:

Click the 🎤 microphone icon
Say: "Create file hello.py on desktop"
Check your Desktop for the newly created hello.py file
File will contain Python template code with timestamp

Other supported filenames:

"Create file test.txt on desktop" - Creates text file
"Create file script.js on desktop" - Creates JavaScript file
"Make file notes.txt on desktop" - Alternative phrasing

What You'll See:

System Resources: CPU usage per core, total/used/free memory, disk space per drive
Windows Services: List of running services with status
Running Processes: Top processes sorted by memory usage
Real-time Metrics: Visual bars showing CPU and memory percentages
Desktop File: Python file with print statement and timestamp

Example Selectors for Demo Apps

E-commerce App (localhost:3003):

.metrics - All metric boxes
.metric-box - Individual metric
.product-table - Product table
.orders-chart - Weekly orders chart

AWS Cost App (localhost:3001):

.stats-grid - All statistics
.stat-card - Individual stat card
.chart-section - Cost trend chart
.service-list - Service cost list

Azure Cost App (localhost:3002):

.azure-costs - Main container
.cost-card - Individual cost cards
.resource-table - Resource usage table

Development

For development with auto-restart:

npm run dev

Building

Development Build

For development with auto-restart:

npm run dev

Production Build

To build the Chromium browser app for distribution:

Quick Build (Recommended):

npm run build

This will:

Clean any previous builds
Package the application using electron-packager
Create a distributable version in dist/chromium-browser-win32-x64/

Manual Build Commands:

# Clean previous builds
npm run clean

# Package for Windows
npm run package

# Package for all platforms (Windows, macOS, Linux)
npm run package:all

Built Application

After building, you'll find:

Executable: dist/chromium-browser-win32-x64/chromium-browser.exe
Complete App: The entire dist/chromium-browser-win32-x64/ folder contains everything needed to run the app

Running the Built App:

cd dist/chromium-browser-win32-x64
./chromium-browser.exe

Or simply double-click chromium-browser.exe in the file explorer.

Build Scripts Available

Script	Description
`npm run build`	Complete build process with cleaning
`npm run package`	Package for Windows x64
`npm run package:all`	Package for all platforms
`npm run clean`	Remove all build files

Distribution

The built application is portable and self-contained. You can:

Zip the folder: Compress dist/chromium-browser-win32-x64/ and share
Copy to other machines: The entire folder works on any Windows x64 machine
Create installer: Use tools like NSIS or Inno Setup for proper installers

Build Requirements

Node.js: v16 or higher
npm: v7 or higher
Windows: x64 architecture (for Windows builds)
Disk Space: ~200MB for build output

Troubleshooting Build Issues

Common Issues:

"Cannot create symbolic link": Run terminal as Administrator or disable code signing
ENOENT errors: Ensure all dependencies are installed with npm install
Permission errors: Check write permissions in the project directory

Technologies Used

Electron: Cross-platform desktop app framework
Chromium: Full-featured browser engine (via Electron webview)
Express.js: Web server for demo applications
Cheerio: Server-side HTML parsing for component extraction
Axios: HTTP client for fetching web pages
Flask: Python web framework for speech recognition server
OpenAI Whisper: Offline speech recognition model
MediaRecorder API: Browser API for audio recording
HTML/CSS/JavaScript: Frontend technologies

How It Works

Browser: Each tab creates an isolated <webview> element that loads websites independently
Extraction: When you request extraction, the browser fetches the page HTML via Axios and parses it with Cheerio on the backend to validate the selector
Selector Matching: Cheerio finds elements matching your CSS selector and extracts metadata
Storage: Component metadata (URL + selector) is stored in memory
Composition: Each saved component creates a live webview that:
- Loads the full website in the background
- Hides all elements except your selected component using CSS injection
- Maintains full functionality (API calls, timers, WebSockets, etc.)
- Updates dynamically just like the original website
- Auto-adjusts height based on content

This means your composed dashboard shows live, real-time data from multiple sources!

Use Cases

Dashboard Composition: Combine metrics from multiple monitoring tools
Report Generation: Extract data visualizations from different sources
Component Library: Build a library of reusable UI components
Web Scraping: Extract specific data from websites for analysis
Learning: Study how different websites structure their HTML

Limitations

CORS: Some websites may block extraction due to CORS policies
Dynamic Content: JavaScript-rendered content may not be captured
Authentication: Cannot extract from pages requiring login (without additional setup)
Styling: Extracted components may look different if they depend on external stylesheets

Future Enhancements

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
assets		assets
src		src
.env.example		.env.example
.gitignore		.gitignore
AGENT_COMMANDS.md		AGENT_COMMANDS.md
AGENT_EXAMPLES.md		AGENT_EXAMPLES.md
AGENT_IMPLEMENTATION_SUMMARY.md		AGENT_IMPLEMENTATION_SUMMARY.md
AGENT_QUICKSTART.md		AGENT_QUICKSTART.md
AGENT_SYSTEM.md		AGENT_SYSTEM.md
INSTALL_FFMPEG.md		INSTALL_FFMPEG.md
LOCAL_SPEECH_README.md		LOCAL_SPEECH_README.md
MICROPHONE_TROUBLESHOOTING.md		MICROPHONE_TROUBLESHOOTING.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SPEECH_COMPLETE_FIX.md		SPEECH_COMPLETE_FIX.md
SPEECH_FIX_SUMMARY.md		SPEECH_FIX_SUMMARY.md
SPEECH_INTEGRATION_GUIDE.md		SPEECH_INTEGRATION_GUIDE.md
SPEECH_TESTING_GUIDE.md		SPEECH_TESTING_GUIDE.md
SYSTEM_CONTROL.md		SYSTEM_CONTROL.md
UNIFIED_FRAMEWORK.md		UNIFIED_FRAMEWORK.md
VOICE_RECOGNITION_FIX.md		VOICE_RECOGNITION_FIX.md
VOICE_SYSTEM_GUIDE.md		VOICE_SYSTEM_GUIDE.md
aisummarizer.png		aisummarizer.png
build.js		build.js
composer.png		composer.png
homepage.png		homepage.png
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
setup_ffmpeg_path.bat		setup_ffmpeg_path.bat
speech_server.py		speech_server.py
start-all.js		start-all.js
start_speech_server.bat		start_speech_server.bat
summarizer-server.js		summarizer-server.js
systemaccess.png		systemaccess.png
test-agent-system.js		test-agent-system.js
test-useragent.js		test-useragent.js
test.txt		test.txt

ActuallyIR/ComponentFlow

Folders and files

Latest commit

History

Repository files navigation