Skip to content

[Hackathon] feat: Workflow Hub — Community Gallery for Sharing and Forking Workflows#5113

Open
EmilySun621 wants to merge 4 commits into
apache:mainfrom
EmilySun621:hackathon/workflow-hub
Open

[Hackathon] feat: Workflow Hub — Community Gallery for Sharing and Forking Workflows#5113
EmilySun621 wants to merge 4 commits into
apache:mainfrom
EmilySun621:hackathon/workflow-hub

Conversation

@EmilySun621
Copy link
Copy Markdown

@EmilySun621 EmilySun621 commented May 16, 2026

TL;DR: A community-powered gallery where researchers publish workflows and others can browse, star, and one-click fork them into their workspace — like GitHub for data science pipelines. Ships with 15 curated seed workflows so the Hub is never empty.

⚠️ Still under testing


😤 The Problem

A new student joins the lab. She needs to build a diabetes prediction pipeline but has no idea where to start. There's no way to discover what others have built, no way to learn from existing workflows, no way to reuse them.

Every researcher starts from zero. Every time.


✨ What We Built

🌐 Workflow Hub

A browsable, searchable gallery of community workflows — star your favorites, fork them into your workspace, publish your own.


📋 Hub List Page — Sidebar → Hub → Workflow Hub

Feature Details
🔎 Full-text search By name, tag, or description
🏷️ Category filters Biomedical, NLP, CV, Finance, EDA, Education, Tabular
📊 Sort by Trending · Most Stars · Most Forks · Recent
🖼️ Featured grid 3-column layout with DAG preview thumbnails
🤖 Agent badge Workflows generated by custom agents are labeled

📄 Workflow Detail Page

Feature Details
👤 Author info Avatar, name, publish date
📝 Description Full writeup of what the workflow does
🔀 DAG preview Operators shown as connected boxes
🏷️ Tags Searchable labels
📊 Stats panel ⭐ Stars · 🍴 Forks · 👁️ Views · 🧩 Operators
🤖 Agent config "Import Agent Config" button when workflow was agent-generated

🍴 Fork — One-Click Reuse

Click Fork → new workflow created as [Fork] Original Title → opens in workspace with all operators pre-configured → add your own data and run.

Fork creates a real workflow in Texera's backend, not a localStorage copy.


⭐ Star — Save Favorites

Toggle star on any workflow. Count updates instantly. Persists across sessions.


📤 Publish — Share Your Work

Click "Publish Workflow" → select a saved workflow → add title, description, category, tags → operators auto-extracted from workflow content → published to the Hub for everyone.


📦 15 Seed Workflows — Never Empty

The Hub ships pre-loaded so new users see a living gallery from day one:

Category Workflows
🧬 Biomedical Diabetes Prediction (CRISP-DM), Heart Disease, Breast Cancer, COVID-19 Clinical Trials
📝 NLP Sentiment Analysis, News Topic Classification
💰 Finance Credit Card Fraud, Stock Price Regression
👁️ CV MNIST Digits
📊 EDA Movie Recommendation EDA, Air Quality
🎓 Education UCI Iris Beginner, Titanic Survival, Wine Quality
📋 Tabular Census Income Prediction

Each seed includes title, author, description, category, tags, operator list, and star/fork/view counts.


🎬 Demo Walkthrough

  1. 🌐 Sidebar → "Workflow Hub" → 15 community workflows
  2. 🏷️ Filter "Biomedical" → see diabetes, heart disease, cancer workflows
  3. 📄 Click "Diabetes Prediction (CRISP-DM)"
  4. 📊 See description, DAG preview, 142 stars, 38 forks
  5. 🍴 Click "Fork to My Workflows"
  6. 🖥️ [Fork] Diabetes Prediction opens in workspace with operators
  7. ⚙️ Configure CSV source with your own data → Run
  8. 📤 Click "Publish Workflow" → share your own workflow back to the Hub

📸 Screenshots

Screenshot 2026-05-16 at 12 15 24 PM Screenshot 2026-05-16 at 12 15 29 PM Screenshot 2026-05-16 at 12 15 39 PM Screenshot 2026-05-16 at 12 16 08 PM ---

🏗️ Architecture

┌──────────────────────────────────────────────────────┐
│  🖥️  Frontend (Angular)                              │
│                                                      │
│  New Components                                      │
│  • Hub list page (search, sort, categories, cards)   │
│  • Detail page (DAG preview, fork, star, stats)      │
│  • Publish dialog (select workflow, add metadata)     │
│                                                      │
│  New Services                                        │
│  • workflow-hub.service (seed data, localStorage     │
│    CRUD, star/fork logic)                            │
│                                                      │
│  Modified (additive only)                            │
│  • 2 new routes, sidebar link                        │
├──────────────────────────────────────────────────────┤
│  🔒 Texera Core Engine (Amber) — UNMODIFIED          │
│  Fork creates real workflows through existing         │
│  backend APIs — no new endpoints needed              │
└──────────────────────────────────────────────────────┘

🔒 Zero modifications to Texera's core engine


✅ Testing

Test Status
Angular typecheck ✅ Clean
Seed data renders on first visit ✅ Pass
Search filters by name/tag/description ✅ Pass
Category chips filter correctly ✅ Pass
Sort (trending/stars/forks/recent) ✅ Pass
Star toggle persists ✅ Pass
Fork creates real workflow in backend ✅ Pass
Forked workflow opens in editor with operators ✅ Pass
Publish extracts operators from content ✅ Pass

💡 Why This Matters

Every data science platform has a workflow editor. Almost none have a community layer where users discover and build on each other's work. The Hub turns Texera from a tool you use alone into a platform where knowledge compounds.

Start from scratch every time, no idea what others have built

Browse → fork → customize → publish back — standing on each other's shoulders

Emily Sun and others added 4 commits May 15, 2026 21:55
This bundles the feature work that built up on this branch:

- Custom agents: dashboard CRUD page and editor dialog (48px icon tile,
  chip-style guardrails, model selector). Each custom agent now carries a
  LiteLLM model_name (Opus 4.7 / Haiku 4.5) that is passed through to the
  agent-service so different agents can use different models.

- Conversation history is scoped per (workflowId, agentId): switching
  agent or workflow yields a different conversation list. localStorage
  key: texera.workflowConversations.v1.{workflowId}.{agentId}.

- Time machine: workflow snapshot list, revert, and agent-tagged
  checkpoints. New workflow-history-tool in agent-service backs the
  "undo my last change" flow; amber gains a WorkflowSnapshotResource;
  sql/updates/23.sql adds the snapshot table.

- Operator-aware custom-agent prompts: the system prompt now injects the
  full operator catalog with a "prefer built-in operators over Python
  UDFs" rule, sourced from WorkflowSystemMetadata at request time.

- LiteLLM: added the claude-opus-4.7 entry alongside claude-haiku-4.5
  and gpt-5-mini in bin/litellm-config.yaml.

- Agent panel rewritten around the (conversation list / chat) two-view
  model with subscription-managed list reloads and per-step persistence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a community gallery under /dashboard/hub/workflow-hub where users
browse, star, fork, and publish data science workflows. Backed by 15
seed entries and localStorage for stars/forks/views/publishes so the
page is never empty.

- List page: search, sort (trending/stars/forks/recent), category
  chips, featured grid, DAG-chain preview cards, agent badges.
- Detail page: SVG DAG preview, stats panel, fork-to-my-workflows
  (uses WorkflowPersistService.duplicateWorkflow when a backend wid is
  attached, otherwise falls back to a local stub), star toggle, and an
  optional 'Agent Included' card.
- Publish dialog: pulls the user's workflows via the persist service,
  derives operator chain from workflow content, writes a hub entry to
  localStorage.
- Sidebar: 'Workflow Hub' link added to the Hub submenu.
Seed entries don't have a workflowId, so the previous code only
incremented a localStorage counter and navigated to /dashboard/user/workflow
without actually writing to the backend — the forked workflow never showed
up in the Workflows page. Now the seed path calls
WorkflowPersistService.createWorkflow with empty content named
"[Fork] <title>", waits for the backend to return the new wid, and routes
straight into the new workflow's workspace. The duplicate-workflow path
for real-wid entries is unchanged.
The previous fix used createWorkflow with empty content, so forking a seed
entry produced a workflow with the right name but zero operators — the
"Executions doesn't exist" 403 the user saw was just the workspace trying
to load nonexistent executions for an empty workflow.

Now seed entries carry a sampleOperators field listing REAL Texera operator
types from the running backend's metadata (verified against the 163
operators the deployed build exposes). When the user forks:

1. Wait for OperatorMetadataService to publish the schema list.
2. For each known sampleOperators type, build a proper OperatorPredicate via
   WorkflowUtilService.getNewOperatorPredicate (which fills in ports, default
   properties, and the correct operatorVersion).
3. Connect consecutive operators by their first output→input ports.
4. Lay them out in a horizontal chain (200px apart).
5. POST to /workflow/create with the populated WorkflowContent and navigate
   to the new wid.

Any sampleOperators not present in the running build land in a single
comment box at the top of the canvas so the user can see what was intended.

For real (published) hub entries with a workflowId, the path is still
WorkflowPersistService.duplicateWorkflow — unchanged.
@github-actions github-actions Bot added engine ddl-change Changes to the TexeraDB DDL frontend Changes related to the frontend GUI dev common agent-service labels May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service common ddl-change Changes to the TexeraDB DDL dev engine frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant