-
Notifications
You must be signed in to change notification settings - Fork 113
Fail to Reproduce WebArena Results with GenericAgent-GPT-4o #249
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Dear AgentLab Authors,
Thank you for the great work! I'm trying to reproduce the WebArena Results with GenericAgent-GPT-4o. In particular, I used the following code. Everything should just follow AgentLab's default. However the number I got is 25 which is significantly lower than 31.4 as shown on the BrowserGym Leaderboard. Do you have any suggestions for the reproduction? Any code available to reproduce the performance ~31?
Thanks again for you great contribution to the community!
from agentlab.agents.generic_agent import AGENT_4o
from agentlab.experiments.study import make_study
from agentlab.experiments.study import Study
study = make_study(
benchmark="webarena",
agent_args=[AGENT_4o],
comment="repo 4o agent",
)
study.run(n_jobs=5)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working