Conversation
allison-truhlar
left a comment
There was a problem hiding this comment.
I will open a branch to work on some possible UI/styling changes.
General questions:
-
Is this meant primarily for usage in the CLI app, or also for deployment to fileglancer.int.janelia.org? If the latter,
- Do we want to add an option for preconfigured Apps? So that when we launch Fileglancer on production, we can add in a "starter pack" of apps that are commonly used or we want to promote the use of?
- This is in part a how-does-the-cluster work question - how is it determined what project to charge a job to? By username? Do we want to make it possible for the user to configure what account to charge a job to?
-
Should we get rid of the /jobs page (points to Tasks) and the associated convert tab and dialog?
-
I tried adding a GitHub repo that didn't have a valid runnables.yaml file - I think the error that was returned could be improved and I was also wondering if we want to remove a cloned repo if it's found not to have a valid runnables.yaml file? Currently it's left on disk.
Yes, please! The UI here is just a placeholder. But I would recommend holding off on this for now, until we can discuss this functionality as a team and finalize the requirements.
This is intended for deployment on fileglancer.int.janelia.org, where apps will run on the cluster. But you can also use it locally with the local executor.
This is a fantastic idea. Let's discuss more in person.
Yes, and I just added that feature this morning.
Not yet, but maybe in the future, I agree we should think closely about this.
Once we nail down the base functionality (and merge the services PR into this one), we'll definitely need to do some passes focused on UX, error handling, etc. |
Fileglancer Services
| ```bash | ||
| eval "$(conda shell.bash hook)" | ||
| conda activate my-analysis-env | ||
| ``` |
There was a problem hiding this comment.
I'm confused why a service would go this route.
conda runis generally the recommended way to run a specific command within an environment, especially during non-interactive use.conda activateis mainly intended for interactive use.
https://docs.conda.io/projects/conda/en/stable/commands/run.html
- If the main purpose is to just to run Python, often this can be done by just providing the full path to a Python executable since conda-forge hardcodes the library path in the executable. This is how Jupyter kernels within conda environments generally function. I would consider an example that more strongly depends on environment activation.
There was a problem hiding this comment.
Sure, this was just a quick hack to get things working with Cellmap Flow. We can revisit this later, if we want to continue supporting conda. Right now, I'm not even sure if we should.
|
|
||
| ## Quick Start | ||
|
|
||
| 1. Create a `runnables.yaml` file in your GitHub repository |
There was a problem hiding this comment.
While I can imagine a few reasons why you are using YAML here, I am surprised to see YAML used for a new project. The pitfalls of YAML are well documented now.
https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell
The issue that bothers me the most is how easy it is to write something that looks correct, but means something else. For example, no being interpreted as false. Another is how easy it is to accidentally write a number when a string is intended. The problem occurs for me most when using version numbers.
Would plain JSON or small extension of it (i.e. JSONC, JSON5) ve sufficient? (Note that JSON and JSONC are subsets of YAML). The other popular alternative is TOML.
There was a problem hiding this comment.
YAML has its issues, but so does every other format. I don't think any of the problems in that blog are relevant to how we're using YAML here.
I experimented with both JSON and TOML and I didn't like how the files looked for this use case. JSON is too verbose and difficult to edit (lived experience with nextflow_schema.json) and TOML is not great for lists of objects.
| | Variable | Availability | Description | | ||
| |----------|-------------|-------------| | ||
| | `FG_WORK_DIR` | All jobs | Absolute path to the job's working directory (contains `repo/` symlink, log files, etc.) | | ||
| | `SERVICE_URL_PATH` | Service-type jobs only | Absolute path where the service should write its URL. Equivalent to `$FG_WORK_DIR/service_url` | |
There was a problem hiding this comment.
Perhaps SERVICE_URL_PATH should also have a FG prefix? FG_SERVICE_URL_PATH
There was a problem hiding this comment.
The reason that I didn't do that is because SERVICE_URL_PATH represents an API that needs to be implemented by apps, such as Cellmap Flow. I didn't want to create a dependency from Cellmap Flow on Fileglancer.
But thanks for asking, because I agree this is odd. I think what might be better is for the variable name to be defined in the runnables.yaml file (perhaps defaulting to SERVICE_URL_PATH). Then each app could decide how it wants the path passed in.
I'm open to other ideas for this API.
This PR adds Apps which are CLI tools that can be executed on the cluster from a web form, and monitored within Fileglancer.
There are two new top level pages: Apps and Jobs. The Apps page allows you to add apps via a GitHub URL that conforms to a new "Runnables Protocol" that I created based on inspiration from Nexflow Schema, CWL, IPP, and Fractal. In short, each app repo contains a
runnables.yamlmanifest that declares one or more entry points, and defines their parameters, resource defaults, and environment variables. A single repo can contain multiple manifests in subdirectories, each discovered and registered as a separate app. Manifests can also reference a separaterepo_urlfor the actual tool code. This will allow us to create a "catalog" repo of external tools in cases their tool's repo can't host therunnables.yaml.Currently, the prerequisites support Pixi, Maven, and NPM, and only Pixi has been tested. I'd like to expand this to also include Conda and containers. Allowing uncontainerized tools like Pixi is a bit of a risk from a complexity standpoint, and also won't work for users who don't have these prereqs set up. I'd like to keep thinking about the best solution to this, such as offering solutions for automatically installing prereqs, pre-installing all prerequisite platforms in a shared location during Fileglancer server setup, exploring solutions like album, etc.
App URLs are stored in your preferences and the repos are cloned and cached in
~/.fileglancer/apps. When you launch an entry point, the manifest is used to auto-generate a web form with appropriate controls for each parameter type. Parameters can be organized into collapsible sections. File and directory parameters use the file selection dialog.Jobs are submitted to the cluster via a new py-cluster-api package (inspired by dask-jobqueue and our java-lsf), which initially supports local and LSF executors. We can easily add others (e.g. Slurm) as necessary for non-Janelia users. The cluster executor type and default resources are configured in Fileglancer's
config.yamlunder a newclustersection. Seedocs/config.yaml.templatefor the possible options.The Jobs page lists all submitted jobs and updates periodically. Clicking on a Job brings you to a Job Detail page that shows four tabs: parameters, the generated script, stdout, and stderr. Jobs can be cancelled, deleted, or relaunched with the same (or modified) parameters.
On server restart, the job monitor automatically reconnects to any active cluster jobs and resumes polling, as long as the
job_name_prefixis set (otherwise this is determined randomly each time, similar to thesession_secret_key). A reconciliation loop detects zombie jobs and syncs status transitions from the executor to the database.Services
Services are cluster jobs that are started and stopped by the user as needed. They can be web servers, notebooks, APIs, etc.
type: serviceinrunnables.yaml. These run until explicitly stopped rather than running to completion.SERVICE_URL_PATHas an env var in the job script. Services write their URL to a new file atSERVICE_URL_PATH. Fileglancer polls this file and when the URL is written it gets displayed in the UI for the user to access.Trying it out
You can paste in these app URLs:
fileglancer)fileglancer)@StephanPreibisch @JaneliaSciComp/fileglancer @dchen116 @mzouink