Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 27 additions & 14 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
# specific language governing permissions and limitations
# under the License.

# For some actions, we use Runs-On to run them on ASF infrastructure: https://datafusion.apache.org/contributor-guide/#ci-runners

name: Rust

concurrency:
Expand Down Expand Up @@ -45,7 +47,7 @@ jobs:
# Check crate compiles and base cargo check passes
linux-build-lib:
name: linux build test
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m7a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=8,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was thinking is that someone in the future may run into this and have a hard time understanding what all the runs-on is all about

I didn't see it documented anywhere -- https://github.com/search?q=repo%3Aapache%2Fdatafusion%20runs-on&type=code

Is there any chance that you could write up some documentation in a README that explains why we are using runs-on and how to apply/unapply it?

For example, it might not be obvious without the context of this PR, to know that

  runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}

Was the same as

    runs-on: ubuntu-latest

Except that in the apache repo it triggers the use of special runners

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docs/source/contributor-guide/index.md and referenced that in CI yml file. Do you think we should also mention that in readme?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge this PR now but happy to expand docs further in the follow up in you like 🙂

container:
image: amd64/rust
steps:
Expand Down Expand Up @@ -99,7 +101,7 @@ jobs:
linux-datafusion-substrait-features:
name: cargo check datafusion-substrait features
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
Expand Down Expand Up @@ -136,10 +138,11 @@ jobs:
linux-datafusion-proto-features:
name: cargo check datafusion-proto features
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
Expand Down Expand Up @@ -167,10 +170,11 @@ jobs:
linux-cargo-check-datafusion:
name: cargo check datafusion features
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
Expand Down Expand Up @@ -267,7 +271,7 @@ jobs:
linux-test:
name: cargo test (amd64)
needs: linux-build-lib
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m7a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
volumes:
Expand Down Expand Up @@ -318,8 +322,9 @@ jobs:
linux-test-datafusion-cli:
name: cargo test datafusion-cli (amd64)
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand Down Expand Up @@ -347,10 +352,11 @@ jobs:
linux-test-example:
name: cargo examples (amd64)
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand All @@ -377,10 +383,11 @@ jobs:
linux-test-doc:
name: cargo test doc (amd64)
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand All @@ -398,10 +405,11 @@ jobs:
linux-rustdoc:
name: cargo doc
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
Expand Down Expand Up @@ -438,10 +446,11 @@ jobs:
verify-benchmark-results:
name: verify benchmark results (amd64)
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand Down Expand Up @@ -471,7 +480,7 @@ jobs:
sqllogictest-postgres:
name: "Run sqllogictest with Postgres runner"
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
services:
Expand All @@ -489,6 +498,7 @@ jobs:
--health-timeout 5s
--health-retries 5
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand All @@ -509,10 +519,11 @@ jobs:
sqllogictest-substrait:
name: "Run sqllogictest in Substrait round-trip mode"
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand Down Expand Up @@ -639,10 +650,11 @@ jobs:
clippy:
name: clippy
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand Down Expand Up @@ -685,10 +697,11 @@ jobs:
config-docs-check:
name: check configs.md and ***_functions.md is up-to-date
needs: linux-build-lib
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}
container:
image: amd64/rust
steps:
- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e # v2.0.3
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
Expand Down
8 changes: 4 additions & 4 deletions datafusion-cli/src/object_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -749,7 +749,6 @@ mod tests {
eprintln!("{e}");
return Ok(());
}
let expected_region = "eu-central-1";
let location = "s3://test-bucket/path/file.parquet";
// Set it to a non-existent file to avoid reading the default configuration file
unsafe {
Expand All @@ -766,9 +765,10 @@ mod tests {
get_s3_object_store_builder(table_url.as_ref(), &aws_options, false).await?;

// Verify that the region was auto-detected in test environment
assert_eq!(
builder.get_config_value(&AmazonS3ConfigKey::Region),
Some(expected_region.to_string())
assert!(
builder
.get_config_value(&AmazonS3ConfigKey::Region)
.is_some()
);

Ok(())
Expand Down
28 changes: 27 additions & 1 deletion docs/source/contributor-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,32 @@ It's recommended to write a high-quality issue with a clear problem statement an

### CI Runners

We use [Runs-On](https://runs-on.com/) for some actions in the main repository, which run in the ASF AWS account to speed up CI time. In forks, these actions run on the default GitHub runners since forks do not have access to ASF infrastructure.
#### Runs-On

We use [Runs-On](https://runs-on.com/) for some actions in the main repository, which run in the ASF AWS account to speed up CI. In forks, these actions run on the default GitHub runners since forks do not have access to ASF infrastructure.

To configure them, we use the following format:

`runs-on: ${{ github.repository_owner == 'apache' && format('runs-on={0},family=m8a,cpu=16,image=ubuntu24-full-x64,extras=s3-cache,disk=large,tag=datafusion', github.run_id) || 'ubuntu-latest' }}`

This is a conditional expression that uses Runs-On custom runners for the main repository and falls back to the standard GitHub runners for forks. Runs-On configuration follows the [Runs-On pattern](https://runs-on.com/configuration/job-labels/).

For those actions we also use the [Runs-On action](https://runs-on.com/caching/magic-cache/#how-to-use), which adds support for external caching and reports job metrics:

`- uses: runs-on/action@cd2b598b0515d39d78c38a02d529db87d2196d1e`

For the standard GitHub runners, this action will do nothing.

##### Spot Instances

By default, Runs-On actions run as [spot instances](https://runs-on.com/configuration/spot-instances/), which means they might occasionally be interrupted. In the CI you would see:

```
Error: The operation was canceled.
```

According to Runs-On, spot instance termination is extremely rare for instances running for less than 1h. Those actions will be restarted automatically.

#### GitHub Runners

We also use standard GitHub runners for some actions in the main repository; these are also runnable in forks.