feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery#10352
Open
localai-bot wants to merge 3 commits into
Open
feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery#10352localai-bot wants to merge 3 commits into
localai-bot wants to merge 3 commits into
Conversation
… + gallery Mirrors the locate-anything-cpp backend to register a new depth-anything backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via purego (cgo-less, no Python at inference). - backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage), purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts building depth-anything.cpp's DA_SHARED static .so per CPU variant. - backend/index.yaml: depth-anything backend meta + all hardware-variant capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t). - gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32, small, large, giant, mono-large). - .github/backend-matrix.yml: one build entry per hardware variant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| img.Pix[y*img.Stride+x] = uint8(n * 255) | ||
| } | ||
| } | ||
| f, err := os.Create(dst) |
| h, w = int(ch), int(cw) | ||
| n := h * w | ||
| if n > 0 { | ||
| src := unsafe.Slice(ptr, n) |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| return nil, fmt.Errorf("depth-anything-cpp: mkdir export dir: %w", err) | ||
| } | ||
| dstDir = tmp | ||
| } else if err := os.MkdirAll(dstDir, 0o755); err != nil { |
| paths = append(paths, out) | ||
| case "colmap": | ||
| out := filepath.Join(dstDir, "colmap") | ||
| if err := os.MkdirAll(out, 0o755); err != nil { |
| if p == nil || n <= 0 { | ||
| return nil | ||
| } | ||
| src := unsafe.Slice(p, n) |
| if p == nil || n <= 0 { | ||
| return nil | ||
| } | ||
| src := unsafe.Slice(p, n) |
The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3); pin the native build to the commit that exports them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a new
depth-anythingbackend to LocalAI, mirroring the existinglocate-anything-cppbackend. It wraps the Depth Anything 3 ggml port — monocular metric depth + camera pose estimation in C++/ggml, loaded via purego (cgo-less, no Python at inference).Depth has no native OpenAI endpoint, so the model is exposed two ways:
The C side shares a ggml graph allocator and is not reentrant, so the backend embeds
base.SingleThreadto serialize inference.Backend
backend/go/depth-anything-cpp/mirrorslocate-anything-cpp's structure (single Gopackage main:main.godlopens the variant.soand registers theda_capi_*symbols;godepthanythingcpp.goimplements Load/Predict/GenerateImage). TheCMakeLists.txt+Makefileclone and build depth-anything.cpp pinned to commit61ede2a6f1402aab3875729126830b61561db6ae, using-DDA_SHARED=ON -DDA_BUILD_CLI=OFF -DDA_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFFto produce a self-containedlibdepthanythingcpp-<variant>.so(ggml linked statically) per CPU variant (avx/avx2/avx512/fallback), exactly like locate-anything-cpp.Models (gallery)
Eight Depth Anything 3 GGUFs published at huggingface.co/mudler/depth-anything.cpp-gguf, all
backend: depth-anything:depth-anything-3-base(q4_k default),-q8_0,-f16,-f32depth-anything-3-small(vits),-large(vitl),-giant(vitg)depth-anything-3-mono-large(monocular, depth + sky mask, no pose)Registration
backend/index.yaml—depth-anythingmeta + all hardware-variant capability/image-tag entries (cpu, cuda12, cuda13, intel-sycl-f32/f16, vulkan, nvidia-l4t-arm64, cuda13-l4t), samequay.io/go-skynet/local-ai-backends:...+localai/localai-backends:...mirror scheme as locate-anything-cpp..github/backend-matrix.yml— one build entry per hardware variant forbackend: "depth-anything-cpp".Note / deviation
Faithfully mirrors
locate-anything-cpp, which is part of LocalAI's root Go module (no per-backendgo.mod/proto). The backend therefore usespkg/grpc+pkg/grpc/proto+base.SingleThreadrather than the standalonego.mod/protofrom the depth-anything.cpp repo — required for compatibility with LocalAI's gRPC server contract.API — typed
DepthRPC +POST /v1/depthThis PR also adds a typed
DepthgRPC RPC (mirroring the existingDetectRPC end-to-end) and a REST endpoint that expose the full Depth Anything 3
output surface, not just the depth PNG / stats JSON.
Proto (
backend/backend.proto):rpc Depth(DepthRequest) returns (DepthResponse) {} message DepthRequest { string src = 1; // image: filesystem path or base64 payload string dst = 2; // optional output dir for exports bool include_depth = 3; bool include_confidence = 4; bool include_pose = 5; bool include_sky = 6; bool include_points = 7; float points_conf_thresh = 8; repeated string exports = 9; // "glb", "colmap" } message DepthResponse { int32 width = 1; int32 height = 2; repeated float depth = 3; // width*height row-major metric depth repeated float confidence = 4; // DualDPT repeated float sky = 5; // mono models repeated float extrinsics = 6; // 12 (3x4) repeated float intrinsics = 7; // 9 (3x3) int32 num_points = 8; repeated float points = 9; // num_points*3 xyz (world space) bytes point_colors = 10; // num_points*3 u8 rgb repeated string export_paths = 11; bool is_metric = 12; }REST:
POST /v1/depth(requestschema.DepthRequest→schema.DepthResponse).Accepts the image as a file path, base64, or URL (like the other vision
endpoints);
point_colorsis base64-encoded in the JSON response. When noinclude_*flag is set the endpoint returns everything the model produces(depth + confidence + pose + sky). Routed via a new
depthusecase(
FLAG_DEPTH/MethodDepth) wired intobackend_capabilities.gofor thedepth-anythingbackend.Full data surface: per-pixel metric depth + confidence (DualDPT) or
depth + sky (mono), camera pose (extrinsics 3x4 / intrinsics 3x3), a
back-projected 3D point cloud (xyz + rgb, conf-thresholded), and glb /
COLMAP exports — backed by the native ABI-3 C-API
(
da_capi_depth_dense,da_capi_points,da_capi_export_glb,da_capi_export_colmap). The existingPredict(stats JSON) andGenerateImage(depth PNG) paths are kept.Wiring mirrors
Detectacross every layer:backend.proto, the regeneratedpkg/grpc/proto(gitignored, built viamake protogen-go),pkg/grpc(
interface.go/backend.go/client.go/base/embed/server),pkg/model+core/services/nodesclient wrappers,core/backend/depth.go,and
core/http/endpoints/localai/depth.go+ route registration.Test Plan
cd backend/go/depth-anything-cpp && make depth-anything-cpp(clones + builds depth-anything.cpp variants + Go binary)make test(downloadsdepth-anything-small-f32.gguf+ a test image, runs the gRPC Load/Predict smoke test inmain_test.go)make packageproduces a self-contained package dir (run.sh+ variant.sos + bundled libc/libstdc++/libgomp)depth-anything-3-base, callGenerateImage(depth PNG) andPredict(JSON depth stats + pose)🤖 Generated with Claude Code