From 2d61ad2e2a9f5788633bcd173b1cfbfebe843a7e Mon Sep 17 00:00:00 2001 From: Bharati Khanijo Date: Wed, 25 Mar 2026 15:33:02 +0530 Subject: [PATCH 1/6] Updated Model list and documentation for DDP --- ...utilize_multiple_gpus_to_train_model.ipynb | 272 +++++++++++++++++- 1 file changed, 271 insertions(+), 1 deletion(-) diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb index f307890146..8c86b18ca2 100644 --- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb +++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb @@ -1 +1,271 @@ -{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Train arcgis.learn models on multiple GPUs"]}, {"cell_type": "markdown", "metadata": {}, "source": [""]}, {"cell_type": "markdown", "metadata": {}, "source": ["
Accelerating AI with GPUs"]}, {"cell_type": "markdown", "metadata": {}, "source": ["In this guide, we will walk you through how `arcgis.learn` models with PyTorch backend, support training across multiple GPUs.\n", "You can verify how many GPUs are available to PyTorch by runing the following commands: \n", "\n", "`import torch \n", "print ('Available devices ', torch.cuda.device_count())\n", "`\n", "\n", "PyTorch provides capabilities to utilize multiple GPUs in two ways:\n", "- Data Parallelism\n", "- Model Parallelism\n", "\n", "`arcgis.learn` uses one of the two ways to train models using multiple GPUs.\n", "Each of the two ways has its own significance and both offer an easy means of wrapping your code to add the capability of training the model on multiple GPUs.\n", "\n", "**Data Parallelism**: Data Parallelism is when we split the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel across multiple GPUs.\n", "\n", "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a Module in `DataParallel` and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n", "\n", "For certain models, `arcgis.learn` models already provide support for data parallelism in order to enhance model performance. This makes it easy for users to utilize multiple GPUs while training model on a single machine. Below is a detailed list of the models that have `DataParallel` support.\n", "\n", "- `FeatureClassifier`\n", "- `MaskRCNN`\n", "- `MultiTaskRoadExtractor`\n", "- `ConnectNet`\n", "- `PointCNN`\n", "- `SingleShotDetetor`\n", "- `UnetClassifier`\n", "\n", "You can set a subset of GPUs to be used for training your model by running the following command in the first cell:\n", "\n", "`\n", "import os\n", "os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # for setting first 2 GPUs\n", "`"]}, {"cell_type": "markdown", "metadata": {}, "source": ["**Model parallelism**: is when we split a model between multiple devices or nodes (such as GPU-equipped instances) for creating an efficient training pipeline to maximize GPU utilization. Model parallelism is implemented using `torch.nn.DistributedDataParallel`. \n", "\n", "This approach parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. The module is replicated on each machine and each device, and each such replica handles a portion of the input. During the backwards pass, gradients from each node are averaged. To read more, [click here](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html).\n", "Currently, the following models available in arcgis.learn support `DistributedDataParallel`: \n", "\n", "- `UnetClassifier`\n", "- `DeepLab`\n", "- `PSPNetClassifier`\n", "- `MaskRCNN`\n", "- `FasterRCNN`\n", "- `HEDEdgeDetector`\n", "- `BDCNEdgeDetector`\n", "- `ModelExtension`"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The following blocks of code download a script and training data that can be used to test the functionality of multi-gpu support, given a user has multiple GPUs. "]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["from arcgis.gis import GIS"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["gis = GIS()"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": ["script_item = gis.content.get('afd1c9a88a6f4f04896b4172c0f3a78c')"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/plain": ["'train_model.zip'"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["script_item.name"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": ["filepath = script_item.download(file_name=script_item.name)"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": ["import zipfile\n", "import os\n", "from pathlib import Path\n", "with zipfile.ZipFile(filepath, 'r') as zip_ref:\n", " zip_ref.extractall(Path(filepath).parent)"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": ["script_path = Path(os.path.join(os.path.splitext(filepath)[0]) + '.py')"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/plain": ["WindowsPath('C:/Users/Admin/AppData/Local/Temp/train_model.py')"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["script_path"]}, {"cell_type": "markdown", "metadata": {}, "source": ["To run multiple GPUs while training your model, you can download the script from above and/or create one for your own training data. Execute the command shown below in your command prompt:\n", "\n", "`python -m torch.distributed.launch --nproc_per_node=2 train-model.py`\n", "\n", "`nproc_per_node` = number of GPU instances per machine."]}, {"cell_type": "markdown", "metadata": {}, "source": ["For detailed arguments of (Distributed data Parallel)DDP, please refer to [this page](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py). "]}, {"cell_type": "markdown", "metadata": {}, "source": [""]}, {"cell_type": "markdown", "metadata": {}, "source": ["To verify that your GPUs are utilized for training, run `nvidia-smi` as shown below:"]}, {"cell_type": "markdown", "metadata": {}, "source": [""]}, {"cell_type": "markdown", "metadata": {}, "source": ["### References:"]}, {"cell_type": "markdown", "metadata": {}, "source": ["- https://www.esri.com/arcgis-blog/products/arcgis-pro/imagery/deep-learning-with-arcgis-pro-tips-tricks/\n", "- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html\n", "- https://fastai1.fast.ai/distributed.html\n", "- https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py\n", "- https://towardsdatascience.com/how-to-scale-training-on-multiple-gpus-dae1041f49d2"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9"}}, "nbformat": 4, "nbformat_minor": 4} \ No newline at end of file +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train arcgis.learn models on multiple GPUs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
Accelerating AI with GPUs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this guide, we will walk you through how `arcgis.learn` models with PyTorch backend, support training across multiple GPUs.\n", + "You can verify how many GPUs are available to PyTorch by runing the following commands: \n", + "\n", + "`import torch \n", + "print ('Available devices ', torch.cuda.device_count())\n", + "`\n", + "\n", + "PyTorch provides capabilities to utilize multiple GPUs in two ways:\n", + "- Data Parallelism\n", + "- Distributed Data Parallelism\n", + "\n", + "`arcgis.learn` uses one of the two ways to train models using multiple GPUs.\n", + "Each of the two ways has its own significance and both offer an easy means of wrapping your code to add the capability of training the model on multiple GPUs.\n", + "\n", + "**Data Parallelism**: Data Parallelism is when we split the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel across multiple GPUs, on a single machine.\n", + "\n", + "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a Module in `DataParallel` and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n", + "\n", + "For certain models, `arcgis.learn` models already provide support for data parallelism in order to enhance model performance. This makes it easy for users to utilize multiple GPUs while training model on a single machine. Below is a detailed list of the models that have `DataParallel` support.\n", + "\n", + "- `FeatureClassifier`\n", + "- `SingleShotDetector`\n", + "\n", + "You can set a subset of GPUs to be used for training your model by running the following command in the first cell:\n", + "\n", + "`\n", + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # for setting first 2 GPUs\n", + "`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Distributed Data Parallelism (DDP)**: In DDP, data is partitioned across multiple devices or nodes (such as GPU-equipped instances). It is implemented using `torch.nn.DistributedDataParallel`. Unlike data parallelism, DDP supports model parallelism across multiple machines, thus making it significantly more scalable. \n", + "\n", + "This approach parallelizes by chunking data along the batch dimension and distributing it across specified devices. Ideally, one process is spawned for each model replica; however, a single replica may span multiple devices if the model is large.\n", + "During the backward pass, DDP automatically synchronizes gradients across all processes to ensure consistent model updates.\n", + "To read more, [click here](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html).\n", + "Currently, the following models available in arcgis.learn support `DistributedDataParallel`: \n", + "\n", + "- `UnetClassifier`\n", + "- `DeepLab`\n", + "- `PSPNetClassifier`\n", + "- `MaskRCNN`\n", + "- `FasterRCNN`\n", + "- `HEDEdgeDetector`\n", + "- `BDCNEdgeDetector`\n", + "- `ModelExtension`\n", + "- `SuperResolution`\n", + "- `DETReg`\n", + "- `MaXDeepLab`\n", + "- `MMDetection`\n", + "- `MMSegmentation`\n", + "- `RTDetrV2`\n", + "- `SamLoRA`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following blocks of code download a script and training data that can be used to test the functionality of multi-gpu support, given a user has multiple GPUs. " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.gis import GIS" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "gis = GIS()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "script_item = gis.content.get('afd1c9a88a6f4f04896b4172c0f3a78c')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'train_model.zip'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "script_item.name" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "filepath = script_item.download(file_name=script_item.name)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "import zipfile\n", + "import os\n", + "from pathlib import Path\n", + "with zipfile.ZipFile(filepath, 'r') as zip_ref:\n", + " zip_ref.extractall(Path(filepath).parent)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "script_path = Path(os.path.join(os.path.splitext(filepath)[0]) + '.py')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "WindowsPath('C:/Users/Admin/AppData/Local/Temp/train_model.py')" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "script_path" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run multiple GPUs while training your model, you can download the script from above and/or create one for your own training data. Execute the command shown below in your command prompt:\n", + "\n", + "`python -m torch.distributed.launch --nproc_per_node=2 train_model.py`\n", + "\n", + "`nproc_per_node` = number of GPU instances to be used on the given machine." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For detailed arguments of (Distributed data Parallel)DDP, please refer to [this page](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To verify that your GPUs are utilized for training, run `nvidia-smi` as shown below:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### References:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- https://www.esri.com/arcgis-blog/products/arcgis-pro/imagery/deep-learning-with-arcgis-pro-tips-tricks/\n", + "- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html\n", + "- https://fastai1.fast.ai/distributed.html\n", + "- https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py\n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "dl-embedding_test_01", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 6ed2d8c7c13b6e6cf3c97ec574069513cffe7249 Mon Sep 17 00:00:00 2001 From: Bharati Khanijo Date: Wed, 25 Mar 2026 16:01:30 +0530 Subject: [PATCH 2/6] Fix for Spelling errors and grammatical mistakes --- .../utilize_multiple_gpus_to_train_model.ipynb | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb index 8c86b18ca2..d2e41e406c 100644 --- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb +++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb @@ -25,25 +25,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this guide, we will walk you through how `arcgis.learn` models with PyTorch backend, support training across multiple GPUs.\n", - "You can verify how many GPUs are available to PyTorch by runing the following commands: \n", + "In this guide, we will walk you through how `arcgis.learn` models with PyTorch backend support training across multiple GPUs.\n", + "You can verify how many GPUs are available to PyTorch by running the following commands: \n", "\n", "`import torch \n", "print ('Available devices ', torch.cuda.device_count())\n", "`\n", "\n", - "PyTorch provides capabilities to utilize multiple GPUs in two ways:\n", + "PyTorch provides the capability to utilize multiple GPUs in two ways:\n", "- Data Parallelism\n", "- Distributed Data Parallelism\n", "\n", - "`arcgis.learn` uses one of the two ways to train models using multiple GPUs.\n", - "Each of the two ways has its own significance and both offer an easy means of wrapping your code to add the capability of training the model on multiple GPUs.\n", + "`arcgis.learn` uses one of these two methods to train models using multiple GPUs.\n", + "Each of the two ways has its own significance and both offer an easy means of wrapping your code to add capability to train the model on multiple GPUs.\n", "\n", - "**Data Parallelism**: Data Parallelism is when we split the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel across multiple GPUs, on a single machine.\n", + "**Data Parallelism**: Data Parallelism refers to splitting the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel across multiple GPUs, on a single machine.\n", "\n", "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a Module in `DataParallel` and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n", "\n", - "For certain models, `arcgis.learn` models already provide support for data parallelism in order to enhance model performance. This makes it easy for users to utilize multiple GPUs while training model on a single machine. Below is a detailed list of the models that have `DataParallel` support.\n", + "For certain models, `arcgis.learn` already provide support for data parallelism in order to enhance model performance. This makes it easy for users to utilize multiple GPUs while training a model on a single machine. Below is a detailed list of the models that have `DataParallel` support.\n", "\n", "- `FeatureClassifier`\n", "- `SingleShotDetector`\n", @@ -52,7 +52,7 @@ "\n", "`\n", "import os\n", - "os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # for setting first 2 GPUs\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # for setting the first 2 GPUs\n", "`" ] }, @@ -60,7 +60,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Distributed Data Parallelism (DDP)**: In DDP, data is partitioned across multiple devices or nodes (such as GPU-equipped instances). It is implemented using `torch.nn.DistributedDataParallel`. Unlike data parallelism, DDP supports model parallelism across multiple machines, thus making it significantly more scalable. \n", + "**Distributed Data Parallelism (DDP)**: In DDP, data is partitioned across multiple devices or nodes (such as GPU-equipped instances). It is implemented using `torch.nn.DistributedDataParallel`. Unlike data parallelism, DDP supports model parallelism across multiple machines, making it significantly more scalable. \n", "\n", "This approach parallelizes by chunking data along the batch dimension and distributing it across specified devices. Ideally, one process is spawned for each model replica; however, a single replica may span multiple devices if the model is large.\n", "During the backward pass, DDP automatically synchronizes gradients across all processes to ensure consistent model updates.\n", From b1aceeaee068da79464aa368e760899af99d01c4 Mon Sep 17 00:00:00 2001 From: Bharati Khanijo Date: Wed, 25 Mar 2026 16:12:45 +0530 Subject: [PATCH 3/6] Grammatical enhancement --- .../utilize_multiple_gpus_to_train_model.ipynb | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb index d2e41e406c..71b7d82a73 100644 --- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb +++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb @@ -88,7 +88,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The following blocks of code download a script and training data that can be used to test the functionality of multi-gpu support, given a user has multiple GPUs. " + "The following blocks of code download a script and training data that can be used to test the functionality of multi-GPU support, provided the user has multiple GPUs. \n", + "The user can modify the script downloaded at `script_path` to be in accordance with their model and dataset." ] }, { @@ -204,7 +205,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For detailed arguments of (Distributed data Parallel)DDP, please refer to [this page](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py). " + "For detailed arguments of DDP (Distributed Data Parallel), please refer to [this page](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py). " ] }, { From 2e84674b7fe346668de66fe1eaa8cb3c894a2c5c Mon Sep 17 00:00:00 2001 From: Bharati Khanijo Date: Wed, 25 Mar 2026 16:32:47 +0530 Subject: [PATCH 4/6] Updated to improve narrative consistency --- .../utilize_multiple_gpus_to_train_model.ipynb | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb index 71b7d82a73..0e186c0a9c 100644 --- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb +++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb @@ -25,7 +25,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this guide, we will walk you through how `arcgis.learn` models with PyTorch backend support training across multiple GPUs.\n", + "In this guide, we walk you through how `arcgis.learn` models with PyTorch backend, support training across multiple GPUs.\n", "You can verify how many GPUs are available to PyTorch by running the following commands: \n", "\n", "`import torch \n", @@ -37,13 +37,13 @@ "- Distributed Data Parallelism\n", "\n", "`arcgis.learn` uses one of these two methods to train models using multiple GPUs.\n", - "Each of the two ways has its own significance and both offer an easy means of wrapping your code to add capability to train the model on multiple GPUs.\n", + "Each method has its own significance and both offer an easy way to wrap your code to enable training on multiple GPUs.\n", "\n", - "**Data Parallelism**: Data Parallelism refers to splitting the mini-batch of samples into multiple smaller mini-batches and run the computation for each of the smaller mini-batches in parallel across multiple GPUs, on a single machine.\n", + "**Data Parallelism**: Data Parallelism refers to splitting the mini-batch of samples into multiple smaller mini-batches and running the computation for each of the smaller mini-batches in parallel across multiple GPUs on a single machine.\n", "\n", - "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a Module in `DataParallel` and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n", + "Data Parallelism is implemented using `torch.nn.DataParallel`. You can wrap a module in `DataParallel`, and it will be parallelized over multiple GPUs in the batch dimension. For more details [click here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). \n", "\n", - "For certain models, `arcgis.learn` already provide support for data parallelism in order to enhance model performance. This makes it easy for users to utilize multiple GPUs while training a model on a single machine. Below is a detailed list of the models that have `DataParallel` support.\n", + "For certain models, `arcgis.learn` already provide support for data parallelism to enhance model performance. This makes it easy for users to utilize multiple GPUs while training a model on a single machine. Below is a list of the models that have `DataParallel` support.\n", "\n", "- `FeatureClassifier`\n", "- `SingleShotDetector`\n", @@ -65,7 +65,7 @@ "This approach parallelizes by chunking data along the batch dimension and distributing it across specified devices. Ideally, one process is spawned for each model replica; however, a single replica may span multiple devices if the model is large.\n", "During the backward pass, DDP automatically synchronizes gradients across all processes to ensure consistent model updates.\n", "To read more, [click here](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html).\n", - "Currently, the following models available in arcgis.learn support `DistributedDataParallel`: \n", + "Currently, the following models in `arcgis.learn` support `DistributedDataParallel`: \n", "\n", "- `UnetClassifier`\n", "- `DeepLab`\n", @@ -88,8 +88,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The following blocks of code download a script and training data that can be used to test the functionality of multi-GPU support, provided the user has multiple GPUs. \n", - "The user can modify the script downloaded at `script_path` to be in accordance with their model and dataset." + "The following blocks of code download a script and training data that you can use to test the functionality of multi-GPU support, provided you have multiple GPUs. \n", + "You can modify the script downloaded at `script_path` to be in match your model and dataset." ] }, { @@ -219,7 +219,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To verify that your GPUs are utilized for training, run `nvidia-smi` as shown below:" + "To verify that your GPUs are being utilized for training, run `nvidia-smi` as shown below:" ] }, { From aa36625547a719a5001c06061762105ce52f3c19 Mon Sep 17 00:00:00 2001 From: Bharati Khanijo Date: Wed, 25 Mar 2026 16:50:39 +0530 Subject: [PATCH 5/6] Updates for better consistency --- .../utilize_multiple_gpus_to_train_model.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb index 0e186c0a9c..5ab3dd5174 100644 --- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb +++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb @@ -88,8 +88,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The following blocks of code download a script and training data that you can use to test the functionality of multi-GPU support, provided you have multiple GPUs. \n", - "You can modify the script downloaded at `script_path` to be in match your model and dataset." + "The following blocks of code download a sample script that you can use to test the functionality of multi-GPU support, provided you have multiple GPUs. \n", + "You can modify the script downloaded at `script_path` to include your model and dataset." ] }, { From db7eb3a7eeda79dbf94a3e3e2edd1447a7d4c0fb Mon Sep 17 00:00:00 2001 From: Bharati Khanijo Date: Wed, 25 Mar 2026 16:58:58 +0530 Subject: [PATCH 6/6] Jupyter Black changes --- .../utilize_multiple_gpus_to_train_model.ipynb | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb index 5ab3dd5174..333c746d16 100644 --- a/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb +++ b/guide/14-deep-learning/utilize_multiple_gpus_to_train_model.ipynb @@ -103,7 +103,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -112,16 +112,16 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "script_item = gis.content.get('afd1c9a88a6f4f04896b4172c0f3a78c')" + "script_item = gis.content.get(\"afd1c9a88a6f4f04896b4172c0f3a78c\")" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -150,24 +150,25 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import zipfile\n", "import os\n", "from pathlib import Path\n", - "with zipfile.ZipFile(filepath, 'r') as zip_ref:\n", + "\n", + "with zipfile.ZipFile(filepath, \"r\") as zip_ref:\n", " zip_ref.extractall(Path(filepath).parent)" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "script_path = Path(os.path.join(os.path.splitext(filepath)[0]) + '.py')" + "script_path = Path(os.path.join(os.path.splitext(filepath)[0]) + \".py\")" ] }, {