3.4. GPU#

GPU 以及其他异构计算加速器被广泛用来加速深度学习,Dask 社区联合 NVIDIA,提供了基于 GPU 的数据科学工具包,以加速各类任务。

Dask GPU 集群#

Dask-CUDAdask.distributed 扩展,可以识别和管理 GPU 设备。

使用这些 GPU 设备前,先通过 pip install dask_cuda 安装 Dask-CUDA。跟 章节 3.3 提到的dask.distributed 类似,Dask-CUDA 提供了一个单机的 LocalCUDAClusterLocalCUDACluster 会自动发现并注册该计算节点上的多张 GPU 卡,每张 GPU 自动配比一定数量的 CPU 核心。比如,我的环境有 4 张 GPU 卡,启动一个单机 Dask 集群,会自动启动 4 个 Dask Worker,每个 Worker 一张 GPU 卡。

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)
client
/fs/fast/u20200002/envs/dispy/lib/python3.11/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37111 instead
  warnings.warn(

Client

Client-5c3311bf-0ce5-11ef-bd8c-000012e4fe80

Connection method: Cluster object Cluster type: dask_cuda.LocalCUDACluster
Dashboard: http://127.0.0.1:37111/status

Cluster Info

我们也可以启动一个 Dask GPU 集群,先启动 Dask Scheduler:

dask scheduler

再在每个 GPU 节点上启动支持 GPU 的 Worker,构成一个 Dask GPU 集群。

dask cuda worker tcp://scheduler:8786
client = Client("10.0.0.3:8786")
client

Client

Client-6039933f-0ce3-11ef-b163-000012e4fe80

Connection method: Direct
Dashboard: http://10.0.0.3:8787/status

Scheduler Info

Scheduler

Scheduler-d073585d-dcac-41bf-9c5c-1055fe07576c

Comm: tcp://10.0.0.3:8786 Workers: 8
Dashboard: http://10.0.0.3:8787/status Total threads: 8
Started: Just now Total memory: 180.00 GiB

Workers

Worker: tcp://10.0.0.2:34491

Comm: tcp://10.0.0.2:34491 Total threads: 1
Dashboard: http://10.0.0.2:38385/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:37559
Local directory: /tmp/dask-scratch-space/worker-p2de783n
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.19 MiB Spilled bytes: 0 B
Read bytes: 8.81 kiB Write bytes: 14.61 kiB

Worker: tcp://10.0.0.2:39239

Comm: tcp://10.0.0.2:39239 Total threads: 1
Dashboard: http://10.0.0.2:45797/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:36259
Local directory: /tmp/dask-scratch-space/worker-mo04yp4a
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 216.30 MiB Spilled bytes: 0 B
Read bytes: 9.76 kiB Write bytes: 14.86 kiB

Worker: tcp://10.0.0.2:40863

Comm: tcp://10.0.0.2:40863 Total threads: 1
Dashboard: http://10.0.0.2:43677/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:32877
Local directory: /tmp/dask-scratch-space/worker-4p9jsv4f
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.27 MiB Spilled bytes: 0 B
Read bytes: 9.77 kiB Write bytes: 14.88 kiB

Worker: tcp://10.0.0.2:46243

Comm: tcp://10.0.0.2:46243 Total threads: 1
Dashboard: http://10.0.0.2:40513/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.2:45107
Local directory: /tmp/dask-scratch-space/worker-gt5epnxr
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 216.21 MiB Spilled bytes: 0 B
Read bytes: 10.04 kiB Write bytes: 15.00 kiB

Worker: tcp://10.0.0.3:39647

Comm: tcp://10.0.0.3:39647 Total threads: 1
Dashboard: http://10.0.0.3:38377/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:34843
Local directory: /tmp/dask-scratch-space/worker-gqcyic7m
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 217.51 MiB Spilled bytes: 0 B
Read bytes: 63.74 kiB Write bytes: 58.80 kiB

Worker: tcp://10.0.0.3:40155

Comm: tcp://10.0.0.3:40155 Total threads: 1
Dashboard: http://10.0.0.3:34723/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:46339
Local directory: /tmp/dask-scratch-space/worker-yo78gnof
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 218.25 MiB Spilled bytes: 0 B
Read bytes: 63.73 kiB Write bytes: 58.80 kiB

Worker: tcp://10.0.0.3:45005

Comm: tcp://10.0.0.3:45005 Total threads: 1
Dashboard: http://10.0.0.3:42503/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:34929
Local directory: /tmp/dask-scratch-space/worker-skts4xjq
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 6.0% Last seen: Just now
Memory usage: 216.24 MiB Spilled bytes: 0 B
Read bytes: 63.74 kiB Write bytes: 58.81 kiB

Worker: tcp://10.0.0.3:46333

Comm: tcp://10.0.0.3:46333 Total threads: 1
Dashboard: http://10.0.0.3:36413/status Memory: 22.50 GiB
Nanny: tcp://10.0.0.3:44405
Local directory: /tmp/dask-scratch-space/worker-pu9uzxbg
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 218.16 MiB Spilled bytes: 0 B
Read bytes: 64.86 kiB Write bytes: 59.93 kiB

Note

Dask-CUDA 只发现并注册这些 GPU,但无法做到 GPU 的隔离,其他非 Dask 的任务仍然可以抢占该 GPU。GPU 资源的隔离应该需要借助 Kubernetes 等容器技术。

GPU 任务#

并不是所有任务都能被 GPU 加速,GPU 主要加速一些计算密集型任务,比如机器学习和深度学习等。目前,Dask 支持的 GPU 上框架包括:

  • CuPy 的 Dask 分布式版本

  • Dask-cuDF 将 GPU 加速版的 cuDF 横向扩展到 GPU 集群上

Note

使用 NVIDIA 的 GPU 的各类库,应该将 CUDA 目录添加到 PATHLD_LIBRARY_PATH 环境变量中,CuPy 和 cuDF 需要依赖所安装的动态链接库。

案例:奇异值分解#

下面的代码在 GPU 上进行奇异值分解,是一种适合 GPU 加速的任务。设置 dask.config.set({"array.backend": "cupy"}) 即可将 Dask Array 的执行后端改为 GPU 上的 CuPy。

import cupy
import dask
import dask.array as da

dask.config.set({"array.backend": "cupy"})
rs = dask.array.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.random((10000, 1000), chunks=(1000, 1000))
u, s, v = dask.array.linalg.svd(x)