Rexdf

The devil is in the Details.

Apple M1 编译tensorflow

| Comments

本文记录了一个周日下午的日常。想来苹果这M1大概都出来半年了, M2/M1X都开始有传闻了,tensorflow应该支持得差不多了,实际情况似乎还是没有原生的官方二进制release。本文是记录的M1原生的编译,大概还有给Terminal开Rosetta 2 和 Miniforge两种方法本文不讨论.

0. python环境

M1支持的python版本大概是3.8.5+和python3.9.1+。

pyenv install 3.8.11
pyenv virtualenv 3.8.11 tf3811
pyenv activate tf3811
pip install -U pip wheel

1. bazel版本

当前tensorflow官网给出最新的支持bazel版本是3.7.2,即使是最新的master分支也是锁死在3.99.0版本上,而bazel的release二进制从4.1开始才支持darwin-arm64的二进制release。

所以就打算直接用4.1版本的bazel了。对tensorflow的代码打一个patch。

diff --git a/configure.py b/configure.py
index 4f096b87550..9f7e16bd29a 100644
--- a/configure.py
+++ b/configure.py
@@ -50,7 +50,8 @@ _TF_WORKSPACE_ROOT = ''
_TF_BAZELRC = ''
_TF_CURRENT_BAZEL_VERSION = None
_TF_MIN_BAZEL_VERSION = '3.7.2'
-_TF_MAX_BAZEL_VERSION = '3.99.0'
+#_TF_MAX_BAZEL_VERSION = '3.99.0'
+_TF_MAX_BAZEL_VERSION = '5.0.0'
NCCL_LIB_PATHS = [
'lib64/', 'lib/powerpc64le-linux-gnu/', 'lib/x86_64-linux-gnu/', ''

2. 尝试

bazel版本解决后按照官方的编译教程直接往下一路编译就正常了。似乎很简单是么?那我就不会写本文来记录了(^_^)

首先我们安装两个python依赖的时候大概是pip install numpy会安装1.21.0。 但是目前tensorflow的master分支依赖的~=1.19.2。 然后我们会注意到一个libclang的错误。

3. 依赖解决

3.1 libclang

wget https://github.com/llvm/llvm-project/releases/download/llvmorg-11.1.0/llvm-11.1.0.src.tar.xz
tar xf llvm-project-11.1.0.src.tar.xz
mv llvm-project-11.1.0.src llvm-project-11.1.0
cd llvm-project-11.1.0
mkdir build
cd build
brew install gcc g++
cmake ../llvm -DLLVM_ENABLE_PROJECTS=clang -DBUILD_SHARED_LIBS=OFF -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=MinSizeRel -DCMAKE_CXX_FLAGS_MINSIZEREL="-Os -s -DNDEBUG -static-libgcc -static-libstdc++" -DCMAKE_C_COMPILER=gcc-11 -DCMAKE_CXX_COMPILER=g++-11 -DCMAKE_OSX_DEPLOYMENT_TARGET=10.9
make libclang -j$(sysctl -n hw.ncpu)
cd ../..
wget https://github.com/sighingnow/libclang/archive/refs/tags/llvm-11.1.0.tar.gz
tar xvf llvm-11.1.0.tar.gz
cd libclang-llvm-11.1.0
cp ../llvm-project-11.1.0/build/lib/libclang.dylib native/
pip install -v .
# python setup.py build
# python setup.py install

3.2 numpy

brew install openblas
OPENBLAS="$(brew --prefix openblas)" pip install numpy==1.19.5

3.3 grpcio

CFLAGS="-I /opt/homebrew/opt/openssl/include" LDFLAGS="-L /opt/homebrew/opt/openssl/lib" GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1 pip install grpcio==1.38.1

3.4 h5py

brew install hdf5
HDF5_DIR=/opt/homebrew/opt/hdf5 pip3 install --no-build-isolation h5py==3.1.0

4. 编译

#!/bin/bash
set -e
pushd tensorflow
git pull
export TEST_TMPDIR=$HOME/tmp
bazel --output_user_root=$HOME/bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
#bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package $HOME
popd

5. 测试

pushd $HOME/models/official/vision/image_classification
export MODEL_DIR="$HOME/models"
export DATA_DIR="$HOME/data"
export NUM_GPUS="0"
export PYTHONPATH=$PYTHONPATH:$HOME/models
python3 mnist_main.py \
--model_dir=$MODEL_DIR \
--data_dir=$DATA_DIR \
--train_epochs=10 \
--distribution_strategy=one_device \
--num_gpus=0 \
--download

运行的结果

...
58/58 [==============================] - ETA: 0s - loss: 1.4935 - sparse_categorical_accuracy: 0.54532021-07-11 18:30:40.523279: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:461] The `assert_cardinality` transformation is currently not handled by the auto-shard rewrite and will be removed.
58/58 [==============================] - 19s 312ms/step - loss: 1.4935 - sparse_categorical_accuracy: 0.5453 - val_loss: 0.5256 - val_sparse_categorical_accuracy: 0.8397
Epoch 2/10
58/58 [==============================] - ETA: 0s - loss: 0.4501 - sparse_categorical_accuracy: 0.85792021-07-11 18:31:00.658115: W tensorflow/core/framework/dataset.cc:679] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
58/58 [==============================] - 20s 344ms/step - loss: 0.4501 - sparse_categorical_accuracy: 0.8579 - val_loss: 0.2695 - val_sparse_categorical_accuracy: 0.9229
Epoch 3/10
58/58 [==============================] - 20s 341ms/step - loss: 0.2812 - sparse_categorical_accuracy: 0.9153 - val_loss: 0.1970 - val_sparse_categorical_accuracy: 0.9427
...

简直被Nvidia的显卡15ms/step暴击。Intel的260ms/step要慢。

6. tensorflow-metal PluggableDevice

参考tensorflow-mental

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
...
conda create -n tf39 python=3.9
conda activate tf39
conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal

结果

58/58 [==============================] - ETA: 0s - loss: 1.5647 - sparse_categorical_accuracy: 0.52392021-07-11 18:51:53.772789: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:461] The `assert_cardinality` transformation is currently not handled by the auto-shard rewrite and will be removed.
2021-07-11 18:51:53.799144: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-11 18:51:53.801124: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-11 18:51:53.844207: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
58/58 [==============================] - 8s 100ms/step - loss: 1.5647 - sparse_categorical_accuracy: 0.5239 - val_loss: 0.5425 - val_sparse_categorical_accuracy: 0.8359
Epoch 2/10
58/58 [==============================] - ETA: 0s - loss: 0.4310 - sparse_categorical_accuracy: 0.86482021-07-11 18:51:59.418133: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-11 18:51:59.420233: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
58/58 [==============================] - 5s 95ms/step - loss: 0.4310 - sparse_categorical_accuracy: 0.8648 - val_loss: 0.2834 - val_sparse_categorical_accuracy: 0.9180
Epoch 3/10
58/58 [==============================] - ETA: 0s - loss: 0.2663 - sparse_categorical_accuracy: 0.91952021-07-11 18:52:05.022747: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-11 18:52:05.025100: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
58/58 [==============================] - 6s 96ms/step - loss: 0.2663 - sparse_categorical_accuracy: 0.9195 - val_loss: 0.1869 - val_sparse_categorical_accuracy: 0.9452
Epoch 4/10
58/58 [==============================] - ETA: 0s - loss: 0.2088 - sparse_categorical_accuracy: 0.93692021-07-11 18:52:10.479798: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-11 18:52:10.482146: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
58/58 [==============================] - 5s 94ms/step - loss: 0.2088 - sparse_categorical_accuracy: 0.9369 - val_loss: 0.1528 - val_sparse_categorical_accuracy: 0.9548
Epoch 5/10
58/58 [==============================] - ETA: 0s - loss: 0.1774 - sparse_categorical_accuracy: 0.94642021-07-11 18:52:15.953144: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-07-11 18:52:15.955539: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
58/58 [==============================] - 5s 94ms/step - loss: 0.1774 - sparse_categorical_accuracy: 0.9464 - val_loss: 0.1292 - val_sparse_categorical_accuracy: 0.9602

95ms/step还算可以吧。差不多是1060的35ms/step的三分之一。

7. 参考资料

  1. Installing TensorFlow on the M1 Mac
  2. Fails to build from source on MacOS 11 + M1
  3. Apple M1, Python, Pandas, and Homebrew
  4. BLD: fail to build on Apple M1
  5. How can I install GRPCIO on an Apple M1 Silicon laptop?
  6. libclang/.github/workflows/libclang-macosx-amd64.yml

Comments