Amazon Elastic Inference（EI）について調べてみた

ずっと前に調べていたメモをブログ用に書き起こしてみました。

Amazon Elastic Inference（EI）とは
料金
制限
GPU インスタンスとの違い
Amazon EI を使うためにやること
EC2 インスタンスの起動
EI Predictor を使ってモデルを実行する
参考URL

Amazon Elastic Inference（EI）とは

Amazon Elastic Inference (Amazon EI) は、Amazon EC2 または Amazon SageMaker のインスタンスタイプまたは Amazon ECS タスクに適切な量の GPU による推論アクセラレーションをアタッチさせる、高速コンピューティングサービスです。つまり、アプリケーションの全体的なコンピューティング、メモリ、およびストレージのニーズに最も適したインスタンスタイプを選択し、必要な推論アクセラレーションの程度を個別に構成できます。

https://aws.amazon.com/jp/machine-learning/elastic-inference/faqs/

要は EC2、ECS、SageMaker に GPU をアタッチできるようになることらしい。

料金

料金表はこちらです。計算例を後述しますが、インスタンスタイプを適切にすることで、GPU インスタンスよりコストをおさえることができます。

制限

GPU インスタンスと異なり、いくつかの制限があります。

Before you get started with Amazon Elastic Inference - Amazon Elastic Inference

1つの EC2 インスタンスにアタッチできる EI アクセラレータは一つ
EI アクセラレータは EC2 インスタンス間で共有できない
EC2 インスタンスから EI アクセラレータをデタッチしたり別のインスタンスに移すことができない。EI アクセラレータのタイプも変更できない（Terminate するしかない）
Amazon Elastic Inference が強化された MXNet および Amazon Elastic Inference が強化された TensorFlow ライブラリのみが、Elastic Inference アクセラレーターの推論呼び出しを行うことができる。CUDA と併用ができない

GPU インスタンスとの違い

EI は EI アクセラレータへ VPC エンドポイント経由でアクセスするため、推論レイテンシが増加する
ドキュメントによると、 c5.xlarge × eia2.medium と p2.xlarge と同等以上のパフォーマンスがでるらしい
- p2.xlarge : $ 1.542 * 24h * 30days = $ 1,110.24
- c5.xlarge × eia1.large : ($ 0.214 + $ 0.450) * 24h * 30days = $ 478.08 *1
GPU 使用率やメモリ使用量が CloudWatch から見れる
- https://docs.aws.amazon.com/ja_jp/elastic-inference/latest/developerguide/ei-cloudwatch-metrics.html

ちなみに EI の GPU は Amazon 製でした。

# c5.large + eia1.medium
$ lspci | grep VGA
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111

Amazon EI を使うためにやること

セキュリティグループのルール追加

インバウンド・アウトバウンドともに HTTPS（443）を許可する必要があります。

AWS PrivateLink エンドポイントを作成

$ aws ec2 create-vpc-endpoint \
  --vpc-id <VPC ID> \
  --vpc-endpoint-type Interface \
  --service-name com.amazonaws.<REGION>.elastic-inference.runtime \
  --subnet-ids <SUBNET ID> \
  --security-group-ids <SECURITY GROUP ID>

IAM ロールのアタッチ

以下のポリシーをアタッチする。特に elastic-inference:Connect は忘れないようにします。

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elastic-inference:Connect",
                "iam:List*",
                "iam:Get*",
                "ec2:Describe*",
                "ec2:Get*"
            ],
            "Resource": "*"
        }
    ]
}

IAM 管理ポリシーを作っておくと便利です。

$ cat << EOF > policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elastic-inference:Connect",
                "iam:List*",
                "iam:Get*",
                "ec2:Describe*",
                "ec2:Get*"
            ],
            "Resource": "*"
        }
    ]
}
EOF

$ aws iam create-policy \
  --policy-name AmazonElasticInferenceAccess \
  --policy-document file://policy.json

EC2 インスタンスの起動

EI アクセラレータオプションをつけて EC2 インスタンスを新規作成します。稼働中のインスタンスにはアタッチすることはできないので注意してください。EI アクセラレータオプションの付与はマネジメントコンソール、AWS CLI どちらでもできます。

$ aws ec2 run-instances \
...
  --elastic-inference-accelerator Type=eia1.large

今回は検証用に以下の条件で EC2 インスタンスを作成しました。VPC、サブネット、セキュリティグループは作成済とします。IAM ロールに先程作成したIAM ポリシー（AmazonElasticInferenceAccess）をアタッチします。

EC2インスタンス: c5.large
EI: eia1.medium
Deep Learning AMI (Ubuntu 16.04) Version 29.1（ami-07e49d277938544cf）

EC2 インスタンスに SSH もしくはセッションマネージャでログインします。セッションマネージャでログインした場合は、ubuntu ユーザにスイッチします。

$ sudo su - ubuntu

EI Predictor を使ってモデルを実行する

EI 対応の TensorFlow 1.12 で利用できる柔軟性のある新型 Python API を使用して、Amazon Elastic Inference で TensorFlow モデルをデプロイする | Amazon Web Services ブログを参考に試してみます。

$ source activate amazonei_tensorflow_p36
Please run 'python ~/anaconda3/bin/EISetupValidator.py' if you experience issues using Amazon EI service. This script verifies that this instance is correctly configured to use Amazon EI service.

(amazonei_tensorflow_p36) $ python ~/anaconda3/bin/EISetupValidator.py
All the validation checks passed for Amazon EI from this instance - i-0165b536e224827eb

# ResNet SSD のサンプルモデルをダウンロード・解凍
(amazonei_tensorflow_p36) $ curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip
(amazonei_tensorflow_p36) $ unzip ssd_resnet.zip -d /tmp

# 3匹の犬の写真をダウンロード
(amazonei_tensorflow_p36) $ curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/3dogs.jpg

# AWSブログにあるソースコードをコピペする
(amazonei_tensorflow_p36) $ vim ssd_resnet_predictor.py

# 推論スクリプトの実行
(amazonei_tensorflow_p36) $ python ssd_resnet_predictor.py --image 3dogs.jpg
# 多くのワーニングエラーが出るので一部のみを転載
Using Amazon Elastic Inference Client Library Version: 1.6.3
Number of Elastic Inference Accelerators Available: 1
Elastic Inference Accelerator ID: eia-2e7df871fc5e474c837628dde72a5ec3
Elastic Inference Accelerator Type: eia1.medium
Elastic Inference Accelerator Ordinal: 0

Inference 0 took 20.271986 seconds
Inference 1 took 0.197032 seconds
Inference 2 took 0.201697 seconds
Inference 3 took 0.199163 seconds
Inference 4 took 0.199758 seconds
Inference 5 took 0.201006 seconds
Inference 6 took 0.201349 seconds
Inference 7 took 0.199473 seconds
Inference 8 took 0.199053 seconds
Inference 9 took 0.199506 seconds
Inference 10 took 0.201012 seconds
Inference 11 took 0.197802 seconds
Inference 12 took 0.200899 seconds
Inference 13 took 0.200465 seconds
Inference 14 took 0.201096 seconds
Inference 15 took 0.198585 seconds
Inference 16 took 0.199784 seconds
Inference 17 took 0.200048 seconds
Inference 18 took 0.199824 seconds
Inference 19 took 0.200712 seconds
3 detection[s]
['dog', 'dog', 'dog']
Running SSD Resnet on EIPredictor using default Signature Def
Using DEFAULT_SERVING_SIGNATURE_DEF_KEY .....
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
I0110 07:44:28.352408 140618874283776 saver.py:1503] Saver not created because there are no variables in the graph to restore
INFO:tensorflow:The specified SavedModel has no variables; no checkpoints were restored.
I0110 07:44:28.352689 140618874283776 loader_impl.py:374] The specified SavedModel has no variables; no checkpoints were restored.
The first inference request loads the model into the accelerator and can take several seconds to complete.Please standby!

Inference 0 took 9.871971 seconds
Inference 1 took 0.200099 seconds
Inference 2 took 0.199755 seconds
Inference 3 took 0.201527 seconds
Inference 4 took 0.198058 seconds
Inference 5 took 0.201121 seconds
Inference 6 took 0.200824 seconds
Inference 7 took 0.199095 seconds
Inference 8 took 0.199493 seconds
Inference 9 took 0.200469 seconds
Inference 10 took 0.199092 seconds
Inference 11 took 0.200935 seconds
Inference 12 took 0.199280 seconds
Inference 13 took 0.199491 seconds
Inference 14 took 0.199399 seconds
Inference 15 took 0.200766 seconds
Inference 16 took 0.203524 seconds
Inference 17 took 0.198946 seconds
Inference 18 took 0.201468 seconds
Inference 19 took 0.198967 seconds
3 detection[s]
['dog', 'dog', 'dog']