Redshiftクラスタの暗号化をやってみた〜準備編〜

最近までずっとAWSサービスストレージの暗号化しかやっていなかったのでネタは暗号化一本ですｗ今回はRedshiftクラスタのストレージ暗号化です。

今回は長いので、準備編と移行編に分けたいと思います。

Redshiftクラスタの暗号化方法
どうやってデータを移行するか
UnloadCopyUtility
準備
最後に
参考URL

Redshiftクラスタの暗号化方法

Redshiftクラスタは暗号化は対応しているのですが、RDSのようにスナップショットから復元しても暗号化設定ができません。
ドキュメントを読むと、暗号化済のRedshiftクラスタを新規作成し、データ移行するしかないようです。

クラスターを起動するときに暗号化を有効にします。暗号化されていないクラスターから暗号化されたクラスターに移行するには、まず既存のソースクラスターからデータをアンロードします。次に、選択した暗号化設定を使用して、新しいターゲットクラスター内のデータを再ロードします。

どうやってデータを移行するか

データの移行(データのアンロード&コピー)をどうやるかですが、AWS Database Migration Service(DMS) を使いたいところですが、ソースにRedshiftを選択できません。

Amazon RDS instance databases, and Amazon S3
* Oracle versions 11g (versions 11.2.0.3.v1 and later) and 12c, for the Enterprise, Standard, Standard One, and Standard Two editions.
* Microsoft SQL Server versions 2008R2, 2012, 2014, and 2016 for the Enterprise, Standard, Workgroup, and Developer editions. The Web and Express editions are not supported.
* MySQL versions 5.5, 5.6, and 5.7.
* MariaDB (supported as a MySQL-compatible data source). * PostgreSQL 9.4 and later. Change data capture (CDC) is only supported for versions 9.4.9 and higher and 9.5.4 and higher. The rds.logical_replication parameter, which is required for CDC, is supported only in these versions and later.
* Amazon Aurora (supported as a MySQL-compatible data source).
* Amazon S3.

https://docs.aws.amazon.com/ja_jp/dms/latest/userguide/CHAP_Introduction.Sources.html

先程のドキュメントを見ますと、Amazon Redshift アンロード/コピーユーティリティ(UnloadCopyUtility)が紹介されているので、今回はそれを使用します。

UnloadCopyUtility

Amazon Redshift Unload/Copy Utility

AWSが開発しているRedshiftユーティリティツールの中にあります。

この図を見ると、RedshiftクラスタからデータをアンロードしたものをS3に一時的に保管し、移行先のRedshiftクラスタにCOPYコマンドでロードしているといった、シンプルな動作になります。

RedshiftUnloadCopy

https://github.com/awslabs/amazon-redshift-utils/blob/master/src/UnloadCopyUtility/RedshiftUnloadCopy.png

準備

ここから本記事の本題に入りますが、UnloadCopyUtilityを使えるように準備していきましょう。

実行環境はAmazon Linuxです。

% uname -r
4.9.85-38.58.amzn1.x86_64
% cat /etc/system-release
Amazon Linux AMI release 2017.09
% python -V
Python 2.7.13

パッケージインストールなど

必要なパッケージをインストールします。すでにpipがインストールされているとしてすすめていますが、もしインストールしていない場合は先にpipを入れてください。

% yum install postgresql postgresql-devel gcc python-devel libffi-devel
% pip install PyGreSQL boto3 pytz pycrypto awscli
% pip list | grep PyGreSQL
PyGreSQL                         5.0.4
% pip list | grep boto3
boto3                        1.7.16
% pip list | grep pytz
pytz                         2018.4
% pip list | grep pycrypto
pycrypto                     2.6.1
% aws --version
aws-cli/1.15.16 Python/2.7.13 Linux/4.9.85-38.58.amzn1.x86_64 botocore/1.10.16

Redshiftユーティリティツールをgit cloneしてきます。

% cd /export/tmp/
% git clone https://github.com/awslabs/amazon-redshift-utils.git
% cd amazon-redshift-utils/src/UnloadCopyUtility/

RedshiftクラスタのIAMロール作成

Redshiftクラスタ自身がS3にアクセスする必要があるため、s3:GetObject 及び s3:PutObject の権限を付与したIAMロールを作成し、移行元のRedshiftクラスタにアタッチしてください。

移行用のS3バケットを作成

データを一時的に保管するS3バケットを作成します。
今回は copy-redshift-cluster としていますが、適宜変更してください。

% aws s3 mb s3://copy-redshift-cluster

KMSマスターキーの作成

暗号化Redshiftクラスタで指定するKMSマスターキーを作成します。
デフォルトキーも指定可能ですが、アカウント間のスナップショット共有ができなかったり不便なのでできるだけ専用のKMSキーを作成しておいた方が良いです。
幸いにもスクリプトはUnloadCopyUtilityに同梱されていたのでそれを使いましょう。

% cd amazon-redshift-utils/src/UnloadCopyUtility/
% region=$(curl -fsSL http://169.254.169.254/latest/meta-data/placement/availability-zone | sed -e 's/.$//')
% ./createKmsKey.sh $region

パスワードの暗号化

後で出てくる設定ファイルにRedshiftクラスタの接続情報を記載するので、パスワードを暗号化しておきます。
これもスクリプトが用意されているのでそれを使います。実行すると長い文字列が出るのでそれを控えておいてください。

% vim PASSWORD
<Redshiftクラスタのパスワード>

% ./encryptValue.sh $(cat PASSWORD) $region
AQICAHiOHMxxxxxxxxxxxxxxxxxxxx

設定ファイルの作成

UnloadCopyUtility/example/config.json に設定ファイルの見本があるのでそれをコピーして設定ファイルを作成します。一つのテーブルにつき、一つの設定ファイルが必要になります。

こんな感じで設定します。これを mytable_a.json として保存します。

{
  // 移行元Redshiftクラスタの設定
  "unloadSource": {
    "clusterEndpoint": "source-my-cluster.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com",
    "clusterPort": 5439,
    "connectPwd": "AQICAHiOHMxxxxxxxxxxxxxxxxxxxx",
    "connectUser": "master",
    "db": "mydb",
    "schemaName": "public",
    "tableName": "mytable_a"
  },
  // S3の設定。アンロードしたデータの置き場所を指定する
  "s3Staging": {
    "aws_iam_role": "arn:aws:iam::123456789012:role/redshift-iam-role",
    "path": "s3://copy-redshift-cluster/mytable_a/",
    "deleteOnSuccess": "True",
    "region": "ap-northeast-1"
  },
  // 移行先Redshiftクラスタの設定
  "copyTarget": {
    "clusterEndpoint": "target-my-cluster.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com",
    "clusterPort": 5439,
    "connectPwd": "AQICAHiOHMxxxxxxxxxxxxxxxxxxxx",
    "connectUser": "master",
    "db": "mydb",
    "schemaName": "public",
    "tableName": "mytable_a"
  }
}

作成した設定ファイルをS3にアップロードします。

% aws s3 cp mytable_a.json s3://copy-redshift-cluster/

実行

試しにこのテーブル mytable_a に対してデータを移行してみます。
検証を行う際は、移行元のRedshiftクラスタでクエリが実行されていないことを確認してください。もし停止が難しい場合は、スナップショットから検証用としてクラスタを復元して試してください。
--destination-table-auto-create は移行先のRedshiftクラスタにテーブルが無ければ作成するオプションです。

% python redshift_unload_copy.py \
--destination-table-auto-create \
--s3-config-file s3://copy-redshift-cluster/mytable_a.json $region

INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): copy-redshift-cluster.s3.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): copy-redshift-cluster.s3-ap-northeast-1.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): kms.ap-northeast-1.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): kms.ap-northeast-1.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): kms.ap-northeast-1.amazonaws.com
INFO:root:Task succeeded FailIfResourceClusterDoesNotExistsTask(c45dc900-6874-4578-be4b-78dddee8b353)
INFO:root:Task succeeded NoOperationTask(f3d0a1c9-ca74-43d1-87da-dc0436c09644)
INFO:root:Task succeeded FailIfResourceDoesNotExistsTask(5d6199c2-08e6-476a-b9f8-7f134e853cde)
INFO:root:Creating target target-my-cluster.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com:mydb.public.mytable_a
INFO:root:Creating target-my-cluster.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com:mydb.public.mytable_a with:
INFO:root:CREATE TABLE IF NOT EXISTS public.mytable_a (省略) ;
INFO:root:Task succeeded CreateIfTargetDoesNotExistTask(5b8f0f6b-7b37-428e-96a2-3019fcc98afd)
INFO:root:Task succeeded NoOperationTask(bcb5f367-3311-43c4-b7d7-ff99867d7555)
INFO:root:Exporting from Source (UnloadDataToS3Task(02176840-bb2b-4f5e-a21e-a09c01185c6b))
INFO:root:Executing unload_table against source-my-cluster.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com:mydb.public.mytable_a:
INFO:root:unload ('SELECT * FROM public.mytable_a')
                     to 's3://copy-redshift-cluster/2018-05-07_18:49:11/mydb.public.mytable_a.' credentials
                     'aws_iam_role=arn:aws:iam::123456789012:role/redshift-iam-role;master_symmetric_key=REDACTED'
                     manifest
                     encrypted
                     gzip
                     delimiter '^' addquotes escape allowoverwrite
INFO:root:Task succeeded UnloadDataToS3Task(02176840-bb2b-4f5e-a21e-a09c01185c6b)
INFO:root:Importing to Target (CopyDataFromS3Task(d09bc2be-0a61-494d-8ff8-cc00cb42d362))
INFO:root:Executing copy_table against target-my-cluster.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com:mydb.public.mytable_a:
INFO:root:copy public.mytable_a
                   from 's3://copy-redshift-cluster/2018-05-07_18:49:11/mydb.public.mytable_a.manifest' credentials
                   'aws_iam_role=arn:aws:iam::123456789012:role/redshift-iam-role;master_symmetric_key=REDACTED'
                   manifest
                   encrypted
                   gzip
                   delimiter '^' removequotes escape compupdate off REGION 'ap-northeast-1'
INFO:  Load into table 'mytable_a' completed, 11353329 record(s) loaded successfully.
INFO:root:Task succeeded CopyDataFromS3Task(d09bc2be-0a61-494d-8ff8-cc00cb42d362)
Cleaning up S3 Data Staging Location s3://copy-redshift-cluster/2018-05-07_18:49:11/mydb.public.mytable_a
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): copy-redshift-cluster.s3.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): copy-redshift-cluster.s3-ap-northeast-1.amazonaws.com
INFO:root:Task succeeded CleanupS3StagingAreaTask(b5957f91-b2d1-49cb-9c24-926176336e8c)

移行先Redshiftクラスタで select "table", tbl_rows from svv_table_info; を実行しテーブルが作成されていること、データ件数が移行元と同じであればOKです。

最後に

Redshiftクラスタの暗号化を行うための準備を整えました。
実際に移行しているとわかりますが、データ量が多いテーブルだと途中でコケてしまいます。その対処方法については次回の記事で解説します。

本日も乙

ただの自己満足な備忘録。