[작성중] [3] Azure MLOps 템플릿을 통해 MLOps 파이프라인 이해하기

azure에서 제공하는 public template을 살펴보면서 MLOps 파이프라인을 이해해 보겠습니다.

먼저 https://github.com/Azure/mlops-v2를 clone합니다.

그리고 mlopw-v2/documentation/deployguides/deployguid_gha.md를 따라서 다음과 같이 수행해줍니다.

gh 설치 & login (github cli)
- linux는 apt-get install gh / mac은 brew instll gh
- gh auth login
- 이는 간단한 키워드만으로 레포지토리 clone / pull request 등을 수행할 수 있다.
WSL을 사용하는 경우 dos2unix를 설치해준다.
git config가 되어있지 않으면 다음을 수행한다.
- git config --global user.email "[email protected]"
- git config --global user.name "Your Name"

완료되었으면 sparse_checkout.sh를 실행해주세요. (sh파일의 앞부분을 본인의 설정에 맞게 조금 수정해주어야 합니다)

잘 실행이 되면 다음과 같은 private repository가 생성됩니다.

Untitled

project의 구조 살펴보기

이 프로젝트의 전체적인 구조는 다음과 같네요

.
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── SECURITY.md
├── SUPPORT.md
├── config-infra-dev.yml : 개발용 infra구성. workflow에서 $GITHUB_ENV로 추가해준다.
├── config-infra-prod.yml : production용 infra구성. workflow에서 $GITHUB_ENV로 추가해준다. <https://docs.github.com/ko/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable>
├── data : 학습에 사용할 data들
│   ├── taxi-batch.csv
│   ├── taxi-data.csv
│   └── taxi-request.json
├── data-science : ML / DL 관련
│   ├── environment
│   │   └── train-conda.yml : training 위한 conda 가상환경의 yml파일
│   └── src : mlflow로 짜여진 ML/DL 관련 코드들
│       ├── evaluate.py : 평가코드
│       ├── prep.py : 전처리 코드
│       ├── register.py : deploy flag가 True일 때 학습된 ML model을 register하는 코드
│       └── train.py : 모델을 학습하고 학습된 모델을 저장하는 코드
├── environment.yml : azureml-cli-v2사용을 위한 conda 가상환경 yml파일
├── infrastructure : terraform으로 정의된 infrastructures
│   ├── aml_deploy.tf
│   ├── locals.tf
│   ├── main.tf
│   ├── modules
│   │   ├── aml-workspace
│   │   │   ├── main.tf
│   │   │   ├── outputs.tf
│   │   │   └── variables.tf
│   │   ├── application-insights
│   │   │   ├── main.tf
│   │   │   ├── outputs.tf
│   │   │   └── variables.tf
│   │   ├── container-registry
│   │   │   ├── main.tf
│   │   │   ├── outputs.tf
│   │   │   └── variables.tf
│   │   ├── data-explorer
│   │   │   ├── main.tf
│   │   │   ├── outputs.tf
│   │   │   └── variables.tf
│   │   ├── key-vault
│   │   │   ├── main.tf
│   │   │   ├── outputs.tf
│   │   │   └── variables.tf
│   │   ├── resource-group
│   │   │   ├── main.tf
│   │   │   ├── outputs.tf
│   │   │   └── variables.tf
│   │   └── storage-account
│   │       ├── main.tf
│   │       ├── outputs.tf
│   │       └── variables.tf
│   └── variables.tf
├── mlops : mlops관련
│   └── azureml 
│       ├── deploy : 배포 관련 aml-cli-v2 명령작업 yaml <https://learn.microsoft.com/ko-kr/azure/machine-learning/reference-yaml-job-command?view=azureml-api-2>
│       │   ├── batch
│       │   │   ├── batch-deployment.yml
│       │   │   └── batch-endpoint.yml
│       │   └── online
│       │       ├── online-deployment.yml
│       │       ├── online-endpoint.yml
│       │       └── score.py
│       └── train : 학습 관련 aml-cli-v2 명령작업 yaml
│           ├── data.yml
│           ├── pipeline.yml
│           └── train-env.yml
└── requirements.txt

구독 단위의 principal 생성하고 github secrets 추가하기

이 템플릿은 main branch는 production 배포용, 이외의 branch는 개발용으로 배포되도록 설정되어 있습니다.

<aside> 💡 dev용 (개발용) 리소스 그룹, production용 (상품용) 리소스 그룹이 따로 만들어지기 때문에, 개발용은 용량이 적은 리소스로, production용은 용량이 큰 리소스로 운영을 할 수 있게되고 또한 상품용은 보안이 철저하도록 접근할 수 없도록 인프라를 구성. 브랜치를 따로 만들어서 수행을 하게 된다. (workflow파일 내에서 if로 정해져있음)

</aside>

따라서 여러개의 리소스그룹을 생성하는 등의 작업이 필요해서 구독 단위의 principal이 필요합니다. 다음 명령어로 생성해줍니다.

$ az ad sp create-for-rbac --name <service_principal_name> --role contributor --scopes /subscriptions/<subscription_id> --sdk-auth

이를 수행했을 때 출력되는 json을 저장해둡니다

예시)

{
  "clientId": "xxxx6ddc-xxxx-xxxx-xxx-ef78a99dxxxx",
  "clientSecret": "xxxx79dc-xxxx-xxxx-xxxx-aaaaaec5xxxx",
  "subscriptionId": "xxxx251c-xxxx-xxxx-xxxx-bf99a306xxxx",
  "tenantId": "xxxx88bf-xxxx-xxxx-xxxx-2d7cd011xxxx",
  "activeDirectoryEndpointUrl": "<https://login.microsoftonline.com>",
  "resourceManagerEndpointUrl": "<https://management.azure.com/>",
  "activeDirectoryGraphResourceId": "<https://graph.windows.net/>",
  "sqlManagementEndpointUrl": "<https://management.core.windows.net:8443/>",
  "galleryEndpointUrl": "<https://gallery.azure.com/>",
  "managementEndpointUrl": "<https://management.core.windows.net/>"
}