AzureDeveloper

Jupyter for Azure机器学习五天入门 - Day3 使用Jupyter部署图像分类模型

分类: Azure机器学习 ◆ 标签: #Azure #人工智能 #机器学习 #JupyterBook ◆ 发布于: 2023-06-11 21:51:40

我们在前一章学习了如何通过jupyter来训练图像分类模型，我们本节需要直接使用上一节训练出来的模型，上一节中我们已经了解了如何将模型训练成功，并且将训练成功的模型注册到workspace中，我们这一节介绍如何使用jupyter来链接Azure Machine Learning的workspace并部署模型为一个web服务，从而让用户可以通过rest api来使用这个模型。

如果需要了解前几章的内容，请参考文末的链接。

本章最重要的点在于理解: 如何进行模型部署以及将模型部署到哪里

开始之前

先完成前面的环境设置。
再完成上一节使用jupyter进行模型训练。
激活conda环境之后，确认安装了库matplotlib和库scikit-learn

开始部署已经训练好的模型

本章剩余的内容以及源代码，您也可以通过这个URL学习和了解：https://github.com/hylinux/azure-demo/blob/main/azure-machine-learning/jupyter-5-days/Day-2.ipynb

设置环境

在本地启动jupyer notebook的时候，确保在项目的根目录启动，同时确保您已经将Azure Machine Learning的config.js文件放到了项目的根目录。然后在jupyer notebook上新建一个book, 选择您的conda python环境。然后在单元格里输入下述代码:

注意
也必须确保已经使用了azure cli设定了目标云服务，而且使用了az login进行了登录，同时也有使用az 设定了workspace 默认所在的订阅

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
 
import azureml.core

# Display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

将训练好的模型部署成一个web服务

我们在本节计划将模型部署到Azure Container Instance(ACI)上，一般情况我们建议在开发和测试的情况下部署到ACI，对于生产环境我们一般建议部署到AKS上以适应生产环境的需要。

要部署到ACI，我们需要：

定义一个scoring脚本，这个脚本暴露给客户端使用。
定义一个配置文件，用于创建ACI环境。

创建`scoring`脚本

我们创建一个名为score.py的Python脚本，这个脚本中有两个函数非常重要:

init: 这个函数主要的作用是将注册到workspace里的模型载入成为一个global的对象，init仅仅在初始化运行一次。
run: 该方法接受用户的请求以及请求参数，然后使用模型预测用户需要的结果。

如下是主要的实现代码:

%%writefile score.py
import json
import numpy as np
import os
import pickle
import joblib

def init():
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # For multiple models, it points to the folder containing all deployed models (./azureml-models)
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    # you can return any data type as long as it is JSON-serializable
    return y_hat.tolist()

可以看到非常简单。就载入模型，然后接受参数，使用模型进行预测。

创建配置文件

from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "MNIST",  "method" : "sklearn"}, 
                                               description='Predict MNIST with sklearn')

部署到ACI

部署到ACI，一般情况需要花2-5分钟。我们还是使用脚本来部署

创建一个enviroment对象，该对象描述模型需要的依赖关系。
创建一个模型部署为web的必要配置。
部署到ACI
返回一个web service的endpoint

%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.from_config()
model = Model(ws, 'sklearn_mnist')


myenv = Environment.get(workspace=ws, name="tutorial-env", version="1")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

service = Model.deploy(workspace=ws, 
                       name='sklearn-mnist-svc3', 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

注意
tutorial-env是在上一章训练模型的时候创建的一个enviroment对象，为了方便部署成功，这里直接使用就好了。

测试模型

我们使用如下的脚本来测试发布的模型

下载测试数据

import os
from azureml.core import Dataset
from azureml.opendatasets import MNIST

data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

mnist_file_dataset = MNIST.get_file_dataset()
mnist_file_dataset.download(data_folder, overwrite=True)

载入测试数据

from utils import load_data
import os
import glob

data_folder = os.path.join(os.getcwd(), 'data')
# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster
X_test = load_data(glob.glob(os.path.join(data_folder,"**/t10k-images-idx3-ubyte.gz"), recursive=True)[0], False) / 255.0
y_test = load_data(glob.glob(os.path.join(data_folder,"**/t10k-labels-idx1-ubyte.gz"), recursive=True)[0], True).reshape(-1)

预测数据

import json
test = json.dumps({"data": X_test.tolist()})
test = bytes(test, encoding='utf8')
y_hat = service.run(input_data=test)


from sklearn.metrics import confusion_matrix

conf_mx = confusion_matrix(y_test, y_hat)
print(conf_mx)
print('Overall accuracy:', np.average(y_hat == y_test))

# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = conf_mx / row_sums
np.fill_diagonal(norm_conf_mx, 0)

fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot(111)
cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)
ticks = np.arange(0, 10, 1)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_xticklabels(ticks)
ax.set_yticklabels(ticks)
fig.colorbar(cax)
plt.ylabel('true labels', fontsize=14)
plt.xlabel('predicted values', fontsize=14)
plt.savefig('conf.png')
plt.show()

你也可以使用HTTP请求来测试web服务

import requests

# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"

headers = {'Content-Type': 'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_data, headers=headers)

print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", y_test[random_index])
print("prediction:", resp.text)

至此我们完成了通过jupyter链接workspace进行模型部署，以及测试部署之后的模型。