亚马逊AWS官方博客

在 Amazon SageMaker 上部署通用 LLM API 接口服务

1. 概述

很多客户在使用 Amazon SageMaker 做推理端点的时候经常会遇到前端应用兼容 OpenAI 的 API,却无法兼容 SageMaker API 调用的情况。如果你想让这些应用可以快速地使用部署到 Amazon Sagemaker 推理端点的模型服务,而又不希望修改其应用代码,那么可以使用此项目所实现的与 OpenAI API 兼容的服务,使用 Amazon SageMaker 作为后端来生成文本响应。服务支持流式响应,可以实时将生成内容返回给客户端。

2. 整体架构

OpenAI Compatible API with Amazon SageMaker 的架构如下图所示:

3. 方案部署

3.1 前提条件

Amazon SageMaker 端点

部署一个 Amazon SageMaker 推理端点,具体步骤可以参考 Jupyter Notebook

AWS 服务权限:

  • Amazon ECS 部署权限
  • Amazon SageMaker 创建推理端点,调用推理端点权限
  • Amazon EC2 ALB 创建权限
  • Amazon ECR 推送权限

3.2 安装与部署

下载项目代码 https://github.com/leoou331/openai-compatible-api-streaming.git

环境变量配置

创建一个.env 文件,设置以下环境变量:

# 客户端测试使用的环境变量
export OPENAI_BASE_URL="http://<ALB ADDRESS>/v1"  # API服务的负载均衡器地址
export OPENAI_API_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxx"  # 调用API的密钥

# 服务端配置使用的环境变量
export API_KEY_CACHE_TTL="3600"  # API密钥缓存时间,单位为秒
export AUTH_SECRET_ID="<AWS Secret Manager Secret ID for API Key>"  # AWS Secret Manager中存储API密钥的Secret ID
export MODEL="<Sagemaker Endpoint Name>"  # SageMaker端点名称
export AWS_REGION="<AWS REGION>"  # AWS区域
export AWS_ACCOUNT_ID="<AWS Account ID>"  # AWS账户ID

构建和推送 Docker 镜像

先加载环境变量,然后构建 Docker 镜像并推送到 ECR

source .env
./build_and_push.sh

build_and_push.sh 这个脚本会:

  • 更新 Dockerfile 中的环境变量,特别是 SAGEMAKER_ENDPOINT_NAME 将被设置为环境变量 MODEL 的值
  • 构建 Docker 镜像
  • 推送镜像到 Amazon ECR

部署服务

Aamazon ECS 部署涉及多个步骤,包括创建集群、任务定义、服务和负载均衡器设置。以下是完整的部署流程:

  1. 创建 Amazon ECS 集群
    aws ecs create-cluster --cluster-name openai-compatible-api
    
  1. 创建 ALB (Application Load Balancer),记下返回的 ALB ARN 和 DNS 名称
       aws elbv2 create-load-balancer \
         --name openai-api-alb \
         --subnets subnet-xxxxxxxx subnet-xxxxxxxx subnet-xxxxxxxx \
         --security-groups sg-xxxxxxx \
         --scheme internet-facing \
         --type application
    
  1. 创建目标组,记下返回的目标组 ARN
    aws elbv2 create-target-group \
         --name streaming-target-group \
         --protocol HTTP \
         --port 8080 \
         --vpc-id vpc-xxxxxxxx \
         --target-type ip \
         --health-check-path /ping \
         --health-check-interval-seconds 30
    
  1. 创建监听器,TARGET-GROUP-ARN 为上一步记录的目标组 ARN
       aws elbv2 create-listener \
         --load-balancer-arn <ALB-ARN> \
         --protocol HTTP \
         --port 80 \
         --default-actions Type=forward,TargetGroupArn=<TARGET-GROUP-ARN>
    
  1. 创建任务执行 IAM 角色,这个角色允许 ECS 任务拉取 ECR 镜像、访问 CloudWatch 日志等。如果还没有此角色,请创建:
       cat > task-execution-role-trust-policy.json << EOF
       {
         "Version": "2012-10-17",
         "Statement": [
           {
             "Effect": "Allow",
             "Principal": {
               "Service": "ecs-tasks.amazonaws.com"
             },
             "Action": "sts:AssumeRole"
           }
         ]
       }
       EOF
    
       # 创建角色
       aws iam create-role \
         --role-name ecsTaskExecutionRole \
         --assume-role-policy-document file://task-execution-role-trust-policy.json
    
       # 附加必要策略
       aws iam attach-role-policy \
         --role-name ecsTaskExecutionRole \
         --policy-arn arn:aws-cn:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
    
       # 附加访问Secrets Manager的策略
       aws iam attach-role-policy \
         --role-name ecsTaskExecutionRole \
         --policy-arn arn:aws-cn:iam::aws:policy/SecretsManagerReadWrite
       
       # 附加访问SageMaker的策略
       aws iam attach-role-policy \
         --role-name ecsTaskExecutionRole \
         --policy-arn arn:aws-cn:iam::aws:policy/AmazonSageMakerFullAccess
    
  1. 创建任务定义:

创建一个名为`task-definition.json`的文件:

cat > task-definition.json << EOF   
{
     "family": "streaming-service",
     "networkMode": "awsvpc",
     "executionRoleArn": "arn:aws-cn:iam::${AWS_ACCOUNT_ID}:role/ecsTaskExecutionRole",
     "taskRoleArn": "arn:aws-cn:iam::${AWS_ACCOUNT_ID}:role/ecsTaskExecutionRole",
     "containerDefinitions": [
       {
         "name": "streaming-container",
         "image": "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/openai-compatible-api:latest",
         "essential": true,
         "portMappings": [
           {
             "containerPort": 8080,
             "hostPort": 8080,
             "protocol": "tcp"
           }
         ],
         "environment": [
           {
             "name": "AWS_REGION",
             "value": "${AWS_REGION}"
           },
           {
             "name": "AUTH_SECRET_ID",
             "value": "${AUTH_SECRET_ID}"
           },
           {
             "name": "API_KEY_CACHE_TTL",
             "value": "${API_KEY_CACHE_TTL}"
           },
           {
             "name": "SAGEMAKER_ENDPOINT_NAME",
             "value": "${MODEL}"
           }
         ],
         "logConfiguration": {
           "logDriver": "awslogs",
           "options": {
             "awslogs-group": "/ecs/streaming-service",
             "awslogs-region": "${AWS_REGION}",
             "awslogs-stream-prefix": "ecs",
             "awslogs-create-group": "true"
           }
         },
         "healthCheck": {
           "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
           "interval": 30,
           "timeout": 5,
           "retries": 3,
           "startPeriod": 60
         }
       }
     ],
     "requiresCompatibilities": ["FARGATE"],
     "cpu": "1024",
     "memory": "2048"
   }
EOF

注册任务定义:

# 注册任务定义
aws ecs register-task-definition --cli-input-json file://task-definition-filled.json
  1. 创建 ECS 服务

创建一个名为 service-definition.json 的文件:

   {
     "cluster": "openai-compatible-api",
     "serviceName": "openai-compatible-api",
     "taskDefinition": "streaming-service",
     "loadBalancers": [
       {
         "targetGroupArn": "<TARGET-GROUP-ARN>",
         "containerName": "streaming-container",
         "containerPort": 8080
       }
     ],
     "desiredCount": 1,
     "launchType": "FARGATE",
     "platformVersion": "LATEST",
     "networkConfiguration": {
       "awsvpcConfiguration": {
         "subnets": ["subnet-xxxxxx", "subnet-xxxxx", "subnet-xxxxx"],
         "securityGroups": ["sg-xxxxx"],
         "assignPublicIp": "ENABLED"
       }
     },
     "healthCheckGracePeriodSeconds": 60,
     "schedulingStrategy": "REPLICA",
     "deploymentController": {
       "type": "ECS"
     },
     "deploymentConfiguration": {
       "deploymentCircuitBreaker": {
         "enable": true,
         "rollback": true
       },
       "maximumPercent": 200,
       "minimumHealthyPercent": 100
     }
   }

替换<TARGET-GROUP-ARN>为之前创建的目标组 ARN,更新相应的 subnet id 和 securityGroups,然后创建服务:

aws ecs create-service --cli-input-json file://service-definition.json
  1. 获取 ALB DNS 名称

获取 ALB 的 DNS 名称,用于访问服务:

   aws elbv2 describe-load-balancers \
     --names openai-api-alb \
     --query 'LoadBalancers[0].DNSName' \
     --output text

使用获得的 DNS 名称更新`.env`文件中的`OPENAI_BASE_URL`:

export OPENAI_BASE_URL="http://<ALB-DNS-NAME>/v1"
  1. 监控服务状态
        aws ecs describe-services \
          --cluster openai-compatible-api \
          --services openai-compatible-api \
          --query 'services[0].{ServiceName:serviceName,Status:status,DesiredCount:desiredCount,RunningCount:runningCount,TaskDefinition:taskDefinition}' \
          --output table
    

这个命令会显示一个简洁的表格,包含服务名称、状态、期望任务数、运行中任务数和使用的任务定义版本,如下图所示:

4. 测试与使用

本方案包含一个测试脚本 OpenAI_Client_Test.debug.py,可用于验证 API 的功能。

首先加载环境变量:

source .env

运行测试脚本:

python openAI_client_test.debug.py

脚本会使用环境变量中设置的`OPENAI_BASE_URL`、`OPENAI_API_KEY`和`MODEL`,发送一个简单的问候消息”Hello!”,并以流式方式接收和显示响应。输出如下所示:

python OpenAI_Client_Test.debug.py

[ec2-user@ip-172-31-10-156 openai-compatible-api-streaming]$ ./openAI_client_test.debug.py 
[10:07:14.328] 脚本开始执行
[10:07:14.328] 环境变量检查:
[10:07:14.328] OPENAI_BASE_URL = http://xxxxxxx.elb.amazonaws.com.cn/v1
[10:07:14.328] MODEL = deepseek-ai-DeepSeek-R1-Distill-Qwen-1-5B-250326-0342
[10:07:14.328] OPENAI_API_KEY = 已设置
[10:07:14.328] 初始化 OpenAI 客户端...
[10:07:14.401] 创建聊天完成请求,模型: deepseek-ai-DeepSeek-R1-Distill-Qwen-1-5B-250326-0342
[10:07:15.104] 开始接收流式响应...
Alright, the user said "Hello!" and I should respond warmly.

I'll greet them and offer my help.

Keeping it friendly and open-ended should work best.
</think>

Hello! How can I assist you today?

完整响应: Alright, the user said "Hello!" and I should respond warmly.

I'll greet them and offer my help.

Keeping it friendly and open-ended should work best.
</think>

Hello! How can I assist you today?
[10:07:15.217] 脚本执行完毕

5. 总结

本解决方案提供了兼容 OpenAI 的代理功能,帮助客户轻松将 OpenAI 应用轻松接入到 Amazon SageMaker 推理端点,主要具有如下特点:

  • 与 OpenAI API 兼容,易于与现有工具集成
  • 支持流式响应(Server-Sent Events)
  • API 密钥认证
  • 使用 Amazon SageMaker 作为后端推理服务
  • 支持多种部署方式(Docker, ECS, EKS)

*前述特定亚马逊云科技生成式人工智能相关的服务仅在亚马逊云科技海外区域可用,亚马逊云科技中国仅为帮助您了解行业前沿技术和发展海外业务选择推介该服务。

本篇作者

欧智华

超过 15 年的丰富 IT 行业经验,曾在研发、咨询、架构等多个专业技术领域深耕,积累了深厚的专业知识和实践经验。目前专注于支持客户实施落地人工智能解决方案,致力于通过技术创新推动企业数字化转型与智能化升级。