Deploy Qwen models with Amazon Bedrock Custom Model Import

We’re excited to announce that Amazon Bedrock Custom Model Import now supports Qwen models. You can now import custom weights for Qwen2, Qwen2_VL, and Qwen2_5_VL architectures, including models like Qwen 2, 2.5 Coder, Qwen 2.5 VL, and QwQ 32B. You can bring your own customized Qwen models into Amazon Bedrock and deploy them in a fully managed, serverless environment—without having to manage infrastructure or model serving.

In this post, we cover how to deploy Qwen 2.5 models with Amazon Bedrock Custom Model Import, making them accessible to organizations looking to use state-of-the-art AI capabilities within the AWS infrastructure at an effective cost.

Overview of Qwen models

Qwen 2 and 2.5 are families of large language models, available in a wide range of sizes and specialized variants to suit diverse needs:

General language models: Models ranging from 0.5B to 72B parameters, with both base and instruct versions for general-purpose tasks
Qwen 2.5-Coder: Specialized for code generation and completion
Qwen 2.5-Math: Focused on advanced mathematical reasoning
Qwen 2.5-VL (vision-language): Image and video processing capabilities, enabling multimodal applications

Overview of Amazon Bedrock Custom Model Import

Amazon Bedrock Custom Model Import enables the import and use of your customized models alongside existing foundation models (FMs) through a single serverless, unified API. You can access your imported custom models on-demand and without the need to manage the underlying infrastructure. Accelerate your generative AI application development by integrating your supported custom models with native Amazon Bedrock tools and features like Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Agents. Amazon Bedrock Custom Model Import is generally available in the US-East (N. Virginia), US-West (Oregon), and Europe (Frankfurt) AWS Regions. Now, we’ll explore how you can use Qwen 2.5 models for two common use cases: as a coding assistant and for image understanding. Qwen2.5-Coder is a state-of-the-art code model, matching capabilities of proprietary models like GPT-4o. It supports over 90 programming languages and excels at code generation, debugging, and reasoning. Qwen 2.5-VL brings advanced multimodal capabilities. According to Qwen, Qwen 2.5-VL is not only proficient at recognizing objects such as flowers and animals, but also at analyzing charts, extracting text from images, interpreting document layouts, and processing long videos.

Prerequisites

Before importing the Qwen model with Amazon Bedrock Custom Model Import, make sure that you have the following in place:

An active AWS account
An Amazon Simple Storage Service (Amazon S3) bucket to store the Qwen model files
Sufficient permissions to create Amazon Bedrock model import jobs
Verified that your Region supports Amazon Bedrock Custom Model Import

Use case 1: Qwen coding assistant

In this example, we will demonstrate how to build a coding assistant using the Qwen2.5-Coder-7B-Instruct model

Go to to Hugging Face and search for and copy the Model ID Qwen/Qwen2.5-Coder-7B-Instruct:

You will use Qwen/Qwen2.5-Coder-7B-Instruct for the rest of the walkthrough. We don’t demonstrate fine-tuning steps, but you can also fine-tune before importing.

Use the following command to download a snapshot of the model locally. The Python library for Hugging Face provides a utility called snapshot download for this:

from huggingface_hub import snapshot_download

snapshot_download(repo_id=" Qwen/Qwen2.5-Coder-7B-Instruct", 
                local_dir=f"./extractedmodel/")

Depending on your model size, this could take a few minutes. When completed, your Qwen Coder 7B model folder will contain the following files.

Configuration files: Including config.json, generation_config.json, tokenizer_config.json, tokenizer.json, and vocab.json
Model files: Four safetensor files and model.safetensors.index.json
Documentation: LICENSE, README.md, and merges.txt

Upload the model to Amazon S3, using boto3 or the command line:

aws s3 cp ./extractedfolder s3://yourbucket/path/ --recursive

Start the import model job using the following API call:

response = self.bedrock_client.create_model_import_job(
                jobName="uniquejobname",
                importedModelName="uniquemodelname",
                roleArn="fullrolearn",
                modelDataSource={
                    's3DataSource': {
                        's3Uri': "s3://yourbucket/path/"
                    }
                }
            )

You can also do this using the AWS Management Console for Amazon Bedrock.

In the Amazon Bedrock console, choose Imported models in the navigation pane.
Choose Import a model.

Enter the details, including a Model name, Import job name, and model S3 location.

Create a new service role or use an existing service role. Then choose Import model

After you choose Import on the console, you should see status as importing when model is being imported:

If you’re using your own role, make sure you add the following trust relationship as describes in Create a service role for model import.

After your model is imported, wait for model inference to be ready, and then chat with the model on the playground or through the API. In the following example, we append Python to prompt the model to directly output Python code to list items in an S3 bucket. Remember to use the right chat template to input prompts in the format required. For example, you can get the right chat template for any compatible model on Hugging Face using below code:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")

# Instead of using model.chat(), we directly use model.generate()
# But you need to use tokenizer.apply_chat_template() to format your inputs as shown below
prompt = "Write sample boto3 python code to list files in a bucket stored in the variable `my_bucket`"
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Note that when using the invoke_model APIs, you must use the full Amazon Resource Name (ARN) for the imported model. You can find the Model ARN in the Bedrock console, by navigating to the Imported models section and then viewing the Model details page, as shown in the following figure

After the model is ready for inference, you can use Chat Playground in Bedrock console or APIs to invoke the model.

Use case 2: Qwen 2.5 VL image understanding

Qwen2.5-VL-* offers multimodal capabilities, combining vision and language understanding in a single model. This section demonstrates how to deploy Qwen2.5-VL using Amazon Bedrock Custom Model Import and test its image understanding capabilities.

Import Qwen2.5-VL-7B to Amazon Bedrock

Download the model from Huggingface Face and upload it to Amazon S3:

from huggingface_hub import snapshot_download

hf_model_id = "Qwen/Qwen2.5-VL-7B-Instruct"

# Enable faster downloads
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# Download model locally
snapshot_download(repo_id=hf_model_id, local_dir=f"./{local_directory}")

Next, import the model to Amazon Bedrock (either via Console or API):

response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,
    modelDataSource={
        's3DataSource': {
            's3Uri': s3_uri
        }
    }
)

Test the vision capabilities

After the import is complete, test the model with an image input. The Qwen2.5-VL-* model requires proper formatting of multimodal inputs:

def generate_vl(messages, image_base64, temperature=0.3, max_tokens=4096, top_p=0.9):
    processor = AutoProcessor.from_pretrained("Qwen/QVQ-72B-Preview")
    prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    
    response = client.invoke_model(
        modelId=model_id,
        body=json.dumps({
            'prompt': prompt,
            'temperature': temperature,
            'max_gen_len': max_tokens,
            'top_p': top_p,
            'images': [image_base64]
        }),
        accept='application/json',
        contentType='application/json'
    )
    
    return json.loads(response['body'].read().decode('utf-8'))

# Using the model with an image
file_path = "cat_image.jpg"
base64_data = image_to_base64(file_path)

messages = [
    {
        "role": "user",
        "content": [
            {"image": base64_data},
            {"text": "Describe this image."}
        ]
    }
]

response = generate_vl(messages, base64_data)

# Print response
print("Model Response:")
if 'choices' in response:
    print(response['choices'][0]['text'])
elif 'outputs' in response:
    print(response['outputs'][0]['text'])
else:
    print(response)

When provided with an example image of a cat (such the following image), the model accurately describes key features such as the cat’s position, fur color, eye color, and general appearance. This demonstrates Qwen2.5-VL-* model’s ability to process visual information and generate relevant text descriptions.

The model’s response:

This image features a close-up of a cat lying down on a soft, textured surface, likely a couch or a bed. The cat has a tabby coat with a mix of dark and light brown fur, and its eyes are a striking green with vertical pupils, giving it a captivating look. The cat's whiskers are prominent and extend outward from its face, adding to the detailed texture of the image. The background is softly blurred, suggesting a cozy indoor setting with some furniture and possibly a window letting in natural light. The overall atmosphere of the image is warm and serene, highlighting the cat's relaxed and content demeanor.

Pricing

You can use Amazon Bedrock Custom Model Import to use your custom model weights within Amazon Bedrock for supported architectures, serving them alongside Amazon Bedrock hosted FMs in a fully managed way through On-Demand mode. Custom Model Import doesn’t charge for model import. You are charged for inference based on two factors: the number of active model copies and their duration of activity. Billing occurs in 5-minute increments, starting from the first successful invocation of each model copy. The pricing per model copy per minute varies based on factors including architecture, context length, Region, and compute unit version, and is tiered by model copy size. The custom model unites required for hosting depends on the model’s architecture, parameter count, and context length. Amazon Bedrock automatically manages scaling based on your usage patterns. If there are no invocations for 5 minutes, it scales to zero and scales up when needed, though this might involve cold-start latency of up to a minute. Additional copies are added if inference volume consistently exceeds single-copy concurrency limits. The maximum throughput and concurrency per copy is determined during import, based on factors such as input/output token mix, hardware type, model size, architecture, and inference optimizations.

For more information, see Amazon Bedrock pricing.

Clean up

To avoid ongoing charges after completing the experiments:

Delete your imported Qwen models from Amazon Bedrock Custom Model Import using the console or the API.
Optionally, delete the model files from your S3 bucket if you no longer need them.

Remember that while Amazon Bedrock Custom Model Import doesn’t charge for the import process itself, you are billed for model inference usage and storage.

Conclusion

Amazon Bedrock Custom Model Import empowers organizations to use powerful publicly available models like Qwen 2.5, among others, while benefiting from enterprise-grade infrastructure. The serverless nature of Amazon Bedrock eliminates the complexity of managing model deployments and operations, allowing teams to focus on building applications rather than infrastructure. With features like auto scaling, pay-per-use pricing, and seamless integration with AWS services, Amazon Bedrock provides a production-ready environment for AI workloads. The combination of Qwen 2.5’s advanced AI capabilities and Amazon Bedrock managed infrastructure offers an optimal balance of performance, cost, and operational efficiency. Organizations can start with smaller models and scale up as needed, while maintaining full control over their model deployments and benefiting from AWS security and compliance capabilities.

For more information, refer to the Amazon Bedrock User Guide.

About the Authors

Ajit Mahareddy is an experienced Product and Go-To-Market (GTM) leader with over 20 years of experience in Product Management, Engineering, and Go-To-Market. Prior to his current role, Ajit led product management building AI/ML products at leading technology companies, including Uber, Turing, and eHealth. He is passionate about advancing Generative AI technologies and driving real-world impact with Generative AI.

Shreyas Subramanian is a Principal Data Scientist and helps customers by using generative AI and deep learning to solve their business challenges using AWS services. Shreyas has a background in large-scale optimization and ML and in the use of ML and reinforcement learning for accelerating optimization tasks.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Dharinee Gupta is an Engineering Manager at AWS Bedrock, where she focuses on enabling customers to seamlessly utilize open source models through serverless solutions. Her team specializes in optimizing these models to deliver the best cost-performance balance for customers. Prior to her current role, she gained extensive experience in authentication and authorization systems at Amazon, developing secure access solutions for Amazon offerings. Dharinee is passionate about making advanced AI technologies accessible and efficient for AWS customers.

Lokeshwaran Ravi is a Senior Deep Learning Compiler Engineer at AWS, specializing in ML optimization, model acceleration, and AI security. He focuses on enhancing efficiency, reducing costs, and building secure ecosystems to democratize AI technologies, making cutting-edge ML accessible and impactful across industries.

June Won is a Principal Product Manager with Amazon SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping applications and last mile delivery.

AWS Machine Learning Blog