Amazon AWS to Deploy Cerebras Wafer-Level AI Chip CS-3 Alongside Own Trainium
Amazon AWS announces a partnership with Cerebras to deploy a hybrid AI inference system on Amazon Bedrock, combining Cerebras CS-3 chips with AWS Trainium chips. The system leverages each chip's strengths: Trainium handles prefill (prompt processing) workloads while CS-3 manages decoding (output generation), connected via AWS EFA elastic network adapter.
Amazon AWS has announced a partnership with Cerebras to integrate the company's wafer-level AI chip CS-3 into its cloud infrastructure. The collaboration, announced on March 13, will bring a hybrid AI inference system to Amazon Bedrock in the coming months.
The system combines Cerebras CS-3, AWS Trainium chips, and AWS EFA (Elastic Fabric Adapter) networking. Under this architecture, Trainium chips handle the prefill stage (prompt processing), while CS-3 systems承担解码 (output generation) tasks. The two components communicate through the EFA network adapter.
This hybrid approach is designed to leverage the unique strengths of each chip. Prefill workloads are parallel in nature, requiring high compute power with moderate memory bandwidth. Decoding workloads are serial, needing lower compute but higher memory bandwidth. By combining these chips, AWS aims to deliver optimal performance and user experience for AI inference tasks.
The partnership marks a significant step in AWS's strategy to offer diverse AI computing options through its Bedrock platform, potentially providing customers with faster inference capabilities for large language models and other AI applications.