NextGen is the new architecture of Amazon OpenSearch Serverless, a fully managed search and vector engine. The core innovation is that it completely decouples compute and storage through a shared storage layer, allowing compute capacity to scale independently of data volume. This enables the service to provision resources in seconds and scale all the way to zero when idle.
AWS names the two architectures: older collections are now called Classic, while the new architecture is NextGen and is the default when creating new collections via the Console. In API/CLI, you specify --generation NEXTGEN (or --generation CLASSIC to keep the old architecture).
The original Serverless architecture always maintained a minimum of two OpenSearch Compute Units (OCUs) running at all times. This makes sense for a production search engine with steady traffic, but is wasteful for agent-type workloads:
NextGen’s fundamental change is separating compute from storage. A new shared storage layer is accessed by both indexing OCUs and search OCUs, allowing OCUs to scale up or down regardless of stored data volume. You can have multiple indexes with data without incurring compute costs when not indexing or searching.
The architecture diagram below illustrates how NextGen serves an AI agent: the agent creates embeddings via Bedrock and writes to the Indexing OCU; vector queries go through the Search OCU. Both Indexing OCU and Search OCU (compute layer) scale independently and share a separate Shared Storage Layer, enabling scale-to-zero when idle.
The new generation of OpenSearch Serverless was announced on May 28, 2026 and is GA. At launch, two collection types are supported: full-text search and vector search. Collections can be created via Console, AWS SDK, and AWS CLI; AWS CloudFormation support is coming soon. Note: ‘shared storage’ still incurs GB-month storage charges even when compute is at zero, so the 20x and 60% figures are conditional, tied to specific baselines and workloads with significant idle time.
OpenSearch Serverless NextGen directly addresses the cost challenge of the AI agent era: bursty traffic followed by silence. By decoupling compute and storage to scale to zero, it enables deploying production-ready search and vector backends in minutes, paying only for actual usage. This is a natural fit for RAG and semantic search architectures without needing to operate a separate Vector Database.