What We Shipped in 2024

In 2024, we radically transformed how organizations manage their Kubernetes infrastructure by shipping our most ambitious features yet. Here’s a look at them below:

Continuous Deployment Engine

Our Continuous Deployment Engine takes aim at the most common challenges that companies of all sizes face when it comes to managing their Kubernetes setups, including upgrading Kubernetes clusters and setting up robust deployment pipelines.

Managing the lifecycle of your Kubernetes deployment

Managing the lifecycle of your Kubernetes clusters can be a surprisingly difficult task, especially when dealing with things like rotating node groups or upgrading add-ons and controllers.

In our Continuous Deployment engine, the Clusters tab acts as a pre-flight checklist for successful cluster upgrades, identifying controllers compatible with the current and upgraded versions. The version matrix feature further streamlines the process by mapping which controller versions align with specific Kubernetes versions. For a detailed guide, check out this blog post.

Meanwhile, our Global Services feature simplifies the management of essential Kubernetes add-ons, such as networking solutions like NGINX, observability agents like DataDog, and CNI providers like Calico. Instead of managing these add-ons individually for each cluster, Plural allows users to define a single Global Service resource. This resource replicates the defined service across all clusters targeted by it. This feature ensures uniformity and reduces administrative overhead even with hundreds of clusters.

Setting up robust deployment pipelines

Standard open-source tools in the CNCF ecosystem often lack the necessary support for setting up complex deployment pipelines. Many tools are limited to simple one-repo-to-one-cluster deployments, making it necessary to create complex workflows that require specialized knowledge.

Our deployment pipelines feature automates the entire deployment lifecycle with GitOps-driven workflows, generating automated pull requests for each pipeline stage. Approval gates then provide additional control by restricting promotions based on integration tests, manual approvals, or other validation criteria. In addition, pipelines can control global services, allowing for cross-cluster deployments to be orchestrated in a multi-staged, gated manner.

All of this runs on an egress-only network architecture, ensuring secure communication between managed Kubernetes clusters and a central hub cluster. Our approach relies on agents installed on each Kubernetes cluster managed by Plural. These agents initiate only outbound network connections, eliminating the need for inbound network access to managed clusters. This design significantly enhances security in two key ways: it minimizes the attack surface of managed clusters by restricting network access, and it eliminates the need for the management cluster to store sensitive kubeconfig files for each managed cluster.

Infrastructure as Code Management with Stacks

Stacks is our powerful solution for managing infrastructure as code (IaC), designed to enhance security, efficiency, and control over your infrastructure deployments.

The core workflow is simple:

1. Define your stack declaratively by specifying:

The IaC type (Terraform, Ansible, etc.)
The source code location in your git repository
The target cluster for execution

2. Our deployment operator automatically detects commits to your tracked repository and executes runs on the targeted cluster. During execution, Stacks provides:

Real-time stdout communication to the UI
Comprehensive visibility into inputs and outputs
Visual Terraform state diagrams
Automated PR comments with plan results
Fine-grained permission controls
Configurable network location for IaC runs

Built on CRDs (Custom Resource Definitions), Stacks integrates with our Continuous Deployment engine as well as monitoring tools like Datadog, which means that Stacks can automatically detect and respond to infrastructure issues. For example, during Kubernetes cluster upgrades, stack runs can be automatically canceled if problems are detected, preventing extended outages and reducing operational risk. The CRD-based approach also enables the powerful self-service provisioning patterns that are brought to life in our Service Catalog.

When connected to GitHub or GitLab, Stacks automatically executes Terraform plans on pull requests and posts the results directly as PR comments.

Built-in Multi-Cluster Dashboard

Plural's Built-in Multi-Cluster Dashboard provides deep visibility and control over your Kubernetes resources, including core components like pods, deployments, and replica sets, as well as infrastructure elements such as networking and storage resources. The dashboard maintains live updates of cluster state and resource conditions across your entire fleet, providing real-time visibility into your infrastructure's health and status.

The dashboard is built on a sophisticated reverse tunneling Kubernetes authentication proxy that enables full cluster visibility and control without exposing internal cluster endpoints. At its core is the egress-only communication model mentioned previously. This architecture allows for the management of clusters across different networks, such as production and development VPCs or private networks, without the traditional complexity of juggling VPN credentials or maintaining separate environments. As a practical example, even clusters running on local development environments (like K3s on a laptop) can be securely managed through the same interface as production clusters.

Unlike solutions such as the EKS dashboard that require manual setup, Plural's dashboard comes pre-configured, with a number of enterprise-ready features such as Single Sign-On (SSO) integration and comprehensive audit logging that tracks all API requests through the dashboard.

Self-Service Catalog

Our Self-Service Catalog is built on the idea that infrastructure provisioning should be as simple as selecting from a menu, while being as powerful as custom code.

With the Self-Service Catalog, platform teams can create a curated set of infrastructure components that developers can then deploy with confidence. For example, the platform team can define standardized provisioning paths that automatically enforce security policies, resource limits, and compliance requirements. This ensures that all infrastructure deployments follow organizational best practices without creating bottlenecks.

The service catalog comes pre-configured with a number of common components, including:

log setup with ElasticSearch
scalable Prometheus setup with VictoriaMetrics
data engineering tooling like Airbyte/Dagster/MLFlow
security tooling like OPA Gatekeeper and Trivy operator

But it’s also extensible. Any infrastructure component that can be defined in Terraform or GitOps manifests can be added to your catalog, making it adaptable to your specific needs. Whether it's setting up multi-cluster observability with Prometheus and Grafana, implementing cost management with Kubecost, or deploying security scanning with Trivy – everything is available through a simple, consistent interface.

AI Insight Engine

Managing infrastructure typically involves endless hours of troubleshooting misconfigurations, responding to alerts, and answering repetitive developer questions. After looking into the space a bit more closely, we realized that Large Language Models (LLMs) offer remarkable benefits to these workflows due to the completeness of the Kubernetes API and the fact so much of the core information is already in the LLM training set. we’ve incorporated LLMs in a number of ways in our product to address these issues.

Root Cause Analysis

Our AI Insight Engine performs automatic root cause analysis, leveraging a causal evidence graph that spans Terraform logs, Kubernetes objects, their ownership hierarchy, and GitOps manifests. This allows us to quickly pinpoint the root cause of any issue, eliminating the need for time-consuming manual digging.

AI-Powered Fixes and Collaboration

But detection is just the beginning. Plural AI goes further by offering actionable solutions through our AI Fix Engine. When an issue is identified, the AI suggests accurate code changes to resolve it. And if the AI's suggestion needs refinement, you can seamlessly collaborate in a chat with the AI to fine-tune the fix.

Offload Internal Support with Explain With AI

Managing infrastructure also means answering countless internal questions. Plural AI makes this easier with Explain With AI which offloads internal support to LLMs. This feature allows you to instantly get clear explanations of complicated configurations, reducing the burden on your team and freeing them up to focus on more strategic tasks.

Simplifying Terraform Plans

Terraform plans can be complex and challenging to decipher. With Plural AI’s Terraform Plan Explain, you can quickly receive AI-generated, easy-to-understand explanations of any intricate Terraform plan, making it more straightforward for your team to understand and act on.

Note that Plural allows you to bring the LLM already approved by your enterprise, so there are no middleman concerns.

Conclusion

While Kubernetes has become the de facto standard for container orchestration, it has traditionally required deep expertise to operate effectively. Our goal last year was to change that equation by systematically breaking down complex operations into manageable, automated workflows.

Whether it's AI-powered troubleshooting that eliminates hours of manual investigation, automated upgrade paths that safely handle cluster transitions, or self-service catalogs that empower developers to provision infrastructure confidently, we're removing the need for specialized Kubernetes expertise at every step.

As we look ahead, we remain committed to this vision: making sophisticated infrastructure management accessible to all engineering teams. And we're just getting started.

Ready to simplify your infrastructure management? Get started with Plural today.