Apr 17, 2025
Virtuability Works With St. James’s Place To Deliver An MLOps Deployment & Governance Framework
Introduction
Virtuability, a Professional Services consulting company and AWS Select Services Partner, partnered with St. James’s Place, a UK-based financial services company, to implement a comprehensive MLOps deployment and governance framework.
The framework was designed to leverage AWS SageMaker services, AWS multi-account structures, and a robust set of governance and security controls to ensure scalable, reliable and secure ML environments.
Challenges
After initial analysis, Virtuability made a number of recommendations that would help address the challenges that St. James’s Place was facing with scaling governance and operational efficiency from an initial few models to many models and use cases over time.
-
Separation of concerns: Separate model training from model storage and model deployment by AWS account boundaries in a multi-account structure.
-
Cost tracking: Mandate that internal, user-defined cost allocation tags are set on resource creation. This is relevant in particular for AWS Sagemaker resources and jobs, which can be quite costly to run.
-
Model Training & Deployment Guardrails: Leverage Control Tower Proactive controls to mandate use of VPC’s, encryption of S3 objects at rest and in transit as well as other relevant controls.
-
Streamlined Model Deployment: Provide a streamlined process for deploying and managing Sagemaker models across multiple AWS accounts and environments.
-
Shared Services: Use an isolated shared services account approach to host the Sagemaker model registry, perform model container image builds and operation model deployment pipelines.
-
Secure Foundation model consumption: Ensure that Bedrock model endpoints are used to access dataplane operations (Invoke/Converse).
-
Scalability: Standardise model registry and model deployments to allow easy management of Sagemaker models and model endpoints. Leverage AWS CDK pipelines and CDK and Cloudformation infrastructure as code to provide an MLOps framework.
Why Virtuability?
Virtuability has a strong history of collaboration with customers in the SaaS and Financial services sectors. We are specialised AWS Cloud experts with a team of consultants who work in a number of technology domains.
Our AWS Services Partner status has over the years validated our expertise and ongoing commitment to AWS Cloud.
Solutions
To address the challenges that St. James’s Place faced, Virtuability leveraged various solutions.
Multi-account Structure with Control Tower
Virtuability previously advised and assisted St. James’s Place in the adoption and use of AWS Control Tower.
This allowed St. James’s Place to setup a multi-account structure for MLOps within the AWS Organization, which included the following accounts:
-
Data Science account: Used for model training via Sagemaker domains and use of Stagemaker Studio
-
Shared Services account: Used for central model registry, model pipeline factory (backed by CDK Pipeline), code repositories etc.
-
Workload accounts: NonProd, PreProd and Prod accounts for Machine Learning workloads
Sagemaker Domain
A Sagemaker domain was configured in VPC-only configuration with storage encryption. Outbound connectivity leveraged existing VPC and transit gateway connectivity as well as centralised VPC endpoints for private API access.
Cost allocation
SCP’s were used to enforce use of tag keys on particular, supported Sagemaker and other API calls. This ensured that training jobs and workloads were correctly allocated to owners and projects.
Guardrails
Guardrails for model training and deployments were adopted at several levels:
-
Control Tower Proactive (Cloudformation Hooks) and Preventive (Service Control Policies) controls were used to enforce internal controls such as encryption in transit and at rest as well as a requirement for resources to exist in a VPC.
-
Additional Service Control Policies were used, as required, to enforce tagging and encryption of various resources and Sagemaker jobs
-
Custom CDK bootstrap was used to enforce a permissions boundary on workloads to reduce access to mostly data plane operations at runtime. Furthermore, used to enforce use of IAM role and policy paths on workloads. Finally, the CDK bootstrap was customised to enforce deployment of certain MLOps-related AWS resource types only.
-
AWS Identity Center was used to provide Data Scientists and MLOps Engineers with read-only access to the Data Science and workload accounts via the AWS Console. In addition, Data Scientists and MLOps Engineers were given access to Sagemaker model training and Sagemaker domain resources as required. An emphasis was put on a code-first approach to develop and deploy resources - as well as to run Sagemaker jobs - to ensure consistency.
Deployments
Model deployments were achieved using CDK pipelines. The workload stacks were developed using CDK stacks. Each CDK pipeline consistently deploys a particular model across multiple, configurable environments, which in turn exist across multiple accounts.
A centralised CDK pipeline factory was developed to define and create new model deployment pipelines quickly.
The pipeline factory and pipelines cater for different resource requirements on a per-environment and per-model basis. E.g. a development environment typically doesn’t require the same size or number of Sagemaker model resources as a production environment. This includes deployment of API Gateways and Lambda functions to provide a unified and consistent access point for Sagemaker models. The Systems Manager Parameter store was leveraged for dynamic model registration with the Lambda function. In other words, introduction of new models did not require API Gateway or Lambda function code or deployment changes.
New model versions automatically trigger the CDK pipeline to start deployment.
A centralised Model Registry was created to easily onboard new model packages. This was achieved using the CDK with a configurable set of model package groups.
Finally, a centralised framework was created to build and publish model container images used for model endpoint inference.
Foundation models
Secure access to a select list of Bedrock-provided foundation models was required. This was achieved by creating Bedrock VPC endpoints in the VPC.
AWS Enablers
AWS offers a suite of powerful tools and services that enable the MLOps Deployment and Governance Framework.
AWS Cloud Development Kit (CDK)
The AWS CDK is an open-source software development framework that allows you to define cloud infrastructure using familiar programming languages such as TypeScript and Python. By leveraging the CDK, developers can create and manage AWS resources as infrastructure as code at scale. This approach ensures consistency, repeatability, scale and version control for cloud infrastructure.
CDK Pipelines
CDK Pipelines is a feature within the AWS CDK that facilitates continuous integration and continuous delivery (CI/CD) for AWS applications through the AWS CodePipeline. CDK Pipelines facilitates the process of building, testing and deploying applications by defining a pipeline as code. This allows for seamless and consistent deployment processes across different environments, ensuring that new versions of applications are delivered rapidly and reliably.
AWS CloudFormation
AWS CloudFormation is a service that provides a common language for describing and provisioning all the infrastructure resources in the AWS Cloud environment. Code can be version-controlled and reused. CloudFormation automates the provisioning and management of resources, making it easier to deploy and update infrastructure consistently. Cloudformation is used internally by the CDK to deploy stacks and stacksets.
AWS Lambda
AWS Lambda is a serverless compute service that allows code to run without provisioning or managing servers. Lambda provides an excellent integration point with other services such as Sagemaker model endpoints and API Gateway.
AWS Control Tower
Control Tower is a service that simplifies the setup and governance of a secure, multi-account AWS environment. It orchestrates the capabilities of several other AWS services, including AWS Organizations, AWS Service Catalog and AWS IAM Identity Center. Control Tower applies controls (sometimes called guardrails) to ensure that accounts adhere to best practices, standards and regulatory requirements.
Business Outcomes
- Faster, hands-off & automated release and deployment of new model versions through to production via continuous delivery, reduced from days to hours
- Onboarding of new models can be achieved in hours rather than days
- A small team of DevOps Engineers to facilitate and maintain the operational and security requirements of the business while at the same time facilitating the requirements of the Data Scientists and MLOps Engineers
Conclusion
The collaboration between Virtuability and St. James’s Place has successfully delivered a comprehensive MLOps Deployment and Governance Framework. By leveraging AWS services such as SageMaker, Control Tower, CDK, CloudFormation and Lambda, the framework ensures a scalable, reliable, and secure environment for managing machine learning models. The adoption of a multi-account structure, robust security controls, and automated deployment processes has addressed key challenges related to governance and operational efficiency.
Overall, the MLOps Deployment and Governance Framework provides a solid foundation for St. James’s Place to continue innovating and scaling their machine learning initiatives, while maintaining a high level of security and operational efficiency.