2024 Platform Engineering Strategic Plan
Platform engineering is focused on five core principles and practices. These are normalizing technology stacks, standardization, DevOps practices, infrastructure automation, and providing self-service tools to appropriate stakeholders. Aligning our goals along these areas allows us to better understand our progress as a team, and to identify opportunities for improvements.
Under each of the following section you will find a brief description of the topic. Additionally, there is a list of work streams the platform engineering team is managing in support of the given area.
Overview
Normalize Tech Stack
It is important to have a defined tech stack to streamline onboarding new team members and customers. This includes tools used internally within ICS, as well as tools we support for our customers.
- ICS Tech Stack Inventory - Ongoing
- Platform engineering is working to create documentation of the various technologies, platforms, systems, and vendors ICS interacts with.
- The inventory will help identify outliers that should be considered for standardization.
- ICS Git Training - Ongoing
- Moving all of ICS towards managing code assets via Git source control.
- Databricks Training - Ongoing
- Ensuring ICS is proficient in using Databricks to fulfill customer requests.
- Minimize Technical Debt - Ongoing
- Actively exploring solutions that can be transitioned to platform engineering.
- This includes historical solutions that were created in an ad-hoc fashion and can now be updated to fit existing patterns/platforms defined by the platform team.
Standardize and Reduce Variability
It is important to provide a standardized environment both internally and externally to provide an intuitive interaction with our products and services. Defining standardized schemas, queries/libraries, and APIs reduces the cognitive load required of our users. Platform engineering strives to build solutions that surface repeatable patterns that our customers can easily apply to work across a variety of projects.
- Internal Infrastructure Pattern Definitions - Ongoing
- Platform engineering is defining the architectural patterns we will use for various solutions. These patterns will be repeated across projects to reduce the learning curve when cross training and collaborating.
- Data Lake Schemas and Resources - Ongoing
- The team provides support to the Data Warehousing team by defining consistent patterns in schemas, volumes, source control integration and other Databricks related resources.
Expand DevOps Practices
DevOps practices are the foundation in which infrastructure automation and self-service platforms sit upon. In order to reduce internal support costs and empower our customers, the platform team is focused on taking a DevOps first approach when building the infrastructure that ICS runs on.
- Terraform migration/adoption - Ongoing
- The team has begun creating Terraform code to manage ICS cloud resources that are created now and in the future.
- Platform engineering is also working to identify existing resources that are not managed using Terraform. These resources will be migrated to Terraform when possible.
- Documentation - Ongoing
Automate Infrastructure Delivery
By building upon the DevOps practices, the platform engineering team reduces the effort needed to manage new and existing services. Deploying modern solutions is complex and error prone. Our goal is to wrap this risk and complexity into repeatable processes that automate the deployment of platforms we support.
- REDCap Automated Tests - Finalizing
- The first version of the suite of automated tests for our REDCap deployments is wrapping up. The tests cover both core functionality of REDCap and many of the most used external modules. These tests will be used in an automated deployment pipeline in the future, and will help verify REDCap upgrades before releasing them to our customers.
- REDCap Automated Upgrades - Pending
- Automated Creation of AI Endpoints - Investigating
- The team is in the early stages of investigating a solution to allow us to easily deliver metered API access to various LLMs. This includes both OpenAI and open source models.
- Improving RDC Deployment Pipeline - Pending
Provide Self-Service Capabilities
Platform engineering is dedicated to empowering both members of I2DB and our larger customer base. To this end, the team is focused on providing self-service solutions that users can leverage to reduce friction and ensure they have what they need to succeed.
- Self-Service Data Pipelines - Ongoing
- Platform engineering is currently working on rolling out the web interface to the Databasin tool to the ICS teams. This tool will enable ICS team members to easily configure automated data pipelines that run within Databricks.
Team Metrics
Shared Infrastructure
- Databricks
- RDC
- ...
Automated Workflows and Processes
- Data lake onboarding - In progress
- REDCap Testing - In progress
- REDCap Deployment - Pending
Self-Service Platforms
- Databricks
- SAS
- REDCap
- MDClone
- Databasin - In progress
- Atlas - In progress
- GIC - In progress
Access Control Management
- MDClone
- Databricks
- ...
Development Environments
- All teams in databricks (count)
- Atlas
- REDCap
Deployment Pipelines
- REDCap Deployment - Pending
- Atlas - In progress
Managing Costs and Resources
- Databricks
- Azure Storage
- OpenAI
Application Infrastructure
- MDClone
- GIC
- REDCap
- ...