Skip to main content

3 posts tagged with "continuous integration"

View All Tags

Lars Kamp
Julia Schottenstein

dbt Labs' mission is to empower data practitioners to create and disseminate organizational knowledge with its open-source product dbt. dbt helps write and execute data transformation jobs by compiling code to SQL and running it against your cloud warehouse.

When raw data from production or SaaS apps arrives in a cloud warehouse for analysis, it's not in a usable state. Analytics engineers need to prepare, clean, join, and transform the data to match business needs. These needs could include visualizing data for a sales forecast, feeding data into a machine learning model, or preparing operational analytics with infrastructure data. The analytics engineering workflow covers all the steps from raw data extraction to data modeling and end uses like reporting or data science.

Today, over 16,000 companies use dbt. dbt has become a foundational technology for the analytics engineering workflow, which is very similar to the DevOps workflow. dbt applies software engineering principles to working with data. To "productionize" data, engineers develop, test, and integrate it—and then also provide observability and alerting once it's in production. All of this functionality is included in dbt Cloud, the commercial version of dbt.

Julia Schottenstein heads Product at dbt Labs. In this episode, Julia walks us through the evolution of dbt from a tool for data teams at start-ups to enterprise deployments where sometimes thousands of analytics engineers collaborate through dbt. We cover all aspects of the modern data stack—cloud warehouses, ETL, data pipelines, and orchestration—with an outlook on the wider use of data in the enterprise by both humans and applications:

  • dbt's semantic layer, which assigns definitions (e.g., revenue, customer, churn) to a specific metric

    The semantic layer in dbt contains the definitions for each metric, ensuring consistency and flexibility—users can slice and dice a metric along any dimension. Metrics are computed at the time of a query rather than pointing to an already materialized view.

  • Continuous integration and deployment (CI/CD) for data

    Building data pipelines is expensive, and data transformation can take a long time with large data sets and complex queries. dbt Cloud ships a purpose-built CI tool that builds the absolute minimum set of code and data to test changes.

  • How dbt works, with its direct acyclic graph (DAG)

    The DAG is a visual representation of data models and the connections in-between them. dbt started out with SQL to run all transformations, but is now also inviting other languages such as Python.

Lars Kamp

In the old world of software engineering, developer productivity was measured by lines of code. However, time has shown how code quantity is a poor measure of productivity. So, how come engineering organizations continue to rely on this metric? Because they do not have a "single-pane" view across all the different systems that have data on various activities that actually correlate with productivity.

That's where Faros AI comes in. Faros AI connects the dots between engineering data sources—ticketing, source control, CI/CD, and more—providing visibility and insight into a company's engineering processes.

Vitaly Gordon is the founder and CEO of Faros AI. Vitaly came up with the concept for Faros AI when he was VP of Engineering in the Machine Learning Group at Salesforce. As an engineering leader, it's not always code; you also have business responsibilities. That meant interacting with other functions of the business, like sales and marketing.

In those meetings, Vitaly realized that other functions used standardized metrics that measure the performance of their business. Examples are CAC, LTV, or NDR. These functions built data pipelines to acquire the necessary data and compute these metrics. Surprisingly, engineering did not have that same understanding of their processes.

An example of an engineering metrics framework is DORA. DORA is an industry-standard benchmark that correlates deployment frequency, lead time, change failure rate, and time to restoration with actual business outcomes and employee satisfaction. For hyperscalers like Google and Meta, these metrics are so important that they employ thousands of people just to build and report them.

So, how do you calculate DORA metrics for your business? With data, of course. But, it turns out the data to calculate these metrics is locked inside the dozens of engineering tools used to build and deliver software. While those tools have APIs, they are optimized for workflows, not for exporting data. If you're not a hyperscaler with the budget to employ thousands of people, what do you do? You can turn to Faros AI, which does all the heavy lifting of acquiring data and calculating metrics for you.

The lessons learned from the modern data stack (MDS) come in when building data pipelines to connect data from disparate tools. In this episode, we explore the open-source Faros Community Edition and the data stack that powers it.

Lars Kamp
Jon Edvald

Jon Edvald is the founder and CEO of Garden, an end-to-end cloud delivery platform that accelerates your development, testing, and CI/CD workflows.

In this conversation, Jon covers how the shift from monolithic applications to microservices has taken us from a single codebase to individual deliverables that are getting smaller and smaller. With the introduction of containers, an application now consists of many discrete components—which continue to get even smaller with the arrival of serverless. And where teams previously had to manage five to ten codebases, they are now dealing with hundreds or even thousands. Testing and deploying these different codebases has become a graph problem.

Beyond adopting containers and Kubernetes, the complexity of that graph of system components pushes the boundaries of existing DevOps tool chains. There is overhead for setup of each component in the graph, which becomes unmanageable with existing tools.

Garden solves this issue by factoring out things that are undifferentiated across different teams, allowing them to focus on their own business problems. Garden builds a directed graph of everything that needs to happen to transition from a bunch of Git repositories to a fully built, deployed, and tested system.

Listen to this episode to learn more about the industrialization of continuous integration (CI), infrastructure as code (with popular tools like Terraform and Pulumi), and how Garden helps developers ship more software faster.