Skip to main content

Lars Kamp
Hassan Khajeh-Hosseini

When developers deploy resources, there is little to no insight for them to understand how much a resource is going to cost.

Infracost is changing this by shifting the cost component of cloud resources "left"—i.e. into the hands of developers in a new approach to FinOps.

The existing paradigm of cloud financial management and the traditional FinOps way of managing cloud spend mean waiting for the cloud bill to arrive, then trying to identify opportunities for cost savings.

First-generation FinOps companies like Flexera, Cloudability, and CloudHealth emerged around 2011. They provided an improved user interface for complex billing data, and followed the monthly billing cycle of cloud providers.

However, a month is not sufficient with today's automated and dynamic cloud environments driven by infrastructure–as-code. A new generation of tools have shortened cycles, and delays between cloud bills and their analysis has come down to a day or less.

In broad terms, efforts to lower the cloud bill are based on a simple formula:

cost=usage×pricecost = usage \times price

Existing approaches mostly focus on the "price" component of the equation. Procurement mechanisms to lower the price point of a cloud resource include reserved instances, enterprise discount programs, savings plans, etc. Finance "slices and dices" the cloud bill after resources have been deployed to optimize price points and the overall size of the cloud bill.

However, the procurement-driven approach doesn't account for the "usage" component of the equation, which is a function of developer activity. Finance lacks the context that developers have when deploying resources, while developers lack visibility into resource prices and the cost of their deployments.

Infracost is closing this gap by providing cloud cost estimates for Terraform in pull requests to show engineering teams how code changes affect their cloud bills. Infracost adds comments to pull requests (e.g., "this change will increase your cloud bill by 25%") which are visible to engineering management, FinOps, and product teams.

Hassan Khajeh-Hosseini is Co-Founder and CEO at Infracost, which he co-founded with his brother Ali Khajeh-Hosseini and their friend Alistair Scott. The founding team has a decade of cloud cost history together, with two previous cloud cost start-ups founded and exited.

In this episode, Hassan walks us through the science and engineering behind building Infracost. We also discuss broader infrastructure trends, including "cloud financial engineering" and the general "shift left" of testing, security, and (of course) cost in the development process.

Lars Kamp

ITAM is an established category in the IT market, with its own Gartner Magic Quadrant.

Gartner defines ITAM as "[providing] an accurate account of technology asset lifecycle costs and risks to maximize the business value of technology strategy, architecture, funding, contractual and sourcing decisions." ITAM is usually divided into two subcategories, SAM and HAM.

With cloud computing and SaaS tools, the requirements for ITAM have changed.

In the old world of IT, there was tight control over who could purchase servers and software licenses. IT was a (literal) gatekeeper that determined who could push a new server into a rack and provision that server with software.

That control is gone in today's world, where developers and employees have the flexibility to swipe a credit card or push a button in a console to "procure" cloud resources and software.

There are, of course, benefits of giving employees flexibility—namely, "development velocity", the speed to build and launch new products.

A challenge remains to optimize the value of these infrastructure expenditures, however, which means balancing "development velocity" and "business velocity." Without balance, the result is tool and infrastructure sprawl, as well as out-of-control spending. Decentralized procurement may sound great on paper, but usually leads to the "worst best deal."

Balancing business with development velocity is Amit Mizrahi's job as Head of Strategic Operations at Wix.

Wix's flagship product is their free website builder, around which they've also built a portfolio of e-commerce products. The Wix company mantra is "to measure everything," and Amit's work includes measuring the ROI on Wix's IT assets—a tall order when Wix's employees number nearly 6,000.

In this episode, Amit walks us through how he built an ITAM program at Wix from scratch. The ITAM program is part of the "Value & Impact Center of Excellence at Wix," which has two pillars:

  1. ITAM: Managing procurement and operations for everything related to SaaS products and tools within Wix.
  2. FinOps: An organizational function that is in charge of monitoring cloud activities, governing cloud spending, and educating teams on financial-driven KPIs. (See Episode 5: Shifting From FinOps to Financial Engineering.)

To understand the business value of tools, Amit and his team built an internal data integration and analytics layer that extracts usage data from all tooling—an abstraction across Wix's IT assets. This abstraction layer is coupled with procurement processes that create alignment between development and business velocity for Wix.

Lars Kamp
Yevgeny Pats

ELT describes the process of extracting raw data from a source, loading it into a destination, and then transforming the data for analytics purposes. ELT has become mainstream with the rise of cloud warehouses and data lakes, in a shift away from ETL.

ETL was the dominant paradigm when storage and compute were expensive and pre-aggregating data (i.e., transform) made economic sense. But ETL comes with a trade-off—aggregating data before analysis also means losing fidelity, granularity, and the flexibility to iterate and re-run an analysis in a different way.

The cloud has driven down the cost of compute and storage so that it no longer makes sense to pre-aggregate data in an external processing layer, resulting in the shift to ELT. Today, we can store raw data in data lakes at high fidelity and with the flexibility to write queries tailored to any use case.

The main use case for ELT until now has been sales and marketing data, where data sources include systems like Salesforce, Marketo, and Google Analytics.

A new type of data source is cloud infrastructure data, which encompasses information about cloud resources like compute instances, storage buckets, or databases. Cloud infrastructure data describes the configuration of and relationships between cloud resources.

Examples of cloud infrastructure data include not only general properties like start date, name, and tags; but also resource-specific properties like price or type. This data is available via the cloud APIs that infrastructure-as-code tools like Terraform and Pulumi use to deploy resources.

CloudQuery is a high-performance open-source ELT framework built for developers. CloudQuery extracts data from cloud APIs and loads it into databases, data lakes, or streaming platforms for further analysis.

With raw infrastructure data, CloudQuery users are building solutions for security, cost, and governance use cases by writing SQL queries. Querying raw infrastructure SQL provides more flexibility and coverage than an opinionated DevOps tool could provide.

In this episode, I chat with Yevgeny Pats, CEO and co-founder at CloudQuery. We cover the "why now?" for infrastructure data, and the change in mindset observed among infrastructure engineers and their shift to using data lakes.

Watch this episode to also see a demo of CloudQuery, and learn how the tool evolved from a niche data sync solution to a high-performing ELT framework.

Lars Kamp
Kevin Hu

In this episode, we interview Kevin Hu, co-founder and CEO at Metaplane. Metaplane offers data observability for the modern data stack. Kevin calls Metaplane the "Datadog for data," in reference to observability for microservices and cloud-native stacks.

As data volume and tool usage grow, so does the potential for something to break—resulting in errors and data downtime. In the modern data stack, the chain of SQL-based transformations between the original data source and the computed result is long and complex. For this reason, it's often nearly impossible to pinpoint the source of data errors.

Metaplane's focus is data criticality, and Metaplane has built instrumentation to understand exactly where errors occur. When data is mission-critical to the business, data teams become "solution-aware."

We take a walk down memory lane in this episode. We discuss the early days of the cloud warehouse market and the paradigm shift to separate storage and compute that, overnight, turned Snowflake into a market leader.

As a result of this shift, the market for analytics expanded and spawned a new generation of data tooling across categories like data integration and ETL, customer data platforms, data catalogs, reverse ETL, and data observability by companies like RudderStack, Airbyte, Census, Hightouch, and, of course, Metaplane.

Lars Kamp

Apache Iceberg is a new table format that offers both the simplicity of SQL and separation of storage and compute. The Iceberg table format works with any compute engine, so users are not limited to working with a single engine. Popular engines (e.g., Spark, Trino, Flink, and Hive) and modern cloud warehouses (e.g., Snowflake, Redshift, and BigQuery) can work with Iceberg tables at the same time.

A table format is a layer that sits between the file format and database. Iceberg is an abstraction layer above file formats like Parquet, Avro, and ORC born out of necessity at Netflix. Like many other companies at the time, Netflix shifted from MPP data warehouses to the Hadoop ecosystem in the 2010s. MPP warehouses like Teradata were hitting scale limitations and becoming too expensive at Netflix's scale.

The Hadoop ecosystem abandoned the table abstraction layer in favor of scale. In Hadoop, we deal directly with file systems like HDFS. The conventional wisdom at the time was that bringing compute to storage was easier than moving the data to compute. Hadoop scales compute and disk together, which turned out to be incredibly hard to manage in the on-premise world.

Early on, Netflix shifted to the cloud and started storing data in Amazon S3 instead, which separated storage from compute. Snowflake, the cloud warehouse, also picked up on that principle, bringing back SQL semantics and tables from "old" data warehouses.

Netflix wanted to separate storage/compute and SQL table semantics. They wanted to add, remove, and rename columns without S3 paths. But rather than going with another proprietary vendor, Netflix wanted to stay in open source and open formats. And thus, Iceberg was developed and eventually donated to the Apache Foundation. Today, Iceberg is also in use at companies like Apple and LinkedIn.

Tabular commercializes Apache Iceberg. Working with open-source Iceberg tables still requires understanding of object stores and distributed data processing engines and how various components interact with each other. Tabular lowers the bar for adoption and removes the heavy lifting.

Jason Reid is a co-founder and heads Product at Tabular. In this episode, Jason walks us through the benefits of using an open table format like Iceberg and how it works with existing analytics infrastructure and tooling of the modern data stack like dbt.

Lars Kamp
Julia Schottenstein

dbt Labs' mission is to empower data practitioners to create and disseminate organizational knowledge with its open-source product dbt. dbt helps write and execute data transformation jobs by compiling code to SQL and running it against your cloud warehouse.

When raw data from production or SaaS apps arrives in a cloud warehouse for analysis, it's not in a usable state. Analytics engineers need to prepare, clean, join, and transform the data to match business needs. These needs could include visualizing data for a sales forecast, feeding data into a machine learning model, or preparing operational analytics with infrastructure data. The analytics engineering workflow covers all the steps from raw data extraction to data modeling and end uses like reporting or data science.

Today, over 16,000 companies use dbt. dbt has become a foundational technology for the analytics engineering workflow, which is very similar to the DevOps workflow. dbt applies software engineering principles to working with data. To "productionize" data, engineers develop, test, and integrate it—and then also provide observability and alerting once it's in production. All of this functionality is included in dbt Cloud, the commercial version of dbt.

Julia Schottenstein heads Product at dbt Labs. In this episode, Julia walks us through the evolution of dbt from a tool for data teams at start-ups to enterprise deployments where sometimes thousands of analytics engineers collaborate through dbt. We cover all aspects of the modern data stack—cloud warehouses, ETL, data pipelines, and orchestration—with an outlook on the wider use of data in the enterprise by both humans and applications:

  • dbt's semantic layer, which assigns definitions (e.g., revenue, customer, churn) to a specific metric

    The semantic layer in dbt contains the definitions for each metric, ensuring consistency and flexibility—users can slice and dice a metric along any dimension. Metrics are computed at the time of a query rather than pointing to an already materialized view.

  • Continuous integration and deployment (CI/CD) for data

    Building data pipelines is expensive, and data transformation can take a long time with large data sets and complex queries. dbt Cloud ships a purpose-built CI tool that builds the absolute minimum set of code and data to test changes.

  • How dbt works, with its direct acyclic graph (DAG)

    The DAG is a visual representation of data models and the connections in-between them. dbt started out with SQL to run all transformations, but is now also inviting other languages such as Python.

Lars Kamp
Michael Driscoll

Creating an analytics dashboard is a time-consuming process that involves stitching together many components: ELT pipelines, cloud warehouses, transformation and semantic layers, data catalogs, and a dashboard tool. The flexibility of the Modern Data Stack (MDS) also means a great deal of complexity and many design decisions.

Rill Data is on a mission to radically simplify how developers create operational dashboards. Rill offers blazing fast dashboards that come bundled with a real-time analytical database and a modeling layer.

Michael Driscoll is the co-founder and CEO of Rill Data. In this episode, Mike demos the latest 0.16 release of Rill Developer.

There are three pieces of infrastructure that form a Rill dashboard application:

  • Sources: Rill ships with a CLI you can use to import data from an object store like AWS S3 or Google Cloud Storage. Rill treats the object store as the source of truth and imports data for the "last-mile ETL." As data in the object store changes, Rill orchestrates incremental updates.
  • Runtime: The runtime itself consists of a database (DuckDB), a web UI for rendering the dashboards (SvelteKit), and a middleware written in Go. Rill Enterprise replaces DuckDB with Apache Druid to process large data sets.
  • Models: Configuration code that parameterizes the dashboards, using YAML and SQL.

Bringing these things together in one application is an opinionated way to transform data to dashboards that Mike says covers "80%+ of the use cases that [they've] come across when building operational dashboards." Rill customers create dashboards to build analytics for their advertising, marketplace, and infrastructure operations.

Rill's stack is a departure from point-and-click interfaces, moving towards what Mike calls "BI-as-code." Source definitions and metrics are implemented in YAML, and models create a SQL query. The combination of SQL and YAML creates a BI layer that can be checked into a Git repository, which can then be managed automatically by CI workflows.

We also cover broader trends in our discussion, including the convergence of engineering and analytics cultures as engineers adopt practices from analytics to work with infrastructure data. Watch this episode to learn more about building data infrastructure for engineering teams with SQL and YAML with Rill.

Lars Kamp

Some studies estimate that nine out of ten copies of data are precomputed. Precomputation requires a lot of engineering and batch processing. Compare this to what you can achieve when instead computing raw data, which reduces the amount of data you need to manage, store, and secure by up to 90%. Yet, some precomputation has often still been required because of bottlenecks in I/O, storage, or compute.

FeatureBase is the first analytical database built entirely on bitmaps.

Bitmaps lay out data differently from both the row-oriented layout of transactional databases and the columnar layout of analytical databases; bitmaps store data at the value. Due to the nature of bitmaps, the data pertaining to each unique value within a row or column can be accessed independently without having to scan the row or column. The I/O for typical analytical workloads is only a fraction of that of traditional analytical queries.

Bitmaps are more efficient when it comes to storing, transporting, and managing data—they are orders of magnitude faster than today's popular cloud warehouses, and also an order of magnitude more efficient at storing data. Their efficiency makes them ideal for real-time processing and artificial intelligence workloads.

In fact, that's what positions FeatureBase as the database between real-time streaming engines like Kafka on one end, and cloud warehouses as long-term storage engines on the other end. FeatureBase is the working memory in-between the two.

Higinio "H.O." Maycotte is Founder and CEO at FeatureBase. In this session, we explore the mathematical pillars of databases and bitmaps. We cover:

The data footprint and scale of some of FeatureBase's customers is nothing short of breathtaking. One of their advertising customers processes 120 billion updates a day—that's 1.38 million updates per second. FeatureBase allowed them to reduce their server count from 1,000 servers to just 11, saving them millions of dollars per year.

The team at FeatureBase has invested over $30 million in R&D and nine years of their lives to advance the use of bitmaps in databases. Watch this fascinating session with H.O. to learn more about math, bitmaps, and modern real-time processing data architecture.

Lars Kamp
Patrick DeVivo

Software engineering is often more art than science, making it difficult to measure productivity. There are ways to use data to be more effective as an individual contributor or an engineering leader, but surprisingly, engineering organizations and teams typically are not data-driven.

MergeStat is on a mission to change this with open-source, operational analytics for software engineering organizations. MergeStat started as an experiment to bring together two technologies: SQL and Git repositories. MergeStat provides data integration for your Git repositories, facilitating the exploration of legacy code and identification of code that hadn't been touched in a while and maybe deserved new attention.

From there, the use cases evolved. Today, MergeStat is used by organizations that have hundreds or even thousands of repositories. MergeStat is data infrastructure for Git repositories, where anyone can query the history and contents of their code bases.

Behind the scenes, MergeStat syncs data from the tools used to build and ship software into a PostgreSQL instance, as APIs provided by these tools are not always easy to understand and extract data from. MergeStat puts a lot of the usual work into implementing good API data consumption, like pagination and respecting rate limits.

From there, a user can query their data directly in MergeStat, or use other business intelligence tools and dashboards that know how to speak to PostgreSQL. See this example Grafana dashboard for GitHub pull requests.

Patrick DeVivo is Founder and CEO at MergeStat. In this session, we start out with a general overview of MergeStat and how it's used today.

Patrick explains how MergeStat is a general-purpose engine that companies use to craft the queries that fit their organization. We go into a few MergeStat use cases that Patrick sees today:

  • In some cases, the actual data collection is the use case. For example, with audits the action is to deliver the list of pull requests that didn't follow best practices.
  • Understanding the different versions of a programming language in use. If you're a Go shop, a single query aggregates the different Go versions used across all repositories.
  • Find pull requests that have been open for a long time or merged without review.

Patrick's advice is to use MergeStat in a way that is positive and constructive to take action. Watch this episode to learn more about data integration for the software development lifecycle.

Lars Kamp

In the old world of software engineering, developer productivity was measured by lines of code. However, time has shown how code quantity is a poor measure of productivity. So, how come engineering organizations continue to rely on this metric? Because they do not have a "single-pane" view across all the different systems that have data on various activities that actually correlate with productivity.

That's where Faros AI comes in. Faros AI connects the dots between engineering data sources—ticketing, source control, CI/CD, and more—providing visibility and insight into a company's engineering processes.

Vitaly Gordon is the founder and CEO of Faros AI. Vitaly came up with the concept for Faros AI when he was VP of Engineering in the Machine Learning Group at Salesforce. As an engineering leader, it's not always code; you also have business responsibilities. That meant interacting with other functions of the business, like sales and marketing.

In those meetings, Vitaly realized that other functions used standardized metrics that measure the performance of their business. Examples are CAC, LTV, or NDR. These functions built data pipelines to acquire the necessary data and compute these metrics. Surprisingly, engineering did not have that same understanding of their processes.

An example of an engineering metrics framework is DORA. DORA is an industry-standard benchmark that correlates deployment frequency, lead time, change failure rate, and time to restoration with actual business outcomes and employee satisfaction. For hyperscalers like Google and Meta, these metrics are so important that they employ thousands of people just to build and report them.

So, how do you calculate DORA metrics for your business? With data, of course. But, it turns out the data to calculate these metrics is locked inside the dozens of engineering tools used to build and deliver software. While those tools have APIs, they are optimized for workflows, not for exporting data. If you're not a hyperscaler with the budget to employ thousands of people, what do you do? You can turn to Faros AI, which does all the heavy lifting of acquiring data and calculating metrics for you.

The lessons learned from the modern data stack (MDS) come in when building data pipelines to connect data from disparate tools. In this episode, we explore the open-source Faros Community Edition and the data stack that powers it.