Cloud2SQL is a tool based on Resoto's collector plugins that allows you to collect data from various cloud infrastructure sources and export it to a database (like Snowflake, PostgreSQL, MariaDB, or MySQL) or write it as Parquet, SQLite, or CSV files for ingestion in your data lake.
In this post, I will guide you through the process of installing Cloud2SQL and demonstrate how to export data from AWS to a local SQLite database file.
Whether you are looking to integrate cloud data into your existing SQL workflows or simply want an easy way to access and analyze data from multiple cloud sources, Cloud2SQL is an excellent tool to consider.
To install Cloud2SQL, all you need is Python 3.9 or newer. Create a new virtual environment and install the
$ pip3 install --user cloud2sql[all]
If you only require support for a specific database, you can choose between
cloud2sql[mysql] instead of installing
Snowflake currently only supports Python 3.10. If you are using Python 3.11, use a database-specific package instead of
Cloud2SQL supports all sources supported by Resoto. By default, Cloud2SQL ships with AWS, Google Cloud, DigitalOcean, and Kubernetes source plugins pre-installed.
Let's load some AWS data into a local SQLite database file. For authentication with AWS, we have all options supported by the AWS CLI. You can either export
AWS_SECRET_ACCESS_KEY, or create a profile in your AWS config file and export
The configuration file is a YAML file that specifies sources to load data from and destinations to export data to. Create a file
This tells Cloud2SQL to load the AWS collector. Within the
aws section, you can configure additional options, but they are optional.
Advanced configuration examples
- AWS / K8S / Snowflake
- GCP / PostgreSQL
- DigitalOcean / Parquet
# IAM role name to assume
# List of AWS profiles to collect
# List of AWS Regions to collect (null for all)
# Scrape the entire AWS organization
# Assume given role in current account
# Do not scrape current account
# Configure access to k8s clusters.
# - name: 'k8s-cluster-name'
# certificate_authority_data: 'CERT'
# server: 'https://k8s-cluster-server.example.com'
# token: 'TOKEN'
# Configure access via kubeconfig files.
# - path: "/path/to/kubeconfig"
# all_contexts: false
# contexts: ["context1", "context2"]
- path: /path/to/kubeconfig
# GCP service account file(s)
# Empty string to use the default service account e.g.:
# service_account: [""]
# DigitalOcean API tokens for the teams to be collected
config-template.yaml for a list of all supported configuration options.
When collecting multiple accounts, the
scrape_org options instruct Cloud2SQL to fetch the list of all organization accounts and specify a role to assume. Alternatively, you can specify the list of accounts to collect using the
profiles option if those profiles have been defined in your AWS CLI config file.
account_pool_size controls the number of accounts that are being collected in parallel. Increasing the value uses more CPU cores and memory but also makes the collection of multiple accounts finish faster.
Running Cloud2SQL is as simple as executing the following command:
$ cloud2sql --config myconfig.yml
And that's it! After a couple of minutes, when the collection is done, you will have a copy of your cloud infrastructure in
Let's take a look at the exported data! Open the SQLite database:
$ sqlite3 resoto.db
sqlite3 command is not already installed on your machine, the package name in most Linux distributions as well as homebrew (MacOS) and Chocolatey (Windows) is
.tables command lists all the tables that were created during Cloud2SQL's collect run, while the
.schema command outputs a table's fields.
For instance, if you have IAM server certificates, we can inspect the schema of the
> .schema aws_iam_server_certificate
CREATE TABLE IF NOT EXISTS "aws_iam_server_certificate" (
_id VARCHAR NOT NULL,
PRIMARY KEY (_id)
We could then find all certificates expiring in the next thirty days:
> SELECT name, expires, account from aws_iam_server_certificate WHERE datetime(expires) BETWEEN datetime('now') AND datetime('now', 'start of day', '+30 day');
See Resource Data Models for a full list of currently supported resources.
Now that you have a good understanding of how to install and configure Cloud2SQL, why not give it a try and see how it can help you streamline your cloud data management and analysis processes?
Simply follow the steps above to get started, and feel free to reach out to us on Discord if you have any questions or encounter any issues along the way. We're always happy to help!