BigQuery ML now in CARTO: supercharge your spatial analysis with AI
We recently launched Workflows Extension Packages, a powerful new tool that allows users to extend the functionality of CARTO Workflows by creating packages of custom components through SQL stored procedures. With these new packages, users can now design and integrate components that are tailored to their specific needs, boosting both the flexibility and capabilities of the CARTO platform.
But that’s not all— CARTO has also developed a selection of pre-built extension packages to tackle common analytical challenges, and today, we are thrilled to present the BigQuery ML Extension Package, our AI-driven new tool to elevate your spatial analysis!
In this post, we’ll explore how these new machine learning features work, and show you how you can incorporate them into your workflows to elevate your spatial data analysis. All of this can be achieved entirely inside your BigQuery data warehouse with CARTO’s fully cloud-native approach to spatial analysis, ensuring speed, scalability and security.
The new Extension Package allows users to exploit BigQuery’s ML capabilities directly from Workflows, enabling seamless integration of machine learning models into automated pipelines. With minimal coding required, users can quickly build, train, and deploy predictive models—such as customer churn prediction or sales forecasting—using data stored in Google BigQuery.
This integration empowers businesses to unlock valuable insights, automate data-driven decisions, and drive better outcomes through machine learning, all within the simplicity of a low-code environment such as CARTO Workflows.
With this toolbox, users can either train their own models or import custom models trained outside of BigQuery. They can then run predictions, understand the model’s output and evaluate their model’s performance, or perform targeted spatial analysis based on the model's outputs. Available components include the following:
All components are created on top of BigQuery ML’s capabilities, with each component invoking a specific BQ ML procedure. You can refer to the BigQuery ML Extension Package and the BigQuery ML SQL Reference documentation for further details. Notice that to be able to manage models, you will need to have granted the roles/ml.developer role.
Now, let’s dive into a real-world scenario and explore how to predict customer churn for a telecom provider offering telephone and internet services.
Many industries, including telecoms, face the challenge of customer churn driven by dissatisfaction with specific products or services. In this example, we’ll dive into telecom customer churn data to uncover the key reasons behind customer departures and develop targeted strategies to boost retention and satisfaction.
For this use case, we will be using IBM’s Telco Customer Churn Dataset, which contains information about a fictional telco company that provides home phone and Internet services to 7043 customers in California. This dataset provides essential insights into each customer's profile, covering everything from subscribed services and tenure to socio-demographic information and sentiment data.
First, let’s take a look at the data in the map below (open in full screen here). From the map widgets - the dynamic graphical elements on the right of the map -we can see that 26.54% of customers churned this quarter, resulting in a $3.68M revenue loss. Regions like Los Angeles and San Diego are characterized by having both a large number of customers and a higher number of lost customers, positioning them as high-priority areas for improving customer retention.
Now, it is time to create our first workflow using the BigQuery ML Extension! We will load the telco dataset into our data warehouse, from which we’ve pre-selected some interesting features (e.g. those correlated with churn), and we will train a classification model to estimate which customers are prone to churn and which are not. This is the complete workflow, which can be accessed as a template which you can open directly in your workspace here:
First, we use the Where component to select only those rows for which the churn_label is available. This will be the data we will split for training and evaluating our model through random sampling (RAND() < 0.7) using another Where component.
Secondly, we will use the training data to create a classification model, whose output will be the probability of churn (i.e. 0 means no churn, 1 means churn) for a customer given specific socio-demographic, contract type and sentiment characteristics. Simply connect the Create Classification Model component to the input data and set up the model’s parameters. You will need to give the model a Fully Qualified Name (FQN), which is where the model will be stored. In this way, it would also be possible to call the model from a different workflow using the Get Model by Name component.
Next, we will Evaluate the performance of our model using the test data. Based on the classification metrics (see below), the results seem very promising. The high accuracy indicates that the model correctly predicts the majority of instances, and the low log loss suggests that our model's probability estimates are close to the actual values. With precision and recall both performing well, we can be confident that the model is making correct positive predictions, and the F1 score further reassures us that the balance between precision and recall is optimal. Additionally, the ROC AUC score shows that our model has a strong ability to distinguish between clients churning and not churning. Overall, these metrics highlight that our model is well-tuned and capable of handling the classification task effectively.
Having a model that performs good, we can then run predictions and obtain estimates to check which customers are prone to churn. We will use the Predict component to do so.
Lastly, to better understand our model, we can take a look at the model’s explainability. This gives an estimate of each feature’s importance when it comes to churn. The Global Explain component provides the feature importance of the model predictors to each class (churn vs no churn). If the Class level explain option is not clicked, the overall feature importances are given, rather than per class. For further details, we can also use the Explain Predict component, that provides feature attributions that indicate how much each feature in your model contributed to the final prediction for each given customer.
From the results for the overall feature importances, we can see that the most important features when it comes to estimating churn are the customer’s overall satisfaction rating of the company (satisfaction_score), the customer’s current contract type (contract), the number of referrals the customer has made (number_of_referrals), and whether or not the customer has subscribed to an additional online security service (online_security).
We can visualize the results in the following map (open in full screen here), where we can see which customers are prone to churn, and with which probability this will happen.
Now that you’ve seen the power of the new BigQuery ML Extension Package integrated into Workflows, it’s time to dive in and experience the impact for yourself! Whether you’re looking to unlock valuable insights from your data or make predictions with machine learning models, this extension is designed to help you achieve your goals faster and more intuitively.
Ready to see how it can transform your data-driven decisions? Learn more about Extension Packages in our Documentation - or watch them in action in our upcoming webinar with Google: Analyzing the Earth at scale. You can also sign up for a free 14-day trial today and take your analytics and workflows to the next level!