What's new in Databricks - February 2025
February 2025 Release Highlights
Automatic Liquid clustering is now public preview
Serverless Compute can now use instance profiles for data access
Data Engineering
You can now write from pipelines to external services with Delta Live tables sinks
The Delta Live Tables sink
API is in Public Preview. With Delta Live Tables sinks, you can write data transformed by your pipeline to targets like event streaming services such as Apache Kafka or Azure Event Hubs, and external tables managed by Unity Catalog or the Hive metastore. Documentation
Standard access mode compute now support more Scala Streaming functions
Standard access mode compute now supports the Scala streaming function DataStreamWriter.foreach
on Databricks Runtime 16.1 and above. On Databricks Runtime 16.2 and above, the functions DataStreamWriter.foreachBatch
and KeyValueGroupedDataset.flatMapGroupsWithState
are supported.
Automatic liquid clustering is now public preview
You can use enable automatic liquid clustering on Unity Catalog managed tables. Automatic liquid clustering intelligently selects clustering keys to optimize data layout for your queries. Documentation
New functions allowed in Delta Lake generated columns (DBR16.2+)
you can use the timestampdiff and timestampadd functions in Delta Lake generated column expressions. Documentation
Support for SQL Pipeline syntax
A SQL pipeline structures a standard query
Trailing blank insensitive collations
Support for trailing blank insensitive collations, adding to the collation support added in Databricks Runtime 16.1. For example, these collations treat 'Youssef'
and 'Youssef '
as equal
Governance
UC Governed access to external cloud services using service credentials is now GA
Service credentials enable simple and secure authentication with your cloud tenant’s services from Databricks. Service credentials support Scala and Python SDKs. Documentation
Delta Sharing change behavior
Shares created using the SQL command ALTER SHARE <share> ADD TABLE <table>
now have history sharing (WITH HISTORY
) enabled by default.
Preview files in volumes
Volumes now display previews for common file formats in Catalog Explorer, including images, text files, JSON, yaml, and CSV
Platform
You can now download as excel in notebooks connected to SQL Warehouses
Serverless compute can now use instance profiles for data access
Notebooks are supported as workspace files
You can now programmatically interact with notebooks from anywhere the workspace filesystem is available, including writing, reading, and deleting notebooks like any other file. Documentation.
OAuth secrets for service principals now have a configurable lifetime
Newly created OAuth secrets default to a maximum lifetime of two years, whereas previously, they did not expire.
GenAI & ML
Connect AI agent tools to external services (public preview)
You can connect AI agent tools to external applications like Slack, Google Calendar, or any service with an API using HTTP requests. Set up authentication to the external service using either a bearer token, OAuth 2.0 Machine-to-Machine, or OAuth 2.0 User-to-Machine Shared. In terms of requirements, your workspace must be Unity Catalog enabled, you must have network connectivity from a Databricks compute resource to the external service, and you must use a compute with dedicated access mode (formerly single user access mode) on Databricks Runtime 15.4 and above. Finally, you must have a pro or serverless SQL warehouse.
Learn more here.
Updates in the model serving billing records
In order to improve the cost observability, billing records are now logged every five minutes rather than one hour interval.
MLFlow Tracing is GA
You can track inputs, outputs, and other metadata associated with each step of a model or agent request.
Tracing lets you pinpoint the source of bugs and unexpected behavior, compare performance across models or agents, and build new datasets to improve the quality.
AIBI
AI/BI dashboards
Quickly navigate to the most popular dashboards: Dashboard thumbnails are now shown for all dashboards published with embedded credentials. The dashboards listing page attempts to show thumbnails for the four most popular dashboards you can access. Dashboards you don't have access to do not appear on the listing page.
Pivot tables support more cells: Pivot tables now accommodate up to 1,000 rows and 1,000 columns, up from the previous limit of 100 rows and 100 columns.
Edit box plot display names: You can now edit the Y-axis display names in box plots, enabling a more customized presentation.
Multiple Y fields for generated charts: Visualizations generated using the Databricks Assistant now support multiple Y fields.
ColorBy performance optimization: Rendering is now optimized for charts with a very large number of groupings. This optimization prevents performance issues and crashes.
Customize sort order and label angles: Control the sorting order of data on the axis and adjust the angle of labels in visualizations. See Format axis settings.
Custom column widths for tables: All column types in table visualizations now support custom widths. Drag the handle at the top of a column to adjust its size.
Enhanced value display in stacked bar and pie charts: Stacked bar charts and pie charts now display raw values and percentages together.
Clone dashboard pages: You can now duplicate dashboard pages. See Clone a page.
Updated timezone handling: Visualizations now use the timezone from the dataset or compute resource instead of the browser settings. If a widget includes two columns with different time zones, the second is formatted to match the first
AI/BI Genie
Edit parameters in a response: You can now edit the parameter values used to generate a response to a trusted question. See Review a response.
View data sources: Genie now displays the tables used as source data for each response.
Avoid unnecessary wait times: You can now cancel a SQL query execution during the Waiting for warehouse state to avoid unnecessary wait times.
Improved reasoning about generated SQL: Genie’s model for translating text into SQL now uses Chain-of-Thought reasoning to break down questions into manageable steps: first, identifying useful columns; next, planning the SQL generation; and finally, combining the parts into a single SQL query. This upgrade results in more robust and accurate SQL translations. You should see improvements in Genie’s ability to pick precise filter conditions and improved reasoning on nuanced questions.
Sharing a Genie space now sends an email notification to the recipient. See Share a Genie space.