What's new in Databricks - May 2024
Data+AI Summit is coming! 500+ sessions on topics ranging from #ML, data science, data engineering, #GenAI, and data governance!
Register now and find your learning path: https://dbricks.co/4cBIag9
Get a $500 discount if you use this code to register for DAIS: SUMCL5VDG
AI, LLM & Data Science
Foundation Model Fine Tuning is now Public Preview, with a new dbdemos!
You can now easily Fine-Tune an existing Foundation Model (llama3, DBRX...) to optimize its performance, reduce cost, and increase privacy and security.
Databricks makes it easy to prepare and clean your training dataset. Once ready, you can just select your Table within UC and Databricks will automatically tune the LLM for your use-case!
Once the Fine-Tuning run is completed, your LLM is automatically available within your Unity Catalog repository, ready to be deployed and tested!
New capabilities for Databricks Vector Search
Databricks Vector Search is a vector database built into the Databricks Data Intelligence Platform and integrated with its governance and productivity tools. A vector database is optimized to store and retrieve embeddings. They are the mathematical representations of the semantic content of data, typically text or image data. Embeddings are generated by a large language model and are a key component of many GenAI applications that depend on finding documents or images that are similar to each other.
New capabilities include the following:
PrivateLink and IP access lists are now supported.
Customer Managed Keys (CMK) are now supported on endpoints created on or after May 8, 2024. Vector Search support for CMK is in Public Preview.
Improved audit logs and cost attribution tracking. See Audit log reference.
You can now save generated embeddings as a Delta table. See Create a vector search index.
Pre-trained models in Unity Catalog
A selection of OSS, pre-trained GenAI models are now included in the System.AI schema! It makes it super easy to reference them for deployments!
Data Engineering on Databricks
Databricks Assistant autocomplete
Databricks Assistant autocomplete provides AI-powered suggestions in real-time as you type in notebooks, queries, and files. Learn more
Notebooks now detest and auto-complete column names for Spark Connect Dataframes
DBT-Databricks connector adopts decoupled dbt architecture
DBT-Databricks connector 1.8.0 is the first version to adopt the new decoupled dbt architecture. Rather than depend on dbt-core to free customers from having to specify versions for both libraries, the connector now depends on a shared abstraction layer between the adapter and dbt-core. As a result, the connector no longer needs to match the Databricks feature version to that of dbt-core, and is free to adopt semantic versioning. This means that connector developers no longer need to release significant features like compute-per-model as patches.
Governance & Delta Sharing
Bing storage credentials and external locations to specific workspaces
You can now bind storage credentials to specific workspaces, preventing access to those objects from workspaces. It’s useful if you use
Attribute tag values for Unity catalog
Attribute tag values in Unity Catalog can now be up to 1000 characters long.
Row filters and column masks are now GA
The ability to apply row filters and column masks to table is now GA on DBR 12.2+ with many new functionalities. Learn more
New Tableau connector for Delta Sharing
OAuth is supported in Lakehouse Federation for Snowflake
New dashboard helps Databricks Marketplace providers monitor listing usage
The new Provider Analytics Dashboard enables Databricks Marketplace providers to monitor listing views, requests, and installs. The dashboard pulls data from the Marketplace system tables. Learn more.
Platform admin
Unified login now supported with AWS private link
Unified login allows you to manage one single sign-on (SSO) configuration in your account that is used for the account and Databricks workspace. Unified login is now supported with private connectivity using AWS PrivateLink between users and their Databricks workspaces. Learn more
The compute metrics UI is now available on all Databricks Runtime versions
The compute metrics UI has been rolled out to all Databricks Runtime versions. Previously, these metrics were available only on compute resources running on Databricks Runtime 13.3 and above.
In a nutshell…
Credentials passthrough and hive metastore table access controls are deprecated
Databricks JDBC driver 2.6.38 has been released
You can now use AWS Graviton instance types in workspaces that enable the compliance security profile or enhanced security monitoring
Compute now uses EBS GP3 volumes for autoscaling local storage
Bulk move and delete workspace objects from the workspace browser
Compute plane outbound IP addresses must be added to a workspace IP allow list
Featured articles
How to leverage zero-shot and few shot learning for text classification on Databricks
Keeping Your Databricks Direct Vector Access Index Fresh in Near Real Time
A guide to quick and scalable historical data loads into Databricks using Azure Data Factory
Unified Logging for Databricks Notebooks and ADF with Azure Log Analytics
Five things that can go wrong when building RAG applications - Part 1