Partners

Unleashing the Power of AI: Automating SQL Query Generation and Real-Time Data Streaming with Confluent and Google Cloud

September 17, 2024

Pascal Vantrepote

Senior Director of Innovation, Confluent

Merlin Yamssi

Lead Solutions Consultant, AI/ML CoE Partner Engineering, Google Cloud

Google Cloud Summit Series

Discover the latest in AI, Security, Workspace, App Dev, & more.

In today's data-driven world, companies need real-time insights from vast datasets to optimize supply chains, predict demand fluctuations, and improve patient outcomes. However, extracting these insights often involves complex SQL queries and intricate data pipelines. This is where the power of large language models (LLMs) and real-time data streaming converge, to enhance data analytics.

This blog explores how to harness LLMs to automate SQL query generation, streamlining data analysis workflows. We'll delve into how to integrate LLMs with Confluent and Vertex AI to create a powerful, end-to-end solution for real-time data processing and insights. This integration can be useful for the following use cases:

Accelerated data exploration: LLMs can generate SQL queries based on natural language prompts, enabling users to quickly explore datasets without SQL expertise.
Automated report generation: By understanding data schema and business requirements, LLMs can generate complex SQL queries for report generation, saving time and reducing errors.
Data pipeline optimization: LLMs can analyze existing SQL queries and suggest improvements, leading to more efficient data processing pipelines.
Anomaly detection: By identifying patterns in historical data, LLMs can generate SQL queries to detect anomalies and outliers.

In addition, integrating LLMs with Confluent and Vertex AI helps resolve numerous challenges and problems in the data analytics space:

Writing complex SQL queries: Writing and optimizing complex SQL queries can be time-consuming and error-prone, requiring specialized data engineering skills.
Real-time Data analysis on real-time data: Traditional batch processing methods often lack the speed and agility needed for real-time decision-making.
Data silos: Disparate data sources and formats hinder a holistic view of business operations.

To address these challenges and maximize the benefits of LLMs, Confluent and Google Cloud Vertex AI play essential roles:

Automated SQL generation: LLMs on Vertex AI/Gemini translate natural language requests into efficient SQL queries, empowering business users to access data without specialized skills.
Real-time data streaming: Confluent facilitates continuous data ingestion, processing, and delivery, ensuring that insights are readily available for immediate action.
Unified data platform: Integrating Confluent with Google Cloud services like BigQuery (or Cloud SQL) creates a centralized data platform, breaking down data silos and providing a comprehensive view.

By combining these technologies, organizations can create robust and scalable solutions for automating SQL query generation.

In the next section, we'll dive deeper into the technical implementation and explore how to integrate these technologies.

Solution details

In the following demo, we take a look at COVID data and utilize Googleâ€™s Speech-to-Text to convert spoken queries into text, setting the stage for a sophisticated data processing workflow. Confluent Cloud's FlinkAI is specifically utilized to manage calls to Googleâ€™s remote inference engine, Gemini, providing efficient and timely processing of SQL queries that were generated from user inputs. This integration within Confluentâ€™s microservices architecture, facilitated by Apache Kafka for real-time data streaming, delivers highly integrated communication and data flow between services. Then, once the data has been retrieved and processed, Gemini summarizes the findings into clear, insightful summaries that are then converted back into natural-sounding speech using Googleâ€™s text-to-speech. This robust system, combining Googleâ€™s AI capabilities with the specialized function of FlinkAI within Confluent Cloud, is a streamlined approach to delivering fast, accurate, and accessible data-driven insights through intuitive voice commands. This solution not only demonstrates the power of voice and AI integration but also opens up new possibilities for making data-driven insights more accessible to everyone.

How it works:

Voice input: Users interact with a user-friendly voice-enabled interface, articulating their data queries in natural language, such as requesting, â€œGive me all COVID-19 cases in France in 2021.â€
Speech-to-text: Google Speech-to-Text service converts the spoken input into text.
SQL query formation: The text is processed by Gemini to generate an SQL query.
Query execution: The SQL query is executed to fetch the relevant data.
Data summarization: Gemini then summarizes the retrieved data into a concise format.
Text-to-speech: A text-to-speech service converts the summarized text back into natural-sounding speech, which is delivered to the user.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Xmxcwz2.max-1100x1100.png

The diagram above shows the overall flow from the user - in natural language - to the AI models and back to the user.

The solution implements the following features:

Natural language interface: Users can interact with data using simple, intuitive language, eliminating the need for SQL expertise.
Automated query optimization: Using Gemini, Vertex AI leverages its knowledge of data structures and query patterns to generate efficient queries, optimizing performance.
Real-time data pipelines: Confluent's streaming capabilities provide insights with minimal latency, enabling proactive decision-making.
Scalability and security: The solution leverages the scalability and security of Google Cloud, ensuring data integrity and compliance (with healthcare regulations).

Letâ€™s take a deeper look at the various components.

1. Speech to Text
The initial phase of our interactive data processing system begins with converting speech to text. Utilizing a KStream application, the solution handles audio inputs where each audio file is processed to extract textual queries. This process involves the AudioProcessor class which, upon receiving an audio file, leverages Googleâ€™s Speech-to-Text API to perform accurate and fast speech recognition. Once the audio is processed, the resulting text is encapsulated as an SQLRequest, which contains both the query and session information. This SQLRequest is then forwarded to another topic within our streaming architecture, setting the stage for subsequent SQL generation and data retrieval steps. This transition from audio to text helps ensure that user queries are quickly and accurately converted into actionable database queries, ready for deeper analysis and response generation.

@Slf4j
public class AudioProcessor implements Processor<String, AudioQuery, String, SQLRequest> {

private ProcessorContext<String, SQLRequest> context;

@Override
    public void init(ProcessorContext<String, SQLRequest> context) {
        this.context = context;
    }

@Override
    public void process(Record<String, AudioQuery> record) {
        final AudioQuery audioQuery = record.value();

SpeechSettings settings;
        try {
            settings = SpeechSettings.newBuilder().setEndpoint("speech.googleapis.com:443").build();
        } catch (IOException e) {
            log.error("Error creating SpeechSettings.", e);
            throw new RuntimeException(e);
        }

try (SpeechClient speechClient = SpeechClient.create(settings)) {
            final byte[] audioFile = audioQuery.getAudio();

log.info("Processing audio for session id: {}", audioQuery.getSessionId());

// Builds the sync recognize request
            RecognitionConfig config =
                    RecognitionConfig.newBuilder()
                            .setEncoding(RecognitionConfig.AudioEncoding.WEBM_OPUS)
                            .setLanguageCode("en-US")
                            .build();
            RecognitionAudio audio = RecognitionAudio.newBuilder()
                    .setContent(ByteString.copyFrom(audioFile))
                    .build();
            RecognizeResponse response = speechClient.recognize(config, audio);
            List<SpeechRecognitionResult> results = response.getResultsList();
            if (results.isEmpty()) {
                log.info("No results found for session id: {}", audioQuery.getSessionId());
                return;
            }

final String query = results.get(0).getAlternatives(0).getTranscript();
            final SQLRequest sqlRequest = new SQLRequest(query, audioQuery.getSessionId());
            context.forward(new Record<>(audioQuery.getSessionId(), sqlRequest, record.timestamp()));

log.info("Results: {}", results);

} catch (Exception e) {
            log.error("Error processing audio for session id: {}", audioQuery.getSessionId(), e);
            throw new RuntimeException(e);
        }
    }
}

Sample code to convert audio file to text

2. Human query to SQL
A standout feature of our demo is the sqlgenerator model, which turns spoken queries into precise SQL commands. This modelâ€™s capability hinges on its sophisticated prompt system, designed to handle complex natural language inputs. The prompt details the database schema, guiding the AI to generate SQL queries that are accurate and contextually aware. For example, when a user asks for the total number of COVID-19 tests and first-dose vaccinations for each country in the latest week, the model constructs a query based on an intricate understanding of database structures and relationships, as described in the prompt. This involves parsing and translating diverse data types and table relationships into a cohesive SQL command. Additionally, it outputs a comprehensive JSON description that elucidates the queryâ€™s purpose and the data schema it impacts. This intricate prompt design underscores our innovative approach, enabling complex database queries through straightforward voice commands.

CREATE MODEL sqlgenerator
INPUT(query STRING)
OUTPUT(`sql` STRING)
COMMENT 'Human to SQL'
WITH (
  'provider' = 'googleai',
  'task' = 'text_generation',
  'googleai.endpoint' = 'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent',
  'googleai.PARAMS.top_p' = '0.95',
  'googleai.PARAMS.top_k' = '64',
  'googleai.PARAMS.temperature' = '1',
  'googleai.api_key' = '{{sessionconfig/sql.secrets.gcp_key}}',
  'googleai.system_prompt' = 'Given the database schema and structures provided below, generate a Postgres-compatible SQL query to select the required data, and provide a detailed description in JSON format about the query, the data schema, and data types.

Here is the schema:

Table 1: `vaccine_data.covid_testing_data`
- `country`: character varying(255)
- `country_code`: character varying(10)
- `year_week`: character varying(20)
...

Table 2: `vaccine_data.covid_vaccination_data`
- `reportingcountry`: character varying(255)
- `denominator`: double precision
...

Table 3: `vaccine_data.hospital_occupancy`
- `indicator`: text NOT NULL
- `date`: text NOT NULL
...

An example request for data:
"Please provide the total number of tests done and the total first-dose vaccinations for each country in the latest week available."

Expected SQL Query:
```sql
SELECT
    t.country,
    MAX(t.year_week) AS latest_week,
    SUM(t.tests_done) AS total_tests_done,
    SUM(v.firstdose) AS total_first_dose_vaccinations
FROM
    vaccine_data.covid_testing_data t
JOIN
    vaccine_data.covid_vaccination_data v
ON
    t.country = v.reportingcountry
    AND t.year_week = v.year_week
GROUP BY
    t.country;
```

Expected JSON output:
```json
{
  "query_description": "This query retrieves the total number of tests done and the total number of first-dose vaccinations for each country in the latest week available from the respective tables.",
  "schema": {
    "country": "character varying(255)",
    "latest_week": "character varying(20)",
    "total_tests_done": "integer",
    "total_first_dose_vaccinations": "integer"
  },
  "sql_query": "SELECT t.country, MAX(t.year_week) AS latest_week, SUM(t.tests_done) AS total_tests_done, SUM(v.firstdose) AS total_first_dose_vaccinations FROM vaccine_data.covid_testing_data t JOIN vaccine_data.covid_vaccination_data v ON t.country = v.reportingcountry AND t.year_week = v.year_week GROUP BY t.country;"
}
```

Human: '
);

FlinkAI SQL query to invoke Google Vertex AI/Gemini to convert human query to SQL.

JSON result after execution of the remote inference that converts a human query to SQL

3. Query execution
Following the generation of the SQL query by the sqlgenerator model, the subsequent step is its execution, which retrieves the desired data from the databases. Once the data is obtained, the system formats the results into a markdown table. This structuring is crucial as it prepares the data for further processing, tailoring it for readability and further analysis. The formatted table is then processed by another call to the Gemini inference engine, which creates a concise, human-readable summary that can be presented verbally.

{
   "sessionId":"bb76112c-3900-121d-d365-12bee9de421b",
   "executedQuery":"SELECT * FROM vaccine_data.covid_testing_data WHERE country = 'France' AND year_week LIKE '2021%';",
   "renderedResult":"| country | country_code | year_week | level    | region | region_name | new_cases | tests_done | population | testing_rate       | positivity_rate     | testing_data_source |\n| ------- | ------------ | --------- | -------- | ------ | ----------- | --------- | ---------- | ---------- | ------------------ | ------------------- | ------------------- |\n| France  | FR           | 2021-W43  | national | FR     | France      | 20604     | 3374137    | 67391582   | 50.06763307619044  | 0.6106450330854971  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 21668     | 3047258    | 67391582   | 45.217190479368774 | 0.7110654890396546  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 30112     | 2408920    | 67391582   | 35.745117246245975 | 1.2500207561894956  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 25332     | 2277710    | 67391582   | 33.79813817102558  | 1.1121696791953322  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 36249     | 4897647    | 67391582   | 72.6744625166983   | 0.7401309240947745  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 32067     | 3014996    | 67391582   | 44.73846600010072  | 1.0635835006082928  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 30575     | 2250466    | 67391582   | 33.39387403014222  | 1.3586075061787204  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 43273     | 2802684    | 67391582   | 41.588042850811846 | 1.543984266510245   | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 17001     | 3704239    | 67391582   | 54.965900637263566 | 0.4589606664148831  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 36215     | 2197339    | 67391582   | 32.605541149041436 | 1.6481298516068756  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 41156     | 4780081    | 67391582   | 70.92994196218751  | 0.8609895941093885  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 12976     | 4880462    | 67391582   | 72.41946034150082  | 0.26587646825239086 | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 40344     | 4867416    | 67391582   | 72.22587533261944  | 0.8288586798416243  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 46205     | 2474580    | 67391582   | 36.719422909525996 | 1.8671855425971278  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 43638     | 2268119    | 67391582   | 33.65582069285745  | 1.923973124866905   | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 27657     | 3130413    | 67391582   | 46.451098298894365 | 0.8834936476432982  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 39714     | 3542094    | 67391582   | 52.559887969390594 | 1.121201187771979   | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 13068     | 2071816    | 67391582   | 30.742949468080447 | 0.6307509933314541  | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 39389     | 1600183    | 67391582   | 23.744553140182997 | 2.461530962396176   | TESSy COVID-19      |\n| France  | FR           | 2021-W43  | national | FR     | France      | 17297     | 2863145    | 67391582   | 42.485202380321034 | 0.6041258825522284  | TESSy COVID-19      |",
   "description":"This query returns all COVID-19 testing data for France in 2021, including new cases, tests done, and positivity rate.",
   "query":"give me all the coid 19 in France in 2021"
}

JSON result after the execution of the generated SQL query.

4. Table summarization
Building upon the data retrieved and organized into a markdown table, the system calls the â€œsqlsummaryâ€ model, powered by FlinkAI. This model is tasked with generating an easily understandable summary of the tableâ€™s data, tailored for audio delivery. It uses a sophisticated prompt mechanism, crucial for directing the AIâ€™s text-generation capabilities. The prompt specifies that the summary should not only recite the data but also provide a coherent narrative about its context, trends, and significant points such as highs and lows.

CREATE MODEL sqlsummary
INPUT(renderedResult STRING)
OUTPUT(response STRING)
COMMENT 'SQL Summary'
WITH (
  'provider' = 'googleai',
  'task' = 'text_generation',
  'googleai.endpoint' = 'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent',
  'googleai.PARAMS.top_p' = '0.95',
  'googleai.PARAMS.top_k' = '64',
  'googleai.PARAMS.temperature' = '1',
  'googleai.api_key' = '{{sessionconfig/sql.secrets.gcp_key}}',
  'googleai.system_prompt' = 'Using the information provided in the JSON document, create a detailed summary for the table in the `renderedResult` field. The summary should contain details of the data in the table and be clear, detailed, and suitable for an audio format.

Please generate a summary that includes the following elements:
1. A brief overview of the context and source of the data.
2. Details about ranges and typical values.
3. Any notable highs and lows in new cases.
4. General trends that can be observed from the data.

Ensure that the summary is clear, detailed, and presented in a way that makes it suitable for being read out loud.

Below is the JSON document:'
);

FlinkAI SQL query to invoke Google Vertex AI/Gemini, which summarizes the JSON result into text suitable for audio.

{
   "sessionId":"bb76112c-3900-121d-d365-12bee9de421b",
   "executedQuery":"SELECT * FROM vaccine_data.covid_testing_data WHERE country = 'France' AND year_week LIKE '2021%';",
   "response":"This data comes from the TESSy COVID-19 database and presents the COVID-19 testing data for France during the year 2021, specifically for week 43. \n\nOver this week, the number of new daily cases in France fluctuated. Daily new cases ranged from a low of 12,976 to a high of 46,205.  The number of tests conducted each day also varied, ranging from approximately 1.6 million to nearly 4.9 million. This resulted in a testing rate, which represents the number of tests per 100,000 people, ranging from approximately 23.7 to 72.7. \n\nNotably, the positivity rate, which is the percentage of tests that came back positive, saw a significant high of around 2.46 on one of the days. This indicates a higher transmission rate during that time. \n\nOverall, while the testing rate remained relatively stable, the fluctuating number of new cases and positivity rate during week 43 in France suggests an evolving COVID-19 situation. \n",
   "description":"This query returns all COVID-19 testing data for France in 2021, including new cases, tests done, and positivity rate.",
   "query":"give me all the coid 19 in France in 2021"
}

Result of the summarization.

5. Text-to-speech
The final stage in the data processing pipeline is the text-to-speech conversion, where the summarized text generated by the system is transformed into audible speech. This is accomplished using a KStream application, which takes the prepared text summaries and processes them through a text-to-speech service. This service is configured to deliver high-quality audio output that captures the essence of the data in a clear and engaging manner. The application helps ensure that the speech output is free of any formatting remnants from the text summary, for a clean and professional listening experience. Once the conversion is complete, the audio is then pushed to another topic within the system, to make it accessible for further use.

/**
 * Processor that converts text to speech.
 */
@Slf4j
public class TextProcessor implements Processor<String, SQLResponse, String, byte[]> {

private ProcessorContext<String, byte[]> context;
    private final TextContentRenderer renderer = TextContentRenderer.builder().build();
    private final Parser parser = Parser.builder().build();

@Override
    public void init(ProcessorContext<String, byte[]> context) {
        this.context = context;
    }

@Override
    public void process(Record<String, SQLResponse> record) {
        final SQLResponse sqlResponse = record.value();

log.info("Processing text for session id: {}", sqlResponse.getSessionId());

final TextToSpeechSettings settings;
        try {
            settings = TextToSpeechSettings.newBuilder().setEndpoint("texttospeech.googleapis.com:443").build();
        } catch (IOException e) {
            log.error("Error creating TextToSpeechSettings.", e);
            throw new RuntimeException(e);
        }

try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create(settings)) {
            // Make sure the summary doesn't include any markdown formatting
            Node document = parser.parse(sqlResponse.getResponse());
            final String renderedText = renderer.render(document);

SynthesizeSpeechRequest request =
                    SynthesizeSpeechRequest.newBuilder()
                            .setInput(SynthesisInput.
                                    newBuilder()
                                    .setText(renderedText)
                                    .build())
                            .setVoice(VoiceSelectionParams.newBuilder()
                                    .setLanguageCode("en-US")
                                    .setSsmlGender(SsmlVoiceGender.FEMALE)
                                    .build())
                            .setAudioConfig(AudioConfig.newBuilder()
                                    .setAudioEncoding(AudioEncoding.MP3)
                                    .addEffectsProfileId("telephony-class-application")
                                    .build())
                            .build();

// Call Text-to-Speech API
            SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(request);
            final byte[] audio = response.getAudioContent().toByteArray();

// Now push the audio response to the output topic 
            context.forward(new Record<>(sqlResponse.getSessionId(), audio, record.timestamp()));
        } catch (Exception e) {
            log.error("Error processing text for session id: {}", sqlResponse.getSessionId(), e);
            throw new RuntimeException(e);
        }
    }
}

Sample code to convert text to audio file

Outcomes and benefits

By leveraging LLMs with Confluent and Vertex AI, organizations unlock the full potential of their data and gain a competitive advantage, achieving:

Increased data accessibility: Empowering non-technical users to explore data through natural language
Improved data analysis efficiency: Automating routine SQL tasks, freeing up analyst time for higher-value activities
Enhanced data quality: Identifying and correcting errors in SQL queries through LLM analysis
Faster time-to-insights: Accelerating data exploration and analysis processes
Cost reduction: Optimizing query performance and reducing reliance on SQL experts

Confluent and Google Cloud: A powerful combination for AI-driven data

Together, Confluent and Google Cloud partner to help organizations harness the full potential of their data. By combining Confluent's real-time data-streaming capabilities with Google Cloud's robust infrastructure and AI services, including Vertex AI, businesses can create innovative solutions that drive growth and efficiency.

Key benefits of this collaboration for automating SQL query generation with LLMs include:

Real-time data foundation: Confluent ensures LLMs have access to the freshest data for accurate and relevant query generation.
Scalable AI infrastructure: Vertex AI provides the ideal platform for deploying and managing LLMs.
Data integration and enrichment: Confluent's connectors and data processing capabilities enable seamless integration of diverse data sources.
Accelerated time-to-insights: By automating SQL query generation, businesses can expedite data exploration and analysis.
Improved decision making: Real-time insights derived from automated queries inform better business decisions.

Through this powerful combination, organizations can break down data silos, optimize data pipelines, and unlock the true value of their data assets.

Ready to unlock the full potential of your data?Â

Contact us to learn how Confluent and Google Cloud can help you build intelligent, data-driven applications. Begin experimenting with Vertex AI and Confluent Cloud on the Google Cloud Marketplace today!

^{*FlinkAI is currently in public preview as of this writing.}

Posted in