xbill for Google Developer Experts

Posted on Apr 1 • Originally published at Medium on Apr 1

Building a Multimodal Agent with the ADK and Gemini Flash Live 3.1

#python #googleadk #gemini #geminilive

Leveraging the Google Agent Development Kit (ADK) and the underlying Gemini LLM to build Agentic apps using the Gemini Live API with the Python programming language deployed to Google Cloud Run.

Aren’t There a Billion Python ADK Demos?

Yes there are.

Python has traditionally been the main coding language for ML and AI tools. The goal of this article is to provide a minimal viable basic working ADK streaming multi-modal agent using the latest Gemini Live Models.

In the Spirit of Mr. McConaughey’s “alright, alright, alright”

So what is different about this lab compared to all the others out there?

This is one of the first implementations of the latest Gemini 3.1 Flash Live Model with the Agent Development Kit (ADK). The starting point for the demo was an existing Code lab- which was updated and re-engineered with Gemini CLI.

The original Codelab- is here:

Way Back Home - Building an ADK Bi-Directional Streaming Agent | Google Codelabs

What Is Python?

Python is an interpreted language that allows for rapid development and testing and has deep libraries for working with ML and AI:

Welcome to Python.org

Python Version Management

One of the downsides of the wide deployment of Python has been managing the language versions across platforms and maintaining a supported version.

The pyenv tool enables deploying consistent versions of Python:

GitHub - pyenv/pyenv: Simple Python version management

As of writing — the mainstream python version is 3.13. To validate your current Python:

admin@ip-172-31-70-211:~/gemini-cli-aws/mcp-lightsail-python-aws$ python --version
Python 3.13.12

Google Cloud Run

Google Cloud Run is a fully managed, serverless compute platform that enables you to run containerized applications and services without managing infrastructure. It automatically scales up or down — even to zero — based on traffic, allowing you to pay only for resources used, billed by the 100-millisecond.

More info is available here:

Cloud Run

Gemini Live Models

Gemini Live is a conversational AI feature from Google that enables free-flowing, real-time voice, video, and screen-sharing interactions, allowing you to brainstorm, learn, or problem-solve through natural dialogue. Powered by the Gemini 3.1 Flash Live model , it provides low-latency, human-like, and emotionally aware speech in over 200 countries.

More details are available here:

Gemini 3.1 Flash Live Preview | Gemini API | Google AI for Developers

The Gemini Live Models bring unique real-time capabilities than can be used directly from an Agent. A summary of the model is also available here:

https://deepmind.google/models/model-cards/gemini-3-1-flash-live/

Gemini CLI

If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:

npm install -g @google/gemini-cli

Testing the Gemini CLI Environment

Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:

▝▜▄ Gemini CLI v0.33.1
    ▝▜▄
   ▗▟▀ Logged in with Google /auth
  ▝▀ Gemini Code Assist Standard /upgrade no sandbox (see /docs) /model Auto (Gemini 3) | 239.8 MB

Node Version Management

Gemini CLI needs a consistent, up to date version of Node. The nvm command can be used to get a standard Node environment:

GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

Agent Development Kit

The Google Agent Development Kit (ADK) is an open-source, Python-based framework designed to streamline the creation, deployment, and orchestration of sophisticated, multi-agent AI systems. It treats agent development like software engineering, offering modularity, state management, and built-in tools (like Google Search) to build autonomous agents.

The ADK can be installed from here:

Agent Development Kit (ADK)

Where do I start?

The strategy for starting multimodal real time agent development is a incremental step by step approach.

First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.

Then, a minimal ADK Agent is built and tested locally. Next — the entire solution is deployed to Google Cloud Run.

Setup the Basic Environment

At this point you should have a working Python environment and a working Gemini CLI installation. All of the relevant code examples and documentation is available in GitHub. This repo has a wide variety of samples- but this lab will focus on the ‘level_3-gemini’ setup.

The next step is to clone the GitHub repository to your local environment:

cd ~
git clone https://github.com/xbill9/way-back-home
cd way-back-home
cd level_3_gemini

Then run init.sh from the cloned directory.

The script will attempt to determine your shell environment and set the correct variables:

source init.sh

If your session times out or you need to re-authenticate- you can run the set_env.sh script to reset your environment variables:

source set_env.sh

Variables like PROJECT_ID need to be setup for use in the various build scripts- so the set_env script can be used to reset the environment if you time-out.

Build the User Interface

The front end files provide the user interface:


xbill@penguin:~/way-back-home/level_3_gemini$ make frontend
cd frontend && npm install && npm run build

up to date, audited 219 packages in 577ms

49 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

> frontend@0.0.0 build
> vite build

vite v7.3.1 building client environment for production...
✓ 33 modules transformed.
dist/index.html 0.46 kB │ gzip: 0.29 kB
dist/assets/index-xOQlTZZB.css 21.60 kB │ gzip: 4.54 kB
dist/assets/index-0hbet2qm.js 214.56 kB │ gzip: 67.44 kB
✓ built in 1.01s
xbill@penguin:~/way-back-home/level_3_gemini$

Test The User Interface

The mock server test script allows the interface and Browser settings to be set to allow multimedia — without using any external Model calls or tokens:

xbill@penguin:~/way-back-home/level_3_gemini$ make mock
. ./mock.sh
http://127.0.0.1:8080/
Serving static files from: /home/xbill/way-back-home/level_3_gemini/frontend/dist
INFO: Started server process [24714]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

The Deployed mock front-end will look similar to:

Verify The ADK Installation

To verify the setup, run the ADK CLI locally with the biometric_agent:

xbill@penguin:~/way-back-home/level_3_gemini$ make testadk
. ./testadk.sh
connect to local ADK CLI 

Log setup complete: /tmp/agents_log/agent.20260401_101649.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.pyenv/versions/3.13.12/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.pyenv/versions/3.13.12/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent biometric_agent, type exit to exit.

[biometric_agent]: Scanner Online.




  
  
  Test The ADK Web Interface


This tests the Audio / Video ADK agent interactions:



xbill@penguin:~/way-back-home/level_3_gemini$ make adk
. ./runadk.sh
connect on http://127.0.0.1:8000/

2026-04-01 10:18:01,644 - INFO - service_factory.py:266 - Using in-memory memory service
2026-04-01 10:18:01,644 - INFO - local_storage.py:84 - Using per-agent session storage rooted at /home/xbill/way-back-home/level_3_gemini/backend/app
2026-04-01 10:18:01,644 - INFO - local_storage.py:110 - Using file artifact service at /home/xbill/way-back-home/level_3_gemini/backend/app/.adk/artifacts
/home/xbill/.pyenv/versions/3.13.12/lib/python3.13/site-packages/google/adk/cli/fast_api.py:192: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.pyenv/versions/3.13.12/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
INFO: Started server process [26851]
INFO: Waiting for application startup.

+-----------------------------------------------------------------------------+
| ADK Web Server started |
| |
| For local testing, access at http://0.0.0.0:8000. |
+-----------------------------------------------------------------------------+

INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)



    

    







Then use the web interface — either on the local interface 127.0.0.1 or the catch-all web interface 0.0.0.0 -depending on your environment:



Special note for Google Cloud Shell Deployments- add a CORS allow_origins configuration exemption to allow the ADK agent to run:




adk web --host 0.0.0.0 --allow_origins 'regex:.*'



    

    








  
  
  Lint and Test the Main Python Code


The final step is to build, lint, and test the main Python code.

To Lint:





xbill@penguin:~/way-back-home/level_3_gemini$ make lint
ruff check .
All checks passed!
ruff format --check .
10 files already formatted
cd frontend && npm run lint

> frontend@0.0.0 lint
> eslint .



    

    







To Test:




xbill@penguin:~/way-back-home/level_3_gemini$ make test
python -m pytest
============================================================ test session starts ============================================================
platform linux -- Python 3.13.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/xbill
configfile: pyproject.toml
plugins: anyio-4.11.0
collected 11 items                                                                                                                          

backend/app/biometric_agent/test_agent.py ..... [45%]
test_live_connection.py .. [63%]
test_ws_backend.py .. [81%]
test_ws_backend_v2.py .. [100%]

============================================================ 11 passed in 3.20s =============================================================
xbill@penguin:~/way-back-home/level_3_gemini$ 



    

    








  
  
  Running Locally


The main Python Code can then be run locally:




xbill@penguin:~/way-back-home/level_3_gemini$ make run
. ./biosync.sh
Local URL
http://127.0.0.1:8080/
2026-04-01 10:59:43,112 - INFO - System Config: 2.0 FPS, 10.0s Heartbeat
Warning: Frontend build not found at /home/xbill/way-back-home/level_3_gemini/frontend/dist
Please run 'npm run build' in the frontend directory.
INFO: Started server process [1178]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)



    

    







Then connect to the local front end:




  
  
  Deploying to Google Cloud Run


A utility script runs the deployment to Cloud Run. Use the deploy version from the local system:




xbill@penguin:~/way-back-home/level_3_gemini$ source deploy.sh
Deploying container to Cloud Run service [biometric-scout] in project [comglitn] region [us-central1]
⠏ Deploying...                                                                                                                              
  ⠏ Creating Revision...                                                                                                                    
  . Routing traffic...                                                                                                                      
  ✓ Setting IAM Policy... 0.0s 0.0s



    

    







You can validate the final result by checking the messages:




Service [biometric-scout] revision [biometric-scout-00014-rh7] has been deployed and is serving 100 percent of traffic.
Service URL: https://biometric-scout-1056842563084.us-central1.run.app



    

    







Once the container is deployed- you can then get the endpoint:




xbill@penguin:~/way-back-home/level_3_gemini$ make endpoint
https://biometric-scout-fgasxpwzoq-uc.a.run.app



    

    







The service will be visible in the Cloud Run console:




  
  
  Running the Web Interface


Start a connection to the Cloud Run deployed app:




https://biometric-scout-fgasxpwzoq-uc.a.run.app



    

    







Then connect to the app :



Then use the Live model to process audio and video:



Finally — complete the sequence:




  
  
  So What Really Changed in the Lab? It looks the Same to Me!


This project was a great learning experience to get first hand review of working with different LLM models and common issues with ADK and the nuances of working with a real-time Live model.

The key changes include:


Gemini Models vs Vertex AI Models


The original lab was using LLM models hosted on Vertex AI with PROJECT_ID and REGION authentication. The Gemini 3.1 Live Flash model is only current supported with the Gemini API- which requires an API key. When the authentication method and API is changed- the actual model support also changes — so you have to be careful that the model you want to use is available on that API.


Support for Gemini 3.1 Live


The API support for the 3.1 version of the Live model is slightly different- and required a translation layer for the current version of the ADK. This “monkey patch” acts as a workaround until full support of the 3.1 Live model is available in the ADK. There are several open issues in GitHub around these issues and they are noted in the GEMINI.md file.


Cloud Shell CORS over-ride


The recent versions of the ADK are more restrictive for running from the Cloud Shell.


ADK CLI Support


The ADK agent was extended to detect when running from the CLI and not throw errors related to the Live model.


Audio Support Client Side


The client side audio API was updated to the recommended Worklet implement ion to get audio processing off the main thread.


Re-Architected the Protocol to be Mostly JSON-Free


The original version wrapped the binary messages in JSON- which makes it easier to work with- but also introduces overhead. The updated code sends the Audio and Video streams as raw data.


Gemini CLI Development support


README.md and GEMINI.md were added to the project- as well as an Agent Skill for working with Live Models.


  
  
  Gemini CLI Code Review


As a final step— Gemini CLI was used for a full code review of the project:




✦ Based on my comprehensive review of the "Alpha Rescue Drone - Biometric Security System," the codebase is exceptionally well-structured,
  follows modern engineering standards, and strictly adheres to the specialized requirements for Gemini 3.1 Flash Live and the Google Agent
  Development Kit (ADK).

  Core Architecture & Integration
  The system excels at orchestrating a high-frequency, multimodal data stream. The use of a binary protocol (0x01 for audio, 0x02 for video)
  in useGeminiSocket.js minimizes overhead, while the monkey-patching utility in patch_adk.py demonstrates a sophisticated understanding of
  the evolving Gemini 3.1 Live API. This proactive approach ensures compatibility with features like send_realtime_input that are not yet
  natively handled by the standard ADK.

  Backend Analysis
   - Agent Intelligence: The instructions in agent.py are precise and prioritize safety (offensive gesture detection) and hidden overrides
     ("Devil's Horns") effectively.
   - Resilience: The backend implements a heartbeat stimulus (CONTINUE_SURVEILLANCE) and a neural handshake, which are critical for
     maintaining model engagement during visual-only periods.
   - Model Fallback: The get_model_id() logic is clever, automatically switching to gemini-2.5-flash for CLI runs to prevent 404 errors,
     while defaulting to the high-performance gemini-3.1-flash-live-preview for production.

  Frontend & UI/UX
   - Aesthetic: The "Neon Cyan" HUD in BiometricLock.jsx provides a high-fidelity, futuristic feel that aligns with the "Mission Alpha"
     theme.
   - Feedback Loop: The synchronization between the random sequence generation, visual matching, and the special "Heavy Metal" audio event
     creates a compelling user experience.
   - Efficiency: Utilizing toBlob with JPEG compression (0.6) for the 2 FPS video stream is an optimal choice for balancing visual clarity
     with network latency.

  Testing & Maintainability
   - Validation: The project maintains a clean slate with 100% test and linting compliance. The inclusion of mock.sh is a significant
     developer productivity boost.
   - Standards: The use of Ruff for Python and ESLint for the frontend ensures long-term code quality.

  Recommendations
   1. Connection Resilience: While the SYSTEM_ERROR closes the socket correctly, the frontend could benefit from a more graceful
      "Reconnecting..." state if the network fluctuates during a mission.
   2. Patch Management: As the Gemini Live API stabilizes, keep an eye on official ADK updates to deprecate the manual unrolling of
      media_chunks in patch_adk.py.
   3. Security: Verify that the .env variables are correctly injected into the Cloud Run environment via the Makefile or cloudbuild.yaml
      without being exposed in the build logs.

  Overall, the system is a robust and creative implementation of real-time multimodal AI. Authentication Protocol: Stable. 🤘
                                                                                                                             ? for shortcuts



    

    








  
  
  Summary


The Agent Development Kit was used to enable a multi-modal agent using the Gemini Live Model. This Agent was tested locally with the CLI and then deployed to Google Cloud Run. Several key take-aways and lessons learned were summarized from working with the transition to a new Live Gemini LLM model. Finally, Gemini CLI was used for a complete project code review.