Background

Google Agentic Era Hackathon

Google Agentic Era HackathonAnimated Box
3/31/2025

On Thursday 18th of March 2025, our team at Norma had the pleasure of attending the Google Agentic Era Hackathon in Paris — an exclusive event designed for Google Partners.

What is the Agentic Era hackathon about?

As AI continues to evolve, we are entering what Google calls the Agentic Era — a time where AI agents are no longer just tools, but collaborators capable of reasoning, planning, and autonomously taking actions to achieve goals. This hackathon was an opportunity to explore and build with these next-generation AI capabilities, using Google’s cutting-edge tools and platforms.

The Agent Starter Pack

To help us get started, Google provided every team with an “Agent Starter Pack”:

a ready-to-use setup that included the most efficient tools to build and deploy an AI agent. This pack combines several Google products and frameworks such as:

  • Vertex AI for building, deploying, and managing machine learning models
  • LangChain for prompt orchestration and tool integration
  • Firebase and BigQuery for real-time data storage and querying
  • Google Cloud Logging for accessing and monitoring application logs
  • Google Workspace APIs for integrations with tools like Gmail, Calendar, and Docs

The full starter pack is available on GitHub: agent-starter-pack

Our challenge was to reuse this foundation to design and prototype a functional AI agent with real-world utility.

What we designed

We decided to build an “AI-Powered Agent for Production Monitoring & Code Analysis”.

Our project focused on solving a common but time-consuming problem for developers: identifying and resolving critical issues hidden within cloud logs.

We designed an AI-powered agent that automates issue detection, performs code analysis, and suggests actionable solutions, all seamlessly integrated into Slack, our team’s daily communication tool.

Core Functionalities:

  • Automated issue detection: the agent monitors production logs in real time, identifying anomalies and potential failures.
  • Code analysis & root cause identification: it investigates source of the error and handles the origin of bugs.
  • Actionable insights: instead of just reporting problems, the agent explains them clearly and proposes potential fixes.
  • Slack Bot integration: instant Slack notifications are triggered only for critical issues, enabling focused troubleshooting.

Agent code overview

The agent-starter-pack comes with a full Streamlit app, and several code files including agent.py.

  1. After installing the required libraries and configuring environment variables, we defined a system prompt that is executed by the gemini-2.0-flash-001 language model.
LOCATION = "us-central1"
LLM = "gemini-2.0-flash-001"

system_message = f"""
You are an advanced production monitoring agent responsible for overseeing our deployed environment. Your tasks include:

1. Querying logs and traces from the past 24 hours—do not look beyond this period.
2. Answer questions such as:
- "Can you give me the list of calls we had in the Cloud Run service in the last 12 hours?"
- "What was the last issue faced?"
- "Analyze the logs and traces to determine which part of the project’s code might be the root cause."
3. For any incident detected:
- Extract key details (error messages, timestamps, etc.) from logs and traces.
- Identify if the error references a specific source file.
- If a file is referenced, search the corresponding Github repository to locate the relevant code.
- Identify the specific code line(s) that might be causing the issue.
- Propose a concise, actionable fix based on your analysis.
4. If no anomalies or errors are found, confirm that the system is operating normally.

Always include relevant context (e.g., error details, affected file names) in your responses and ensure that the timeframe does not exceed 24 hours.
Current date and time (Europe/Paris): {datetime.now(ZoneInfo("Europe/Paris")).strftime('%Y-%m-%d %H:%M')}
"""

2. We then set up a Slack alert context, allowing the agent to send error logs retrieved from GCP along with their associated severity level.

# 1. Create an alert context
def send_slack_alert(error_logs, severity="ERROR", dry_run=False):
"""Sends an error message with GCP logs."""

payload = {
"attachments": [
{
"fallback": f"*{severity} in production",
"text": f"```{error_logs}```",

}
]
}

if dry_run:
print("Slack alert:", json.dumps(payload, indent=2))
return

headers = {"Content-Type": "application/json"}
response = requests.post(SLACK_WEBHOOK_URL, data=json.dumps(payload), headers=headers)

if response.status_code != 200:
print(f"Error while sending message to Slack: {response.text}")

3. We developed four tools using LangGraph to enhance agent capabilities:

  • Retrieve GCP logs
  • Fetch associated GitHub repository
  • Access related files in the repository
  • Search code within the GitHub repo

These tools were bundled and passed to the agent, enabling dynamic decision-making by the LLM based on user input.

Here is an example for the one searching for a github repository

@tool
def search_github_repo(query: str) -> str:
"""
Search the GitHub repository for files matching the query.
Args:
query (str): A keyword or regex pattern to filter file names.
Returns:
str: A list of file paths (as a string) that match the query.
"""
auth = Auth.Token(ACCESS_TOKEN)
g = Github(auth=auth)
try:
repo = g.get_repo("hello-world-norma/hello-world-api-service")
# Retrieve the entire repository tree (you may need to adjust branch as needed)
tree = repo.get_git_tree("main", recursive=True)
matching_files = [file.path for file in tree.tree if query.lower() in file.path.lower()]
return json.dumps(matching_files, indent=2)
except Exception as e:
return f"GitHub repository search error: {e}"
finally:
g.close()

All tools were then grouped into a tools list, which is passed to the agent.

tools = [get_gcp_logs, query_github_file, search_github_repo, search_github_code]

4. We instantiated the language model using ChatVertexAI, which is capable of recognizing when and how to call the tools based on user inputs.

# 3. Set up the language model
llm = ChatVertexAI(
model=LLM, location=LOCATION, temperature=0, max_tokens=1024, streaming=True
).bind_tools(tools)

5. Finally, we defined the agent’s behavior using LangGraph, connecting the model and the tools into a structured workflow. The LLM dynamically decides whether to call a tool and if not, it’s going to return the LLM’s normal response message to the user.

# 4. Define workflow components
def should_continue(state: MessagesState) -> str:
"""Determines whether to use tools or end the conversation."""
last_message = state["messages"][-1]
return "tools" if last_message.tool_calls else END


def call_model(state: MessagesState, config: RunnableConfig) -> dict[str, BaseMessage]:
"""Calls the language model and returns the response."""
messages_with_system = [
{"type": "system", "content": system_message}
] + state[
"messages"
]
# Forward the RunnableConfig object to ensure the agent is capable of streaming the response.
response = llm.invoke(messages_with_system, config)
return {"messages": response}


# 5. Create the workflow graph
workflow = StateGraph(MessagesState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", ToolNode(tools))
workflow.set_entry_point("agent")

# 6. Define graph edges
workflow.add_conditional_edges("agent", should_continue)
workflow.add_edge("tools", "agent")

# 7. Compile the workflow
agent = workflow.compile()

User Experience

Developers can interact with the agent directly in the Streamlit app, ask for context, get summaries of errors, and receive guidance on next steps. The agent highlights potential causes in the code and offers targeted support, making it especially useful for teams managing multiple projects in production.

Technical Implementation on GCP:

Our implementation was based on the Agent Starter Pack.

We:

  • Set up the agent-starter-pack in Google Cloud.
  • Added tools for log monitoring in GCP.
  • Enabled code repository access to enhance contextual understanding of issues.
  • Integrated a Slack webhook to connect the agent directly to our communication channel.

Business Impact:

  • Faster issue resolution: cuts down debugging time so devs can focus on building features.
  • Minimized downtime: proactive monitoring helps prevent service disruptions.
  • 24/7 support: the agent works continuously, supporting multiple environments.
  • Improved productivity: repetitive debugging tasks are offloaded to the agent, reducing stress for engineers and cognitive load.

Difficulties faced

We made a point of building a secure, production-ready solution that went beyond demonstrating AI capabilities. The solution had to demonstrate its value in the real world and be deployed in a professional environment. This meant that our team had to navigate not only the technical complexities of building an AI agent, but also find real added value in a very limited timeframe.

To make sure that the solution was effective, we used it with one of our real production environments, which, a few days earlier, had alerted us to a 422 error. In this way, we were able to diagnose the origin of the error and solve our problem in a shorter timeframe.

Much of our effort went into meeting the hackathon’s evaluation criteria, which included presenting a demo, ensuring that the agent was safely deployed in production, and justifying the relevance and impact of our use case.

We also ran into issues with IAM permissions, especially when accessing GCP logs and interacting with other cloud services. Properly configuring access rights took additional time and troubleshooting.

In addition, Slack alert integration is partially working: alert messages are working and are sent automatically in Slack.The content received by Slack needs refinement to ensure clarity, relevance and comprehensive information since it contains either too many detailed logs or an empty message.

Improvements

To move the project forward, we need to optimize the way the tools are queried. In addition, the structure of the prompt is a key element in indicating the LLM’s role in the system, on the basis of which we can further improve the prompt.

Finally, better manage the formatting and structure of GCP logs, ensuring they are clearly presented to users. This would make it easier to interpret errors and act quickly in production environments.

Video demonstration


Find a better way to
customAI solutions

Loading...