Orchestrating AI Agents in Your Android CI/CD Pipeline
Table of Contents
This article is part of the AI Agents in Android Development series. Before continuing, I recommend checking out:
- Beyond the Chat: Why You Need AI Agents on a Multi-Agent Environment in Android β The theoretical foundation of what agents are and why they matter.
- Your Virtual Staff: Configuring Sentinel, Bolt, and Palette β How to set up each agent in your repository using
AGENTS.md.- Autonomous AI Agents in Android: Beyond the Assistant β The conceptual leap to agents that act without human intervention.
Youβve configured your agents. You know what Sentinel does, what Bolt does, what Scribe does. You call them manually when needed and they work great. But thereβs a next level: making those agents activate automatically, at exactly the right moment, as a natural part of your workflow. Thatβs what integrating them into your CI/CD pipeline means.
In this article, weβll build an architecture where three specialized agents collaborate in a coordinated way every time you open a Pull Request in your Android project.
πΊοΈ The Architecture: Three Agents, One Pipeline
The classic Android CI/CD pipeline has well-known steps: compile, run tests, analyze lint, and publish. What weβll do is insert AI agents as additional jobs that run in parallel or in sequence depending on their dependencies.
Our three agents will be:
- Sentinel β Code Reviewer. Analyzes the PR diff, verifies Kotlin/Clean Architecture conventions, looks for security issues, and posts review comments.
- Scribe β Documenter. Generates or updates KDoc for new functions, updates
CHANGELOG.mdwith PR changes, verifies thatUseCases have proper descriptions. - Bolt β Performance. Runs Android benchmarks before and after the change, compares results, and comments if there are performance regressions.
Key principle: Each agent should be stateless within its job. It receives event context (the PR diff, the repo state), does its work, and communicates output via GitHub comments or artifacts. It doesnβt depend on the previous agent having finished β unless the logic specifically requires it.
βοΈ The Workflow YAML Structure
The workflow skeleton has three agent jobs. Sentinel runs first (its comments can block the merge if critical issues are found), while Scribe and Bolt run in parallel after Sentinel approves.
# .github/workflows/ai-agents-pipeline.yml
name: AI Agents Pipeline
on:
pull_request:
types: [opened, synchronize, reopened]
branches: [main, develop]
permissions:
contents: write
pull-requests: write
issues: write
jobs:
# ββ JOB 1: Sentinel β Code Review Agent βββββββββββββββββββββββββββββ
sentinel-review:
name: "π‘οΈ Sentinel: Code Review"
runs-on: ubuntu-latest
outputs:
review_passed: ${{ steps.review.outputs.passed }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get PR diff
id: diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > pr_diff.txt
echo "diff_lines=$(wc -l < pr_diff.txt)" >> $GITHUB_OUTPUT
- name: Run Sentinel Agent
id: review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
python scripts/agents/sentinel_review.py \
--diff pr_diff.txt \
--pr-number "$PR_NUMBER" \
--agents-config AGENTS.md \
--output review_result.json
- name: Post review comment
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const result = JSON.parse(fs.readFileSync('review_result.json'));
if (result.comments.length > 0) {
await github.rest.pulls.createReview({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: context.payload.pull_request.number,
body: result.summary,
event: result.critical_issues ? 'REQUEST_CHANGES' : 'COMMENT',
comments: result.comments
});
}
# ββ JOB 2: Scribe β Documentation Agent βββββββββββββββββββββββββββββ
scribe-docs:
name: "π Scribe: Documentation"
runs-on: ubuntu-latest
needs: sentinel-review
if: needs.sentinel-review.outputs.review_passed == 'true'
steps:
- uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
- name: Run Scribe Agent
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
python scripts/agents/scribe_docs.py \
--branch ${{ github.head_ref }} \
--base ${{ github.base_ref }} \
--update-kdoc \
--update-changelog
- name: Commit documentation updates
run: |
git config user.name "Scribe Agent"
git config user.email "scribe-agent@arceapps.github.io"
git add -A
git diff --staged --quiet || \
git commit -m "docs(scribe): auto-update KDoc and CHANGELOG [skip ci]"
git push
# ββ JOB 3: Bolt β Performance Agent βββββββββββββββββββββββββββββββββ
bolt-benchmarks:
name: "β‘ Bolt: Performance Benchmarks"
runs-on: ubuntu-latest
needs: sentinel-review
if: needs.sentinel-review.outputs.review_passed == 'true'
steps:
- uses: actions/checkout@v4
- name: Setup JDK 21
uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
- name: Run Macrobenchmarks
run: ./gradlew :benchmark:connectedBenchmarkAndroidTest
- name: Run Bolt Analysis
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python scripts/agents/bolt_benchmarks.py \
--results benchmark/outputs/ \
--baseline benchmark/baseline.json \
--pr-number ${{ github.event.pull_request.number }} \
--threshold-regression 10
π€ Implementing Each Agent as a Python Script
The GitHub Actions workflows are the orchestrator, but the actual agent logic lives in Python scripts that call the LLM API. Hereβs the structure for sentinel_review.py:
# scripts/agents/sentinel_review.py
import argparse, json, os
from openai import OpenAI
def load_agents_config(path: str) -> str:
"""Reads AGENTS.md to extract project conventions."""
with open(path) as f:
return f.read()
def build_sentinel_prompt(diff: str, config: str) -> str:
return f"""You are Sentinel, an agent specialized in Android/Kotlin code review.
PROJECT CONVENTIONS:
{config}
PULL REQUEST DIFF:
{diff}
Review the code according to the conventions. For each issue found, indicate:
- file: file path
- line: approximate line number
- severity: critical | warning | suggestion
- body: description of the issue and how to fix it
Respond in JSON with this structure:
{{
"summary": "Review summary in Markdown",
"critical_issues": boolean,
"comments": [
{{"path": "...", "position": N, "body": "..."}}
]
}}"""
def run_sentinel(diff_path: str, pr_number: str, config_path: str, output_path: str):
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
diff = open(diff_path).read()[:15000] # context limit
config = load_agents_config(config_path)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": build_sentinel_prompt(diff, config)}],
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
result["passed"] = not result["critical_issues"]
with open(output_path, "w") as f:
json.dump(result, f, indent=2)
# Output for GitHub Actions
with open(os.environ.get("GITHUB_OUTPUT", "/dev/null"), "a") as gho:
gho.write(f"passed={'true' if result['passed'] else 'false'}\n")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--diff")
parser.add_argument("--pr-number")
parser.add_argument("--agents-config")
parser.add_argument("--output")
args = parser.parse_args()
run_sentinel(args.diff, args.pr_number, args.agents_config, args.output)
Security note: Never send the full diff if it exceeds the modelβs context window. Implement chunking or filter only the relevant
.ktfiles to maintain analysis quality.
π AGENTS.md as the Pipelineβs Source of Truth
The key to keeping agents consistent across runs is having all of them read the same AGENTS.md. This file defines the conventions Sentinel must verify, the tone Scribe must use, and the thresholds Bolt must respect.
<!-- AGENTS.md β excerpt from the CI/CD section -->
## Pipeline Rules
### Sentinel (Code Review)
- CRITICAL: Every public function must have KDoc
- CRITICAL: UseCases must be in the `domain.usecase` package
- WARNING: Avoid business logic in ViewModels
- SUGGESTION: Prefer `StateFlow` over `LiveData` in new code
### Scribe (Documentation)
- KDoc format: first paragraph = description, @param = all non-obvious parameters
- CHANGELOG: follow Keep A Changelog format, [Unreleased] section
- Language: English for KDoc, project language for internal comments
### Bolt (Performance)
- Regression threshold: 10% on startup time, 15% on list operations
- Baseline file: benchmark/baseline.json (update on each release)
- Priority metrics: TimeToFullDisplayMs, FrameOverrunMs
With this contract, any new agent added to your toolset in the future can read the same AGENTS.md and behave consistently. Itβs like automated onboarding for your AI agents.
π Advanced Coordination: Conditional Flows
The basic example is linear: Sentinel β (Scribe + Bolt). But real pipelines need more sophisticated conditional logic.
Skip Scribe for Small PRs
scribe-docs:
needs: sentinel-review
if: |
needs.sentinel-review.outputs.review_passed == 'true' &&
github.event.pull_request.changed_files > 3
Security Agent Only for Data Layer Changes
security-scan:
needs: sentinel-review
if: |
contains(github.event.pull_request.labels.*.name, 'data-layer') ||
contains(steps.diff.outputs.changed_files, 'data/repository')
Auto-Correction Loop
The most advanced pattern is the self-healing pipeline: if Sentinel finds simple style issues (non-critical), it fixes them automatically, commits, and re-triggers the workflow.
- name: Auto-fix style issues
if: steps.review.outputs.has_style_issues == 'true'
run: |
python scripts/agents/sentinel_autofix.py \
--issues review_result.json \
--apply-fixes
git commit -am "fix(sentinel): auto-fix style issues [skip ci]"
git push
Watch out for infinite loops! Always add the
[skip ci]condition to agent commits, or implement a check that detects if the commit was made by the agent to avoid re-triggering the pipeline.
π Gradual Rollout: From Pilot to Production
Donβt implement all of this at once. A gradual rollout reduces risk and gives you time to calibrate your agents:
Weeks 1-2: Sentinel in COMMENT mode (non-blocking). Observe the quality of its reviews and adjust the prompt.
Weeks 3-4: Sentinel in REQUEST_CHANGES mode for critical issues. Add Scribe in read-only mode (generates KDoc but doesnβt commit it).
Week 5+: Full pipeline. Bolt active with conservative thresholds (30% regression before alerting). Fine-tune until you reach 10%.
This approach gives you time to trust your agents before granting them write permissions on the repository.
Conclusion
Integrating AI agents into your CI/CD pipeline isnβt about replacing your current process β itβs about adding a layer of specialized intelligence at exactly the moments where it delivers the most value. Sentinel ensures code consistency without the tech lead having to manually review every PR. Scribe makes sure documentation doesnβt fall behind the code. Bolt prevents performance regressions from reaching production undetected.
The AGENTS.md file acts as the contract that keeps all agents β human and AI β working under the same rules. And GitHub Actions is the orchestrator that decides when and in what order each one acts.
The natural next step is adding a planning agent that coordinates the others using a framework like CrewAI or LangGraph. But thatβs a story for another time.
References
You might also be interested in
Autonomous AI Agents in Android Development: Beyond the Assistant
How autonomous AI agents transform Android development: from multi-agent frameworks to pipelines that open PRs and run tests on their own.
Semantic Versioning in CI/CD: The Science of Continuous Delivery
Master semantic versioning in CI/CD pipelines. Learn to calculate versions automatically and ensure traceability in your Android deployments.
Automated Deployment to Google Play Store with GitHub Actions
Learn how to configure a robust Continuous Deployment pipeline that automatically compiles, signs, and publishes your Android App to Google Play Store.