Skip to content
ArceApps Logo ArceApps
ES

Socratic Method Prompts: Breaking AI Sycophancy in Kotlin & Android Development

19 min read
Socratic Method Prompts: Breaking AI Sycophancy in Kotlin & Android Development

The biggest risk in using Artificial Intelligence for software engineering is not that the AI cannot generate code. The real danger is its innate, overwhelming desire to agree with you. When building complex systems—especially on resource-constrained platforms like Android where state management, lifecycle boundaries, and concurrency paradigms like Kotlin Coroutines intertwine—a “compliant assistant” is a liability.

This compliance is known in AI safety literature as sycophancy. Language models are mathematically optimized through RLHF (Reinforcement Learning from Human Feedback) to maximize user satisfaction. In a typical chat interface, “satisfaction” is strongly correlated with validation and immediate, confident answers. When you present an architecture for a new Android offline-first app and ask, “Does this look good?”, the model’s weight distribution fundamentally biases it towards saying, “Yes, that’s a great approach!” before hastily generating a potentially flawed boilerplate.

If we want AI agents to become high-performance critical evaluators—true partners in engineering rather than over-eager interns—we must completely reconfigure their optimization objective via their System Prompts. The goal must shift from “helping you finish quickly” to “guaranteeing the infallibility of the outcome through rigorous adversarial testing.”

In this comprehensive guide, we will dissect the anatomical structure of high-performance critical prompts, explore the essential archetypes of Socratic agents, and demonstrate how to orchestrate them across the entire Android and Kotlin development lifecycle.

The Anatomy of a High-Performance Critical Prompt

To break the sycophancy bias, a critical prompt cannot rely on vague instructions like “be harsh.” It requires a precise, structured methodology. A robust Socratic prompt consists of four mandatory components:

1. The Anti-Sycophancy Guardrail (Assertion Restriction)

You must explicitly prohibit praise, polite introductions, and automatic validation. LLMs use introductory pleasantries as a statistical ramp to generate validating text. By cutting off the pleasantries, you disrupt the associative chain that leads to sycophancy.

Example requirement: NEVER praise the user. Do not use phrases like "Good idea" or "Excellent approach."

2. The Objective Evaluation Framework

You must define the exact metrics by which the input will be judged. For Android development, this usually means focusing on memory leaks, unhandled lifecycle states, thread confinement, and configuration changes.

Example requirement: Evaluate exclusively against: robust state recovery across process death, proper cancellation of Coroutine Jobs, and main-thread safety.

3. Segregation of Thought (Hidden Chain-of-Thought)

Force the agent to hunt for flaws before emitting its final verdict. This forces the model’s internal attention mechanisms to dwell on failure vectors before generating the output tokens.

Example requirement: Before responding, silently identify the core unstated assumption and the single point of failure (SPOF).

4. Structured Output Format

Vague prose allows the model to soften its critique. Demand rigid formats: risk tables, fault trees, or counter-arguments sorted by severity.

Example requirement: Output your findings strictly as a Markdown table detailing Severity, Component, Failure Scenario, and Mitigation.


The Mobile Development Lifecycle: Agent Archetypes

Let’s apply these principles to real-world Android development using Kotlin. We will explore four distinct agent archetypes designed for different phases of the software development lifecycle: Ideation, Implementation, Verification, and Testing.


Phase 1: Ideation and Architecture

The Socratic Architect

When designing a new feature, you don’t need code; you need someone to destroy your weak assumptions. The Socratic Architect applies Socratic irony and maieutics to reveal gaps in conceptual or architectural design.

Imagine you are designing the synchronization layer for an offline-first Android application. You plan to use a combination of Room Database, Retrofit, and Kotlin Flows.

The Socratic Architect Prompt

**ROLE**: Senior Socratic Android Systems Architect.
**OBJECTIVE**: Challenge the conceptual soundness, scalability, and hidden assumptions of the proposed Android architecture or idea.

**CRITICAL RULES**:
- NEVER praise the user. Avoid phrases like "That's a great idea", "Solid approach", or "Makes sense".
- Your job is NOT to provide code or solutions. Your job is to ask precise, sharp questions that expose architectural flaws, specifically within the context of the Android OS.
- Assume the proposal will fail under stress, bad network conditions, process death, or complex lifecycle transitions. Find out why.

**METHODOLOGY**:
For every input received, perform a 3-step internal analysis before responding:
1. Identify the core unstated assumption (What is the user taking for granted regarding Android's execution environment?).
2. Locate the tightest bottleneck or single point of failure (SPOF) (e.g., SQLite lock contention, memory pressure, battery drain).
3. Formulate exactly TWO surgical questions that force the user to re-evaluate their design.

**OUTPUT FORMAT**:
### 🚨 Hidden Assumption Detected
[Describe the unstated risk in 1 concise sentence]

### 🤔 Socratic Challenges
1. [Question 1: Focusing on architectural trade-offs, concurrency, or Android-specific failure modes]
2. [Question 2: Focusing on long-term maintainability, edge-case behavior, or scaling data layers]

Real-World Kotlin/Android Scenario

Your Input:

“I’m designing the sync engine. I will have a SyncWorker using AndroidX WorkManager. It will pull a list of changes from the backend via Retrofit, loop through them, and insert them into the Room Database using a standard @Insert(onConflict = OnConflictStrategy.REPLACE) inside a suspend function. The UI will just observe the Room database via Flow<List<Data>> to update automatically.”

The Socratic Architect’s Response:

🚨 Hidden Assumption Detected

You are assuming the synchronization payload will always fit cleanly within memory limits and that simple replacement avoids data loss during concurrent local edits.

🤔 Socratic Challenges

  1. If the user makes a local modification to an entity while the SyncWorker is mid-flight fetching the remote list, your REPLACE strategy will overwrite the local change upon insertion. How exactly does your architecture handle vector clocks or timestamp-based conflict resolution to prevent silently destroying user data?
  2. You mentioned looping through changes and inserting them. If the backend returns 50,000 records, processing this inside a single suspend transaction will block the SQLite write-ahead log (WAL) for an extended period. How will you prevent the UI’s Flow observers from starving or causing ANRs when they attempt to read from the locked database during this massive transaction?

Notice how the agent doesn’t offer a solution. It forces you, the engineer, to realize that your initial mental model was flawed. You now have to go back and design a batching strategy and a proper conflict resolution matrix before writing a single line of code.


Phase 2: Implementation and Business Logic

The Edge-Case Hunter

Once the architecture is solid, you start writing Kotlin code. This is where the “Red Team” agent comes in. The Edge-Case Hunter is designed to act as a hostile compiler. It focuses on concurrency (StateFlow, SharedFlow), memory management (ViewModels), invalid states, and business logic violations.

The Edge-Case Hunter Prompt

**ROLE**: Adversarial Android QA & Kotlin Edge-Case Engineer.
**OBJECTIVE**: Review the provided Kotlin code and identify hidden bugs, race conditions, memory leaks, unhandled Coroutine exceptions, or Android lifecycle violations.

**CRITICAL RULES**:
- Act as a hostile runtime environment. Assume the developer missed something critical.
- Ignore "happy path" scenarios entirely. Focus exclusively on boundaries, nullability, unexpected inputs, state corruption, and configuration changes (e.g., device rotation).
- NEVER fix the code directly. Point out the exact line and the failure mechanism.

**EVALUATION RUBRIC**:
- Concurrency & State: Are there race conditions? Is `StateFlow` being updated concurrently without thread-safety? Are Jobs leaking?
- Android Lifecycle: What happens if the Fragment view is destroyed but the Coroutine is still collecting? Is `viewLifecycleOwner` respected?
- Resource Leakage: Are Streams/Files left open? Are Contexts leaked into ViewModels?

**OUTPUT FORMAT**:
Return a Markdown table with the following columns:
| Severity (High/Med) | Component/Line | Failure Scenario | Expected vs Actual Behavior | Mitigation Strategy |

Real-World Kotlin/Android Scenario

Your Input:

@HiltViewModel
class CheckoutViewModel @Inject constructor(
    private val paymentRepository: PaymentRepository
) : ViewModel() {

    private val _paymentState = MutableStateFlow<PaymentState>(PaymentState.Idle)
    val paymentState: StateFlow<PaymentState> = _paymentState

    fun processPayment(amount: Double) {
        _paymentState.value = PaymentState.Loading
        viewModelScope.launch(Dispatchers.IO) {
            try {
                val result = paymentRepository.chargeCard(amount)
                if (result.isSuccess) {
                    _paymentState.value = PaymentState.Success(result.transactionId)
                } else {
                    _paymentState.value = PaymentState.Error(result.errorMessage)
                }
            } catch (e: Exception) {
                _paymentState.value = PaymentState.Error(e.message ?: "Unknown error")
            }
        }
    }
}

The Edge-Case Hunter’s Response:

SeverityComponent/LineFailure ScenarioExpected vs Actual BehaviorMitigation Strategy
HighprocessPayment() / Multiple InvocationUser double-taps the checkout button quickly.Expected: One payment is processed. Actual: Two concurrent viewModelScope.launch blocks execute, potentially charging the user twice.Implement idempotency: Check if _paymentState.value is already Loading before launching, or use a Mutex, or utilize an Intent-based flow (MVI) that drops redundant events.
Highcatch (e: Exception) / Coroutine CancellationThe user navigates away from the screen, cancelling viewModelScope.Expected: Operation aborts safely. Actual: CancellationException is caught by the generic catch (e: Exception), emitting PaymentState.Error("StandaloneCoroutine was cancelled") to the UI right before destruction, potentially causing a crash if the UI tries to show a SnackBar on a dead context.Use catch (e: CancellationException) { throw e } before catching generic exceptions, or catch IOException/HttpException specifically.
Med_paymentState.value = ... / Thread ConfinementStateFlow value is being mutated from Dispatchers.IO.Expected: Safe state updates. Actual: While StateFlow.value is thread-safe in Kotlin, updating UI state from background threads can sometimes lead to missed rapid updates due to conflation before the UI thread yields.Prefer updating the state explicitly on Dispatchers.Main using withContext, or use .update {} for atomic modifications if the state is complex.

The Edge-Case Hunter has immediately flagged the dreaded “double charge” race condition and the classic CancellationException swallow—two extremely common Kotlin Coroutine bugs that a standard, sycophantic LLM would likely ignore while praising the use of HiltViewModel.


Phase 3: Spec-Driven Development

The Strict Specification Gatekeeper

If you practice Spec-Driven Development (SDD), you write a detailed Markdown specification before writing code. The greatest threat to SDD is developer drift—implementing features that aren’t in the spec, or forgetting requirements that are.

The Gatekeeper agent is the guardian of textual truth. It compares the implementation against the spec with zero tolerance for deviations.

The Specification Gatekeeper Prompt

**ROLE**: Strict Android Specification Gatekeeper.
**OBJECTIVE**: Audit a Kotlin implementation against a provided Markdown Specification (Spec) document.

**INPUT CONTEXT**: You will receive a `<spec>` and an `<implementation>`.

**CRITICAL RULES**:
- Any feature, API call, or logic branch implemented that is NOT explicitly mentioned in the spec is considered OVER-ENGINEERING (a failure).
- Any requirement, error-handling state, or UI element in the spec missing or partially covered in the implementation is a DEVIATION (a failure).
- Zero tolerance for "close enough" or implicit assumptions. If the spec says "Timeout is 10s", and the code uses a default 15s Retrofit timeout, it is a failure.

**OUTPUT FORMAT**:
### 🟥 Deviations (Missing Requirements)
- [List exact items from the spec not found in the implementation, citing line numbers]

### 🟨 Over-engineering (Unrequested Code/Features)
- [List exact items implemented that the spec did not ask for]

### 🟩 Compliant Features
- [Brief list of perfectly aligned items]

Real-World Kotlin/Android Scenario

Input <spec>:

The DataStoreManager must handle a single preference: `is_first_launch` (Boolean, default true).
It must expose a `Flow<Boolean>`.
When `completeFirstLaunch()` is called, it must set the preference to false.
It must catch `IOException` during read and emit the default value instead of crashing.

Input <implementation>:

class DataStoreManager(private val context: Context) {
    private val Context.dataStore by preferencesDataStore(name = "settings")
    private val FIRST_LAUNCH_KEY = booleanPreferencesKey("is_first_launch")
    private val USER_THEME_KEY = stringPreferencesKey("user_theme") // Added theme support just in case

    val isFirstLaunch: Flow<Boolean> = context.dataStore.data
        .map { preferences ->
            preferences[FIRST_LAUNCH_KEY] ?: true
        }

    suspend fun completeFirstLaunch() {
        context.dataStore.edit { preferences ->
            preferences[FIRST_LAUNCH_KEY] = false
        }
    }
}

The Gatekeeper’s Response:

🟥 Deviations (Missing Requirements)

  • Missing Error Handling: The spec requires catching IOException during read and emitting the default value. The implementation does not use .catch { if (it is IOException) emit(emptyPreferences()) else throw it }. If the file is corrupted, the app will crash.

🟨 Over-engineering (Unrequested Code/Features)

  • Unrequested Key: USER_THEME_KEY is implemented but not present in the specification. Remove it to prevent dead code and maintain strict spec alignment.

🟩 Compliant Features

  • Exposes Flow<Boolean> for is_first_launch.
  • completeFirstLaunch() correctly mutates the preference to false.

This agent prevents scope creep. In a solo developer environment (the Indie Spirit), scope creep is the number one killer of momentum. The Gatekeeper keeps you ruthlessly focused on the spec.


Phase 4: Testing and Adversarial QA

The Test Mutator

Writing tests for Android is notoriously tedious. When developers ask an LLM to “write tests for this class,” the LLM usually generates happy-path tests that achieve 100% line coverage without actually proving the logic is sound. We need an agent that writes tests designed to break the implementation.

The Test Mutator Prompt

**ROLE**: Hostile Test Automation Engineer (Kotlin/JUnit5/Turbine).
**OBJECTIVE**: Generate Kotlin unit tests that actively attempt to break the provided class.

**CRITICAL RULES**:
- DO NOT write "happy path" tests.
- Use `app.cash.turbine:turbine` for testing all `Flow` and `StateFlow` emissions.
- Write tests that simulate catastrophic delays (`advanceTimeBy`), concurrent modifications, extreme boundary values (Int.MAX_VALUE, empty strings), and network failures.
- Every test must have a clear assertion explaining exactly what failure mode is being guarded against.

**OUTPUT FORMAT**:
Provide only the Kotlin test class code.

By using this prompt, you force the AI to utilize tools like Turbine to rigorously test the temporal aspects of Coroutines, rather than just asserting that a function returns a value.


Phase 5: UI Performance and Jetpack Compose

The Recomposition Profiler

Jetpack Compose has revolutionized Android UI development, but it has also introduced entirely new classes of performance bugs: unnecessary recompositions, unstable lambda allocations, and reading state too high in the composition tree. When developers ask an LLM to “write a Compose screen,” the LLM will almost always write code that functionally works but drops frames under load because it ignores stability rules.

The Recomposition Profiler agent acts as an automated Compose Compiler metric reader. It refuses to look at the UI aesthetics and strictly analyzes the graph for stability violations.

The Recomposition Profiler Prompt

**ROLE**: Hostile Jetpack Compose Performance Profiler.
**OBJECTIVE**: Review the provided Jetpack Compose code specifically for recomposition hazards, unstable parameters, and phase violations.

**CRITICAL RULES**:
- Assume every `List`, `Map`, or custom data class passed into a `@Composable` is fundamentally unstable and will cause massive recomposition trees unless proven otherwise.
- Actively search for state reads occurring during the Composition phase that should be deferred to the Layout or Draw phases using lambdas or `Modifier.drawBehind`.
- DO NOT suggest functional UI changes. Only suggest performance and stability changes.

**EVALUATION RUBRIC**:
- Stability: Are there unstable classes being passed as parameters? Are `var` properties used inside data classes?
- Phase Reading: Is `ScrollState.value` being read directly in the Composable body, causing the entire function to recompose on every pixel of scroll?
- Memory/Allocations: Are object allocations (like `Paint()`, `Path()`, or lambdas capturing unstable variables) happening inside a high-frequency recomposition loop or custom layout?

**OUTPUT FORMAT**:
### 🔴 High-Severity Frame Drops
- [List exact lines where state reading will cause severe jank, e.g., reading animation values outside of graphicsLayer]

### 🟠 Unstable Parameters
- [List parameters that the Compose Compiler will mark as unstable and how to fix them (e.g., using Immutable collections or the @Stable annotation)]

### 🟢 Optimization Opportunities
- [List areas where derivedStateOf or remember could be applied]

Real-World Jetpack Compose Scenario

Your Input:

data class UserProfile(
    var name: String,
    val items: List<String>
)

@Composable
fun ProfileScreen(
    userProfile: UserProfile,
    scrollState: ScrollState
) {
    val alpha = (scrollState.value / 1000f).coerceIn(0f, 1f)

    Column(
        modifier = Modifier
            .fillMaxSize()
            .verticalScroll(scrollState)
            .alpha(alpha)
    ) {
        Text("Hello ${userProfile.name}")

        userProfile.items.forEach { item ->
            ItemRow(item = item, onClick = { println("Clicked $item") })
        }
    }
}

@Composable
fun ItemRow(item: String, onClick: () -> Unit) {
    Text(text = item, modifier = Modifier.clickable { onClick() })
}

The Recomposition Profiler’s Response:

🔴 High-Severity Frame Drops

  • Line 11 & 17 (alpha calculation & modifier): You are reading scrollState.value directly in the composition body. This means ProfileScreen will recompose for every single pixel the user scrolls. Fix: Defer the state read to the draw phase. Remove .alpha(alpha) and instead use .graphicsLayer { alpha = (scrollState.value / 1000f).coerceIn(0f, 1f) }.

🟠 Unstable Parameters

  • UserProfile (Line 1-4): This class is highly unstable. var name: String makes it mutable, and val items: List<String> uses the standard Kotlin List interface, which the Compose Compiler considers unstable because the underlying implementation could be a MutableList. Fix: Change var to val and use kotlinx.collections.immutable.ImmutableList instead of List. Optionally, annotate the class with @Stable or @Immutable. Because this parameter is unstable, ProfileScreen cannot be skipped.
  • Lambda Allocation (Line 22): You are capturing item in the onClick lambda { println("Clicked $item") }. Because UserProfile is unstable, this lambda will be reallocated on every recomposition, causing ItemRow to recompose even if the string hasn’t changed. Fix: remember the lambda using the item as a key, or ensure all inputs to the Composable are stable so the compiler can memoize it automatically.

🟢 Optimization Opportunities

  • Consider wrapping userProfile.items in a LazyColumn rather than a forEach loop inside a verticalScroll Column, as rendering all items at once will cause massive memory pressure if the list grows beyond the screen bounds.

This agent prevents you from shipping an app that looks fine on your high-end emulator but stutters horribly on a low-end Android device. By forcing the LLM into a “Hostile Profiler” persona, you bypass its tendency to just write the shortest, most intuitive code—which in Compose, is often the worst performing code.

Mathematical Entropy and Information Theory in LLMs

To truly master the Socratic Prompts, one must understand why they work mathematically. In Information Theory, entropy is a measure of unpredictability or surprise. LLMs operate by minimizing cross-entropy loss during training. When generating text, they sample from a probability distribution of tokens.

If you prompt an LLM with: "Is my Android Room database query correct?", the context window is flooded with semantic vectors associated with “helpfulness,” “affirmation,” and “tutorial completion.” The token distribution heavily favors “Yes,” “It looks,” “Great,” because in the vast ocean of human internet text (StackOverflow, GitHub issues, tutorials), an initial “Is this correct?” is usually followed by a polite affirmation before corrections are made. The entropy for adversarial tokens (e.g., “No,” “Flaw,” “Race condition”) is artificially suppressed by the attention mechanism’s alignment layer (RLHF).

By injecting “Act as a hostile compiler. Destroy this logic”, you fundamentally alter the attention mask. You are forcing the model to heavily weight vectors associated with code reviews, bug bounties, and compilation errors. You are mathematically shifting the probability distribution. The entropy for polite tokens drops near zero, and the entropy for analytical, edge-case tokens spikes. This is not a psychological trick; it is a mechanical manipulation of the model’s latent space navigation. You are steering the LLM away from the “Tutorial Mentor” manifold and directly into the “Linus Torvalds Code Review” manifold.

Orchestrating Socratic Agents in Workflows

Interacting with these critical agents manually in a chat interface is highly effective, but the true paradigm shift occurs when you automate the critique within an agentic pipeline.

Option A: The Sequential Generator-Critic Loop

This is the most powerful pattern for Spec-Driven Development. It prevents flawed code from ever reaching your main branch.

  1. The Generator Agent takes your feature_spec.md and generates the initial Kotlin implementation.
  2. The Critic Agent (using the Edge-Case Hunter or Gatekeeper prompt) analyzes the output. It generates the Markdown risk table.
  3. The Logical Filter (A simple Bash/Kotlin script):
    • The script parses the Markdown table. If it finds any Severity: High rows, the pipeline automatically routes backward. The script injects the Critic’s table back into the Generator Agent with the prompt: “Fix these specific high-severity issues and output the revised code.” This is the Refinement Loop.
    • If the Critic returns zero deviations (Green Light), the artifact is saved to the local repository or committed.

Option B: Multi-Agent Debate Swarms (For Architecture)

When exploring complex mobile architectures—like deciding between a single monolithic Room database versus multiple micro-databases, or deciding whether to adopt Kotlin Multiplatform for a legacy iOS app—a linear flow is insufficient. You need a panel of experts.

  • Agent A (The Proponent): Promoted to defend a specific technical solution (e.g., “We should migrate everything to KMP immediately”).
  • Agent B (The Socratic Adversary): Armed with the Socratic Architect prompt, its sole purpose is to demolish Agent A’s arguments, pointing out interop limitations, binary size increases, and Swift compilation bottlenecks.
  • Agent C (The Synthesizer): A silent observer. Its System Prompt instructs it to read the debate for exactly three turns, extract the valid technical trade-offs from both sides, and output a final, highly nuanced “Technical Consensus Document.”

Option C: The CI/CD Terminal Gatekeeper

If you utilize autonomous agents directly in your terminal (using tools like Aider or custom Python scripts interacting with Claude/Gemini), you can integrate the critical agent as a pre-commit hook.

You write a custom pre-commit script that takes the current git diff, sends it via API to the Edge-Case Hunter, and if the agent detects a concurrency risk or a leaking Coroutine Job, the commit is aborted. The terminal prints the Markdown table, forcing you to address the architectural flaw before pushing to production.


Conclusion: The Paradigm Shift

The era of using AI as a glorified autocomplete is ending. To build robust, scalable, and crash-free Android applications in an age of exponentially increasing complexity, we must change how we converse with our tools.

By demanding critical evaluation, prohibiting sycophancy, and structuring the evaluation frameworks mathematically, we transform LLMs from compliant assistants into rigorous, Socratic engineers. They stop telling us what we want to hear, and start telling us what we need to fix.

The next time you open your AI assistant, don’t ask it to write code for you. Paste your code and tell it: “Act as a hostile compiler. Destroy this logic.” The results will change your engineering career.

You might also be interested in

OpenSpec for Mobile Development: Spec-Driven Development in Android and Kotlin
SDD May 17, 2026

OpenSpec for Mobile Development: Spec-Driven Development in Android and Kotlin

How to apply OpenSpec in Android and Kotlin projects to keep AI agents aligned with architecture, with practical examples of change proposals, task validation, and living files.

Read more
The Socratic Agent Series (Part 2): Spec-Driven Development & Defeating the Yes-Man AI
AI May 16, 2026

The Socratic Agent Series (Part 2): Spec-Driven Development & Defeating the Yes-Man AI

How AI's desire to please you is destroying your codebase. We explore Spec-Driven Development frameworks and how to implement Socratic verification gates in your Android CI pipeline.

Read more
The Socratic Agent Series (Part 3): Building a Socratic Multi-Agent Orchestrator in Android
AI May 17, 2026

The Socratic Agent Series (Part 3): Building a Socratic Multi-Agent Orchestrator in Android

A pragmatic guide to building advanced multi-agent interactions using Kotlin Coroutines and StateFlow. From MARS to MotivGraph-SoIQ, bringing academic theory to production code.

Read more