Strong Coast

Campaign Brain

| How It Works
Dashboard

How It Works

Data-driven content tailoring using psychographic segmentation, machine learning, and self-improving feedback loops.

The psychographic segments below are derived from the MPA National Survey 2024 (n=10,003). This campaign uses that evidence base to create audience-specific content grounded in local context.

Content Engine — Continuous Improvement Loop

Each cycle improves the next
Preloaded Data
Survey Data
Campaign-specific respondent base
Segmentation
8 psychographic segments
Campaign Research
Strategic compendium + factual sources
Persona Profiles
Language, tone, beliefs
Connected Services
LLM Providers
OpenRouter (active)
Anthropic Claude
Google Gemini • OpenAI
Ollama • DeepSeek
Hybrid RAG + Graph
Semantic vector search
Knowledge graph (RyuGraph)
Observer Agent cache
Full context (CAG)
External APIs
Meta (FB + IG)
Image search (3 APIs)
ElevenLabs TTS
Embedding API
+ Generate
Micro-Targeted Content Creation
Hybrid Retrieval & Tailoring
RAG + Graph + Observer → persona framing
auto-check
Validation Gate
Second LLM agent validates work
against campaign rules & modifies
before passing to user
Editorial Review
Human edit then approve / reject
📅 Schedule
Scheduling, Publishing & Targeted Advertising
Schedule posts, publish directly to Facebook & Instagram, run targeted ads — all via Meta API
Measurement & Learning
Editorial Feedback
Scoring — approve/reject/edit → −1 to +1
Rejection reasons — pattern analysis
Learned rules — "do more / do less" per persona
Performance Data
Engagement — reach, clicks, shares, comments
Conversions — actions taken from content
Composite score — editorial + engagement blend
Measurement feeds back into next cycle
Top posts become examples • editorial patterns become rules • engagement data refines targeting

Psychographic Personas

The personas below were derived from the MPA National Survey 2024 (n=10,003) and represent distinct psychographic segments with specific beliefs, values, and communication preferences. Content is never generic — every post is tailored to a specific audience.

PersonaSizePriorityKey Insight
Elder Progressives8.0%HIGHSupport MPAs in principle but only 39% believe they work
Local Economy Advocates10.7%HIGH81% believe MPAs work — need the economic case
Traditional Rural Elders12.9%HIGHMost skeptical (30% believe MPAs work), need local framing
Precarious Disengaged12.9%MEDIUM45% "unsure" — need awareness first
Young Urban Bros12.3%MEDIUM39% "depends" — need practical facts, no preaching
Conservationists20.8%ALIGNED71% "very good idea" — organic amplifiers
Young Progressives10.5%ALIGNED77% positive — social amplification
Free Speech Warriors6.8%SKIP43% climate skeptics — not persuadable

What the System Knows About Each Persona

When generating content, the LLM receives a comprehensive brief for the target persona:

📚 Campaign Research & Issue Data

Every post is grounded in real research. The system doesn't fabricate facts — it extracts from a campaign-specific research document that provides the substance for each post. The persona determines the framing; the research provides the content.

The research documents uploaded to this campaign provide the factual substance. The system draws from different sections depending on the topic:

Each generated post draws from a different section of the research, so a single batch can cover policy, economics, and on-the-ground impacts. All framed through the same persona's values while teaching different concrete facts.

Factual source verification is built in. A cited Works Cited section (5 primary sources) provides hard numbers for zone assessments, economic impacts, federal commitments, species data, and survey findings. Every claim in generated content must trace to these sources.

Campaign-Interchangeable Architecture

Each campaign uploads three data layers that drive the entire system:

  • Polling data — survey responses that feed psychographic segmentation for the active campaign
  • Psychographic research — persona profiles with beliefs, language rules, messaging strategies, and targeting priorities derived from the polling
  • Issue research — campaign-specific knowledge documents (status reports, ecological data, policy analysis) that provide the factual substance for content

Upload your research data and personas, and the system handles the rest.

The Self-Improving Feedback Loop

The system learns from three distinct feedback channels, each operating on a different timescale and requiring different thresholds.

1. Admin Feedback (Instant Rules)

When an admin identifies a problem in generated content, they can create a Universal Lesson that applies immediately to ALL future generation. No threshold required - one bad example = one rule.

  • Avoid rules - "Never use em dashes", "Avoid 'it's not X, it's Y' patterns"
  • Require rules - "Always include humans in image directions"
  • Style rules - Tone and framing preferences

These rules are injected into the generation prompt as MANDATORY RULES before persona-specific guidance.

2. Edit-Based Learning (Auto-Detected Patterns)

When an admin edits post copy, the system analyzes the diff to detect teachable patterns:

  • Em dash removal detected? Suggest "avoid em dashes" lesson
  • AI reframing pattern removed? Suggest corresponding rule
  • Smart quotes converted to straight quotes? Log the pattern

The API returns suggested lessons from detected edits. One click to make them universal rules.

3. Engagement Metrics (Statistical Learning)

When Meta API integration is active, the system pulls real engagement data:

  • Reach, impressions, clicks, shares, comments
  • Posts scored based on engagement vs. baseline for that persona
  • Statistical analysis after 3+ high-performing and 3+ low-performing posts

This is the only feedback loop with a threshold - you need enough data to detect real patterns, not noise.

Where Lessons Live

All lessons are visible on the Analysis page in the "Lessons Learned" panel. Each lesson shows its source (admin rule, edit detection, or engagement analysis) and can be edited or deleted.

Three-Pass Generation Pipeline (Detail)

Every piece of content goes through three distinct LLM passes, each with a specific purpose. This catches errors that a single-pass system would miss.

Pass 1: Draft Generation

The LLM receives:

  • Full persona profile (demographics, beliefs, effective messages, language rules)
  • Universal lessons (MANDATORY RULES section)
  • Content stream guardrails (Stream A-E with specific failure modes)
  • Research document excerpts (retrieved via hybrid RAG — see architecture below)
  • Gold standard exemplars (top-scoring approved posts for this persona)
  • Learned preferences (do more / do less patterns from editorial history)

Output: Raw draft with title, copy, image direction, rationale, and research citations.

Pass 2: Editorial Review

A second LLM pass reviews the draft for:

  • Hallucination check - Are all claims traceable to the research document?
  • Tone violations - Does it match the persona's voice?
  • Persona leaks - Any segment labels, poll data, or internal jargon exposed?
  • Stream compliance - Does it follow the content stream's guardrails?
  • AI patterns - Em dashes, reframing cliches, defensive framing?

Output: Revised copy with editorial notes explaining changes made.

Pass 3: Quality Gate (Self-Scoring)

A third LLM pass scores the content 0-10 on:

  • Factual accuracy - Claims verified against source documents
  • Persona alignment - Voice, tone, framing match
  • Stream compliance - Follows content stream guardrails
  • Engagement potential - Is this actually interesting?

Threshold: 7/10. Posts scoring below 7 are automatically regenerated (up to 2 attempts). Posts that still fail are rejected and not saved to the database.

Programmatic Safety Net

In addition to LLM-based quality scoring, regex patterns detect hard fails that force rejection:

  • Em dashes (—) anywhere in copy
  • "It's not X, it's Y" and "isn't just" reframing patterns
  • En dashes (–) in body copy

These patterns are detected after LLM scoring and override the score to 3 (hard fail). This ensures known AI tells never reach the dashboard, even if the LLM misses them.

📷 Intelligent Image Sourcing

Every post needs a relevant image. The system uses a multi-layer approach to find the best visual match for each piece of content:

Layer 1: Local Catalog (searched first)

A curated repository of campaign-ready images is organized by themes and tagged with locations, subjects, and persona relevance. Images are sourced from permissively licensed libraries. The LLM's "image direction" output drives a content-based tag search so visuals match the post topic, not generic stock scenery.

Layer 2: External API Search (fallback)

If no local image matches well enough, the system searches three external APIs:

  • Wikimedia Commons — free, high-quality scientific and nature photography (no API key required)
  • Pexels — professional stock photography with generous free API tier
  • Unsplash — high-quality creative photography with free API tier

Search queries are extracted from the LLM's image direction field, so each image search is specific to the post's content (for example, place-based geography, species, or community work contexts).

Deduplication

The system tracks recently used images across the current batch and the last 20 posts, penalizing reuse so each post gets a distinct visual.

🔌 Hybrid Graph RAG Architecture

The system doesn't use a single retrieval method. It combines three complementary approaches to find the best research context for each generation, selecting the optimal mode based on topic, persona, and available data.

Layer 1: Semantic Vector Search (RAG)

Campaign research documents are chunked by section heading, embedded using OpenAI's text-embedding-3-small model, and stored in a SQLite vector database (sqlite-vec). At generation time, the topic is embedded and compared against all chunks using cosine similarity. Results are re-ranked by:

  • Source tier boost — Tier 1 (core vetted) gets 1.5x, Tier 2 (team-reviewed) gets 1.2x, Tier 4 (unreviewed) gets 0.6x
  • Temporal decay — older chunks lose 3% relevance per reflection cycle, keeping fresh research prominent
  • Persona affinity — chunks that performed well for a specific persona are boosted for that persona's future generations
  • Topic canonicalization — a 22-topic taxonomy maps free-text topics to canonical categories for consistent retrieval

Every retrieval event is logged with chunk ID, similarity score, tier, and whether the resulting content was approved or rejected — creating a feedback loop that improves source quality over time.

Layer 2: Knowledge Graph (RyuGraph)

An embedded RyuGraph (Kuzu-fork) graph database at data/graph.db captures cross-entity relationships that vector search alone can't surface. No external server needed — the graph runs in-process.

  • Entity nodes — zones, species, communities, threats, governance structures
  • Relationship edges — "salmon spawn in zone 310", "LNG tankers threaten whale habitat", "First Nation co-governs MPA"
  • Path queries — given a topic like "salmon recovery", the graph traverses connections to find related economic impacts, governance frameworks, and community stories that vector search might miss
  • Cross-persona transfer — topic insights that work for one persona can be discovered and adapted for others via graph traversal
  • Research gap detection — identifies under-covered topics and content lineage across generation cycles

Graph context is injected alongside vector results, giving the LLM a richer, more interconnected understanding of the topic. Falls back gracefully to SQL-based entity lookups when the ryugraph package isn't installed.

Layer 3: Observer Agent Bypass

For topics the system has seen before, the Observer Agent provides a pre-compressed research briefing that bypasses both vector search and graph queries entirely:

  • Cached observations — previous research compressions are stored per persona + topic hash
  • Cache hits skip the entire retrieval pipeline, saving embedding API calls and latency
  • Cache misses trigger fresh compression, then cache the result for future use

The system automatically selects the best retrieval mode. The Primary Sources panel on each content card shows which method was used (Semantic search, Knowledge graph, Observer compression, or Full context).

Full Context Mode (CAG)

For models with large context windows, the system can load the entire research corpus into a single prompt with prompt caching enabled. This is the most thorough mode — the LLM sees every fact, every citation, every zone profile — but costs more tokens. Prompt caching (supported via Anthropic and OpenRouter) reduces repeat costs by 90% for the cached system prompt portion.

👁 Observer Agent

The Observer Agent is an autonomous background process that compresses research, maintains editorial memory, and continuously improves retrieval quality between generation cycles.

Research Compression

Raw campaign research can be 150,000+ characters across dozens of documents. The Observer compresses this into ~30,000 characters of structured observations per persona, focusing on the facts most relevant to each audience:

  • Compression ratios of 5:1 to 8:1 while preserving all cited statistics and source references
  • Persona-specific framing — economic data emphasized for "local economy advocates", ecological data for "progressive elders"
  • Results cached by persona + topic hash for instant retrieval on repeat topics

Editorial Memory

After each editorial review cycle, the Observer analyzes patterns in approvals, rejections, and edits:

  • Mini editorial reflections — an LLM summarizes recent feedback into actionable rules ("this persona responds well to economic framing but rejects governance-heavy posts")
  • Preference extraction — "do more" / "do less" rules derived from approval/rejection patterns, scoped by persona and topic
  • Stale preference pruning — preferences older than 90 days with low evidence counts are automatically deactivated to prevent outdated rules from persisting

Periodic Reflection

The Observer runs periodic maintenance tasks that keep the knowledge base healthy:

  • Temporal decay — applies a 0.97x multiplier to relevance scores on older RAG chunks, ensuring fresh research surfaces above stale data
  • Low-relevance pruning — observations that consistently fail to contribute to approved content are expired
  • Cache metrics — hit/miss rates, compression ratios, and retrieval performance are logged for the pipeline monitoring dashboard

Publishing & Analytics (Coming Soon)

The infrastructure is built for Meta API integration:

Under the Hood

Backend

  • Python 3 + FastAPI — async API server
  • SQLite + sqlite-vec — content database + vector embeddings for RAG
  • RyuGraph / Kuzu — embedded knowledge graph at data/graph.db (no external server needed; falls back to SQL-based entity lookups when ryugraph package isn't installed)
  • Hosted on Dreamhost via FastCGI (passenger_wsgi.py)

Frontend

  • Vanilla JS (no framework), Chart.js 4.4 for dashboards
  • 8 pages — Dashboard, Analysis, Settings, Pipeline Intelligence, Admin, Calendar, How It Works, Persona Detail

LLM Providers (switchable via env var)

  • OpenRouter — currently active (Claude Sonnet 4.5 with prompt caching for cost efficiency)
  • Anthropic, Google Gemini, OpenAI, Ollama, DeepSeek, Kimi — provider-agnostic switching with per-call token usage and cost tracking

External APIs

  • Meta Graph API — Facebook / Instagram publishing + analytics
  • Wikimedia Commons, Pexels, Unsplash — image search
  • ElevenLabs — text-to-speech
  • OpenAI Embeddings API — text-embedding-3-small for RAG

Data Flow

Campaign research (MD files) → Chunked & embedded into vector store → Observer Agent compresses research (150K → 30K chars) → Hybrid retrieval (semantic + graph + observer) selects context → Three-pass LLM pipeline (Draft → Check & Fix → Quality Gate) → Image matching (local catalog → external APIs) → Human editorial review → Feedback loop → Next generation cycle

Learning Index

Live log of all editorial signals indexed for semantic retrieval. These entries are retrieved at generation time to help the AI learn from past editorial decisions.

Loading...