Data-driven content tailoring using psychographic segmentation, machine learning, and self-improving feedback loops.
The psychographic segments below are derived from the MPA National Survey 2024 (n=10,003). This campaign uses that evidence base to create audience-specific content grounded in local context.
The personas below were derived from the MPA National Survey 2024 (n=10,003) and represent distinct psychographic segments with specific beliefs, values, and communication preferences. Content is never generic — every post is tailored to a specific audience.
| Persona | Size | Priority | Key Insight |
|---|---|---|---|
| Elder Progressives | 8.0% | HIGH | Support MPAs in principle but only 39% believe they work |
| Local Economy Advocates | 10.7% | HIGH | 81% believe MPAs work — need the economic case |
| Traditional Rural Elders | 12.9% | HIGH | Most skeptical (30% believe MPAs work), need local framing |
| Precarious Disengaged | 12.9% | MEDIUM | 45% "unsure" — need awareness first |
| Young Urban Bros | 12.3% | MEDIUM | 39% "depends" — need practical facts, no preaching |
| Conservationists | 20.8% | ALIGNED | 71% "very good idea" — organic amplifiers |
| Young Progressives | 10.5% | ALIGNED | 77% positive — social amplification |
| Free Speech Warriors | 6.8% | SKIP | 43% climate skeptics — not persuadable |
When generating content, the LLM receives a comprehensive brief for the target persona:
Every post is grounded in real research. The system doesn't fabricate facts — it extracts from a campaign-specific research document that provides the substance for each post. The persona determines the framing; the research provides the content.
The research documents uploaded to this campaign provide the factual substance. The system draws from different sections depending on the topic:
Each generated post draws from a different section of the research, so a single batch can cover policy, economics, and on-the-ground impacts. All framed through the same persona's values while teaching different concrete facts.
Factual source verification is built in. A cited Works Cited section (5 primary sources) provides hard numbers for zone assessments, economic impacts, federal commitments, species data, and survey findings. Every claim in generated content must trace to these sources.
Campaign-Interchangeable Architecture
Each campaign uploads three data layers that drive the entire system:
Upload your research data and personas, and the system handles the rest.
The system learns from three distinct feedback channels, each operating on a different timescale and requiring different thresholds.
1. Admin Feedback (Instant Rules)
When an admin identifies a problem in generated content, they can create a Universal Lesson that applies immediately to ALL future generation. No threshold required - one bad example = one rule.
These rules are injected into the generation prompt as MANDATORY RULES before persona-specific guidance.
2. Edit-Based Learning (Auto-Detected Patterns)
When an admin edits post copy, the system analyzes the diff to detect teachable patterns:
The API returns suggested lessons from detected edits. One click to make them universal rules.
3. Engagement Metrics (Statistical Learning)
When Meta API integration is active, the system pulls real engagement data:
This is the only feedback loop with a threshold - you need enough data to detect real patterns, not noise.
Where Lessons Live
All lessons are visible on the Analysis page in the "Lessons Learned" panel. Each lesson shows its source (admin rule, edit detection, or engagement analysis) and can be edited or deleted.
Every piece of content goes through three distinct LLM passes, each with a specific purpose. This catches errors that a single-pass system would miss.
Pass 1: Draft Generation
The LLM receives:
Output: Raw draft with title, copy, image direction, rationale, and research citations.
Pass 2: Editorial Review
A second LLM pass reviews the draft for:
Output: Revised copy with editorial notes explaining changes made.
Pass 3: Quality Gate (Self-Scoring)
A third LLM pass scores the content 0-10 on:
Threshold: 7/10. Posts scoring below 7 are automatically regenerated (up to 2 attempts). Posts that still fail are rejected and not saved to the database.
Programmatic Safety Net
In addition to LLM-based quality scoring, regex patterns detect hard fails that force rejection:
These patterns are detected after LLM scoring and override the score to 3 (hard fail). This ensures known AI tells never reach the dashboard, even if the LLM misses them.
Every post needs a relevant image. The system uses a multi-layer approach to find the best visual match for each piece of content:
Layer 1: Local Catalog (searched first)
A curated repository of campaign-ready images is organized by themes and tagged with locations, subjects, and persona relevance. Images are sourced from permissively licensed libraries. The LLM's "image direction" output drives a content-based tag search so visuals match the post topic, not generic stock scenery.
Layer 2: External API Search (fallback)
If no local image matches well enough, the system searches three external APIs:
Search queries are extracted from the LLM's image direction field, so each image search is specific to the post's content (for example, place-based geography, species, or community work contexts).
Deduplication
The system tracks recently used images across the current batch and the last 20 posts, penalizing reuse so each post gets a distinct visual.
The system doesn't use a single retrieval method. It combines three complementary approaches to find the best research context for each generation, selecting the optimal mode based on topic, persona, and available data.
Layer 1: Semantic Vector Search (RAG)
Campaign research documents are chunked by section heading, embedded using OpenAI's text-embedding-3-small model, and stored in a SQLite vector database (sqlite-vec). At generation time, the topic is embedded and compared against all chunks using cosine similarity. Results are re-ranked by:
Every retrieval event is logged with chunk ID, similarity score, tier, and whether the resulting content was approved or rejected — creating a feedback loop that improves source quality over time.
Layer 2: Knowledge Graph (RyuGraph)
An embedded RyuGraph (Kuzu-fork) graph database at data/graph.db captures cross-entity relationships that vector search alone can't surface. No external server needed — the graph runs in-process.
Graph context is injected alongside vector results, giving the LLM a richer, more interconnected understanding of the topic. Falls back gracefully to SQL-based entity lookups when the ryugraph package isn't installed.
Layer 3: Observer Agent Bypass
For topics the system has seen before, the Observer Agent provides a pre-compressed research briefing that bypasses both vector search and graph queries entirely:
The system automatically selects the best retrieval mode. The Primary Sources panel on each content card shows which method was used (Semantic search, Knowledge graph, Observer compression, or Full context).
Full Context Mode (CAG)
For models with large context windows, the system can load the entire research corpus into a single prompt with prompt caching enabled. This is the most thorough mode — the LLM sees every fact, every citation, every zone profile — but costs more tokens. Prompt caching (supported via Anthropic and OpenRouter) reduces repeat costs by 90% for the cached system prompt portion.
The Observer Agent is an autonomous background process that compresses research, maintains editorial memory, and continuously improves retrieval quality between generation cycles.
Research Compression
Raw campaign research can be 150,000+ characters across dozens of documents. The Observer compresses this into ~30,000 characters of structured observations per persona, focusing on the facts most relevant to each audience:
Editorial Memory
After each editorial review cycle, the Observer analyzes patterns in approvals, rejections, and edits:
Periodic Reflection
The Observer runs periodic maintenance tasks that keep the knowledge base healthy:
The infrastructure is built for Meta API integration:
Backend
data/graph.db (no external server needed; falls back to SQL-based entity lookups when ryugraph package isn't installed)passenger_wsgi.py)Frontend
LLM Providers (switchable via env var)
External APIs
Data Flow
Campaign research (MD files) → Chunked & embedded into vector store → Observer Agent compresses research (150K → 30K chars) → Hybrid retrieval (semantic + graph + observer) selects context → Three-pass LLM pipeline (Draft → Check & Fix → Quality Gate) → Image matching (local catalog → external APIs) → Human editorial review → Feedback loop → Next generation cycle
Live log of all editorial signals indexed for semantic retrieval. These entries are retrieved at generation time to help the AI learn from past editorial decisions.
Loading...