How It Works

Data-driven content tailoring using psychographic segmentation, machine learning, and self-improving feedback loops.

The psychographic segments below are derived from the MPA National Survey 2024 (n=10,003). This campaign uses that evidence base to create audience-specific content grounded in local context.

Content Engine — Continuous Improvement Loop

▲ ▲ ▲

▼ ▼ ▼

↻ Each cycle improves the next

Preloaded Data

Survey Data
Campaign-specific respondent base

Segmentation
8 psychographic segments

Campaign Research
Strategic compendium + factual sources

Persona Profiles
Language, tone, beliefs

▼

Connected Services

LLM Providers

OpenRouter (active)
Anthropic Claude
Google Gemini • OpenAI
Ollama • DeepSeek

Hybrid RAG + Graph

Semantic vector search
Knowledge graph (LadybugDB)
Observer Agent cache
Full context (CAG)

External APIs

Meta (FB + IG)
Image search (3 APIs)
ElevenLabs TTS
Embedding API

▼

+ Generate

▼

Micro-Targeted Content Creation

Hybrid Retrieval & Tailoring
RAG + Graph + Observer → persona framing

→

auto-check

←

Validation Gate
Second LLM agent validates work
against campaign rules & modifies
before passing to user

▼

Editorial Review
Human edit then approve / reject

▼

📅 Schedule

▼

Scheduling, Publishing & Targeted Advertising
Schedule posts, publish directly to Facebook & Instagram, run targeted ads — all via Meta API

▼

Measurement & Learning

Editorial Feedback

● Scoring — approve/reject/edit → −1 to +1

● Rejection reasons — pattern analysis

● Learned rules — "do more / do less" per persona

Performance Data

● Engagement — reach, clicks, shares, comments

● Conversions — actions taken from content

● Composite score — editorial + engagement blend

▼

⟵ Measurement feeds back into next cycle ⟵

Top posts become examples • editorial patterns become rules • engagement data refines targeting

◉ Psychographic Personas

The personas below were derived from the MPA National Survey 2024 (n=10,003) and represent distinct psychographic segments with specific beliefs, values, and communication preferences. Content is never generic — every post is tailored to a specific audience.

Persona	Size	Priority	Key Insight
Elder Progressives	8.0%	HIGH	Support MPAs in principle but only 39% believe they work
Local Economy Advocates	10.7%	HIGH	81% believe MPAs work — need the economic case
Traditional Rural Elders	12.9%	HIGH	Most skeptical (30% believe MPAs work), need local framing
Precarious Disengaged	12.9%	MEDIUM	45% "unsure" — need awareness first
Young Urban Bros	12.3%	MEDIUM	39% "depends" — need practical facts, no preaching
Conservationists	20.8%	ALIGNED	71% "very good idea" — organic amplifiers
Young Progressives	10.5%	ALIGNED	77% positive — social amplification
Free Speech Warriors	6.8%	SKIP	43% climate skeptics — not persuadable

⚙ What the System Knows About Each Persona

When generating content, the LLM receives a comprehensive brief for the target persona:

Who they are — demographics, worldview, life situation
Core beliefs — what they think about the issue
Effective messages — what framing and arguments resonate
Ineffective messages — what turns them off or alienates them
Exact language to use — specific phrases that work
Language to avoid — words that will lose them
Tone and emotional appeal — derived from their position on the trust/concern axes
Past performance — top-scoring approved posts as style references
Learned preferences — "do more / do less" rules from editorial feedback

📚 Campaign Research & Issue Data

Every post is grounded in real research. The system doesn't fabricate facts — it extracts from a campaign-specific research document that provides the substance for each post. The persona determines the framing; the research provides the content.

The research documents uploaded to this campaign provide the factual substance. The system draws from different sections depending on the topic:

Zone Profiles — Northern Shelf Bioregion zones 100-506 with ecological features, protection status, community context, and species inventories per zone
Marine Species & Ecology — 84 species at risk, five Pacific salmon species, humpback/fin/killer whale habitat, glass sponge reefs, eelgrass meadows, rockfish conservation areas
Communities & First Nations — 17 First Nations as signatories, 15 as co-governors. Guardian programs, trilateral partnership structure, community-based monitoring
Threats — LNG tanker routes through Caamano Sound, open-net salmon farm transition, marine traffic noise, spill risk, climate-driven ocean acidification
Governance & Policy — $335M federal MPA Network commitment, DFO co-management framework, consent-based decision making, Indigenous Protected and Conserved Areas
Economic Data — 3,000+ projected jobs, $40M+ already flowing to communities, commercial and recreational fishing economy, tourism and ecotourism revenue

Each generated post draws from a different section of the research, so a single batch can cover policy, economics, and on-the-ground impacts. All framed through the same persona's values while teaching different concrete facts.

Factual source verification is built in. A cited Works Cited section (5 primary sources) provides hard numbers for zone assessments, economic impacts, federal commitments, species data, and survey findings. Every claim in generated content must trace to these sources.

Campaign-Interchangeable Architecture

Each campaign uploads three data layers that drive the entire system:

Polling data — survey responses that feed psychographic segmentation for the active campaign
Psychographic research — persona profiles with beliefs, language rules, messaging strategies, and targeting priorities derived from the polling
Issue research — campaign-specific knowledge documents (status reports, ecological data, policy analysis) that provide the factual substance for content

Upload your research data and personas, and the system handles the rest.

↺ The Self-Improving Feedback Loop

The system learns from three distinct feedback channels, each operating on a different timescale and requiring different thresholds.

1. Admin Feedback (Instant Rules)

When an admin identifies a problem in generated content, they can create a Universal Lesson that applies immediately to ALL future generation. No threshold required - one bad example = one rule.

Avoid rules - "Never use em dashes", "Avoid 'it's not X, it's Y' patterns"
Require rules - "Always include humans in image directions"
Style rules - Tone and framing preferences

These rules are injected into the generation prompt as MANDATORY RULES before persona-specific guidance.

2. Edit-Based Learning (Auto-Detected Patterns)

When an admin edits post copy, the system analyzes the diff to detect teachable patterns:

Em dash removal detected? Suggest "avoid em dashes" lesson
AI reframing pattern removed? Suggest corresponding rule
Smart quotes converted to straight quotes? Log the pattern

The API returns suggested lessons from detected edits. One click to make them universal rules.

3. Engagement Metrics (Statistical Learning)

When Meta API integration is active, the system pulls real engagement data:

Reach, impressions, clicks, shares, comments
Posts scored based on engagement vs. baseline for that persona
Statistical analysis after 3+ high-performing and 3+ low-performing posts

This is the only feedback loop with a threshold - you need enough data to detect real patterns, not noise.

Where Lessons Live

All lessons are visible on the Analysis page in the "Lessons Learned" panel. Each lesson shows its source (admin rule, edit detection, or engagement analysis) and can be edited or deleted.

⚙ Three-Pass Generation Pipeline (Detail)

Every piece of content goes through three distinct LLM passes, each with a specific purpose. This catches errors that a single-pass system would miss.

Pass 1: Draft Generation

The LLM receives:

Full persona profile (demographics, beliefs, effective messages, language rules)
Universal lessons (MANDATORY RULES section)
Content stream guardrails (Stream A-E with specific failure modes)
Research document excerpts (retrieved via hybrid RAG — see architecture below)
Gold standard exemplars (top-scoring approved posts for this persona)
Learned preferences (do more / do less patterns from editorial history)

Output: Raw draft with title, copy, image direction, rationale, and research citations.

Pass 2: Editorial Review

A second LLM pass reviews the draft for:

Hallucination check - Are all claims traceable to the research document?
Tone violations - Does it match the persona's voice?
Persona leaks - Any segment labels, poll data, or internal jargon exposed?
Stream compliance - Does it follow the content stream's guardrails?
AI patterns - Em dashes, reframing cliches, defensive framing?

Output: Revised copy with editorial notes explaining changes made.

Pass 3: Quality Gate (Self-Scoring)

A third LLM pass scores the content 0-10 on:

Factual accuracy - Claims verified against source documents
Persona alignment - Voice, tone, framing match
Stream compliance - Follows content stream guardrails
Engagement potential - Is this actually interesting?

Threshold: 7/10. Posts scoring below 7 are automatically regenerated (up to 2 attempts). Posts that still fail are rejected and not saved to the database.

Programmatic Safety Net

In addition to LLM-based quality scoring, regex patterns detect hard fails that force rejection:

Em dashes (—) anywhere in copy
"It's not X, it's Y" and "isn't just" reframing patterns
En dashes (–) in body copy

These patterns are detected after LLM scoring and override the score to 3 (hard fail). This ensures known AI tells never reach the dashboard, even if the LLM misses them.

📷 Intelligent Image Sourcing

Every post needs a relevant image. The system uses a multi-layer approach to find the best visual match for each piece of content:

Layer 1: Local Catalog (searched first)

A curated repository of campaign-ready images is organized by themes and tagged with locations, subjects, and persona relevance. Images are sourced from permissively licensed libraries. The LLM's "image direction" output drives a content-based tag search so visuals match the post topic, not generic stock scenery.

Layer 2: External API Search (fallback)

If no local image matches well enough, the system searches three external APIs:

Wikimedia Commons — free, high-quality scientific and nature photography (no API key required)
Pexels — professional stock photography with generous free API tier
Unsplash — high-quality creative photography with free API tier

Search queries are extracted from the LLM's image direction field, so each image search is specific to the post's content (for example, place-based geography, species, or community work contexts).

Deduplication

The system tracks recently used images across the current batch and the last 20 posts, penalizing reuse so each post gets a distinct visual.

🔌 Hybrid Graph RAG Architecture

The system doesn't use a single retrieval method. It combines three complementary approaches to find the best research context for each generation, selecting the optimal mode based on topic, persona, and available data.

Layer 1: Semantic Vector Search (RAG)

Campaign research documents are chunked by section heading, embedded using OpenAI's text-embedding-3-small model, and stored as vector nodes in LadybugDB. At generation time, the topic is embedded and compared against all chunks using cosine similarity. Results are re-ranked by:

Source tier boost — Tier 1 (core vetted) gets 1.5x, Tier 2 (team-reviewed) gets 1.2x, Tier 4 (unreviewed) gets 0.6x
Temporal decay — older chunks lose 3% relevance per reflection cycle, keeping fresh research prominent
Persona affinity — chunks that performed well for a specific persona are boosted for that persona's future generations
Topic canonicalization — a 22-topic taxonomy maps free-text topics to canonical categories for consistent retrieval

Every retrieval event is logged with chunk ID, similarity score, tier, and whether the resulting content was approved or rejected — creating a feedback loop that improves source quality over time.

Layer 2: Knowledge Graph (LadybugDB)

An embedded LadybugDB graph database at data/graph.db captures cross-entity relationships that vector search alone can't surface. No external server needed — the graph runs in-process.

Entity nodes — zones, species, communities, threats, governance structures
Relationship edges — "salmon spawn in zone 310", "LNG tankers threaten whale habitat", "First Nation co-governs MPA"
Path queries — given a topic like "salmon recovery", the graph traverses connections to find related economic impacts, governance frameworks, and community stories that vector search might miss
Cross-persona transfer — topic insights that work for one persona can be discovered and adapted for others via graph traversal
Research gap detection — identifies under-covered topics and content lineage across generation cycles

Graph context is injected alongside vector results, giving the LLM a richer, more interconnected understanding of the topic. Falls back gracefully to SQL-based entity lookups when LadybugDB isn't available.

Layer 3: Observer Agent Bypass

For topics the system has seen before, the Observer Agent provides a pre-compressed research briefing that bypasses both vector search and graph queries entirely:

Cached observations — previous research compressions are stored per persona + topic hash
Cache hits skip the entire retrieval pipeline, saving embedding API calls and latency
Cache misses trigger fresh compression, then cache the result for future use

The system automatically selects the best retrieval mode. The Primary Sources panel on each content card shows which method was used (Semantic search, Knowledge graph, Observer compression, or Full context).

Full Context Mode (CAG)

For models with large context windows, the system can load the entire research corpus into a single prompt with prompt caching enabled. This is the most thorough mode — the LLM sees every fact, every citation, every zone profile — but costs more tokens. Prompt caching (supported via Anthropic and OpenRouter) reduces repeat costs by 90% for the cached system prompt portion.

👁 Observer Agent

The Observer Agent is an autonomous background process that compresses research, maintains editorial memory, and continuously improves retrieval quality between generation cycles.

Research Compression

Raw campaign research can be 150,000+ characters across dozens of documents. The Observer compresses this into ~30,000 characters of structured observations per persona, focusing on the facts most relevant to each audience:

Compression ratios of 5:1 to 8:1 while preserving all cited statistics and source references
Persona-specific framing — economic data emphasized for "local economy advocates", ecological data for "progressive elders"
Results cached by persona + topic hash for instant retrieval on repeat topics

Editorial Memory

After each editorial review cycle, the Observer analyzes patterns in approvals, rejections, and edits:

Mini editorial reflections — an LLM summarizes recent feedback into actionable rules ("this persona responds well to economic framing but rejects governance-heavy posts")
Preference extraction — "do more" / "do less" rules derived from approval/rejection patterns, scoped by persona and topic
Stale preference pruning — preferences older than 90 days with low evidence counts are automatically deactivated to prevent outdated rules from persisting

Periodic Reflection

The Observer runs periodic maintenance tasks that keep the knowledge base healthy:

Temporal decay — applies a 0.97x multiplier to relevance scores on older RAG chunks, ensuring fresh research surfaces above stale data
Low-relevance pruning — observations that consistently fail to contribute to approved content are expired
Cache metrics — hit/miss rates, compression ratios, and retrieval performance are logged for the pipeline monitoring dashboard

✉ Publishing & Analytics (Coming Soon)

The infrastructure is built for Meta API integration:

Publish approved posts directly to Facebook Pages and Instagram Business
Support for organic and paid (boosted) content
Pull engagement metrics (reach, impressions, clicks, shares, comments)
Engagement data feeds back into scoring — closing the loop with real audience response

Under the Hood

Backend

Python 3 + FastAPI — async API server
SQLite — content metadata, metrics, learning data (data/content.db)
LadybugDB — embedded knowledge graph + vector search at data/graph.db (no external server needed; falls back to SQL-based entity lookups when unavailable)
Hosted on Dreamhost via FastCGI (passenger_wsgi.py)

Frontend

Vanilla JS (no framework), Chart.js 4.4 for dashboards
8 pages — Dashboard, Analysis, Settings, Pipeline Intelligence, Admin, Calendar, How It Works, Persona Detail

LLM Providers (switchable via env var)

OpenRouter — currently active (Claude Sonnet 4.5 with prompt caching for cost efficiency)
Anthropic, Google Gemini, OpenAI, Ollama, DeepSeek, Kimi — provider-agnostic switching with per-call token usage and cost tracking