One Agentic

Foundation Document

AI Due Diligence Workspace

Version 1.0 May 2026 Founders & Engineering

Section 01

The Problem We're Solving

What's genuinely broken in VC due diligence today. No softening.

Early-stage venture investing runs on incomplete information. The information that actually matters is hard to get, expensive to gather, and inconsistently applied across deals and partners. The product doesn't exist yet, the track record is thin, and the market may not have validated itself. You are evaluating a thesis about people and timing, armed with a deck, a meeting, and whatever your network happens to surface.

The result is a diligence process that is structurally prone to two distinct failure modes: funding the wrong people, and missing the right ones. Both are expensive. Neither is obviously avoidable without better infrastructure.

The Reality on the Ground

A typical seed-stage deal gets 2–10 hours of diligence from an analyst.^[1] Enough to sanity-check the deck, not enough to find what matters.
Founder background checks are informal — a few warm calls if you're lucky, a LinkedIn scan if you're not.
Larger funds with analysts do more — but it's inconsistent between partners and deal cycles, and completely non-transferable when someone leaves.
Smaller funds simply do less. They bet on gut and network, which is fine until it isn't.
Reference checks are structurally broken: founders provide references who will speak well of them. Genuine back-channel references are only available to investors with strong networks — which means diligence quality correlates with who you know, not how well you look.
Competitive pressure to move fast compresses the time available for diligence. Investors who slow down to think carefully often lose the deal.^[8] This creates a systematic incentive to under-diligence on precisely the deals where speed is highest.
The feedback loop is brutally long. Warning signs — missed milestones, failed fundraises, board friction — surface within 1–2 years. But full resolution rarely arrives for 7–10. More critically, even when a company is clearly struggling, the attribution is murky: bad execution, bad timing, or bad judgment at the point of investment? A VC can be systematically wrong in their selection criteria for years, attributing failures to circumstance and wins to skill.

What Gets Missed as a Result

Founders with inflated or misrepresented track records that a 30-minute background check would surface.
Conflicting information about founders and startups that surfaces across different sources — one account of a prior exit, a different one on LinkedIn, a third in a news article — and no structured way to reconcile it.
Reference signals that don't show up in formal calls but are findable if you know where to look.
Pattern breaks — the thing that's slightly different about this founder or this market that changes the entire thesis.
Market timing errors — identifying whether a market will exist is hard enough; identifying when is a separate, harder question that almost nobody answers reliably from a deck alone.

The Two Failure Modes That Define the Problem

Underneath all of this are two specific errors that diligence is supposed to prevent — and rarely does well:

Funding the Wrong People

Early-stage traction is easy to manufacture. Vanity metrics, curated reference lists, inflated prior exit narratives, coordinated social proof — a motivated founder can construct a compelling signal picture without the underlying substance. Investors evaluating these signals at speed, without structured verification, are routinely misled. Not because they're credulous, but because the signals are designed to deceive and the tools to verify them don't exist in one place.

Missing the Right People

The inverse failure is just as costly and far less discussed. Genuinely exceptional founders who are first-timers, outside the network, from underrepresented geographies, or simply quiet about their work produce weak online signals. They don't have warm intros to partners, polished decks, or press coverage. Pattern-matching on surface signals — LinkedIn pedigree, prior exits, familiar schools — systematically screens out this cohort. The best deal a fund ever passed on is often invisible in their post-mortems.

The tools that exist today either automate the shallow stuff (CRM enrichment, news alerts). They are not built to spot fake traction, surface conflicting information across sources, or identify founders who lack a strong online presence. Nothing is built around what a VC team actually needs to make a better decision — and nothing is designed to reduce both false positives and false negatives at once.

Section 02

What We're Building

Plain mechanics. What the product is and what a team does inside it.

An AI workspace where VC teams — partners and analysts together — run due diligence on startups. The product combines automated enrichment with a thesis-aware evaluation engine and a human-directed agent, producing a picture of the opportunity that is specific to how each fund actually invests.

Thesis Capture

Before any deal is evaluated, a fund encodes their investment thesis — stage, markets, team profiles, the signals they weight, and what they've decided not to invest in. This is the foundation against which every deal is assessed. It is not a one-time setup; it is a living configuration. The output is never "this is a good startup" — it is "this is or isn't a fit for this fund's specific thesis, and here is why."

Workflow Design

Each fund defines their own diligence workflow — the steps they run, the questions they ask, the structure that fits how they invest. Workflows are created conversationally, through the AI agent, not through a visual builder. The fund describes what they want; the agent generates the workflow. Pre-built templates are available as starting points.

Open question: To what degree does a fund's workflow definition influence which data sources the product queries? Whether source selection is fully product-controlled or partly shaped by the workflow is still being decided.

The Core Interaction Loop

A deal enters through whichever path fits the moment: a pitch deck, a company URL, a founder name, or a structured intake form. The product extracts what it needs from whatever it receives.
The agent runs enrichment automatically — founder background signals, company context, market references, public signals — and evaluates the deal against the fund's thesis and workflow.
The product surfaces conflicts proactively: when independent sources tell different stories about the same fact, when a claimed background has no verifiable trace, or when signals look manufactured. Each flag is specific and sourced. The product flags; it does not conclude.
The VC reviews the output and decides what to do next: move forward, pass, or direct the agent to do more — in plain language.
The agent executes, returns with results, and the VC reviews again. The loop continues until they have enough to decide.

What It Is Not

It is not a decision engine. It does not tell investors whether to invest. It is not a fully automated system: the VC directs every external action, including any outreach to founders — the agent drafts, the VC sends.

See also Product Strategy — the interaction loop in full detail, build sequencing, and what VCs will pay for.

Section 03

Vision

Where this ends up in 10 years if everything goes right.

Every capital allocation decision in early-stage investing is made with complete, unbiased information about the people and markets behind it — not just the ones a well-networked analyst happened to surface.

Today, the quality of a VC's due diligence is bounded by how much time their team has and who they happen to know. A tier-1 fund with ten analysts misses things. A two-partner fund misses most things.

We're building toward a world where the depth of diligence is no longer a function of headcount or network — where a small fund can evaluate a founder as thoroughly as the best team on Sand Hill Road, and where the signals that actually predict outcomes (founder resilience, market timing, category dynamics) are surfaced for every deal, not just the ones that get lucky.

Section 04

Mission

What we're doing right now, for whom, and why it matters.

We build tools that help VC teams — partners and analysts alike — see what they're actually investing in: the founder's real track record, the market's real dynamics, and the risks that don't show up in a pitch deck.

Our job right now is to make sure that when a VC team sits down to evaluate a startup, they're not flying half-blind. We combine what the team already knows with data that would take hours to surface manually — founder background signals, market nuance, competitive dynamics — and turn it into a clear, structured picture of the opportunity and the risk.

We're not replacing the judgment of a good investor. We're making sure that judgment is applied to the full picture, not just the parts that were easy to find.

Section 05

Who We're Building For

Specific user profiles. Who they are, what their day looks like, what they'll pay to fix.

Primary Users

The Analyst or Associate

Usually 1–3 years in, responsible for sourcing and first-pass diligence. Spends 40–60% of their time on research that feels repetitive and low-leverage. Wants to do good work but is capped by time. Will become a daily user if the product makes them look sharper in partner meetings.

The Partner

Reviews deals, relies on analyst output, fills gaps with their own network. Inconsistent in how deeply they go depending on how busy the week is. Will use the product to gut-check, not to grind. Needs to trust the output before they'll rely on it.

Fund Profile We're Optimizing for Now

Early-stage focus (Pre-Seed through Series A)
1–15 person team
Receiving 50–200+ inbound deals per month, declining most after light-touch screening
Advancing 5–30 deals to active pipeline evaluation, ranging from quick first-pass analysis to deep multi-source research on the most serious candidates
No dedicated data or research infrastructure

One Agentic supports the full deal lifecycle for this profile — from quick screening runs at the top of funnel to intensive research on deals approaching a partner decision.

Note: This profile will expand. But this is who we're building for at launch, and every product decision should be filtered through them first.

Section 06

The Market

How VC due diligence tooling works today and why this market is winnable now.

Market Structure

VC due diligence tooling is fragmented. Funds use a patchwork of: general CRM tools (Affinity, Salesforce), data providers (PitchBook, Crunchbase, CB Insights), background check services, and internal spreadsheets or Notion wikis for synthesis.

Most tools in the market today solve one layer of the diligence problem and stop — CRM enrichment, data lookup, or relationship tracking. The few that venture further into workflow, like Harmonic, are built primarily around sourcing and signal monitoring rather than the evaluation of a specific deal once it's in consideration. The gap isn't that nothing exists — it's that nothing is built around the full diligence workflow: structured capture, deep research, and AI synthesis, configured around how a specific fund actually thinks.

Why Now

LLMs are good enough to synthesize unstructured data meaningfully for the first time. Two years ago this product wasn't buildable at the quality bar VCs would accept.
VC funds are under pressure to move faster. Deal cycles compressed. Teams that can diligence faster without losing depth have a real edge.
Founder data is more accessible than ever — public signals, social footprints, news, court records, company filings — but nobody has connected the dots for VC.
AI tooling fatigue is creating an opening for products that are specific and useful over products that are generic and impressive.

Market Size — Honest Estimate

The more useful lens for sizing this market is users, not funds. A typical early-stage fund in our target profile — Pre-Seed through Series A, 1–15 people — has 2–5 people actively doing diligence work at any given time. Applied across an estimated 4,000–8,000 actively deploying early-stage funds globally,^[2] that puts the realistic addressable user base at roughly 10,000–40,000 people.

The product is sold as a tiered subscription — Seed, Series, and Max — each with a fixed number of seats and a shared monthly credit pool that scales with deal volume. Credits are consumed as workflows run; a fund in a slow quarter draws less, a fund in active deployment draws more. Self-serve tiers require no sales conversation and are billed annually by default. Full pricing structure and unit economics are in the Pricing Model & Assumptions.

Based on our current model — an average of 4 interactions per standard workflow and deep research sessions running north of 50K tokens, with a 90/10 standard-to-deep split — a typical 5-person fund running approximately 275 workflows per month^[3] — spanning quick top-of-funnel screenings across a large inbound volume, enrichment runs on deals in active consideration, and deeper research sessions on a handful approaching a decision — spends roughly $175/month or ~$2,100/year. Full pricing assumptions and sensitivity analysis are documented separately in the Pricing Model & Assumptions.

Geographic note: The fund count above is a global estimate. The near-term reachable market in year one is more concentrated — primarily North America (an estimated 2,000–2,500 active early-stage funds) and Western Europe (1,500–2,000), accounting for roughly 3,500–4,500 funds combined. Asia-Pacific, LATAM, and MENA represent real long-term volume but convert more slowly given discovery friction, varying diligence norms, and the time required to build regional trust. The optimistic scenario assumes meaningful penetration beyond North America and Western Europe; the base case does not depend on it.

TAM — Early-Stage VC Segment

Scenario	Active Funds	Avg Yearly Spend	ARR
Base	6,000	$2,100	~$12.6M
Optimistic	8,000	$3,500	~$28M

TAM — Angel Investor Segment

Active angel investors — solo operators evaluating deals independently, without analyst teams — represent a discrete second segment running the same core diligence workflow. Based on 66,000 active US angels identified by the Angel Capital Association,^[7] and 40 workflows per month per angel at the blended cost, the computed annual ARPU is —/year.

Scenario	Active Angels	ARPU/yr	ARR
Base	66,000	—	—
Optimistic	150,000	—	—

Combined TAM — VC + Angel

Scenario	VC ARR	Angel ARR	Combined ARR
Base	—	—	—
Optimistic	—	—	—

The VC and angel figures above represent the primary modelled segments. Four additional categories — multi-stage and larger VC funds (4,000 estimated firms), private equity funds with venture or growth arms (2,500), family offices doing direct deals (3,500), and corporate VC units (1,250) — run materially identical due diligence workflows at equal or greater intensity. Applying usage-based ARPU multipliers derived from deal volume and team size yields a combined ballpark TAM for these excluded segments of approximately —, bringing the full modelled opportunity to roughly — at the base case and roughly — in the optimistic scenario. An optimistic range for the extended segments has not yet been modelled; the optimistic total above uses the same extended segment figure as the base case. See the Excluded Segments Appendix for full assumptions and segment-level detail.

Full Market Summary

Segment	Base	Optimistic
VC — Early-Stage	—	—
Angel Investors	—	—
Combined VC + Angel	—	—
Extended Segments (base-case only — optimistic TBD)	—	—
Grand Total	—	—

Section 07

Business Model

How the product makes money and the assumptions it depends on.

The Model: Tiered Seats + Shared Credits

The product is sold as a tiered subscription. Each tier includes a fixed number of seats and a monthly credit pool shared across the organization. Credits are consumed when workflows run — standard diligence workflows draw less, deep research sessions draw more. Any tier can purchase additional credits at any time; purchased credits are pooled at the org level, not assigned per seat. Billing is annual by default; monthly is available at a small premium.

The Three Tiers

Tier	Seats included	Monthly credit pool	Price
Seed	2 (TBD)	Sized for small funds, low deal volume	[TBD] / mo, billed annually
Series	5 (TBD)	Sized for a typical early-stage fund in active deployment	[TBD] / mo, billed annually
Max	Unlimited (TBD)	Large pool for high deal volume; priority support included	[TBD] / mo, billed annually
Enterprise	Unlimited	Negotiated volume; no pool cap	Annual contract, custom pricing

Seed, Series, and Max are self-serve — funds sign up, pick a tier, and start without a sales conversation. The Enterprise tier is sold directly: it requires a scoping call and is priced on a custom annual contract. In addition to negotiated credit volume, Enterprise includes capabilities not available on self-serve tiers: a dedicated customer success manager, CRM and Slack integrations, API access, SSO and audit logs, and priority model routing for high-volume deal periods. Specific credit pool sizes and price points are finalized in the Pricing Model & Assumptions.

Additional Credits

Credits purchased on top of the included monthly pool are shared across all org members — not assigned to individual seats. A team in an active deployment sprint can top up once and every member draws from the same pool. Unused additional credits roll over; included monthly credits reset on renewal.

Pilot Motion and High-Touch Onboarding

All self-serve accounts are eligible for a structured onboarding call in the first two weeks. For Enterprise accounts, and for any early-cohort fund where it makes sense, we offer a white-glove pilot: the team manually reviews the AI's output against the fund's last five completed deals — mapping what the platform would have surfaced against what actually happened post-investment. Where the AI would have caught something the team missed, we show it. This is a proof point, not a demo: it produces a fund-specific number rather than a general capability claim. We run these pilots manually during the early-partner cohort and productize if the conversion rate justifies it.

Unit Economics Assumptions

The following need to be validated with real usage data. Full sensitivity analysis is in the Pricing Model & Assumptions.

Workflow mix: Approximately 90% standard workflows, 10% deep research runs.^[4] If the mix shifts toward deep research, ARPU and cost both increase. Watch this ratio closely in the first 60 days of live usage.
Credit consumption per workflow type: Standard and deep research sessions have materially different draw rates. These must be set before launch and revisited as model routing evolves.
Overage revenue: Additional credit purchases are a meaningful expansion lever. A fund that regularly tops up is a natural upgrade candidate to the next tier.
Gross margin target: Margins should support reinvestment in data coverage and model improvement. Retention and expansion matter more than margin at this stage.

The Expansion Path

Revenue grows in two ways: new funds adopting the product, and existing funds expanding — moving from partial deal coverage to running every deal through the platform, or upgrading tiers as headcount or volume grows. The second lever is handled by customer success: usage reviews, tier upgrade conversations when credit utilization signals the fund has outgrown their plan, and a named account owner for Enterprise accounts. For self-serve tiers, the expansion trigger is usage data; for Enterprise, it is a structured quarterly review.

Section 08

Competition

Named plainly. What they do well, where they fall short. Update this quarterly.

Harmonic

Harmonic is a well-funded, widely adopted sourcing platform used by 1,000+ investors. Its core strength is finding high-potential companies early — often 6–12 months before they become visible elsewhere — using proprietary data on startups, funding rounds, headcount signals, and founder movement. Their Scout AI agent adds a meaningful research layer: it can map markets, assess founding team background, map competitive positioning, and draft outreach from a single prompt. This is not a data aggregator; it's a capable sourcing workflow with a growing AI layer.

Where Harmonic ends and One Agentic begins is the transition from sourcing to evaluation. Harmonic is built for top-of-funnel discovery — finding deals worth looking at. It is not built around the structured risk assessment a team runs once a deal is in active consideration: verifying claims, surfacing signal conflicts across sources, identifying red flags in unstructured data, or producing a structured diligence output shaped by a fund's specific thesis.

Positioning: Harmonic finds the deal; we evaluate it. Do not underestimate them — Scout is real and improving. The differentiation must be depth of evaluation, not presence of AI. Harmonic users are a natural acquisition target: they already pay for sourcing and will understand the gap.

Affinity

Affinity is the dominant CRM for private capital, trusted by some of the most recognized names in venture and private equity. Its core product is relationship intelligence: automatically mapping a firm's collective network across email and calendar to surface warm intro paths, relationship strength scores, and connection history across every team member. Deal Sourcing and Deal Management sit on top of that foundation, with enriched company data from 40+ sources, pipeline tracking, and AI-powered meeting notes. Their AI layer is real — Affinity MCP connects directly to Claude, ChatGPT, Copilot, and Gemini, allowing teams to search deals, prep for meetings, and update the CRM from inside any AI tool.

Where Affinity stops is independent pre-investment evaluation. Their "due diligence" feature means capturing and organizing what your team already knows — meeting notes, reference call records, communication history. It does not mean researching a founder independently, surfacing public risk signals, verifying claims, or synthesizing a structured risk picture from outside data. Their data enrichment is firmographic (funding rounds, headcount, growth metrics), not founder-level depth. Affinity knows who you've talked to; it does not tell you what you're actually investing in.

Positioning: Complementary to Affinity, not a replacement. Their CRM tracks the relationship and the pipeline; we handle the evaluation that sits in between sourcing and decision. Watch for Affinity expanding their AI layer into enrichment and research — the MCP integration means they could route external research into their CRM workflow faster than expected. Win Affinity users by filling the gap they openly leave.

Attio

Modern, highly customizable CRM with a purpose-built VC product and a growing AI feature set. Its AI layer includes call recording with auto-generated meeting summaries, AI-drafted follow-up emails, and contact enrichment. More accessible pricing than Affinity, making it increasingly attractive to smaller and tech-forward funds.

That said, Attio is fundamentally a CRM. Its AI is productivity-oriented — saving time on admin, automating follow-ups — not intelligence-focused. It does not perform founder background research, does not synthesize public signals into a structured risk picture, and does not support the collaborative diligence workflow a team needs before a partner meeting.

Positioning: Attio manages pipeline; we surface what you're actually investing in. Watch for product expansion into enrichment and synthesis.

Clay

Clay is a GTM data enrichment and automation platform used primarily by sales and marketing teams. It pulls from a wide range of data sources and lets users build AI-powered research workflows through Claygent, its AI agent. A technically capable VC analyst can wire together a credible research workflow in Clay: enriching founder and company profiles from multiple sources and pushing results to a CRM.

Clay has no VC-specific product, no structured diligence output, and no concept of a fund's investment thesis. The threat is indirect — it's cheap, powerful, and already in the hands of tech-forward analysts building their own version of what we offer. Every analyst who assembles their diligence workflow in Clay is a prospect we didn't convert.

Positioning: The DIY alternative. Win by offering what Clay can't: VC-specific structure, workflow depth, and outputs a partner trusts without assembly required.

AlphaSense

AlphaSense is an enterprise-grade market intelligence platform used by large financial firms — hedge funds, PE, investment banks, and consulting firms. It covers an extensive library of expert transcripts, broker research, company filings, and financial data. Its Deep Research AI agent autonomously generates investment-grade briefings; its Generative Search reasons across qualitative and structured data simultaneously. A Gartner Magic Quadrant leader in competitive and market intelligence.

The overlap with One Agentic is narrow but real for larger or institutional VC funds doing deep market diligence. For our core target it falls well short: no VC-specific workflow, no early-stage founder research, no product shaped around pre-seed to Series A evaluation. Enterprise pricing puts it out of reach for most funds we're targeting, and it is predominantly oriented toward public-company and later-stage research.

Positioning: Not a direct competitor for our core market. Relevant at the edges for larger, more institutional funds. Win by being purpose-built for early-stage VC and an order of magnitude cheaper.

PitchBook / CB Insights

Data depth is their moat. Very expensive, very slow to use for actual diligence workflows. Built for LP reports and market research, not day-to-day deal evaluation.

Generic AI (ChatGPT, Perplexity, Claude)

The real ambient competitor. Any analyst can already use these to do faster research. Our advantage must be: structure, workflow integration, proprietary data, and outputs a VC team trusts enough to put in front of a partner.

LinkedIn / Sales Navigator

The de facto founder research tool — every analyst runs a LinkedIn check as a first step. Strong on employment history, network connections, and social signals. No AI synthesis, no structured output, no workflow. Ambient and unavoidable: already in use at every fund, never going away. We work alongside it, not against it.

Notion / Airtable

The "build your own" alternative. Many small funds run their entire diligence workflow in a Notion wiki or Airtable base. Zero switching cost to start, zero ceiling on capability. No enrichment, no AI synthesis, no consistency. An analyst who has already built a diligence structure here is our most reachable prospect — they've proven they want structure, they just haven't paid for a dedicated tool yet.

Emerging Vertical AI Competitors

Several stealth products are being built in this space. We don't have full visibility. Our moat must be built on workflow depth and data coverage — not on being first. First-mover advantage in B2B SaaS is weak. Depth of product is not.

Competitive Pricing Position

Product	Annual Cost (5-person fund)	Entry Model
One Agentic	~$2,100/yr (self-serve) · custom (enterprise)	Shared credit pool, no per-seat lock-in; self-serve (Seed–Max) or enterprise contract with dedicated onboarding
Attio (Plus)	~$1,740/yr ($29/seat/mo × 5)	Per seat, billed annually, self-serve
Affinity	~$10,000–$13,500/yr ($2,000–2,700/seat/yr × 5)	Per seat, billed annually
Harmonic	~$25,000–$30,000/yr (~$10,000/user/yr)^[5]	Enterprise contract, 3-user minimum (~$30K/yr)

We sit between Attio and Affinity on price — comparable to a CRM for self-serve plans, a fraction of the cost of a sourcing tool even at the Enterprise tier. Unlike per-seat models, our shared credit pool means a team of five costs the same whether two people run every workflow or all five do. Self-serve plans require no contract and no conversation to start. Enterprise accounts get a scoped annual contract, a dedicated CSM, and a white-glove pilot before committing.

Competitor Comparison

Competitor	Pricing	What They Do Well	Critical Gap	How We Win
Harmonic	Enterprise contract; ~$10,000/user/yr; estimated ~$25,000–$30,000/yr for VC teams (3-user minimum)	Best-in-class sourcing platform; broad startup and founder data; Scout AI agent for market mapping, team research, and outreach; strong brand	Built for top-of-funnel discovery, not deal evaluation. No structured risk assessment, claim verification, or diligence output configured around a fund's thesis.	Position as the evaluation layer that activates after sourcing. Harmonic finds deals; we tell you what you're actually investing in. Target Harmonic users directly.
Affinity	Enterprise; estimated ~$24,000–$60,000+/yr for a VC team	Dominant CRM for private capital; deep relationship intelligence and network mapping; real AI layer with MCP integrations; enriched company data from 40+ sources; trusted by leading names in the industry	Pre-investment evaluation is not a supported workflow. Due diligence means organizing what your team already knows — not independent founder research, claim verification, or risk synthesis from outside data.	Complementary positioning: Affinity tracks the relationship and pipeline; we handle the evaluation between sourcing and decision. Target Affinity users who have no structured diligence layer.
Attio	Accessible; starts ~$408/user/yr; Pro plan ~$828/user/yr	Modern, flexible CRM; purpose-built VC product; growing AI layer; accessible pricing	CRM-first by design. No founder background research, no public signal synthesis, no structured risk output.	Complementary positioning: Attio manages pipeline; we produce diligence depth. Target tech-forward funds.
Clay	Tiered; starts ~$1,788/yr; scales with usage and data credits	Flexible data enrichment with a wide range of sources and AI research agents; cheap and fast to get started; already in use at tech-forward analysts	No VC-specific product, no structured diligence output, no fund thesis awareness. Requires significant analyst time to build and maintain a workflow.	Win on depth and readiness. Clay requires assembly; we deliver a purpose-built workflow out of the box. Target analysts already building in Clay — they've proven the need.
AlphaSense	Enterprise; estimated $20,000–$50,000+/yr; out of reach for most small funds	Enterprise-grade market intelligence; deep research AI; extensive library of expert transcripts, filings, and financial data; Gartner Magic Quadrant leader	No early-stage VC workflow, no founder research capability. Oriented toward public-company and later-stage analysis.	Win on fit and price. Purpose-built for early-stage VC; an order of magnitude cheaper. Relevant only at the edges where larger funds overlap.
PitchBook / CB Insights	Enterprise; $15,000–$30,000+/yr per seat^[6]	Enormous structured data moat; strong brand trust with large funds; deep market research capability	Very expensive; slow and complex UX; built for LP reports and market research, not day-to-day deal evaluation.	Undercut on price and out-execute on workflow speed. Offer AI synthesis they don't provide.
Generic AI Tools	Free to ~$240–$2,400/yr (ChatGPT, Perplexity, Claude plans)	Free or cheap; fast; zero onboarding; capable enough for basic research tasks	No VC-specific structure; no persistent deal context; no proprietary data; inconsistent, untrusted output.	Beat them on structure, consistency, and trust. A repeatable workflow with VC-specific framing and reliable output.
LinkedIn / Sales Navigator	~$1,200/user/yr (Sales Navigator Core)	Universal adoption; strong on employment history, network signals, and social footprint; zero learning curve	No synthesis, no structured output, no workflow. A research starting point, not a diligence tool.	Complementary. We incorporate what LinkedIn surfaces and take it further. Not a displacement play.
Notion / Airtable	Free to ~$240/user/yr; effectively zero cost for most use cases	Flexible, familiar, zero upfront cost; funds have already built custom diligence structures here	No enrichment, no AI synthesis, no consistency across deals. Entirely dependent on analyst effort.	Best conversion target. They've already built the habit of structured diligence — we offer the same structure with automation and depth built in.
Emerging Vertical AI	Unknown; likely early-access or seed-stage pricing	AI-native; no legacy to defend; potentially well-funded; moving fast	No established customer relationships; no workflow depth yet; building from zero; unknown data coverage.	Ship fast, build workflow depth, and acquire early design partners. Win 10 reference customers who won't leave.

Action: Conduct a structured competitive review every quarter. Name new entrants. Update this section.

Section 09

Our Differentiators

What's actually defensible. Not what sounds good in a pitch.

The things that are actually hard to replicate are: proprietary data with real verification depth, and a product that encodes how the best investors actually think — not how a generic AI assumes they do.

Data Is the Differentiator — Specifically, What We Can Verify

Founder data is not hard to find. It is hard to trust. Public records, LinkedIn histories, press coverage, company filings — an analyst can aggregate these in an afternoon. What no analyst can reliably do is detect a fabricated claim, catch the inconsistency between what a founder says today and what they said two years ago, or surface the signal that's been deliberately buried. The differentiation is not in aggregation — it's in verification. Identifying what's been manufactured, what's been omitted, and what contradicts. That's a judgment problem we can encode. It's also one that compounds: the more deals we run, the better our detection gets.

Simplicity Wins the VC Market

VCs are not engineers. They will not configure complex tooling to get value from a research product. Every layer of setup, every prompt they have to write, every integration they have to maintain, is a reason to stop using the product. The bar is: a partner with no technical background should reach a trusted output without help. Complexity is a competitor's problem. Simplicity is a genuine moat in this market.

Usefulness Is the Floor. Trust Is the Ceiling.

A VC who finds the output interesting but doesn't rely on it has not been converted. The product only creates value when a partner puts it in front of a Monday meeting and stands behind it. That requires two things generic AI cannot provide: consistency across every deal type, and a clear audit trail for why a signal was flagged. An analyst can tolerate occasional noise. A partner cannot. Every product decision should be tested against the partner standard — not whether it works, but whether it's trustworthy enough to act on.

Distilled VC Knowledge, Not Generic AI

A general-purpose model can research a founder. It cannot do it through the lens of a seed-stage investor running a thesis-driven fund. The difference between a useful output and a trusted one is whether the framing reflects how a VC actually thinks: what signals matter at pre-seed versus Series A, what patterns indicate resilience versus polish, what questions a well-prepared deck was specifically designed to avoid. Encoding that knowledge — and sharpening it with every deal run through the platform — is what gets harder to replicate over time.

The Moat Is Not the Workflow

Automated workflows, AI agents, enrichment pipelines — we can build them, and so can anyone else, in days. If our differentiation lives primarily in the workflow layer, we have a head start, not a moat. We are not defensible because of what we assembled. We are defensible if we build the things that are genuinely hard to assemble: verification depth that improves with every deal, and VC knowledge that's been deliberately encoded rather than assumed.

Section 10

What We're Not Building

Deliberate exclusions. This is a commitment, not a to-do list.

Scope pressure will come. Partners will ask for things that sound reasonable. Investors will suggest pivots. This section is a forcing function to stay honest about what we are.

We Are Not Building

A replacement for human judgment. The product is useful because it extends what a good investor can see, not because it removes the investor from the loop. We surface signal. The call is always theirs.
A general AI research assistant. We are purpose-built for VC due diligence. Breadth is not the goal. A product that does everything for everyone does nothing well for the people we're building for.
A portfolio monitoring tool. Post-investment is a different workflow, a different user need, and a different product. We do not follow deals past the investment decision.
A tool for founders. Our user is the investor evaluating a company, not the company being evaluated. We will not build features that serve or face the founder directly.
A data vendor. We use data to produce diligence outputs. We do not license, sell, or expose raw data as a product.

Section 11

How the Product Works Today

Honest MVP state. Where the AI is strong. Where it isn't. No spin.

What's Working

The core input-to-output workflow functions end-to-end.
Automated enrichment pulls meaningful data from available public sources.
AI synthesis produces structured output that is materially better than what an analyst would produce in the same time.

What's Manual and Needs to Be Reduced

Some data source connections require manual steps. This will be a bottleneck at scale.
Output quality is inconsistent across different types of founders and markets. Edge cases expose model weaknesses.
The workflow UX needs work for partners specifically — they need a faster on-ramp than analysts do.

Where the AI Is Weak Right Now

Private company data is sparse and the AI knows it — it hedges, which is correct but frustrating for users.
Founder character signals are the hardest to get right. We have a signal gap here that data coverage will help but not fully solve.
Market nuance requires more curated data than we currently have. The AI's world knowledge is a starting point, not a finish line.

This section should be updated after every sprint cycle. It is a live document of what is true, not a snapshot from day one.

Section 12

The Road to Launch

What must be true before we charge money. Not a timeline — a checklist of readiness.

Engineering Readiness

Core diligence workflow is stable and reproducible across deal types.
Automated enrichment runs without manual intervention for standard cases.
Output quality is consistent enough that a partner would not be embarrassed putting it in a meeting.
The product does not fail silently — when data is missing or uncertain, it says so.
Monitoring and alerting is live — the team knows when something breaks before a customer reports it.
Internal technical documentation exists and is current: system architecture, runbooks, and deployment procedures.
Authentication, access controls, and role permissions are locked down. Data is encrypted at rest and in transit. A penetration test is completed or formally scheduled before the first paying customer.
The product is not hardwired to a single model provider. Switching between providers — or routing specific workflows to open-weights models — is an architectural option, not a migration project.

Data Readiness

The sources designated as live at launch (see §17 — Authorized Data Sources) are connected with real production coverage — not demo data, not manual lookups. Sources designated as post-launch are documented as such and do not silently degrade output quality when absent.
Fallback behavior is defined and tested for every integrated source: when a provider is unavailable, rate-limited, or returns an empty result, the product surfaces that gap explicitly in the output rather than completing silently with partial data.
We have run at least 20 real deals through the product with real VC users and iterated on the output.
We know exactly what data we have, what we don't, and how that affects output quality for different deal types.
Data handling policies are defined: what we collect, how long we retain it, who has access, and how we delete it on customer request.

UX Readiness

Both partner and analyst workflows have been tested with real users, not just internal team members.
Onboarding takes less than 20 minutes to get to first useful output.
User-facing documentation and help content exists before any paying customer touches the product. Error messaging is clear enough that a non-technical partner can act on it without escalating.

Commercial Readiness

We have at least 3 paying pilot customers who are using the product on real deals.
We know what we're charging and why, and the value exchange is clear to the customer.
We have a support process for when something goes wrong.

Legal & Compliance Readiness

Privacy policy and terms of service are reviewed by counsel and live on the product.
Data processing agreements (DPAs) are drafted and ready — VC funds with institutional LPs will ask for them.
Data licensing agreements are fully executed for every commercial data provider in the launch catalog — LexisNexis Risk Solutions, Thomson Reuters CLEAR, Dun & Bradstreet, and any equivalent licensed source. Usage restrictions, resale prohibitions, and audit rights in those agreements are understood and accounted for in how the product surfaces data.
We know which jurisdictions we can and cannot serve given our current data handling practices.

Enterprise Tier

The Enterprise tier — custom contracts, SSO, audit logs, API access, CRM/Slack integrations, dedicated CSM, and priority model routing — is not available at launch. Self-serve tiers (Seed, Series, Max) are the only commercial offering until the following conditions are met post-launch: enterprise-specific features are fully built and tested, a CSM function is staffed or formally covered, and at least one Enterprise pilot has been completed with a design partner on a structured contract. The Enterprise row in the pricing table is visible to set expectations; the product will direct inbound enterprise interest to a waitlist until these gates are cleared.

Section 13

Open Questions

What we genuinely don't know yet. These are not weaknesses to hide — they're risks to work against.

This section is the most important section to keep current. The moment we stop adding to it is the moment we start lying to ourselves.

Product Questions

Do VCs trust AI-surfaced founder signals enough to act on them, or are we still in a 'helpful summary' phase where the output informs but doesn't influence? We don't fully know yet.
What's the right balance between depth of output and speed of delivery? We're currently optimizing for depth. Is that what paying customers want?
How much does output quality need to improve before a partner, not just an analyst, relies on it — and what does "good enough" look like specifically for the verification layer, where a false signal could cost a fund a deal or a reputation?
Should the product eventually learn from outcomes — tracking whether investments made using our output performed — and fold that signal back into the model? And if so, how long before that data is meaningful?
Is conflicting signal detection compelling enough to drive early adoption on its own — before the rest of the product is fully mature? VCs will act on a genuine conflict when one is surfaced; that's not the question. The question is whether the prospect of that detection is a strong enough hook for a fund to adopt early, despite gaps elsewhere in the product. If yes, it changes how we sequence what we build and how we position the product to the first cohort of design partners.

Data Questions

What data sources do we actually have access to at quality and scale versus what we plan to have? This gap needs to be explicit.
How do we handle markets and founders where public data is thin? Southeast Asia, MENA, first-time founders with no track record.
What's our position on using data that is technically public but that founders would find invasive if they knew we'd used it?

Market Questions

We are leading with partners and GPs as the primary point of entry for enterprise accounts. The open question is now: how many stakeholders are involved in signing off at a fund of our target size, and what does each need to see? We need to map the typical enterprise buying committee before we can finalize the demo flow and contract structure.
Are emerging market VC funds a realistic early market, or are they under-resourced relative to the willingness to pay we need?
Will AI-native funds build this themselves rather than buy it?
Private equity firms and family offices run variations of the same diligence workflow with meaningful budgets and less tooling sophistication than large VC funds. Do they represent a near-term expansion market — or a distraction from going deep in early-stage VC first?

Commercial Questions

What is the minimum feature set that justifies a paying transaction? Not feature completeness — the value threshold at which a VC team feels the exchange is clearly worth money. We need to know this before we open billing.
Is the analyst or the partner the actual internal champion for adoption within a fund? This changes who we sell to, how we onboard, and what the product needs to do on day one.
At what fund size and deal volume does consumption-based pricing break down, and what model replaces it?

Strategic Questions

What does a well-resourced competitor (PitchBook, Affinity, or a well-funded new entrant) do to this product if they decide to compete directly in 18 months?
What is the second product we build once diligence is solid? Do we expand vertically — deeper into VC — or horizontally into adjacent segments first?

Last updated: May 2026. Assign an owner to each question and a target date by which we need an answer.

Section 14

Core Bets

What we've decided is true. Not open questions — commitments. If any of these turn out to be wrong, we need to revisit the strategy, not just the tactics.

Open Questions track uncertainty. Core Bets track conviction. This section forces us to be explicit about the assumptions we're already building on, so that when one of them is tested, we recognize it for what it is — not a tactical problem, but a foundational rethink.

Bet 1: Direct Enterprise Sales Is the Most Likely Winning GTM Motion

We are betting that going direct — founder-led outreach, demo, and structured onboarding — is the fastest path to defensible early revenue and the strongest early relationship with funds. The sequencing is deliberate: self-serve tiers (Seed, Series, Max) are the commercial offering at launch, giving us real usage data and early reference customers before we commit engineering and operational resources to Enterprise. Direct enterprise sales follows once the Enterprise tier is built and a CSM function is in place — as described in the Road to Launch. The conviction in this bet is about the eventual GTM shape, not the launch order. If direct sales proves too slow to close at the pace we need once Enterprise opens, we revert: self-serve becomes the primary motion and we optimize for the analyst discovering and activating the product without any human contact.

Bet 2: Output Quality Is Good Enough to Build Trust at Launch

We are betting that current AI capabilities — with real data coverage and VC-specific framing — produce outputs that are meaningfully better than what an analyst generates in the same time. Not perfect. Not comprehensive. Better. A launch product that hedges clearly and is right more often than not is enough to build early trust. If the gap between our output quality and analyst-generated work is not obvious to a partner in the first real deal run, this bet is wrong and we need to delay launch and close the quality gap first.

Bet 3: Data Depth and Quality Will Improve Fast Enough Not to Permanently Damage Trust

We are betting that the gaps in our current data — sparse private company data, thin founder signals in underrepresented geographies, weak market nuance — will narrow fast enough through improving depth and quality that early users develop tolerance for honest uncertainty rather than resentment of inadequate output. We ship with clear hedging language on every data gap. We do not pretend to have coverage we don't have. If our honest acknowledgment of gaps creates the impression of a weak product rather than a trustworthy one, this bet is wrong.

Note: this bet is about the early trust-building period. It is not a bet that we need to match sourcing platforms on raw data volume — that distinction is the subject of Bet 8.

Bet 4: The Partner Is the Right Entry Point for the Sale

In a direct enterprise motion, we lead with the partner or GP — the person who can authorize the spend and owns the quality of the fund's diligence process. Analysts become the day-to-day users once the fund commits. If we find that partners consistently defer to analysts and procurement stalls without a bottoms-up champion, we revert: analyst-led discovery moves to the front, and onboarding is redesigned to convert without a partner-facing pitch.

Bet 5: Tiered Pricing + Shared Credits Holds Through Early Growth

We are betting that the Seed / Series / Max structure — with org-level shared credit pools and top-up flexibility — covers the range of fund sizes and deal volumes we'll encounter in early cohorts without requiring a structural pricing change. The biggest risk is mispricing the included credit pools: too small and customers feel nickeled by overages; too large and we leave margin on the table. We watch credit utilization rates and overage frequency per tier closely and treat either extreme as a signal to rebalance.

Bet 6: Early-Stage VC Is the Right Beachhead Before Expanding

We are betting that going deep in Pre-Seed through Series A — and winning a reputation there — is the right path before expanding to growth equity, PE, family offices, or corporate venture. Spreading across segments early fragments the product, diffuses the brand, and means we're mediocre everywhere instead of trusted somewhere. The adjacent segments are real and reachable. We get there by being dominant in the beachhead first.

Bet 7: Agentic Depth and Proprietary Data Are the Compounding Moat

We are betting that our defensible position is built on two compounding advantages — neither of which lives in the workflow layer, and neither of which a competitor can replicate quickly.

The first is agentic depth. We invest continuously in improving our agents' reasoning — not just the workflows they execute. As AI capabilities mature, we move toward agents that reason across conflicting signals, adapt to context, and operate with increasing autonomy, proactively surfacing what a fund needs rather than waiting to be asked. Every meaningful advance in the underlying AI is an improvement in our product, not just a shift in our cost structure. This compounds as the field moves.

The second is proprietary data. Every deal run through the platform generates data we hold and improve — on founders, companies, markets, and which signals actually predicted what. Competitors who don't run diligence workflows don't collect this. Over time, the signal quality we can bring to a new deal improves because of every deal that came before it.

The combination is what's defensible: better agents operating on better data, both improving with usage. Building only one of these gives us a head start, not a moat. If we ship without actively investing in both tracks — letting the agent layer stagnate while data improves, or vice versa — this bet fails on its own terms.

Bet 8: We Deliver Meaningful Value Before We Have Harmonic-Scale Data Volume

Harmonic has years of data collection at scale. We don't, and we won't at launch. We are betting that our value doesn't depend on matching that. A fund running 10 deals a month doesn't need a database of 10 million companies — they need deep, reliable analysis on the 10 companies in front of them. Our agentic synthesis, verification layer, and VC-specific reasoning deliver value on the deals that matter to a fund, using the data available right now.

We grow data coverage because it sharpens the product — not because the product is unusable without it. The falsification condition is specific: if early users consistently report that missing data on their specific deals — not uncertainty hedging, but actually absent information on companies they're actively researching — is the primary reason they don't trust the output, this bet is wrong and data volume becomes a prerequisite before we can grow.

Bet 9: Agents Can Detect Conflicting Signals and Surface What Founders Would Rather Stay Hidden

We are betting that our agents can do something genuinely hard: reason across multiple independent data sources and detect when they don't agree. A founder's account of a prior exit that doesn't match public filings. A co-founder relationship that appears in one source and disappears in another. A company quietly rebranded after a failure that a pitch deck never mentions. A role that surfaces in a court record but not in any professional profile.

This is not aggregation — any analyst with time can aggregate. The bet is that agents can identify the discrepancy itself: what's inconsistent, what's implausibly absent, and what patterns across sources suggest deliberate omission rather than incomplete data. And that they can do this at a speed and coverage no analyst working manually can match.

The bet fails in two ways: agents miss the conflicts that matter at a rate that makes the detection layer untrustworthy, or they generate enough noise that investors stop reading the flags altogether.

Section 15

Technical Architecture Principles

The engineering commitments the team has agreed on. Not a spec — a set of principles that inform every technical decision before we've made it explicitly.

These principles exist so that the founding engineer is not making foundational architectural calls alone. They represent the team's agreed-on priorities at this stage. They should be revisited when the context changes — not ignored when they're inconvenient.

Model-Agnostic by Design

No part of the core product should be hardwired to a single model provider. Prompt logic, output parsing, and routing must be designed so that switching the underlying model — or routing specific workflow types to different models — is a configuration change, not a migration. This is not about hedging on Anthropic; it's about not creating a dependency that costs us months to unwind if pricing changes, quality diverges, or a better model for a specific task type emerges. We also want the option to route lighter tasks to cheaper, faster models and reserve deeper synthesis for the model that earns it.

Strict Workspace Isolation

No fund's data touches another fund's workflow. Deal data, enrichment results, synthesis outputs, and any persistent context associated with a customer are isolated at the storage and access layer — not just at the application layer. This is not primarily a security requirement: it is a product trust requirement. A fund that suspects their deal flow could influence another fund's output will not use the product. Design for isolation from the start; retrofitting it is expensive and incomplete.

Fail Loudly, Never Silently

When data is missing, a source fails, or the model's confidence is low, the product says so explicitly — in the output, in the UI, in the logs. Silent degradation — where a workflow completes but with incomplete data and no signal to the user — is the worst possible failure mode for a product that VC teams are supposed to trust. The output's value is tied to its honesty about its own limitations. A hedge that is clearly labeled is useful. An overconfident output that is quietly wrong is damaging.

Don't Over-Engineer for Scale We Don't Have

We are not building for 10,000 concurrent funds at launch. We are building for reliability and correctness at the scale of early design partners. Infrastructure decisions should be driven by the next milestone, not the eventual ceiling. Premature scalability investments are a form of procrastination. When we hit a scale problem, we solve it. Until then, the constraint is product quality, not throughput.

The Non-Technical Partner Is the UX Bar

Every interface decision should be tested against a single standard: can a partner with no technical background reach a trusted output without asking for help? This is not about dumbing the product down. It is about designing for the actual user. Analysts can tolerate friction and configuration. Partners cannot. We build for the partner and let the analyst benefit from the depth underneath. If a feature requires explanation to use, it is not ready to ship to partners.

Observability Is Not Optional

Logging, alerting, and monitoring are launch requirements, not post-launch improvements. We need to know when something breaks before a customer reports it. We need to trace the full path of a workflow — from input to enrichment to synthesis to output — to diagnose quality issues in production. Observability also informs product decisions: usage patterns, failure rates, and latency hotspots are the data that drives the next sprint's priorities.

Security Is Baked In, Not Bolted On

Authentication, access controls, data encryption at rest and in transit, and role-based permissions are first-class requirements. We operate in a market where customer data includes non-public information about investment targets and fund strategy. A breach is not a recoverable event for an early-stage product in this market. The penetration test referenced in §11 is a hard gate, not a nice-to-have.

Section 16

Data Ethics & Principles

Where we draw the lines on data. This is a values agreement, not a legal one. It needs to be resolved before the first real deal runs through the product.

The product surfaces sensitive information about real people — founders who have not consented to being researched by us specifically. This creates obligations that go beyond legal compliance. The team needs to agree on these principles explicitly. A question in §12 about "technically public but invasive" data cannot stay open. Every person in this company needs to know the answer before we ship.

What Data We Use

We draw on the broadest set of legally and ethically defensible sources available for professional due diligence on business principals — public records, licensed commercial data, premium legal and court APIs, regulatory databases, and commercial signal providers. The full catalog is in §17 — Authorized Data Sources. The test for inclusion is not whether data is freely accessible: it is whether the source is legitimate, the access method is authorized, and the use is consistent with the purpose for which the data exists. We do not misrepresent who we are to access any source, bypass access restrictions, or use data derived from unauthorized or leaked datasets.

What Is Off-Limits

Non-public personal information. Home addresses, personal phone numbers, private communications, and information derived from hacked or leaked datasets are off-limits. Accessible is not the same as appropriate.
Protected characteristics. Race, religion, national origin, sexual orientation, disability, and family status are not signals we surface, model on, or score. This is not only a legal requirement in many jurisdictions — it is a product integrity requirement. A diligence tool that encodes demographic bias is not one we are willing to build.
Aggregating private individuals who are not principals. We research founders, co-founders, and named executives — people who have taken on a public role by raising capital and representing a company. We do not surface information about family members, employees who have not taken public roles, or individuals who are connected to a founder but have not assumed the reputational exposure that comes with founding a company.

Handling Unverified Damaging Information

This is the hardest case. The product will sometimes surface information that, if true, is material to an investment decision — and that we cannot independently verify. A single news article alleging fraud. A forum post describing a prior company failure not mentioned on a founder's profile. A conflict between two sources about how a previous exit actually resolved.

Our policy: surface the signal, label the uncertainty, and never draw a conclusion the data doesn't support. We flag conflicting information as conflicting. We attribute everything to its source. We do not synthesize unverified signals into a verdict. The investor reads the signal and decides what to do with it. Our job is to make sure they don't miss it — not to interpret it for them when we don't have the full picture.

On "Technically Public but Invasive" Data

The test is not whether we can access it. The test is whether a reasonable founder who discovered we had used it would feel the use was fair given the context — that they raised capital, represented a company publicly, and are being evaluated by a potential investor. Information that clears that bar is in scope. Information that doesn't — personal data that happens to be accessible but was never intended to be part of a professional profile — is not.

When we're unsure, we err on the side of exclusion. The marginal value of including a borderline signal is almost always less than the reputational cost of being wrong about it.

How We Handle What We Find

Diligence outputs are shared only with the fund that ran the workflow. We do not aggregate findings across funds, reference outputs from one customer in another customer's context, or retain synthesis outputs beyond what is needed to serve the customer who generated them.
Customers can delete their data. We honor deletion requests in full. Retention periods are defined, communicated, and enforced — not theoretical.
If the product makes a material error — surfaces false information that a fund acts on — we want to know. We create a clear path for customers to report this. We treat these reports as the most important feedback we receive.

This section should be reviewed by counsel before first paid customer. It is a values document, not a legal document — but the legal version should be consistent with what's written here, not in tension with it.

Section 17

Authorized Data Sources

The specific data categories and providers we draw on. The test for inclusion: legitimate source, authorized access method, purpose consistent with professional due diligence on business principals.

Court & Legal Records

Federal and state civil and criminal filings, judgments, bankruptcies, and enforcement actions accessed via professional legal data APIs. Providers include PACER (federal courts), UniCourt, Docket Alarm, and CourtListener. This is one of the highest-signal categories: a founder's name appearing in a breach-of-contract judgment, a dismissed fraud case, or a restraining order leaves a record that rarely surfaces in any professional profile. Premium APIs make this programmatic and structured; manual research on the same records is slow and inconsistent.

Regulatory & Compliance Databases

Purpose-built regulatory records covering licensed professionals and sanctioned entities. Includes FINRA BrokerCheck (broker and adviser disciplinary history), SEC EDGAR enforcement actions and investment adviser registrations, OFAC sanctions and SDN lists, NFA BASIC (commodities and futures), the FinCEN beneficial ownership registry, and state securities regulator filings via NASAA. These databases are public and authoritative — but require structured programmatic access to query at scale and are routinely missed in manual diligence.

Business & Corporate Records

Secretary of State filings across all 50 US states: incorporations, officer and director records, registered agent history, and dissolution records. UCC lien filings, fictitious business name (DBA) registrations, and business license records. OpenCorporates and equivalent services for international jurisdictions. These records surface prior company histories, undisclosed affiliations, and entity structures that founders may not volunteer. Particularly useful for detecting quietly abandoned or rebranded ventures.

Licensed Commercial Data

Structured business and people data from providers with established legal rights to the data they sell, used by law firms, compliance teams, and financial institutions for the same purpose. Includes LexisNexis Risk Solutions, Thomson Reuters CLEAR, Dun & Bradstreet, and equivalent providers. Not "public" in the casual sense — but legitimate, licensed, and purpose-built for background research on business principals. This category delivers data quality and coverage that no public-record scraping can match, particularly for business credit history, prior address and affiliation records, and cross-jurisdictional identity resolution.

Intellectual Property Records

USPTO patent and trademark filings, EPO and WIPO for international patents, and copyright registrations. Filing history validates or contradicts claimed deep-tech backgrounds: the inventors listed, the filing dates, the assignee history, and whether IP was assigned away from a founder in a prior company. Particularly relevant for founders whose pitch centers on proprietary technology.

Alternative & Commercial Signal Data

Commercially licensed datasets that surface operational signals not visible in corporate records. Job posting data (Thinknum, Revelio Labs) reveals whether a company is actually hiring in the direction they claim. Web traffic trends (SimilarWeb) and app store metrics (Sensor Tower, data.ai) provide independent product traction signals. GitHub commit history and contributor data is available via public API and relevant for technical founders and engineering-led companies. These are signals — not verdicts — and are labeled as such in output.

International Corporate & Sanctions Records

Companies House (UK) via free public API, GLEIF global legal entity identifiers, and equivalent registries for EU and APAC jurisdictions. For sanctions and politically exposed persons (PEPs): OpenSanctions (open-source, multi-jurisdiction), Dow Jones Risk & Compliance, and World-Check (Thomson Reuters). Cross-border deals require cross-border record access; a clean US profile does not rule out a problematic international history.

Professional Profiles & Published Record

Publicly accessible professional histories (LinkedIn, company websites, published bios), news coverage and press archives (Factiva, LexisNexis News, GDELT), published interviews and podcast appearances, and web archives (Wayback Machine) for historical site content. Domain WHOIS records for prior business affiliations. These are the conventional starting points — useful for establishing a baseline and identifying what a founder claims, against which all other sources are compared.

This catalog is a working list, not a closed one. New sources are added when they meet the access and purpose tests in §16. Any provider requiring misrepresentation of who we are, bypassing an access restriction, or aggregation in ways the source explicitly prohibits is not added regardless of value.

See also Pricing estimates for all sources in this catalog are compiled in the Data Source Pricing Reference.

Appendix

Claims to Validate

Assumptions flagged for verification before this document is treated as settled.

A typical seed-stage deal gets 2–10 hours of diligence from an analyst. (Section 1 — The Problem We're Solving)
There are 4,000–8,000 actively deploying early-stage funds globally, with 2,000–2,500 in North America and 1,500–2,000 in Western Europe. These figures drive every TAM calculation in the document. Source and methodology need to be cited. (Section 6 — The Market)
A typical 5-person early-stage fund runs approximately 275 workflows per month across screening, enrichment, and deep research. This is the single most consequential input in the pricing model — it drives the ~$2,100/year ARPU figure and all TAM projections that depend on it. Needs to be grounded in design partner usage data before launch. (Section 6 — The Market; Pricing Model)
The standard-to-deep research workflow mix is 90/10. If this shifts meaningfully toward deep research, both ARPU and model cost increase materially. This ratio is assumed, not observed. (Section 7 — Business Model)
Harmonic is priced at approximately $25,000–$30,000/year (~$10,000/user/year). Competitor pricing figures are used to position One Agentic's pricing as an order of magnitude cheaper — if these benchmarks are stale or misquoted, the positioning argument weakens. (Section 8 — Competition)
PitchBook and CB Insights are priced at $15,000–$30,000+/year per seat. Same caveat as Harmonic — these are the anchors for the high-end of the competitive pricing comparison. (Section 8 — Competition)
There are approximately 66,000 active US angel investors, sourced from the Angel Capital Association. This is the entire basis for the angel investor TAM segment. The ACA figure should be verified for recency and definition of "active." (Section 6 — TAM, Angel Investor Segment)
Investors who slow down to think carefully often lose the deal — creating a systematic incentive to under-diligence on fast-moving rounds. This is asserted as structural fact in the problem framing. It is directionally credible but not cited. A data point or investor survey backing this claim would significantly strengthen Section 1. (Section 1 — The Problem We're Solving)