AIX2 RELU: Reinforcement Learning from User Interactions, A Unique Technology for Organic User Feedback for Off-Policy Learning

In our research, we’ve coined the term “RLUI” in collaboration with Stanford researchers, building on our research presented in the book Reinforcement Learning Bit by Bit. This is a critical part for the success of our AI agents for deal matching and due diligence which relies on information uniquely gathered in our vertically integrated platform (which will be otherwise lost). Importantly, RLUI gives our agents a strong network effect. Our approach differs from traditional methods like Reinforcement Learning from Human Feedback (RLHF) by focusing on organic user interactions. Instead of needing explicit feedback (like having users rank multiple AI outputs in a labeling interface), RLUI passively observes how professionals use our platform—what deals they open, how they edit memos, which data points they add or delete—and uses that behavioral data to learn in real time. Also as compared to historical data, RLUI allows collecting off-policy data for the reinforcement learning of the agents.

Below, we’ll walk you through why RLUI matters for private equity (PE), venture capital (VC), and other private market funds, and how our vertically integrated platform is making it all possible.

Imagine hiring a new analyst on your investment team who quietly watches how your most seasoned partners discuss deals, perform due diligence, and write memos. They take note of every subtle preference—like how you structure risk assessments or the specific financial ratios you care about—and quickly adapt to deliver insights that match your style and needs. That’s basically what Reinforcement Learning from User Interactions (RLUI) does in the world of AI-driven private investments.

Why Generic AI Agents Struggle in Private Investments

Large language models (LLMs) like ChatGPT have impressed the world with their ability to chat and reason generally. But they can fall short when working in highly specialized domains like private equity or venture capital. Why?

  1. Specialized Jargon & Context
    Finance professionals deal with acronyms and concepts—CAGR, IRR, post-money valuations, convertible notes—that might confuse a general-purpose AI trained mostly on open-web data.

  2. Confidential or Proprietary Data
    Fund managers rely on data not available publicly, like internal due diligence docs, specialized research reports, and private deal memos. Generic AI systems don’t have access to that.

  3. Rapidly Evolving Needs
    Strategies can pivot quickly based on market conditions. If an AI is not constantly updating, it becomes stale.

RLUI addresses these challenges by embedding the AI agent in your day-to-day workflows, letting it “watch” and learn from every document edit, every deal view, and every snippet of due diligence conversation—without having to rely on external annotators.

What Exactly is RLUI?

In simpler terms, Reinforcement Learning from Human Feedback (RLHF) gathers explicit ratings from human labelers, who compare AI outputs and pick the best one. It’s great for certain tasks, but it can be costly and lacks real-world nuance—especially if the labelers don’t have the domain expertise of a private equity partner.

RLUI takes an entirely different tack:

  • It captures organic user interactions inside a platform.

  • It interprets these interactions as positive or negative “rewards.”

  • It updates its AI models continuously—learning what types of deals or memos meet real user needs.

For instance, if you open a suggested healthcare deal and spend time reading it, that’s a positive signal—the AI probably got something right. If you repeatedly ignore deals in a certain category, the AI picks up on that preference and suggests fewer of them in the future.

Why You Need a Vertically Integrated Platform

This all works best when you have a single environment that handles:

  1. Deal Recommendations
    Where the AI can surface opportunities based on sector, geography, or your past investment history.

  2. Due Diligence Workflows
    The ability to edit memos, attach references, or comment on risk assessments.

  3. Collaboration
    Real-time chat or note-sharing among team members, all tracked in one place.

This “one-stop shop” ensures the AI has a clear line of sight into user behavior without collecting extra or sensitive data or asking a human expert to rate the tasks explicitly. The AI sees how you move from one recommended deal to another, which paragraphs in a memo you rewrite, and so on—no extra labeling steps needed.

Real-Time Adaptation vs. Historical Data

Traditional training might rely heavily on historical deals—looking at which ones succeeded. These are limited to on-policy data. That’s useful, but it’s also limited to the past. RLUI complements this with real-time signals that also include off-policy data, i.e. actions that AI did not do completely correct and was corrected by the human investor in the platform.

Over time, the AI becomes like a new analyst who not only learns from final outcomes but also from how deals evolve on the go.

Stronger Network Effects (and Your Competitive Moat)

Imagine 10, 20, or 100 different investment teams all using the same core AI system (with each team’s data strictly private to them). The agent “observes” countless ways investors clarify strategies, prioritize metrics, and shape memos across various industries and stages.

None of these specifics leak from one team to another; everyone’s data is protected. Yet the system still builds a high-level understanding of “what works” in different scenarios. This creates a network effect that constantly levels up the AI’s expertise. And because these interactions are unique to AIx2’s platform, it forms a defensive moat—competitors can’t easily replicate that rich dataset of real-time user behavior.

Early Results Speak Volumes

In pilot studies with ~50 users, we found that:

  1. Deal Recommendation Acceptance went up by nearly 30% after just three months with RLUI.

  2. Memo Editing Overhead dropped by almost 50%, meaning memos generated by the AI needed fewer changes.

  3. User Satisfaction Scores improved steadily, showing that real users felt the AI was adding more value.

As the AI better understands your specific criteria and editing style, it reduces busywork, surfaces higher-quality deals, and ensures your team can focus on strategic tasks rather than repetitive data gathering.

  1. Recommendation Acceptance (%) and Memo Edit Overhead (%) share the primary y-axis.

  2. User Satisfaction (1-5) uses a secondary y-axis for better clarity.

Looking Ahead

While RLUI shows great promise, a few points to keep in mind:

  • Cold Start: New users won’t see immediate benefits since the AI needs time to observe your behaviors. However, it ramps up quickly.

  • Niche Subdomains: If your fund focuses on something really specialized—like deep-sea shipping startups—you may need extra curated data.

  • Scaling: RLUI’s true potential will shine as we onboard more users and funds, further refining the AI’s understanding across a diverse range of investment styles.