AIx2 MEMORA: a Vector Space Backbone Representing US Private Market and User Preference for Advanced Deal Sourcing & Due Diligence

AIx2 Memora is the technology stack grounded in vector space embeddings based on our research with Stanford and Harvard researchers (see paper here). The technology enables curated search and robust reasoning specifically tailored for the private markets. The vector space embedding is a base to represent the entire private market as well as user preference to be used for curated search and reasoning algorithms. It acts as the brain and central intelligence for the AIx2 technology. In our previous research, we have shown how this technique can result in high performance curated search and reasoning for online platforms.

We discuss the technical underpinnings of our approach and explain how it outperforms off-the-shelf large language models (LLMs) for deal sourcing, preference extraction, and due diligence.

1. Introduction to Vector Space Embeddings

A vector space embedding is a mathematical representation of data—such as text, entities, or structured financial data—into a high-dimensional numerical space. Instead of relying solely on traditional keyword-based matching, embeddings capture semantic and contextual relationships between data points. For example, companies operating in similar industries or having correlated financial metrics will reside close to each other in the embedding space.

This concept has been used widely in natural language processing (NLP) to represent words and documents. However, AIx2 extends and optimizes these embeddings explicitly for finance, making them the bedrock of a powerful, curated search engine that caters to private investment needs.

2. Building a Finance-Specific Embedding Space

The first step in our methodology is to create an optimized embedding space for finance:

  1. Domain-Specific Training
    We train embedding models on finance-relevant corpora, including regulatory filings, analyst reports, and specialized financial literature. This ensures that the model captures domain-specific nuances—like sector trends, risk factors, and valuation metrics.

  2. Fine-Tuning
    Our next step involves fine-tuning on specialized datasets, such as private deal terms, private market valuations, and performance metrics. This ensures the embeddings not only learn general financial language but also capture the intricacies of private market transactions.

  3. Integration with Structured Data
    Financial data is not purely textual. We leverage structured datasets from platforms such as Bloomberg, Capital IQ, and other private databases. By incorporating structured features (e.g., revenues, valuations, fund performance metrics) into the training pipeline, the embedding space more accurately encodes quantitative factors crucial to private market investors.

The result is an embedding space that comprehensively reflects the financial world, capturing both textual context and key numerical signals relevant to private equity (PE), venture capital (VC), and other private investors.

3. Mapping the Entire US Private Market

Once we have a robust embedding for finance, the next step is to map the entire US private market into this vector space:

  • Aggregating Data
    We ingest a broad range of data from third-party repositories (Bloomberg, Capital IQ, etc.) as well as publicly available information on the web. This includes company profiles, performance metrics, news articles, press releases, hiring patterns, patent filings, and more.

  • Harmonization & Normalization
    Different data sources often use varying structures and terminologies. We employ a data harmonization pipeline to normalize all this information before embedding it. This ensures consistent representation within the vector space.

  • Dynamic Updating
    Private market data is constantly evolving—new rounds of funding, exits, emerging players, etc. We have a continuous ingest process that keeps our embedding space up to date, reflecting the latest developments in the market.

4. Mapping User Data into the Same Space

AIx2 also embeds the private data of a user—such as a PE fund, VC, or family office—into the same vector space. This user data may include:

  • Investment Preferences: Sector focus, typical check sizes, stage preferences, risk tolerance, and return expectations.

  • Existing Portfolio: Current holdings, performance history, strategic objectives.

  • Qualitative Criteria: Personalized concerns around governance, ESG factors, or operational synergies.

By embedding both the private market and the user’s private data into the same high-dimensional space, we can effectively extract user preferences. This allows us to model nuanced investor style, context, and priorities, which get represented as a unique “signature” in the embedding space.

5. Preference Extraction for Curated Search & Reasoning

Armed with these comprehensive embeddings, AIx2 can deliver curated search results that go beyond simple keyword matches:

  1. Nearest-Neighbor Queries
    By locating the user’s preference vector, we can find companies or funds that reside close to that vector in the embedding space—indicating strong alignment with the user’s criteria.

  2. Clustering & Similarity
    We can segment the market into clusters that reveal hidden opportunities, white spaces, or synergy potentials. This helps identify under-the-radar deals or unconventional partners with shared strategies.

  3. Advanced Reasoning Algorithms
    Our approach enables more than just matching—it supports robust reasoning about risk mitigation, synergy analysis, or exit strategy viability. Because we have structured data integrated, we can run finance-specific algorithms that evaluate deal attractiveness based on fundamentals, benchmarks, and peer comparisons.

6. Outperforming GPT & Other LLMs

Why not just ask ChatGPT or another LLM directly for deals or due diligence?

  1. Limited Context Window
    Large language models have a constrained input size. You cannot easily feed large volumes of private user data and confidential documents into GPT without risking data leakage or hitting token limits.

  2. Security & Privacy
    Passing sensitive internal data through public APIs poses security risks. Our system ensures private data stays within a secure vector database, mitigating leakage concerns.

  3. Structured Finance Representation
    GPT lacks direct access to your full, structured financial database. AIx2’s embedding system, on the other hand, is built on top of an integrated knowledge graph and numerical database, enabling granular reasoning about user preferences and private market dynamics.

  4. Continuous Updates
    AIx2 continuously ingests private market data. Our models are updated on the fly, ensuring real-time reflection of market changes. GPT-based models are typically static between major training updates.

7. Enhanced Deal Sourcing & Due Diligence

By maintaining an up-to-date embedding space:

  • Deal Sourcing
    We rapidly pinpoint investment opportunities aligned with user preferences, strategies, and portfolio needs—whether it’s a new startup at Series B or a niche SaaS company entering the growth stage.

  • LP/GP Identification
    The same embedding space includes representation of LPs, GPs, and individuals, allowing you to find potential co-investors, limited partners, or operators who share strategic outlooks and risk profiles.

  • Due Diligence
    Because we embed user-specific data alongside broader market information, we can tailor due diligence content to the user’s perspective. Factors like synergy with the user’s existing portfolio, known risk triggers, or upcoming milestones are automatically flagged and integrated into diligence reports.

8. A Continually Improving System

Our system’s performance improves over time as it learns from:

  • New Market Data
    Fresh transaction data, updated financials, or new market reports further refine the embeddings of companies, funds, and individuals.

  • User-Specific Feedback
    As users confirm deals or refine preferences, we capture that feedback to iteratively update their preference representation in the vector space.

  • Model Advancements
    We regularly invest in research to improve our embeddings—through better architectures, alternative loss functions, or domain-specific heuristics. This continuous R&D ensures that AIx2 remains a state-of-the-art solution for private markets.

9. Conclusion

Through a dedicated finance-focused vector space embedding, AIx2 delivers tailored deal sourcing, advanced reasoning, and comprehensive due diligence in the private markets. By mapping both the market and user’s private data into this continually updated space, we unlock profound insights that conventional LLMs cannot provide—while safeguarding data privacy and ensuring targeted, domain-specific intelligence.