FeedExploreAsk AIAlertsSavedProfile

Categories

AICybersecurityInfrastructureDatabaseTech Updates

Tech news that matters.

FeedExploreAskAlertsSavedProfile
Back to feed
AI·High↗Trending

How OpenAI's AI Agent Queries 600 Petabytes

A data scientist in an office environment works on a complex data query displayed on a large computer monitor.
OpenAI logo
OpenAI news →

TL;DR: OpenAI revealed how its internal AI agent, Kepler, analyzes over 600 petabytes of data. It uses techniques like RAG and automated code analysis to overcome context limits, offering a blueprint for building large-scale AI systems.

By Neeraj Dhiman·1h ago·2 min read·updated 2m ago
Source

Key facts

Category
AI
Impact
High
Published
1h ago
Source
InfoQ

Full summary

OpenAI shared how its internal AI agent queries 600+ petabytes of data by overcoming common large language model limitations.

OpenAI has revealed how it built an internal AI data analyst, named Kepler, to navigate its enormous 600-plus petabytes of internal data. In a presentation, engineer Bonnie Xu explained that the agent was created to help teams make sense of this massive dataset. A primary challenge for any large language model is its limited context window, which restricts how much information it can consider at once. To solve this, OpenAI’s team employed several clever techniques. Kepler uses automated code crawling to understand the structure of the company's data repositories. It also heavily relies on Retrieval-Augmented Generation, or RAG, a method that allows the AI to pull in relevant, up-to-date information from external knowledge bases when answering a query, effectively extending its memory.

These methods provide a valuable blueprint for developers and CTOs building similar AI systems. The insights go beyond just managing context windows. OpenAI also detailed its approach to ensuring the agent's reliability and continuous improvement. Kepler uses a "scoped semantic memory" to learn from its interactions and get smarter over time. For evaluation, the team developed a robust pipeline using Abstract Syntax Tree (AST) based grading. Instead of just checking if the final answer is correct, this method analyzes the structure of the code the AI generates. This ensures the underlying logic is sound, preventing performance regressions and building trust in the agent's analytical capabilities.

The development of specialized agents like Kepler highlights a significant industry trend. Companies are moving beyond general-purpose chatbots and are now building custom AI tools tailored to their specific internal data and workflows. These agents act as powerful copilots for data scientists, engineers, and business analysts, drastically speeding up the process of extracting insights from vast, complex datasets. By sharing its approach, OpenAI offers a glimpse into the future of enterprise AI, where bespoke agents become essential for navigating the data-rich environments of modern technology companies.

Why it matters

OpenAI's internal playbook offers a rare look at how to solve core LLM challenges like context limits and reliable evaluation when building production-grade AI agents at a massive scale.

Business impact

Companies building their own AI data analysts or copilots can adopt these proven techniques (RAG, AST-based grading) to accelerate development, improve reliability, and unlock insights from vast internal datasets, creating a significant competitive advantage.

Tags

#openai#ai agents#rag#llms#data analysis

Related on Notifire

  • Researchllms.txt
  • ResearchRetrieval-augmented generation (RAG)
  • ResearchAI agents and agentic workflows
  • ResearchLLM evaluation

✦ Notifire newsletter

Get more AI intelligence

Join engineers getting Notifire’s verified tech briefings — short, sourced, and free. No spam, unsubscribe anytime.

The day's most important tech briefings. No spam, unsubscribe anytime.

Related stories

Primary source: InfoQ

Part of our research on

  • Retrieval-augmented generation (RAG) →
  • AI agents and agentic workflows →

Tech intelligence for engineering teams

Short, verified briefings on AI, cybersecurity, infrastructure, and data — with the analysis and action steps that matter. Every briefing is sourced, fact-checked, and bylined to a named editor.

[email protected]Story tips & corrections welcomeHow we report →

The Notifire briefing

Verified tech intelligence in your inbox — AI, security, infra, and data.

The day's most important tech briefings. No spam, unsubscribe anytime.

Sections

  • AI
  • Cybersecurity
  • Infrastructure
  • Database
  • Tech Updates
  • Web3 & Chains

Newsroom

  • About Notifire
  • Editorial team
  • Editorial standards
  • Methodology
  • AI disclosure
  • Corrections

Resources

  • Explore
  • Research hubs
  • Comparisons
  • Tech glossary
  • FAQ
  • Alerts & watchlists

Follow

  • RSS feed
© 2026 NotifirePrivacyTermsCorrections
An independent, AI-assisted publication. Built at </Alpheric>
IntelligenceLive panel
Live

Top trending

Last 24h

    Popular tags

    Add to watchlist

    +OpenAI+Claude+PostgreSQL+Kubernetes+Cloudflare+AWS+CVE Critical

    Notifire score

    0–100 priority signal — combines impact, freshness, trending velocity, and source credibility.

  1. Atom feed
  2. LinkedIn
  3. X / Twitter
  4. Facebook
  5. Instagram
  6. YouTube