talk to the world's public data

turbo charge your LLM by connecting it to the world’s largest public database

built for the analysts doing the work that matters.

  • academic researchersskip the cleanup. start with the question.
  • think tanksproduce policy briefs grounded in joined evidence.
  • nonprofitsmeasure impact against open public baselines.
  • corporate analystsmacroeconomic context on demand, grounded in source data.
  • independent analystspublish real work on real data. get spotlighted.

questions.

point luna is an organization on a mission to build the largest relational database of public data in the world, and integrate it with ai agents to automate data analysis and democratize decision-making.

our first product is an mcp server that connects ai clients — claude, cursor, chatgpt, and others — to point luna and queries the warehouse on their behalf. it’s free to get started. sign up.

we collect public datasets, ingest them, clean them up, document them, and make them easy to query — all joined together at the geographic level so you can ask questions across data that historically lived in incompatible silos. this lets us expand easily.

the default unit of analysis is the county, with state, metro, and zip available where the underlying data supports it. counties give us a useful balance: granular enough to surface real local variation, broad enough that most public data is reliably reported at that grain, and stable enough that you can build time-series across decades.

sources include the U.S. Census Bureau, BLS, BEA, CDC, IRS, HUD, FHFA, EPA, FBI, NHTSA, the Department of Education, MIT Election Lab, federal program records, and a long tail of state and local agencies. the full catalog and current ingestion status live in our data library.

some concrete examples to give you an idea:

  • federal and state agency data released under open data policies
  • academic datasets released under permissive licenses
  • aggregated administrative records that have been de-identified and published

we do not include:

  • anything containing pii or anything that could be re-identified
  • proprietary or licensed data, even if it’s been leaked or scraped
  • data behind paywalls or restricted access agreements
  • anything we can’t fully document the provenance of

every dataset in point luna is documented end-to-end: where it came from, when we pulled it, what we changed during cleaning, and what its known limitations are.

we’re currently focused on U.S. data at the county, metro, and state level, organized into the domains people actually make decisions on:

  • economy — employment, wages, industry mix, ai exposure, business dynamism
  • demographics — population, race/ethnicity, age, education, migration, language
  • health — mortality, chronic disease, healthcare access, healthcare cost, environmental health
  • housing — home values, rents, cost burden, mortgage activity, vacancy, supply
  • education — test scores, attainment, enrollment, school finance, post-college outcomes
  • elections — federal and state-level results, partisan trends, competitiveness
  • energy & climate — air quality, emissions, electricity costs, climate exposure (early)
  • immigration — foreign-born population, language, naturalization (early)

coverage isn’t uniform — some domains have a decade of clean time-series, others are still being built out. the honest answer for what’s ready today is in the dataset catalog, with reliability tiers and coverage notes on every table.

hyperlocal data. 311 records, building permits, local crime data, and other municipal sources. most U.S. metros publish this, but it’s a mess of incompatible formats. we’re prioritizing the cities where there’s a clear use case — tell us if you have one.

international data. Canada, UK, EU, and India are next on the roadmap. cross-country comparison is one of the highest-leverage things you can do with structured public data, especially for policy questions.

more domains. crime, transportation, social safety net, mobility/opportunity, family/aging, and democracy indicators are all on the longer-term list.

if there’s a use case driving you to ask, tell us — we prioritize based on what people will actually use.

you can tell us about data you’d like to see added. at minimum: name the dataset and explain your use case, and we’ll review and provide a timeline if accepted.

every addition goes through our review process before it lands in the public schema. we check:

  • provenance — where the data comes from, who publishes it, and whether the license allows redistribution
  • refresh cadence and reliability — how often it updates, how stable the schema is, and how trustworthy the source has been historically
  • overlap — whether we already have a comparable or better source covering the same ground
  • integrity — automated and manual checks for cleanliness, completeness, and consistency before it’s exposed to users
  • use case — why is this data important and actionable?

datasets that pass review get a public manifest, a reliability tier, and a documentation page. datasets that fail get an honest writeup of why we’re not including them, so the next person asking the same question doesn’t have to redo the work.

point luna is free to get started. we expect to introduce paid tiers but we are committed to keeping subscription costs low because we genuinely believe this is something the world needs.

point luna runs through your ai client over the mcp. connect the mcp →

point luna is built by a small team with backgrounds in data science and software engineering in big tech. if you want to contribute — code, data, analysis, editorial work, or just sharp questions — we want to hear from you.

ready to see what public data can actually do?

sign up