democratizing data.

the people’s think tank making the world’s public data usable — in minutes instead of weeks, accelerated by ai.

where point luna is going.

01

the warehouse

in beta

every major public dataset, cleaned, documented, and joined. request access and start querying within days.

get early access
02

the mcp

beta soon

plug point luna into your existing ai client. ask questions in plain english, get answers grounded in source data.

join the waitlist
03

the research

coming soon

rigorous investigations published openly. data and methodology released alongside every report.

built for the analysts doing the work that matters.

  • academic researchersskip the cleanup. start with the question.
  • think tanksproduce policy briefs grounded in joined evidence.
  • nonprofitsmeasure impact against open public baselines.
  • corporate analystsmacroeconomic context on demand, grounded in source data.
  • independent analystspublish real work on real data. get spotlighted.

get early access.

  • +free during beta
  • +access to our data warehouse with public data (most requests granted within 2 business days)
  • +early access to the mcp when it ships
  • +direct line to the team
* required fields

questions.

point luna is an organization on a mission to build the largest relational database of public data in the world, and integrate it with ai agents to automate data analysis and democratize decision-making.

our first product is an mcp server that lets ai clients — claude, cursor, chatgpt, and others — query the point luna warehouse directly. it’s free during the beta. join the waitlist to get early access.

we collect public datasets, ingest them, clean them up, document them, and make them easy to query — all joined together at the geographic level so you can ask questions across data that historically lived in incompatible silos. this lets us expand easily.

the default unit of analysis is the county, with state, metro, and zip available where the underlying data supports it. counties give us a useful balance: granular enough to surface real local variation, broad enough that most public data is reliably reported at that grain, and stable enough that you can build time-series across decades.

sources include the U.S. Census Bureau, BLS, BEA, CDC, IRS, HUD, FHFA, EPA, FBI, NHTSA, the Department of Education, MIT Election Lab, federal program records, and a long tail of state and local agencies. the full catalog and current ingestion status live in our data library.

some concrete examples to give you an idea:

  • federal and state agency data released under open data policies
  • academic datasets released under permissive licenses
  • aggregated administrative records that have been de-identified and published

we do not include:

  • anything containing pii or anything that could be re-identified
  • proprietary or licensed data, even if it’s been leaked or scraped
  • data behind paywalls or restricted access agreements
  • anything we can’t fully document the provenance of

every dataset in point luna is documented end-to-end: where it came from, when we pulled it, what we changed during cleaning, and what its known limitations are.

we’re currently focused on U.S. data at the county, metro, and state level, organized into the domains people actually make decisions on:

  • economy — employment, wages, industry mix, ai exposure, business dynamism
  • demographics — population, race/ethnicity, age, education, migration, language
  • health — mortality, chronic disease, healthcare access, healthcare cost, environmental health
  • housing — home values, rents, cost burden, mortgage activity, vacancy, supply
  • education — test scores, attainment, enrollment, school finance, post-college outcomes
  • elections — federal and state-level results, partisan trends, competitiveness
  • energy & climate — air quality, emissions, electricity costs, climate exposure (early)
  • immigration — foreign-born population, language, naturalization (early)

coverage isn’t uniform — some domains have a decade of clean time-series, others are still being built out. the honest answer for what’s ready today is in the dataset catalog, with reliability tiers and coverage notes on every table.

hyperlocal data. 311 records, building permits, local crime data, and other municipal sources. most U.S. metros publish this, but it’s a mess of incompatible formats. we’re prioritizing the cities where there’s a clear use case — join the waitlist if you have one.

international data. Canada, UK, EU, and India are next on the roadmap. cross-country comparison is one of the highest-leverage things you can do with structured public data, especially for policy questions.

more domains. crime, transportation, social safety net, mobility/opportunity, family/aging, and democracy indicators are all on the longer-term list.

if there’s a use case driving you to ask, join the waitlist and tell us — we prioritize based on what people will actually use.

when you request early access, feel free to provide context on data you’d like to see added. at minimum: name the dataset and explain your use case, and we’ll review and provide a timeline if accepted.

every addition goes through our review process before it lands in the public schema. we check:

  • provenance — where the data comes from, who publishes it, and whether the license allows redistribution
  • refresh cadence and reliability — how often it updates, how stable the schema is, and how trustworthy the source has been historically
  • overlap — whether we already have a comparable or better source covering the same ground
  • integrity — automated and manual checks for cleanliness, completeness, and consistency before it’s exposed to users
  • use case — why is this data important and actionable?

datasets that pass review get a public manifest, a reliability tier, and a documentation page. datasets that fail get an honest writeup of why we’re not including them, so the next person asking the same question doesn’t have to redo the work.

point luna is free during the beta. we expect to introduce paid tiers but we are committed to keeping subscription costs low because we genuinely believe this is something the world needs.

there’s one cost worth naming clearly upfront, because it isn’t ours:

BigQuery compute. point luna lives in Google BigQuery as a public dataset. Google includes 1 TB of query processing per month at no charge, and applies standard BigQuery pricing above that. most users — anyone running typical county-level analyses — never come close to the free tier. heavy users (full-table scans, large joins, repeated queries on big sources like HMDA) can rack up real costs, and we’d rather you know that going in than be surprised later.

haven’t signed up for early access yet? start here →

got an email from us? follow our setup guide →

point luna is built by a small team with backgrounds in data science and software engineering in big tech. if you want to contribute — code, data, analysis, editorial work, or just sharp questions — we want to hear from you.

ready to see what public data can actually do?

get early access