democratizing data.

the people’s think tank making the world’s public data usable — in minutes instead of weeks, accelerated by ai.

get early access

01 / roadmap

where point luna is going.

the warehouse

in beta

every major public dataset, cleaned, documented, and joined. request access and start querying within days.

get early access

the mcp

beta soon

plug point luna into your existing ai client. ask questions in plain english, get answers grounded in source data.

join the waitlist

the research

coming soon

rigorous investigations published openly. data and methodology released alongside every report.

02 / who it’s for

built for the analysts doing the work that matters.

academic researchersskip the cleanup. start with the question.
think tanksproduce policy briefs grounded in joined evidence.
nonprofitsmeasure impact against open public baselines.
corporate analystsmacroeconomic context on demand, grounded in source data.
independent analystspublish real work on real data. get spotlighted.

03 / early access

get early access.

+free during beta
+access to our data warehouse with public data (most requests granted within 2 business days)
+early access to the mcp when it ships
+direct line to the team

04 / faq

questions.

what is point luna?

point luna is an organization on a mission to build the largest relational database of public data in the world, and integrate it with ai agents to automate data analysis and democratize decision-making.

our first product is an mcp server that lets ai clients — claude, cursor, chatgpt, and others — query the point luna warehouse directly. it’s free during the beta. join the waitlist to get early access.

how does point luna work?

we collect public datasets, ingest them, clean them up, document them, and make them easy to query — all joined together at the geographic level so you can ask questions across data that historically lived in incompatible silos. this lets us expand easily.

the default unit of analysis is the county, with state, metro, and zip available where the underlying data supports it. counties give us a useful balance: granular enough to surface real local variation, broad enough that most public data is reliably reported at that grain, and stable enough that you can build time-series across decades.

sources include the U.S. Census Bureau, BLS, BEA, CDC, IRS, HUD, FHFA, EPA, FBI, NHTSA, the Department of Education, MIT Election Lab, federal program records, and a long tail of state and local agencies. the full catalog and current ingestion status live in our data library.

what counts as a “public” dataset?

some concrete examples to give you an idea:

federal and state agency data released under open data policies
academic datasets released under permissive licenses
aggregated administrative records that have been de-identified and published

we do not include:

anything containing pii or anything that could be re-identified
proprietary or licensed data, even if it’s been leaked or scraped
data behind paywalls or restricted access agreements
anything we can’t fully document the provenance of

every dataset in point luna is documented end-to-end: where it came from, when we pulled it, what we changed during cleaning, and what its known limitations are.

what data is included today?

we’re currently focused on U.S. data at the county, metro, and state level, organized into the domains people actually make decisions on:

economy — employment, wages, industry mix, ai exposure, business dynamism
demographics — population, race/ethnicity, age, education, migration, language
health — mortality, chronic disease, healthcare access, healthcare cost, environmental health
housing — home values, rents, cost burden, mortgage activity, vacancy, supply
education — test scores, attainment, enrollment, school finance, post-college outcomes
elections — federal and state-level results, partisan trends, competitiveness
energy & climate — air quality, emissions, electricity costs, climate exposure (early)
immigration — foreign-born population, language, naturalization (early)

coverage isn’t uniform — some domains have a decade of clean time-series, others are still being built out. the honest answer for what’s ready today is in the dataset catalog, with reliability tiers and coverage notes on every table.

what's coming next?

hyperlocal data. 311 records, building permits, local crime data, and other municipal sources. most U.S. metros publish this, but it’s a mess of incompatible formats. we’re prioritizing the cities where there’s a clear use case — join the waitlist if you have one.

international data. Canada, UK, EU, and India are next on the roadmap. cross-country comparison is one of the highest-leverage things you can do with structured public data, especially for policy questions.

more domains. crime, transportation, social safety net, mobility/opportunity, family/aging, and democracy indicators are all on the longer-term list.

if there’s a use case driving you to ask, join the waitlist and tell us — we prioritize based on what people will actually use.

how do i add data to point luna?

when you request early access, feel free to provide context on data you’d like to see added. at minimum: name the dataset and explain your use case, and we’ll review and provide a timeline if accepted.

every addition goes through our review process before it lands in the public schema. we check:

provenance — where the data comes from, who publishes it, and whether the license allows redistribution
refresh cadence and reliability — how often it updates, how stable the schema is, and how trustworthy the source has been historically
overlap — whether we already have a comparable or better source covering the same ground
integrity — automated and manual checks for cleanliness, completeness, and consistency before it’s exposed to users
use case — why is this data important and actionable?

datasets that pass review get a public manifest, a reliability tier, and a documentation page. datasets that fail get an honest writeup of why we’re not including them, so the next person asking the same question doesn’t have to redo the work.

what does it cost?

point luna is free during the beta. we expect to introduce paid tiers but we are committed to keeping subscription costs low because we genuinely believe this is something the world needs.

there’s one cost worth naming clearly upfront, because it isn’t ours:

BigQuery compute. point luna lives in Google BigQuery as a public dataset. Google includes 1 TB of query processing per month at no charge, and applies standard BigQuery pricing above that. most users — anyone running typical county-level analyses — never come close to the free tier. heavy users (full-table scans, large joins, repeated queries on big sources like HMDA) can rack up real costs, and we’d rather you know that going in than be surprised later.

how do i access the data?

haven’t signed up for early access yet? start here →

got an email from us? follow our setup guide →

who's behind this?

point luna is built by a small team with backgrounds in data science and software engineering in big tech. if you want to contribute — code, data, analysis, editorial work, or just sharp questions — we want to hear from you.

how do i get started?

browse the data — check our dataset catalog →
query directly — request early access; once you’re in, the docs walk through your first query in about five minutes
get in touch — for data requests, collaboration, or use cases we haven’t thought of yet, request early access or email us

ready to see what public data can actually do?

get early access