Social Explorer includes a wide range of demographic, economic, social, environmental, and thematic datasets. When a user submits a question, the AI Assistant automatically selects the dataset that can produce the most accurate, complete, and reliable answer. This decision is based on geographic requirements, variable availability, time coverage, and methodological consistency.
How the AI Selects a Dataset
The AI first evaluates the geographic level requested. Some datasets support only certain geographies. For example, ACS 1-year estimates are available only for larger populations, while ACS 5-year estimates support all geographic levels, including census tracts and block groups. For small or detailed areas, the AI selects datasets that provide full coverage.
Next, the AI checks which datasets contain the required variables. Many variables exist only in specific tables or only for certain years. If a variable is not available in a 1-year dataset, the AI automatically switches to a 5-year dataset or another appropriate source.
If a year or multi-year comparison is requested, the AI selects datasets that support compatible timeframes and survey methodologies to ensure valid comparisons. For non-ACS topics such as crime, health, or environmental data, the AI selects relevant datasets from the broader Social Explorer Data Library.
Examples of Dataset Selection
If a user asks for the population of Miami in 2020, the AI selects ACS 2020 5-year estimates for full city coverage. If a user compares median income between Los Angeles and New York in 2022, the AI uses the ACS 2022 1-year estimates when both cities qualify. For median income by census tract in Cook County, the AI selects ACS 5-year estimates because tract-level data is not available in 1-year datasets.
Fallback Logic and Transparency
When the initially selected dataset cannot fulfill the request, the AI applies fallback rules. This may include switching from 1-year to 5-year data, adjusting the geographic level, selecting an alternative variable, or explaining why the request cannot be completed.
Every AI response includes a Sources panel showing the dataset, table, variables, geography, and relevant methodology notes. This ensures complete transparency, allowing users to understand and verify how the dataset was selected.