
Brock Webb
This book exists to help move official statistics beyond the traditional, into applied machine learning and onto GenAI.
Federal statistical agencies collect, process, and publish some of the most consequential data in the world. The methods behind that work are evolving. Machine learning, large language models, and agentic AI are not future possibilities; they are current tools with real applications in survey operations, data processing, and statistical production.
This book covers what those methods are, how they work, when they apply, and when they do not. Every chapter includes plain-language explanation, practical context for federal operations, and guidance on explaining methods to leadership and stakeholders.
All examples use public datasets from the Census Bureau, Bureau of Labor Statistics, and other federal sources. No proprietary data, no restricted access required.
How This Book Is Organized¶
Fifteen chapters across five parts, each building on the previous.
These examples use public U.S. Census Bureau data because it is public and accessible through a stable API that is well documented. The concepts — when to apply machine learning, how to evaluate AI systems, what questions to ask — apply across all of official statistics.
Contents¶
- AI for Official Statistics
- Foreword
- Version Information
- Introduction to AI for Official Statistics
- Chapter 1 - Regression and Classification for Survey Data
- Chapter 2 - Cross-Validation and Model Selection
- Chapter 3 - Decision Trees and Random Forests
- Chapter 4 - Neural Networks Basics
- Chapter 5 - Graph Thinking and Record Linkage
- Chapter 6 - Dimension Reduction and Geographic Segmentation
- Chapter 7 - Imputation Methods for Survey Data
- Chapter 8 - Bias, Fairness, and Equity in Federal AI/ML
- Chapter 9 - Synthetic Data Generation for Federal Statistics
- Chapter 10 - Statistical Disclosure Limitation in the Age of AI
- Chapter 11 - Transformers for Survey Text Classification
- Chapter 12 - Large Language Models for Survey Operations
- Chapter 13 - Agentic AI for Federal Statistical Operations
- Chapter 14 - Evaluating AI Systems for Federal Use
- Chapter 15 - Capstone: Reproducible AI-Assisted Research
- Bibliography and References
- Glossary
- Index