Synthetic Data and Generative AI covers the foundations of machine learning with modern approaches to solving complex problems and the systematic generation and use of synthetic data. Emphasis is on scalability, automation, testing, optimizing, and interpretability (explainable AI). For instance, regression techniques – including logistic and Lasso – are presented as a single method without using advanced linear algebra. Confidence regions and prediction intervals are built using parametric bootstrap without statistical models or probability distributions. Models (including generative models and mixtures) are mostly used to create rich synthetic data to test and benchmark various methods.
- Emphasizes numerical stability and performance of algorithms (computational complexity)
- Focuses on explainable AI/interpretable machine learning, with heavy use of synthetic data and generative models, a new trend in the field
- Includes new, easier construction of confidence regions, without statistics, a simple alternative to the powerful, well-known XGBoost technique
- Covers automation of data cleaning, favoring easier solutions when possible
- Includes chapters dedicated fully to synthetic data applications: fractal-like terrain generation with the diamond-square algorithm, and synthetic star clusters evolving over time and bound by gravity