Josh Kraus
Data Scientist, Former Musician & Educator
Browse my work below.
Data Scientist, Former Musician & Educator
Browse my work below.
Built an end-to-end machine learning pipeline in Python to predict NCAA tournament
outcomes and generate optimized bracket picks. Engineered models in AutoGluon with
walk-forward cross-validation and backtesting. Developed a Monte Carlo simulation
framework using importance-weighted bracket selection via binomial log-likelihood, and
feature pipelines to pull from web-scraped KenPom and SportsReference data.
Stack: Python, AutoGluon, BeautifulSoup, scikit-learn
Key techniques: Automated Machine Learning, walk-forward cross-validation, class weights, Monte Carlo simulation
Highlight: 63% backtested accuracy over 10 years
Developed a production-grade async web scraper in Python to extract and analyze flight
pricing data. Architected modular batch processing system supporting concurrent scraping
with configurable rate limiting, retry logic, and error handling achieving 97.5% success rate.
Implemented a comprehensive test suite with 300 unit tests (95% code coverage) and
established CI/CD pipeline for automated testing and versioned releases.
Stack: Python, Playwright, pytest, GitHub Actions
Key techniques: Async batch architecture, retry/error handling, CI/CD, unit testing
Highlight: 97.5% scrape success rate across hundreds of routes; 300 unit tests at 95% coverage
Deep learning models have shown strong accuracy in medical image classification, but their black-box nature makes them unviable in clinical settings, they can't be interpreted, validated, or approved by the FDA as diagnostic tools. This project explored whether interpretable traditional ML models could serve as a feasible alternative, using the NIH Chest X-ray dataset (112,000+ radiographs) to classify pleural effusion versus no finding. Seven classifiers were trained and tuned including kernel-SVM, Gradient Boosting, ANN, and Naïve Bayes across balanced and imbalanced training sets, with four data augmentation techniques tested systematically. The kernel-SVM on balanced data achieved an AUC of 0.75, matching published deep learning benchmarks on the same dataset at a fraction of the computational cost and with full interpretability. A key finding was the profound impact of class imbalance: models trained on imbalanced data achieved misleadingly high accuracy while completely failing to detect the minority class.
Stack: Python, scikit-learn, Keras, ImgAug, NumPy
Key techniques: Kernel-SVM, Gradient Boosting, ANN, hyperparameter tuning, data augmentation
Identifying zip codes with strong short-term rental returns requires synthesizing property values, occupancy data, and local socioeconomic factors. This project built a machine learning pipeline to classify U.S. zip codes as profitable or not for Airbnb investment, defined as an annualized rental yield of 18% or higher. Data was scraped from Rabbu using Selenium across 4,300+ zip codes and merged with Zillow home value data and U.S. Census socioeconomic variables, resulting in a 73-feature dataset with significant class imbalance (only 5.5% profitable). After evaluating 12 classifiers via PyCaret and testing resampling techniques including SMOTE and GANs, a LightGBM model achieved 81.1% precision on the minority class. The model identified average daily rate, home value, and occupancy as the top predictors, and surfaced a counterintuitive finding: the most profitable zip codes tended to have lower-than-average home values, suggesting that high-priced markets are often harder to generate returns in despite commanding higher nightly rates.
Stack: Python, PySpark, LightGBM, PyCaret, Selenium, Pandas, scikit-learn, imbalanced-learn
Key techniques: Web scraping, data integration, class imbalance (SMOTE, GANs), LightGBM, feature importance
Highlight: 81.1% minority-class precision across 4,300+ U.S. zip codes; identified key drivers of STR profitability with actionable insights for real estate investors
Before transitioning to data science, my master's thesis examined a persistent gap in music education: most college music students report feeling unprepared by their high school programs for collegiate music theory and aural skills courses, yet educators largely report teaching these subjects regularly. Was this a real discrepancy, and if so, why? To investigate, I designed and administered an IRB-approved quantitative survey to 102 public school music educators across North Carolina and Florida, analyzing responses using descriptive statistics, Spearman's rho correlation tests, and repeated measures ANOVA. Results showed no significant discrepancy between perceived importance and reported implementation but revealed that educators' confidence in their students' enjoyment of these subjects was the barrier most associated with reduced teaching frequency, a finding with direct implications for curriculum and teacher training design.
Methods: Survey design, Spearman's correlation, repeated measures ANOVA with Tukey-Kramer post-hoc tests
Highlight: This project predates my data science career, but demonstrates defining a measurable research question, designing a rigorous data collection instrument, and drawing defensible conclusions from noisy, real-world data