Real Good Research

Our research is all open source, and we are excited to share our results (and failures) with you.

Our Focus Areas

  • Investigating Computational Efficiency

    How can we save precious computational resources without sacrificing accuracy or interpretability?

    A scatter plot graph depicting data points in black, with red vertical error bars and a blue trend line.
  • Understanding High Dimensionality

    How do we model data with high dimensions (number of observations and number of variables) and their complex relationships?

    A colorful electromagnetic spectrum chart showing wavelengths from 700nm to 10,000nm, with the spectrum spanning from red on the right to violet on the left.
  • Expanding Statistical Models

    How can we learn from machine learning methodologies to modernize statistically validated models for modern complex problems?

    Graph of a sine wave with data points, fit curve, and confidence interval bands.

Select Current Research Projects

How can we help nonprofits plan for their future and learn from each other?

Mayleen Cortez-Rodriguez

An important part of our mission is to support nonprofit organizations with real good data science. This summer, we’ve been working on a revenue prediction tool for nonprofits. The goal? To provide a free and simple-to-use tool that helps nonprofits plan for the future. Using data from the National Center for Charity Statistics, which offers a database with financial information on over a million nonprofits since 1989, we find other organizations from the same subsector that had similar revenue trends in the past and use Gaussian Processes to predict an organization’s future.

Line chart comparing revenue projections of a company to similar organizations from 2022 to 2026, with multiple colored lines and shaded area indicating variation.

Can we measure the impact of policy on the unhoused population among older adults?

Dr. Imène Goumiri

This research presents a comprehensive, data-driven assessment of Continuum of Care (CoC) effectiveness in supporting unhoused older adults (65+) across California. The analysis leverages publicly available data, rigorous statistical modeling, and an innovative interpretation methodology to identify factors impacting the unhoused population and to inform policy and resource allocation. By integrating diverse datasets including unhoused counts, population demographics, and geographical information, and employing both Random Forest and ANCOVA models, we aim to uncover the underlying drivers contributing to housing insecurity in this vulnerable demographic. A key innovation is the "prediction set" approach, which enables a counterfactual comparison of CoC performance by standardizing confounding variables. Our findings highlight the critical role of housing affordability and economic stability, provide a ranking of CoCs based on their effectiveness under controlled conditions, and offer concrete policy recommendations for targeted interventions and best practice dissemination. This study contributes to a more transparent and evidence-based approach to addressing California's escalating unhoused crisis.

Bar chart showing California homelessness by age group from 2017 to 2024, with the highest homelessness among those under 18 and declining in older age groups.

How can we help identify when AIs are lying?

Dr. Amanda Muyskens

Because of the “black box” of machine learning algorithms, mistakes that AI’s make are more believable to humans reading them.  This is leading to misinformation and limits the utility of AI to the public. We believe that an AI’s response should be reflective of the confidence of the model in the question it is asked. However, this requires new mathematical methodology, which is exactly what our team is doing. Our goal is to give you a heads up when a response should be considered more skeptically, leading to less hallucinations and more truth. 

Watch this space for more research projects.

Dr. Mandy discusses why this problem is challenging: the curse of dimensionality.