Real Good Research

Our research is all open source, and we are excited to share our results (and failures) with you.

Our Focus Areas

  • Investigating Computational Efficiency

    How can we save precious computational resources without sacrificing accuracy or interpretability?

  • Understanding High Dimensionality

    How do we model data with high dimensions (number of observations and number of variables) and their complex relationships?

  • Expanding Statistical Models

    How can we learn from machine learning methodologies to modernize statistically validated models for modern complex problems?

Select Current Research Projects

How can we help nonprofits plan for their future and learn from each other?

Mayleen Cortez-Rodriguez

An important part of our mission is to support nonprofit organizations with real good data science. This summer, we’ve been working on a revenue prediction tool for nonprofits. The goal? To provide a free and simple-to-use tool that helps nonprofits plan for the future. Using data from the National Center for Charity Statistics, which offers a database with financial information on over a million nonprofits since 1989, we find other organizations from the same subsector that had similar revenue trends in the past and use Gaussian Processes to predict an organization’s future. Check out this sneak preview of what our tool can do, and stay tuned–we’ll be launching in the next couple of weeks!

Can we identify how policy impacts homelessness in older adults?

Dr. Imène Goumiri

This research presents a comprehensive, data-driven assessment of Continuum of Care (CoC) effectiveness in combating homelessness among older adults (65+) across California. The analysis leverages publicly available data, rigorous statistical modeling, and an innovative interpretation methodology to identify factors influencing homelessness and to provide actionable insights for policy and resource allocation. By integrating diverse datasets including homelessness counts, population demographics, and geographical information, and employing both Random Forest and ANCOVA models, we aim to uncover the underlying drivers of homelessness in this vulnerable demographic. A key innovation is the "prediction set" approach, which enables a counterfactual comparison of CoC performance by standardizing confounding variables. Our findings highlight the critical role of housing affordability and economic stability, provide a ranking of CoCs based on their effectiveness under controlled conditions, and offer concrete policy recommendations for targeted interventions and best practice dissemination. This study contributes to a more transparent and evidence-based approach to addressing California's escalating homelessness crisis.

How can we help identify when AIs are lying?

Dr. Amanda Muyskens

Because of the “black box” of machine learning algorithms, mistakes that AI’s make are more believable to humans reading them.  This is leading to misinformation and limits the utility of AI to the public. We believe that an AI’s response should be reflective of the confidence of the model in the question it is asked. However, this requires new mathematical methodology, which is exactly what our team is doing. Our goal is to give you a heads up when a response should be considered more skeptically, leading to less hallucinations and more truth. 

Watch this space for more research projects.