Greg Tozzi

Data Scientist | Retired Senior Military Officer | Active Clearance

About Me

I'm a retired senior military officer (Captain, U.S. Coast Guard) with an active security clearance and a passion for the practice and promise of data science. To that end, I recently completed the UC Berkeley School of Information's Master of Information and Data Science program. I believe that data science needs to be focused on the needs of decision-makers. Dedicated and driven, I bring a proven track record of achieving remarkable results—often under extremely challenging circumstances—individually, as a collaborator, and as a leader. If you're looking for someone who turns intent into action and who brings a rare mix of sharp data science skills and deep expertise in leadership, operations, policy, and strategy, let's connect.

READ MY RESUME

What I Do

Statistical Programming and ML. Using Python with Scikit-Learn, R with caret. Dabbling in Julia to keep things interesting.

Data Visualization. Building clean and interactive web apps using D3, Vega-Lite/Altair, and Shiny.

Deep Learning. Building models in TensorFlow's sequential and functional APIs, custom loss functions, training in the cloud, and deploying to the edge.

NLP. Classification using deep learning. Transformer models with Huggingface. Word2vec models with Gensim.

Causal Inference. Designing experiments, A/B tests, randomization inference.

Tying Data Science to Decision-Making. Explaining implications to executive leaders and driving action.

Education, Fellowships, and Certifications

Degrees

UC Berkeley School of Information

Master of Information and Data Science

May 2021

UC Berkeley Haas School of Business

Master of Business Administration

December 2008

Massachusetts Institute of Technology

Master of Science, Naval Architecture and Marine Engineering

May 2004

READ MY THESIS

Massachusetts Institute of Technology

Master of Science, Mechanical Engineering

May 2004

U.S. Coast Guard Academy

Bachelor of Science, Naval Architecture and Marine Engineering

Fellowships

Center for a New American Security

MIT Seminar XXI

Certifications

Databricks Certified Data Engineer Associate

December 2022

Certified TensorFlow Developer

January 2021

Professional Scrum Product Owner I

April 2022

KNIME L1

January 2022

Published Work

Getting it Righter, Faster | CNAS | August 2020

In a report published by the Center for a New American Security, a leading Washington, DC-based security policy think tank, Dr. Kaythrn McNabb Cochran and I argue that effective prediction is the cornerstone of agile decision-making. We survey predictive methodologies available to policymakers and present the results of a consulting engagement conducted with Good Judgment, Inc. and the U.S. Department of State. We identify key political and structural impediments to acting on model outputs and provided targeted recommendations to overcome them. Our work was informed by interviews with current and former cabinet secretaries, leaders and program managers at IARPA and In-Q-Tel, and members of the academic community.

READ THE REPORT

Projects

Rule5.ai - Probabilistic Prediction of Vessel Trajectories

My team and I trained a novel deep learning model on automatic identification system (AIS) data. The model provides tactically-meaningful probabilistic predictions of the trajectories of vessels in San Francisco Bay over the course of 30 minutes using six minutes of input data. Our solution is entirely novel and stands apart from recently published deterministic models.

TensorFlow | AWS Athena | H3 | D3 | Leaflet

VISIT Rule5.ai

Visualizing San Francisco Bay Vessel Traffic

My team and I built a visualization of vessel traffic in San Francisco Bay using a large set of transponder data collected by the Coast Guard and provided by NOAA. Our aim was to provide professional mariners with a tool to understand vessel traffic patterns. To that end, we conducted nearly a dozen user tests with professional mariners and prioritized features using the MoSCoW framework.

Python | Vega-Lite | Altair | GeoPandas | Mapshaper | HTML/CSS | Javascript

TRY THE APP

VIEW THE REPO

Toward Automated Celestial Navigation with Deep Learning

A colleague and I set out to demonstrate that automated celestial navigation could be carried out at the edge. We built an exotic deep regression model in TensorFlow using the functional API, trained it in the cloud with a custom loss function and synthetic images we created using open source astronomy software, deployed the model to an NVIDIA Jetson TX2, and showed promising results against synthetic test images.

Python | TensorFlow | Keras | ktrain | Bash | Docker | Edge

VIEW THE REPO

Detecting Evidence of Gender Discrimination in Fijian Court Documents

A colleague and I built models to detect evidence of gender discrimination from court documents hand-labeled by domain experts. We explored the challenges that arise from having large variation in document sizes and content. Our modeling effort spanned word2vec to CNNs to cutting edge BERT-based transformer models.

Python | NLP | TensorFlow | ktrain | Gensim | Huggingface | Explainable AI

READ THE REPORT

VIEW THE REPO

A/B Testing for Email Fundraising

My team and I consulted with a non-profit to optimize their end-of-year email fundraising drive. We designed an experiment to test choices of from and subject lines, performed power calculations, provided random assignments to the client, built an analysis pipeline using the Mailchip API, and conducted statistical tests on the results. We even managed to find a statistically significant result.

R | data.table | knitr | Causal Inference | Regression Analysis | Robust Statistics

READ THE REPORT

VIEW THE REPO

Detecting Pneumonia in Pediatric Chest X-Rays with Deep Learning

I applied a CNN to detecting pneumonia in pediatric chest x-rays, achieving 92% accuracy. I provided visualizations of intermediate activations to give a sense of what is happening in the convolutional layers.

Python | TensorFlow | Keras | NumPy

VIEW THE REPO

Crime in North Carolina

I conducted a classic econometric analysis of crime data from North Carolina. I developed and compared regression models, considering the Gauss-Markov assumptions in each case and settling on a parsimonious model.

R | knitr | Regression Analysis | ggplot2 | Robust Statistics

READ THE REPORT

VIEW THE REPO

Modeling Probabilistic Forecasting

I built a model of probabilistic forecasting based on information learned from human interaction as a term paper for George Mason University's Introduction to Computational Social Science. I wrote the paper in Shiny to allow readers to interact with the model.

R | Shiny | Bokeh | Computational Social Science

READ THE INTERACTIVE REPORT

VIEW THE REPO

Iris Classifier Shiny App

This was my first data product, which I produced in 2015. I built a Shiny application to demonstrate classification using random forrests and decision trees as part of Johns Hopkins University's Data Science Specialization on Coursera.

R | Shiny | highcharteR | Decision Trees | Random Forrests

TRY THE APP

VIEW THE REPO

Contact