App icon

Ringworm Scanner

Instantly check photos of skin for signs of ringworm using on-device AI.
(No images ever leave your phone.)

Download on the App Store


Methodology

Ringworm is something that is highly contagious and can be easily spread from person to person. This makes it important to be able to identify, especially for people participating in sports and other activities where close contact or sharing of equipment is common. This includes brazilian jiu jitsu, wrestling, boxing and yoga.

Ringworm lesions, typically circular or oval-shaped, can be easily mistaken for other skin conditions. This can lead to delayed treatment and the spread of the infection to others.

I built up a training dataset from public sources and used it to train a vision-transformer model. When looking for publically available datasets, I found that most that had ringworm images were of ringworm in different variations, but limited when it comes to the circular lesions (tinea corporis) that is more commonly seen and confused for other conditions. I also added more data for eczema, a condfition that is commonly mistaken for ringworm.

The dataset contains 2160 training samples and 543 validation samples. The classes are:

The evaluation performance for ringworm is as follows:

Validation performance for ringworm

  Predicted Negative Predicted Positive
True Negative 520 0
True Positive 2 21

Metrics:

The model in it’s current state is extremely performant and will improve with more data. I plan to add more data to the training set and also add more classes to the model to improve the performance.

Comparison to GPT-4o

I performed an experiment to use ChatGPT, more specifically the GPT-4o model as it’s what people will typically have access to via the app. I wanted to compare the performance of bespoke models vs GPT (which is amazing at many things and only getting better). It also starts to ask questions about the efficacy of using GPT for medical diagnosis.

I asked the model to act as a mecical imaging expert and to output a binary True or False (and enforced this using a Pydantic output class)for each skin condition in my test set. I requested that it output true if it was more likely than not that the condition is present. This was to mimic the 50% probability threshold used when evaluating the ine tuned vision model.

  Predicted Negative Predicted Positive
True Negative 502 18
True Positive 6 17

Metrics:

The results show that while GPT-4o demonstrates reasonable recall at 73.9%, the precision is significantly lower at 48.6% compared to our bespoke model’s 100% precision. This means GPT-4o generates many false positives – incorrectly identifying non-ringworm conditions as ringworm in 18 cases. This highlights the value of purpose-built models for medical applications where precision is critical.


Why on-device?


Privacy Policy