Scoring Overview

Purpose

The Image2Biomass competition uses math to check how good our predictions are. You try to guess how much plant matter is in a pasture image. The organizers use a scoring formula to see how close the guesses are to the real values. This score is called a weighted R².

Weighted R²

This is the main math formula used to score our predictions. Don’t worry if it looks intimidating. We’ll break it down step by step:

R_{w}^{2} = 1 - \frac{\sum_{j} w_{j} (y_{j} - {\hat{y}}_{j})^{2}}{\sum_{j} w_{j} (y_{j} - {\bar{y}}_{w})^{2}}

In simple terms:

y_j: the actual (true) value of biomass.
ŷ_j: the value you predicted.
w_j: how important this row is (weights are higher for more important components).
ȳ_w: the average of all actual values, adjusted for importance.

The score tells us what fraction of the total error was explained by our predictions. A perfect score is 1.0.

Weighted Mean

This is how we calculate the weighted average of all the true biomass values:

{\bar{y}}_{w} = \frac{\sum_{j} w_{j} y_{j}}{\sum_{j} w_{j}}

This gives more influence to rows that are considered more important (have a higher weight).

Residual Sum of Squares

S S_{res} = \sum_{j} w_{j} (y_{j} - {\hat{y}}_{j})^{2}

This adds up all the squared differences between our predictions and the truth, multiplied by how important each one is. The smaller this value, the better.

Total Sum of Squares

S S_{tot} = \sum_{j} w_{j} (y_{j} - {\bar{y}}_{w})^{2}

This measures how spread out the true values are, again weighted by importance. It sets the baseline for comparison.

Component Weights

The scoring system gives more weight to some components. This pie chart shows how each one contributes to the final score:

7. What Do These Biomass Values Look Like?

This bar chart shows the average amount of each biomass type in the training data. It helps you understand what is “typical.”

8. Fit vs. Mean?

This chart compares our model’s R² score to a dumb baseline that just predicts the mean. It shows that our model explains a lot more of the variation in the data: