| Train rows | Unique train images | States | Months | Missingness | Best CV strategy |
|---|---|---|---|---|---|
| 1785 | 357 | NSW, Tas, Vic, WA | 10 | 0 (no missing values) | spatial_statewise |
NaN/dash entries in tables indicate values not applicable (e.g., mean of a string column), not missing data.
Dataset shape: 1785 rows × 9 cols (0.677 MB). Full descriptive statistics saved as metadata_summary.html.
— entries indicate statistics not applicable (e.g., mean of categorical columns).
No missing values detected in this dataset.
Histograms showing biomass target distributions (grams).
Heatmap of pairwise correlations between numeric biomass targets.
Distribution of Dry_Total_g per species.
Average target over time, showing seasonal pattern if present.
Temporal sampling analysis: global interval distribution, per-state timelines, and sampling density heatmap.
Image metadata summary: resolution distribution, file sizes, aspect ratios, and RGB channel statistics. Full metadata saved to image_metadata.csv.
Data integrity analysis: duplicate sample IDs, missing image files, invalid target names, and train/test image count consistency.
| train_rows | test_rows | unique_train_images | unique_test_images | duplicate_sample_ids | missing_image_files | invalid_target_rows |
|---|---|---|---|---|---|---|
| 1785 | 5 | 357 | 1 | 0 | 0 | 0 |
State Distribution
Counts per state.
| state | count | pct |
|---|---|---|
| NSW | 75 | 0.210084 |
| Tas | 138 | 0.386555 |
| Vic | 112 | 0.313725 |
| WA | 32 | 0.089636 |
[
{
"state": "NSW",
"count": 75,
"pct": 0.2100840336
},
{
"state": "Tas",
"count": 138,
"pct": 0.3865546218
},
{
"state": "Vic",
"count": 112,
"pct": 0.3137254902
},
{
"state": "WA",
"count": 32,
"pct": 0.0896358543
}
]
Temporal Distribution
Sampling by month.
| month | count | pct |
|---|---|---|
| 1 | 17 | 0.047619 |
| 2 | 24 | 0.067227 |
| 4 | 10 | 0.028011 |
| 5 | 42 | 0.117647 |
| 6 | 53 | 0.148459 |
| 7 | 41 | 0.114846 |
| 8 | 37 | 0.103641 |
| 9 | 67 | 0.187675 |
| 10 | 29 | 0.081232 |
| 11 | 37 | 0.103641 |
[
{
"month": 1,
"count": 17,
"pct": 0.0476190476
},
{
"month": 2,
"count": 24,
"pct": 0.0672268908
},
{
"month": 4,
"count": 10,
"pct": 0.0280112045
},
{
"month": 5,
"count": 42,
"pct": 0.1176470588
},
{
"month": 6,
"count": 53,
"pct": 0.1484593838
},
{
"month": 7,
"count": 41,
"pct": 0.1148459384
},
{
"month": 8,
"count": 37,
"pct": 0.1036414566
},
{
"month": 9,
"count": 67,
"pct": 0.18767507
},
{
"month": 10,
"count": 29,
"pct": 0.081232493
},
{
"month": 11,
"count": 37,
"pct": 0.1036414566
}
]
Spatiotemporal Matrix
State × Month sparsity.
| State | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NSW | 17 | 24 | 0 | 10 | 13 | 0 | 0 | 0 | 0 | 11 | 0 | 0 |
| Tas | 0 | 0 | 0 | 0 | 29 | 28 | 10 | 0 | 34 | 0 | 37 | 0 |
| Vic | 0 | 0 | 0 | 0 | 0 | 25 | 19 | 29 | 21 | 18 | 0 | 0 |
| WA | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 8 | 12 | 0 | 0 | 0 |
{
"columns": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12
],
"index": [
"NSW",
"Tas",
"Vic",
"WA"
],
"data": [
[
17,
24,
0,
10,
13,
0,
0,
0,
0,
11,
0,
0
],
[
0,
0,
0,
0,
29,
28,
10,
0,
34,
0,
37,
0
],
[
0,
0,
0,
0,
0,
25,
19,
29,
21,
18,
0,
0
],
[
0,
0,
0,
0,
0,
0,
12,
8,
12,
0,
0,
0
]
]
}
Target Distributions by State & Month
Target spread across State and Month.
Leakage Diagnostics
Temporal proximity & clusters.
| sample_id | image_path | Sampling_Date | State | Species | Pre_GSHH_NDVI | Height_Ave_cm | target_name | target | image_id | date_ordinal | prev_date | delta_days |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ID1070112260__Dry_Clover_g | train/ID1070112260.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.60 | 6.0 | Dry_Clover_g | 0.0000 | ID1070112260 | 735613 | — | — |
| ID1275072698__Dry_Clover_g | train/ID1275072698.jpg | 2015-01-15 | NSW | Lucerne | 0.74 | 42.0 | Dry_Clover_g | 0.0000 | ID1275072698 | 735613 | 2015-01-15 | 0.0 |
| ID1314135397__Dry_Clover_g | train/ID1314135397.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.75 | 6.0 | Dry_Clover_g | 0.0000 | ID1314135397 | 735613 | 2015-01-15 | 0.0 |
| ID1357758282__Dry_Clover_g | train/ID1357758282.jpg | 2015-01-15 | NSW | Lucerne | 0.77 | 62.0 | Dry_Clover_g | 0.0000 | ID1357758282 | 735613 | 2015-01-15 | 0.0 |
| ID1472525822__Dry_Clover_g | train/ID1472525822.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.70 | 7.0 | Dry_Clover_g | 0.0000 | ID1472525822 | 735613 | 2015-01-15 | 0.0 |
| ID1473228876__Dry_Clover_g | train/ID1473228876.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.72 | 8.0 | Dry_Clover_g | 0.0000 | ID1473228876 | 735613 | 2015-01-15 | 0.0 |
| ID147528735__Dry_Clover_g | train/ID147528735.jpg | 2015-01-15 | NSW | Lucerne | 0.74 | 52.0 | Dry_Clover_g | 0.0000 | ID147528735 | 735613 | 2015-01-15 | 0.0 |
| ID1573329652__Dry_Clover_g | train/ID1573329652.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.61 | 6.0 | Dry_Clover_g | 0.0000 | ID1573329652 | 735613 | 2015-01-15 | 0.0 |
| ID1624268863__Dry_Clover_g | train/ID1624268863.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.56 | 4.0 | Dry_Clover_g | 0.0000 | ID1624268863 | 735613 | 2015-01-15 | 0.0 |
| ID1859251563__Dry_Clover_g | train/ID1859251563.jpg | 2015-01-15 | NSW | Lucerne | 0.83 | 70.0 | Dry_Clover_g | 0.0000 | ID1859251563 | 735613 | 2015-01-15 | 0.0 |
| ID1948354837__Dry_Clover_g | train/ID1948354837.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.63 | 5.0 | Dry_Clover_g | 0.0000 | ID1948354837 | 735613 | 2015-01-15 | 0.0 |
| ID554314721__Dry_Clover_g | train/ID554314721.jpg | 2015-01-15 | NSW | Lucerne | 0.69 | 49.0 | Dry_Clover_g | 0.0000 | ID554314721 | 735613 | 2015-01-15 | 0.0 |
| ID576621307__Dry_Clover_g | train/ID576621307.jpg | 2015-01-15 | NSW | Lucerne | 0.73 | 63.0 | Dry_Clover_g | 0.0000 | ID576621307 | 735613 | 2015-01-15 | 0.0 |
| ID663006174__Dry_Clover_g | train/ID663006174.jpg | 2015-01-15 | NSW | Lucerne | 0.66 | 49.0 | Dry_Clover_g | 0.0000 | ID663006174 | 735613 | 2015-01-15 | 0.0 |
| ID670276799__Dry_Clover_g | train/ID670276799.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.66 | 6.0 | Dry_Clover_g | 0.0000 | ID670276799 | 735613 | 2015-01-15 | 0.0 |
| ID710341728__Dry_Clover_g | train/ID710341728.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.84 | 11.0 | Dry_Clover_g | 0.0000 | ID710341728 | 735613 | 2015-01-15 | 0.0 |
| ID871463897__Dry_Clover_g | train/ID871463897.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.79 | 12.0 | Dry_Clover_g | 10.0981 | ID871463897 | 735613 | 2015-01-15 | 0.0 |
| ID1103883611__Dry_Clover_g | train/ID1103883611.jpg | 2015-02-24 | NSW | Phalaris | 0.48 | 26.0 | Dry_Clover_g | 0.0000 | ID1103883611 | 735653 | 2015-01-15 | 40.0 |
| ID1121692672__Dry_Clover_g | train/ID1121692672.jpg | 2015-02-24 | NSW | Phalaris | 0.52 | 38.0 | Dry_Clover_g | 0.0000 | ID1121692672 | 735653 | 2015-02-24 | 0.0 |
| ID1211362607__Dry_Clover_g | train/ID1211362607.jpg | 2015-02-24 | NSW | Ryegrass | 0.58 | 7.0 | Dry_Clover_g | 0.0000 | ID1211362607 | 735653 | 2015-02-24 | 0.0 |
Showing first {max_rows} rows only
| sample_id | image_path | Sampling_Date | State | Species | Pre_GSHH_NDVI | Height_Ave_cm | target_name | target | image_id | date_ordinal | prev_date | delta_days | cluster | cluster_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ID1070112260__Dry_Clover_g | train/ID1070112260.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.60 | 6.0 | Dry_Clover_g | 0.0000 | ID1070112260 | 735613 | — | — | 0 | 1 |
| ID1275072698__Dry_Clover_g | train/ID1275072698.jpg | 2015-01-15 | NSW | Lucerne | 0.74 | 42.0 | Dry_Clover_g | 0.0000 | ID1275072698 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1314135397__Dry_Clover_g | train/ID1314135397.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.75 | 6.0 | Dry_Clover_g | 0.0000 | ID1314135397 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1357758282__Dry_Clover_g | train/ID1357758282.jpg | 2015-01-15 | NSW | Lucerne | 0.77 | 62.0 | Dry_Clover_g | 0.0000 | ID1357758282 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1472525822__Dry_Clover_g | train/ID1472525822.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.70 | 7.0 | Dry_Clover_g | 0.0000 | ID1472525822 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1473228876__Dry_Clover_g | train/ID1473228876.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.72 | 8.0 | Dry_Clover_g | 0.0000 | ID1473228876 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID147528735__Dry_Clover_g | train/ID147528735.jpg | 2015-01-15 | NSW | Lucerne | 0.74 | 52.0 | Dry_Clover_g | 0.0000 | ID147528735 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1573329652__Dry_Clover_g | train/ID1573329652.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.61 | 6.0 | Dry_Clover_g | 0.0000 | ID1573329652 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1624268863__Dry_Clover_g | train/ID1624268863.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.56 | 4.0 | Dry_Clover_g | 0.0000 | ID1624268863 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1859251563__Dry_Clover_g | train/ID1859251563.jpg | 2015-01-15 | NSW | Lucerne | 0.83 | 70.0 | Dry_Clover_g | 0.0000 | ID1859251563 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1948354837__Dry_Clover_g | train/ID1948354837.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.63 | 5.0 | Dry_Clover_g | 0.0000 | ID1948354837 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID554314721__Dry_Clover_g | train/ID554314721.jpg | 2015-01-15 | NSW | Lucerne | 0.69 | 49.0 | Dry_Clover_g | 0.0000 | ID554314721 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID576621307__Dry_Clover_g | train/ID576621307.jpg | 2015-01-15 | NSW | Lucerne | 0.73 | 63.0 | Dry_Clover_g | 0.0000 | ID576621307 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID663006174__Dry_Clover_g | train/ID663006174.jpg | 2015-01-15 | NSW | Lucerne | 0.66 | 49.0 | Dry_Clover_g | 0.0000 | ID663006174 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID670276799__Dry_Clover_g | train/ID670276799.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.66 | 6.0 | Dry_Clover_g | 0.0000 | ID670276799 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID710341728__Dry_Clover_g | train/ID710341728.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.84 | 11.0 | Dry_Clover_g | 0.0000 | ID710341728 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID871463897__Dry_Clover_g | train/ID871463897.jpg | 2015-01-15 | NSW | Fescue_CrumbWeed | 0.79 | 12.0 | Dry_Clover_g | 10.0981 | ID871463897 | 735613 | 2015-01-15 | 0.0 | 1 | 1 |
| ID1103883611__Dry_Clover_g | train/ID1103883611.jpg | 2015-02-24 | NSW | Phalaris | 0.48 | 26.0 | Dry_Clover_g | 0.0000 | ID1103883611 | 735653 | 2015-01-15 | 40.0 | 0 | 2 |
| ID1121692672__Dry_Clover_g | train/ID1121692672.jpg | 2015-02-24 | NSW | Phalaris | 0.52 | 38.0 | Dry_Clover_g | 0.0000 | ID1121692672 | 735653 | 2015-02-24 | 0.0 | 1 | 2 |
| ID1211362607__Dry_Clover_g | train/ID1211362607.jpg | 2015-02-24 | NSW | Ryegrass | 0.58 | 7.0 | Dry_Clover_g | 0.0000 | ID1211362607 | 735653 | 2015-02-24 | 0.0 | 1 | 2 |
Showing first {max_rows} rows only
| target_name | autocorr |
|---|---|
| Dry_Green_g | — |
| Dry_Dead_g | — |
| Dry_Clover_g | 0.377647 |
| GDM_g | — |
| Dry_Total_g | — |
{
"module": "leakage_diagnostics",
"temporal_deltas_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/temporal_deltas.csv",
"temporal_clusters_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/temporal_clusters.csv",
"autocorrelation_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/autocorrelation.csv",
"state_date_counts_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/state_date_counts.csv",
"state_date_heatmap": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/state_date_heatmap.png",
"min_temporal_gap_days": 0.0,
"cluster_window_days": 3
}
Coherence Diagnostics
Mass-balance consistency.
| image_id | State | Sampling_Date | Dry_Clover_g | Dry_Dead_g | Dry_Green_g | Dry_Total_g | GDM_g | ce_abs | ce_rel | month |
|---|---|---|---|---|---|---|---|---|---|---|
| ID1011485656 | Tas | 2015-09-04 | 0.0000 | 31.9984 | 16.2751 | 48.2735 | 16.2750 | 0.000000e+00 | 0.000000e+00 | 9 |
| ID1012260530 | NSW | 2015-04-01 | 0.0000 | 0.0000 | 7.6000 | 7.6000 | 7.6000 | 0.000000e+00 | 0.000000e+00 | 4 |
| ID1025234388 | WA | 2015-09-01 | 6.0500 | 0.0000 | 0.0000 | 6.0500 | 6.0500 | 0.000000e+00 | 0.000000e+00 | 9 |
| ID1028611175 | Tas | 2015-05-18 | 0.0000 | 30.9703 | 24.2376 | 55.2079 | 24.2376 | 0.000000e+00 | 0.000000e+00 | 5 |
| ID1035947949 | Tas | 2015-09-11 | 0.4343 | 23.2239 | 10.5261 | 34.1844 | 10.9605 | 1.000000e-04 | 2.925311e-06 | 9 |
| ID1036339023 | Vic | 2015-09-30 | 23.0755 | 2.6135 | 32.1910 | 57.8800 | 55.2665 | -7.105427e-15 | -1.227614e-16 | 9 |
| ID1049634115 | Vic | 2015-07-02 | 1.5083 | 3.0167 | 13.5750 | 18.1000 | 15.0833 | 3.552714e-15 | 1.962825e-16 | 7 |
| ID1051144034 | WA | 2015-09-01 | 55.3200 | 0.0000 | 0.0000 | 55.3200 | 55.3200 | 0.000000e+00 | 0.000000e+00 | 9 |
| ID1052620238 | Tas | 2015-05-18 | 0.0000 | 11.2291 | 20.1707 | 31.3998 | 20.1707 | 0.000000e+00 | 0.000000e+00 | 5 |
| ID105271783 | Vic | 2015-06-30 | 5.2698 | 8.5635 | 27.6667 | 41.5000 | 32.9365 | 0.000000e+00 | 0.000000e+00 | 6 |
| ID1053972079 | Tas | 2015-09-04 | 21.0801 | 7.9393 | 1.3688 | 30.3882 | 22.4489 | 0.000000e+00 | 0.000000e+00 | 9 |
| ID1058383417 | Tas | 2015-05-19 | 0.1000 | 9.5000 | 13.1000 | 22.7000 | 13.2000 | -3.552714e-15 | -1.565072e-16 | 5 |
| ID1062837331 | Vic | 2015-09-29 | 19.9800 | 3.9623 | 35.4077 | 59.3500 | 55.3877 | 7.105427e-15 | 1.197208e-16 | 9 |
| ID1070112260 | NSW | 2015-01-15 | 0.0000 | 9.8765 | 22.1235 | 32.0000 | 22.1235 | 0.000000e+00 | 0.000000e+00 | 1 |
| ID1078930021 | Vic | 2015-06-26 | 0.0000 | 5.0189 | 32.9811 | 38.0000 | 32.9811 | 0.000000e+00 | 0.000000e+00 | 6 |
| ID1084819986 | Tas | 2015-09-04 | 21.4551 | 13.0742 | 2.6819 | 37.2112 | 24.1370 | -7.105427e-15 | -1.909486e-16 | 9 |
| ID1088965591 | NSW | 2015-04-01 | 0.0000 | 0.3443 | 59.5557 | 59.9000 | 59.5557 | 0.000000e+00 | 0.000000e+00 | 4 |
| ID1098771283 | Tas | 2015-11-09 | 8.3760 | 15.1673 | 0.4528 | 23.9961 | 8.8287 | 0.000000e+00 | 0.000000e+00 | 11 |
| ID1103883611 | NSW | 2015-02-24 | 0.0000 | 12.9166 | 82.7834 | 95.7000 | 82.7834 | 0.000000e+00 | 0.000000e+00 | 2 |
| ID1108283583 | Vic | 2015-08-19 | 5.7730 | 7.2162 | 40.4108 | 53.4000 | 46.1838 | -7.105427e-15 | -1.330604e-16 | 8 |
Showing first {max_rows} rows only
| State | ce_abs | ce_abs.1 | ce_abs.2 | ce_abs.3 | ce_abs.4 | ce_rel | ce_rel.1 | ce_rel.2 | ce_rel.3 | ce_rel.4 |
|---|---|---|---|---|---|---|---|---|---|---|
| — | mean | std | min | max | count | mean | std | min | max | count |
| NSW | 7.579122514774402e-16 | 5.549925956896144e-15 | -1.4210854715202004e-14 | 2.842170943040401e-14 | 75 | 5.9984927158471725e-18 | 6.247577426811178e-17 | -1.5313421029312505e-16 | 2.1466547908160125e-16 | 75 |
| Tas | 7.079683055580708e-17 | 4.1854806385090615e-05 | -0.00010000000000331966 | 0.00010000000000331966 | 138 | -2.225080908964888e-09 | 1.4454018973314639e-06 | -6.447286981508574e-06 | 5.565418714263998e-06 | 138 |
| Vic | -0.002758928571428676 | 0.02917872405675814 | -0.30879999999999974 | 0.00010000000000331966 | 112 | -0.0002706022894376321 | 0.002863136958345086 | -0.030300651542507235 | 5.555555555542608e-06 | 112 |
| WA | 4.440892098500626e-16 | 2.51214793389404e-15 | 0.0 | 1.4210854715202004e-14 | 32 | 6.153376885826002e-18 | 3.480875618531301e-17 | 0.0 | 1.9690806034643207e-16 | 32 |
| month | ce_abs | ce_abs.1 | ce_abs.2 | ce_abs.3 | ce_abs.4 | ce_rel | ce_rel.1 | ce_rel.2 | ce_rel.3 | ce_rel.4 |
|---|---|---|---|---|---|---|---|---|---|---|
| — | mean | std | min | max | count | mean | std | min | max | count |
| 1.0 | 8.359326303060002e-16 | 2.3597520885057272e-15 | 0.0 | 7.105427357601002e-15 | 17 | 2.1894926912979286e-17 | 6.26264395700414e-17 | 0.0 | 2.1466547908160125e-16 | 17 |
| 2.0 | 2.9605947323337506e-16 | 6.7827319990657174e-15 | -1.4210854715202004e-14 | 1.4210854715202004e-14 | 24 | -1.9701267837655182e-18 | 6.917261940531651e-17 | -1.5313421029312505e-16 | 1.3482784359774196e-16 | 24 |
| 4.0 | -7.105427357601002e-16 | 2.2469334198890887e-15 | -7.105427357601002e-15 | 0.0 | 10 | -1.518253708889103e-17 | 4.8011397860877965e-17 | -1.518253708889103e-16 | 0.0 | 10 |
| 5.0 | -2.380952380185536e-06 | 1.5430334996855132e-05 | -0.00010000000000331966 | 2.842170943040401e-14 | 42 | -5.527983814683679e-08 | 3.582542969093264e-07 | -2.3217532023551785e-06 | 1.7999815978723246e-16 | 42 |
| 6.0 | -0.005828301886791941 | 0.042416702884588084 | -0.30879999999999974 | 0.00010000000000331966 | 53 | -0.0005717458196387811 | 0.004162109937349363 | -0.030300651542507235 | 5.099257038418804e-06 | 53 |
| 7.0 | 2.4390243899349377e-06 | 2.7274574955527273e-05 | -9.999999999976694e-05 | 9.999999999976694e-05 | 41 | 8.75461234554813e-08 | 1.13899935025474e-06 | -4.098360655728153e-06 | 5.555555555542608e-06 | 41 |
| 8.0 | -1.3513513513578039e-05 | 4.808650863058296e-05 | -0.00010000000000331966 | 0.00010000000000331966 | 37 | -4.3627897605013946e-07 | 1.2882967921235893e-06 | -4.201680672259115e-06 | 2.066115702547927e-06 | 37 |
| 9.0 | 1.3256394323882465e-16 | 3.8924947208054574e-05 | -0.00010000000000331966 | 0.00010000000000331966 | 67 | 7.139112535396771e-08 | 1.4557161838070191e-06 | -6.447286981508574e-06 | 5.565418714263998e-06 | 67 |
| 10.0 | 1.0344827586305295e-05 | 4.092525928270141e-05 | -0.00010000000000331966 | 0.00010000000000331966 | 29 | 1.6841733627103754e-07 | 9.393314908829852e-07 | -2.9498525074725564e-06 | 2.994011976147295e-06 | 29 |
| 11.0 | 2.702702702672399e-06 | 4.992486847735313e-05 | -0.00010000000000331966 | 0.00010000000000331966 | 37 | -7.476005563815007e-10 | 1.2284833849897595e-06 | -4.063388866305036e-06 | 3.555606124294022e-06 | 37 |
{
"module": "coherence_diagnostics",
"per_sample_csv": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_per_sample.csv",
"state_summary_csv": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_state_summary.csv",
"month_summary_csv": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_month_summary.csv",
"histogram_png": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_histogram.png",
"state_boxplot_png": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_state_boxplot.png",
"month_boxplot_png": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_month_boxplot.png",
"n_samples": 357,
"ce_abs_mean": -0.0008655462184872013,
"ce_abs_std": 0.016343440616208217,
"ce_rel_mean": -8.489569601730977e-05,
"ce_rel_std": 0.0016036799014134855
}
Clustering
Structure of state+month embeddings.
| pc1 | pc2 |
|---|---|
| 1.880687 | -0.595691 |
| 1.880687 | -0.595691 |
| 1.880687 | -0.595691 |
| 1.880687 | -0.595691 |
| 1.880687 | -0.595691 |
| -3.215288 | -0.771861 |
| -3.215288 | -0.771861 |
| -3.215288 | -0.771861 |
| -3.215288 | -0.771861 |
| -3.215288 | -0.771861 |
| 2.204927 | 1.377851 |
| 2.204927 | 1.377851 |
| 2.204927 | 1.377851 |
| 2.204927 | 1.377851 |
| 2.204927 | 1.377851 |
| -2.066397 | 0.052790 |
| -2.066397 | 0.052790 |
| -2.066397 | 0.052790 |
| -2.066397 | 0.052790 |
| -2.066397 | 0.052790 |
Showing first {max_rows} rows only
{
"module": "clustering",
"pca_embedding_csv": "workspace/outputs/reports/eda/split_analytics/clustering/pca_embedding.csv",
"pca_plot_png": "workspace/outputs/reports/eda/split_analytics/clustering/pca_scatter.png",
"umap_available": false,
"state_codes": {
"NSW": 0,
"Tas": 1,
"Vic": 2,
"WA": 3
},
"umap_embedding_csv": null,
"umap_plot_png": null
}
Train/Test Comparison
Distributional alignment where metadata exists.
{
"module": "train_test_comparison",
"state_comparison": null,
"month_comparison": null,
"state_month_comparison": null,
"notes": [
"State comparison skipped: 'State' column missing in train or test.",
"Month comparison skipped: 'Sampling_Date' missing in test.csv; add test metadata to enable.",
"State \u00d7 Month comparison skipped: test.csv lacks State/Sampling_Date metadata."
]
}
Fold Feasibility
Strategy feasibility assessment.
| strategy | status | insufficient_states | num_states | min_samples_per_state | reason | min_gap | unsafe_months | empty_bins | total_bins | sparsity_ratio |
|---|---|---|---|---|---|---|---|---|---|---|
| spatial_statewise | partially_feasible | ['NSW', 'Tas', 'Vic', 'WA'] | 4.0 | 32.0 | — | — | — | — | — | — |
| temporal_monthwise | high_leakage_risk | — | — | — | Temporal gap below safety threshold | 0.0 | [1, 2, 4, 5, 6, 7, 8, 9, 10, 11] | — | — | — |
| spatiotemporal_bins | sparse | — | — | — | — | — | — | 30.0 | 48.0 | 0.625 |
| cluster_based | not_evaluable | — | — | — | No UMAP embedding available | — | — | — | — | — |
| random_kfold | feasible | — | — | — | Baseline only | — | — | — | — | — |
{
"spatial_statewise": {
"status": "partially_feasible",
"insufficient_states": [
"NSW",
"Tas",
"Vic",
"WA"
],
"num_states": 4,
"min_samples_per_state": 32
},
"temporal_monthwise": {
"status": "high_leakage_risk",
"reason": "Temporal gap below safety threshold",
"min_gap": 0.0,
"unsafe_months": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
]
},
"spatiotemporal_bins": {
"status": "sparse",
"empty_bins": 30,
"total_bins": 48,
"sparsity_ratio": 0.625
},
"cluster_based": {
"status": "not_evaluable",
"reason": "No UMAP embedding available"
},
"random_kfold": {
"status": "feasible",
"reason": "Baseline only"
}
}
{
"violations": [
{
"strategy": "spatial_statewise",
"status": "partially_feasible",
"insufficient_states": [
"NSW",
"Tas",
"Vic",
"WA"
],
"num_states": 4,
"min_samples_per_state": 32
},
{
"strategy": "temporal_monthwise",
"status": "high_leakage_risk",
"reason": "Temporal gap below safety threshold",
"min_gap": 0.0,
"unsafe_months": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
]
},
{
"strategy": "spatiotemporal_bins",
"status": "sparse",
"empty_bins": 30,
"total_bins": 48,
"sparsity_ratio": 0.625
},
{
"strategy": "cluster_based",
"status": "not_evaluable",
"reason": "No UMAP embedding available"
},
{
"strategy": "random_kfold",
"status": "feasible",
"reason": "Baseline only"
}
]
}
Split Candidate Evaluation
Strategy scoring & recommended CV approach.
| strategy | score | folds_csv |
|---|---|---|
| spatial_statewise | 1 | workspace/outputs/reports/eda/split_analytics/candidate_folds/spatial_statewise_folds.csv |
| random_kfold | 1 | workspace/outputs/reports/eda/split_analytics/candidate_folds/random_kfold_folds.csv |
{
"spatial_statewise": {
"evaluation": {
"sample_counts": {
"0": 75,
"1": 138,
"2": 112,
"3": 32
},
"state_coverage": {
"0": [
"NSW"
],
"1": [
"Tas"
],
"2": [
"Vic"
],
"3": [
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
10
],
"1": [
5,
6,
7,
9,
11
],
"2": [
6,
7,
8,
9,
10
],
"3": [
7,
8,
9
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/spatial_statewise_folds.csv"
},
"random_kfold": {
"evaluation": {
"sample_counts": {
"0": 70,
"1": 63,
"2": 84,
"3": 80,
"4": 60
},
"state_coverage": {
"0": [
"NSW",
"Tas",
"Vic",
"WA"
],
"1": [
"NSW",
"Tas",
"Vic",
"WA"
],
"2": [
"NSW",
"Tas",
"Vic",
"WA"
],
"3": [
"NSW",
"Tas",
"Vic",
"WA"
],
"4": [
"NSW",
"Tas",
"Vic",
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"1": [
1,
2,
5,
6,
7,
8,
9,
10,
11
],
"2": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"3": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"4": [
1,
2,
5,
6,
7,
8,
9,
10,
11
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/random_kfold_folds.csv"
}
}
{
"best_strategy": "spatial_statewise",
"strategies": {
"spatial_statewise": {
"evaluation": {
"sample_counts": {
"0": 75,
"1": 138,
"2": 112,
"3": 32
},
"state_coverage": {
"0": [
"NSW"
],
"1": [
"Tas"
],
"2": [
"Vic"
],
"3": [
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
10
],
"1": [
5,
6,
7,
9,
11
],
"2": [
6,
7,
8,
9,
10
],
"3": [
7,
8,
9
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/spatial_statewise_folds.csv"
},
"random_kfold": {
"evaluation": {
"sample_counts": {
"0": 70,
"1": 63,
"2": 84,
"3": 80,
"4": 60
},
"state_coverage": {
"0": [
"NSW",
"Tas",
"Vic",
"WA"
],
"1": [
"NSW",
"Tas",
"Vic",
"WA"
],
"2": [
"NSW",
"Tas",
"Vic",
"WA"
],
"3": [
"NSW",
"Tas",
"Vic",
"WA"
],
"4": [
"NSW",
"Tas",
"Vic",
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"1": [
1,
2,
5,
6,
7,
8,
9,
10,
11
],
"2": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"3": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"4": [
1,
2,
5,
6,
7,
8,
9,
10,
11
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/random_kfold_folds.csv"
}
},
"scores_csv": "workspace/outputs/reports/eda/split_analytics/strategy_scores.csv",
"scores_json": "workspace/outputs/reports/eda/split_analytics/strategy_scores.json"
}