Exploratory Data Analysis Report

Report Overview

Train rows Unique train images States Months Missingness Best CV strategy
1785 357 NSW, Tas, Vic, WA 10 0 (no missing values) spatial_statewise

NaN/dash entries in tables indicate values not applicable (e.g., mean of a string column), not missing data.

Metadata Summary

Dataset shape: 1785 rows × 9 cols (0.677 MB). Full descriptive statistics saved as metadata_summary.html.

— entries indicate statistics not applicable (e.g., mean of categorical columns).

Missingness

No missing values detected in this dataset.

Target Distributions

Histograms showing biomass target distributions (grams).

Target Correlations

Heatmap of pairwise correlations between numeric biomass targets.

Species Biomass

Distribution of Dry_Total_g per species.

Temporal Trends

Average target over time, showing seasonal pattern if present.

Temporal Distance

Temporal sampling analysis: global interval distribution, per-state timelines, and sampling density heatmap.

Image Metadata

Image metadata summary: resolution distribution, file sizes, aspect ratios, and RGB channel statistics. Full metadata saved to image_metadata.csv.

Data Integrity

Data integrity analysis: duplicate sample IDs, missing image files, invalid target names, and train/test image count consistency.

Integrity Summary

train_rows test_rows unique_train_images unique_test_images duplicate_sample_ids missing_image_files invalid_target_rows
1785 5 357 1 0 0 0

State Distribution

State Distribution
Counts per state.

Image

Table

state count pct
NSW 75 0.210084
Tas 138 0.386555
Vic 112 0.313725
WA 32 0.089636

Summary

[
{
"state": "NSW",
"count": 75,
"pct": 0.2100840336
},
{
"state": "Tas",
"count": 138,
"pct": 0.3865546218
},
{
"state": "Vic",
"count": 112,
"pct": 0.3137254902
},
{
"state": "WA",
"count": 32,
"pct": 0.0896358543
}
]

Temporal Distribution

Temporal Distribution
Sampling by month.

Image

Image

Table

month count pct
1 17 0.047619
2 24 0.067227
4 10 0.028011
5 42 0.117647
6 53 0.148459
7 41 0.114846
8 37 0.103641
9 67 0.187675
10 29 0.081232
11 37 0.103641

Summary

[
{
"month": 1,
"count": 17,
"pct": 0.0476190476
},
{
"month": 2,
"count": 24,
"pct": 0.0672268908
},
{
"month": 4,
"count": 10,
"pct": 0.0280112045
},
{
"month": 5,
"count": 42,
"pct": 0.1176470588
},
{
"month": 6,
"count": 53,
"pct": 0.1484593838
},
{
"month": 7,
"count": 41,
"pct": 0.1148459384
},
{
"month": 8,
"count": 37,
"pct": 0.1036414566
},
{
"month": 9,
"count": 67,
"pct": 0.18767507
},
{
"month": 10,
"count": 29,
"pct": 0.081232493
},
{
"month": 11,
"count": 37,
"pct": 0.1036414566
}
]

Spatiotemporal Matrix

Spatiotemporal Matrix
State × Month sparsity.

Image

Table

State 1 2 3 4 5 6 7 8 9 10 11 12
NSW 17 24 0 10 13 0 0 0 0 11 0 0
Tas 0 0 0 0 29 28 10 0 34 0 37 0
Vic 0 0 0 0 0 25 19 29 21 18 0 0
WA 0 0 0 0 0 0 12 8 12 0 0 0

Summary

{
"columns": [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12
],
"index": [
"NSW",
"Tas",
"Vic",
"WA"
],
"data": [
[
17,
24,
0,
10,
13,
0,
0,
0,
0,
11,
0,
0
],
[
0,
0,
0,
0,
29,
28,
10,
0,
34,
0,
37,
0
],
[
0,
0,
0,
0,
0,
25,
19,
29,
21,
18,
0,
0
],
[
0,
0,
0,
0,
0,
0,
12,
8,
12,
0,
0,
0
]
]
}

Target Distributions by State & Month

Target Distributions by State & Month
Target spread across State and Month.

Leakage Diagnostics

Leakage Diagnostics
Temporal proximity & clusters.

Image

Table

sample_id image_path Sampling_Date State Species Pre_GSHH_NDVI Height_Ave_cm target_name target image_id date_ordinal prev_date delta_days
ID1070112260__Dry_Clover_g train/ID1070112260.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.60 6.0 Dry_Clover_g 0.0000 ID1070112260 735613
ID1275072698__Dry_Clover_g train/ID1275072698.jpg 2015-01-15 NSW Lucerne 0.74 42.0 Dry_Clover_g 0.0000 ID1275072698 735613 2015-01-15 0.0
ID1314135397__Dry_Clover_g train/ID1314135397.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.75 6.0 Dry_Clover_g 0.0000 ID1314135397 735613 2015-01-15 0.0
ID1357758282__Dry_Clover_g train/ID1357758282.jpg 2015-01-15 NSW Lucerne 0.77 62.0 Dry_Clover_g 0.0000 ID1357758282 735613 2015-01-15 0.0
ID1472525822__Dry_Clover_g train/ID1472525822.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.70 7.0 Dry_Clover_g 0.0000 ID1472525822 735613 2015-01-15 0.0
ID1473228876__Dry_Clover_g train/ID1473228876.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.72 8.0 Dry_Clover_g 0.0000 ID1473228876 735613 2015-01-15 0.0
ID147528735__Dry_Clover_g train/ID147528735.jpg 2015-01-15 NSW Lucerne 0.74 52.0 Dry_Clover_g 0.0000 ID147528735 735613 2015-01-15 0.0
ID1573329652__Dry_Clover_g train/ID1573329652.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.61 6.0 Dry_Clover_g 0.0000 ID1573329652 735613 2015-01-15 0.0
ID1624268863__Dry_Clover_g train/ID1624268863.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.56 4.0 Dry_Clover_g 0.0000 ID1624268863 735613 2015-01-15 0.0
ID1859251563__Dry_Clover_g train/ID1859251563.jpg 2015-01-15 NSW Lucerne 0.83 70.0 Dry_Clover_g 0.0000 ID1859251563 735613 2015-01-15 0.0
ID1948354837__Dry_Clover_g train/ID1948354837.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.63 5.0 Dry_Clover_g 0.0000 ID1948354837 735613 2015-01-15 0.0
ID554314721__Dry_Clover_g train/ID554314721.jpg 2015-01-15 NSW Lucerne 0.69 49.0 Dry_Clover_g 0.0000 ID554314721 735613 2015-01-15 0.0
ID576621307__Dry_Clover_g train/ID576621307.jpg 2015-01-15 NSW Lucerne 0.73 63.0 Dry_Clover_g 0.0000 ID576621307 735613 2015-01-15 0.0
ID663006174__Dry_Clover_g train/ID663006174.jpg 2015-01-15 NSW Lucerne 0.66 49.0 Dry_Clover_g 0.0000 ID663006174 735613 2015-01-15 0.0
ID670276799__Dry_Clover_g train/ID670276799.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.66 6.0 Dry_Clover_g 0.0000 ID670276799 735613 2015-01-15 0.0
ID710341728__Dry_Clover_g train/ID710341728.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.84 11.0 Dry_Clover_g 0.0000 ID710341728 735613 2015-01-15 0.0
ID871463897__Dry_Clover_g train/ID871463897.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.79 12.0 Dry_Clover_g 10.0981 ID871463897 735613 2015-01-15 0.0
ID1103883611__Dry_Clover_g train/ID1103883611.jpg 2015-02-24 NSW Phalaris 0.48 26.0 Dry_Clover_g 0.0000 ID1103883611 735653 2015-01-15 40.0
ID1121692672__Dry_Clover_g train/ID1121692672.jpg 2015-02-24 NSW Phalaris 0.52 38.0 Dry_Clover_g 0.0000 ID1121692672 735653 2015-02-24 0.0
ID1211362607__Dry_Clover_g train/ID1211362607.jpg 2015-02-24 NSW Ryegrass 0.58 7.0 Dry_Clover_g 0.0000 ID1211362607 735653 2015-02-24 0.0

Showing first {max_rows} rows only

Table

sample_id image_path Sampling_Date State Species Pre_GSHH_NDVI Height_Ave_cm target_name target image_id date_ordinal prev_date delta_days cluster cluster_id
ID1070112260__Dry_Clover_g train/ID1070112260.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.60 6.0 Dry_Clover_g 0.0000 ID1070112260 735613 0 1
ID1275072698__Dry_Clover_g train/ID1275072698.jpg 2015-01-15 NSW Lucerne 0.74 42.0 Dry_Clover_g 0.0000 ID1275072698 735613 2015-01-15 0.0 1 1
ID1314135397__Dry_Clover_g train/ID1314135397.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.75 6.0 Dry_Clover_g 0.0000 ID1314135397 735613 2015-01-15 0.0 1 1
ID1357758282__Dry_Clover_g train/ID1357758282.jpg 2015-01-15 NSW Lucerne 0.77 62.0 Dry_Clover_g 0.0000 ID1357758282 735613 2015-01-15 0.0 1 1
ID1472525822__Dry_Clover_g train/ID1472525822.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.70 7.0 Dry_Clover_g 0.0000 ID1472525822 735613 2015-01-15 0.0 1 1
ID1473228876__Dry_Clover_g train/ID1473228876.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.72 8.0 Dry_Clover_g 0.0000 ID1473228876 735613 2015-01-15 0.0 1 1
ID147528735__Dry_Clover_g train/ID147528735.jpg 2015-01-15 NSW Lucerne 0.74 52.0 Dry_Clover_g 0.0000 ID147528735 735613 2015-01-15 0.0 1 1
ID1573329652__Dry_Clover_g train/ID1573329652.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.61 6.0 Dry_Clover_g 0.0000 ID1573329652 735613 2015-01-15 0.0 1 1
ID1624268863__Dry_Clover_g train/ID1624268863.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.56 4.0 Dry_Clover_g 0.0000 ID1624268863 735613 2015-01-15 0.0 1 1
ID1859251563__Dry_Clover_g train/ID1859251563.jpg 2015-01-15 NSW Lucerne 0.83 70.0 Dry_Clover_g 0.0000 ID1859251563 735613 2015-01-15 0.0 1 1
ID1948354837__Dry_Clover_g train/ID1948354837.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.63 5.0 Dry_Clover_g 0.0000 ID1948354837 735613 2015-01-15 0.0 1 1
ID554314721__Dry_Clover_g train/ID554314721.jpg 2015-01-15 NSW Lucerne 0.69 49.0 Dry_Clover_g 0.0000 ID554314721 735613 2015-01-15 0.0 1 1
ID576621307__Dry_Clover_g train/ID576621307.jpg 2015-01-15 NSW Lucerne 0.73 63.0 Dry_Clover_g 0.0000 ID576621307 735613 2015-01-15 0.0 1 1
ID663006174__Dry_Clover_g train/ID663006174.jpg 2015-01-15 NSW Lucerne 0.66 49.0 Dry_Clover_g 0.0000 ID663006174 735613 2015-01-15 0.0 1 1
ID670276799__Dry_Clover_g train/ID670276799.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.66 6.0 Dry_Clover_g 0.0000 ID670276799 735613 2015-01-15 0.0 1 1
ID710341728__Dry_Clover_g train/ID710341728.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.84 11.0 Dry_Clover_g 0.0000 ID710341728 735613 2015-01-15 0.0 1 1
ID871463897__Dry_Clover_g train/ID871463897.jpg 2015-01-15 NSW Fescue_CrumbWeed 0.79 12.0 Dry_Clover_g 10.0981 ID871463897 735613 2015-01-15 0.0 1 1
ID1103883611__Dry_Clover_g train/ID1103883611.jpg 2015-02-24 NSW Phalaris 0.48 26.0 Dry_Clover_g 0.0000 ID1103883611 735653 2015-01-15 40.0 0 2
ID1121692672__Dry_Clover_g train/ID1121692672.jpg 2015-02-24 NSW Phalaris 0.52 38.0 Dry_Clover_g 0.0000 ID1121692672 735653 2015-02-24 0.0 1 2
ID1211362607__Dry_Clover_g train/ID1211362607.jpg 2015-02-24 NSW Ryegrass 0.58 7.0 Dry_Clover_g 0.0000 ID1211362607 735653 2015-02-24 0.0 1 2

Showing first {max_rows} rows only

Table

target_name autocorr
Dry_Green_g
Dry_Dead_g
Dry_Clover_g 0.377647
GDM_g
Dry_Total_g

Summary

{
"module": "leakage_diagnostics",
"temporal_deltas_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/temporal_deltas.csv",
"temporal_clusters_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/temporal_clusters.csv",
"autocorrelation_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/autocorrelation.csv",
"state_date_counts_csv": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/state_date_counts.csv",
"state_date_heatmap": "workspace/outputs/reports/eda/split_analytics/leakage_diagnostics/state_date_heatmap.png",
"min_temporal_gap_days": 0.0,
"cluster_window_days": 3
}

Coherence Diagnostics

Coherence Diagnostics
Mass-balance consistency.

Image

Image

Image

Table

image_id State Sampling_Date Dry_Clover_g Dry_Dead_g Dry_Green_g Dry_Total_g GDM_g ce_abs ce_rel month
ID1011485656 Tas 2015-09-04 0.0000 31.9984 16.2751 48.2735 16.2750 0.000000e+00 0.000000e+00 9
ID1012260530 NSW 2015-04-01 0.0000 0.0000 7.6000 7.6000 7.6000 0.000000e+00 0.000000e+00 4
ID1025234388 WA 2015-09-01 6.0500 0.0000 0.0000 6.0500 6.0500 0.000000e+00 0.000000e+00 9
ID1028611175 Tas 2015-05-18 0.0000 30.9703 24.2376 55.2079 24.2376 0.000000e+00 0.000000e+00 5
ID1035947949 Tas 2015-09-11 0.4343 23.2239 10.5261 34.1844 10.9605 1.000000e-04 2.925311e-06 9
ID1036339023 Vic 2015-09-30 23.0755 2.6135 32.1910 57.8800 55.2665 -7.105427e-15 -1.227614e-16 9
ID1049634115 Vic 2015-07-02 1.5083 3.0167 13.5750 18.1000 15.0833 3.552714e-15 1.962825e-16 7
ID1051144034 WA 2015-09-01 55.3200 0.0000 0.0000 55.3200 55.3200 0.000000e+00 0.000000e+00 9
ID1052620238 Tas 2015-05-18 0.0000 11.2291 20.1707 31.3998 20.1707 0.000000e+00 0.000000e+00 5
ID105271783 Vic 2015-06-30 5.2698 8.5635 27.6667 41.5000 32.9365 0.000000e+00 0.000000e+00 6
ID1053972079 Tas 2015-09-04 21.0801 7.9393 1.3688 30.3882 22.4489 0.000000e+00 0.000000e+00 9
ID1058383417 Tas 2015-05-19 0.1000 9.5000 13.1000 22.7000 13.2000 -3.552714e-15 -1.565072e-16 5
ID1062837331 Vic 2015-09-29 19.9800 3.9623 35.4077 59.3500 55.3877 7.105427e-15 1.197208e-16 9
ID1070112260 NSW 2015-01-15 0.0000 9.8765 22.1235 32.0000 22.1235 0.000000e+00 0.000000e+00 1
ID1078930021 Vic 2015-06-26 0.0000 5.0189 32.9811 38.0000 32.9811 0.000000e+00 0.000000e+00 6
ID1084819986 Tas 2015-09-04 21.4551 13.0742 2.6819 37.2112 24.1370 -7.105427e-15 -1.909486e-16 9
ID1088965591 NSW 2015-04-01 0.0000 0.3443 59.5557 59.9000 59.5557 0.000000e+00 0.000000e+00 4
ID1098771283 Tas 2015-11-09 8.3760 15.1673 0.4528 23.9961 8.8287 0.000000e+00 0.000000e+00 11
ID1103883611 NSW 2015-02-24 0.0000 12.9166 82.7834 95.7000 82.7834 0.000000e+00 0.000000e+00 2
ID1108283583 Vic 2015-08-19 5.7730 7.2162 40.4108 53.4000 46.1838 -7.105427e-15 -1.330604e-16 8

Showing first {max_rows} rows only

Table

State ce_abs ce_abs.1 ce_abs.2 ce_abs.3 ce_abs.4 ce_rel ce_rel.1 ce_rel.2 ce_rel.3 ce_rel.4
mean std min max count mean std min max count
NSW 7.579122514774402e-16 5.549925956896144e-15 -1.4210854715202004e-14 2.842170943040401e-14 75 5.9984927158471725e-18 6.247577426811178e-17 -1.5313421029312505e-16 2.1466547908160125e-16 75
Tas 7.079683055580708e-17 4.1854806385090615e-05 -0.00010000000000331966 0.00010000000000331966 138 -2.225080908964888e-09 1.4454018973314639e-06 -6.447286981508574e-06 5.565418714263998e-06 138
Vic -0.002758928571428676 0.02917872405675814 -0.30879999999999974 0.00010000000000331966 112 -0.0002706022894376321 0.002863136958345086 -0.030300651542507235 5.555555555542608e-06 112
WA 4.440892098500626e-16 2.51214793389404e-15 0.0 1.4210854715202004e-14 32 6.153376885826002e-18 3.480875618531301e-17 0.0 1.9690806034643207e-16 32

Table

month ce_abs ce_abs.1 ce_abs.2 ce_abs.3 ce_abs.4 ce_rel ce_rel.1 ce_rel.2 ce_rel.3 ce_rel.4
mean std min max count mean std min max count
1.0 8.359326303060002e-16 2.3597520885057272e-15 0.0 7.105427357601002e-15 17 2.1894926912979286e-17 6.26264395700414e-17 0.0 2.1466547908160125e-16 17
2.0 2.9605947323337506e-16 6.7827319990657174e-15 -1.4210854715202004e-14 1.4210854715202004e-14 24 -1.9701267837655182e-18 6.917261940531651e-17 -1.5313421029312505e-16 1.3482784359774196e-16 24
4.0 -7.105427357601002e-16 2.2469334198890887e-15 -7.105427357601002e-15 0.0 10 -1.518253708889103e-17 4.8011397860877965e-17 -1.518253708889103e-16 0.0 10
5.0 -2.380952380185536e-06 1.5430334996855132e-05 -0.00010000000000331966 2.842170943040401e-14 42 -5.527983814683679e-08 3.582542969093264e-07 -2.3217532023551785e-06 1.7999815978723246e-16 42
6.0 -0.005828301886791941 0.042416702884588084 -0.30879999999999974 0.00010000000000331966 53 -0.0005717458196387811 0.004162109937349363 -0.030300651542507235 5.099257038418804e-06 53
7.0 2.4390243899349377e-06 2.7274574955527273e-05 -9.999999999976694e-05 9.999999999976694e-05 41 8.75461234554813e-08 1.13899935025474e-06 -4.098360655728153e-06 5.555555555542608e-06 41
8.0 -1.3513513513578039e-05 4.808650863058296e-05 -0.00010000000000331966 0.00010000000000331966 37 -4.3627897605013946e-07 1.2882967921235893e-06 -4.201680672259115e-06 2.066115702547927e-06 37
9.0 1.3256394323882465e-16 3.8924947208054574e-05 -0.00010000000000331966 0.00010000000000331966 67 7.139112535396771e-08 1.4557161838070191e-06 -6.447286981508574e-06 5.565418714263998e-06 67
10.0 1.0344827586305295e-05 4.092525928270141e-05 -0.00010000000000331966 0.00010000000000331966 29 1.6841733627103754e-07 9.393314908829852e-07 -2.9498525074725564e-06 2.994011976147295e-06 29
11.0 2.702702702672399e-06 4.992486847735313e-05 -0.00010000000000331966 0.00010000000000331966 37 -7.476005563815007e-10 1.2284833849897595e-06 -4.063388866305036e-06 3.555606124294022e-06 37

Summary

{
"module": "coherence_diagnostics",
"per_sample_csv": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_per_sample.csv",
"state_summary_csv": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_state_summary.csv",
"month_summary_csv": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_month_summary.csv",
"histogram_png": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_histogram.png",
"state_boxplot_png": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_state_boxplot.png",
"month_boxplot_png": "workspace/outputs/reports/eda/split_analytics/coherence_diagnostics/coherence_month_boxplot.png",
"n_samples": 357,
"ce_abs_mean": -0.0008655462184872013,
"ce_abs_std": 0.016343440616208217,
"ce_rel_mean": -8.489569601730977e-05,
"ce_rel_std": 0.0016036799014134855
}

Clustering

Clustering
Structure of state+month embeddings.

Image

Table

pc1 pc2
1.880687 -0.595691
1.880687 -0.595691
1.880687 -0.595691
1.880687 -0.595691
1.880687 -0.595691
-3.215288 -0.771861
-3.215288 -0.771861
-3.215288 -0.771861
-3.215288 -0.771861
-3.215288 -0.771861
2.204927 1.377851
2.204927 1.377851
2.204927 1.377851
2.204927 1.377851
2.204927 1.377851
-2.066397 0.052790
-2.066397 0.052790
-2.066397 0.052790
-2.066397 0.052790
-2.066397 0.052790

Showing first {max_rows} rows only

Summary

{
"module": "clustering",
"pca_embedding_csv": "workspace/outputs/reports/eda/split_analytics/clustering/pca_embedding.csv",
"pca_plot_png": "workspace/outputs/reports/eda/split_analytics/clustering/pca_scatter.png",
"umap_available": false,
"state_codes": {
"NSW": 0,
"Tas": 1,
"Vic": 2,
"WA": 3
},
"umap_embedding_csv": null,
"umap_plot_png": null
}

Train/Test Comparison

Train/Test Comparison
Distributional alignment where metadata exists.

Summary

{
"module": "train_test_comparison",
"state_comparison": null,
"month_comparison": null,
"state_month_comparison": null,
"notes": [
"State comparison skipped: 'State' column missing in train or test.",
"Month comparison skipped: 'Sampling_Date' missing in test.csv; add test metadata to enable.",
"State \u00d7 Month comparison skipped: test.csv lacks State/Sampling_Date metadata."
]
}

Fold Feasibility

Fold Feasibility
Strategy feasibility assessment.

Table

strategy status insufficient_states num_states min_samples_per_state reason min_gap unsafe_months empty_bins total_bins sparsity_ratio
spatial_statewise partially_feasible ['NSW', 'Tas', 'Vic', 'WA'] 4.0 32.0
temporal_monthwise high_leakage_risk Temporal gap below safety threshold 0.0 [1, 2, 4, 5, 6, 7, 8, 9, 10, 11]
spatiotemporal_bins sparse 30.0 48.0 0.625
cluster_based not_evaluable No UMAP embedding available
random_kfold feasible Baseline only

Summary

{
"spatial_statewise": {
"status": "partially_feasible",
"insufficient_states": [
"NSW",
"Tas",
"Vic",
"WA"
],
"num_states": 4,
"min_samples_per_state": 32
},
"temporal_monthwise": {
"status": "high_leakage_risk",
"reason": "Temporal gap below safety threshold",
"min_gap": 0.0,
"unsafe_months": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
]
},
"spatiotemporal_bins": {
"status": "sparse",
"empty_bins": 30,
"total_bins": 48,
"sparsity_ratio": 0.625
},
"cluster_based": {
"status": "not_evaluable",
"reason": "No UMAP embedding available"
},
"random_kfold": {
"status": "feasible",
"reason": "Baseline only"
}
}

Summary

{
"violations": [
{
"strategy": "spatial_statewise",
"status": "partially_feasible",
"insufficient_states": [
"NSW",
"Tas",
"Vic",
"WA"
],
"num_states": 4,
"min_samples_per_state": 32
},
{
"strategy": "temporal_monthwise",
"status": "high_leakage_risk",
"reason": "Temporal gap below safety threshold",
"min_gap": 0.0,
"unsafe_months": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
]
},
{
"strategy": "spatiotemporal_bins",
"status": "sparse",
"empty_bins": 30,
"total_bins": 48,
"sparsity_ratio": 0.625
},
{
"strategy": "cluster_based",
"status": "not_evaluable",
"reason": "No UMAP embedding available"
},
{
"strategy": "random_kfold",
"status": "feasible",
"reason": "Baseline only"
}
]
}

Split Candidate Evaluation

Split Candidate Evaluation
Strategy scoring & recommended CV approach.

Table

strategy score folds_csv
spatial_statewise 1 workspace/outputs/reports/eda/split_analytics/candidate_folds/spatial_statewise_folds.csv
random_kfold 1 workspace/outputs/reports/eda/split_analytics/candidate_folds/random_kfold_folds.csv

Summary

{
"spatial_statewise": {
"evaluation": {
"sample_counts": {
"0": 75,
"1": 138,
"2": 112,
"3": 32
},
"state_coverage": {
"0": [
"NSW"
],
"1": [
"Tas"
],
"2": [
"Vic"
],
"3": [
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
10
],
"1": [
5,
6,
7,
9,
11
],
"2": [
6,
7,
8,
9,
10
],
"3": [
7,
8,
9
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/spatial_statewise_folds.csv"
},
"random_kfold": {
"evaluation": {
"sample_counts": {
"0": 70,
"1": 63,
"2": 84,
"3": 80,
"4": 60
},
"state_coverage": {
"0": [
"NSW",
"Tas",
"Vic",
"WA"
],
"1": [
"NSW",
"Tas",
"Vic",
"WA"
],
"2": [
"NSW",
"Tas",
"Vic",
"WA"
],
"3": [
"NSW",
"Tas",
"Vic",
"WA"
],
"4": [
"NSW",
"Tas",
"Vic",
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"1": [
1,
2,
5,
6,
7,
8,
9,
10,
11
],
"2": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"3": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"4": [
1,
2,
5,
6,
7,
8,
9,
10,
11
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/random_kfold_folds.csv"
}
}

Summary

{
"best_strategy": "spatial_statewise",
"strategies": {
"spatial_statewise": {
"evaluation": {
"sample_counts": {
"0": 75,
"1": 138,
"2": 112,
"3": 32
},
"state_coverage": {
"0": [
"NSW"
],
"1": [
"Tas"
],
"2": [
"Vic"
],
"3": [
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
10
],
"1": [
5,
6,
7,
9,
11
],
"2": [
6,
7,
8,
9,
10
],
"3": [
7,
8,
9
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/spatial_statewise_folds.csv"
},
"random_kfold": {
"evaluation": {
"sample_counts": {
"0": 70,
"1": 63,
"2": 84,
"3": 80,
"4": 60
},
"state_coverage": {
"0": [
"NSW",
"Tas",
"Vic",
"WA"
],
"1": [
"NSW",
"Tas",
"Vic",
"WA"
],
"2": [
"NSW",
"Tas",
"Vic",
"WA"
],
"3": [
"NSW",
"Tas",
"Vic",
"WA"
],
"4": [
"NSW",
"Tas",
"Vic",
"WA"
]
},
"month_coverage": {
"0": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"1": [
1,
2,
5,
6,
7,
8,
9,
10,
11
],
"2": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"3": [
1,
2,
4,
5,
6,
7,
8,
9,
10,
11
],
"4": [
1,
2,
5,
6,
7,
8,
9,
10,
11
]
},
"leakage_violation": false
},
"score": 1,
"folds_csv": "workspace/outputs/reports/eda/split_analytics/candidate_folds/random_kfold_folds.csv"
}
},
"scores_csv": "workspace/outputs/reports/eda/split_analytics/strategy_scores.csv",
"scores_json": "workspace/outputs/reports/eda/split_analytics/strategy_scores.json"
}