plant-disease-predictor | Plant Health Prediction Using CNN Docs

Architecture

Service architecture, components, and data flow

Overview

This page describes the internal architecture of the Plant Disease Predictor, covering how its components fit together and how data moves through the system when you submit a leaf image for analysis. Understanding the architecture helps you integrate the application correctly, diagnose unexpected behaviour, and reason about performance characteristics. The system is built around three layers: a Streamlit-based web frontend, a TensorFlow/Keras CNN inference engine, and a pre-trained model artifact backed by a class-label index.

Architecture diagram

Loading diagram...

Components

The application is composed of the following components. Each entry describes what the component is, what it does, and what it means for you as a developer consuming the service.

1. Streamlit Frontend

Type: Web UI / entry point

The frontend is built with Streamlit and serves as the single user-facing surface of the application. It renders a file-upload widget (st.file_uploader) that accepts jpg, jpeg, and png images, a two-column result panel that shows a 150 × 150 thumbnail of your image alongside a Classify button, and a success message displaying the predicted class label once inference completes.

For API consumers embedding this service or extending it: the frontend is the boundary at which image bytes enter the system. If you are building a headless integration, this is the layer you would replace with a REST or gRPC endpoint.

2. Image Preprocessor

Type: Transformation layer

Implemented in load_and_preprocess_image() inside main.py, this component uses Pillow to open the uploaded image and NumPy to convert it into a tensor the model can consume. It performs three deterministic operations in order:

Resize to 224 × 224 pixels (the input resolution expected by the CNN).
Convert to a NumPy array and expand dims to shape (1, 224, 224, 3) — adding the batch dimension the model requires.
Cast to float32 and normalise pixel values to the [0, 1] range by dividing by 255.

This normalisation must match whatever normalisation was applied during training; mismatches here are a common source of degraded prediction accuracy.

3. CNN Inference Engine

Type: ML model runtime

The core prediction logic is a TensorFlow / Keras Sequential convolutional neural network loaded once at application startup from the .h5 model artifact. The network was trained on the PlantVillage dataset (colour images, 38 disease/healthy classes) using Conv2D layers on a GPU (Nvidia T4 via Google Colab).

At inference time the engine receives the preprocessed (1, 224, 224, 3) float32 tensor and returns a probability vector of shape (1, 38) — one softmax score per class.

4. Argmax Decoder

Type: Post-processing step

Implemented inline inside predict_image_class(), this step calls np.argmax(predictions, axis=1)[0] to select the index of the highest-confidence class from the probability vector. The index is an integer in the range [0, 37].

5. Class Index Map (`class_indices.json`)

Type: Configuration / lookup artifact

A JSON file that maps integer class indices (as string keys) to human-readable disease or health-status labels such as "Tomato___healthy" or "Apple___Black_rot". The file is loaded once at startup and kept in memory for the lifetime of the process. The labels follow the PlantVillage dataset naming convention: {PlantName}___{ConditionName}.

6. Model Artifact (`plant_disease_prediction_model.h5`)

Type: Serialised model weights

A Keras HDF5 file containing the trained CNN architecture and weights. It is loaded at process startup with tf.keras.models.load_model(). Because it is loaded into memory once and reused for every request, cold-start latency is paid only once per process lifecycle.

7. Training Pipeline (Offline)

Type: Offline / build-time component

The model was trained in a separate Google Colab notebook (PlantDiseasePrediction_CNN_ImageClassifier.ipynb) using the PlantVillage dataset (sourced from Kaggle). Training used ImageDataGenerator for on-the-fly augmentation and ran on a GPU T4 accelerator. The output of this pipeline — the .h5 file and class_indices.json — are the artifacts that the runtime system depends on. You do not need to interact with the training pipeline to consume the application.

Data flow

The following traces the complete lifecycle of a single prediction request from the moment you upload an image to the moment you receive a result.

Step 1 — Image Upload

You select or programmatically provide a jpg, jpeg, or png file through the Streamlit st.file_uploader widget. The file is held in memory as a file-like object (uploaded_image); it is never written to disk by the application at this stage.

Step 2 — Display Thumbnail

Before classification, Streamlit opens the uploaded bytes with Pillow (Image.open(uploaded_image)) and renders a 150 × 150 pixel thumbnail in the left column. This is purely presentational and does not affect the image data sent to the model.

Step 3 — Trigger Inference

When you press the Classify button, predict_image_class(model, uploaded_image, class_indices) is called. Control passes to the preprocessor.

Step 4 — Preprocessing

load_and_preprocess_image() receives the same file-like object, re-opens it with Pillow, resizes it to 224 × 224, converts it to a NumPy array of shape (224, 224, 3), expands dims to (1, 224, 224, 3), and normalises pixel values to float32 in [0, 1]. The resulting tensor is returned to the caller.

Step 5 — Model Inference

The preprocessed tensor is passed to model.predict(). The already-loaded TensorFlow/Keras CNN performs a forward pass through its Conv2D layers and returns a probability array of shape (1, 38). Each position in the array corresponds to one of the 38 PlantVillage classes.

Step 6 — Decoding

np.argmax(predictions, axis=1)[0] selects the index of the maximum probability value, yielding a single integer. That integer is converted to a string key and used to look up the class label in the in-memory class_indices dictionary.

Step 7 — Result Presentation

The resolved label string (e.g. "Tomato___Late_blight") is displayed in the right column via st.success(f'Prediction: {str(prediction)}'). No confidence score or alternative predictions are surfaced in the current implementation.

Startup sequence (relevant to cold-start behaviour)

Before any request can be served, two blocking operations occur when the Python process starts:

tf.keras.models.load_model("plant_disease_prediction_model.h5") — loads and compiles the CNN into memory.
json.load(open("class_indices.json")) — reads the label map into a Python dict.

Both artifacts must be present in the working directory (or their configured paths) or the application will fail to start.

Design decisions

The following design decisions shape how the application behaves and what trade-offs you inherit as a developer integrating it.

Decision 1: Sequential CNN with Conv2D layers

A Keras Sequential model using Conv2D layers was chosen as the classification backbone. Sequential models are straightforward to define, train, and serialise, making them a practical fit for an image-classification task with a fixed input size and a closed set of 38 output classes. The trade-off is that Sequential models offer less architectural flexibility than the Functional API or subclassed models, but that flexibility is not required here.

Decision 2: Input resolution fixed at 224 × 224

All images are resized to 224 × 224 pixels before inference. This resolution is a common convention for CNN image classifiers (aligned with architectures such as VGG and MobileNet) and balances spatial detail against computational cost. Fixing the input resolution means the model's weights are valid only for this size; if you supply images at a different resolution, the preprocessor silently resizes them, which may discard fine-grained detail in very high-resolution photographs.

Decision 3: Pixel normalisation to [0, 1]

Pixel values are divided by 255 to produce float32 values in [0, 1]. This normalisation range was chosen to match the convention used during training. Using a different normalisation at inference time (e.g. [-1, 1]) would silently produce incorrect predictions without raising an error, so this value is effectively a contract between the training pipeline and the runtime.

Decision 4: Model and class map loaded once at startup

Both plant_disease_prediction_model.h5 and class_indices.json are loaded into memory when the Streamlit process starts, rather than on each request. This eliminates per-request disk I/O and model deserialisation overhead, making individual inference calls faster. The cost is a longer cold-start time and higher baseline memory usage for the process.

Decision 5: Streamlit as the deployment framework

Streamlit was chosen for its low-friction path from a Python ML script to a browser-accessible UI, and for its compatibility with Docker-based deployment. It requires no separate frontend codebase. The trade-off is that Streamlit is not a conventional REST API framework; it is not designed for high-concurrency or headless programmatic access, which limits its suitability as a backend service in a microservices architecture without modification.

Decision 6: PlantVillage dataset (colour variant)

The model was trained exclusively on the colour variant of the PlantVillage dataset, ignoring the segmented and grayscale variants that are also present in the dataset. Colour images provide richer feature signals (e.g. yellowing, browning) relevant to disease identification. Segmented images would have been an alternative but introduce a dependency on a prior segmentation step that is not available at inference time.

Trade-offs

The following known limitations and alternatives are important to understand before integrating or extending the application.

Limitation 1: No REST API surface

The application exposes its functionality exclusively through a Streamlit UI. There is no HTTP endpoint you can call programmatically with a raw image and receive a JSON response. If you need machine-to-machine integration, you must either wrap the inference logic in a separate Flask/FastAPI service or invoke the Streamlit interface programmatically — neither of which is supported out of the box.

Limitation 2: No confidence score returned

The current implementation returns only the top-1 predicted class label. The full probability vector from model.predict() is computed but discarded after argmax. You cannot currently distinguish a high-confidence prediction (e.g. 98%) from a low-confidence one (e.g. 34%), which matters for downstream decision-making in agricultural contexts where a wrong recommendation has real consequences.

Limitation 3: Closed class set (38 classes)

The model can only classify diseases and health states present in the PlantVillage training dataset. If you submit an image of a plant species or disease not represented in those 38 classes, the model will still return one of the 38 labels — potentially with high confidence — rather than indicating an out-of-distribution input. There is no rejection or uncertainty threshold implemented.

Limitation 4: Single-process, single-threaded inference

Streamlit runs as a single Python process. The TensorFlow model runs synchronously inside the request handler. Under concurrent usage, requests will queue rather than be processed in parallel. For production workloads with multiple simultaneous users, you would need to move inference behind a dedicated serving layer (e.g. TensorFlow Serving, Triton, or a multi-worker FastAPI deployment).

Limitation 5: Input image quality not validated

The preprocessor will accept and resize any image that Pillow can open, regardless of whether it contains a leaf, is blurred, or is entirely irrelevant. There is no input validation, quality check, or pre-screening step. Garbage-in / garbage-out applies directly.

Alternative not chosen: Grayscale or segmented images

The PlantVillage dataset includes grayscale and segmented variants. Training on segmented images could improve robustness to background clutter, but would require a segmentation preprocessing step at inference time that is not currently implemented. Grayscale images would reduce input dimensionality and model size but sacrifice the colour cues that are diagnostically significant for plant diseases.

Alternative not chosen: Transfer learning from a pre-trained backbone

The model is trained from scratch using a custom Sequential CNN. Using a pre-trained backbone (e.g. MobileNetV2 or EfficientNet with ImageNet weights) would typically yield higher accuracy with less training data and shorter training time, but was not the approach taken here. If you need higher accuracy, replacing the backbone with a pre-trained model through transfer learning is the most impactful architectural change available.