Architecture
Service architecture, components, and data flow
This page describes the internal architecture of the Plant Disease Predictor, covering how its components fit together and how data moves through the system when you submit a leaf image for analysis. Understanding the architecture helps you integrate the application correctly, diagnose unexpected behaviour, and reason about performance characteristics. The system is built around three layers: a Streamlit-based web frontend, a TensorFlow/Keras CNN inference engine, and a pre-trained model artifact backed by a class-label index.
Loading diagram...
The application is composed of the following components. Each entry describes what the component is, what it does, and what it means for you as a developer consuming the service.
1. Streamlit Frontend
Type: Web UI / entry point
The frontend is built with Streamlit and serves as the single user-facing surface of the application. It renders a file-upload widget (st.file_uploader) that accepts jpg, jpeg, and png images, a two-column result panel that shows a 150 × 150 thumbnail of your image alongside a Classify button, and a success message displaying the predicted class label once inference completes.
For API consumers embedding this service or extending it: the frontend is the boundary at which image bytes enter the system. If you are building a headless integration, this is the layer you would replace with a REST or gRPC endpoint.
2. Image Preprocessor
Type: Transformation layer
Implemented in load_and_preprocess_image() inside main.py, this component uses Pillow to open the uploaded image and NumPy to convert it into a tensor the model can consume. It performs three deterministic operations in order:
- Resize to 224 × 224 pixels (the input resolution expected by the CNN).
- Convert to a NumPy array and expand dims to shape
(1, 224, 224, 3)— adding the batch dimension the model requires. - Cast to
float32and normalise pixel values to the [0, 1] range by dividing by 255.
This normalisation must match whatever normalisation was applied during training; mismatches here are a common source of degraded prediction accuracy.
3. CNN Inference Engine
Type: ML model runtime
The core prediction logic is a TensorFlow / Keras Sequential convolutional neural network loaded once at application startup from the .h5 model artifact. The network was trained on the PlantVillage dataset (colour images, 38 disease/healthy classes) using Conv2D layers on a GPU (Nvidia T4 via Google Colab).
At inference time the engine receives the preprocessed (1, 224, 224, 3) float32 tensor and returns a probability vector of shape (1, 38) — one softmax score per class.
4. Argmax Decoder
Type: Post-processing step
Implemented inline inside predict_image_class(), this step calls np.argmax(predictions, axis=1)[0] to select the index of the highest-confidence class from the probability vector. The index is an integer in the range [0, 37].
5. Class Index Map (class_indices.json)
Type: Configuration / lookup artifact
A JSON file that maps integer class indices (as string keys) to human-readable disease or health-status labels such as "Tomato___healthy" or "Apple___Black_rot". The file is loaded once at startup and kept in memory for the lifetime of the process. The labels follow the PlantVillage dataset naming convention: {PlantName}___{ConditionName}.
6. Model Artifact (plant_disease_prediction_model.h5)
Type: Serialised model weights
A Keras HDF5 file containing the trained CNN architecture and weights. It is loaded at process startup with tf.keras.models.load_model(). Because it is loaded into memory once and reused for every request, cold-start latency is paid only once per process lifecycle.
7. Training Pipeline (Offline)
Type: Offline / build-time component
The model was trained in a separate Google Colab notebook (PlantDiseasePrediction_CNN_ImageClassifier.ipynb) using the PlantVillage dataset (sourced from Kaggle). Training used ImageDataGenerator for on-the-fly augmentation and ran on a GPU T4 accelerator. The output of this pipeline — the .h5 file and class_indices.json — are the artifacts that the runtime system depends on. You do not need to interact with the training pipeline to consume the application.
The following traces the complete lifecycle of a single prediction request from the moment you upload an image to the moment you receive a result.
Step 1 — Image Upload
You select or programmatically provide a jpg, jpeg, or png file through the Streamlit st.file_uploader widget. The file is held in memory as a file-like object (uploaded_image); it is never written to disk by the application at this stage.
Step 2 — Display Thumbnail
Before classification, Streamlit opens the uploaded bytes with Pillow (Image.open(uploaded_image)) and renders a 150 × 150 pixel thumbnail in the left column. This is purely presentational and does not affect the image data sent to the model.
Step 3 — Trigger Inference
When you press the Classify button, predict_image_class(model, uploaded_image, class_indices) is called. Control passes to the preprocessor.
Step 4 — Preprocessing
load_and_preprocess_image() receives the same file-like object, re-opens it with Pillow, resizes it to 224 × 224, converts it to a NumPy array of shape (224, 224, 3), expands dims to (1, 224, 224, 3), and normalises pixel values to float32 in [0, 1]. The resulting tensor is returned to the caller.
Step 5 — Model Inference
The preprocessed tensor is passed to model.predict(). The already-loaded TensorFlow/Keras CNN performs a forward pass through its Conv2D layers and returns a probability array of shape (1, 38). Each position in the array corresponds to one of the 38 PlantVillage classes.
Step 6 — Decoding
np.argmax(predictions, axis=1)[0] selects the index of the maximum probability value, yielding a single integer. That integer is converted to a string key and used to look up the class label in the in-memory class_indices dictionary.
Step 7 — Result Presentation
The resolved label string (e.g. "Tomato___Late_blight") is displayed in the right column via st.success(f'Prediction: {str(prediction)}'). No confidence score or alternative predictions are surfaced in the current implementation.
Startup sequence (relevant to cold-start behaviour)
Before any request can be served, two blocking operations occur when the Python process starts:
tf.keras.models.load_model("plant_disease_prediction_model.h5")— loads and compiles the CNN into memory.json.load(open("class_indices.json"))— reads the label map into a Python dict.
Both artifacts must be present in the working directory (or their configured paths) or the application will fail to start.
The following design decisions shape how the application behaves and what trade-offs you inherit as a developer integrating it.
Decision 1: Sequential CNN with Conv2D layers
A Keras Sequential model using Conv2D layers was chosen as the classification backbone. Sequential models are straightforward to define, train, and serialise, making them a practical fit for an image-classification task with a fixed input size and a closed set of 38 output classes. The trade-off is that Sequential models offer less architectural flexibility than the Functional API or subclassed models, but that flexibility is not required here.
Decision 2: Input resolution fixed at 224 × 224
All images are resized to 224 × 224 pixels before inference. This resolution is a common convention for CNN image classifiers (aligned with architectures such as VGG and MobileNet) and balances spatial detail against computational cost. Fixing the input resolution means the model's weights are valid only for this size; if you supply images at a different resolution, the preprocessor silently resizes them, which may discard fine-grained detail in very high-resolution photographs.
Decision 3: Pixel normalisation to [0, 1]
Pixel values are divided by 255 to produce float32 values in [0, 1]. This normalisation range was chosen to match the convention used during training. Using a different normalisation at inference time (e.g. [-1, 1]) would silently produce incorrect predictions without raising an error, so this value is effectively a contract between the training pipeline and the runtime.
Decision 4: Model and class map loaded once at startup
Both plant_disease_prediction_model.h5 and class_indices.json are loaded into memory when the Streamlit process starts, rather than on each request. This eliminates per-request disk I/O and model deserialisation overhead, making individual inference calls faster. The cost is a longer cold-start time and higher baseline memory usage for the process.
Decision 5: Streamlit as the deployment framework
Streamlit was chosen for its low-friction path from a Python ML script to a browser-accessible UI, and for its compatibility with Docker-based deployment. It requires no separate frontend codebase. The trade-off is that Streamlit is not a conventional REST API framework; it is not designed for high-concurrency or headless programmatic access, which limits its suitability as a backend service in a microservices architecture without modification.
Decision 6: PlantVillage dataset (colour variant)
The model was trained exclusively on the colour variant of the PlantVillage dataset, ignoring the segmented and grayscale variants that are also present in the dataset. Colour images provide richer feature signals (e.g. yellowing, browning) relevant to disease identification. Segmented images would have been an alternative but introduce a dependency on a prior segmentation step that is not available at inference time.
The following known limitations and alternatives are important to understand before integrating or extending the application.
Limitation 1: No REST API surface
The application exposes its functionality exclusively through a Streamlit UI. There is no HTTP endpoint you can call programmatically with a raw image and receive a JSON response. If you need machine-to-machine integration, you must either wrap the inference logic in a separate Flask/FastAPI service or invoke the Streamlit interface programmatically — neither of which is supported out of the box.
Limitation 2: No confidence score returned
The current implementation returns only the top-1 predicted class label. The full probability vector from model.predict() is computed but discarded after argmax. You cannot currently distinguish a high-confidence prediction (e.g. 98%) from a low-confidence one (e.g. 34%), which matters for downstream decision-making in agricultural contexts where a wrong recommendation has real consequences.
Limitation 3: Closed class set (38 classes)
The model can only classify diseases and health states present in the PlantVillage training dataset. If you submit an image of a plant species or disease not represented in those 38 classes, the model will still return one of the 38 labels — potentially with high confidence — rather than indicating an out-of-distribution input. There is no rejection or uncertainty threshold implemented.
Limitation 4: Single-process, single-threaded inference
Streamlit runs as a single Python process. The TensorFlow model runs synchronously inside the request handler. Under concurrent usage, requests will queue rather than be processed in parallel. For production workloads with multiple simultaneous users, you would need to move inference behind a dedicated serving layer (e.g. TensorFlow Serving, Triton, or a multi-worker FastAPI deployment).
Limitation 5: Input image quality not validated
The preprocessor will accept and resize any image that Pillow can open, regardless of whether it contains a leaf, is blurred, or is entirely irrelevant. There is no input validation, quality check, or pre-screening step. Garbage-in / garbage-out applies directly.
Alternative not chosen: Grayscale or segmented images
The PlantVillage dataset includes grayscale and segmented variants. Training on segmented images could improve robustness to background clutter, but would require a segmentation preprocessing step at inference time that is not currently implemented. Grayscale images would reduce input dimensionality and model size but sacrifice the colour cues that are diagnostically significant for plant diseases.
Alternative not chosen: Transfer learning from a pre-trained backbone
The model is trained from scratch using a custom Sequential CNN. Using a pre-trained backbone (e.g. MobileNetV2 or EfficientNet with ImageNet weights) would typically yield higher accuracy with less training data and shorter training time, but was not the approach taken here. If you need higher accuracy, replacing the backbone with a pre-trained model through transfer learning is the most impactful architectural change available.