Work in Progress

Project: Plum Process

Universal Image Processing Scripts

Where the idea came from: Whilst getting out of a swamp, I called Miss Plum, and whilst we sorted out the logistics of getting "Homer" out of the swamp, the idea to collate my loose scripts into a repo came to my mind. Hence the name: Plum Process

Aims of the Repo

A collection of image processing scripts with the target of automating image analysis for Petabyte sized datasets

Repo Sub Aim: Stitching for small images and splitting for large images

The repo Contains several collision detection algorithms and annotation quality checkers

Designed to be as flexible as possible, allowing scripts to be integrated across several vision projects, speeding up development time and working as boilerplate code for computer vision applications

(Thumbnails are references to personal events based around what I was doing when I was developing these scripts)

Algorithms within Repo

Gilbert–Johnson–Keerthi Collision Algorithm 3D
(gjk.py)

Purpose of the Code

Collision Detection: The primary goal is to determine if two convex 3D shapes overlap.
Distance Calculation: Unlike simple boolean collision checks, this implementation computes the minimum Euclidean distance between non-intersecting shapes.
Closest Point Identification: It identifies the specific pair of points (one on each object) that are closest to each other.
Generic Solver: It uses the GJK (Gilbert–Johnson–Keerthi) algorithm, which allows it to handle any convex shape (cubes, spheres, capsules, convex hulls) as long as a "support function" (a way to find the furthest point in a direction) is provided.

--------------------------------------------------------------------

Algorithmic Breakdown

The core logic resides in the gjk function, which iteratively builds a geometric shape (a simplex) inside the Minkowski Difference of the two objects.

1. Initialization (polyhedron_intersect_*)

Support Setup: A GJKSupport object is initialized with arbitrary starting points from both shapes.
Simplex Setup: A GJKSimplex container is created to hold vertices (up to 4) that define the current search space (point, line, triangle, or tetrahedron).

2. The GJK Iteration Loop (gjk function)

The function gjk is called repeatedly. In each pass, it attempts to get closer to the origin (0,0,0) within the Minkowski Difference configuration space.

Step A: Generate Support Point
- It calculates a search direction vector based on the previous iteration.
- It finds the "support point"—the vertex furthest along that direction—and adds it to the current simplex.
- Significance: If the new support point doesn't pass the origin along the search direction, the objects definitely do not intersect, and the algorithm terminates early.
Step B: Reduce Simplex (The if s.cnt == ... blocks) This is the heaviest part of the code. It analyzes the current simplex to see which feature is closest to the origin.
- Line Case (cnt=2): It checks if the origin lies between the two points or "behind" one of them. It discards the vertex furthest away.
- Triangle Case (cnt=3): It uses cross products to determine if the closest point to the origin lies on the triangle face or on one of its edges. It discards vertices that don't contribute to the closest feature.
- Tetrahedron Case (cnt=4): It checks the voronoi regions of the tetrahedron faces.
  - If the origin is inside the tetrahedron, it sets s.hit = True (Collision detected).
  - If outside, it identifies which face the origin is closest to, discards the vertex not part of that face, and sets the new search direction.
Step C: Compute Closest Point & New Direction
- It calculates the actual coordinates of the closest point (pnt) on the current simplex using barycentric coordinates (bc).
- It checks convergence: if the distance squared (d2) hasn't improved significantly, the algorithm assumes it has found the minimum distance and stops.
- New Direction: The new search direction is set to the vector from the closest point towards the origin (-pnt).

3. Result Analysis (gjk_analyze)

Once the loop terminates (either by finding a hit or converging on a minimum distance):

Barycentric Interpolation: It uses the cached barycentric coordinates (bc) to map the abstract simplex result back to real-world coordinates.
Point Reconstruction: It reconstructs the specific contact points p0 (on object A) and p1 (on object B).

4. Quadric Adjustment (gjk_quad)

Radius Handling: GJK works on "hard" vertices. To support shapes like spheres or capsules, the code treats them as points or lines first.
Surface Projection: After the core GJK finishes, this function subtracts the radius of the sphere/capsule from the calculated distance to get the true surface-to-surface distance.

Shamos-Hoey Algorithm (Validity of segmentation.py)

Purpose of the Code

Polygon Validation: Checks if a list of vertices forms a valid, non-self-intersecting shape.
Intersection Localization: If an intersection exists, it calculates the exact (x, y) coordinate where the edges cross.
High Performance: Optimized for handling polygons with many vertices using a sorted "active list" structure.

--------------------------------------------------------------------

Algorithmic Breakdown

The algorithm visualizes a vertical line sweeping across the plane from left to right.

1. Input & Preprocessing

Segment Creation: The list of (x, y) points is converted into a list of edges (Segments).
Normalization: Each segment is ordered left-to-right (if p1.x > p2.x, the points are swapped) to simplify logic.

2. Event Queue Generation

Instead of checking the entire shape at once, the algorithm breaks the problem down into "Events":

Left Event (Start): The sweep line hits the start of a segment.
Right Event (End): The sweep line hits the end of a segment.
Sorting: All events are sorted by their x-coordinate. The sweep line processes them in this order.

3. The Sweep Line Loop (_process_events)

The code iterates through the sorted events, maintaining a Status Structure (SweepLineStatus) containing all segments currently intersecting the vertical sweep line, sorted by their y-coordinate.

When hitting a LEFT Event (Segment Start):
1. Add the segment to the Status structure.
2. Check for intersection with its immediate neighbors (the segment directly above and directly below it in the Status list).
When hitting a RIGHT Event (Segment End):
1. Remove the segment from the Status structure.
2. Check for intersection between the new neighbors (the segments that were previously separated by the removed segment).

4. Intersection Logic (_segments_intersect)

CCW Check: Uses the cross product (Counter-Clockwise check) to determine orientation without expensive trigonometry.
Adjacency Filter: Explicitly ignores intersections between segments that share a vertex (e.g., segment AB and BC touch at B, which is valid and not a self-intersection).
Epsilon: Uses a small epsilon value (1e-9) to handle floating-point inaccuracies.

5. Termination

Early Exit: As soon as any intersection is found, the function halts and returns True (plus the point).
Completion: If the event queue is emptied with no intersections found, the polygon is confirmed as simple.

Gilbert–Johnson–Keerthi Collision Algorithm 2D
(gjk_nesterov_2d.py)

Purpose of the Code

High-Performance Collision Check: Uses numba (JIT compilation) to achieve C-like speeds in Python.
Exact Distance: Computes the precise Euclidean distance between non-overlapping shapes, not just a boolean "hit/miss".
Convergence Acceleration: Uses Nesterov Momentum to modify the search direction. Standard GJK can "zigzag" as it approaches the closest points; Nesterov acceleration smooths this path, reducing the number of iterations required to reach a specific tolerance.

Algorithmic Breakdown

The algorithm operates in Minkowski Difference Space. It tries to find the point in the Minkowski difference (A - B) that is closest to the origin (0,0).

1. Setup & Support Functions

Instead of passing raw geometry (lists of vertices), the algorithm takes Support Functions.

A support function takes a direction vector vec{d} and returns the vertex of the shape furthest in that direction.
Benefit: This abstraction allows the solver to handle circles, ellipses, boxes, and polygons uniformly without changing the core logic.

2. The Main Loop (gjk_nesterov_accelerated_2d)

The solver iterates until it finds the origin or the distance converges.

Step A: Momentum Update (The "Nesterov" part)
- Standard GJK sets the next search direction strictly based on the current closest point.
- This implementation: Calculates a weighted average of the current search direction and the previous one (momentum * ray + ...).
- Effect: It predicts where the search is heading, dampening oscillations and converging faster.
Step B: Simplex Expansion
- It calculates a new support point s = support1(-d) - support2(d).
- It checks the Frank-Wolfe duality gap. If the gap is small enough, the algorithm knows it has reached the optimal distance and terminates early.
Step C: Simplex Projection (Reduction)
The algorithm maintains a "Simplex" (a set of 1 to 3 points) and constantly simplifies it to keep only the features closest to the origin.
- Line Case (project_line_origin_2d): Projects the origin onto the line segment. If the projection is on the segment, the simplex stays a line. If it's off the ends, the simplex reduces to a single point.
- Triangle Case (project_triangle_origin_2d):
  - It checks if the origin is inside the triangle. If yes, inside = True (Collision detected).
  - If no, it identifies which edge is closest to the origin, discards the vertex opposite to it, and downgrades the simplex to a line.

3. Geometric Primitives (numba optimized)

The heavy lifting of vector math is offloaded to small, JIT-compiled functions:

triple_product_2d: Used to find perpendicular vectors (normals) in 2D.
cross_2d: Standard 2D cross product to check winding order/orientation.
origin_inside_triangle_2d: The "win condition" for intersection.

Key Features Summary

@numba.njit Decorators: All low-level math functions are compiled to machine code, bypassing the slow Python interpreter loop.
Implicit Geometry: Shapes are defined mathematically (centers/radii) rather than just as mesh data, saving memory.
Robustness: Includes an inflation and tolerance parameter to handle floating-point inaccuracies, ensuring the loop doesn't run forever when shapes are barely touching.

Example Usage Logic

Define Shapes: You wrap your polygon data or bounding boxes in the provided _support functions (e.g., polygon_support(verts)).
Run Solver: Call gjk_nesterov_accelerated_2d_intersection(poly_a, poly_b).
Pipeline Integration:
- If True (Intersecting): You might calculate the IOU (Intersection over Union) or use a depth heuristic to decide which polygon occludes the other.
- If False (Disjoint): The distance return value tells you how close they are, which is useful for clustering or tracking objects across frames.

AKAZE feature matching algorithm (akaze_feature_matching.py)

Purpose of the Code

Feature Extraction: Automatically detects distinct features in images using the AKAZE method.
Normalization Study: It allows users to run detection on both "raw" images and "normalized" images (using percentile-based normalization) to see which yields better features.
Dataset Generation: It converts complex OpenCV objects (KeyPoint) into structured Pandas DataFrames and CSV files, making the feature data ready for Machine Learning or statistical analysis.
Visual Debugging: It generates diagnostic plots showing exactly where the algorithm "looked" and found features.

Algorithmic Breakdown

The class ImageFeatureMatcher encapsulates the entire workflow for a single image.

1. Preprocessing & Normalization (_norm_img, _prepare_gray_image)

Computer vision feature detectors generally operate on grayscale images.

Original Path: Simply converts the BGR image to Grayscale.
Normalized Path: Uses csbdeep.utils.normalize (a library often used in microscopy/bio-imaging) to normalize pixel intensities between the 1st and 99.8th percentiles. This improves contrast and robustness against lighting variations.

2. Feature Detection & Description (keypoints_and_descriptors)

Detector: It initializes an AKAZE (Accelerated-KAZE) detector. AKAZE is a fast, nonlinear scale-space detector that finds features that are robust to scaling and rotation.
Compute: It runs detectAndCompute, which performs two tasks:
1. Detect: Finds (x, y) coordinates of interesting points.
2. Describe: Generates a binary vector (descriptor) describing the texture/gradient around that point.

3. Data Serialization (keypoints_to_dataframe)

OpenCV returns C++ style objects that are hard to analyze in Python. This step flattens them:

Keypoint Attributes: Extracts x, y, size (diameter of the meaningful area), angle (orientation), and response (strength of the feature).
Descriptor Unpacking: The 61-byte (or similar) binary descriptor array is split into individual columns (descriptor_0, descriptor_1, ...). This creates a "tabular" representation of the visual features.

4. Visualization (plot_features)

It draws the keypoints onto the image using cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS.
Rich Keypoints: Instead of just dots, this draws circles corresponding to the size of the feature and lines indicating the orientation. This allows you to visually verify if the algorithm is tracking stable structures (like corners or blobs).

5. Orchestration (analyze_and_save_all_data)

This wrapper runs the entire pipeline twice:

Once for the Normalized version.
Once for the Original version.
It saves CSVs and Plots for both, and finally merges them into a massive "Master CSV" for the image.

Key Components

cv2.AKAZE_create(): The specific algorithm used. AKAZE is known for being computationally efficient and good at preserving edges compared to Gaussian-based methods like SIFT.
csbdeep: A specialized library for deep learning in microscopy. Its presence suggests this code might be intended for scientific or medical images where contrast normalization is critical.
Pandas Integration: The code invests heavily in converting unstructured vision data into structured DataFrames, facilitating downstream analysis (e.g., "Do normalized images yield larger keypoints on average?").

Graham Scan algorithm
(graham_scan_convex_algo.py)

Algorithmic Breakdown

The GrahamScan class orchestrates a pipeline of operations:

1. Preprocessing (The Preprocessor pipeline)

Before the main algorithm starts, the data is cleaned to improve speed and stability.

Interior Filtering (SweepLineFilter): Identifies the 4 extreme points (min/max X and Y). It forms a quadrilateral connecting them and discards all points inside, as they cannot possibly be on the hull. This drastically reduces the dataset size.
Coordinate Compression (CoordinateCompressor): Maps complex floating-point coordinates to integer indices based on a tolerance. This prevents "jitter" where two points are mathematically distinct but computationally identical.

2. Geometric Calculation (PointCloud)

Instead of calculating angles one by one, the code uses Parallelized Numba Functions:

Pivot Selection: Finds the point with the lowest Y coordinate (and lowest X if there's a tie).
Polar Angles (_parallel_polar_angles): Computes the angle of every point relative to the pivot using arctan2. This runs on multiple threads simultaneously.
Distances (_parallel_distances_sq): Computes squared distances to handle collinear points (points lying on the same line).

3. Sorting (SortStrategy)

The Graham Scan requires points to be processed in counter-clockwise order.

ArraySortStrategy: Standard fast sort. Used for normal datasets.
GeneratorSortStrategy: Used for massive datasets. Instead of creating a huge sorted array in memory, it lazily yields indices one by one.

4. The Stack Traversal (_graham_stack)

This is the core of the Graham Scan, JIT-compiled for speed.

It iterates through the sorted points.
It maintains a stack of "hull candidates".
The Left Turn Check: For every new point, it checks the cross product of the last two points.
- If the turn is Clockwise (Right), the previous point creates a concavity and is popped (removed).
- If the turn is Counter-Clockwise (Left), the point is added.

Key Features & Optimizations

Python 3.14 Free-Threading Support: The ExecutorFactory checks sys._is_gil_enabled(). If the Global Interpreter Lock (GIL) is disabled (a feature coming in Python 3.14), it uses ThreadPoolExecutor for true parallelism. Otherwise, it falls back to ProcessPoolExecutor.
Numba prange: The decorators @njit(parallel=True) and loops using prange allow the CPU to process the mathematical heavy lifting (angle/distance calculations) across all available cores automatically.
Strategy Pattern: The code uses Object-Oriented patterns to swap behaviors. For example, you can switch between ArraySortStrategy and GeneratorSortStrategy via configuration without changing the core logic.
FastMath: The @njit(fastmath=True) flag relaxes strict IEEE 754 floating-point compliance, allowing the compiler to use faster CPU instructions (like AVX) for trigonometric functions.

Interactive Bounding Box Heatmap Visualizer
(heatmap_visuals)

Purpose of the Code

Spatial Distribution Analysis: Visualizes where objects appear most frequently on the image canvas.
Dataset Debugging: Helps identify biases (e.g., a self-driving car dataset where pedestrians only appear on the right side).
Interactive Exploration: Provides a GUI (built on Plotly) to toggle layers, change color schemes, and adjust smoothing levels dynamically without re-running the script.

Algorithmic Breakdown

The "algorithm" is distributed across four modules that form a pipeline: Data Modeling -> Processing -> Configuration -> Visualization.

1. Data Ingestion (heatmap_models.py)

BoundingBox Class: A dataclass that accepts raw coordinates (x0, y0, x1, y1).
Centroid Calculation: It automatically computes the geometric center (center_x, center_y) of each box. These center points are the primary input for the density estimation.

2. Density Estimation (heatmap_processors.py)

This module transforms discrete box coordinates into a continuous density field.

Binning (Histogram): It divides the canvas into a grid. The grid resolution is determined by the bin_divisor config setting. It uses numpy.histogram2d to count how many box centers fall into each grid cell.
Gaussian Smoothing: To convert the blocky histogram into a smooth heatmap, it applies a Gaussian Filter (scipy.ndimage.gaussian_filter). This spreads the "heat" of a single point to neighboring cells, creating a continuous probability gradient.

3. Configuration Management (heatmap_config_loader.py & .yaml)

Centralized Control: Instead of hardcoding visual settings, all parameters (canvas size, colors, opacity, UI layout) are stored in heatmap_config.yaml.
Loader: The Config class wraps dictionary access, abstracting the YAML loading process.

4. Visualization Construction (heatmap_viz.py)

This is the orchestrator that builds the Plotly figure using the processed data.

Layering: It constructs the visualization in three distinct layers:
- Heatmap Trace: The bottom layer showing the smoothed density field (z data).
- Scatter Trace: Dots representing the exact centers of specific boxes.
- Shapes: Wireframe rectangles representing the actual bounding boxes, generated by the BoxShapeFactory.
Interactive Menus (MenuBuilder):
- Visibility: Uses Plotly's update method to toggle the visibility of the scatter points and shape layers.
- Smoothing: Uses the restyle method to swap the Heatmap's z data with pre-computed arrays of different smoothing levels (Low/Medium/High).

Key Components

BoxShapeFactory (heatmap_viz.py): A factory class that converts the data-centric BoundingBox objects into visual Plotly shape dictionaries (type: "rect"). This separation of concerns keeps the plotting logic clean.
HeatmapData (heatmap_processors.py): The mathematical engine. It lazily calculates properties like x_centers and y_centers (the coordinates for the heatmap axes) only when requested, ensuring efficiency.
MenuBuilder (heatmap_viz.py): Dynamically generates the Plotly layout buttons based on the configuration file. For example, if you add "Magma" to the colorscales list in YAML, it automatically appears in the dropdown menu.

Chromatic Aberration (CA) Correction Algorithm
(chromatic_abbr_fix)

Purpose of the Code

Artifact Removal: Corrects color fringing along high-contrast edges (e.g., purple/green edges on tree branches against a bright sky).
Edge-Preserving Correction: Unlike simple blurring, it uses Guided Filtering to smooth out color errors without losing the sharpness of the original image details.
Robustness: Includes "safety" mechanisms to prevent the algorithm from creating new artifacts (over-correction) in flat areas.

Algorithmic Breakdown

The system is organized into a pipeline: Configuration -> Filtering Kernel -> Channel Modeling-> Blending -> Orchestration.

1. Configuration & Utilities (config.py, generators.py)

Singleton Config: The Config class acts as a central repository for hyperparameters like radius (filter size), strength, and numerical epsilons.
Parameter Generation: generators.py provides iterators to feed parameters into the pipeline and includes a heap-based utility (find_optimal_radius_candidates) to guess the best filter radius based on edge gradient statistics.

2. The Math Engine: Filters & Manifolds (filters.py, mainfolds.py)

This is the mathematical core. It assumes that the distortion in the Red/Blue channels is locally linear with respect to the Green channel.

Guided Filter: Implemented in filters.py, this is the heavy lifter. It filters an image (the source) using the structure of a second image (the guide). It ensures that corrections follow the actual edges of the object.
Manifold Construction:
- Log-Ratios: LogRatioComputer calculates the intensity difference between the target (e.g., Red) and the guide (Green) in log-space, effectively linearizing the relationship.
- Manifold Fitting: ManifoldBuilder uses the Guided Filter to fit two local models ("manifolds")—an upper and a lower bound—that describe how the color channel should look based on the guide's structure.
- Interpolation: CorrectionInterpolator blends these two manifolds to predict the corrected pixel value.

3. Correction Logic (corrector.py, blending.py)

Channel Correction: ChannelCorrector ties the math together. It takes a target channel and a guide, computes the manifolds, and generates a corrected image.
Safety Mechanisms:
- Constraints: ModeConstraint allows users to restrict corrections to only "brighten" or "darken" pixels, useful if the CA is strictly one-sided.
- Blending: SafetyBlender compares the "ratio change" between the original and corrected versions. If the correction is too drastic (exceeding a threshold), it blends the original pixel back in to prevent artifacts.

4. Orchestration (pipeline.py, api.py)

Pipeline: CACorrectRGB is the main class. It validates the input image, normalizes it, and iterates through the channels using channel_generator. It skips the guide channel (Green) and applies the correction to the others.
API: api.py provides a simplified function correct_chromatic_aberration that handles the boilerplate of setting up the classes and generators.

Key Components

GuidedFilter (filters.py): This is the most critical component. By calculating local covariance and variance between the guide (Green) and the source (Red/Blue), it allows the algorithm to distinguish between actual image details (which should be kept) and color fringing (which should be smoothed out).
SafetyBlender (blending.py): Chromatic aberration correction can sometimes "break" valid colors in highly textured areas. The safety blender calculates a deviation score; if the correction deviates too far from the local color ratio, it is rejected or reduced.
Synthetic Testing (main.py): The create_synthetic_test_image function generates a test pattern with mathematically perfect Gaussian blobs that are spatially offset. This allows the algorithm's performance to be verified against a ground truth where the exact offset is known.

COCO Dataset Annotation Transformer (annotation_transformer.py)

Purpose of the Code

Automated Label Migration: Instead of manually re-labeling images after correcting their perspective or alignment, this script calculates the transformation mathematically and moves the labels to the correct new positions.
Dataset Integrity: Ensures that transformed labels remain valid by clamping them to image boundaries and removing "degenerate" (collapsed or empty) annotations.
Homography Estimation: Uses computer vision feature matching to discover the relationship between the original and corrected images without needing manual calibration parameters.

Algorithmic Breakdown

The process follows a pipeline: Load Data -> Compute Transform -> Warp Annotations -> Export.

1. Transformation Discovery (compute_transformation_matrix)

For every image in the dataset, the script finds the geometric relationship between the "Original" and "Corrected" versions:

Feature Extraction: It uses an ImageFeatureMatcher (implied external dependency) to detect keypoints (likely corners/edges) in both images.
Matching: It pairs these points using a Brute-Force Matcher (cv2.BFMatcher).
Filtering: It keeps only "good" matches where the distance is less than the mean plus half a standard deviation to remove outliers.
Homography: It feeds the matched points into cv2.findHomography using RANSAC. This computes a 3 x 3 matrix that can map any point (x, y) from the old image to the new one.

2. Geometry Transformation

Once the matrix is known, the script applies it to different annotation types:

Bounding Boxes (transform_bbox):
- Converts the box (x, y, w, h) into 4 corner points.
- Applies the perspective transform to each corner.
- Clamping: Forces points that fall outside the image (e.g., negative coordinates) back to the edge using clamp_point.
- Re-bounding: Calculates the new bounding box that encloses the warped corners.
Segmentations (transform_segmentation):
- Iterates through polygon vertices, transforms them, and clamps them.
- Validation: Checks the area of the new polygon using the Shoelace formula (_polygon_area). If the polygon has collapsed (area approx 0), it is discarded.
Keypoints (transform_keypoints):
- Transforms individual (x, y) coordinates.
- Preserves the visibility flag (the 3rd value in COCO keypoints).

3. Dataset Reconstruction (transform_dataset_annotations)

Deep Copy: Starts with a copy of the original COCO JSON structure.
Image Updates: Updates the width, height, and file_name fields to match the corrected images.
Annotation Filtering: Iterates through all annotations, transforming them using the cached matrices. If a transformation fails or results in an invalid annotation (e.g., a box shrinks to 0 width), that annotation is removed from the dataset.
Export: Saves the new structure as a valid COCO JSON file.

Key Components

clamp_point:
A safety utility that ensures transformed coordinates never exceed the image dimensions.
Python

x = max(0, min(x, width - 1))

This prevents "out of bounds" errors in downstream training pipelines.
Homography with RANSAC:
In compute_transformation_matrix, the use of RANSAC (Random Sample Consensus) is critical. Feature matching often produces false positives (wrong matches). RANSAC randomly samples matches to find a model that fits the majority, effectively ignoring the outliers that would otherwise ruin the transformation matrix.
Degeneracy Checks:
The code proactively cleans the dataset.
- if bbox_w <= 1 or bbox_h <= 1: return None
- if ... self._polygon_area(transformed_points) > 1
  This ensures the output dataset doesn't contain "garbage" labels that could cause training loss to result in NaN.

Percentile based Image Greyscale Algorithm

Purpose of the Code

Dimensionality Reduction: Converts 3-channel color images (Blue-Green-Red) into single-channel 2D matrices, reducing computational load.
Contrast Enhancement: Uses percentile-based normalization to make features distinct, even if the original image is very dark or has bright outliers.
Standardization: Ensures that all images entering a pipeline have the same data type (uint8) and intensity distribution.

Algorithmic Breakdown

The logic is split into two dependent functions.

1. Grayscale Conversion (to_grayscale)

Validation: It first calls validate_image (a function not included in this snippet) to ensure the input is valid (e.g., not None, not empty).
Dimensionality Check: It inspects image.ndim to determine if the image is color (3 dimensions: Height, Width, Channels) or already grayscale (2 dimensions).
Conversion:
- If Color (3 dims): Uses OpenCV's cv2.cvtColor with the COLOR_BGR2GRAY flag to mathematically combine the channels.
- If Gray (2 dims): Returns a copy of the image to ensure the original data remains immutable.

2. Robust Normalization (to_normalized_grayscale)

This function wraps the grayscale conversion with an intensity adjustment step.

Preprocessing: Calls to_grayscale to ensure it is working with a single channel.
Percentile Calculation: Instead of simple Min-Max scaling (which is sensitive to dead pixels or glare), it uses csbdeep.utils.normalize.
- It finds the 1st percentile (low) and 99.8th percentile (high) of pixel values.
- It clips the image data to this range and scales it between 0 and 1.
Re-quantization: The normalized data (now floating-point) is multiplied by 255 and cast back to uint8 (integers 0-255), making it compatible with standard image viewers and OpenCV functions.

Key Components

csbdeep.utils.normalize: This is a specialized function from the CSBDeep library (Content-Aware Image Restoration). It is preferred over standard normalization because it handles outliers robustly. If an image has one extremely bright pixel (glare), standard normalization would make the rest of the image black. Percentile normalization ignores that outlier, preserving the contrast of the actual object.
validate_image: An external dependency (missing from this snippet) that acts as a guard clause. It likely checks if the input array is None or has a size of 0, preventing cryptic "Segmentation Faults" or NoneType errors later in the pipeline.

Image Metadata Extractor (img_metadata_extractor.py)

Purpose of the Code

Multi-Format Extraction: It doesn't rely on a single standard; it pulls data from the older EXIF standard, the press-oriented IPTC standard, and the modern, XML-based XMP standard.
Data Cleaning: It normalizes the messy output often found in metadata (e.g., removing XML namespaces like exif:) and converts string representations of numbers back into actual floats/integers.
Unification: It flattens all found metadata into a single key-value structure, making it easy to export to JSON or a database.

Algorithmic Breakdown

The Metadata_Extractor class follows a linear extraction pipeline orchestrated by the run_metadata_extractor method.

1. Initialization

The class is instantiated with a path to an image file.

2. Extraction Phase

The script opens the file once and passes the file object to three specialized readers:

EXIF Extraction (read_exif_metadata): Uses the exifread library to parse standard camera tags (shutter speed, ISO, date digitized).
IPTC Extraction (read_iptc_metadata): Uses the IPTCInfo library to read legacy media metadata. It actively filters out empty or None values to keep the result clean.
XMP Extraction (read_xmp_metadata):
- Byte Scanning: Unlike the other methods, this manually scans the raw bytes of the image to find the start (<x:xmpmeta) and end (</x:xmpmeta>) tags.
- XML Parsing: Once the XMP block is isolated, it decodes the bytes to a string and uses BeautifulSoup to parse the XML structure.
- Attribute Scraping: It locates the rdf:Description tag and extracts all its attributes as key-value pairs.

3. Cleaning Phase (xmp_metadata_cleaner)

Raw XMP data is often cluttered. The cleaner performs two tasks:

Namespace Stripping: It removes prefixes like exif: or photoshop: from keys (e.g., converting exif:FNumber to just FNumber).
Type Inference: It uses the static method _convert_value to check if a string looks like a number (using regex) and converts it to a Python float if possible.

4. Unification Phase

The dictionaries from all three sources are merged.
Type Standardization: Finally, every key and value in the combined dictionary is cast to a string (str()) to ensure consistent serialization (e.g., for JSON export) and prevent encoding errors later.

Key Components

Manual XMP Parsing: Instead of relying on a heavy XMP library, the code implements a lightweight "search and parse" strategy.
This is robust because XMP is simply an XML packet embedded in the file header, making it accessible even without specialized drivers.
_convert_value: A utility method that attempts to restore numeric data types from string-based metadata.

Automatic Lens Distortion Correction
(lens_dist_fix.py)

Purpose of the Code

Optical Correction: Removes geometric distortions caused by the physical properties of the camera lens (e.g., straight lines appearing curved).
Metadata-Driven: Fully automates the process by reading the EXIF data (Camera Make, Model, Focal Length, Aperture) to determine exactly how the image was shot.
Database Lookup: Relies on the lensfun library, which contains pre-calibrated profiles for thousands of commercial cameras and lenses.

Algorithmic Breakdown

The process follows a pipeline: Parse Metadata -> Query Database -> Initialize Modifier -> Remap Pixels.

1. Metadata Parsing (_conv_fraction)

Format Conversion: EXIF data often stores numbers as strings of fractions (e.g., "28/10" for a 2.8 aperture). The helper function _conv_fraction converts these into usable floating-point numbers (2.8).
Extraction: The main function lens_fix extracts critical shooting parameters from the input metadata_dict: Camera Make/Model, Lens Model, Focal Length, Aperture, and Focus Distance.

2. Profile Selection

Camera Lookup: It queries the lensfunpy.Database to find the camera body. This determines the sensor size and crop factor.
Lens Lookup: It searches for the specific lens used. If the exact lens model isn't in the metadata, it tries to find compatible lenses for that camera.
Failure Handling: If the camera or lens isn't found in the database, the script prints an error and aborts to avoid applying incorrect corrections.

3. Distortion Modeling

Modifier Initialization: It creates a lensfunpy.Modifier object. This is the mathematical engine that models the lens's physical path.
Parameter Setting: It feeds the specific shooting conditions (Focal Length, Aperture, Subject Distance) into the modifier. This is crucial because a zoom lens distorts differently at 18mm vs. 55mm, and differently at f/2.8 vs. f/11.

4. Image Rectification

Geometry Calculation: The modifier generates an undistort_map. This is a set of coordinates that tells the computer, "The pixel at (x, y) should actually move to (x', y') to be straight".
Pixel Remapping: The script uses OpenCV's cv2.remap to physically move the pixels according to the map.
- Interpolation: It uses cv2.INTER_LANCZOS4, a high-quality resampling method, to ensure the corrected image remains sharp and doesn't look jagged.
Export: The corrected image is saved to the output directory with an undistorted_ prefix.

Key Components

lensfunpy:
A Python wrapper for the C++ library lensfun. It is the industry standard for open-source lens correction (used by software like Darktable and RawTherapee).
cv2.remap:
This function performs the actual pixel manipulation. It is highly optimized and can warp images arbitrarily given a coordinate map.
Distance Handling: The script attempts to find the focus distance (RelativeAltitude or SubjectDistance). This is a subtle but important detail: lens distortion changes slightly depending on how close the subject is (breathing). If unknown, it defaults to 0 (infinity).

Nested BoundingBox Finder (nested_bbox_finder.py)

Purpose of the Code

Geometric Analysis: Identifies complex relationships between rectangles, specifically distinguishing between partial overlaps and full containment (nesting).
Data Validation/Cleaning: Can be used to filter out redundant bounding boxes in computer vision tasks (like Non-Maximum Suppression) or layout analysis.
Test Data Generation: Includes robust utilities to generate synthetic datasets with guaranteed nesting and overlapping properties for verification.

Algorithmic Breakdown

The core logic revolves around the find_overlapping function, which orchestrates the sweep-line process.

1. Data Structures

Event: Represents a vertical line segment (left or right edge of a rectangle). It stores the x-coordinate, the rectangle's bounds, and whether it's a "start" (left edge) or "end" (right edge).
IntervalUnionQuery: This is the heavy lifter. It implements a Segment Tree (or a similar interval tree structure) over the y-coordinates.
- It discretizes the y-space using unique y-coordinates from all rectangles.
- It supports range updates (modify_interval): incrementing a range when a rectangle starts and decrementing it when it ends.
- It tracks "active" regions to detect when multiple rectangles cover the same y-interval simultaneously.

2. The Sweep-Line Algorithm (find_overlapping)

The algorithm "sweeps" a vertical line across the plane from left to right.

Initialization:
- Rectangles are normalized (ensuring x0 < x1, etc.).
- _build_events: Converts rectangles into Event objects (2 per rectangle).
- Events are sorted by x-coordinate.
Processing Events:
The code iterates through the sorted events:
- Start Event (Left Edge):
  - Checks active_rects to see if this new rectangle is fully contained within the y-bounds of any currently active rectangle. If so, it marks it as a nesting candidate.
  - Adds the rectangle to the IntervalUnionQuery structure (increments coverage count).
- End Event (Right Edge):
  - Verifies nesting candidates: If a candidate's "outer" rectangle is still active when the candidate ends, full containment is confirmed.
  - Removes the rectangle from the IntervalUnionQuery structure (decrements coverage count).
Overlap Detection:
- The IntervalUnionQuery records "sweep events" whenever the coverage count changes.
- find_overlaps: Analyzes these changes to identify continuous regions where the coverage count is ge 2 (meaning 2+ rectangles overlap).

3. Post-Processing

Transitivity (_find_transitivity): Uses networkx to build a graph where nodes are rectangles and edges represent overlaps. Connected components in this graph represent groups of overlapping rectangles.
Invalid Index Identification: find_invalid_inds helps filter out "bad" rectangles (e.g., keeping only the largest one in an overlapping group), useful for tasks like deduplication.

4. Test Data Generators

The script includes sophisticated functions to generate unit test data:

generate_nested_rectangles: Creates groups of rectangles that are guaranteed to be strictly inside one another.
generate_mixed_test_rectangles: Creates a complex scene with a mix of nested, overlapping, and isolated rectangles to stress-test the algorithm.

Key Components

Coordinate Compression:
The IntervalUnionQuery doesn't work on continuous float coordinates. It maps unique y-coordinates to integer indices (y_map), allowing the segment tree to work on a discrete grid.
Efficiency:
By sorting events by X and using a tree for Y, the algorithm avoids checking every rectangle against every other rectangle. This makes it scalable for large numbers of inputs.

Resolution Grouper
(resolution_grouper.py)

Purpose of the Code

File Organization: Automatically declutters folders containing mixed image sizes (e.g., separating high-res wallpapers from low-res thumbnails).
Resolution Grouping: Sorts images strictly by pixel dimensions without resizing or modifying the actual content.
Efficiency: Uses lazy evaluation to handle folders with thousands of images without consuming excessive memory.

Algorithmic Breakdown

The ImageSorter class is designed as an Iterator, allowing it to process files one by one rather than loading everything at once.

1. Initialization & Discovery (__init__)

Path Setup: Converts the input string to a pathlib.Path object for robust cross-platform file handling.
Generator Creation: Instead of creating a list of all files (which could be huge), it creates a generator expression (self._files). This lazily yields files only when requested.
Filtering: It checks files against a whitelist of extensions (.jpg, .png, .webp, etc.) to ignore non-image files.

2. The Iteration Loop (__next__)

The class implements the Python iterator protocol (__iter__ and __next__).

When the class is iterated over (e.g., in the sort method), __next__ retrieves the next file path from the generator and passes it to _process_image.
This pattern supports "streaming" execution, allowing the process to be stopped or monitored easily.

3. Dimension Extraction (_get_dimensions)

Metadata Reading: Uses PIL.Image.open() to access the file.
Lazy Loading: Crucially, Pillow (PIL) reads only the image header to determine img.size. It does not load the pixel data, making this operation extremely fast even for large files.
Error Handling: It wraps the read operation in a try...except block to safely handle corrupted or invalid image files (returning None to skip them).

4. File Operation (_move_to_target)

Target Calculation: Constructs the folder name string dynamically: "{width}x{height}".
Directory Safety: Calls target_dir.mkdir(exist_ok=True) to ensure the subfolder exists; if it is already there, it proceeds without error.
Collision Check: Checks if destination.exists() to prevent overwriting files that have the same name in the target folder.
Relocation: Uses shutil.move to physically transfer the file.

Key Components

pathlib.Path: The script eschews string manipulation for file paths. Using Path objects allows for clean syntax like target_dir / file_path.name, handling operating system separators (\ vs /) automatically.
Lazy Generator (self._files):
This is a memory optimization. If the folder contains 100,000 images, the script does not build a 100,000-item list in RAM. It finds and processes one file at a time.
Pillow (PIL.Image): The external dependency PIL allows the script to "see" the image properties without the overhead of decoding the visual data.

Object Detection Dataset Optimization Pipeline (annotated_processor/ object_detection)

Purpose of the Code

Automated Quality Assurance: Detects "bad" images (blurry, overexposed, poor texture) before they pollute a training set.
Geometric Normalization: Corrects lens distortion in images and—crucially—mathematically warps the corresponding ground truth annotations (bounding boxes, masks) to match the new geometry.
Hyperparameter Optimization: Analyzes the physical properties of objects (aspect ratio, size, location) to mathematically derive optimal anchor box sizes and augmentation strategies.
Latent Pattern Discovery: Uses a neural network autoencoder to find hidden groups of objects (e.g., "small objects in corners") that might be hard for a model to learn.

Algorithmic Breakdown

The system operates in four distinct stages: Ingestion & QA -> Correction -> Clustering -> Recommendation.

1. Image Quality Analysis (obj_det_main.py, spatial_frequency_analysis.py, exposure_analysis.py)

For every image in the dataset, the system performs a forensic analysis:

Frequency Domain Analysis: SpatialFrequencyAnalyzer converts images to the frequency domain (FFT). It calculates ratios of high-vs-low frequency energy to detect blur and computes "directional energy" to find motion blur.
Photographic Analysis: ExposureAnalyzer behaves like a digital light meter. It checks dynamic range, detects clipped highlights/shadows, and even calculates how many "stops" of exposure compensation are needed.
Feature Extraction: ImageFeatureMatcher uses the AKAZE algorithm to find distinctive keypoints, which serve as landmarks for alignment.

2. Geometric Correction Pipeline (obj_det_main.py)

Lens Fix: The script applies lens distortion correction (using metadata and likely lensfun as seen in previous contexts) to straighten curved lines.
Annotation Morphing: Once the image is corrected, transform_annotations_for_dataset finds the transformation matrix between the original and corrected image. It then warps all COCO annotations (boxes, polygons) so they perfectly align with the new, undistorted image.

3. Autoencoder-Enhanced Clustering (annotation_clustering.py)

This is the advanced machine learning component. Instead of just clustering based on box width/height, it:

Feature Engineering: Extracts dozens of descriptors for every annotation: compactness, elongation, distance_from_center, seg_circularity, etc..
Deep Compression: Trains a PyTorch Autoencoder (Deep, Sparse, or Variational) to compress these features into a latent space. This forces the model to learn the most essential structural characteristics of the objects.
Clustering: Runs algorithms like KMeans or DBSCAN on these "learned" features to group objects into semantic clusters (e.g., "long thin objects", "large centered objects").

4. Optimization Recommendations (object_detection_feature_optimizer.py)

Based on the clusters found, this module acts as a data scientist advisor:

Anchor Boxes: Calculates the optimal width/height for anchor boxes by analyzing the centroids of the discovered clusters.
Augmentation Strategy: Checks feature importance. If "rotation" or "color" separates the clusters heavily, it suggests adding rotation or color jitter augmentations.
Class Balancing: Identifies "hard" clusters (those with mixed classes or high variance) and suggests increasing loss weights for those specific samples.

Key Components

AutoencoderTrainer (annotation_clustering.py): A complete PyTorch training loop that learns to encode annotation features. It includes reconstruction loss (MSE) and regularization (KL-divergence for VAEs, L1 for Sparse) to ensure the learned features are robust.
SpatialFrequencyAnalyzer (spatial_frequency_analysis.py): This class is critical for filtering out low-quality data. By analyzing the radial_profile of the power spectrum, it can objectively determine if an image is sharp enough for training without human inspection.
general_stats.py: A comprehensive visualization engine that builds an interactive HTML dashboard (using Plotly) and a navigation system, allowing users to explore the dataset distributions and the results of the analysis.

Data-Centric Computer Vision Pipeline (annotated_processor/ seg_splitter)

Purpose of the Code

Dataset Lifecycle Management: Handles the creation of synthetic data, merging of datasets, and resolution-based splitting while managing unique IDs efficiently.
Image Restoration & Label Alignment: Corrects optical defects (Chromatic Aberration, Lens Distortion) and mathematically transforms the ground truth annotations (bounding boxes, polygons) to align with the corrected images.
Tiling Strategy: Implements a sweep-line algorithm to slice high-resolution images into smaller tiles with calculated overlaps, ensuring objects aren't lost at the edges.
Deep Analytics: Uses unsupervised learning (Autoencoders) to cluster objects by latent features, providing actionable advice on anchor box sizing and augmentation strategies.

Algorithmic Breakdown

The pipeline operates as a modular toolbox, roughly following this data flow:

1. Data Generation & Management (coco_dataset_*.py, coco_bloom_filt.py)

Synthetic Generation: The CocoDatasetGenerator creates dummy datasets with random polygons for testing pipelines.
Bloom Filter ID Tracking: To prevent ID collisions when merging huge datasets without loading all IDs into memory, it uses a Bloom Filter.
- It hashes IDs using mmh3 into a bit array.
- It allows the system to check might_exist(id) with a defined false positive rate (0.1%), significantly reducing memory usage compared to Python sets.

2. Slicing & Tiling (image_slice_coords_finder.py, preprocess.py)

For high-res satellite or medical imagery, the system slices images into tiles.

Sweep-Line Algorithm: Instead of a naive grid, it uses a sweep-line approach with an IntervalUnionQuery (Segment Tree variant).
Overlap Management: It calculates optimal slice positions to maintain specific overlap ratios (e.g., 20% overlap) to ensure objects on boundaries are preserved in at least one tile.

3. Image Restoration Engine (pipeline.py, corrector.py, obj_det_main.py)

This phase improves image quality before training.

Chromatic Aberration (CA) Correction: The CACorrectRGB pipeline aligns color channels.
- Manifold Building: It uses a Guided Filter to create "manifolds" (local linear models) of the Red/Blue channels relative to the Green channel.
- Log-Ratio: It computes the log-difference between channels to linearize the color error.
Lens Correction: obj_det_main.py orchestrates lens distortion removal and triggers the AnnotationTransformer to warp the corresponding bounding boxes and masks using the computed homography.

4. Quality & Feature Analysis (spatial_frequency_analysis.py, exposure_analysis.py)

Frequency Domain: Converts images to the Fourier domain (FFT) to analyze the Radial Profile of frequencies. This detects blur (lack of high frequencies) or specific noise patterns.
Exposure: Calculates dynamic range utilization and histogram entropy to detect clipped highlights or crushed shadows.

5. Deep Optimization (annotation_clustering.py, object_detection_feature_optimizer.py)

Instead of manual analysis, this module "learns" the dataset structure.

Feature Extraction: Extracts extensive metadata from annotations: elongation, compactness, corner_distance, etc..
Autoencoder: Trains a PyTorch Autoencoder (Sparse, Variational, or Deep) to compress these features into a latent representation, capturing non-linear relationships between object size, shape, and position.
Actionable Insights:
- Clustering: Groups the latent features (K-Means/DBSCAN) to find semantic object types.
- Recommendations: The optimizer suggests specific Anchor Box sizes based on cluster centroids and recommends augmentations (e.g., "Apply rotation" if orientation is a key differentiating feature).

Key Components

COCOAnnotationBloomFilter (coco_bloom_filt.py):
A memory-efficient probabilistic data structure. By storing existence in a bitarray rather than a list of integers, it scales to millions of annotations with constant memory footprint.
IntervalUnionQuery (image_slice_coords_finder.py):
A specialized Segment Tree implementation used during image slicing. It efficiently tracks coverage across the Y-axis as the sweep-line moves across the X-axis, ensuring that overlaps are calculated correctly in O(N \log N) time.
GuidedFilter (filters.py):
Used in the restoration pipeline. Unlike a Gaussian blur, it smooths noise (or color fringing) while preserving edges by using a reference image (the Green channel) as a structural guide.

Work in Progress

Project: Plum Process

Universal Image Processing Scripts

Aims of the Repo

Gilbert–Johnson–Keerthi Collision Algorithm 3D (gjk.py)

Purpose of the Code

Algorithmic Breakdown

Shamos-Hoey Algorithm (Validity of segmentation.py)

Purpose of the Code

Algorithmic Breakdown

Gilbert–Johnson–Keerthi Collision Algorithm 2D (gjk_nesterov_2d.py)

Purpose of the Code

Algorithmic Breakdown

Key Features Summary

Example Usage Logic

AKAZE feature matching algorithm (akaze_feature_matching.py)

Purpose of the Code

Algorithmic Breakdown

Key Components

Graham Scan algorithm (graham_scan_convex_algo.py)

Algorithmic Breakdown

Key Features & Optimizations

Interactive Bounding Box Heatmap Visualizer(heatmap_visuals)

Purpose of the Code

Algorithmic Breakdown

Key Components

Chromatic Aberration (CA) Correction Algorithm (chromatic_abbr_fix)

Purpose of the Code

Algorithmic Breakdown

Key Components

COCO Dataset Annotation Transformer (annotation_transformer.py)

Purpose of the Code

Algorithmic Breakdown

Key Components

Percentile based Image Greyscale Algorithm

Purpose of the Code

Algorithmic Breakdown

Key Components

Image Metadata Extractor (img_metadata_extractor.py)

Purpose of the Code

Algorithmic Breakdown

Key Components

Automatic Lens Distortion Correction (lens_dist_fix.py)

Purpose of the Code

Algorithmic Breakdown

Key Components

Nested BoundingBox Finder (nested_bbox_finder.py)

Purpose of the Code

Algorithmic Breakdown

Key Components

Resolution Grouper(resolution_grouper.py)

Purpose of the Code

Algorithmic Breakdown

Key Components

Object Detection Dataset Optimization Pipeline (annotated_processor/ object_detection)

Purpose of the Code

Algorithmic Breakdown

Key Components

Data-Centric Computer Vision Pipeline (annotated_processor/ seg_splitter)

Purpose of the Code

Algorithmic Breakdown

Key Components

Gilbert–Johnson–Keerthi Collision Algorithm 3D
(gjk.py)

Gilbert–Johnson–Keerthi Collision Algorithm 2D
(gjk_nesterov_2d.py)

Graham Scan algorithm
(graham_scan_convex_algo.py)

Interactive Bounding Box Heatmap Visualizer
(heatmap_visuals)

Chromatic Aberration (CA) Correction Algorithm
(chromatic_abbr_fix)

Automatic Lens Distortion Correction
(lens_dist_fix.py)

Resolution Grouper
(resolution_grouper.py)