Stop Feeding Your Model JPEG Mush: Detect Over-Compressed Images Automatically

If you’ve ever assembled a seemingly “high-res” image dataset only to discover it’s full of crunchy block artifacts, smeared edges, and over-sharpened halos, you’re not alone. At AI Tech Inspire, we spotted a practical question that hits home for anyone prepping data for generative models: how do you automatically detect over-compressed, low-quality images without a pristine reference image?

What kicked off the discussion

Small dataset (~1,000 images) aimed at a generative AI project.
Images are technically high-resolution (1 MP+), yet many suffer from JPEG artifacts, upscaled blur, and over-compression.
Existing attempts: Laplacian variance (sharpness) catches blur but misses compression; edge density + contrast heuristics are inconsistent; manual review is not scalable.
Seeking an open-source, no-reference image quality assessment (IQA) solution to score perceptual quality.
Bonus: ability to run in Node.js, ONNX, or TensorFlow.js for seamless JS pipeline integration.

Why this matters for your models

Generative models are bluntly honest: garbage in, garbage out. Low-quality inputs—especially those that are “fake sharp” (over-compressed, upscaled, or halo-heavy)—teach the model to imitate artifacts. The result: outputs with blocky textures, ringing around edges, and washed-out detail. For training data curation, automated filtering is the difference between scaling to thousands of images and sinking hours into manual triage.

Key takeaway: A high pixel count is not the same as high perceptual quality. Detecting low-quality images early prevents expensive downstream debugging.

No-reference IQA models to try first

No-reference Image Quality Assessment (NR-IQA) models estimate perceived quality without needing an original, clean reference. These are strong candidates for catching “JPEG hell.”

Classical NR-IQA: BRISQUE, NIQE, PIQE, and BLIINDS-II. They analyze natural scene statistics and are often sensitive to compression artifacts and blur. Pros: light, fast, widely implemented. Cons: less robust to diverse content and stylized images.
Deep NR-IQA: NIMA (Neural Image Assessment), MUSIQ, MANIQA, and CONTRIQUE. These use CNN/Transformer features and generally correlate better with human opinion scores. Pros: higher accuracy and better generalization. Cons: heavier models but still practical for batch scoring.

Many of these have community ports and can be exported to ONNX or run in TF.js. If you prefer Python first, libraries like PyTorch have quality toolkits (e.g., the piq library) that implement both classical and deep metrics, and models can often be exported to ONNX for production in JS.

Turn this into a production-friendly Node/ONNX/TF.js pipeline

Here’s a pragmatic path that balances accuracy with deployment simplicity:

Start with a deep NR-IQA baseline (e.g., NIMA or MUSIQ). These produce a mean opinion score (MOS) or quality score per image, which is a good single number to rank images.
Export to ONNX for easy integration with onnxruntime-node. Many community repos show torch.onnx.export (for PyTorch) or tf2onnx (for TensorFlow) steps.
Run in Node.js using onnxruntime-node for CPU, with optional GPU via onnxruntime-web + WebGPU.
TF.js option: Convert a TensorFlow SavedModel to TF.js format using the tensorflowjs_converter. Then score images with @tensorflow/tfjs-node for speed.

Minimal setup examples:

# Node (ONNX) npm install onnxruntime-node

# Node (TF.js) npm install @tensorflow/tfjs-node

# Export PyTorch model to ONNX (illustrative) # In Python import torch from model import IQAModel model = IQAModel().eval() dummy = torch.randn(1, 3, 384, 384) torch.onnx.export(model, dummy, "iqamodel.onnx", opset_version=15, input_names=["input"], output_names=["score"], dynamic_axes={"input": {0: "batch"}})

Heuristics that specifically catch “JPEG hell”

NR-IQA scores are great, but pairing them with lightweight detectors boosts precision for compression-specific artifacts:

Blockiness score: Measure gradient jumps across 8×8 block boundaries. High periodic jumps at multiples of 8 pixels often signal JPEG blocking. Quick to compute, and very indicative of over-compression.
Ringing/oversharpening: Around strong edges, look for alternating gradient signs that extend several pixels outward. A high edge contrast with oscillating halos suggests aggressive sharpening or low-quality deconvolution.
DCT histogram irregularity: For images still in JPEG, quantization tables and DCT coefficient histograms can reveal harsh compression. If the file was saved as PNG after compression, fall back to spatial-domain checks.
Upscaled blur: Compare image size to high-frequency energy (e.g., via a high-pass or Laplacian). A large image with consistently weak high-frequency content indicates upscaling without detail.

These features act like fast prefilters. Use them to avoid spending model time on obvious rejects, or to add explainability to your decisions: “rejected due to blockiness + low NR-IQA score.”

A practical scoring recipe

To keep things deterministic and maintainable, combine signals into a small composite score:

Q = NR-IQA score (normalized 0–1).
B = blockiness (higher is worse, normalized 0–1).
R = ringing severity (higher is worse, normalized 0–1).
F = high-frequency energy ratio (lower can indicate upscaled blur).

One simple rule-based decision:

reject if (Q < 0.35) or (B > 0.6) or ((F < 0.2) and (Q < 0.45))

It’s intentionally interpretable. You can evolve it into a tiny logistic model trained on your own accept/reject annotations later.

Calibration: don’t skip the 200-image sanity check

Even with robust models, quality thresholds are dataset-dependent. Sample ~200 images across sources, manually tag them as keep/reject, and compute your metrics. A quick ROC or precision-recall plot tells you where to set a conservative cutoff. It’s a one-time investment that pays off in fewer false rejects.

For faster triage, keep a shortlist view in your curation UI. Use j/k to scan, and display per-image reasons: “Low IQA (0.28), high blockiness (0.72)”. Teams move faster when the system explains itself.

Speed and scaling tips

Batching: Many NR-IQA models support batches; score images in packs of 8–32 for better utilization.
Patching: For very large images, score center and corners with patches (e.g., 384×384) and average.
Parallel I/O: Decode with sharp in Node, and pre-resize to the model’s target size to minimize memory copies.
Mixed precision: If you target WebGPU or CUDA-backed runtimes, FP16 inference improves throughput without hurting ranking performance.

Where these models shine (and where they don’t)

NR-IQA models excel at “natural” photos—portraits, landscapes, everyday web content. They can be less reliable on stylized art, heavy CGI, or screenshots with crisp UI edges. If your dataset includes synthetic content for generative tasks like Stable Diffusion training, consider a dual-path strategy: a photo-trained NR-IQA for natural content and a specialized classifier (e.g., artifact detector trained on synthetic examples) for stylized domains. Uploading a subset to a personal space on Hugging Face for quick experiments can accelerate comparisons.

Suggested tooling map

Classical metrics: BRISQUE/NIQE/PIQE via Python bindings; export results as JSON for Node.
Deep NR-IQA: NIMA, MUSIQ, MANIQA, CONTRIQUE (PyTorch or TensorFlow). Export to ONNX and run with onnxruntime-node.
JS-first: Convert a TensorFlow SavedModel to TF.js and score with @tensorflow/tfjs-node.
Pre/post: PyTorch or TensorFlow for quick prototyping; deploy to Node once thresholds stabilize.

The bottom line: a hybrid of NR-IQA scoring plus targeted artifact heuristics is a reliable way to auto-filter over-compressed, low-quality images. It’s fast enough for a ~1k image dataset, and robust enough to scale. The payoff is immediate—cleaner training data, fewer artifacts in downstream generations, and a reproducible quality gate you can defend to your team.

If you’ve been relying on resolution and sharpness alone, try adding a deep NR-IQA score and a blockiness detector this week. Chances are, you’ll catch the exact class of images that “look fine at first glance” but quietly poison model performance. And that’s the kind of quiet win that keeps shipping velocity high.

Recommended Resources

As an Amazon Associate, I earn from qualifying purchases.

ML Foundations (1st Ed.)

Core ML theory.

Fiverr Marketplace

Hire AI talent.