Skip to main content

Vision Algorithm Acceleration

This guide covers techniques for optimizing and accelerating computer vision algorithms on Qualcomm platforms, leveraging specialized hardware blocks and optimized libraries.

FastCV Library

Qualcomm's FastCV library provides hardware-accelerated implementations of common computer vision functions:

#include "fastcv.h"

void optimizeImageProcessing() {
// Initialize FastCV
fcvSetOperationMode(FASTCV_OP_PERFORMANCE);

// Example: Accelerated image conversion from YUV to RGB
fcvColorYCbCr420PlanarToRGB888u8(
yuv_data, // Input YUV data
width, // Image width
height, // Image height
stride, // YUV stride
rgb_output, // Output buffer
rgb_stride // RGB stride
);

// Example: Accelerated image resizing
fcvScaleu8(
input_image, // Input image
input_width, // Input width
input_height, // Input height
input_stride, // Input stride
output_image, // Output image
output_width, // Output width
output_height, // Output height
output_stride // Output stride
);
}

Key Accelerated Functions

FastCV provides optimized implementations of:

Function CategoryAccelerated Operations
Image ConversionColor space conversion, format transformations
Image FilteringGaussian blur, median filter, sobel operators
Feature DetectionFAST corners, Harris corners, edge detection
Matrix OperationsMultiplication, inverse, determinant
Image ArithmeticAddition, subtraction, multiplication
Geometric TransformsResize, rotate, warp, perspective transform

Compute DSP Offloading

For more advanced use cases, the Hexagon DSP can be directly programmed:

#include "hexagon_types.h"
#include "hexagon_protos.h"

// Example of offloading a custom algorithm to the DSP
int dspOffloadExample() {
// Initialize DSP communication
hexagon_nn_config config;
config.batches = 1;
config.enable_training = 0;

// Set up the DSP handle
hexagon_nn_handle_t handle;
int result = hexagon_nn_init(&handle);

if (result != 0) {
return -1; // Failed to initialize DSP
}

// Execute custom operations on DSP
// ...

// Clean up
hexagon_nn_teardown(handle);
return 0;
}

Parallel Processing with HVX

Hexagon Vector eXtensions (HVX) can be leveraged for SIMD-style parallel processing:

#include <hexagon_types.h>
#include <hvx.h>

// Example of HVX vector processing
void hvxAcceleration() {
// Data setup for HVX processing
HVX_Vector* src = (HVX_Vector*)input_data;
HVX_Vector* dst = (HVX_Vector*)output_data;

// Enable HVX units
HEXAGON_Enable_HVX();

// Process data with HVX vector instructions
for (int i = 0; i < vector_count; i++) {
// Example: Vector addition
*dst++ = Q6_V_vadd_VV(*src++, *offset);
}

// Disable HVX units to save power
HEXAGON_Disable_HVX();
}

Using the Computer Vision SDK

Qualcomm provides additional acceleration through the Computer Vision SDK:

#include "qcv.h"

void acceleratedFeatureDetection() {
// Initialize the SDK
qcv_handle_t handle;
qcv_status_t status = qcv_initialize(&handle);

if (status != QCV_SUCCESS) {
return; // Failed to initialize
}

// Set up feature detection
qcv_feature_detector_t detector;
qcv_feature_detector_create(handle, &detector);

// Configure detector
qcv_feature_detector_set_type(detector, QCV_FEATURE_TYPE_ORB);

// Run detection
qcv_feature_points_t keypoints;
qcv_feature_detector_detect(
detector,
input_image,
width,
height,
stride,
&keypoints
);

// Process detected features
// ...

// Clean up resources
qcv_feature_detector_destroy(detector);
qcv_shutdown(handle);
}

Acceleration Through NNAPI

For Android-based Qualcomm platforms, the Neural Network API provides hardware acceleration:

// Java example for NNAPI acceleration
import android.util.Size;
import androidx.camera.core.ImageAnalysis;
import androidx.camera.core.ImageProxy;
import android.media.Image;

// Image analyzer with NNAPI acceleration
class VisionAnalyzer implements ImageAnalysis.Analyzer {
private NeuralNetworkInference mNNAPI;

public VisionAnalyzer() {
// Initialize NNAPI with accelerator preference
mNNAPI = new NeuralNetworkInference.Builder()
.setDevice(NeuralNetworks.PREFER_SUSTAINED_SPEED)
.build();

// Load and compile model
mNNAPI.loadModel("model.tflite");
}

@Override
public void analyze(ImageProxy image) {
Image.Plane[] planes = image.getPlanes();

// Process image using hardware acceleration
ByteBuffer buffer = planes[0].getBuffer();
Size size = new Size(image.getWidth(), image.getHeight());

// Run accelerated processing
mNNAPI.execute(buffer, size);

image.close();
}
}

Performance Tips

  1. Memory Management:

    • Align buffers to cache line boundaries (typically 128 bytes)
    • Use direct, uncached memory for large transfers
    • Minimize memory copies between CPU and DSP domains
  2. Algorithm Selection:

    • Choose algorithms that map well to parallel processing
    • Consider approximation algorithms when absolute precision isn't critical
    • Use fixed-point math where possible on DSP
  3. Profiling and Optimization:

    • Profile with Snapdragon Profiler to identify bottlenecks
    • Batch operations to amortize overhead costs
    • Balance workloads across CPU, GPU, and DSP

Case Study: Feature Detection Optimization

ImplementationProcessing TimePower Consumption
OpenCV on CPU25ms100% (baseline)
FastCV on CPU12ms60%
FastCV with DSP5ms30%
Custom HVX2ms15%