Neural Network Optimization for Qualcomm Platforms
This guide covers techniques for optimizing deep learning models to achieve maximum performance and efficiency on Qualcomm's heterogeneous computing architecture.
Model Conversion Workflow
Converting models for Qualcomm hardware involves these key steps:
- Export from Training Framework: Save models from PyTorch, TensorFlow, or other frameworks
- Convert to ONNX: Use framework-specific exporters to create ONNX representation
- Convert to DLC: Use SNPE tools to convert ONNX to Qualcomm's DLC format
- Quantize: Optionally convert to INT8 precision for improved performance
- Deploy: Load the optimized model on the target device
Conversion Examples
TensorFlow to DLC
# 1. Export SavedModel from TensorFlow
python -c "import tensorflow as tf; model = tf.keras.applications.MobileNetV2(); model.save('model_tf')"
# 2. Convert to ONNX using tf2onnx
python -m tf2onnx.convert --saved-model model_tf --output model.onnx
# 3. Convert to DLC
snpe-onnx-to-dlc --input_network model.onnx --output_path model.dlc
PyTorch to DLC
# 1. Export PyTorch model to ONNX
python -c "
import torch
import torchvision.models as models
model = models.resnet18(pretrained=True)
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, 'model.onnx', verbose=True)
"
# 2. Convert to DLC
snpe-onnx-to-dlc --input_network model.onnx --output_path model.dlc
Quantization
Quantizing models from FP32 to INT8 significantly improves performance on Qualcomm hardware:
# Generate calibration list (text file with paths to calibration images)
ls calibration_images/*.jpg > cal_list.txt
# Run quantization with calibration data
snpe-dlc-quantize --input_dlc model.dlc \
--output_dlc model_quantized.dlc \
--input_list cal_list.txt \
--use_enhanced_quantizer
Quantization-Aware Training
For best results, train models with quantization awareness:
# TensorFlow example
import tensorflow as tf
import tensorflow_model_optimization as tfmot
# Apply quantization aware training
quantize_model = tfmot.quantization.keras.quantize_model
# Create a quantization aware model
q_aware_model = quantize_model(model)
# Train the model with quantization awareness
q_aware_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
q_aware_model.fit(train_data, train_labels, epochs=10)
Layer Optimization
Some layers require special handling for optimal performance:
Layer Type | Optimization |
---|---|
Convolutions | Use multiples of 8 for filter counts |
Activations | Prefer ReLU over sigmoid/tanh |
Pooling | Use fixed-size pooling when possible |
Custom Ops | Replace with supported primitives |
Architecture Recommendations
Model architectures that perform well on Qualcomm platforms:
- MobileNet Family: Designed for mobile inference with depthwise separable convolutions
- EfficientNet: Balanced accuracy/efficiency scaling
- MnasNet: Architecture optimized for mobile devices
- SqueezeNet: Very compact model with competitive accuracy
Benchmarking Your Models
Use the SNPE benchmarking tool to measure performance:
snpe-net-run --container model.dlc \
--input_list input_list.txt \
--output_dir results \
--perf_profile high_performance \
--measure_performance
Analyze the results to identify bottlenecks:
snpe-diagview --input_log results/logs/*.log
Runtime Selection
Qualcomm platforms support multiple runtimes with different performance characteristics:
Runtime | Best For |
---|---|
CPU | Compatibility and development |
GPU | Most vision models, balance of performance/power |
DSP | Maximum power efficiency |
AIP | Latest platforms, highest performance |
Example: Selecting Optimal Runtime
// Initialize SNPE with runtime selection
zdl::DlSystem::RuntimeList runtimeList;
runtimeList.add(zdl::DlSystem::Runtime_t::GPU);
runtimeList.add(zdl::DlSystem::Runtime_t::DSP);
runtimeList.add(zdl::DlSystem::Runtime_t::CPU);
// Create SNPE network with runtime failover
zdl::SNPE::SNPEFactory::Instance().CreateSNPE(
zdl::DlSystem::PlatformConfig(),
modelPath.c_str(),
zdl::DlSystem::Runtime_t::GPU, // Preferred runtime
runtimeList, // Fallback runtimes
"output_layer"
);
Advanced Optimization Techniques
For expert users and production deployment:
- Layer Fusion: Merge sequential operations where possible
- Mixed Precision: Use different precision for different layers
- Custom Layers: Implement optimized versions of custom operations
- Memory Planning: Control memory allocation for large models
- Profile-Guided Optimization: Use actual usage data to optimize execution paths