IREX has come a long way to build the most efficient AI algorithms on both CPU and GPU platforms. Our optimization happens on multiple layers:
- Neural Network Architecture Optimization: our scientists continuously work to reach the optimal tradeoff between network complexity and accuracy.
- Quantization: switching from 32-bit floating-point (high-precision) to 8-bit (low-precision) math yields a substantial performance gain between 3X and 4X with a minor reduction of accuracy of 0.3%. The main time saving comes from the 4X reduction in memory bandwidth utilization, the bottleneck for most high-load AI applications.
- Superior Inference Engine: IREX has developed a proprietary inference technology for CPUs to enable efficient inference on standard servers without GPU cards. The framework brings 1.5X-2X acceleration compared to OpenVINO. Further, on the next-gen Vector Neural Network Instructions (VNNI), the IREX inference engine provides an additional performance gain of up to 50% when compared to AVX-512. The average VNNI gain (for different AI algorithms) is 23%.
Facebook has achieved a similar level of performance gain. Practicing such reduced precision technologies helped Facebook to save data center capacities while deploying models with up to 5X complexity that would otherwise not be deployed on traditional general-purpose CPUs.
Most IREX AI modules for threat detection, face recognition, and vehicle analytics run in real-time, in a single thread for each video stream. This means that modern multi-core CPUs can serve the corresponding number of cameras. No GPU or camera upgrade is required.