
Accuracy vs Speed
-
2 ULP for sci math for floating point and 2 LSB for fixed point
-
vectorized variants for most functions
-
dedicated functions for specific problem sizes for the better performance
-
utilizes all acceleration features of Tensilica cores
Very few examples (for B20):
Floating point
16-bit fixed point
sine
1 pts/cycle
5.8 pts/cycle
logarithm
0.8 pts/cycle
4.0 pts/cycle
full arctangent
0.5 pts/cycle
1.2 pts/cycle
complex QR decomposition
38 cycles/matrix
19 cycles/matrix
complex FFT 1024 pts
1484 cycles
545 cycles
IntegrIT Services
-
modifications according to the specific requirements
-
adaptation the library for framework/application or customized DSP core
-
migration of an API from one core to another
-
reference code and test harness for simulation models running on the PC without Tensilica simulator
-
test data set for functional and performance validation on customizable core/system
-
porting directly from your Matlab to optimized version for SoC
Data types and accuracy
-
8-bit, 16-bit, 32-bit fixed point arithmetic
-
single precision floating point
-
double precision floating point
-
half precision (16-bit floating point)
-
excellent 2 ULP accuracy for all floating point math
-
ANSI C/IEEE-754 errno/exception support for sci math
-
streaming data formats for better SIMD vectorization
NatureDSP: from Matlab to SoC
-
powerful set of math routines for numerous applications (>1k routines per library)
-
filters, ffts, sci math, neural, image processing, matrix decompositions, etc.
-
portable code – faster adaption to new devices
-
ported and adapted to many Cadence cores – from simplest HiFi2 to very powerful B20 with up to 128 MAC/cycle
-
supports all data types including modern half precision floating point
-
highly optimized for every core
-
powerful test engine and very big test set (10 Gbytes typ)
-
strong validation methodology
-
friendly for integration into any framework/OS

Application and Cores
-
audio/speech
-
IoT, robotics
-
communications
-
ADAS
-
recognition,neural networking
-
image processing
Cadence DSP Cores
-
HiFi2, HiFi mini, HiFi3/3z, HiFi4, HiFi5
-
ConnXD2
-
Fusion F1, G3, G6, J6
-
Vision P5, P6, Q7
-
Vectra LX
-
BBE 16EP, 32EP, 64EP
-
B10, B20
Commercial cores
-
TI C64x, C55x
-
ADI Shark, Blackfin
-
ARM cores from v5TE to v8 (Neon v2)
Main Features
-
broad range of DSP categories – from filtering to image processing
-
C reference code for faster porting and migration across the cores
-
excellent validation/testing methodology
-
strong coding discipline – ready for MISRA-C certification
-
all popular data types supported, including half precision floating point
-
ANSI C/IEEE-754 errno/exception support
-
code conditionalization to support configurability of Tensilica cores
Algorithmic Features
-
memory layout and cache optimization
-
code size optimization
-
sci math for all data types
-
all popular matrix decompositions (Cholesky, QR, Gauss-Jordan, LU, SVD, eigenvalues, direct invesion) for floating point and even for fixed point data
-
hundreds of different FFTs (including all LTE non-radix 2 sizes) for communications and image processing
Testing and specification
-
excellent testing methodology
-
simulation model to run on the PC or linux server without simulator
-
Matlab-generated test data
-
good coverage - 5…15 Gbytes of test data per library
-
dedicated accuracy tests for validation of sci math
-
validation of out-bound memory accesses
-
internal tests for code coverage
-
detailed cycle performance tests
-
very detailed specification of performance (typ.300…700 pages per core)
Coding Features
-
detailed API documents
-
reference Matlab code for creating the test data
-
reference C code for faster migration to another platform/core/SoC
-
ported code written on plain C with core specific intrinsics
-
already utilizes the best practices for memory layout and generic optimization
-
no memory allocation and system calls inside – ready to port and running the code under any framework or operating system
-
strong coding discipline – ready for MISRA-C certification