Accuracy vs Speed
  • 2 ULP for sci math for floating point and 2 LSB for fixed point

  • vectorized variants for most functions

  • dedicated functions for specific problem sizes for the better performance

  • utilizes all acceleration features of Tensilica cores

Very few examples (for B20):

Floating point

16-bit fixed point

sine

1 pts/cycle

5.8 pts/cycle

logarithm

0.8 pts/cycle

4.0 pts/cycle

full arctangent

0.5 pts/cycle

1.2 pts/cycle

complex QR decomposition

38 cycles/matrix

19 cycles/matrix

complex FFT 1024 pts

1484 cycles

545 cycles

IntegrIT Services
  • modifications according to the specific requirements

  • adaptation the library for framework/application or customized DSP core

  • migration of an API from one core to another

  • reference code and test harness for simulation models running on the PC without Tensilica simulator

  • test data set for functional and performance validation on customizable core/system

  • porting directly from your Matlab to optimized version for SoC

Data types and accuracy
  • 8-bit, 16-bit, 32-bit fixed point arithmetic

  • single precision floating point 

  • double precision floating point 

  • half precision (16-bit floating point)

  • excellent 2 ULP accuracy for all floating point math 

  • ANSI C/IEEE-754 errno/exception support for sci math

  • streaming data formats for better SIMD vectorization

NatureDSP: from Matlab to SoC
  • powerful set of math routines for numerous applications (>1k routines per library)

  • filters, ffts, sci math, neural, image processing, matrix decompositions, etc.

  • portable code – faster adaption to new devices

  • ported and adapted to many Cadence cores – from simplest HiFi2 to very powerful B20 with up to 128 MAC/cycle

  • supports all data types including modern half precision floating point

  • highly optimized for every core

  • powerful test engine and very big test set (10 Gbytes typ)

  • strong validation methodology

  • friendly for integration into any framework/OS

Application and Cores
  • audio/speech

  • IoT, robotics

  • communications

  • ADAS

  • recognition,neural networking

  • image processing

Cadence DSP Cores

  • HiFi2, HiFi mini, HiFi3/3z, HiFi4, HiFi5

  • ConnXD2

  • Fusion F1, G3, G6, J6

  • Vision P5, P6, Q7

  • Vectra LX

  • BBE 16EP, 32EP, 64EP

  • B10, B20

Commercial cores

  • TI C64x, C55x

  • ADI Shark, Blackfin

  • ARM cores from v5TE to v8 (Neon v2)

Main Features
  • broad range of DSP categories – from filtering to image processing

  • C reference code for faster porting and migration across the cores

  • excellent validation/testing methodology

  • strong coding discipline – ready for MISRA-C certification

  • all popular data types supported, including half precision floating point

  • ANSI C/IEEE-754 errno/exception support

  • code conditionalization to support configurability of Tensilica cores

Algorithmic Features
  • memory layout and cache optimization

  • code size optimization

  • sci math for all data types

  • all popular matrix decompositions (Cholesky, QR, Gauss-Jordan, LU, SVD, eigenvalues, direct invesion) for floating point and even for fixed point data

  • hundreds of different FFTs (including all LTE non-radix 2 sizes) for communications and image processing

Testing and specification
  • excellent testing methodology

  • simulation model to run on the PC or linux server without simulator

  • Matlab-generated test data

  • good coverage - 5…15 Gbytes of test data per library

  • dedicated accuracy tests for validation of sci math

  • validation of out-bound memory accesses

  • internal tests for code coverage 

  • detailed cycle performance tests

  • very detailed specification of performance (typ.300…700 pages per core)

Coding Features
  • detailed API documents

  • reference Matlab code for creating the test data

  • reference C code for faster migration to another platform/core/SoC

  • ported code written on plain C with core specific intrinsics

  • already utilizes the best practices for memory layout and generic optimization

  • no memory allocation and system calls inside – ready to port and running the code under any framework or operating system

  • strong coding discipline – ready for MISRA-C certification