New Horizons

Welcome to my blog

My name is Sven Andersson and I
work as a consultant in embedded
system design, implemented in ASIC
and FPGA.
In my spare time I write this blog
and I hope it will inspire others to
learn more about this fantastic field.
I live in Stockholm Sweden and have
my own company


You are welcome to contact me
and ask questions or make comments
about my blog.


New Horizons
What's new
Starting a blog
Writing a blog
Using an RSS reader

Zynq Design From Scratch
Started February 2014
Changes and updates
Zynq-7000 All Programmable SoC
ZedBoard and other boards
Computer platform and VirtualBox
Installing Ubuntu
Fixing Ubuntu
Installing Vivado
Starting Vivado
Using Vivado
Lab 1. Create a Zynq project
Lab 1. Build a hardware platform
Lab 1. Create a software application
Lab 1. Connect to ZedBoard
Lab 1. Run a software application
Lab 1. Benchmarking ARM Cortex-A9
Lab 2. Adding a GPIO peripheral
Lab 2. Create a custom HDL module
Lab 2. Connect package pins and implement
Lab 2. Create a software application and configure the PL
Lab 2. Debugging a software application
Running Linux from SD card
Installing PetaLinux
Booting PetaLinux
Connect to ZedBoad via ethernet
Rebuilding the PetaLinux kernel image
Running a DHCP server on the host
Running a TFTP server on the host
PetaLinux boot via U-boot
PetaLinux application development
Fixing the host computer
Running NFS servers
VirtualBox seamless mode
Mounting guest file system using sshfs
PetaLinux. Setting up a web server
PetaLinux. Using cgi scripts
PetaLinux. Web enabled application
Convert from VirtualBox to VMware
Running Linaro Ubuntu on ZedBoard
Running Android on ZedBoard
Lab2. Booting from SD card and SPI flash
Lab2. PetaLinux board bringup
Lab2. Writing userspace IO device driver
Lab2. Hardware debugging
MicroZed quick start
Installing Vivado 2014.1
Lab3. Adding push buttons to our Zynq system
Lab3. Adding an interrupt service routine
Installing Ubuntu 14.04
Installing Vivado and Petalinux 2014.2
Using Vivado 2014.2
Upgrading to Ubuntu 14.04
Using Petalinux 2014.2
Booting from SD card and SPI flash
Booting Petalinux 2014.2 from SD card
Booting Petalinux 2014.2 from SPI flash

Chipotle Verification System

EE Times Retrospective Series
It all started more than 40 years ago
My first job as an electrical engineer
The Memory (R)evolution
The Microprocessor (R)evolution

Four soft-core processors
Started January 2012
Table of contents
OpenRISC 1200
Nios II

Using the Spartan-6 LX9 MicroBoard
Started August 2011
Table of contents
Problems, fixes and solutions

FPGA Design From Scratch
Started December 2006
Table of contents
Acronyms and abbreviations

Actel FPGA design
Designing with an Actel FPGA. Part 1
Designing with an Actel FPGA. Part 2
Designing with an Actel FPGA. Part 3
Designing with an Actel FPGA. Part 4
Designing with an Actel FPGA. Part 5

A hardware designer's best friend
Zoo Design Platform

Installing Cobra Command Tool
A processor benchmark

Porting a Unix program to Mac OS X
Fixing a HyperTerminal in Mac OS X
A dream come true

The New York City Marathon

Kittelfjall Lappland

Tour skating in Sweden and around the world
Wild skating
Tour day
Safety equipment
A look at the equipment you need
Skate maintenance
Books, photos, films and videos
Weather forecasts

38000 feet above see level
A trip to Spain
Florida the sunshine state

Photo Albums
Seaside Florida
Ronda Spain
Sevilla Spain
Cordoba Spain
Alhambra Spain
Kittelfjäll Lapland
Landsort Art Walk
Skating on thin ice

100 Power Tips for FPGA Designers

Adventures in ASIC
Computer History Museum
Design & Reuse
d9 Tech Blog
EDA Cafe
EDA DesignLine
Eli's tech Blog
FPGA Arcade
FPGA Central
FPGA developer
FPGA Journal
FPGA World
Lesley Shannon Courses
Mac 2 Ubuntu
Programmable Logic DesignLine
World of ASIC

If you want to be updated on this weblog Enter your email here:

rss feed

Saturday, February 22, 2014
Zynq design from scratch. Part 15.

Benchmarking the ARM Cortex-A9 processor

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term 'benchmark' is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves.

Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the floating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software. Software benchmarks are, for example, run against compilers or database management systems.

CPU core benchmarking

Although it doesn’t reflect how you would use a processor in a real application, sometimes it’s important to isolate the CPU’s core from the other elements of the processor and focus on one key element. For example, you might want to have the ability to ignore memory and I/O effects and focus primarily on the pipeline operation. This is CoreMark’s domain. CoreMark is capable of testing a processor’s basic pipeline structure, as well as the ability to test basic read/write operations, integer operations, and control operations. Read more.


CoreMark is a benchmark that aims to measure the performance of central processing units (CPU) used in embedded systems. It was developed in 2009 by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the antiquated Dhrystone benchmark. The code is written in C code and contains implementations of the following algorithms: list processing (find and sort), Matrix (mathematics) manipulation (common matrix operations), state machine (determine if an input stream contains valid numbers), and CRC.

Downloading CoreMark

The test suite can be downloaded from

Here is the result after unpacking. We will create a new application in SDK called CoreMark and copy the the marked c-files to the src directory.


Trying to compile the Coremark program without modifications gives the following error:

undefined reference to `clock_gettime'

We are running this application "bare metal" (without OS). This means we don't have access to a real-time clock (RTC) and we can not use the library routines in time.h.  It looks like we have to write our own "clock_gettime" routine.

Bare-metal application development

Xilinx software design tools facilitate the development of embedded software applications for many runtime environments. Xilinx embedded design tools create a set of hardware platform data files that include:

• An XML-based hardware description file describing processors, peripherals, memory maps, and additional system data
• A bitstream file containing optional Programmable Logic (PL) programming data
• A block RAM Memory Map (BMM) file
• PS configuration data used by the Zynq-7000 AP SoC First Stage Bootloader (FSBL).

The bare-metal Board Support Package (BSP) is a collection of libraries and drivers that form the lowest layer of your application. The runtime environment is a simple, semi-hosted and single-threaded environment that provides basic features, including boot code, cache functions, exception handling, basic file I/O, C library support for memory allocation and other calls, processor hardware access macros, timer functions, and other functions required to support bare-metal applications. Using the hardware platform data and bare-metal BSP, you can develop, debug, and deploy bare-metal applications using SDK.

Board support package

The BSP <standalone_bsp_0> we generated in our first software project stores all the information about our board setup and all the software we need to start writing a bare metal program. The libsrc directory contains low-level drivers and example code to be used when writing software to access the hardware in the processing system. We will take a closer look in the scutimer_v1_02_a directory.

Writing our own clock_gettime

We will use one of the timers available in the in ARM processor to count clock cycles and measure time intervals. Let's take a look on the timer setup. Here is a picture taken from chapter 8 in the Zynq-7000 Technical Reference Manual.


Each Cortex-A9 processor has its own private 32-bit timer and 32-bit watchdog timer. Both processors share a global 64-bit timer. These timers are always clocked at 1/2 of the CPU frequency (667MHz). On the system level, there is a 24-bit watchdog timer and two 16-bit triple timer/counters. The system watchdog timer is clocked at 1/4 or 1/6 of the CPU frequency, or can be clocked by an external signal from an MIO pin or from the PL. The two triple timers/counters are always clocked at 1/4 or 1/6 of the CPU frequency, and are used to count the widths of signal pulses from an MIO pin or from the PL. Read more about the timers in the Cortex-A9 MPCore Technical Reference Manual chapter 4.

Program example

Here is an example program that uses the ARM CPU private timer to measure the time it takes to run the CoreMark benchmark program. It is used in the core_portme.c to read the timer counter register before the program starts and when it has finished.

ee_u32 GetTimerValue(ee_u32 TimerIntrId,ee_u16 Mode)


    int                 Status;
    XScuTimer_Config    *ConfigPtr;
    volatile ee_u32     CntValue  = 0;
    XScuTimer           *TimerInstancePtr = &Timer;

    if (Mode == 0) {

      // Initialize the Private Timer so that it is ready to use

      ConfigPtr = XScuTimer_LookupConfig(TimerIntrId);

      Status = XScuTimer_CfgInitialize(TimerInstancePtr, ConfigPtr,

      if (Status != XST_SUCCESS) {
          return XST_FAILURE; }

      // Load the timer prescaler register.

      XScuTimer_SetPrescaler(TimerInstancePtr, TIMER_RES_DIVIDER);

      // Load the timer counter register.

      XScuTimer_LoadTimer(TimerInstancePtr, TIMER_LOAD_VALUE);

      // Start the timer counter and read start value

      CntValue = XScuTimer_GetCounterValue(TimerInstancePtr);


    else {

       //  Read stop value
and stop the timer counter

       CntValue = XScuTimer_GetCounterValue(TimerInstancePtr);


    return CntValue;


Compiling the modified code

Here is all the source code that will be compiled. Here are the modified files core_portme.h and core_portme.c ready to be downloaded.

Compilation setup

Right-click the CoreMark project and select C/C++ Build Settings. We will define the following symbols

and select the most optimization (-O3).

Compilation print out

Running CoreMark

Here is a print out from the CoreMark program.

CoreMark benchmark result

1998 iterations/sec and the CPU running at 667MHz will give a CoreMark value of 1998/667 ≈ 3.0 CoreMark/MHz. All you compiler experts out there please let me know about other ways to improve this result.

More benchmarking

Z-7020 based ZC702 evaluation platform

Top   Previous   Next

Posted at 11:41 by svenand


Leave a Comment:


Homepage (optional)


Previous Entry Home Next Entry