## Introduction

On this page we will present the design of a sine and cosine computer.

## Specification

We will assume that the input angle is represented in radians, and that it is
in the range between `-π/2`

and `π/2`

. Other angles can be handled by adding a
factor `π`

first (modulo `2π`

) and changing the sign of the sine and cosine
result.

The floating point numbers are represented as integers. Technically, this means
rounding them to a fixed number of bits, and then multiplying them with the
factor `2**F`

, where `F`

is the number of bits after the point.

Both the sine and the cosine of the input angle will be computed. The interface of the module looks as follows:

def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset): """ Sine and cosine computer. This module computes the sine and cosine of an input angle. The floating point numbers are represented as integers by scaling them up with a factor corresponding to the number of bits after the point. Ports: ----- cos_z0: cosine of the input angle sin_z0: sine of the input angle done: output flag indicating completion of the computation z0: input angle; -pi/2 <= z0 <= pi/2 start: input that starts the computation on a posedge clock: clock input reset: reset input """

## Unit test

We will first write a unit test for the design. The idea is to use the `cos`

and `sin`

functions from the `math`

module to compute the expected results on a
number of input angles, and to compare them with the outputs from the design
under test. Here is the code:

from math import pi, sin, cos, log import random from myhdl import * from SineComputer import SineComputer, SineComputer_v def bench(fractionSize, errorMargin, nrTests=100): """ Test bench for SineComputer. fractionSize: number of bits after the point errorMargin: margin for rounding errors on result nrTests: number of tests vectors """ # scaling factor to represent floats as integers M = 2**fractionSize # maximum angle ZMAX = int(round(M*pi/2)) # error margin shorthand D = errorMargin # signals cos_z0 = Signal(intbv(0, min=-D, max=M+D)) sin_z0 = Signal(intbv(0, min=-M-D, max=M+D)) z0 = Signal(intbv(0, min=-ZMAX, max=ZMAX+1)) done = Signal(False) start = Signal(False) clock = Signal(bool(0)) reset = Signal(True) # design under test dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset) # clock generator @always(delay(10)) def clockgen(): clock.next = not clock # test vector setup testAngles = [-pi/2, -pi/4, 0.0, pi/4, pi/2] testAngles.extend([random.uniform(-pi/2, pi/2) for i in range(nrTests)]) # actual test @instance def check(): yield clock.negedge reset.next = False for z in testAngles: yield clock.negedge z0.next = int(round(M*z)) start.next = True yield clock.negedge start.next = False yield done.posedge exp_cos_z0 = int(round(cos(z)*M)) exp_sin_z0 = int(round(sin(z)*M)) assert abs(cos_z0 - exp_cos_z0) < D assert abs(sin_z0 - exp_sin_z0) < D raise StopSimulation return dut, clockgen, check

As the design will perform its calculations in finite precision, there will be rounding errors. Therefore, the comparisons between expected and actual results are performed using an error margin. We may want to use the test bench for designs with different characteristics; therefore, the error margin is made a parameter.

The precision is specified in terms of the number of bits after the point,
using the parameter `fractionSize`

.

To represent the numbers, we use the `intbv`

class, which is basically an
integer-like type with bit-vector capabilities. Note that we constrain the
`intbv`

instances by specifying the range of valid integer values, not by a bit
width. For high-level, algorithmic work, this is much easier. Moreover, it
enables fine-grained range error checking at run-time.

By filling in the parameters of the `bench`

function, we can construct an
actual test bench that runs a simulation, as follows:

def test_bench(): fractionSize = 18 errorMargin = fractionSize tb = bench(fractionSize, errorMargin) sim = Simulation(tb) sim.run()

A function with a name starting with `test_`

is automatically recognized and run by a unit testing framework such as `py.test`

.

## Algorithm

To implement the design, we will use the Cordic algorithm, a very popular algorithm to compute trigonometric functions in hardware. On this page, we are mainly interested in the mechanical characteristics of the algorithm and their hardware implications. For more information and background on the algorithm itself, please consult other sources, such as this paper by Ray Andraka.

The Cordic algorithm is an iterative algorithm based on vector rotations over
elementary angles. The algorithm normally operates in one of two modes. In
rotation mode, it rotates a vector (x_{0}, y_{0}) in the
Cartesian plane over an input angle z_{0}. The Cordic equations for
this mode are:

x_{i +1} = x_{i} - y_{i}d_{i}2^{-i}

y_{i +1} = y_{i} - x_{i}d_{i}2^{-i}

z_{i +1} = y_{i} - d_{i}tan^{-i}(2^{-i})

where

d_{i} = -1 if z_{i} < 0, else +1.

These equations can be implemented with relatively simple hardware. This is the
characteristic that makes the Cordic algorithm attractive. In particular,
multiplications with a factor 2^{-i} are simply shift-right operations.
Also, the arctangent values tan^{-i}(2^{-i}) can be precomputed
and stored in a small look-up table.

The Cordic equations can be used for a variety of computations. For our purposes, it can be shown that

x_{n} = cos z_{0}

y_{n} = sin z_{0}

for the following initial conditions:

x_{0} = 1 / A_{n}

y_{0} = 0

where

A_{n} = Product[ sqrt(1 + 2^{-2i}) ] with i = 0 ... n-1

## Design

The Cordic algorithm can be implemented in many ways, with various characteristics and advantages. On this page, we will implement a parallel, iterative processor, which is a fairly straightforward mapping of the equations into a bit-parallel data path and a state machine.

from math import atan, sqrt, ceil, floor, pi from myhdl import * t_State = enum("WAITING", "CALCULATING") def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset): """ Sine and cosine computer. This module computes the sine and cosine of an input angle. The floating point numbers are represented as integers by scaling them up with a factor corresponding to the number of bits after the point. Ports: ----- cos_z0: cosine of the input angle sin_z0: sine of the input angle done: output flag indicated completion of the computation z0: input angle start: input that starts the computation on a posedge clock: clock input reset: reset input """ # angle input bit width W = len(z0) # angle input z0 represents number between -pi/2 and pi/2 # scaling factor corresponds to the nr of bits after the point M = 2 ** (W-2) # nr of iterations equals nr of significant input bits N = W-1 # calculate X0 An = 1.0 for i in range(N): An *= (sqrt(1 + 2**(-2*i))) # X0 X0 = int(round(M*1/An)) # tuple with elementary angles angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)]) # iterative cordic processor @instance def processor(): x = intbv(0, min=sin_z0.min, max=sin_z0.max) y = intbv(0, min=sin_z0.min, max=sin_z0.max) z = intbv(0, min=z0.min, max=z0.max) dx = intbv(0, min=sin_z0.min, max=sin_z0.max) dy = intbv(0, min=sin_z0.min, max=sin_z0.max) dz = intbv(0, min=z0.min, max=z0.max) i = intbv(0, min=0, max=N) state = t_State.WAITING while True: yield clock.posedge, reset.posedge if reset: state = t_State.WAITING cos_z0.next = 1 sin_z0.next = 0 done.next = False x[:] = 0 y[:] = 0 z[:] = 0 i[:] = 0 else: if state == t_State.WAITING: if start: x[:] = X0 y[:] = 0 z[:] = z0 i[:] = 0 done.next = False state = t_State.CALCULATING elif state == t_State.CALCULATING: dx[:] = y >> i dy[:] = x >> i dz[:] = angles[int(i)] if (z >= 0): x -= dx y += dy z -= dz else: x += dx y -= dy z += dz if i == N-1: cos_z0.next = x sin_z0.next = y state = t_State.WAITING done.next = True else: i += 1 return processor

The actual computation is done by the `processor`

generator. Note that outside
the generator function, we calculate some data such as the `X0`

constant, and
the look-up table of elementary arctangents, represented by the `angles`

tuple.

The internal number variables are represented by `intbv`

instances. The dual
nature of this class comes in very handy. On the one hand, we can constrain the
instances as integer subtypes by specifying the valid integer range at
construction time. On the other hand, we can access their two's complement
representation as a bit vector, for example for slicing or right-shifting.

It seems obvious that a type that unifies the integer and the bit vector views
should be very useful for hardware design. One would therefore expect a similar
feature in other HDLs. However, I believe that it is actually a unique
capability offered by MyHDL. Other HDLs seem to try to solve the issues by
creating more and more integer and bit-vector like types. In MyHDL, a single
type does it all - the `intbv`

class.

`py.test`

confirms that this is a valid impementation:

> py.test testing-mode: inprocess executable: /usr/local/bin/python (2.4.2-final-0) using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown> test_SineComputer.py[1] .

## Automatic conversion to Verilog

Now that we have a working design (and only now!) we can attempt to convert it
to Verilog automatically. In MyHDL, this is done by using the `toVerilog`

function. For example, consider the instantiation of the design under test in
the test bench:

# design under test dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset)

To convert the design instance to Verilog, this line can be replaced by the following:

# design under test dut = toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset)

As before, the `dut`

object is a simulatable design instance, but as a side
effect of the instantiation, an equivalent Verilog module file will be
generated. The Verilog output is as follows:

module SineComputer ( cos_z0, sin_z0, done, z0, start, clock, reset ); output signed [19:0] cos_z0; reg signed [19:0] cos_z0; output signed [19:0] sin_z0; reg signed [19:0] sin_z0; output done; reg done; input signed [19:0] z0; input start; input clock; input reset; always @(posedge clock or posedge reset) begin: _SineComputer_processor reg [5-1:0] i; reg [1-1:0] state; reg signed [20-1:0] dz; reg signed [20-1:0] dx; reg signed [20-1:0] dy; reg signed [20-1:0] y; reg signed [20-1:0] x; reg signed [20-1:0] z; if (reset) begin state = 1'b0; cos_z0 <= 1; sin_z0 <= 0; done <= 0; x = 0; y = 0; z = 0; i = 0; end else begin // synthesis parallel_case full_case casez (state) 1'b0: begin if (start) begin x = 159188; y = 0; z = z0; i = 0; done <= 0; state = 1'b1; end end 1'b1: begin dx = $signed(y >>> $signed({1'b0, i})); dy = $signed(x >>> $signed({1'b0, i})); // synthesis parallel_case full_case case (i) 0: dz = 205887; 1: dz = 121542; 2: dz = 64220; 3: dz = 32599; 4: dz = 16363; 5: dz = 8189; 6: dz = 4096; 7: dz = 2048; 8: dz = 1024; 9: dz = 512; 10: dz = 256; 11: dz = 128; 12: dz = 64; 13: dz = 32; 14: dz = 16; 15: dz = 8; 16: dz = 4; 17: dz = 2; default: dz = 1; endcase if ((z >= 0)) begin x = x - dx; y = y + dy; z = z - dz; end else begin x = x + dx; y = y - dy; z = z + dz; end if ((i == (19 - 1))) begin cos_z0 <= x; sin_z0 <= y; state = 1'b0; done <= 1; end else begin i = i + 1; end end endcase end end endmodule

## A discussion about some convertor features

With this example, some interesting features of the Verilog convertor can be illustrated.

### Taking advantage of the elaboration phase

It is important to realize that the conversion occurs on a design *instance*.
This means that the code has already been *elaborated* by the Python
interpreter. Therefore, the convertor works on the simulatable data structure,
which is a (hierarchical) list of generators. This means that only the source
code of generator functions is converted. The pleasant consequence is that the
restrictions of the "convertible subset" apply only to the code inside
generator functions, not to any code outside them.

For example, consider how the look-up table of elementary arctangents is set up
in the `SineComputer`

design:

# tuple with elementary angles angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)])

This line uses things like a list comprehension and a call to the trigonometric
function `atan`

from the `math`

library, At this point, this is beyond the
scope of the convertible subset, and it may stay like that forever. But as this
code is outside a generator function, it doesn't matter.

Inside the `processor`

function, the lookup-up table is used as follows:

z[:] = angles[int(i)]

This is just fine for the convertor. Note how this single-line lookup is expanded into a case statement in the Verilog output.

Note that we have been talking about the convertible subset, and not about the "synthesizable subset". The reason is that the convertible subset is much less restrictive. You can find more information about the convertible subset in the MyHDL manual.

Obviously, MyHDL code intended for synthesis also has to take synthesis-related restrictions into account. But again, these restrictions only apply to the code inside generator functions, because only that code is actually converted.

The described behavior is a unique feature of the MyHDL design flow. Outside generator functions, you can use Python's full power to describe designs. As long as the code inside them obeys the constraints of the convertible subset, the design instance can always be converted to Verilog. And if that code also obeys the constraints of the synthesizable subset, the result will be synthesizable Verilog.

### Handling negative numbers

One important feature of the convertor is that it handles the details of signed
and unsigned representations automatically. When writing synthesizable code, a
MyHDL designer can use a high-level view for integer operations by using the
`intbv`

type, and rely on the underlying two's complement representation to do
the right thing automatically. In contrast, a Verilog designer is forced to
deal with low-level representational issues explicitly. This can become very
tricky, especially with negative numbers and the signed representation.

First of all, note in the Verilog output that the convertor infers which
variables have to be declared as `signed`

. This is the easy part.

Now consider the following line in the MyHDL code of `SineComputer`

:

dx[:] = y >> i

I believe it's quite clear what this is supposed to do. With the underlying
two's complement representation, it works for positive and negative values of
`y`

.

Now consider how this has been translated into Verilog:

dx = $signed(y >>> $signed({1'b0, i}));

The convertor has to deal with several potential pitfalls. The fundamental problem is that Verilog uses an unsigned interpretation by default, which is the opposite from what you should do to get the naturally expected results.

First, the convertor uses arithmetic shift (`>>>`

) instead of bit-wise shift
(`>>`

), to have the sign bit shifted in instead of a zero.

Secondly, the second (unsigned) operand is typecasted to `$signed`

, after
adding a sign bit. Otherwise, Verilog will interprete all operands in a mixed
expression as unsigned. Having said that, the Verilog literature seems to
indicate that a shift operation is an exception to this rule. But the convertor
doesn't take risks and inserts the typecast as a general measure. It may be
redundant in this case.

Finally, the whole expression is typecasted to `$signed`

. Actually, I would
have expected that this typecast would not be necessary - after all, we are
shifting a signed number. However, my current simulator (Cver) tells me that it
is. It may be wrong, I don't know. Anyway, the cast is an additional safety
net.

The message you should get from this discussion is the following: working with the signed representation in Verilog is tricky. I believe that writing the code in a natural, high-level way in MyHDL, and letting the convertor take care of the low-level representation issues, is a better option. (Of course, we still need to make sure that the convertor gets it right, which is hard enough.)

## Verilog co-simulation

Clearly we will want to verify that the Verilog output from the convertor is correct. For this purpose, MyHDL supports co-simulation with Verilog.

To set up a co-simulation, we need to create a Cosimulation object for the Verilog design. The Verilog convertor makes this task easier. In addition to the Verilog code for the design itself, it also generates a Verilog test bench stub that defines an interface between the Verilog design and a Cosimulation object.

The following function creates a Cosimulation object for our design:

def SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset): toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset) cmd = "cver -q +loadvpi=myhdl_vpi:vpi_compat_bootstrap " + \ "SineComputer.v tb_SineComputer.v" return Cosimulation(cmd, **locals())

We start by doing the Verilog conversion itself first. Then, we define the command to start up the Verilog simulator. This is of course simulator-specific. The command shown is for the open-source Cver simulator. It loads a vpi module that defines the interface between the MyHDL simulator and the Verilog simulator. Also, both the Verilog code for the design and the test bench stub are compiled.

The Cosimulation object is then constructed with the command as its first
parameter, followed by a number of keyword arguments. The keyword arguments
make the link between signal names declared in the Verilog test bench stub and
signals in the MyHDL code. When the signal names in MyHDL and Verilog are
identical we can use a little trick and simply pass the local namespace
dictionary `locals()`

to the constructor.

In the test bench code, we replace the instantiation of the MyHDL module with this function:

dut = SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset)

When running py.test again, we now run a co-simulation between the MyHDL test bench and the Verilog code:

testing-mode: inprocess executable: /usr/local/bin/python (2.4.2-final-0) using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown> test_SineComputer.py[1] Copyright (c) 1991-2005 Pragmatic C Software Corp. All Rights reserved. Licensed under the GNU General Public License (GPL). See the 'COPYING' file for details. NO WARRANTY provided.

So the test bench confirms that the Verilog code is correct.

**Historical note** When this example was developed with MyHDL 0.5, this test
failed, showing that the convertor had bugs. It turned out that the handling of
signed variables with shift operations had not been implemented and tested
properly. Therefore, this example has been the trigger to fix these bugs and
develop MyHDL 0.5.1.

## Implementation

To confirm synthesizablity, this example was synthesized with Xilinx ISE and targeted to a Spartan FPGA. For detailed information, you can review the synthesis report.