

In the next 2 sections I will show you a simple example of the use of Multi threading.



- + This example show you how to start three threads.
- + you will see the effect of threading



In this section I will do a code walk through. At each step I'll show the registers involved, the C code lines and assemble code for each step.

In the next section I will go through the compiling and running the code in the debugger.

## MT CE Example Intrinsic, macros and #defines for the MIPS® MT ASE Allow easy access from C code to special MT instructions and operations. The Intrinsic are defined in the file include/mips/mt.h #include <mips/mt.h> in your C code source file. Refer to the mt.h file for more information.

I have already shown you some of the macros that allow you to program in C. Just a reminder these are located in the include/mips/mt.h file.

| Fields     |      | MVPControl - CP0 #0-Sel1                                                                                                                                                         | Read/Write | Reset |
|------------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-------|
| Name       | Bits |                                                                                                                                                                                  |            | State |
| VPC        | 1    | VPE Configuration State. If set, allows writing to normally read-only configuration register fields on conventional MIPS32 CPUs.                                                 | R/W        | 0     |
| EVP        | 0    | Enable Virtual Processors. If set, execute instructions for all threads<br>on activated VPEs. If cleared, execute instructions only for thread<br>which is running when cleared. | R/W        | 0     |
| ssembly co |      | // load the value for the combined VPC and EVP fi                                                                                                                                |            |       |

The first thing the code needs to do is to put the processor into a mode where we can use the CP0 registers to configure the threads we want to run.

The MVPControl register has 2 fields, VPC and EVP.

+ Setting the VPC field will allow us to write registers that a normally are not writable on a single core MIPS processor.

+ Clearing EVP disables all multi processing so we can configure all the threads.

+ The mips32\_getmvpcontrol macro reads the MVPControl register. I clear the EVP bit using the MVPCONTROL\_EVP #define and set the VPC bit using the MVPCONTROL\_VPC #define The write is back using the mips32\_setmvpcontrol macro.

+ Here is what the assemble code looks like

|                        | -                                                  | we can access TC1s CP0<br>tr and mftr instructions                                                                                                                                                                              | registers  | with  |
|------------------------|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-------|
| Field                  | ds                                                 | VPEControl - CP0#1-Sel1                                                                                                                                                                                                         | Read/Write | Reset |
| Name                   | Bits                                               |                                                                                                                                                                                                                                 |            | State |
| TargTC                 | 7-0                                                | Target TC number to be used on MTTR and MFTR instructions                                                                                                                                                                       | R/W        | 0     |
| Assembly<br>mi<br>li t | c0 t0,c0_v<br>1, TC1<br>s t0,t1,0, 8<br>c0 v0,c0_v | et(TC1);<br>pecontrol // read the VPEControl Register<br>// load target TC number<br>// insert TC number into VPEControl register<br>// write new value to VPEControl register<br>// ensure write has completed before continui |            |       |
|                        |                                                    |                                                                                                                                                                                                                                 |            | 3     |

Assuming the thread we are executing on is thread 0, the target TC needs to be configure for thread 1. To do this use the TargTC field in the VPEControl register

Once this done the mttr - move to thread register instruction and the mftr - move from thread register instruction will be directed to Thread 1.

+ This is simple to do in C using the mips32\_mt\_settarget macro

| Halt       | TC1                 |                                                                                                                                               |            |                |
|------------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------|----------------|
| Field      |                     | TCHalt - CP0#2-Sel4                                                                                                                           | Read/Write | Reset<br>State |
| Name       | <b>Bits</b><br>0    | Thread Halted. If set thread has been halted and cannot be allocated, activated, or scheduled                                                 | R/W        | 1              |
| Assem<br>I | bly code:<br>i t0,1 | alt(TCHALT_H);<br>// load the H field<br>_tchalt // write the value to the TCHalt register<br>// ensure write has completed before continuing |            |                |
|            | <b>P</b> S          |                                                                                                                                               |            |                |

Before continuing, the target thread needs to be halted otherwise the change being made will be unpredictable. To make sure Thread 1 is halted before configuring it; set the H field in its TCHalt register.

+ In C I can use the mips32\_mt\_settchalt macro and the TCHALT\_H #define

| Bind   | TC1                         | to VPE0                                         |       |       |
|--------|-----------------------------|-------------------------------------------------|-------|-------|
| Fields |                             | TCBind - CP0#2-Sel2                             | Read/ | Reset |
| Name   | Bits                        |                                                 | Write | State |
| CurVPE | 3 - 0                       | ID number of the VPE the TC is bound to         | R/W   | 0     |
|        | nttc0 zero,c0 <u></u><br>hb | // ensure write has completed before continuing |       |       |
|        | S                           |                                                 |       | 1     |

Bind thread 1 to VPE 0 using its TCBind register

- + The mips32\_mt\_settcbind macro will write the register
- + Here is the assemble code



I now setup the stack pointer and global pointer of the thread.

I have previously allocated space for the threads stack and set the variable TC1\_stack\_top to the last word entry in the stack since stacks grow down.

I use the mips32\_mt\_setsp macro to write the stack pointer register.

+ Here is the assemble code

The Global pointer is used to reference the global variables in the small data areas. These variables are shared by all threads.

+ To set the global pointer I will use an external variable set up by the linker called \_gp and the mips32\_mt\_setgp macro

+ Here is the assemble code notice I just copy the current threads gp register to the target thread

| Fiel<br>Name                                                                                           | ds<br>Bits | TCRestart - CP0#2-Sel3                | Read/Write | Reset<br>State |
|--------------------------------------------------------------------------------------------------------|------------|---------------------------------------|------------|----------------|
| Restart<br>Address                                                                                     | 31 - 0     | Address at which execution is started | R/W        | 0              |
| mips32_mt_settcrestart(startTC1)<br>Assembly code:<br>li t0, _startTC1<br>mttc0 t0,c0_tcrestart<br>ehb |            |                                       |            |                |

Use the TCRestart register to tell the CPU where to start fetching instructions from for the target TC. Use the function pointer for the startTC1 function as the address to start TC from.

+ The macro mips32\_mt\_settcrestart sets the starting address using the startTC1 function pointer which points to the starting function for the thread.

| Fields |                                     | TCStatus - CP0#2-Sel1                                                                      | Read/ | Reset |
|--------|-------------------------------------|--------------------------------------------------------------------------------------------|-------|-------|
| Name   | Bits                                |                                                                                            | Write | State |
|        | 13                                  | Activated. If set run instructions for this TC. Also set by FORK and cleared by YIELD $0$  | R/W   | 1     |
| A      | 15                                  | Dynamic Allocation enable. If set TC can be allocated by FORK or de-<br>allocated by Yield | R/W   | 0     |
|        | ri v0,v0,0xa000<br>httc0 v0,c0_tcst | // or in the A and AD bits<br>atus // write the TCStatus register                          |       |       |

Next activate the thread and make it available for use with Fork and Yield instructions using the TCStatus register.

+ To active the thread, set the A field (activated)

+ To make the thread Yieldable it must be marked a Dynamically Allocatable, set the DA bit

Note: This example does not use the fork instruction but it will use the Yield instruction at the end of execution so we do need to enable Dynamic Thread allocation by setting the DA field.

+ I use the mips32\_mt\_gettcstatus to get the curent value and the TCSTATUS\_A and TCSTATUS\_DA #define to set the bits and the mips32\_mt\_settcstatus macro to write the value to the register.

|      | Dite                          | TCHalt - CP0#2-Sel4                                                                                              | Read/Write        | Reset<br>State |
|------|-------------------------------|------------------------------------------------------------------------------------------------------------------|-------------------|----------------|
| Name | <b>Bits</b><br>0              | Thread Halted. If set thread has been halted and cannot be located, activated, or scheduled                      | R/W               | 1              |
|      | y code:<br>httc0 zero,c<br>hb | 0_tchalt // only bit in register move the value in the zero r<br>// ensure write has completed before continuing | egister to TCHalt |                |

The last step in configuring the Thread is to un-halt it. By doing this the thread can be scheduled and instructions can be fetched once I enable multi threading. Clearing the H bit in the TCHalt register un-halts the thread.

+ The mips32\_mt\_settchalt macro with a zero argument will clear the H bit in the TCHalt register.



To setup thread 2 I just set the Target TC to 2 and then set it up the same as thread 1.

+ The one thing that must be different for each thread is the stack other wise the stack will be corrupted. You can use the same starting code address since each thread will have its own stack and therefore each will have its own context. However this example will use slightly different code for each thread.

| МТ   | MT TC Example Enable Threading |                                                   |            |       |  |  |  |
|------|--------------------------------|---------------------------------------------------|------------|-------|--|--|--|
| Fi   | elds                           | VPEControl - CP0#1-Sel1                           | Read/Write | Reset |  |  |  |
| Name | Bits                           |                                                   |            | State |  |  |  |
| TE   | 15                             | Thread Enable. If unset only one TC may execute.  | R/W        | 0     |  |  |  |
| As   | ori v0,v0                      | ,c0_vpecontrol // read in the VPEControl register | tinuing    |       |  |  |  |
|      |                                |                                                   |            | 14    |  |  |  |

After I have initialized all the threads I need to enable threading on the VPE. I do this by setting the TE bit in the VPEControl register.

+ To do this I use the mips32\_mt\_getvpecontrol to get the current value of the VPEControl register then I use the VPECONTROL\_TE #define to set the TE bit and the mips32\_mt\_setvpecontrol to write it back.

## Turn off configuration flag and enable Virtual Processing • MVPControl - CP0#0-Sel1

| Fields      |                                                   | MVPControl - CP0 #0-Sel1                                                                                                                                                         | Read/Write | Reset |
|-------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-------|
| Name        | Bits                                              |                                                                                                                                                                                  |            | State |
| VPC         | 1                                                 | VPE Configuration State. If set, allows writing to normally read-only configuration register fields on conventional MIPS32 CPUs.                                                 | R/W        | 0     |
| EVP         | 0                                                 | Enable Virtual Processors. If set, execute instructions for all threads<br>on activated VPEs. If cleared, execute instructions only for thread<br>which is running when cleared. | R/W        | 0     |
| li t<br>ins | c0 t0,c0_m<br>1 1<br>5 t0, t1, 0, 2<br>c0 t0,c0_m | // load the value for the combined VPC and EVP<br>// insert VPC and EVP fields number into MVPCo                                                                                 |            |       |
|             |                                                   |                                                                                                                                                                                  |            |       |

Last, to finally enable Multi threading and start all enabled threads executing I need to turn off configuration mode and Enable Virtual Processing. These are set in the MVPControl register.

+ I use the mips32\_getmvpcontrol macro to read the register

Then the MVPCONTROL\_VPC #define to clear the VPC bit to turn off the configuration state

and the MVPCONTROL\_EVP #define to set the EVP bit to enable virtual processing.

Then write the register using the mips32\_setmvpcontrol macro.

| MT TC Example <ul> <li>Start count function on TC0</li> </ul> |    |
|---------------------------------------------------------------|----|
| count(0);                                                     |    |
| return (0); // Never gets here.                               |    |
|                                                               |    |
|                                                               |    |
|                                                               |    |
|                                                               |    |
| MIPS                                                          | 16 |

At this point all threads will be scheduled and will start running code.

Now put the current thread that executed the initialization code, thread 0 into the mix by calling the count function.

Note; the code will never execute the return call because all threads will be yielded including thread 0.

| MT TC Example                                                                                                                                                        |   |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| <ul> <li>Count Functions</li> </ul>                                                                                                                                  |   |
| <ul> <li>Two simple functions</li> </ul>                                                                                                                             |   |
| <ul> <li>Both increment a counter element in the same array using the argument<br/>given, which is the TC number.</li> </ul>                                         |   |
| <ul> <li>Count function – increments array element by using a cached address in<br/>KSEG0. When the counter reaches 2000 the TC will yield (de-allocate)</li> </ul>  |   |
| <ul> <li>Nccount function  – increments array element by using a Uncached address<br/>in KSEG1. This function will stall waiting to read the count value.</li> </ul> |   |
|                                                                                                                                                                      |   |
|                                                                                                                                                                      |   |
|                                                                                                                                                                      |   |
|                                                                                                                                                                      | _ |
|                                                                                                                                                                      | 7 |

The rest of the code will be used to show multi threading and how the threads behave when run from cache or straight out of ram.

+ To do this there are 2 simple functions each of which will increment a counter in a global array.

+ The first function Count, access the counter array using a cached address.

+ the second function Ncount increments a counter in the same array but through a uncached address.



What you will see is the threads that use the cached address to increment their counter will not stall and these threads will execute more then the thread that is writing to the uncached address. This is because the threads using the cached address will be allowed to execute while the thread that uses the non cached address stalls waiting for the load to complete.

Fine grain multi threading allows other threads to execute while another thread is stalled. Normally stall cycles would be wasted in a non threaded CPU.



The thread shows the effect of the Yield command.

+ Each thread will use the Yield command to terminate itself once terminal count has been reached.

+ You'll see that the threads using the count function will terminate before the thread executing the Ncount function because they have gotten to run while the Ncount function was stalled.