#### BIRLA INSTITUTE OF TECCHNOLOGY AND SCIENCE PILANI, PILANI CAMPUS

#### First Semester 2023-24, Mid Semester exam (Closed Book),

G626: Hardware Software Co-design.

Duration: 90 minutes Date of exam: 13/10/2023 Max. Marks: 50

Q1. Apply hierarchical clustering to the following topology. The weights indicated on the links in the topology indicate the closeness function. During clustering assume the modified closeness function to be the arithmetic mean of the prior weights (as discussed in the class too). At the end summarize the existing clusters at the different steps/points in the clustering. [4M]



Q2. Consider the following Data flow graph, architecture graph and the additional details. **a, b and c** are the inputs to the system and **d** is the outputs. Note that P1, P2, etc. are the various procedures and all communication between various modules happens over the shared bus as shown in the architecture graph. The additional details describe the time a particular procedure/process takes to be executed on a particular component (in ms) [9 M]



| Procedure | Component  | Time |
|-----------|------------|------|
| P1        | FPGA       | 12   |
| P1        | DSP        | 9    |
| P2        | FPGA       | 18   |
| P2        | DSP        | 6    |
| P3        | FPGA       | 18   |
| P3        | DSP        | 6    |
| P4        | FPGA       | 24   |
| P4        | Ult.Scale  | 18   |
| P4        | DSP        | 12   |
| C1        | Shared Bus | ~0   |
| C2        | Shared Bus | ~0   |
| C3        | Shared Bus | ~0   |
|           |            |      |

- a. What is the order of solution space of this design problem? [2M]
- b. Draw the specification graph for the given problem. [2M]
- c. Draw the implementation graph given that the system is supposed to generate the output within a time deadline of 24 ms and a cost constraint as cost < 25k. (Assume each of the modules (i.e. FPGA, DST and UltraScale) costs 10k) Clearly highlight (/draw) the following i. feasible allocation ii. feasible binding. iii. Scheduling of the different tasks/functions [5M]</p>

## Q3. Design Problem – 15 M

Implement a real-time video denoising system using the PYNQ framework on a PYNQ-Z2 board. The system should capture video from a USB camera, identify specific objects in the video stream using custom IP logic, and trigger an alert through PS GPIO (). The design must use DMA for data transfer, an AXI interface for communication, PS GPIO for control and alerts, and PS/PL interface for memory allocation and buffer management.

## **Requirements:**

- PYNQ Framework: Utilize the PYNQ framework for developing Python code that interfaces with the Zyng PS and PL.
- DMA: Use Direct Memory Access for efficient data movement between PS and PL.
- AXI Interface: Implement the AXI interface to facilitate communication between the PS and the custom IP block.
- AXI and MMIO: Implement AXI interface and use MMIO to interact with the custom IP. Assume the IP Base address is 0x500000000 and the address range is 0x1000.
- PS GPIO: Utilize PS GPIO for triggering alerts when a specified object is identified.
- PS/PL Interface Memory Allocation and Buffer: Implement memory allocation and buffer management for real-time object tracking and identification.
- Custom IP: Design a custom IP block for video denoising.

#### Task:

- Draw an end to end pipeline diagram capturing the the whole process.
- Draw a basic block level diagram which we see in IP Integrator tool.
- Write a basic python code using relevant apis for each of the requirements. Begin with importing a custom overlay file called custom.bit

# Q4) Vitis AI and ML [10M]

- a) What are the three components of Vitis AI? How is Vitis AI packaged and delivered? [2M]
- b) What is the advantage of using DPU architecture over other Verilog based architectures? [1M]
- c) What happens in Optimization Step and Quantization step? [2M]
- d) Differentiate bwteen the two types of Loss Functions we studied in class? [2M]
- e) \
- f) W
- 2 X

| What would be the result of operation of Max Pooling with 2X2 Filter on this matrix? [1M] | 7 | 8 | 1 | 2 |
|-------------------------------------------------------------------------------------------|---|---|---|---|
| What would be the dimension on a 32 X 32 X 3 image with 5 filters of size                 | 5 | 6 | 1 | 2 |
| X 2 with stride of 2. [2M]                                                                | 1 | 1 | 3 | 4 |
|                                                                                           |   |   |   |   |

# Q5) PS-PL basic, interface and AXI [12M]

- What is AXI protocol and what is it's main purpose in context to SoC? Write a short explanation of three flavours of AXI4. [2]
- (b) Compare the AXI Read Transaction and AXI Write Transaction with the help of diagram. [2]
- (c) Mention 4 PS external interfaces and mention 4 PL external interfaces. Write a small line explaining each interface. [2M]
- Describe in detail the 9 interfaces available at the PS-PL boundary. Specify the Master and Slave (d) direction. [2M]
- Draw a simple diagram of the APU of the Zyng 7000 device showing the major components (including different memory blocks) and their interconnections. No need to specify the exact memory . [2M]
- What is the full form and functionality of SCU? Write 3 points in functionality. [2M] (f)