

### **Central Signal Processor Consortium**

### Sean Dougherty, NRC Herzberg CSP Consortium Lead

Thanks to Dave Stevens, Grant Hampson, Mike Pleasance, Ben Stappers, Willem van Stratem, Sonja Vrcic

13 June 2017 SKA Engineering Meeting, Rotterdam

### CSP Product Breakdown Structure



## **Progress Against Plan**



- Significant effort supporting CCP
- Finalizing Level 2 requirements against Level 1 Rev 10
- Signal Modelling mature
- All ICDs (internal and external) being worked aggressively
- Mid.CBF Frequency Slice Architecture adopted
  - substantial cost savings, reduced risk
- Working towards the Pre-CDR milestones
  - main content of sub-element CDRs



## **Schedule**



| Milestone   | Description Date                                        |             |
|-------------|---------------------------------------------------------|-------------|
| 13          | Sub-element CDRs &<br>Prototype Test Reports            |             |
| <b>13</b> a | Pulsar Timing Sub-element 5 Oct 2017                    |             |
| 13b         | LMC Sub-element                                         | 3 Oct 2017  |
| 13c         | Pulsar Search Sub-element                               | 30 Oct 2017 |
| 13d         | Mid CBF Sub-element                                     | 1 Nov 2017  |
| <b>13</b> e | Low CBF Sub-element                                     | 6 Nov 2017  |
| 14          | Submission of Stage 2 (CDR) 14 Dec 2017<br>Data Package |             |
| 15          | Review of Stage 2 Data<br>Package (CDR)                 | 29 Jan 2018 |
| 16          | Closure of Stage 2                                      | 31 Mar 2018 |





## **Current State of CSP Design**

- All sub-elements approaching Pre-CDR status
- Some design updates being worked
  - cost, power, risk, and construction schedule issues
- Designs are much more mature and refined
- Biggest remaining issue is total cost



5



## **CSP Issues and Risks**

- Impacts of CCP
- Schedule implications
- Funding issues for some consortium members
- Unresolved System Level Issues
  - L1 Rev 10 requirements definitions
  - Some ICDs e.g. Transient buffer, VLBI
- Adequate CDR review resources from SKAO
  - Extensive volume of materials
  - Many CDR's in a short period
- Post-CDR?
  - Need to keep momentum



6

## **Perentie Team – aka "CSP\_Low.CBF"**

### CSIRO, ASTRON and AUT

- Average 7.8 FTE
- 4.1/2.9/0.8 respectively

Penticton "Convergence" workshop Nov 15-18<sup>th</sup>

- Hardware platform documentation
- Liquid cooling for LRU and sub-rack



Thank you to Mike and Sonja for a great tour of the NRC-DRAO facilities



7

# **Detailed Costing Analysis**

Firmware workshop 30<sup>th</sup> Jan – 3<sup>rd</sup> Feb at ASTRON Detailed analysis of all firmware modules and estimation of effort:

Independent
 estimates from
 7 engineers
 – confidence
 within ~8%

Costs reduced 22%

- 22.5 ME(Sept' 16)
- 17.6ME (Jun '17)



\*Newsflash\* – new volume FPGA price and less QSFP's => further 0.9MEuro saving





# Monitor and Control Environment (MACE)

MACE prototyping goals:

- HW networking, servers
- Gemini LRU protocol Tango server
- SW client, engineering interface, Tango
- ARGS Automatic Register Generation System<sup>4</sup>
- FW protocol interface firmware, register endpoints

Secondment of Leon Hiemstra (FW engineer) from ASTRON to CSIRO for MACE prototyping in July/Aug/Sep

Extra modes in Subarrays, Substations, Zooms and VLBI increase complexity of control software



9



## Gemini FPGA LRU

- Collaborative design effort
- Single LRU for all FPGA processing
- 1.3Tbps external
  I/O per LRU
- Production board in July
- HBM version early 2018
- HBM breaks memory buffer I/O bottleneck

**CSP** Consortium



Gemini Proof of Concept (POC) board (has HMC)

### Gemini first production board – 18 layers – 915 parts



## **Gemini Sub-rack**

288-FPGA farm 24 sub-racks of 12 Gemini LRU's

Complete mechanical design supports:

- liquid cooling
- 48V DC Power
- optical mesh (inter sub-rack and rack)

Very low mean-time-to-repair:

- Hot swappable LRU design
- Non-interfering front panel LRU optical connections
- Optical/Liquid/Power

- quick connects on backplane





## **Disruptive forces past 6 months**

Cost control project

**CSP** Consortium

- Expending effort on alternatives
- CDR slippage announced, no clear outcome yet
- Only 100 calendar days to CDR submission

Rev 10 requirements + ECPs

- Major changes for zooms, substations, VLBI
- In some cases results in complete redesign

Funding to CSIRO – successful 2018 (likely 2019)

Lots of work required for application though

Requirements churn and project delays

 Collaborators prioritizing projects (e.g. APERTIF)







### CSP Mid CBF Update SKA1 MID.CBF CSP Mid CBF Update SKA1 MID.CBF CSP Mid CBF Update SKA1 CBF CSP Mid CBF Update SKA1 CBF Mike Pleasance, NRC June-2017



# **System Architecture**

- Architecture change to Frequency Slice Architecture
- Reduced cost by €19M
- Reduced construction risk
- Reduced commensality in Band 4 and 5
- 369 Processing LRUs (2U Server boxes)
- 738 Stratix 10 FPGAs
- ~20 Tbps Input Data Rate
- ~10 Tbps Output Data Rate (SDP, PSS, PST)
- ~240kW of power
- Air cooling (liquid possible)
- Flexible power density:
- 28, 20 or 14 Processing Racks
  - pending INFRA-SA



## 

16

## **Hardware Design Iteration**

### • HW design to reduce risk

- Single processing LRU used throughout the system for VCC and FSP
- 2U air cooled (adaptable to liquid cooling) server box with two FPGA processing boards
- DDR4 for bulk memory instead of emerging HMC
- AC powered using COTS ATX 1+1 redundant power supply





## **Firmware Design**

- Targeting Altera Stratix 10 FPGAs
  - 14nm available today reduced risk
  - Up to 28 Tera-Ops / second (fixed point multiply/accumulate)
  - Hyperflex Technology: Clock rates up to 1GHz
  - Up to 144 Transceivers, 128 @ 30 Gbps + 16 @ 17.4 Gbps
  - Quad-core ARM A-53 embedded processor
  - First Stratix 10 device received by NRC in March



- Estimated ~3700 Person Days of Firmware design effort in construction phase
  - ~3200 Person Days Signal Processing (~25 IP Blocks)
  - ~500 Person Days Infrastructure / Utilities (~10 IP Blocks)
- Designs running at clock rates of 450-500 MHz
- Currently prototyping highest risk signal processing IP blocks to confirm resource usage and performance
  - Digital Resampler / Fine Delay Tracking
  - Band 5 Very Coarse Channelizer
  - PSS / PST Beamformers
  - Correlator

## Path to CDR

- A lot of work since Stellenbosch
- On target for Mid.CBF CDR in Nov.



#### Deliverables

- Mid.CBF Sub-element Requirement Specifications (EB-1)
- Mid.CBF Detailed Design Document (EB-4a)
- Mid.CBF Development Plan (EB-5)
- Mid.CBF Signal Processing Matlab Model (EB-7)
- Mid.CBF Prototyping Results

#### Prototyping

- Design and prototype Stratix 10 FPGA boards
- Design and prototype air cooling thermal solution
- Complete algorithm signal modelling
- Prototype high risk signal processing firmware blocks
- Prototype monitor and control software infrastructure
- Prototype 26Gbps and 100GBE communication links





### **Pulsar Search: PSS-Low/Mid**

Manchester, Oxford, MPIfR, STFC, ASTRON, INAF, NZA, Swinburne

## **Progress Summary**

#### Acceleration Search

- Benchmarking on latest FPGA implementations promising results.
- Industrial contract signed to develop prototype FPGA implementation.
- GPU version continually improving speed

#### Single Pulse Search

- End-to-end single pulse search using GPUs and CPUs almost completed. All FPGA components now available.
- **Pipeline** New data types added. More modules added and progress on FPGA implementation.
- Hardware ProtoNIP hardware arrived/arriving in South Africa for assembly this month.
- **Industry** Working with many companies on hardware, firmware and software review.
- TestingFull test vector machine infrastucture to generate test vectors rapidly is built.Now building test vectors.





20

## **Design Status**

**PSS CDR Deliverables Status** 

- Most CDR deliverables were baselined end of March 17
- 100% complete except for ECPs and updates that track further development or requirements changes
- The notable exceptions are:
  - Level 2 Requirements Specifications (~80%)
  - Dependent on completion of the CSP Level 2 requirements definitions and also drive
    - The Test Specifications
    - The Verification Matrix
  - Development Plan also requires a construction Statement of Work
  - Expanded Prototype Results

Level 3 requirements almost fully updated.



# **ProtoNIP: PSS prototype in the Karoo**

- 18x 2U, dual socket, dual accelerator servers
- 2x Intel Xeon 14 cores
- Arria 10 PCIe FPGA board
- NVIDIA P100 GPU card
- 512 GB TruDDR4 RA
- Rack allocation for tests of power, cooling, system stability, running Cheetah/Panda prototype software.
- Majority of hardware in RSA
- Operational July 2017



## **Software/Firmware Developments**

**GPU Single Pulse Pipeline:** Through work with Zenotech, greatly improved peak identification in single pulse search: decreased redundancy – 50-100x reduced no. candidates => less-post processing.



Able to now run a GPU version of the Single Pulse Code at significantly faster than real time. Library of AstroAccelerate code being placed into Cheetah framework.

**FPGA Implementation of Single Pulse:** -- Achieving real time performance now for most scenarios.



#### Harmonic Summing - University of Auckland



Implementation of the complex algorithm for acceleration searching Harmonic Summing in OpenCL established 23

## **Test Vector Infrastructure & LMC Prototype**

**Test vector pipeline**: Produces SKA test vectors, used to verify prototype pipeline adherence to functional/scientific requirements. Comprised of software and hardware:

- Rack mountable workstation , 2 x Xeon E5-2630 v4 CPU's, 512 GB of RAM.
- 24 TB of disk capacity (NVMe & SSD), can store 1,000 SKA test vectors.
- 2 x Nvidia GTX1080 Ti (pascal) GPU.
- Python software in Docker container, executing pulsar tools (Tempo, Sigproc ...).
- Working, test vectors being produced.
- Directly interfaced with a ProtoNIP node.

#### **PSS LMC evolutionary prototype:**

- Main Tango Control Devices in place.
- Init/program/start scan working. interface with (small) real nodes.
- Interface with simulated pipelines.
- Compatible with the (evolving) SKA standards.



## **Industrial Engagement**

Design work with FPGA designers Covnetics complete:

Contract for a prototype implementation of Fourier Domain Acceleration Search signed last week.

Industrial collaboration with Zenotech (James Sharpe did the work) that has led to our GPU peak finding kernels.

Two contracts with software companies SRSLearning and RealTech:

- DevOps Development and Operations support including documentation
- C++ programming for pipelines.

Through the ProtoNIP work we are working closely with:

- OCF for the servers and GPUs
- Sarsen Technologies and Bittware for the FPGAs.





### Pulsar Timing: PST-Low/Mid

Facilitates gravitational wave science and tests of gravity in the strong-field regime using pulsars

PST Design team based at

- Swinburne University of Technology,
- Auckland University of Technology, and
- Max Planck Institute for Radio Astronomy

## Pulsar Timing: PST-Low/Mid

Digital signal processing pipeline performs:

- Radio Frequency Interference mitigation
- Phase-coherent dispersion removal
- Integrated pulse profile formation
- Dynamic spectrum formation



## PST-Low/Mid Baseline Solution

- Custom software on COTS hardware
- PST processes 16 phased array beams:
  - 16 servers, each with 4 GPUs (Nvidia Pascal/Volta)
  - Space: 2 racks
  - Power: 17 kW
- Design is fully compliant for Mid and Low
  - Developing signal model to demonstrate cost-saving ideas for CBF
- Prototyping progress
  - Prototype installed at SKA-SA in November 2015
  - Installed and integrated with MeerKAT in September 2016



# Prototype PST Hardware



- Currently in operation at MeerKAT
- Provides platform for de-risking SKA PST design
- Informs costing, RAMS, ILS, etc. in real-world environment

### MeerKAT Commissioning Science



- MeerKAT pulsar timing programs currently under way (as part of commissioning science).
- Results provide extremely useful diagnostic information for the telescope as a whole.



### **CSP LMC Sub-element Report**

Sonja Vrcic, NRC Herzberg CSP.LMC Lead

13 June 2017 SKA Engineering Meeting, Rotterdam

### **CSP Local Monitor and Control Sub-element**

- Provides a single point of access for TM for observation configuration and control.
- Maintains and reports to TM overall status of the CSP and its capabilities.
- Provides advice to TM regarding the significance of alarms and events and suggests corrective actions.



**CSP** Consortium



32

## **CSP LMC Team**

| Sonja Vrcic       | NRC                     | Canada    | Sub-element lead, Design, ICDs                           |
|-------------------|-------------------------|-----------|----------------------------------------------------------|
| Nicolas Loubser   | MDA                     | Canada    | ILS/RAMS/Opex                                            |
| Carlo Baffa       | INAF                    | Italy     | PSS Interface and related requirements and functionality |
| Elisabetta Giani  | INAF                    | Italy     | Lead prototyping effort                                  |
| Marina Vela       | INAF                    | Italy     | Prototyping tools and GUIs                               |
| Andrew<br>Jameson | Swinburne<br>University | Australia | PST interface and related requirements and functionality |
| Rajesh Warange    | NCRA                    | India     | Test Plan                                                |





## **Requirements and Design**

- Requirements and design are driven by interfaces and project standards.
- Interfaces:
  - Telescope Manager
  - Correlator and Beamformer (CBF)
  - Pulsar Search Engine (PSS)
  - Pulsar Timing Engine (PST)
- Standards
  - SKA Control System Guidelines
  - TANGO Control System
  - Software Engineering Standards





## **CDR Deliverables**

- CSP.LMC Sub-element CDR Package:
  - Requirements (Level 3)
  - Detailed Design Document
  - Development Plan
  - Prototyping Report
  - Costing
- LMC team maintains relevant Interface Control Documents :
  - □ CSP to Telescope Manager
  - □ LMC to CBF, PSS and PST
- In addition, CSP.LMC provided significant contribution to the development of project wide initiatives:
  - Control System Guidelines (Design Patterns)
  - Software Engineering



## **Status**

- ✓ Overall requirements and design are well understood.
- ✓ Good progress has been achieved on prototyping.
- Still lots of work remains on CDR deliverables to incorporate latest guideline, requirements and interfaces.





## Challenges

- Additional effort is required to keep up with constant changes:
  - Standards are being developed and defined in parallel with design;
  - Requirements and interfaces evolve.
- Requirements for volume of data and data rates are difficult to estimate.
- In the mean time work on prototyping continues.





### Thank you !



