Skip to content

Commit

Permalink
release 1.2.3
Browse files Browse the repository at this point in the history
  • Loading branch information
kristopk committed Jun 15, 2017
1 parent 480507c commit c7575f3
Show file tree
Hide file tree
Showing 8 changed files with 150 additions and 35 deletions.
49 changes: 49 additions & 0 deletions ERRATA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@

# AWS EC2 FPGA HDK+SDK Errata

Any items in this release marked as WIP (Work-in-progress) or NA (Not avaiable yet) are not currently supported by the 1.2.0 release.

## Integrated DMA in Beta Release. AWS Shell now includes DMA capabilities on behalf of the CL
* The DMA bus toward the CL is multiplexed over sh_cl_dma_pcis AXI4 interface so the same address space can be accessed via DMA or directly via PCIe AppPF BAR4
* DMA usage is covered in the new [CL_DRAM_DMA example](./hdk/cl/examples/cl_dram_dma) RTL verification/simulation and Software
* A corresponding AWS Elastic DMA ([EDMA](./sdk/linux_kernel_drivers/edma)) driver is provided.
* [EDMA Installation Readme](./sdk/linux_kernel_drivers/edma/edma_install.md) provides installation and usage guidlines
* The initial release supports a single queue in each direction
* DMA support is in Beta stage with a known issue for DMA READ transactions that cross 4K address boundaries. See [Kernel_Drivers_README](./sdk/linux_kernel_drivers/edma/README.md) for more information on restrictions for this releas

## Implementation Restrictions

* PCIE AXI4 interfaces between Custom Logic(CL) and Shell(SH) have following restrictions:
* All PCIe transactions must adhere to the PCIe Exress base spec
* 4Kbyte Address boundary for all transactions(PCIe restriction)
* Multiple outstanding outbound PCIe Read transactions with same ID not supported
* PCIE extended tag not supported, so read-request is limited to 32 outstanding
* Address must match DoubleWord(DW) address of the transaction
* WSTRB(write strobe) must reflect appropriate valid bytes for AXI write beats
* Only Increment burst type is supported
* AXI lock, memory type, protection type, Quality of service and Region identifier are not supported
* PCIE AXI4 interfaces between Custom Logic(CL) and Shell(SH) must follow the AMBA AXI4 protocol specification.
* Prior to running on F1 instance, it is highly recommended that developers run logic simulations with the ARM or Xilinx AXI4 protocol checker


## Unsupported Features (Planned for future releases)

* PCI-M AXI interface is not supported in this release.
* FPGA to FPGA communication over PCIe for F1.16xl
* FPGA to FPGA over the 400Gbps Ring for F1.16xl
* Aurora and Reliabile Aurora modules for the FPGA-to-FPGA
* Preserving the DRAM content between different AFI loads (by the same running instance)
* Cadence RTL simulations tools
* All AXI-4 interfaces (PCIM, DDR4) do not support AxSIZE other than 0b110 (64B)

## Known Bugs/Issues

* The PCI-M AXI interface is not supported in this release.
* The interface is included in cl_ports.vh and required in a CL design, but not enabled for functional use

* The integrated DMA function is in Beta stage. Known issues:
* DMA READ addresses crossing 4K page boundaries. The failure can be triggered by READ transfers that start on an address other than 4K aligned AND cross the 4K page boundary. READ transfers that do not cross the 4K boundary OR transfers that start at the beginning of a 4K page and greater than 4K size are not susceptible to the error. WRITE transfers are not affected by this issue Developers should use 4K aligned address boundaries on any READ transfer that can cross a 4K boundary to avoid the issue.
* Transfer sizes of 8KB or less are supported with the integrated DMA engine for this revision of the Shell. Integrated DMA with large transfer sizes (16KB or greater) can cause timeouts between the Shell and CL if the Shell can’t respond with all data before the timeout. Please see documentation on how to [detect a timeout has occured](./hdk/docs/HOWTO_detect_shell_timeout.md)



34 changes: 12 additions & 22 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@

# AWS EC2 FPGA HDK+SDK Release Notes

See [Errata](./ERRATA.md) for additional documentation of unsupported features and known bugs/issues.

## AWS EC2 F1 Platform Features:
* 1-8 Xilinx UltraScale+ VU9P based FPGA slots
Expand All @@ -26,8 +27,15 @@
* 1 DDR controller implemented in the SH (always available)
* 3 DDR controllers implemented in the CL (configurable number of implemented controllers allowed)

# Release 1.2.3
* New [Errata](./ERRATA.md)
* Added debug probes (.ltx) generation to build scripts
* Fixed a bug with the simulation model that fixed the AXI behavior of wlast on unaligned address
* Added [timeout debug documentation](./hdk/docs/HOWTO_detect_shell_timeout.md)

# Release 1.2.2
* Expanded [clock recipes](./hdk/docs/clock_recipes.csv)
* Virtual JTAG documentation updates
* Reduced DCP build times by 13% (34 mins) for cl_dram_dma example by adding an option to disable virtual jtag
* Included encryption of .sv files for CL examples

Expand All @@ -43,12 +51,12 @@
## NOTE on Release 1.2.0
Release 1.2.0 is the first Generally Available release of the Shell, HDK, and SDK. This release provides F1 developers with documentation and tools to start building their Custom Logic (CL) designs to work with the F1 instances.

Any items in this release marked as WIP (Work-in-progress) or NA (Not avaiable yet) are not currently supported by the 1.2.0 release.
Any items in this release marked as WIP (Work-in-progress) or NA (Not avaiable yet) are not currently supported by the 1.2.X release.


## Release 1.2.0 Content Overview

This is the first Generally Available release of the AWS EC2 FPGA Development Kit. Major updates are included for both the HDK and SDK directories. 1.2.0 a required version for all Developers running on F1 instances, and prior releases of the FPGA Development Kit are not supported.
This is the first Generally Available release of the AWS EC2 FPGA Development Kit. Major updates are included for both the HDK and SDK directories. 1.2.X is required version for all Developers running on F1 instances, and prior releases of the FPGA Development Kit are not supported.

**All AFIs created with previous HDK versions will no longer correctly load on an F1 instance**, hence a `fpga-load-loca-image` command executed with an AFI created prior to 1.2.0 will return an error and not load.

Expand Down Expand Up @@ -194,7 +202,7 @@ Additional tunable auxiliary clocks are generated by the Shell and fed to the CL

* Matching the new Shell/CL interface
* Add support for 32-bit peek/poke via ocl\_ AXI-L bus
* Adding Virtual JTAG support with Xilinx ILA and VIO debug cores (WIP)
* Virtual JTAG support with Xilinx ILA and VIO debug cores
* Demonstrate the use of Virtual LED and Virtual DIPSwitch
* Runtime software examples, leveraging fpga_pci and fpga_mgmt C-libraries
* Updated PCIe Vendor ID and Device ID
Expand All @@ -208,7 +216,7 @@ Additional tunable auxiliary clocks are generated by the Shell and fed to the CL
* Using SystemVerilog Bus constructs to simplify the code
* Demonstrate the use of User interrupts
* Demonstrate the use of bar1\_ AXI-L bus
* Includes Runtime C-code application under [CL_DRAM_DMA software](./hdk/cl/examples/cl_dram_dma/software) (WIP)
* Includes Runtime C-code application under [CL_DRAM_DMA software](./hdk/cl/examples/cl_dram_dma/software)
* See [CL_DRAM_DMA README](./hdk/cl/examples/cl_dram_dma/README.md)


Expand Down Expand Up @@ -284,24 +292,6 @@ Additional tunable auxiliary clocks are generated by the Shell and fed to the CL
* Only Increment burst type is supported
* AXI lock, memory type, protection type, Quality of service and Region identifier are not supported

## Unsupported Features (Planned for future releases)

* PCI-M AXI interface is not supported in this release.
* FPGA to FPGA communication over PCIe for F1.16xl
* FPGA to FPGA over the 400Gbps Ring for F1.16xl
* Aurora and Reliabile Aurora modules for the FPGA-to-FPGA
* Preserving the DRAM content between different AFI loads (by the same running instance)
* Cadence RTL simulations tools
* All AXI-4 interfaces (PCIM, DDR4) do not support AxSIZE other than 0b110 (64B)

## Known Bugs/Issues

* The PCI-M AXI interface is not supported in this release. The interface is included in cl_ports.vh and required in a CL design, but not enabled for functional use in this release.

* The integrated DMA function is in Beta stage. There is a known issue with DMA READ addresses crossing 4K page boundaries. The failure can be triggered by READ transfers that start on an address other than 4K aligned AND cross the 4K page boundary. READ transfers that do not cross the 4K boundary OR transfers that start at the beginning of a 4K page and greater than 4K size are not susceptible to the error. WRITE transfers are not affected by this issue Developers should use 4K aligned address boundaries on any READ transfer that can cross a 4K boundary to avoid the issue.

* aws_dcp_verify flow (aws_dcp_verify.tcl) does not work. The script will be fixed in a future release. Currently the script will always give an error even if the DCP is OK.

## Supported Tools and Environment

* The HDK and SDK are designed for **Linux** environment and has not been tested on other platforms
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,10 @@ if { $failval==0 } {
puts "AWS FPGA: ([clock format [clock seconds] -format %T]) writing post synth checkpoint.";

write_checkpoint -force $CL_DIR/build/checkpoints/${timestamp}.CL.post_synth.dcp

# Generate debug probes file
write_debug_probes -force -no_partial_ltxfile -file $CL_DIR/build/checkpoints/${timestamp}.debug_probes.ltx

close_project
#Set param back to default value
set_param sta.enableAutoGenClkNamePersistence 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,10 @@ report_timing_summary -file $CL_DIR/build/reports/${timestamp}.SH_CL_final_timin
puts "AWS FPGA: ([clock format [clock seconds] -format %T]) writing final DCP to to_aws directory.";

write_checkpoint -force $CL_DIR/build/checkpoints/to_aws/${timestamp}.SH_CL_routed.dcp

# Generate debug probes file
write_debug_probes -force -no_partial_ltxfile -file $CL_DIR/build/checkpoints/${timestamp}.debug_probes.ltx

close_project

# ################################################
Expand Down
19 changes: 11 additions & 8 deletions hdk/common/verif/models/sh_bfm/sh_bfm.sv
Original file line number Diff line number Diff line change
Expand Up @@ -1872,14 +1872,16 @@ module sh_bfm #(
bit last_beat;
logic [5:0] start_addr;
bit aligned;
bit last_data_beat;

num_of_data_beats = 0;
byte_cnt = 0;
num_bytes = 0;
aligned_addr = 0;
last_beat = 0;
start_addr = 0;
aligned = 0;
last_data_beat = 0;
byte_cnt = 0;
num_bytes = 0;
aligned_addr = 0;
last_beat = 0;
start_addr = 0;
aligned = 0;

for (int chan = 0; chan < 4; chan++) begin
if ((h2c_dma_started[chan] != 1'b0) && (h2c_dma_list[chan].size() > 0)) begin
Expand Down Expand Up @@ -1922,9 +1924,10 @@ module sh_bfm #(
axi_data.data = 0;
axi_data.strb = 64'b0;
axi_data.id = chan;
axi_data.last = (((num_of_data_beats - 1) - burst_cnt) == 0) ? 1 : 0;
last_data_beat = (((num_of_data_beats - 1) - burst_cnt) == 0) ? 1 : 0;
num_bytes = last_beat ? (dop.len + dop.cl_addr[5:0])%64 : 64;
if(axi_data.last) begin
axi_data.last = (j == axi_cmd.len) ? 1 : 0;
if(last_data_beat) begin
for(int i=0; i < num_bytes; i++) begin
axi_data.data = axi_data.data | tb.hm_get_byte(.addr(dop.buffer + byte_cnt)) << 8*i;
axi_data.strb = axi_data.strb | 1 << i;
Expand Down
68 changes: 68 additions & 0 deletions hdk/docs/HOWTO_detect_shell_timeout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@

# AXI Timeouts

* The Shell provides a timeout mechanism which terminates any outstanding AXI transactions after 2.5 uS. There is a separate timeout per interface. Upon the first timeout, metrics registers are updated with the offending address and a counter is incremented. Upon further timeouts the counter is incremented. These metrics registers can be read via the fpga-describe-local-image found in [Amazon FPGA Image Management Tools README](../../sdk//userspace/fpga_mgmt_tools/README.md)

* Timeouts can occur for three reasons:
1. The CL doesn’t respond to the address (reserved address space)
2. The CL has a protocol violation on AXI which hangs the bus
3. The address is going to F1 card’s DDR memory and the CL design’s latency is exceeding timeout value.

* Best practice is to ensure addresses to reserved address space are fully decoded in your CL design.
* DMA accesses to DDR will accumulate which can sometimes lead to timeouts.
* CL designs which have multiple masters to DDR will also incur arbitration delays.
* If you suspect a timeout, debug by reading the metrics registers. The saved offending address should help narrow whether this is to DDR or registers/RAMs inside the FPGA. If it’s inside the FPGA the developer should investigate protocol violations.

# How to detect a shell timeout has occured

* Shell-CL interface timeouts can be detected by checking for non-zero timeout counters. These metrics can be read using this command:
```
$sudo fpga-describe-local-image -S 0 --metrics
AFI 0 agfi-0f0e045f919413242 loaded 0 ok 0 0x04151701
AFIDEVICE 0 0x1d0f 0xf000 0000:00:1d.0
sdacl-slave-timeout=0
virtual-jtag-slave-timeout=0
ocl-slave-timeout=0
bar1-slave-timeout=0
dma-pcis-timeout=0
pcim-range-error=0
pcim-axi-protocol-error=0
pcim-axi-protocol-4K-cross-error=0
pcim-axi-protocol-bus-master-enable-error=0
pcim-axi-protocol-request-size-error=0
pcim-axi-protocol-write-incomplete-error=0
pcim-axi-protocol-first-byte-enable-error=0
pcim-axi-protocol-last-byte-enable-error=0
pcim-axi-protocol-bready-error=0
pcim-axi-protocol-rready-error=0
pcim-axi-protocol-wchannel-error=0
sdacl-slave-timeout-addr=0x0
sdacl-slave-timeout-count=0
virtual-jtag-slave-timeout-addr=0x0
virtual-jtag-slave-timeout-count=0
ocl-slave-timeout-addr=0x8001
ocl-slave-timeout-count=0
bar1-slave-timeout-addr=0x2001
bar1-slave-timeout-count=0
dma-pcis-timeout-addr=0x0
dma-pcis-timeout-count=0
pcim-range-error-addr=0x0
pcim-range-error-count=0
pcim-axi-protocol-error-addr=0x0
pcim-axi-protocol-error-count=0
pcim-write-count=0
pcim-read-count=0
DDR0
write-count=0
read-count=0
DDR1
write-count=0
read-count=0
DDR2
write-count=29797854199
read-count=4
DDR3
write-count=0
read-count=0
```
* For detailed infomation on metrics, see [Amazon FPGA Image Management Tools README](../../sdk//userspace/fpga_mgmt_tools/README.md)
5 changes: 1 addition & 4 deletions hdk/docs/Virtual_JTAG_XVC.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,10 +129,7 @@ Upon successful connection, Vivado's Hardware panel will be populated with a deb

5) Select the debug bridge instance from the Vivado Hardware panel

6) You will need a "Probes file" in the next step. Once you run the EC2 API create-fpga-image and the process of creating the AFI is complete, a "Probes file" is generated that has a ".ltx" extension.
```
$ aws s3 cp s3://<bucket-name>/<logs-folder-name>/*_debug_probes.ltx $CL_DIR #copy to the example directory
```
6) You will need a "Probes file" in the next step. A "Probes file" with an ".ltx" extension is generated during the build process and written to the checkpoints directory.

7) In the Hardware Device Properties window select the appropriate “Probes file” for your design by clicking the icon next to the “Probes file” entry, selecting the file, and clicking “OK”. This will refresh the hardware device and it should now show the debug cores present in your design. Note the Probes file is written out during the design implementation, and is typically has the extension ".ltx".

Expand Down
2 changes: 1 addition & 1 deletion hdk/hdk_version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
HDK_VERSION=1.2.2
HDK_VERSION=1.2.3

0 comments on commit c7575f3

Please sign in to comment.