802.11n TX aggregation support
Overview
A large part of correct 802.11n behaviour is implementing 802.11n TX aggregation.
Sponsored by
This work was initially sponsored by Hobnob, Inc. from June, 2011 through to December, 2011.
It is now being done as a side-project by Adrian Chadd, separate from his employment.
Qualcomm Atheros, Inc. have also provided source code and documentation under an NDA which allows for contribution back to an open source project.
Location
The initial work was done in a FreeBSD project subversion branch - base/user/adrian/if_ath_tx/ . That branch is now obsolete. Development work was done in various branches in https://gitorious.org/~adrianchadd/freebsd/adrianchadd-freebsd-work before merged back into -HEAD. That's also now mostly obsolete.
The current work is in -HEAD.
Status
The code in -HEAD is stable in both station and hostap modes.
IBSS should negotiate 802.11n between FreeBSD nodes.
TDMA and mesh currently do not have 802.11n support.
Timeline
The initial project scope was to implement 802.11n TX aggregation. The completed milestones (which are now in -HEAD) :
- AMPDU TX support for 802.11n NICs
- Software retransmission support
- net80211 changes needed for AMPDU TX (it already has AMPDU TX support but some further extensions are likely required)
- STA, Hostap support
- Correct BAW tracking
- Fix some locking issues in the driver, but ignore net80211 lock-order issues.
- Minimal changes needed to ath_rate_sample to support 802.11n
- Change ath_rate_onoe and ath_rate_amrr to work for legacy rates, but not 802.11n
The second phase, which is also now in -HEAD:
- Extend net80211 AMPDU-TX support to be per-TID, rather than per-AC
- Correct BAR TX
- Properly serialise TX/RX (which occur in parallel) with reset and channel change operations
The third phase, which is now also in -HEAD:
- Filtered frames support (for nodes which go into power saving mode, so frames aren't dropped)
net80211 -> driver notification of node sleep and wake
- Teach net80211 power-saving hostap support (TIM bitmap) about filtered frame queue depth in the ath driver (as a work-around for now, as the net80211 driver only knows about PS buffered frames, not filtered frames from the NIC.)
What I'm currently working on now:
- Fix the current TX path locking so there's a per-TID queue lock separate from the hardware TXQ locks
- .. then overhaul how frame TX queuing is handled - right now all non-QoS frames go into TID 16 at the best effort AC, which is not actually correct (eg some management frames may want to go out at a higher AC.) This needs to be supported.
- And then a general locking strategy review and tidyup, as the current method is not very optimal and efficient.
Future work, for which there's no fixed timeline:
- Implement correct (any!) locking in net80211 for AMPDU RX and TX state tracking
- MIMO PS support - STA and hostap
- Improve net80211 rate control to support 802.11n and multi-rate retry
- Implement a sample rate control module for net80211
- (Maybe) port the minstrel rate control module from Linux/mac80211
- Correct how channel scanning and general off-channel TX occurs; maintain the current pending TX/RX queues, so scan/bgscan can be done without upsetting 802.11n aggregation sessions.
Future work:
- Power saving support in STA/IBSS mode
- 802.11n IBSS/TDMA/Mesh support
- Push per-TID AMPDU TX queue management into net80211
- Fix up / verify fast frames support (which needs per-AC state kept for each node, it's done in net80211 but I've never tried it.)
- RIFS TX?
- AR5416 burst support?
Task list
(This is incomplete and mostly relevant to Adrian.)
Per-TID software queue |
Complete |
Per-AC software queue |
TODO |
STA operation |
Complete |
Hostap operation |
Complete |
Filtered frames |
Complete |
802.11n protection |
Complete |
Correct BAR TX |
Complete |
Software retransmission of aggregate session frames |
Complete |
Test backwards compatibility with previous non-11n chipsets |
Complete |
TX aggregation statistics |
Complete |
Extend net80211 AMPDU TX state to be per-TID, not just per-AC |
Complete |
Rate control changes |
Complete |
AR5416 RTS 8K TX limitation |
Complete |
Reset serialisation |
Complete |
AR5416 BA reset workaround |
Complete |
Ignore RSSI for non-final aggregate frames |
TODO |
Implement per-node and per-tid scheduling, rather than just per-TXQ |
Rejected |
Migrate node/TID locking from the TXQ lock to the ATH_NODE lock |
Rejected |
Migrate node/TID locking to a global ath TX lock |
Complete |
Implement filtered frames support, for aggregate traffic sessions |
Complete |
Implement filtered frames support, in both aggregate and non-aggregate modes |
Complete |
Extend net80211 power-save support (hostap) to be driver software-queue aware |
Complete |
Extend net80211/ath(4) power-save support (hostap) to correctly handle station sleep/wakeup and ps-poll/uapsd based frame transmission |
Complete |
When entering off-channel mode (for scan), don't simply purge the software/hardware frame queue, but maintain its contents until the scan has completed. This is required for scan/bgscan support with active aggregate sessions. |
TODO |
.. fix bgscan (see above) |
TODO |
Known Issues
- Many of the status fields don't apply for intermediary RX frames in an aggregate. This includes RSSI values. These should be tossed in the RSSI tracking logic.
- Unfortunately net80211 and the radiotap code will also need to be told which frames have valid RSSI and which don't, or they may use invalid RSSI for various decisions.
The channel width change in net80211 needs to be extended to defer the actual channel width change/update (ie, the channel flags and the vap->iv_bss->ni_chw field update) until _after_ the queued channel width change net80211 task has occured. Otherwise there's a small window where ni_chw doesn't reflect what the hardware actually has configured, resulting in unpredictable messes.
- .. and it's likely a good idea to add some code to if_ath_tx_ht.c for now to at least log when this has occured.
- bgscan doesn't work - frames currently in flight are dropped once a channel change occurs. This needs to be rectified.
- pspoll and general power save support also need to be addressed.
Fixed issues
- TDMA is broken - don't even enable it or you'll get immediate TX hangs.
- There is some code from ath9k which serialises register access for earlier PCI NICs. This may help some users who are using the AR5416/AR9160/AR9220/AR9227 PCI versions in multi-core machines.
- There seems to be a race condition with the (bg?) scan code, where the stack is given some frames destined for another station; it then corrupts the current encryption key RSC and AMPDU RX state.
- There's an issue with RX'ing high throughput UDP streams - RXEOL/RXORN interrupts
- Frames shouldn't be flushed from the TX queue on an interface reset (eg stuck beacon) !
- I need to add some code to lock things in the ath driver so concurrent tasks don't try fondling the hardware at the wrong time.
- There is also some code which serialises access to the MAC for things like TX, RX and interrupt fiddling. This is so two concurrent tasks (eg a callout and a taskqueue call) don't interleave and potentially leave things in an inconsistent state.
- The RX queue shouldn't be flushed on an RX DMA hang - they should first be handled so the RX A-MPDU BAW state isn't lost (along with the frame contents!)
- Sequence number allocation needs to be deferred until the frame is first queued to the hardware, otherwise race conditions can occur on multi-CPU machines where out of order sequence numbers get allocated and added to the BAW, with the lower seqno losing the race and never being added (as it's immediately outside of the TX BAW.)
- I need to implement correct BAR TX handling; it's just not done at all at the present.
- State transitions (eg band/width/stream changes) need to be kept in mind when doing interface resets, the hardware doesn't like being given rates that the PHY isn't currently setup for.
- The rate control code makes some occasional poor choices.
Management frames are sent via ic->ic_raw_xmit() and this calls into the driver directly to TX. This is a problem for things like AMPDU, BAR and software retransmit/busy frames, as bursts of UDP traffic may cause the TX queue to fill up. This is 100% reproducable by doing a 200mbit UDP TX without aggregation up - the driver pauses the queue and then never gets a free ath_buf to send a management frame with.
- .. for now, implement a per-node limit and a global limit for non-management frames, so there's always room for management frames to be added.
- Berislav has reported some regressions in performance/stability.
He's also reported asymmetric TX performance (ie, TCP/UDP through STA->AP doesn't match AP->STA).