Review the file api-reference/peripherals/spi_master.rst.

2019-07-11 19:00:58 +08:00 · 2019-07-11 19:00:58 +08:00 · deda0091d9
parent 7e3676f307
commit deda0091d9
2 changed files with 259 additions and 342 deletions
--- a/docs/_static/miso_timing_waveform_async.png
+++ b/docs/_static/miso_timing_waveform_async.png
--- a/docs/en/api-reference/peripherals/spi_master.rst
+++ b/docs/en/api-reference/peripherals/spi_master.rst
@ -1,256 +1,246 @@
-SPI Master driver
+SPI Master Driver
 =================

-Overview
--------
+SPI Master driver is a program that controls ESP32's SPI peripherals while they function as masters.

-The ESP32 has four SPI peripheral devices, called SPI0, SPI1, HSPI and VSPI. SPI0 is entirely dedicated to
-the flash cache the ESP32 uses to map the SPI flash device it is connected to into memory. SPI1 is
-connected to the same hardware lines as SPI0 and is used to write to the flash chip. HSPI and VSPI
-are free to use. SPI1, HSPI and VSPI all have three chip select lines, allowing them to drive up to
-three SPI devices each as a master.

-The spi_master driver
-^^^^^^^^^^^^^^^^^^^^^
+Overview of ESP32's SPI peripherals
+-----------------------------------

-The spi_master driver allows easy communicating with SPI slave devices, even in a multithreaded environment.
-It fully transparently handles DMA transfers to read and write data and automatically takes care of
-multiplexing between different SPI slaves on the same master.
+ESP32 integrates four SPI peripherals.

-.. note::
-
-    **Notes about thread safety**
-
-    The SPI driver API is thread safe when multiple SPI devices on the same bus are accessed from different tasks. However, the driver is not thread safe if the same SPI device is accessed from multiple tasks.
-
-    In this case, it is recommended to either refactor your application so only a single task accesses each SPI device, or to add mutex locking around access of the shared device.
+- SPI0 and SPI1 are used internally to access the ESP32's attached flash memory and thus are currently not open to users. They share one signal bus via an arbiter.
+- SPI2 and SPI3 are general purpose SPI controllers, sometimes referred to as HSPI and VSPI, respectively. They are open to users. SPI2 and SPI3 have independent signal buses with the same respective names. Each bus has three CS lines to drive up to three SPI slaves.


 Terminology
-^^^^^^^^^^^
+-----------

-The spi_master driver uses the following terms:
+The terms used in relation to the SPI master driver are given in the table below.

-* Host: The SPI peripheral inside the ESP32 initiating the SPI transmissions. One of SPI, HSPI or VSPI. (For
-  now, only HSPI or VSPI are actually supported in the driver; it will support all 3 peripherals
-  somewhere in the future.)
-* Bus: The SPI bus, common to all SPI devices connected to one host. In general the bus consists of the
-  miso, mosi, sclk and optionally quadwp and quadhd signals. The SPI slaves are connected to these
-  signals in parallel.
+=================  =========================================================================================
+Term               Definition
+=================  =========================================================================================
+**Host**           The SPI controller peripheral inside ESP32 that initiates SPI transmissions over the bus, and acts as an SPI Master. This may be the SPI2 or SPI3 peripheral. (The driver will also support the SPI1 peripheral in the future.)
+**Device**         SPI slave device. An SPI bus may be connected to one or more Devices. Each Device shares the MOSI, MISO and SCLK signals but is only active on the bus when the Host asserts the Device's individual CS line.
+**Bus**            A signal bus, common to all Devices connected to one Host. In general, a bus includes the following lines: MISO, MOSI, SCLK, one or more CS lines, and, optionally, QUADWP and QUADHD. So Devices are connected to the same lines, with the exception that each Device has its own CS line. Several Devices can also share one CS line if connected in the daisy-chain manner.
+- **MISO**         Master In, Slave Out, a.k.a. Q. Data transmission from a Device to Host.
+- **MOSI**         Master Out, Slave In, a.k.a. D. Data transmission from a Host to Device.
+- **SCLK**         Serial Clock. Oscillating signal generated by a Host that keeps the transmission of data bits in sync.
+- **CS**           Chip Select. Allows a Host to select individual Device(s) connected to the bus in order to send or receive data.
+- **QUADWP**       Write Protect signal. Only used for 4-bit (qio/qout) transactions.
+- **QUADHD**       Hold signal. Only used for 4-bit (qio/qout) transactions.
+- **Assertion**    The action of activating a line. The opposite action of returning the line back to inactive (back to idle) is called *de-assertion*.
+**Transaction**    One instance of a Host asserting a CS line, transferring data to and from a Device, and de-asserting the CS line. Transactions are atomic, which means they can never be interrupted by another transaction.
+**Launch edge**    Edge of the clock at which the source register *launches* the signal onto the line.
+**Latch edge**     Edge of the clock at which the destination register *latches in* the signal.
+=================  =========================================================================================

-  - miso - Also known as q, this is the input of the serial stream into the ESP32

-  - mosi - Also known as d, this is the output of the serial stream from the ESP32
+Driver Features
+---------------

-  - sclk - Clock signal. Each data bit is clocked out or in on the positive or negative edge of this signal
+The SPI master driver governs communications of Hosts with Devices. The driver supports the following features:

-  - quadwp - Write Protect signal. Only used for 4-bit (qio/qout) transactions.
+- Multi-threaded environments
+- Transparent handling of DMA transfers while reading and writing data
+- Automatic time-division multiplexing of data coming from different Devices on the same signal bus

-  - quadhd - Hold signal. Only used for 4-bit (qio/qout) transactions.
+.. warning::

-* Device: A SPI slave. Each SPI slave has its own chip select (CS) line, which is made active when
-  a transmission to/from the SPI slave occurs.
-* Transaction: One instance of CS going active, data transfer from and/or to a device happening, and
-  CS going inactive again. Transactions are atomic, as in they will never be interrupted by another
-  transaction.
+    The SPI master driver has the concept of multiple Devices connected to a single bus (sharing a single ESP32 SPI peripheral). As long as each Device is accessed by only one task, the driver is thread safe. However, if multiple tasks try to access the same SPI Device, the driver is **not thread-safe**. In this case, it is recommended to either:

-SPI transactions
-^^^^^^^^^^^^^^^^
+    - Refactor your application so that each SPI peripheral is only accessed by a single task at a time.
+    - Add a mutex lock around the shared Device using :c:macro:`xSemaphoreCreateMutex`.

-A transaction on the SPI bus consists of five phases, any of which may be skipped:

-* The command phase. In this phase, a command (0-16 bit) is clocked out.
-* The address phase. In this phase, an address (0-64 bit) is clocked out.
-* The write phase. The master sends data to the slave.
-* The dummy phase. The phase is configurable, used to meet the timing requirements.
-* The read phase. The slave sends data to the master.
+SPI Transactions
+----------------

-In full duplex mode, the read and write phases are combined, and the SPI host reads and
-writes data simultaneously. The total transaction length is decided by
-``command_bits + address_bits + trans_conf.length``, while the ``trans_conf.rx_length``
-only determins length of data received into the buffer.
+An SPI bus transaction consists of five phases which can be found in the table below. Any of these phases can be skipped.

-While in half duplex mode, the host have independent write and read phases. The length of write phase and read phase are
-decided by ``trans_conf.length`` and ``trans_conf.rx_length`` respectively.
+==============  =========================================================================================================
+Phase           Description
+==============  =========================================================================================================
+**Command**     In this phase, a command (0-16 bit) is written to the bus by the Host.
+**Address**     In this phase, an address (0-64 bit) is transmitted over the bus by the Host.
+**Write**       Host sends data to a Device. This data follows the optional command and address phases and is indistinguishable from them at the electrical level.
+**Dummy**       This phase is configurable and is used to meet the timing requirements.
+**Read**        Device sends data to its Host.
+==============  =========================================================================================================

-The command and address phase are optional in that not every SPI device will need to be sent a command
-and/or address. This is reflected in the device configuration: when the ``command_bits`` or ``address_bits``
-fields are set to zero, no command or address phase is done.
+.. todo::

-Something similar is true for the read and write phase: not every transaction needs both data to be written
-as well as data to be read. When ``rx_buffer`` is NULL (and SPI_TRANS_USE_RXDATA) is not set) the read phase
-is skipped. When ``tx_buffer`` is NULL (and SPI_TRANS_USE_TXDATA) is not set) the write phase is skipped.
+   Add a package diagram.
+
+
+The attributes of a transaction are determined by the bus configuration structure :cpp:type:`spi_bus_config_t`, device configuration structure :cpp:type:`spi_device_interface_config_t`, and transaction configuration structure :cpp:type:`spi_transaction_t`.
+
+An SPI Host can send full-duplex transactions, during which the read and write phases occur simultaneously. The total transaction length is determined by the sum of the following members:
+
+- :cpp:member:`spi_device_interface_config_t::command_bits`
+- :cpp:member:`spi_device_interface_config_t::address_bits`
+- :cpp:member:`spi_transaction_t::length`
+
+While the member :cpp:member:`spi_transaction_t::rxlength` only determines the length of data received into the buffer.
+
+In half-duplex transactions, the read and write phases are not simultaneous (one direction at a time). The lengths of the write and read phases are determined by :cpp:member:`length` and :cpp:member:`rxlength` members of the struct :cpp:type:`spi_transaction_t` respectively.
+
+The command and address phases are optional, as not every SPI device requires a command and/or address. This is reflected in the Device's configuration: if :cpp:member:`command_bits` and/or :cpp:member:`address_bits` are set to zero, no command or address phase will occur.
+
+The read and write phases can also be optional, as not every transaction requires both writing and reading data. If :cpp:member:`rx_buffer` is NULL and :cpp:type:`SPI_TRANS_USE_RXDATA` is not set, the read phase is skipped. If :cpp:member:`tx_buffer` is NULL and :cpp:type:`SPI_TRANS_USE_TXDATA` is not set, the write phase is skipped.
+
+The driver supports two types of transactions: the interrupt transactions and polling transactions. The programmer can choose to use a different transaction type per Device. If your Device requires both transaction types, see :ref:`mixed_transactions`.

-The driver offers two different kinds of transactions: the interrupt
-transactions and the polling transactions. Each device can choose one kind of
-transaction to send. See :ref:`mixed_transactions` if your device do require
-both kinds of transactions.

 .. _interrupt_transactions:

-Interrupt transactions
-""""""""""""""""""""""""
+Interrupt Transactions
+^^^^^^^^^^^^^^^^^^^^^^

-The interrupt transactions use an interrupt-driven logic when the
-transactions are in-flight. The routine will get blocked, allowing the CPU to
-run other tasks, while it is waiting for a transaction to be finished.
+Interrupt transactions will block the transaction routine until the transaction completes, thus allowing the CPU to run other tasks.
+
+An application task can queue multiple transactions, and the driver will automatically handle them one-by-one in the interrupt service routine (ISR). It allows the task to switch to other procedures until all the transactions complete.

-Interrupt transactions can be queued into a device, the driver automatically
-send them one-by-one in the ISR. A task can queue several transactions, and
-then do something else before the transactions are finished.

 .. _polling_transactions:

-Polling transactions
-""""""""""""""""""""
+Polling Transactions
+^^^^^^^^^^^^^^^^^^^^

-The polling transactions don't rely on the interrupt, the routine keeps polling
-the status bit of the SPI peripheral until the transaction is done.
+Polling transactions do not use interrupts. The routine keeps polling the SPI Host's status bit until the transaction is finished.

-All the tasks that do interrupt transactions may get blocked by the queue, at
-which point they need to wait for the ISR to run twice before the transaction
-is done. Polling transactions save the time spent on queue handling and
-context switching, resulting in a smaller transaction interval smaller. The
-disadvantage is that the the CPU is busy while these transactions are in
-flight.
+All the tasks that use interrupt transactions can be blocked by the queue. At this point, they will need to wait for the ISR to run twice before the transaction is finished. Polling transactions save time otherwise spent on queue handling and context switching, which results in smaller transaction intervals. The disadvantage is that the CPU is busy while these transactions are in progress.

-The ``spi_device_polling_end`` routine spends at least 1us overhead to
-unblock other tasks when the transaction is done. It is strongly recommended
-to wrap a series of polling transactions inside of ``spi_device_acquire_bus``
-and ``spi_device_release_bus`` to avoid the overhead. (See
-:ref:`bus_acquiring`)
+The :cpp:func:`spi_device_polling_end` routine needs an overhead of at least 1 us to unblock other tasks when the transaction is finished. It is strongly recommended to wrap a series of polling transactions using the functions :cpp:func:`spi_device_acquire_bus` and :cpp:func:`spi_device_release_bus` to avoid the overhead. For more information, see :ref:`bus_acquiring`.

-Command and address phases
+
+Command and Address Phases
 ^^^^^^^^^^^^^^^^^^^^^^^^^^

-During the command and address phases, ``cmd`` and ``addr`` field in the
-``spi_transaction_t`` struct are sent to the bus, while nothing is read at the
-same time. The default length of command and address phase are set in the
-``spi_device_interface_config_t`` and by ``spi_bus_add_device``. When the the
-flag ``SPI_TRANS_VARIABLE_CMD`` and ``SPI_TRANS_VARIABLE_ADDR`` are not set in
-the ``spi_transaction_t``,the driver automatically set the length of these
-phases to the default value as set when the device is initialized respectively.
+During the command and address phases, the members :cpp:member:`cmd` and :cpp:member:`addr` in the struct :cpp:type:`spi_transaction_t` are sent to the bus, nothing is read at this time. The default lengths of the command and address phases are set in :cpp:type:`spi_device_interface_config_t` by calling :cpp:func:`spi_bus_add_device`. If the flags :cpp:type:`SPI_TRANS_VARIABLE_CMD` and :cpp:type:`SPI_TRANS_VARIABLE_ADDR` in the member :cpp:member:`spi_transaction_t::flags` are not set, the driver automatically sets the length of these phases to default values during Device initialization.

-If the length of command and address phases needs to be variable, declare a
-``spi_transaction_ext_t`` descriptor, set the flag ``SPI_TRANS_VARIABLE_CMD``
-or/and ``SPI_TRANS_VARIABLE_ADDR`` in the ``flags`` of ``base`` member and
-configure the rest part of ``base`` as usual. Then the length of each phases
-will be ``command_bits`` and ``address_bits`` set in the ``spi_transaction_ext_t``.
+If the lengths of the command and address phases need to be variable, declare the struct :cpp:type:`spi_transaction_ext_t`, set the flags :cpp:type:`SPI_TRANS_VARIABLE_CMD` and/or :cpp:type:`SPI_TRANS_VARIABLE_ADDR` in the member :cpp:member:`spi_transaction_ext_t::base` and configure the rest of base as usual. Then the length of each phase will be equal to :cpp:member:`command_bits` and :cpp:member:`address_bits` set in the struct :cpp:type:`spi_transaction_ext_t`.

-Write and read phases
+
+Write and Read Phases
 ^^^^^^^^^^^^^^^^^^^^^

-Normally, data to be transferred to or from a device will be read from or written to a chunk of memory
-indicated by the ``rx_buffer`` and ``tx_buffer`` members of the transaction structure.
-When DMA is enabled for transfers, these buffers are highly recommended to meet the requirements as below:
+Normally, the data that needs to be transferred to or from a Device will be read from or written to a chunk of memory indicated by the members :cpp:member:`rx_buffer` and :cpp:member:`tx_buffer` of the structure :cpp:type:`spi_transaction_t`. If DMA is enabled for transfers, the buffers are required to be:

-  1. allocated in DMA-capable memory using ``pvPortMallocCaps(size, MALLOC_CAP_DMA)``;
-  2. 32-bit aligned (start from the boundary and have length of multiples of 4 bytes).
+  1. Allocated in DMA-capable internal memory. If :ref:`external PSRAM is enabled<dma-capable-memory>`, this means using ``pvPortMallocCaps(size, MALLOC_CAP_DMA)``.
+  2. 32-bit aligned (staring from a 32-bit boundary and having a length of multiples of 4 bytes).

-If these requirements are not satisfied, efficiency of the transaction will suffer due to the allocation and
-memcpy of temporary buffers.
+If these requirements are not satisfied, the transaction efficiency will be affected due to the allocation and copying of temporary buffers.
+
+.. note::
+
+    Half-duplex transactions with both read and write phases are not supported when using DMA. For details and workarounds, see :ref:`spi_known_issues`.

-.. note::  Half duplex transactions with both read and write phases are not supported when using DMA. See
-  :ref:`spi_known_issues` for details and workarounds.

 .. _bus_acquiring:

-Bus acquiring
+Bus Acquiring
 ^^^^^^^^^^^^^

-Sometimes you may want to send spi transactions exclusively, continuously, to
-make it as fast as possible. You may use ``spi_device_acquire_bus`` and
-``spi_device_release_bus`` to realize this. When the bus is acquired,
-transactions to other devices (no matter polling or interrupt) are pending
-until the bus is released.
+Sometimes you might want to send SPI transactions exclusively and continuously so that it takes as little time as possible. For this, you can use bus acquiring, which helps to suspend transactions (both polling or interrupt) to other devices until the bus is released. To acquire and release a bus, use the functions :cpp:func:`spi_device_acquire_bus` and :cpp:func:`spi_device_release_bus`.

-Using the spi_master driver
-^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Initialize a SPI bus by calling ``spi_bus_initialize``. Make sure to set the correct IO pins in
-  the ``bus_config`` struct. Take care to set signals that are not needed to -1.
+Driver Usage
+------------

- Tell the driver about a SPI slave device connected to the bus by calling spi_bus_add_device.
-  Make sure to configure any timing requirements the device has in the ``dev_config`` structure.
-  You should now have a handle for the device, to be used when sending it a transaction.
+.. todo::

- To interact with the device, fill one or more spi_transaction_t structure with any transaction
-  parameters you need. Then send them either in a polling way or the interrupt way:
+   Organize the Driver Usage into subsections that will reflect the general usage experience of the users, e.g.,
+
+   Configuration
+   
+   Add stuff about the configuration API here, and the various options in configuration (e.g., configure for interrupt vs. polling), and optional configuration
+
+   Transactions
+   
+   Describe how to execute a normal transaction (i.e., where data is larger than 32 bits). Describe how to configure between big and little-endian. 
+
+   - Add subsub section on how to optimize when transmitting less than 32 bits
+   - Add subsub section on how to transmit mixed transactions to the same device
+
+
+- Initialize an SPI bus by calling the function :cpp:func:`spi_bus_initialize`. Make sure to set the correct I/O pins in the struct :cpp:type:`spi_bus_config_t`. Set the signals that are not needed to ``-1``.
+
+- Register a Device connected to the bus with the driver by calling the function :cpp:func:`spi_bus_add_device`. Make sure to configure any timing requirements the device might need with the parameter ``dev_config``. You should now have obtained the Device's handle which will be used when sending a transaction to it.
+
+- To interact with the Device, fill one or more :cpp:type:`spi_transaction_t` structs with any transaction parameters required. Then send the structs either using a polling transaction or an interrupt transaction:

    - :ref:`Interrupt <interrupt_transactions>`
-        Either queue all transactions by calling ``spi_device_queue_trans``,
-        and at a later time query the result using
-        ``spi_device_get_trans_result``, or handle all requests
-        synchroneously by feeding them into ``spi_device_transmit``.
+        Either queue all transactions by calling the function :cpp:func:`spi_device_queue_trans` and, at a later time, query the result using the function :cpp:func:`spi_device_get_trans_result`, or handle all requests synchronously by feeding them into :cpp:func:`spi_device_transmit`.

    - :ref:`Polling <polling_transactions>`
-        Call the ``spi_device_polling_transmit`` to send polling
-        transactions. Alternatively, you can send a polling transaction by
-        ``spi_device_polling_start`` and ``spi_device_polling_end`` if you
-        want to insert something between them.
+        Call the function :cpp:func:`spi_device_polling_transmit` to send polling transactions. Alternatively, if you want to insert something in between, send the transactions by using :cpp:func:`spi_device_polling_start` and :cpp:func:`spi_device_polling_end`.

- Optional: to do back-to-back transactions to a device, call
-  ``spi_device_acquire_bus`` before and ``spi_device_release_bus`` after the
-  transactions.
+- (Optional) To perform back-to-back transactions with a Device, call the function :cpp:func:`spi_device_acquire_bus` before sending transactions and :cpp:func:`spi_device_release_bus` after the transactions have been sent.

- Optional: to unload the driver for a device, call ``spi_bus_remove_device`` with the device
-  handle as an argument
+- (Optional) To unload the driver for a certain Device, call :cpp:func:`spi_bus_remove_device` with the Device handle as an argument.

- Optional: to remove the driver for a bus, make sure no more drivers are attached and call
-  ``spi_bus_free``.
+- (Optional) To remove the driver for a bus, make sure no more drivers are attached and call :cpp:func:`spi_bus_free`.

-Tips
-""""
+The example code for the SPI master driver can be found in the :example:`peripherals/spi_master` directory of ESP-IDF examples.

-1. Transactions with small amount of data:
-    Sometimes, the amount of data is very small making it less than optimal allocating a separate buffer
-    for it. If the data to be transferred is 32 bits or less, it can be stored in the transaction struct
-    itself. For transmitted data, use the ``tx_data`` member for this and set the ``SPI_TRANS_USE_TXDATA`` flag
-    on the transmission. For received data, use ``rx_data`` and set ``SPI_TRANS_USE_RXDATA``. In both cases, do
-    not touch the ``tx_buffer`` or ``rx_buffer`` members, because they use the same memory locations
-    as ``tx_data`` and ``rx_data``.

-2. Transactions with integers other than uint8_t
-    The SPI peripheral reads and writes the memory byte-by-byte. By default,
-    the SPI works at MSB first mode, each bytes are sent or received from the
-    MSB to the LSB. However, if you want to send data with length which is
-    not multiples of 8 bits, unused bits are sent.
+Transactions with Data Not Exceeding 32 Bits
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-    E.g. you write ``uint8_t data = 0x15`` (00010101B), and set length to
-    only 5 bits, the sent data is ``00010B`` rather than expected ``10101B``.
+When the transaction data size is equal to or less than 32 bits, it will be sub-optimal to allocate a buffer for the data. The data can be directly stored in the transaction struct instead. For transmitted data, it can be achieved by using the :cpp:member:`tx_data` member and setting the :cpp:type:`SPI_TRANS_USE_TXDATA` flag on the transmission. For received data, use :cpp:member:`rx_data` and set :cpp:type:`SPI_TRANS_USE_RXDATA`. In both cases, do not touch the :cpp:member:`tx_buffer` or :cpp:member:`rx_buffer` members, because they use the same memory locations as :cpp:member:`tx_data` and :cpp:member:`rx_data`.

-    Moreover, ESP32 is a little-endian chip whose lowest byte is stored at
-    the very beginning address for uint16_t and uint32_t variables. Hence if
-    a uint16_t is stored in the memory, it's bit 7 is first sent, then bit 6
-    to 0, then comes its bit 15 to bit 8.

-    To send data other than uint8_t arrays, macros ``SPI_SWAP_DATA_TX`` is
-    provided to shift your data to the MSB and swap the MSB to the lowest
-    address; while ``SPI_SWAP_DATA_RX`` can be used to swap received data
-    from the MSB to it's correct place.
+Transactions with Integers Other Than ``uint8_t``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-GPIO matrix and IOMUX
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
+An SPI Host reads and writes data into memory byte by byte. By default, data is sent with the most significant bit (MSB) first, as LSB first used in rare cases. If a value less than 8 bits needs to be sent, the bits should be written into memory in the MSB first manner. 

-Most peripheral signals in ESP32 can connect directly to a specific GPIO, which is called its IOMUX pin. When a
-peripheral signal is routed to a pin other than its IOMUX pin, ESP32 uses the less direct GPIO matrix to make this
-connection.
+For example, if ``0b00010`` needs to be sent, it should be written into a ``uint8_t`` variable, and the length for reading should be set to 5 bits. The Device will still receive 8 bits with 3 additional "random" bits, so the reading must be performed correctly.

-If the driver is configured with all SPI signals set to their specific IOMUX pins (or left unconnected), it will bypass
-the GPIO matrix. If any SPI signal is configured to a pin other than its IOMUx pin, the driver will automatically route
-all the signals via the GPIO Matrix. The GPIO matrix samples all signals at 80MHz and sends them between the GPIO and
-the peripheral.
+On top of that, ESP32 is a little-endian chip, which means that the least significant byte of ``uint16_t`` and ``uint32_t`` variables is stored at the smallest address. Hence, if ``uint16_t`` is stored in memory, bits [7:0] are sent first, followed by bits [15:8].

-When the GPIO matrix is used, signals faster than 40MHz cannot propagate and the setup time of MISO is more easily
-violated, since the input delay of MISO signal is increased. The maximum clock frequency with GPIO Matrix is 40MHz
-or less, whereas using all IOMUX pins allows 80MHz.
+For cases when the data to be transmitted has the size differing from ``uint8_t`` arrays, the following macros can be used to transform data to the format that can be sent by the SPI driver directly:

-.. note:: More details about influence of input delay on the maximum clock frequency, see :ref:`timing_considerations` below.
+- :c:macro:`SPI_SWAP_DATA_TX` for data to be transmitted
+- :c:macro:`SPI_SWAP_DATA_RX` for data received

-IOMUX pins for SPI controllers are as below:
+
+.. _mixed_transactions:
+
+Notes on Sending Mixed Transactions to the Same Device
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To reduce coding complexity, send only one type of transactions (interrupt or polling) to one Device. However, you still can send both interrupt and polling transactions alternately. The notes below explain how to do this.
+
+The polling transactions should be initiated only after all the polling and interrupt transactions are finished.
+
+Since an unfinished polling transaction blocks other transactions, please do not forget to call the function :cpp:func:`spi_device_polling_end` after :cpp:func:`spi_device_polling_start` to allow other transactions or to allow other Devices to use the bus. Remember that if there is no need to switch to other tasks during your polling transaction, you can initiate a transaction with :cpp:func:`spi_device_polling_transmit` so that it will be ended automatically.
+
+In-flight polling transactions are disturbed by the ISR operation to accommodate interrupt transactions. Always make sure that all the interrupt transactions sent to the ISR are finished before you call :cpp:func:`spi_device_polling_start`. To do that, you can keep calling :cpp:func:`spi_device_get_trans_result` until all the transactions are returned.
+
+To have better control of the calling sequence of functions, send mixed transactions to the same Device only within a single task.
+
+
+GPIO Matrix and IO_MUX
+----------------------
+
+Most of ESP32's peripheral signals have direct connection to their dedicated IO_MUX pins. However, the signals can also be routed to any other available pins using the less direct GPIO matrix. If at least one signal is routed through the GPIO matrix, then all signals will be routed through it.
+
+The GPIO matrix introduces flexibility of routing but also brings the following disadvantages:
+
+- Increases the input delay of the MISO signal, which makes MISO setup time violations more likely. If SPI needs to operate at high speeds, use dedicated IO_MUX pins.
+- Allows signals with clock frequencies only up to 40 MHz, as opposed to 80 MHz if IO_MUX pins are used.
+
+.. note::
+
+    For more details about the influence of the MISO input delay on the maximum clock frequency, see :ref:`timing_considerations`.
+
+The IO_MUX pins for SPI buses are given below.

 +----------+------+------+
-| Pin Name | HSPI | VSPI |
+| Pin Name | SPI2 | SPI3 |
 +          +------+------+
 |          | GPIO Number |
 +==========+======+======+
@ -267,98 +257,64 @@ IOMUX pins for SPI controllers are as below:
 | QUADHD   | 4    | 21   |
 +----------+------+------+

-note * Only the first device attaching to the bus can use CS0 pin.
+* Only the first Device attached to the bus can use the CS0 pin.

-.. _mixed_transactions:
-
-Notes to send mixed transactions to the same device
-"""""""""""""""""""""""""""""""""""""""""""""""""""
-
-Though we suggest to send only one type (interrupt or polling) of
-transactions to one device to reduce coding complexity, it is supported to
-send both interrupt and polling transactions alternately. Notes below is to
-help you do this.
-
-The polling transactions should be started when all the other transactions
-are finished, no matter they are polling or interrupt.
-
-An unfinished polling transaction forbid other transactions from being sent.
-Always call ``spi_device_polling_end`` after ``spi_device_polling_start`` to
-allow other device using the bus, or allow other transactions to be started
-to the same device. You can use ``spi_device_polling_transmit`` to simplify
-this if you don't need to do something during your polling transaction.
-
-An in-flight polling transaction would get disturbed by the ISR operation
-caused by interrupt transactions. Always make sure all the interrupt
-transactions sent to the ISR are finished before you call
-``spi_device_polling_start``. To do that, you can call
-``spi_device_get_trans_result`` until all the transactions are returned.
-
-It is strongly recommended to send mixed transactions to the same device in
-only one task to control the calling sequence of functions.
-
-Speed and Timing Considerations
-------------------------------

 .. _speed_considerations:

-Transferring speed
-^^^^^^^^^^^^^^^^^^
+Transfer Speed Considerations
+-----------------------------

-There're three factors limiting the transferring speed: (1) The transaction interval, (2) The SPI clock frequency used.
-(3) The cache miss of SPI functions including callbacks.
-When large transactions are used, the clock frequency determines the transferring speed; while the interval effects the
-speed a lot if small transactions are used.
+There are three factors limiting the transfer speed:

-    1. Transaction interval: It takes time for the software to setup spi
-       peripheral registers as well as copy data to FIFOs, or setup DMA links.
-       When the interrupt transactions are used, an extra overhead is appended,
-       from the cost of FreeRTOS queues and the time switching between tasks and
-       the ISR.
+- Transaction interval
+- SPI clock frequency
+- Cache miss of SPI functions, including callbacks

-            1. For **interrupt transactions**, the CPU can switched to other
-               tasks when the transaction is in flight. This save the cpu time
-               but increase the interval (See :ref:`interrupt_transactions`).
-               For
-               **polling transactions**, it does not block the task but do
-               polling when the transaction is in flight. (See
-               :ref:`polling_transactions`).
+The main parameter that determines the transfer speed for large transactions is clock frequency. For multiple small transactions, the transfer speed is mostly determined by the length of transaction intervals.

-            2.  When the DMA is enabled, it needs about 2us per transaction to setup the linked list. When the master is
-                transferring, it automatically read data from the linked list. If the DMA is not enabled,
-                CPU has to write/read each byte to/from the FIFO by itself. Usually this is faster than 2us, but the
-                transaction length is limited to 64 bytes for both write and read.

-       Typical transaction interval with one byte data is as below:
+Transaction Interval
+^^^^^^^^^^^^^^^^^^^^

-       +--------+----------------+--------------+
-       |        | Typical Transaction Time (us) |
-       +========+================+==============+
-       |        | Interrupt      | Polling      |
-       +--------+----------------+--------------+
-       | DMA    | 24             | 8            |
-       +--------+----------------+--------------+
-       | No DMA | 22             | 7            |
-       +--------+----------------+--------------+
+Transaction interval is the time that software requires to set up SPI peripheral registers and to copy data to FIFOs, or to set up DMA links.

-    2. SPI clock frequency: Each byte transferred takes 8 times of the clock period *8/fspi*. If the clock frequency is
-       too high, some functions may be limited to use. See :ref:`timing_considerations`.
+Interrupt transactions allow appending extra overhead to accommodate the cost of FreeRTOS queues and the time needed for switching between tasks and the ISR.

-    3. The cache miss: the default config puts only the ISR into the IRAM.
-       Other SPI related functions including the driver itself and the callback
-       may suffer from the cache miss and wait for some time while reading code
-       from the flash. Select :ref:`CONFIG_SPI_MASTER_IN_IRAM` to put the whole
-       SPI driver into IRAM, and put the entire callback(s) and its callee
-       functions into IRAM to prevent this.
+For **interrupt transactions**, the CPU can switch to other tasks when a transaction is in progress. This saves the CPU time but increases the interval. See :ref:`interrupt_transactions`. For **polling transactions**, it does not block the task but allows to do polling when the transaction is in progress. For more information, see :ref:`polling_transactions`.

-For an interrupt transaction, the overall cost is *20+8n/Fspi[MHz]* [us] for n bytes tranferred
-in one transaction. Hence the transferring speed is : *n/(20+8n/Fspi)*. Example of transferring speed under 8MHz
-clock speed:
+If DMA is enabled, setting up the linked list requires about 2 us per transaction. When a master is transferring data, it automatically reads the data from the linked list. If DMA is not enabled, the CPU has to write and read each byte from the FIFO by itself. Usually, this is faster than 2 us, but the transaction length is limited to 64 bytes for both write and read.
+
+Typical transaction interval timings for one byte of data are given below.
+
+--------+----------------+--------------+
+|        | Typical Transaction Time (us) |
+========+================+==============+
+|        | Interrupt      | Polling      |
+--------+----------------+--------------+
+| DMA    | 24             | 8            |
+--------+----------------+--------------+
+| No DMA | 22             | 7            |
+--------+----------------+--------------+
+
+
+SPI Clock Frequency
+^^^^^^^^^^^^^^^^^^^
+
+Transferring each byte takes eight times the clock period *8/fspi*. If the clock frequency is too high, the use of some functions might be limited. See :ref:`timing_considerations`.
+
+
+Cache Miss
+^^^^^^^^^^
+
+The default config puts only the ISR into the IRAM. Other SPI related functions, including the driver itself and the callback, might suffer from the cache miss and will need to wait until the code is read from the flash. Select :ref:`CONFIG_SPI_MASTER_IN_IRAM` to put the whole SPI driver into IRAM and put the entire callback(s) and its callee functions into IRAM to prevent cache miss.
+
+For an interrupt transaction, the overall cost is *20+8n/Fspi[MHz]* [us] for n bytes transferred in one transaction. Hence, the transferring speed is: *n/(20+8n/Fspi)*. An example of transferring speed at 8 MHz clock speed is given in the following table.

 +-----------+----------------------+--------------------+------------+-------------+
 | Frequency | Transaction Interval | Transaction Length | Total Time | Total Speed |
 |           |                      |                    |            |             |
-| (MHz)     | (us)                 | (bytes)            | (us)       | (kBps)      |
+| (MHz)     | (us)                 | (bytes)            | (us)       | (KBps)      |
 +===========+======================+====================+============+=============+
 | 8         | 25                   | 1                  | 26         | 38.5        |
 +-----------+----------------------+--------------------+------------+-------------+
@ -371,129 +327,90 @@ clock speed:
 | 8         | 25                   | 128                | 153        | 836.6       |
 +-----------+----------------------+--------------------+------------+-------------+

-When the length of transaction is short, the cost of transaction interval is really high. Please try to squash data
-into one transaction if possible to get higher transfer speed.
+When a transaction length is short, the cost of transaction interval is high. If possible, try to squash several short transactions into one transaction to achieve a higher transfer speed.
+
+Please note that the ISR is disabled during flash operation by default. To keep sending transactions during flash operations, enable :ref:`CONFIG_SPI_MASTER_ISR_IN_IRAM` and set :cpp:class:`ESP_INTR_FLAG_IRAM` in the member :cpp:member:`spi_bus_config_t::intr_flags`. In this case, all the transactions queued before starting flash operations will be handled by the ISR in parallel. Also note that the callback of each Device and their callee functions should be in IRAM, or your callback will crash due to cache miss. For more details, see :ref:`iram-safe-interrupt-handlers`.

-BTW, the ISR is disabled during flash operation by default. To keep sending
-transactions during flash operations, enable
-:ref:`CONFIG_SPI_MASTER_ISR_IN_IRAM` and set :cpp:class:`ESP_INTR_FLAG_IRAM`
-in the ``intr_flags`` member of :cpp:class:`spi_bus_config_t`. Then all the
-transactions queued before the flash operations will be handled by the ISR
-continuously during flash operation. Note that the callback of each devices,
-and their callee functions, should be in the IRAM in this case, or your
-callback will crash due to cache miss.

 .. _timing_considerations:

-Timing considerations
-^^^^^^^^^^^^^^^^^^^^^
+Timing Considerations
+---------------------

-As shown in the figure below, there is a delay on the MISO signal after SCLK
-launch edge and before it's latched by the internal register. As a result,
-the MISO pin setup time is the limiting factor for SPI clock speed. When the
-delay is too large, setup slack is < 0 and the setup timing requirement is
-violated, leads to the failure of reading correctly.
+As shown in the figure below, there is a delay on the MISO line after the SCLK launch edge and before the signal is latched by the internal register. As a result, the MISO pin setup time is the limiting factor for the SPI clock speed. When the delay is too long, the setup slack is < 0, and the setup timing requirement is violated, which results in the failure to perform the reading correctly.

 .. image:: /../_static/spi_miso.png
+   :scale: 40 %
+   :align: center

-.. wavedrom don't support rendering pdflatex till now(1.3.1), so we use the png here
+.. wavedrom does not support rendering pdflatex till now(1.3.1), so we use the png here

 .. image:: /../_static/miso_timing_waveform.png

-The maximum frequency allowed is related to the *input delay* (maximum valid
-time after SCLK on the MISO bus), as well as the usage of GPIO matrix. The
-maximum frequency allowed is reduced to about 33~77% (related to existing
-*input delay*) when the GPIO matrix is used. To work at higher frequency, you
-have to use the IOMUX pins or the *dummy bit workaround*. You can get the
-maximum reading frequency of the master by ``spi_get_freq_limit``.
+The maximum allowed frequency is dependent on:
+
+- ``input_delay_ns`` - maximum data valid time on the MISO bus after a clock cycle on SCLK starts
+- If the IO_MUX pin or the GPIO Matrix is used
+
+When the GPIO matrix is used, the maximum allowed frequency is reduced to about 33~77% in comparison to the existing *input delay*. To retain a higher frequency, you have to use the IO_MUX pins or the *dummy bit workaround*. You can obtain the maximum reading frequency of the master by using the function :cpp:func:`spi_get_freq_limit`.

 .. _dummy_bit_workaround:

-**Dummy bit workaround:** We can insert dummy clocks (during which the host does not read data) before the read phase
-actually begins. The slave still sees the dummy clocks and gives out data, but the host does not read until the read
-phase. This compensates the lack of setup time of MISO required by the host, allowing the host reading at higher
-frequency.
+**Dummy bit workaround**: Dummy clocks, during which the Host does not read data, can be inserted before the read phase begins. The Device still sees the dummy clocks and sends out data, but the Host does not read until the read phase comes. This compensates for the lack of the MISO setup time required by the Host and allows the Host to do reading at a higher frequency.

-In the ideal case (the slave is so fast that the input delay is shorter than an apb clock, 12.5ns), the maximum
-frequency host can read (or read and write) under different conditions is as below:
+In the ideal case, if the Device is so fast that the input delay is shorter than an APB clock cycle - 12.5 ns - the maximum frequency at which the Host can read (or read and write) in different conditions is as follows:

 +-------------+-------------+------------+-----------------------------+
 | Frequency Limit (MHz)     | Dummy Bits | Comments                    |
 +-------------+-------------+ Used       +                             +
-| GPIO matrix | IOMUX pins  | By Driver  |                             |
+| GPIO matrix | IO_MUX pins | By Driver  |                             |
 +=============+=============+============+=============================+
 | 26.6        | 80          | No         |                             |
 +-------------+-------------+------------+-----------------------------+
-| 40          | --          | Yes        | Half Duplex, no DMA allowed |
+| 40          | --          | Yes        | Half-duplex, no DMA allowed |
 +-------------+-------------+------------+-----------------------------+

-And if the host only writes, the *dummy bit workaround* is not used and the frequency limit is as below:
+If the Host only writes data, the *dummy bit workaround* and the frequency check can be disabled by setting the bit `SPI_DEVICE_NO_DUMMY` in the member :cpp:member:`spi_device_interface_config_t::flags`. When disabled, the output frequency can be 80MHz, even if the GPIO matrix is used.

-+-------------------+------------------+
-| GPIO matrix (MHz) | IOMUX pins (MHz) |
-+===================+==================+
-| 40                | 80               |
-+-------------------+------------------+
+:cpp:member:`spi_device_interface_config_t::flags`

-The spi master driver can work even if the *input delay* in the ``spi_device_interface_config_t`` is set to 0.
-However, setting a accurate value helps to: (1) calculate the frequency limit in full duplex mode, and (2) compensate
-the timing correctly by dummy bits in half duplex mode. You may find the maximum data valid time after the launch edge
-of SPI clocks in the AC characteristics chapter of the device specifications, or measure the time on a oscilloscope or
-logic analyzer.
+The SPI master driver can work even if the :cpp:member:`input_delay_ns` in the structure :cpp:type:`spi_device_interface_config_t` is set to 0. However, setting an accurate value helps to:

-.. wavedrom don't support rendering pdflatex till now(1.3.1), so we use the png here
+- Calculate the frequency limit for full-duplex transactions
+- Compensate the timing correctly with dummy bits for half-duplex transactions

-.. image:: /../_static/miso_timing_waveform_async.png
+You can approximate the maximum data valid time after the launch edge of SPI clocks by checking the statistics in the AC characteristics chapter of your Device's specification or measure the time on an oscilloscope or logic analyzer.

-As shown in the figure above, the input delay is usually:
+Please note that the actual PCB layout design and the excessive loads may increase the input delay. It means that non-optimal wiring and/or a load capacitor on the bus will most likely lead to the input delay values exceeding the values given in the Device specification or measured while the bus is floating.

-    *[input delay] = [sample delay] + [slave output delay]*
+Some typical delay values are shown in the following table.

-    1. The sample delay is the maximum random delay due to the
-       asynchronization of SCLK and peripheral clock of the slave. It's usually
-       1 slave peripheral clock if the clock is asynchronize with SCLK, or 0 if
-       the slave just use the SCLK to latch the SCLK and launch MISO data. e.g.
-       for ESP32 slaves, the delay is 12.5ns (1 apb clock), while it is reduced
-       to 0 if the slave is in the same chip as the master.
+----------------------------------------+------------------+
+| Device                                 | Input delay (ns) |
+========================================+==================+
+| Ideal Device                           |      0           |
+----------------------------------------+------------------+
+| ESP32 slave using IO_MUX*              |      50          |
+----------------------------------------+------------------+
+| ESP32 slave using GPIO_MUX*            |      75          |
+----------------------------------------+------------------+
+| ESP32's slave device is on a different physical chip.     |
+-----------------------------------------------------------+

-    2. The slave output delay is the time for the MOSI to be stable after the
-       launch edge. e.g. for ESP32 slaves, the output delay is 37.5ns (3 apb
-       clocks) when IOMUX pins in the slave is used, or 62.5ns (5 apb clocks) if
-       through the GPIO matrix.
+The MISO path delay (valid time) consists of a slave's *input delay* plus master's *GPIO matrix delay*. This delay determines the frequency limit above which full-duplex transfers will not work as well as the dummy bits used in the half-duplex transactions. The frequency limit is:

-Some typical delays are shown in the following table:
+    *Freq limit [MHz] = 80 / (floor(MISO delay[ns]/12.5) + 1)*

-+--------------------+------------------+
-| Device             | Input delay (ns) |
-+====================+==================+
-| Ideal device       |      0           |
-+--------------------+------------------+
-| ESP32 slave IOMUX* |      50          |
-+--------------------+------------------+
-| ESP32 slave GPIO*  |      75          |
-+--------------------+------------------+
-| ESP32 slave is on an independent      |
-| chip, 12.5ns sample delay included.   |
-+---------------------------------------+
-
-The MISO path delay(tv), consists of slave *input delay* and master *GPIO matrix delay*, finally determines the
-frequency limit, above which the full duplex mode will not work, or dummy bits are used in the half duplex mode. The
-frequency limit is:
-
-    *Freq limit[MHz] = 80 / (floor(MISO delay[ns]/12.5) + 1)*
-
-The figure below shows the relations of frequency limit against the input delay. 2 extra apb clocks should be counted
-into the MISO delay if the GPIO matrix in the master is used.
+The figure below shows the relationship between frequency limit and input delay. Two extra APB clock cycle periods should be added to the MISO delay if the master uses the GPIO matrix.

 .. image:: /../_static/spi_master_freq_tv.png

-Corresponding frequency limit for different devices with different *input delay* are shown in the following
-table:
+Corresponding frequency limits for different Devices with different *input delay* times are shown in the table below.

 +--------+------------------+----------------------+-------------------+
 | Master | Input delay (ns) | MISO path delay (ns) | Freq. limit (MHz) |
 +========+==================+======================+===================+
-| IOMUX  | 0                | 0                    | 80                |
+| IO_MUX | 0                | 0                    | 80                |
 + (0ns)  +------------------+----------------------+-------------------+
 |        | 50               | 50                   | 16                |
 +        +------------------+----------------------+-------------------+
@ -512,27 +429,27 @@ table:
 Known Issues
 ------------

-1. Half duplex mode is not compatible with DMA when both writing and reading phases exist.
+1. Half-duplex transactions are not compatible with DMA when both writing and reading phases are used.

   If such transactions are required, you have to use one of the alternative solutions:

-   1. use full-duplex mode instead.
-   2. disable the DMA by setting the last parameter to 0 in bus initialization function just as below:
+   1. Use full-duplex transactions instead.
+   2. Disable DMA by setting the bus initialization function's last parameter to 0 as follows:
      ``ret=spi_bus_initialize(VSPI_HOST, &buscfg, 0);``

-      this may prohibit you from transmitting and receiving data longer than 64 bytes.
-   3. try to use command and address field to replace the write phase.
+      This can prohibit you from transmitting and receiving data longer than 64 bytes.
+   3. Try using the command and address fields to replace the write phase.

-2. Full duplex mode is not compatible with the *dummy bit workaround*, hence the frequency is limited. See :ref:`dummy
+2. Full-duplex transactions are not compatible with the *dummy bit workaround*, hence the frequency is limited. See :ref:`dummy
   bit speed-up workaround <dummy_bit_workaround>`.

-3. ``cs_ena_pretrans`` is not compatible with command, address phases in full duplex mode.
+3. ``cs_ena_pretrans`` is not compatible with the command and address phases of full-duplex transactions.


 Application Example
 -------------------

-Display graphics on the 320x240 LCD of WROVER-Kits: :example:`peripherals/spi_master`.
+The code example for displaying graphics on an ESP32-WROVER-KIT's 320x240 LCD screen can be found in the :example:`peripherals/spi_master` directory of ESP-IDF examples.


 API Reference - SPI Common