An Asynchronous Interface with Robust Control for Globally-Asynchronous Locally-Synchronous Systems

ABSTRACT: Contemporary digital systems must necessarily be based on the “System-on-Chip” (SoC) concept. Especially in relation to the aerospace industry, these systems must overcome some additional engineering challenges concerning reliability, safety and low power. An interesting style for aerospace SoC design is the GALS (Globally Asynchronous, Locally Synchronous) paradigm, which can be used for Very Large Scale Integration – Deep-Sub-Micron (VLSI_DSM) design. Currently, the major drawback in the design of a GALS system is the asynchronous interface (asynchronous wrapper – AW) when being implemented in VLSI_DSM. There is a typical AW design style based on asynchronous controllers that provides communication between modules (called ports), but the port controllers are generally subjected to essential hazard, what decreases the reliability and safety of the full system. Concerning to this main drawback, this paper proposes an AW with robust port controller that shows to be free of essential hazard, besides allowing full autonomy for the locally synchronous modules, creating fault tolerant systems as much as possible. It follows the Delay Insensitive (DI) model interacting with the environment in the Generalized Fundamental Mode (GFM) without the need to insert any delay elements. Additional delay elements, although proposed by some previous work found in literature, are not desirable in aerospace applications. The proposed interface allows working on Ib/Ob mode, showing the DI model is more robust than the QDI model and, therefore, it does not need to meet isochronic fork requirements nor timing analysis. Once an interface presenting similar properties was not found in literature, the proposed architecture proved to have great potential of implementation in practical VLSI_DSM designs, including the aerospace ones, once it overcomes the main engineering challenges of this kind of industry.


INTRODUCTION
Contemporary digital systems are usually implemented on Very Large Scale Integration (VLSI) and must necessarily be based on the "System-on-Chip" (SoC) concept.The reason for that is to satisfy the ever-growing demand for higher performance, reusability and low-power requirements (De Micheli, 2009;Muller-Glaser et al., 2004).Especially in relation to the aerospace industry, these systems must overcome some additional engineering challenges concerning reliability, safety, high complexity and the unavailability of component failure data, generating fault tolerant systems as much as possible (Sues, et al., 2005;Bertuccelli, 2008).SoC circuits are composed of functional modules, which can be the intellectual property cores (IP-cores) from many different vendors.These IP-cores are pre-designed, verified, tested and optimized for high-performance, providing both cost and development time reduction.Once SoC circuits are implemented in deepsub-micron (DSM) technologies (VLSI_DSM) (for example, 70 nm, 500M transistors for chip and f=2,5 GHz), delays caused by wires prove to be big when compared to the gate timing, and the difference between minimal and maximum delays in the gates is significant (Jain et al., 2001;Martin et al., 2006).Therefore, when SoC circuits are implemented using only a global clock signal, they are subjected to speed and power penalties (clock skew, distribution networks etc.), thus making timing analysis very complex (Friedman, 2001).Besides that, the harsh environment found in aerospace applications, with high temperature variations, can make this time analysis even more difficult.Oliveira, D.L., Lussari, E., Sato, S.S. and Faria, L.A.

Asynchronous
project methodologies (Martin et al., 2006;Myers, 2004) can naturally eliminate such challenges by removing the clock signal from the design.Different classes of asynchronous circuits may be used to implement SoCs, which can be built from completely asynchronous modules, but these kinds of circuits are not a widely accepted solution.The main reasons for that refusal are: a) lack of reliable tools for asynchronous design; b) difficulties from hazard-free designing and testing; c) limited culture on asynchronous design; and d) lack of asynchronous IPs (Hardt et al., 2000).
The aerospace industry imposes many additional challenges to the design of dedicated systems, such as the high complexity of systems; main power generation systems; missions' profiles and environment; high demand for new technologies; high reliability and safety requirements; unavailability of component failure data; component sizes; and especially tight schedules, what leaves no room for errors.Any problem in an aerospace system leads to big losses of aircrafts (or spacecraft), crews, missions and revenues.In this context, reliability and robustness are important, leading to lower maintenance cost and lower failure frequency.The objective is always to maximize system performance, while satisfying constraints that ensure a reliable operation (Sues et al., 2005;Bertuccelli, 2008).
Concerning to this special situation and the features of both synchronous and asynchronous systems, intermediate solutions were proposed between "totally synchronous" and "totally asynchronous", such as the Globally Asynchronous, Locally Synchronous methodology (GALS).The term GALS was first used by Chapiro (1984), in his PhD thesis.A GALS system consists of many synchronous functional modules that communicate in the asynchronous form.In this paper, we refer to the GALS systems as digital systems partitioned in functional modules (that may be IPs), which carry their own individual clock signals and are unrelated between modules.An asynchronous communication scheme is provided for the communication between different modules with different clock domains.In order to handle the asynchronous communication between these modules, an interface circuit has to be added around each one of the synchronous modules, which is called an asynchronous wrapper (AW).The AW term was first used by Bormann et al. (1997).This local interface may be built by using local clocks, FIFOs, asynchronous controllers (Input Ports, Output Ports) etc. Techan et al. (2007) show different styles for asynchronous interfaces dedicated to GALS systems.Figure 1 shows a generic interface with a synchronous module as an example.
GALS systems have been successfully used in many implementations, including the Application Specific Integrated Circuit (ASIC) (Gurkaynak et al., 2006;Amini et al., 2006;Miller et al., 2005) and Field Programmable Gate Array (FPGA) (Jia et al., 2005;Kumala et al., 2006;Yuan et al., 2005).Currently, FPGA devices have shown to be a common choice for implementing digital circuits (Muller-Glaser, 2004), growing considerably in recent years.High-performance FPGAs, with up to 50 million gates, can be easily found nowadays, therefore allowing complex digital systems, such as GALS, to be programmed on them (De Micheli, 2009) and to be implemented in CMOS technology, DSM.
Asynchronous interfaces that use communication ports are of main interest, once they allow removing the asynchronous handshake scheme from the synchronous modules, allowing the synchronous module to be developed using standard techniques of synchronous design.Although the GALS methodology has solved problems related to the global clock signal, the communication between modules is already performed in the asynchronous paradigm, therefore being subjected to all its inherent problems.

IMPLEMENTATIONS OF PORTS: DIFFERENT APPROACHS
Different kinds of ports have been synthesized in the logic synthesis style (Myers, 2004).As an example, the ports proposed by Amini et al. (2006) have been specified in Signal Transition Graph (STG), which is a Petri-net-like speficification (Chu, 1987) in the Petrify tool (Cortadella et al., 1997).These ports must meet the isochronic fork requirement (Myers, 2004), but the realization of this requirement in VLSI_DSM presents a high level of difficulty.Furthermore, the STG specification, as well as its synthesis method, is not familiar to synchronous paradigm designers.The ports proposed by Muttersbach et al. (2000), Muttersbach (2001), Reddy Ravi (2001) and Pontes et al. (2007) were specified in Extended Burst-Mode (XBM) and Burst Mode (BM).These ports were implemented, respectively, in 3D (Yun et al., 1999) and minimalist (Fuhrer et al., 1999) tools.They interact with the environment in the generalized fundamental mode (GFM), requiring a timing analysis and being subjected to essential hazard, especially in the DSM technology.Concerning to this last drawback, the insertion of delay elements may be a possible solution (VLSI_DSM), but it degrades the testability and cycle-time of the system.The insertion of delay elements is not adequate when implementing GALS in FPGA as well, because these devices (FPGAs) are not designed to favor the insertion of delay elements.

AVOIDING ESSENTIAL HAZARD IN PORTS CONTROLLERS: INCREASING THE SYSTEM'S RELIABILITY
The XBM specification is quite interesting when describing port controllers, once it is not only "familiar" to synchronous paradigm designers, but also because the method that synthesizes ports described by XBM shows to be simpler when compared to the synthesis by STG (Myers, 2004).Yun et al. (1999) and Nowick (1993) proposed the insertion of delay elements on the feedback wires in order to avoid essential hazard in burst-mode controllers.Oliveira et al. (2008) proposed a sufficient condition that guarantees essential hazard-free operation on burst-mode controller without the need for extra delay elements, when mapped on VLSI_DSM or any type of LUT-based FPGA.The absence of delay elements is highly desirable when considering FPGA devices (difficulties in implementing this kind of elements) and, furthermore, in aerospace applications, in which the harsh environment must change the behavior of electronic components.
This paper proposes robust port controllers for asynchronous interfaces used in GALS style.They are completely free of essential hazard and are described in the XBM specification.The robust controller design for asynchronous interfaces is proposed as a solution to the increasing demand for high reliablility aerospace electronic systems.The paper also shows that the method proposed by Oliveira et al. (2011) to synthesize BM controllers free of essential hazard is improved for XBM controllers.These proposed ports are implemented in the following architectures: "Huffman machine with feedback output" and "standard RS".The use of both architectures enables a better performance of the system, besides being more reliable and providing safer operation for aerospace applications.A new AW for GALS with robust ports is also proposed.Once it is known that a major drawback in the design of asynchronous wrapper is the synthesis of these ports, the proposed AW proved to be very important and robust.These ports are easily implemented both in VLSI_DSM and LUT-based FPGA.Other advantages of this wrapper are: 1) total autonomy to the locally synchronous modules, when interacting with the proposed AW; and 2) its ports interact with the environment in the mode I b /O b , thus not requiring timing analysis and being more robust than the GFM mode.In this mode, a new input burst is immediately accepted when all signals of output burst change their values.All of these achieved features make the proposed architecture a good option for aerospace implementations, once it increases the reliability of the full system, overcoming some of the main challenges in this kind of industry.

DIFFERENT STYLES OF GALS DESIGN
Once the synchronous modules of a GALS system operate at different frequencies and/or different phases, the communication between them is subjected to metastability (Ginosar, 2003).Metastability occurs when a specific signal violates the setup time or the hold time of the memory element, and during any time the output voltage assumes an intermediate value that leads the circuit to achieve a random logic value.Metastability may occur in a timing window defined by the sum of "setup" and "hold" times.So, the GALS design style is determined according to the treatment of metastability, since there different ones in literature.Techan et al. (2007) propose specific taxonomy to classify these styles, in which they basically can be classified into three main styles: a) weak synchronous interface; b) pausible clock interface; and c) asynchronous interface.

WEAK SYNCHRONOUS INTERFACE
This style has three variants: a) heterochronous; b) mesochronous; and c) plesiochronous.In the heterochronous style (footer), the clocks of the synchronous modules run on different nominal frequencies (Techan et al., 2007).On the other hand, in the mesochronous style (from Greek, meso means average), the clocks show the same average frequency, but with different unknown phases, which are generated by the same oscillator (Techan et al., 2007).Finally, in the plesiochronous style (from Greek, plesio means "almost equal"), the clocks operate with equal nominal frequency, but being generated by different oscillators (Techan et al., 2007).These styles always require timing analysis, starting from the knowledge of the clocks and using FIFO as a base, phase adjusters and, sometimes, synchronizers.The advantage of these styles is to enable low latency and high frequency clocks.On the other hand, there is the need for a rigorous timing analysis.Figure 2 shows a mesochronous interface that uses a phase adjuster (timing recovery circuit -TRC).

PAUSIBLE CLOCK INTERFACE
This style, firstly proposed by Chapiro (1984), tackles the problem of metastability by interrupting the clock signal.When data are ready for transmission, the clock is interrupted, enabling data synchronization.The synchronous modules have pausible clock signals.Most often, these clocks are locally generated using a ring oscillator and a mutualexclusion circuit, or arbitrator, which properly generates the pause and restart of the clock (Yun et al., 1999).The potential advantages of this style are the robustness in the treatment of metastability and power reduction.On the other hand, the weakness of this style is the possibility of "deadlock" and "jitter" (Mullins et al., 2007).Different architectures have been proposed for pausible clocks, for example, the one involving FIFO (Techan et al., 2007).Figure 3 shows an architecture involving pausible clock as an example.

ASYNCHRONOUS INTERFACE
This style uses circuits known as synchronizers and handshaking signals.The synchronous modules have clocks running freely at different frequencies, without any prior knowledge about their timing.Data are synchronized from one clock domain to another.Some examples of data synchronizers are the well known "two registers", or "double latches" (Mullins et al., 2007), or some other more elaborated synchronization schemes, such as the "synchronization pipeline" (Sjogren et al., 2000) and "FIFOs" (Dobkin et al., 2006).The proposed synchronizers do not totally eliminate failure due to metastability, once the probability of failure different of zero percent remains (Dobkin et al., 2006).The "two register" synchronizer presents as advantages its simplicity and robustness, but as a disadvantage there is an increasing area, power, and especially a high penalty in latency times, which leads to an increase of two clock cycles.Figure 4 shows the architecture of asynchronous interface as an example.

COMMUNICATION CONTROLLERS (PORTS)
GALS systems require asynchronous communication links, which can require two kinds of communication protocols: two or four stages handshaking.The ports can work as active (generating the "request" signal) or passive (generating the "acknowledge" signal).In GALS design there are two types of communication controllers: a) port of "demand", b) port of "poll" (inquiry).In the port of demand, the data being transferred are immediately required after the communication.Therefore, in this type of controller the clock must be immediately stopped (paused) and reactivated (restarted) when communication is done.In the port of poll, the clock is not stopped immediately.It defines when it is "safe" to send the data.The clock is stopped (paused) only in cases when there is the need for additional time, in order to troubleshoot eventual metastability.

XBM-EHF SPECIFICATION: CONDITION
BM is a kind of specification based on a state transition graph which was first proposed by Davis et al., (1979), later formalized by Nowick (1993), and improved by Yun et al. (1999) as XBM.It allows multiple input changes and is usually used to describe Mealy Asynchronous Finite State Machines (FSM).These machines interact with the environment in GFM.In GFM, a new input burst can only occur if the controller is stable (with no activity in the ports or in the lines).The XBM specification supports the BM specification, introducing two kinds of input signals: a) conditional signal that is sensitive to level, showing nonmonotonic behavior; and b) "directed don't care signals" that can activated concurrently with the output signals.
In this paper, the XBM specification is illustrated with the benchmark Biufifo2dma of the HP (see Fig. 5), with four inputs (cntgt1,dackn, fain,ok), two outputs (dreq,frout) and initial state 0. The description fain-dackn+/ frout+ in transition 4→3 means that the output (frout: 0→1) will follow the input burst (fain: 1→0 AND dackn: 0→1).Signals not enclosed in angle brackets and ending with + or -are "terminating".Signals enclosed in angle brackets are "conditionals", which are level sensitive with non-monotonic behavior.The input signals dackn, fain and ok are transition sensitive signals (TSS).The level sensitive signal cntgt1 is used to describe the mutual exclusion between transitions 2→5 and 2→4.The "directed don't care signal" fain * in transition 2→4 means that fain may either change its value or remain in its previous value.All state transition should have at least one signal called "compulsory".A compulsory signal is an input signal that, in the previous state transition, is not directed to don't care.
A TSS input signal in a XBM specification is considered as a context signal in a transition A→B if it does not change its value during such transition (it is not on the label).On the other hand, it is considered as a trigger signal if it is labeled during this transition.The input burst of each state transition can be represented by an input transition cube (ITC).For example, the ITC in state transition 0→1 on Fig. 5 is cntgt1, dackn, fain, ok=2102 (the number 2 means"don't care").In this example, ok is a trigger signal, while dackn and fain are context signals (whose values are 1 and 0, respectively).
Definition 1.1: Let A and B be a pair of total states in a XBM specification, and I b /O b be the input/output burst for the A→B transition.Let E s be one "terminating" input (E s ∈ I b ).E s is considered as an essential signal if it is a context signal on all transitions that address state A and is a trigger signal on the transition A→B.
For instance (see Fig. Oliveira, D.L., Lussari, E., Sato, S.S. and Faria, L.A. As an example, Fig. 6 shows the HP-mp-for-pkt benchmark described by a BM specification.On all transition labels there is at least one essential signal.Therefore, it is a BM-EHF specification. Figure 7 shows the state flow map of HP-mpfor-pkt.As shown by Oliveira et al. (2008), which applies the rule generalized Ungle to check for essential hazard, this states flow map is subjected to essential hazard.The essential hazard depends on the code that the don't-care assumes, when held the logic coverage free of logic hazard.

ESSENTIAL CUBE CONDITION
Lemma 1.1 is a necessary and sufficient condition for an essential-hazard-free specification, but not for hazard-free implementation.The super-state concept will guarantee the latter condition.According to Oliveira et al. (2008), the concept of super-state is presented.It is used to obtain an implementation EHF.To simplify the implementation EHF, in this article we generalize the concept of super-state introducing the idea of essential cube.
Definition 1.2: Consider an input burst I b (a, b, ..n) and an output burst O b (x, y,..m).We call a super-state the set of single total states defined by all 0/1 combinations of a subset S Ib of the input burst signals, keeping all the remaining input signals and all the output signals constant.
Definition 1.3: Consider a XBM-EHF specification and a super-state F of the state transition T, so that FÎ XBM-EH, whereas T is labeled by Ib/Ob.We call essential cube of transition T all the total states related to the 0/1 combinations of input burst and output burst (Ib/Ob).Whereas the states not reachable in the cube T are encoded with the value of context signals, and trigger signals are don't-care.
A super-state XBM flow map is derived from a XBM-EHF specification by applying definition 1.2 to all total states.The essential cube is composed of 2 N states, in which N is the total number of input signals plus the output signals that are labeled in a state transition.Figures 8a-d are part of the flow map for the BM specification described in Fig. 6.Cells in blue are used to compose super-states and essential cubes (applying the definition 1.3).For example, the 0→1 transition (see Fig. 8a,b and 9a,b) creates superstate 1 composed of two total states: AckPB Req Ackout Allockoutbound Ack AllocPB RTS= [0010001,0000001].State 0010001 is the final total state.Figure 9b shows the essential cube of the state transition 0→1, in which the next total states in blue are not reachable and belong to the essential cube.Due to the delays of gates and wires, the state totals which not are reachable can become reachable.For example, the 1→2 transition (see Fig. 8c,d and 9c,d) creates super-state 2, composed of four total states: AckPB Req Ackout Allockoutbound Ack AllocPB RTS= [1000010, 1100010 ,0100010,0000010].State 0100010 is the final total state.Figures 9b,d respectively, show the essential cubes of the transitions 0→1 and 1→2.Lemma 1.2 and theorem 1.1 show the robustness of our controls.
Lemma 1.2 -Let T (B→A) be a state transition of XBM-EHF specification labeled by Ib/Ob and let an input signal any Is ∈ Ib and an output signal any Os ∈ Ob.If T is described by an essential cube, then in whatever order and whatever the time of arrival of Is and Os activation the total generated states belong to the essential cube T, and they all lead to the final total state A. Proof: As a cube essential, T consists of 2 N total states, in which N is the sum of the signals that compose the input burst (Ib) and the output burst (Ob).As the next states not reachable in the transition B→A are encoded in the way in which the signals of Ib and Ob are don't-care, then whatever combination of activations of the signals Is and Os in T, the total states generated will belong to the cube essential T and lead to final state A, therefore the cube essential is free of essential hazard.
Theorem 1: The XBM-EHF specification has an EHF implementation in the "Huffman machine architectures with feedback output" or "standard RS" if ∀ state transition T (B→A) ∈ XBM-EHF, all your activation is covered by the cube essential T.
Proof: Lemma 1.2 says that if the XBM specification is EHF, then whatever the state transition T ∈ XBM has a cube essential T. As the context signals in T transition remains as a constant value in all the next states, which are reachable and not reachable in the cube essential, then regardless of the delays of gates and wires of the architectures, the activation of the next state belongs to the cube essential.As essential cube is EHF according to lemma 1.2, then the implementations on both architectures are EHF.

ASYNCHRONOUS WRAPPERS: ARCHITECTURE
The main objective of the proposed architecture is to provide a weak interface interaction between the locally synchronous module (LSM) and the asynchronous interface.Figure 10 shows the two different variables, "data available" and "data accept", as the only ones used for communication between LSM and the interface.When data available='1' , data is ready to be transmitted, while when data accept='1' the data was received.Our architecture is based on the architecture proposal described by Reddy Ravi (2001).
Figure 11 shows the architecture of the proposed output communication control, which implements the weak interaction between the interface and the LSM, while Figs. 12 and 13 show the proposed input and output asynchronous wrapper, respectively, with the insertion of a gated clock generator.Finally, Fig. 14 shows the full proposed AW that receives and transmits data.

GATED-CLOCK GENERATOR
In this paper, a gated-clock generator (GCG) composed basically by two synchronizers and a gated-clock is also proposed.Figure 15 shows the timing diagram of the proposed GCG with the activation and deactivation of signal GCLK.While Fig. 16 shows the architecture of GCG, Fig. 17 shows the topology of the gated-clock and Fig. 18 shows the topology of its synchronizer.The stopping (pause) of the GCLK signal occurs when R CLK switches 0→1 and after two clock cycles the signal "Stop" switches 0→1, thus determining the stopping (interruption) of signal GCLK.

DESIGN: PORTS (AFSM)
The input/output ports used in the proposed AW were previously proposed by Muttersbach et al. (2000) and Muttersbach (2001).They are described in the XBM specification (as shown in Figs.19 and 20).The XBM specification of the input/output ports meets the essential signal concept, therefore XBM_EHF.

PROCEDURE: SYNTHESIS OF PORTS
The ports designing method starts from the XBM description and is synthesized in four steps: • Use the algorithm of Yun et al. (1999) and derive the minimum set of XBM flow charts; • Encode XBM flow tables using the adjacency diagram (Unger, 1969); • For each coded XBM flow table, insert the essential superstates, as seen in the previous section; • Perform the logic minimization, logic-hazard-free, for each "non-input" signal in the "standard RS" and "machine Huffman with output fed back" architectures (Oliveira et al., 2008).
Figure 21 shows the state flow map of the output port, with the introduction of a state signal 'Z' to solve conflicts, while Fig. 22 shows all the minterms (black and blue) used  in logic coverage, which ensure the output port to be free of essential hazard.The output port was implemented in the architectures "Huffman machine with output feedback" and "standard RS".Figures 23-26 show the logic coverage free of logic hazard using the Karnaugh maps.Finally, Figs.27 and 28 show, respectively, the logic circuits of output and input ports.
DISCUSSION & SIMULATION Oliveira et al. (2011) present a list of advantages of the GALS system, which leads to the conclusion that GALS design can play a relevant role in the future of digital design in all kind of applications, including aerospace ones.However, a major drawback to this use is the asynchronous interface.
Focusing on this kind of application, the proposed hazardfree asynchronous interface proved to have a great potential, being highly desirable for the aerospace industry, once it overcomes the main challenges of this industry, thus increasing the reliability of the full system.In the treatment of essential hazard, our ports support any type of mapping either for VLSI_DSM or PLDs devices.It follows the Delay Insensitive model (DI) (Myers, 2004), restricted to interact with the environment in GFM, but without the insertion of any delay elements.This interface allows working in I b /O b mode, showing that the DI model is more robust than the QDI model, therefore not needing to meet isochronic fork requirements.An interface presenting similar properties was not found in literature.Figures 29 and 30 show simulations of I/O ports of the proposed AW, which show that the proposed architecture satisfies the XBM specification, are hazard-free and robust.

CONCLUSION
GALS systems implemented in VLSI_DSM are an interesting design style for SoCs, however, typical problems concerning the asynchronous interface, especially for AW design, proves to be major drawbacks.In relation to aerospace applications, in which reliability and safety are major constraints, these drawbacks are prohibitive.Concerning this situation, a new architecture to AW was proposed in order to overcome the previously discussed problems, showing to be a good option for those designers who need to implement GALS in VLSI_DSM, including for aerospace applications, once it improves the reliability of the system, thus eliminating essential hazards.The achieved results showed that the proposed architecture is completely free of essential hazard and allows full autonomy for the locally synchronous modules.It follows the DI model, interacts with the environment in GFM without the need to insert any delay elements, as suggested by the previous papers found in literature, and allows working in Ib/Ob mode, proving to be more robust than the QDI model and, therefore, not needing to meet isochronic fork requirements nor requiring timing analysis.Since an interface presenting similar properties was not found in literature, the proposed architecture showed to have a great potential of implementation in all VLSI_DSM systems, including the aerospace ones, in which the harsh environment imposes additional challenges to the designers.Future work leads to a robust asynchronous interface for the implementation of GALS, involving FIFO and an application aimed for software-defined radio.