Precursor

Mobile, Open Hardware, RISC-V System-on-Chip (SoC) Development Kit

Jun 02, 2021

Project update 18 of 38

Supply Chain, Xous, EMC, and Keyboard Updates

by bunnie

Supply Chain Update

It seems the parade of news covering the “collapse” of the global semiconductor supply chain has been steadily growing. While there is definitely a shortage, this is a cycle the chip industry is familiar with, complete with panic buying and opportunistic market manipulators. And if the past is any predictor of the future, in about 12-18 months we’re going to see a glut of capacity, which may lead to a shuttering of smaller foundries, which then leads to further consolidation in an already too consolidated industry. I am optimistic, however, that the groups working on open silicon will be well-poised to capitalize on the current boom-bust cycle with new, interesting ways to utilize some of the excess foundry capacity in a couple of years.

Fortunately, the pre-purchase of Precursor’s inventory seems to have gone fairly smoothly, with the exception, of course, of the FPGAs. These critical components are still tracking for a mid-November delivery; however I personally give 50/50 odds in the next couple of months I will get an email from the distributor saying delivery has been pushed out even further. We’re doing our best to avoid that outcome, but honestly, our tiny order accounts for maybe two wafers out of the 100k/month rate a modern fab can churn out. We can’t influence the tides, but we’re doing our best to plot a course a tiny interest our size can safely navigate.

Xous OS Update

The upside of the production delay is I’ve had a couple solid months to focus on building infrastructure for the Xous operating system. Since the last time I wrote you about the release of Xous 0.8, we’ve added a few key features that make the operating system start to feel less like scaffolding and more like a real operating system: suspend/resume, audio playback, firmware updates, and hardware-accelerated SHA-512. Since we push our code to Github in real-time as we write it, I refer readers looking for in-depth information to our wiki, project board, issues list, and development branches.

However, I thought it might be interesting to talk a little more about “suspend/resume”. If you want to look for an operation that causes an OS to burst at its seams and fall apart, it’s cutting power to the CPU and peripherals while you’re away from the keyboard, and then asking the OS to magically bring everything back “like nothing happened”. Robust suspend/resume operation is, in my opinion, a key feature that distinguishes a “research-only” OS from a “production” OS. I feel a lack of attention to this feature is unacceptable for any battery-powered end user product. Thus, I prioritized integrating suspend/resume very early on, so it looms over every future architectural decision, and so it is baked into the CI flow.

I think clear documentation goes hand-in-hand with maintainability, so I made sure to extensively document the suspend/resume process as I developed it. In Precursor, we have the advantage that all RAM is battery-backed; but, we have the distinct disadvantage that our FPGA-based SoC – including the CPU – gets completely obliterated upon suspend. It’s the opposite scenario compared to most traditional devices, where one might be able to rely on some CPU register state being kept around in a hibernate or suspend mode, while perhaps main RAM gets wiped.

Xobs laid out the battle plan for Suspend/Resume, which in retrospect was delightful in its elegant simplicity: suspend the OS in an interrupt context, so that all of the processes would already be paged to battery-backed RAM. Then, when we reboot, we have the loader drop us back into the interrupt context. This allows us to re-use the entire OS’s interrupt handler mechanism to coordinate all the messy work of saving RISC-V architectural registers and thread state. Furthermore, the interrupt handler requires minimal setup to enter, as it runs from a well-known location and all other interrupts are already masked upon entry. The interrupt handler code itself just needs to know if its entry was called by the OS for a Suspend or by the bootloader for a Resume.

To accomplish this, we added a new hardware block called the “Suspend/Resume Helper”. Being able to conjure hardware blocks out of thin air to solve OS problems is one of the luxuries of writing an OS for an FPGA-based platform! The Suspend/Resume Helper, or SUSRES for short, has a couple of bits that indicate the reason for the current SUSRES interrupt (e.g., Suspend or Resume), as well as a bit to trigger an interrupt to kick off the entire suspend/resume process. It also has a couple of helper registers that can reach into the “Ticktimer” device – our 64-bit monatomic time-keeping peripheral that counts the number of milliseconds since boot. This ability to reach into the Ticktimer’s state without having to invoke OS calls is crucial for ensuring we don’t lose a single tick of time, or worse yet, “go backwards” in time upon a resume due to a race condition on the resume-side.

The other major trick was figuring out how to coordinate saving and restoring peripherals. These bits of state exist outside of the RISC-V architectural registers, and include the graphics frame buffer, SoC peripheral registers, as well as state within any of the off-chip I2C devices (such as the audio CODEC) which would lose power with the SoC. In order to facilitate this, we created a convenience object that can be bound to the hardware implementation which manages pushing and popping register state into RAM before and after a suspend/resume cycle.

Thus, for many hardware-facing servers, suspend/resume consists of just a few lines of code a bit like these in the main event loop:

Some(api::Opcode::SuspendResume) => xous::msg_scalar_unpack!(msg, token, _, _, _, {
       trng.suspend();
       susres.suspend_until_resume(token).expect("couldn't execute suspend/resume");
       trng.resume();
   }),

The token is a unique ID the suspend/resume manager uses to track which hardware blocks have reported in and to make an informed decision on whether or not we will just go ahead and do a hard-suspend.

The hardware driver itself would need these extensions:

impl Trng {
    pub fn new() -> Trng {
        // other setup code here
        // push a list of registers to store/restore for suspend resume machine
        trng.susres_manager.push(RegOrField::Reg(utra::trng_server::CONTROL), None);
        trng.susres_manager.push(RegOrField::Reg(utra::trng_server::AV_CONFIG), None);
        trng.susres_manager.push(RegOrField::Reg(utra::trng_server::RO_CONFIG), None);
        // more setup code here
    }


    pub fn suspend(&mut self) {
        self.susres_manager.suspend();
    }
    pub fn resume(&mut self) {
        self.susres_manager.resume();
        // hardware specific post-resume tasks, such as:
        // pump the engine to discard the initial 0's in the execution pipeline
        self.get_trng(2);
    }
}

When everything works right, from the server’s standpoint, the .suspend_until_resume() method “does nothing”, even though the system goes through a full power-off. Note any servers which don’t touch hardware are completely oblivious to suspend/resume and so require no special code.

The upshot is we now have a system that can robustly cycle through a CPU-off-to-CPU-on state in a fraction of a second, picking up right where the user left off. Because our LCD can display information even when the CPU is powered off with virtually zero power draw, we can create the illusion of a single user session that can stretch for days with light usage.

UI during Suspend state. The CPU is off, but the LCD is keeping state. The CPU can be awoken by holding down F1+F4 or by an incoming Wifi packet via the EC or an alarm from the Real Time Clock, both of which remain running when the CPU is off.

Going through and implementing suspend/resume in detail was a terrific exercise that heightened my understanding of the Xous architecture, from what happens before boot to beyond.

EMC Testing Update

On the EMC testing front, I’m happy to report we finally have testing samples in China. It took much longer than we had thought to even get the samples into the country. In a pre-pandemic world, I would have tucked three Precursor prototypes into my suitcase, walked across the border into China, and hand-delivered them to the testing facility. Unfortunately, travel to China simply is not feasible currently, so we adopted the strategy of mailing the devices into China for testing. This turned into a Kafkaesque nightmare, because in order to import the devices to China, we had to produce a CCC certificate (China’s version of the FCC); but, of course, the purpose for importing the devices in the first place is to go for FCC/CCC-style testing. We resolved the circular dependency by breaking the prototypes down into their constituent parts, importing the individual pieces, and re-assembling it all the AQS offices in China. That way, we did not have to pass scrutiny as a fully-assembled phone-like communications device. Currently, the devices are in a queue for the actual test and finally, maybe finally in about another month, I’ll have some news about whether or not we passed the test.

The three samples sent for EMC testing, before they were broken down for shipping.

Keyboard Update

We also recently received production samples for the keyboard. A total of five overlays are planned for broad release: QWERTY, AZERTY, QWERTZ, Dvorak, and a “blank black” template one can ink for a fully custom layout.

Overall, I’m fairly pleased with how the samples came out; the production molds look sharp and the printing is crisp for the main text. The alt-text, accessible by a press-hold on a given key, is a bit harder to see, but it’s a tight-rope to walk between visual clutter and ease of finding the octothorp. I personally prefer a cleaner design for the primary characters; although it makes finding a symbol harder initially, I’m betting that eventually users will learn the location of special symbols important to their workflow and it’ll become second nature.

I’m also continuing to develop a Braille version of the keyboard in parallel: we’re now approaching our third-generation of prototypes. It turns out chording is much more demanding in terms of responsiveness and key feel, so our latest prototypes are exploring the trade-offs between portability and tactile feel. One version is attaching full-sized Cherry-MX key-switches to the bezel for ultimate key responsiveness, but with a significant sacrifice in pocket-ability. However, it just might be the case this is the right trade-off for users who live in a world defined by tactile feedback. We also have another version in the works that uses a lower-profile (but still a bit thick for a mobile phone) snap-action tactile switch that balances portability against tactile feel. We’ve also thought about using laptop keyboard “scissor switches”, but unfortunately custom versions of those are quite a bit beyond our development budget at the moment.

That’s it for this update! The plan for this month is mostly to continue improving Xous while we wait on eggshells to hear back from the EMC testing lab. Xobs has been furiously cooking up std support in Rust for Xous, which will be a game-changer that un-gates the development of a number of higher-level features such as the Plausibly Deniable Data Base (PDDB for short, our quirky, non-POSIX answer to a filesystem) and networking support. I’m probably going to take another pass at power-saving extensions; in particular Florent (the maintainer of Litex) has kindly prototyped some extensions that will allow us to gate clocks and reprogram the PLL on the fly. If that goes smoothly, I might kick the tires a bit more on some core cryptographic primitives in preparation for the first draft of the PDDB. Stay safe and get vaccinated if you have the privilege to readily do so!


Sign up to receive future updates for Precursor.

Subscribe to the Crowd Supply newsletter, highlighting the latest creators and projects