NASA, Air Force define cutting-edge next-generation space computer
- 11 April, 2013 21:19
It's not every day a customer wants a computer that can do everything from direct a landing on the surface of Mars, to control descent onto a speeding asteroid or coordinate a flight of satellites -- but NASA and the Air Force's space group aren't everyday customers.
NASA and the Air Force Research Laboratory's Space Vehicles Directorate put out a call today for research and development of what likely will become the next computer system - what the agencies call the Next Generation Space Processor (NGSP) -- to fly onboard a variety of future spacecraft.
The processor is envisioned to be a radiation-hardened, general-purpose multicore chip and associated software, applicable to a broad variety of military and civil space missions, and a broad range of spacecraft sizes and power/mass/volume constraints, NASA stated. Processor applications could include autonomous pinpoint landing with hazard detection and avoidance during entry, descent and landing during moon or Mars missions; real-time segmented mirror control for large space-based telescopes; onboard real-time analysis of multi-megapixel-level hyperspectral image data; autonomous onboard situational analysis and real-time mission planning; and real-time mode-based spacecraft-level fault protection.
"Computer processors and applications aboard spacecraft will need to transform dramatically to take advantage of computational leaps in technology and new mission needs," said Michael Gazarik, associate administrator for NASA's Space Technology Mission Directorate at the agency's headquarters in Washington, in a statement. "NASA's Space Technology Program is teaming with the Air Force to develop the next generation spaceflight processor requirements and propose solutions to meet future high performance space computing needs in the upcoming decades."
The Air Force noted that its requirements could change as its "future space computing needs have not yet been extensively analyzed, but are expected to be similar to those defined in the NASA's." Expected deviations from the NASA requirements include: enhanced radiation hardness, relaxed objectives for low power, power management and fault tolerance; and processor throughput performance objectives (which are expected to be similar to those for future NASA missions, but in some cases may be somewhat higher, the Air Force said). Key drivers for the Air Force include: onboard processing of high rate sensor data; goal directed autonomous operations; situational assessment and rapid response; multi-platform operations; and model based integrated system health management.
The specific processor requirements list is extensive. From the NASA and Air Force announcement the systems should:
- Provide a minimum of 24 processor cores in order to support both highly parallel applications and to provide a high degree of granularity for power management, fault tolerance and program unit distribution. Processor cores should be at least 32 bits wide with full IEEE 754 Floating Point capability.
- Be based on commercially available hardware and software IP (processing cores, external I/O and memory interfaces, software stack and development environment).
- Be capable of executing multiple concurrent applications and parallel processing across the set of cores.
- Provide a minimum of 24 GOPS and 10 GFLOPS performance (concurrent) at 7W or less.
- Provide support for real-time processing.
- Maintain time knowledge across processor array to 1 microsecond or better.
- Support synchronization between cores to 0.1 millisecond or better.
- Support low latency and deterministic timing of communication both internal and external to the processor.
- Be able to receive and distribute real-time interrupts.
- Be radiation hardened to at least 1 Mrad TID.
- Provide built-in self test and ability to remove faulty cores or otherwise recover correct operation.
- Provide a boot sequence that ensures booting up in a known good state.
- Provide 100,000 hours of operation between non-recoverable permanent faults.
- Be able to reset individual cores or a cluster of cores, as determined by the smallest unit of granularity.
- Be able to support a watchdog timer at each core and a synchronized time base across all cores.
- Provide the ability to power off unused cores and other resources.
- Provide "smooth" performance/power scaling, from maximum performance to a minimal throughput at 1W or less utilizing a single core (or smallest unit of granularity), by powering down cores or cluster of cores and other unused resources.
- Support a sleep mode, dissipating less than 100mW and performing no processing while awaiting an external event in order to "wake up" in an operational state. This sleep state can be applied to the entire chip and optionally to individual cores or regions. Upon waking from a sleep state, the processor and/or cores shall resume execution from the point at which they were put to sleep or from a well-defined wake up state.
- Support power and redundancy management on external memory and I/O.
- Providing the ability to, autonomously, in real time, detect errors, prevent propagation of these errors past well defined error containment boundaries, recover proper state and resume proper execution.
- Afford the ability to detect errors and faults in microprocessor cores via acceptance testing, duplex execution and comparison, and triplication and voting -- note that these fault tolerance modes may require features in the hardware and system software to enable efficient operation.
- Provide the ability to detect and route around faulty internal interconnect links and switches without excessive software complication while maximizing the use of the remaining resources.
- Deliver the ability to prevent a single hardware error within a processor core, interconnection fabric or memory/I/O control core, from causing a violation of virtual memory boundaries.
- Provide the ability to enforce partitioning of groups of cores, interconnects and memories into fault containment regions in order to prevent error propagation and to guarantee a working part of the system if a fault occurs in another region, including the ability to prevent these cores from writing into regions of memory reserved for other core groups.
- Offer the ability to concurrently detect and handle errors on internal and external interconnects and memories, and to support fault tolerance using redundant banks and interfaces in external memory and I/O.
- Offer hardware and/or software hooks for fault isolation and reconfiguration.
"It is understood by AFRL and NASA that the objectives listed below are both extremely challenging and may not be complete. [Vendors] are encouraged to suggest alternative approaches to meet the top level objective of defining a processor and associated software suite that provides high performance, fault tolerant, power scalable computing suitable for the broad range of space/mission environments and especially the extreme environments expected in future NASA and Air Force missions," the agencies stated.
One of the main issues that comes with building any space systems is radiation. NASA has talked about space mission's use of radiation-hardened computer chips in the past, noting that such systems contain extra transistors that take more energy to switch on and off. Cosmic rays can't trigger them so easily. Rad-hard chips continue to do accurate calculations when ordinary chips might "glitch." NASA said it relies almost exclusively on these extra-durable chips to make computers space-worthy. But these custom-made chips have some downsides: They're expensive, power hungry, and slow -- as much as 10 times slower than an equivalent CPU in a modern consumer desktop PC. It is always an issue to give spacecraft as much computing horsepower as possible.
An example of what a NASA system handles today: According to NASA, the International Space Station's U.S. segment alone handles 1.5 million lines of flight software code run on 44 computers communicating via 100 data networks transferring 400,000 signals (pressure or temperature measurements, valve positions, etc.). Main control computers have 1.5 gigabytes of total main hard drive storage in the U.S. segment compared to modern PCs, many of which have more than 500 gigabyte hard drives.
Follow Michael Cooney on Twitter: @nwwlayer8 and on Facebook.
Read more about data center in Network World's Data Center section.