2016年9月7日星期三

New colours, new chips and waterproofing tipped for the iPhone 7 next week

It would be a surprise of unprecedented proportions if Apple didn't launch a brand new iPhone next Wednesday, and those charged with speculating about what's in store are having one last stab at some educated guesses this weekend.
KGI Securities analyst Ming-Chi Kuo, who has a respectable record at predicting Apple's next movements, has gone on record with a fresh batch of claims this weekend about the upcoming iPhone 7 - some of them we've heard before, and some of them are new.
Ming-Chi Kuo says the new iPhones will rock a new Apple A10 chip, which is apparently going to be substantially quicker than the A9 (and clocked at up to 2.4GHz). There are going to be two new colours, "dark black" and "piano black", while the classic silver grey look is going to be retired, he says.
Waterproofing, headphone jacks and more
We've previously heard rumours of waterproofing and the KGI Securities analyst is in agreement, predicting the iPhone 7 will have the same rating as the Apple Watch - specifically, being able to last for 30 minutes under 1 metre of water.
The headphone jack is indeed going away, Ming-Chi Kuo reckons, and he also agrees with many tipsters that a new dual-lens camera is coming to the larger 5.5-inch iPhone this year. That iPhone 7 Plus will have an extra gigabyte of RAM compared with the smaller model to help with the image processing (3GB vs 2GB).
16GB and 64GB storage options will be ditched, so your choice is going to be between 32GB, 128GB and 256GB, according to the analyst, and there are going to be numerous other minor upgrades too. In just a few days' time we're going to know for sure what Apple's been making - and we'll bring you the news as soon as it breaks.

2016年9月6日星期二

Quanta Compute Plug—a computer the size and shape of an AC adapter


One of the biggest IT trade shows in Asia is going on right now in Taipei in Taiwan and one of the products being shown is grabbing a lot of attention—the Quanta Compute Plug—a complete computer that is the size and shape of an AC adapter. Initial reports suggest its main purpose is to convert a flat screen TV to a really "smart" TV.


Computers have been shrinking in size for many years, perhaps in reaction to the development of smartphones and tablet computers. Now, instead of taking up a desk, or serving as a portable device such as a laptop, computer makers are creating fully functional computers the size of thumb drives, or in this case, an AC adapter—complete with prongs. Of course it does not come with a hard drive, keyboard, mouse or screen, but some of those can be added because it does have two 3.0 USB ports and one HDMI port.
The Compute Plug was demonstrated at the show by Microsoft VP, Nick Parker while he was giving the keynote address. That was because the device runs Windows 10 (though which flavor is still not clear)—Parker showed how it can be used with Cortana and a Bluetooth headset allowing for hands free operation.
The idea behind such computers is apparently, to allow consumers to have multiple inexpensive computers in their home, each dedicated to certain specific tasks—tasks that typically cannot be done with a phone or tablet computer. The Compute Plug, for example, would allow for running full blown Word using a full sized keyboard, eliminating the need for lugging around a laptop, for students, or perhaps journalists. The downside to such a tiny computer, is of course, its low-power processor, hence its more dedicated use—not enough for massive gaming applications, but certainly enough to push 4K video to a screen. The Plug is not the first wall-plug computer, but it is the smallest thus far, and the first to run Windows.
Unfortunately, Parker did not give any specifics regarding when the little computer might be for sale, or how much it might cost. That presumably will come later from Quanta reps.

2016年9月5日星期一

New chip design makes parallel programs run many times faster and requires one-tenth the code


Computer chips have stopped getting faster. For the past 10 years, chips' performance improvements have come from the addition of processing units known as cores.

In theory, a program on a 64-core machine would be 64 times as fast as it would be on a single-core machine. But it rarely works out that way. Most computer programs are sequential, and splitting them up so that chunks of them can run in parallel causes all kinds of complications.
In the May/June issue of the Institute of Electrical and Electronics Engineers' journal Micro, researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new chip design they call Swarm, which should make parallel programs not only much more efficient but easier to write, too.
In simulations, the researchers compared Swarm versions of six common algorithms with the best existing parallel versions, which had been individually engineered by seasoned software developers. The Swarm versions were between three and 18 times as fast, but they generally required only one-tenth as much code—or even less. And in one case, Swarm achieved a 75-fold speedup on a program that computer scientists had so far failed to parallelize.
"Multicore systems are really hard to program," says Daniel Sanchez, an assistant professor in MIT's Department of Electrical Engineering and Computer Science, who led the project. "You have to explicitly divide the work that you're doing into tasks, and then you need to enforce some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit synchronization, to make parallel programming much easier. There's an especially hard set of applications that have resisted parallelization for many, many years, and those are the kinds of applications we've focused on in this paper."
Many of those applications involve the exploration of what computer scientists call graphs. A graph consists of nodes, typically depicted as circles, and edges, typically depicted as line segments connecting the nodes. Frequently, the edges have associated numbers called "weights," which might represent, say, the strength of correlations between data points in a data set, or the distances between cities.
Graphs crop up in a wide range of computer science problems, but their most intuitive use may be to describe geographic relationships. Indeed, one of the algorithms that the CSAIL researchers evaluated is the standard algorithm for finding the fastest driving route between two points.
Setting priorities
In principle, exploring graphs would seem to be something that could be parallelized: Different cores could analyze different regions of a graph or different paths through the graph at the same time. The problem is that with most graph-exploring algorithms, it gradually becomes clear that whole regions of the graph are irrelevant to the problem at hand. If, right off the bat, cores are tasked with exploring those regions, their exertions end up being fruitless.
Of course, fruitless analysis of irrelevant regions is a problem for sequential graph-exploring algorithms, too, not just parallel ones. So computer scientists have developed a host of application-specific techniques for prioritizing graph exploration. An algorithm might begin by exploring just those paths whose edges have the lowest weights, for instance, or it might look first at those nodes with the lowest number of edges.
What distinguishes Swarm from other multicore chips is that it has extra circuitry for handling that type of prioritization. It time-stamps tasks according to their priorities and begins working on the highest-priority tasks in parallel. Higher-priority tasks may engender their own lower-priority tasks, but Swarm slots those into its queue of tasks automatically.
Occasionally, tasks running in parallel may come into conflict. For instance, a task with a lower priority may write data to a particular memory location before a higher-priority task has read the same location. In those cases, Swarm automatically backs out the results of the lower-priority tasks. It thus maintains the synchronization between cores accessing the same data that programmers previously had to worry about themselves.
Indeed, from the programmer's perspective, using Swarm is pretty painless. When the programmer defines a function, he or she simply adds a line of code that loads the function into Swarm's queue of tasks. The programmer does have to specify the metric—such as edge weight or number of edges—that the program uses to prioritize tasks, but that would be necessary, anyway. Usually, adapting an existing sequential algorithm to Swarm requires the addition of only a few lines of code.
Keeping tabs
The hard work falls to the chip itself, which Sanchez designed in collaboration with Mark Jeffrey and Suvinay Subramanian, both MIT graduate students in electrical engineering and computer science; Cong Yan, who did her master's as a member of Sanchez's group and is now a PhD student at the University of Washington; and Joel Emer, a professor of the practice in MIT's Department of Electrical Engineering and Computer Science, and a senior distinguished research scientist at the chip manufacturer NVidia.
The Swarm chip has extra circuitry to store and manage its queue of tasks. It also has a circuit that records the memory addresses of all the data its cores are currently working on. That circuit implements something called a Bloom filter, which crams data into a fixed allotment of space and answers yes/no questions about its contents. If too many addresses are loaded into the filter, it will occasionally yield false positives—indicating "yes, I'm storing that address"—but it will never yield false negatives.
The Bloom filter is one of several circuits that help Swarm identify memory access conflicts. The researchers were able to show that time-stamping makes synchronization between cores easier to enforce. For instance, each data item is labeled with the time stamp of the last task that updated it, so tasks with later time-stamps know they can read that data without bothering to determine who else is using it.
Finally, all the cores occasionally report the time stamps of the highest-priority tasks they're still executing. If a core has finished tasks that have earlier time stamps than any of those reported by its fellows, it knows it can write its results to memory without courting any conflicts.
"I think their architecture has just the right aspects of past work on transactional memory and thread-level speculation," says Luis Ceze, an associate professor of computer science and engineering at the University of Washington. "'Transactional memory' refers to a mechanism to make sure that multiple processors working in parallel don't step on each other's toes. It guarantees that updates to shared memory locations occur in an orderly way. 
Thread-level speculation is a related technique that uses transactional-memory ideas for parallelization: Do it without being sure the task is parallel, and if it's not, undo and re-execute serially. Sanchez's architecture uses many good pieces of those ideas and technologies in a creative way."

2016年9月4日星期日

Engineers develop the first on-chip RF circulator that doubles WiFi speeds with a single antenna


Last year, Columbia Engineering researchers were the first to invent a technology—full-duplex radio integrated circuits (ICs)—that can be implemented in nanoscale CMOS to enable simultaneous transmission and reception at the same frequency in a wireless radio. That system required two antennas, one for the transmitter and one for the receiver. And now the team, led by Electrical Engineering Associate Professor Harish Krishnaswamy, has developed a breakthrough technology that needs only one antenna, thus enabling an even smaller overall system. This is the first time researchers have integrated a non-reciprocal circulator and a full-duplex radio on a nanoscale silicon chip. The circulator research is published online April 15 in Nature Communications and the paper detailing the single-chip full-duplex radio with the circulator and additional echo cancellation was presented at the 2016 IEEE International Solid-State Circuits Conference on February 2.


"This technology could revolutionize the field of telecommunications," says Krishnaswamy, director of the Columbia High-Speed and Mm-wave IC (CoSMIC) Lab. "Our circulator is the first to be put on a silicon chip, and we get literally orders of magnitude better performance than prior work. Full-duplex communications, where the transmitter and the receiver operate at the same time and at the same frequency, has become a critical research area and now we've shown that WiFi capacity can be doubled on a nano scale silicon chip with a single antenna. This has enormous implications for devices like smartphones and tablets."
Krishnaswamy's group has been working on silicon radio chips for full duplex communications for several years and became particularly interested in the role of the circulator, a component that enables full-duplex communications where the transmitter and the receiver share the same antenna. In order to do this, the circulator has to "break" Lorentz Reciprocity, a fundamental physical characteristic of most electronic structures that requires electromagnetic waves travel in the same manner in forward and reverse directions.
"Reciprocal circuits and systems are quite restrictive because you can't control the signal freely," says PhD student Negar Reiskarimian, who developed the circulator and is lead author of the Nature Communications paper. "We wanted to create a simple and efficient way, using conventional materials, to break Lorentz Reciprocity and build a low-cost nanoscale circulator that would fit on a chip. This could open up the door to all kinds of exciting new applications."
The traditional way of breaking Lorentz Reciprocity and building radio-frequency circulators has been to use magnetic materials such as ferrites, which lose reciprocity when an external magnetic field is applied. But these materials are not compatible with silicon chip technology, and ferrite circulators are bulky and expensive. Krishnaswamy and his team were able to design a highly miniaturized circulator that uses switches to rotate the signal across a set of capacitors to emulate the non-reciprocal "twist" of the signal that is seen in ferrite materials. Aside from the circulator, they also built a prototype of their full-duplex system—a silicon IC that included both their circulator and an echo-cancelling receiver—and demonstrated its capability at the 2016 IEEE International Solid- State Circuits Conference this past February.
"Being able to put the circulator on the same chip as the rest of the radio has the potential to significantly reduce the size of the system, enhance its performance, and introduce new functionalities critical to full duplex," says PhD student Jin Zhou, who integrated the circulator with the full-duplex receiver that featured additional echo cancellation.

Non-reciprocal circuits and components have applications in many different scenarios, from radio-frequency full-duplex communications and radar to building isolators that prevent high-power transmitters from being damaged by back-reflections from the antenna. The ability to break reciprocity also opens up new possibilities in radio-frequency signal processing that are yet to be discovered. Full-duplex communications is of particular interest to researchers because of its potential to double network capacity, compared to half-duplex communications that current cell phones and WiFi radios use. The Krishnaswamy group is already working on further improving the performance of their circulator, and exploring "beyond-circulator" applications of non-reciprocity.
"What really excites me about this research is that we were able to make a contribution at a theoretically fundamental level, which led to the publication in Nature Communications, and also able to demonstrate a practical RF circulator integrated with a full-duplex receiver that exhibited a factor of nearly a billion in echo cancellation, making it the first practical full-duplex receiver chip and which led to the publication in the 2016 IEEE ISSCC," Krishnaswamy adds. "It is rare for a single piece of research, or even a research group, to bridge fundamental theoretical contributions with implementations of practical relevance. It is extremely rewarding to supervise graduate students who were able to do that!"

2016年9月2日星期五

Self-folding robot walks, swims, climbs, dissolves


A demo sparking interest at the ICRA 2015 conference in Seattle was all about an origami robot that was worked on by researchers. More specifically, the team members are from the computer science and artificial intelligence lab at MIT and the department of informatics, Technische Universitat in Germany. "An untethered miniature origami robot that self-folds, walks, swims, and degrades" was the name of the paper, co-authored by Shuhei Miyashita. They focused on an origami robot that does just what the paper's title suggests. A video showing the robot in action showcases each move.

One can watch the robot walking on a trajectory, walking on human skin, delivering a block; swimming (the robot has a boat-shaped body so that it can float on water with roll and pitch stability); carrying a load (0.3 g robot); climbing a slope; and digging through a stack. It also shows how a polystyrene model robot dissolves in acetone.
Even Ackerman in IEEE Spectrum reported on the Seattle demo. Unfolded, the robot has a magnet and PVC sandwiched between laser-cut structural layers (polystyrene or paper). How it folds: when placed on a heating element, the PVC contracts, and where the structural layers have been cut, it creates folds, said Ackerman. The self-folding exercise takes place on a flat sheet; the robot folded itself in a few seconds. Kelsey Atherton in Popular Science, said, "Underneath it all, hidden like the Wizard of Oz behind his curtain, sit four electromagnetic coils, which turn on and off and makes the robot move forward in a direction set by its shape."
When placed in the tank of acetone, the robot dissolves, except for the magnet. The authors noted "minimal body materials" in their design enabled the robot to completely dissolve in a liquid environment, "a difficult challenge to accomplish if the robot had a more complex architecture."
Possible future directions: self-folding magnetic sensors into the body of the robot, which could lead to autonomous operation, and eventually, even inside the human body. The authors wrote, "Such autonomous '4D-printed' robots could be used at unreachable sites, including those encountered in both in vivo and bionic biological treatment."
Atherton said, for example, future designs based on this robot could be even smaller, and could work as medical devices sent under the skin.
Origami robots—reconfigurable robots that can fold themselves into arbitrary shapes—was discussed in an article last year in MIT News, quoting Ronald Fearing, a professor of electrical engineering and computer science at the University of California at Berkeley. 
Origami robotics, he said, is "a pretty powerful concept, because cutting planar things and folding is an inherently very low-cost process." He said, "Folding, I think, is a good way to get to the smaller robots."

2016年9月1日星期四

New microchip demonstrates efficiency and scalable design


A new computer chip has been built by Princeton University researchers, which promises to boost performance of data centers that lie at the core of online services from email to social media.

Data centers - essentially giant warehouses packed with computer servers - enable cloud-based services, such as Gmail and Facebook, as well as store the staggeringly voluminous content available via the internet. Surprisingly, the computer chips at the hearts of the biggest servers that route and process information often differ little from the chips in smaller servers or everyday personal computers.
By designing their chip specifically for massive computing systems, the Princeton researchers say they can substantially increase processing speed while slashing energy needs. The chip architecture is scalable; designs can be built that go from a dozen processing units (called cores) to several thousand. Also, the architecture enables thousands of chips to be connected together into a single system containing millions of cores. Called Piton, after the metal spikes driven by rock climbers into mountainsides to aid in their ascent, it is designed to scale.
"With Piton, we really sat down and rethought computer architecture in order to build a chip specifically for data centers and the cloud," said David Wentzlaff, an assistant professor of electrical engineering and associated faculty in the Department of Computer Science at Princeton University. "The chip we've made is among the largest chips ever built in academia and it shows how servers could run far more efficiently and cheaply."
Other Princeton researchers involved in the project since its 2013 inception are Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Jonathan Balkind, Alexey Lavrov, Matthew Matl, Xiaohua Liang, and Samuel Payne, who is now at NVIDIA. The Princeton team designed the Piton chip, which was manufactured for the research team by IBM. Primary funding for the project has come from the National Science Foundation, the Defense Advanced Research Projects Agency, and the Air Force Office of Scientific Research.
The current version of the Piton chip measures six by six millimeters. The chip has over 460 million transistors, each of which are as small as 32 nanometers - too small to be seen by anything but an electron microscope. The bulk of these transistors are contained in 25 cores, the independent processors that carry out the instructions in a computer program. Most personal computer chips have four or eight cores. In general, more cores mean faster processing times, so long as software ably exploits the hardware's available cores to run operations in parallel. Therefore, computer manufacturers have turned to multi-core chips to squeeze further gains out of conventional approaches to computer hardware.
In recent years companies and academic institutions have produced chips with many dozens of cores; but Wentzlaff said the readily scalable architecture of Piton can enable thousands of cores on a single chip with half a billion cores in the data center.
"What we have with Piton is really a prototype for future commercial server systems that could take advantage of a tremendous number of cores to speed up processing," said Wentzlaff.
The Piton chip's design focuses on exploiting commonality among programs running simultaneously on the same chip. One method to do this is called execution drafting. It works very much like the drafting in bicycle racing, when cyclists conserve energy behind a lead rider who cuts through the air, creating a slipstream.
At a data center, multiple users often run programs that rely on similar operations at the processor level. The Piton chip's cores can recognize these instances and execute identical instructions consecutively, so that they flow one after another, like a line of drafting cyclists. Doing so can increase energy efficiency by about 20 percent compared to a standard core, the researchers said.
A second innovation incorporated into the Piton chip parcels out when competing programs access computer memory that exists off of the chip. Called a memory traffic shaper, this function acts like a traffic cop at a busy intersection, considering each programs' needs and adjusting memory requests and waving them through appropriately so they do not clog the system. This approach can yield an 18 percent performance jump compared to conventional allocation.
The Piton chip also gains efficiency by its management of memory stored on the chip itself. This memory, known as the cache memory, is the fastest in the computer and used for frequently accessed information. In most designs, cache memory is shared across all of the chip's cores. But that strategy can backfire when multiple cores access and modify the cache memory. Piton sidesteps this problem by assigning areas of the cache and specific cores to dedicated applications. The researchers say the system can increase efficiency by 29 percent when applied to a 1,024-core architecture. They estimate that this savings would multiply as the system is deployed across millions of cores in a data center.
The researchers said these improvements could be implemented while keeping costs in line with current manufacturing standards. To hasten further developments leveraging and extending the Piton architecture, the Princeton researchers have made its design open source and thus available to the public.
"We're very pleased with all that we've achieved with Piton in an academic setting, where there are far fewer resources than at large, commercial chipmakers," said Wentzlaff. "We're also happy to give out our design to the world as open source, which has long been commonplace for software, but is almost never done for hardware."