apatrizio

Intel's RISC-y Business

by apatrizio ‎19-09-2011 06:00 AM - edited ‎03-10-2011 01:09 PM

For the better part of a decade, Intel has been nibbling and gnawing away at the Unix server market that RISC-based servers owned – lock, stock, and SAN – just 15 years ago.

Back in the mid-1990s, the Unix business was thriving. Sun was the big shot with its SPARC-based servers running Solaris; IBM was on its tail with AIX running on POWER-based servers; HP was in the mix with HP-UX running on PA-RISC; and SGI was the darling of Hollywood after its IRIX-based MIPS machines brought Jurassic Park to life.

Today, Oracle owns what's left of Sun, a one-time $18 billion-a-year company. SGI exists in name only, after Rackable bought what was left of it and assumed its name. Google now occupies SGI's former Mountain View offices. Big iron is less than 10% of IBM's business total, and much of that is x86 servers; the PA-RISC chip has been reincarnated as the Itanium.

And Intel? It just kept chugging along, improving on the Xeon with each generation. As Linux vendors made Linux increasingly viable on the server (aided in part by IBM, SGI, and Sun, who bowed to the inevitable a decade ago), Linux-based Xeon servers took more and more share away from the Big Four Unix vendors. All four tried to make the switch to Xeon/Linux servers, with varying degrees of success.

Today, IDC puts Xeon-based Linux at about 90% of the *nix server market. The last 10% is for Oracle, HP and IBM, SGI's IRIX having long since gone extinct.

RISC machines have retreated to the highest ground of computing, the five 9s domain (99.99999% uptime) of mission-critical computing, and Xeon has thus far been unable to follow. With its latest Xeon, though, it looks like Intel believes it can enter into the RISC market.

Taking a RISC

The Xeon E7 line is the successor to the 7500, introduced last year. Developed under the codename “Westmere-EX,” it's a 10-core chip with HyperThreading, so it can power through 20 threads at a time. Whereas the Xeon 5x00 line, the staple of the Xeon family, targets dual- and four-socket systems, the E7 is aimed at 8-socket and more. SGI and Cray, for example, have 256-socket machines.

The Xeon 7500, a.k.a. Nehalem-EX, was a huge leap over the prior generation, as much as three times faster. That's because it was completely re-architected from the ground up. The E7 is an incremental change over the –EX, but still delivers anywhere from 20% to 42% more performance over the -EX.

With the launch of the E7 earlier this year, it seemed Intel was finally ready to make its final push, calling out RISC by name. “The days of IT organizations being forced to deploy expensive, closed RISC architectures for mission-critical applications are nearing an end,” said Kirk Skaugen, vice president and general manager of Intel's Data Center Group, in a statement announcing the E7 line.

Bold words. Can the E7 really dethrone UltraSparc/Power/PA-RISC and, of course, Intel's own Itanium processors? Intel thinks so.

“From our perspective, the Power/Sparc/mainframe market still represents a $15 billion dollar server spend, so whether you choose to deploy a Xeon or Itanium, we have solutions to address this market. Will Xeon pull some sales from it? Sure, but at the same time we're viewing this as having a portfolio to go after the Power, Sparc, and mainframe markets,” said Patrick Buddenbaum, mission critical marketing director at Intel.

The Xeon and Itanium share a common chipset and common memory buffer, and their RAS (reliability/availability/scalability) capabilities are almost the same. So it really is up to the OEM in terms of system implementation and what they choose to build around it, said Buddenbaum.

"I do think it's going to be a progression over time where we will see bigger, more robust systems coming out," he said.

But HP is going slowly in that regard. It has its Xeon line in the ProLiant family that run Linux and Windows Server; it also has two lines of mission-critical Itanium-based Integrity servers, the Non-Stop and the Superdome, which run HP-UX, HP's very mature Unix OS. A lot of what starts in the Integrity servers migrates down, notes Jim Lofink, cross-portfolio product marketing manager for HP Integrity.

"A lot of Itanium functionality is cascading down to Xeon. So instead of being completely different environments, we look at Itanium being a more high-end environment to Xeon rather than being from two different families," he said.

"If you look at Integrity products and ProLiant Xeons, a lot of functionality cascades from Itanium down to Xeon. The Prima architecture on ProLiants started on Itanium side. Instance management capabilities started on Integrity side," said Lofink.

More Than Chips

It really does fall to big iron vendors like HP to bring their infrastructure to the E7, because at the chip level, for all intents and purposes, "it's pretty much equivalent [to Itanium]," said Buddenbaum.

"If you run through every RAS feature, Itanium may have a few more capabilities. But from an end-user perspective, the difference would not be very noticeable. So much of that ties back to OEM implementation and how they bring their platform to market," he said.

Jed Scaramella, research manager for servers at IDC, agrees. "You still do have to design your system around the CPU. There's still the management and the redundancy to be built around it. There is the operating system. That's a big part of it. People still consider Unix on RISC a more stable environment than Linux on x86," he said.

IDC has noted that RISC sales have increased for the last two quarters while x86 is trailing off slightly. Scaramella noted that when the economy began showing signs of recovery, x86 was first to pick up steam because they tend to be cheaper and an easier deployment choice.

RISC systems tend to be bigger, scale-out systems, often used in consolidation of older hardware. It takes longer to approve the purchase of a million dollar server than a $3,000 two-socket Xeon server, Scaramella explained. So cheaper, smaller x86 servers were the first to start selling.

Most Unix-based RISC systems deployed today are sold to existing customers, but there are occasional new customer wins here and there. By and large, though, these systems are being used in 10-to-1 consolidations where old systems are retired and virtualized systems deployed.

So Scaramella doesn't expect Xeon, even the E7, will displace RISC any time soon. "These products are slow-moving because [IT organizations] are risk-averse. They aren't going to move their systems over any time soon. They like to keep their workloads siloed," he said.

Slowly But Surely

But Lofink thinks customers are warming, slightly, to non-RISC environments. "While there are more and more customers who want to put more mission-critical needs in Xeon environment, there are still customers who for business purposes cannot support mission-critical needs in an x86 environment," he said.

"There are some things on hardware beyond the chip for mission-critical computing," added Lofink. "The Superdome crossbar fabric provides additional hardware resiliency not available on our Xeon servers. Our Non-Stops have triple redundancy so you never go down. Integrity is one step below that. The next level down there is Xeon, with good enough availability."

But would HP make those Non-Stop hardware features and HP-UX available for the Xeon E7? Not likely. "We have no plans for an HP-UX port. The rest depends on market needs. Would our customers want HP-UX on a Xeon or would they want to go to Windows? We have customers who go both ways. I can't say anything either way," he said.

But today, the full high-reliability ecosystem isn’t there, he added, and if the world is going in the x86 direction, a Xeon environment has to have the full ecosystem to support all mission-critical needs.

It's not just reliability; it's also reporting. The ability to go back and debug crashes by getting memory dumps is easy on a Unix or mainframe system, less so on a Linux system. "If there is a crash you get dumps on everything [on Unix]. Will those come to x86 in time? Yes, but it's not there today," said Lofink.

Buddenbaum said that despite whatever claims Oracle has made about the end of the Itanium, Intel is committed to the chip. "We remain firmly committed to the Itanium architecture and are pleased OEMs are bringing new solutions to the market," he said. "At the high-end RISC market, we spend a lot of time working with the ISV community. I spend my time equally between OEMs and the ISV community, making sure we have platforms ready and making sure they are all focusing their innovation on Xeon and Itanium."

Scaramella believes servers are becoming more specialized and general purpose servers are going away. "They put a lot of design into these systems. We call them general purpose but the servers are becoming less general. There has been some specialization creeping into the market for some time, whether it's scale out, SMB, mission-critical or high-performance. General purpose is a misnomer now," he said.

And RISC is here to stay, he adds. "We think x86 will creep up a little by little but it won't gain by leaps and bounds – and non-x86 won't go away for a long time."

Comments
by Bart G(anon) on ‎19-09-2011 02:29 PM
"five nines" is 99.999%, not 99.99999%.
by Zachary Stern(anon) on ‎19-09-2011 04:18 PM

Techinically correct, Bart, but there isn't much difference.

 

99.999% availability means no more than 25.92 seconds of downtime per 30 day period. At that point, the exact times are irrelevant. If you have any downtime at all, you've lost 5 nines, at either standard.

by Name(anon) on ‎19-09-2011 04:24 PM
"Can the E7 really dethrone UltraSparc/Power/PA-RISC and, of course, Intel's own Itanium processors?" This implies that Itanium is RISC.. when it is the absolute opposite, it is VLIW (very long instruction word) and not RISC in any way. That said, the article is right that Xeons and other processors will eat into the other CPU lines, performance is such that actual architectures don't matter much, so you might as well use the Mass Produced CPUs.
by Chris(anon) on ‎19-09-2011 05:09 PM

There is a 5 minute difference between 99.999 and 99.99999 per year. That could equate to "cpu exploded, quick replace it and reboot"

Quite a bit of difference between 316 seconds and 3.16

by Doug(anon) on ‎19-09-2011 05:48 PM

The difference between 5 nines and 7 nines is huge.    Five is hard, even for telcos, seven is near-impossible.

by Friendly Admin(anon) on ‎19-09-2011 06:00 PM

Believe me, I can't wait until I can run much cheaper Linux on Intel, but for my web hosting bang for the buck SPARC is still where it's at, even at the new higher Oracle pricing. I just bought some SPARC T3 units which are 256 threads per processor, so sorry Intel only pushing 20 threads per proc still does not impress.

Web hosting is of course a special breed where we take full avantage of threads, so a SPARC box that's 3X the price of a Xeon box can handle about 8X the number of concurrent users per box, based on real-world testing of the same app I did on last years Linux/Intel 8-core Xeon versus last years Solaris/SPARC T2. That was a 16-thread 7500 versus a 64-thread T2,  I think the gap between this years 20-thread Xeon 7800 versus a 256-thread SPARC T3 would be widening not closing, no?

 

 

by Joss(anon) on ‎20-09-2011 02:14 AM

Hope Intel's new CPU will be a BIGENDIAN. All transmission equipement should be BIGENDIAN unless we redefine all 3GPP protocols.

by Wanderson(anon) on ‎20-09-2011 06:56 AM

The article provides some insights into the thinking of Intel, HP and others, according to their statements.

However, it does get confusing at times when the reference is - supposedly - X86 versus RISC, then switches to and mixes UNIX versus Linux with a mention of Windows thrown in, although I am uncertain as to what context.

Maybe an article dedicated to UNIX versus Linux iincluding BSD, and maybe (realistic and credible) comparison of these with Windows server.

 

 

by alpha754293(anon) on ‎20-09-2011 11:39 AM

The question is how well will those systems handle when you throw 150,000 bank accounts at it per hour and tell it to calculate the interest.

The other half of the equation - the operating systems.

The big iron UNIXes are infinitely more secure than Linux. (I'm not even gonna go there with Windows. Too easy.)

When was the last time you heard about a big mainframe being breached?

It's actually quite amusing to watch a POWER5 crunch through the interest calculations on those accounts, while you see the Xeon boxes SWEATING; and the IBM is just like "what? Is that all you've got? ONLY 150k? psshhhh...."

The downside: AIX admins are a dying breed. At least with Solaris, you can play around with it and learn it on x86 systems so that when you transition into the workforce, you don't have to learn everything from scratch going to a SPARC system. SOME stuff chances, but not much, and not by much.

by Go BSD(anon) on ‎20-09-2011 12:03 PM

Linux will probably continue to struggle to have the security to replace these systems.

 

Unix members have turned toward the BSDs or else work a little harder on hardening linux.It is hard to get more secure than OpenBSD. And NetBSD can install on almost anything. FreeBSD is the most common and probably secure enough too. The BSDs are not hard to learn for an AIX/Sun admin.

Jared
http://www.rhyous.com

by tcobob1(anon) on ‎20-09-2011 02:31 PM

Does anyone have any thoughts on how Linux on IBMZ plays in this discussion, in particular, with mention of cheaper, what are some Total Cost of Ownership findings?

by Michael Mahon(anon) on ‎20-09-2011 08:17 PM

It is grossly incorrect to say that "7 nines" is essentially equivalent to "5 nines" availability.

5 nines availability is about 5 minutes of unavailability per year, and periods of years are exactly what high availability systems are about.

When a failure occurs, a high-availability system will perform an automatic recovery which may take milliseconds to minutes.  For a 5 nines system, if the MTTF is on the order of a month, then recovery must, indeed, be less than 20 seconds. If MTTF is on the order of a year, then recovery may take a few minutes.

Achieving 5 nines or better means employing significant redundancy so that errors can be detected rapidly and either masked (in the case of TMR systems) or rolled-back and re-run.  Human intervention is practically out of the question.

There are some applications where 6 or 7 nines of availability are desireable, but this level can only be reached with TMR (Triple Modular Redundancy) and self-checking systems.  While this technology is pretty well developed for CPUs, getting the I/O subsystem and networks to this level is a work in progress.

In summary, two orders of magnitude is never "irrelevant".  Just admit that you made a mistake.

-michael

 

Post a Comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.

The HP Input Output site is sponsored by HP and features articles and content from HP and third-party contributors. Third-party articles and content, while paid for by HP, do not necessarily represent the views and opinions of HP. HP does not endorse this content and is not responsible for its accuracy, availability and quality.

Follow Us
Spotlight
"It's Not My Job" - Handling the Vendor Finger-Pointing Trap Is Teamwork Dead? A Post-Agile Prognosis Improving Your Personal Brand with Social Networking 5 Types of Meetings Every Business Must Explore
┼ Based on energy, paper and toner savings from regular printer usage. Results may vary.