“Al Gore Invented the Internet” is the vampire of memes. Despite having been debunked by both Vint Cerf and Leonard Kleinrock, that not-quite-a-claim still arose when Cerf, a.k.a. “one of the fathers of the Internet,” was introduced at a recent public lecture at UCLA. Cerf, dressed to kill (vampires) in a three-piece suit complete with silk pocket square (but sans wooden stake), once again struck it dead.
According to Cerf, Gore was “pivotal” in creating the commercial Internet we know today. Gore championed the High-Performance Computing Act of 1991, says Cerf, which he described as “legislation which funded the expansion of the NSF Net and which also permitted commercial traffic to flow on government sponsored backbones.”
The Act also created the Networking and Information Technology Research and Development (NITRD) Program, which coordinates the efforts of 15 U.S. federal agencies, such as the NSF and NIST, to fund research into advanced information technologies.
Without Gore’s initiative, says Cerf, the private sector “would not likely have gone forward with any investments.” The expansion meant that businesses didn’t have to build a national-scale backbone to find out if “there was a business in serving the Internet capability on a commercial basis,” says Cerf. Once the business community realized the potential, the gold rush was on.
So, no, you shouldn’t joke about Gore “inventing” the Internet. But maybe you can blame everything from Facebook to TMZ on him.
After clarifying Gore’s role in the origin of the commercial Internet, Cerf reminded the audience that in the future, the Internet is likely to be very different from what we know today. Currently, North America has a nearly 80% user penetration. By contrast, Asia, with its far larger population, has only a 26% user penetration. “If you’re interested in going into business with Internet-based applications on a global scale, these numbers and their growth rates are important to you,” says Cerf. “The users of the network are going to color its culture, its flavor, its interests, its policies, its rules, and regulations.”
If that isn’t enough to guard you against complacency, Cerf had this unsettling comment about that victory over PIPA and SOPA we all celebrated: “The only reason that the various legislators backed away from PIPA and SOPA is because it was an election year, and the topic had become toxic. You can’t rely on that. If this had been 2011 or 2013, there’s no guarantee that the action taken by this groundswell would necessarily have had the effect that it did had.”
The slayer warns this vampire will arise again. “The problem is severe. It is going to come back. If we don’t want the Net to be suppressed, if we don’t want speech to be curtailed, we’re going to have to figure out how to deal with this,” says Cerf. “We can’t say to the Intellectual Property community, ‘This is not a problem, buzz off.’ Because it is a problem. Piracy is real. We need to think our way through to find a way to monetize these works, without harming the freedom and openness of the Internet.”
Another of Cerf’s forward-looking themes is that we have begun to live in an era in which Metcalfe’s Law applies to data, as well as networks. Though the reference to Metcalfe specifically is mine, not Cerf’s. I don’t mean the precise formulation (the value of a network is proportional to square of the connections), but the underlying concept that an increase in size creates a far more powerful network. (Readers: If I’ve overlooked a pre-existing axiom that expresses this idea for data, please fill me in in the comments. Thanks!)
The Internet has produced, in Cerf’s words, a “huge corpus of material,” a dataset unprecedented in size and scope. Unleash a Bayesian algorithm on the dataset, and you get, to use one of Cerf’s examples, much better machine translation because there’s a greater body of real-world samples for the algorithm to learn from.
Another example of what large datasets may ultimately provide for machine learning was just published in ArXiv and reported in the New York Times. Google researchers used the “huge corpus” to train a neural network to accurately recognize images of cats. (Much current image recognition still relies on meta-data: a coder including a tag that indicates it’s an image of a cat.) Nevertheless, it will surprise none of you that more than one academic expert feels (off the record) that Google is “over-selling” its results.
Andrew Y. Ng, leader of Google’s “Deep Learning” team, is a scheduled guest lecturer at UCLA’s Institute for Applied Mathematics graduate summer course on Deep Learning, which began July 9. If you’re unable to attend, IPAM usually posts videos from its courses.
During the Q&A session after Cerf’s talk, the notion that data might overwhelm even Google peeked through. Surprisingly, Cerf said, he wasn’t familiar with Google’s much-derided decision to drop the + operator for searches.
When I asked why Google never employed the Boolean “near” operator offered by AltaVista, the early darling of search engines, Cerf replied that a lot of people have asked why Google never used Booleans.
“The argument against it is two-fold. First, there’s a small portion of the population that uses Booleans,” he explained. “We were looking for something that was dead easy for everybody. We also hoped, frankly, that our ranking procedures would substitute for some of the complexity that a Boolean search would generate.”
Cerf added that there’s also a scaling problem. “We’d have to create large scale subsets of the corpus of material and then merge them. We have so much stuff that even with our significant computing capabilities, that might not have been feasible for the scale we’re talking about.”
As someone who still laments the demise of AltaVista, I’d love to hear from any of you who have the background to confirm or possibly dispute that explanation.