WASHINGTON - The Bush administration will soon be packed and gone, but part of its legacy will live on in cyberspace.
A consortium of government and nonprofit agencies plans to capture snapshots of every federal government Web site before Jan. 20, when the next president moves into the White House and starts remaking the federal bureaucracy to fit his agenda. The goal of the 2008 "end-of-term harvest" is to preserve millions of agency records in an online archive that librarians hope will provide a valuable trove for historians, government scholars and the public.
The need for such an archive is greater than ever, librarians say. Many federal agency records exist only in digital form and are in danger of disappearing when the administration changes. Digital records are telling, they say, because an administration's policy priorities are often reflected in the face it presents to the world online.
"These sites provide a record for the future about the workings of the government in this day and time," said Martha Anderson, a top official with the Library of Congress's National Digital Information Infrastructure Preservation Program, which will lead the archiving effort. "Most government information today is published on the Web and not published in paper."
Kris Carpenter, director of the Web group at the Internet Archive, one of the nonprofit partners, said the project will capture an estimated 125 million pages from federal Web sites. She said the group plans to take a snapshot of those pages in late August, another two or three weeks before the November election and a third shortly before the presidential inauguration in January.
The entire trove, which experts say can be stored on servers the size of a standard file cabinet, should be available free on the group's Web site beginning in February, she said.
"In the past 10 years, there has been a radical change in the way that information is published by the government to its constituents," Carpenter said. "The majority of information is actually made available through these Web sites. But they are changing very rapidly as new information is introduced, and especially as administration changes occur."
Programs such as President Bush's No Child Left Behind initiative are unlikely to maintain a high profile under the next president, Anderson said. Capturing the information about that program on the Education Department's Web site could be a valuable resource for researchers looking at Bush's record, she said.
Other officials involved in the project said an agency that deserves special focus is the Department of Homeland Security, which was created in 2003 in response to the Sept. 11, 2001, attacks.
"These things become important for public policy [makers]," Anderson said. "They also become fodder for history books eventually."
The San Francisco-based Internet Archive will use powerful computer software to automatically capture and store material from federal Web sites. Other partners include the California Digital Library and the University of North Texas Libraries, which will focus on in-depth looks at specific agencies; and the Government Printing Office, which maintains paper records in the Federal Depository Library Program and will offer curating advice.
Retrieving information after it has vanished is difficult, said Suzanne Sears, head of the Government Documents Department at the University of North Texas Libraries, one of nearly 1,300 federal depository libraries across the U.S.
"When Bush took over, the second that he was sworn in, the White House Web site went from being this massive collection of important links to a picture of Bush and his biography and a picture of [Vice President] Cheney and his biography," Sears said. "We don't want to see that happen again. . . . It's just very important for history's sake that this material is archived the way that the printed material has always been archived."
The National Archives and Records Administration collected federal Web site records at the end of the Clinton administration in 2000 and again in 2004 at the end of Bush's first term. But the comprehensive records are not easily accessible to the public online, said Susan Cooper, a spokeswoman for the Archives.
The Archives does not plan to capture again all agency Web sites, Cooper said, noting that the Archives will work with agencies to determine which Web records should be preserved.
© 2008 The Washington Post