Where is the Digital Library?
![]() Judith Boettcher [JB] |
![]() Howard Strauss [HS] |
![]() Cliff Lynch [CL] |
September 28, 2000
Audio
• Streaming
MP3
• Download
MP3 (Download
Tips)
JB: Welcome to the CREN Tech Talk series for Fall of 2000 and to this session on "Where is the Digital Library?" with Cliff Lynch. You are here because it's time to discuss the core technologies for your future campus. This is Judith Boettcher, your CREN host for today, and a special thanks to our CREN member institutions for their support of this series.
Let me welcome Howard Strauss of Princeton as the technology anchor for Tech Talk. Howard is a well-known web technology expert and portal expert. Howard?
HS: Thank you, Judith. I'm Howard Strauss, the technology anchor for the Tech Talk series of technology webcasts. In this webcast, I invite you to join Judith and me in a lively technical dialogue with our guest expert, Cliff Lynch, that will answer the questions you'd like answered and ask those very important follow-up questions. You can join in this dialogue by sending your questions via e-mail to expert@cren.net anytime during this webcast. If we don't get to your questions during the webcast, we'll provide an answer in the webcast archives.
In 1943, Thomas J. Watson, the founder of IBM, said, "I think there is a world market for maybe five computers." Before you fall over laughing at this bad prediction, it should be remembered that the ENIAC wasn't built until three years later, and that Watson's prediction in fact held true for ten years.
Just two years later, in 1945, Vannevar Bush, Director of the Office of Scientific Research and Development, in the article "As We May Think," described a futuristic device called the Memex. A Memex, said Bush, "is a device in which an individual stores all his books, records and communications and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged, intimate supplement to his memory." In the Memex, Bush said, would be books of all sorts, pictures, current periodicals and newspaper. And there is provision for direct entry, he said, so that individuals can add personal annotations to the material. Bush's Memex, described two years before the first digital computer was built, is a fairly good prototype of the digital libraries we have been trying to build ever since then.
While we have collected and digitized many petabytes of data from books, images, sounds and movies, there are still many fundamental problems to be solved before all of us have access to a device like the Memex. Although we traditionally think about a library as a large building in town or on campus that we can spend many pleasurable hours visiting, a digital library just exists on some kind of magnetic or optical storage medium. To use a digital library, we need some means to search it, access it and render the bits in a format that serves our purpose-not to mention some means to get all the information legally digitized and store there.
Traditional libraries have vast and growing collections of books so it is convenient to access everything you want in one place. In fact, the bigger the library, the more prestigious and desirable the university or municipality thinks it is. Given humans' propensity to collect stuff and the fact that books have been available for centuries, it is no wonder that our physical libraries are bulging at the seams. Will all physical libraries soon be replaced by Memex-like devices that are part of our cell phones and PDA's? Should we immediately halt our library construction projects and put the money into buying petabytes of disk storage and fiber optic networks?
The online article, "The Transformation of the Public Library," imagines Andrew Carnegie coming back to tour the nation's 8,929 public libraries and being amazed at all the changes. If Andrew comes back again in five or ten years, what will he see then? Will we even have libraries to visit? And will he find digital librarians instead of real ones? Many institutions are already addressing the issues of building and accessing digital libraries on many fronts, but predicting where we are going and how best to get there is always problematic. Nonetheless, our intrepid expert, Cliff Lynch, will consult his personal Memex to help us better understand digital libraries on today's webcast of Tech Talk. Judith?
JB: Thank you very much, Howard, and I can't wait for a Memex that really is easily accessed and searched and I can find-even find things, you know.
HS: Actually, we have-by the way, we have a link to the original article on the website.
JB: You know, that is great. In fact, I went out and printed it out too, and I was amazed. And to remember that it was printed in 1945, you know. Can you-can we imagine? Most everything I have, I throw away after it gets 12 months old, but 1945, it's longer ago than most of us think about right now. With that, let me move into introducing Cliff Lynch who is here with us today. Cliff is the Executive Director of the Coalition for Networked Information which is jointly sponsored by the Association of Research Libraries and Educause. CNI is a nonprofit organization focused on effective use of networked information to enhance scholarship and intellectual productivity. Prior to joining CNI three years ago, Lynch was at the University of California Office of the President for 18 years, most of that time as Director of Library Automation. More about Cliff is at the site with some interesting links, including a free report online called "The Digital Dilemma: Intellectual Property in the Information Age." Welcome, Cliff. Thanks so much for joining us here on Tech Talks.
HS: Okay, Cliff, since we are not sure exactly what a digital library is, maybe you could start out by telling us what a digital library is and what people are going to be able to do with a digital library.
CL: Okay, I think that's an excellent place to start, especially because there's not a lot of consensus around exactly what a digital library is. You know, I was very entertained to hear you mention Vannevar Bush's article because, in fact, there is a whole prehistory which actually goes back quite a long time before Vannevar Bush in trying to envision these kinds of things that today we think of as digital libraries, going way back into the '20s and '30s.
HS: And the web only goes back so far, Cliff!
CL: Yeah! Let me at least say just a few things about digital libraries and what we kind of mean by that or what various people mean by that. I believe I first heard the term "digital library" sometime in the mid-eighties from Vince Cerf when he was talking about some of the new applications that the-at that time-really still just emergent Internet would usher in.
And digital library is one of these wonderful phrases. On the one hand, it's terribly resonant because it brings in all of the kind of cultural stuff that comes with libraries and maps it onto the networked environment. At the same time, it's sort of an oxymoronic term as well. When we use the term "library," we really take it to mean one of three things. A building, a collection, or an organization that manages a building and a collection. And when you start talking about digital collections, that makes sense, but in a digital library-it doesn't make sense when you use "library" in the organizational or place sense. Now, having said that, I guess the kind of working definition I use for digital library is it's a system that houses a sort of a coherent and managed collection of information to support some purpose or end, and that gives you tools for working with that. It's worth saying that there's still very much of an open issue about how passive or active a digital library is.
HS: What do you mean by passive or active?
CL: You think of a physical library with collections of materials, basically, you go there to read. And certainly, you can emulate that in a digital library. On the other hand, there's a notion which also emerged in the late '80s called the collaboratory, which is a place where researchers can get together on the net, control scientific apparatus, analyze data, work together, author materials. Things like that. And you can see a kind of digital library that moves beyond just a place to read and is a place to actively engage data, refine it, create new information as well. So you really can find digital library systems that kind of run the spectrum from rather passive stores of information all the way through things that start looking a lot more like collaboratories.
Let me say just a couple other things to kind of put digital libraries in context. What we're seeing, one of the sort of key questions is how do digital libraries relate to existing libraries as organizations, be they public libraries, be they academic libraries that are part of universities, whatever? Many, many institutions now are building extensive electronic collections, extending and somewhat supplanting their print collections. And these can be thought of as digital libraries within a library-as-organization. But not all digital libraries have their roots in historic library organizations. For example, we're finding digital libraries to support science, to support various kinds of scholarly inquiry. Much of that's being put together by working scholars, by scientific organizations, by professional societies.
HS: But is that done under the auspices of a library or just totally outside?
CL: Totally outside, and in some cases, these things may then be marketed to libraries on a commercial basis or they may be-they may just be sort of sitting out there supported by scientific communities. It's also worth pointing out, some of the earliest things that I think of as digital libraries are, in fact, commercial. To give you just one good example, there's a system called Lexis which is major, major commercial system for attorneys and this has got all the legislation, all the case law, Law Review articles, things like this. It's virtually impossible to do law in the United States today without access to this system, and it's a very expensive commercial system which is used by most law offices. They in fact give it away to law students to get them familiar with it early in their law training. But that's a purely commercial operation.
HS: What are the things that are pushing us to go from books to bits? Why the push to do that? Is it just because we can do it now?
CL: Well, I think there's two things. There's a push, if you will, and there's a pull. The push is that libraries as physical things are really expensive. It's expensive to print paper, it's expensive to ship it around. Paper deteriorates, people razor-blade articles out of journals. Paper is really expensive stuff to manage and it's expensive to build buildings to house it. So there's a lot of sort of efficiency and greater ease of access that's part of the push.
HS: So are you saying that there's some strong economic reasons for going to digital libraries, that digital libraries somehow are cheaper than physical libraries?
CL: Not necessarily cheaper, but certainly they displace a lot of costs. I think the jury is still out about are they truly cheaper? Certainly, they seem to promise more service at the same level. For example, a digital library never needs to be closed the way a physical library is. You have the convenience of consulting it at a distance. All of those things are very attractive. But let me say a word about the pull because I think it's easy to overlook this.
I think it's a terrible mistake to think about digital libraries simply as places where paper is translated to bits. If you talk to scholars and to authors, many of them are really, really excited about the capabilities of authoring for the digital media, of being able to create very rich, complex, interactive multimedia works that can be updated and expanded incrementally, that can include models, that can be linked up to databases, that can interconnect with other people's works. And so I think there's a tremendous amount of excitement, particularly in the scholarly world, about how as we author works that are intended and designed for the digital medium, we have a much richer set of abilities to communicate and convey information than we did when we were limited to the printed page. And as we see this sort of authorship for the digital medium growing, that is the pull that brings us to digital libraries because that material cannot be reduced to paper, in many cases, without losing a tremendous amount of content.
HS: So when you talk about digital libraries, you're talking about storing all kinds of stuff other than books. Are there other things-I mean, do we have digital libraries for movies and sounds and are there folks trying to do that kind of thing?
CL: Oh, yes. In fact, there are extensive digital libraries of visual materials. Museums are very aggressively, in some cases, moving to make digital images of stuff in their collections available in digital libraries and there are some consortia within the museum world that are moving this agenda along. There are a few sound archives, digital libraries of sound and music, and recently we've seen a few very exciting developments in the video area.
For example, there is a project which I'm pretty sure you have a link to on the page called Survivors of the Shoah, which is a project that Steven Spielberg has been funding. And what they have done is they have tracked down several thousand Holocaust survivors, built up biographies, interviewed them on video and done very good transcripts of this. But they've also digitized the video, so they actually have a database now which I believe is around 160 terabytes in size of these thousands of hours of video interviews. And, you know, this is just one sort of specific, highly-focused historical and cultural study.
JB: So what you're describing, Cliff, is not, you know, taking our existing libraries and making them digital as much as what you've been talking about now is what the new digital-what new categories and new types of both libraries and authorship we're going to see in the future.
CL: Yes, and it's not just authorship. Data plays an important role here as well. For example, one of the digital library projects that was funded under the first round of National Science Foundation NASA funding back a few years ago was a project called Alexandria at the University of California at Santa Barbara. And what the Alexandria digital library was focused on was geospatial information so they combined digital analogues of topographic maps, aerial photos, remote sensing data. They used tools like gazetteers to provide access to it and primarily, what that database was about was trying to manage geospatial information.
HS: Okay, even though you said this is a lot more than books, is this kind of thing going to get rid of printed versions of books?
CL: Well, I don't believe that we're going to - personally, I don't think we're going to see the rapid demise of the printed book.
HS: But I mean in libraries. I think that I - I mean, I personally will always carry around a paperback book with me or [inaudible].
JB: You think?
HS: I think I will always do that!
JB: I don't know, I was just talking with someone else about this recently and about the idea of, you know, are we really - or are we going to see the development of other little digital appliances? Are they going to really get past some of the barriers of the screen development and the battery development and all the rest of that? So that it means rather than carrying one book, you can carry five.
CL: Well, you know, don't underestimate that. There has certainly been some progress in these kind of appliance book readers over the last couple of years, and in fact, I think people think about these the wrong way sometimes because they think of these appliance book readers as sort of a substitute for a hardcover book, and it's just when you finish that one, you can pour another book into it.
JB: Um-hum.
CL: Even the current generation of them hold ten, 15 books and, you know, if you just sort of turn the technology crank another generation or two, you can readily believe people will be walking around with these little portable appliances with a thousand-book library in them.
HS: Well, yeah, in fact, I can believe that they'll have access to all the world's knowledge, but you'll have to read it on this funny little screen. But in libraries, are we seeing paper disappearing or are we just seeing in addition to all the paper you have, we're also having electronic journals and CD-ROMs and all that kind of stuff? What's happening, what's the trend?
CL: Well, the fundamental problem is that reading on screens is not a lot of fun, with current screen technology. Particularly if you're reading something accurately, you know, carefully and at some length. So in fact, it's very interesting. We're seeing paper play this sort of role as almost a user interface through printing on demand.
JB: Interesting thought.
CL: I'll give you an example. There are a lot of journals now that are available in digital form and in some cases, the publishers are just starting to phase out the print. You know, they're publishing both in digital and in print form and the print will sort of phase out. Now, if you look at studies of how people are actually using these things online-�
HS: I bet they all print them.
CL: Yes. Well, not exactly. What happens if you look at what they're doing is that they're doing a lot of skimming of abstracts or sort of fact checking and things like that. But much of what they're doing is making decisions about what they do and don't need to read. And then when they decide they really do need to read it, they print it. It's just, they don't necessarily file away the paper after they're finished reading it. They, you know, read it and pitch it, and you know, if they need another one a year from now, rather than having a huge filing cabinet full of this, they'll just print it again. So you're starting to see a lot of that kind of behavior around journals.
Now, your typical journal article is what, 15, 20, 25 pages. It's the kind of thing you can demand-print pretty easily. You can get a staple through it or a paper clip or something. Books are a lot more problematic. You need a pretty high end printer to really print books on demand in a reasonable way, so we are seeing some experiments with people, you know, going to digital books, but the nuisance of reading them online if it's the kind of book you read linearly is such that these are meeting kind of a mixed reception. Now, it's worth remembering, of course, that there are many genres of book that are historically published between two covers and not all of them are read linearly from beginning to end. One of the things-�
HS: Textbooks, you mean.
CL: Textbooks are an example and encyclopedia's another good example.
HS: Or the dictionary.
CL: Yes. And if you look, you know, you'll see that encyclopedias have moved to the net almost completely in many cases. They're not selling a lot of copies of Britannica in print.
JB: Um-hum.
HS: But I bet a lot of people are printing up articles [inaudible], rather than reading them online.
CL: That's right. They-�
HS: They're probably exhibiting just the behavior that you described, skimming online until you find something interesting, then printing it.
CL: And then maybe printing it, yeah. And - but in fact, these things have been much improved by moving to the net because of the ability to update the thing incrementally, to include a lot more pictorial material that was very expensive to put out on paper.
JB: Um-hum.
CL: So you know, another really wonderful example-I don't know if either of you have ever had the sort of scarring experience of trying to use Science Citation Index in Print.
HS: Um-hum!
CL: If you think about that, that's a book that never should have been a book. It should have been a database online from day one.
HS: I think its lineage goes back to before it could have been anything else.
CL: That's exactly right, but you know, I mean, that was just a database waiting to get out.
HS: Yeah! Cliff, you mentioned the Digital Library Initiative and the follow-on, the Digital Library Initiative 2. Could you tell us a little about that? Is that where all the important work is being done on digital libraries?
CL: That's certainly not where all the important work is being done on digital libraries by any means, but it was an important step along the road. I mean, what's happened actually in the world of digital libraries is you have these kind of three thematic lines of development. One is commercial products, and we talked a little bit about those. The second is the work that libraries, primarily in the university library context, have been doing in extending their collections to encompass more and more digital material. The third stream, and this is the one that the initiative you're talking about speaks to, is really a research stream. There are a lot of hard problems, computer science problems, human factors problems, problems around economics and the management of intellectual property, user behavior questions.
There's a very large research agenda that speaks to our capabilities to develop digital libraries and back about, I guess it was five years or so now, we had a kind of an unusual three-agency program involving NASA, the Advanced Research Projects Agency of the Department of Defense and the National Science Foundation, where they got together and basically put out a call for proposals for major digital library research initiatives. And they awarded six of these. Each was funded at, oh, say, roughly a million dollars a year for four years, I guess it was. Each one was centered at a university, but also had a very large array of partners, technology companies, content suppliers, all kinds of people.
So while there were, you know, sort of six flagship institutions, each one of them really was kind of a consortium. The six awards were actually quite interesting. There was the one I mentioned earlier at UC Santa Barbara which addressed geospatial information.
JB: Um-hum.
CL: There was an award to UC Berkeley which really looked at environmental information primarily and they did a whole lot of different kinds of content. There was a project at Stanford which I would characterize as more of a sort of technology underpinnings kind of project. It really didn't create a tremendous amount of content for people to access, but tried to work some of the supporting technology issues. There was an award to the University Illinois at Urbana-Champaign which really addressed the construction of SGML databases based off of journals. And let's see, I think we've missed-that's five. Right? So there should be one more here.
HS: Okay, when you get it. We have a couple questions, e-mail questions that have come in. One is from someone who is from Vienna who says she's a friend of yours, Cliff. Emile Levin from the UN Ideal Library.
CL: Yes, that's actually a he, not a she.
HS: That's a he. I'm sorry, Emile! He says, "Okay, Cliff, the professional over-40 crowd wants to hold paper. How do you move them away from paper?"
CL: I don't know that you do. Other than by creating content that doesn't reduce to paper, where they really will want the whole thing. I think that certainly there is some generational element here about how comfortable various people are reading on-screen as opposed to reading with paper. I've never seen, to be honest, any kind of rigorous study of this, but one hears at least anecdotal evidence that younger people who really grew up with screens are a bit more comfortable reading on them than older people that really only encountered them first in their teens or 20's or 30's. But I don't think you kind of abruptly transition people away from paper. That's been tried and usually unsuccessfully.
HS: Okay, we have another question from Richard Danielson at Laurentian University. And Richard says-that's kind of cute.
JB: Assuming��
HS: [inaudible] assuming. "Assuming that I am a poor undergraduate student�"
JB: Are there any other kind?
HS: I don't know of any! "Where could I go electronically to obtain free research articles for my assignments? Are there many large national or international sites which are free and which will allow me to download full text versions of peer-reviewed research?"
CL: Great question! Most peer-reviewed research is owned by somebody and in most cases, these publishers license this material, often for very large sums of money, to universities for use by that specific university community. So if you were an undergraduate at many universities and colleges, you will find that your institutional library is spending a great deal of money every year licensing access to this kind of content, and as a member of the university community there, you can use that content.
Now, for people who aren't affiliated with a university or college or whose institution has not licensed this kind of material, things get more complicated. There are some public repositories. Some scholarly societies just make their material available. There has been a lot of work in an area called e-print archiving. For example, in physics, there is a pretty well-established practice that authors submit electronic copies of their articles to a public database which is at Los Alamos and anybody can read these articles there.
HS: So one suggestion to Richard is-Richard is at Laurentian University. Probably he has some resources right there, then.
CL: I would imagine so, and I would urge him to go talk to his institutional library.
HS: Okay, we have another e-mail.
JB: Are you going to ask the question about the role of digital paper?
JB: This was one that came in, I thought it linked in well with where we were. John Charles from Cal State Hayward asks, "What about the future role of digital paper?" Which kinds of brings some of these threads together, Cliff.
HS: Just one point, Judith. The reason I didn't ask this is the message I got from John Charles was completely blank!
JB: Oh, you didn't see the subject. The question was in the subject.
HS: The whole question was in the subject?
JB: That's a first.
HS: Oh, yeah, you're right. The whole question!
JB: That's a first!
HS: I was looking for-�
CL: Let's try and get him an answer. Let me give you a little background on--you know, as we've talked about things like appliances for reading e-books, today the display screens on those are very similar technology from computing. They're typically LCD panel type displays.
JB: Um-hum.
CL: Now, there are at least two major research efforts that I know, one of them is based at Xerox Palo Alto Research. And there's another effort at a company called E-Ink, which is, as I understand it, sort of a spinoff of research done at MIT and still has some ties to places like the MIT Media Lab. Both of these research efforts are working with notions of coming up with some kind of paper-like plastic substance that would be very thin and very light, that would have the same kind of visual characteristics in terms of resolution and reflectivity as we're accustomed to in paper and that could provide kind of a new generation of display technology which would be much closer to paper in terms of quality of an experience in viewing. It's really tough to know what to make of this. Both of these projects, at least last time I looked at them, were still very much in the lab. E-Ink, as I understand it, is targeting as their first product not a kind of consumer thing I was talking about a minute ago, but signage that can be electronically rewritten, which is a rather different set of engineering problems. So, you know, it's tough to say. I mean, the physics, the material science of the stuff is fascinating, but my reading is that it's at least a couple of years away from turning into real product, and maybe much more than that. It's just too early to tell.
HS: Okay, we have another question. It's kind of a long one. Actually, it's three questions here, from Lee Watson Healy. And Lee says, "In our research in the information industry, our brains report that they are moving toward the digital library, but more slowly than expected. They tell us a number of key gating factors are affecting the rate of movement toward digital, particularly in the shift from physical library paradigm to networking and published journals. First question is�" I guess it's a question. "Availability of electronic content. Retrospective print journal collections get heavy use in the sciences for example, and much of this content may never be digitized."
CL: Okay, I know there are two more questions, but let me take them one at a time.
HS: That's why I stopped right there.
CL: Let me get this one.
HS: We don't know how much digital memory you have!
CL: Yeah! The retrospective conversion issue is a very critical one. You know, if you look at what's available in digital form from most journals, history starts in, you know, some time in the late '90s.
HS: Yes, an exception, of course, that I should mention and I'm sure that you're aware of is Jstore.
CL: Yes, I was going to say a few words about that. Now, if you look, though, at what many of the publishers started doing, now, they just started putting the stuff out in the late '90s and certainly one of the phenomena we see is that when material is available online, it tends to crowd out print material because it's so much more convenient to have it online. This just seems to be a sort of reader laziness phenomenon, I guess, is the best way to put it. There hasn't been a lot of systematic efforts until recently to start converting back material, particularly by the publishers. There is a project called Jstore which was set up a few years ago with some seed money from the Mellon Foundation which has identified something on the order of, let's say, 100 to 200 kind of really core academic journals and is digitizing them all the way back to the start of publishing of each of those journals. And some of them go quite far back. For example, I understand that they're currently working on the Proceedings of the Royal Academy, so that goes back several hundred years.
JB: Um-hum.
CL: But this is only a few hundred journals. We're starting to see other scholarly societies go back. The physicists are starting to go back and digitize retrospectively. A tremendous amount of the astronomy literature is already digitized retrospectively back to day one because it turns out that's just not that large a literature and they managed to find some money to do it. But I think it is going to be a slow process to get these back runs converted.
Now, I want to make one other point here that's very easy to overlook, but which ties back to intellectual property issues. The historic practice has been that as a condition of publication in a scholarly journal, when you submit an article, you transfer all copyright to the publisher. So in fact, what happens with the scholarly journals is whoever published that has the rights to all the back run and basically, you can just cut a deal with the publisher. Somebody can, or the publisher can decide, if they want to, they're going to digitize all this back material.
When you start talking about digitizing old books, it's a completely different story because it's very common in the book publishing world to have agreements where, when the book goes out of print, a year or two later the rights revert to the author. So as we look at digitizing all the books, as opposed to scholarly journals from the last 70 or so years, in many, many, many cases that has to be taken back to the author or the author's heirs, on a book-by-book basis so the rights clearance is very likely for books going to be more expensive than the digitizing.
HS: Talking about the intellectual property rights problem, I mean, it seems like there are more problems with intellectual property rights and access to stuff as soon as you put it on a network.
CL: There are always more problems with intellectual property issues!
HS: How are people dealing with these things on digital libraries? Are these just restricting access to the things?
CL: Yeah, typically what you're doing - I mean, it's very interesting because we moved to the digital world. We're restructuring a lot of the relationships between content, institutions who try and exercise some stewardship over it like libraries, readers, and these things are going to have some profound social consequences. For instance, you know, typically what you did in the print world was you bought a copy of a book and once you owned that copy, you had the right to share it with whoever you want, keep it as long as you want.
HS: Well, in fact, you could put it in the public library and anybody could walk in and read it.
CL: Yes, and when I say "you," you know, I'm including libraries in that. In fact, that whole notion of sort of what is called technically the doctrine of first sale really was what allowed libraries to operate. Now, in the digital world, what we're doing is you often don't own anything. What you do is you license it, and you're typically licensing it to a specific and limited user community.
HS: So in a public library-I mean, if I buy a book, anybody can walk in and read it. But if I get in digital form, it might be very restricted.
CL: Yes.
HS: How - I mean, public libraries, if they go all digital, might look quite different.
CL: In fact, what's happened is we're seeing quite a different situation when you compare university libraries and public libraries as they go digital. If you think of the university library, they can write a contract which says, "We have a pretty finite user community: the faculty, the students and the staff of University X. And we're going to write a license that gives those people�" And you can count up how many there are. "Access to that content." What you're starting to find in many public libraries is that because the user community for a public library is so large and so amorphous, the library's unable to negotiate or unable to afford a license that would permit its entire user community to have access to the digital content. So you see these odd compromises. For example, public libraries licensing digital content, but licensing it only for use of people who are physically present in the public library.
HS: You have to go there to look at the book.
CL: Yeah, and I mean, it's crazy.
HS: [inaudible] crazy.
CL: Because it misses so much of the benefit of the digital content, but it's just sort of a pragmatic compromise and it's the best they can do.
JB: We have a question that starts, I think, focusing us on a couple of challenges in the future, Cliff. One is that once everyone has a digital library, whether it's stored in a warehouse or on a PDA, how do we locate in these libraries? So in other words, you know, rather than having one library, we're finding that we've got just thousands of libraries, right? So how do we find the content truly that we're looking for? Is there going to be any, you know, large global organization about how we structure content?
HS: We should mention that that question was from Nia Phytos Alekhouvu, or something close to that.
JB: Right, I wasn't going to - you see, you're braver than I.
HS: And he's from Minneapolis, Minnesota. I said "he," but who knows? I've been wrong [inaudible] percent of the time.
JB: North Highway 169, for those of you who are in that area!
CL: [inaudible] I know him. That's - this is in a certain way true. What we're finding is that the way the world is going, it's not like there's one monster digital library, but in fact, there are many, many, many specialized ones with kind of funny overlaps sometimes. And this question of how you find the right libraries to search in is a fundamental and hard research problem. Picking the right ones, you know, it's the sort of thing that for specialists in a given area, pragmatically turns out to be a bit of a non-problem.
One of the things you learn as part of your professional training in an area is where the major information resources are. The place where it's a big issue is for students, for casual information seekers, for people trying to do interdisciplinary work and for these people, we really don't have a good answer today. There has been some research done in how to describe these things. Most of it has been relatively simplistic. There has been some human effort devoted to essentially compiling descriptive directories and catalogues of these information resources and that's, you know, worked to some extent. But it's a problem we don't really have an entirely satisfactory solution to yet. And it's worth noting, incidentally, when you think of things like the web search engines-�
JB: Um-hum.
CL: This is where they tend to fail. In fact, there's a tremendous amount of content on - accessible through the web, in part in various digital libraries, but this content isn't visible to the web indexers.
HS: That's right, because it lives on some database and-�
CL: Yeah, in databases and it only manifests in dynamically created HTML pages in response to queries.
JB: That's a good thought, yeah.
CL: And so in fact, the web indexers can't even seen most of the major content resources.
HS: So how does - this brings us to the question of how do you search digital libraries? How do you find this information?
CL: And as I say, right now, it's really piecemeal, it's-�
HS: So you've got to go from one to another.
CL: Yeah, it's largely directories of good places to look.
HS: And at each one, the way you search, it's going to be different.
CL: Yes, although there has been a lot of work on federating them in various ways so that you can search across multiple libraries. Stanford has done a lot of work in this area, Cornell's done some work, as have others. There seem to be some fundamentally hard trade-offs between precision and reach. You know, the more you want to search across multiple libraries, the less accurate you tend to be able to be in asking for what you want.
HS: Are people looking at some kind of XML solution in the sense that the hotel industry and the car rental industry and other folks have sat down and decided that here is a bunch of XML fields, here's what they mean, so we can actually exchange data so we can look into each other's databases in some consistent way. Are digital libraries doing something like that?
CL: Oh, digital libraries have done a tremendous amount of work in how to structure knowledge. The trouble is that these XML-ish sort of structured information solutions work within particular communities. For example, there's a lot of work going on in molecular biology today in how do we exchange various bits of genetic and molecular biology information between digital libraries. But you've got kind of a common universe of discourse there. When you start talking about the literature of the humanities, this becomes much harder. All of a sudden, your data elements aren't so precise and it becomes much harder to talk about what you're interchanging other than text that is sort of read by the human mind.
HS: On a total other topic, just abandoning the e-mail that we're getting and things, what are some of the funding issues involved here? How do universities afford to put up digital libraries? And again, is this a push thing or a pull thing or both?
CL: I think it's both. I mean, most of the money that universities are spending on extending their libraries into digital form is either reprogrammed money from their existing library operations or in some cases a certain amount of new money to support digital work. When you look at scanning and other digitization of older content kinds of activities, because those have this sort of one-off character, many institutions have been successful in finding grant funding of one sort or another to underwrite those conversions, sometimes on kind of an opportunistic basis where they find some sort of fund that's interested in improving access to content in a specific area.
One of the places where I believe we've got some difficult issues facing us is when you look at some of the situations where individual scholars or groups of scholars have cobbled together the resources to build, you know, sort of a discipline-based digital library. The sustainability on that is much more problematic because, you know, often they really don't have a long-term funding strategy for that. They're just doing it for the love of it and because it makes a real difference to research and teaching in their discipline and I think in the next decade, we're going to be faced with a lot of issues about how we institutionalize the support for the more successful of those projects.
HS: But like with journals, you don't have a choice, right? With a lot of journals that are now only available in digital format, your library, you want this journal, you have no choice. Right? Are there going to be more and more things like that? I mean, are we going to start to see books that are only available in digital format, that's it?
CL: Um-hum!
JB: I think-�
CL: We're already there, to an extent. There are at least some things now that really are only available in digital format and to the extent that it's important to have access to those materials to support your research and teaching missions, you've got to do it.
JB: In fact, you know, I think I just saw something recently that the Oxford English Dictionary is planning their next edition, that they are making that decision, to only go digital.
CL: Yes.
HS: Only digital?
JB: Is that right?
CL: Yes, you're starting to see that for a lot of reference works now.
JB: Um-hum.
CL: Big ones like that.
JB: As we're getting, obviously, more questions, but also we're getting close to the end of our time, Cliff. Do you want to maybe make a comment and summarize in terms of are there really, you know, one or two tough problems or maybe tough, good answers that you'd like to encourage our folks to look at or think about?
CL: I can try and do a little summarizing. I mean, I think this is a very exciting time for digital libraries because they're showing up in a lot of contexts and they're really starting to make a difference in the way we conduct teaching, learning, scholarship, provide information access in a wide variety of fields. I think that, you know, certainly as we've touched upon in the last 45 minutes or so, there are lots of technology issues here, but I think in some ways the dominant ones are going to turn out to be organizational, institutional, economic and rights kind of issues. And I think we've kind of touched on some of those as we've gone along. Questions of how you sustain these things over the long run, I think, are going to be very crucial and insure there's enough funding that they're here five years from now, maybe, when the person who was the driver in creating it has moved on or lost interest. I think we've got some tremendously hard problems about how we're going to retrospectively convert a lot of the riches of the twentieth century into digital form because of these rights encumbrances which are so costly to sort out, even if the material has only minimal scholarly value.
JB: What about, you know, a campus library today? Do you have any particular suggestions or recommendations on how a particular campus can participate in this revolution and evolution as we move to digital libraries? I mean, you know, how many more buildings should be build or how much of their collection should be digital vs., you know, analog?
CL: Well, I mean, I think you're going to see a slow but steady conversion to digital material. The building issue is tough because libraries aren't just book and journal warehouses, they're social places, they're learning places, they are sometimes intellectual centers for a campus community and what you're seeing now with much of the new construction in libraries or remodeling in libraries is that they're really rethinking how they use space. They're doing lots of group collaboration rooms, teaching and lecture facilities, they're putting in places where people can work together with multimedia, you know. One of the most elaborate attempts at this, for example, is the Media Union that the University of Michigan built a few years ago, which really combines library and studio and lab space in a very unusual and innovative way.
So I think, you know, when you talk about campus strategies, this is something that really the whole campus needs to come together and talk about. Their kind of strategies for digital content, and it's really got to be connected into their teaching and research and public service missions of the institution, as you see institutions, for example, talking about extending their distance education offerings.
Often digital libraries will go hand-in-hand with that because just as you want to project teaching across the net, you also need to project the information content to support teaching and learning across the net. One other point that I think is worth making is that a lot of the action, certainly not all of it but a fair amount of it up until recently, has been centered mostly at the large research institutions who have the resources, the know-how, the people to build systems themselves. I believe that we are right on the sort of dividing line now where we're starting to see more and more commercial offerings come into play that allow smaller institutions that may not have that kind of engineering capability to build local systems.
HS: Why don't they just take advantage of the large institutions? It seems like the large institutions could share their libraries with these smaller institutions.
CL: Well, they can't, really in some ways because of licensing constraints. There have been some situations where groups of institutions have come together into consortia and have licensed things, mounted it at one lead institution or two lead institutions. But, you know, at some level this is not the business that research institutions need to be in in the long run, and I think you're going to see a set of commercial offerings that really have a very high impact on smaller institutions. We're already seeing that with JStore. JStore is offering that collection of journals to rather small academic institutions at pretty affordable rates, and all of a sudden, your user communities have a resource they just didn't have before and the evidence is they're using it heavily and to good advantage in both teaching and research.
HS: Do we have time to get another question in here, Judith?
JB: Yes, I think-�
HS: Let's do it anyway.
HS: Okay, we have a question from Richard Stringer High who wants you to talk a little bit about the role of the librarian in a digital library.
JB: It's funny, we talked about this just before we started, right?
HS: Yes.
CL: Yeah, I think that certainly we've seen librarians very involved in the sort of operational digital libraries. The engagement of librarians in some of the research work has been spottier. For example, in those NSF projects I described earlier, most of those funded through computer science departments and in some cases, libraries and librarians were not particularly engaged in those. The focus turned out to be mainly about technological underpinning. I think we've got an interesting sort of nomenclature problem here. Sometimes we think of librarians as the professionals who work in libraries, and in fact, we're going to see some digital libraries in contexts other than libraries. I think people with librarian training and skills are going to be very involved with that, although they may go by other names.
JB: Isn't that reflected in - I've seen a lot of library schools change their names over the last three or four years to Information Systems.
CL: Yes, sometimes they just use very generic terms like School of Information.
JB: Yeah.
CL: And certainly, I also think that as you work at building and running digital libraries, many of the sort of traditional library skills are relevant, but there's a whole bundle of new skills that also come into play.
JB: Okay. Howard, did you want to try and combine those two questions regarding Napster and Gnutella.
HS: Not only that, but I think I'll answer the questions!
HS: Unless Cliff disagrees pretty strongly. But I think, Cliff, we had a couple questions about whether we should be using Napster and Gnutella to solve digital library problems and I hope you're going to say, "No, that's not the kind of thing we want to get into." At any rate, that's what I would say.
CL: I guess I sort of view things like Napster and Gnutella as a little bit orthogonal to the question of digital libraries. They're really kind of file-sharing technologies.
HS: Well, one person, Cliff, said, "Gee, if I can't get to this information I really need because I don't have access, what about some kind of Napster thing? Then we could all share it."
JB: Get links into the poor students.
HS: Sounds a little illegal to me.
CL: Well, I mean, actually, there's a very interesting proposal floating around which, if memory serves, comes out of Yale, but I wouldn't absolutely swear to that, call Docster, where the idea is you basically extend off of fair use to do sharing of articles among various people who've legitimately gotten copies of it. And, you know, if you're interested in that kind of thing, there's some papers on that on the web that you might find interesting reading.
JB: If you have one specific one that you'd like to highlight, Cliff, we can do that. I think we may want to - may need to close, although we have enough content and questions and you have enough information that we could go on for hours, literally. Howard, do you have any final comment?
HS: No, I think we're way over, and especially when we're in a discussion about Napster and Gnutella, this might be a time to stop here. I agree, I think we should - we didn't cover even a third of the material that we had hoped to, so I commit you on the air, Cliff, to come back and do some more of this with us sometime.
CL: I would be delighted to. There's a tremendous amount we've seen to talk about here. One of the things I find so fascinating about this area is the way it merges technology, organizational and social kinds of issues and I'd be delighted to come and explore them again with you in future.
JB: Thanks so much, Cliff. Well, let's ask our folks to be sure to set aside Thursdays at 4:00 Eastern time for this continuing series of Tech Talks and to remind everybody that our next session, which will be two weeks from today on October 12 will be a special Live from Educause session. In fact, our third annual Live from Educause session! At this session, we'll ask another very important question, focusing on portals, i.e., just where are campuses with campus and student portals and just what are faculty and students doing with portals? Our Live from Educause session is the only time each year that our hosts and experts are all together in one physical spot.
HS: It's actually a strange thing!
JB: That's right, that's right. In line with this being a special event, we have three experts ready to talk about this question, and they are Orrin [inaudible] from the University of Washington, Michael Hanberg of the University of North Carolina Portal Project and Penny Turgeon of the Worcester Polytechnic Institute. For those of you who will be in Nashville, do come by and join in and ask the experts live questions. For those of you who won't be in Nashville, be sure to listen in and send your questions in by e-mail.
Many thanks to everyone who supports these Tech Talks and everyone who helped make this event possible today. A special thanks to guest expert, Cliff Lynch; to technology anchor, Howard Strauss; to Terry Calhoun, our Tech Talk web guru; to David Smith, Patty Gaul of CREN; Jason Russell, Gail Terkeurst and the support team at Merit Network: to Susie Berneis, audio file transcriber. The audio file and the checks will be up in about a week. And finally, a thanks to all of you for being here. You were here because it's time. Bye, Cliff and thanks for saying you'll come back. Bye, Howard. Bye, everyone.
HS: Bye, Judith. Bye, Cliff.
CL: Bye-bye.
END OF WEBCAST