Home > TechTalks > Transcripts Archive > TechTalks Transcript

TechTalks Transcript

What Are Likely to be Your Best Uses for XML?

Judith Boettcher
Judith Boettcher
[JB]
Howard Strauss
Howard Strauss
[HS]
Michael
Michael Sperberg-McQueen
[MS]

February 8, 2001

Audio
  • Streaming MP3
  • Download MP3 (Download Tips)

Topics covered include:

JB: Welcome to the CREN Tech Talk series for spring of 2001 and to this session on "What Are Likely to be Your Best Uses for XML?" You are here because it's time to discuss the core technologies for your future campus. This is Judith Boettcher, your CREN host for today, and our session is coming to you today with the support of the CREN member institutions.

I'd like to welcome Howard Strauss, as usual, as our customary wonderful technology anchor for Tech Talk, and he's a well-known web technology expert, portal expert and - I must say - Princeton booster. Welcome, Howard!

HS: That just means I'm going to answer all the questions that come from Princeton today, right?

JB: That's right!

HS: Before anybody else's. No, I really don't do that! But thank you, Judith. I'm Howard Strauss, the technology anchor for the Tech Talk series of technology webcasts. In this webcast, I invite you to join Judith and me in a lively technical dialogue about XML with our guest expert, Michael Sperberg-McQueen, that will answer the questions you'd like answered and to ask those very important follow-up questions. You can join in this dialogue by sending your XML questions via e-mail to expert@cren.net anytime during this webcast. If we don't get to your questions during the webcast, we'll provide an answer in the webcast archives.

In the late 1960's, IBM was faced with a rapidly expanding sea of legal documents that needed to be retrieved, modified, classified and managed by their computer systems. IBM turned to three researchers: Charles F. Goldfarb, Edward Mosher and Raymond Laurie, who in 1969 developed GML. GML stands for Generalized Markup Language, and not coincidentally, is also the first initials of each of the researchers' last names.

GML employed a standard set of tags that enabled legal documents to be marked up and formatted in a consistent way that made automated processing of these documents more effective. GML was soon extended and standardized to become SGML, or Standard Generalized Markup Language, and was adopted as an ISO standard in 1986. Even today, it remains the last word in markup languages and is used in the most modern publishing software such as Adobe's Framemaker.

In 1989, Tim Bernersley and Anders Bergland at CERN needed a formatting language for the new World Wide Web they had just invented. They created HTML-Hypertext Markup Language-loosely based on SGML. HTML was used in the creation of about a billion web pages today.

HTML describes the format of a document. It does not describe the meaning of a document. While HTML is okay for making pretty web pages, it is useless as a document abstraction language and nearly useless in enabling web searching. SML-Extensible Markup Language-was developed by the World Wide Web Consortium, commonly referred to as W3C, to address these problems. W3C adopted XML version 1.0 as a recommendation for a standard in just 1998. XML describes nothing about the format of a document, but does describe the meaning of a document, and unlike HTML, it is easily extensible.

At first glance, XML looks just like HTML. It has paired tags mixed with text, but the tag names are quite different than in HTML. In fact, the XML tags used for different kinds of documents are likely to very different from each other, and since there is no formatting information, you might wonder how XML can produce a formatted web page. The shorter answer is that it can't. To do any formatting, XML must be paired with a stylesheet language such as XSL, the Extensible Stylesheet Language. Once paired with XSL, the same XML can be used to produce many different renditions of a document. For example, a rendition for a web page, printer and database would all be possible. For awhile, it was thought that XML's most important use would be to replace HTML, but having a general way to describe documents to computer systems turns out to have dozens of other important uses, some even more far-reaching than improving upon HTML.

Today, XML is being used in every nook and cranny of information technology. In many areas, it has become the essential interface for making systems work together. Specialized collections of XML are appearing for use with mathematics, chemistry, biology, tourism and even managing Java programs. Just a few years ago, XML was going to be the next big thing. Today, it is everywhere and is much bigger than almost anyone had imagined. You need to take immediate action to learn enough about XML to understand its impact on every byte that anyone on or off your campus processes. Tom Lehrer offer's find advice when he sings, "Don't be nervous/Don't be flustered/Don't be scared./Be prepared!" We'll start your preparation as we delve into some of the most interesting uses of XML on today's webcast of Tech Talk. Judith?

JB: Thank you, Howard, and I didn't know you were going to break into song. Is that the surprise promised?

HS: That was the surprise, all right.

JB: All right, that's great! Well, with that as an introduction, I think we will be talking a lot more about form and substance today, and to do that, I'd like to welcome a new expert to Tech Talk, Michael Sperberg-McQueen from the World Wide Web Consortium-also known as W3C. Michael is a member of the technical staff at W3C and serves as co-chair of two groups with David Hollander, the XML Schema Working Group and the XML Coordination Group. Michael's background is that he was a member of the Academic Computing Center at the University of Illinois at Chicago for 12 years, and before that, he was at-guess where!-Princeton University. A Princeton connection! And while at UIC, Michael had special responsibility for document processing, i.e., Michael, I bet that's the link to XML that you'll probably share with us.

MS: Yes.

JB: During his time at Princeton, he served as consultant for the Humanities Computing Questions. In Michael's biography, he notes that he has a Ph.D. in Comparative Literature, but "strayed into computing as a student and never strayed back out."

HS: Hard to get out, isn't it?

JB: Right! Enjoy seeing that. There's lots more about Michael at the website, so be sure to explore it. Michael, thanks for being here and joining us all the way from Bergen, Norway. I think that you're the "most distant" but still close expert that we've had joining Tech Talk. Welcome!

MS: It's a great pleasure to be here. Thank you for having me.

HS: We didn't mention that all of Michael's answers are going to be in Norwegian, did we?

MS: Ja, [inaudible].

JB: But they'll be translated and transcribed!

HS: By this very fancy technology we have here, using XML.

JB: XML, right.

HS: Michael, I sort of hinted at what XML was, but maybe you could give us your view of what it is and why is it so important?

MS: Okay, well, first of all, it is, of course, an acronym for Extensible Markup Language. The word "extensible" is there to distinguish it from finite, concrete markup languages like HTML. Its second spec, developed in an open, non-proprietary way by the World Wide Web Consortium. And third, it's - well, some people say it is a markup language. I prefer to say it is a markup meta-language. That is, it's a language for defining markup languages. A language like HTML defines a specific set of tags that be used with text. Other applications like Chemical Markup Language define a different set of tags for different purposes and XML is the meta-language that you use to define languages like HTML or CML or any of the others that we'll be mentioning.

HS: Okay, even though this is not going to be a real technical discussion here, if you could just perhaps talk about the couple things that come up whenever anybody talks about XML. Namely, we hear about DTD's and schemas and stylesheet languages. How do those things hold together?

MS: Okay, a DTD is Document Type Definition. It is the way, defined by SGML and by the XML 1.0 spec, for actually declaring a set of element types that occur in your documents and that get marked by START tags and END tags. And declaring the ways in which they can relate to each other, so that you can specify in a sort of document grammar that paragraphs occur within the body of the text, but the body of the text does not occur inside of a paragraph. And similarly, you can say that the order number occurs within a purchase order, but if you have a purchase order tag appearing within an order number, then you have data corruption and you had better not process that baby! DTD's are a well-established technology. They're important, they're widely deployed, they're a perfectly good thing.

HS: Do you have one for, like, each different kind of document with XML?

MS: Yes, basically, you will. You can have a DTD for general purpose documents, you can have a DTD for the exchange of the component information data sheets that you get with semiconductor components. There are specialized DTD's for aircraft repair manuals and on and on and on. There is a small but thriving industry of people who earn their living helping other people design DTD's for industrial applications. The problem, the only problem with DTD's is they don't go quite far enough, as you mentioned in your intro.

One of the big surprises in the development of XML is that although it comes out of the publishing world and the document processing world, it has been embraced with somewhat surprising fervor by database people and programming language people and e-commerce people and all sorts of people whose material is a document or a text only in the very broad sense of the word. And things like purchase orders.

Well, yes, I'm sure any literary critic would tell you that of course it's a text, but we don't normally think of it quite that way. The problem is that for applications like that, you would like to be able to specify not only that the order number occurs in a particular location and a date in a particular location, but you'd like to be able to specify that the order number has a specific data type and the date has a specific data type and the price had better be 6.2-a decimal number with two digits to the right of the decimal point.

HS: So schemas add some additional information to DTD's. They're DTD's with more information in them.

MS: Schemas are DTD's ++, yes.

HS: And these schemas and DTD's, do they look like XML?

MS: DTD's don't, schemas do. That is, DTD's are in a specialized notation. They are not themselves XML documents. That's one of the frustrations over time. Those of us who work with DTD's have gotten frustrated because we have wonderful tools for dealing with structured information, but they don't work on DTD's because DTD's aren't XML [inaudible].

HS: So it would be fair to say that DTD's are going away and schemas are replacing them?

MS: If you take a long enough view-�

JB: [inaudible] like that to happen.

MS: Some people would like that to happen. Other people would like that not to happen. I think if you take a long enough view, that the advantages of using XML as the notation for defining a document type are very strong. But DTD's won't be going away in the next couple of years, at least.

HS: In my intro, I said that W3C - which is your organization now - just said that in 1998 there was a recommendation of a standard for XML. What's the state of that now?

MS: Recommendation is as far as the W3C goes. We are formally not a standards organization where a member - or a membership organization that develops specifications, develops technology for the web. And recommendation is the highest state of any of our specifications. We avoid calling them standards because we are not an [inaudible] accredited standards body or anything like that.

HS: So is there a standard for XML?

MS: I think, in fact, there probably is an ISO standard for a particular profile of SGML that turns out to be effectively XML. I'm not actually sure that it has progressed all the way to being an international standard or not. I occasionally hear people having problems because they can't specify XML on a request for bids because their government rules won't allow them to require anything that's not an international standard. So I'm not sure that it's all the way there, but I believe there is work in that direction.

HS: Okay, is XML going to replace HTML? I mean, is it happening now? How far along are we?

MS: Well, XML won't replace HTML because they are two different levels. In the definition of HTML, XML has already replaced SGML in the sense that the most recent W3C recommendation for HTML is actually XHTML, the XML version, the XML reformulation of HTML 4.0. The question about whether-�

HS: When I sit down to build a web page today, I build it one way or another in HTML, whether I use Front Page to build it or one of these other things, or whether I write it myself. Tomorrow, when I do this, am I going to start with XML?

MS: You may well start with some vocabulary, some XML vocabulary other than HTML. But I expect HTML will be around for a long time. It's a bit like the old programming language joke. What programming language will science and engineering be using 20 years from now?

HS: FORTRAN!

MS: We don't know. We only know that it will be called FORTRAN. And I don't know what markup language we'll be using 20 years from now, but I do know that there will be an important variant of it called HTML.

HS: But aren't people using XML today to generate web pages?

MS: An awful lot of people are using XML to generate web pages. Let me say again, XML in vocabularies other than HTML. But by and large, what you get is dynamic translation to HTML for all of the browsers that don't currently support direct display of XML. There are-IE 5.0 has pretty good XML support. I am told that Mozilla has visible XML support, although not quite as-�

HS: By Mozilla, you mean Netscape?

MS: I mean the open source version of Netscape. I'm not sure that the XML support has gotten into the Netscape released product yet because Netscape tends to lag a little bit behind the bleeding edge versions of Mozilla.

HS: Okay, if XML is not going to replace HTML, what are people on campuses using it for?

MS: Well, in academic departments, people are using it for all sorts of things. There are XML vocabularies for biological information and for physics and for geology and for computer science and for mathematics.

HS: Can you take one of those and just tell us how it's being used? Like biology or any one of them.

MS: Okay, well, I'll take-�

HS: [inaudible] academics. Okay, fine. I'm in chemistry.

JB: We're going to go to chemistry, okay.

HS: How's this going to change what I used to do before XML vs. after XML?

MS: Okay, before XML, if you wanted to put up information about, say, a family of molecules you were interested in on the web to share with your collaborators at other institutions, you ended up having to put up some textual description in HTML and maybe some descriptions of the molecular information in some ad hoc format that you wrote with your colleagues or that fits with some chemical manipulation.

HS: So I've got some proprietary program somewhere?

MS: You have some-�

HS: Molecule describer.

MS: Proprietary program. With XML, it's possible - and the Chemical Markup Language demonstrates this-�

HS: That's CML.

MS: CML, yes. CML, one of my favorites probably because it was one of the earliest applications of XML by someone who was not a member of the working group, provides ways to describe the structure of molecules and Peter Marie Rust who developed it also developed a Java-based browser, a little Java applet, so that when you pull down an XML description of a molecule from his site, you would see in one window a translation of the molecular information into a chemical formula and you would see in another window a translation into the kind of semi-English that chemists use to describe molecules. And in a third window, you'd see a 3-D molecule that you could rotate, and it was all from the same underlying representations, so they never got out of sync.

JB: So the same information generated all three of those manifestations?

MS: Exactly. So it was a very compact illustration of the ability to use the same information for different purposes and look at it in very different ways.

HS: And was it using some spreadsheet language to do that? Was it using XSL? To get the three different versions?

MS: No, he hard-coded the 3-D interpretation of the Chemical Markup Language in his applet.

JB: In the applet?

MS: Yes. Nowadays, we're getting to the point where you might be able to do a sort of stylesheet kind of thing with that, with the [inaudible] on [inaudible] and so forth, as they move into 3-D, using XML for 3-D.

HS: What about in distance education? Is XML going to have some impact there?

MS: So far, I haven't seen a lot of impact, but I have always thought that was a great place to use XML. We're spending enormous amounts of resources in every campus I see putting material online in a form that's more than just scan in the syllabus. And yet a lot of the effort is going into software-specific formats that are going to be out of date maybe in three years, maybe in five years. Certainly in ten or 20 years, and that's in some cases long, long before the information will be out of date.

JB: So it sounds like you're recommending that there are some applications that would be very, very good to use XML, like starting now?

MS: Yes!

JB: Yes, now!

MS: I would say any information that you expect to care about in five years, you will do much better putting it in XML than putting it into old-style HTML or into a proprietary word processor.

HS: What about databases? When we talk about information, a lot of our information is in databases. How does this HTML stuff fit into our databases?

MS: Well, that's the tricky bit. The old HTML stuff-and historically, documents have not fitted very well into databases at all. One of the premises from which the original development of XML went forward was, well, the database people don't need any particular help for data interchange. They have common delimited format and they're perfectly happy with that. And so we'll focus just on documents. What we discovered after XML came out and the database people started saying, "Oh, that looks interesting! Let's try that!" was that they weren't happy with common delimited format because you lose too much of the contextual information. It's too easy for the schema to get stripped off the top or not to get dumped in the first place, and the self-labeling of XML was very important. And also, as time has moved forward, the SQL community at least has-�

HS: SQL, when you say "sequel."

MS: Sorry, yes. The SQL community-�

HS: Doesn't sound quite the same when you hear it as when you see it.

MS: --has reintroduced a lot of notions that are familiar from pre-relational, hierarchical databases that would be very hard to do in a common delimited format. The kind of repeatable fields, having arrays as the values of individual columns in a table, that's all much more easily marked up in XML than in the old style common delimited formats. So the database people became interested in XML and I think the biggest impact of XML in the long run is going to be in the fact that you have a single form, a single format that can handle both the kind of data that we're used to working with in word processors and the kind of data that we're used to working with in database management systems.

HS: That's so that we can also exchange information between different databases? Is that one of the hopes, that if everybody uses XML?

MS: Very, very definitely. That's one of the big interests of the database management system vendors. They're building on systems to use XML as a source format or as a target format for database dumps or exports and to the extent that they do that, it will become easier and easier to move highly structured data from-or hierarchical data from one database management system to another. And if you can have that take a side trip through someone's word processor, all the better.

Up until now, we've had this enormous, this big divide. In many institutions, we went through years of struggle to get all the mission-critical data into centralized data repositories under control of database management systems and it was a lot of work, but I think most people feel it paid off and no one seems to want to go back to flat files that get overwritten and get messed up. But even so, if you measure the amount of information in an organization, the amount of information that's actually stored in the centralized databases is, by many measures, only less than half of the information in the organization.

HS: The rest is in documents.

MS: The rest is in documents. Your undergraduate catalogue, your graduate catalogue, your admissions policy, your policy handbooks. All of those things are documents. They're kind of tree-like and leggy. They don't fit really well into neat relational tables and they're hard to manage with a database management system. But having the same format for that data and for relational data is a big step forward, and once we start seeing and exploiting the advantages, then I think no one will ever want to go back.

HS: If we really do want to exchange information, don't we need a common XML schema to do this? Or are things not going to work?

MS: Yes and no.

HS: I understand the "yes" part. As to both sides.

MS: XML is not a silver bullet. Part of the problem of data interchange is agreeing on the format and XML helps with that a lot.

HS: But [inaudible], don't we also agree on the names? I mean, don't we also say, "I'm going to call a person's name NAME rather than PROPERNAME or LASTNAME or whatever."

MS: You really do need to do that, if you want it to be understood at the other side. And that is why the spread of XML is accompanied by the spread of projects in industry after industry to agree on sets of names that people can use. But one of the things-the reason I said no is that, somewhat to my surprise, a colleague of mine in the health industry says, "No, no, what we're interested in is the fact that it allows you to put in stuff that you haven't agreed on! We have the stuff that we agreed on but we don't want to write two export routines, one to send to you guys and one to send to the other branch hospital down the road. To that other branch hospital down the road, we have to have this additional information. If we tag this using a simple naming convention, you won't know what it is, but you will know that you don't know what it is and you will know that you can skip it." And the ability to put in information and know that your data interchange partners will know that they can skip it is an enormous liberation for the healthcare people because they have been struggling with formats that made that impossible.

HS: I certainly see the importance of being able to do that, but it seems that the part you agree on is also very important.

MS: Oh, don't-�

HS: If we just got the advantage of the part you didn't know about.

JB: But then there's all that stuff, you know, that you do encounter and then you say, "Well, what do we do with this?" And then yes, it sounds like we have the option to say, "Oh, I don't have to worry about it. It's there, but I can choose to either take it or not."

MS: Exactly. The self-labeling of the data means that it is much easier to have evolution of vocabularies. You can write your version 1 processors in such a way that when they see a version 2 document, they don't necessarily fall over and die. They may need to know, "Oh, for this version 2 document, I should fall over and die because I don't know how to deal with it properly and I can't skip this." But the evolution of vocabularies is a lot easier with data that clearly marks what it is and which version it is.

JB: I do want to invite our listeners that now is a good time to send questions in to Michael on XML that you might have at that address again, that's expert@cren.net. Michael, earlier you said that, gee, a heartfelt yes that you felt it was really important for more people to be using XML. But what are some of the reasons that people aren't using it? What are some of the barriers, perhaps, to using XML? And is there something we can do about that?

MS: Well, one of the barriers is, of course, that the people that you're exchanging data with don't use it. I was working on XML before I left the University of Illinois and I would have been very happy to use XML in a number of projects there, but the source of the data didn't want to provide it in XML. Actually, it was a classic academic computer center, administrative computer center situation and so they didn't want to talk to us at all, let alone agree to use [inaudible] the data format.

JB: Okay. It's kind of a critical mass problem.

MS: It's a critical mass problem. There's a certain degree of chicken-and-egg problem. In some cases, XML is going to require a certain amount of design and agreement because, as Howard says, you do have to agree on some things in order to actually have the data exchange make any sense. And those things take time.

HS: They're probably the most difficult thing, probably more difficult than any technical problem is getting people to agree on things.

MS: The political problems are very, very definitely big ones.

HS: Okay, we have a - I'm sorry.

MS: Get those things down.

HS: We have a couple of questions that have come in.

MS: Sure.

HS: One came in from Rick Chavez from the Mitre Corporation, and Rick says, "Is there any standard DTD's or schemas that define document releasibility over secure networks?"

MS: Okay!

JB: That means that you're thinking about it, right?

HS: If you're waiting for me to answer it, it wasn't going to happen.

MS: I suspect there are, but I suspect that they are done by people who won't tell me about it because I don't have the right security clearance. There are certainly a lot of SGML and XML use in the defense industry and in intelligence and I would be surprised if no one had done that, so I would certainly talk to people knowledgeable about the CALS initiative. I think anyone at Mitre is likely to know about that already.

JB: What does that stand for?

MS: It has stood for a variety of things. When I learned it, it was Computer Aided Logistics and Support, or later, I think, Computer Aided Life Cycle and then I think most recently, it's been renamed Commerce At Light Speed. But it's a project to encourage or require defense contractors to deliver documentation in SGML or, I suppose now, XML form and so there are a lot of people associated with CALS dealing with problems like security. But I can't give you any more detailed leads than that.

HS: Okay, that's fine. We have a question from Ron De Gray at St. Joseph College and he asks, "Are there public domain XML editors analogous to HTML editors? If so, where can we get them?"

MS: Okay, sure. The XML editor I use is a mode for EMAX called PSGML mode, developed by a fellow at the University of Lindsjerping in Sweden and extended for XML by David Magenson of a variety of organizations in Canada, University of Ottawa is the place where he was when I first met him. And PSGML mode now supports XML and it makes a fine editor if you're an EMAX kind of person. If you're not, there's also XED from the language technology group at the University of Edinburgh and we should be sure to put pointers to these on the website.

JB: [inaudible], okay.

MS: You can certainly find a variety of other XML editors, public domain and commercial, on the website that Robin Cover maintains, a pointer to which is on the Tech Talk website.

HS: The browsers include HTML editors. Do they include XML editors as well, or are they going to?

MS: Well, the browser vendors don't tell me about their product plans, so I don't know for sure. I have heard speculation that that might happen in at least some cases, specifically in at least Microsoft's case because they are making such a big commitment to XML. But I don't think that has happened yet.

HS: Going back to these groups that are trying to come up with some standards to share information with XML like OTA, the Open Travel Alliance, or SIF, which is something that K-12 folks are using. Is there something like that going on for people in higher education?

MS: Well, I have been through Robin Cover's website without finding anything that jumped out at me. There are a lot of such-there are a lot of efforts based in higher education for disciplines, for research areas, but for the kind of tasks that arise in administrative computing for higher education, I have not yet seen signs of such an effort.

HS: So although we have rental car agencies and hotels and K-12 really trying to pin down XML schemas, we don't have this thing going on in the higher ed? What about this EDUPERSON initiative? Is that related to XML? Is that the beginning of this kind of thing, or how does that fit in?

MS: It looks - I'm not deeply familiar with EDUPERSON, but it looks to me as though EDUPERSON is not currently using XML. They're apparently developing a list of the properties that people have as a profile for the Lightweight Directory Application Protocol, LDAP. But of course, any set of properties that's sufficiently precise to serve in an LDAP environment would represent a kind of data analysis or document analysis that you need for a successful XML effort. So capturing that information in the form of an XML DTD or an SML schema would be a natural next step, if people had that interest.

HS: Talking about next steps, where's this XML thing going right now? You talked about a few places where it's used. Is there some direction that it's taking right now or in the near future?

MS: All sorts of directions, in fact, so many that it's sometimes hard to get an overall sense of a unitary direction. There are an awful lot of people working on XML based vocabularies for a lot of things. Even within the W3C, we have XML being used as the basis for the privacy preferences profiles, for scalable vector graphics, for XSL itself and XML query and so forth. We have continuing development of the XML schema language which is XS, to be distinguished from a variety of other schema languages for XML, some of them proprietary, some of them open and all of them available in bewildering profusion.

The goal of any XML schema language is as we said earlier, to get a better match to the needs of programming languages and e-commerce and database people so you get more information about data types, slightly more control over things like type inheritance and so on. And the other big direction in the W3C is the development of XML Query on the one side, which should make it-by providing a standard query language-much easier to search large bodies of XML material, whether it's document-like or database-like. And there's an XML Protocols activity which is exploring what's necessary to use XML as the basis for messages that go back and forth so that you can use XML to define line protocols.

HS: Tell me a little bit more about this XML Query. Is that going to replace SQL, and why not just use SQL?

MS: Well, the main reason not to use SQL is SQL is really very tightly tied to the relational model and XML is really tightly tied to the notion that the data doesn't look so much like a table as like a tree. So hierarchical nesting and so forth for which there are no convenient constructs in SQL are very central to querying XML. There's certainly an awful lot of interest in avoiding gratuitous incompatibilities with SQL because there are a lot of the same organizations involved. There are a lot of SQL vendors active in the XML Query working group. But-�

HS: I mean, if XML Query ever becomes a thing that lots and lots of folks use, is that how people will query the web?

MS: That's certainly one possibility. The scope of the working group is careful to say its task is to develop a query language that can be used over finite bodies of material, which means that they don't have to go back to square one if it turns out you can't use it to survey infinite bodies of material, which for all intents and purposes is what the web is becoming. On the other hand, I believe that there's nothing in the current spec that makes it impossible to query arbitrarily large and unbounded bodies of material.

HS: Like the web.

MS: Like the web, exactly.

HS: Unbounded. We have a question from Mies Martin at Michigan Technological University and he says, "Is RDF an XML schema? If not, what's the difference between RDF and an XML schema?"

MS: Okay, RDF is an acronym for the Resource Definition or Resource Description Format, which is again a recommendation issued by the W3C a couple years ago. It's the outgrowth of earlier work on content description and content rating systems. RDF itself is not an XML schema. There is a schema language for RDF called RDF Schema which was developed before the W3C launched the XML Schema Working Group and it differs from XML schema in that RDF schemas are designed to constrain the underlying RDF data model which, if I can wax technical for a brief moment, is simply a directed graph with labels on the arcs. And XML Schema constrains basically trees. There are a few non-treelike things like keys and cube references in an SML schema, but if you just look at the element structure, XML Schema is there for constraining trees and RDF schema is there for constraining arbitrary graphs. RDF Schema does use, will use the data type system developed for XML Schema. And that's one reason that it's not a recommendation yet is they're waiting for XML Schema to become a recommendation.

HS: Another standards group that I've heard mentioned is OASIS. Is that a standards group or is that a project? What is OASIS?

MS: OASIS is another membership organization. It grew out of an early organization called SGML Open and its scope is support for and encouragement of open information standards. SGML, XML and related specifications. OASIS is interested in helping people with the development of specific XML vocabularies and it's a place where you can go if you have a particular need for standardization in a particular industry or a particular group. W3C cannot standardize everything.

HS: You wouldn't know that by looking at the web pages!

JB: Actually, is this the group, then, that helps support the chemists and the physicists and all these other discipline groups developing their own XML vocabularies?

MS: Yes, if you need an umbrella organization to provide a sort of framework for the development of a specification, gathering public comment and so forth and progressing it through rational stages, W3C does that for some areas but we have finite resources and we have to pick and choose so we try to concentrate on areas that seem very fundamental or of extremely wide application. And so we work on schema, we work on XML Query, we've developed vocabularies for graphic languages and so forth, but if the insurance industry needs something for the insurance industry, W3C is probably not the right place to go. OASIS may well be the right place to go.

JB: And we didn't mention MATH yet, did we? I think that's an important thing to mention.

MS: I don't think you have mentioned MATH.

HS: No, we didn't mention MATH!

JB: Do you want to mention MATH?

HS: No, we didn't mention MATH. We didn't mention MATHML. Tell us all there is to know about MATHML.

MS: Oh, I can't tell you all there is to know, but I can tell you a little bit. MATHML is-�

HS: That's Math Markup Language.

MS: Math Markup Language.

HS: I'm sorry. Can't see what we're reading here.

MS: It's a project of the W3C, and in fact, there is a recommendation out. MATHML 1.0. They're now working on MATHML 2.0, developing a tag set for marking up mathematics in such a way that-�

HS: Okay, when you say "tag set," you mean XML schema. Is that a tag set? No?

MS: Tag set is a little broader. An XML schema defines an XML tag set, but you can also have a tag set that doesn't have a schema document. It's probably not a good idea, but for current purposes.

HS: Purposes of this talk!

MS: Basically, yes.

HS: [inaudible] schema.

MS: They've developed a DTD fragment and they are now developing an XML schema for mathematics so that you can display math without necessarily having to do what people mostly do today, which is do the math in [inaudible-sounded like lay tech], generate encapsulated Postscript, generate bitmaps from that and embed the bitmaps in your document.

JB: Ouch.

MS: "Ouch" is right.

HS: Okay, we must have said something in that last question that caused people to send in their e-mail. Either that or we're at the end.

JB: And we're already behind here!

HS: Right, when people send in all their questions. Anyway, we did just get a rash of questions. If we could try to go through them quickly, I'll even take them in the order they came in, Judith, instead of taking Princeton ones first here.

JB: Well, that's a good change!

MS: That's very big of you, Howard!

HS: A real change here!. Right, we have one from Greg Spies at the University of Delaware. He said, "Do all current browsers in use allow XML?" Answer that quickly!

MS: No.

HS: Okay, Greg, the answer is NO!

MS: Yes. But talk to your browser vendor.

HS: Oh, I would say, more than talk to your browser vendor. I'd say fuss with your browser vendor because your browser vendor ought to support this stuff. It's a nuisance that they don't. Okay, we have a question from Lucy Stein at Rutgers, up the road. That's close to Princeton, so perhaps I'm showing some prejudice here.

JB: We'll [inaudible].

HS: Lucy says, "Can databases like Oracle or SQL Server directly export to XML format?"

MS: Many of them can. Those that cannot are working on it right now as far as I'm told, yes.

HS: Okay. She continues to say, "Because XML is browser limited, only works with IA 5, how shall we use it?" I think her point is that if only some of the browsers support it, is it ready for prime time?

MS: It's still - yes, it's ready for prime time because even if you have to translate it into HTML, having XML as your source allows you to translate to HTML at one door and into Postscript at another door so you can have single source for both the paper and, say, the web version of, say, your undergraduate catalogue.

JB: Now, I think that's really important and we want to make certain. So you're saying, then, that XML really - even though the browsers, we do have that [inaudible] of browsers, that we shouldn't wait for the browsers then before-�

MS: There is no need to wait for the browsers.

JB: Okay. I think that's important.

MS: The browsers, having the browsers support XML directly will certainly be a big help and a big advantage.

JB: Okay.

MS: But XML is worth deploying even without that because it means you can single-source your information and that means the print version and the web version of your undergraduate catalogue never need to get out of sync again.

JB: That sounds wonderful! Okay, we've got more questions, Howard, right?

HS: Yes, in fact, we have about nine questions from the next person who sent a question in here and we are running very late here right now, so perhaps I should just grab one or two of the questions that we got from Eric Tribeau from - I'm not sure where, but we can - oh, from Bridgewater State College. But Eric says, "Why should I put my data into XML format? How does this make life easier, vs. placing data in some known proprietary format? And aren't XML schemas proprietary to begin with?"

MS: Well, it only makes life easier if you expect your data to be of interest to you longer than you can guarantee that you will want to work with that vendor. If you don't mind being locked in to one vendor, then there's no reason to avoid a proprietary format. Experience shows, however, that most of us care about our data a lot longer than we care about that vendor, or more to the point, a lot longer than that vendor may care about us! So it's sheer self-preservation. If you have your data in a proprietary format, effectively the owner of that format owns your data. You have to pay dues. And if it's in an open format, you own your data.

HS: Okay, I think we ought to start asking the final questions here and the kind of question we normally ask around this time, like long after we've run out of time-�

JB: Now is a good time!

HS: Right, now is a good time to do that. For folks sitting out there at universities who really haven't done very much with XML, how should they get started?

MS: Well, it's certainly worthwhile to-�

HS: Besides listening to this webcast.

MS: It's certainly worthwhile to find some small application in which your - sorry. Start with documents, start with simple documents. Use XHTML rather than old style HTML or design your specialized markup language for specialized information. If you're involved in exchanging data between two locations or two devices on your system, you may find yourself having to invent a format. My colleagues at Illinois invented a format for sending jobs to the printer and it looked a lot like XML. And acquire some of the public domain parsers. Play with them. Write applications. That's the software developer's point of view, of course.

HS: Okay, sort of a related question. I mean, if you were to look over to your business office or the place that's doing your financial type data processing, should those folks sort of stop right now and take another look at the way they're doing stuff in databases and exchanging information and say, "How should we fit XML into this?" I mean, is it early enough to start doing that, to really play with some of your key systems that keep track of important systems at the university?

MS: I would certainly say that for any new project that you take up or any new system, you should look at building XML into it, and over time, think about converting older systems. Quite frequently, XML can be used very effectively as just a bridge between that legacy system whose internal format is never going to change and which is not going to be replaced for a long, long time, and other systems. If you can get the old behemoth to produce XML and maybe that means you write a translator from the format that it does produce into XML, it can be a lot easier for other people around the campus to read and process the XML. So I would go for it through osmosis. I wouldn't necessarily push the button and say, "All right, everything stops. Nothing happens for another six months while everybody gets intensive XML training and then we'll start developing applications again." But that's just general policy. Nobody wants to do that.

HS: Do we expect that using XML is going to save universities money?

MS: Certainly in the long run. I have been in projects where we spent inordinate amounts of time writing screen scrapers to get information out of the administrative databases. In the amount of time we spent on the screen scrapers, they and we working together could have easily developed an XML format and written an XML exporter and an XML importer on our side. And I think XML would have paid for itself in that one project.

HS: That's great! That means we can go off to our financial people and actually make a case for XML.

JB: Of course, the other thing is that, from what you're saying, we have to maybe start asking more questions about our data. You know, is this data and content that we're going to have around for a long time or is it really fairly short-term data? And therefore the projects that would really be important to use XML for the longer-term projects, or the content that's going to be important for a longer time. Is that what you're saying?

MS: Yes.

JB: Oh, okay, wrong question.

MS: Exactly.

JB: Okay. We did have a question come in regarding the reference on OASIS, and I'd just like to mention online here that OASIS is referenced in the Robin Cover one-man survey of all things XML. So it's just one link down in there. Howard, how are we doing here? Do you have a final question?

HS: I have a final question.

JB: Okay.

HS: Michael, you've been talking about people doing all this XML stuff and I wonder, what kind of training does somebody need to be able to do XML? If we're going to send these people out to get trained, what kind of training should they get?

MS: Well, there are a couple of branches. Software developers, I think, should get general XML training. They should certainly look at XSLT because that simplifies the process [inaudible] data.

HS: Just tell us - I know what that means, but you should [inaudible].

JB: It came by pretty fast for me. What was it again?

HS: XSLT.

MS: XSLT. Extensible Stylesheet Language Transformations.

HS: And that does what? Tell us.

MS: Basically, that allows you to write in a compact and fully declarative way mappings that translate from one XML document form into another XML document form. Actually, it can also go into non-XML forms, but basically from one XML tree to another. And my life has been changed since I no longer have to use PERL or Python or Rex for processing XML, for just mapping it into some other format. XSLT does everything much more easily and much faster and much more compactly, I may say.

HS: So that's one of the things that developers, software developers ought to-�

MS: Developers should certainly look at that. The kind of application for which you're now using PERL may well stay in PERL or maybe better done in XSLT if the data's coming in in XML. For other people, I think it depends on the application. You may quite frequently, if you're deploying XML in something like a-because you're working with partners who are using a specific XML vocabulary for exchange of data, say, for ordering or something like that, then you'll probably want training in that particular application, in that particular vocabulary, both the people developing software for it and the people using that software.

There are a lot of XML courses out there, there are a lot of consultants willing and able to teach. Certainly for big production systems, it makes sense to bring in a consultant to help you develop your vocabulary, if you're developing your own, or even to help you figure out how to adapt an existing standard vocabulary to your needs. And those consultants can also advise you in much more detail about the kind of training that's appropriate for a particular project.

HS: But you're not soliciting for business right now?

MS: I am not soliciting for business. I am not myself available as a consultant because I'm too busy.

JB: Actually, when you talk about all these different vocabularies, I'm thinking, you know, it's just another variation on all the dialects we all grew up with, right?

MS: In a way, yes!

JB: We all have these various-I guess it's codification, perhaps, of the discipline vocabularies that we've had all along.

MS: Yes.

HS: So now we can talk to each other in a well-defined way.,�

MS: Exactly. Communication can-�

JB: Only if you have the right browser.

MS: We can pinpoint exactly where communication fails. It's not longer a global problem.

JB: Okay, all right. Let's see, Michael, do you have a final comment or something that you'd like to have? We've really enjoyed having you here.

MS: Well, I think that XML offers a lot of opportunities for application in higher education because this is the area that's responsible for our cultural memory, and I think having a data format that allows the data to live for a lot longer than it would otherwise live is a very good thing and a very good match for higher education.

JB: Okay, and Howard? Anything else?

HS: No.

JB: Okay.

HS: Or we'll be here 'til seven!

JB: Well, just a couple points, I think, for those of you who missed it at the beginning. Michael is staying up perhaps past his bedtime. It's close to 11:00 where you are right now?

MS: Yes, exactly.

JB: So when you sign off, you'll probably go to bed. The other thing is, I'd just like to make a comment that as we go forward into the future here, I was reading something that talked about how our analog brains are really going to be much slower in another 20 years than the computing brains and so as we build these languages, I guess the computers are going to be able to talk to each other faster than we can talk to each other.

HS: They can certainly do that now!

JB: Well, you know, you're right! There it goes. But they'll be able to do it even more efficiently. So there we go. Thanks, everyone, for being here today. It's time for our closing notes. Be sure to plan on joining us two weeks from today when our special guest expert, Phil Schecter from the Burton Group will be talking about web access, management and security. The security expert, Mark Bruhn, who's been a guest on Tech Talks before from Indiana University will be the CREN co-host at that time.

Many thanks to all the institutions who support these Tech Talks, and also thanks to the Tech Talk team who helped make this event possible today. A special thanks to our Tech Talk expert, Michael Sperberg-McQueen; to technology anchor, Howard Strauss; to Terry Calhoun, Tech Talk web guru; to Jason Russell, Gayle Terkeurst and the support team at Merit Network; to Susie Berneis who is our audio file transcriber; and finally, a thanks to all of you for being here. You were here because it's time. Bye, Michael.

MS: Bye-bye.

JB: Bye, Howard.

HS: Bye, Judith.

JB: See you in two weeks, all!

HS: Bye-bye.

MS: So long.

END OF WEBCAST