This version of the tutorial applies to MarkLogic 6 and MarkLogic 7. For MarkLogic 8 and later, see the updated version.
In this tutorial we will walk you through the creation of a simple web-based MarkLogic application. This tutorial builds upon the foundation that we laid in Part 1. If you haven't completed that tutorial yet, now might be a good time as we are going to pick up where that tutorial left off and start building an application on the setup we performed in that tutorial. This tutorial not designed to teach you to be some AJAX-wielding web ninja nor is it designed to make you an XQuery guru. What it will do is show you how to create a simple web-based application that leverages the power of MarkLogic and along the way we'll pick up some best practices for building our applications. Enough already, on to the actual tutorial!
Creating an HTTP Server
In addition to being the industry's only operational database for Big Data, MarkLogic is also an HTTP server. Surprise! It is this feature that allows us to build web applications directly on the server using XQuery and to expose functionality to other services via an XML-RPC style interface. Query Console is really nothing more than a web application running directly on MarkLogic that provides a programming interface to the server.
Your MarkLogic server is administered via your web browser via a default address of http://localhost:8001. Go ahead and go to that location and you should see something like this:
In order to create our HTTP server we need to expand the "Groups" entry in the tree on the left side of the screen until we see the "App Servers" entry. Click on "App Servers" and on the top right you will be presented with a series of tabs. Click on the tab labeled "Create HTTP" and you will be presented with yet another form, a portion of which is shown here:
There are only four items that we need to complete to create out HTTP server. The first is the name. I would hate to break a trend at this point so let's call our HTTP server Shakespeare. We also need to define the root directory for our HTTP server. This is the directory where the server will be looking for content to serve in response to HTTP requests. For those of you who have done some web application development in Java, this is the same as configuring the web root of your application server. Pick a location on your file system where you are going to want to store the XQuery code that we are going to write for this application. Bear in mind that the server is going to need read/write access to this area so be careful to pick someplace where directory permissions won't be an issue.
The next item that we need to configure here is the port that our HTTP server will listen on. I'm going to choose 8010 as I know that is a free port and it's easy to remember. Feel free to choose a different port if that one isn't free on your machine.
The final thing that we need to do here is to attach this HTTP server to our database. All HTTP servers have to be associated with a database so that the server has a context for evaluating any XQuery that comes in via the HTTP server. Simply select the "Shakespeare" database from the drop-down list then click on the "ok" button. That's it! Our HTTP server is all setup!
Where to Put Our Code?
Now that we've created our HTTP server we're pretty much ready to start creating our web application. When you created the HTTP server you were asked for a root directory. I don't know what you chose, but I chose /Users/clarkrichey/Documents/MarkLogic/Education/Tutorials/Tutorial 2/xquery/
. For the purpose of our discussion here, I don't care about the full path, only the name of the root directory which, in my case, is xquery
. The actual name doesn't really matter much but as a matter of style xquery
is probably a good choice as it clearly indicates that this is where we are going to be storing the XQuery code that we are going to write. I suppose that code
or src
would do as well but I like the fact that it's clear what kind of code is going to be found here—XQuery. This also gives us flexibility in larger projects where we may be working with Java or .NET code alongside the XQuery code.
Hello World!
At this point your xquery
directory (or whatever you called it if you're doing your own thing here) should be empty. No files, no subdirectories. Zippy. So, let's try something. Let's go to http://localhost:8010 and see what happens. You should have been prompted for your login credentials and then you should have recieved a 404 Not Found error. This seems logical as we just verified that the root directory for our HTTP server is empty. So, let's fix that. Create a file called default.xqy
in your xquery directory and paste the following code into that file:
Now, lets try this again. Lets go to http://localhost:8010. Ah ha! Now we see our requisite Hello World web page. Now that things are working we should take a moment to talk about what exactly is happening. First off, when the http server is presented with a request that does not specify a page (as in the request we just made) it automatically looks for a file named default.xqy
in the HTTP server's root directory (this is all assuming that no url rewriter has been setup, which is a topic for another time). Now that we have created just such a page it was processed and the results were returned in our browser. We would have seen the exact same results if we had instead gone to http://localhost:8010/default.xqy.
Creating Web Content
OK, that's all well and good but what was all that weird code we put into the file? Well, I'm glad you asked. The MarkLogic HTTP server provides us a way to dynamically create web pages much in the same way that you can with JSP or PHP pages. However, because MarkLogic is an XQuery platform and because we gave our file the .xqy
extension, the server is expecting that this file will contain vaild XQuery code. More specifically, the server is expecting that the file will contain a main module. A main module is simply some code that can be directly executed as an XQuery program. It must include, at a minimum, a query body consisting of an XQuery expression (which in turn can contain other XQuery expressions, and so on). Our main module contains an XQuery sequence expression whose first part is a call to xdmp:set-response-content-type
. This function is used to set the response encoding. We used this call to set a response encoding of text/html
so that the browser would know to interpret the results as html because most browsers do not intrinsically know what to do with content ending in .xqy.
However, as you will note from the documentation, the call to xdmp:set-response-content-type
returns an empty sequence. Clearly, an empty sequence is not what we want in order to create a valid web page. In order to get our HTML returned to the browser we have to include it as part of the sequence that is returned. We did that by adding to the empty sequence returned by xdmp:set-response-content-type
. The ',
' that we placed after xdmp:set-response-content-type("text/html")
indicated that what followed next was the next part of the sequence: the string DOCTYPE
declaration followed by the HTML element that we wanted sent to the browser. I realize that all of this returning of sequences appended to sequences sounds a bit daunting at first but I assure you that with just a little practice it becomes second nature in no time at all. Additional information on sequences as return types from XQuery expressions can be found in the "Expressions return items" section in the XQuery and XSLT Reference Guide.
Dynamic Content
That was a good start but returning static HTML really isn't very useful for actually building applications. In order to really do something useful and interesting we need to return dynamic content. Well, as I alluded to earlier, we have the ability to include script that will be evaluated dynamically much as you can with JSP or PHP pages. The main difference here is that instead of embedding Java or Python in our pages we're going to embed XQuery to provide our dynamic functionality. Adding that functionality couldn't in fact be simpler. All we need to do is to take the XQuery code that we want evaluated and enclose it within {}
. So, let's try that out by adding a very simple XQuery expression to our default.xqy
page.
Now when we view this page in our browser we see the version of the MarkLogic server dynamically displayed as part of the HTML. This is due to the server evaluating the XQuery expression xdmp:version()
that it encountered within the {}
and returning the result as part of the HTML response. This ability to embed XQuery directly within our HTML will serve as the foundation for building up much more complex web applications. Let's continue our exploration of this capability by creating a small application that actually leverages the Shakespeare content that we went through so much effort (well...at least a little effort) to load into the server. Rather than having you go through all of the effort of copying and pasting some code, why don't you just download the simple application that I wrote so that we can discuss it in some more depth?
Let's get Modular with it
Go ahead and expand the zip file your just downloaded into the same directory where you placed the default.xqy
file that we worked with earlier. What I want to do now is to take a little bit of time to talk about how this very simple application is structured. However, before we do let me provide a disclaimer. There is no single correct way to structure your XQuery application. However, there are some good fundamental practices and concepts that will help you to create a good structure for your projects. What we will be looking at now are some of those fundamental practices and concepts. So, after unzipping the application you should notice the addition of two new files, search.xqy
and results.xqy
as well as a new directory named modules
. Let's ignore the modules
directory for a moment and focus on those two new XQuery files. search.xqy
is a very simple bit of code that creates a form allowing users to enter some text for the speaker they are searching for and then submits that form to the results.xqy
page.
OK, clearly the results.xqy
page is where a lot of the work must be happening. Let's dive in and see what's going on. A quick peek at this page shows that starting on line 10 we are looping through some sequence of SPEECH elements and displaying the LINE elements contained in each speech. Where did we get these search results from? Let's look more closely at line 10. Here we're calling some function called find-speech
in the search-lib
namespace. That sounds promising but what is that function and where did it come from? Well, if we look at the code on line 1 we see that we are importing a module in the search-lib
namespace and that we expect to find the file containing that module at the relative path modules/search-lib.xqy
. Hmmmmmm...that's interesting. Do you remember how we talked about main modules earlier? Well, there is another type of module called a library module and that is what we are importing. Library modules, unlike main modules, are not directly executable by the server. Instead they house reusable bits of code, typically functions, that we can access from elsewhere in our application as we did here. Think of library modules like JAR files in Java or DLLs in .NET. It's not exactly the same thing but the idea is close enough. So, according to that import statement we just looked at on line 1 we should be able to find this library module in a file called search-lib.xqy
within the modules
directory. Let's pop that file open and see what we find!
The first interesting things in this rather short and simple files appears on line 2 where we are declaring the namespace that is associated with this module (module namespace search-lib = "http://www.marklogic.com/tutorial2/search-lib";
). Note that this is the same namespace we used when we imported the module into our results.xqy
file. After that little bit of module housekeeping is taken care of we jump right into declaring functions to be used in this module. In this case we only have one function defined, search-lib:find-speech
. This is the function that we called from our results.xqy
page in order to find lines spoken by a particular speaker. As you can see, this function takes a single string as the search parameter and it returns a sequence of zero of more SPEECH elements. This query is accomplished in a single line of XQuery where we do a case-insensitive query (where we also allow for wildcards) to find all SPEECH elements with a child element, SPEAKER, that matches our search term. While powerful, this query is simple enough that we are able to easily accomplish it in a single line of code. Why then did we go through all of the hassle to put this very simple query into a module in a completely separate file that we then had to import in order to use? Surely it wasn't just a completely arbitrary example to demonstrate the use of modules, was it?
Of course, the answer is no. There is a much more important reason for why we separated out that search function and that has to do with the fundamental concepts of code modularity and reuse. Simply put, we are employing a technique to separate the implementation of our search (contained in the search-lib
module) from the use of those results, which in this case is to display some very simple XHTML in our results.xqy
page. This simple technique is going to allow us to reuse our search code from within other portions of application. Additionally, if we need to modify the way search works, perhaps by making the search case-sensitive, we have a single place to make that modification instead of having to track down everyplace we pasted the search code in order to maintain consistent search behavior. The concept behind this technique is probably not new to you if you have been programming for any period of time in any other language. The really important point here was to demonstrate how to implement that technique in XQuery.
Summary
Hopefully you learned a few things during the course of this tutorial. We covered the basic techniques for getting an XQuery based web application up and running. Along the way we talked about some best practices for laying out our project and we looked at how modules allow us to reuse portions of our code within our application while also making our code more maintainable. We will also build upon those concepts in an upcoming tutorial. In the meantime get out there and start building your own web based applications on MarkLogic!