CRUD

The Java API's DocumentManager interface defines all the CRUD-related functionality for MarkLogic. Once you have a DocumentManager instance (represented as docMgr in the table below), you can use the following methods for creating, reading, updating, and deleting content:

Task	Method
Create	docMgr.write(docURI, handle)
Read	docMgr.read(docURI, handle)
Update (Patch)	docMgr.patch(docURI, handle)
Delete	docMgr.delete(docURI)

Those four methods—read(), write(), patch() and delete(), along with their various overloaded versions, provide MarkLogic's core CRUD functionality. You use the same write() method whether you're first creating a document or later updating (replacing) it. In other words, there's no real distinction between Create and Updating an entire document. However, if you want to act differently based on whether a document already exists, you can first check for its existence by calling the exists() method. Likewise, if you only want to update part of the document (aka, patch), use the patch() method.

Notice the "handle" arguments in the above table. Handles provide a consistent way of wrapping many different kinds of content (and metadata) representations for use by MarkLogic. Example handle types include DOMHandle, FileHandle, InputStreamHandle, StringHandle, etc. Typically, you pass an "empty" handle to read(), which then populates the handle with content from the database for your subsequent use. Conversely, you pass a pre-populated handle to write(), which then inserts the encapsulated content into the database.

As you may recall, MarkLogic supports four primary document formats:

XML
JSON
text
binary

The Java API provides sub-interfaces for working with each type of document:

DocumentManager
- XMLDocumentManager
- JSONDocumentManager
- TextDocumentManager
- BinaryDocumentManager
- GenericDocumentManager

When you're working with just one document format, such as JSON, you would use the corresponding format-specific class, such as JSONDocumentManager. However, you also have the option of using GenericDocumentManager if you are processing multiple documents with unknown formats.

For more details on all the classes in the MarkLogic Java API, see the online javadocs.

Create a JSON document

Let's get started by loading a JSON document (src/main/resources/data/flipper.json) into the database. In Eclipse, take a look at Example_01_CreateJSON.java. Since we're loading a JSON document, we need an instance of JSONDocumentManager to start working with it. That's where our DatabaseClient instance (client) comes in:

// create a manager for JSON documents
JSONDocumentManager docMgr = client.newJSONDocumentManager();

Once we have a document manager, we can use it for any number of documents with that type (JSON in this case). In other words, you do not need to create a new document manager for each individual document you want to work with.

Before passing docStream (the InputStream encapsulating flipper.json) to our document manager, we need to wrap it with a handle object. In this case, we use the InputStreamHandle wrapper:

// create a handle on the content
InputStreamHandle handle = new InputStreamHandle(docStream);

Finally, we decide on a URI for our document (what URI it will be associated with in the database) and call our document manager's write() method:

// write the document content
docMgr.write("/example/flipper.json", handle);

This has the effect of writing the given content (encapsulated in handle) to the database, using the given URI ("/example/flipper.json").

Go ahead and run the program in Eclipse (Run->Run). The flipper.json document will be inserted into the database, assuming your REST server is up and running with the expected configuration. We can verify that the content has been loaded by viewing the following URL in the browser (making direct use of the REST API):

http://localhost:8011/v1/documents?uri=/example/flipper.json&format=json

(You can log in as either the admin user or one of the REST users you configured during setup.)

Create an XML document

Now let's load an XML document into the database. Take a look at Example_02_CreateXML.java. This is very similar to loading a JSON document, except that now we use an XMLDocumentManager:

// create a manager for XML documents
XMLDocumentManager docMgr = client.newXMLDocumentManager();

Given the variable docStream pointing to an InputStream for src/main/resources/data/flipper.xml, we can now create a handle for the content:

// create a handle on the content
InputStreamHandle handle = new InputStreamHandle(docStream);

Finally, to load the document to the database, we call the write() method:

// write the document content
docMgr.write("/example/flipper.xml", handle);

After you run the program, you can verify the content has been loaded by accessing the REST API directly in your browser using this URL:

http://localhost:8011/v1/documents?uri=/example/flipper.xml

Create a text document (with a collection)

Creating a text document is quite similar, except now—you guessed it—we use a TextDocumentManager. See in Example_03_CreateText.java:

// create a manager for text documents
TextDocumentManager docMgr = client.newTextDocumentManager();

This time, rather than loading text from a file, let's just load it from a string:

// create a handle on the document's content
StringHandle content = new StringHandle("some text");

And to spice things up a bit, let's associate this document with a collection. Recall that documents can be associated with collections and directories. Directories are an implicit part of the document's URI, but a collection is a tag that's independent of the URI. To add a collection tag to this document, we effectively need to send not only the document's content, but also some associated metadata. For that, we start by constructing a DocumentMetadataHandle:

// create a handle for the document's associated metadata
DocumentMetadataHandle metadata = new DocumentMetadataHandle();

Then we add a collection tag to the metadata:

// add a collection tag
metadata.getCollections().addAll("myCollection");

Finally, we use the version of write() that takes a metadata handle as its second argument:

// write the document content & metadata
docMgr.write("/example/foo.txt", metadata, content);

After you run the example, verify that the document was added to the "myCollection" collection by opening the following URL in your browser:

http://localhost:8011/v1/search?collection=myCollection&format=json

You should see a reference to "/example/foo.txt" in the result, which means that document is indeed tagged with "myCollection". Don't worry about understanding the format of these search results. The Java API provides its own way of performing these searches (as we'll see), so we're cheating a little here just for the purpose of some immediate gratification.

Create a binary document (with properties)

The only document format we haven't loaded yet is binary, so let's make this complete by loading an image into the database. In the last example, we associated the document with a collection. This time, let's associate the document with another kind of metadata: properties.

Documents can have several kinds of metadata associated with them:

collections,
permissions,
quality, and
properties.

We've seen what collections are (essentially tags). Permissions associate roles (such as "rest-writer") with privileges (such as "update") for the document. Quality is a numeric value that can be used to boost a document's ranking in search results. Properties are arbitrary name/value pairs that you can associate with a document, outside of and in addition to its actual content. All of these are encapsulated in Java by the aforementioned DocumentMetadataHandle.

In this case, we're going to make use of MarkLogic's ability to automatically extract metadata from binary files and store it as a set of properties associated with the binary document. Since MarkLogic does this automatically on the server side, we don't even need to worry about encapsulating the metadata on our end. The only thing we have to do is enable this behavior (which is uniquely available for binary documents, via BinaryDocumentManager). Take a look at the relevant lines in Example_04_CreateBinary.java:

// create a manager for binary documents
BinaryDocumentManager docMgr = client.newBinaryDocumentManager();
 
// enable automatic metadata extraction into properties
docMgr.setMetadataExtraction(MetadataExtraction.PROPERTIES);

Once we have that, we create our handle as usual and write it to the database, without having to mention anything about properties or metadata:

// create a handle on the document's content
InputStreamHandle content = new InputStreamHandle(pngStream);
 
// write the document content
docMgr.write("/example/mlfavicon.png", content);

Go ahead and run the program. We'll see how to read the properties back using Java, but, again, in the interest of immediate gratification, follow this URL in your browser:

http://localhost:8011/v1/documents?uri=/example/mlfavicon.png&category=properties&format=json

Although this particular example doesn't include a lot of interesting properties, MarkLogic can handle hundreds of different kinds of binary documents (images, videos, office documents, etc.), automatically extracting useful metadata.

Read a JSON document

Now that we have a document of each format in the database, let's read them back, starting with the JSON document. Open Example_05_ReadJSON.java.

The first few steps (creating the DatabaseClient and getting the JSONDocumentManager) are identical to when we created a new JSON document. But this time, instead of calling the write() method, we call the read() method to read a database document having a specific URI. When we wrote content, we provided a handle already populated with an InputStream (for flipper.json). But when reading content, we instead supply an empty handle which will get populate when the method executes.

Broadly, there are two kinds of handles:

Write handles, which define a representation for writing database content, and
Read handles, which define a representation for reading database content.

Many handle types can function as both read and write handles. Take a look at the full list of handles in the javadoc: both the built-in handle types and the extra handle types. What type of handle you use depends on how you prefer to locally interact with the data (JSON, in this case) in Java. A StringHandle will return the JSON as a string. A JacksonHandle will return the JSON as a tree structure (JsonNode), which can optionally be mapped to a POJO (plain old Java object), based on a data binding configuration that you provide.

In this case, we'll just use a StringHandle:

// create a handle to receive the document content
StringHandle handle = new StringHandle();

Then we call read(), which populates (and returns, for that matter) the handle:

// read the document content
docMgr.read("/example/flipper.json", handle);

To see that our StringHandle has in fact been populated with the document content, let's call the handle's get() method to retrieve the string, and then print it to the console:

// print out the document content
System.out.println(handle.get());

Run the program. You should see the JSON document printed to the console.

Read an XML document

Now let's read our XML document from the database. Open up Example_06_ReadXML.java. This is very similar to reading a JSON document, except that we're back to using an XMLDocumentManager. Although a StringHandle will work here as well, let's pick a read handle that is specifically tailored to XML. Some of the XML-specific handle types available include: DOMHandle, JDOMHandle, DOM4JHandle, XOMHandle, and JAXBHandle, the latter of which enables you to marshall/unmarshall from/to POJOs.

In this case, we'll use a DOMHandle, which represents the given document as a DOM tree (and can function as either a read or write handle).

// create a handle to receive the document content
DOMHandle handle = new DOMHandle();

Then we call read(), which populates (and returns) the handle:

// read the document content
docMgr.read("/example/flipper.xml", handle);

To see that our DOMHandle has in fact been populated with the document content, we call the handle's get() method to retrieve the encapsulated DOM object and print the outermost element's name to the console:

// access the document content
Document document = handle.get();
 
String rootName = document.getDocumentElement().getTagName();
System.out.println("Read /example/flipper.xml content with the <"+rootName+"/> root element");

Now run the program. You should see the printed message, including the document element's name.

Read a text document (and its collections)

Open Example_07_ReadText.java. Reading a text document works much the same way. This time, however, we're also going to fetch the document's metadata, in addition to its content. That means we need to create two handles: one for the content (we'll use StringHandle) and one for the metadata (DocumentMetadataHandle):

// create a handle to receive the document content
StringHandle content = new StringHandle();

// create a handle to receive the document metadata
DocumentMetadataHandle metadata = new DocumentMetadataHandle();

Next, we retrieve both the content and metadata in one call to a 3-argument form of the read() method:

// read the document content
docMgr.read("/example/foo.txt", metadata, content);

To verify we've got everything back, we'll print the content and its collections to the console:

// print the document content
System.out.println(content.get());
 
// iterate over the collections and print each one
for (String collection : metadata.getCollections()) {
    System.out.println("Collection: " + collection);
}

Note that the metadata handle's getCollections() method returns an object, which, for your convenience, implements the java.util.Set interface, making it easy to iterate over its members. Run the program to see the console output.

Read a binary document (and its properties)

To read binary content from the database, we need to use a handle class that implements the BinaryReadHandle interface (we'll use BytesHandle). Reading metadata works the same way as any other document format: use a DocumentMetadataHandle. See in Example_08_ReadBinary.java:

// get a manager for binary documents
BinaryDocumentManager docMgr = client.newBinaryDocumentManager();
 
// create a handle to receive the document content
BytesHandle content = new BytesHandle();
 
// create a handle to receive the document metadata
DocumentMetadataHandle metadata = new DocumentMetadataHandle();

We saw how one unique feature of the binary document manager is enabling automatic metadata extraction when inserting binary documents. Another unique feature is the ability to do "range requests," which are useful for serving large data, such as video. The BinaryDocumentManager interface provides overloaded versions of the read() method for retrieving parts of binaries (called sub-binaries), by specifying a range of bytes. In our case, we don't need to do that, since we're dealing with a small document; we'll just grab the entire thing in one request:

// read the document content & metadata
docMgr.read("/example/mlfavicon.png", metadata, content);
 
// get the document content as a byte array
byte[] contentBytes = content.get();

Once we've read the content and metadata from the server, let's print some reasonable information to the console:

// print the image size to the console
System.out.println("Binary document size in bytes: " + contentBytes.length);
 
// iterate over the properties and print each one
for (Map.Entry<QName,Object> prop : metadata.getProperties().entrySet()) {
    System.out.println(prop.getKey().getLocalPart() + ": " + prop.getValue());
}

After printing the document's size in bytes, the above code iterates through the document's properties, making use of the fact that getProperties() returns an object that implements java.util.Map. Run the program to see each of the document's properties (key/value pairs) printed to the console.

Read only a document's metadata

Example_09_ReadMetadata.java shows that, thanks to the readMetadata() method, you don't have to download a document's content if all you want to do is read its metadata:

// read just the document's metadata
docMgr.readMetadata("/example/mlfavicon.png", metadata);

Run the program to see the properties extracted from the metadata.

Read a document's metadata as raw XML

In addition to the tailor-made DocumentMetadataHandle POJO, we can also use other kinds of handles to receive and work with metadata in its raw form (as served up by the REST API). Take a look at Example_10_ReadMetadataAsXML.java. For our metadata handle, this time we're using a regular StringHandle:

// create a handle to receive the document metadata as XML
StringHandle metadata = new StringHandle();
 
// read just the document's metadata
docMgr.readMetadata("/example/mlfavicon.png", metadata);
 
// dump the metadata as raw XML
System.out.println(metadata);

Run the program to see the underlying XML representation of the document's metadata (collections, permissions, properties, and quality).

Read a document's metadata as raw JSON

Example_11_ReadMetadataAsJSON.java shows how you can read a document's metadata as JSON (as served up by the REST API). This is just like the previous example, except that we specify JSON as the format when creating our StringHandle:

// create a handle to receive the document metadata as JSON
StringHandle metadata = new StringHandle().withFormat(Format.JSON);

Run the program to see the JSON representation of document metadata.

Update (patch) a document

To update (replace) an entire document, you use the write() method just as when creating a document for the first time. See any of the "Create …" sections above for examples. (It is, however, possible to update a document's metadata without changing its content; for that you'd use your document manager's writeMetadata() method.)

If you'd like to update just a piece of the document, please see the documentation for Partial Updates(PATCH).

Delete a document

No matter what kind of DocumentManager you use, deleting works the same. So in this case, let's just use a generic document manager to delete each of the documents that we've created so far. Open up Example_12_DeleteDocuments.java, and find the relevant lines of code:

// create a generic manager for documents
GenericDocumentManager docMgr = client.newDocumentManager();
 
// delete the documents
docMgr.delete("/example/flipper.json");
docMgr.delete("/example/flipper.xml");
docMgr.delete("/example/foo.txt");
docMgr.delete("/example/mlfavicon.png");

You can verify a document has been deleted if you subsequently try to run one of the "Read" examples from earlier; it will yield an error complaining that the document does not exist.

Perform multiple database updates in the same transaction

As preparation for the search and query examples, let's populate the database with more data, including a bunch of JSON, XML, and image files from inside src/main/resources/data. The JSON documents describe talks given at the last MarkLogic World conference; the XML consists of a set of Shakespeare plays (associated with the "shakespeare" collection on load); and the images are photos with embedded metadata.

Take a look at Example_13_LoadUsingTransaction.java and run it. Besides calling write() a bunch of times, this program illustrates the ability to make a series of database updates either completely succeed or completely fail -- all or nothing. In the previous examples, each update occurred in its own transaction, but in this example every call to write() is associated with the same transaction. First we create a transaction:

// start the transaction
Transaction transaction = client.openTransaction();

Then we reference the same transaction in every write() call we make:

// load each document in the same transaction
mgr.write(uri, metadata, doc, transaction);

None of the changes are committed (or globally visible) until we later call commit():

// Commit the transaction
transaction.commit();

Go ahead and populate the database by running the program. If for some reason the transaction gets stalled (for example, due to early program termination), see Rolling Back a Transaction for instructions on how to remedy the problem.

For loading larger amounts of data, I recommend checking out MarkLogic Content Pump, which lets you efficiently load large numbers of documents asynchronously, automatically dividing them up into appropriately sized chunks for loading.

Java API basics

Basic Search

Contents

CRUD