In-Memory Transaction Example

In-Memory Transaction Example
Prev	Chapter 6. Summary and Examples	Next

Some applications use XML documents in a transient manner. That is, they create and store XML documents as a part of their run time, but there is no need for the documents to persist between application restarts. For these class of applications, overall throughput can be improved by abandoning the transactional durability guarantee. To do this, you keep your environment, containers, and logs entirely in-memory so as to avoid the performance impact of unneeded disk I/O.

To do this:

Refrain from specifying a home directory when you open your environment. The exception to this is if you are using the DB_CONFIG configuration file — in that case you must identify the environment's home directory so that the configuration file can be found.
Configure your environment to back your regions from system memory instead of the filesystem.
Configure your logging subsystem such that log files are kept entirely in-memory.
Increase the size of your in-memory log buffer so that it is large enough to hold the largest set of concurrent write operations.
Increase the size of your in-memory cache so that it can hold your entire data set. You do not want your cache to page to disk.
Specify an empty string when you open your container. Note that for in-memory operations, you are limited to just one container.

As an example, this section takes the transaction example provided in Transaction Example and it updates that example so that the environment, container, log files, and regions are all kept entirely in-memory.

To begin, we simplify the beginning of our example a bit. Because we no longer need an environment home directory, we can remove all the code that we used to determine path delimiters.

// File TxnGuideInMemory.cpp

// We assume an ANSI-compatible compiler
#include "dbxml/DbXml.hpp"
#include <cstdlib>
#include <iostream>
#include <pthread.h>
#include <sstream>

#ifdef _WIN32
extern int getopt(int, char * const *, const char *);
#endif

using namespace DbXml;

// Printing of pthread_t is implementation-specific, so we
// create our own thread IDs for reporting purposes.
int global_thread_num;
int global_num_deadlocks;
pthread_mutex_t thread_num_lock, thread_num_deadlocks;

// Forward declarations
int usage(void);
void *writerThread(void *);

struct ThreadVars {
    XmlContainer container;
    bool useReadCommitted;
    int numNodes;
};

Next, we modify the usage() function so that it no longer mentions the -h option which was used to specify the environment home directory.

// Usage function
int
usage()
{
    std::cerr << "\nThis program writes XML documents to a DB XML"
              << "container. The documents are written using any number\n"
              << "of threads that will perform writes "
              << "using 50 transactions. Each transaction writes \n"
              << "10 documents. You can choose to perform the "
              << "writes using default isolation, or using \n"
              << "READ COMMITTED isolation. If READ COMMITTED "
              << "is used, the application will see fewer deadlocks."
              << std::endl;
     std::cerr << "\nNote that you can vary the size of the documents "
               << "written to the container by defining the number of \n"
               << "nodes in the documents. Up to a point, and depending "
               << "on your system's performance, increasing the number \n"
               << "of nodes will increase the number of deadlocks that "
               << "your application will see." << std::endl;
    std::cerr << "Command line options are: " << std::endl;
    std::cerr << " [-t <number of threads>]" << std::endl;
    std::cerr << " [-n <number of nodes per document>]" << std::endl;
    std::cerr << " [-w]       (create a Wholedoc container)"   << std::endl;
    std::cerr << " [-2]       (use READ COMMITTED isolation)" << std::endl;
    return (EXIT_FAILURE);
}

We are also able to eliminate the containerName and dbHomeDir variables from our main().

int
main(int argc, char *argv[])
{

    DbEnv *envp = NULL;
    XmlManager *mgrp = NULL;

    ThreadVars threadInfo;
    threadInfo.useReadCommitted = false;

    // Initialize globals
    global_thread_num = 0;
    global_num_deadlocks = 0;

    int ch, i;
    int numThreads = 5;
    u_int32_t envFlags;
    XmlContainer::ContainerType containerType =
        XmlContainer::NodeContainer;

    // Application name
    const char *progName = "TxnGuide";

Parsing the command line arguments is somewhat simpler now too. We no longer care about the difference in file path delimiters between a windows and a unix system, and we no longer support the -h option.

    // Parse the command line arguments
    while ((ch = getopt(argc, argv, "n:t:w2")) != EOF)
        switch (ch) {
        case 'n':
            threadInfo.numNodes = atoi(optarg);
            break;
        case 't':
            numThreads = atoi(optarg);
            break;
        case '2':
            threadInfo.useReadCommitted = true;
            break;
        case 'w':
            containerType = XmlContainer::WholedocContainer;
            break;
        case '?':
        default:
            return (usage());
        }

Until now we have only eliminated things from the program. This is to be expected; after all, we need to collect less information in order to operate and so our code should be slightly simpler.

But now we need to start adding information to tell the Berkeley DB library that it must keep information in-memory only. We start by making the environment private; this causes all the region files to be kept in memory. (Additional code is in bold.)

Note that we also remove the DB_RECOVER flag from the environment open flags. Because our containers, logs, and regions are maintained in-memory, there can never be anything to recover.

    // Find out how many nodes we'll write to the container
    threadInfo.numNodes = threadInfo.numNodes < 1 ? 1 :
                          threadInfo.numNodes;

    // Find out how many threads
    numThreads = numThreads < 1 ? 1 : numThreads;

    std::cout << "Number nodes per document:       "
              << threadInfo.numNodes << std::endl;
    std::cout << "Number of writer threads:        " << numThreads
              << std::endl;

    std::string msg = threadInfo.useReadCommitted ?
                        "Read Committed " :
                        "Default";
    std::cout << "Isolation level:                 " << msg
              << std::endl;

    msg = containerType == XmlContainer::WholedocContainer ?
                           "Wholedoc storage" : "Node storage";
    std::cout << "Container type:                  " << msg << "\n\n"
              << std::endl;

    // Env open flags
    envFlags =
      DB_CREATE     |  // Create the environment if it does not exist
      // Removed DB_RECOVER flag
      DB_INIT_LOCK  |  // Initialize the locking subsystem
      DB_INIT_LOG   |  // Initialize the logging subsystem
      DB_INIT_TXN   |  // Initialize the transactional subsystem.
      DB_INIT_MPOOL |  // Initialize the memory pool (in-memory cache)
      DB_PRIVATE    |  // Region files are not backed by the filesystem.
                       // Instead, they are backed by heap memory.
      DB_THREAD;       // Cause the environment to be free-threaded

Now we configure our environment to keep the log files in memory, increase the log buffer size to 10 MB, and increase our in-memory cache to 10 MB. These values should be more than enough for our application's workload.

    try {
        envp = new DbEnv(0);

        // Specify in-memory logging
        envp->set_flags(DB_LOG_INMEMORY, 1);

        // Specify the size of the in-memory log buffer.
        envp->set_lg_bsize(10 * 1024 * 1024);

        // Specify the size of the in-memory cache
        envp->set_cachesize(0, 10 * 1024 * 1024, 1);

Next, we open the environment and setup our lock detection. This is identical to how the example previously worked, except that we do not provide a location for the environment's home directory.

        // Indicate that we want to internally perform deadlock 
        // detection.  Also indicate that the transaction with 
        // the fewest number of write locks will receive the 
        // deadlock notification in the event of a deadlock.
        envp->set_lk_detect(DB_LOCK_MINWRITE);

        // Open the environment
        envp->open(NULL, envFlags, 0); 

        // Create and open a DB XML Manager.
        mgrp = new XmlManager(envp,
                              DBXML_ADOPT_DBENV); // Close the env when
                                                  // the manager closes.

When we open our container, we provide an empty string for the container name. This causes the container to be kept entirely in memory.

        // Open the container
        threadInfo.container =
            mgrp->openContainer("",
                DBXML_TRANSACTIONAL | // Container is transactional
                DB_THREAD           |
                DB_CREATE,            // Create the container if it does
                                      // not exist.
                containerType,        // Type of container to create
                0);

After that, our main() function is unchanged, except that our error messages are changed so as to not reference the environment home directory.

        // Initialize a pthread mutex. Used to help provide thread ids.
        (void)pthread_mutex_init(&thread_num_lock, NULL);
        // Initialize a pthread mutex. Used to count the number of
        // deadlocks encountered by the various threads in this example.
        (void)pthread_mutex_init(&thread_num_deadlocks, NULL);

        // Start the writer threads.
        pthread_t writerThreads[numThreads];
        for (i = 0; i < numThreads; i++)
            (void)pthread_create(
                &writerThreads[i], NULL,
                writerThread, (void *)&threadInfo);

        // Join the writers
        for (i = 0; i < numThreads; i++)
            (void)pthread_join(writerThreads[i], NULL);

    } catch(DbException &e) {
        std::cerr << "Error opening database environment: "
                  << std::endl;
        std::cerr << e.what() << std::endl;
        return (EXIT_FAILURE);
    } catch(XmlException &xe) {
        std::cerr << "Error opening XmlManager and Container: "
                  << std::endl;
        std::cerr << xe.what() << std::endl;
        return (EXIT_FAILURE);
    } catch(std::exception &ee) {
        std::cerr << "Unknown error: "
                  << ee.what() << std::endl;
        return (EXIT_FAILURE);
    }

    try {
        // Close our manager if it was opened.
        if (mgrp != NULL)
            delete mgrp;

        // We don't have to close our container or
        // environment handles. The container closes
        // when it goes out of scope. The environment
        // is closed when the manager is deleted, because
        // we specified DBXML_ADOPT_DBENV on the manager
        // open.

    } catch(XmlException &xe) {
        std::cerr << progName << "Error closing manager and environment."
                  << std::endl;
        std::cerr << xe.what() << std::endl;
        return (EXIT_FAILURE);
    } catch(std::exception &ee) {
        std::cerr << progName << "Error closing manager and environment."
                  << std::endl;
        std::cerr << ee.what() << std::endl;
        return (EXIT_FAILURE);
    }

    // Final status message and return.

    std::cout << "I'm all done." << std::endl;
    std::cout << "I saw " << global_num_deadlocks
              << " deadlocks in this program run."
              << std::endl;
    return (EXIT_SUCCESS);
}

That completes the updates we must make in order to cause the application to keep its environment, container, and logs entirely in memory. The writerThread() is left entirely unchanged.

If you would like to experiment with this code, you can find the example in the following location in your BDB XML distribution:

BDBXML_INSTALL/dbxml/examples/cxx/txn