Jaguar User Guide

Matt Welsh
Jaguar v2.1 - Last modified 17 May 2000

[ Back to the Jaguar Documentation Index ]

Introduction

This is the user guide for Jaguar v2.1, dated 17 May 2000. Jaguar is a system developed by Matt Welsh at the UC Berkeley Computer Science Division which supports high-performance Java communication and I/O. For more information, see the Jaguar Project web pages.

Structural Overview

Before we discuss actually using Jaguar, it might help to get an idea of how the system works. Please read the papers and talks at the Jaguar project web site for background on Jaguar's motivation.

Jaguar consists of two compilers: a front-end compiler which converts Java bytecode to Jaguar bytecode, and a back-end compiler which converts Jaguar bytecode to machine code. The front-end compiler is itself written entirely in Java and is invoked with the command jaguarc. There are two back-ends available: one is a modification of OpenJIT, and the other is a modification to the GCJ Java front-end to GCC. OpenJIT-Jaguar acts like a regular JIT compiler, except that it understands Jaguar bytecode. GCJ-Jaguar compiles Jaguar bytecode to object files, which are then linked into an application binary, which is invoked just as one would any other program. The binary is also linked against a runtime library (libgcj) which provides all of the standard Java libraries, garbage collection, threads, and so forth.

jaguarc: The Jaguar front-end compiler

The real meat of how Jaguar works is embedded in jaguarc, the front-end compiler. jaguarc takes a Java classfile and converts it to Jaguar bytecode. Jaguar bytecode is identical to Java bytecode, with the exception that Jaguar bytecode has several new instructions not supported by Java. Currently, these instructions are

These instructions are necessary for Jaguar bytecode to perform certain operations "outside of the Java sandbox", for example, to manipulate network interface registers, machine-specific data structures, and "external" memory regions outside of the Java heap.

Clearly, we don't want to allow Java application code to use these instructions directly; Jaguar's approach is to translate Java bytecode to Jaguar bytecode at compile time, by applying a set of translation rules. These translation rules embody the privileged Jaguar driver code for a specific function, such implementing Pre-Serialized Objects, or manipulation of VIA network buffers. In some sense, you can think of the Jaguar driver as a specialized "library" for a specific device or low-level system operation which one wishes to perform from Java; however, this driver code is inlined into the Java application at compile time.

Because Jaguar drivers are implemented through bytecode translation, a number of interesting properties arise. First, the resulting code is very efficient. Jaguar driver code is directly inlined with application code, and can be optimized (by the compiler) along with it. Also, the high cost of native methods is entirely avoided. Native methods incur a high overhead for both the method call itself as well as for copying data between Java and the native code context. This is particularly bad using JNI, found in most JVMs. The performance problems with native methods are documented in the papers on Jaguar.

Secondly, rather than hiding all of the driver code behind a method-call interface (as is done with native methods), we can translate an arbitrary set of Java bytecodes into Jaguar bytecode with inlined Jaguar driver code. For example, field accesses can be translated into specialized Jaguar driver code which implements fast object serialization; this is what is done with Pre-Serialized Objects. This is a powerful concept, as it allows Java applications to express system-level operations in a natural way (e.g., through field accesses, use of operators, or method calls as appopriate), yet the implementation of those operations can be extremely efficient. Again, the Jaguar papers contain much more detail about this.

jaguarc takes a Java .class file and produces either a Jaguar class file, which ends in the extension .jagc, or an annotated Java classfile, which is a copy of the original classfile with additional information for use by the back-end compiler. Jaguar classfiles are used by GCJ-Jaguar, while annotated classfiles are used by OpenJIT-Jaguar. The basic idea is that when you use a Jaguar-enabled JIT, the JVM only understands "standard" Java classfiles; using the annotations allows us to endow the classfile with additional information which the back-end compiler can use. The static Java compiler has no such limitation, since it is modified directly to support Jaguar classfiles.

By default jaguarc produces Jaguar classfiles; using the switch --annotate causes it to produce an annotated classfile instead. The rule of thumb is to use jaguarc --annotate whenever you are using OpenJIT-Jaguar, and just jaguarc when using GCJ-Jaguar. Running jaguarc Foo.class results in the Jaguar class file Foo.jagc. Running jaguarc --annotate Foo.class results in the Jaguar-annotated class file Foo.class, and a backup of the original classfile in Foo.class.bak.

The Jaguar back-end compiler

As mentioned above, there are two back-end compilers used by Jaguar. OpenJIT-Jaguar is a modification to the OpenJIT compiler from Fujitsu Labs and the Tokyo Institute of Technology. GCJ-Jaguar is a modification to GCJ, a static Java compiler based on GCC (formerly EGCS). To use Jaguar you will need to download and install one of these back-end compilers. Both can be used to compile and run standard Java applications as well.

The design of Jaguar is such that it is easy to modify an existing JVM, JIT, or static Java compiler to support Jaguar bytecodes. If there is interest I should be able to easily modify Kaffe to support Jaguar. I would very much like to take Sun's or IBM's JVM/JIT solutions and adapt Jaguar to them, however, this is dependent upon getting source code for those systems.

Using Jaguar

How you use Jaguar depends on whether you are using OpenJIT-Jaguar or GCJ-Jaguar as your back end compiler. For most people I recommend using OpenJIT-Jaguar, but GCJ-Jaguar generates faster code (with the loss of some flexibility).

Here's how using OpenJIT-Jaguar works:

  1. The Java source code is compiled to Java bytecode (.class files).
  2. The Java bytecode is annotated using jaguarc --annotate.
  3. The annotated bytecode is compiled to machine code by OpenJIT-Jaguar at runtime. Setting JAVA_COMPILER to Jaguar accomplishes this.

Using GCJ-Jaguar is very much the same as using GCJ by itself. It works like this:

  1. The Java source code is compiled to Java bytecode (.class files).
  2. The Java bytecode is translated to Jaguar bytecode (.jagc files) using jaguarc.
  3. The Jaguar bytecode is compiled to machine code, using gcj. Normally this would be done in several steps, e.g., by compiling each .jagc file to an object file, and then linking many object files together into an executable.

Using jaguarc

jaguarc takes the following syntax:

jaguarc [options] classfile
Only one classfile may be specified on the commandline at a time.

The options can be:

The rulefile is a file containing a list of Jaguar translation rules to apply to the bytecode being processed. Each line of the rulefile has the format

classname [ arguments ... ]
The classname is the fully-qualified Java classname of the rule to apply, while arguments are optional arguments passed to that rule at initialization time. The format and structure of the arguments is up to the rule itself.

The rulefile included in Jaguar v2.0 (in the file jaguar2/default-jaguar-rules) is just:

Jaguar.compiler.rules.PSOBufferRule
Jaguar.compiler.rules.PSORule
Jaguar.compiler.rules.ViaDescrRule
Jaguar.compiler.rules.ViaDBRule
Note that none of these rules have any additional arguments. Here, PSOBufferRule is a driver for Pre-Serialized Object buffers (also known as External Objects in the Jaguar papers); PSORule is a driver for PSO field accesses and method calls. ViaDescrRule is a driver for operations on JaguarVia queue descriptors, and ViaDBRule for JaguarVia doorbell operations.

Clearly, the rules listed in the rulefile must be all accessible to jaguarc (that is, they must be on your CLASSPATH).

jaguarc takes the classfile specified on the command line and outputs a either an annotated classfile (when using the --annotate option) or a Jaguar classfile (with the filename extension .jagc).

Running Applications using OpenJIT-Jaguar

First compile your Java sourcecode to .class files using javac. (I have not tested alternate bytecode compilers, such as Jikes, but they should work.) Next, annotate each class file using jaguarc --annotate. Note that you don't strictly need to run jaguarc on every classfile in your application; only those classfiles that you know will be using Jaguar drivers (such as the PSO libraries) need this treatment. For example you don't need to run jaguarc on any of the standard class libraries in your JDK installation.

Now all that should be required is to set the JAVA_COMPILER environment variable to Jaguar. You will know if it is running if the following line is printed when running a Java application:

   Jaguar/OpenJIT-1.1.10(X86 JDK1.1.7B)

Compiling Applications using GCJ-Jaguar

To compile a Java application to a running binary using GCJ-Jaguar, here's what you do:

  1. Compile each Java source file to bytecode using javac:
      javac myclass.java
    
    (You can also use:
      gcj -C myclass.java
    
    for this purpose, however, GCJ does not yet support inner classes when compiling source. I prefer to use the standard Sun javac as the bytecode it produces is well-understood by many compilers.)

  2. Compile each class file to Jaguar bytecode using jaguarc:
      jaguarc myclass.class
    

    Note that you don't strictly need to run jaguarc on every classfile in your application; only those classfiles that you know will be using Jaguar drivers (such as the PSO libraries) need this treatment. For example you don't need to run jaguarc on any of the standard class libraries in your GCJ installation.

  3. Compile each Jaguar classfile to an object file, use:
      gcj -O6 -c -o myclass.o myclass.jagc
    

    (Of course, you can also do:

      gcj -O6 -c -o myclass.o myclass.class
    
    if you don't wish to use any Jaguar drivers from myclass.)

  4. You can then take a collection of .o files and produce a shared library:
      gcj -O6 -shared -o mylibrary.so myclass1.o myclass2.o ...
    

    You can even combine the last two steps into one:

      gcj -O6 -shared -o mylibrary.so myclass1.jagc myclass2.jagc ...
    

  5. And now you're ready to link the whole application:
      gcj -O6 --main=MyApplication -o MyApplication MyApplication.o mylibrary.so
    
    Here, MyApplication is the class which contains the method
    public static void main(String args[]) { ... }
    that is, the main method for your application. The above example assumes that MyApplication.o was in turn compiled from running gcj on MyApplication.jagc.

  6. To run your application, just do:
      ./MyApplication
    
    that is, just run it like any other program!

Of course, it's not necessary that all of the above files be in the same directory or in the same Java package; I'm only showing it that way for simplicity's sake. Take a look at the Makefiles included in the Jaguar release for an example of using this for a multi-package, multi-directory project.

An additional option to gcj has been added for GCJ-Jaguar:

  gcj -fjaguar-debug ...
will print out additional debugging information while compiling .class or .jagc files. Right now this displays the bytecode PC currently being processed during compilation. This can be useful if you are debugging a Jaguar driver that seems to be giving the compiler problems.

Note that gcj can also compile Java sourcecode directly, as in:

  gcj -O6 -c -o myclass.o myclass.java   # DON'T DO THIS!
While this is fine for classes you don't intend to use Jaguar drivers with, there is the additional caveat that GCJ does not (yet) support inner classes when compiling directly from source. It's usually better to compile down to bytecode first and then to machine code.

Other Tools

jag-dump

Another tool included with Jaguar is jag-dump, which dumps the contents of a Java (plain or annotated) or Jaguar classfile, showing you the bytecode for each method in the class. This is like a Jaguar-enhanced version of javap. This is useful to debug Jaguar translation rules (see below).

To use it, just type

jag-dump [ .class file | .jagc file ]

For example,

% jag-dump Hello.jagc
Hello.jagc is a Jaguar classfile

Method main ([Ljava/lang/String;)V:
[0] getstatic java/lang/System.out
[3] ldc 0x1
[5] invokevirtual java/io/PrintStream.println (Ljava/lang/String;)V
[8] return

Method <init> ()V:
[0] aload_0
[1] invokespecial java/lang/Object.<init> ()V
[4] return

If you are dealing with an annotated classfile, jag-dump will show the original bytecode as well as the "patched" Jaguar bytecode resulting from the application of the Jaguar translation rules.

Java Debugging with gdb

It is possible to debug Java programs compiled with GCJ using GDB. This is very useful, as it allows you to drill down and debug not only high-level Java code, but also debug into native methods, debug multithreaded applications, inspect machine registers, look at the machine code for the portion of the application you're debugging, and more.

To do this, you need a recent version of GDB 4.18 with support for Java and Linux threads; an RPM for Red Hat systems is available from the Jaguar download page. (Note that not all varions of GDB 4.18 include support for Java and Linux threads. It appears as though the GDB 4.18 included with Red Hat 6.1 will work, while previous versions may not.)

When debugging GCJ-compiled Java programs, you need to tell GDB to ignore the SIGPWR and SIGXCPU signals (which are used by the garbage collector). This can be done with the GDB commands:

handle SIGPWR nostop noprint
handle SIGXCPU nostop noprint
Alternately you can place these two lines in the file .gdbinit in the directory where you're running GDB.

Here is an example of debugging a simple test program (which uses multiple Java threads) in GDB:

$ javac TestT.java
$ gcj -g -O --main=TestT -o TestT TestT.class
$ gdb TestT
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) handle SIGPWR nostop noprint
Signal        Stop      Print   Pass to program Description
SIGPWR        No        No      Yes             Power fail/restart
(gdb) handle SIGXCPU nostop noprint
Signal        Stop      Print   Pass to program Description
SIGXCPU       No        No      Yes             CPU time limit exceeded
(gdb) break TestT.main
Breakpoint 1 at 0x8049fa2: file TestT.java, line 64.
(gdb) run
Starting program: /disks/now/grad/mdw/src/ninja/test/mdw/TestT 
[New Thread 16843 (manager thread)]
[New Thread 16835 (initial thread)]
[New Thread 16844]
[Switching to Thread 16844]

Breakpoint 1, TestT.main (args=@806cff0) at TestT.java:64
64          TestT a1 = new TestT(1,false);
(gdb) where
(gdb) where 
#0  TestT.main (args=@806cff0) at TestT.java:64
#1  0x4011033a in java::lang::FirstThread::run (this=@8064f90)
    at /home/cs/mdw/disks/enclave1/libgcj-991104/libjava/java/lang/natFirstThread.cc:52
#2  0x400ccdfa in java.lang.Thread.run_ (this=@8064f90)
    at /home/cs/mdw/disks/enclave1/libgcj-991104/libjava/java/lang/Thread.java:119
#3  0x4011554a in java::lang::Thread::run__ (obj=@8064f90)
    at /home/cs/mdw/disks/enclave1/libgcj-991104/libjava/java/lang/natThread.cc:286
#4  0x4012524a in really_start (x=@805fef0)
    at /home/cs/mdw/disks/enclave1/libgcj-991104/libjava/posix-threads.cc:316
#5  0x401d7ba6 in GC_start_routine (arg=@807ffe0)
    at /home/cs/mdw/disks/enclave1/libgcj-991104/boehm-gc/linux_threads.c:533
#6  0x401eece9 in pthread_start_thread (arg=@bf7ffe7c) at manager.c:204

Note that the stack trace includes both Java code and the native methods in the libgcj runtime library!

You can examine threads using the info threads and thread commands:

(gdb) info threads
* 3 Thread 16844  TestT.main (args=@806cff0) at TestT.java:64
  2 Thread 16835 (initial thread)  0x4022a1bb in __sigsuspend (set=0xbffff4f4)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
  1 Thread 16843 (manager thread)  0x402b37d0 in __poll (fds=0x808fef0, 
    nfds=1, timeout=2000) at ../sysdeps/unix/sysv/linux/poll.c:45
(gdb) thread 2
[Switching to thread 2 (Thread 16835 (initial thread))]
#0  0x4022a1bb in __sigsuspend (set=0xbffff4f4)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
48      ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or directory.
Current language:  auto; currently c
(gdb)

Using Pre-Serialized and External Objects

External Objects and Pre-Serialized Objects (PSOs), described in the Jaguar papers, are the most generally useful feature provided by Jaguar. External Objects allow you to create and manipulate memory regions outside of the Java heap directly from your application; these memory regions could correspond to network buffers, memory-mapped files, or explicitly-managed application heaps. PSOs take this a step further and allow you to map Java objects onto such an external memory region; every field access to these objects automatically "serializes" the data such that it can be directly transmitted over the network, or saved to disk for later use.

The point of PSOs is that the standard Java object serialization is extremely slow, but many applications still want to be able to communicate Java objects over the network, or read and write them from disk files. Using PSOs you can avoid the overhead of object serialization altogether; see the Jaguar papers for more information on this. PSOs mapped onto a disk file give you a kind of "fast persistence" as well as a way to do very efficient disk I/O (since the data is read from or written to a memory-mapped disk file, rather than using the standard Java I/O libraries).

The test programs in jaguar2/classpath/Jaguar/PSO/test demonstrate the use of External Objects and PSOs. PSOTest creates a MemoryPSOBuffer -- an External Object which maps onto a region of memory outside of the Java heap -- and creates a PSO within it. FilePSOBuffer does the same thing, except that the External Object is mapped from a file specified on the command line.

The PSO itself is in the class MyObject.java in the test directory. To specify your own PSOs, you create a class which:

Note that PSOAllocator is an interface which is implemented by subclasses of PSOBuffer. That is, MemoryPSOBuffer and FilePSOBuffer are two valid arguments for the constructor of Jaguar.PSO.PSO. The idea is that the PSOAllocator is responsible for mapping the PSO onto itself.

Any public fields of your class which are one of the Java primitive types (byte, int, long, etc.) or are references to subclasses of Jaguar.PSO.PSO will be represented in the "pre-serialized" format of the object. That is, an assignment of any value to that field writes the value into the External Object memory out of which the PSO is mapped; any read of the field's value will read from the External Object memory as well. Other fields will be treated as normal Java fields.

What this means is that you can do something like:

FilePSOBuffer fb = new FilePSOBuffer(filename);
MyObject mo = new MyObject(fb); // Map PSO onto file
System.out.println("The field value is: "+mo.someInt);
The value of mo.someInt will be read from the data contained in the file.

The format of fields stored in PSOs is defined by the Jaguar driver which implement PSOs (Jaguar.compiler.rules.PSORule). Basically the contents of the PSO are laid out like a C struct, although fields are not aligned to an integer multiple of their size (that is, an int field isn't necessarily aligned to an offset multiple of 4). Object references to other PSOs within the same External Object are stored as two 32-bit words: the offset into the External Object to the PSO, and its size (if it is a PSOArray; otherwise -1 is stored). References to non-PSO objects aren't part of the pre-serialized format of the object. References to PSOs not in the same container are stored with an offset and size of -1, meaning that those references can't be recovered from the pre-serialized form. (However, such references are stored in the Java object itself.)

The PSOTest program dumps the output of the External Object after it's done, so you can see the format for yourself.

The PSOArray class is a "generic" PSO with various methods allowing you to treat it like an array. This is a nice way to get access to external object memory without having to create PSOs, however, the types that you can represent in this way are of course limited. For example,

FilePSOBuffer fb = new FilePSOBuffer(filename);
PSOArray pa = new PSOArray(fb,fb.getSize()); // Map PSOArray onto file
for (int i = 0; i < fb.getSize(); i++) { byte b = pa.readByte(i); }
This reads the contents of the file one byte at a time, through the PSOArray mapped onto the FilePSOBuffer.

Using JaguarVia

JaguarVia is a Jaguar interface to the Berkeley VIA communications substrate, which obtains round-trip times of about 80 microseconds for small messages, and over 488 megabits/sec of bandwidth for large messages on the Myrinet system area network. This is basically identical to the performance of Berkeley VIA as measured from C code.

To use JaguarVia effectively you should be familiar with the VIA architecture as well as Berkeley's implementation of it on Myrinet. See the Berkeley VIA web pages for more information about this. Note that you need to have Berkeley VIA installed and working (from C) before you attempt to use it with JaguarVia.

JaguarVia is a Java implementation of the VIPL (VI Provider Library), the quasi-standard API for applications that wish to communicate using a VIA interface. If you have a C program written to use the VIPL, it is pretty straightforward to port it to Java using JaguarVia.

JaguarVia obtains high performance in 3 ways:

Going through native methods would require a high method-call overhead for all 3 of these operations, as well as an expensive copy between C and Java heap memory for network buffers. Also, JaguarVia is arguably "safer" than use of the C-based libvia library, since it's implemented almost entirely in Java, with Jaguar drivers providing the bare minimum functionality for the 3 operations described above.

The easiest way to see how JaguarVia works is to look at the test programs in jaguar2/classpath/Jaguar/JaguarVia/test. ViaPingpong is a simple ping-pong benchmark (used to measure round-trip latency), and ViaWindow is a streaming packet benchmark (used to measure bandwidth).

Unfortunately VIA is a fairly low-level API, so it's not particularly easy to use at this level. Also, it does not provide any flow control, reliable transmission, or error correction; it is up to the application to implement these features on top of VIA. Note that this is not a limitation of JaguarVia itself; it is the philosophy of VIA that applications should implement their own protocols on top of this bare-bones interface. See www.viarch.org for background on VIA itself.

I have a library which implements a simple reliable transmission protocol over JaguarVia (implemented entirely in Java, of course), which I plan to make available soon. Bug me about it if you are interested in using JaguarVia and would like to see it.

Jaguar Translation Rules

As mentioned above, it's the translation rules from Java bytecode to Jaguar bytecode which are responsible for implementing Jaguar drivers. Jaguar translation rules are subclasses of Jaguar.compiler.rules.Rule which convert a sequence of Java bytecode to some new sequence of Jaguar bytecode. Remember that Jaguar bytecode is the same as Java bytecode, but with a few additional instructions.

You shouldn't have to concern yourself with how Jaguar translation rules work unless you're a developer wishing to build a new Jaguar driver (say, to interface to a new device or change the way External Objects work). A good way to get a feel for how they work is to look at the code for one of the simpler rules, such as ViaDBRule (which transforms calls to methods on the VIA_Doorbell class into direct access to the VIA doorbell register).

The most important structure used in transformation rules is Jaguar.compiler.classfile.CodeTree, which represents a linked list of Java or Jaguar bytecode instructions. insertInsn() lets you insert an instruction, getInsn() reads an instruction, and deleteInsn() deletes an instruction. Jaguar.compiler.classfile.Insn is the class representing an instruction, and it has several subclasses (such as FieldRefInsn or JumpInsn). Jaguar instructions are represented by the class JaguarInsn.

Soon I plan to write a "Jaguar Driver Developer's Guide" which will describe all of this in more detail; for now I hope you can learn enough by reading the code for some of the Jaguar drivers included in the release to write your own.

Caveats and Missing Features

Using OpenJIT-Jaguar, you should be able to make use of all of the features supported by standard JVMs.

Using GCJ-Jaguar, however, you will have to work around some missing featues and bugs in that compiler. GCJ-Jaguar is based on GCJ 2.95.2, which has several important missing features:

Note that all of these features, with the exception of serialization, are supported by more recent versions of GCJ. Soon I plan to release a new version of Jaguar which is up-to-date with a more recent GCJ snapshot. In other words, this is a temporary situation.

Under Linux, GCJ only supports native threads, implemented using the Linux kernel threads mechanism. As has been pointed out by several groups, Linux kernel threads don't scale well and have performance problems under some circumstances (for example, contended locks are slow). The good news is that GCJ is entirely open source and it should be possible to augment or replace the threading mechanism with something better. Any volunteers?

Despite these shortcomings I think that GCJ is an excellent Java platform for doing high-performance Java research. I selected GCJ for this project for several reasons: First, it's open source (very important for research projects); second, it does in fact support native threads; and third, the code it produces is very efficient. It's also a lot easier to work with a static compiler rather than a heavyweight JVM and JIT compiler; there's a lot less complexity involved.

Reporting Bugs

I am interested in helping people to use Jaguar, so please don't hesitate to report any problems or bugs to me.

If you do have a bug report, please e-mail me with a complete description of the problem. Please send me the following information as well:

You can use GDB to debug the problem yourself; see the section on debugging with GDB for details on that.

Credits

Thanks to the following people for their help and feedback:

[ Back to the Jaguar Documentation Index ]

M. Welsh
Berkeley Jaguar Project