MTA is a tool that analyzes use of Posix threads in a program, and detects incorrect use of mutexes and events, performance problems, and potential deadlocks.
License blurb
System requirements
(Linux, HP/UX,
Solaris)
Build & Installation
How MTA works
Compiling your program
Running with MTA event logging
(LD_PRELOAD, mtarun,
direct linking,
naming objects)
Running the analyzer
(options,
output,
messages)
MTA is Copyright 2004 Ivan Skytte Jørgensen.
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.
Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:
Neither the interposer shared library nor the analyzer interacts directly with the compiler so there are no requirements regarding the compiler unless you want to build MTA yourself. The compiler used for the pre-built binaries may have inserted references to compiler specific shared libraries and these dependencies are listed.
Requirements:
Build requirements:
Requirements:
Both 32-bit and 64-bit programs are supported.
Build requirements:
Requirements:
Build requirements:
This section assumes that you are installing MTA version 0.3 on Solaris, and that you want MTA to be installed in /opt/mta-0.3-sparc-solaris. The installation procedure is the same for other operating systems except that the name of the shared librariy is different.
./configure make make test make install
The configure script will detect the operating system and the architecture and select a suitable compiler. The configure script supports a few command-line arguments:
Failures in the make test step means that you have some digging to do. The most common case is that all the test cases fails with "mta interposer could not load libpthread" which indicates that your OS has developed faster than MTA and you will have to find out which library is the correct library to load.
On newer Linux distributions with the TLS-based libc mta does not always know where the real pthread library is, as some of the libpthread.so files is actually ld.so scripts which dlopen() cannot load. In this case the tests will fail and you should override the PTHREAD_LIB_NAME in mta.c
The make install step is optional. If you know how to juggle LD_PRELOAD and LD_LIBRARY_PATH then it is fine with me.
Add /opt/mta-0.3-sparc-solaris/bin to your PATH
Add /opt/mta-0.3-sparc-solaris/lib to your LD_LIBRARY_PATH
There are no "man" pages or "info" files.
Overview of the package:
MTA works by intercepting most pthread_???? calls and logging them to an event log mta.log. libmta.so then calls the real functions in libpthread.so. libmta.so also logs the return from the pthread functions including timestamps and return codes.
The event log can later be analyzed with mtanalyze which can find potential and real performance problems, incorrect use, scalability problems, non-portable behavior, potential and real deadlocks, uncommon behavior, optimization possibilities, etc.
Compile and link your program as usual, including -lpthread on the linker line.
MTA does not currently support programs that fork().
Slightly longer description: If you link directly with libmta.so and the
child process exec()s immediately without calling pthread functions it should
work. The problem is that events from the parent process and the child
process ends up in the same log file which will confuse mtanalyze.
Before you run with MTA event logging you should have debugged your program. Running a few times with Purify, Insure++, IDH, ElectricFence, Valgrind, MPatrol or the like is recommended.
Run a few tests on you program. You do not usually need to run extensive test - just enough to get good coverage. libmta.so will generate a complete event log called "mta.log" which can become quite large (500MB is not unheard of). Make sure you have enough space in the current directory. For best results shut down your program properly.
You can run your program with MTA event logging in 3 ways as described in the following sections
You can use the LD_PRELOAD feature on Linux, Solaris and HP/UX. Assuming that you program is called "threadtest" you can force libmta.so to be loaded before the program is run (and override symbols in libpthread.so) with:
LD_PRELOAD=libmta.so ./threadtest
mtarun is a shell script which uses the LD_PRELOAD feature, but also removes old logs, checks for write permission to the current directory, etc. :
mtarun ./threadtest
You can also link directly with libmta.so. Just insert libmta.so before -lpthread:
CC ... -L/where_mta_is_installed/lib -lmta -lpthread
This does not work under HP/UX as it causes libpthread.1 to loop forever calling ksleep()
libmta.so can only capture the address of mutexes and conditions, and the thread-id of threads. So mtanalyze can only report that e.g. a mutex at address 08fe34004 were not unlocked by thread 17. You can usually find out what is at address 08fe34004 using a debugger, and what thread 17 is using a simple log file. This can be a bit tedious. An alternative to this is to call mtanalyze_name_{thread,mutex,condition}() prototyped in mta_naming.h and located in mta_naming.o. This requires you to modify your program.
The program mtanalyze reads the event log file and analyzes it:
mtanalyze
You will usually want to pipe the output to a pager (more/less):
mtanalyze | more
Or send the output to a file::
mtanalyze -of analysis.txt
Options can be specified to mtanalyze. If you are using recursive mutexes in you program you will have to specify +r. If you renamed the mta.log file you have to specify -if otherfilename. If you want mtanalyze to output its report to something else than stdout your can specify -of outputfilename.
This is a brief summary of the most common options that mtanalyze accepts. The list is not complete. For a complete list run mtanalyze with the --help option:
mtanalyze --help
Options requiring an argument must be separated from the argument. e.g.. +w32 is not valid.
Option(s) | Description |
---|---|
-of filename | Set output filename. Default is standard output. |
-if filename | Set event log filename. Default is mta.log |
+r -r |
Enable/disable recursive mutexes. If recursive mutexes are not used more strict analysis can be done and more errors can be caught. |
+w x -w x |
Enable/disable warning x. Individual warnings and suggestions can be enabled and disabled. |
-wmcr x | Warn mutex contention ratio. If the ratio of contended lock operations is above this level a warning is generated. |
-wmlht x | Warn mutex long hold time. Generate warning if a mutex is held for more than the specified time. |
-hm x | Dump x most used mutexes. By default the 5 most frequently used mutexes are written to the report. |
+ad -ad |
Enable/disable deadlock analysis. mtanalyze can find real and potential deadlocks. |
Example:
mtanalyze +r -wmcr 0.4 +ad -w 32
This makes mtanalyze allow recursive mutexes, sets the threshold for warning about mutex contention to 0.4 (40% of the lock operations), enables deadlock analysis and disables warning 32.
mtanalyze will output event log reading progress to standard output so you
can see when it has finished reading the event log and started analyzing it.
The report produced could look like this:
=============================================================================== ===== Start of pre-analysis ------------------------------------------------------------------------------- ----- Start of mutex-discovery --- Found 2 mutexes ----- End of mutex-discovery ------------------------------------------------------------------------------- ----- Start of condition-discovery --- Found 0 conditions ----- End of condition-discovery ------------------------------------------------------------------------------- ----- Start of thread-discovery --- Found 2 threads ----- End of thread-discovery ===== End of pre-analysis =============================================================================== =============================================================================== ===== Start of mutex analysis ------------------------------------------------------------------------------- ----- Start of mutex use analysis ----- End of mutex use analysis ------------------------------------------------------------------------------- ----- Start of mutex performance analysis --- Suggestion (message code 33): Mutex 400679d4 is only used by thread 1024. It may be possible to eliminate the mutex --- Warning (message code 27): High contention ratio for mutex 08049a04 (50.00% lock operations were contended) --- Warning found in event thread-id: 1026 timestamp= 999943270.736070 mtx-addr = 08049a04 cond-addr = 00000000 event = mutex_lock Warning (message code 28): In at least one occurrence it took 2.001 seconds to lock mutex --- Warning found in event thread-id: 1024 timestamp= 999943270.735746 mtx-addr = 08049a04 cond-addr = 00000000 event = mutex_locked Warning (message code 29): Mutex 08049a04 were held for 2.001 seconds here ----- End of mutex performance analysis ------------------------------------------------------------------------------- ----- Start of mutex deadlock analysis Building lock traces To be examined: 2 threads, 0 traces and 0 lock operations Skipping analysis since deadlocks are not possible ----- End of mutex deadlock analysis -------------------------------------------------------------------------------
mtanalyze can generate a lot of different errors, warnings and suggestions. The following table lists all the message that mtanalyze currently generates.
1 |
pthread_mutex_init() is not necessarily idempotent and could cause a resource leak. It will also create problems if another thread currently owns the mutex. |
2 |
This message usually follows message 1 if the pthread library do not check if a mutex already exists |
3 |
This message usually follows message 1 if the pthread library checks if a mutex already exists at the specified address. |
4 |
pthread_mutex_destroy is not necessarily idempotent and could cause all sorts of problems ranging from nothing to core dumps. |
5 |
This could indicate that two threads tries to destroy a mutex simultaneously. |
6 |
Don't do that. |
7 |
This could indicate that a thread was trying to lock a mutex while another thread destroyed it. |
8 |
This is impossible. This most likely indicates that something has overwritten the stack of the calling thread. |
9 |
Also impossible. This most likely indicates that something has overwritten the stack of the calling thread. |
10 |
pthread_mutex_unlock() is called for an unowned mutex AND that mutex has not been pthread_mutex_init()'ed. If the mutex were known message 24 would have been generated instead. |
11 |
A thread tried to unlock a mutex which has been destroyed in the meantime. At least the pthread_mutex_unlock() call returned after another thread had destroyed the mutex. This may or may not be a problem. |
12 |
Very strange and most likely impossible. |
13 |
A threads attempts to lock a mutex it already owns. |
14 |
Most likely a bug in the pthread implementation or another thread has overwritten the mutex structure. |
15 |
Don't do that. |
16 |
The mutex specified in pthread_cond_wait/timedwait() must be held by the calling thread. |
17 |
The mutex specified in pthread_cond_wait/timedwait() must be held by the calling thread. |
18 |
Posix does not guarantee this to work and pthread libraries differ a great deal on how they treat this situation. It invokes undefined behavior. |
19 |
This probably indicates a bug in the pthread library (or someone has overwritten the calling thread's stack) |
20 |
The event log is probably not complete, or exit() has been called while a thread tried to lock a mutex. |
21 |
Same as message 20. |
22 |
The behavior of pthread_mutex_trylock() on an already held recursive mutex differs in the various pthread implementations. Don't rely on this to work in the same way in other implementations. For non-recursive mutexes it could indicate a flaw in the program logic. |
23 |
pthread_mutex_lock() rarely fails and programs do not usually test the return value. Look into why it failed. |
24 |
Don't do that. |
25 |
This is a very unlikely error and a program does not usually test for the return value. Investigate why it failed. |
26 |
The event log is either incomplete or the program does not shut down releasing all resources. |
27 |
Many threads wants a mutex and have to wait. |
28 |
It took a non-trivial amount of time to lock a mutex. This could indicate a system pushed beyond its limits. This message is usually followed by message 29. |
29 |
Mutexes are not meant to be held for a long time. |
30 |
On at least on occasion a mutex were held for unusually long time. This could indicate unwanted task switches or that something during the hold time took unexpectedly long time. |
31 |
Many pthread_mutex_trylock() operations fails on a mutex and thus wasting CPU time. |
32 |
Consider using a non-recursive mutex and locking it before manipulating what the mutex protects. |
33 |
A mutex was only used by one thread so the mutex is unnecessary. If the mutex operations are in your program and not in a system library you may want to consider removing the mutex. |
34 |
A call to pthread_cond_wait/timedwait() specified a mutex which has been destroyed. |
35 |
The message says it all. |
36 |
The call to mtanalyze_name_mutex() specified a mutex which has already been named |
37 |
The mutex was created (and possibly destroyed) but never used |
99 |
internal error in mtanalyze |
101 |
The specified condition has been destroyed prior to the call. |
102 |
This message is either a result of message 101, or someone destroyed the condition while threads were waiting for it. |
103 |
Probably impossible. Or something has overwritten the stack of the calling thread. |
104 |
pthread_cond_init is not necessarily idempotent and this could lead to resource leaks. |
105 |
pthread_cond_init() returned creating a condition where a condition already exists. |
106 |
- |
107 |
A thread is trying to destroy a condition which has already been destroyed. pthread_cond_destroy is not necessarily idempotent and this could lead to errors. |
108 |
This could indicate that two threads were trying to destroy the same condition simultaneously. |
109 |
Programs usually do not test the return code from pthread_cond_broadcast. Investigate the error and why it failed. |
110 |
pthread_cond_destroy() could not destroy a condition. Investigate why. |
111 |
A condition were destroyed while other threads were waiting on it. |
112 |
pthread_cond_init() failed and programs do not usually test the return value. Investigate why it failed. |
113 |
The decision to wake just one thread with pthread_cond_signal() can usually not be made without holding the mutex. This is most likely a programming error. |
114 |
Programs usually do not test the return code from pthread_cond_signal(). Investigate the error and why it failed. |
115 |
Programs usually use the same mutex for a condition. While it is well-defined not to do so it is usually an error. |
116 |
pthread_cond_wait/timedwait() were called specifying a mutex different from what other waiting threads specified. This is undefined. |
117 |
Programs do not usually test the return value from pthread_cond_wait() as it rarely fails. Investigate why it happened. |
118 |
Many of the pthread_cond_signal() calls did not wake any threads. You can probably implement stronger tests before calling pthread_cond_signal() |
119 |
Some threads were still waiting on a condition when the program terminated. Or maybe the event log is incomplete. |
120 |
The call to mtanalyze_name_condition() specified a condition which has already been named |
121 |
Some of the threads awakened with pthread_cond_broadcast() did very little work before waiting on the condition again. Maybe you can use pthread_cond_signal() instead. |
122 |
A call to pthread_cond_wait() or pthread_cond_timedwait() returned but nothing has called pthread_cond_signal() or pthread_cond_broadcast() on the condition since the thread went to sleep. This is spurious wakeup and explicitly allowed by the Posix standard. However, spurious wakeups are generally rare and it is more likely that a bug has caused the thread to wake up. |
199 |
Internal error in mtanalyze |
201 |
The call to mtanalyze_name_thread() specified a thread which has already been named. You can only name a thread once. |
202 |
A unreasonably large number of threads were present. Consider implementing a "worker pool" or in the case of thread-per-connection making the clients re-use the connections or implement shorter idle time-outs. |
203 |
POSIX only guarantees 64 threads in a process. Your program exceeded that limit and is not portable. |
204 |
A thread lived for a very short time. This could indicate a thread-per-request program which does not scale well and uses unnecessary resources. |
205 |
The message says it all. |
206 |
The pthread_attr_t supplied to the pthread_create() call specified PTHREAD_CREATE_DETACHED, but the thread detached itself before pthread_create() returned. It causes undefined behaviour to detached a thread more than once. |
207 |
A call to pthread_detach() specified an invalid thread-id. The thread may have been created detached and finished already, already been detached or already been joined. In any case, this causes undefined behaviour. |
208 |
An already-detached thread-id was specified to pthread_detach(). |
209 |
210 |
A pthread_join() call specified an invalid thread-id. The thread-id may be uninitialized, overwritten, but can also be a thread-id which is no longer valid because the thread has been detached or joined and exited. |
211 |
A pthread_join() call specified an invalid thread-id. The thread has been detached and cannot be the target for a join. |
212 |
You cannot join a thread twice. |
213 |
A thread was created and finished, but it was never joined nor detached. This may cause a resource leak (eg. stack). |
214 |
A thread was created but it never returned from its thread function nor called pthread_exit(). This may cause problem if you use shared libraries and the thread wakes up duing shutdown. |
215 |
(fatal) The analyzer suspects that the trace file is corrupted or other foul play. |
216 |
A thread performed pthread_exit() or returned, and while looking for lingering events generated by the thread library or cleanup handlers, a pthread_create() returned with the thread ID. This can either be the original pthread_create() returning, or the the thread ID being reused. mtanalyze detected that it was a re-use, in which case the thread lifetime analysis may be slightly off. |
217 |
A thread never returned from pthread_mutex_lock(), pthread_cond_(timed)wait, or pthread_join. |
218 |
A thread is calling pthread_exit() from its cleanup handlers. This causes undefined behavior. |
219 |
While scanning for lingering operations of an exiting thread, mtanalyze found a pthread_create() call returning in a (other thread) which did not call pthread_create(). It is most likely a corrupt trace file, or the createing thread was interrupted in the middle of a pthread_create() call by a signal or a debugger. Re-run your application. |