Introduction
For much of my career (prior to 1995 when Java appeared on the scene), debugging software was all about answering a frustrating, but critical question: "is there a problem with my algorithm or have I corrupted memory and, hence, the runtime system?" I have traced problems down to memory allocation / deallocation issues many many times.
Fortunately, software can help us write software. Just as we use IDEs and debuggers to make us more efficient, we can use automatic garbage collection (GC) to avoid dynamic memory programming errors and increase productivity dramatically. You are not good at tracking garbage manually and programmers typically do lots of extra copying to resolve "who frees what" issues between programmers.
Simply put, GC relieves the programmer from having to track and deallocate dynamic memory--you do not have to write code to deallocate data structures. GC reduces cognitive load. People estimate about 10% of CPU time for automatically collecting garbage. An excellent trade.
The basic idea and terms
The core GC strategy is:
- At some frequency, distinguish live from dead objects (tracing) and reclaim dead ones.
- Live objects are reachable from roots, which are globals, locals, registers of active methods etc...
Analogy: Imagine walking to the refrigerator and getting out a bowl of grapes. You pick up the bunch by the stem and look in the bottom of the bowl; there are a bunch of black and blue moldy grapes--that's the "garbage". Anything not reachable from the stem has gone "bad." (This analogy is attributed to Randy Nelson, now at Pixar, formerly of the Flying Karamazov Brothers
).
Imagine a heap of dynamic memory for a running program. Certain variables point into the heap to your data structures. When those variables no longer point at a data structure, nothing can reach the data structure so it's garbage:

After garbage collecting, you'd see something like this:

GC often implicitly assumes that the objects in memory are typed so you know how to find the pointers within, say, an AST node. This does not mean that GC only works in interpreters--compiled languages such as C++ can store runtime type information now.
Further, w/o type information you can still do conservative collection. If it looks like a pointer, assume it is. That unfortunately leaves some objects around because you still (erroneously) think you have a pointer to those objects (in fact they are probably just integers). Java has lots of type information in the objects as well as the bytecodes that operate on data, hence, does not have to be conservative in general. Conservative is bad in my opinion since "conversative GC implies leaking memory".
Common Strategies
There are two main user-perspective categories of GC: disruptive and nondisruptive (often called pauseless). I can remember SmallTalk and LISP programs halting in the middle and saying "Sorry...garbage collecting...". Ack! A disruptive GC is one that noticeably halts your program and usually means it's doing a full collection of a memory space and literally turning off your program for a bit. If you interleave little bits of collection alongside the running program, you call it an incremental collector; if it runs at the same times as the program, you call it a concurrent collector. These collectors are a lot harder to implement because you must deal with the program altering data structures while the collector sniffs the structures. If you are building a real-time system, however, incremental collectors are pretty much a requirement.
Reference Counting
This simplest GC strategy is reference counting, which adds a counter for every "object" in your system. When you copy a reference to that object, you increment the count by one. When a reference to an object goes away (such as when a local variable goes out of scope), the count is decremented by one. If, at that time, the count goes to 0, the object is garbage and all pointers emanating from it should be decremented. Then that object is reclaimed (as are any other objects that go to 0 during the count update phase).
Reference counting is mostly nondisruptive, but can't handle cycles. A cycle looks like this:

and then after you dereference the object and decrement its count, it is still greater than 0 because an object inside the heap refers to it. These objects are no longer reachable from a root and will never be reclaimed.

Cycles occur often enough: as in circular queues, doubly-linked trees etc...
The cost is also high as it is proportional to the amount of work done in program.
Disruptive, Stop-And-Collect Schemes
Mark and sweep collectors are two-phase collectors that first walk the live objects, marking them, and then finds all the dead objects (i.e., anything that is not live).
This is pretty easy to understand and build but badly fragments memory, wastes time walking dead objects (assuming you don't have to run destructors), and has bad virtual memory characteristics.
An improvement on this strategy is called mark and compact because it walks memory moving all live objects to the front of the heap, thus, leaving a nice big contiguous block of free memory afterwards. This removes fragmentation concerns and helps locality for virtual memory because objects are kept in same order in which they were allocated. Still have to walk the memory a lot and you have to update pointers since you are moving things around. I believe you still have to walk the dead objects to free them.
If instead of moving live objects to the head of a single heap, you copy live objects to another memory space, you have a copying collector. You only have to walk the live objects (updating their pointers as you move them)--anything left in the old space is garbage. The term scavenging is often used to refer to this process. This has the advantages of the mark and compact algorithm but is more efficient because it can free all dead objects by ignoring entire old heap.
Note that if you have a finalize() method (a destructor), it implies you have to walk garbage even if not strictly required by your strategy.
Allocation for any copying collector is fast because you just have to bump a pointer in the heap; all free memory is contiguous after collection.
Nondisruptive, Generational Schemes
The disruptive stop-and-collect schemes are so disruptive because they have so much work to do--they must deal with the entire heap. If your heap is 600M, then it has *lots*to do.
Observation: most objects live only a short time while some tend to live a long time (think about System object in Java).
A generational collector takes advantage of this observation by having a "younger" and an "older" generation. Objects that live a few "generations" (i.e., collection runs), are moved to the "older" generation, which reduces the amount of live objects the collector must traverse in the younger generation.
Note the similarity to the mark and copy algorithm; here, though, the "when to copy" algorithm is very different. A generational copying collector moves objects to another generation when it has survived a few generations. Mark and copy copies all live objects upon each activation, thus, not significantly reducing its workload for future generations. Also, there may be many generations, not just two spaces as in a mark and copy scheme.
In the end, even generational schemes must stop-and-collect the older generations. Hopefully this can be hidden from the user such as when the system is waiting for user input. Java does this by making the collector the lowest priority thread. When nothing else is running, the collector starts up (and hopefully finishes).
Nondisruptive, Incremental Tracing Collectors
If you must avoid stopping to do collection, you can interleave collection with the running program. Such an incremental collector faces a concurrency problem with the mutator (the program) because while it's looking for (tracing) live objects, the mutator can rearrange the objects.
One means of dealing with this concurrency is a write barrier, which tracks pointer writes. The write barrier can record writes of pointers into objects and re-traverse some objects later. This allows writes to occur while the collector is operating. Reads are obviously not an issue here.
Incremental collectors can be combined with generational schemes for added efficiency.
Hard real-time systems use incremental collectors--you need fixed-cost GC operations.
Java
Pre-JDK 1.2, Java had a conservative mark and sweep algorithm. After JDK 1.2 / HotSpot, Java uses a generational scheme with "train algorithm" for fast GC of mature generation. I find that for very little CPU cost, the -Xincr incremental collector to be very nondisruptive. The latest Java collector has a very fast allocator as well.