NanoJPEG: a compact JPEG decoder
If you followed my works, you know that I like compact, single-file implementations of decoders for various media formats, and where such a thing doesn’t exist, I tend to write or at least port one myself. Now I’d like to add the third format to that list: Baseline JPEG images.
There are already two decoders on the web that go by the name »Tiny JPEG Decoder«: One of them actually isn’t tiny at all, it’s nothing but a huge load of C++ bloat. Luc Saillard’s decoder at least deserves its name somehow – I use it for my demo engine currently. It’s far from perfect, though – the color conversion code is awful, for example. It may be reasonably fast, but it’s bloated (dedicated conversion routines for every common format) and it lacks a proper chroma upsampling filter, resulting in ugly artifacts.
Since I was writing a JPEG decoder for work anyway, I decided to write another one at home, too. My goals were compact code, reasonable quality (read: a proper upscaling filter is a must-have) and decent, but not necessarily good speed. I think I have achieved that. Here are the bullet points:
- decodes baseline JPEG only, no progressive or lossless JPEG
- supports 8-bit grayscale and YCbCr images, no 16 bit, CMYK or other color spaces
- supports any power-of-two chroma subsampling ratio
- supports restart markers
- the four points above mean that it should be able to decode all digital camera JPEG files and many other JPEG files
- below 900 lines of code (and that already includes over 200 lines of comments and empty lines!)
- converts YCbCr to RGB
- uses a bicubic chrominance upsampling filter (this is actually better than the mere bilinear filter of libjpeg!)
- a little slower than libjpeg
- memory requirements: ~512 KiB (static) + 1x the decoded image size for grayscale images or 2x the decoded image size for color images
- very simple API
- input: memory dumps of JPEG files
- output: memory dumps of raw, uncompressed 8-bit grayscale or 24-bit RGB pixels
- output format is compatible to the PGM/PPM file formats as well as OpenGL
GL_LUMINANCE8/GL_RGB8texture formats - not fault-tolerant – any bitstream error will stop the decoder immediately and return an error to the application
- 100% pure C code
- no warnings with GCC 4.3
-pedanticand MSVC/W3 - 32-bit integer arithmetic only
- supposed to be endianness independent
- 64-bit clean
- not thread-safe
- platform-independent
- includes some provisions to build ultra-small Win32 executables
- open source (free-beer, but maybe not really free-speech software)
- single C file
- batteries example program included
If you want to check it out, here is it:
The code compiles to less than 6 kB of x86 code; with a suitable main() function that doesn’t use any C runtime library calls, it’s easily possible to create a 8 KiB Win32 executable that does what the built-in example program does: Decode a JPEG file to PGM or PPM. (Well, almost. The Win32-only version lacks error messages :) This can be reduced further by using Crinkler, and voilĂ : Here’s a working JPEG decompression program for Windows in just 4085 bytes:
(However, note that it only works properly when run from the command line!)
Update (2009-12-03): Scott Graham has written a C++ port of NanoJPEG. It’s a single header file (only .h, no .cpp) that wraps NanoJPEG into a class and makes it thread-safe, too. The configuration options are limited, though: It always uses libc and the bicubic chroma upsampler, so it’s not suitable for extremely size-critical applications – but these are rare cases anyway :)
Update (2010-05-12): After a little delay of just (*cough*) two months, I updated nanojpeg.c to version 1.1. It fixes two compiler warnings with newer GCCs and, most importantly, a bug that caused NanoJPEG to reject valid files where the number of macroblocks is divisible by the restart interval – oops :)
Note that the new release is just an update of the C file, the example .exe file is still based on version 1.0.

Wow. That is impressive for 6k! Looking forward to seeing what you guys present at Evoke ;-)
Great work! I could really use this. Only need to port it to Pascal first :)
How can this be used to resize a jpg? What extra code would have to be written to use the data parsed by nanojpeg to resize a jpg and create that new file?
Don: Well, it would need a scaler and a file writer, obviously ;)
Depending on what exactly is required, a scaler can be anything between 100 and a few 1000 lines. If you restrict yourself to weighted-average downscaling and no upscaling, you may make it in the aforementioned 100 lines.
The file writer is another tough question: If all you need is uncompressed BMP, TGA or PPM, you might make it in under 50 lines. If you want PNG or JPEG, however, it will be much more. A JPEG encoder, for example, will be roughly the same size as the decoder, maybe a little more. A PNG encoder will be even larger because it requires a more or less full reimplementation of zlib.
I am adding this conversation to this comment section in case anyone else ever wonders about this stuff. :)
I just need simple resizing as I’m doing it to a much smaller size and don’t need amazing perfection in the image. But what I’m not sure about is how I do this scaling. What data do I change? Where can i find information on exactly what pieces to work with (qtables? huffman?) and what gets discarded and what arrays of info are changed. Do you think you could give me a little direction with this?
I need either a JPEG or PNG but if it’s a JPEG, isn’t it just writing everything back out in the same order? So, I write the JPEG “magic code”, then the qtables, sof, huffman, etc. Or am I missing something?
I’m going to be using this on image files from a Palm Pre. Any idea how I:
1. Find out if it is correctly decoding them?
2. Find out info on adding in the proper reading of Exif info?
Don: »Weighted average downscaling« simply means taking the average of the original pixels that make up a target pixel. It’s weighted because you may end up dividing source pixels in the middle and you have to consider that in the averaging step.
However, it seems that what you really want is downscaling right in the JPEG domain. In fact, JPEG makes it possible to reduce an image by 1/2, 1/4 and 1/8 without decoding all the pixels first. If you want 1/8 only, it’s really simple: Just use the DC value directly as the pixel value and don’t perform the IDCT at all. You still need to decode the AC coefficients though, because otherwise you wouldn’t know where to find the next DC coefficient, so all the Huffman stuff needs to be kept. You can reduce the quantization tables to the DC component, though.
Creating a JPEG file from the downscaled file isn’t just a matter of rearranging some bits – after all, you just created a completely new (smaller) image. So nothing saves you from performing the DCTs, quantization and Huffman coding, I’m afraid.
You can get the official Exif standard for free at the CIPA website.
As I mentioned in the previous comment, NanoJPEG correctly decodes images with
YCbCrPositioningset tocentered. I just had a look at example photos from the Palm Pre, and it seems to use centered positioning indeed, so everything is fine :)Hello, i try to use your nanojpeg in my application but compared to basic libpng and internal jpeg decoder it was much slower.
for example i decode 100 images 256×256 (map tiles)
average speed | 16 seconds | 8 seconds | 6 seconds
library | nanojpeg | Quartz2D | libjpeg (compiled with -03)
but your code interface was exactly what i want. especially compared to libjpeg or libpng.
molind: You’re right, NanoJPEG is anything but fast. That wasn’t the design goal anyway – it is optimized for code size and simplicity, not for speed.
I would like implement a jpeg decoder into an embedded system, with only one predefined format to decode, and it seems that the RAM quantity requested by nanojpeg is too big !
Is it possible to reduce it, specially the “vlctab[4][65536]” array ?
Thanks.
yrt: The
vlctabarray is the main part of the Huffman decoder. With this array, the decoder can directly look up any Huffman code in just a few clock cycles. There are two ways to reduce the memory required for this table (none of which I’m going to implement, because it’s a non-issue on most systems):First, the array can be reduced to look only, say, 8 or 12 bits instead of 16 into the bitstream. The base tables would then shrink to 256 or 4096 entries each, but additional tables (of size 256 or 16, respectively) would be required for longer codes. The number of additional tables required is, however, not constant and determined by the JPEG file itself. In the worst case (which is completely unrealistic, though), the tables could even be larger than the original table. Also, decoding would be a little bit slower and you would need some kind of
malloc(), which may also be problematic on embedded systems.The other option would be storing the Huffman code book as a tree and parse it bit for bit. This would not require much memory (should be possible in 512 or 768 bytes per table), but it would be really extremely slow.
Hello KeyJ,
great work! I was looking for a tiny JpegDecoder and this is the one I could use in my application. I am very new in image processing programming. Therefore, I would like to ask you that if you have any flow control diagram of NanoJPEG applicaiton then please send me (or upload here). Then it would be easier as well as helpful for me to adapt in your application in a short time.
Thank you.
With best regards,
Mohsin Reza
Dresden, Germany
e-mail: smmohsin.reza@gmail.com
Mohsin: Sorry, I don’t have a control flow diagram of the decoder – it would be very complex anyway. You could use source code analysis or profiling tools to get a call graph at least, but again, I doubt that this would be very useful.
I was very glad for your program. I would like to translate to Pascal but I do not know in C programming. More details to me uncertain. I do not have any C compiler. Could you send me the translation of the program without compression (8 KiB The Win32 executable)? I would check with a disassembler, what exactly does a C line.
thank you very much
genk: The argument »I don’t have a C compiler« doesn’t count. GCC is available for free on almost any conceivable platform. If you’re on Windows, there’s additionally Microsoft Visual C++ Express Edition, lcc-win32, Digital Mars, Pelles C and perhaps a few others. Furthermore, I can’t think of a harder way of learning C than reading code in a disassembler ;)
I don’t know programming in C, but I know very good in assembly. If I download a free compiler, I’m still not going to understand the C.
genk: But if you have a compiler, you can compile the code to assembly and see what the compiler did, line for line. This is what
gcc -Sdoes, for example.OK. Thanks.
awesome work dood, using this in an embedded project & its a charm :)
have optimised it a bit more for speed, it now operates on average around 3x as fast as original
once again thanks
:*)
Hello KenJ,
I save a jpeg file with photoshop CS4 (baseline standard in photoshop’s option). But nenojpeg can’t decode it. It’s error code is NJ_SYNTAX_ERROR.
Can you give me any suggestion about this problem?
Thanks!
shaw: That’s interesting – I could understand if NanoJPEG threw NJ_UNSUPPROTED, but NJ_SYNTAX_ERROR is strange indeed. Could you send me a file that exhibits the problem, preferrably via e-mail?
I’ve sent the test file to your email.
Thanks!
shaw: This bug is actually already fixed in NanoJPEG 1.1, but even though this version is already two months old, I somehow missed to upload it :( I’ll post the update tonight.
I’ve tried it and it works well.
Thanks!
hi KeyJ,
I just wanted to understand the code.
why are there no comments on many portions of the code ?
does any one have the comments ??
Is there a way to by-pass the CHROMA_FILTER in the subroutine CONVERT.. ? It is taking bulk of the computation time..
I am actually trying to optimiza the code for speed..
any suggestions on this ?
rajendra: There is, and it’s in fact one of the few places of NanoJPEG that are nicely documented :)
Just define
NJ_CHROMA_FILTER=0at compile time and NanoJPEG will use a much faster upsampling algorithm at the expense of greatly reduced quality.