Translate

Wednesday, 2 October 2013

Validate an IMAGE File : Is it a REAL IMAGE or a file with Image-Like Clothes ?

It is a great feeling to be back here... Was not able to find time for myself from quite sometime.. : - ( Anyways...
This post s regarding a concern which my colleague Mr X raised while i was doing some creative work in my organization for some event. It required the users to upload their pictures and i have to use them for some further processing on an existing application.

The limitation I had was: the existing processing application was written by someone else and He was unavailable with us now.   That app was designed (don't know how and why) to work for .gif files only.

So i had two options: either ask for photographs from everyone in any format and convert them ( i mean manually renaming them to my desired format.. haha i know its cheating), Or asking them to send ONLY the desired format.

I opted for second . Also the pic-collecting app need to be on INTRANET so 1000 people using and uploading large files was a concern...  (I know this as I also used to do it when someone make his app up for some thing... haha.. doing their STRESS testing. :p..)
The approach to validate the input pictures i took was to check the FIleExtension of the input file.
This is what I have seen in many places and I found it very right and then Mr X, renamed a Doc file as GIF and uploaded on my app. My APP considered it to be ALL RIGHT (due to .gif extn). But when i opened it , the photo viewer displayed :"Nothing to Display". And Mr X asked me to Work Out Something that will check for a REAL IMAGE file and Not a File wearing Clothes like  an IMAGE.  I googled and found very good articles and collated them here to have the various techniques to check/validate a real image.

Below are the various techniques to validate the input files:

1.  Check the Image File Extension
This is a very lame technique but If you are sure that users will only upload REAL Image files then you can use this simple approach to validate your input image in a desired format: The code for the same s here:


 



The Other two approached, which I found very fascinated, were to check for Codes of Image files for some hexadecimal codes and then concluding the Image format for the Input Image. Before discussing the techniques, lets have a brief of the background:

[ the below details are copied shamelessly from few different source..  ;-) ]



·         JPEG format:
The bit sequence for a valid JPEG file should be like this:
Byte
1
2
3
4
5
6
7
8
9
10
11
Hex
FF
D8
FF
E0
Skip
Skip
4A
46
49
46
00
Char
Ÿ
Ø
ÿ
À
Skip
Skip
J
F
I
F

   
The point of notice are the RED and GREEN one's highlighted above.

 The first part to look at is the first two bytes of the file. The hex values FF D8 will identify the start of the image file.  This is often enough to know that you have an actual JPEG file.  The next two bytes are the Application marker typically FF E0This marker can change depending on the application used to modify or save the image .Someone has quoted "  I have seen this marker as FF E1 when pictures were created by Canon digital cameras. "
The next two bytes are skipped.  Read the next five bytes to identify specifically the application marker.  This would typically be 4A 46 49 46 (JFIF) and 00 to terminate the string.  Normally this zero terminated string will be "JFIF" but using the previous example of Canon digital cameras this string will be 45 78 69 66 (Exif)


·         TIFF: TAG IMAGE FILE format:
The bit sequence for a valid TIFF file should be like this:

Byte
1
2
3
Hex
49
49
2A
Char
I
I
*

The TIFF image format was designed to become a standard in image file exchange.  Even though it is widely used it never did become the standard that was envisioned.  Most commonly now you might see this format used by document scanners.  The image header for a TIFF image is a fixed 8 byte segment always occurring at the beginning of the file.  To ensure TIFF images can be read properly by PC's (Intel processors) and Macintosh computers the header must indicate a byte order which in this case is the first two bytes of the file.  The first two bytes will either be hex 49 49 (II) for Intel format or 4D 4D (MM) for the Macintosh integer format which was based on Motorola processors.  The next byte is 2A (decimal 42).  This number should never change.

·         BMP format

The bit sequence for a valid BMP file should be like this
Byte
1
2
Hex
42
4D
Char
B
M

·         GIF (Graphics Interchange Format)
The bit sequence for a valid GIF file should be like this

A GIF file (pronounced as "jiff") is a compressed image format.  It uses lossless data compression which is also used in zip and gzip functions.  Lossless data compression ensures that there is no data loss or image degradation.  GIF files are largely used for animated images and in the early years of the internet you would be hard pressed to find a website not using some form of animated GIF file.
To identify the GIF file read the first three bytes of the file.
Byte
1
2
3
Hex
47
49
46
Char
G
I
F



The Approach 2 may or may not be using the above described code check, I AM STILL IN SEARCH of the LOGIC..for this approach....!!!
But
 Approach 3 checks the CODE of HEADERS...

Now we are having two approaches

First is to Open/ Load a file into memory and then check Whether it is a valid File and Other is to check the header of the files without opening them.

The former approach is though MEMORY CONSUMING but a more reliable than Latter.

2. Load and Check for a valid File
 (  Memory consuming but more Reliable approach )

The below code snippet will load a file and throws an exception if it is not a valid Format of Image:



So just handle the exception and You got your problem resolved.


The various other options available for Image formats are:-





3. Check the header of input image file

The below code looks for the first two bytes and then other two bytes to check for a valid JPEG image.



So the expected comparison of the first two bytes should be:
BMP
0x4d42
JPG
0xd8ff
PNG
0x5089
GIF
0x4947
TIF
0x4949

Lets make it a to VS and do some work...

I have few files with me:


And I have tried to fetch the first byte of their header to consider them as identifier for distinguish purposes.

The code goes here:



The output is:


The appropriate comparison will give us the right validation check.

As far as File Size is concerned:

I have used:
int filesize = FileUpload1.PostedFile.ContentLength;

Now a
if (filesize >= maxsize )


where maxSize is MaxNoOfBytes...   can make you raise a warning..   :-)

For few more  operations on Images see Dot Net Perls for Image