It is a great feeling to be back here... Was
not able to find time for myself from quite sometime.. : - ( Anyways...
This
post s regarding a concern which my colleague Mr X raised while i was doing some creative work in my organization
for some event. It required the users to
upload their pictures and i have to use them for some further processing on
an existing application.
The
limitation I had was: the existing processing
application was written by someone else and He was unavailable with us now. That
app was designed (don't know how and why) to work for .gif files only.
So
i had two options: either ask for photographs from everyone in any format and
convert them ( i mean manually renaming
them to my desired format.. haha i know its cheating), Or asking them to
send ONLY the desired format.
I opted for second
. Also the pic-collecting app need to be on INTRANET so 1000 people using and
uploading large files was a concern... (I know this as I also used to do it when
someone make his app up for some thing... haha.. doing their STRESS testing.
:p..)
The
approach to validate the input pictures i took was to check the FIleExtension
of the input file.
This
is what I have seen in many places and I found it very right and then Mr X, renamed a Doc file as GIF and
uploaded on my app. My APP considered it to be ALL RIGHT (due to .gif extn). But
when i opened it , the photo viewer displayed :"Nothing to Display". And Mr
X asked me to Work Out Something that will check for a REAL IMAGE file and Not a File wearing Clothes like an IMAGE.
I googled and found very good articles and collated them here to have
the various techniques to check/validate a real image.
Below
are the various techniques to validate the input files:
1.
Check the Image File Extension
This is a very lame technique but If you are sure that users will only upload
REAL Image files then you can use this simple approach to validate
your input image in a desired format: The code for the same s here:
The Other
two approached, which I found very fascinated,
were to check for Codes of Image files for
some hexadecimal codes and then concluding the Image format for the Input
Image. Before discussing the techniques, lets have a brief of the background:
[ the below details are copied shamelessly from few different source.. ;-) ]
·
JPEG format:
The bit sequence for a valid JPEG file should
be like this:
Byte
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
Hex
|
FF
|
D8
|
FF
|
E0
|
Skip
|
Skip
|
4A
|
46
|
49
|
46
|
00
|
Char
|
Ÿ
|
Ø
|
ÿ
|
À
|
Skip
|
Skip
|
J
|
F
|
I
|
F
|
The point of notice are the RED and
GREEN one's highlighted above.
The first part to look at is the first
two bytes of the file. The hex values FF D8 will identify the
start of the image file. This is often enough to know that you have an
actual JPEG file. The next two bytes are the Application marker typically FF
E0. This marker can change
depending on the application used to modify or save the image .Someone has
quoted " I have seen
this marker as FF E1 when
pictures were created by Canon digital cameras. "
The next two bytes are skipped. Read
the next five bytes to identify specifically the application marker. This
would typically be 4A 46 49 46 (JFIF) and 00 to
terminate the string. Normally this zero terminated string will be "JFIF" but
using the previous example of Canon digital cameras this string will be 45
78 69 66 (Exif).
·
TIFF: TAG IMAGE FILE format:
The bit sequence for a valid TIFF file should
be like this:
Byte
|
1
|
2
|
3
|
Hex
|
49
|
49
|
2A
|
Char
|
I
|
I
|
*
|
The TIFF image format was designed to
become a standard in image file exchange. Even though it is widely used
it never did become the standard that was envisioned. Most commonly now
you might see this format used by document scanners. The image header for
a TIFF image is a fixed 8 byte segment always occurring at the beginning
of the file. To ensure TIFF images can be read properly by PC's
(Intel processors) and Macintosh computers the header must indicate a byte
order which in this case is the first two bytes of the file. The first
two bytes will either be hex 49 49 (II) for Intel format or 4D
4D (MM) for the Macintosh integer format which was based on Motorola
processors. The next byte is 2A (decimal 42).
This number should never change.
·
BMP format
The bit sequence for a valid BMP
file should be like this
Byte
|
1
|
2
|
Hex
|
42
|
4D
|
Char
|
B
|
M
|
·
GIF (Graphics Interchange Format)
The bit sequence for a valid GIF file should
be like this
A GIF file (pronounced as
"jiff") is a compressed image format. It uses lossless data
compression which is also used in zip and gzip functions. Lossless data
compression ensures that there is no data loss or image degradation. GIF
files are largely used for animated images and in the early years of the
internet you would be hard pressed to find a website not using some form of
animated GIF file.
To identify the GIF file read the
first three bytes of the file.
Byte
|
1
|
2
|
3
|
Hex
|
47
|
49
|
46
|
Char
|
G
|
I
|
F
|
The Approach 2 may or may not be using the above described code
check, I AM STILL IN SEARCH of the LOGIC..for this approach....!!!
But
Approach 3 checks the CODE of HEADERS...
Now
we are having two approaches
First
is to Open/ Load a file into memory and then check Whether it is a valid File
and Other is to check the header of
the files without opening them.
The former approach is though MEMORY
CONSUMING but a more reliable
than Latter.
2. Load and Check for a valid File
( Memory consuming but more Reliable approach )
The below code snippet will load
a file and throws an exception if it is not a valid Format of Image:
So just handle the exception and
You got your problem resolved.
The various other options available
for Image formats are:-
3. Check the header of input image file
The below code looks for the
first two bytes and then other two bytes to check for a valid JPEG image.
So the expected comparison of the first two bytes should be:
BMP
|
0x4d42
|
JPG
|
0xd8ff
|
PNG
|
0x5089
|
GIF
|
0x4947
|
TIF
|
0x4949
|
Lets make it a to VS and do some
work...
I have few files with me:
And I have tried to fetch the
first byte of their header to consider them as identifier for distinguish
purposes.
The code goes here:
The output is:
The appropriate comparison will
give us the right validation check.
As far as File Size is
concerned:
I have used:
int
filesize = FileUpload1.PostedFile.ContentLength;
Now a
if (filesize >= maxsize )
where maxSize is
MaxNoOfBytes... can make you raise a warning..
:-)
For few more operations on Images see Dot Net Perls for Image
For few more operations on Images see Dot Net Perls for Image