Report for download : Resume.pdf

Project report

 

Skeletonization/thinning

 

Brief description:

Skeletonization also known as “thinning” is a preprocessing procedure widely used in computer vision, pattern recognition, image processing and computer graphics. It deals with the line drawing images whose characteristics have little to do with the thickness of the lines such as OPTICAL CHARACTER RECOGNITION.

Definition:

Skeletonization or thinning is a process by which a one-pixel width representation (or skeleton) of an object is obtained. In other words after pixels have been peeled off, the pattern should be still recognized .

Hence the skeleton obtained must have the following properties

•  As thin as possible ( possibly one pixel width)

•  Connected

•  Centered

I used a simple thinning algorithm with reference to HILDITCH'S algorithm.

Hilditch's algorithm consists of performing multiple passes on the pattern and on each pass, the algorithm checks all the pixels and decides to change a pixel

P1 from black to white if it satisfies the following four conditions:

  • 2 < = B(p1) < = 6
  • A(p1)=1
  • p2.p4.p8=0 or A(p2)!= 1
  • p2.p4.p6=0 or A(p4)!= 1
    Stop when nothing changes (no more pixels can be removed)

 

Where B (p1) = the number of non-zero neighbors of p1 and

A(p1) = number of 0,1 patterns in the sequence p2,p3,p4,p5,p6,p7,p8,p9,p2 (in clockwise sequence)

For example consider the following picture

Here B(p1) = 2 ; A(p1) =1

Four Conditions explained

•  Condition 1 ensures that it is a boundary pixel and no end point pixel and no isolated pixel is removed.

•  Condition 2 ensures the connectivity.

•  Condition 3 ensures that 2 pixel wide vertical lines do not get completely eroded.

•  Condition 4 ensures that 2 pixel vertical lines do not get completely eroded.

When i have applied the algorithm as such to different fonts of Assamese script, the result is poor, the size of the characters almost remaining same.

So I made some modifications in this algorithm which proved helpful for me in thinning Assamese characters. Assamese script is similar to Hindi script. The unique feature of these two is “headline” property. Headline is a horizontal line which connects all the characters in the word. Almost Every word in Assamese or Hindi has this headline. I used this property during the modifications.

 

The modified algorithm is presented below.

The algorithm performs multiple passes on the pattern and on each pass, the algorithm checks all the pixels and decides to change a pixel p1 if it satisfies the following conditions

  • 2 < = B(p1) < = 6
  • A(p1)=1
  • A(p4)!= 1
The algorithm stops when no more pixels can be removed.

The condition 3 is removed because no vertical lines are present as such, these are connected by the head line so no problem aroused for me. The output of this contains thin vertical lines as needed but width of the horizontal lines remained same and the noise is more.Therefore the output obtained is again fed to the following algorithm

  • 2 < = B(p1) < = 6
  • A(p1)=1

Again it stops when no more pixels can be removed.

The output of this is to the expected level.

The output of 2 different font words :

Font 1:

Original

After thinning

Original

After thinning

 

 

Font 2:

Original

After thinning

Original

After thinning

 

 

Drawbacks:

On observation, I found following drawbacks in the output

•  Character “na” is not properly thinned.

•  Noise exists in the output.

The noise can be removed using the noise removal algorithms.

 

 

References:

•  Chapter 7: Skeletons Authored by Godfried Toussaint

•  Optical Character Recognition of printed Tamil characters by Anubani Subramanian and Bhadri Kubendran. http://www.ee.vt.edu/~anbumani/tamilocr/

•  B.Tech thesis of Mandeep Alumnus of IITK

•  Project explanation of Daniela Azar http://cgm.cs.mcgill.ca/~godfried/teaching/projects97/azar/skeleton.html

•  A simple and robust Thinning Algorithm C.Lee and P.S.Wang college of computer science, Northeastern University , Boston MA 02115 , USA

 

(This was done as a project in winter 2003 by Thanniru Ramakrishna under the guidance of Dr.SVRao Computer Science Department, IIT GUWAHATI.)

 

 

 


 
 
 
 
 

home