About Us  
Services  
Industry Sectors  
Delivery Excellence  
Infrastructure  
Security & Confidentiality  
Contact Us  
 
 
Data Capture  
Web Extraction & Web Harvesting  
Data Conversion / PDFs Processing  
Financial  
Document Imaging & OCR  
AutoCAD & Vectorization  

 

 

 

 

 

 

 

Web Extraction & Web Harvesting Services (Manual / Automated)

Web Research, Data Mining, Online Data Entry, Online Capture are some of the services that Nirvana categorizes under Web Extraction / Web Harvesting. Nirvana provides all of these services to help our clients benefit from cost savings as much as 50%!

Using the Internet we can extract data in any format and provide the same in formats suited to the client. Our staff is trained in searching for text, URLs, individual contact information…

Some of our web extraction projects have included:
•  Collecting predefined data from websites to an Excel spreadsheet, Access database, or any structured database or text format
•  Web Mining - searching the web and creating a database of target websites
•  Online Invoice Data Entry and Submission
•  Database population for online stores and B2C sites
•  Web extraction of sports news related information
•  Blog entries for sites using web sources
•  Online tagging of photos

Methodology

We perform a preliminary analysis of client requirements and of the site to understand the data structure and project needs. We then work out the ideal method of Web Extraction (Manual / Automated) and get confirmation for the same from the client. Output/database formats are discussed and finalized after which the extraction is put into production. Daily or weekly updates are provided to clients.

Typical Case

The client provided us with a list of links for classes across one of the larger states in USA. Nirvana used these links to compile a list of types of classes offered which was then further categorized into age groups, timings, etc. An online database was then created for the client which had lots of fields that represented information on all these classes. This project lasted 5 months.

 

 


News