Business Data Solutions & Services

Database Portfolio

Web scraping, or web content mining, is the process of harvesting or extracting useful data / information from Internet HTML web pages and restructuring them in pre-formatted containers such as CSV, spreadsheets (Excel), XML or SQL databases (primarily MySQL), with the extracted data well organized, orderly indexed and semantically accessible.

Take a peek at our Portfolio or browse them by the Databases Index below.


Many web screen scraping tools exist. They are hard to learn and adapt. To fully satisfy your specific requirements and changing demands, you will need custom Web scraping services rather than ready-made Web scraper tools nor any Web scraping software. In a custom scraping project, not only can you scrape web pages but also other online materials such as PDF, Flash, audios and even videos. The results are highly structural and semantic.

Let us give you a quote.

Need data scraped and reshaped? Feel free to reach us for a quote.


We do NOT scrape privately copyrighted materialsi or materials whose copyright holder expressly prohibits unauthorized distribution. It also depends on how you use the scraped data, e.g. for personal use or for commercial use. If you are not sure whether your scraping project is legal, consult an attorney. Contact us for further information.

What we can do for you?

We are here to address a number of common data problems for your business, including but not limited to:

  • Web scraping and information manipulation to produce new content for your web site
  • Provide large data base for your web applications to build upon
  • Data mining of products data from multiple ecommerce sites
  • Data mining of large chunk of texts
  • Web site data extraction of multiple sites to create meta search engines that present a consistent comparison of results from different sources, such as credit card rates, insurance quotes and property prices
  • Data migration from legacy system to new one with complete data restructuring and modeling
  • Custom coded automated data collection and processing programs / scripts
  • Automated entering of large collection of data entries to a web site via forms / Bulk data submission to web forms

Market & business intelligence, competition analysis

Our services provide you with more than just data, but the information you need to better understand your market and competitors. Examples of using our services in business intelligence include:

  • Periodically reading product listings of your competitors and doing comparison
  • Crawling routinely for recent information or news in your market and use them as recompiled content or research base
  • Get alerts of emerging products and services

Development languages and targeted platforms

Net web page scraping can be done in a variety of languages in addition to SQL (MySQL and MSSQL), such as

  • Python
  • PHP
  • Perl
  • Ruby (on Rails)
  • ASP
  • .Net (C#, VB)
  • Java

In order to be versatile in presenting and administering the data content, you can choose to scrape for a specific platform such as a blog or a forum script:

  • WordPress
  • Joomla
  • vBulletin
  • phpBB
  • MyBB
  • ...

All of the above web programming languages are fully capable of screen-scraping web pages for web data. We work primarily in PHP for all web page screen scraping projects.

We are not cheap solutions.

Web scraping takes tremendous time in raw data analysis and information restructuring. To make the results future proof and versatile enough to fit in as many situations as possible, you need knowledgeable team with experienced hands.

What are web scrapers?

Simply put, Web scrapers, web data miners or content extractors are just screen scrapers doing web data scraping, extracting information and reblending them for more uses.

 

 

 

 

Notes:

  1. We don't scrape privately copyrighted materials such as articles or blog posts. As a general rule of thumb, straight facts such as business listings and historical data cannot be copyrighted - though not always so. Public domain works such as those released by government are always free to be copied and distributed in any way possible.