today I want to describe how to use C# for scrapping results.
For me it was surprising to see, that there is a stereotype that for scrapping the best tool or tools only to use are R, Python. And majority of people who needed functionality of parsing wanted that functionality in some of those languages. While I do agree that those are great languages, which really enhance toolbox of developer, I want to show also that in C# you can achieve similar results.
Below goes screenshot of C# program which sends request to google and receives result:
Application workflow is pretty simple.
- You enter search request. At screenshot you can see that search request was i7 skylake
- Press at button Request
- Application send request to google.
- Application parses respond from the first page from google and shows urls and titles of urls.
The program itself can be downloaded from here
Below I also provide hear of that program:
public partial class Form1 : Form
HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();
private void btn1_Click(object sender, EventArgs e)
StringBuilder bufferForHtml = new StringBuilder();
byte encodedBytes = new byte;
var urlForSearch = "http://google.com/search?q=" + txtSearch.Text.Trim();
var request = (HttpWebRequest)System.Net.WebRequest.Create(urlForSearch);
var response = (HttpWebResponse)request.GetResponse();
using (Stream responseFromGoogle = response.GetResponseStream())
var enc = response.GetEncoding();
int count = 0;
count = responseFromGoogle.Read(encodedBytes, 0, encodedBytes.Length);
if (count != 0)
var tempString = enc.GetString(encodedBytes, 0, count);
while (count > 0);
string sbb = bufferForHtml.ToString();
var processedHtml = new HtmlAgilityPack.HtmlDocument
OptionOutputAsXml = true
var doc = processedHtml.DocumentNode;
foreach (var link in doc.SelectNodes("//a[@href]"))
string hrefValue = link.GetAttributeValue("href", string.Empty);
int index = hrefValue.IndexOf("&");
if (index > 0)
hrefValue = hrefValue.Substring(0, index);
string output = Regex.Replace(link.InnerText, ""\\.?", string.Empty);
If you like C#, you'll see, that program in general sends request, receives response, decodes result, and then parses url, and those, which follow certain criteria are added to listbox.