Thursday, October 12, 2006

parsing strings

today I am attempting to parse my string that has been aquired from google search results - i am going to parse it for website addresses.

Thursday, October 05, 2006

reading the text file

I haven't updated in a very long time, why? Lets just say I had a very long holiday...

Well anyway, I left off having managed to get a URL and stick the code in a text file, next I needed to be able to look at the text file so I could begin searching it for URLs and emails. It took me ages to find out but http://www.garybeene.com/vb/tut-file.htm was a very helpful site. Here's the code to open a text file, read each line into the temp string and then stick each line into the alltext string - finally it displays it into a textbox.

Open "N:\vbspider\vbspider\searchresult.txt" For Input As #1
While Not EOF(1)Line Input #1,
temp$alltext$ = alltext$ & temp$ & vbCrLf
WendText1.Text = alltext$
Close #1

Tuesday, July 18, 2006

Woho

Managed to get that part done - its gone to the google site and pulled the source code of the results triggered by the keyword the client enters. I had a problem with getting google to search for the value within the variable 'keyword' rather than actually searching fot the word 'keyword'. In the end James a fellow programmer suggested that I took the 'keyword' out of the quotes and then simply add an ampersand (&) to the end with 'keyword' added afterwards. As I understand it ampersand is usually used to join to strings together into one. But because of the syntax it actually added the value of the variable 'keyword' to the end of the search query URL google uses. Come to think of it, I should probably enter some screenshots here to show the first parts of my development.

Here's the code:





Here's the form:






Thats about it, now I have to get VB to search through the HTML source code and find the relevant link partners.

Looks like we're going somewhere

Right I'm back on this day, tuesday, just like I said I would be, I've been having a look at the code that I left off with and I'm beginning to understand it a little, it seems that the DownloadFile is in fact a command within VB's libary but it needs to have a few of it's settings changed to allow it to run, not quite sure yet what exactly, but I'm getting there. I've decided that I'm going to do the first part now, create some input boxes to allow the user to enter some keywords that their website is related to - they will then click a button to pull up a list of potential link partners using google's search engine. This is really simple, firstly the user enters their keyword, this keyword is entered into a variable, probably 'keyword' and then the data miner code I've got will go to http://www.google.co.uk/search?q='keyword'. This will pull the source for the search results and then it's on to pulling out the URL's and the correct ones too, because of course there's the adverts and irrelevant google links.

Friday, July 14, 2006

Starting to get confusing

I still don't know exactly how the code works and I'm having difficulty learning how to manipulate the text files - I think my brain has overheated and met it's processing limit today, I shall continue with this later, probably next tuesday :)

wow! Thats interesting

Hey I've just been looking through that forum again, the one that had the 'sendrequest' thin in it - I looked back there to see if anyone had referenced to winsock in there code examples (there were a lot of different examples), the first bit of code I looked at did not use winsock - it was small and used some other method, I'm not really sure how it works, but anyway I copied it into vb6 - ran it and voila! it went to google copied the source code and stuck it in a text file named google! WOHO!!!! Now that I know I can retrieve data from the net, all I have to do now is learn how to manipulate the text files - you know, search through them looking for emails and such. My only concern is I will be downloading a lot of data, I need to know how to delete the files once they've been delt with or to find a more efficient way to just view the files rather than download them, it could be quite intensive/demanding on the computers internet connection. Well I need to investigate, I'm gonna look at how it friggin' works and then go ahead with the searchinating of files - oh and don't worry, winsock still hasn't gone, I'll keep it in the the back of my mind, it seemed rather interesting.

winsock- the future?

Went to the microsoft website to find out more about winsock, seems to be a pretty nifty command, it only needs two bits of information to connect to a remote host/computer, that is the I.P address of the computer and the port you want to connect to, I didn't know much about the ports but fortunately there was a tutorial located at http://www.freevbcode.com/ShowCode.asp?ID=3025 which included a list of ports associated with different transfer protocols, in this case I wanted to connect to a website and thus needed to connect to port 80. I've written the code for connecting to a website, however when I run it I have no idea whther it has actually connected, I'm hoping I can find some way of returning errors. I'm also having another problem - from what I can gather winsock works like this - the client connects to the server and sends a request or ping, the sever has been setup so that it waits for this request and sends back the appropiate data. This is cool if you're making an online game or a chat program, however I just want to mine data from websites, the website will not send data back, I'm not sure how I'm going to actually get hold of the data once connected without getting the server setup with the appropiate code.. I still think this may be possible so I'm going to carry on with winsock for a while, if not I shall try to pursue another method.

doh!

Had a look at this 'sendrequest' command, and what a dumbass I was, its not a command at all. I should of looked at the code a little closer, it was a function the programmer had created, so I copied the code into VB6 and tried to run it, unfortunately it requires MSXML to run, I may be able to get a copy, but first I would like to explore into other methods I could use, that way I can pursue the method that suits me best, I've done some researching and I've found something called inet and winsock, winsock looks most promising as it appears to run in a standard exe project on VB6, this is the type of programming I'm most familiar with.

I'll report back soon :)

Tuesday, July 11, 2006

I've been doing some researching, I've done a similar project in php, however vb will be different. From experience I know that the most vitle part of a data miner is its ability to connect to the web - I'm still unsure whether vb6 can actually grab data from the internet, I'm also unsure if it needs to be installed on a server for it to work. From the research I've done, I've found what looks like a very simple way to grab data from the net and put it into a string - the operator is named SendRequest(). I found it at http://www.vbforums.com/showthread.php?s=5b8d2ced56fada595326414b91d974ba&p=2514645#post2514645

however I've just tried it out and it doesn't seem to know what the operator is, it calls it an undefined function - which basically means that the operator SendRequest is not installed in its libary of commands - perhaps it needs to be installed on a sever or maybe a higher version of VB is required - I'll do some more research to see if I can findout anymore about the SendRequest command, will report back soon.

Welcome

This is the very beginning of my visual basic spider project for school, it aims to collect data from a number of websites and then use the information to increase the search position of any given website. The spider will first identify keywords related to the website - find websites that are related to the website - search the websites for email addresses and then email the owner of the website asking for a link exchange.

Thats all for now :)