Difference between revisions of "Open Search"
From Hack Sphere Labs Wiki
(→Custom Search Engine) |
|||
Line 1: | Line 1: | ||
+ | =Elastic Search= | ||
+ | *http://www.elasticsearch.org/videos/to-infinity-and-beyond/ | ||
+ | |||
+ | |||
=Decentralized Search= | =Decentralized Search= | ||
Revision as of 18:03, 18 May 2013
Contents
Elastic Search
Decentralized Search
I really think that decentralized search will be the future. Reasons:
- More Freedom
- DCMA Requests PFFFTT? (Trying getting 300 million people to remove a result)
- Content is owned by no one. This means that the content is not a lie.
- When you know what is under the hood you know what you are getting and how to change it.
- I like the fact that it will confuse ISP's when every users computer is maxing out the "unlimited" BW that they purchase every month.
- You can index anything and everything if you want. Facebook,twitter, etc can't block the entire internet.
Finally someone has released something:
Resources and Links
Custom Search Engine
The original google search document (the paper on google search engine) is great for a start on the creation of a search engine. The document retrieval and storage processes are easier to create today then they where before.
- http://doc.scrapy.org/en/latest/intro/install.html
- http://readthedocs.org/docs/scrapy/en/0.12/intro/tutorial.html
The indexing and index storage could start the same but in the end should be different. All search engines are the same right now. They use similar methods to index information and such. This is where the true experiment comes in.
Here is a reference to an index you can just download: http://search.slashdot.org/story/11/11/15/0057200/common-crawl-foundation-providing-data-for-search-researchers
Algorithm Experiments
- Image Matching?
- IP relationships?
- Last updated
- Links
- Metadata
- Features of site(WEB 2.0ey?)
- Information Category (Forums, Blogs, Feeds)
Notes
The Plan
- Get google paper
- pull objects out
- create objects
- import local server(virtual)
- setup a server on a decently fast consumer internet connection to test
- move to dedi box