Apache Solr

Solr Basics

Solr is an open source search tool written in Java and based on the Apache Lucene Library. It is meant to be a stand alone web application. It exposes REST like endpoints which can also be extended on (you can set up your own endpoints).

Solr can also be embedded in software.

Solr is not a database, it is an index of data i.e. information (data, meta data, fields etc) in a list with references to where the data came from.

Check out this good intro tutorial in 5 minutes.

Overview of solr architecture

Installing and running Solr locally

For development I start by downloading Solr and running a local instance. You can download Solr from the Solr website. Make sure you have Java installed too.

Once you have installed it, navigate to the core and run:

java -Dsolr.solr.home=path/to/your/core -jar start.jar

Then in a browser go to:

http://localhost:8983/solr/#/

Running Queries

Solr mind map

Using pagination we can set the start and rows parameters. Think of start as the page number, and the rows as the number of records per page:

curl -X GET "http://localhost:8983/solr/paintings/select?q=*:*&start=0&rows=0&wt=json&indent=true"

add example entity

curl 'http://localhost:8983/solr/paintings/update?commit=true&wt=json' -H 'Content-type:application/json' -d ' [ { "uri" : "http://en.wikipedia.org/wiki/Mona_Lisa", "title" : "Mona Lisa", "museum" : "unknown" } ]'

add same as above with more fields

curl 'http://localhost:8983/solr/paintings/update?commit=true&wt=json' -H 'Content-type:application/json' -d '[ { "uri" : "http://en.wikipedia.org/wiki/Mona_Lisa", "title" : "Mona Lisa", "artist" : "Leonardo Da Vinci", "museum" : "Louvre" } ]'

find out what is on the index

curl 'http://localhost:8983/solr/paintings/select?q=*:*&commit=true&wt=json' -H 'Content-type:application/json'

list all the fields with the csv output

curl -X GET 'http://localhost:8983/solr/paintings/select?q=*:*&rows=0&wt=csv'

to post pdf documents

curl -X POST 'http://localhost:8983/solr/pdfs_1/update/extract?extractFormat=text&literal.annotation=The+Wikipedia+Page+About+Apache+Lucene&commit=true' -F 'Lucene.pdf=@Lucene.pdf'

Spellcheck and autosuggest

curl -X GET 'http://localhost:8983/suggest?spellcheck.build=true&wt=json&email=jon.french@uk.dk.com&api_key=07617a9af105ea8015d2d68d0fb9eeb8&core_name=DKFO

Running a spellcheck

http://localhost:8983/suggest?spellcheck.q=new+y&wt=json&spellcheck.build=true 

A good article on google like spell checks is on opensolr.com.

To enable auto suggest and spellcheck follow these steps on opensolr.

Once data is indexed, BUILD your SPELLCHECK dictionary by visiting this url:

Queries with weighting

http://localhost:8983/select?q=text:moon
&fq=language:en-gb
&wt=json
&start=0
&rows=1
&mlt=on
&mlt.qf=language:en-gb
&mlt.fl=language,title,subject
&mlt.mindf=1
&mlt.mintf=1
&mlt.minwl=3
&mlt.count
&fl=language,url,title,introSummary,themeImage,score
&sort=score+desc
&wt=json
&omitHeader=true

ATOMIC UPDATES

You must include all the unique attributes, in this case its "id", "title", "url". For example:

[{ 
    "id": "7645", 
    "title" : "Marks test",
    "introSummary" : {
        "set" : "test test test"
    }
}]

For example:

curl http://localhost:8983/solr/update?commit=true -H 'Content-type:application/json' -d '[{ "id": "1422034025023", "title": { "set" : "Marks test change" } }]'
curl http://localhost:8983/solr/update?commit=true -H 'Content-type:application/json' -d '{ "id": "1422034025023", "title": { "set" : "Marks test change" } }]'

To delete

Using Curl POST XML:

curl -X POST 'http://localhost:8983/solr/update?commit=true' -H 'Content-Type: text/xml' --data-binary '<delete><query>*:*</query></delete>'

Using Curl with Json:

curl 'http://localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '{ "delete" : { "query" : "*:*" } }'

Committing a change

curl -X POST 'http://localhost:8983/solr/update' --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8'

Useful links:

Resources

Query resources: