CONFIGURE SOLR
Koala Framework supports different backends, one is Apache Solr.
Folder-Structure:
Web
=> compontents, controllers, css ...
=> solr => solr.xml
=> solr => solrconfig_master.xml
=> solr => FOLDER => conf => schema.xml
=> solr => FOLDER => conf => solrconfig.xml
=> solr => FOLDER => data (folder has to be created and needs correct rights because solr stores indices)
Folder-Structure example for multilanguage (de, en)
Web
=> compontents, controllers, css ...
=> solr => solr.xml
=> solr => solrconfig_master.xml
=> solr => master => conf => schema.xml
=> solr => master => conf => solrconfig.xml
=> solr => en => conf => schema.xml
=> solr => en => conf => solrconfig.xml
Create a solr.xml
It's possible to define different configs in different folders. But they have to be defined in solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
<cores adminPath="/admin/cores">
<core name="FOLDER1" instanceDir="FOLDER1" />
<core name="FOLDER2" instanceDir="FOLDER2" />
</cores>
</solr>
Create a solrconfig_master.xml
This file is used to define a default config for the different configs.
This is a sample configuration:
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<luceneMatchVersion>LUCENE_35</luceneMatchVersion>
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<indexDefaults>
<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>
<!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<lockType>native</lockType>
</indexDefaults>
<mainIndex>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>32</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<unlockOnStartup>false</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<!-- The number of commit points to be kept -->
<str name="maxCommitsToKeep">1</str>
<!-- The number of optimized commit points to be kept -->
<str name="maxOptimizedCommitsToKeep">0</str>
<!--
Delete all commit points once they have reached the given age.
Supports DateMathParser syntax e.g.
-->
<!--
<str name="maxCommitAge">30MINUTES</str>
<str name="maxCommitAge">1DAY</str>
-->
</deletionPolicy>
<infoStream file="INFOSTREAM.txt">false</infoStream>
</mainIndex>
<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="true" >
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" />
<httpCaching never304="true" />
</requestDispatcher>
<requestHandler name="/search" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<!-- query fields -->
<str name="qf">title^10 normalContent^1 contentstrong^2 contenth1^5 contenth2^3 contenth3^2 contenth4^1.5 contenth5^1.3 contenth6^1.2 keywords^12</str>
<str name="fl">componentId score</str>
</lst>
</requestHandler>
<requestHandler name="/update"
class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/update/json"
class="solr.JsonUpdateRequestHandler"
startup="lazy" />
<requestHandler name="/admin/"
class="solr.admin.AdminHandlers" />
<queryResponseWriter name="json" class="solr.JSONResponseWriter" />
</config>
Create schema.xml
This configuration should be adjusted for your special language needs. The different filters can be found on the solr documentation.
It's also possible to define fields where data should be stored or just referenced. This can be used to add special filters to the search-query.
<?xml version="1.0" encoding="UTF-8" ?>
<schema version="1.2">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="date" class="solr.TrieDateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="bool" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
</types>
<fields>
<field name="componentId" type="string" indexed="true" stored="true" required="true" />
<field name="created" type="date" indexed="true" stored="true" /> <!-- todo change stored to false -->
<field name="lastModified" type="date" indexed="true" stored="false" />
<field name="content" type="text" indexed="false" stored="true" required="true" />
<field name="title" type="text" indexed="true" stored="true" required="true" />
<field name="normalContent" type="text" indexed="true" stored="false" />
<field name="contentstrong" type="text" indexed="true" stored="false" />
<field name="contenth1" type="text" indexed="true" stored="false" />
<field name="contenth2" type="text" indexed="true" stored="false" />
<field name="contenth3" type="text" indexed="true" stored="false" />
<field name="contenth4" type="text" indexed="true" stored="false" />
<field name="contenth5" type="text" indexed="true" stored="false" />
<field name="contenth6" type="text" indexed="true" stored="false" />
<field name="keywords" type="text" indexed="true" stored="false" />
<field name="type" type="string" indexed="true" stored="false" /> <!-- eg. news -->
</fields>
<types>
<fieldType name="text" class="solr.TextField" positionIncrementGap="3">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GermanStemFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German2" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="25"/>
</analyzer>
</fieldType>
</types>
<uniqueKey>componentId</uniqueKey>
</schema>
Create solrconfig.xml
Change or adjust the master config for special needs in this file.
<?xml version="1.0" encoding="UTF-8" ?>
<xi:include href="../../solrconfig_master.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
Start Solr
For local development you can start solr manually:
php bootstrap.php fulltext start-solr
After solr starts, run initial indexing:
php bootstrap.php fulltext rebuild --debug
On production server a tomcat running solr is recommended.
Access Solr
You can open the solr web interface for debugging purposes:
http://DOMAIN:8983/solr/FOLDER/admin/
Configure backend
To use solr backend add those lines to config.ini
:
fulltext.backend = Kwf_Util_Fulltext_Backend_Solr
;optional
fulltext.solr.port = 8983
fulltext.solr.path = /solr/example
Index refreshing controller
There should also be a controller/cronjob refreshing the index regularly to keep the search-results up-to-date.
To keep the index up-to-date use php bootstrap.php fulltext update-changed
And every day php bootstrap.php fulltext check-contents
should be called.