This guide provides an example of how to load CSV and JSON data sets into the Siren platform. You should adapt it for use with your own data sets.
Note
The data sets used in the example contains millions of records. If you use these data sets, loading may take a long time to complete.
Instal the Siren platform as described in Installing the Siren Platform.
Install Logstash (https://www.elastic.co/downloads/logstash).
The example uses publicly available data from Companies House. If you want to try it for yourself, you can download:
The company data as one CSV file (http://download.companieshouse.gov.uk/en_output.html).
The person of significant control data as one JSON file (http://download.companieshouse.gov.uk/en_pscdata.html).
Extract the CSV
and TXT
files. Edit the example scripts to match the path and file names.
Create a plain text file with the following content:
input {
file {
path => "<location of BasicCompanyDataAsOneFile-date.csv>"
start_position => beginning
}
}
filter {
csv {
separator => ","
autodetect_column_names => true
autogenerate_column_names => true
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9220"]
index => "company"
}
}
Edit the path to match the location of the CSV
file and save it as logstash_csv.conf
in the same path as the data set.
Create a plain text file with the following content:
input {
file {
type => "json"
path => "<location of persons-with-significant-control-snapshot-date.txt>"
start_position => beginning
}
}
filter {
json {
source => "message"
}
mutate {
uppercase => [ "data[name]" ]
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9220"]
index => "persons-control"
}
}
Edit the path to match the location of the TXT
file and save it as logstash_json.conf
in the same path as the data set.
From the command prompt, navigate to the logstash/bin
folder and run Logstash with the configuration files you created earlier. For example:
logstash -f C:\data\logstash_csv.conf logstash -f C:\data\logstash_json.conf
Tip
You can speed up the import process by installing a second instance of Logstash and running the imports concurrently.