Tuesday, April 12, 2016

We have the entire database “- New Technology

Mossack Fonseca internal database is part of the Panama leak. It reveals María del Mar Cabra in the international journalist network ICIJ.

After 11, 5 million documents leaked from the law firm Mossack Fonseca in Panama, the company announced his clients to be the victim of a hacker attack and is now investigating exactly what information might have leaked out.

But, according to María del Mar Cabra, as bosses for data management at the ICIJ (International Consortium of Investigative Journalists), included Mossack Fonseca internal database in the collection of the Panama-documents.

ICIJ is that together with the Süddeutsche Zeitung, has taken care of the files, which was first submitted to the German newspaper by a whistleblower who called himself John Doe. They include details of 214,488 letterbox companies and their business transactions.

New Technology met her recently when investigative journalists association, FGJ, held its annual seminar in Gothenburg.

– We have the entire database of structured information.

And beyond them, there are also unstructured documents in the leak, said she.

Read more: Mossack Fonseca says that the mail server hacked

María del Mar Cabra also tell new details about the journalist network’s handling of the data.

ICIJ, data journalists and programmers who work in teams to assist journalists in the world who work with documents and for various disclosures, including SVT, now continues to dig into the content.

the 2.6 terabytes of heavy files, such as “John Doe” to the left reporters in the German magazine last year, is the largest ICIJ handled so far. It consists among other things of a large proportion of e-mails, which in turn has attachments.

The Süddeutsche Zeitung first did was to use the software Nuix, a search engine that read the files as text and images via OCR scanning, thereby making them searchable. It was used locally, in the newspaper’s network, says Maria del Mar Cabra for New Technology.

Read more The software behind the Panama-disclosure

ICIJ however, had not own access to Nuix, she says. Instead they used other software to search the documents.

Through an automated process took eleven days to ICIJ’s team to do the leak searchable. Among other things, they used 35 virtual servers to speed up the loading process.

– ICIJ uses mostly of open source programs, which we ourselves can improve. Not least for reasons of cost, she says. The network’s budget is begräsad to 1- $ 1.5 million per year.

The program that ICIJ used to scan and index documents called Solr. To search them then used the software Blacklight.

Through the platform Linkurious, which contains the Swedish graph database Neo4j, was saved then all metadata in the material, said María del Mar Cabra.

Read more Swedish database technology winded up the Panama scandal

Metadata is data that surrounds communication – such as information about who emailed to whom, when and how and from where. This metadata is stored as values, by which the connections between the name and the business can be traced. Relations / graphs presented then developed using Linkurious, which is a visualization tool.

The record of the leak has been The Economist called “the Century leak”.

It can be compared with the Wikileaks leak of US State Department telegrams in 2010, at 1, 7 gigabytes, or Luxleaks 2014, 4 gigabytes, or Swissleaks 2015 containing files corresponding to 3.3 gigabytes.

LikeTweet

No comments:

Post a Comment