Data and methods

The data and methods used in Trase are being improved constantly as we develop better methodologies and identify new data sources. Trase will continue to improve the quality and accuracy of its data, and is committed to communicating these limitations transparently, including on this website and via reports and peer-reviewed publications.

The Trase 1.0 beta platform was first launched on 11th November at UNFCCC COP 22 in Marrakech. The information presented here reflects the status for Trase 1.1, released on March 20th at the Tropical Forest Alliance General Assembly in Brasilia. Future updates will follow a comprehensive versioning and documentation system so users can track changes and understand the logic behind them.

Mapping of global supply chains in Trase

To map global supply chains of agricultural commodities Trase uses the SEI-PCS approach (Spatially Explicit Information on Production to Consumption Systems) developed by researchers at the Stockholm Environment Institute. Central to the SEI-PCS approach is the use of multiple, independent datasets, all of which are either publically available or available for purchase – it uses no private or confidential information – to triangulate flows of traded commodities between producers and consumers. The Trase 1.1 release is focussed on linking sourcing regions at subnational scales (e.g. municipalities in Brazil and departments in Paraguay) to exporters, importers, and consumer countries engaged in international trade of farming commodities. The subnational resolution depends on the availability and accuracy of the datasets, and can be improved as new data become available.

Trase 1.1 includes Brazilian and Paraguayan soy, but the same approaches and methods will be applied to other commodities and countries as Trase expands its coverage. To learn about plans for expansion to other geographies and commodities please read the Trase strategy to 2020.

SEI-PCS brings together multiple datasets from domestic and international sources (see below). The basic structure of the supply chain is defined by per-shipment and/or per day custom declarations and bills of lading compiled from official sources. The information is acquired for specific trade codes (e.g a specific type of soy oil), which together make up the total trade for a given commodities (e.g. soy bean, and raw equivalents of soy beans for derived products). Once the basic structure is in place, we identify the sub-national origin of individual material trade flows by triangulating the information with a wide range of other datasets, including on the logistics of trading companies, production and taxation data. This cross-validation approach is implemented in a logic based decision tree. In the case of Brazilian soy the decision tree includes self-declarations by companies on their supply chain sourcing and logistics, official data on silo and warehouse location and ownership, crushing facilities per company, official production data per municipality, municipal domicile and state of production associated to each export operation, and the National Registry of Legal Entities (CNPJ) identifying the presence of companies in different locations and their economic activities as reported to the Federal Government. Where it is not possible to triangulate municipal origin trade data with these auxiliary datasets, expert knowledge on the geography of individual supply chains and transport networks can sometimes be used to improve the identification of sourcing regions. Where it is not possible to identify sourcing regions with a minimum level of accuracy the region of origin is classified as unknown. These cases are mostly related to import-export trading companies with limited participation in the physical supply chain. The shipments associated to these companies are linked to ports of export or company headquarters, making it impossible to trace back soy origins with our current approach. However even in those cases the state of production is often known (from customs and tax data), as are all the downstream traders, total volumes and country of destination. The SEI-PCS approach is highly flexible and scalable, and in the future we aim to include new datasets on sanitary registries, domestic transportation movements, fiscal notes per sale operation at the producer side and others, helping to provide more confidence to individual supply chains and resolve unknown connections which currently account for about 20% of the total exported volume of Brazilian soy).

This SEI-PCS approach, based on thousands of detailed records, allows for obtaining a robust linking of supply chain actors and traded volumes with municipalities where the commodity is first handled (e.g. stored in warehouses or crushed), which we call “logistic hubs” on the platform.

In the current 1.1 release Trase includes very little information linking individual farms in municipalities of production with these logistic hubs. To make this linkage we run a highly constrained optimization model that relies on linear programming to allocate flows of commodities from municipalities with known levels of production (based on official production data) to logistic hubs with known levels of “demand” (as determined by our cross-validated supply chain map), based on minimum transportation costs and state-level origin of production data.

Supply chain maps provided by Trase

In both Paraguay and Brazil, the SEI-PCS approach provides with the right amount of exports per company through the right ports and importers, to specific countries of import. The Trase platform offers this information in raw commodity equivalents, transforming the tones exported of e.g. soy cake and soy oil into their equivalents of soybeans needed for their production. It is straightforward to disaggregate current flows into the different sub-products but this capability is not yet implemented. The current release accounts for 100% of the exported volumes (both by sea, land, air or waterways), mapping 100% of the exporters and the countries of import. Although not yet implemented, re-exports based on bilateral trade matrices developed by FAO will be included in a later release based on a mass balance approach.

In addition to accurately mapping the actors that make up a supply chain and the commodity volumes traded by them, the most relevant advance of SEI-PCS is to link those actors with the places of production, for example at the municipal scale in the case of Brazil. The approach estimates the location with an error that depends on the scale of analysis: for the case of Brazilian soy the error should be minimal at state and biome levels, and dependent only on the data provided by the government and the traders in their official customs and bill of lading contracts; high municipal accuracy at the logistic hub level – storage silos and warehouses – accounting, for example, for 79% of the total volume of Brazilian soy traded in 2015. At the municipal production level the exact volumes are obtained based on a model that assumes an optimization of transportation costs constrained by state of origin of goods. The use of a transportation cost model means that actual volumes may differ from reality in case whether unknown transportation routes are used. This uncertainty will be reduced in future releases by additional mapping of supply chain actors and logistics using additional datasets, as well as mapping back to individual soy producing properties wherever possible. The aspiration of Trase is to work collaboratively with users of the platform, including both companies and governments, to contribute their own data to improve the usability of the platform.

Vital statistics for Brazilian and Paraguayan soy in Trase 1.1

Brazil produces some 30% of the world´s soy, second only to the USA, and is the largest exporter, shipping some 100 million tonnes in 2016 alone, 40 million of which went to China.Trase maps these exports in unprecedented detail, linking over 2000 municipalities of soy production to countries of consumption worldwide via more than 79,000 unique supply chain pathways, encompassing 385 exporting companies, dozens of ports and hundreds of importing companies. Currently 21% of the volumes are not identified with a specific municipality of production but only their state of production. However 100% of total exports are identified including accurate information on total volumes per shipment, FOB value in USD, exporter name and country of import. Currently the importer companies are obtained from the custom declarations of the Brazilian Government, which include foreign companies that trade the soy but are not necessarily the companies that physically handle the commodity in countries of destination. Future releases will include more information on the companies that physically handle imports.

Paraguay is the 7th largest soy producer. Trase links soy production in 9 departments to a total of 8.2 million tones of soy exports (2016), encompassing a total of 56 exporters and 31 countries of import. The current release of Trase does not include information on importers of Paraguayan soy in countries of destination.

Trase distils the complexity of international commodity supply chains into extremely accessible data visualizations and summaries, enabling the user to extract individual supply chain flows on countries, companies and municipalities of interest at a few clicks of the mouse. A major innovation of the Trase platform is to link these complex material flows to a wide range of social and economic indicators associated with different sourcing areas. All the information is made available through a data portal on the Trase platform.

Data

The Trase platform uses several core and auxiliary datasets to provide confidence to the allocation of trade export volumes to sub-national localities.

  • Customs declarations and tax records for exports, collected by government institutions
  • Bills of Lading - is a legal document issued by a carrier (or his agent) to acknowledge receipt of cargo for shipment. It is a conclusive receipt, evidence of the contract of carriage, ensuring that exporters receive payment and importers receive the merchandise.
  • Self-declarations of the traders on their own logistics and sourcing areas, published publically in the internet.
  • National Registry of Legal Entities (CNPJ) identifying the presence of companies in different locations and their economic activities as reported to the Federal Government of Brazil
  • Production data from the finest available administrative division.
  • Silos and warehouses registration records identifying their location and ownership, obtained from public and/or private sector sources.
  • Industrial processing facilities (e.g. crushing facilities), records identify their location and ownership, obtained from public and/or private sector sources.
  • Other sources, such as sanitary registries, domestic transportation databases, fiscal notes per sale operation will progressively be explored. Other data sources may be useful for specific country/commodity combinations (slaughterhouse locations and cattle sanitary inspections for beef exports), in an ad-hoc basis.

The data requires a considerable amount of pre-processing, including formatting, consolidation of company names, countries and locations, as well as a transformation of the different sub-commodities to their raw equivalents. When both custom declarations and bills of lading are available, we match them to the extent this is possible, using algorithms that detect per-shipment common patterns in date, volume, commodity descriptor, exporter, port of export, FOB values, volumes in tons and country of destination. This is not only used to obtain a more robust dataset, but also to consolidate information that may be present in one dataset and not the other, thus obtaining a more comprehensive dataset.

Trase data sources (release 1.1)

See a description of all data sources used in the first release of Trase