Smart networks are loose clusters of Internet of Things (IoT) devices, located around the world. Their networks serve as lightweight intermediate servers for thousands of data companies that depend on full access to publicly available data around the world. Millions of completely legally leased IoT devices act as proxies to fuel our data economy. These smart networks, called proxy networks, are rarely reviewed, analyzed or evaluated to find out how they actually impact the modern data economy.
This year, the second Proxy Market Research report came out from the tech site Proxyway. It did a great job by showing some of the inner workings of open data aggregators, while focusing on the technical aspects of the smart networks.
The data researcher’s toolbox
Gathering data for research is not as easy as it might seem. Not every public data source provides easy access to researchers through API connections. Any researcher that gathers or scrapes web data for research is bound to come up with limitations. Even with an API, their connections are not unlimited, most online content is localized, and some websites ban non-local IP addresses. On top of that, public data providers create artificial barriers for entry by profiling user devices, browser user agents or even their operating systems.
In order to create a sufficiently representative dataset or to access precise and up-to-date data quickly, researchers use a variety of tools. For instance, accessing data on a site might be as easy as sending an API call, but sometimes you have to log data from thousands of pages manually. To automate such a task, researchers employ automated scripts called data scrapers that access a page and pull the information.
As any other automated program, data scrapers can easily overload a server, which is why most websites block users that act too quickly and request too much data in a short amount of time. IP addresses, user agents and other fingerprinting techniques tell website servers when a scraper is connecting to it. This makes user agent changers, advanced Python or Javascript codes and libraries indispensable in data research.
The data scrapers themselves come in a multitude of forms and configurations: some are proprietary, others might be created in-house. Browser libraries, cookie configurations and other settings are also different for every data research project’s data gathering stage. But the sole element of any good data research operation and the only tool that is in every data research toolbox are proxy networks that allow them to change their IP address.
What researchers say
Data is extremely valuable, so it is no surprise that not many researchers share their data acquisition tricks. We need to piece bits of information together to see the full picture, and Proxyway’s 2020 report had an interesting element in its full document that might help us. In the methodology part the authors note how they had reached out to over two dozen market data researchers to ask how they used their tools. Authors of the study found out that data aggregators needed fast and reliable tools, while market researchers did not mind that much to get data slowly, if that meant reduced costs and safer access to local data.
Surveyed experts confirmed that their work would not be possible without robust smart networks of IoT devices, especially mobile and desktop devices. These lightweight servers are also called residential proxies. Most respondents noted how residential proxies provided the best access to data.
These official and legal networks of millions of IP addresses were never transparently tested to determine how good they actually were for gathering market data. This is what led Proxyway’s co-founder Adam Dubois to talk about the usage of proxy networks in a new light.
Not all data research proxies are of the same quality
As Adam and his partner Chris Prosser started testing 9 leading smart networks of residential proxies, their jaws dropped: data access companies could choose tools that varied in quality by more than 3 times. Adam Dubois, the co-author of a proxy market report notes that market research companies have clear needs from proxy networks that supply connectivity for large-scale and economic data gathering. By asking 31 experts in the field, Adam’s research duo found out what those needs were and could combine them into a single quality score.
‘A major proxy provider gives you a powerful tool for a variety of uses. When we talk about market data gathering, scraping or aggregating public data and prices, some providers are at least three times better than others when we consider their speed, scale, efficiency and functionality.’ said Mr. Dubois. He noted that a decade-old proxy provider Oxylabs performed the best for open data aggregation in this year’s tests.
The cost of the right decision
As the Proxyway research shows, any market research project might have a variable of 300% in terms of cost efficiency for smart networks. As the precise testing data is constantly updated, Adam and Chris tweak their proxy provider reviews on a monthly basis. This is the first year that anyone has defined clear proxy network use cases, created scoring systems with experts from those fields, and issued dozens of technical tests of the largest smart networks.
‘Testing proxies is the easy part, what’s hard is figuring out what professionals and researchers care about them in any given use case,’ said Chris Prosser, the co-author of the research report.
The research project
The research project ran questionnaires and interviews of industry experts from Dec 14, 2019 to Jan 31, 2020. The technical proxy network tests ran throughout Jan-Feb and into March. To see precise technical specifications of the tests and how each of the 9 proxy networks performed, visit the Proxy Market Research page.
About Proxyway
Proxyway is a community site dedicated to the research and testing of the largest proxy providers. Its mission is to inform and educate readers – both regular people and tech geeks – about proxy services. Proxyway was founded in 2018 by two tech-enthusiasts, Adam and Chris.
The post How smart networks enable open data research appeared first on RCR Wireless News.