OSINT and Big Data in Intelligence Collection

almost 3 years ago

Intro

I'm stuck hallway through a masters program in Intelligence Studies (spy, not IQ). Last year I shared some work that I had done. Since my arm is fugazzi, I'm returning to the sharing of that work.

The elephant in the room

I doubt there is anybody left in the country that has any trust at all in the IC. I certainly don't, and that sometimes show up in my work.

I do think there is still value to you the reader here. Espionage is like informationwar or a gun. it is a tool to use when necessary. While the everyday Hiver doesn't have a taxpayer-filled bag of money to support their own agency, there is a lot of education that is valuable to me (and I think to you). Part that if you can explain where the IC has fallen from where it should be, your arguments against it, or for its reform or for its disbandment will carry more weight.

And on to the show

The Question

Categorize the challenges facing the OSINT community and analyze some of the implications of these challenges for the future. Make sure you place your discussion within the context of the literature.

…

Before I get started, I’ll share this :
Internet Tools and Resources for Open Source intelligence, http://www.onstrat.com/osint/

OSINT faces many of the same challenges as the other disciplines. However, Williams and Blum (2018) point out that OSINT is still not fully defined; that it’s place as an intelligence discipline is still subject to debate. Omand et al (2012) also charge that “that social sciences have not developed an approach to robustly sample social media data sets.”

One of the more controversial issues is the security versus privacy debate; however, OSINT puts a different twist on the issue. While Eijkman and Weggemans (2013) correctly note that the primary mission of intelligence is the security of their charge, they also bring up the privacy issue. Yet, they seem to miss the distinction between “public” data and “private” data in doing so. They attempt to redefine “privacy”...”at least when privacy is defined as the ‘freedom from unreasonable constraints on the construction of one’s identity’

However, they redeem themselves by considering that this problem occurs when “data is disconnected from the context in which they intended it to be”, and we know that good intel is verified with other sources.

A second issue is the amount of data that can be collected with OSINT methods; Lim (2015) discusses a method of mitigating information overload in the context of falsification. Lim suggests that “Big Data” approaches can produce three results:

The first involves the inductive collection of data with the aim of discerning general trends and anomalies.
The second and perhaps more important function related to the formulation of intelligence hypotheses, a stage requiring as little inhibition from cognitive bias, and as much imagination and (informed) speculation as possible.
Big Data allow the intelligence analyst to cut through the overwhelming morass of supporting facts in order to adduce those with refutative value – the search for that one black swan also being naturally far more defined than for the thousandth white one

Lim’s last point addresses falsification directly.

Schneider (2014) addresses the other side of the information overload issue; the issue of coordination. The sheer number of agencies creates it’s own problem in this area.

And Big Data concepts can overwhelm the analytic capabilities of some. Anderson (2008) claims that Big Data methodology makes scientific method obsolete...You have to love it when people that don’t understand scientific method tell you that scientific method doesn’t work because there is too much data, or because they cherry picked enough sources to arrive at a consensus (the point being here is that scientific method has nothing to do with consensus but rather, whether a hypothesis tests true or not)

Eijkman and Weggemans (2013) highlight a third issue with OSINT: “A second obstacle is the multiplication of individual sources.”. A contemporary example would be a supposed “Red Flag” gun confiscation confrontation from earlier this week. Multiple “sources” simply echoed the initial unproven claim. Silvey (2019) notes the danger of using social media in this way.

Omand et al (2012), on the other hand, note the benefits of social media OSINT.

Crowd-sourced information.
Research and understanding
Near real-time situational awareness
Insight into groups
Identification of criminal intent or criminal elements in the course of anenquiry both for the prevention and prosecution of crime.

Does anyone remember using Twitter to keep track of the “Green Movement” in Iran back in 2009?

So, while OSINT has problems (as does any other collection method), it remains the most used source of raw data due to it’s capabilities.

References:
Anderson, C. (2008, June 23). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired. Retrieved from https://www.wired.com/2008/06/pb-theory/

Eijkman, Q., & Weggemans, D. (2013). Open source intelligence and privacy dilemmas: Is it time to reassess state accountability? Security and Human Rights, 23(4), 285–296. https://doi.org/10.1163/18750230-99900033

Lim, K. (2016). Big Data and Strategic Intelligence. Intelligence and National Security, 31(4), 619–635. https://doi.org/10.1080/02684527.2015.1062321

Omand, D., Bartlett, J., & Miller, C. (2012). Introducing Social Media Intelligence (SOCMINT). Intelligence & National Security, 27(6), 801–823. https://doi.org/10.1080/02684527.2012.716965

Schneider, K. (2014, March 18). Challenges and Capabilities Increase the Role of Open Source Intelligence. Retrieved November 28, 2019, from SIGNAL Magazine website: https://www.afcea.org/content/challenges-and-capabilities-increase-role-open-source-intelligence

Silvey, M. (2019, November 25). Whiskey_Warrior_556 is the Mike Brown of the 2A Community. Retrieved November 28, 2019, from 2A Cops website: https://2acops.com/2019/11/24/whiskey_warrior_556-is-the-mike-brown-of-the-2a-community/

Williams, H., & Blum, I. (2018). Defining Second Generation Open Source Intelligence (OSINT) for the Defense Enterprise. https://doi.org/10.7249/RR1964

Discussion

Too much data can cause information overload, and if all of it is coming from the same origin point, then more of it doesn't do a lot of good. Having more sources is good, because it allows for cross checking of information. Having a method of vetting sources is even better; I like the old A-1 model, which is a matrix of judging data by likelihood of information versus reliability of source in the past. So it's not a case of "more is better" as much as "better is better" ;>
You can make the assumption that any data that isn't technically produced (IMINT, SIGINT, etc) is subject to bias. Even that collection method is subject to bias in who selects it to pass on, and it is also subject to CI methods. But more data does allow for comparison of bias.
Another method to use is triangulation of data. For example, I take the information that possibly biased source “A” gives me in reference to source “C”, and the information that possibly biased source “B” in reference to source “C”; I then compare how that information matches to the material that “C” presents. Doing so allows me to judge the reliability of that source, and often can illuminate the bias that affected the presentation of the data. This was a method used by Panamanian dictator Torrijos.

…

There may be a problem with an over-reliance on technology. Agrell (2012) argues that technology has been misused as a “universal solution". In addition, Warner (2012) claims that “technology’s impact on intelligence has been incompletely examined”

Agrell, W. (2012). The Next 100 Years? Reflections on the Future of Intelligence. Intelligence and National Security, 27(1), 118–132. https://doi.org/10.1080/02684527.2012.621601

Warner, M. (2012). Reflections on Technology and Intelligence Systems. Intelligence and National Security, 27(1), 133–153. https://doi.org/10.1080/02684527.2012.621604

…
SOCMINT does come with it's own set of issues. How it is percieved is one of those issues: Eijkman and Weggemans (2013) note that context is important in gathering SOCMINT.
While I was looking for a study discussing the problems of OSINT, I ran across this Gradecki and Curry paper, which in turn led me to the following website: The Crowd Sourced Intelligence Agency (CSIA).
It's an interactive example of what agencies are seeing as they take in OSINT data; it "allows the viewer to experience how intelligence agents view social media posts and two machine-learning classifiers for predictive policing. Like OSINT interfaces used by intelligence agencies and government contractors, the CSIA recontextualizes social media posts by removing them from their original context and reframing them as a potential threat to national security."

CSIA. (n.d.). Retrieved October 26, 2019, from http://www.crowdsourcedintel.org/about

Gradecki, J., & Curry, D. (2017). Crowd-Sourced Intelligence Agency: Prototyping counterveillance. Big Data & Society, 4(1), 205395171769325. https://doi.org/10.1177/2053951717693259

proofofbrain neoxian palnet informationwar politics deepdives spying security education government