walls.corpus

By Nathan L. Walls

  • .
  • .
  • .
  • .

Articles tagged “civics”

wget of mass destruction

David E. Sanger and Eric Schmitt, reporting for the New York Times, have published an article titled “Snowden Used Low-Cost Tool to Best N.S.A.”. I know they’re reporting for a general audience, but I believe the article does a disservice by allowing anonymous national security “officials” to put simple automation into scare quotes:

Using “web crawler” software designed to search, index and back up a website, Mr. Snowden “scraped data out of our systems” while he went about his day job, according to a senior intelligence official. “We do not believe this was an individual sitting at a machine and downloading this much material in sequence,” the official said. The process, he added, was “quite automated.”

The findings are striking because the N.S.A.’s mission includes protecting the nation’s most sensitive military and intelligence computer systems from cyberattacks, especially the sophisticated attacks that emanate from Russia and China. Mr. Snowden’s “insider attack,” by contrast, was hardly sophisticated and should have been easily detected, investigators found.

Automation gonna automate, I suppose. Given that we’ve seen this dance with Aaron Schwartz, Chelsea Manning and Edward Snowden, the national security-industrial complex has a disingenuously naïve view of automation tools, particularly around Schwartz at MIT and Snowden, suggesting there was a mix of luck and quite possibly something nefarious to all this automation. The New York Times should approach statements made by agency officials skeptically. This sort of programming is not hard. Moreover, no one has to work particularly hard to hide this. In fact, what might look to some like “hiding” would simply be polite engineering under a different lens.

One key is a not-at-all-advanced concept of throttling. Well-behaved web crawlers (also known as spiders) are respectful about how many requests they issue in a given amount of time. A lot of requests all at once will attract the very sort of attention unnamed officials seem beside themselves to acknowledge Snowden only barely called to himself.

First, lots of requests in a short amount of time shows up in log files as such and quickly becomes a pattern. Patterns attract attention. Assuming the NSA and it’s various contractors audit access logs (which itself is something I’d automate), spreading requests over time makes it less likely to arouse suspicion. Moreover, unless an audit is looking for a particular type of activity, that manual or automated audit will not care a whit about well-throttled crawler traffic, because it looks a lot like expected traffic. It’s “hiding” to the same degree someone of average height and dress is “hiding” as they walk on a Manhattan sidewalk.

Second, setting aside any activity logs, system activity monitors seem more likely to catch a misbehaving web crawler. System activity monitors look at how much work a machine is doing at a given time. Typical checks look at how busy the CPU is, how much RAM is in use, overall network activity, what processes (“programs”) are running and so on. Some servers have automated checks in place, some don’t. For sake of discussion, I assert the servers hosting the content Snowden accessed were monitored in such a fashion. Now, assume each server has a variable amount, but average band of activity. Unless what Snowden was doing with his web crawler caused one of these checks to go out-of-bounds, it was unlikely to attract attention. Normal activity gets ignored.

On to the alleged crawling software itself.

In interviews, officials declined to say which web crawler Mr. Snowden had used, or whether he had written some of the software himself. Officials said it functioned like Googlebot, a widely used web crawler that Google developed to find and index new pages on the web. What officials cannot explain is why the presence of such software in a highly classified system was not an obvious tip-off to unauthorized activity.

First, Snowden’s job was as a systems administrator. Systems administration and development jobs involve access to not in any way top secret technologies like *NIX servers which typically have a wide-array of built-in scripting languages (Perl and Python most likely, Ruby very possibly). Or, perhaps Snowden is a shell scripter. Bash will get the job done.

As software goes, a basic web crawler is not exceptionally hard. I assert if its written with tools likely already resident on any average server or *NIX-based laptop (e.g. Mac OS X, Linux, possibly Windows with PowerShell), there’s really nothing about one that would raise any particular suspicion. Effectively, the raw pieces of the web crawler were quite likely already present. Writing a text file to marshal these raw pieces together is unlikely to raise suspicion because a systems administrator or software developer already has scores of similar files laying around. There’s not a magic “web crawler” bit that flips and will alert anyone.

As a thought experiment, what happens if every machine is audited and new and modified files are flagged, logged and sent off somewhere for analysis? Probably nothing, because in a large working group, a lot of these files are going to look very similar to each other, have innocuous or cryptic names and it would be a nigh-impossible task to write meaningful software to determine what all of these new files are for and, if they’re programs, what they do. Surely, no one is going to look at each one of these files. It’d be soul-sucking work.

Put another way; hammers, screwdrivers, wrenches, pliers, saws, knives aren’t noteworthy tools in a tool box. A new hammer on a construction site is unlikely to raise any attention. Similarly, just as carpenters use jigs, painters use scaffolding and auto mechanics use impact wrenches, ramps and hydraulic lifts to make their jobs easier, faster, more consistent and less tedious, systems engineers and developers use scripts. Now, imagine a construction site or factory inspecting everyone’s tool bag and workspace constantly for anything “inappropriate”. It wouldn’t be terribly effective and it’d be a huge burden and expense on the actual work. Imagine your average TSA security line at the office park.

There’s also some question about the web crawler having Snowden’s credentials:

When inserted with Mr. Snowden’s passwords, the web crawler became especially powerful. Investigators determined he probably had also made use of the passwords of some colleagues or supervisors.

But he was also aided by a culture within the N.S.A., officials say, that “compartmented” relatively little information. As a result, a 29-year-old computer engineer, working from a World War II-era tunnel in Oahu and then from downtown Honolulu, had access to unencrypted files that dealt with information as varied as the bulk collection of domestic phone numbers and the intercepted communications of Chancellor Angela Merkel of Germany and dozens of other leaders.

Officials say web crawlers are almost never used on the N.S.A.’s internal systems, making it all the more inexplicable that the one used by Mr. Snowden did not set off alarms as it copied intelligence and military documents stored in the N.S.A.’s systems and linked through the agency’s internal equivalent of Wikipedia.

As noted above, there’s nothing particularly special about a web crawler versus any other manner of script. It’s easy to inform utilities like wget and curl about authentication parameters and keep login cookies. It’s also easy for such a web crawler to announce itself to the server it requests information from in any manner. There’s a convention around giving an identification string, as Google and Yahoo do for their web crawlers, but it’s just as easy to call a web crawler Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko or Internet Explorer 11. Add in polite engineering of not requesting every page the web crawler sees as soon as it processes each preceding page and it’s going to be far less obvious that traffic to a web server is coming from a script instead of a human clicking a link. There’s not necessarily anything nefarious going on.

If Snowden had access to all of these systems and accessing what sounds equivalent to a corporate intranet was not going to arouse suspicion, there’s little I can think about this conceptual web crawler that would tip the balance into being caught. If the NSA wasn’t going to catch Snowden doing all of the work himself, it’s no more likely they were going to catch an automated process he wrote.

I don’t find any part of this story surprising from a technical standpoint. What I do find somewhat distressing is that unnamed officials think this is special or conveys villainous status on Snowden. It doesn’t, just as it should not have with Aaron Schwartz. Said officials should actually know better and if they don’t, they need to find technical advisors who will correctly inform them.

I bring this all up because I would like for reporters on stories such as this to find an average systems administrator, security analyst or software engineer to talk to in order to provide perspective. The New York Times has an excellent digital staff with developers who could easily demonstrate what a similar script would look like and how it would work and look internally. Surely, a news organization that builds great interactive stories and is growing more comfortable in its own clothes online can use some agency and draw on some of the experience that’s helping to provide some of that comfort to call officials on bad, self-serving analysis like this.

Open government in Raleigh and BarCampRDU

Untitled

Above: Cristóbal Palmer, Justis Peters and Dan Sterling open BarCampRDU 2013

May 18, 2013 marked the return of BarCampRDU after an 18-month absence. There were five session slots followed by lightning talks. All the sessions I attended were in some way interesting (see below for a list), but I want to start by focusing on one by Jason Hibbets, “Open Source All the Cities!”.

Overall, I was impressed at Hibbets’ passion for civic involvement. I was also pleasantly surprised by the number of Open Data/Open Government initiatives already in-progress in Raleigh. It’s something I’ve had a desire to see and participate in, but did not know was thriving. In particular, I’ll be making use of SeeClickFix to request some traffic calming in my neighborhood. I also want to have a look through Raleigh’s open data portal to see what I can pull out and futz with.

For as much as government is complained about, a lot of time, it’s just “there” for a lot of folks. I don’t know if there was ever a golden age of civic involvement, but I hypothesize that we have better neighborhoods and cities if we learn more about how to affect change in our neighborhood and follow-up on it. What I mean by that is making sure problems with infrastructure are reported, instead of assuming someone in Public Works knows. It means asking for sidewalks, crosswalks and traffic calming to help neighborhoods more walkable. It means reporting problem properties, starting a neighborhood watch, having a block party, cleaning a neighborhood creek or just clearing the storm drains down the block.

These aren’t partisan issues. Think of civics as a sense of stewardship for what’s around you. I am a steward, my neighbors are stewards, if they choose to be, city government is a stewardship, too, and one that I can influence. You can, too.

My raw notes for Jason’s talk:

  • Explaining open source to a non-technical/unfamiliar audience
    • Open source is like a recipe
      • Ingredients
      • Process
  • Principles
    • Transparency
      • Code
      • Roadmaps
      • Bug reports
      • Creates accountability
    • Collaboration
      • Hierarchy vs. peer-to-peer
      • Foster innovative ideas
    • Rapid prototyping
      • Release early, release often
      • Fail faster
    • Meritocracy
      • Best ideas rise to the top
        • Best code tends to win
    • Passion
      • Projects exist because people are trying to scratch their own itch
  • Open source as a philosophy past software engineering
    • opensource.com is where discussion happens about things that can have open source principles applied to other areas
  • Civics beyond government
    • Opening up a channel back to government
    • Creating a community
  • Elements of a open source city
    • Culture/participation
    • Open government and data policies
  • Catalyst: CityCamp Raleigh
    • 200 people each of the last two years
    • Triangle Wiki
      • Had a Triangle Wiki Day
        • 50 people showed up including city council members, mayor
    • RGreenway team built the RGreenway mobile app
      • iPhone
      • Android
    • Goal is to do something after the Camp
    • CityShape – Mayor’s Challenge
      • “Where should we put more density?”
    • SeeClickFix
      • Bug tracking for city infrastructure
        • Potholes
        • Tree branches
        • Graffiti
      • During Hurricane Sandy, people used the tool to organize + source help
      • Bonner Gaylord
        • Open Gov advocate and city councilman
        • Piloted in his district and city council ended up adopting
      • Goal is to break down political boundary barriers (e.g. Raleigh and Cary)
      • You can set-up watch areas to see when other people submit items in your area
      • City council, mayor and local media all get pinged when an item gets submitted
    • Open government resolution
      • City of Raleigh will consider open source software
      • Established an open data portal
        • City put $50,000 and hired an open data manager toward that initiative
      • Open Raleigh website
      • Data Portal
  • Code for America
    • Peace Corps for Geeks
    • Built “Adopt a Fire Hydrant” for Boston
    • Other “Adopta” solutions
      • Sidewalks in Chicago
      • Bus stops in Raleigh
      • Storm drains in Seattle
      • Tsunami Sirens in Honolulu
    • Brigade program
      • Code for Raleigh
        • Active
        • Participated in “Race for Reuse”
          • We already have applications that are good, how can we increase their adoption?
      • Cary just adopted a brigade
      • Durham is close to having one
      • Proposed: Triangle Code for America Division
        • Get multiple brigades together to share expertise
  • He wrote a book, The Foundation for an Open Source City
    • Used IndieGoGo to fund
  • Get involved!

Other sessions

The other sessions I attended were:

  • Organizing Tech. Conferences
    • BarCampRDU 2013 organizer Jeremy Davis led a retrospective on the organization of BarCampRDU 2013 and the elements that make a technical conference successful
    • Worth a post on its own
  • Git Internals
    • Jimmy Thrasher did a nice whiteboard session of how Git does what it does
  • Getting Kanban-odoros Done!
    • Chris Imershein led a session talking high level about
      • Getting Things Done
      • Personal Kanban
      • The Pomodoro Technique
  • Pebble API
  • Lightning Talks
    • There were several, but two in particular I liked.
      • Justis Peters talking about several Coursera courses he is taken or has taken around machine learning and neuroscience
      • A talk on the value of full-spectrum lighting (if someone can tell me the name of the woman who presented that, I would love to update the post accordingly)

Event photos

I took a few. Check out the set I put together over on Flickr.

Thank you

The BarCampRDU 2013 organizing group and volunteers did a fantastic job with the event. Thank you, all, for your efforts in rebooting the event. Having BarCampRDU back is great for the local community. Thank you, too, sponsors.

I’m looking forward to BarCampRDU 2014.

Triangle voting guides

With the November election in just another couple of weeks, it’s past time for me to get ready for early voting. In my last post, I pointed out info regarding early voting and registration info. Hopefully you’re already registered. If not, you can register and vote at one stop voting in North Carolina until Nov. 3.

What about selecting candidates? On my ballot, I have 32 offices and one bond referendum to vote for. Twenty one of those offices are contested and of those, only three have a third-party (Libertarian) or unaffiliated candidate. Several judicial races are uncontested. The race for North Carolina attorney general is also uncontested. That’s a pretty sad state of affairs, partly related to North Carolina’s ballot access policies.

Outside of the presidential races, I have a fair amount of research to do. I use a mix of candidate websites and voting guides.

Checking around today, here’s what I found that covers North Carolina broadly, the Triangle or Raleigh:

My own approach is going to be breaking up my sample ballot into chunks and researching three or four races a day, instead of trying to get through everything at once.

Your vote

On November 6, 2012, you need to vote. Not just for president, either.

In North Carolina, there are races for governor, lieutenant governor and other Council of State offices, state legislature, congress, judgeships, county commissioner, etc. The down ballot races get far less attention, but it’s more likely that your vote will matter more. Please take the time to get ready.

First, If you’re in North Carolina and you have not already registered, you can do so until Oct. 12. Starting Oct. 18, you can vote early. If you haven’t already registered, you can register and vote the same day. See the NC Board of Elections site for more. In Wake County, visit the county Board of Elections early voting site for more info.

Second, know how you’re going to vote before you go. Research. In the Triangle, the N&O has a voting guide prepared by the NC Center for Voter Education.

Third, know when and where you’re going to vote Nov. 6 or through early voting.

No excuses, check your schedule, make sure you can vote, then make sure you’re prepared to vote. It matters.

Next →