Hashing Algorithms and Security – Computerphile


Let’s say you want to transfer a file
from one computer to another and it is really important to know that it’s
got there intact in one piece erm, you could send it multiple times and
then compare them all – but what generally gets used is something called a hash
algorithm. A hash algorithm is kind of like the check digit in a bar code on a
credit card. I think James Grime talked about this a long long time ago on Numberphile. The last digit in a bar code or on a credit card is determined by all the
other digits on it and if you change one of those digits the last one changes as well so as you
typed into a computer – you can know instantly if you’ve
missed a key somewhere so a hash algorithm is kind of like that
– but for an entire file that might be megabytes or gigabytes in size what it gives you is a code 16 or 32 or
64 characters generally hexadecimal basically just one long number expressed
in that way that is a “sum up” of everything that’s in that file If you crushed it down if you do all
these manipulations to it and crush it down crush it down and crush it down and what it
comes out with this thing that says this is a summary of that file you can never
make it work backwards you can’t pull that data back out but it’s like a
signature it’s like a confirmation that this file
is really who it says it is the simplest hash algorithm I can think of I would just be something like that’s
five add up all the digits in the file which is 4, 9, 14, 23 that’s not a good hash
algorithm for a few reasons hash algorithms have three main
requirements the first one is speed it’s got to be reasonably fast it should
be able to churn through a big file in in a second or two at most but it also shouldn’t be too quick if
it’s too quick it’s easy to break and I’ll explain that later the second requirement is that if you
change one byte one bit anywhere in the file of the start of the middle at the
end then the whole hash should be completely different this is something
called the avalanche effect. If you’re interested in how this is
achieved do look up the actual algorithms themselves. It would take me an
hour to explain vaguely how they work in a in a friendly way but if it’s your
kind of thing do look it up but suffice it to say one bit gets flipped anywhere
in the message then the whole hash is completely and utterly different the
third requirement is that you’ve got to be able to avoid what are called hash
collisions this is where you have two documents
which have the same hash obviously there is a mathematical
principle called the pigeonhole principle you have it if you have 50
pigeons and 25 pigeonholes did you have to stuff two pigeons into one of the
pigeonholes that’s a terrible analogy when you say it like this but if I could
explain it there are incredible numbers of
documents out that possible with the hash meanwhile it’s just one fairly long
number that will be files out there which naturally have the same hash and
that’s okay because the odds against it are so unlikely that we can deal with
that it’s never going to happen naturally but if you can artificially create a
hash collision if you can say create a file and change
your name then we have a problem and that’s that’s
where security comes into these because if i can make a file that sums to a
certain hash then i can fake documents i can send
different things and have this signature match so let’s say I have an important
document something that’s i don’t know, that’s the “permission to to go to the moon” I don’t know why I said that erm… oh yeah “permission to go to the moon”
let’s say that – and it’s got someone’s name on it and that file is sent and along with it
through other channels comes this hash to verify that this is actually the
document now let’s say I can intercept that file
and I can change it but because the hash algorithm is broken i can change it and
change the name and change the data and change whatever i can send someone else
to the moon because I can make this hash the same through carefully tweaking the
bytes now it’s incredibly difficult to do that
in practice you’d want a massive file and a lot of computer code but there are old
hash algorithms like md5 which was used for many many years which now have these
collisions out in the wild and are considered broken because you can get a
file not document with text in but a computer code anything like that where
it’s possible to send something malicious and have it come out with the
same hash so this is important this is where speed
comes it if the hash is too slow no one will want to use it but if the
hash is too fast if you can create new ones in a few processor cycles then you
can fairly easily create documents that match a particular hash. it is in a very
real sense an arms race as I said for many years md5 was the accepted
algorithm and it’s still used for a few things but md5 is now thoroughly
broken because computers are fast enough and there are a few -sort-of- interesting
tricks you can use to try and create hash collisions deliberately. The other
problem with md5 is because it was used so much and it was used everywhere on
the web google has become an exceptionally good
resource for breaking them You wouldn’t want to store a
password this way i’ll talk about that in a later video don’t use something like this for
storing passwords but people did many for many years people did & in a lot of cases
a word will be stored next to its md5 hash for some reason if you type an md5
hash into google frequently the word it was hashing comes out which means that
for pretty much every word in the English language and a lot of other
passwords besides the md5 can be solved by typing it into google so md5 is is comprehensively,
constantly broken so everyone move to something called sha-1 and now there are
rumors that that might start to be broken soon if it hasn’t already because
computers keep getting faster hash collisions are easier to generate so
everyone is moving to sha-2 which for the time being is secure. sha-3 is going through
the process of being ratified by all the agencies now and in a few years that’ll be the standard – I mean
ultimately I should really emphasize this **Don’t use this for
storing passwords** I’ll talk about that in a later video these are used for verifying files for
verifying transmission and that’s all they should be useful there is one last thing which is that
occasionally you will see download sites offering software who say
that here’s the file we’re going to send you
and click here to download it and if you want to be safe here’s the hash of the file so you can
be sure it’s the right one – that’s a terrible idea I mean it will
verify you’ve gotta download intact but they’re selling this as we guarantee
that this software is safe and you can check it against that hash – which is
a bad idea because if someone has been able to get into their website and
change the software they’re sending its pretty trivial to change that hash as well so
they got that is hash algorithms that is taking a big chunk of data and turn it
into a small amount to verify it & in a later video i will talk about how that’s
used and how that shouldn’t be used for actually keeping things secure this episode of computer file was
brought to you by audible.com and you can go to audible.com / computerphile
and download a free book they’ve got a huge range that you can listen
to on all kinds of devices your phone or in the car things like that I was thinking about a book to recommend
and it made me think about the first audio book I ever listened to and that was
Treasure Island and I listened to it on a cassette next to my bed as i was going
to sleep each night I checked the audible website they do have treasure
island so that’s my recommendation today why
don’t you check it out audible.com/computerphile free book and
thanks to them for supporting our videos

81 comments on “Hashing Algorithms and Security – Computerphile”

  1. Green Brain Seaside says:

    How about lastpass, how secure is their method of storage & managing passwords

  2. Djane Rey Mabelin says:

    do a video about rainbow tables

  3. Surfurplex says:

    Sometimes webistes deny a password reset since the new password is "too similar" to the old one. How do they know this is all they have is a hash?

  4. Joseph Joestar says:

    Numberphile: BOARING
    Computerphile: OMG DIS IZ DA BEST THING EVA

  5. Liam Coleman says:

    would a root hash be too slow

  6. Przemek Kołowski says:

    I thought hashes for files on websites (like Microsoft Windows ISO images) are used for you to verify that your download did not corrupt the file.

  7. 123sendodo says:

    Just watched "Youtube doesn't know your password" on Tom's Channel… Now it's the same guy talking about similar stuff on another channel… I'm confused.

  8. Kim says:

    If I make a hash algorithm in PHP or JS, how do I hide that algorithm securely from users? I could make a kind of secure hash algorithm, but that is useless if everyone can just read the instructions

  9. Seth Mitchell says:

    Can you just use multiple quick-cycle hashes, or is that just a really stupid, poorly thought out idea some runon-sentence-using, highly-allergic teenager types out on a poorly-constructed desktop computer in their bedroom at an hour far beyond his or her bedtime while under the influence of one of many mind-altering substances that exist in the world today?

  10. andu alem says:

    wow good explanation but i have this Q one of my boy ask me the 
    *. How i can Write a program that integers 1 to 20 to a binary search tree. Assume the root node is created with       value 10.
    **  Assume the data structure:
    StructNode{
    Int value;
    Node*next;
            };
          Node *head=NULL;
     Assume also that there is a value 10 in the linked list.Write a code that deletes a node with this value.Consider all the following cases:
    a. The node is at the head
    b. The node is at the middle
    c. The node is at the end
    Show less

  11. Ambrus Sümegi says:

    Writing hashes next to download buttons has never been intended to ensure that the software isn't maliciously altered. It's for people with crappy connections who want to make sure everything got through as intended.

  12. Eric Taylor says:

    4:10 Could this be used for, say instead of changing the name of the next lunar astronaut, which MAY get you on the fast track but probably won't. (after all they are bound to notice you are grossly unqualified for such a mission) but instead manipulate  troop moment orders in Pakistan. If I could get 6 or seven armor battalions to suddenly be ordered to the India- Pakistan border, well that's bound to get India to respond, which could begin a chain of events that ends in nuclear war.
    Even if it is discovered that it was fake orders that started it, it might go out of control before it could be stopped.

  13. Rakesh P Gopal says:

    The software or file download that has the hash along with it is actually secure. Provided they sign the hash. That is they run RSA on the hash using the Private key of the company. So, nobody can change the hash. If they should change the hash, they need the private key of the company.

  14. xenontesla122 says:

    3:41 That's an… interesting rocket.

  15. urbex2007 says:

    How do you verify the hash of a file on Windows?  It's not very easy is it?
    GCHQ in the UK routinely intercept people downloading files and send ones that have been tampered with.  They did this in 2013 to people using the Tor Project site.  When people requested the Tor Browser Bundle they sent their own modified version hoping to monitor people using that network.  It was only ever picked up by McAfee as it did something to trigger it.  They do it on other sites like BoingBoing and target people using LiveLeak.  Nothing is safe any more now we are all spied on!

  16. Skrapion says:

    The hash for file downloads is usually used by open source projects, where the executable may be mirrored by countless universities which the software author doesn't have control over. In such a case, it certainly is not trivial to compromise both locations.

  17. Yoshis Vids says:

    Damn Tom, I'm amazed from your knowledge in every video of yours I watch here and on your personal channel, would love if you could recommend some good books/ resources other then this and your personal channel.

  18. Justin Garofolo says:

    if my md5 key is like a randomly generated string of 3000 characters and numbers, will that highly decrease the chance of something else hash collision it?

  19. Jacob H says:

    Since when is the moon shaped like a banana?

  20. redesigned says:

    When using hashes for file or packet verification, wouldn't using multiple hash types on the same file/packet and comparing all the hash types applied provide much greater reliability? The chances of multiple hash types having overlapping collisions is infinitesimally small with just 2 hash types let alone more.

    Thanks for the great videos!

  21. Brak says:

    uh ssh1 has been broken wait 2013 oh lol.

  22. Jaime Dantas says:

    Excellent video!

  23. Kahr Kunne says:

    Giving the hash for a file is not intended to look "safe", at least I've never seen a site like that. Mostly when it's used it's to verify that your file didn't corrupt while downloading, which could be problematic if it's, say, a bootable disk file.

  24. jony4real says:

    7:55 Like if you remember the time when everyone used cassette tapes!

  25. Skippy the Magnificent says:

    3:51 I like how the Moon is banana-shaped…

  26. Mars says:

    Very clear, no bullshit introduction to skip. Right to the core. Thanks a lot.

  27. Timur Sultanov says:

    Well I always thought that hash was there on those download sites for protection against network glitches rather than hacker attacks…

  28. chase like the bank says:

    "if you have 50 pigeons into 25 pigeon holes, you have to stuff 2 of the pigeons into 1 of the holes"

  29. Harish Bisht says:

    Permission to go to moon :))))))

  30. Michael Bellerue says:

    7:06 Heeeey look at that! Someone gets it! And 3 years before the Linux Mint fiasco. Well done.

  31. invalidusername says:

    It's still fine to use md5 for hashing passwords as long as you salt them

  32. Hacking says:

    if hacker stole hashing algorithm from server what will happen
    do he is capable to get passwords

  33. H32 says:

    THANKS SO SO MUCH!!!!!!!!!!!!!!

  34. Divya Kk says:

    This was a very nice video..My first comment on any video on youtube !

  35. tan8_197 says:

    The moon shaped like a banana

  36. Dilip Tien says:

    3:51 how is the rocket appearing behind the moon? The rest of the moon is still there

  37. Christopher Butler says:

    Just realized the opening title card says "<computerphile>" and the end title says "</computerphile>"….

  38. Ryan0911 says:

    1:34 Isn't that the first 6 digits of pi?

  39. May says:

    the moon shaped like a banana 3:45

  40. Hany Heggy says:

    it is very useful

  41. bbsonjohn says:

    SHA1 is broken – February 23, 2017

  42. Yasser Alshalaan says:

    SHA1 officially broken by Google today lol

  43. Ashton Pinch says:

    RIP SHA1. 1995-2017

  44. Akshay Aradhya says:

    What does a fingerprint have to do with hashing ?

  45. Michael Murphy says:

    I thought the verification hash offered by those websites was just to check that you got a complete successful download.

  46. Sakata Samig says:

    Wish computerphile was my computing teacher.

  47. Gummans Gubbe says:

    And as people is getting poorer and poorer and governments are getting richer we will have this already?

  48. Gaurav Raj Ghimire says:

    4 years after this video was made… sha1 has been broken

  49. Gradyn Wursten says:

    SHA1 has been broken, sha256 is the standard

  50. Jiany Star Massa vich says:

    Awesome vid

  51. Thành Bùi says:

    Does anyone know What important requirement must a hash function fulfill?

  52. Matthew N says:

    I reckon they should make a video explaining the difference between checksums and hashes.

  53. Mehdi Bounya says:

    6:46 I think the hash is not used to verify that the file wasn't manipulated, but just to verify that the file is not damaged.

  54. Fun Monkey says:

    Almost there guys! Almost got that golden play button! 😀

  55. Uniform Health says:

    how vulnerable is bitcoin vs lite coin to hash collisions

  56. David says:

    I know this is super old but I always thought it was funny that Kali offered the hash for the exact same reason that you mentioned.

  57. Ajai .A says:

    Thank you!

  58. Juan Contreras says:

    hash codes on websites offering a download are also used to make sure the download went well and nothing got corrupted (or involuntarily changed by a machine error or noise)

  59. Robin Östringer says:

    3:41 actual moonlanding footage of 1969 (colorized)

  60. Nice Trade says:

    interesting now we have Blockchain :^)

  61. xev790 says:

    all your hashes are belong to us

  62. Its_me_Bonniee says:

    Awesome!! Very clear, thanks! 🙂

  63. Big Nasty says:

    James talked about it on Numberphile video number 1 on 11.11.11.

  64. Mike Suarez says:

    Permission to go to a banana

  65. killwize says:

    Google broke SHA-1 and told everyone on Feb 23, 2017.

  66. Bogomil Gospodinov says:

    nobody uses md5 without a salt to store password in a db, so a lot of that is exaggerated

  67. Riley Griffin says:

    Theres someone outside your window at 2:08 o.o

  68. Jen Wilson says:

    Thanks so much for this video! Really enjoyed it and made me better at my job.

  69. Zachary Perkins says:

    But what if you you 2 hashes? so i send a file, and it generates a 2 hashes using 2 different algorithms? Surely that lowers the chances of hash collisions astronomically.

  70. RadekG G says:

    1:58 – it should be different? it's just nice to have, hence the pigeon stuff. "Should" is not a word you can use in a definition.

  71. Sparrow says:

    4:20 I'm still using MD to this day.

    oh wait.

  72. Calder Johnson says:

    Hello I am from future, yes sha1 is broken.

  73. Tan Vorn says:

    So is it possible for two or more different document to generate the same hash?

  74. Alix Goldpoint says:

    thanks for the md5 update android developement uses it a lot

  75. jalaj61 says:

    very nicely explained …thanks alot

  76. R M says:

    let that one slip in @ 4:31. Thanks for the great explanation.

  77. Adam Miller says:

    So you settled on round fins for your rocket…

  78. Paul Fragemann says:

    Sha512 ist the Standard today

  79. Albert Renshaw says:

    Do a video on Lamport Signatures and quantum secure encryption

Leave a Reply

Your email address will not be published. Required fields are marked *